Message boards : Number crunching : Computation Error
Previous · 1 · 2 · 3
Author | Message |
---|---|
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 5 |
Anyone with a "suspended" DEFAULT_xxxxx_205 please check the webpage for your results, and look at that one - if the "errors" line at the top says "Cancelled", you can unsuspend it and abort it. That will let it get back to the server and be finished. Thanks! |
Etienne Guyot Send message Joined: 27 Oct 05 Posts: 10 Credit: 952,910 RAC: 0 |
Hello, I've got many computation error with Rosetta 4.81 on all my computers Most of them ending with error 0xC0000005 Following is a sample of the boinc error log (stdoutdae.txt): 2005-12-29 22:17:57 [LHC@home] No work from project 2005-12-29 22:22:38 [---] request_reschedule_cpus: process exited 2005-12-29 22:22:38 [SETI@home] Computation for result 19fe05aa.27874.4082.498562.1.33_1 finished 2005-12-29 22:22:39 [rosetta@home] Starting result 1n0u__topology_sample_207_8081_9 using rosetta version 481 2005-12-29 22:22:41 [SETI@home] Started upload of 19fe05aa.27874.4082.498562.1.33_1_0 [color=red]2005-12-29 22:23:06 [rosetta@home] Unrecoverable error for result 1n0u__topology_sample_207_8081_9 ( - exit code -1073741819 (0xc0000005))[/color] 2005-12-29 22:23:06 [---] request_reschedule_cpus: process exited 2005-12-29 22:23:06 [rosetta@home] Computation for result 1n0u__topology_sample_207_8081_9 finished 2005-12-29 22:23:06 [SETI@home] Starting result 13dc03aa.20895.6785.561076.1.70_3 using setiathome version 418 I've noticed that this kind of error always happen at the exact moment Boinc Manager performs a project switch. Issuing a Suspend command while Rosetta is crunching or a Quit will produce the same behavior. I also noticed that I always got a succesfully completed Rosetta WU if the task has neither been interrupted. May be a clue to fix this problem? (I'm running Boinc 5.2.15 on one computer and 5.3.6 on the other - Win32 XP, no graphics used, no screensaver, not linked to DEFAULT_xxxx_205 WU). Regards, Gex - France |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Set your remain in memory when pre-empted and see what happens....Rosetta need this is seems. |
Etienne Guyot Send message Joined: 27 Oct 05 Posts: 10 Credit: 952,910 RAC: 0 |
Set your remain in memory when pre-empted and see what happens....Rosetta need this is seems. Thanks for the trick. I'll try it. But it's not a long term fix as it's a general switch active for all projects. I need not to swap too much my physical memory with hard drive as I run other applications (not only dedicated to Boinc). It's slowing done a lot my computers. Hope Rosetta team will fix that quicly, otherwise I'll consider suspending this project as I waste cpu time for nothing! (And Rosetta project too) Regards, Gex - France |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
They are working to fix it but I don't know how long..... |
winman Send message Joined: 5 Dec 05 Posts: 2 Credit: 267,850 RAC: 0 |
had similar probs with my 3200+. seems it would cause my machine to lock up, and cause other probs with it. It was very frustrating, I leave on Tuesday morning and don't get back till Friday evening, not happy to see that the machine locked up an hour or two after I left, set there idle not crunching for almost 4 days. My 3700+ seems to have no probs so it is the only machine that runs rosetta, and sadly it doesn't run when i am gone. Nice to hear I am not the only one with probs with rosetta though. My 3200+ happily crunches set and einstien 24/7 now, and my 3700+ runs rosetta and LHC(when there is work), or seti when LHC doesn't have anything. Live long and crunch!!! |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 5 |
There are still a few of these floating around - [url=https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3819739]this one[/quote] sat in someone's queue from Dec 21 to Jan 8. Six people have had it so far out of the 10 these were set to allow. So far the next result is "unsent"... |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
There are still a few of these floating around - [url=https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3819739]this onesat in someone's queue from Dec 21 to Jan 8. Six people have had it so far out of the 10 these were set to allow. So far the next result is "unsent"... [/quote] Could one of the project team make sure it is not sent out again please? Maybe steal its files from the server (and wait for the questions about download errors) if you can't make its status change to 'not needed' Obviously, I don't mean just this one, but maybe run a script to identify all wu where the number of errors exceeds the new max? Just a fort ;-) |
Aquila audax Send message Joined: 13 Dec 05 Posts: 3 Credit: 55,412 RAC: 0 |
I am also still having problems with 'computation errors'... and these are with new WUs downloaded yesterday. and as Etienne Guyot noted, they all occur when pausing a running R@H job as BOINC switches to a different job. [See log snippets below] 10/01/2006 1:06:39 AM|Predictor @ Home|Restarting result h0013B_1_139120_3 using mfoldB125 version 428 10/01/2006 1:06:39 AM|SETI@home|Restarting result 15mr05aa.29724.21872.447166.1.38_2 using setiathome version 418 10/01/2006 1:06:39 AM|rosetta@home|Pausing result NO_RAND_WTS_2tif_230_6530_0 (removed from memory) 10/01/2006 1:06:40 AM|rosetta@home|Unrecoverable error for result NO_RAND_WTS_2tif_230_6530_0 ( - exit code -1073741819 (0xc0000005)) 10/01/2006 1:06:40 AM||request_reschedule_cpus: process exited ... 10/01/2006 5:14:42 AM|Einstein@Home|Restarting result r1_0930.0__761_S4R2a_1 using albert version 437 10/01/2006 5:14:42 AM|SETI@home|Restarting result 15mr05aa.29724.21872.447166.1.38_2 using setiathome version 418 10/01/2006 5:14:42 AM|rosetta@home|Pausing result MORE_FRAGS_W_BARCODE_2tif_231_6530_0 (removed from memory) 10/01/2006 5:14:42 AM|rosetta@home|Pausing result NO_RANDOM_WTS_OR_FRAGS_1dcj_223_9021_0 (removed from memory) 10/01/2006 5:14:43 AM|rosetta@home|Unrecoverable error for result MORE_FRAGS_W_BARCODE_2tif_231_6530_0 ( - exit code -1073741819 (0xc0000005)) 10/01/2006 5:14:43 AM|rosetta@home|Unrecoverable error for result NO_RANDOM_WTS_OR_FRAGS_1dcj_223_9021_0 ( - exit code -1073741819 (0xc0000005)) 10/01/2006 5:14:43 AM||request_reschedule_cpus: process exited 10/01/2006 5:14:43 AM|rosetta@home|Computation for result MORE_FRAGS_W_BARCODE_2tif_231_6530_0 finished 10/01/2006 5:14:43 AM|rosetta@home|Computation for result NO_RANDOM_WTS_OR_FRAGS_1dcj_223_9021_0 finished |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 5 |
10/01/2006 1:06:39 AM|rosetta@home|Pausing result NO_RAND_WTS_2tif_230_6530_0 (removed from memory) Yes - Rosetta will error out if it is removed from memory. Until they find and fix this bug, you have to have "leave applications in memory when preempted" set to "yes" on the website preferences. Also - please edit your post to break the lines in the 'pre' blocks. This causes stretching. |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,103,208 RAC: 167 |
Yes - Rosetta will error out if it is removed from memory. Until they find and fix this bug, you have to have "leave applications in memory when preempted" set to "yes" on the website preferences. ========== Bill, leaving applications in memory may help but it is not a cure all. I have my preferences set to that and I still get Computation Error's quit a bit. I had some this morning when I got up because of the Benchmarks that ran overnight, and I've had some during the day because of suspending WU's to run another Project. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 5 |
Bill, leaving applications in memory may help but it is not a cure all. I have my preferences set to that and I still get Computation Error's quit a bit. I had some this morning when I got up because of the Benchmarks that ran overnight, and I've had some during the day because of suspending WU's to run another Project. Benchmarks shouldn't be a problem with BOINC V5.2.8 or later... suspending shouldn't either with 4.45 or later. Only quitting BOINC completely will cause the "memory bug" computation error, if "leave in memory" is yes. It's possible that what you're seeing is a different problem? If it's one host only, could be overclocked too much, or overheating, or RAM problems - or of course just a string of bad luck, getting some of the "bad WUs". You've been around long enough - you know what to look for! :-) Seriously, out of my last 80 results, four have computation errors; and all four of those were "bad WUs", three from over the holidays, one that just crawled out of somebody's cache where it'd been hiding since then. |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,103,208 RAC: 167 |
Benchmarks shouldn't be a problem with BOINC V5.2.8 or later... suspending shouldn't either with 4.45 or later. I've been running v5.2.7 for quite awhile now so suspending a WU is still a problem at times, the computation error's don't happen to often but enough to be irritating. I've slowly started to update all my PC's to v5.2.13 to see if I still get an error sometimes when suspending a WU, will report if I do ... :) |
Aquila audax Send message Joined: 13 Dec 05 Posts: 3 Credit: 55,412 RAC: 0 |
I am running 5.2.13. I will try the leave in memory option. None of the other projects I have running mind being suspended. PS. Apologies for the long lines, but I can't edit the post anymore to fix. |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,103,208 RAC: 167 |
Only quitting BOINC completely will cause the "memory bug" computation error, if "leave in memory" is yes. eeerrrgggg ... Yes, even after Upgrading to BOINC v5.2.13 I found that out this morning. I Suspended a WU with 3 hours on it & everything was Kewl, then I exited the BOINC Manager & when I started BOINC back up the WU gave me a Computation Error. This should not happen & needs to be fixed because a lot of people don't leave their PC's running 24/7 like some of us do. |
Aquila audax Send message Joined: 13 Dec 05 Posts: 3 Credit: 55,412 RAC: 0 |
Ok, I have been running with the 'leave in memory' option turned on for over a day now and have not had a single computation error so far. |
Trog Dog Send message Joined: 25 Nov 05 Posts: 129 Credit: 57,345 RAC: 0 |
10/01/2006 1:06:39 AM|rosetta@home|Pausing result NO_RAND_WTS_2tif_230_6530_0 (removed from memory) This problem doesn't seem to affect linux - I can happily crunch wu's on my two linux boxes (removing from memory when suspended) but the two windows boxes (XP & 98) choke with the 1073741819 error. |
Message boards :
Number crunching :
Computation Error
©2024 University of Washington
https://www.bakerlab.org