WU errors after hibernate?

Author	Message
Deamiter Send message Joined: 9 Nov 05 Posts: 26 Credit: 3,793,650 RAC: 0	Message 8331 - Posted: 4 Jan 2006, 6:51:05 UTC Last modified: 4 Jan 2006, 6:51:40 UTC I've been following the boards here pretty regularly, so I'm well aware of the issue with the Rosetta WUs erroring on some machines when they are not set to "leave in memory when suspended." I never encountered the problem on any of my machines even when I played with the setting, but that's besides the point. Today, I set my computer to hibernate by accident as usually I try to simply suspend it (I've had trouble with hibernating in the past, though never a boinc issue like this). When I turned it back on, the WU had errored out. Is this the same issue as above, or is it something totally different? The error was on computer 80879 on workunit NO_RANDOM_WTS_OR_FRAGS_1b72_223_5556 ID: 8331 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0	Message 8334 - Posted: 4 Jan 2006, 7:04:51 UTC It looks the same ... I have had some errors when I had a forced re-boot. :( I know the developers are working hard to find the issue. I also know that this is one of those hard bugs ... As an example, my boss kept telling me he would load a model and it would fail. Well, actually, it was the second model he loaded that failed. Mattered not what the model was, just that the second model would fail because of an improperly initialized variable. That bug took me nearly 6 months to find because we could not clearly identify the failure mechanism. Every model that "failed" would load cleanly for the developers ... but we would just test the "bad" model ... sigh ... Anyway, I know that they want to fix this, probably more than we do ... ID: 8334 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 8341 - Posted: 4 Jan 2006, 10:39:25 UTC - in response to Message 8331. I've been following the boards here pretty regularly, so I'm well aware of the issue with the Rosetta WUs erroring on some machines when they are not set to "leave in memory when suspended." I never encountered the problem on any of my machines even when I played with the setting, but that's besides the point. Today, I set my computer to hibernate by accident as usually I try to simply suspend it (I've had trouble with hibernating in the past, though never a boinc issue like this). When I turned it back on, the WU had errored out. Is this the same issue as above, or is it something totally different? The error was on computer 80879 on workunit NO_RANDOM_WTS_OR_FRAGS_1b72_223_5556 Yes it happens and the leave in memory (as far as I remember, will need to search for my posts) doesn't help either or it causes a 'zero status, no finish file'. It happens with both Suspend and Hibernate (WindowsXP) Team mauisun.org ID: 8341 · Rating: 0 · rate: / Reply Quote

Keck_Komputers Send message Joined: 17 Sep 05 Posts: 211 Credit: 4,246,150 RAC: 0	Message 8345 - Posted: 4 Jan 2006, 12:23:30 UTC BOINC in general tends to not like Windows hibernate and stand by settings. Both have been known to cause complete wipes of the queue occasionally, and fairly commonly lost of the active workunit. BOINC WIKI BOINCing since 2002/12/8 ID: 8345 · Rating: 0 · rate: / Reply Quote

bartsob5&alicjam Send message Joined: 17 Sep 05 Posts: 6 Credit: 183,280 RAC: 0	Message 8360 - Posted: 4 Jan 2006, 18:35:00 UTC and i've found, that every time i'm suspending rosetta project, the result which is being crunched got an error... is it normal? ID: 8360 · Rating: 0 · rate: / Reply Quote

Rebirther Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0	Message 8364 - Posted: 4 Jan 2006, 18:59:30 UTC - in response to Message 8360. and i've found, that every time i'm suspending rosetta project, the result which is being crunched got an error... is it normal? Looks like only by an AMD processor, my P4 haven`t had any probs... ID: 8364 · Rating: 0 · rate: / Reply Quote

Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,701,412 RAC: 0	Message 8368 - Posted: 4 Jan 2006, 19:51:34 UTC - in response to Message 8360. and i've found, that every time i'm suspending rosetta project, the result which is being crunched got an error... is it normal? Suspend shouldn't cause an error as long as "leave applications in memory when preempted" is "yes"... hibernate is a different level. ID: 8368 · Rating: 0 · rate: / Reply Quote

bartsob5&alicjam Send message Joined: 17 Sep 05 Posts: 6 Credit: 183,280 RAC: 0	Message 8375 - Posted: 4 Jan 2006, 21:58:31 UTC - in response to Message 8368. Suspend shouldn't cause an error as long as "leave applications in memory when preempted" is "yes"... hibernate is a different level. so why it was my fourth error like that? ID: 8375 · Rating: 0 · rate: / Reply Quote

Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,701,412 RAC: 0	Message 8377 - Posted: 4 Jan 2006, 22:32:57 UTC - in response to Message 8375. so why it was my fourth error like that? <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> *UNHANDLED EXCEPTION** Reason: Access Violation (0xc0000005) at address 0x77F52B6A write attempt to address 0x40253040 Exiting... </stderr_txt> This is absolutely typical of a host where "leave applications in memory" is a "no". Have you checked your preferences? Hit "update" on the host, and verified in the Messages tab that the preferences were picked up? I looked at your last 8-10 errors on one host; about 1/3 seem to be the "bad WUs" that are floating around, in that you are not the only one to have errors with them. The others all look like this one. If you are truly getting the "left in memory" message when Rosetta switches out, then I would look for some other problem with your computer. ID: 8377 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0	Message 8388 - Posted: 4 Jan 2006, 22:50:19 UTC 819 errors can be bad drivers, aggressive overclocking, over heating, etc. Are you seeing these type errors, actaully any client errors, on other projects? ID: 8388 · Rating: 0 · rate: / Reply Quote

sbfh Send message Joined: 6 Dec 05 Posts: 2 Credit: 1,624 RAC: 0	Message 8397 - Posted: 5 Jan 2006, 1:45:51 UTC - in response to Message 8345. BOINC in general tends to not like Windows hibernate and stand by settings. Both have been known to cause complete wipes of the queue occasionally, and fairly commonly lost of the active workunit. I don't have these problems with my other Boinc projects, just rosetta. I get a significantly higher rate of client errors here and have few or none on the other projects. Is this something that Rosetta is looking at? fyi... I run Seti, Einstien and Predictor as well as Rosetta. ID: 8397 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 8418 - Posted: 5 Jan 2006, 12:39:44 UTC - in response to Message 8397. BOINC in general tends to not like Windows hibernate and stand by settings. Both have been known to cause complete wipes of the queue occasionally, and fairly commonly lost of the active workunit. I don't have these problems with my other Boinc projects, just rosetta. I get a significantly higher rate of client errors here and have few or none on the other projects. Is this something that Rosetta is looking at? fyi... I run Seti, Einstien and Predictor as well as Rosetta. And no problems here with Seti, CPDN (other than zero state file when coming ot of hibernate/suspend If you say it is BOINC, are they looking in to it? After all it is common functionality to the OS and certainly going to me nore so when they try to move the the 'instant on' living room appliances etc.. I use it all the time. I believe the Rosetta common 'errors' are the cause here though and 'leave in memory' should fix the client error in the most as I just checked the logs and it seems to be just 'zero error' which all projects seem to get (I assume these are just save point files, although cannot see why boinc shouldn't be able to handle the hibernate/suspend calls most other thing can) Team mauisun.org ID: 8418 · Rating: 0 · rate: / Reply Quote

sbfh Send message Joined: 6 Dec 05 Posts: 2 Credit: 1,624 RAC: 0	Message 8479 - Posted: 6 Jan 2006, 15:03:40 UTC - in response to Message 8397. BOINC in general tends to not like Windows hibernate and stand by settings. Both have been known to cause complete wipes of the queue occasionally, and fairly commonly lost of the active workunit. I don't have these problems with my other Boinc projects, just rosetta. I get a significantly higher rate of client errors here and have few or none on the other projects. Is this something that Rosetta is looking at? fyi... I run Seti, Einstien and Predictor as well as Rosetta. I think I will suspend my rosetta account for a while until this is taken care of. I just got another errored our unit. I would rather spend my computer's cpu downtime on projects that don't have this particular problem. ID: 8479 · Rating: 0 · rate: / Reply Quote

bartsob5&alicjam Send message Joined: 17 Sep 05 Posts: 6 Credit: 183,280 RAC: 0	Message 8528 - Posted: 7 Jan 2006, 10:17:01 UTC @Bill Michael there were errors caused by aggresive overclocking, but on another host;) now there should be everything allright;) from that day i haven't run any rosetta WU and i don't know, whether it's ok now or not. Anyway, thanks for help! ID: 8528 · Rating: 0 · rate: / Reply Quote

Rich Zajac Send message Joined: 7 Nov 05 Posts: 4 Credit: 37,323 RAC: 0	Message 8544 - Posted: 7 Jan 2006, 19:54:58 UTC I keep getting a message similar to: 1/7/06 11:19:59 AM\|rosetta@home\|Unrecoverable error for result INCREASE_CYCLES_10_1dtj_226_6184_2 ( - exit code -1073741819 (0xc0000005)) Could someone please give me an idea whats causing this....it started only fairly recently although I cant say exactly when. I'm going to suspend Rosetta until I get some idea cause all I'm doing now is wasting cycles. Rich ID: 8544 · Rating: 0 · rate: / Reply Quote

Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,701,412 RAC: 0	Message 8545 - Posted: 7 Jan 2006, 20:07:50 UTC - in response to Message 8544. I keep getting a message similar to: 1/7/06 11:19:59 AM\|rosetta@home\|Unrecoverable error for result INCREASE_CYCLES_10_1dtj_226_6184_2 ( - exit code -1073741819 (0xc0000005)) The "05" errors USUALLY mean you have "leave applications in memory when preempted" set to "no". If this is "yes", we'll need to dig deeper. ID: 8545 · Rating: 0 · rate: / Reply Quote

Rich Zajac Send message Joined: 7 Nov 05 Posts: 4 Credit: 37,323 RAC: 0	Message 8608 - Posted: 8 Jan 2006, 18:26:08 UTC - in response to Message 8545. I keep getting a message similar to: 1/7/06 11:19:59 AM\|rosetta@home\|Unrecoverable error for result INCREASE_CYCLES_10_1dtj_226_6184_2 ( - exit code -1073741819 (0xc0000005)) The "05" errors USUALLY mean you have "leave applications in memory when preempted" set to "no". If this is "yes", we'll need to dig deeper. So.....what you're telling me is that unlike the other BOINC projects, Rosetta requires that I keep large amounts of memory tied up for ALL projects since the "leave in memory" option is global?!?!?! Please let me know when you come up with a fix....until then I'm going to put Rosetta "on the shelf". ID: 8608 · Rating: 0 · rate: / Reply Quote

Deamiter Send message Joined: 9 Nov 05 Posts: 26 Credit: 3,793,650 RAC: 0	Message 8618 - Posted: 9 Jan 2006, 5:33:19 UTC - in response to Message 8608. So.....what you're telling me is that unlike the other BOINC projects, Rosetta requires that I keep large amounts of memory tied up for ALL projects since the "leave in memory" option is global?!?!?! Please let me know when you come up with a fix....until then I'm going to put Rosetta "on the shelf". Yes it's true, and this is a problem that's being attacked as we speak. However, you should be aware that it does not leave the application in your RAM -- when left in memory, the application is automatically paged to the virtual memory on your hard drive. Unless you're REALLY hurting for hard drive space, I can't imagine why this would be such a problem. Though in the end, there's no question that it IS a problem. I'm sure it'll get fixed eventually, but until then, I'm happy to give a few more megabytes of my HD space to the projects since it reduces the time it takes for my computer to switch between projects with no negative side effects (besides eating another few MB out of the 60GB I have free). ID: 8618 · Rating: 0 · rate: / Reply Quote

Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,701,412 RAC: 0	Message 8620 - Posted: 9 Jan 2006, 5:54:01 UTC - in response to Message 8608. So.....what you're telling me is that unlike the other BOINC projects, Rosetta requires that I keep large amounts of memory tied up for ALL projects since the "leave in memory" option is global?!?!?! Please let me know when you come up with a fix....until then I'm going to put Rosetta "on the shelf". As far as I know, SETI and Einstein are the _only_ two projects that are not harmed (much...) by being swapped out of memory. uFluids and SZTAKI won't error out - they just restart at the very beginning. ClimatePrediction and Predictor and LHC restart at the last checkpoint, which can (for them) be a significant part of the hour they by default run. SETI and Einstein _also_ restart at the last checkpoint, but as long as you have not raised the default "write to disk every" setting, they have checkpoints every 0.1% or so, so you only lose a few minutes of crunching time. The setting is there for the very small number of people (mostly on Win9x boxes) who are VERY tight on memory. Even though it will be swapped out to virtual memory (on disk), there is a few K that remains in RAM. If you have a computer that meets the minimum requirements of Rosetta (512MB, Win2K and better, including Linux or Mac) then there is zero downside to setting the option to "yes". If you can't meet the minimum requirements shown on the website, then you really should never have signed up at all without understanding that you were "on your own", being below those standards. The project staff has this bug on their list of things to fix, but frankly it's only a "medium" priority compared to several other issues. ID: 8620 · Rating: -1 · rate: / Reply Quote

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 8639 - Posted: 9 Jan 2006, 12:08:09 UTC - in response to Message 8620. Please let me know when you come up with a fix....until then I'm going to put Rosetta "on the shelf". Could be your best move if it is really causing you a problem As far as I know, SETI and Einstein are the _only_ two projects that are not harmed [by keep-in-mem=no] ... ClimatePrediction and Predictor and LHC restart at the last checkpoint, which can (for them) be a significant part of the hour they by default run. On CPDN with the min spec 800MHz box and an earlier client, a sulphur WU takes more than 1 hour to get to checkpoint. In runs perfectly in one sense, but never progresses in real terms unless you either set a higher interval for swaps or set keep=yes. I believe that later clients allow crunching to continue to the next checkpoint, but only on a 1-cpu box. (I'm not certain about this - I remember it being discussed but am not sure if it was actually done) R~~ ID: 8639 · Rating: 0 · rate: / Reply Quote