Message boards : Number crunching : Unrecoverable error
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
UBT - Halifax--lad Send message Joined: 17 Sep 05 Posts: 157 Credit: 2,687 RAC: 0 |
Have just altered my settings to tell BOINC to switch every 10 hrs that way I can still crunch Rosetta without and errors when switching, it is currently impossible for me to keep in memory. hope this problem with checkpointing/switching is found soon Join us in Chat (see the forum) Click the Sig Join UBT |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
Have just altered my settings to tell BOINC to switch every 10 hrs that way I can still crunch Rosetta without and errors when switching, it is currently impossible for me to keep in memory. Why is it not possible to keep them in memory? You're running Win XP, so within a few moments of being suspended, the WU will be flushed from physical ram (chip) to the swap file, and left there for the duration. Are you short of swap space or something? |
UBT - Halifax--lad Send message Joined: 17 Sep 05 Posts: 157 Credit: 2,687 RAC: 0 |
Why is it not possible to keep them in memory? You're running Win XP, so within a few moments of being suspended, the WU will be flushed from physical ram (chip) to the swap file, and left there for the duration. I don't run XP at all I have Win 2000, I also run many multiple projects, there wouldn't be enough memory available to hold all of the projects in memory Join us in Chat (see the forum) Click the Sig Join UBT |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
Why is it not possible to keep them in memory? You're running Win XP, so within a few moments of being suspended, the WU will be flushed from physical ram (chip) to the swap file, and left there for the duration. You still don't get it, do you? PAUSED APPLICATIONS CONSUME SWAP SPACE, THEY DO NOT CONSUME PHYSICAL RAM! That picture shows what happened when I purposely "overloaded" this system. Note. Windows 98SE and 512 Mb of ram. As Process Explorer shows in that image, I have paused in ram the following assortment: 3 rosetta WU's, 2 predictor WU's, one Einstein, I'm not sure who the Sixtrack belongs to (WCG?) and 7 seti at homes. Not a bad haul, you will admit. System monitor explains how I can do this, and could keep this up indefinitely. Ignore the red graph, the blue is the one that counts, it's swap file in use, i.e. data that has been removed from chip and placed on disk in the swap file. That's going up at a pretty good rate, because guess what. The good old vmem system is hauling stuff out of ram and parking on disk. The yellow one at the bottom is available physical, which is (of course) pretty much at zero. OK, so the bottom line is this. Even though you only have 256 Meg of ram in that system of yours, it'll still do the same thing. As apps get paused, they'll page out to disk. Win98 can do it, so Win 2K sure as hell can do it too. Have you actually tried setting that switch on your config page? |
FZB Send message Joined: 17 Sep 05 Posts: 84 Credit: 4,948,999 RAC: 0 |
As Process Explorer shows in that image, I have paused in ram the following assortment: 3 rosetta WU's, 2 predictor WU's, one Einstein, I'm not sure who the Sixtrack belongs to (WCG?) and 7 seti at homes. sixtrack belongs to LHC@home if you run that, sixtrack is also used in some benchmarks. -- Florian www.domplatz1.de |
UBT - Halifax--lad Send message Joined: 17 Sep 05 Posts: 157 Credit: 2,687 RAC: 0 |
Dgnuff I know perfectly well how BOINC works and I know what swap space is so there is no need to stick something in Bold and CAPS to me Join us in Chat (see the forum) Click the Sig Join UBT |
Spectre Send message Joined: 1 Nov 05 Posts: 20 Credit: 177,671 RAC: 0 |
@Bill: Tried everything, but still getting one error after another with workunits. Switched to LHC and have done 13 workunits now with NO errors at all....will go another dozen and switch back to Rosetta and see what happens. Thanks, Spectre |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
Dgnuff I know perfectly well how BOINC works and I know what swap space is so there is no need to stick something in Bold and CAPS to me You also state: I don't run XP at all I have Win 2000, I also run many multiple projects, there wouldn't be enough memory available to hold all of the projects in memory Of exactly what sort of memory do you not have enough available? Can't be chip, because when you run out of chip, we both know that it winds up on disk. See my overload job for an example of this in action. Do you have a small hard disk that is limiting swap space? |
UBT - Halifax--lad Send message Joined: 17 Sep 05 Posts: 157 Credit: 2,687 RAC: 0 |
It just doesn't work when left in Memory I know that as had held in memory a long time ago and the computer just sulks, my easy option is to switch every 10hrs all the WU's are getting done with no errors and all get done before there deadline on this setting so thats not an issue with me, when I finally upgrade my comp I will leave in memory but for now I won't Join us in Chat (see the forum) Click the Sig Join UBT |
hob. Send message Joined: 4 Nov 05 Posts: 64 Credit: 250,683 RAC: 0 |
boink ver 5.6.2 rosetta ver 4.80 i have been getting these errors too, on one of 5 machines running rosetta......the other 4 are not having problems. i have now stopped rosetta and restarted FaD (which runs ok on this machine) so far 18 of 27 jobs have failed with this error :- 03/12/2005 15:43:00|rosetta@home|Temporarily failed upload of 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_49156_0_0: can't resolve hostname 03/12/2005 15:43:00|rosetta@home|Backing off 2 hours, 20 minutes, and 55 seconds on upload of file 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_49156_0_0 03/12/2005 15:52:52|rosetta@home|Started upload of 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0 03/12/2005 15:53:09||Couldn't resolve hostname [boinc.bakerlab.org] 03/12/2005 15:53:10|rosetta@home|Temporarily failed upload of 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0: can't resolve hostname 03/12/2005 15:53:10|rosetta@home|Backing off 1 hours, 14 minutes, and 19 seconds on upload of file 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0 03/12/2005 16:07:16|rosetta@home|Unrecoverable error for result 1di2__abrelax_rand_len10_jit02_omega_sim_filters_63326_0 ( - exit code -1073741819 (0xc0000005)) 03/12/2005 16:07:16|rosetta@home|Too many backoffs - fetching master file 03/12/2005 16:07:16||request_reschedule_cpus: process exited 03/12/2005 16:07:16|rosetta@home|Deferring communication with project for 13 hours, 42 minutes, and 34 seconds 03/12/2005 16:07:16|rosetta@home|Computation for result 1di2__abrelax_rand_len10_jit02_omega_sim_filters_63326_0 finished 03/12/2005 16:07:16|rosetta@home|Starting result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_63399_0 using rosetta version 480 03/12/2005 16:23:59|rosetta@home|Unrecoverable error for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_63399_0 ( - exit code -1073741819 (0xc0000005)) 03/12/2005 16:23:59|rosetta@home|Too many backoffs - fetching master file 03/12/2005 16:23:59||request_reschedule_cpus: process exited 03/12/2005 16:23:59|rosetta@home|Computation for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_63399_0 finished 03/12/2005 16:23:59|rosetta@home|Starting result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_63333_0 using rosetta version 480 03/12/2005 16:24:03|rosetta@home|Deferring communication with project for 13 hours, 25 minutes, and 47 seconds 03/12/2005 16:45:24|rosetta@home|Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_63333_0 ( - exit code -1073741819 (0xc0000005)) 03/12/2005 16:45:24|rosetta@home|Too many backoffs - fetching master file 03/12/2005 16:45:24||request_reschedule_cpus: process exited 03/12/2005 16:45:24|rosetta@home|Deferring communication with project for 13 hours, 4 minutes, and 26 seconds 03/12/2005 16:45:24|rosetta@home|Computation for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_63333_0 finished 03/12/2005 16:45:24|rosetta@home|Starting result 1dcj__abrelax_rand_len10_jit02_omega_sim_filters_63289_0 using rosetta version 480 03/12/2005 17:07:30|rosetta@home|Started upload of 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0 03/12/2005 17:07:47||Couldn't resolve hostname [boinc.bakerlab.org] 46 years dc so far join team FaDbeens join us |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 11 |
boink ver 5.6.2 You don't say if you have "Leave in memory" set to "Yes", but assuming you do, I can only throw in a few things. V5.6.2 isn't a valid BOINC version. The current release is V5.2.13, and I would definitely recommend anyone having connection problems upgrade to it. You're currently seeing two problems, a failure to reliably connect with the project, and the results erroring out. I would address one of those at a time, _unless_ you are overclocked. If you are, I would definitely do some testing of temperatures and memory stability. (Actually, checking temps would be a good idea even if not overclocked...) Never having run FaD, I have no idea how sensitive it is to stability issues. I know the various BOINC projects have varying sensitivity, with Rosetta and SETI being at the top of the list. Almost any 'glitch' will cause problems for them, as they rely on extreme accuracy of the calculations. |
hob. Send message Joined: 4 Nov 05 Posts: 64 Credit: 250,683 RAC: 0 |
boink ver 5.6.2 if i go to "help/about" in boink manager it tells me i have ver 5.2.6 quote ..."failure to connect to the project" thats because i only have a 52k modem and an off peak package...so i can't stay "always on" it is not overclocked ...however cooling it is a problem as it's the top unit in this rack "Leave in memory" set to Yes windows xp pro service pack 1 46 years dc so far join team FaDbeens join us |
Morphy375 Send message Joined: 2 Nov 05 Posts: 86 Credit: 1,629,758 RAC: 0 |
"Couldn't resolve hostname [boinc.bakerlab.org]" No nameserver available. DNS misconfigured? Teddies.... |
Vester Send message Joined: 2 Nov 05 Posts: 258 Credit: 3,651,260 RAC: 428 |
Hob, the current BOINC client is version 5.2.13. It helped me. Download from this page. |
Plum Ugly Send message Joined: 3 Nov 05 Posts: 24 Credit: 2,005,763 RAC: 0 |
Seems I'm getting nothing but errors too.Changed the drive,cpu,memory everything but the mother board to see if this changed anything.No change.Not over clocked and running the newest version. 2005-12-03 21:10:18 [rosetta@home] Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_82016_0 ( - exit code -164 (0xffffff5c)) 2005-12-03 21:10:18 [---] request_reschedule_cpus: process exited 2005-12-03 21:10:18 [rosetta@home] Computation for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_82016_0 finished 2005-12-03 21:10:18 [rosetta@home] Starting result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_82056_0 using rosetta version 480 2005-12-03 21:00:51 [rosetta@home] Unrecoverable error for result 1di2__abrelax_rand_len10_jit02_omega_sim_filters_82001_0 ( - exit code -1073741819 (0xc0000005)) 2005-12-03 21:03:53 [rosetta@home] Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_54889_1 ( - exit code -164 (0xffffff5c)) 2005-12-03 21:04:53 [rosetta@home] Unrecoverable error for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_82041_0 ( - exit code -1073741819 (0xc0000005)) 2005-12-03 21:06:03 [rosetta@home] Unrecoverable error for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_82048_0 ( - exit code -1073741819 (0xc0000005)) 2005-12-03 21:10:18 [rosetta@home] Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_82016_0 ( - exit code -164 (0xffffff5c)) from the results page; core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # ===================================== # random seed: 1518321 # ===================================== ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x77F51D24 write attempt to address 0x00000000 1: 12/03/05 21:06:03 </stderr_txt> |
Vester Send message Joined: 2 Nov 05 Posts: 258 Credit: 3,651,260 RAC: 428 |
I would check to see that Windows XP settings for cache are "system managed size" or cache is liberally allocated, about 1.5 GB. Also, run cache in the partition on which the OS is installed unless you can run it on a secondary hard drive (optimal). Tank once commented: I have to agree with THINK about stress causing these access violations. I have found that a hastily assembled machine is much more likely to produce these errors. It may just be down to the order in which drivers are loaded during setup. I am not convinced that this accounts for all of the occurrences but a rebuild and reinstall usually sorts things out. IMHO. They were referring to heat stress. |
Dogbytes Send message Joined: 4 Dec 05 Posts: 37 Credit: 207,563 RAC: 0 |
Looks like my Rosetta WU crunches are erroring out since Nov 19th. I've got a PowerMac G5 2.5 running OS10.4.3 and I got the same thing with the Super Bench Manager. I uninstalled that client and downloaded the current OS X client and I'm getting the same thing. The WU keeps crunching and crunching for 7 or more hours then fails with client error. <core_client_version>4.44</core_client_version> <message>Maximum CPU time exceeded </message> <stderr_txt> # ===================================== # random seed: 824341 # ===================================== </stderr_txt> If I cant get this fixed, I'll have to move this hosts to another project. |
Dogbytes Send message Joined: 4 Dec 05 Posts: 37 Credit: 207,563 RAC: 0 |
I just joined Rosetta a few days ago. My PC's are OK with R&H but my PowerMac G5 2.5 runs the WU's for about the same time as a 600MHz Celeron, then error out. I was using a superbench mark 4.44 client. I then uninstalled it, trashing numerous Seti work units which were hung up with Berkeley's servers BS, and installed the current Mac client. The same thing it happening. I don't do mixed projects; I crunch only one project exclusively at a time. If I can't get this fixed soon, I'll migrate all my hosts to some other project. My Mac is my pride and joy, if it can't be in on the project, nothing will. I was looking forward to the change from Seti, I'd heard a lot of good things about this project, but this is not working. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 11 |
I don't do mixed projects; I crunch only one project exclusively at a time. If I can't get this fixed soon, I'll migrate all my hosts to some other project. My Mac is my pride and joy, if it can't be in on the project, nothing will. I was looking forward to the change from Seti, I'd heard a lot of good things about this project, but this is not working. I will only do work for projects that _have_ a Mac client, but I do work on multiple projects at a time, so if a project is having a Mac problem for a while, I'll leave the PC going for them... I'm running Rosetta "heavy" on the PC, and "some" on the Mac Mini. Kind of playing with numbers trying to get Rosetta and Predictor over 10000, and Einstein over SETI... The only problem I've had is certain WUs that run "extremely" long times on the Mini, while others seem to just be a touch slower on the Mini than they should be, based on work downloaded the same day to the PC. (Now, I did give up on the iBook G3...) I haven't seen _any_ WUs that errored out, so I don't know what to tell you on that. If you want to "pull the plug" on Rosetta for now, that's up to you, but they are currently looking at the Mac app, wanting to make it better/faster. So don't give up completely, but come back and check it out again later. Out of curiosity, no offense intended - does it work for you to do one-project work? I know PoorBoy does that too, putting all of his (considerable!) power towards one project until he's happy with his rank, then moving to the next. I play games with resource shares and which computers are attached to which projects, but I almost always have one computer _somewhere_ doing any of the projects I'm interested in - if nothing else, it keeps me from falling quite so far before I get back to making that one "#1" again. |
Dogbytes Send message Joined: 4 Dec 05 Posts: 37 Credit: 207,563 RAC: 0 |
Now that someone has said that they're working on a functional client, I'll stick around and bide my time. I'm like Poorboy. I find it rather frustrating at this point in time that Mac's are still treated like unwanted, redheaded, step childs within the community. So, I'll hang loose for awhile. I guess that I'm having the adult version of a temper tantrum. What's even worst is having to trash WU's! |
Message boards :
Number crunching :
Unrecoverable error
©2024 University of Washington
https://www.bakerlab.org