Message boards : Number crunching : Many Problems
Author | Message |
---|---|
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
It appears that I will have to cease work on Rosetta tasks as the PC descibed below is not doing well with Rosetta. This project is the only work assigned to the PC.Rosetts is version 5.25 and BOINC is ver 5.4.11 These are the messages that I was able to retrieve and don’t know if this is of any help to diagnose the reason this PC is causing problems.If any other information should be provided to try to resolve these problems please advise me. This PC is an AMD Athlon 64 3700+ NOT OVERCLOCKED Motherboard is ASUS A8N-E RAM is 1 Gigabyte OCZ PC4200 (263 MHz) running at 200 MHz I am using very relaxed timings of 2.5, 4, 4, and 8. The hard drive is a Western Digital Raptor (SATA) Processor temperature is around 43 degrees C. There have also been several instances of BSOD . One being “ Page_Fault_IN_Nonpaged_Area” Stop: 0x00000050 Win32.sys – address BF8028A7 base at BF800000 Datestamp43446a58 There have been several instances of the system locking up and having to reboot to recover. This PC has worked on Folding at Home for several months with no problems. I have set BOINC to not take any new work and will run memtest86 and Prime95 to see if they indicate problems. None had been indicated by these programs when the PC was first built and run at at a FSB of 220 MHZ. I see no point in messing up the project with Client and Compute errors. My Intel Pentium 4 at 2.4 GHz is working well / stable. - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0086009A write attempt to address 0x11B8E760 Engaging BOINC Windows Runtime Debugger Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0060AA0A read attempt to address 0x8A34AED4 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004C9C46 read attempt to address 0xA15A7174 stderr out <core_client_version>5.4.11</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 3084267 # cpu_run_time_pref: 28800 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score 952.206 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .xx1vie.out # cpu_run_time_pref: 28800 ERROR:: Exit at: .initialize.cc line:1618 |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Eek, I'm not sure what to say. I have an AMD64 3700 sandiego, OCed 10%, with 1M OCZ gold, Asus A8N-E mobo, but a Hitachi deskstar 250G sataII drive, and it works well with rosetta 5.25. From the BSOD errors, I'd have to guess your bios setting are good enough for general work, but when tasked by rosetta your mem or something goof up. Try setting bios timings back to auto. have you run memtest86+ and Prime95? tony Oh yeah, I'm Boinc Alpha tester so I'm using 5.6.5, I'm having a hard time thinking it's boinc related, but I could direct you to the latest "Alpha" client |
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
[quote]Eek, I'm not sure what to say. Thanks for the reply. The bios settings for the memory are the Auto settings if I was running the memory at a FSB of 263. If I use auto at 200 MHZ it sets them faster to 2.5, 3, 3, 7. I have raised the RAM voltage and CPU voltage....all the OC'ing tricks but without success. As I stated Prime95 and memtest86 were solid when the box was built and with an FSB of 220 MHZ. I will run both tests again for at least 24 hours when the current task finishes. Its at 70% now. murky / Bob From the BSOD errors, I'd have to guess your bios setting are good enough for general work, but when tasked by rosetta your mem or something goof up. Try setting bios timings back to auto. have you run memtest86+ and Prime95? tony |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I have the asus 6200 (cheap) Pcix video card. Have you tried turning off the graphics? I see the 0x1 error and one was terminated by the watchdog timer. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
You might go to "rosetta prefs" and set your "cpu run time" to 1 hour, just for testing. If any shoot past 1 hour by more than 5-10 min then it is having a problem. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
My OCZ is guaranteed to 3v +/-. I now find if I go below 2.9 it just locks up. I think default was 2.65, but there's no way I can get that low. Quite frankly I got it to 10% and got tired of trying to get more out of it and left it there. I'll reboot and look at my settings |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
2530 mhz CPU config dram config (Auto) 400Mhz 2.5T 7T 2T 2T 11T 14T 4T 3T 2T both remapping enabled HT 4X CnQ disabled Jumperfree Overclocking (manual) cpu freq 230 Pciex clcok 100 Mhz DDR volt 3.0 cpu mult X11 cpu volt 1.5 pci clock sync auto I have to run out, but will return |
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
I have the asus 6200 (cheap) Pcix video card. Have you tried turning off the graphics? I see the 0x1 error and one was terminated by the watchdog timer. Thanks for all your thoughts on this problem. The current task has just over an hour to completion so I'll run the diagnostics for the next 36 hours or so. I occasionally turn on the graphics to see what is happening but seldom for more than a couple of minutes. The card is also a low priced PCI-Express using the NVidia GeForce 7300GS chipset. The CPU runtime is currently 8 hours and some tasks complete without a glitch and some don't. Memory was at 2.6 volts and raised it to 2.75v but there was no change in stability. Is there a way I can provide a link to the work this PC has done (its id is 313507) and gain a little more insight? I will see what happens after I try memtest and prime. If they show stability I will give Rosetta another try. Thanks....murky / Bob |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,821,902 RAC: 15,180 |
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=313507 ;-) As a temporary measure you could reduce your run times so that you lose less work if it does error out on you. |
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=313507 Thanks for the heads up on the computer id url. In that vein I will provide a link to the 4 Tasks that errored out. https://boinc.bakerlab.org/rosetta/result.php?resultid=41043803 https://boinc.bakerlab.org/rosetta/result.php?resultid=40834653 https://boinc.bakerlab.org/rosetta/result.php?resultid=40668898 https://boinc.bakerlab.org/rosetta/result.php?resultid=40606247 If someone can find a common thread in these that would indicate why I'm having these problems that would be great. Thanks.....murky / Bob |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
The BSOD on it's own would say there was a problem. This (the Page_Fault error) could well be bad memory OR a bad harddrive. (among many many oteer things ;-( So yes check your memory, use the default timings, hell run it overclocked and see if that is still ok. Will it still run F@H fine ? Also check your harddrive (chkdisk an full check) Other things are to reseat (take 'em out, pray to the god of electronics, put it back in) thats everything, memory, cables, reseat the CPU with a nice new thermal interface. Give it an over hall and don't forget your drivers...... F@H did you use the console version or the graphic version ? Team mauisun.org |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
btw, it could also be bad tasks/work units, there has been a few reports of client fails recently. Since you are completing valid work, keep going. Most the invalid ones are giving debug info so they shold see that at there end. (that is if they actually look ;-) Team mauisun.org |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,821,902 RAC: 15,180 |
is the Asus AI booster thing that you're running doing dynamic overclocking? If so I'd suggest disabling it. If you're running DC then you don't need any dynamic OCing - just OC it as far as you want/it'll go and you're done! I think the idea of dynamic OCing is to increase the clock rate when under load, but with DC it's always under load. Could it be that it's not overclocking when you're running prime95 etc, but is when running Rosetta because it's low priority? (although I think prime95 is low priority too...) |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Murky: How were you using F@H? With -advmethods and -bigWUs? (or whatever the big switch is). If F@H was being run with default flags, it'd use less ram than Rosetta; so might might explain why the problem is popping up here so frequently. You plan on ruling out Ram problems.. but you should also test out the HD. If you don't have a non destructive HD test like Spinrite, there's Sata HD tests on Western Digital's site. verification that it works with Sata drives. WD's download page A second possibility is software problems. Does HiJackThis! show any programs running that you don't recognize? The error log from the 386 second error result showed a number of applications running from the C:cygwin directory and from an F: drive. Error result (Can you temporarily disable them?) Or can you setup a new HD, format it, setup a clean install of WinXP on it, and then try Rosetta for a week on that? (Proving that either windows is corrupted, or some program running on the original drive conflicts with Rosetta.) |
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
[quote]The BSOD on it's own would say there was a problem. This (the Page_Fault error) could well be bad memory OR a bad harddrive. (among many many oteer things ;-( FluffyChicken, dcdc and BennyRop: thanks I will try to address all the advice in one reply :) I am running Prime95 now....2 1/2 hours. I will run checkdsk after I stop Prime95, reseat the memory etc. I thought that there might be a few bad WUs but I don't see very many people remarking on this so I assume my system has a problem. I was using the consul version for F@H, adv methods and large work units. I do not use the software to overclock....ASUS AI was uninstalled after I uninstalled BOINC but was only in the Start menu and was always closed. Thanks for the link to Western Digital. I will definitely check that out after this run of Prime95. This system has been dedicated to F@H since I built it in early spring. No surfing, no email, no virus, or adware. Using a firewall behind a router. I will have to look into the cygwin directory and the F drive reference. At this time I can not find a cygwin directory on either PC. There are only 3 drives: C, D: (CD-RW) and E: (CD- read only) I have to get back to Talladega for the end of the race :) murky |
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
[quote]BennyRop: A second possibility is software problems. Does HiJackThis! show any programs running that you don't recognize? The error log from the 386 second error result showed a number of applications running from the C:cygwin directory and from an F: drive. Error result (Can you temporarily disable them?) BennyRop: with regard to the c:cygwin...etc from the resultid=41043803...... Looking through all that information, I was able to determine that it not from my C drive! nor is the reference to F drive. Those are at the Baker Labs! There is a reference to a "jack schonbrun" I Googled that name and it is assocoiated with Baker Labs and Rosetta. There is a reference to f:rtmvctoolscrt_bld: now I am starting to wonder if this is a part of my problem. I studied my Windows event logs and have 3 occurrences of errors.Quote: Source: Side by Side, Type: error Event ID:59 Resolve Partial Assembly failed for Microsoft VC80CRT.(I see a reference to vctools and crt in the f: directory above) Continuing: Resolve Partial Assembly failed for Microsoft VC80CRT:Reference error message. The referenced assembly is not installed on your system Explanation: A component or manifest could not be activated. Possible causes include: The component or manifest depends on another program or a component is not installed. The manifest contains XML content that is not valid. The user does not have the correct permissions. I may be way out in left field but I thinks there is some connection to the errors in the event log and the failed tasks. But this is way over my head :) Regards....murky |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,821,902 RAC: 15,180 |
when i was running filemon yesterday i saw quite a few attempts to access f: (comp also had no f:) and files with 'jack schonbrun' in the path. I assume these references should have been taken out before compilation? |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Jack is the person who first setup the graphics/screensaver in the Rosetta@home program. So it sounds like bad jobs (compilations) I think it maybe something about not including the manifest file, well thats what a search on google say. Since I also get the side by side errors on my computers.. Though I've not had any bad work ? Team mauisun.org |
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
Jack is the person who first setup the graphics/screensaver in the Rosetta@home program. Thanks to everyone for their input. Prime95 ran with no errors for 17 hours and that is good enough for me :) I have just started memtest86 from a bootable CD and will give it a long run of all tests as this is not a conditioning exercise. I will look further into this SideBySide, version 5.2, Symbolic Name:"MSG_SXS_Function_Call_Fail If memtest runs without error I may give Rosetta another try. I'm having no problems on the other system murky |
murky Send message Joined: 24 Sep 06 Posts: 9 Credit: 214,896 RAC: 0 |
murky[/quote] If memtest runs without error I may give Rosetta another try. I'm having no problems on the other system memtest86 ran for 21 hours before I shut it down. 0 errors This system will go back to F@H for Team Helix and the pentium4 box will work on Rosetta. Thanks again for the input. murky |
Message boards :
Number crunching :
Many Problems
©2024 University of Washington
https://www.bakerlab.org