Message boards : Number crunching : Report Problems with Rosetta Version 5.13
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Snake Doctor Send message Joined: 17 Sep 05 Posts: 182 Credit: 6,401,938 RAC: 0 |
I Have a few - BOINC 5.4.9, Rosetta 5.13 GenuineIntel Intel(R) Pentium(R) M processor 1.86GHz Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00) Memory 2039.37 MB cash 76.56 KB swap space 932.3 MB 65.54 GB
We Must look for intelligent life on other planets as, it is becoming increasingly apparent we will not find any on our own. |
akma Send message Joined: 11 May 06 Posts: 8 Credit: 159,246 RAC: 0 |
in case this helps this is the error message i get every time it dies. HOMOLOG_ABRELAX_hom005_t283__505_7824_0 ( - exit code -1073741811 (0xc000000d)) also every time i've saw it exit it always during the switch between the inital and the full atom relax stages. |
Seth Aaronson Send message Joined: 5 Mar 06 Posts: 18 Credit: 3,976 RAC: 0 |
Here are my latest errors from the Messages tab in BOINC Manager: 5/15/2006 8:41:31 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1ogw_ROT_TRIALS_TRIE_462_10628_1 ( - exit code 1073807364 (0x40010004)) 5/15/2006 4:12:24 PM|rosetta@home|Unrecoverable error for result HOMOLOG_ABRELAX_hom006_t283__505_27635_0 ( - exit code 1073807364 (0x40010004)) 5/16/2006 12:27:44 AM|rosetta@home|Unrecoverable error for result TEST_HOMOLOG_ABRELAX_hom001_1opd__504_42274_0 ( - exit code 1073807364 (0x40010004)) The last error seems to have fozen my machine. I press ctrl-alt-delete (the old three finger salute) go into task manager, end the process 'rosetta_5.13_windows_intelx86.exe' as it's not responding, exit the dialouge to debug the app, then BOINC seems to continue to crunch my other attached projects (einstein and seti). What's the prognosis, if any? -Seth |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
More errors : This one is frustrating as it occurred on a WU that was basically getting near completion (94%+ with 25 models done) https://boinc.bakerlab.org/rosetta/result.php?resultid=20185281 My results page is basically now a computing errors report. I am getting frustrated again. I have done everything within my power. have given maintenance to my computer. I have reinstalled my Operating systems with all the hassles of having to re install many of the applications I NEED. I cannot buy a new computer. And forget about a MAC. Simply stated most of the applications I use and NEED do not have a MAC equivalent. This thing is very inefficient: a computing error unit gets re sent. That means a computer that could be doing a new work unit has to redo a Wu . In the case I am just reporting, redo a WU that was more than 94% complete. ARGH!!!! This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Another error!!!! This how my result pages looks now: https://boinc.bakerlab.org/rosetta/results.php?userid=69098 The last error happened while I was using my word processor application. The way this is going, it seems that Boinc and or Rosetta is becoming a "Windows XP, need not apply" application combo. This is getting to the point where I am basically being forced to choose between being able to use my computer for the purposes I need or running Rosetta. I am sad to say that if I am force to make that choice, I will have to stop running Rosetta. I am getting so annoyed , I am running the risk of being unfair: But it seems that the developers are not considering that the "conflict" issue is important to spend time finding a solution. This sounds harsh. But, This is how I am feeling. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Four more new errors. Should the last WU I have in queue fail, I will not be accepting more Rosetta work units until the issue of the 107 errors is solved. Simply stated, I am being forced to choose between using my computer for my daily tasks and watch how the Rosetta Wu's keep failing some of them within seconds of starting or have my computer as a Rosetta only computer ( That is not a choice.). Rosetta has become for me the computational equivalent of Russian Roulette. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Cureseekers~Kristof Send message Joined: 5 Nov 05 Posts: 80 Credit: 689,603 RAC: 0 |
After they implemented the watchdog function, it seems that 99% of the previous errors are gone. Only you do post here at regular basis errors. When I see your error rate here, I guess there has to be something with your pc. * Is your pc overclocked? If yes, try to reset it to the normal values * Do you use an optimized BOINC client? If yes, try the default * Try also to run a memory test (http://www.memtest.org/#downiso). Member of Dutch Power Cows |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Jose, you could find out if it's the Rosetta science app or your puter by attaching to a different project and running a few wus. |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
After they implemented the watchdog function, it seems that 99% of the previous errors are gone. Only you do post here at regular basis errors. My PC is NOT overclocked. I do not use an optimized client I have updated and mantained my computer and it started producing complete WUS and now ALL I am getting is the deluge of errors... I have done everything I can to my computer. All I will do now is that should I get another 107 error, I will leave Rosetta. Dont delude yourself thinking that because I am the only one reporting, I am the only one have problems. Maybe I am one of the few ones that CARES enought to report. Wasting my time is something I dont like and this thing is turning out to be a waste of time. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
pieface Send message Joined: 20 Sep 05 Posts: 17 Credit: 797,661 RAC: 0 |
Lost a Rosetta 5.13 unit overnite: 20297842 Running BM 5.4.9 on Win XP, hit one of those 0xc0000005 errors. when I looked at the machine this morning there were several dialog boxes saying rosetta was trying to connect to the internet / dns server (norton internet security). I don't know if they were related to this unit or one of the others that finished overnite though. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
...My PC is NOT overclocked. Jose, I have spent a number of hours over the last few days looking specifically at the issues you are having. Personally I do not believe the problem is specifically any one element of your hardware, Windows, BOINC or Rosetta by themselves. There is some other process running on the system (probably in the background) that is attempting to terminate Rosetta for some reason. Here is a Quote from an e-mail I received from a Windows/BOINC developer on the "-107" error type issue - "This error code should only be triggered by an external process that is This implies that something else is going on with your system. For the most part your errors are very consistent and in the same location. It is my belief that the problem is somehow related to the graphics functions of your system, but this is only a guess. The programmers are looking at these errors right now. While I understand your frustration, Please understand that these things cannot be fixed instantly, and in many cases it can take some time to track them down even if you are sitting right at the computer. In this case there are people from all over the world (literally) looking at and trying to solve this particular problem for you. But it is not easy to do it remotely. The next release of the software is closer than you think and it has been running well on RALPH. Give us at least another 48 hours to solve this. What I would try if I was there to do it, is turn off ALL screen saver functions in windows completely (including Windows screen savers). I would then cold start the system (power off restart). Then I would NOT run any graphics for any of the BOINC projects. I would wait to see if I get any errors under those conditions, but I would also keep track of what I am doing other than BOINC while I am running this test. If I then had a problem I would be able to perhaps determine if some other process (Other than BOINC/Rosetta) was involved. Moderator9 ROSETTA@home FAQ Moderator Contact |
Simon Walker Send message Joined: 17 Oct 05 Posts: 3 Credit: 459,592 RAC: 0 |
I'm seeing problems with 2 computers, both running XP, neither are over-clocked or running any optimised clients. One is an FX-53 with 2Gb of mem and the other is an AMD 64 X2 4400 with 4Gb mem. The latter (my work PC) was doing nothing except Boinc and getting mail (Outlook Open) yet managed to come up with : 16/05/2006 16:41:22|rosetta@home|Unrecoverable error for result HOMOLOG_ABRELAX_hom004_t283__505_31515_0 ( - exit code -1073741811 (0xc000000d)) 16/05/2006 16:41:22|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 16/05/2006 16:41:22||Rescheduling CPU: application exited 16/05/2006 16:41:22|rosetta@home|Computation for task HOMOLOG_ABRELAX_hom004_t283__505_31515_0 finished The other FX-53 machine has been having Rosetta problems now for quite a while now, and it's user (the wife) is getting close to getting it deleted Active PC's https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=145422 https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=193752 Results: https://boinc.bakerlab.org/rosetta/results.php?userid=5150 |
KWSN Sir Clark Send message Joined: 18 Sep 05 Posts: 46 Credit: 387,432 RAC: 0 |
Using BOINC 5.4.9 and Rosetta 5.13 WU Name: CASP_HOMOLOG_ABRELAX_hom001_t287__507_11587 (https://boinc.bakerlab.org/rosetta/workunit.php?wuid=16862883) Stuck at 1.04% after 1 hour despite having a preference setting of 1hr. It's still crunching........if it's not changed by 90min crunch time I'm going to abort it. In the screensaver it shows Accepted RMSD: ? Not sure whether this is a bug Edit: It just crashed <core_client_version>5.4.9</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2215394 No heartbeat from core client for 31 sec - exiting # random seed: 2215394 No heartbeat from core client for 31 sec - exiting # random seed: 2215394 # cpu_run_time_pref: 3600 </stderr_txt> |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
In the screensaver it shows Accepted RMSD: ? Not a bug. I means you are exploring unknown territory with one of the proteins from the CASP contest. I described my understanding of RMSD way down here in this thread. Hope this helps. PS a WU that takes more than an hour is very common, and giving it another 30 min. isn't always going to help. In this case, it errored out. But in the future look at the graphic and the steps and models crunched. You have to complete 1 model before the WU will be able to send anything back. And with a 1 hr time preference, you will often run over, and often get only the one model completed. Pay no attention to the progress %, it will say 1% something during all of model 1... and with your short time preference, will then often zip to 100% when it reaches the end of model one. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
KWSN Sir Clark Send message Joined: 18 Sep 05 Posts: 46 Credit: 387,432 RAC: 0 |
I've upped it to 2 hrs and will see what happens |
senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0 |
I am not sure if this is a problem with Version 5.13 or the new CASP workunits. Lately, I am finding that the workunits get stuck at 1.04 percent for an unknown amount of time (similiar to clark). I returned about 1.5 hours later and it was at 77 percent. The workunit reported successfully and I did not have to abort it. I think there may be a problem with the progress indicator under the workunit tab. I have not had this problem with Rosetta until my machine was sent a CASP workunit. I am using BOINC version 4.45. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Jose: Download a program called HiJackThis! from here: MajorGeeks download site When you run it, select the option that creates the log. Post the HiJackThis! log here, and we can see what's running in the background - and hopefully help the programmers identify what can be causing the 107 errors. |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
I am running another BOINC Application. Lets see what happens there. I am not running Rosetta. I cannot take more frustration for the moment. Nor I can take more aggravation. I tried everything you suggested :all but the MAC. So I I will stop being the odd-person out and leave the projects to those who can actually run it and process applications without the humongous quantity of errors I got. I seriously doubt, the conflicts will be solved. Since I am a CASP 7 observer, I will keep track of Team Rosetta's progress. I wish you all sucess. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I am not sure if this is a problem with Version 5.13 or the new CASP workunits. Lately, I am finding that the workunits get stuck at 1.04 percent for an unknown amount of time (similiar to clark). I returned about 1.5 hours later and it was at 77 percent. The workunit reported successfully and I did not have to abort it. I think there may be a problem with the progress indicator under the workunit tab. I have not had this problem with Rosetta until my machine was sent a CASP workunit. I am using BOINC version 4.45. This is normal Rosetta behavior. The work unit reaches 1.4% complete during the initialization process. It then will stay there until it completes the first model. Depending on your time setting and the size of the protein being examined, the percent complete can do a variety of different things. If the first model takes longer then you time setting, It will jump from 1.4% to 100% complete in one leap. If the first model takes only half of your time setting, the percent will jump to 50%. It progresses in that way. There is a lot more detail in the FAQs linked in my signature. But what you are seeing is completely normal. You may not have notice that the time to completion, rises as the CPU time rises. When the Percent complete jumps forward, the time remaining falls back to a more accurate number as well. This is also normal. Moderator9 ROSETTA@home FAQ Moderator Contact |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
When you're seeing reports of 1.x percent done.. it's not an actual percentage. Those are specific points in the program, and it's informing us where it is. Such as point 1.040, 1.042, etc. Once it finishes a model, it figures out how much time it took, looks at your time setting, and then determines the percentage done, and whether there's time to create another model. Since some of the Casp7 models are taking much longer than we're used to, we're much more likely to see 4, 6, or 8 hour WUs on our machines, and they'll click slowly through the 1.x percent done messages, finish the model, notice that it's over the time limit, pop up a message about being 100% complete, and then send the WU back to the lab. |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.13
©2024 University of Washington
https://www.bakerlab.org