Message boards : Number crunching : Report Problems with Rosetta Version 5.12
Author | Message |
---|---|
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Rosetta@Home Version 5.12 has been released! Please DO NOT abort Work Units from previous versions if they are running well. The science from them is still important to the project. The new software and Work Units for it will download automatically. This thread is for reporting errors related to Rosetta application version 5.12. When reporting errors please include a link to the results from any related Work Units. As always, Credit will be awarded for failed Work units on a DAILY Basis. For errors related to Version 5.07 continue to use this thread. For information on the new Version and what it is supposed to do see this post. For a message from Dr. Baker about the work unit runs in preparation for CASP see his journal entry here. For important project information updates see This Thread (you may subscribe to the thread for e-mail updates) Moderator9 ROSETTA@home FAQ Moderator Contact |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
Looks like I'm the first to report an error on v5.12... Computer ID = 182506 5/10/2006 10:31:38 AM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_1291_0 ( - exit code -1073741811 (0xc000000d)) 5/10/2006 10:31:43 AM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_1333_0 ( - exit code -1073741811 (0xc000000d)) Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
Stu D. Send message Joined: 3 Mar 06 Posts: 8 Credit: 575,867 RAC: 0 |
5/10/2006 12:16:23 PM|rosetta@home|Unrecoverable error for result JUMP_NATIVEBARCODE_ANTIPARALLEL_1tul__SAVE_ALL_OUT_492_877_0 ( - exit code -1073741811 (0xc000000d)) 5/10/2006 12:16:19 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_4396_0 ( - exit code -1073741811 (0xc000000d)) Computer...175527 Stu |
Stu D. Send message Joined: 3 Mar 06 Posts: 8 Credit: 575,867 RAC: 0 |
5/10/2006 4:32:33 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_9050_0 ( - exit code -1073741811 (0xc000000d)) 5/10/2006 4:32:38 PM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_7224_0 ( - exit code -1073741811 (0xc000000d)) Computer...175527 AMD x2 3800 No problem like this with prior versions. Stu |
andreas Send message Joined: 22 Nov 05 Posts: 2 Credit: 10,465 RAC: 0 |
trying to resume Rosetta@home after the machine was offline for over a month, I get only errors 2006-05-10 13:13:11 [rosetta@home] execv(../../projects/boinc.bakerlab.org_rosetta/rosetta_5.12_i686-pc-linux-gnu) failed: error -1 2006-05-10 13:13:11 [rosetta@home] Starting result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 using rosetta version 512 2006-05-10 13:13:12 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 (process exited with code 26 (0x1a)) 2006-05-10 13:13:12 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 (process exited with code 26 (0x1a)) 2006-05-10 13:13:12 [---] request_reschedule_cpus: process exited 2006-05-10 13:13:12 [rosetta@home] Computation for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 finished 2006-05-10 13:13:12 [rosetta@home] Starting result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 using rosetta version 512 2006-05-10 13:13:12 [rosetta@home] execv(../../projects/boinc.bakerlab.org_rosetta/rosetta_5.12_i686-pc-linux-gnu) failed: error -1 2006-05-10 13:13:13 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 (process exited with code 26 (0x1a)) 2006-05-10 13:13:13 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 (process exited with code 26 (0x1a)) 2006-05-10 13:13:13 [---] request_reschedule_cpus: process exited 2006-05-10 13:13:13 [rosetta@home] Computation for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 finished For now, I have suspended this project. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
trying to resume Rosetta@home after the machine was offline for over a month, I This is a file error related to an open file. Since you have been offline for a while try resetting the project. Moderator9 ROSETTA@home FAQ Moderator Contact |
Kerwin Send message Joined: 19 Sep 05 Posts: 10 Credit: 1,773,393 RAC: 0 |
I was looking at the graphics for 5.12 and suddenly they stopped, the computer froze, my screen went blank and all of a sudden it reappeared with a message from my Catalyst driver stating that the video card was no longer accepting commands from the driver and that it had to be reset. Once this error message popped up, everything went back to normal, graphics and all. I'm using an ATI X850 XT. This is the second time this happened. It also happened with a result using 5.07 with the same outcome. Edit: I continued watching the graphics and it happened again, but this time 4 in a row a few seconds apart. By the fourth time, the video card stopped responding and the driver reverted to software rendering. At this point I chose to restart my machine. This happened 4.70% into JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_4697_0 |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I was looking at the graphics for 5.12 and suddenly they stopped, the computer froze, my screen went blank and all of a sudden it reappeared with a message from my Catalyst driver stating that the video card was no longer accepting commands from the driver and that it had to be reset. Once this error message popped up, everything went back to normal, graphics and all. This is very interesting and the first time I have seen a report of this behavior. There is another thread here with a discussion of memory and graphics software issues. You should add this report to that thread as well as it may be a related issue. Moderator9 ROSETTA@home FAQ Moderator Contact |
andreas Send message Joined: 22 Nov 05 Posts: 2 Credit: 10,465 RAC: 0 |
trying to resume Rosetta@home after the machine was offline for over a month, I I reset this project and it upgraded itself to version 5.13 which seems to be running fine so far. |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_23413_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=19786764 Got to 100% and then just sat there, and sat there, and ... Got an interesting error though: *** glibc detected *** corrupted double-linked list: 0x0b4af0f8 *** dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_23413_0 Well you should get full credit tonight (check the result display for the Work Unit not your Work units list display). But that message bothers me a bit. It could be caused by two or three things. The most serious would be a disk problem on your hard disk. Keep an eye on your system and see if you have any more of these. Moderator9 ROSETTA@home FAQ Moderator Contact |
hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0 |
this compter 16420818 16420797 16420710 BOINC ver 5.4.9 Win XP home sp2 3.00 GHz 1GB RAM 12/05/2006 11:28:04 AM||Suspending network activity - user request - running CPU benchmarks 12/05/2006 11:28:06 AM||Running CPU benchmarks 12/05/2006 11:29:05 AM||Benchmark results: 12/05/2006 11:29:05 AM|| Number of CPUs: 2 12/05/2006 11:29:05 AM|| 1311 floating point MIPS (Whetstone) per CPU 12/05/2006 11:29:05 AM|| 1209 integer MIPS (Dhrystone) per CPU 12/05/2006 11:29:05 AM||Finished CPU benchmarks 12/05/2006 11:29:06 AM|rosetta@home|Resuming task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0 using rosetta version 512 12/05/2006 11:29:06 AM|rosetta@home|Resuming task JUMP_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16802_0 using rosetta version 512 12/05/2006 11:29:06 AM||Resuming computation 12/05/2006 11:29:06 AM||Rescheduling CPU: Resuming computation 12/05/2006 12:58:06 PM|rosetta@home|Aborting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0: exceeded disk limit: 101397335.000000 > 100000000.000000 12/05/2006 12:58:06 PM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0 (Maximum disk usage exceeded) 12/05/2006 12:58:06 PM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 12/05/2006 12:58:08 PM||Rescheduling CPU: application exited 12/05/2006 12:58:08 PM|rosetta@home|Computation for task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0 finished 12/05/2006 12:58:08 PM|rosetta@home|Starting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0 using rosetta version 512 12/05/2006 1:38:25 PM||Rescheduling CPU: application exited 12/05/2006 1:38:25 PM|rosetta@home|Computation for task JUMP_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16802_0 finished 12/05/2006 1:38:25 PM|rosetta@home|Starting task JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_16837_0 using rosetta version 512 12/05/2006 9:28:15 PM|rosetta@home|Aborting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0: exceeded disk limit: 100258025.000000 > 100000000.000000 12/05/2006 9:28:15 PM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0 (Maximum disk usage exceeded) 12/05/2006 9:28:15 PM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 12/05/2006 9:28:17 PM||Rescheduling CPU: application exited 12/05/2006 9:28:17 PM|rosetta@home|Computation for task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0 finished 12/05/2006 9:28:17 PM|rosetta@home|Starting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0 using rosetta version 512 13/05/2006 7:38:36 AM|rosetta@home|Aborting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0: exceeded disk limit: 102689604.000000 > 100000000.000000 13/05/2006 7:38:36 AM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0 (Maximum disk usage exceeded) 13/05/2006 7:38:36 AM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 13/05/2006 7:38:38 AM||Rescheduling CPU: application exited 13/05/2006 7:38:38 AM|rosetta@home|Computation for task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0 finished 13/05/2006 7:38:38 AM|rosetta@home|Starting task JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_16815_0 using rosetta version 512 Edit to add highlighting |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
this compter Hugo, You are currently working with Rosetta version 5.12. There is a problem in the BOINC API that causes the errors you are seeing. They will stop as soon as you get Rosetta Version 5.13. Version 5.12 was only available for about 4 hours, but it seems your queue filled up with Work units during that time. If you reset the project you will get the new version. If you have Work you have not started, suspend those work units and let those you have started complete before you do the rest so you will not loose the time. Then when all the active work unit have been reported, do the reset. That should load version 5.13 and fix the problem Moderator9 ROSETTA@home FAQ Moderator Contact |
Darren Send message Joined: 6 Oct 05 Posts: 27 Credit: 43,535 RAC: 0 |
I've also got an error with 5.12 for "exceeded disk limit". This is on a Gentoo Linux system, so the windows debug code shouldn't be the problem. Sat May 13 00:41:08 2006|rosetta@home|Aborting result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16783_0: exceeded disk limit: 103092700.000000 > 100000000.000000 Sat May 13 00:41:08 2006|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16783_0 (Maximum disk usage exceeded) Sat May 13 00:41:09 2006||request_reschedule_cpus: process exited Sat May 13 00:41:09 2006|rosetta@home|Computation for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16783_0 finished It's this work unit, which is my only 5.12 work unit - the next is 5.13. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I've also got an error with 5.12 for "exceeded disk limit". This is on a Gentoo Linux system, so the windows debug code shouldn't be the problem. Actually the debug issue was in 5.12 and was a conflict with BOINC. While more prominent in windows machines we also saw it on some linux machines as well. If you are clear of 5.12 work units, that should end the problem. 5.13 and up should be ok. Moderator9 ROSETTA@home FAQ Moderator Contact |
senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0 |
I still have a Version 5.12 work unit suspended in my queue until I finish crunching workunits for LHC. Should I abort this workunit? |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I still have a Version 5.12 work unit suspended in my queue until I finish crunching workunits for LHC. Should I abort this workunit? Not if you can run it in time to report the result. No matter what happens you will get credit, so I would let it loose to run. Moderator9 ROSETTA@home FAQ Moderator Contact |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=19754192 checkpoint CPU time: 28235.070000 current CPU time: 28241.740000 fraction done: 1.000000 https://boinc.bakerlab.org/rosetta/result.php?resultid=19771884 checkpoint CPU time: 28621.150000 current CPU time: 28626.950000 fraction done: 1.000000 Both of these are JUMP_ALLBARCODES_* workunits and are the first real problems I've noticed since the recent rosetta updates included the watchdog. These differed from past errors in 2 ways, firstly fraction done is exactly 1.00 although checkpoints have been written & both of these were completely frozen (2 different linux servers), now when I say frozen I mean not even the cpu time was increasing unlike the old stuck wu's of past rosetta versions. Anyway the result urls both show glibc errors which is a first out of both of these servers to the best of my knowledge and since both the wu's are JUMP_ALLBARCODES_* I figure this is not a coincidence. I manually aborted both in the end, since I dont cache wu's time sent shows both of these wu's have been stuck for 5 days each. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=19754192 Well you won't likely see any more of these with the new version. Moderator9 ROSETTA@home FAQ Moderator Contact |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.12
©2024 University of Washington
https://www.bakerlab.org