Message boards : Number crunching : Should I abort work unit?
Author | Message |
---|---|
Ed and Harriet Griffith Send message Joined: 17 Sep 05 Posts: 39 Credit: 1,905,063 RAC: 842 |
Normally Rosetta work units last between one and six hours, but I have one which is over 11 hours, still working, and still only shows 1% done. It does not look right, but I hate to abort a unit if there is science there. Any advice? Should I continue to 24 hours before aborting? I run 1.8 MHz computer with 496 Mb RAM. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 23 |
Have you tried stopping your BOINC core client then restarting it again? That often gets a stuck wu Rosetta going again. Also, checking the "Leave in memory" box on your application setup helps. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,102,177 RAC: 216 |
If I can help it I don't let any Rosetta WU run for more than 70-80 Min's @ 1%, if it gets that far I either shut BOINC Down and restart it or I just Abort it. It's a total waste of CPU time to let these WU's run Hour after Hour & never get past the 1% Mark ... IMO |
Osku87 Send message Joined: 1 Nov 05 Posts: 17 Credit: 280,268 RAC: 0 |
I had the same problem and rebooting the Boinc-client helped. I had ran it about 5 hours with Sempron 2800+. After rebooting the client CPU time resetted. Now it has ran properly being in 70% after 1 hour and 9 minutes. |
Beatminister Send message Joined: 23 Oct 05 Posts: 7 Credit: 206,304 RAC: 0 |
|
Alan Bridgewater Send message Joined: 24 Nov 05 Posts: 2 Credit: 10,205 RAC: 0 |
I've noticed twice now when I'm fooling around with settings like Suspending/Unsuspending tasks, Rosetta seems to have a problem with it. When I unsuspend the Rosetta WU, I see the % stay the same, and the "To Completion" time increases by one second every 5 seconds. After a few hours of that, I just end up killing the work unit, which I don't like to do, but I'm not going to have my machine wasting its time and never completing the WU. Furthermore, after I deleted the work unit and got another one, I believe it went into the same state. I had to delete the files from Rosetta and download another work item. And then it worked perfectly. Until the next time I was fooling around with suspending/unsuspending work. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 11 |
Suspending/Unsuspending tasks If you have "leave applications in memory when preempted" set to "no", I'd expect what you're describing (or worse). If that setting is "yes", then this is something the developers need to take a look at. |
Alan Bridgewater Send message Joined: 24 Nov 05 Posts: 2 Credit: 10,205 RAC: 0 |
Interesting. I checked my settings, and "No" was selected. My client was just in that state not long ago, but I decided to let it go for a few more hours and it somehow corrected itself and completed the WU. Anyway, I've changed the setting to "Yes", and will change that setting for all the other clients as well. Hopefully the problem won't happen again. Thanks for the tip. |
Message boards :
Number crunching :
Should I abort work unit?
©2024 University of Washington
https://www.bakerlab.org