Message boards : Number crunching : Problem with WU?
Author | Message |
---|---|
tng* Send message Joined: 28 Oct 05 Posts: 14 Credit: 5,389,798 RAC: 0 |
Happened to notice that a WU has used almost 5 hours of CPU time (as opposed to a previous maximum of about 2) and has spent at least 1/2 hour at 100% complete. On inspection, I see that I was reissued this one after another machine missed the deadline. I tried exiting BOINC and restarting it, but still the same. Here's the WU: 141080 Machine is a 1 GHz P3, XP SP2, BOINC 5.2.2. New to this project, but from what I can tell this isn't normal, and the fact that somebody else has already missed deadline on this one causes me some concern (although that host has only completed 1 WU, the user's hostlist leads me to believe that one of his systems could be stuck on the same WU for a month and be overlooked). If it's like the other WU issued to that machine at the same time, this will be longer than the others my machine has run, but I'm still concerned. Would appreciate feedback on whether this sort of behavior is normal, and if not should I just kill that WU or let it go so somebody can investigate? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
See this thread |
tng* Send message Joined: 28 Oct 05 Posts: 14 Credit: 5,389,798 RAC: 0 |
|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
This is one of the larger WUs. Give it another hour and if it is still stuck email me the stdout.txt file to dekim at u.washington.edu |
tng* Send message Joined: 28 Oct 05 Posts: 14 Credit: 5,389,798 RAC: 0 |
This is one of the larger WUs. Give it another hour and if it is still stuck email me the stdout.txt file to dekim at u.washington.edu It finally finished (6:16:47 CPU, at least 1:45 at 100%). Next time I'll know to be more patient. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
It finally finished (6:16:47 CPU, at least 1:45 at 100%). Next time I'll know It normally shouldn't get stuck at 100% for that long but there is a bug in the app that may cause this if the job is restarted when it is almost finished. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
It normally shouldn't get stuck at 100% for that long but there is a bug in the app that may cause this if the job is restarted when it is almost finished. I have one that was at 14 hours or so (slow G3 iBook) this morning, and at 100%. Last night it was at 91 % and 10 hours. I come back now and it's still at 100%, showing 17 hours, preempted by SETI at the moment. Apps are left in memory, so it shouldn't be getting reset, and "switch between" is an hour. I've suspended SETI, I'll give it another couple of hours... it's "1btn__abrelax_10472_1" if that helps. WU id 44502, which also was not returned by another user. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
it's still at 100%, showing 17 hours Now at 19:42, still at 100%... stdout has several "warnings" in it, tons of data, nothing meaningful or particularly enlightening. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
it's still at 100%, showing 17 hours Okay... this WU started on 10/29, is now at 23 hours, been showing 100% for the last 9 hours! The stdout file is still being written to, still seems to be actually accomplishing something, but this is getting ridiculous. Last four hours it had the CPU all to itself; I've now resumed SETI, so at least that computer can be accomplishing something. I hate to abort the Rosetta result, and lose a full day's work... is there any way to tell if/when this thing will ever finish??? The Windows box crunches along happily... only the two Macs have had problems... |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Can you email me the stdout.txt file? dekim at u.washington.edu |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
Can you email me the stdout.txt file? It's on it's way! |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
Just to report the final outcome - David K took a look and said it was still progressing, so I let it run, and WU 44502 finally completed after 114,203.89 seconds (31.7 hours, the last half sitting showing 100%) and got 122.97 credits... |
Message boards :
Number crunching :
Problem with WU?
©2024 University of Washington
https://www.bakerlab.org