Message boards : Number crunching : 1% for 37 hours
Author | Message |
---|---|
TimRoberts Send message Joined: 6 Oct 05 Posts: 3 Credit: 46,136 RAC: 0 |
I have a WU running 37 hours, completed 1%. What should I do? |
mrwizer Send message Joined: 18 Sep 05 Posts: 23 Credit: 507,085 RAC: 0 |
This thread has more info... https://boinc.bakerlab.org/rosetta/forum_thread.php?id=78 You can try restarting BOINC manager, or the WU. I have had a WU stay frozen after this though, so aborting may be the last option. |
Shaktai Send message Joined: 21 Sep 05 Posts: 56 Credit: 575,419 RAC: 0 |
For work units stuck at 1%, restarting the BOINC will fix it most of the time. It seems to happen most often on Windows machines with HT (yours?), dual cores or dual processors. The issue is being looked into by the Rosetta team. Team MacNN - The best Macintosh team ever. |
TimRoberts Send message Joined: 6 Oct 05 Posts: 3 Credit: 46,136 RAC: 0 |
Stopping and starting BOIC didn't seem to have any effect (though I may just have been impatient!) so I aborted the WU and downloaded another, which is working fine. And no, I don't have HT (at least that I'm aware of!) or dual processors, just a standard Dell desktop multimedia machine running Windows XP Service Pack 2. |
Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0 |
And no, I don't have HT (at least that I'm aware of!) Looking at the PCs listed for you, the one with the aborted WUs does appear to have HT (it shows two CPUs). I don't know why you have so many errors though. Do you run only Rosetta on the PC and if not, have you set it to keep the Work Units in memory when swapping? Seems to have enough memory (1GB) to do it. (Oh, and why run a team of 1 - see below :-) *** Join BOINC@Australia today *** |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
Are you running any of the other BOINC projects? If so, with what result? What I'm trying to establish is, is this a you/Rosetta problem or a you/BOINC in general problem. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
mrwizer Send message Joined: 18 Sep 05 Posts: 23 Credit: 507,085 RAC: 0 |
Personally, I have had the 1% hang on AMD and Intel boxes, with or without HT, and on Win2000 and WinXP. Just had two more this morning. All running BOINC 4.45. Oh, and all currently only run Rosetta. |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
Personally, I have had the 1% hang on AMD and Intel boxes, with or without HT, and on Win2000 and WinXP. Just had two more this morning. All running BOINC 4.45. And I've had the same experience - AMD or Intel (with and without HT) all running windows, and on BOINC 4.72 and 5.2.x Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
I think we've all had a few stick or whatever. If you look at the OP's results though, he is getting a very high proportion of failures relative to successes. I suspect the more general problem we see, and the problem he specifically is having may not be the same. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
The Pirate Send message Joined: 22 Sep 05 Posts: 20 Credit: 7,090,933 RAC: 0 |
|
TimRoberts Send message Joined: 6 Oct 05 Posts: 3 Credit: 46,136 RAC: 0 |
In answer to a couple of queries, I run other BOINC projects (SETI, EINSTEIN, Climate) without any problem. As for the team of 1, I only just set it up a couple of days ago, not finding any other Australian groups...happy to disband it and join another existing one though... Tim |
Rebirther Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0 |
And again the old problem, don`t know what the program is doing, 1% after 5h with the new 1hz7A WU, after restart Boinc all is fine again :( |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
These are getting somewhat more frequent with me, restarting BOINC has NO effect and I leave them in memory when swapped out. It would seem the only cure is to abort them. I hope to try and catch them when they are less than an hour 'old', but I have had some way over 6 hours before I have notices them....all stuck on 1%. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
SOMETIMES, suspending them and restarting them works, but more often, stop and restart BOINC and they will run. In general, I have BOINC View running and I start paying lots of attention if one of my computers is at 10-15 minutes with Rosetta@Home at 1% ... This is a "known" problem (though David Kim has not indicated if he has any idea of what is causing it ... David?). I did suggest that a time "cap" be placed on the start up of a work unit, though it was pointed out that the use of a fixed amount of time is not viable, I still think we can come up with something, like, if time to complete is 4 hours, it probably should not take 30 minutes to pass 1% ... CPDN has something like this I think in their start up ... if it does not initialize right it will "rewind" ... |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
I tried both the suspend and restart of Boinc, neither worked for me! :-(( |
Robin Bastards Send message Joined: 5 Nov 05 Posts: 31 Credit: 5,166 RAC: 0 |
Forgive the noob ignorance but I have a lil query. I am currently running a unit and it too is at 1%....cpu run time says over six hours and the est time to completion continues to increase. Does that suggest a unit that will fail or something? Should I try restarting boinc or just leave it be? wondering why we bother anymore |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
It sounds like the same problem, try an exit of Boinc and a resart....it did not work for me and I had to abort the unit....loosing 7 or so hours... |
Robin Bastards Send message Joined: 5 Nov 05 Posts: 31 Credit: 5,166 RAC: 0 |
Yup just did that a few minutes ago.........All it appears to have done is reset the runtime. Best go check the other rigs!!! wondering why we bother anymore |
Robin Bastards Send message Joined: 5 Nov 05 Posts: 31 Credit: 5,166 RAC: 0 |
Was just about to abort when it jumped to 5%.....will let her run a while wondering why we bother anymore |
Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0 |
It sounds like the same problem, try an exit of Boinc and a resart....it did not work for me and I had to abort the unit....loosing 7 or so hours... David has said somewhere earlier, I can't find the posts right now, that you shall send the stdout file to him. He can see in that what's going on. I had one WU stuck at 1 % for about a half hour, where I sent the stdout file to him, and he responded this: "Let it run for a couple hours and see if it gets past 1%. From the log file, it looks like it almost finished the first structure but was restarted." So he can see what's going on. His mail adress is dekim at u dot washington dot edu. [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] |
Message boards :
Number crunching :
1% for 37 hours
©2024 University of Washington
https://www.bakerlab.org