1% for 37 hours

Message boards : Number crunching : 1% for 37 hours

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
TimRoberts

Send message
Joined: 6 Oct 05
Posts: 3
Credit: 46,136
RAC: 0
Message 1505 - Posted: 19 Oct 2005, 6:38:26 UTC

I have a WU running 37 hours, completed 1%.
What should I do?
ID: 1505 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrwizer
Avatar

Send message
Joined: 18 Sep 05
Posts: 23
Credit: 507,085
RAC: 0
Message 1508 - Posted: 19 Oct 2005, 7:06:39 UTC

This thread has more info...

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=78

You can try restarting BOINC manager, or the WU. I have had a WU stay frozen after this though, so aborting may be the last option.
ID: 1508 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Shaktai
Avatar

Send message
Joined: 21 Sep 05
Posts: 56
Credit: 575,419
RAC: 0
Message 1509 - Posted: 19 Oct 2005, 7:22:09 UTC

For work units stuck at 1%, restarting the BOINC will fix it most of the time. It seems to happen most often on Windows machines with HT (yours?), dual cores or dual processors. The issue is being looked into by the Rosetta team.


Team MacNN - The best Macintosh team ever.
ID: 1509 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TimRoberts

Send message
Joined: 6 Oct 05
Posts: 3
Credit: 46,136
RAC: 0
Message 1528 - Posted: 19 Oct 2005, 22:57:34 UTC

Stopping and starting BOIC didn't seem to have any effect (though I may just have been impatient!) so I aborted the WU and downloaded another, which is working fine. And no, I don't have HT (at least that I'm aware of!) or dual processors, just a standard Dell desktop multimedia machine running Windows XP Service Pack 2.
ID: 1528 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 17 Sep 05
Posts: 161
Credit: 162,253
RAC: 0
Message 1529 - Posted: 19 Oct 2005, 23:56:32 UTC - in response to Message 1528.  
Last modified: 19 Oct 2005, 23:58:12 UTC

And no, I don't have HT (at least that I'm aware of!)


Looking at the PCs listed for you, the one with the aborted WUs does appear to have HT (it shows two CPUs). I don't know why you have so many errors though.

Do you run only Rosetta on the PC and if not, have you set it to keep the Work Units in memory when swapping? Seems to have enough memory (1GB) to do it.

(Oh, and why run a team of 1 - see below :-)

*** Join BOINC@Australia today ***
ID: 1529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 1547 - Posted: 20 Oct 2005, 16:19:00 UTC

Are you running any of the other BOINC projects? If so, with what result?

What I'm trying to establish is, is this a you/Rosetta problem or a you/BOINC in general problem.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 1547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrwizer
Avatar

Send message
Joined: 18 Sep 05
Posts: 23
Credit: 507,085
RAC: 0
Message 1548 - Posted: 20 Oct 2005, 16:46:15 UTC
Last modified: 20 Oct 2005, 16:47:29 UTC

Personally, I have had the 1% hang on AMD and Intel boxes, with or without HT, and on Win2000 and WinXP. Just had two more this morning. All running BOINC 4.45.

Oh, and all currently only run Rosetta.
ID: 1548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 1549 - Posted: 20 Oct 2005, 17:03:43 UTC - in response to Message 1548.  

Personally, I have had the 1% hang on AMD and Intel boxes, with or without HT, and on Win2000 and WinXP. Just had two more this morning. All running BOINC 4.45.

Oh, and all currently only run Rosetta.

And I've had the same experience - AMD or Intel (with and without HT) all running windows, and on BOINC 4.72 and 5.2.x

Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 1549 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 1550 - Posted: 20 Oct 2005, 18:19:30 UTC

I think we've all had a few stick or whatever. If you look at the OP's results though, he is getting a very high proportion of failures relative to successes. I suspect the more general problem we see, and the problem he specifically is having may not be the same.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 1550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The Pirate
Avatar

Send message
Joined: 22 Sep 05
Posts: 20
Credit: 7,090,933
RAC: 0
Message 1555 - Posted: 21 Oct 2005, 2:28:58 UTC - in response to Message 1547.  

Are you running any of the other BOINC projects? If so, with what result?

What I'm trying to establish is, is this a you/Rosetta problem or a you/BOINC in general problem.


Myself, I have only had it stick on Rosetta. All other projects run jus fime.

ID: 1555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TimRoberts

Send message
Joined: 6 Oct 05
Posts: 3
Credit: 46,136
RAC: 0
Message 1560 - Posted: 21 Oct 2005, 7:38:30 UTC

In answer to a couple of queries, I run other BOINC projects (SETI, EINSTEIN, Climate) without any problem. As for the team of 1, I only just set it up a couple of days ago, not finding any other Australian groups...happy to disband it and join another existing one though...

Tim
ID: 1560 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Rebirther
Avatar

Send message
Joined: 17 Sep 05
Posts: 116
Credit: 41,315
RAC: 0
Message 2068 - Posted: 2 Nov 2005, 17:56:02 UTC

And again the old problem, don`t know what the program is doing, 1% after 5h with the new 1hz7A WU, after restart Boinc all is fine again :(
ID: 2068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 2693 - Posted: 9 Nov 2005, 7:19:40 UTC

These are getting somewhat more frequent with me, restarting BOINC has NO effect and I leave them in memory when swapped out.
It would seem the only cure is to abort them. I hope to try and catch them when they are less than an hour 'old', but I have had some way over 6 hours before I have notices them....all stuck on 1%.
ID: 2693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 2703 - Posted: 9 Nov 2005, 11:47:05 UTC

SOMETIMES, suspending them and restarting them works, but more often, stop and restart BOINC and they will run.

In general, I have BOINC View running and I start paying lots of attention if one of my computers is at 10-15 minutes with Rosetta@Home at 1% ...

This is a "known" problem (though David Kim has not indicated if he has any idea of what is causing it ... David?).

I did suggest that a time "cap" be placed on the start up of a work unit, though it was pointed out that the use of a fixed amount of time is not viable, I still think we can come up with something, like, if time to complete is 4 hours, it probably should not take 30 minutes to pass 1% ... CPDN has something like this I think in their start up ... if it does not initialize right it will "rewind" ...
ID: 2703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 2706 - Posted: 9 Nov 2005, 12:22:27 UTC

I tried both the suspend and restart of Boinc, neither worked for me! :-((
ID: 2706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Robin Bastards
Avatar

Send message
Joined: 5 Nov 05
Posts: 31
Credit: 5,166
RAC: 0
Message 2707 - Posted: 9 Nov 2005, 12:55:14 UTC
Last modified: 9 Nov 2005, 12:55:40 UTC

Forgive the noob ignorance but I have a lil query.

I am currently running a unit and it too is at 1%....cpu run time says over six hours and the est time to completion continues to increase.

Does that suggest a unit that will fail or something?
Should I try restarting boinc or just leave it be?


wondering why we bother anymore
ID: 2707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 2712 - Posted: 9 Nov 2005, 13:17:13 UTC

It sounds like the same problem, try an exit of Boinc and a resart....it did not work for me and I had to abort the unit....loosing 7 or so hours...
ID: 2712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Robin Bastards
Avatar

Send message
Joined: 5 Nov 05
Posts: 31
Credit: 5,166
RAC: 0
Message 2713 - Posted: 9 Nov 2005, 13:19:01 UTC
Last modified: 9 Nov 2005, 13:19:46 UTC

Yup just did that a few minutes ago.........All it appears to have done is reset the runtime.

Best go check the other rigs!!!
wondering why we bother anymore
ID: 2713 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Robin Bastards
Avatar

Send message
Joined: 5 Nov 05
Posts: 31
Credit: 5,166
RAC: 0
Message 2714 - Posted: 9 Nov 2005, 13:22:53 UTC

Was just about to abort when it jumped to 5%.....will let her run a while
wondering why we bother anymore
ID: 2714 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 2720 - Posted: 9 Nov 2005, 14:26:03 UTC - in response to Message 2712.  

It sounds like the same problem, try an exit of Boinc and a resart....it did not work for me and I had to abort the unit....loosing 7 or so hours...


David has said somewhere earlier, I can't find the posts right now, that you shall send the stdout file to him. He can see in that what's going on.

I had one WU stuck at 1 % for about a half hour, where I sent the stdout file to him, and he responded this:

"Let it run for a couple hours and see if it gets past 1%. From the
log file, it looks like it almost finished the first structure but
was restarted."

So he can see what's going on. His mail adress is dekim at u dot washington dot edu.



[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 2720 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : 1% for 37 hours



©2024 University of Washington
https://www.bakerlab.org