WU not reporting CPU time

Message boards : Number crunching : WU not reporting CPU time

To post messages, you must log in.

AuthorMessage
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 105
Message 7331 - Posted: 23 Dec 2005, 2:52:08 UTC
Last modified: 23 Dec 2005, 2:53:21 UTC

I have a Win98SE machine with a PII running 100% Rosetta. A short while ago it hit one of thos short running WUs. It aborted after 180 seconds. It then started up the next WU in the queue. I happened to notice that the Boinc Manager says it was running but the CPU time for it is empty (actually, it had -- in it.) I brought up the graphics display and it *was* running. The cpu time on the graphics said 0 hrs 0 min 0 sec. I let it keep going. It soon finished the first pass and started on the second. The cpu time jumped to 0 hr 17 min 29 sec. As the second pass is processing now, the clock is not advancing. I'm going to guess that when the second pass finishes, the clock will jump again.

I've never seen this before and I think someone else reported this earlier today (and ended up getting into a bit of a tussle with Bill Michael.) I'm going to let this keep going to see how it ends up. The WU in question can be seen at:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3893355

I've been running RAH on this computer (and my home Linux server) for quite a while now and neither has given me any problems before. Both run RAH 100%. See my profile for why if you're curious.
-Charlie
ID: 7331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,450
RAC: 5
Message 7334 - Posted: 23 Dec 2005, 3:30:50 UTC

The basic problem is that Win9x doesn't "know" about CPU time... there's just not a system call that accurately tracks it. For whatever reason, it appears that Rosetta 4.81 "trips" the problem a lot more often than 4.80 did. On other projects, if you send in a "0 seconds, 0 credit" result, you may never even notice it, because the quorum approach causes you to get the middle credit anyway. Here, it's obvious.

The "tussle" had nothing to do with the problem. :-/

On other projects, the recommendation has been to set "leave application in memory" to _no_ if you're on Win9x, which is just the opposite of the recommendation for Rosetta. I don't know if this will help or hurt overall - more errors, but fewer 'wasted' 0's?

Flops-counting will solve this, as the time becomes unimportant. I don't know when we'll see that though.

ID: 7334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 105
Message 7336 - Posted: 23 Dec 2005, 3:43:03 UTC

Well, I've been running RAH since September and this is the first time I've seen this particular problem. It's always recorded the cpu time continuously before. When I was running more than one project at a time I did notice the cpu time increasing for a paused WU. Boy, did that inflate the requested credit! Also, Win98 (and I assume the SE variant) cannot report cpu time. What it is actually reporting is wall time, which would explain the increasing time for a paused WU.

Oh, and it did take a jump in time when it finished the second trajectory as I suspected it would.

Charlie


-Charlie
ID: 7336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 105
Message 7338 - Posted: 23 Dec 2005, 3:48:11 UTC - in response to Message 7334.  
Last modified: 23 Dec 2005, 3:48:24 UTC


The "tussle" had nothing to do with the problem. :-/


Oh, I realize that. Without going back and rereading the posts, I thought the person was reporting the same (or similar) problem - the cpu clock had apparently stopped. That's all I was trying to say. Sorry if I was not clear on that. I'm in agreement with you on the whole thing.

Hope you have a great holiday.

Charlie

-Charlie
ID: 7338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,450
RAC: 5
Message 7339 - Posted: 23 Dec 2005, 4:00:28 UTC

You might find this thread at SETI interesting. It seems that after an error, the next result processed is more likely to have "CPU timing" issues. So it may be the "short WUs" here, rather than the app version change...

ID: 7339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 105
Message 7421 - Posted: 23 Dec 2005, 21:12:09 UTC

Well, I won't do that again! I let the WU continue until it finished. Even though it said it used some amount of CPU time (probably about 18 hours - hey give me a break! It's an old slow 300 MHz PII!), when all was said and done and it was reported, it showed up as 0 cpu seconds for 0 credit. Next time I restart the WU and if that doesn't work, I'll reboot.

WU is https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3893355

Charlie
-Charlie
ID: 7421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7426 - Posted: 23 Dec 2005, 21:30:48 UTC - in response to Message 7421.  

... Next time I restart the WU and if that doesn't work, I'll reboot.


I think if you see the clock issue, reboot right away. Win98 is clearly confused and even if the WU runs OK you don't know what other problems the operating system has got. Once an OS starts to crumble, the longer you leave it the worse the symptoms!

River~~
ID: 7426 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : WU not reporting CPU time



©2024 University of Washington
https://www.bakerlab.org