Maximum CPU time Exceeded...How about some granted credit!

Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit!

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile nasher

Send message
Joined: 5 Nov 05
Posts: 98
Credit: 625,341
RAC: 647
Message 9666 - Posted: 23 Jan 2006, 22:37:47 UTC

i know its a touch off topic but

.. what type of recoards arcive are they useing here.. i know at my work we are stuck using a (sorry if i make mistakes typing the name) SQLserver database >a nightmare to sort out or through<

i understand how dificult it is to sort throught databases for a specific refrence. honestly i would love to get credit for any WU's i have that are Maximum CPU time Exceeded... but from my minor ability and knoledge i would look myself at . for my self my stats and spend my own time figureing out the work units that are overlimit time...like thisand mabey post all the ones i could find together in a request for credit.. personaly i run beta projects and alpha projects also so if i loose a job now and then im not going to spend my time huntning down every loss of a credit so i dont expect the scientists to spend there time eithor.

Although.. if the scientists could give us a list of what they would like to see from us to request credit for errored jobs or such it might make it easyer for myself and others to supply the info
ID: 9666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 9671 - Posted: 24 Jan 2006, 0:11:39 UTC - in response to Message 9666.  

i know its a touch off topic but

.. what type of records archive are they using here.. i know at my work we are stuck using a (sorry if i make mistakes typing the name) SQLserver database >a nightmare to sort out or through<
...
Although.. if the scientists could give us a list of what they would like to see from us to request credit for errored jobs or such it might make it easier for myself and others to supply the info


Good questions. I do not know the answer to the first, but as for the second don't think they will need anything from us. Right now the problem is that when a WU errors out it is resent to another machine, so it takes quite some time for all of them to drain out of the active data base, and reach the archive. It is my understanding that at that point they can process them for credit. This happened just recently with a bad batch of WUs, and some of them are still floating around on machines with long queues.

I do know that David Kim has increased the time available for a WU to complete and shortened the number of process cycles to be run on each WU to fix this problem. From what I am seeing on my systems it is working. I have been two days without a Max time error on my systems. This of course will make the 1% problem worse as it will take a WU longer to time out. Also some of the longest WUs may still fail.

Regards
phil
ID: 9671 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile The Gas Giant

Send message
Joined: 20 Sep 05
Posts: 23
Credit: 58,591
RAC: 0
Message 9891 - Posted: 26 Jan 2006, 5:22:26 UTC

Fairly easy to search for the affected results, just do a search in the stderr out section of the result for max cpu time exceeded. It might take a bucket load of hrs to complete the search, but then we have contributed a bucket load of cpu time in good faith. David is being very silent on this point.

Live long and crunch.

Paul.
ID: 9891 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 9892 - Posted: 26 Jan 2006, 5:51:58 UTC - in response to Message 9891.  

Fairly easy to search for the affected results, just do a search in the stderr out section of the result for max cpu time exceeded. It might take a bucket load of hrs to complete the search, but then we have contributed a bucket load of cpu time in good faith. David is being very silent on this point.

Live long and crunch.

Paul.


You should not be having this problem any more. I know they are very interested in this. They have made adjustment to the WUS to fix it. They are now waiting to see if they have killed the problem. If you see any of these please report them so they will know. If they have the problem nailed then they can worry about the credits. Why should they put in the time to process the credits now, and then have to run the process again later if the problem is not yet fixed?

I really do not understand what difference it makes if we get the credit now or a month from now. I for one am willing to wait for the credit until the problem is actually fixed.

Am I missing something here?

Regards
Phil


We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 9892 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 9945 - Posted: 26 Jan 2006, 17:44:29 UTC - in response to Message 9935.  
Last modified: 26 Jan 2006, 18:07:57 UTC

Just had one. What info do you want?

	Host	Project	Date	Message
dbserver	DBSERVER	rosetta@home	26/01/2006 14:16:20	Throughput 6124 bytes/sec
dbserver	DBSERVER	rosetta@home	26/01/2006 14:16:20	Finished upload of NO_SIM_ANNEAL_2tif_228_635_2_0
dbserver	DBSERVER	rosetta@home	26/01/2006 14:16:11	Started upload of NO_SIM_ANNEAL_2tif_228_635_2_0
dbserver	DBSERVER	rosetta@home	26/01/2006 14:16:09	Starting result PRODUCTION_ABINITIO_1vls__250_2364_0 using rosetta version 481
dbserver	DBSERVER	rosetta@home	26/01/2006 14:16:08	Computation for result NO_SIM_ANNEAL_2tif_228_635_2 finished
dbserver	DBSERVER	---	26/01/2006 14:16:08	request_reschedule_cpus: process exited
dbserver	DBSERVER	rosetta@home	26/01/2006 11:53:37	Starting result NO_SIM_ANNEAL_2tif_228_635_2 using rosetta version 481
dbserver	DBSERVER	rosetta@home	26/01/2006 11:53:37	Computation for result PRODUCTION_ABINITIO_2vik__250_1426_0 finished
dbserver	DBSERVER	---	26/01/2006 11:53:37	request_reschedule_cpus: process exited
dbserver	DBSERVER	rosetta@home	26/01/2006 11:53:36	Unrecoverable error for result PRODUCTION_ABINITIO_2vik__250_1426_0 (Maximum CPU time exceeded)
dbserver	DBSERVER	rosetta@home	26/01/2006 11:53:36	Aborting result PRODUCTION_ABINITIO_2vik__250_1426_0: exceeded CPU time limit 61808.106462
dbserver	DBSERVER	rosetta@home	25/01/2006 16:54:50	Throughput 5380 bytes/sec
dbserver	DBSERVER	rosetta@home	25/01/2006 16:54:50	Finished upload of PRODUCTION_ABINITIO_1vls__250_1175_0_0
dbserver	DBSERVER	rosetta@home	25/01/2006 16:54:19	Started upload of PRODUCTION_ABINITIO_1vls__250_1175_0_0
dbserver	DBSERVER	rosetta@home	25/01/2006 16:54:17	Starting result PRODUCTION_ABINITIO_2vik__250_1426_0 using rosetta version 481
dbserver	DBSERVER	rosetta@home	25/01/2006 16:54:16	Computation for result PRODUCTION_ABINITIO_1vls__250_1175_0 finished


Duron 1600, 512MB RAM, XP Pro (fully patched)
Benchmarks were:
	Host	Project	Date	Message
dbserver	DBSERVER	---	24/01/2006 18:12:09	   2404 integer MIPS (Dhrystone) per CPU
dbserver	DBSERVER	---	24/01/2006 18:12:09	   1456 double precision MIPS (Whetstone) per CPU



MODERATOR NOTE: I am going to open a thread for Maxt Time Exceeded WU, and make it a sticky for the project team. With your permission I am moving your post to that thread. I will leave this post in its place here.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 9945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit!



©2024 University of Washington
https://www.bakerlab.org