Work Units that fail in under one minute - Report HERE

Message boards : Number crunching : Work Units that fail in under one minute - Report HERE

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
TeamBSE~Timmeh!

Send message
Joined: 16 Mar 06
Posts: 6
Credit: 76,936
RAC: 0
Message 12899 - Posted: 1 Apr 2006, 2:43:34 UTC
Last modified: 1 Apr 2006, 2:44:36 UTC

ID: 12899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TeamBSE~Timmeh!

Send message
Joined: 16 Mar 06
Posts: 6
Credit: 76,936
RAC: 0
Message 13138 - Posted: 6 Apr 2006, 22:16:29 UTC
Last modified: 6 Apr 2006, 22:18:07 UTC

ID: 13138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 13142 - Posted: 6 Apr 2006, 23:04:01 UTC

From Timmeh's list:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10350511
This failed nearly instantly on 2 Windows machines, yet ran fine on a Linux machine on the 3rd try. What's different with the Linux client that would allow it to handle the WU but the Windows clients not?
ID: 13142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sam-TNO-

Send message
Joined: 15 Feb 06
Posts: 2
Credit: 252,795
RAC: 0
Message 13257 - Posted: 8 Apr 2006, 18:17:24 UTC - in response to Message 13142.  
Last modified: 8 Apr 2006, 18:28:47 UTC

BOINC just upgraded Rosetta application from 4.83 to 4.97 and the first two WU's failed within seconds (third is running now ~24 minutes and about 8.95% complete). I'm running BOINC client v. 5.2.13 under Windows XP SP2. CPU is Intel Pentium M 1.7 (735). There were no interaction with PC, only Rosetta running, when this failure happened.

8.4.2006 20:42:39||request_reschedule_cpus: files downloaded
8.4.2006 20:42:50|rosetta@home|Unrecoverable error for result HBLR_1.0_1r69_426_5807_0 ( - exit code -1073741819 (0xc0000005))
8.4.2006 20:42:50||request_reschedule_cpus: process exited
8.4.2006 20:42:50|rosetta@home|Computation for result HBLR_1.0_1r69_426_5807_0 finished
8.4.2006 20:42:50|rosetta@home|Starting result HBLR_1.0_1b72_426_5808_0 using rosetta version 497
8.4.2006 20:43:52|rosetta@home|Unrecoverable error for result HBLR_1.0_1b72_426_5808_0 ( - exit code -1073741819 (0xc0000005))
8.4.2006 20:43:52||request_reschedule_cpus: process exited
8.4.2006 20:43:52|rosetta@home|Computation for result HBLR_1.0_1b72_426_5808_0 finished

https://boinc.bakerlab.org/rosetta/result.php?resultid=16647868
https://boinc.bakerlab.org/rosetta/result.php?resultid=16647860

EDIT (ten minutes later):
(8.4.2006 20:43:52|rosetta@home|Starting result HBLR_1.0_1mky_426_5804_0 using rosetta version 497
8.4.2006 21:14:56||request_reschedule_cpus: project op
8.4.2006 21:15:01|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
8.4.2006 21:15:01|rosetta@home|Reason: Requested by user
8.4.2006 21:15:01|rosetta@home|Reporting 3 results
8.4.2006 21:15:06|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
8.4.2006 21:21:54|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_426_5804_0 ( - exit code -1073741819 (0xc0000005))
8.4.2006 21:21:54||request_reschedule_cpus: process exited)

8.4.2006 21:21:54|rosetta@home|Computation for result HBLR_1.0_1mky_426_5804_0 finished
8.4.2006 21:21:54|rosetta@home|Starting result HBLR_1.0_1b72_426_5809_0 using rosetta version 497
8.4.2006 21:22:56|rosetta@home|Unrecoverable error for result HBLR_1.0_1b72_426_5809_0 ( - exit code -1073741819 (0xc0000005))
8.4.2006 21:22:56||request_reschedule_cpus: process exited

https://boinc.bakerlab.org/rosetta/result.php?resultid=16647818
https://boinc.bakerlab.org/rosetta/result.php?resultid=16647880

First one of these two is not 'under one minute failure' but included it anyways (just in case it's helpfull...)


Hope this helps,
Sam-TNO-
Team-SciFi
www.team-scifi.com
ID: 13257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 13269 - Posted: 8 Apr 2006, 20:13:41 UTC

Unless your max cpu time setting is down at 1 hour, you'll have close to a 100% failure rate with these HBLR WUs using the latest Rosetta 4.97 client. 1 of my last 12 lasted to almost 3 hours (so the 2 hour setting would have caught it) and one more (a total of 2) survived longer than 3600 seconds (1 hour). If you want to turn in useful results, you might want to reduce the max cpu time to 1 hour. If the bandwidth usage is too much, you also might want to shut down Rosetta or Boinc this weekend until the HBLRs are cleared.


ID: 13269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
charmed

Send message
Joined: 2 Nov 05
Posts: 11
Credit: 1,780,440
RAC: 0
Message 13273 - Posted: 8 Apr 2006, 20:35:38 UTC
Last modified: 8 Apr 2006, 20:38:28 UTC

All my windows boxes both Winxp Pro and Home are failing on FARELAX_NOFILTERS and HBLR etc etc. The Linux boxes are working fine, they are all one version or another of Fedora Core up to 4. This is since version 4.97 was installed. Using BOINC 5.2.13
ID: 13273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 13379 - Posted: 10 Apr 2006, 4:09:53 UTC - in response to Message 13273.  

All my windows boxes both Winxp Pro and Home are failing on FARELAX_NOFILTERS and HBLR etc etc. The Linux boxes are working fine, they are all one version or another of Fedora Core up to 4. This is since version 4.97 was installed. Using BOINC 5.2.13



4.98 is the latest version for Rosetta. You are dealing with a know problem. You need to update your project.
ID: 13379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Work Units that fail in under one minute - Report HERE



©2024 University of Washington
https://www.bakerlab.org