Message boards : Number crunching : WUs that die in 30 seconds..
Author | Message |
---|---|
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
I got one; and I've seen mention of a number of them in one of the reporting threads. One of the fellows on the Teddies team managed to get enough erroring WUs that he hit the max number of errors allowed in a day. (You're only allowed to upload 48 WUs a day?) There've been several batches of these terminal WUs in the last few months; so I ask the following questions: Do they have a 100% failure rate on Linux and Windows? And even if not, why aren't all these WUs being having 1 model generated on a fast Windows machine there in the labs before being released to us? |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
I got one; and I've seen mention of a number of them in one of the reporting threads. One of the fellows on the Teddies team managed to get enough erroring WUs that he hit the max number of errors allowed in a day. (You're only allowed to upload 48 WUs a day?) you are right--we should be able to avoid this by always testing first on ralph. will check to see why this is still happening |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
...so should we be reporting them in here, or do you have enough info already? |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
HOMSdt_homDB004_1dtj__340_50_0 failed 3 times. HOMSdt_homDB004 appears in a number of the reports on this series of WU failures. Did any of the HOMSdt_homDB WUs get processed successfully? |
Darren Send message Joined: 6 Oct 05 Posts: 27 Credit: 43,535 RAC: 0 |
Did any of the HOMSdt_homDB WUs get processed successfully? I have this HOMSdt_homDB005_1dtj__352_78 that processed successfully. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Dang.. so some of them did succeed. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10401327 HOMSdt_homDB027_1dtj__352_1364 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10268921 HOMSdt_homDB003_1dtj__352_40 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10310727 HOMSdt_homDB009_1dtj__352_458 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10386127 HOMSdt_homDB027_1dtj__352_1212 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10337611 HOMSdt_homDB011_1dtj__352_727 all failed multiple times. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10683881 HOMSti_homDB017_1tif__352_1174 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10439262 HOMSmk_homDB013_1mkyA_352_1147 both failed, but finally succeeded on the last machine. Which shows that while these WUs have an incredibly high failure rate, they're able to be processed by some systems. |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
HOMSdt_homDB004_1dtj__340_50_0 failed 3 times. HOMSdt_homDB004 appears in a number of the reports on this series of WU failures. Did any of the HOMSdt_homDB WUs get processed successfully? there were only a subset that failed--Divya identified the problem and fixed it. for experts, the problem was that the "-termini" option adds a proton to the N terminus, but for proline there is no place to put the proton, and for a subset of the 1dtj homologues there was an N terminal proline. this is the sort of mistake that only gets made once--it has now been fixed. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
I was wondering if running a single model of these WUs in the lab would have uncovered the problem - assuming that there was a 100% failure rate. But Darren proved that theory wrong. And as dgeiser posted here: dgeiser's sub minute failures post it's not limited to a subset of 1dtj WUs. Any explanation for why the 1tif and 1mkyA (someone else's failures listed in my last message) failed with less than 0.12 credits on the first machine or two, but managed to succeed on the last machine? Do you have any idea what was different? (hardware/software/starting code?) |
Message boards :
Number crunching :
WUs that die in 30 seconds..
©2024 University of Washington
https://www.bakerlab.org