Message boards : Number crunching : Report stuck & aborted WU here please - II
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next
Author | Message |
---|---|
Nite Owl Send message Joined: 2 Nov 05 Posts: 87 Credit: 3,019,449 RAC: 0 |
I just aborted Result ID 16972487, TRUNCATE_TERMINI_FULLRELAX_1ptq__433_139_0 at 1.04% after two+ hours (CPU run time preference =1 hour). |
Mikkie Send message Joined: 1 Apr 06 Posts: 9 Credit: 5,700 RAC: 0 |
|
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,717,792 RAC: 0 |
Iv'e got a stuck work unit at 1.042% complete (4h 50min) w/ 2 hr runtime preference. No activity in graphics mode. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13936782 Please advise; should I terminate? |
keyboards Send message Joined: 3 Mar 06 Posts: 36 Credit: 74,787 RAC: 0 |
Aborting 7485_largescale_large_fullatom_relax_dec7485_1_47_1.pdb_432_95. Completed 1.76% after 2 hours with no further advance. Set for 2 hours. !!Stupidity should be PAINFUL!! |
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,717,792 RAC: 0 |
Iv'e got a stuck work unit at 1.042% complete (4h 50min) w/ 2 hr runtime preference. No activity in graphics mode. I've aborted this work unit after six hours. https://boinc.bakerlab.org/rosetta/result.php?resultid=17002591 |
Purple Rabbit Send message Joined: 24 Sep 05 Posts: 28 Credit: 4,296,740 RAC: 3,006 |
This one ran for 6 hours stuck at 1.04%. I restarted BOINC and the WU began again at zero. It quickly ran up to 1.04%, but seemed to have hung again according to the graphics display. I aborted the WU after 14 minutes (the second time). TRUNCATE_TERMINI_FULLRELAX_1enh__433_53_0 using rosetta version 498 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13904970 |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
The fun continues, https://boinc.bakerlab.org/rosetta/result.php?resultid=16988523 https://boinc.bakerlab.org/rosetta/result.php?resultid=17002662 Both aborted via cli due to 1% error, 12 hours lost. |
Christian Barrett Send message Joined: 17 Sep 05 Posts: 11 Credit: 14,933 RAC: 0 |
here is one that cost me dearly https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13792169 10 Apr 2006 22:50:35 UTC 12 Apr 2006 3:54:52 UTC Over Client error Done 70,199.00 105.31 --- |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
Another: https://boinc.bakerlab.org/rosetta/result.php?resultid=16998699 9.5 hours - 1% manually killed |
JT.Ault Send message Joined: 9 Dec 05 Posts: 1 Credit: 829,315 RAC: 0 |
home1 rosetta@home 4/11/2006 9:48:33 PM Unrecoverable error for result TRUNCATE_TERMINI_FULLRELAX_2tif__433_106_0 (aborted via GUI RPC) https://boinc.bakerlab.org/rosetta/result.php?resultid=16970267 Exit status -197 (0xffffff3b) application version 4.98 Stuck at 1.04% |
Team_Elteor_Borislavj~Intelligence Send message Joined: 7 Dec 05 Posts: 14 Credit: 56,027 RAC: 0 |
1 of my clients hasnt contacted bakerlab since 23 march, will investigate this evening why it died... |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
This is so not my day, maybe I'll hit the record for most lost work in a 24 hour period. https://boinc.bakerlab.org/rosetta/result.php?resultid=17029579 21,256.53 seconds still at 1% manually aborted. |
[DPC]Charley Send message Joined: 18 Mar 06 Posts: 9 Credit: 295,915 RAC: 0 |
Got another two units stuck at 1%, aborted 'm TRUNCATE_TERMINI_FULLRELAX_1b3aA_433_355 after 6 hours and TRUNCATE_TERMINI_FULLRELAX_1b3aA_433_479_0 after 10 hours (seriously stuck, no counters increase except for the time) |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
and another one, 4.5 hours stuck at 1%: https://boinc.bakerlab.org/rosetta/result.php?resultid=17043276 all these stuck units are from different systems and a mix of linux/windows. |
Christian Hagen Send message Joined: 26 Sep 05 Posts: 5 Credit: 46,795 RAC: 0 |
Got also a WU stuck at 1% and aborted it https://boinc.bakerlab.org/rosetta/result.php?resultid=17029338 TRUNCATE_TERMINI_FULLRELAX_1enh__433_691_0 after 2.5 hours |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=17060746 https://boinc.bakerlab.org/rosetta/result.php?resultid=17044331 and https://boinc.bakerlab.org/rosetta/result.php?resultid=17051524 ok whats the word on these work units this is really annoying. |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
ARGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHH Another failed project 17027763 12417632 11 Apr 2006 20:35:20 UTC 12 Apr 2006 12:28:07 UTC Over Client error Computing 44,064.20 136.62 This makes at least 5 projects with crashes and more than 5 cpu days wasted in total. What the hell is happening. To Say I am frustrated is an understatement This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
ARGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHH Well don't feel to bad Jose I seem to have to abort 60 to 100 Hrs of wasted CPU time every DAY. I did abort just today 7 WU's STUCK at 1.04% for a total of 80 HRs DAVID what are you going to do about solving this problem ??? Any end in sight? Baby sitting your client does consume a lot of my time If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
2 WUs here stuck at 1.04% TRUNCATE_TERMINI_FULLRELAX_1ptq_433_485_0 TRUNCATE_TERMINI_FULLRELAX_1enh_433_558_0 There are two more in this series to come; I'll abort the stuck ones and see what happens. Edit: the subsequent WUs seem to be running ok, although one of them had already been aborted elsewhere. Anyway, they're both past 8% so fingers crossed. NB: my default is 4 hours and the two units above are the first to have stuck. |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
ARGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHH sounds to me like things are worse than they were a week ago, is this correct? the only change is that we increased the default run time from 2 hours to 4 hours, which reduces network traffic at the cost of an increased chance of work unit errors (because they are longer). we can set the default back to two hours and see if it helps. anyway--main question--are people seeing more stuck work units now than 7-10 days ago? |
Message boards :
Number crunching :
Report stuck & aborted WU here please - II
©2024 University of Washington
https://www.bakerlab.org