Message boards : Number crunching : Help us solve the 1% bug!
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next
Author | Message |
---|---|
OhioDude Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,056,499 RAC: 0 |
Two more stuck at 1% HB_BARCODE_30_1scjB_351_1354_0 HB_BARCODE_30_1a19A_351_3105_1 Visit my websites honoring some of America's heroes: USS Rich DE-695 USS Bunch DE-694 / APD-79 |
Mike Send message Joined: 21 Dec 05 Posts: 9 Credit: 35,252 RAC: 0 |
Computer reboot solved my problem. 8 hours stuck at 1%. |
Los Alcoholicos~La Muis Send message Joined: 4 Nov 05 Posts: 34 Credit: 1,041,724 RAC: 0 |
Stuck at 1% (model 1 - step 21731) SHORTRELAX_2tif__333_146_1 Unfortunaly it did take 21:34 hrs before I noticed it. It was stuck once before I got it. Maybe you might consider not sending stuck wu's out again, because R@h is loosing lots of valuable crunching time. |
arklms Send message Joined: 17 Dec 05 Posts: 7 Credit: 177,488 RAC: 0 |
SSFEATURES_BARCODE_ABINITIO_5croA_334_286_0 66 hours, 1%. Can't get it to budge. |
OhioDude Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,056,499 RAC: 0 |
Stuck at 1% HB_BARCODE_30_1b3aA_351_3464_0 Visit my websites honoring some of America's heroes: USS Rich DE-695 USS Bunch DE-694 / APD-79 |
Los Alcoholicos~La Muis Send message Joined: 4 Nov 05 Posts: 34 Credit: 1,041,724 RAC: 0 |
Stuck at 1% (model 1 - step 21615) HB_BARCODE_30_1bk2_351_1307_0 |
OhioDude Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,056,499 RAC: 0 |
Stuck at 1% HB_BARCODE_30_1a68__351_3815 HB_BARCODE_30_1acf__351_3730 Visit my websites honoring some of America's heroes: USS Rich DE-695 USS Bunch DE-694 / APD-79 |
OhioDude Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,056,499 RAC: 0 |
More stuck at 1%: HOMSr6_homDB027_1r69__352_1131_0 Visit my websites honoring some of America's heroes: USS Rich DE-695 USS Bunch DE-694 / APD-79 |
Mike Send message Joined: 21 Dec 05 Posts: 9 Credit: 35,252 RAC: 0 |
Hi all. I'm beginning to think that this problem is caused by the BOINC screen saver. I have changed my screen saver to a windows saver and so far so good. will keep you informed. |
doc :) Send message Joined: 4 Oct 05 Posts: 47 Credit: 1,106,102 RAC: 0 |
got my first stuck at 1% wu on my athlonXP pc (had 2 long ago on my duron pc) its stuck at step 22813 of model 1 for more than 1 hour and 30 minutes, exiting and restarting boinc brought it past that step in less than one minute. WU - result (nothing to see there yet, its still running here while i type this :)) |
paul and kirsty yates Send message Joined: 14 Jan 06 Posts: 2 Credit: 179,851 RAC: 0 |
think ive got one too https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11223579 FA_RLXfn_hom024_1fna__360_141_0 only 1 hour at present but no movement on screensaver and no increase of % shall i abort ?? |
Robert Send message Joined: 3 Jan 06 Posts: 4 Credit: 23,755 RAC: 0 |
Got one stuck at 1% ,restarted Boinc no help,restarted PC CPU time reset to 00:00 and after 30 min still at 1%.Work unit FA_RLXai_hom020_aiu_359_363_0 Aborted gui rpc. |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
This thread is approaching 100 posts - Is the project ANY closer to solving the 1% bug? Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
xrobert Send message Joined: 28 Oct 05 Posts: 3 Credit: 168,865 RAC: 0 |
The WU I have has got stuck on 1%. FA_RLXai_hom023_1aiu__359_454_0 I've now also aborted another unit which begins with ai_hom, to be safe. |
Mike Send message Joined: 21 Dec 05 Posts: 9 Credit: 35,252 RAC: 0 |
Hi all. Since turning off the Boinc screen saver I have no problems with the 1% bug during the last 4 days. |
vavega Send message Joined: 2 Nov 05 Posts: 82 Credit: 519,981 RAC: 0 |
be vigilant mike, i've never run the screensaver and yesterday had 2 stuck at 1%. i tried rebooting (which twice in the past had gotten it to crunch past 1%) but alas this time failed. so they were aborted. i'm with angus on this.....can we have an update from someone? |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
stuck at 1.00% https://boinc.bakerlab.org/rosetta/result.php?resultid=13879990 02:37:08 cpu time - Pentium 4 1800 mhz Rosetta 4.82 Windows *I Suspended this WU (removing from memory), later it should be re-started automatically In case of More 3 hours of cpu time, stuck at 1.00% I will abort this wu *My preference run time is 2 hours -:( *and I don't use screen saver Click signature for global team stats |
Mike Send message Joined: 21 Dec 05 Posts: 9 Credit: 35,252 RAC: 0 |
Sorry. I should have said I have disabled all screen savers. |
Pirxx Send message Joined: 8 Jan 06 Posts: 2 Credit: 42,312 RAC: 0 |
Hi! After crunching for quite a while, I have my first WU that has got stuck on 1%: FA_RLXbg_hom029_1bgf__359_368_0 I noticed it after it wasted about 12 hours of my CPU time :-( I tried all the tricks proposed in this thread in order to get it going again - I suspended computing of this WU, reentered BOINC, rebooted my computer - with no luck (maybe I was a bit too impatient in those trials - as it turned out later). I also ran the WU outside of BOINC, as proposed in David Kim's instructions. It went OK, though it took ~30 minutes to go past 1%, so maybe it would also ran in BOINC after restart - if I had waited that long. I waited only about 20 min because I thought that with a standard WU crunch time of 2 hrs, 1% couldn't take that long. I use P4 3GHz with HT, WinXP Professional, BOINC 5.2.13 I had 'leave app in memory' set to 'no' because I did not know I should have it set to 'yes'. But with fast computer and switching time between projects set to 120 min it shouldn't be such a problem. I have stdout.txt saved and I can send it, if it could be of any use. I hope the above notes will help you. Pirxx Pirxx |
napolj2 Send message Joined: 29 Dec 05 Posts: 1 Credit: 1,021,633 RAC: 0 |
Hello. Earlier I had my first 1% bug after processing many work units with little or no problems. Result WorkUnit I noticed the work unit was stuck at 1% after having run for 15 hours! The graphics were completely frozen (no change in # steps) at Model 1, Ab initio, step 21924, while R@H was still consuming high CPU usage. From stdout.txt: command executed: projects/boinc.bakerlab.org_rosetta/rosetta_4.82_windows_intelx86.exe xx 1cg5 B -output_silent_gz -silent -increase_cycles 10 -relax_score_filter -new_centroid_packing -abrelax -output_chi_silent -stringent_relax -vary_omega -omega_weight 0.5 -farlx -ex1 -ex2 -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -no_filters -nstruct 10 -protein_name_prefix hom023_ -frags_name_prefix hom023_ -filter1 -240 -filter2 -255 -termini -cpu_run_time 7200 -constant_seed -jran 2764725 stdout.txt ended with: score0 done: (best, low) rms 0 0 29.1041889 --------------------------------------------------------- score1 done: (best, low) rms (best,low) -38.2760544 -41.4388504 13.1321259 12.8618298 standard trials: 20000 accepts: 1434 %: 7.17 ----------------------------------------------------- Alternate score2/score5... kk score2 score5 low_score n_low_accept rms rms_min low_rms 0 -7.574 -7.574 -7.574 77 12.862 12.137 12.862 As per instructions, I stopped the BOINC service and ran Rosetta outside of BOINC from the command line; it did not get stuck. I closed Rosetta and then restarted BOINC. BOINC restarted the workunit from the beginning and ran it to completion--with the same random seed--without trouble. stdout.txt now continued on like this: score0 done: (best, low) rms 0 0 29.1041889 --------------------------------------------------------- score1 done: (best, low) rms (best,low) -38.2760544 -41.4388504 13.1321259 12.8618298 standard trials: 20000 accepts: 1434 %: 7.17 ----------------------------------------------------- Alternate score2/score5... kk score2 score5 low_score n_low_accept rms rms_min low_rms 0 -7.574 -7.574 -7.574 77 12.862 12.137 12.862 converged 2.55542803 104397 1 -11.212 -11.212 -14.466 83 12.846 12.137 12.871 converged 1.9010148 104842 2 -18.353 -18.353 -19.036 89 12.853 12.137 12.893 converged 1.58851826 110341 3 -12.661 -12.661 -22.760 93 12.912 12.137 12.875 4 16.287 16.287 -24.439 96 21.553 12.137 12.886 5 18.037 18.037 -24.986 97 22.463 10.673 12.858 converged 2.06075215 119404 6 -2.563 -2.563 -29.275 98 13.062 10.673 12.902 converged 1.70931101 123891 7 -12.944 -12.944 -29.275 98 13.164 10.673 12.902 8 -10.142 -10.142 -29.275 98 15.353 10.673 12.902 9 -21.779 -21.779 -31.775 100 17.247 10.673 18.628 converged 2.26275945 100960 10 -31.685 -31.685 -31.763 100 18.124 10.673 18.628 standard trials: 97008 accepts: 6066 %: 6.25309 ----------------------------------------------------- Starting score3 moves... kk,score3,low_score,rms_err,low_rms,rms_min,naccept 0 40.546 40.546 18.628 18.628 10.673 7500 1 60.275 24.222 18.322 18.137 10.673 12142 pre-computing chuck/gunn move set for frag length 3 2 42.101 21.584 17.448 18.061 10.673 15124 3 33.776 21.028 18.461 17.431 10.673 17792 standard trials: 40000 accepts: 4642 %: 11.605 smooth trials: 80000 accepts: 5650 %: 7.0625 ----------------------------------------------------- ----------------------------------------------------- Comparing this from before, apparently BOINC (or Rosetta) got stuck somewhere after the line 0 -7.574 -7.574 -7.574 77 12.862 12.137 12.862 but before converged 2.55542803 104397. Also, this final stdout.txt contained the exact same numbers as did the stdout.txt from when I ran Rosetta from the command line. This is weird. I can't say the problem lies with BOINC or the BOINC-Rosetta interface because BOINC ran the same exact comman d without error the second time around. I can only guess that the 1% bug is caused by some odd combination of outside events--like threads executing in a different order. I will see if I can do anything to reproduce it. Other info on my configuration: WinXP SP2. 2.6GHz Celeron CPU. 1GB RAM. BOINC 5.2.13 running as system service. Rosetta 4.82. Also running RALPH@Home. Applications kept in memory when swapping. |
Message boards :
Number crunching :
Help us solve the 1% bug!
©2024 University of Washington
https://www.bakerlab.org