Minirosetta 3.73-3.78

Message boards : Number crunching : Minirosetta 3.73-3.78

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next

AuthorMessage
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,054,272
RAC: 5,361
Message 79801 - Posted: 27 Mar 2016, 0:21:12 UTC - in response to Message 79799.  

Has anyone else gotten work units for Minirosetta 3.71 that are estimated to run 14 days? I'm running on an old (2009) Mac with 8GB of memory and lately I've gotten these here and there.

Thanks

Boinc 7.6.22
Mac OS 10.11.4


Rosetta appear OK.

I just set my Rosetta PREFERENCES: CPU TARGET RUNTIME = 14 hours, enabled Rosetta computing on one of my Linux 64-bit systems and Rosetta downloaded 50 14-hour jobs. I think the only difference in a default 6-hour job and a 14-hour job is what Rosetta sets in the "-cpu_run_time 21600" as a command line option. I don't think Rosetta jobs care what system they execute on .... MACOS, Windows, Linux86/64.

I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached.

NOTE: THIS is one reason why it is very, very tough to compare system performances. Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys.

I would guess that your event log is showing some problem with DISK SPACE available for Rosetta. BOINC has 3 possible limits on disk and I always seem to hit them accidently:

1. maximum amount used
2. amount to leave free
3. maximum % of disk


SAMPLE command line will only differ in the leading OS name and is added by the Rosetta server when it dispatches the job to a system.

command: minirosetta_3.71_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip NTF2_215_N90N92K61_4_9_1_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000
-cpu_run_time 21600
-checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3415526


ID: 79801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile James Adrian

Send message
Joined: 27 Apr 12
Posts: 5
Credit: 1,801,535
RAC: 0
Message 79805 - Posted: 27 Mar 2016, 16:27:07 UTC - in response to Message 79801.  

rls5 thanks for all the info!

I checked the logs and didn't find any errors and prefs show 6 hours as you mentioned below. If it happens again I'll wait for the 6 hour mark, just so I can see what happens. (:-)


Rosetta appear OK.

I just set my Rosetta PREFERENCES: CPU TARGET RUNTIME = 14 hours, enabled Rosetta computing on one of my Linux 64-bit systems and Rosetta downloaded 50 14-hour jobs. I think the only difference in a default 6-hour job and a 14-hour job is what Rosetta sets in the "-cpu_run_time 21600" as a command line option. I don't think Rosetta jobs care what system they execute on .... MACOS, Windows, Linux86/64.

I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached.

NOTE: THIS is one reason why it is very, very tough to compare system performances. Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys.

I would guess that your event log is showing some problem with DISK SPACE available for Rosetta. BOINC has 3 possible limits on disk and I always seem to hit them accidently:

1. maximum amount used
2. amount to leave free
3. maximum % of disk


SAMPLE command line will only differ in the leading OS name and is added by the Rosetta server when it dispatches the job to a system.

command: minirosetta_3.71_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip NTF2_215_N90N92K61_4_9_1_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000
-cpu_run_time 21600
-checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3415526




ID: 79805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 79806 - Posted: 27 Mar 2016, 17:48:48 UTC - in response to Message 79801.  



I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached.

... Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys.

Not all. If you check any FFD_ tasks in your list you will see they generate many hundreds of models (I have several with over 1000 models generated).

If memory serves, the 99 model limit was enacted when some tasks created output files too large to be uploaded. The limit only applies to a particular type of task. Others use the preferred cpu time plus 4 method to determine when to end things. When a model is completed the task calculates whether it has time left to complete another model. If the answer is no then the task wraps things up despite there appearing (to the cruncher) hours left. if the answer is yes the tasks will begin another model. All models aren't equal however, even within the same task so some will take longer than predicted. To insure that otherwise good models aren't cut short just before completing (and to increase the odds that the task will complete at least one model) the task will continue past the preferred cpu time. At some point though, you gotta cut your losses and so at preferred cpu time plus 4 hours the watchdog cuts bait and the task goes home. ( I'm curious about the average overtime; my totally uninformed guess is that it's less than an hour.)

There are other types of tasks in which filters are employed to cut off models early. If the model passes the filter it will continue working on that one task to the end. This results in dramatically disparate counts, with one task generating hundreds of models while another task from the same batch only generating one, two, five, etc. Recently on ralph a filter was used to remove models resulting in a file transfer error upon upload. The stderr out listed 13 models from 2 attempts but since the models had been erased the file meant to contain them didn't exist. I'm guessing, based on DEK's post, which I may well have misinterpreted, that the server, possibly as part of a validation check, automatically gives the file transfer error (client error, compute error) when this particular file isn't part of the upload.

All these different strategies result, from the cruncher's point of view, in varied behavior which we struggle to interpret. Is it a problem with my computer or a problem with rosetta? Is it a problem at all? BOINC is complicated enough for the computer savvy, much more so for majority of crunchers who just want to maximize their participation in rosetta and end up massively tangled up in the BOINC settings. The variety of legitimate behaviors exhibited by rosetta tasks trips up the volunteers trying to help them become untangled. From the researcher' point of view everything may look fine, working as expected, and any issues a lone cruncher is having is most likely due to their particular set up. And it probably is, but the lack of information leaves the volunteers flailing.

I have long wished for a reference, a database of tasks, in which the tasks are divided into broad categories of strategies employed (as above, which some info on how they "look " to the crunchers) and what, in a most basic way, is being asked (how does this particular protein fold, how do these two proteins interact, can we create a new protein to do x, etc.)

Best,
Snags

ID: 79806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 79807 - Posted: 28 Mar 2016, 0:55:42 UTC

Thanks for the report you guys!

I'm responsible for the *MAP* jobs. I'm getting 90% success, which is "normal", but if it turns out that part of the 10% that fail are coming from mac(s), we could fix this!

I'll do some local tests on my mac.
ID: 79807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 79809 - Posted: 28 Mar 2016, 8:32:49 UTC

Hi krypton.

I've had 9 of your tasks fail on one rig just today so far, all with the same error like so.

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1622019

P76481_PF12034_90-575_300-486_EN_MAP_hyb_cst_v02_i01_t000__krypton_SAVE_ALL_OUT_03_09_341621_123_0

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>

(EDITED OUT THE REST)

Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_f513f38.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/P76481_PF12034_90-575_300-486_EN_MAP_hyb_cst_v02_i01_t000__krypton.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (22 frames):
[0xd98d38f]
[0xb7766404]
[0xb6d9849]
[0xb8ef314]
[0xb8f1a90]
[0xb8f4b33]
[0xb90ae55]
[0xb7ecda9]
[0xb8cebea]
[0xc2ff844]
[0xc31427f]
[0xabe3c0b]
[0x8d92b93]
[0xb04b065]
[0xb05021c]
[0xb0f6a35]
[0xb0f959e]
[0xb1b8bc3]
[0xb1b524d]
[0x8057071]
[0xda24988]
[0x8048131]

Exiting...

</stderr_txt>

ID: 79809 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Cap

Send message
Joined: 29 Aug 11
Posts: 3
Credit: 7,112,836
RAC: 0
Message 79812 - Posted: 28 Mar 2016, 18:47:57 UTC

I have had several tasks fail and some of them fail to exit leaving a boinc slot in use but no processing being done. Those I had to force quit. They all have an error message from malloc saying that a free was done on a block that was not allocated or a block was corrupted after being freed. Seems that the app is using a block after it has been freed.

I don't know why boinc isn't cleaning up after some these.
ID: 79812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 79814 - Posted: 28 Mar 2016, 20:13:13 UTC
Last modified: 28 Mar 2016, 21:44:12 UTC

The error turned out to be related to a new rotamer library we are using, which I happened to enable for the *MAP* jobs. I confirmed on my mac, appears to only happen on macs (and some older linux machines).

I currently have no more jobs in the queue. For all future jobs I'll be reverting to the older rotamer library until the error is fixed! Thanks for the examples, it was helpful in debugging.

Update:
I just submitted a new batch of jobs *REDO_MAP*, if you get any errors from these, please report!

Thanks,
-krypton
ID: 79814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 79826 - Posted: 31 Mar 2016, 18:16:25 UTC

I just updated the minirosetta app to 3.73. This version includes new protocols, including the remodel protocol for design, and various bug fixes. It uses the latest Rosetta source.
ID: 79826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1995
Credit: 9,635,132
RAC: 6,870
Message 79828 - Posted: 1 Apr 2016, 5:44:19 UTC

Seems that the memory problem of 3.72 is not completely resolved:
806102354
806102335
ID: 79828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 79840 - Posted: 4 Apr 2016, 16:21:15 UTC

Maybe instead of the blue, make the graphics "fit" the more common widescreen configuration. That way there won't be any "blank space" on the right of modern monitors.
ID: 79840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 79846 - Posted: 5 Apr 2016, 6:32:44 UTC

Forgot to mention, I added a project specific option for the black screen (like before).
ID: 79846 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 79863 - Posted: 9 Apr 2016, 12:14:38 UTC - in response to Message 79846.  

Forgot to mention, I added a project specific option for the black screen (like before).


Oh. Thanks!
ID: 79863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark Kramer

Send message
Joined: 25 Jun 10
Posts: 5
Credit: 74,534
RAC: 0
Message 79918 - Posted: 24 Apr 2016, 1:15:02 UTC
Last modified: 24 Apr 2016, 1:16:16 UTC

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.
ID: 79918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,150
Message 79919 - Posted: 24 Apr 2016, 2:45:37 UTC - in response to Message 79918.  

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.


Does the SETI@HOME work use a graphics card with an Nvidia GPU? If so, the 364.* series of drivers for Nvidia GPUs has problems, so you might want to check whether going back to the 362.00 driver fixes any problems for you.
ID: 79919 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 79931 - Posted: 25 Apr 2016, 16:11:48 UTC

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer
ID: 79931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LarryMajor

Send message
Joined: 1 Apr 16
Posts: 22
Credit: 31,533,212
RAC: 0
Message 79941 - Posted: 26 Apr 2016, 19:50:58 UTC

Same problem still exists today.
It happened to two of my machines, running 32 and 64 bit Linux 3.16.0-4. Forcing an update reports/fetches jobs and clears the 24 hour wait time, and reports normally for about 12 hours or so, when the cycle repeats.
ID: 79941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark Kramer

Send message
Joined: 25 Jun 10
Posts: 5
Credit: 74,534
RAC: 0
Message 79942 - Posted: 26 Apr 2016, 20:20:39 UTC - in response to Message 79919.  

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.


Does the SETI@HOME work use a graphics card with an Nvidia GPU? If so, the 364.* series of drivers for Nvidia GPUs has problems, so you might want to check whether going back to the 362.00 driver fixes any problems for you.


It does but, as it's a 9600GT, the highest driver is a 340.22. As I was running 337.88, I updated and then tried again. It has now outright crashed twice when I was running other programs. (Starcraft 2 when I was redoing graphics settings and SWTOR just now.) Because both crashes completely locked up the system to the point of needing a reboot, I couldn't check task manager to see if Minirosetta had started itself again. Reviewing the logs under admin tools didn't show me anything either.

I've uninstalled BOINC again and I'm just going to run this computer as normal for the next two days. If it crashes again during that time, then I'll know that it's something else wrong with the computer. If it doesn't, then I'm probably going to lean towards it being an XP/older graphics card conflict with BOINC.
ID: 79942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark Kramer

Send message
Joined: 25 Jun 10
Posts: 5
Credit: 74,534
RAC: 0
Message 79944 - Posted: 27 Apr 2016, 0:04:20 UTC - in response to Message 79942.  

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.


Does the SETI@HOME work use a graphics card with an Nvidia GPU? If so, the 364.* series of drivers for Nvidia GPUs has problems, so you might want to check whether going back to the 362.00 driver fixes any problems for you.


It does but, as it's a 9600GT, the highest driver is a 340.22. As I was running 337.88, I updated and then tried again. It has now outright crashed twice when I was running other programs. (Starcraft 2 when I was redoing graphics settings and SWTOR just now.) Because both crashes completely locked up the system to the point of needing a reboot, I couldn't check task manager to see if Minirosetta had started itself again. Reviewing the logs under admin tools didn't show me anything either.

I've uninstalled BOINC again and I'm just going to run this computer as normal for the next two days. If it crashes again during that time, then I'll know that it's something else wrong with the computer. If it doesn't, then I'm probably going to lean towards it being an XP/older graphics card conflict with BOINC.


Follow-up: The outright crashes seem to have been caused by the driver upgrade so I reverted it back to 337.88. That doesn't resolve the problem with minirosetta locking up or starting despite preferences but I know it wasn't causing outright crashes.
ID: 79944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 79950 - Posted: 27 Apr 2016, 16:43:43 UTC

I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be:

ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884
[0x4485e82]

Sample tasks:

815715361
815591894

Boinc 7.2.42
Ubuntu 14.04
ID: 79950 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tortuga1

Send message
Joined: 16 Oct 08
Posts: 1
Credit: 734,150
RAC: 0
Message 79951 - Posted: 27 Apr 2016, 21:39:36 UTC - in response to Message 79950.  

I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be:

ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884
[0x4485e82]

Sample tasks:

815715361
815591894

Boinc 7.2.42
Ubuntu 14.04

ID: 79951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next

Message boards : Number crunching : Minirosetta 3.73-3.78



©2024 University of Washington
https://www.bakerlab.org