Discussion on increasing the default run time

Message boards : Number crunching : Discussion on increasing the default run time

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1686
Credit: 17,999,343
RAC: 23,616
Message 93322 - Posted: 4 Apr 2020, 0:40:17 UTC - in response to Message 93320.  

OK, so the target time limits the run length of the task, but how is that affected by or interact with the minimum target setting?
I have no idea what you are asking here. The Target CPU time is the minimum time the Task will run for (unless there is a problem with it).


. . Actually I am finding just the opposite, with % CPU numbers running at 50% (2 cores) one task is running just fine (has just completed AOK and a second one has started) but when I increased CPUs to 75% (3 cores) all hell broke loose
Most likely due to all the conflicting CPU usage restriction, Resource share settings, cache settings & any other app_config.xml project specific limitations, cc_config.xml settings, along with any locally set preferences that override web based ones you may have made.
I don't have any of these sorts of configurations, and am only only running 1 project with a 1 day + 0.2 day extra cache setting and i'm not having any of the issues you are describing while making 100% use of the available cores & threads.
So somewhere between what i have set & what you have set is what is causing the problems.
Check the Manager Event viewer because it will tell you if Rosetta needs more disk space, but whether or not that is a factor in your issues, i've no idea.
Grant
Darwin NT
ID: 93322 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93326 - Posted: 4 Apr 2020, 0:53:32 UTC - in response to Message 93322.  
Last modified: 4 Apr 2020, 0:53:46 UTC

OK, so the target time limits the run length of the task, but how is that affected by or interact with the minimum target setting?

I have no idea what you are asking here. The Target CPU time is the minimum time the Task will run for (unless there is a problem with it).


It is just a target. If your target is 8 hours, and models each take 90 minutes, the 5th model will be completed at 7.5 hours of CPU time. At that point, one reasonably predicts running another model will take you to 9 hours, stopping now only gets you to the 7.5 hours. Neither one is an exact match. But when the prediction exceeds the preference, that is when the next model is not attempted, and the WU is marked completed and reported back, with all of the models completed so far.
Rosetta Moderator: Mod.Sense
ID: 93326 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93331 - Posted: 4 Apr 2020, 2:16:20 UTC - in response to Message 93326.  
Last modified: 4 Apr 2020, 2:19:57 UTC

It is just a target. If your target is 8 hours, and models each take 90 minutes, the 5th model will be completed at 7.5 hours of CPU time. At that point, one reasonably predicts running another model will take you to 9 hours, stopping now only gets you to the 7.5 hours. Neither one is an exact match. But when the prediction exceeds the preference, that is when the next model is not attempted, and the WU is marked completed and reported back, with all of the models completed so far.


. . Thank you, now I understand what the target is about. Is there any preferred or required number of models that need to be run for each task? And is there any metric to know how long each model is taking on a particular host? One thing I have noted by going over the limited information available for my host is that the tasks that seem to run OK are using the 32 bit app, the 3 tasks that have failed (spectacularly) were all for the 64 bit app. Do you know of anything I might have done to cause an issue with the 64 bit app? Grant indicated that RAM is a major factor in running Rosetta tasks, does the 64 bit app significantly increase the demand on RAM? What are the ram requirements per task for both the 32 bit app and the 64 bit app? Does the 64 bit app increase productivity to any appreciable degree or improve the quality of the modelling? Sorry to barrage you with so many questions but I am starting from a position of knowing nothing about this project.

. . Thanks for any help or info you can provide.

Stephen

? ?
ID: 93331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1686
Credit: 17,999,343
RAC: 23,616
Message 93332 - Posted: 4 Apr 2020, 2:28:31 UTC - in response to Message 93331.  

Grant indicated that RAM is a major factor in running Rosetta tasks, does the 64 bit app significantly increase the demand on RAM? What are the ram requirements per task for both the 32 bit app and the 64 bit app? Does the 64 bit app increase productivity to any appreciable degree or improve the quality of the modelling? Sorry to barrage you with so many questions but I am starting from a position of knowing nothing about this project.
I've seen next to no difference in APR between the Applications. And the type of Task being runs determines how much RAM & HDD space is needed.
Present batch of work is using around 700MB per Task (some as low as 400MB, others as high as 800MB). Previous Tasks i was seeing up to 1.3GB on some Tasks, around 1GB for most.
Grant
Darwin NT
ID: 93332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93348 - Posted: 4 Apr 2020, 5:20:41 UTC - in response to Message 93331.  

Is there any preferred or required number of models that need to be run for each task?


Required, one. That's it. If it takes more than a 2 hour preferred runtime to complete one model, then that is all that it will do. Each machine is different. Each protein is different. Each work unit could be used for millions of models, so the runtime preference is basically just a round number to try and break things up into a digestible size.
Rosetta Moderator: Mod.Sense
ID: 93348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93442 - Posted: 5 Apr 2020, 0:08:45 UTC - in response to Message 93348.  
Last modified: 5 Apr 2020, 0:12:02 UTC

Is there any preferred or required number of models that need to be run for each task?


Required, one. That's it. If it takes more than a 2 hour preferred runtime to complete one model, then that is all that it will do. Each machine is different. Each protein is different. Each work unit could be used for millions of models, so the runtime preference is basically just a round number to try and break things up into a digestible size.


. . Just out of curiosity can you offer any reason why they call the models 'decoys'? It seems a strange term to use. I ask because I finally have some completed tasks and went looking for info in the result files.

Stephen

? ?
ID: 93442 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93446 - Posted: 5 Apr 2020, 0:59:09 UTC - in response to Message 93442.  
Last modified: 5 Apr 2020, 2:48:46 UTC

IIRC it has something to do with looking at 10 million of them, with only about 10 or 100 really worth studying further. MOST of them are decoys. i.e. not the solution you are looking for.
Rosetta Moderator: Mod.Sense
ID: 93446 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93449 - Posted: 5 Apr 2020, 1:17:36 UTC - in response to Message 93446.  
Last modified: 5 Apr 2020, 1:19:03 UTC

IIRC it has something to do with looking at 10 million of them, with only about 10 or 100 really worth studying further. MOST of the are decoys. i.e. not the solution you are looking for
.


. . OK, thanks for that ...

Stephen

:)
ID: 93449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
anndelkyl

Send message
Joined: 8 Nov 19
Posts: 2
Credit: 29,074
RAC: 0
Message 93777 - Posted: 7 Apr 2020, 21:19:28 UTC - in response to Message 93449.  

I'm getting 18-20 hour runs but F@H it's only 4 hours.
ID: 93777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93782 - Posted: 7 Apr 2020, 21:30:51 UTC - in response to Message 93777.  

R@h runtimes are based on a rosetta-specific preference set via the website for each venue you have setup. Tasks that run more than 4 hours longer than the runtime preference configured are ended by the watchdog.
Rosetta Moderator: Mod.Sense
ID: 93782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93796 - Posted: 8 Apr 2020, 0:51:45 UTC - in response to Message 93777.  
Last modified: 8 Apr 2020, 0:57:35 UTC

I'm getting 18-20 hour runs but F@H it's only 4 hours.


. . Different projects run different apps. It is not unusual for their task runtimes to differ greatly from one another. I have not tried Folding at Home but I suspect their apps/tasks require less time to do their thing. In R@H there is a preference setting for target run time and minimum run time, with v4.12 the default target is 8 hours, Rosetta Mini has a different default it seems. If you want to spend less time on each task you can reduce those settings. R@H creates multiple models called Decoys for each task, the run time you set will affect how many models are created when you run the task. I hope that helps.

Stephen

. .
ID: 93796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 107
Credit: 838,868
RAC: 799
Message 93827 - Posted: 8 Apr 2020, 7:23:05 UTC

It normally takes my computer between 4 to 6 hours to process a Rosetta task. Finished up with SETI@H and it's now just running R@H and Asteroids.

Rosetta just sent me 22 tasks with a due date of April 11, three days from now. Their run times are more than 5 hours each.

There's no way this computer can complete 22 tasks in 3 days. There are also numerous "errors while computing" reported.

Hadn't gotten much Rosetta work for quite a while, then they all came at once.

That's just not reasonable. I've had this problem before.

Why do they do that??

I set my preferences to store only 1 day of work.

Steven Gaber
Oldsmar, FL
ID: 93827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1686
Credit: 17,999,343
RAC: 23,616
Message 93833 - Posted: 8 Apr 2020, 8:13:19 UTC - in response to Message 93827.  
Last modified: 8 Apr 2020, 8:15:19 UTC

It normally takes my computer between 4 to 6 hours to process a Rosetta task. Finished up with SETI@H and it's now just running R@H and Asteroids.

Rosetta just sent me 22 tasks with a due date of April 11, three days from now. Their run times are more than 5 hours each.
The Default run time for a Rosetta task is 8 hours. You can select other target times in your preferences if you wish.


There's no way this computer can complete 22 tasks in 3 days. There are also numerous "errors while computing" reported.
There has been a batch of bad Tasks released that are erroring out as soon as they start t to run.


Hadn't gotten much Rosetta work for quite a while, then they all came at once.
There hasn't been any Rosetta work for quite a while, now there is work & the BOINC Manager is trying to honour your Resource share settings with the other projects.


That's just not reasonable. I've had this problem before.
And what did you do, abort work? If so, that just exacerbates the problem.
Just let things be, then the Manager & the Rosetta servers will figure out what your system can and can't do, and the Estimated completion times will settle down to their correct values and then the issue won't occur in the future.
Unless of course you join another project, change your cache size or change the Target CPU runtime or Resource share settings. Then it will need to figure out things all over again.


I set my preferences to store only 1 day of work.
What value did you set for the additional days setting?
Grant
Darwin NT
ID: 93833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1686
Credit: 17,999,343
RAC: 23,616
Message 93932 - Posted: 8 Apr 2020, 23:46:27 UTC
Last modified: 8 Apr 2020, 23:53:35 UTC

Has there been another change in the default Target CPU runtimes?
My settings are to use the default, and the latest batch of Mini work has Estimated remaining times of 10.5 and 11.5 hours.


Edit-
Hmm.
Had a closer look. The system that has already processed quite a few Mini tasks, and the last batch it did were the instant error Tasks, is the one with the long Estimated completion times.
The other system which has only done a couple of Mini Tasks, one of which was of the series that errored out (but this one didn't) is showing 6hr Estimated times.
I'm guessing all those instant errors scrambled things up somewhat.
Grant
Darwin NT
ID: 93932 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93934 - Posted: 9 Apr 2020, 0:59:59 UTC - in response to Message 93932.  
Last modified: 9 Apr 2020, 16:09:10 UTC

I'm guessing all those instant errors scrambled things up somewhat.


That puts it pretty well, I'd say. I don't take much stock in estimated runtime, until it is more than half done with the WU. And even then, I like to see 5 minutes of CPU time added, and 5 minutes of estimate pulled off. Sometimes it is still adjusting the estimate, and you don't get a 1-for-1 movement.
Rosetta Moderator: Mod.Sense
ID: 93934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1686
Credit: 17,999,343
RAC: 23,616
Message 93936 - Posted: 9 Apr 2020, 1:37:11 UTC - in response to Message 93934.  
Last modified: 9 Apr 2020, 2:23:02 UTC

I'm guessing all those instant errors scrambled things up somewhat.
That puts it pretty well, I'd say. I don't take must stock in estimated runtime, until it is more than half done with the WU. And even then, I like to see 5 minutes of CPU time added, and 5 minutes of estimate pulled off. Sometimes it is still adjusting the estimate, and you don't get a 1-for-1 movement.
It would be good if the defaults were higher rather than lower than the eventual time- that would help stop people from getting more work than they can handle, particularly when new Applications come out or people add a new system to the project.
On work that's just downloaded Rosetta v4.12 has got the Estimates pretty close (within 15min or so) on one system (the one that has the high Mini Estimates), on the other system that i recently added they're still a good 30min low. So over time the Estimates do settle very close to the actual Target CPU runtime. Which means people end up not getting more than their cache settings ask for and don't end up missing deadlines.

It would be good if as Tasks are allocated to to a system, the Estimated time were set to their Target CPU Runtime. But given Estimated completion times are based on the FLOPs of a Task, and the the CPU's benchmarks, and the Credit granted is based on both of them (amongst other things), it all gets rather ugly, rather quickly.
Grant
Darwin NT
ID: 93936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 107
Credit: 838,868
RAC: 799
Message 93987 - Posted: 9 Apr 2020, 18:09:28 UTC - in response to Message 93985.  

The last time this happened, I just let the time expire on lots of tasks.

I set the storage time for 1 day of work, with the additional time of 0.5 days.

Those errors did occur immediately after starting.

SETI@H just completed its project, so the workload has changed. I may sign onto another project.

So I'll follow your advice and just let it sort itself out.

Thanks for your detailed response.

Cheers,
Stevie G.
Oldsmar, FL

quote]
It normally takes my computer between 4 to 6 hours to process a Rosetta task. Finished up with SETI@H and it's now just running R@H and Asteroids.

Rosetta just sent me 22 tasks with a due date of April 11, three days from now. Their run times are more than 5 hours each.
The Default run time for a Rosetta task is 8 hours. You can select other target times in your preferences if you wish.


There's no way this computer can complete 22 tasks in 3 days. There are also numerous "errors while computing" reported.
There has been a batch of bad Tasks released that are erroring out as soon as they start t to run.


Hadn't gotten much Rosetta work for quite a while, then they all came at once.
There hasn't been any Rosetta work for quite a while, now there is work & the BOINC Manager is trying to honour your Resource share settings with the other projects.


That's just not reasonable. I've had this problem before.
And what did you do, abort work? If so, that just exacerbates the problem.
Just let things be, then the Manager & the Rosetta servers will figure out what your system can and can't do, and the Estimated completion times will settle down to their correct values and then the issue won't occur in the future.
Unless of course you join another project, change your cache size or change the Target CPU runtime or Resource share settings. Then it will need to figure out things all over again.


I set my preferences to store only 1 day of work.
What value did you set for the additional days setting?
[/quote]
ID: 93987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1686
Credit: 17,999,343
RAC: 23,616
Message 94009 - Posted: 9 Apr 2020, 22:14:56 UTC - in response to Message 93987.  
Last modified: 9 Apr 2020, 22:16:13 UTC

I set the storage time for 1 day of work, with the additional time of 0.5 days.
Ok, so what that means is you'll get around 1 days work, then when that cache runs down to the point of the Manager requesting more, you'll end up with at least 1.5 days worth (once the Estimated times match your Target CPU runtime, all bets are off till then).
I generally go with 0.02 for my Additional days settings (approx 28min) so i don't drop much below a days worth before getting more, and i don't end up with much more than a days worth when i do get new work.
And when a new application is released, or there's a bunch of dodgy work & things go haywire again i won't get swamped with work i can't handle.

When running more than one project a cache setting of 0.5 (or less) & 0.02 will keep your systems busy, less likely to get swamped with work when things stuff up, and it takes much less time for the Manager to honour your Resource share settings and balance things out again once things settle down.
Grant
Darwin NT
ID: 94009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 87
Credit: 15,354,449
RAC: 37,654
Message 94237 - Posted: 12 Apr 2020, 13:13:49 UTC - in response to Message 67551.  


Let me take my best shot at these, I cannot confirm the status of the original proposal.


+1

Thank you Moderator for a clear exposition on the "nuts and bolts".

Tom
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 94237 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1686
Credit: 17,999,343
RAC: 23,616
Message 94581 - Posted: 16 Apr 2020, 0:25:34 UTC
Last modified: 16 Apr 2020, 0:26:39 UTC

Looks like we've got a few Tasks that run much longer than the Target time coming through.
rb_04_13_21398_21021_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_08_913288_47_0

DONE ::     1 starting structures  43759.8 cpu seconds
This process generated      1 decoys from       1 attempts
takes 12hr 15min.

For systems with Target CPU Run times of less than 8 hours, will these end with no Valid result? (Error or Invalid) due to not enough time to produce a Decoy?
Grant
Darwin NT
ID: 94581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

Message boards : Number crunching : Discussion on increasing the default run time



©2024 University of Washington
https://www.bakerlab.org