Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
R@h adapts to changing requirements. With these new large protein models coming soon, tagged with a 4GB memory bound, and with models that may take several hours to run, enough that the watchdog has been extended from its normal 4hours to 10 hours, it seems credit may need some changes as well. Normally, credit is granted based on the cumulative reported CPU time per model. And so a fast machine with lots of memory computes more models and gets more credit than an older system. But, in the case of these 4GB WUs, they will not even be sent to machines that do not have at least 4GB of memory (and normally BOINC would only be allowed to use less than 100% of that, so I should say where BOINC is allowed to use at least 4GB). So there will be no struggling Pentium 4s reporting any results to reflect the difficulty in the cumulative average. Now a 4GB tagged WU will generally not consume that much memory to run. It is an upper limit, if the BOINC Manager sees requests for memory that exceed that 4GB the task is actually aborted. So, it seems reasonable that these 4GB work units should come with a premium on credits granted. But how much of a premium is reasonable? There are many ways to look at it, so we thought we'd open it up for discussion. Please keep things respectful. Probably best to just state your own perspective on the topic and not address other posts directly, and certainly no need for rebuttals here. We're trying to brainstorm. Rosetta Moderator: Mod.Sense |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,031,635 RAC: 13,511 |
A fair day’s credit for a fair day’s computation would be my take. If a normal WU takes 80,000 GFLOPs and gets 300 credits and these new resource hungry WUs take 120,000 GFLOPs then grant 450 credits. I know that the GFLOPs will vary between machines but there’s surely some way of estimating it from the complexity of the task and the number of decoys created. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
It will depend on the checkpointing. If these long models need a continuous span of 5 hours to complete, then a fair amount of work will be lost on machines that are not running 24/7. It will also depend on the % of time where more than 1 GB of memory is required to run it. I don't watch credits much. I'll just add some WCG to the mix (which are usually low memory WUs), and let the BOINC Manager worry about what to dispatch when. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
likeapresident Send message Joined: 14 Mar 20 Posts: 7 Credit: 1,096,138 RAC: 0 |
how about Increase the credit by the memory size requirement. So if normal work units only needed 1GB of memory and issued 10 credit. If these large models need 4GB of memory, the credit should be 40 credits. |
teacup_DPC Send message Joined: 3 Apr 20 Posts: 6 Credit: 2,744,282 RAC: 0 |
So, it seems reasonable that these 4GB work units should come with a premium on credits granted. But how much of a premium is reasonable? I am still quite new to actively using distributed computing, so I hope my thoughts are of relevance. The footprint of a WU can be seen as a multiplication of the resources processor capacity and memory space. I am not fully aware what disk space is needed, so I leave that aside. When needing (worst case) 4GB of memory the 4GB task prevents the client from running 4 1GB tasks at the same time. The single 4GB task will probably use less processor capacity (1 core?) than the parallel processor capacity needed for the 4 separate 1GB tasks (4 cores?). So per time unit the credits should be positioned somewhere between 1 1GB task and 4 1GB tasks. More than 1 1GB task while the memory use is four times as much, and less than 4 1GB tasks, while the single task can be completed with one core. Where exactly to position the credits between 1x1GB and 4x1GB depends on the availability of processor cores and memory in the clients capable for the 4GB jobs, you can judge that better than I. My long shot will be that the credits will end up somewhere between 2,5 and 3 1GB jobs, per time unit. When running a 4GB WU with one core more data needs to be dealt with, so it is probable the task will need more time. This can be covered with the time dependency in the credits. Maybe an extra bonus for the somewhat higher risk of failing because of the longer throughput time, as others did suggest as well. I expect you do not want to end up with a bias toward 1GB or 4GB jobs, while both are needed. For the clients that can handle the 4GB jobs the bias should be neutral. Unless you expect a tendency towards more 4GB jobs with respect to 1 GB jobs, or the other way around, then you want a bias. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2112 Credit: 41,036,141 RAC: 20,543 |
Pick a number. If you get more than 3 complaints about it, increase it further. If you get 3 or fewer, keep it as it is. Only half-joking... |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
Where do I spend my credits again? I seem to be accumulating a lot of them but I'm not sure where the credit store is. I'm anxious to start redeeming them for valuable prizes and rewards. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,438,418 RAC: 24,367 |
I expect you do not want to end up with a bias toward 1GB or 4GB jobs, while both are needed. For the clients that can handle the 4GB jobs the bias should be neutral. Unless you expect a tendency towards more 4GB jobs with respect to 1 GB jobs, or the other way around, then you want a bias.That's the thinking. Over all, the effect should be neutral. People shouldn't lose out for processing these larger RAM requirement Tasks, and they shouldn't get a boost either. All the work is important, so if a Tasks stops 2 or more others form being processed at that time, it needs to offset that loss in production. Credits can't buy you a toaster, but they can let you see how you are doing, and how much you have done to help Rosetta. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,093,280 RAC: 5,341 |
So, it seems reasonable that these 4GB work units should come with a premium on credits granted. But how much of a premium is reasonable? I think the new longer tasks should get more than the credit awarded for running 4 1gb tasks, the idea being that they can't be run by everyone and that by definition means older and slower pc;s which should be enouraged to be replaced or updated as over time they simply won't be able to keep up. How much more depends on the priority the Project places on these new workunits, if they are just new workunits than only a minimal amount of credit above the amount 4 1gb tasks would get, on the other hand if the new tasks are a higher than normal priority than a higher credit should be given to encourage people to crunch them instead. |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
The answer is +25% credit compensation. I've simply plugged in the data (4x memory requirement) into my equations for 3700X and 3950X at the The most efficient cruncher rig possible thread that amortizes the cost of power and capital expenditure against produced RAC and solved for the needed RAC correction to arrive at the same RAC/$/5years. I assumed 0.3W/GB of power consumption for DDR4 RAM. Although, you could rephrase the same question in another way: based on supply and demand, if a great majority of volunteers have insufficient amount of memory, how do I incentivize them to purchase more? If you put it that way, adding +50% may not be that far fetched. |
Jonathan Send message Joined: 4 Oct 17 Posts: 43 Credit: 1,337,472 RAC: 0 |
Fixed link for sangaku https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13791 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,255,046 RAC: 3,833 |
Any such extra large workunit should come with a way to limit the number of that type of workunits in memory at any one time. My computers can handle more than one at a time, but not one for every virtual core. I'll leave the discussion of extra credits to others. |
WBT112 Send message Joined: 11 Dec 05 Posts: 11 Credit: 1,382,693 RAC: 0 |
While I still prefer limiting these "monster" workunits to 1 or 2 per host (WCG does that for 1 GB+) for extra credits we need to look at the hardware of the average active host. Let's assume the average host with MORE THAN 4 GB RAM has 4 Threads and 8 GB of RAM (which is a bit low but i don't know the numbers). 75% of RAM usage while active is default (?) means 6 GB RAM usage is allowed normally. So usually it can process 4 "normal" workunits with 1 GB, making it use 100% of its threads. When you now change the preference to 4 GB this would change the capacity to only one workunit.. so a 75% bonus would be needed to offset this. However considering that the machine will likely not only process these monsters or might have other projects running at least +25% seems like an appropriate amount imho |
strongboes Send message Joined: 3 Mar 20 Posts: 27 Credit: 5,394,270 RAC: 0 |
The credit system is broken as far as I'm concerned anyway, 2 virtually identically named units can finish within a minute of each other and have wildly different credits. It makes absolutely no sense at all. Imo having a time based system is one issue, instead there should be no time, the units should be simply setup so that they complete x amount of decoys, the researchers can set each run of work so that x number of decoys take a desired time Example, rb1, 1 decoy takes 1 hour approx on reference hardware, credits awarded per decoy 100, unit size 4 decoys. Approx runtime 4 hours. Best hardware takes 3 hours, worst 5 hours, faster hardware clearly gets more reward per hour. Rb2 1 decoy takes 15 mins approx on reference hardware, credit awarded per decoy 25, unit size 12 decoys. Etc Obviously size can be set to what researchers want, perhaps instead of time preference for crunchers simply have wu size, small, medium, and large which could be a preference only, but wouldn't preclude you getting wu of any size Perhaps a further option such as in wcg where you can set a number for these new larger wu's being proposed to run on your machine, default could be set to 1 per 16gb of machine memory to allow for other tasks to run. I imagine credit compensation should be on the basis that if these larger units require you to suspend a core to allow it to run then double the credit should be awarded. Enough for this post but is there no way to allow a workunit to use more than 1 core? This would reduce memory issues considerably and allow larger work to be run. |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
In my opinion, the credit system is not broken at all, it is working well as intended. Please read the posts that explain how credits are awarded in Rosetta@home. * https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669&postid=10377 * http://https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2194&postid=24612#24612 Basically, from what I understand, what really bothers you right now is WU to WU variability. Note that this will all average out on the long term, even after a few days, but definitely after 2 weeks (RAC). Thus your aggregate credit count and RAC is still as good as anything for the purpose of keeping up the friendly competition with your peers, getting feedback about your contribution, fine tuning the performance of your hardware, checking whether your boxes are producing as expected for the given kind of hardware, etc. Hence what you are asking (precise WU flops estimates) would require lots of development and maintenance time to be devoted to something that isn't that important at all and would take away time from research. |
strongboes Send message Joined: 3 Mar 20 Posts: 27 Credit: 5,394,270 RAC: 0 |
In my opinion, the credit system is not broken at all, it is working well as intended. Please read the posts that explain how credits are awarded in Rosetta@home. I'm aware how credits are awarded, there is huge variation. What I've suggested I doubt is much of a change, instead of a wu being called to finish after a time period it would be a decoy count number which is set by researcher. Before batch is sent out it is run on a known hardware to determine credit per decoy awarded. Its very simple. |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
R@h adapts to changing requirements. With these new large protein models coming soon, tagged with a 4GB memory bound, and with models that may take several hours to run, enough that the watchdog has been extended from its normal 4hours to 10 hours, it seems credit may need some changes as well. if( maxMemoryUsed > 1) { grantedCredits = normalCredits * maxMemoryUsed; } else { grantedCredits = normalCredits; } E.g. a host gets 40cr/h. If it uses up to3.5GB of memory, you pay it 40*3.5=140cr/h. |
torma99 Send message Joined: 16 Feb 20 Posts: 14 Credit: 288,937 RAC: 0 |
For the masses I think there should be extra credit for bigger workouts. For me doesn't matter. What I would like more to have some descriptions about the workunits. Not novels, but maybe 50-80 words (so I can google them more if I am interested in the topic), maybe the lab which will use the data, or some info about the researchers. For me that would matter more, because I am just an average joe with 16 threads, but moreover because I do not believe in the credit system. If I were some system administrator at a huge server farm and I could convince my bosses to let me use some percent of the idle time, I would be the king cruncher, but I think the scientific results matter more. I joined in mid February. Since then there was an update on Rosetta, I could send back some Ralph WUs, there was the thread with the fluorescent proteins, now there is a discussion about this more complex folding, so as a commoner I assume the project heads into the right direction, for me that counts. And that is what convinced me, that till the end of the year I will try my best to run my computer 7/24. And after that I will adjust to my financial situation (maybe have to turn off some threads to cut a couple of euros in the electricity bill). No credits can change that, I could have quadrillion-zillion credits, but that is just a number written in a database , the real value is in the act to voluntarily donate computing time to the scientists to help them solve problems which otherwise could take more time. (Sorry for my broken English, learned the language alone ;-) ) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,487,316 RAC: 12,258 |
I'm very happy for this new science page, but i cannot understand clearly how these wus will run. There are more decoys in a single wu or is just a single big decoy? And if it is the second option and i restart the pc after 7hs, the wu will restart from 0? If so, we loose all "little volunteers" that haven't 24/7 system |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 4805 Credit: 0 RAC: 0 |
I want to make this clear since there is some mis-information going around. These long, up to 2000 residue, sequences that are sometimes, but not often, submitted to Robetta, our protein structure prediction server, have been around for a while now. They are nothing new, may or may not be related to COVID-19, and are rare. We have just adjusted the logic to make sure these jobs are assigned with enough memory and time to complete . The vast majority of jobs have a smaller memory footprint of less than 2g and produce models at timescales in minutes and not hours. Run times and memory usage may vary but a typical 2000 residue Rosetta comparative modeling job from Robetta (these jobs are rare) took a little over an hour to produce 1 model and used 1.8 gigs of RAM on our local cluster. These jobs should not be confused with the problematic cyclic peptide jobs (which have been canceled) that users reported sometimes taking longer than the cpu run time preference to complete. This also was a rare event and likely due to random trajectories that were not passing model quality filtering criteria. These cyclic peptide jobs have a small memory footprint and can produce models at a faster pace. These issues highlight the fact that Rosetta@home runs a variety of protocols for modeling and design, the result of which can be seen from the variety of research publications and projects related to diseases such as COVID-19 and cancer, vaccine development, nano-materials, cellular biology, structural biology, environmental sciences, and the list goes on. These are within the Baker lab and IPD, but there are also researchers around the world using the Robetta structure prediction server for a vast variety of research. |
Message boards :
Number crunching :
Tells us your thoughts on granting credit for large protein, long-running tasks
©2024 University of Washington
https://www.bakerlab.org