Message boards : Number crunching : For the betterment of BOINC
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Simplex0 Send message Joined: 13 Jun 18 Posts: 14 Credit: 1,714,717 RAC: 0 |
Since many of you seem interested, and mystified by the credit system at R@h, I'll try to clarify a few points. Some other projects simply use CPU seconds, or FLOPS of a task to issue credit. The problem with such a system is that it doesn't reward machines that achieve more work per second due to CPU cache, and memory available. It also depends upon the BOINC Manager to track and report the FLOPS expended on a task, and so some users began to falsify the FLOPS benchmarks and thus the FLOPS of their reported results. If that's the case than you should not run this project using a 16 core, 32 thread Threadripper as it is a waste of money and energy because it is being outperformed by a factor of 2 by a single 2 core Intel(R) Celeron(R) CPU G1620 @ 2.70GHz [Family 6 Model 58 Stepping 9]. As you can see here this host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3391974 is ranked at 45 and is producing slightly more in Rosetta than a 22 core, 44 thread Intel(R) Xeon(R) CPU E5-2696 v4 @ 2.20GHz [Family 6 Model 79 Stepping 1] Thank you for the clarification. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,482,262 RAC: 11,825 |
The goal is to get more people to run BOINC, to join in coding all parts that make BOINC (client, manager, web site, forums, projects, etc.), to test everything, to get them to set up their own projects, to make BOINC a future-proof and reliable brand that isn't dependent on any one person in particular. A lot of project's forums seem to be "abandoned" by admins. A system that, automaticaly, send a montly mail to admins to report activities on forum (ex: this mounth there are 2 new thread and 25 posts). |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Since many of you seem interested, and mystified by the credit system at R@h, I'll try to clarify a few points. Some other projects simply use CPU seconds, or FLOPS of a task to issue credit. The problem with such a system is that it doesn't reward machines that achieve more work per second due to CPU cache, and memory available. It also depends upon the BOINC Manager to track and report the FLOPS expended on a task, and so some users began to falsify the FLOPS benchmarks and thus the FLOPS of their reported results. There's definitely multiple computers merged into that host. There's no way a celeron could output that much RAC. Maybe a network (school?) composed of multiple computers with the same configuration (same CPU, RAM, HDD, OS, etc). It happens to me regularly when I spin up VMs with similar OS and hardware... BOINC thinks it's the same machine. Not sure why people merge multiple hosts into one... (?) 2k RAC is what you get with a decently fast i5. A fast i7 could maybe pull 5k depending on the cores. That's my "thumb benchmark" I have for Rosetta. My 24/48 core/thread EPYC from AMD pulls almost 20k when sharing resources with World Community Grid... so it should pull about 30-40k running Rosetta exclusively. Edit: I just realized my EPYC VM made it into the top 30 hosts lol. Edit2: If you check the WUs of the celeron host you linked, you'll notice it's impossible for a 2 thread machine to report so many WUs a day consistently given that it takes 30k seconds each WU. I counted about 80 WUs returned just on the 25th of September... you'd need about 13 "2 threaded" CPUs to get so many WUs crunched per day @ 8 hours each WU. |
Aurum Send message Joined: 12 Jul 17 Posts: 32 Credit: 38,158,977 RAC: 0 |
1. Projects that run both CPU & GPU WUs must separate their controls. It causes no end of problems trying to cope with a one-size-fits-all approach. Now one goes to Edit Project Preferences in My Account and you check the boxes of the kind of WUs you'll get. If you're running multiple computers then some will not have enough memory to run CPU projects but can run the GPU WUs no problem. Or, it may be better to give more client-side control than presently available with cc_config & app_config. That's not in keeping with your request to make it easy for anybody to join and run BOINC without having to commit any time to learning the details. At present I usually have no choice but to only run GPU WUs for one project and get my CPU WUs from other projects. 2. Every project should have a table spelling out what kind of WUs their project provides with their minimum requirements. I tried to come back to Rosetta yesterday and The Beast of BOINC ate all 16 GB of my RAM and gave 3 of my computers LockJaw. I'm here looking for requirements and ideas for app_config to adapt to each computer I run Rosetta on, but, it should be as easy as clicking Requirements. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
At present I usually have no choice but to only run GPU WUs for one project and get my CPU WUs from other projects. There is another option, though I recommend it with some hesitation. And that is to run two BOINC instances, one for the CPU and the other for the GPU. I do it as a matter of routine now, on both Wndows and Ubuntu machines, and it does not add much time to the basic setup. But the first time through will be a bit sticky. Here is the general idea: https://www.overclock.net/forum/18056-boinc-guides-tutorials/1628924-guide-setting-up-multiple-boinc-instances.html There are a couple of tricks left out (as I recall) on how to start it automatically upon reboot in Windows; I use the task scheduler. Also, on Windows you have to put the GPU on the original BOINC instance, and the CPU on the additional BOINC2 instance, or the GPU won't run. (Similarly with a VirtualBox project - it needs to be on the original BOINC instance.) On Linux, I don't think you have those limitations, at least for the GPU - I routinely put the GPU on the BOINC2 instance there. There are probably a few other fine points you will need to figure out, but I find it now quite useful. |
marmot Send message Joined: 10 Nov 15 Posts: 17 Credit: 2,449,489 RAC: 544 |
End users need better control over work loads. 1) WU Black List: Black list of WU that the end user refuses to accept, controlled at the client, not on the server. For example, I do not want to run any mini-Rosetta on my machines and there is no way to stop them except their manual deletion on 34 clients. <max_concurrent>0</max_concurrent> is not properly interpreted by BOINC.exe. Had a similar situation at LHC@home where even when you choose on the server side no ATLAS WU's, the project still sent down the ~6GB virtual machine data set.. They actually sent down every WU data set and filled my SSD drive. So if we Black List a WU, none of it's parts will be accepted by the machine. 2) Project core affinity control: BOINC workload cache was designed in years when there were few cores per machine. Now we have 32 core desktops available. There needs to be a core management tab in advanced BOINC client control where we assign project affinities to each core or set of X cores (4 preferred). This solves the issue where BOINC doesn't adjust work loads dependent on the <max_concurrent> choices made for work units. The current work cache system assigned to each group of 4 cores, thus having 8 working caches on a 32 core machine, would be a possible easy first incarnation. Setting up 8 BOINC client directories can work but it's not friendly and increases long term management time whereas the affinity tab would have an initial time consuming cost, but would require less time in the long haul. Setting up a series of 4 core virtual machines on a 32 core host also works but it's a time consuming and increases management time by a great deal and introduces RAM and core cycle inefficiencies as 8 OS's are running. 3) Priority WU List. We need methods for handling limited WU releases during a week or very short daily punctuated WU releases. Accept the fact that some projects have limited work available and users might not want to wait days for the working cache to properly prioritize the projects. Give us a method to prioritize certain work units so that the client receives them when available and these WU's can even preempt other work units to the point those work units fail their deadlines.. On the BOINC main forums, one of the devs admitted that BOINC works best when it's assumed an infinite stream of work units and doesn't deal well with outages or short punctuated work releases. High priority WU's need to be checked for every 1 to 5 minutes regardless if the same project already has a full work cache or the cache is full of other project WU"s. 4) Project Priority Ranking: A tool for dealing with aggressive project deadline or work cache abuse: My management time could be reduced by hours a week if I was given the option of ranking projects in order of preemption. Assign a rank to projects and the higher rank projects work units can preempt other projects work units in cases where the lower priority WU has a shorter deadline. This gives users some method of control over the projects (every project thinks theirs is the most important) that have aggressive deadlines in order to manipulate users' work caches to favor their project. Projects are in competition for computation time and aggressive deadlines are one method. Currently it takes setting "switch between tasks every 9999 minutes" and manual WU management. 5) Deal with credit inflation: Credit has been talked about the most in this forum thread. I'll just say that some projects are using credit inflationary techniques as another method of attracting computation base away from other projects. There needs to be a secondary credit normalization formula (or better governance from the BOINC central committee) to prevent an increasing inflationary cycle as each new project has to up the ante. I can tell this is happening because I worry more about magnitude of my crypto coin and my machines almost always end up on projects with low credit returns because the inflated credit projects have lured away my competition. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
6) Get rid of the BOINC scheduler entirely, and just assign each core to a project in a priority list. Then, if one project is out of work, it goes on to the next. You don't try to run 37.6% of one project, 22.1% of the next, ad infinitum. It solves most of the problems, especially with the longer work units such as Rosetta, which tend to get preempted unnecessarily but may remain in memory. Most importantly, it ends the problem of a core sitting there idle with nothing to do just because the BOINC scheduler thinks it doesn't deserve it. I have concluded that a preemptive strike against the BOINC scheduler is the only way to deal with it. |
marmot Send message Joined: 10 Nov 15 Posts: 17 Credit: 2,449,489 RAC: 544 |
Yeah, 2 most important changes that could ease management are:: Project Assignments per Core. and/or How many tasks at once? With options 0 to max core count. (Yes, zero needs to be an option.) Example problem I ran into today: Android Rosetta finishes fine, when it's 1 task along with 3 WCG Zika's on my Lollipop 5.1 tablet. Even though I've reduced the z-cache to 32MB (elimination eludes me), 4x Rosetta fill the 800MB RAM and it's constantly swapping then just gives up doing any work once the screen goes energy mode. All I would want is a user dialog asking me "How many tasks running at once?" So I could choose 1 or 2. I have to pause Rosetta, get WCG to d/l tasks, pause one of those 4 running, unpause Rosetta, where I finally get the situation of 1 Rosetta + 3 Zika running. No common user should need to EVER touch an app_config.xml (and no config file edits on Android at all) just to get only 1 task running at a time... |
Breno Send message Joined: 8 Apr 20 Posts: 30 Credit: 12,842,782 RAC: 13,605 |
Hi there. I bring a suggestion for motivation on COVID-19 times and beyond. It is very interesting -if possible- to add a column on BOINC Manager called "Reach" or "Scope", which informs a particular disease or conjunct of diseases that the folding task aims. For instance: Project | Progress | Name ... | Reach Rosseta@home | 40,43% | PFU_rah_2982383 ... | COVID-19 Rosseta@home | 76,19% | RAW_res_10293u2 ... | ALL Rosseta@home | 01,29% | QWE_rty_123456 ... | COVID-19 + CANCER Something in this sense. The main reason I suggest this is purely motivational. Sincerely, Breno. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,091,870 RAC: 5,224 |
One thing I see most of all asked in the Forums is 'why can't I set project A to run 1 task at a time and probect B to run 3 tasks at a time on my quad core pc and have it stay that way', they aren't talking Resource Share they are talking a per cpu core settings. Waaay too many people saying 'I want more of project B running but all I see is project A running for days on end'. Especially in this time of Pandemics and other crisis it would be nice to be able to crunch for both a Pandemic project and my favorite project at the same time full time, not swapping back and forth between them. Another thing is the problem with cache size between using cpu's and gpu's at the same project, with the gpu being alot faster it needs more work and that causes tons of cpu workunits to be downloaded as well. With the ever faster gpu's coming it would be nice if users could set a cpu cache size and a seperate gpu cache size. Sure it's solved by running different projects for each but most people start out with a single pc and that's not possible for them and they often give up because it's 'too hard'. Maybe have a switch to by default make them together and then change it to seperate them. The last thing, right now, is to list EVERY Boinc Project in the list even if it's limited, alpha, beta, etc. Maybe seperate the current regular listing but then add a seperate section below then with the line ie 'these projects could have problems with limited connectivity and are still in the testing phase use at your own risk'. TN-Grid from example is doing Covid-19 work but not alot of people know about it, YES it is new and is having growing pains but it's still there for those that want to crunch for it. And for some reason WuProp is still not listed but has been around long enough for people to get badges etc. |
yoerik Send message Joined: 24 Mar 20 Posts: 128 Credit: 169,525 RAC: 0 |
One thing I see most of all asked in the Forums is 'why can't I set project A to run 1 task at a time and probect B to run 3 tasks at a time on my quad core pc and have it stay that way', they aren't talking Resource Share they are talking a per cpu core settings. Waaay too many people saying 'I want more of project B running but all I see is project A running for days on end'. Especially in this time of Pandemics and other crisis it would be nice to be able to crunch for both a Pandemic project and my favorite project at the same time full time, not swapping back and forth between them. This! I have to manually suspend and resume tasks to literally force the client to do something like this... and because it doesn't fetch work when a task is suspended - my queue is limited for that project. My computer can handle a larger queue - but it runs out every few days because of that additional requirement - necessary, but harmful because of this client-end limitation. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,433,168 RAC: 23,962 |
I have to manually suspend and resume tasks to literally force the client to do something like this... and because it doesn't fetch work when a task is suspended - my queue is limited for that project. My computer can handle a larger queue - but it runs out every few days because of that additional requirement - necessary, but harmful because of this client-end limitation.You have to keep doing it, because you keep doing it. If you were to set things the way you want them, then just let things be they would eventually balance out according to what you have set for your Resource share. But since you micro manage, that will never occur. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,433,168 RAC: 23,962 |
One thing I see most of all asked in the Forums is 'why can't I set project A to run 1 task at a time and probect B to run 3 tasks at a time on my quad core pc and have it stay that way', they aren't talking Resource Share they are talking a per cpu core settings. Waaay too many people saying 'I want more of project B running but all I see is project A running for days on end'. Especially in this time of Pandemics and other crisis it would be nice to be able to crunch for both a Pandemic project and my favorite project at the same time full time, not swapping back and forth between them.That is presently possible, if you have the hardware with enough cores. Otherwise it is a case of doing enough work to balance the debt between your projects, and if there aren't enough cores to do so simultaneously, then as a project's debt builds up then it will be given more processing time with the next request for work. The advantage of Resource share over x number of cores for a given project, is it doesn't need re-setting when you add or remove or suspend or whatever various projects. It sorts itself out (eventually, if you let it). It also allows all cores to be used by any project if one of the chosen projects has no work. and If that project has work again, then it will start processing work for it. Maybe an easier method of setting max_concurrent, avg_ncpus, project_max_concurrent etc from the BOINC Manger Advanced view (similar to the Event log options) for those that feel the need to micro manage things. Another thing is the problem with cache size between using cpu's and gpu's at the same project, with the gpu being alot faster it needs more work and that causes tons of cpu workunits to be downloaded as well.BOINC has been able to provide a full cache's worth of work for CPU & GPU, without one getting too much or the other not getting enough regardless of the difference in processing abilities for years now. As long as the estimated completion times for each computing resource are accurate (CPU or GPU), then the number of Tasks downloaded for each resource (CPU or GPU) will be enough to meet your cache settings (unless the project itself has a hard limit on the number of Tasks a given host can have to limit the load on the servers). So if that is an issue you are experiencing, you need to take up with the project in question and ask them why it is occurring (most likely they are still using DCF (Duration Correction Factor) which was depreciated years ago). Grant Darwin NT |
yoerik Send message Joined: 24 Mar 20 Posts: 128 Credit: 169,525 RAC: 0 |
I joined in March - and have adjusted resource share to attempt to get this result for over 2 months without manual intervention on the client end. Because credit can vary between tasks, the resource share has tended to be fairly useless on my 4-core Pentium laptop. Which is why it makes sense to have a setting, instead of playing with resource share in vain - putting you at the mercy of credit, which you have no control over - and has been discussed numerous times across this forum and others, how broken it can be. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,433,168 RAC: 23,962 |
I joined in March - and have adjusted resource share to attempt to get this result for over 2 months without manual intervention on the client end.And adjusting the Resource share also changes things, as does suspending/ restarting of tasks or projects, so it has to re-work things, yet again. Having a low powered, low core count system with a (relatively) large cache setting, and running multiple projects means it will take months for things to settle down from the time you make your last change to the settings, and no further suspending/ restarting of tasks or projects. As soon as you change something, then it has to re-work things, yet again. Having zero cache, and getting the Resource share setting in WCG set to play well with the rest (WCG really does seem to make things difficult when it comes to Resource share) and things should settle down within 2 months with no further tweaking in that time (all depending on how much uptime the system has, and how much of that time it can do BOINC work, of course). The more the projects, the larger the cache, the lower the computation ability, the lower the number of cores/threads, the less time the system is able to do BOINC work, then the longer it takes for things to stabilise. Because credit can vary between tasks, the resource share has tended to be fairly useless on my 4-core Pentium laptop.Resource share & job scheduling doesn't actually use Credit awarded. It uses REC- Recent Estimated Credit in order to overcome the problems of differing Credit allocation between projects, as well as the differing processing time for Tasks impacting on the granting of Credit. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,091,870 RAC: 5,224 |
One thing I see most of all asked in the Forums is 'why can't I set project A to run 1 task at a time and probect B to run 3 tasks at a time on my quad core pc and have it stay that way', they aren't talking Resource Share they are talking a per cpu core settings. Waaay too many people saying 'I want more of project B running but all I see is project A running for days on end'. Especially in this time of Pandemics and other crisis it would be nice to be able to crunch for both a Pandemic project and my favorite project at the same time full time, not swapping back and forth between them.That is presently possible, if you have the hardware with enough cores. Otherwise it is a case of doing enough work to balance the debt between your projects, and if there aren't enough cores to do so simultaneously, then as a project's debt builds up then it will be given more processing time with the next request for work. Nooo you missed the part about running project A on 1 cpu core all the time and project B on 3 cpu cores all the time, that can NEVER happen with the current resource share settings. Waaaaay too many time you get project A running on all cpu cores then project B running on all cpu cores with an occassional blip of one from project A or B running on one cpu core as it switches projects to match the resource share. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,433,168 RAC: 23,962 |
Nooo you missed the part about running project A on 1 cpu core all the time and project B on 3 cpu cores all the time, that can NEVER happen with the current resource share settings. Waaaaay too many time you get project A running on all cpu cores then project B running on all cpu cores with an occassional blip of one from project A or B running on one cpu core as it switches projects to match the resource share.Re-check what i said. I didn't miss that point, and i pointed out why it's not the best way to do things and that Resource share is the better better method. Then i suggested a way to make easier to do it anyway, than having to use config_xml to achieve it as you do now. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2112 Credit: 41,032,351 RAC: 20,241 |
One thing I see most of all asked in the Forums is 'why can't I set project A to run 1 task at a time and probect B to run 3 tasks at a time on my quad core pc and have it stay that way', they aren't talking Resource Share they are talking a per cpu core settings. Waaay too many people saying 'I want more of project B running but all I see is project A running for days on end'. Especially in this time of Pandemics and other crisis it would be nice to be able to crunch for both a Pandemic project and my favorite project at the same time full time, not swapping back and forth between them. I think the answer is "stop wanting it". It's an overly narrow perspective. There's no difference between wanting 3+1 running at all times and running 12 of one then 4 of the other over a day, or 36 and 12 over 3 days of this project's deadlines. Sure, there may be memory constraints of one project over another, as we occasionally see, but that's an entirely different matter. To me (and I've given myself this rep and I stand by it) it's an example of contributors thinking projects exist for the benefit of the weird quirks of individual users, rather than users offering their capacity to projects for whatever the projects need us to run for them. A desire for the tail to wag the dog. Well, that's not how it works, nor should work. Whatever this suggestion even is, it's not for the betterment of Boinc, nor the betterment of individual projects. So it's a no from me. To be more frank, it 's a "not ever". |
yoerik Send message Joined: 24 Mar 20 Posts: 128 Credit: 169,525 RAC: 0 |
One thing I see most of all asked in the Forums is 'why can't I set project A to run 1 task at a time and probect B to run 3 tasks at a time on my quad core pc and have it stay that way', they aren't talking Resource Share they are talking a per cpu core settings. Waaay too many people saying 'I want more of project B running but all I see is project A running for days on end'. Especially in this time of Pandemics and other crisis it would be nice to be able to crunch for both a Pandemic project and my favorite project at the same time full time, not swapping back and forth between them. BOINC needs to be as user friendly as possible. That it will take me 3 months - after wasting months finding the correct resource share - is not user friendly. Nevermind that it can be completely undone by a temporary WU shortage at any single project, or an issue with credits, completely resetting the process. This is a volunteer project, and giving volunteers as much control over their contribution as possible isn't too much to ask. Just as it isn't too much to be kept informed about the impacts of the community's overall contribution - but that's a different issue. The point is - it isn't insane to ask the developers to add this to their bucketlist for the future, so that user control over their contribution remains a priority. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,433,168 RAC: 23,962 |
BOINC needs to be as user friendly as possible. That it will take me 3 months - after wasting months finding the correct resource share - is not user friendly.And as i pointed out, that is a result of your choice of projects (how about getting on WGC's case to actually follow the BOINC standard? That would help a lot), and your choice of cache setting, and frequent micro managing. Nevermind that it can be completely undone by a temporary WU shortage at any single project, or an issue with credits, completely resetting the process.Never mind that it is not undone by Credits- as i pointed out they are not used for work scheduling or Resource share. And it is not undone by WU shortages as the Manager will juggle the work mix to meet your Resource share once the WU shortage is over. The fact is it is already possible to limit the number of cores/threads a project can use, but it requires people to manually edit configuration files. And it requires them to remember what they have done, because the complaints to projects often come in when their systems end up short of work due to project WU issues, and their settings stopping other projects from taking advantage of those unused cores/threads until the other project's issues are resolved. Which doesn't occur with Resource share only based work allocation. Grant Darwin NT |
Message boards :
Number crunching :
For the betterment of BOINC
©2024 University of Washington
https://www.bakerlab.org