Message boards : Number crunching : Rosetta 4.0+
Previous · 1 . . . 16 · 17 · 18 · 19
Author | Message |
---|---|
[AF>Le_Pommier] Jerome_C2005 Send message Joined: 22 Aug 06 Posts: 42 Credit: 1,258,039 RAC: 0 |
As expected it canceled hundreds of tasks. I'm afraid I spoke too soon : it started to request to many tasks again without changing anything to my small cache, i still have 120 waiting to run and it already canceled 160 again because of the deadline in the past few days... with only 17 valid tasks in the log... So it is still requesting tasks way above the cache setting :( |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I'm afraid I spoke too soon : it started to request to many tasks again without changing anything to my small cache, i still have 120 waiting to run It looks like that is your machine with BOINC 7.16.6. I had the same problem on WCG after I upgraded BOINC from 7.14.2 to the next version, whatever it was. It went berserk and downloaded work units until it reached the 10 day limit (or got exhausted, whichever came first). I ended up with hundreds of work units. It is apparently due to a change in the BOINC scheduler. But the servers don't necessarily know how to deal with it, at least until they "learn". I posted about it on the WCG forum a few months ago. I never found a good solution, except to manually control the downloads. After a while, it starts working again. Good luck. |
Jonathan Send message Joined: 4 Oct 17 Posts: 43 Credit: 1,337,472 RAC: 0 |
Try exiting boinc and all tasks. Edit your Boinc preferences on the project to use 8 cpus out of 24. Roughly 33%. Start up Boinc. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1699 Credit: 18,186,917 RAC: 24,275 |
Try exiting boinc and all tasks. Edit your Boinc preferences on the project to use 8 cpus out of 24. Roughly 33%. Start up Boinc.No need to exit BOINC to do that, just make the changes on the Web site, Update them. Then the next time the BOINC Manager contacts the server (or you click on update) it will get the new settings. Having said that, hopefully this will be noticed by those that can do something, so they can check their changes- it shouldn't be occurring with the latest work allocation changes, Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 396 Credit: 12,254,928 RAC: 11,616 |
I'm afraid I spoke too soon : it started to request to many tasks again without changing anything to my small cache, i still have 120 waiting to run I changed the 10 day limit in the cc_config file down to 1 day because I didn’t like the was one project would run away with the machine after not having WUs for a while. I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I changed the 10 day limit in the cc_config file down to 1 day because I didn’t like the was one project would run away with the machine after not having WUs for a while. I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is. Mine was set for the default (0.1 + 0.5 days). It ignored that. But it seems to have finally collapsed once it reached the 10-day limit, which maybe is part of the server code. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1699 Credit: 18,186,917 RAC: 24,275 |
I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is.The larger the cache, the longer it takes to determine how long different Tasks on different applications on different projects run for. Until it sorts that out, there's no way it can meet your Resource share settings. The smaller the cache, the sooner it can get things sorted. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 396 Credit: 12,254,928 RAC: 11,616 |
I changed the 10 day limit in the cc_config file down to 1 day because I didn’t like the was one project would run away with the machine after not having WUs for a while. I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is. Not the buffer size :- <rec_half_life_days>X</rec_half_life_days> A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs. |
[AF>Le_Pommier] Jerome_C2005 Send message Joined: 22 Aug 06 Posts: 42 Credit: 1,258,039 RAC: 0 |
@Jonathan & Grant : " Edit your Boinc preferences on the project to use 8 cpus out of 24. Roughly 33%" I obviously don't want to do this, I want the 24 cores to be used, not only 8 out of 24 (I wouldn't rent such a host in that case). I limit rosetta via an app_config to 6 now (I found out even 8 was too much for the 8 GB of the machine...) and all the rest is crunching with universe tasks at the moment. I suspect this might be the reason why the rosetta cache is too big, maybe it actually calculates a required number with 24 cores and not 8 or 6 ? but still, with the very small cache I have set it doesn't make much sense. But I assume it will self-regulate after some time, now it has 118 on-going tasks (and 95 recently canceled for deadline), this is much less than the 1000 I had at the very beginning (when I had a bigger cache). And anyway it is a standard boinc behavior to cancel unprocessed tasks at the deadline, so let it be. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Not the buffer size :- OK, I see what you are saying, but I am not sure why you set that larger. I want the estimated time to converge faster. So I routinely set my mine as follows when installing BOINC: <rec_half_life_days>1.000000</rec_half_life_days> That was not the source of my problem. It was some incompatibility between the new BOINC (after 7.14.2) and the server. It worked OK on some projects, and not others. I have not seen the problem for a while now, so it eventually corrects itself. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 396 Credit: 12,254,928 RAC: 11,616 |
Not the buffer size :- I have also set mine down to 1 day (the set it larger was part of the official documentation) and I didn’t know whether it would help you so I made the suggestion on the off chance that it would. |
furukitsune Send message Joined: 19 Mar 16 Posts: 9 Credit: 7,221,780 RAC: 724 |
getting error on all v4.20 tasks, which still validate: https://boinc.bakerlab.org/result.php?resultid=1214389007 Extracting in slot directory: minirosetta_database.zip error: cannot create ./minirosetta_database/scoring/qsar/shape_histogram_data.js Permission denied also happened on other versions. all other files in minirosetta database unpack w/o problems. can copy minirosetta to another directory and unpack this file no problem. running win7, javascript is enabled. file seems to be numbers only. any suggestions? fk |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
furukitsune wrote: error: cannot create ./minirosetta_database/scoring/qsar/shape_histogram_data.jsAre you running security software which might be blocking this file? If so, check the logs for that software. If it is blocking it, consider telling it to ignore the BOINC application and/or data directory. |
Stevie G Send message Joined: 15 Dec 18 Posts: 107 Credit: 865,910 RAC: 2,310 |
Project server status shows work has been available, but I have not received any Rosetta tasks for over a week. Why is that? Steven Gaber Oldsmar, FL |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1699 Credit: 18,186,917 RAC: 24,275 |
Project server status shows work has been available, but I have not received any Rosetta tasks for over a week.See my response to this question you made in another thread. Grant Darwin NT |
furukitsune Send message Joined: 19 Mar 16 Posts: 9 Credit: 7,221,780 RAC: 724 |
I know this is an old post but I thought I would post this if anyone does a search What was blocking the unzip of the .js file was windows "User Acess Control" this is a security feature in windows which prevents programs from starting other programs since java is a scripting language , apparently it triggered UAC thinking it might be a virus. Windows did not generate an error message so I found this by accident. fk |
Message boards :
Number crunching :
Rosetta 4.0+
©2024 University of Washington
https://www.bakerlab.org