Message boards : Number crunching : High number of invalid tasks
Author | Message |
---|---|
tvdsluis Send message Joined: 27 Mar 20 Posts: 11 Credit: 514,960 RAC: 0 |
Lately i have a high number of validation errors. Like this one: https://boinc.bakerlab.org/rosetta/result.php?resultid=1446981031 Nothing changed on my end, and the system is running fine and stable as always. This system also runs a number of WCG tasks without any problems or errors. Also the reported runtime of 2 minutes is not correct. This task ran 4+ hours before being invalidated. It's a win10 pro system with an amd ryzen 7 2700 with 16Gb of memory. It also runs FAH tasks GPU and CPU without any roblems. Anyone else have a growing number of validation errors? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Anyone else have a growing number of validation errors? Yes, I have so far today a 10% invalid rate, which is unusually high. I have five Ubuntu machines on it, three with VirtualBox to run the pythons too. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,096,846 RAC: 5,633 |
Lately i have a high number of validation errors. Each Rosetta task blocks off 8gb of memory for itself, so the problem could be your other tasks, and whatever else you do with the pc, trying to use more than the remaining 8gb and if they also push the envelope then they are both trying to use the same blocks of memory and invalid units are inevitable. For me on my Win10/11 pc the WCG Africa Rainfall tasks use 5gb each, I don't run FAH so don't know how much memory it uses. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I see the same message: ERROR: [ERROR] Unable to open constraints file: m_09051a5a5815e8c4a7a718313fa04930_0001_000000061_0001_1_35_49_H_._HHH_b2_06813_0002_1_0001.MSAcsthttps://boinc.bakerlab.org/rosetta/result.php?resultid=1447459313 It is strange that it is classified as "invalid" rather than "error". |
sam6861 Send message Joined: 25 Mar 20 Posts: 4 Credit: 2,411,420 RAC: 0 |
So far, my computer's invalids appers to randomly happen to: 5nvx_graft_buwei_xab and 5nvx_graft_buwei_xad For my computer, nearly all of my invalids ran for less then 3 mimutes. Log shows an error. Sent time: 8 Nov 2021, 9:07:29 UTC Received: 8 Nov 2021, 9:10:38 UTC Run time: 2 min 54 sec Validate state: Invalid ERROR: [ERROR] Unable to open constraints file https://boinc.bakerlab.org/rosetta/result.php?resultid=1447267533 |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,445,399 RAC: 24,782 |
Anyone else have a growing number of validation errors?Nope, for me there are a slightly higher than usual number of compute Errors. Don't worry about it- 5nvx_graft_buwei_ have produced a steady stream of Invalids & Errors ever since they were first released, and the percentage of them compared to Valids does vary as you get batches of work that have more or less than the usual number of error producing Tasks in them. And most of them die within a matter of minutes. If you start getting Invalids or Errors that aren't 5nvx_graft_buwei_ Tasks (other than the very occasional RB Task), and the computer that processes the resent Task doesn't get an error, then it's time to be concerned. Also the reported runtime of 2 minutes is not correct.No so. It did run for only a few minutes. Run time 2 min 46 sec CPU time 2 min 25 sec Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,445,399 RAC: 24,782 |
It is strange that it is classified as "invalid" rather than "error".Yep, ever since those Tasks were released the exact same error in the stderr output can result in either a Computation error or a Validation error. Grant Darwin NT |
tvdsluis Send message Joined: 27 Mar 20 Posts: 11 Credit: 514,960 RAC: 0 |
Thanks for the responses. For now i will limit Rosetta to 1 task at a time, to see if the memory constraint is the factor. I changed it from 1 to 2 not so long ago, so we'll see what happens in the next few days. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,445,399 RAC: 24,782 |
Thanks for the responses.Why? The problem has noting to do with memory issues. As we mentioned, the problem is with those particular Tasks. Yes, your system is very low on RAM for the number of cores/threads it has (For Rosetta you need to allow 1.3GB RAM per core/thread in use to avoid problems due to lack of RAM; more if you process Python Tasks). But since you aren't using all of them for Rosetta it's not going to be an issue. Memory issues will usually result in an unhandled exception error. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,445,399 RAC: 24,782 |
My Invalids have more than quadrupled overnight. A particularly bad group of 5nvx_graft_buwei_ Tasks making their way through the system. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,042,562 RAC: 14,436 |
My Invalids have more than quadrupled overnight. A particularly bad group of 5nvx_graft_buwei_ Tasks making their way through the system. I had 10, 9 of which were the standard “Unable to open constraints file” but the 10th lasted the full 8 hours and had several hundred “Invalid pointer” errors :- Task 1448313622 Name 5nvx_graft_buwei_xad_SAVE_ALL_OUT_IGNORE_THE_REST_5gk5dq7m_1731808_18_1 Workunit 1291016882 Created 10 Nov 2021, 8:43:42 UTC Sent 10 Nov 2021, 8:47:51 UTC Report deadline 13 Nov 2021, 8:47:51 UTC Received 10 Nov 2021, 17:38:40 UTC Server state Over Outcome Validate error Client state Done Exit status 0 (0x00000000) Computer ID 3563484 Run time 8 hours 2 min CPU time 8 hours 0 min 10 sec Validate state Invalid Credit 470.80 Device peak FLOPS 7.03 GFLOPS Application version Rosetta v4.20 x86_64-pc-linux-gnu Peak working set size 995.81 MB Peak swap size 1,130.45 MB Peak disk usage 30.48 MB Stderr output <core_client_version>7.16.17</core_client_version> <![CDATA[ <stderr_txt> * *** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 *** *** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 *** . . . *** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 *** *** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 *** *** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 *** *** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 *** ====================================================== DONE :: 649 starting structures 28810.4 cpu seconds This process generated 649 decoys from 649 attempts ====================================================== BOINC :: WS_max 1.04103e+09 17:01:05 (1950): called boinc_finish(0) </stderr_txt> ]]> |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,042,562 RAC: 14,436 |
Delete duplicate post |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,445,399 RAC: 24,782 |
I had 10, 9 of which were the standard “Unable to open constraints file” but the 10th lasted the full 8 hours and had several hundred “Invalid pointer” errors :-So even though it produced valid work, it gave a Validation error due to the invalid pointer issue. At least you got Credit for the time spent & work done, even if it still gets counted as Invalid. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,042,562 RAC: 14,436 |
I had 10, 9 of which were the standard “Unable to open constraints file” but the 10th lasted the full 8 hours and had several hundred “Invalid pointer” errors :-So even though it produced valid work, it gave a Validation error due to the invalid pointer issue. That’s about the size of it. I posted it as I’ve not seen one like that before. |
tvdsluis Send message Joined: 27 Mar 20 Posts: 11 Credit: 514,960 RAC: 0 |
An update: After switching back to running RAH on just one core, all invalids are gone. I now have 27 Consecutive valid tasks. @mikey, it looks like you're spot on with the memory requirements. The other 7 cores now run FAH and WCG and because i only have 16gb on this system, just 1 runs RAH. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,096,846 RAC: 5,633 |
An update: And that's exactly why I also limit Rosetta tasks to one at a time on all my machines, most only have 16gb of ram anyway. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,445,399 RAC: 24,782 |
An update:That is not what happened. What happened is that a new batch of work was released, that doesn't include the types of Task that were producing the errors you were posting about. So regardless of whether you use 1 core or 256, the errors won't occur if you're not processing those particular Tasks, and they will re-occur if such Tasks are released again- all regardless of the number of cores you use. If the errors were due to memory issues, you would have got a different error message (and in many cases it would have occurred after the Task had ben processed for some time, not when it had just started up). As long as you allow 1.3GB of RAM per Rosetta 4.20 Tasks being processed, then you won't have any issues with memory. If you do, then go to your account, Computing preferences, Memory, and set both "When computer is in use, use at most" and "When computer is not in use, use at most" to 95 % each. With the amount of RAM you have v the number of cores/threads that will allow you to process 12 Tasks at a time without issue (most of the time would actually be possible to do 16, but the amount of RAM used by Tasks does vary. Presently it's between 700MB and 1GB. It can be as high as 2GB, or as low as 200MB). However 1 Python Task would use half the systems RAM, limiting the amount of other work that could be done. If you don't use VirtualBox for any other projects, then re-installing BOINC using the version that doesn't include VirtualBox would solve that problem. @mikey, it looks like you're spot on with the memory requirements.Mikey has been spouting unhelpful rubbish. Only the Python Tasks require 8GB or RAM due to their use of VirtualBox; as i pointed out above, Rosetta 4.20 Tasks don't require nearly as much RAM. Grant Darwin NT |
Message boards :
Number crunching :
High number of invalid tasks
©2024 University of Washington
https://www.bakerlab.org