Lot of failures

Message boards : Number crunching : Lot of failures

To post messages, you must log in.

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 105770 - Posted: 1 Apr 2022, 8:22:39 UTC

A large percentage of the work units sent here today quickly, (~20 seconds), failed with...

-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION

... two, however, have started, have been running for a couple of hours, and are showing 12% complete.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105770 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,114,842
RAC: 4,200
Message 105771 - Posted: 1 Apr 2022, 10:30:39 UTC - in response to Message 105770.  

A large percentage of the work units sent here today quickly, (~20 seconds), failed with...

-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION

... two, however, have started, have been running for a couple of hours, and are showing 12% complete.


They’re all working fine on my Ubuntu boxes.

If this is another example of work on Linux / crash on windows could those who are crashing and do not run Vbox tasks set NNT so that they last longer for us work starved souls who don’t have vbox level resources.
ID: 105771 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 105773 - Posted: 1 Apr 2022, 10:50:02 UTC

The two I mentioned earlier are still crunching away, 24% and 26% respectively. VBox is here, and a couple of projects use it without issue.

I know what you mean by "work starved". I always regarded Rosetta as an endless supply of work units which caused no problems for me, (Windows 8.1 x64), but work units are few and far between nowadays. The Italian project TN-Grid, (http://gene.disi.unitn.it/test/), is taking up most of the slack though. Generally, the number of projects I consider to be worth supporting has fallen markedly.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105773 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,114,842
RAC: 4,200
Message 105775 - Posted: 1 Apr 2022, 11:18:57 UTC - in response to Message 105773.  

The two I mentioned earlier are still crunching away, 24% and 26% respectively. VBox is here, and a couple of projects use it without issue.

I know what you mean by "work starved". I always regarded Rosetta as an endless supply of work units which caused no problems for me, (Windows 8.1 x64), but work units are few and far between nowadays. The Italian project TN-Grid, (http://gene.disi.unitn.it/test/), is taking up most of the slack though. Generally, the number of projects I consider to be worth supporting has fallen markedly.



Of my 5+1 projects (Ralph is the +1) I’m down to 2 giving me work and of those TN-Grid can’t take the strain, it’s bumping along with all the work going out as soon as it’s loaded into the queue because the work generator is unable to keep up with demand.
ID: 105775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 188
Credit: 6,461,090
RAC: 6,053
Message 105776 - Posted: 1 Apr 2022, 11:53:08 UTC - in response to Message 105770.  


A large percentage of the work units sent here today quickly, (~20 seconds), failed with...
-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION

Not here. Of the work units I have received, the first six completed successfully. I am currently running six more and each has accumulated around four hours of cpu time. I am running
Computer 5910575
CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.20.1.el8_5.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11

ID: 105776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 105777 - Posted: 1 Apr 2022, 12:50:18 UTC - in response to Message 105776.  
Last modified: 1 Apr 2022, 12:52:34 UTC

Hi JD, yes, Linux systems certainly do seem immune from this problem. If the project is getting enough work done, there is little impetus to fix the problem with Windows, or issue VBox work to the same crunchers.

As long as the work is getting done, fair enough. Twenty years ago, the comms was so slow, it would have been a serious annoyance.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1995
Credit: 9,634,307
RAC: 6,840
Message 105780 - Posted: 1 Apr 2022, 14:33:28 UTC - in response to Message 105777.  

Hi JD, yes, Linux systems certainly do seem immune from this problem. If the project is getting enough work done, there is little impetus to fix the problem with Windows


Yes. But the LARGE part of clients/volunteers runs Windows.

P.S.
Same error on my Win11:
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @11AA_YIL_YIL_11mer1282_000088_extract_A.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2442623
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000006300000062

ID: 105780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 105781 - Posted: 1 Apr 2022, 14:51:28 UTC
Last modified: 1 Apr 2022, 14:53:59 UTC

>>> LARGE part of clients/volunteers runs Windows

Certainly. Let us, however, consider the purpose. We are trying to help them. It is up to them to decide if Linux users are sufficient in number to acheive the result they require. It is already mentioned in this thread, that the amount of work from the project is a lot less than it used to be. They will, however, pick up on the fact that Windows users are seeing crashed work, and stop issuing it to these people, that, obviously, includes me.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,054,272
RAC: 5,361
Message 105783 - Posted: 1 Apr 2022, 15:45:45 UTC - in response to Message 105781.  

>>> LARGE part of clients/volunteers runs Windows

Certainly. Let us, however, consider the purpose. We are trying to help them. It is up to them to decide if Linux users are sufficient in number to acheive the result they require. It is already mentioned in this thread, that the amount of work from the project is a lot less than it used to be. They will, however, pick up on the fact that Windows users are seeing crashed work, and stop issuing it to these people, that, obviously, includes me.


I looked at a number of your failing WU DETAILS and there was the same failure by the other machine running the WU.
It looks like the WU are bad and you are OK.
ID: 105783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 105793 - Posted: 2 Apr 2022, 6:01:38 UTC - in response to Message 105783.  
Last modified: 2 Apr 2022, 6:02:16 UTC

>>> It looks like the WU are bad and you are OK.

Yes, I know.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105819 - Posted: 3 Apr 2022, 12:04:56 UTC
Last modified: 3 Apr 2022, 12:13:44 UTC

We have discovered over time that a certain researcher (unknown, but this bug seems to be related to them) compiles tasks just for linux and not for windows. So they bomb on windows machines (no matter what kind) and work just fine on linux. We have tried to reach to the project about this, but are ignored (this is also typical now). So if you get an task from 4.2 that bombs in a few seconds or under a minute,then it is that researcher.

The best way to check if it is your machine or the task is to find the task that crashed in your errors results on your tasks page of your account.
Click on the workunit, not the task and see what your wingman has done. If they completed it (4.2 tasks) then look at the machine OS. If it is windows, then there is also a issue we are exploring (mainly with python) that certain older CPU's do not run certain tasks while newer cpu's do. (long complicated unfolding story). If the task completes on a linux machine, then you know the answer.

In short, certain 4.2 tasks are designed for linux only machines and not windows. This is one of them.

For example look here: [urlhttps://boinc.bakerlab.org/rosetta/workunit.php?wuid=1318176948[/url]
This is one that bombed on my system, was resent, another windows user got it and it bombed.
Exact same error code as you.

Look through Problems and Technical Issues with Rosetta@home thread above for more information of all the current bugs and complaints.
ID: 105819 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Lot of failures



©2024 University of Washington
https://www.bakerlab.org