Report Problems with Rosetta Version 5.16 II

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 II

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
rdickjune

Send message
Joined: 15 May 06
Posts: 5
Credit: 5,529
RAC: 0
Message 17219 - Posted: 27 May 2006, 5:26:27 UTC

I'm getting errors occasionally and need to know where to report them for my version of Rosetta. The error messages have been as follows:

(type is in red:)
rosetta@home 5/26/2006 9:19:51 PM rosetta not responding to screensaver, exiting
rosetta@home 5/26/2006 9:19:51 PM Unrecoverable error.....,etc... (-exit code_ 1 (0xcffffffff)

.....after more dialog, the end result states that the application is terminated.

I have had this same error happen a number of times since I started running Rosetta. (less than two weeks) I think I read on the site somewhere that credit is earned for all work done. I am not concerned in this regard. I see that there are links provided to report errors for specific versions of Rosetta. I don't see a link for the newer clients to report bugs, so this is why I'm using this thread. If you move this to another thread, please let me know to where it's been moved for future reference. Thank you.

ID: 17219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 17221 - Posted: 27 May 2006, 5:45:08 UTC
Last modified: 27 May 2006, 5:52:49 UTC

I have started this new thread for reporting version 5.16 issues and problems. The first thread was getting rather long and was taking too long to load.

The original thread is located here, but please start using this thread for your reports.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 17221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 17252 - Posted: 27 May 2006, 19:19:11 UTC

see last post in "Report Problems with Rosetta Version 5.16 I", this is my second error since turning the screensaver back on. It's wuid=18193778

Result ID 21719438
Name T0283_CONTACTS_MAP_FROM_hom006_535_21537_0
Workunit 18193778
Created 27 May 2006 4:39:53 UTC
Sent 27 May 2006 6:37:21 UTC
Received 27 May 2006 19:14:59 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 3 Jun 2006 6:37:21 UTC
CPU time 1213.703125
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 3172964

</stderr_txt>


Validate state Invalid
Claimed credit 4.80583329460571
Granted credit 0
application version 5.16
ID: 17252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 17255 - Posted: 27 May 2006, 20:35:54 UTC
Last modified: 27 May 2006, 20:39:22 UTC

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18032663

The computing error came after 13,500+ seconds of processing , and 6 models (it was working on number 7)
This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 17255 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 17257 - Posted: 27 May 2006, 22:19:40 UTC

OK, now there's three fatal windows errors since turning screensaver back on. wuid=18250663. I'm now going to turn screensaver back off.

tony


Result ID 21780793
Name T0283_CONTACTS_CONSERVATIVE_MAP_FROM_hom006_547_8549_0
Workunit 18250663
Created 27 May 2006 17:01:15 UTC
Sent 27 May 2006 19:15:00 UTC
Received 27 May 2006 22:16:39 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 3 Jun 2006 19:15:00 UTC
CPU time 9206.765625
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 2943352
# cpu_run_time_pref: 28800

</stderr_txt>


Validate state Invalid
Claimed credit 36.4555218363274
Granted credit 0
application version 5.16
ID: 17257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 17260 - Posted: 28 May 2006, 0:07:32 UTC - in response to Message 17257.  

OK, now there's three fatal windows errors since turning screensaver back on. wuid=18250663. I'm now going to turn screensaver back off.

tony


Result ID 21780793
Name T0283_CONTACTS_CONSERVATIVE_MAP_FROM_hom006_547_8549_0
Workunit 18250663
Created 27 May 2006 17:01:15 UTC
Sent 27 May 2006 19:15:00 UTC
Received 27 May 2006 22:16:39 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 3 Jun 2006 19:15:00 UTC
CPU time 9206.765625
stderr out 5.4.9

- exit code -1073741811 (0xc000000d)


# random seed: 2943352
# cpu_run_time_pref: 28800




Validate state Invalid
Claimed credit 36.4555218363274
Granted credit 0
application version 5.16


Yes--Rom in analyzing the current error breakdown thinks that most are associated with the graphics failing. he is testing a solution in which rosetta keeps going and results get returned even if there is a problem with the graphics.

ID: 17260 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 17262 - Posted: 28 May 2006, 0:56:03 UTC - in response to Message 17260.  

Yes--Rom in analyzing the current error breakdown thinks that most are associated with the graphics failing. he is testing a solution in which rosetta keeps going and results get returned even if there is a problem with the graphics.

If you don't mind, I'll email Rom directly to see if there's anything I can do. I'm one of his Alpha testers anyway.

tony
ID: 17262 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 17265 - Posted: 28 May 2006, 5:34:53 UTC
Last modified: 28 May 2006, 5:38:55 UTC

1/8/2005 3:36:18 PM||request_reschedule_cpus: process exited
1/8/2005 3:36:18 PM|rosetta@home|Computation for result T0283_CONTACTS_MAP_FROM_hom006_535_22929_0 finished
1/8/2005 3:36:19 PM|rosetta@home|Started upload of T0283_CONTACTS_MAP_FROM_hom006_535_22929_0_0
1/8/2005 3:36:24 PM|rosetta@home|Finished upload of T0283_CONTACTS_MAP_FROM_hom006_535_22929_0_0
1/8/2005 3:36:24 PM|rosetta@home|Throughput 31466 bytes/sec
1/8/2005 3:54:13 PM|rosetta@home|Deferring communication with project for 1 days, 19 hours, 59 minutes, and 57 seconds
1/8/2005 4:01:56 PM||Insufficient work; requesting more
1/8/2005 4:01:56 PM|LHC@home|Deferring communication with project for 71 weeks, 5 days, 7 hours, 29 minutes, and 28 seconds
1/8/2005 4:54:14 PM|rosetta@home|Deferring communication with project for 1 days, 18 hours, 59 minutes, and 56 seconds
1/8/2005 11:02:00 PM||Insufficient work; requesting more
1/8/2005 11:02:00 PM|LHC@home|Deferring communication with project for 71 weeks, 5 days, 0 hours, 29 minutes, and 24 seconds
1/8/2005 11:54:18 PM|rosetta@home|Deferring communication with project for 1 days, 11 hours, 59 minutes, and 53 seconds
1/9/2005 12:02:00 AM||Insufficient work; requesting more
1/9/2005 12:02:00 AM|LHC@home|Deferring communication with project for 71 weeks, 4 days, 23 hours, 29 minutes, and 24 seconds
1/9/2005 12:30:39 AM||request_reschedule_cpus: project op
1/9/2005 12:30:40 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
1/9/2005 12:30:40 AM|rosetta@home|Requesting 0 seconds of work, returning 1 results
1/9/2005 12:30:42 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
1/9/2005 12:31:17 AM||request_reschedule_cpus: project op
1/9/2005 12:31:19 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
1/9/2005 12:31:19 AM|rosetta@home|Requesting 8640 seconds of work, returning 0 results
1/9/2005 12:31:20 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
1/9/2005 12:31:20 AM|rosetta@home|Message from server: Not sending work - last RPC too recent: 38 sec
1/9/2005 12:31:20 AM|rosetta@home|No work from project
1/9/2005 12:31:21 AM|rosetta@home|Deferring communication with project for 4 minutes and 1 seconds

It says on the homepage that there are 19,000 workunits in the queue yet I cannot get any workunits. 6 hours comp time wasted....argh. Anyone else having this problem???????????

****Edit****

Hmm, I just got work now? This is interesting. I noticed that for some reason workunits get stuck in the status "ready to report" under the worktab but never actually get uploaded even though BOINC has contacted rosetta servers. Only after I manually press the update button will the workunit go through. I am running version 4.45. Any ideas????
ID: 17265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 17266 - Posted: 28 May 2006, 6:16:36 UTC - in response to Message 17265.  

....Hmm, I just got work now? This is interesting. I noticed that for some reason workunits get stuck in the status "ready to report" under the worktab but never actually get uploaded even though BOINC has contacted rosetta servers. Only after I manually press the update button will the workunit go through. I am running version 4.45. Any ideas????

You might want to upgrade to version 5.4.9 which is the current version for BOINC.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 17266 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 17270 - Posted: 28 May 2006, 10:04:18 UTC - in response to Message 17265.  

This is interesting. I noticed that for some reason workunits get stuck in the status "ready to report" under the worktab but never actually get uploaded even though BOINC has contacted rosetta servers. Only after I manually press the update button will the workunit go through. I am running version 4.45. Any ideas????

Reporting is done seperately from uploading to reduce network comms on the server side. JM7 (the man who wrote the scheduler) says this:

Results are reported any time the project is contacted for an update. Updates occur at the first of:

1) A result report is due within 24 hours.
2) It has been at least the connect interval since the result completed.
3) (5.4) It is less than the connect interval till the report deadline.
4) Work is needed.
5) A manual update.
ID: 17270 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Robinski

Send message
Joined: 7 Mar 06
Posts: 51
Credit: 85,383
RAC: 0
Message 17280 - Posted: 28 May 2006, 20:52:35 UTC

I just saw I had an error today with

r287__CONTACTEIGHT_SHORTRELAX_SAVE_ALL_OUT_hom001__563_711
see: Result

Possible this is due to the fact I manualy stopped the boinc service but I am not sure if this was around the same time.

Otherwise it is just an error which occured.


It was an Invalide Function error:
<core_client_version>5.5.0</core_client_version>
<message>
Onjuiste functie. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2482790
# cpu_run_time_pref: 3600
ERROR:: Exit at: .dock_structure.cc line:401

</stderr_txt>
Member of the Dutch Power Cows

Trying to get the world on IPv6, do you have it? check here: IPv6.RHarmsen.nl
ID: 17280 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 17301 - Posted: 29 May 2006, 14:04:44 UTC

Hi

I'm cruching this https://boinc.bakerlab.org/rosetta/result.php?resultid=21976659 Wu now.

When I select print to screen on my economiprogram the wu halts.

No movment what so ever after on grafics.

Anders n
ID: 17301 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rollo

Send message
Joined: 2 Jan 06
Posts: 21
Credit: 106,369
RAC: 0
Message 17317 - Posted: 29 May 2006, 18:39:25 UTC
Last modified: 29 May 2006, 18:41:56 UTC

I am crunching on 21970571 right now. It stops after reaching 1.210% at time step 2833. If I stop boinc and let it restart from the last checkpoint (here: from the beginning), it stops at same step 2833 in model 1. For me this seems reproducible.
Any suggestions, what I can do to produce a reasonable error report, except abort the workunit or wait for the watchdog?
ID: 17317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 17321 - Posted: 29 May 2006, 20:29:19 UTC - in response to Message 17317.  

I am crunching on 21970571 right now. It stops after reaching 1.210% at time step 2833. If I stop boinc and let it restart from the last checkpoint (here: from the beginning), it stops at same step 2833 in model 1. For me this seems reproducible.
Any suggestions, what I can do to produce a reasonable error report, except abort the workunit or wait for the watchdog?


I'd wait at least an hour. In theory the watchdog should terminate it after an hour. This is a good opportunity to see if it really works as it should. In any case let it run a few hours and if it really keeps stuck at step 2833 abort and you get all the credits for the time crunched.
ID: 17321 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aglarond

Send message
Joined: 29 Jan 06
Posts: 26
Credit: 446,212
RAC: 0
Message 17325 - Posted: 29 May 2006, 23:29:58 UTC

I had another one of that nasty R@H screensaver crashes. It was result T0283_CONTACTS_CONSERVATIVE_HALFHB_MAP_FROM_hom006_575_8907_0. However today I zipped memory dump that windows was going to send to microsoft. If you think it will help you, you can download WERa78d.dir00.zip (16.1 MB). (I will leave it there for download for at least a month)

I was also thinking why this happens only on this particular computer. This is only one of my computers, that has localized version of windows (Slovak language version). Do you think it can be the reason for screensaver crash?
ID: 17325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Winkle

Send message
Joined: 22 May 06
Posts: 88
Credit: 1,354,930
RAC: 0
Message 17337 - Posted: 30 May 2006, 7:11:39 UTC
Last modified: 30 May 2006, 7:23:30 UTC

I am currently crunching JUMP_RELAX_LONGRANGEPAIR_PARALLEL_t285__SAVE_ALL_OUT_548_11268_0 using rosetta version 516
It is at 1% after 2.5 hrs. Boincview tells me that it has 5.25 hrs to complete.
Normally it takes around 2.8 hrs per WU.
It is running on a Dell P3 1G #225837

Wait....

That was wierd...
It just went straight to 100% At 2:43
Any Ideas ??

Edit...
I think this is it...
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18401795
https://boinc.bakerlab.org/rosetta/result.php?resultid=21942605
End Edit...
Thanks
Ian
ID: 17337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 17340 - Posted: 30 May 2006, 7:38:35 UTC - in response to Message 17337.  

I am currently crunching JUMP_RELAX_LONGRANGEPAIR_PARALLEL_t285__SAVE_ALL_OUT_548_11268_0 using rosetta version 516
It is at 1% after 2.5 hrs. Boincview tells me that it has 5.25 hrs to complete.
Normally it takes around 2.8 hrs per WU.
It is running on a Dell P3 1G #225837

Wait....

That was wierd...
It just went straight to 100% At 2:43
Any Ideas ??

Edit...
I think this is it...
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18401795
https://boinc.bakerlab.org/rosetta/result.php?resultid=21942605
End Edit...
Thanks
Ian


Quoting from the FAQ:


Depending on how the Wu is configured, some may have over 1,500,000 steps in the first model and still not reach 1%. This can take over 5 hours of CPU time. There are a few even larger ones.

ID: 17340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Winkle

Send message
Joined: 22 May 06
Posts: 88
Credit: 1,354,930
RAC: 0
Message 17346 - Posted: 30 May 2006, 9:10:00 UTC - in response to Message 17340.  

Thanks
I will have a read
ID: 17346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rollo

Send message
Joined: 2 Jan 06
Posts: 21
Credit: 106,369
RAC: 0
Message 17350 - Posted: 30 May 2006, 11:25:14 UTC - in response to Message 17321.  

I am crunching on 21970571 right now. It stops after reaching 1.210% at time step 2833. If I stop boinc and let it restart from the last checkpoint (here: from the beginning), it stops at same step 2833 in model 1. For me this seems reproducible.
Any suggestions, what I can do to produce a reasonable error report, except abort the workunit or wait for the watchdog?


I'd wait at least an hour. In theory the watchdog should terminate it after an hour. This is a good opportunity to see if it really works as it should. In any case let it run a few hours and if it really keeps stuck at step 2833 abort and you get all the credits for the time crunched.


The watchdog killed the workunit. I have made a backup, so I can rerun it, if that is of any interest.
ID: 17350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ebahapo
Avatar

Send message
Joined: 17 Sep 05
Posts: 29
Credit: 413,302
RAC: 0
Message 17367 - Posted: 30 May 2006, 14:55:07 UTC
Last modified: 30 May 2006, 14:55:34 UTC

I have a runaway WU (here). It reports 100% done, but even after over 11h it keeps on running, even though I limited WU time to 1h.

HTH

ID: 17367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 II



©2024 University of Washington
https://www.bakerlab.org