Newbie ?: Was 24 hours of cpu time "wasted" on "Unrecoverable error for result"

Message boards : Number crunching : Newbie ?: Was 24 hours of cpu time "wasted" on "Unrecoverable error for result"

To post messages, you must log in.

AuthorMessage
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 19699 - Posted: 2 Jul 2006, 16:45:38 UTC

Hi!

Newbiw here, crunching 24 hours at a time.

What is the implication of the following messages I recevied?

Is there any way to "recover" the 24 hours of work?

Sorry, just new to grid computing.

Thanx in advance.

7/2/2006 8:03:08 AM|rosetta@home|Computation for task FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_543_852_30_1 finished

7/2/2006 8:03:08 AM|rosetta@home|Starting task v350__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__857_194_0 using rosetta version 525

7/2/2006 8:03:09 AM|rosetta@home|Unrecoverable error for result FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_543_852_30_1 (<file_xfer_error> <file_name>FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_543_852_30_1_0</file_name> <error_code>-161</error_code></file_xfer_error>)

Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin
ID: 19699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Christoph Jansen
Avatar

Send message
Joined: 6 Jun 06
Posts: 248
Credit: 267,153
RAC: 0
Message 19705 - Posted: 2 Jul 2006, 17:59:46 UTC - in response to Message 19699.  
Last modified: 2 Jul 2006, 18:01:19 UTC

Hi!

What is the implication of the following messages I recevied?

Is there any way to "recover" the 24 hours of work?


Hi, Bad Penguin,

just a misunderstanding there. If you look into your results, you will notice that the aborted WU just ran for 10 seconds. So you lost virtaully no computing time. If you click on the left of the WU's numbers (26737584) you will see that it was created today, so no miscalculation by the progrram either. You can also see that if you look at WU number 3 in your list (26736814), it was finished directly before the one that just ran a few seconds and also reported today. That was the one you ran over night.

That happens from time to time in any project and often you are lucky that there is something wrong directly at the start of a calculation.


Regards,

Christoph
"I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." R.M. Nixon
ID: 19705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MikeMarsUK

Send message
Joined: 15 Jan 06
Posts: 121
Credit: 2,637,872
RAC: 0
Message 19706 - Posted: 2 Jul 2006, 18:27:33 UTC

If you get more 'errored' results, it may be a good idea to get smaller work units. I do however note that the other run of the same workunit also failed, so it might be a problem with the workunit rather than your PC.

ID: 19706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19763 - Posted: 4 Jul 2006, 15:17:12 UTC

Welcome to Rosetta! You have found the best DC project to participate in. The science is being discovered with help from all of us.

The idea behind the "smaller" WU being that if there is a failure, it would only be for a smaller crunch time. But, in general, you are just as likely to have a failure. The project recognizes that failure is part of progress. There are cases where a specific WU fails for everyone, or a specific random start will fail (each of us hits a random start to begin). When the failure is reported back they will study the cause and correct it.

You still get credit for the work, and you are still helping the project. Please try some more WUs and if problems continue, add a description to this thread.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19763 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Newbie ?: Was 24 hours of cpu time "wasted" on "Unrecoverable error for result"



©2024 University of Washington
https://www.bakerlab.org