Suspend to disk

Message boards : Number crunching : Suspend to disk

To post messages, you must log in.

AuthorMessage
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,448,284
RAC: 11,030
Message 21837 - Posted: 4 Aug 2006, 14:33:30 UTC

Hi

Does anyone know if it would be possible/practical to suspend the Rosetta thread to disk on shutdown? I'm thinking something like Windows hibernating, but just for the Rosetta thread.

It seems we're loosing a massive amount of processing time from reboots, and this will increase as the models get bigger (unless a more regular checkpoint can be implemented).

cheers
Danny
ID: 21837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christoph

Send message
Joined: 10 Dec 05
Posts: 57
Credit: 1,512,386
RAC: 0
Message 21841 - Posted: 4 Aug 2006, 15:28:42 UTC

A more regular checkpoint would be very nice! Some proteins take more than 2 hours to checkpoint.
ID: 21841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 21849 - Posted: 4 Aug 2006, 16:48:46 UTC

I agree, as we've been studying larger proteins, the time between checkpoints has grown too large (1+hrs on fast machines).

Before they added the checkpointing, they said it was difficult. There are points in the analysis where they'd have 300+MBs of data they'd like to stash away, and it would be difficult to write code to push all of that to a file and be able to retrieve it again.

Later they found points in the code where they were resetting a portion of the search or whatever, areas where less working storage was required, and these are where they added the checkpoints.

We'll have to see how the AIDS vaccine studies crunch and the checkpoint frequency of those.

The idea of pushing everything to disk upon suspend is great... but I don't believe that's how BOINC works. It just informs the thread to stop crunching, and whether or not it should remain in memory. Doesn't really give it a chance to react and stash stuff.

As for losing massive amounts of time due to reboots... I had the same thought... and concluded that when the checkpointing was added originally, that we would see maybe a 10% increase in TFLOPs for the project... but it didn't work out that way. So... I conclude there are many crunching 24/7 and/or they have changed their general preference to leave in memory while preempted, and don't reboot very often.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 21849 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Suspend to disk



©2024 University of Washington
https://www.bakerlab.org