CASP 10

Author	Message
TJ Volunteer moderator Project developer Project scientist Send message Joined: 22 Oct 10 Posts: 9 Credit: 216,670 RAC: 0	Message 72941 - Posted: 30 Apr 2012, 19:37:35 UTC Hello everyone ! CASP 10, a community wide experiment in structure prediction starts tomorrow on May 1st and runs to August 1st. During this time we will be using BOINC heavily for structure prediction. If your work unit starts with the label rb you're running a CASP 10 target! rb is short for Robetta which is our publicly available server for structure prediction. CASP CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction. Robetta Structure prediction for the community, by the community. Robetta is a server for protein structure prediction that shares Rosetta's structure prediction capabilities to the scientific community (and to the public). The computation for this will be conducted on BOINC meaning that you guys will be crunching protein structure prediction jobs for real scientific studies conducted by researchers all over the world. Improvements since CASP 9 Over the last two years we have extensively modified our structure prediction methodology. Preliminary results indicate that we've made more improvement in the last two years than in the previous 6 years combined. For the first time there is significant doubt wether humans can improve upon the results from computers. So this could be a very exciting CASP. Thanks again everyone for crunching, we wouldn't be able to do this stuff without you ! Excitedly yours, Chris, Ray, Frank, Yifan, David Baker, David Kim, Hetu and TJ ID: 72941 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2599 Credit: 47,220,881 RAC: 0	Message 72949 - Posted: 1 May 2012, 4:52:43 UTC - in response to Message 72941. CASP CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction. You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before: The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount. The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count? I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments. Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise. On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back. ID: 72949 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 72954 - Posted: 1 May 2012, 5:31:17 UTC Hi TJ. Quote[ CASP 10, a community wide experiment in structure prediction starts tomorrow on May 1st and runs to August 1st. During this time we will be using BOINC heavily for structure prediction. If your work unit starts with the label rb you're running a CASP 10 target! rb is short for Robetta which is our publicly available server for structure prediction. ]quote. The only problem is I've been seeing these task names for weeks now, is there going to be some other way to tell which are really CASP tasks. Something added to the task naming many be. ID: 72954 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 72956 - Posted: 1 May 2012, 5:52:06 UTC - in response to Message 72949. Yes you are absolutely right. for CASP we need results back within a day or two, as our approach is iterative: we analyze the results after one day and send out another set of wu based on these results for two days of computing, then collect the results and submit to CASP. so please do set your buffer to a shorter time, and let us know if you are running out of wu. thanks! CASP CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction. You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before: The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount. The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count? I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments. Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise. On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back. ID: 72956 · Rating: 0 · rate: / Reply Quote

Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0	Message 72962 - Posted: 1 May 2012, 16:23:03 UTC - in response to Message 72954. The only problem is I've been seeing these task names for weeks now, is there going to be some other way to tell which are really CASP tasks. A large number of those workunits have been pre-CASP testing - that is, running the entries from previous CASPs through the CASP10 structure prediction machinery and checking that everything is working properly. Now that CASP has started, that testing is pretty much over (although there might be occasional tests to double check something, or to try a last-minute fix). A small portion of those workunits were for structure prediction jobs which were submitted to Robetta by other research groups. But to conserve resources, that public submission is going to be disabled for the duration of CASP. So if you see a rb task in the next few months, in all likelihood it should be for CASP. ID: 72962 · Rating: 0 · rate: / Reply Quote

Sean Kiely Send message Joined: 31 Jan 06 Posts: 65 Credit: 43,992 RAC: 0	Message 72964 - Posted: 1 May 2012, 17:19:26 UTC - in response to Message 72956. I would recommend that you post an item under "News" on the homepage (and also a new thread in the number-crunching forum) asking participants to check their work buffer settings and reduce them to no higher than 1.5 days? This might reduce the number of CASP units that are processed but not returned quickly enough to be useful. Yes you are absolutely right. for CASP we need results back within a day or two, as our approach is iterative: we analyze the results after one day and send out another set of wu based on these results for two days of computing, then collect the results and submit to CASP. so please do set your buffer to a shorter time, and let us know if you are running out of wu. thanks! CASP CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction. You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before: The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount. The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count? I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments. Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise. On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back. ID: 72964 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2599 Credit: 47,220,881 RAC: 0	Message 72966 - Posted: 1 May 2012, 18:11:49 UTC - in response to Message 72956. Thanks for the quick reply. I didn't anticipate you did post-processing of results - is ~1.83 days (1.5days + 8 hour runtime) sufficient for you? What would your ideal maximum turnaround time be? Another issue that arose last year was the fact that the BOINC manager doesn't help us adhere to a quicker-than-usual turnaround time because of issues like "debt" between projects (I'm not qualified to talk about this tbh but I know there's a factor involved). Personally I'll be setting WCG to "No New Tasks" for the duration as Rosetta is my primary project. The biggest issue last year, though, was the "Deadline" we see in the Boinc Manager being set at 10 days from download - especially when a contributor runs more than one project (due to Para 2). Is there any way you can set the deadline for specific CASP10 tasks to your preference - your ideal maximum-turnaround time? That way, the BOINC manager will ensure that targets are met rather than (effectively) working toward missing them. I notice this afternoon I've received a non "RB" task from Rosetta (ab_centroidAbrelax_cst_3qc7A) after the first rb tasks have come down. In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end? I can't think of any other issues that might prevent the CASP exercise from operating successfully. Yes you are absolutely right. For CASP we need results back within a day or two, as our approach is iterative: we analyze the results after one day and send out another set of wu based on these results for two days of computing, then collect the results and submit to CASP. So please do set your buffer to a shorter time, and let us know if you are running out of wu. Thanks! CASP CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction. You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before: The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount. The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count? I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments. Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise. On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back. ID: 72966 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2207 Credit: 13,720,774 RAC: 2	Message 72968 - Posted: 2 May 2012, 15:56:00 UTC I'm downloading CASP9_benchmark again... ID: 72968 · Rating: 0 · rate: / Reply Quote

TJ Volunteer moderator Project developer Project scientist Send message Joined: 22 Oct 10 Posts: 9 Credit: 216,670 RAC: 0	Message 72969 - Posted: 2 May 2012, 16:28:19 UTC - in response to Message 72968. I'm downloading CASP9_benchmark again... We'll be using CASP 9 to test our system while CASP 10 runs. We know what the correct solution for CASP 9 is but won't know the solutions for CASP 10 for a couple weeks. ID: 72969 · Rating: 0 · rate: / Reply Quote

Aegis Maelstrom Send message Joined: 29 Oct 08 Posts: 61 Credit: 2,137,555 RAC: 0	Message 72970 - Posted: 2 May 2012, 16:35:15 UTC - in response to Message 72966. Last modified: 2 May 2012, 16:36:12 UTC Hi, I second Sid Celery in his proposals. My guess is that a majority of R@h crunchers is not meeting your 1-2 days deadline for a task requirement. To be effective they need to be forced (or at least informed) to change their behaviour. As the CASP10 has already started, we would need to inform them in a blink. Furthermore, let's be honest - a vast majority of crunchers does not read information from the projects or their teams on a daily basis. They even lag severly in e-mail communication. Moreover, even if they learn about new requirements a fair amount of participants will forget, be unable etc. to adjust their crunching pattern. In this situation changing the deadlines for WUs in addition to the information about the issue (very important for computers without permanent access to the Internet, used for a small amount of time per day, set on longer run times etc.) seems to be the best option. That or sending CASP10 WUs strictly basing on behavioural patterns (only to "fast" crunchers, if their computing power is big enough). Best Regards and Happy Crunching. ID: 72970 · Rating: 0 · rate: / Reply Quote

EmSti [BlackOps] Send message Joined: 28 Apr 12 Posts: 1 Credit: 536,791 RAC: 0	Message 72972 - Posted: 2 May 2012, 17:14:16 UTC Will crunchers still get full credit for tasks sucessfully run, but not within 3 days? ID: 72972 · Rating: 0 · rate: / Reply Quote

Plomos Send message Joined: 4 Mar 11 Posts: 11 Credit: 439,043 RAC: 0	Message 72973 - Posted: 2 May 2012, 18:12:24 UTC Can I ask why WU's that seem to be for something other than CASP are being run? I am seeing names like heterodimer_design_21_pose_B_abinitio_SAVE_ALL_OUT_46202_4097_0 and ab_11_29__optpps_T5611_optpps_03_09_35686_255345_0 in addition to the CASP9 benchmark and rb WUs. Are all of these being used for CASP? If not then which are and why are other things being sent out? ID: 72973 · Rating: 0 · rate: / Reply Quote

.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0	Message 72974 - Posted: 2 May 2012, 19:49:29 UTC The eisiest way for Rosetta to get the results back in two days max is to simply give the tasks a 48 hour deadline time from the moment the sheduler uploads them to us and Boinc manager will hapily go into panic mode and crunch them in High Priority mode as soon as it downloads them. No fancy coding needed, ID: 72974 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2599 Credit: 47,220,881 RAC: 0	Message 72978 - Posted: 3 May 2012, 2:32:58 UTC - in response to Message 72970. Last modified: 3 May 2012, 2:33:50 UTC Hi, I second Sid Celery in his proposals. Thanks My guess is that a majority of R@h crunchers is not meeting your 1-2 days deadline for a task requirement. To be effective they need to be forced (or at least informed) to change their behaviour. I would guess this isn't true actually. Most people will work with the defaults of 0.25 days buffer & 3-hour run-times, so in the main everything ought to be fine. The problem will be that inveterate fiddlers (presumably like you & I) will have tweaked our settings. Hopefully they cast their eye over the forums too & will catch this wrinkle before long. At the same time these same people will possibly be those who dedicate more rsources to Rosetta, so it may make a disproportionate amount of difference. Just speculating obviously. As long as we tweak things appropriately, everyone should get what they want. It's also worth a shout to say when CASP10 is over so we can revert to our individual preferences afterwards. ID: 72978 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2207 Credit: 13,720,774 RAC: 2	Message 72983 - Posted: 3 May 2012, 12:07:24 UTC - in response to Message 72966. In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end? This is a GOOD idea! ID: 72983 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 72990 - Posted: 4 May 2012, 20:13:10 UTC - in response to Message 72983. Last modified: 4 May 2012, 20:14:26 UTC In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end? This is a GOOD idea! This can be done on our end but we do not want to change things at this point. It's a great suggestion but it may cause errors initially due to past deadlines until the client can adjust to appropriately estimate run times. The majority of users use the default run time setting of 3 hours. Sid Celery is correct. Things are working well on our end with the current settings so we do not want to modify things at this point. Please feel free to update your cpu run time preference and buffer time, (the defaults are ideal) it will definitely help to get results quickly. ID: 72990 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 72991 - Posted: 4 May 2012, 20:15:43 UTC - in response to Message 72972. Will crunchers still get full credit for tasks sucessfully run, but not within 3 days? yes of course! also the results may be useful for our human based predictions. ID: 72991 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2599 Credit: 47,220,881 RAC: 0	Message 72994 - Posted: 5 May 2012, 4:15:04 UTC - in response to Message 72990. In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end? This is a GOOD idea! This can be done on our end but we do not want to change things at this point. It's a great suggestion but it may cause errors initially due to past deadlines until the client can adjust to appropriately estimate run times. The majority of users use the default run time setting of 3 hours. Sid Celery is correct. Things are working well on our end with the current settings so we do not want to modify things at this point. Please feel free to update your cpu run time preference and buffer time, (the defaults are ideal) it will definitely help to get results quickly. I guess if things are running well then it's ok - Boinc manager's scheduling is flaky at the best of times - but if it could be relied upon it would cater for all the weird & wonderful settings people use without intervention or having to read the right forum thread at the right time. You'd have the best overview, I'm sure. Worth thinking about for the future though. Maybe a test at during non-critical period is something that can be tested (WCG uses short deadlines when a task has to be reissued and it seems to go through without a hitch) ID: 72994 · Rating: 0 · rate: / Reply Quote

LanDroid Send message Joined: 28 Sep 05 Posts: 3 Credit: 1,814,244 RAC: 0	Message 73003 - Posted: 6 May 2012, 3:28:34 UTC Structure prediction for the community, by the community. ... For the first time there is significant doubt whether humans can improve upon the results from computers. So this could be a very exciting CASP. So this is not just a competition between Boinc/Rosetta Vs. Independent Labs, it is also Boinc/Rosetta Vs. Humans! ID: 73003 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 73005 - Posted: 6 May 2012, 16:18:25 UTC Last modified: 6 May 2012, 16:21:02 UTC CASP has one side of things for automated structure prediction, but there is another side for human predictions (many of which are done with various degrees of automated tooling behind them). CASP is for the world's scientific community. All researchers studying the subject are potential participants. It is not truly a competition. It is more done as a measure of the current state-of-the-art. At the end, the "winners" present to the others about their methods and ideas that they attribute for their superior predictions. So as the ideas are assimilated into the various other approaches and combined, the next round is always a greater challenge than the last. When reference is made to Rosetta vs. the rest, it is just people rooting for their "home team" to do well and continue to demonstrate that science is progressing and that we're truly learning how proteins work. This leads to vaccines and treatments for many of life's diseases. When I started with Rosetta@home, the project was graciously making good use of 15 TeraFlops of computing power. Dr. Baker said it had taken off beyond all expectation (of a campus-wide distributed system). He also said that ten times more computing power would really open new frontiers to the research he and his lab have dedicated their lives to. It is truly rewarding to now see the project churning at over 150 TFlops with people from all over the planet contributing to a common cause for the common good. Keep crunching! Rosetta Moderator: Mod.Sense ID: 73005 · Rating: 0 · rate: / Reply Quote