Questions about the search space

Author	Message
PhotonSmasher Send message Joined: 26 Aug 09 Posts: 1 Credit: 224,496 RAC: 0	Message 63116 - Posted: 2 Sep 2009, 0:13:28 UTC First of all, congratulations for the great work. I am very interested to know what you discovered about the search space of protein folding and about the details of your search algorithm. Are there any good sources of information in addition to the "Macromolecular Modeling with Rosetta" article? In my results section I can see a 2d plot for proteins for which I contributed results. I can see there a few proteins with around 200,000 predictions from all users. From the plots it seems likely that the optimum was not yet found. It will be very interesting for me to see plots from proteins with a very large number of predictions, as it will give at least some idea about the search space. It will be even more interesting to see a plot which also contains the best FoldIt result. It will also be interesting to hear your thoughts about the importance of improving the energy function vs. improving the search algorithm. Thanks a lot for your time. ID: 63116 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 63122 - Posted: 2 Sep 2009, 13:55:44 UTC I will defer to the Project Team for more details. But I believe it is safe to say they are constantly improving both the energy function and the search algorithm. Both are important. Here is some background... The system uses a Monte Carlo approach to finding the lowest energy level. It is an estimation at best, and here is why... The search space in it's entirety is approximately 3 to the power of the number of amino acids that comprise the protein. So, if the protein is 100 amino acids long (relatively small), the total number of possible structures is estimated at roughly: 3^100 = 5.1537752073201133103646112976562e+47 so that is a 5 with 47 zeros after it! If you had a supercomputer that could examine one of the structures every microsecond (which would be very fast considering the complexity of the energy function in a 3D structure), it would take 5.1537752073201133103646112976562e+41 seconds, which is 1.4316042242555870306568364715712e+38 hours, which is 5.9650176010649459610701519648799e+36 days, which is 1.6342513975520399893342882095561e+34 years! By comparision, the current estimated age of the universe is 14,000,000,000 years, which is 1.4e+10 years So it would take 1,167,322,426,822,885,706,667,348 supercomputers 14 billion years to search the entire space... and that is just ONE of the 100,000+ proteins that need to be studied. This should shed some light on why the protein folding problem is so complex, and why having a faster computer is not all that is required to solve the problem. To have some confidence level that your model is close to the native structure with "only" 200,000 models run is really quite amazing. And how much your confidence level improves if you run 400,000 models is also an interesting question that I'm sure the researchers in BakerLab also considers on a routine basis. Rosetta Moderator: Mod.Sense ID: 63122 · Rating: 0 · rate: / Reply Quote

Otto Send message Joined: 6 Apr 07 Posts: 27 Credit: 3,567,665 RAC: 0	Message 63123 - Posted: 2 Sep 2009, 14:35:26 UTC So to truly and effectively solve any and every imaginable problem, you'd need infinite computing power? Depressing. ID: 63123 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 84	Message 63134 - Posted: 3 Sep 2009, 9:25:34 UTC - in response to Message 63123. So to truly and effectively solve any and every imaginable problem, you'd need infinite computing power? Depressing. According to Moore's Law http://en.wikipedia.org/wiki/Moore%27s_law computing power will double every 2 years at the max. So far we have kept up with that, that kinda brings the problem down to the thinkable range for me anyway. If we keep doubling our ability to compute and the problem doesn't get any bigger, then the problem end is in the range of possibility. How far out is it, only a math wizard could figure that out! I am not one of those so I contribute my computers to the problems. ID: 63134 · Rating: 0 · rate: / Reply Quote

Otto Send message Joined: 6 Apr 07 Posts: 27 Credit: 3,567,665 RAC: 0	Message 63141 - Posted: 3 Sep 2009, 11:41:25 UTC Ok, let's hope Moore's Law at least holds up its 2yr/2x pace (acceleration of it would be more than welcome, but probably needs some major scientific/technological breakthroughs, especially those that cannot be predicted). ID: 63141 · Rating: 0 · rate: / Reply Quote

Michael G.R. Send message Joined: 11 Nov 05 Posts: 264 Credit: 11,247,510 RAC: 0	Message 63142 - Posted: 3 Sep 2009, 14:15:07 UTC It's not as bad as it looks. Nature repeats patterns and conserves useful mutations; only a very small subset of all those possibilities are usually found in proteins, so the search space can be narrowed down quite a lot. Not that it's easy to narrow it down to something very small, but as far as I know, progress is being made on that. ID: 63142 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1836 Credit: 124,981,563 RAC: 178	Message 63145 - Posted: 3 Sep 2009, 16:59:24 UTC - in response to Message 63134. So to truly and effectively solve any and every imaginable problem, you'd need infinite computing power? Depressing. According to Moore's Law http://en.wikipedia.org/wiki/Moore%27s_law computing power will double every 2 years at the max. So far we have kept up with that, that kinda brings the problem down to the thinkable range for me anyway. If we keep doubling our ability to compute and the problem doesn't get any bigger, then the problem end is in the range of possibility. How far out is it, only a math wizard could figure that out! I am not one of those so I contribute my computers to the problems. I thought moore's law was for transistor count and that computer power has increased substantially faster than that? I had a 386DX 25mhz back in 1993 (i think) - i'm sure my current quad is more than 32x as powerful as that! I think it's something like 2 mflops vs 2 gflops so probably in the order of 1000x speedup but obviously difficult to measure because of all the various extensions giving much higher theoretical throughput etc and of course storage and memory have increased too... ID: 63145 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 84	Message 63172 - Posted: 5 Sep 2009, 10:39:10 UTC - in response to Message 63145. I thought moore's law was for transistor count and that computer power has increased substantially faster than that? I had a 386DX 25mhz back in 1993 (i think) - i'm sure my current quad is more than 32x as powerful as that! I think it's something like 2 mflops vs 2 gflops so probably in the order of 1000x speedup but obviously difficult to measure because of all the various extensions giving much higher theoretical throughput etc and of course storage and memory have increased too... You are correct in your statements...Moore's Law is about transistor count but that translates to speed and we are currently progressing faster than his law said we would. There is some concern that we will slow down in the future but of course some very smart people are working very hard to make sure that doesn't happen. ID: 63172 · Rating: 0 · rate: / Reply Quote

dumas777 Send message Joined: 19 Nov 05 Posts: 39 Credit: 2,762,081 RAC: 0	Message 63449 - Posted: 25 Sep 2009, 6:04:00 UTC Last modified: 25 Sep 2009, 6:04:43 UTC As sort of alluded to as future tech, the elephant in the room is quantum computers. We are probably still decades away from a useful large qbit working prototype but the increase in computation power will be orders of magnitude greater than Moore's Law. I really do believe well within the next century we will fully master our own biology at the nano level. Now colonizing other worlds might take a bit longer :). ID: 63449 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 63491 - Posted: 28 Sep 2009, 5:51:58 UTC - in response to Message 63116. First of all, congratulations for the great work. I am very interested to know what you discovered about the search space of protein folding and about the details of your search algorithm. Are there any good sources of information in addition to the "Macromolecular Modeling with Rosetta" article? In my results section I can see a 2d plot for proteins for which I contributed results. I can see there a few proteins with around 200,000 predictions from all users. From the plots it seems likely that the optimum was not yet found. It will be very interesting for me to see plots from proteins with a very large number of predictions, as it will give at least some idea about the search space. It will be even more interesting to see a plot which also contains the best FoldIt result. It will also be interesting to hear your thoughts about the importance of improving the energy function vs. improving the search algorithm. Thanks a lot for your time. Good questions! Mike Tyka is just finalizing a manuscript which describes the folding landscapes for over 100 proteins derived from many hundreds of thousands of rosetta@home trajectories. These landscapes answer some of your questions, and we will post the figures for you to study as soon as they are finalized. the answer to the best fold.it solution depends very much on what the players start with. generally, fold.it players can improve on any puzzle starting point, but it is hard to do better than the best structures sampled in rosetta@home; ie people are better than one processor but not as good as 40,000! the search is the main bottleneck for sure. the energy function is still not perfect, which is why we are still working to improve it, but in almost all cases the native structure is lower in energy than other structures, so the main problem is to find it. ID: 63491 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1264 Credit: 14,424,358 RAC: 4	Message 63512 - Posted: 29 Sep 2009, 2:46:28 UTC - in response to Message 63491. the search is the main bottleneck for sure. the energy function is still not perfect, which is why we are still working to improve it, but in almost all cases the native structure is lower in energy than other structures, so the main problem is to find it. You may want to see if your software is up to handling this protein, which apparantly has two different native structures: Scientists solve mystery of how largest cellular motor protein powers movement http://www.physorg.com/news83861961.html Muscular protein bond -- strongest yet found in nature http://www.physorg.com/news167329628.html Motor proteins may be vehicles for drug delivery http://www.physorg.com/news156775386.html ID: 63512 · Rating: 0 · rate: / Reply Quote