Questions about the search space

Message boards : Rosetta@home Science : Questions about the search space

To post messages, you must log in.

AuthorMessage
PhotonSmasher

Send message
Joined: 26 Aug 09
Posts: 1
Credit: 224,496
RAC: 0
Message 63116 - Posted: 2 Sep 2009, 0:13:28 UTC

First of all, congratulations for the great work.
I am very interested to know what you discovered about the search space of protein folding and about the details of your search algorithm.
Are there any good sources of information in addition to the "Macromolecular Modeling with Rosetta" article?
In my results section I can see a 2d plot for proteins for which I contributed results. I can see there a few proteins with around 200,000 predictions from all users. From the plots it seems likely that the optimum was not yet found. It will be very interesting for me to see plots from proteins with a very large number of predictions, as it will give at least some idea about the search space.
It will be even more interesting to see a plot which also contains the best FoldIt result.
It will also be interesting to hear your thoughts about the importance of improving the energy function vs. improving the search algorithm.

Thanks a lot for your time.
ID: 63116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 63122 - Posted: 2 Sep 2009, 13:55:44 UTC

I will defer to the Project Team for more details. But I believe it is safe to say they are constantly improving both the energy function and the search algorithm. Both are important. Here is some background...

The system uses a Monte Carlo approach to finding the lowest energy level. It is an estimation at best, and here is why...

The search space in it's entirety is approximately 3 to the power of the number of amino acids that comprise the protein. So, if the protein is 100 amino acids long (relatively small), the total number of possible structures is estimated at roughly:
3^100 = 5.1537752073201133103646112976562e+47
so that is a 5 with 47 zeros after it!

If you had a supercomputer that could examine one of the structures every microsecond (which would be very fast considering the complexity of the energy function in a 3D structure), it would take
5.1537752073201133103646112976562e+41 seconds, which is
1.4316042242555870306568364715712e+38 hours, which is
5.9650176010649459610701519648799e+36 days, which is
1.6342513975520399893342882095561e+34 years!

By comparision, the current estimated age of the universe is
14,000,000,000 years, which is
1.4e+10 years

So it would take 1,167,322,426,822,885,706,667,348 supercomputers 14 billion years to search the entire space... and that is just ONE of the 100,000+ proteins that need to be studied.

This should shed some light on why the protein folding problem is so complex, and why having a faster computer is not all that is required to solve the problem.

To have some confidence level that your model is close to the native structure with "only" 200,000 models run is really quite amazing. And how much your confidence level improves if you run 400,000 models is also an interesting question that I'm sure the researchers in BakerLab also considers on a routine basis.
Rosetta Moderator: Mod.Sense
ID: 63122 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Otto

Send message
Joined: 6 Apr 07
Posts: 27
Credit: 3,567,665
RAC: 0
Message 63123 - Posted: 2 Sep 2009, 14:35:26 UTC

So to truly and effectively solve any and every imaginable problem, you'd need infinite computing power? Depressing.
ID: 63123 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,168,954
RAC: 3,909
Message 63134 - Posted: 3 Sep 2009, 9:25:34 UTC - in response to Message 63123.  

So to truly and effectively solve any and every imaginable problem, you'd need infinite computing power? Depressing.


According to Moore's Law http://en.wikipedia.org/wiki/Moore%27s_law computing power will double every 2 years at the max. So far we have kept up with that, that kinda brings the problem down to the thinkable range for me anyway. If we keep doubling our ability to compute and the problem doesn't get any bigger, then the problem end is in the range of possibility. How far out is it, only a math wizard could figure that out! I am not one of those so I contribute my computers to the problems.
ID: 63134 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Otto

Send message
Joined: 6 Apr 07
Posts: 27
Credit: 3,567,665
RAC: 0
Message 63141 - Posted: 3 Sep 2009, 11:41:25 UTC

Ok, let's hope Moore's Law at least holds up its 2yr/2x pace (acceleration of it would be more than welcome, but probably needs some major scientific/technological breakthroughs, especially those that cannot be predicted).
ID: 63141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 63142 - Posted: 3 Sep 2009, 14:15:07 UTC

It's not as bad as it looks. Nature repeats patterns and conserves useful mutations; only a very small subset of all those possibilities are usually found in proteins, so the search space can be narrowed down quite a lot.

Not that it's easy to narrow it down to something very small, but as far as I know, progress is being made on that.
ID: 63142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,617,765
RAC: 11,361
Message 63145 - Posted: 3 Sep 2009, 16:59:24 UTC - in response to Message 63134.  

So to truly and effectively solve any and every imaginable problem, you'd need infinite computing power? Depressing.


According to Moore's Law http://en.wikipedia.org/wiki/Moore%27s_law computing power will double every 2 years at the max. So far we have kept up with that, that kinda brings the problem down to the thinkable range for me anyway. If we keep doubling our ability to compute and the problem doesn't get any bigger, then the problem end is in the range of possibility. How far out is it, only a math wizard could figure that out! I am not one of those so I contribute my computers to the problems.

I thought moore's law was for transistor count and that computer power has increased substantially faster than that? I had a 386DX 25mhz back in 1993 (i think) - i'm sure my current quad is more than 32x as powerful as that! I think it's something like 2 mflops vs 2 gflops so probably in the order of 1000x speedup but obviously difficult to measure because of all the various extensions giving much higher theoretical throughput etc and of course storage and memory have increased too...


ID: 63145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,168,954
RAC: 3,909
Message 63172 - Posted: 5 Sep 2009, 10:39:10 UTC - in response to Message 63145.  

I thought moore's law was for transistor count and that computer power has increased substantially faster than that? I had a 386DX 25mhz back in 1993 (i think) - i'm sure my current quad is more than 32x as powerful as that! I think it's something like 2 mflops vs 2 gflops so probably in the order of 1000x speedup but obviously difficult to measure because of all the various extensions giving much higher theoretical throughput etc and of course storage and memory have increased too...


You are correct in your statements...Moore's Law is about transistor count but that translates to speed and we are currently progressing faster than his law said we would. There is some concern that we will slow down in the future but of course some very smart people are working very hard to make sure that doesn't happen.
ID: 63172 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
dumas777

Send message
Joined: 19 Nov 05
Posts: 39
Credit: 2,762,081
RAC: 0
Message 63449 - Posted: 25 Sep 2009, 6:04:00 UTC
Last modified: 25 Sep 2009, 6:04:43 UTC

As sort of alluded to as future tech, the elephant in the room is quantum computers. We are probably still decades away from a useful large qbit working prototype but the increase in computation power will be orders of magnitude greater than Moore's Law. I really do believe well within the next century we will fully master our own biology at the nano level. Now colonizing other worlds might take a bit longer :).
ID: 63449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 63491 - Posted: 28 Sep 2009, 5:51:58 UTC - in response to Message 63116.  

First of all, congratulations for the great work.
I am very interested to know what you discovered about the search space of protein folding and about the details of your search algorithm.
Are there any good sources of information in addition to the "Macromolecular Modeling with Rosetta" article?
In my results section I can see a 2d plot for proteins for which I contributed results. I can see there a few proteins with around 200,000 predictions from all users. From the plots it seems likely that the optimum was not yet found. It will be very interesting for me to see plots from proteins with a very large number of predictions, as it will give at least some idea about the search space.
It will be even more interesting to see a plot which also contains the best FoldIt result.
It will also be interesting to hear your thoughts about the importance of improving the energy function vs. improving the search algorithm.

Thanks a lot for your time.


Good questions! Mike Tyka is just finalizing a manuscript which describes the folding landscapes for over 100 proteins derived from many hundreds of thousands of rosetta@home trajectories. These landscapes answer some of your questions, and we will post the figures for you to study as soon as they are finalized.

the answer to the best fold.it solution depends very much on what the players start with. generally, fold.it players can improve on any puzzle starting point, but it is hard to do better than the best structures sampled in rosetta@home; ie people are better than one processor but not as good as 40,000!

the search is the main bottleneck for sure. the energy function is still not perfect, which is why we are still working to improve it, but in almost all cases the native structure is lower in energy than other structures, so the main problem is to find it.

ID: 63491 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,807
Message 63512 - Posted: 29 Sep 2009, 2:46:28 UTC - in response to Message 63491.  

the search is the main bottleneck for sure. the energy function is still not perfect, which is why we are still working to improve it, but in almost all cases the native structure is lower in energy than other structures, so the main problem is to find it.


You may want to see if your software is up to handling this protein, which apparantly has two different native structures:


Scientists solve mystery of how largest cellular motor protein powers movement

http://www.physorg.com/news83861961.html


Muscular protein bond -- strongest yet found in nature

http://www.physorg.com/news167329628.html


Motor proteins may be vehicles for drug delivery

http://www.physorg.com/news156775386.html
ID: 63512 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : Questions about the search space



©2024 University of Washington
https://www.bakerlab.org