Help me explain the science behind Rosetta@home!

Author	Message
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 42383 - Posted: 20 Jun 2007, 20:24:16 UTC Last modified: 20 Jun 2007, 20:42:10 UTC Tom, I like the idea of your page with a status, and how simple it is. Not sure how to represent some of the raw research they do that isn't directly related to study of a specific disease, but more broadly just trying to see if they can make a good prediction of how two proteins will dock for example. I would just like to see something like the total number of models for a given task they plan to run, and how many are compelted/in progress/ready to send. Then link it in to the results graphic where it shows how your own predictions compare to the rest of the results. And link it in with the aforementioned improved task descriptions as well. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 42383 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 42384 - Posted: 20 Jun 2007, 20:25:21 UTC - in response to Message 42375. Last modified: 20 Jun 2007, 20:25:46 UTC At first glance it looks good overall, just a quick read of the cancer section looked good. I will study it more in detail when I have time later on this week or weekend. Thanks for the 5 min attempt. Here's a 5min attempt at a possible progress report subpage: Click here Do you think this could be useful? ID: 42384 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 42419 - Posted: 21 Jun 2007, 20:10:40 UTC Many people are confused about RMSD. We've read that an RMSD of zero is a perfect match with the native structure as revealed by other scientific methods. But it's really not clear what it means when I complete a model and it says it found an RMSD of 1.1 or something. And it gets more confusing when you learn that an RMSD of 1.0 could generally be "close enough" for biopharma to make use of the models. If the protein is large, would an RMSD of 2.0 be reletively as correct as an RMSD of 1.0 for a smaller protein? Again, an animation of a protein's native structure and then showing a twist made to one of the AAs in the chain and the resulting movement in the structure, and then perhaps more then one for a cumulative 1.0 RMSD would help visualize the concept. It also gets confusing to understand what RMSD means when looking at a docking task. How many of the AAs in the chain are bound at the proper point to the other protein? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 42419 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 42472 - Posted: 22 Jun 2007, 21:24:09 UTC see this thread for what someone posted in regards to 1gidA. Now would someone translate that into plain English? And then add how Baker Labs sees this as a important protein to study and for what particular disease does this protein exist in? ID: 42472 · Rating: 0 · rate: / Reply Quote

Tom Philippart Send message Joined: 29 May 06 Posts: 183 Credit: 834,667 RAC: 0	Message 42970 - Posted: 1 Jul 2007, 16:08:23 UTC Any news about your work Ian Davis? (I don't mean to be pushy, but the thread was about to be forgotten ;) ) ID: 42970 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 42979 - Posted: 1 Jul 2007, 17:15:29 UTC I think we buried him under to many ideas at once :) He hasn't come up for air yet. ID: 42979 · Rating: 0 · rate: / Reply Quote

AgnosticPope Send message Joined: 16 Dec 05 Posts: 18 Credit: 148,821 RAC: 0	Message 43012 - Posted: 2 Jul 2007, 4:05:49 UTC - in response to Message 42419. Many people are confused about RMSD. We've read that an RMSD of zero is a perfect match with the native structure as revealed by other scientific methods. But it's really not clear what it means when I complete a model and it says it found an RMSD of 1.1 or something. And it gets more confusing when you learn that an RMSD of 1.0 could generally be "close enough" for biopharma to make use of the models. If the protein is large, would an RMSD of 2.0 be reletively as correct as an RMSD of 1.0 for a smaller protein? Again, an animation of a protein's native structure and then showing a twist made to one of the AAs in the chain and the resulting movement in the structure, and then perhaps more then one for a cumulative 1.0 RMSD would help visualize the concept. It also gets confusing to understand what RMSD means when looking at a docking task. How many of the AAs in the chain are bound at the proper point to the other protein? Question: do you understand the idea behind Root Mean Square Deviation (RMSD) in the first place? The questions I would ask about RMSD are more like this: what is the distance metric being used to compute the variation between the two molecules (the computed and the native)? Also, how do you select the zero-reference point for measuring the distance between atomic positions for the two molecules? In other words, how do you prevent larger measured values for errors resulting from merely misaligning the two protein structures you are comparing? (This could be stated more as "how do you normalize the positioning of the two molecules being compared?) And what are the useful values for RMSD? Do you merely need to find the lowest value? Or is there, as is suggested by the quote above, a target value underneath which the RMSD must be before the result is considered to be useful? I certainly hope it isn't 1.0 (as stated above) since only 33 of the 140+ work units currently have a "best prediction" under 1.0 RMSD. The highest value for the "best prediction" RMSD is 5.87 (as of this minute). Once the RMSD situation was better understood, I would probably ask for a similar tutorial on the scoring method. Anyway, if this isn't what this is about, then never mind! == Bill ID: 43012 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 43062 - Posted: 2 Jul 2007, 20:27:51 UTC - in response to Message 43012. Question: do you understand the idea behind Root Mean Square Deviation (RMSD) in the first place? ...well... the idea? Yes, but I don't know the specifics (perhaps that is obvious by the questions I had asked. I've read in prior posts that RSMDs of 2-3 are close enough to be of use to biopharma... but I'm not positive why. I suspect that an RSMD that low means that much of the conformation in our model is correct, and therefore biopharma still have a better then average shot at devising a drug that will dock with the protein being modelled. But it would be great to understand all of that better. Is there some way to express a degree of confidence about each section of the protein model? I mean if part of the model is suspected to be incorrect, would there be a way to assess where it most likely is? And then the biopharma folks could focus their efforts on another portion of the protein where the confidence level is higher? I really think a real-world, tangible, touch it type of analogy is required. The video uses the example of dropping a rope in to a gravityless box. I believe that we are supposed to take for granted first that the rope will drop in the first place, and then the gravityless word is just meant to express that the rope will take a three dimensional shape. But the analogy doesn't help me much, because I've never had a gravityless box to play with. I was burying some 4 inch drainage pipe this weekend. It comes in a roll. When you unroll it, it doesn't ever really get back to straight. If you grab one end and turn it clockwise, the opposite end will turn as well. If I were to apply a force in the middle of the tube, I could turn the end some very small amount and not see a reaction on the other end, because it is absorbed in the middle. I've already posted a link to my drinking straw analogy. I'm not sure if that works for most people or not. But some sort of real-world object that people can visualize, and then discuss how doing various things and seeing the reactions maps to the atoms in the proteins. With my drainage pipe example it might go something like this: "If we step down on the middle of the tube and hold it to the ground, we are resembling a strong atomic bond in the protein chain. Such bonds often occur between atoms of X and Y. The result of having that strong bond in the middle of the chain is that a greater force would have to be applied to the rest of the protein on each end in order to have the effect manafest itself on the opposite end. In other words, with the pressure applied, we must twist the end much further in order to impact the other end. And of a twist is already in one end, then less of a twist on the other end in the same direction would overcome the force applied to the middle of the tube and the entire length would be twisted." Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 43062 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 43182 - Posted: 4 Jul 2007, 21:21:18 UTC maybe you could start a thread where you or someone can explain in plain english what these new work units are in 5.7 beta, the mfr stuff is what i am referring to. also what is the curated_bq_cterm with t386? all these terms baffle me totaly. ID: 43182 · Rating: 0 · rate: / Reply Quote

hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0	Message 45308 - Posted: 23 Aug 2007, 10:29:06 UTC Last modified: 23 Aug 2007, 10:50:23 UTC I've had a few beers, but here goes (I'm not looking for you to answer these questions in the forum but to try to help with the educational side of things) DNA is deoxyribonucleic acid and is a double helix, why is it interesting, who did it, who was the lady that didn't get a Nobel prize and why? (maybe the difference between DNA and RNA , and what is this Deoxy anyway) Human Genome. Who did it and why? Genetic sequence, why is it interesting? Amino acid, what they are and how small they are? Maybe the difference between essential amino acids and AA's that humans can make? Codon, some interesting facts about different DNA sequences that produce the same AA, and who came up with it. Protein, what are they and how small are they ? Protein back bone, what atom does the AA hang off, which bits of the AA can twist? Protein shape, who hypothesized that the most likely shape is the least energy one, and how is the energy is calculated, what happens if a protein is subjected to a high temperature, how does this fit with the energy etc.. I've noticed that some people think that the energy level is "how much energy it takes to fold a protein", also "the end that folds first" type questions. How they fold as opposed to what is their shape. Maybe an explanation about heuristics, something to do with travelling salesman problem to explain why computers cannot try every combination in a sensible amount of time. How does R@H use Monte Carlo minimisation, in this regard? Why the need for homologues? Homologues how they are found, is it via DNA or protein sequence that R@H is using, does it matter? And why? What makes amino acids hydro-phobic/phillic and which ones are which, and which ones if any don't care? (in R@H graphically what do they look like) Which AAs want to mate with another AA, is this important in the prediction of the final shape of the protein? Docking molecules, lock and key definition, but does the proteins backbones and or the AAs twist to produce the the final docked shape? Docking novel proteins for health benefits, they may exist in nature but just haven't been found yet. Novel molecules for other purposes. mRNA helper proteins (chaperones) Catalysts in humans what are they, enzymes? How is it possible that a protein can cut a DNA strand? Maybe some stuff on NMR, if it looks like it really will work, but that gets into the dreaded and incomprehensible quantum black magic :( This is just what I can think of at the moment, hope it helps Hugo edit for spelling ID: 45308 · Rating: 0 · rate: / Reply Quote

darkpella Send message Joined: 27 Sep 05 Posts: 13 Credit: 66,840 RAC: 0	Message 45543 - Posted: 28 Aug 2007, 16:49:01 UTC Last modified: 28 Aug 2007, 16:53:48 UTC Hi, I do have two points I could not really understand in the explanation I found here about what rosetta basically does. In this page, if I understood it right, it is told that, starting from one point in the trajectory (current "accepted" configuration) Rosetta will try to move a bit some part of the amino-acid chain in a random manner and then decide whether this new configuration can become the new "accepted" one based on the energy of this "trial" configuration. If so, the "trial" becomes the new "accepted" configuation and the process starts again modifying this new "accepted" configuration", otherwise the old "accepted" configuration is modified again in a different way to get a new "trial" one, the energy of which is then calculated to see whether this new "trial" can be taken as the new "accepted" and so on. The subsequent "accepted" configurations form a trajecory. Rosetta keeps track of what the lowest energy "accepted" configuration found along the trajecory is, then, when the trajecory calculation end, takes this lowest energy configuration as the best prediction for that trajectory. 5 to 20 trajectories are calculated for each WU and lowest energy configuration found among them all is returned as the best prediction for that particular WU. Now the points I don't get are: What criteria is used to decide whether the energy of the "trial" configuartion is right to make a new "accepted" one? It can not be simply that its energy is lower than the one of the "accepted" configuration it was obtained from, since otherwise the energy would be always reducing, hence the last "accepted" configuration would always be the lowest energy one, which is not what happens. How does Rosetta decide that the calculation of a particular trajectory has come to an end? For the same reason as before it can not be that it has found a local minimum Bye darkpella ID: 45543 · Rating: 0 · rate: / Reply Quote

Ian Davis Send message Joined: 10 Dec 06 Posts: 14 Credit: 42,603 RAC: 0	Message 46017 - Posted: 11 Sep 2007, 18:52:36 UTC - in response to Message 45743. What criteria is used to decide whether the energy of the "trial" configuartion is right to make a new "accepted" one? It can not be simply that its energy is lower than the one of the "accepted" configuration it was obtained from, since otherwise the energy would be always reducing, hence the last "accepted" configuration would always be the lowest energy one, which is not what happens. Correct. Lower energy configurations are always accepted. Higher energy configurations are sometimes accepted, based on a virtual "roll of the dice": if it's only a little higher energy than before it's much more likely to be accepted than one that's much higher energy. This helps avoid getting trapped in local minima, but means you'll almost never accept a really bad configuration. Look up "Metropolis Monte Carlo" for more info. How does Rosetta decide that the calculation of a particular trajectory has come to an end? For the same reason as before it can not be that it has found a local minimum. Also correct. The usual answer is that it just takes a set number of steps and then stops. That number is determined empirically for different problems based on how long the simulation takes to converge. In some cases, it may bail out early if it looks like things are going very badly, and just start over. ID: 46017 · Rating: 0 · rate: / Reply Quote

hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0	Message 46758 - Posted: 22 Sep 2007, 7:14:57 UTC Does Rosetta@Home use DNA sequence or protein sequence to identify homologues? Is two similar DNA sequences better at producing homologues, as opposed to different DNA that produce the same AA sequence? ID: 46758 · Rating: 0 · rate: / Reply Quote

Old Member Send message Joined: 21 Sep 07 Posts: 1 Credit: 233,375 RAC: 0	Message 46798 - Posted: 22 Sep 2007, 16:24:17 UTC - in response to Message 42370. Is the disease related research terminated or still going on? That question should be covered in the first place. A subpage, with a list of the diseases covered and a short progress message (1 sentence is enough) on the current stage of the research (planned, ongoing, terminated, results published, ...) This type of progress report encourages many people to keep on supporting the project at WorldCommunityGrid and Einstein. Just to be explicit, yes, disease-related research is still going on. The summary on the front page is the best source I know of right now for what diseases are currently being targeted. But the most important thing is that as Rosetta improves, it will become possible to target all disease in a much more precise way, leading to more effective treatments and fewer side effects, because we're not thrashing around in the dark. In the long run, I think projects like this will revolutionize medicine, but it's important to realize that it will unfold over a period of years and decades -- we're not walking down the hall to inject Rosetta in a sick patient :) With all due respect the page you site contains this statement: "The above projects are not currently running on BOINC because we don't yet have an efficient queuing system which lets people submit jobs easily, but look for them soon! Also, rest assured that the structure prediction calculations currently running on your computers will have direct bearing on treating disease. There is a three-fold explanation for this direct relationship between structure prediction and disease treatment:....." It then goes describe what to me sounds like drug research. Doesn't Pfizer make enough money to do its own research? So my question is this, are we doing drug research, disease research or some combination? I'm no scientist so perhaps I don't comprehend the difference. I'm new to this project so perhaps I have misunderstood. ID: 46798 · Rating: 0 · rate: / Reply Quote

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 46801 - Posted: 22 Sep 2007, 16:47:09 UTC I recently learned about some of the different types of mathematics used in computational chemistry. It seems there is a branch of computational chemistry that uses deterministic methods, and another that uses probabilistic methods, and each has its use depending on what you want to do and the problem at hand! I found it extremely fascinating, and so perhaps a subtopic explaining computational chemistry methods and how and which are used by Rosetta I think would be very interesting! :) Regards, Bob P. ID: 46801 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 51256 - Posted: 9 Feb 2008, 0:58:24 UTC So what is Ian working on now?? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 51256 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 51579 - Posted: 23 Feb 2008, 18:10:58 UTC Dr. Baker mentioned in a journal entry that Ian was training high school teachers on the Rosetta Game and incorporating it in to curriculum for students. I think there was also mention of incoporating general information about Rosetta and protein study into a high school curriculum, but I don't believe we ever heard anything more about it. Was this done? How did the students respond? Is there educational material available for teachers? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 51579 · Rating: 0 · rate: / Reply Quote