Message boards : Rosetta@home Science : Comments/questions on Rosetta@home journal
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
You might also want to think up a way to encourage a production increase in the smaller teams, and the vast horde of those who don't belong to a team at all. I agree! I'll bet the vast majority of the computing power comes from the "all others". Don't forget the little guy in your efforts to recruit the big guys! There should be some mechanism by which everyone feels they have a shot at some sort of recognition for their efforts. :) Regards, Bob P. |
TioSuper Send message Joined: 2 May 06 Posts: 17 Credit: 164 RAC: 0 |
You might also want to think up a way to encourage a production increase in the smaller teams, and the vast horde of those who don't belong to a team at all. Perhaps look for increases in models/month; and weigh those that have added an extra machine or 5 a little higher than those that increase Rosetta's Boinc share from 10% to 100%. (i.e. grab a couple of both). Perhaps picking out one at random so a non teamed member that runs a second cpu for all of Casp7 has a chance of being picked. Hey something like the NCAA: A Competition within the big Teams, The Middle teams and the small teams and a competition for the independents. But lets make this clear: the main goal is to motivate production, the competition cannot be allowed to degenerate into name calling and all the worst things that competition brings out. |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
You might also want to think up a way to encourage a production increase in the smaller teams, and the vast horde of those who don't belong to a team at all. Hi Bob, any suggestions? we can also have an award for the lowest energy model for each target (we can't do this for the low rmsd model as we are now, because we won't know the true structure!). Rom is starting now to look at the credits issue--I'll direct him to the discussion here. |
rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
Hi Bob, any suggestions? we can also have an award for the lowest energy model for each target (we can't do this for the low rmsd model as we are now, because we won't know the true structure!). Sounds like a good idea! Thinking out loud, I thought in addition perhaps a lottery would be nice, with each "lottery ticket" a structure prediction, or something like that. You would pick winners at random from the pool of structure predictions (whether correct or not is not the relevant issue, the purpose of this lottery is to obtain as many structure predictions as possible). Thus, like with lottery tickets, the more structure prediction "entries" one has, the greater chance one has of winning! Others may have suggestions as well.... Regards, Bob P. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
If anyone else remembers the article, it talked about an entreprenure that's working on methods of doing the x-ray crystallography FASTER. And how each protein can cost $100,000s of USD. OK, I finally found it. I guess it was 10s of thousands, not 100s... either way mulitply it by a billion proteins and it's outta MY budget ;) This article in Wired discusses what some others are doing in the pursuit of protein folding. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
Yes, we will probably get the solutions to the prediction problems, the native structures, in september or october, and we can definitely post the winners for each target retrospectively. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
A somewhat philosophical question, asked by someone who has in the last week, doubled his Rosetta quota, but is wondering... The aim of protein structure prediction is to be able to predict protein structure from an AA sequence. CASP is a "competition" to see which group or agent can best predict the structure of a sequence. This is a compute intensive task. No controversy so far. Is there a danger that a real breakthrough algorithm may be lost, because it's developers received no ongoing financial backing because they did not do well in CASP? Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them? ... asked by someone who has actually reduced his quota at Predictor@Home in order to increase Rosetta, and has to admit to feeling a bit bad about that. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them? I believe you are asking if the one with the most computing power is going to win CASP, more or less regardless of the merits of their scientific approach. And the answer is no. In fact, if I were CASP, I'd structure the event such that it would take more than that to win. Indeed SOME teams make entries based entirely on human analysis of the AA sequence. And in fact, the Baker team has out-performed all other teams at CASP for some time, and done it without a distributed computing project. A real breakthrough algorythm would be one that can produce a more accurate model and do it with less computing power. When Baker's work is "done", you'll be able to enter the AA data in, and determine the structure on a single computer is less than an afternoon. But at this point, there are too many unknowns to achieve that. And no algorythm exists to take you to the solution in a straight line. By contributing your time to Rosetta, you are helping prove that their approach to solving the problem is technically superior to approaches that other teams are taking. ...or maybe the other teams prove to have a better approach! Bottom line is that the best predictors will present to the community how they did it! So, everyone wins. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
A real breakthrough algorythm would be one that can produce a more accurate model and do it with less computing power. Forgive me for not being more precise. My concern was that a small funded research effort may offer a better, in terms of less compute time needed modelling tool, but would never appear because although it presented a small footprint, it was "swamped" by the massive brute force crunching avaiable to popular DC projects or well funded commercial sites. Simple analogy, if I had a model which worked well with 100 CPU units, but was beaten by Rosetta if it used 100,000 CPU units of computer time, I might die because I was simply out computed. It need not be a case of my competitor was better, I was beaten, (and the world looses), because the winner had more cpu time. I maybe don't explain this so well, but it is an obvious issue. Note, I raise this issue having read the CASP site in some detail... Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
My concern was that a small funded research effort may offer a better, in terms of less compute time needed modelling tool, but would never appear because although it presented a small footprint, it was "swamped" by the massive brute force crunching avaiable to popular DC projects or well funded commercial sites. I've had the same concern, but this is exactly the beauty of BOINC and DC, that it offers a "democratisation" of scientific research, i.e. a project with minimal funding and no "public relations" power can tap the huge userbase of BOINC donor community, IF it can appeal to them. Until a few years ago, scientific DC projects had to find corporate sponsors like Intel, or Google or IBM, able to push press-releases to the media, so they could get known to the public. Right now there are several life-science DC projects. I find Rosetta@home the one most compatible with my own priorities (see my DC-howto doc in my sig), but I would still be happy to see more coming online and "compete" for CPU cycles and I regularly re-evaluate my resource share. And personally, knowing my character, I would tend to favor a smaller "underdog" project. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
A somewhat philosophical question, asked by someone who has in the last week, doubled his Rosetta quota, but is wondering... This is a good question. There are a couple of issues: First, CASP is really an experiment rather than a competition. The purpose is for researchers to learn what the major bottlenecks to progress are, so that the community as a whole can progress as fast as possible. Second, the general feeling in the CASP community, and certainly my feeling up until about a year ago, is that computer power is not limiting the quality of the predictions. This is because with the energy functions used previously, it was always possible to rapidly generate models with lower energies than the native structure, and it was not possible to choose the best among these models based on their energies. In our case, we could generate large sets of models quickly with the ab initio part of rosetta, but we did not have any way to pick out the best models from the sets, so more computer power really would not have made a difference in our previous CASP efforts. The new step forward for us a year ago was that with the improved high resolution refinement protocol, we COULD pick out the best structures made for a number of small proteins. Now, the rest of the research community is probably, and rightly so, somewhat skeptical about whether our model of the energetics is really accurate enough to reliably recognize correct models (especially since the dogma in the field is that energy functions are not accurate enough to do this). Hence the importance of CASP--if we can predict accurate structures in a completely blind test, then everybody (including ourselves!) will be convinced that accurate prediction is possible, and researchers everywhere can build on this work. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
If I understand Casp correctly, you're only allowed to turn in 5 guesses.. and if using lowest energy, then you'll send in the blue spot, and two models on the left and the two on the right that are just a little higher energy than the blue dot. Since you can't graph the CASP7 results like this (not having the native structure to compare RMSD on), then there's no way to select the higher energy models 1 angstrom to the left of the 5 low energy models? (** .. X ..) i.e. the two asterixes) Here we have a Casp6 entry.. where the lowest RMSD is around 5.2 Angstroms, and the lowest energy models have an RMSD of around 7.0 and 7.1 Angstroms. (eyeballed..) Looking at the Casp6 target model page that is linked for t198, I see 3 results with the name Baker attached to them. The best one is labeled just "Baker" and around 65% of the atoms in the model are less than 5 Angstroms away from where they are in the native structure. Two more are labeled "Baker-Robetta" and only around 15% of the atoms in those models are less than 5 Angstroms away from where they are in the native structure. How much better is our current best low energy Casp6 model than the ones that were turned in by your lab for Casp 6? And.. with a structure this big what is the RMSD at which the model becomes usuable? 2A, 3A, 4Angstroms, etc. How close are we to being able to create models that will be used for something other than a picture book with labels that read.. "T250 should look like this.." ? |
hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0 |
Now that casp7 has started I realise that this may not be the appropriate time but... After watching the video I have a much better understanding of the problem at hand and would highly recommend watching it to everybody (it's big, the audio itself (from my poor memory) is about 8 meg and isn't really enough by itself (I tried), about 120 meg for the small video version so dial up/capped users be warned). I read some time ago that you had a global optimisation problem, not being current in such things I assumed that you meant that you needed to optimise global variables, which I must admit I thought very strange indeed, having educated myself a bit more I can see now what you mean. I'm unsure what the difference is between heuristic programming and global optimisation is? So I'm going to use them interchangeable that being said I'm sure that you have some heuristics that are better for some types of proteins, I would think that the addition a sort of heuristic abstraction layer may help ie. heuristic_1 //good for protein type 1 heuristic_2 //good for protein type 2 heuristic_3 //good for protein type 3 ... heuristic_n // etc... and weighting each of them for protein family, or sequence similarity to a know structure etc... that updates the weights as the new information comes in. So you would have a heuristic search on a heuristic search so to speak. I would still very much like to know if ran3() is contributing to the clustering, I understand that the most likely and probable reason is your weighting values, but I have run ran2() against ran3() 400000000 times, ran3() hit one number 38 times, and hit 8 numbers 0 times, a trial run at Ralph would be able to clear this up. Just some thoughts to bounce off the Rosetta@Home team. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
Just read this in todays journal entry... One of the sequences is clearly similar to a protein with known structure, and we will use the known structure as a starting point in the searches. ... isn't that a potential trap? Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Just read this in todays journal entry... I would assume if it was, then the prediction would soon fail or divert to something else showing this. Team mauisun.org |
Rollo Send message Joined: 2 Jan 06 Posts: 21 Credit: 106,369 RAC: 0 |
One could use the following approach. Use the lowest energy structure as a reference and calculate the RMSD to that one. Then send only structures as guess 2 to 5, which have at least an RMSD > x compared to the lowest energy structure. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
I would assume if it was, then the prediction would soon fail or divert to something else showing this. Not necessarily. If the structure of the similar protein is known and presumably the lowest energy structure, the unknown sequence may well have a deep energy well thereabouts, but there could be a totally different configuration which is a deeper well. I don't know how likely that is however, just an observation. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Global optimization vs heuristics: I can tell you they aren't the same thing. But I'm not certain what global optimization is either. I'm assuming it's a mathematical concept of trying to solve a complex equation. Heuristics is basically use of historical statistics. Let's say you want to write a computer program to play chess. If your computer is not powerful enough to analyze ALL of the possible moves that lay ahead, then what you do is cheat. One way to cheat is to only look 5 or 6 moves ahead... you do this because "heuristically" you've found that if your decision still looks like a good one that far down the road, then typically it proves to be a good one through the end of the game... even though you've not looked that far ahead yet. In evaluating what is a "good move" you have to devise some method of "scoring" the current game board. i.e. you need some means of analyzing proposed move 1 and comparing it with proposed move 2. And so another way to cheat is to look at the field of next possible moves (say there are 20), and throw out the worst scoring choices... and hueristics (i.e. your past experiences) will determine how many of the poorly scoring choices to throw out and how many to pursue further (i.e. continue looking forward at the moves that would follow those). Say you determine you can throw out 10 of the choices, you then look forward on only 10 possible moves rather than 20. This cuts the computing time downstream from there in half! As you can see, if your scoring mechanism isn't good, you might throw away some moves that prove themselves to be the BEST possible... 3 moves later in the game. Playing chess is similar in many ways to what Rosetta is doing with atoms and molecules. The key is in the "Rosetta score". If the scoring mechanism is perfect (and it's not), you can dramatically narrow your field of possible "moves" (i.e. rotational possibilities) to pursue further. Some words to Google if you are interested: Game theory, game tree, depth-first-search, breadth-first-search, backtracking, traveling salesman problem, napsack algorythm. So, the team has devised several different approaches to solving protein structures. As you've read in Dr. Baker's posts, they are finding creative ways of combining the approaches, and finding heuristically that these combination approaches are yielding better results. As they devise new approaches, they should expect to find some work better for some situations than others. For example, a given approach or scoring method works great for proteins less then 100 amino acids long, but doesn't seem to work well for a 200AA protein. Also, as they find proteins where their approach fails to produce a viable structure... they look for more new approaches :) Since this is all still a new science, and a blind study where you don't KNOW the right answer, I would expect them to try all of their approaches on each protein to some extent. And they'll have to gauge which is looking to be the most likely to produce the best prediction to bring work out to R@H. You have to look at the protein your are handed and determine first whether it is a "screw" or a "nail" or a "bolt" and then decide whether you should reach for your screwdriver, your hammer or your wrench to work with it. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Robert Everly Send message Joined: 8 Oct 05 Posts: 27 Credit: 665,094 RAC: 0 |
Today (Friday) we have closed accepting server predictions for the first What should be done with T0283 WUs that are still on our machines? Also. should the deadlines for the CASP7 WUs be just a bit ealier than the real deadlines. Back to the T0283. One of them I still got running has a deadline of May 27. But if the CASP deadline was May 12, will those results be of any benefit? |
Rollo Send message Joined: 2 Jan 06 Posts: 21 Credit: 106,369 RAC: 0 |
Today (Friday) we have closed accepting server predictions for the first Do Rosetta belong to the group 'server prediction' or 'human expert prediction'? If the latter, than it should be called 'human expert prediction (computer assisted)'. |
Message boards :
Rosetta@home Science :
Comments/questions on Rosetta@home journal
©2024 University of Washington
https://www.bakerlab.org