Validation (not for credits, but for scientific reasons)

Message boards : Number crunching : Validation (not for credits, but for scientific reasons)

To post messages, you must log in.

AuthorMessage
mage492

Send message
Joined: 12 Apr 06
Posts: 48
Credit: 17,966
RAC: 0
Message 13718 - Posted: 14 Apr 2006, 12:50:56 UTC

Okay, I've been reading page after page of discussion about validation with regard to fair granting of credit, and such. I've also seen a lot of posts about the science being more important than credit. I'm not touching that debate with a ten-foot pole.

Here's another question about validation, though. Okay, in every branch of science, findings go through a "peer-review" process. Isn't that reason enough to do the same, here? We have thousands of computers running millions (or billions) of operations per second. Especially with overclocking, possibly faulty hardware, network corruption, and a whole host of other factors, how do we know that any single result is valid?

Here's a possible suggestion (based on my imperfect knowledge of the workings of the Rosetta application). What if two people (or maybe three) ran the same protein with different random "seed" values? This suggestion has been mentioned by others, but here are its scientific benefits, as I see them:

1. No work is duplicated. This means it won't take a huge bite out of productivity.

2. This one I'm not sure on. Even with different seed values, wouldn't the two computers (after enough steps) come up with relatively similar values? The match wouldn't be exact, but they would be in the same ballpark, right?

Granted, I'm new to the project (still working on my first WU), but I find it a touch alarming that our WUs aren't being validated. As mentioned above, there are plenty of ways that even an honest user can give a bad result. After all, that's why results in a laboratory are duplicated and journals are peer-reviewed. If my computer gives a bogus result, I want to at least know that someone else caught it.

Also, there's the issue of credibility. In order for the results of the project to be used, researchers need to trust the data. It's a lot easier to trust results that two computers arrived at independently. It doesn't matter how accurate the data is, if you can't convince a researcher to trust it, right?

And if this happens to also fix the credit problems, so much the better!
"There are obviously many things which we do not understand, and may never be able to."
Leela (From the Mac game "Marathon", released 1995)
ID: 13718 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 13722 - Posted: 14 Apr 2006, 14:37:34 UTC - in response to Message 13718.  

It is my understanding that at the current stage Rosetta tries many proteins and tries to predict the final shape. Of those proteins we currently process the final shape is already known form experimental science. The goal is to refine the algorithm to get accurate predictions which than can be applied to yet unknown proteins. If Rosetta enters such a stage in which it will predict the shape of yet unknown proteins some kind of validation procedure will be necessary.

ID: 13722 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 13726 - Posted: 14 Apr 2006, 15:47:19 UTC

Although it take a while to crunch a result, testing to see if the result is valid can be done quickly. Thus, the developers are able to test the validity of all results returned to them. That is why they don't need redundancy.
ID: 13726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 13753 - Posted: 14 Apr 2006, 19:12:18 UTC
Last modified: 14 Apr 2006, 19:13:55 UTC

Looking at top predictions page

each model we run on our PCs, represents one red dot on the chart
Lowest energy structure highlighted in blue
All other predictions in red

As you can see we're "looking for a needle in a haystack".
For the particular result which found the global energy minima (in blue, at bottom-left of chart), the Rosetta project can re-run that particular model and verify it's indeed a correct calculation and not a "fluke"

That's why R@H (thankfully) doesn't need redundance. It needs to send out as many "explorers" as possible.

But it also means that people can (and do) claim arbitrary credits. I hope R@H implements the SETI-Beta credits as soon as it becomes mainstream.

Right now, as you can tell by the posts, they have their hands full, bi-annual CASP7 coming up, big WUs causing misunderstandings, some buggy WUs, BOINC+Win9x credit problems, testing new stuff on RALPH etc etc


Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 13753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 13759 - Posted: 14 Apr 2006, 19:40:42 UTC - in response to Message 13753.  
Last modified: 14 Apr 2006, 19:46:04 UTC

[quote]Looking at top predictions page

each model we run on our PCs, represents one red dot on the chart
Lowest energy structure highlighted in blue
All other predictions in red

As you can see we're "looking for a needle in a haystack".
For the particular result which found the global energy minima (in blue, at bottom-left of chart), the Rosetta project can re-run that particular model and verify it's indeed a correct calculation and not a "fluke"

That's why R@H (thankfully) doesn't need redundance. It needs to send out as many "explorers" as possible.

But it also means that people can (and do) claim arbitrary credits. I hope R@H implements the SETI-Beta credits as soon as it becomes mainstream.

Right now, as you can tell by the posts, they have their hands full, bi-annual CASP7 coming up, big WUs causing misunderstandings, some buggy WUs, BOINC+Win9x credit problems, testing new stuff on RALPH etc etc
...

Also keep in mind that redundancy in part validates two or more results against each other. If they are different then they will not validate. In the case of Rosetta if two results match it would be a miracle. The chart you displayed below illustrates this very well.

All of those dots represent models. All of them have been returned from the same class of WUs. So in fact there is no way to validate the results from two different computers against each other as is done on other projects. You are correct that the answer is something akin to the SETI@Home Enhanced credit system. and when this becomes the standard for BOINC projects, it will be standard here as well.

Of course what is interesting about that system is that two machines with very different CPU times claim the same credit. In otherwords slow systems that take 200,000 CPU seconds claim the same credit as fast systems that take 100,000CPU seconds to process a Wu. So one has to wonder how this is different than simply keeping a count of the number of WUs processed as the scoring.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 13759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 13761 - Posted: 14 Apr 2006, 19:50:10 UTC

What if two people (or maybe three) ran the same protein with different random "seed" values? This suggestion has been mentioned by others, but here are its scientific benefits, as I see them:

For every WU that we've been doing, the Rosetta team are trying to get 10,000 models/decoys back. And they've pointed out that they don't care if they get 1 model/decoy back from 10,000 computers, 10 model/decoys back from 1000 computers, 100 model/decoys back from 100 computers.. or any combination as long as they get 10,000 model/decoys back. Each of us is getting a different seed value to start with.
The current approach leads to just a few models/decoys anywhere near the one with the lowest RMSD value, and just a few models/decoys anywhere near the one with the lowest energy value - which implies to me that the client can't get to either of these searched for states from any starting point (random seed value).
To validate our results, some have recommended sending the exact same seed value out to multiple participants, and comparing the results. David Baker and/or David Kim have pointed out that this would decrease our production to 1/2 or 1/3 our present ability - just to verify that 9995 results really were terrible. (They validate the handful of good results in the lab..) Rather than waste 1/2 or 2/3rds of our computing power proving that the terrible results were terrible, they'd be better spent with us crunching 20,000 or 30,000 decoys/models and producing 2 or 3 times the number of results close to the current lowest RMSD and current lowest energy.

If you look at the top results for 2chf on this page: Top Results you'll notice Nec, one of the refugees from FaD got the best RMSD result for this protein. And the results look like a fingerprint taken from someone's police file.. cooincidence? /e ducks. (disclaimer: I'm a refuge from FaD as well..) And then there's Nightlord's lowest energy result.

What is the chance that a result in those two clumps is not only wrong, but would be near either of the best results in RMSD or energy score?
ID: 13761 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mage492

Send message
Joined: 12 Apr 06
Posts: 48
Credit: 17,966
RAC: 0
Message 13812 - Posted: 15 Apr 2006, 8:36:32 UTC

Okay, that makes sense. So, because they only need to verify the "best" result, we can have validation without sending out duplicate work. Thanks for the explanation(s)!
"There are obviously many things which we do not understand, and may never be able to."
Leela (From the Mac game "Marathon", released 1995)
ID: 13812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
uioped1
Avatar

Send message
Joined: 9 Feb 06
Posts: 15
Credit: 1,058,481
RAC: 0
Message 14075 - Posted: 18 Apr 2006, 20:24:38 UTC - in response to Message 13726.  

Although it take a while to crunch a result, testing to see if the result is valid can be done quickly. Thus, the developers are able to test the validity of all results returned to them. That is why they don't need redundancy.


This would be true if the problem were NP-Complete. I strongly suspect that results are not verifiable in polynomial time (for the case of problems where we don't know the correct answer ahead of time; I don't think we've run any like that)

Perhaps someone from the project could verify that statement?

Also, to paraphrase what some others have posted, including the moderator, we are essentially all running the same WUs when we're working on the same job, from a redundancy sense. It is true, as I think BennyRop was trying to say, that this means someone could fraudulently claim to have 'found' the best decoy by working from the known result that we've been sending out for RMSD calculations. However, When it comes to real scientific applications of rosetta this won't be a problem because we obviously won't be sending that info out (because that's what we'll be trying to calculate) and, as the moderator points out, there's nothing we can really do about it anyway.

Finally with regard to trust of the data, as was pointed out over in the science journal some time ago, I think they have developed a method of combining the results that have the lowest energy and determining which of those have the lowest RMSD, (and I think even estimating what the RMSD is) thus ensuring that the results are scientifically useful.
ID: 14075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Validation (not for credits, but for scientific reasons)



©2024 University of Washington
https://www.bakerlab.org