Message boards : Number crunching : No credit!
Author | Message |
---|---|
Pixiebot Send message Joined: 6 Nov 05 Posts: 50 Credit: 60,515 RAC: 0 |
Windows 98SE. Rosetta 4.81. The job below took at least 1 hour 40 minutes, ran to 100%, the job returned as valid, yet no credit. Why? 22 Dec 2005 15:01:49 UTC 22 Dec 2005 20:20:10 UTC Over Success Done 0.00 0.00 0.00 computer summary Just checked a Windows ME machine, same thing. Is there any point in running 9X machines? I know the minimum required is XP but these machines have run well up to now, until Rosetta 4.81 that is :( Oh and I'm now seeing errors on all my machines.... Not good. When will these bad jobs flush out of the system? |
Andrew Send message Joined: 19 Sep 05 Posts: 162 Credit: 105,512 RAC: 0 |
|
Pixiebot Send message Joined: 6 Nov 05 Posts: 50 Credit: 60,515 RAC: 0 |
One can hardly call that credit! However, yes you are right, the job took longer than the credit granted. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
Just checked a Windows ME machine, same thing. I didn't realize Rosetta's minimum was Win2000/XP until today. Some projects go back as far as Win95, so BOINC itself will handle back that far. However, the "not reporting CPU time" is pretty common on Win9x, it's just not a big deal at other projects because of the quorum. You request 0 credits, so you're the "low" value that's thrown out, and all of you get the "middle" value. Here, with no quorum and "you get what you ask for"... Once flops-counting is implemented, it shouldn't matter, as that will replace the benchmark*time approach. No idea when that will be, however. |
Andrew Send message Joined: 19 Sep 05 Posts: 162 Credit: 105,512 RAC: 0 |
Once flops-counting is implemented, it shouldn't matter, as that will replace the benchmark*time approach. No idea when that will be, however. So unless you want to crunch for no credit... suspend until a new client is release that addresses this issue :( |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Windows 98SE. Rosetta 4.81.... no credit This ME box has had credit from the 4.81 app. I think you have just been hit worse than most with the batches of bad WU that are coming round just now. If you or Bill ask I can boot up my Win98SE box again to tet 4.81 (it last ran with app 4.80) -- currently it's booted into Linux. River~~ |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
This ME box has had credit from the 4.81 app. I think you have just been hit worse than most with the batches of bad WU that are coming round just now. River, I'm not sure that it would tell us anything. The "not reporting the correct amount of CPU time" issue has been known about Win95/Win98/WinME since the beginning of BOINC, and it comes down to the OS not being "true multitasking". Sometimes it works - sometimes it doesn't. ONE fix I've heard recommended is to either set "leave applications in memory when preempted" to _no_ (just the opposite of the normal Rosetta recommendation), or to run only a single project, so it doesn't get switched out. I don't think either guarantee success, but they improve your odds. I really doubt it's a 4.80 vs 4.81 issue. If anything, 4.81 probably checkpoints _more_, but I don't see why that would cause the problem. |
Pixiebot Send message Joined: 6 Nov 05 Posts: 50 Credit: 60,515 RAC: 0 |
Some jobs run ok, others don't. No jobs on my 9X machines have reported valid jobs with 0 time/credit prior to 4.81 I've just checked them all. All instances have happened on or since the 21st December, when 4.81 was sent out. 17 instances of valid jobs 0 time/credit (total) on 5 diferent boxes, since version 4.81 is not coincidence. These are not the bad batch jobs River, I've had them too. EDIT I have found 1 valid job 0 time/credit prior to 4.81. So I take back some of what I said above, but it is happening a lot more frequently (16 instances since 4.81). |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Thanks for the figures - I agree a sudden increase on that scale is likley to be causal given that it remains after excluding the bad batches. It seems then that the difference is one of degree. Not unseen before, but significantly more frequent now. How unreliable it has to get before you take your box to another project is an area where different people will have different feelings. Myself, I'd already go fed up with win-98 on another project, which is why that box dual boots with Linux - but to go there would be off-topic... It is an alternative to going to another project tho ;-) River~~ |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
17 instances of valid jobs 0 time/credit (total) on 5 diferent boxes, since version 4.81 is not coincidence. Did you notice if the 0-credit jobs _followed_ a bad-batch error job? I posted a comment somewhere around here that on SETI, they saw that a "-6" error on one WU would cause the next WU to have 0 CPU time. Edit:: Looking here at one of your Win98 boxes, this actually does seem to be a strong possibility... |
Pixiebot Send message Joined: 6 Nov 05 Posts: 50 Credit: 60,515 RAC: 0 |
I've already gone! After 5 weeks or so with very smooth running, I was dismayed at all these things seemingly going wrong at once. Seemed like I was beta testing and I don't recollect signing up for that. Is the science valid on these zero time units? Personal credit isn't the only thing that drives me to run projects, but good science and full credit is the least we should expect I would have thought. Seems at least one of these things isn't happening here at the moment. I was also trying to troubleshoot a team mates inability to return valid work, then these bad units entered the mix, and hey ho, what's the point.... I was gobsmacked to read in another thread that there is as yet no in-house testing of work about to be sent out, and to make changes with no testing and before the holidays seems ludicrous, and dare I say a touch amateurish. But, I would just like to say thanks to all the volunteers here who have helped me during this brief stay. Onward to a more stable and Windows 9X friendly project methinks. |
Jack Schonbrun Send message Joined: 1 Nov 05 Posts: 115 Credit: 5,954 RAC: 0 |
Everyone has to do what that are comfortable with, and I hope you will give us another chance once things have stabilized. We do have in-house testing, but it has been obviously been demonstrated to be inadequate. Distributed Computing challenges code in ways that can unanticpated. I think one of the things that makes Rosetta@home interesting is that code and algorithms will be constantly evolving. This probably makes it more likely that we will send out faulty work units. I think this is just going to be that kind of project. In my experience on the Rosetta project before it moved to distributed computing, I know that David Baker will always want to be updating the code with new ideas. Because of this especially, we will definitely need to implement a more rigorous test suite, and I wouldn't blame anyone for waiting to sign back up until that happens. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
True. That is one of the issues which led me to suggest a Rosetta Alpha project, where geeks like myself could head off new releases of WU before they hit the majority of participants. I found it encouraging that within the day there were postings from two paid team members endorsing the idea. Personally I find it easier to accept the team's apologies precisely because they take on practical suggestion about how problem X can be avoided in future.
Yes The zero time problem arises because of poor communication between the client and the application. The client is the part of the software that is common to any BOINC project, the application is the science-specific part. They run as separate processes so that they can be compiled separately, and so you can change one without changint the other. The Win95-98-ME family of operating systems is particularly poor at inter-process communication. In contrast, the science is all contained within the app. There the problems of inter-process communication do not arise. There will be bigger differences betwen Linux and winXP than between winME and winXP as far as the science output is concerned. The work you did, whether credited or not, is no less likely to be valid than stuff run on winXP or Linux. River~~ |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
Windows 98SE. Rosetta 4.81.... no credit I wouldn't burn too much time on that. My 4 win98 boxes are all starting to show 0 credits for results that took reasonable time (i.e. several hours "wallclock time"), presumably as a result of 4.81 In the grand scheme of things, the zero credits doesn't make the slightest bit of difference, the main thing is that the Rosetta team get back meaningful results. What I'll probably do is wait it out till the new year, and see if Flops-Counting is on their radar. If not, I'll reinstall them as Win2K. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
I wouldn't burn too much time on that. My 4 win98 boxes are all starting to show 0 credits for results that took reasonable time (i.e. several hours "wallclock time"), presumably as a result of 4.81 dg, can you check and see if the "zero credit" results followed a "short WU" error result? I'm thinking this is not a 4.81 problem so much as a "if an error occurred, the next one won't have the CPU clock running" problem. |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
I wouldn't burn too much time on that. My 4 win98 boxes are all starting to show 0 credits for results that took reasonable time (i.e. several hours "wallclock time"), presumably as a result of 4.81 In at least one case, a "zero credit" definitely did follow an error result. It took a bit of poking through the host messages, and the results, but this WU errored on host 65292 immediately prior to this WU getting zero credits, same system: 65292, which is one of my 98 se boxen. What else is food for thought. Boincview has a "CPU efficiency" display, and on some (but not all) of my 98 se boxen that shows zero, even though they're working 100% on Rosetta. Do you want me to try to find a few more cases of this? |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,177 RAC: 21 |
Do you want me to try to find a few more cases of this? No... that makes "several", I think it's time to (if it matters) do the search using SQL on the server side. |
Message boards :
Number crunching :
No credit!
©2024 University of Washington
https://www.bakerlab.org