Message boards : Number crunching : Low Scores Anyone?
Author | Message |
---|---|
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
v5.34 is a much more stable release than 5.32. However, I'm confused as to the reduction in granted credit. My last two comleted WU's scored in the mid 40's where I used to average 75 per 8hr WU. Dell D-300 2.8GHz P4 1 GB 333MHz DDR-2 Ram. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
No thanks, I just had one. ;-) |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
How much L1/L2 cache do you have? I've seen some whopping big WUs recently. dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
According to memtest86, I have 16k L1 and 1 meg L2. I also check the memory footpint on task manager and I've not noticed anything larger than what ran during CASP 7. Some of those were 300 to 400 AA's long. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Someone last night mentioned having a few WUs turned in that were given much lower credits and blamed it on 5.34. They were getting around what they asked for by 5.34 until they started running WUs that were similar to this: 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_SAVE_ALL_OUT__1306_8456_0 and then they were suddenly getting 1/3rd of the their claimed credit. You're doing much better in that regard; getting 44 and 48 when asking for about 60. (Although one didn't get to the 25k second range.). With the large range of problems with your winxp machine, would you mind trying to diagnose the problems? Rule out hardware problems on your end by testing the ram with Memtest86+, some type of HD check - (windows checkdisk is a start - but the diagnostics from your HD's manufacturer would be better second test.) And if those don't help identify the problem, perhaps turning off the Rosetta screensaver and seeing if it is contributing to the problems on your system. (By eating up more ram than usual, it may be causing the main Rosetta application to use defective ram that hasn't been used on other projects or WUs that didn't have an error.) |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
I ran memtest86 for several hours completing over 60 passes and every thing passed. That rules out memory. As for hard drive, the earlier versions of Rosetta didn't seem to use page-file. However with 5.32, I did notice some page-file swapping going on. Windows normally experiences a majority of its page faults with page-file swapping. That's unusual considering 1GB of ram. I'll se how the next units perform and then chech the HD. By the way thanks for the help. |
Gen_X_Accord Send message Joined: 5 Jun 06 Posts: 154 Credit: 279,018 RAC: 0 |
My granted are lower too Keith, you can see it in my results and granted credit section. I used to get 60's and 70's granted for 8 hours, now it is in the 40's and 50's. And in my results chart, you can see when it happened too, right after a computational error on one work unit on Oct. 24. My RAC has suffered for it ever since. Can this be fixed??? |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
As I understand it, the credit you get is proportional to the number of decoys you produce. Typically, this machine, with 18 hour wu's produces dozens of decoys per wu, sometimes over a hundred. A couple of low scoring wu's recently have only produced 5-6 decoys. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
As I understand it, the credit you get is proportional to the number of decoys you produce. Typically, this machine, with 18 hour wu's produces dozens of decoys per wu, sometimes over a hundred. A couple of low scoring wu's recently have only produced 5-6 decoys. For a given type of WU, credit is proportional to no of decoys. The credit per decoy varies between WU of differing types as determined by an average of other people's runs on WU of that type (more info here). If your credit per hour has dropped it means either a) your hardware is not so efficient on these runs as on previous runs (efficient as compared with the mythical average machine). For example the new run might have overspilled your level1 cache whereas the previous runs did not. or b) the set of machines over which the average was determined has improved in its performance as comapred to yours due to a different set of machines being involved or c) the set of machines over which the average was determined has improved in its performance as comapred to yours due to the new run fitting the average hardware rather better than previous types of WU (eg new runs no longer overspill typical level1 cache where previous WU did) or d) someone has made a mistake. In general I'd rate (a) more likely than (b), both more likely than (c), all more likely that (d). All are possible, one at a time or in any mix. River~~ |
netwraith Send message Joined: 3 Sep 06 Posts: 80 Credit: 13,483,227 RAC: 0 |
Comparison anyone ??? Before I start... I am neutral when it comes to AMD .vs. Intel, but .... (with numbers like this, I would trade in all my XEON's if I only did number crunching... I actually have about a 4 to 1 core ratio... XEON/AMD's) A comparison of two of my systems. Qualifiers: I chose these by the 1hz6A name and by the fact of the wildly diverse claim/grant scores. I was mildly pleased by the fact that the clock rates for the two systems were very similar, so that AMD's performance numbers were not part of the comparision. The AMD is also PRE performance numbers and is rated solely on it's clock rate. It could be that the large L1 on the AMD is the deciding factor. I am certain that the extra cache stages on the Intel cause a hefty performance penalty, but, should not be the cause of the difference. (The function of the L4 is mostly to speed up access to PC-133 memory, saving the purchase cost of DDR).. The AMD job exceeded preferred run time, so, while not ignoring the factor, I am not weighing it heavily either. The Intel would not have been helped much by equal time. Both systems are CentOS 4.4 and both are using appropriate models of kernel: 2.6.9-42.0.3.EL. System A is a local SCSI drive, System B is an FCAL/CLARiiON raid-1. For the sake of this, the disk read/write speed is similar. System A -- AMD Athlon TBIRD Uni-Processor @ 1533MHZ System B -- Intel XEON MP @ 1500MHZ (one of 4 physical/8 logical) System A -- 64kI/64kD 256K-L2 System B -- 12kI/8kD 512K-L2 1024K-L3 32MB-L4 AMD INTEL ___________________________________________________________________ 48402.95 Actual-time 33413.59 59.7020724185046 claim 12.3420001842521 101.790368563704 grant 9.82971861593104 30 decoys-gen 1 43200 Preferred-time 43200 1306_38409_0 SUFFIX 1306_34283_0 System A -- Rosetta id's and name. https://boinc.bakerlab.org/rosetta/result.php?resultid=44225313 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=39021279 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_38409_0 System B -- Rosetta id's and name. https://boinc.bakerlab.org/rosetta/result.php?resultid=44199022 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=38996524 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_34283_0 Looking for a team ??? Join BoincSynergy!! |
Mac-Nic Send message Joined: 6 Jul 06 Posts: 7 Credit: 50,523 RAC: 0 |
Question: How do you know when there's something wrong with the project? Answer: When a AMD duron 1200 on a cheap all in one (even the cpu) main-board with PC133 RAM get more credits/hour than a 2.8Ghz fsb533 Northwood on a decent main-board with DDR400 Mhz RAM. BTW. I have no problems with other projects. Both WIN98 SE machines running 24/7 only with the bare minimum processes(10) Boinc inclusive. Goodbye for now, i'm back when those problems are gone. |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Keith and others: Thanks for bringing this to our attention ... I've been looking into it a bit. The work units like: 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_SAVE_ALL_OUT__1306_8456_0 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__ are the ones that are producing the puzzling low credits. These are computationally intensive workunits -- so the average credit per decoy is significantly higher. In principle, that should translate into a similar credit for each workunit, but that's obviously not what you're reporting. First of all, I'm canceling any more of these workunits, until we figure this out. Second, I've got a hunch as to what might be going on. This kind of workunit does not have checkpointing. So if your client switches to another project (or the screensaver is interrupted), it has to start from scratch unless your user preference is to "Keep in Memory when Preempted". Can you post here whether you are "keeping in memory" or not? Thanks! I ran memtest86 for several hours completing over 60 passes and every thing passed. That rules out memory. As for hard drive, the earlier versions of Rosetta didn't seem to use page-file. However with 5.32, I did notice some page-file swapping going on. Windows normally experiences a majority of its page faults with page-file swapping. That's unusual considering 1GB of ram. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Rhiju, regarding the no checkpointing... this could explain why several folks have noticed their RAC declining. Especially if they have several projects running and switch between projects every... I think the default is 60min (i.e. NOT enough time to complete a model with many of the new WUs). But, if I don't keep the app. in memory, and R@H gets kicked out and restarts a few times before BOINC randomly gives it say two time segments in a row... will the WU reported CPU time be higher then someone that crunches an identical task from an identical seed straight through to completion? What I'm getting at is that I believe the reported CPU time gets reset when the task gets reset, so both of the above scenerios would report similar runtime... yet there seem to be cases where the credit issued varies significantly from prior R@H versions. ...I'm thinking perhaps some of the new science in the new WUs adds more variation to the runtime of each model?? Although that should result in some reports of excessive credit too, which I've not seen. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
R.L. Casey Send message Joined: 7 Jun 06 Posts: 91 Credit: 2,728,885 RAC: 0 |
Question: Thanks for the summary info, Screbilde. If you would unhide your computers, perhaps someone could help you. It would also be good to report the settings you see on your computers under "View Computers". The WUs generate rather varied workloads--even for similer WUs--and the variation you are seeing is likely temporary. |
Gen_X_Accord Send message Joined: 5 Jun 06 Posts: 154 Credit: 279,018 RAC: 0 |
I only run Rosetta,no other project. I run it 24/7, and I do have "leaving the work in memory if interrupted" selected. And you can see my computer, and what the screwy credit on those work units was. (Must be another conspiracy by the Vulcans.) |
R.L. Casey Send message Joined: 7 Jun 06 Posts: 91 Credit: 2,728,885 RAC: 0 |
I only run Rosetta,no other project. I run it 24/7, and I do have "leaving the work in memory if interrupted" selected. And you can see my computer, and what the screwy credit on those work units was. (Must be another conspiracy by the Vulcans.) Thanks for the info, Gen_X_Accord. For possible comparison purposes, I have a WU task result https://boinc.bakerlab.org/rosetta/result.php?resultid=44126512 from the same series of "1hz6A_BOINC_NATIVEJUMPS_SAVE_ALL_OUT__1306_xxxx" WUs that ran on my Intel Celeron M, HostID 246746, that produced 17 decoys in 20402.98 CPU seconds, resulting in Claimed credit 45.3594354091797 and Granted credit 54.4135431404089, about the same (granted) credit as the task result you reported for 27857.09 CPU seconds and 19 decoys. >>>>> Stats for the host: Name Laptop Created 7 Jun 2006 1:25:17 UTC Total Credit 22,498.14 Recent average credit 190.32 CPU type GenuineIntel Intel(R) Celeron(R) M processor 1.40GHz Number of CPUs 1 Operating System Microsoft Windows XP Home Edition, Service Pack 2, (05.01.2600.00) Memory 503.37 MB Cache 976.56 KB Swap space 1226.04 MB Total disk space 34.23 GB >>>>> General Preferences: Separate preferences for school Processor usage Do work while computer is running on batteries? (matters only for portable computers) yes Do work while computer is in use? yes Do work only between the hours of (no restriction) Leave applications in memory while suspended? (suspended applications will consume swap space if 'yes') no Switch between applications every (recommended: 60 minutes) 60 minutes On multiprocessors, use at most 1 processors Use at most 100 percent of CPU time Disk and memory usage Use no more than 100 GB disk space Leave at least (Values smaller than 0.001 are ignored) 0.001 GB disk space free Use no more than 50% of total disk space Write to disk at most every 3 seconds Use no more than 100% of total virtual memory Network usage Connect to network about every (determines size of work cache; maximum 10 days) 1 days Confirm before connecting to Internet? (matters only if you have a modem, ISDN or VPN connection) no Disconnect when done? (matters only if you have a modem, ISDN or VPN connection) no Maximum download rate: no limit Maximum upload rate: no limit Use network only between the hours of Enforced by versions 4.46 and greater (no restriction) Skip image file verification? Check this ONLY if your Internet provider modifies image files (UMTS does this, for example). Skipping verification reduces the security of BOINC. no >>>>> Rosetta Preferences (at the time the task was run): Separate preferences for school Resource share If you participate in multiple BOINC projects, this is the proportion of your resources used by Rosetta@home 400 Percentage of CPU time used for graphics not selected Number of frames per second for graphics not selected Target CPU run time 6 hours --------------------------------- Free Disk Space 18.26 GB Measured floating point speed 1235.52 million ops/sec Measured integer speed 2499.75 million ops/sec Average upload rate 0.92 KB/sec Average download rate Unknown Average turnaround time 1.19 days Maximum daily WU quota per CPU 100/day Results 88 Number of times client has contacted server 95 Last time contacted server 29 Oct 2006 1:39:26 UTC % of time BOINC client is running 97.675 % While BOINC running, % of time work is allowed 99.5911 % Average CPU efficiency 0.944082 Result duration correction factor 0.597361 ---------------------------- I'd be interested in any other data or analysis that you can provide. Keep crunching Rosetta! :-) And... Live Long and Prosper! |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
On a 1.8GHz Duron 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_33105 On an Athlon XP 1800+ 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_41102 Both computers run Linux, both are dedicated 24/7 crunchers running only Rosetta, both have keep in memory set to yes. Both WUs crunched only 2 decoys and only got around 1/4 the usual credit. |
netwraith Send message Joined: 3 Sep 06 Posts: 80 Credit: 13,483,227 RAC: 0 |
My preferences are set to "Keep in memory when preempted" Most of my machines have a GB or more, so, this is not a problem. I am sure that this did assist some WU's, but, not all... Keith and others: Looking for a team ??? Join BoincSynergy!! |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Thanks for your replies everyone -- this is definitely puzzling, especially because so many of the problem clients are leaving the app in memory. I'll have to talk to David Kim on Monday about whether there may be an issue in how credit is assigned. In the meanwhile, I'll be starting tests on ralph tomorrow (Sunday) on some ways to accelerate the new kind of science workunits. For your information, these new workunits allow the bond lengths and angles to move like they do in real proteins -- the additional degrees of freedom result in some computational cost, but appear to produce much better looking models. The new ralph app and implementation should increase the speed by two to three fold... let's see how it goes! Rhiju, regarding the no checkpointing... this could explain why several folks have noticed their RAC declining. Especially if they have several projects running and switch between projects every... I think the default is 60min (i.e. NOT enough time to complete a model with many of the new WUs). |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Is it possible that when a WU errors out (or is aborted) and reports back, that it gets averaged into the credit per model as a zero? And then during the nightly run is granted some credit for the failure, which is not added back in to the running average claim? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Message boards :
Number crunching :
Low Scores Anyone?
©2024 University of Washington
https://www.bakerlab.org