Low Scores Anyone?

Message boards : Number crunching : Low Scores Anyone?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 30124 - Posted: 27 Oct 2006, 18:00:23 UTC

v5.34 is a much more stable release than 5.32. However, I'm confused as to the reduction in granted credit. My last two comleted WU's scored in the mid 40's where I used to average 75 per 8hr WU.

Dell D-300 2.8GHz P4 1 GB 333MHz DDR-2 Ram.
ID: 30124 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 30131 - Posted: 27 Oct 2006, 19:09:28 UTC


No thanks, I just had one.

;-)
ID: 30131 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 30142 - Posted: 27 Oct 2006, 20:27:15 UTC

How much L1/L2 cache do you have? I've seen some whopping big WUs recently.
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 30142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 30143 - Posted: 27 Oct 2006, 20:48:09 UTC

According to memtest86, I have 16k L1 and 1 meg L2. I also check the memory footpint on task manager and I've not noticed anything larger than what ran during CASP 7. Some of those were 300 to 400 AA's long.

ID: 30143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 30145 - Posted: 27 Oct 2006, 20:51:31 UTC

Someone last night mentioned having a few WUs turned in that were given much lower credits and blamed it on 5.34. They were getting around what they asked for by 5.34 until they started running WUs that were similar to this:
1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_SAVE_ALL_OUT__1306_8456_0
and then they were suddenly getting 1/3rd of the their claimed credit. You're doing much better in that regard; getting 44 and 48 when asking for about 60. (Although one didn't get to the 25k second range.).

With the large range of problems with your winxp machine, would you mind trying to diagnose the problems? Rule out hardware problems on your end by testing the ram with Memtest86+, some type of HD check - (windows checkdisk is a start - but the diagnostics from your HD's manufacturer would be better second test.) And if those don't help identify the problem, perhaps turning off the Rosetta screensaver and seeing if it is contributing to the problems on your system. (By eating up more ram than usual, it may be causing the main Rosetta application to use defective ram that hasn't been used on other projects or WUs that didn't have an error.)
ID: 30145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 30146 - Posted: 27 Oct 2006, 21:02:33 UTC

I ran memtest86 for several hours completing over 60 passes and every thing passed. That rules out memory. As for hard drive, the earlier versions of Rosetta didn't seem to use page-file. However with 5.32, I did notice some page-file swapping going on. Windows normally experiences a majority of its page faults with page-file swapping. That's unusual considering 1GB of ram.

I'll se how the next units perform and then chech the HD.

By the way thanks for the help.
ID: 30146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Gen_X_Accord
Avatar

Send message
Joined: 5 Jun 06
Posts: 154
Credit: 279,018
RAC: 0
Message 30167 - Posted: 28 Oct 2006, 7:37:26 UTC

My granted are lower too Keith, you can see it in my results and granted credit section. I used to get 60's and 70's granted for 8 hours, now it is in the 40's and 50's. And in my results chart, you can see when it happened too, right after a computational error on one work unit on Oct. 24. My RAC has suffered for it ever since. Can this be fixed???
ID: 30167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 30172 - Posted: 28 Oct 2006, 8:54:04 UTC

As I understand it, the credit you get is proportional to the number of decoys you produce. Typically, this machine, with 18 hour wu's produces dozens of decoys per wu, sometimes over a hundred. A couple of low scoring wu's recently have only produced 5-6 decoys.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 30172 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 30175 - Posted: 28 Oct 2006, 11:37:15 UTC - in response to Message 30172.  
Last modified: 28 Oct 2006, 11:41:13 UTC

As I understand it, the credit you get is proportional to the number of decoys you produce. Typically, this machine, with 18 hour wu's produces dozens of decoys per wu, sometimes over a hundred. A couple of low scoring wu's recently have only produced 5-6 decoys.


For a given type of WU, credit is proportional to no of decoys.

The credit per decoy varies between WU of differing types as determined by an average of other people's runs on WU of that type (more info here). If your credit per hour has dropped it means either

a) your hardware is not so efficient on these runs as on previous runs (efficient as compared with the mythical average machine). For example the new run might have overspilled your level1 cache whereas the previous runs did not.

or

b) the set of machines over which the average was determined has improved in its performance as comapred to yours due to a different set of machines being involved

or

c) the set of machines over which the average was determined has improved in its performance as comapred to yours due to the new run fitting the average hardware rather better than previous types of WU (eg new runs no longer overspill typical level1 cache where previous WU did)

or

d) someone has made a mistake.

In general I'd rate (a) more likely than (b), both more likely than (c), all more likely that (d). All are possible, one at a time or in any mix.

River~~
ID: 30175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 30184 - Posted: 28 Oct 2006, 16:42:26 UTC
Last modified: 28 Oct 2006, 17:07:10 UTC

Comparison anyone ???

Before I start... I am neutral when it comes to AMD .vs. Intel, but .... (with numbers like this, I would trade in all my XEON's if I only did number crunching... I actually have about a 4 to 1 core ratio... XEON/AMD's)

A comparison of two of my systems.

Qualifiers: I chose these by the 1hz6A name and by the fact of the wildly diverse claim/grant scores. I was mildly pleased by the fact that the clock rates for the two systems were very similar, so that AMD's performance numbers were not part of the comparision. The AMD is also PRE performance numbers and is rated solely on it's clock rate. It could be that the large L1 on the AMD is the deciding factor. I am certain that the extra cache stages on the Intel cause a hefty performance penalty, but, should not be the cause of the difference. (The function of the L4 is mostly to speed up access to PC-133 memory, saving the purchase cost of DDR)..

The AMD job exceeded preferred run time, so, while not ignoring the factor, I am not weighing it heavily either. The Intel would not have been helped much by equal time.

Both systems are CentOS 4.4 and both are using appropriate models of kernel: 2.6.9-42.0.3.EL. System A is a local SCSI drive, System B is an FCAL/CLARiiON raid-1. For the sake of this, the disk read/write speed is similar.

System A -- AMD Athlon TBIRD Uni-Processor @ 1533MHZ
System B -- Intel XEON MP @ 1500MHZ (one of 4 physical/8 logical)

System A -- 64kI/64kD 256K-L2
System B -- 12kI/8kD 512K-L2 1024K-L3 32MB-L4

AMD                                     INTEL
___________________________________________________________________
48402.95                Actual-time     33413.59
59.7020724185046        claim           12.3420001842521
101.790368563704        grant           9.82971861593104
30                      decoys-gen      1
43200                   Preferred-time  43200
1306_38409_0            SUFFIX          1306_34283_0


System A -- Rosetta id's and name.
https://boinc.bakerlab.org/rosetta/result.php?resultid=44225313
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=39021279
1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_38409_0

System B -- Rosetta id's and name.
https://boinc.bakerlab.org/rosetta/result.php?resultid=44199022
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=38996524
1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_34283_0

Looking for a team ??? Join BoincSynergy!!


ID: 30184 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mac-Nic

Send message
Joined: 6 Jul 06
Posts: 7
Credit: 50,523
RAC: 0
Message 30198 - Posted: 28 Oct 2006, 22:53:28 UTC
Last modified: 28 Oct 2006, 23:24:03 UTC

Question:
How do you know when there's something wrong with the project?

Answer:
When a AMD duron 1200 on a cheap all in one (even the cpu) main-board with PC133 RAM get more credits/hour
than a 2.8Ghz fsb533 Northwood on a decent main-board with DDR400 Mhz RAM.

BTW. I have no problems with other projects.
Both WIN98 SE machines running 24/7 only with the bare minimum processes(10) Boinc inclusive.

Goodbye for now, i'm back when those problems are gone.
ID: 30198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 30203 - Posted: 29 Oct 2006, 1:13:03 UTC - in response to Message 30146.  

Keith and others:

Thanks for bringing this to our attention ... I've been looking into it a bit. The work units like:

1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_SAVE_ALL_OUT__1306_8456_0
1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__

are the ones that are producing the puzzling low credits. These are computationally intensive workunits -- so the average credit per decoy is significantly higher. In principle, that should translate into a similar credit for each workunit, but that's obviously not what you're reporting.

First of all, I'm canceling any more of these workunits, until we figure this out.

Second, I've got a hunch as to what might be going on. This kind of workunit does not have checkpointing. So if your client switches to another project (or the screensaver is interrupted), it has to start from scratch unless your user preference is to "Keep in Memory when Preempted". Can you post here whether you are "keeping in memory" or not? Thanks!


I ran memtest86 for several hours completing over 60 passes and every thing passed. That rules out memory. As for hard drive, the earlier versions of Rosetta didn't seem to use page-file. However with 5.32, I did notice some page-file swapping going on. Windows normally experiences a majority of its page faults with page-file swapping. That's unusual considering 1GB of ram.

I'll se how the next units perform and then chech the HD.

By the way thanks for the help.


ID: 30203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 30206 - Posted: 29 Oct 2006, 1:24:46 UTC

Rhiju, regarding the no checkpointing... this could explain why several folks have noticed their RAC declining. Especially if they have several projects running and switch between projects every... I think the default is 60min (i.e. NOT enough time to complete a model with many of the new WUs).

But, if I don't keep the app. in memory, and R@H gets kicked out and restarts a few times before BOINC randomly gives it say two time segments in a row... will the WU reported CPU time be higher then someone that crunches an identical task from an identical seed straight through to completion?

What I'm getting at is that I believe the reported CPU time gets reset when the task gets reset, so both of the above scenerios would report similar runtime... yet there seem to be cases where the credit issued varies significantly from prior R@H versions. ...I'm thinking perhaps some of the new science in the new WUs adds more variation to the runtime of each model?? Although that should result in some reports of excessive credit too, which I've not seen.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 30206 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R.L. Casey

Send message
Joined: 7 Jun 06
Posts: 91
Credit: 2,728,885
RAC: 0
Message 30208 - Posted: 29 Oct 2006, 2:08:12 UTC - in response to Message 30198.  

Question:
How do you know when there's something wrong with the project?

Answer:
When a AMD duron 1200 on a cheap all in one (even the cpu) main-board with PC133 RAM get more credits/hour
than a 2.8Ghz fsb533 Northwood on a decent main-board with DDR400 Mhz RAM.

BTW. I have no problems with other projects.
Both WIN98 SE machines running 24/7 only with the bare minimum processes(10) Boinc inclusive.

Goodbye for now, i'm back when those problems are gone.

Thanks for the summary info, Screbilde. If you would unhide your computers, perhaps someone could help you. It would also be good to report the settings you see on your computers under "View Computers". The WUs generate rather varied workloads--even for similer WUs--and the variation you are seeing is likely temporary.
ID: 30208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Gen_X_Accord
Avatar

Send message
Joined: 5 Jun 06
Posts: 154
Credit: 279,018
RAC: 0
Message 30209 - Posted: 29 Oct 2006, 2:28:00 UTC
Last modified: 29 Oct 2006, 2:29:54 UTC

I only run Rosetta,no other project. I run it 24/7, and I do have "leaving the work in memory if interrupted" selected. And you can see my computer, and what the screwy credit on those work units was. (Must be another conspiracy by the Vulcans.)
ID: 30209 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R.L. Casey

Send message
Joined: 7 Jun 06
Posts: 91
Credit: 2,728,885
RAC: 0
Message 30211 - Posted: 29 Oct 2006, 3:52:28 UTC - in response to Message 30209.  
Last modified: 29 Oct 2006, 3:54:45 UTC

I only run Rosetta,no other project. I run it 24/7, and I do have "leaving the work in memory if interrupted" selected. And you can see my computer, and what the screwy credit on those work units was. (Must be another conspiracy by the Vulcans.)


Thanks for the info, Gen_X_Accord.
For possible comparison purposes, I have a WU task result https://boinc.bakerlab.org/rosetta/result.php?resultid=44126512 from the same series of "1hz6A_BOINC_NATIVEJUMPS_SAVE_ALL_OUT__1306_xxxx" WUs that ran on my Intel Celeron M, HostID 246746, that produced 17 decoys in 20402.98 CPU seconds, resulting in
Claimed credit 45.3594354091797 and
Granted credit 54.4135431404089,
about the same (granted) credit as the task result you reported for 27857.09 CPU seconds and 19 decoys.

>>>>> Stats for the host:
Name Laptop
Created 7 Jun 2006 1:25:17 UTC
Total Credit 22,498.14
Recent average credit 190.32
CPU type GenuineIntel
Intel(R) Celeron(R) M processor 1.40GHz
Number of CPUs 1
Operating System Microsoft Windows XP
Home Edition, Service Pack 2, (05.01.2600.00)
Memory 503.37 MB
Cache 976.56 KB
Swap space 1226.04 MB
Total disk space 34.23 GB

>>>>> General Preferences:
Separate preferences for school
Processor usage
Do work while computer is running on batteries?
(matters only for portable computers) yes
Do work while computer is in use? yes
Do work only between the hours of (no restriction)
Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes') no
Switch between applications every
(recommended: 60 minutes) 60 minutes
On multiprocessors, use at most 1 processors
Use at most 100 percent of CPU time
Disk and memory usage
Use no more than 100 GB disk space
Leave at least
(Values smaller than 0.001 are ignored) 0.001 GB disk space free
Use no more than 50% of total disk space
Write to disk at most every 3 seconds
Use no more than 100% of total virtual memory
Network usage
Connect to network about every
(determines size of work cache; maximum 10 days) 1 days
Confirm before connecting to Internet?
(matters only if you have a modem, ISDN or VPN connection) no
Disconnect when done?
(matters only if you have a modem, ISDN or VPN connection) no
Maximum download rate: no limit
Maximum upload rate: no limit
Use network only between the hours of
Enforced by versions 4.46 and greater (no restriction)
Skip image file verification?
Check this ONLY if your Internet provider modifies image files (UMTS does this, for example).
Skipping verification reduces the security of BOINC. no

>>>>> Rosetta Preferences (at the time the task was run):
Separate preferences for school
Resource share
If you participate in multiple BOINC projects, this is the proportion of your resources used by Rosetta@home 400
Percentage of CPU time used for graphics not selected
Number of frames per second for graphics not selected
Target CPU run time 6 hours

---------------------------------

Free Disk Space 18.26 GB
Measured floating point speed 1235.52 million ops/sec
Measured integer speed 2499.75 million ops/sec
Average upload rate 0.92 KB/sec
Average download rate Unknown
Average turnaround time 1.19 days
Maximum daily WU quota per CPU 100/day
Results 88
Number of times client has contacted server 95
Last time contacted server 29 Oct 2006 1:39:26 UTC
% of time BOINC client is running 97.675 %
While BOINC running, % of time work is allowed 99.5911 %
Average CPU efficiency 0.944082
Result duration correction factor 0.597361
----------------------------

I'd be interested in any other data or analysis that you can provide.

Keep crunching Rosetta! :-)

And... Live Long and Prosper!
ID: 30211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 30213 - Posted: 29 Oct 2006, 4:41:39 UTC
Last modified: 29 Oct 2006, 4:42:00 UTC

On a 1.8GHz Duron
1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_33105

On an Athlon XP 1800+
1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1306_41102

Both computers run Linux, both are dedicated 24/7 crunchers running only Rosetta, both have keep in memory set to yes.

Both WUs crunched only 2 decoys and only got around 1/4 the usual credit.
ID: 30213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 30214 - Posted: 29 Oct 2006, 4:48:41 UTC - in response to Message 30203.  


My preferences are set to "Keep in memory when preempted" Most of my machines have a GB or more, so, this is not a problem. I am sure that this did assist some WU's, but, not all...

Keith and others:

Thanks for bringing this to our attention ... I've been looking into it a bit. The work units like:

1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_SAVE_ALL_OUT__1306_8456_0
1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__

are the ones that are producing the puzzling low credits. These are computationally intensive workunits -- so the average credit per decoy is significantly higher. In principle, that should translate into a similar credit for each workunit, but that's obviously not what you're reporting.

First of all, I'm canceling any more of these workunits, until we figure this out.

Second, I've got a hunch as to what might be going on. This kind of workunit does not have checkpointing. So if your client switches to another project (or the screensaver is interrupted), it has to start from scratch unless your user preference is to "Keep in Memory when Preempted". Can you post here whether you are "keeping in memory" or not? Thanks!


I ran memtest86 for several hours completing over 60 passes and every thing passed. That rules out memory. As for hard drive, the earlier versions of Rosetta didn't seem to use page-file. However with 5.32, I did notice some page-file swapping going on. Windows normally experiences a majority of its page faults with page-file swapping. That's unusual considering 1GB of ram.

I'll se how the next units perform and then chech the HD.

By the way thanks for the help.



Looking for a team ??? Join BoincSynergy!!


ID: 30214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 30217 - Posted: 29 Oct 2006, 7:21:05 UTC - in response to Message 30206.  

Thanks for your replies everyone -- this is definitely puzzling, especially because so many of the problem clients are leaving the app in memory. I'll have to talk to David Kim on Monday about whether there may be an issue in how credit is assigned.

In the meanwhile, I'll be starting tests on ralph tomorrow (Sunday) on some ways to accelerate the new kind of science workunits. For your information, these new workunits allow the bond lengths and angles to move like they do in real proteins -- the additional degrees of freedom result in some computational cost, but appear to produce much better looking models. The new ralph app and implementation should increase the speed by two to three fold... let's see how it goes!

Rhiju, regarding the no checkpointing... this could explain why several folks have noticed their RAC declining. Especially if they have several projects running and switch between projects every... I think the default is 60min (i.e. NOT enough time to complete a model with many of the new WUs).

But, if I don't keep the app. in memory, and R@H gets kicked out and restarts a few times before BOINC randomly gives it say two time segments in a row... will the WU reported CPU time be higher then someone that crunches an identical task from an identical seed straight through to completion?

What I'm getting at is that I believe the reported CPU time gets reset when the task gets reset, so both of the above scenerios would report similar runtime... yet there seem to be cases where the credit issued varies significantly from prior R@H versions. ...I'm thinking perhaps some of the new science in the new WUs adds more variation to the runtime of each model?? Although that should result in some reports of excessive credit too, which I've not seen.


ID: 30217 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 30229 - Posted: 29 Oct 2006, 15:41:56 UTC

Is it possible that when a WU errors out (or is aborted) and reports back, that it gets averaged into the credit per model as a zero? And then during the nightly run is granted some credit for the failure, which is not added back in to the running average claim?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 30229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Low Scores Anyone?



©2024 University of Washington
https://www.bakerlab.org