Something wrong with Server-Side-Scheduler

Message boards : Number crunching : Something wrong with Server-Side-Scheduler

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Yeti
Avatar

Send message
Joined: 2 Nov 05
Posts: 45
Credit: 14,945,062
RAC: 0
Message 5939 - Posted: 12 Dec 2005, 9:52:35 UTC
Last modified: 12 Dec 2005, 9:53:55 UTC

Hi,

there must be something wrong with the server-side-scheduler. I reactivated an old client, attached it to Rosetta, but didn't get work.

While I was wondering, what could be the problem, the box contacted Predictor@Home, and ah, I got the info, not enough disk-space-free on the client. I think, the message had although to come from Rosetta ...

Below the relevant entrys from message-tab:

12/12/2005 10:29:36|rosetta@home|Successfully attached to rosetta@home
12/12/2005 10:30:28||request_reschedule_cpus: project op
12/12/2005 10:30:31|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi
12/12/2005 10:30:31|LHC@home|Reason: To fetch work
...
12/12/2005 10:30:36|LHC@home|No work from project
12/12/2005 10:30:49||request_reschedule_cpus: project op
12/12/2005 10:30:51||request_reschedule_cpus: project op
12/12/2005 10:30:54||request_reschedule_cpus: project op
12/12/2005 10:31:20||request_reschedule_cpus: project op
12/12/2005 10:31:22|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
12/12/2005 10:31:22|rosetta@home|Reason: Requested by user
12/12/2005 10:31:22|rosetta@home|Requesting 86400 seconds of new work
12/12/2005 10:31:27|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
12/12/2005 10:34:05||request_reschedule_cpus: project op
12/12/2005 10:34:07|ProteinPredictorAtHome|Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi
12/12/2005 10:34:07|ProteinPredictorAtHome|Reason: To fetch work
12/12/2005 10:34:07|ProteinPredictorAtHome|Requesting 86400 seconds of new work
12/12/2005 10:34:17|ProteinPredictorAtHome|Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded
12/12/2005 10:34:17|Predictor @ Home|Message from server: No work sent
12/12/2005 10:34:17|Predictor @ Home|Message from server: (there was work but you don't have enough disk space allocated)
12/12/2005 10:34:17|Predictor @ Home|Message from server: No disk space (YOU must free 323.8 MB before BOINC gets space). Review preferences for minimum disk free space allowed.
12/12/2005 10:34:17|Predictor @ Home|No work from project
12/12/2005 10:35:33|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
12/12/2005 10:35:33|rosetta@home|Reason: To fetch work
12/12/2005 10:35:33|rosetta@home|Requesting 86400 seconds of new work
12/12/2005 10:35:38|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
12/12/2005 10:35:38|rosetta@home|No work from project
12/12/2005 10:39:44|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
12/12/2005 10:39:44|rosetta@home|Reason: To fetch work
12/12/2005 10:39:44|rosetta@home|Requesting 86400 seconds of new work
12/12/2005 10:39:54|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
12/12/2005 10:39:55|rosetta@home|Started download of rosetta_4.80_windows_intelx86.exe
...


Supporting BOINC, a great concept !
ID: 5939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yeti
Avatar

Send message
Joined: 2 Nov 05
Posts: 45
Credit: 14,945,062
RAC: 0
Message 5942 - Posted: 12 Dec 2005, 10:28:33 UTC

Follow up:

I attached one more box; it didn't get work again !

This box has enough disk space free; the ResourceShare is configured, that Rosetta gets nearly 49%, Predictor@Home only 1%, LHC@Home nearly 49%.

12/12/2005 11:17:01|rosetta@home|Successfully attached to rosetta@home
12/12/2005 11:17:19||request_reschedule_cpus: project op
12/12/2005 11:17:22|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
12/12/2005 11:17:22|rosetta@home|Reason: Requested by user
12/12/2005 11:17:22|rosetta@home|Requesting 172800 seconds of new work
12/12/2005 11:17:27|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
12/12/2005 11:18:45||request_reschedule_cpus: project op
12/12/2005 11:18:47|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi
12/12/2005 11:18:47|LHC@home|Reason: To fetch work
12/12/2005 11:18:47|LHC@home|Requesting 172800 seconds of new work
12/12/2005 11:18:52|LHC@home|Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
12/12/2005 11:18:52|LHC@home|No work from project
12/12/2005 11:19:07||request_reschedule_cpus: project op
12/12/2005 11:19:07|ProteinPredictorAtHome|Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi
12/12/2005 11:19:07|ProteinPredictorAtHome|Reason: To fetch work
12/12/2005 11:19:07|ProteinPredictorAtHome|Requesting 172800 seconds of new work
12/12/2005 11:19:17|ProteinPredictorAtHome|Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded
12/12/2005 11:19:18|Predictor @ Home|Started download of bprion_4_68243.ini
...



Supporting BOINC, a great concept !
ID: 5942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 17 Sep 05
Posts: 161
Credit: 162,253
RAC: 0
Message 5943 - Posted: 12 Dec 2005, 11:02:50 UTC - in response to Message 5942.  
Last modified: 12 Dec 2005, 11:06:14 UTC

This box has enough disk space free; the ResourceShare is configured, that Rosetta gets nearly 49%, Predictor@Home only 1%, LHC@Home nearly 49%.


And what are the "Disk and memory usage" settings? If they are lower than what the projects need (taking into account what you might already have allocated), BOINC won't download more work as it thinks there won't be enough space.

As it says in one of the message logs you posted: "Message from server: No disk space (YOU must free 323.8 MB before BOINC gets space). Review preferences for minimum disk free space allowed." You may have (for example) 30GB free, but if for instance BOINC settings are to use no more than 1% of space, 323.8MB isn't going to fit in the space "reserved" for BOINC.


*** Join BOINC@Australia today ***
ID: 5943 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yeti
Avatar

Send message
Joined: 2 Nov 05
Posts: 45
Credit: 14,945,062
RAC: 0
Message 5945 - Posted: 12 Dec 2005, 11:06:06 UTC - in response to Message 5943.  
Last modified: 12 Dec 2005, 11:07:24 UTC

This box has enough disk space free; the ResourceShare is configured, that Rosetta gets nearly 49%, Predictor@Home only 1%, LHC@Home nearly 49%.


And what are the "Disk and memory usage" settings? If they are lower than what the projects need (taking into account what you might already have allocated), BOINC won't download more work as it thinks there won't be enough space.

As it says in one of the message logs you posted: "Review preferences for minimum disk free space allowed."


The box has round about 50 GB free; it downloaded immediatly WUs from Predictor; meanwhile it has also work from Rosetta

[Edit]The logs are from two different boxes; one had really problems with disk space, the other not ![/Edit]


Supporting BOINC, a great concept !
ID: 5945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,450
RAC: 5
Message 5946 - Posted: 12 Dec 2005, 11:12:07 UTC

The newer V5 versions of BOINC calculate disk space a bit differently, and are more 'strict' than the V4's. Generally the problem is in the "leave x% free" setting - IF there really is a problem. Sometimes simply going to that part of the preferences on the website and changing "something", and changing it right back, will cause BOINC to start working - the actual defaults when no change has been made apparently don't match the defaults displayed. It's a BOINC-level problem, it hits all the projects equally.

So... my advice is to go to the preferences page and increase the % allowed by a bit, and if that doesn't solve it, increase the GB allowed by a fraction and try again. It's asking for a bit over 300MB, but I doubt it'll actually use anywhere near that.

Looking at the log, I think it _is_ for Predictor that you'll need to do this, not Rosetta.

The message of "no work from project" from Rosetta implies none is being sent out at present, not that you don't have room for it; this generally is temporary, lasting only a few minutes, I've seen it myself, regardless of what the server status page shows as ready to send.

ID: 5946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yeti
Avatar

Send message
Joined: 2 Nov 05
Posts: 45
Credit: 14,945,062
RAC: 0
Message 5947 - Posted: 12 Dec 2005, 11:22:04 UTC - in response to Message 5946.  
Last modified: 12 Dec 2005, 11:23:00 UTC

The newer V5 versions of BOINC calculate disk space a bit differently, and are more 'strict' than the V4's. Generally the problem is in the "leave x% free" setting - IF there really is a problem. Sometimes simply going to that part of the preferences on the website and changing "something", and changing it right back, will cause BOINC to start working - the actual defaults when no change has been made apparently don't match the defaults displayed. It's a BOINC-level problem, it hits all the projects equally.

You are right, sometimes there are problems like this, but this can't be the problem here, because Predictor was succesfully downloaded and nothing from Rosetta (for the first step).
The message of "no work from project" from Rosetta implies none is being sent out at present, not that you don't have room for it; this generally is temporary, lasting only a few minutes, I've seen it myself, regardless of what the server status page shows as ready to send.

I don't have a real problem with getting no work from Rosetta; I have attached enough projects, so my clients always have something todo. But if at the end of this week, when FAD finally closes and a lot of people coming over, there will be a problem, if they are only attached to Rosetta ...





Supporting BOINC, a great concept !
ID: 5947 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,450
RAC: 5
Message 5953 - Posted: 12 Dec 2005, 11:45:02 UTC - in response to Message 5947.  

...this generally is temporary, lasting only a few minutes...

I don't have a real problem with getting no work from Rosetta; I have attached enough projects, so my clients always have something todo. But if at the end of this week, when FAD finally closes and a lot of people coming over, there will be a problem, if they are only attached to Rosetta ...


BOINC will retry, if it doesn't get work - and if it's attached to only one project, it will retry quite often. I keep my cache setting at 0.25 to 0.5 on my systems, much lower than many do, and I have _never_ run out of Rosetta work. The only reason I know I've ever gotten the message at all is that I tend to look at my Messages tab a lot, and I've seen "no work from project" followed just a few minutes later by it getting a couple more results... but there were already at least a couple before then queued up running or ready to run, so it never has come close to missing a CPU second on my machines.

When _initially_ connecting, as you were, it's noticeable. I'm not sure what the problem is, possibly something in the feeder, and someone should probably look into it - but I don't think it's anything critical.

ID: 5953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yeti
Avatar

Send message
Joined: 2 Nov 05
Posts: 45
Credit: 14,945,062
RAC: 0
Message 5955 - Posted: 12 Dec 2005, 11:51:35 UTC - in response to Message 5953.  

I'm not sure what the problem is, possibly something in the feeder, and someone should probably look into it

That's what I think !

but I don't think it's anything critical.

Maybe not at the moment, but with a growing userbase it could become critical



Supporting BOINC, a great concept !
ID: 5955 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ralic

Send message
Joined: 22 Sep 05
Posts: 16
Credit: 46,481
RAC: 0
Message 5977 - Posted: 12 Dec 2005, 14:47:16 UTC
Last modified: 12 Dec 2005, 14:50:35 UTC

Perhaps related to this thread.

Server Status shows:
Ready to send 180,708

12/12/2005 16:11:25|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
12/12/2005 16:11:25|rosetta@home|Requesting 863 seconds of work, returning 0 results
12/12/2005 16:11:32|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
12/12/2005 16:11:32|rosetta@home|No work from project
12/12/2005 16:11:33|rosetta@home|Deferring communication with project for 4 minutes and 1 seconds

.
[snip more of the same]
.
12/12/2005 16:40:12|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
12/12/2005 16:40:12|rosetta@home|Requesting 10126 seconds of work, returning 0 results
12/12/2005 16:40:15|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
12/12/2005 16:40:15|rosetta@home|No work from project
12/12/2005 16:40:16|rosetta@home|Deferring communication with project for 34 minutes and 32 seconds


Seems strange that there are many to send but no work available...
[edit]
Maybe all those ready to send's are for an application version other than rosetta 4.80 ?
[/edit]
ID: 5977 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
STE\/E

Send message
Joined: 17 Sep 05
Posts: 125
Credit: 4,103,208
RAC: 167
Message 5979 - Posted: 12 Dec 2005, 14:57:27 UTC

There does seem to be something wrong, I haven't been able to Download any New WU's to any of my PC's all morning long. All I keep getting is the No work from project Message ... ???
ID: 5979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 5982 - Posted: 12 Dec 2005, 15:06:39 UTC

Yup, started to get the No work in the past hour......
ID: 5982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 5983 - Posted: 12 Dec 2005, 15:07:23 UTC
Last modified: 12 Dec 2005, 15:11:12 UTC

There is a tread specific to the possible No work from project issue here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=648

We don't want to hijack this thread with another issue :)
ID: 5983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AnRM
Avatar

Send message
Joined: 18 Sep 05
Posts: 123
Credit: 1,355,486
RAC: 0
Message 5984 - Posted: 12 Dec 2005, 15:07:31 UTC
Last modified: 12 Dec 2005, 15:30:19 UTC

There is a problem.......we are getting intermittant 'no work' messages on all machines. The work queue seems to be in good shape.....server load problems caused a similiar symptom about a month ago. The recent load increase must be very dramatic and will only get worse as FAD closes down.
ID: 5984 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,450
RAC: 5
Message 5991 - Posted: 12 Dec 2005, 15:53:17 UTC

I just got new work... odd...

ID: 5991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AnRM
Avatar

Send message
Joined: 18 Sep 05
Posts: 123
Credit: 1,355,486
RAC: 0
Message 5993 - Posted: 12 Dec 2005, 16:14:29 UTC - in response to Message 5991.  
Last modified: 12 Dec 2005, 16:15:49 UTC

I just got new work... odd...


>Bill, new work is getting through ok.....I don't want to create the impression that we are not getting enough to process (at least for us anyway). The point is that the 'no work from project' error messages have been very unusal and are now occuring on a regular basis....Cheers, Rog.
ID: 5993 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 6002 - Posted: 12 Dec 2005, 17:05:57 UTC
Last modified: 12 Dec 2005, 17:10:05 UTC

Just got one through also.....

I have one machine that just got one after 3 hours of "No Work" and the other machine still has not got one after 2 hours solid of "No Work"....
ID: 6002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yeti
Avatar

Send message
Joined: 2 Nov 05
Posts: 45
Credit: 14,945,062
RAC: 0
Message 6005 - Posted: 12 Dec 2005, 18:02:19 UTC

I just checked my 15 boxes; most of them show "no work from Project", when they contact Rosetta :-(

I hope, that somebody of the projekt-team notices this soon; it would be worst case, if a lot of crunchers come from FAD (and old Seti ?), and then they get no work.



Supporting BOINC, a great concept !
ID: 6005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AnRM
Avatar

Send message
Joined: 18 Sep 05
Posts: 123
Credit: 1,355,486
RAC: 0
Message 6012 - Posted: 12 Dec 2005, 18:40:04 UTC
Last modified: 12 Dec 2005, 18:47:50 UTC

It's definetly getting worse......major data base purge again?? IP probs??..:(
The beauty of BOINC will make E@H happy.....
ID: 6012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 6023 - Posted: 12 Dec 2005, 19:44:40 UTC

Copied from the Homepage Technical News -

December 12, 2005
Our work unit feeder is having a tough time keeping up with all the client requests for work. A short term fix (as has been done before), is to optimize the database tables. We will be doing this later today at 3pm and also backing up the database. As stated before, we are going to expand our servers soon to deal with this issue.
ID: 6023 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
John Price

Send message
Joined: 4 Dec 05
Posts: 4
Credit: 6,142
RAC: 0
Message 6028 - Posted: 12 Dec 2005, 19:56:45 UTC - in response to Message 5942.  
Last modified: 12 Dec 2005, 19:57:34 UTC

Follow up:

I attached one more box; it didn't get work again !

This box has enough disk space free; the ResourceShare is configured, that Rosetta gets nearly 49%, Predictor@Home only 1%, LHC@Home nearly 49%.

12/12/2005 11:17:01|rosetta@home|Successfully attached to rosetta@home
12/12/2005 11:17:19||request_reschedule_cpus: project op
12/12/2005 11:17:22|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
12/12/2005 11:17:22|rosetta@home|Reason: Requested by user
12/12/2005 11:17:22|rosetta@home|Requesting 172800 seconds of new work
12/12/2005 11:17:27|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
12/12/2005 11:18:45||request_reschedule_cpus: project op
12/12/2005 11:18:47|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi
12/12/2005 11:18:47|LHC@home|Reason: To fetch work
12/12/2005 11:18:47|LHC@home|Requesting 172800 seconds of new work
12/12/2005 11:18:52|LHC@home|Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
12/12/2005 11:18:52|LHC@home|No work from project
12/12/2005 11:19:07||request_reschedule_cpus: project op
12/12/2005 11:19:07|ProteinPredictorAtHome|Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi
12/12/2005 11:19:07|ProteinPredictorAtHome|Reason: To fetch work
12/12/2005 11:19:07|ProteinPredictorAtHome|Requesting 172800 seconds of new work
12/12/2005 11:19:17|ProteinPredictorAtHome|Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded
12/12/2005 11:19:18|Predictor @ Home|Started download of bprion_4_68243.ini
...


I too am unable to get work from Rosetta.
Slightly off the thread a question to you Yeti. It is clear that you are able to do work for Predictor. I have had some 3 months of being unable to get through to the server on Predictor. I can't even get through to www.scripps.edu. Whats the secret? is there a new url or something? As I can't get thru to the website I can't enquire in the predictor message board.


ID: 6028 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Something wrong with Server-Side-Scheduler



©2024 University of Washington
https://www.bakerlab.org