Questions and Answers : Windows : all 16 of my boxes can't communicate with Rosetta anymore
Author | Message |
---|---|
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
A snippet from the log: 2006-09-06 17:48:33 [---] Starting BOINC client version 5.4.11 for windows_intelx86 2006-09-06 17:48:33 [---] libcurl/7.15.3 OpenSSL/0.9.8a zlib/1.2.3 2006-09-06 17:48:33 [---] Data directory: D:Program FilesBOINC 2006-09-06 17:48:33 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 4000+ 2006-09-06 17:48:33 [---] Memory: 510.42 MB physical, 1.22 GB virtual 2006-09-06 17:48:33 [---] Disk: 71.52 GB total, 64.99 GB free 2006-09-06 17:48:33 [rosetta@home] URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 224998; location: home; project prefs: default 2006-09-06 17:48:33 [---] General prefs: from rosetta@home (last modified 2006-03-25 11:18:10) 2006-09-06 17:48:33 [---] General prefs: no separate prefs for home; using your defaults 2006-09-06 17:48:33 [---] Local control only allowed 2006-09-06 17:48:33 [---] Listening on port 31416 2006-09-06 17:48:33 [rosetta@home] Started upload of file 1dtj__CHEAT_ABRELAX_SAVE_ALL_OUT_BARCODE__1222_8420_0_0 2006-09-06 17:48:33 [rosetta@home] Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2006-09-06 17:48:33 [rosetta@home] Reason: To fetch work 2006-09-06 17:48:33 [rosetta@home] Requesting 8640 seconds of new work 2006-09-06 17:48:41 [---] Project communication failed: attempting access to reference site 2006-09-06 17:48:41 [---] Access to reference web site failed - check network connection or proxy configuration. 2006-09-06 17:48:42 [rosetta@home] Temporarily failed upload of 1dtj__CHEAT_ABRELAX_SAVE_ALL_OUT_BARCODE__1222_8420_0_0: http error 2006-09-06 17:48:42 [rosetta@home] Backing off 1 minutes and 0 seconds on upload of file 1dtj__CHEAT_ABRELAX_SAVE_ALL_OUT_BARCODE__1222_8420_0_0 2006-09-06 17:48:43 [rosetta@home] Scheduler request failed: Unrecognized HTTP Content-Encoding 2006-09-06 17:48:43 [rosetta@home] Deferring scheduler requests for 1 minutes and 5 seconds 2006-09-06 17:49:02 [---] Project communication failed: attempting access to reference site 2006-09-06 17:49:05 [---] Access to reference site failed - check network connection or proxy configuration. 2006-09-06 17:49:06 [---] Project communication failed: attempting access to reference site 2006-09-06 17:49:11 [---] Project communication failed: attempting access to reference site 2006-09-06 17:49:14 [---] Project communication failed: attempting access to reference site 2006-09-06 17:49:16 [---] Rescheduling CPU: project reset by user 2006-09-06 17:49:16 [rosetta@home] Resetting project 2006-09-06 17:49:16 [---] Rescheduling CPU: exit_tasks 2006-09-06 17:49:16 [rosetta@home] Persistent file transfer object not found 2006-09-06 17:49:16 [rosetta@home] Persistent file transfer object not found I haven't changed my proxy settings from what I've been running for many months. /Mike |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
What do you get if you click on Start, choose Run, type in "cmd" and hit enter. Click on the dos box, and type "ping boinc.bakerlab.org" and hit enter. If it converts the domain name into an IP#, then DNS is working; and if you get ping times, then at least some communication is possible. Does tracert show any network issues between you and boinc.bakerlab.org? If you're well beyond the tech level of the first two questions, then run EtherReal, capture the first few minutes of Boinc starting up Rosetta. Posting the network messages that show the results of attempting to contact the Rosetta servers should give better clues as to the problem. Posting those results at https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1891 will get more attention. What proxy and version number are you using? Has the proxy server been rebooted? One of our clients decided to save money by not renewing their Anti Virus subscriptions for a year. They plugged an infected machine into the network, which infected all the other machines on the network, and all the infected machines managed to overload the router. You might want to verify that they haven't been compromised. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
Thanks for the quick reply BennyRop! I fired up folding on all 16 machines last night when I couldn't resolve this issue. Folding won't run real well though over my satellite ISP because the returned work result packets are much too large for the very slow uplink speeds of the satellite. [quote] What do you get if you click on Start, choose Run, type in "cmd" and hit enter. Click on the dos box, and type "ping boinc.bakerlab.org" and hit enter. If it converts the domain name into an IP#, then DNS is working; and if you get ping times, then at least some communication is possible. Yes I can ping this domain and load all of the Rosetta sites with my browser. Does tracert show any network issues between you and boinc.bakerlab.org? I haven't checked that yet but will try. If you're well beyond the tech level of the first two questions, then run EtherReal, capture the first few minutes of Boinc starting up Rosetta. Posting the network messages that show the results of attempting to contact the Rosetta servers should give better clues as to the problem. Posting those results at https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1891 will get more attention. I'll try that, too. I rebooted my router. I think the real nature of the problem is my satellite ISP has become less than impressed with me tying up too much bandwidth with 16 boxes over one proxy server and somehow they've managed to block traffic for work packets while still enabling some connections to boinc.bakerlab.org. Is this a possible scenerio? What proxy and version number are you using? Has the proxy server been rebooted? I'm just proxying all boxes to the gateway IP address of my linksys router and the satellite ISP's default gateway IP. One of our clients decided to save money by not renewing their Anti Virus subscriptions for a year. They plugged an infected machine into the network, which infected all the other machines on the network, and all the infected machines managed to overload the router. You might want to verify that they haven't been compromised. I turned off zonealarm and my A/V just to test, rebooted everything and same problem. I'll keep you posted on what happens with the other tests you mentioned to try. Thanks again. /mike |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Have you switched to 24 hour WUs, so only 16 tasks are uploaded and downloaded each day? There was a description of setting up machines so they would only communicate with the server during certain hours of the day. Perhaps you could setup your computers into 8 sets of 2 systems, with each set of 2 only allowed to communicate to the server for a 3 hour block of time each day. If you turn everything off, and then turn on the machines one at a time, let each machine upload its data and download the next WU - does the machine still have errors uploading/downloading? |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
What do you get if you click on Start, choose Run, type in "cmd" and hit enter. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
What do you get if you click on Start, choose Run, type in "cmd" and hit enter. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I have satilite ISP as well. No other "high-speed" option for me. When you use a lot of bandwidth, I think they just penalize you by slowing you down, it shouldn't mean being unable to connect. Just that your speed would be slow. Also, with Rosetta, you download a fairly large (3-5MB) WU, you crunch it for roughly your WU runtime preference (up to 24 hrs!) and then you send back a fairly small (~250K) file with the results. So, the limited upload bandwidth should not be a problem for you. I find my PC seems to lose internet access and I have to reboot. I'm thinking it's a Windows problem, because it worked GREAT until my PC crashed and I had to reload it. Perhaps I still need a BIOS update to get back where I was, not sure. Anyway, keep crunchin'! And be sure to post any problems you run across. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
BennyRop - tonight it's back up and talking to the project. I leave everything on 24/7 so I don't know what changed but I suspect it was a deal with If you think it's a matter of their complaining about your bandwidth usage.. switch to 24 hour WUs to minimize the amount of bandwidth you use. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
BennyRop - tonight it's back up and talking to the project. I leave everything on 24/7 so I don't know what changed but I suspect it was a deal with Thanks for that idea, too Benny. I don't know if it's a problem yet as I haven't heard anything from the ISP but if it happens again I'll look into the 24hr. WU's. /Mike |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
All I can figure is it is a starband satellite latency issue. Sometimes I can connect, sometimes I can't. Usually I can connect and get work maybe once a day but that is all. The units get crunched and there are no more communications for another day or so usually after I quit and restart BOINC. I have returned to folding@home and I seem to be able to run steadily over the same satellite / networked connections. Maybe Stanford has a more relaxed protocol with their server network. Someone on the Boinc support site suggested many tests, I pinged sites with options to set MTU size and had numerous time out problems with many missed packets, almost no matter what size I set the packet to. That is why I'm betting it's a satellite delay problem. For whatever reason, BOINC and rosetta hate it, while I can web browse and even download complete linux iso's with no problems. Folding@home will have to do for now but I'll keep coming back and trying all the latest BOINC clients until I can find one that works smoother with my admittedly marginal satellite setup. regards and keep crunching, /mike What do you get if you click on Start, choose Run, type in "cmd" and hit enter. |
Questions and Answers :
Windows :
all 16 of my boxes can't communicate with Rosetta anymore
©2024 University of Washington
https://www.bakerlab.org