Message boards : Number crunching : WU will not finish upload of 883 KB result
Author | Message |
---|---|
David Ball Send message Joined: 25 Nov 05 Posts: 25 Credit: 1,439,333 RAC: 0 |
I have a workunit on a linux BOINC 5.4.9 box that has been trying to upload a large Rosetta result for a couple of days. Each time, it starts at zero percent, jumps to the percent where it timed out before, maybe uploads a few more KB and then goes into waiting to retry communications. Right now, it's just over 527KB of the 883 KB result. Each try, it seems to jump from 0% to 0.29% (sometimes the 0.29% is skipped or likely happens so fast it doesn't display) to the percentage where it left off last upload attempt and gets a few more KB through. Does the server have a problem with results over a certain size? I don't think I've seen one this large before for Rosetta. The machine is set to run a WU for 24 hours since we're getting some complex WU's that take a long time per model. No other projects on that machine are having problems. DOC_1QFU_pose_u_pert_with_bbmin_1282_1198_0 ResultID 42657928 The result seems to have finished OK, with a runtime of 23:49:13 , progress of 100% , and status of uploading. It's the large upload that is having problems. The machine is also running CPDN, Einstein, Seti, and Docking@home Alpha with no problems uploading or downloading on the others. It spends just over 44% of the time on Rosetta. The machine is a Socket A sempron 2500+ with 1 GB Ram and 1 GB swap space, running FC3, and is set to "Leave applications in memory while suspended". It runs 24 hours a day and rarely runs anything but BOINC since I haven't started the project I plan to develop on it. It's running standard BOINC Linux client 5.4.9 and has been uploading and downloading to other projects fine during the 2 days it's been trying to upload the Rosetta result. On Rosetta, that machine has a total credit of 9,868.54 and a RAC of 68.99 , so it's been running Rosetta, and the other projects, for several months. That particular machine is also past the 40 year milestone for the current CPDN 160 year simulation. Thanks, -- David Have you read a good Science Fiction book lately? |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
I have a workunit on a linux BOINC 5.4.9 box that has been trying to upload a large Rosetta result for a couple of days.... Each try, it seems to jump from 0% to 0.29% (sometimes the 0.29% is skipped or likely happens so fast it doesn't display) to the percentage where it left off last upload attempt and gets a few more KB through. The jumping is normal. The client starts off assuming the server does not have any of the file (ie each transfer is a whole new experience at your end), but the server comes back with a message to say "I've already got nn Mb" (so the receiving end remembers the outcome of the previous attempts.) Don't know for sure on the 0.29%, but obviously this would be a bigger number if the WU was smaller, and different for every upload, so the effect may well be commonplace. One guess is that the client sends the first block out before it is told the server already has it (does 0.29% correspond to a later jump in the size?). This would make sense as most transfers will be starting from zero, and it would save a round trip delay between you and bakerlab. I would hope that if there is a max size of file / max transfer that the transfer algorithm would stop it immediately on retry, whereas it clearly is responding with the file size. Have you left it trying for 24hrs on its automatic backoff? From here (UK) the net to bakerlab is often very slow - especially 1200-1800 UT when the US is starting the day. Best time to try to eliminate net issues seems to be 0600 - 1000 UT. You may find it just whizzes out eventually, probably just after you leave the room. edit- add: That particular machine is also past the 40 year milestone for the current CPDN 160 year simulation. past as in just past (and might still be uploading gigabytes to CPDN)? Or past as in well past and therefore we know your client can push out big files? Do you know how this Rosetta file compares with the bigger CPDN files you have uploaded? River~~ |
David Ball Send message Joined: 25 Nov 05 Posts: 25 Credit: 1,439,333 RAC: 0 |
The jumping is normal. The client starts off assuming the server does not have any of the file (ie each transfer is a whole new experience at your end), but the server comes back with a message to say "I've already got nn Mb" (so the receiving end remembers the outcome of the previous attempts.) It's been doing retries for more than 24 hours. On the transfer tab, the cumulative total time it's spent trying to upload this workunit is about 2.5 hours. Since the backoffs were reaching the 3+ hour range, I've been hitting the retry button about every thirty minutes when I'm in the room. It's now up to about 680KB of 883KB. The next Rosetta WU is almost done and it's going to be interesting to see if it has the same problem or just uploads quickly like they have in the past. Usually, either the server is down or the upload goes in one try. This one must be in the 30 - 60 tries range. BTW, when I hit the update button for Rosetta on the manager to update the points total, it updates within a few seconds. I haven't noticed any problems with Rosetta on my windows machine on the same link, but I don't recall if it's tried to upload a WU this weekend. That particular machine is also past the 40 year milestone for the current CPDN 160 year simulation. I think it's been close to a month since it uploaded the 40 year milestone. It's done several small trickle ups since then. BTW, the next WU just finished and it took about 5 retries to upload it's 133KB result. The 883KB result is still getting a few more KB each time it tries. I'm retrying it manually about every 20 - 30 minutes since it has reached the point of trying 13 hour backoffs. The downloads for the two WU to replace these were very fast and included some files over 1 MB in size. The downloads experienced no retries. It's just the uploads that are having a problem and they show "http error" when they fail and backoff for another try. Oh well, it's got another day before the deadline so it should eventually finish uploading. BTW, I'm in the central time zone in the US. Thanks, -- David Have you read a good Science Fiction book lately? |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
The jumping is normal. The client starts off assuming the server does not have any of the file (ie each transfer is a whole new experience at your end), but the server comes back with a message to say "I've already got nn Mb" (so the receiving end remembers the outcome of the previous attempts.) Are you on dial-up and/or using a proxy. I had this happen before (quite some time ago now) on dial-up through a proxy server. Only solution was to abort the upload and forget about it. You could try installing BOINC 5.6.5 and see if that works any better. It has a newer 'file transfer' part with some other tweaks. Team mauisun.org |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Well i just uploaded a 330KB result file while downloading 2 Seti W.U.s at the same time without a problem, i'm on 512/128 dsl service, Not that helps you. You could just have a local network problem. |
Message boards :
Number crunching :
WU will not finish upload of 883 KB result
©2024 University of Washington
https://www.bakerlab.org