Message boards : Number crunching : Report problems with Rosetta version 5.36
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
[B^S] Gamma^Ray Send message Joined: 20 Apr 06 Posts: 12 Credit: 21,284 RAC: 0 |
Wow did I get burned by this W/U. It ran just over 3 cpu hours which is an hour over my set cpu time, I noticed the screen saver was frozen at I believe it was 1.59 percent done. I watched it sit there for around 20 minutes without anything updating except the cpu time. When I watched the manager, The To Completion was continuly rising and I believe was up to 6 hours at that point. I first then tried to suspend, Then restart it which didn't change the results any as it was still stuck at the same spot. It did have excessive red bands hanging off the main structure as been reported already, Although with this one, It was attached to the backbone structure and hanging in all the way down past the window box. So at last resort, I paused the WU, Exited the manager completly, THen restarted the manager and resumed the WU. This time, It completly started from beginning as if it never ran it at all, Thus I aborted it. So if you look at the wu's details it shows I only ran it for 450, But thats the second run. The first 3 hours run is now history, And not listed. :( https://boinc.bakerlab.org/rosetta/result.php?resultid=45567927 Workunit 38766177 name FRA_2rio_154E_hom001_1_2rio_1_2bdwA_IGNORE_THE_REST_36_1305_21 Computer ID 285870 <core_client_version>5.2.13</core_client_version> XP Pro-AMD 3000+ G^R |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 2,272 |
> The Screensaver problem started back with 5.32, first in Ralph then in Rosetta. It has caused my computer to freeze to the point that i had to reinstall Boinc and lost data due to current version of Boinc wiping out existing version and installing as if a new machine. I have had the screensaver disabled for nearly 2 weeks now and have not had any lock ups or freezing problems, with all work units going ok, this includes the Ralph ones. I have posted this before but the problem remains. The problem with long WU time, low decoy count and low credit given, that cropped up in 5.34 looks like an occassional 5.36 is doing the same. The following WU https://boinc.bakerlab.org/rosetta/workunit.php?wuid=39776702 it ran for 29517.153582 seconds (preference = 21600), generated 19 decoys off 13 (nstruct) times for 27.8752 cobblestones. This is equal to 3.4 c/h, which is about half what it should be. |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
Clicking through my team-mates' computer pages it turned out that a quarter of the computers (6 of 24) are showing errors more or less on a regular basis. Below I linked to the "results for computer" pages with two or more errors. These errors fall into two groups: about half are access violations (exit code 0xc0000005) and the other half are "validate errors" giving the message "Rosetta score is stuck or going too long. Watchdog is ending the run!" While some of the errors occured with version 5.34, it doesn't look like the error rate declined with the most recent app. version 5.36. I included the type of error (AV for access violation and S for "stuck or going too long") and also whether the errors occured in the most recent version with the links: 298712 (AV, S), 287935 (5.36, AV, S), 282932 (5.36, AV), 301240 (5.36, AV), 287942 (AV, S), 289776 (5.36, S) I am a bit concerned that this relatively high error rate will discourage people from crunching. So, I guess it would be good if these problems were being looked into. Thanks, -H. Team betterhumans.com - discuss and celebrate the future - hoelder1in.org |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
Are your team members running with BOINC screensaver enabled? My error rate has disappeared since I switched screensaver to blank. I wonder if RALPH@Home users are running with screensaver on. This might explain why a number of bugs don't seem to be detected while the next updates get tested. |
Buckley Send message Joined: 7 Aug 06 Posts: 1 Credit: 45,505 RAC: 0 |
I'm another one with problems. Normally Rosetta locks up at least once a day. Normally it happens when a unit is completed and the system tries to send the results to Rosetta. The other projects, Seti & Einstein continue to work and seem fine. |
Keith Jillings Send message Joined: 26 Sep 06 Posts: 7 Credit: 536,631 RAC: 0 |
I thought it was just me, till I saw this thread. My computer locks up daily now if I run Rosetta - I've turned Rosetta off until it's fixed. It was fine until a few weeks ago. I get the same thing every time. Machine frozen, won't respond to keypresses or mouse clicks; no mouse icon on the screen, and a message from ZoneAlarm telling me that Rosetta_5.36_windows_intelx86.exe is trying to access the Internet. I have to restart it to get it to work again. Every other programme that's come up against ZoneAlarm doesn't lock up the machine: the others just wait for me to click "Allow" and all is well. Something odd in Rosetta_blahblah.exe, I assume. I've subscribed to this thread so that I can read when it's been fixed and I can start crunching Rosetta again. |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 2,272 |
>> @Keith Jillings, your problem is, I believe with the Boinc Screensaver. You will be able to keep crunching as long as you turn the Boinc screensaver off. Other screensavers work ok but not the Boinc one (Only Rosetta and Ralph are affected by this problem). I noticed this back with 5.32 then 5.34 and reported it more than once but it is still a problem. I have had no Rosetta problems since turning the Boinc screensaver off. Give it a whirl and see if it works. |
Keith Jillings Send message Joined: 26 Sep 06 Posts: 7 Credit: 536,631 RAC: 0 |
Thanks - BOINC screensaver duly turned off, and Rosetta back on. I'll see what happens. SETI is turned off at the moment, too - that's failing in something like 50% of work units, but a different problem with a different solution, I'm sure. |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
Thanks Conan. Actually I've been running with it disabled for four days and no problems. This goes back to the question of how many on RALPH are doing the same. This may explain why this bug hasn't been fixed yet. You gotta break it to fix it. If a majority of the RALPH users are indeed running with screensaver off then that would explain a lot. I'll hold steady and take a peak every so often. |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
Are your team members running with BOINC screensaver enabled?I haven't yet had enough feedback from the team to say for sure whether all of this is a screensaver issue - but it definitely is a possibility (I believe the errors first appeared in version 5.32 which updated the graphics) . Thanks for pointing this out. Team betterhumans.com - discuss and celebrate the future - hoelder1in.org |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 704,566 RAC: 347 |
I'm running both Ralph and Rosetta with the screensaver ON. I agree that if we take the easy way out and just turn it off, we will have no problems, but they will never fix it. I'm reporting errors both in Ralph and Rosetta. Here's some more errors, BTW, but I can't say if they are due to the screensaver: resultid=45561965 resultid=45523378 resultid=45492781 Crunch on! |
RichardJ Send message Joined: 19 Mar 06 Posts: 8 Credit: 73,014 RAC: 0 |
I am getting the following messages: 07/11/2006 10:20:36|rosetta@home|Message from server: Your computer has only 234340352 bytes of memory; workunit requires 265659648 more bytes 07/11/2006 10:20:36|rosetta@home|Message from server: No work sent 07/11/2006 10:20:36|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory) 07/11/2006 10:20:36|rosetta@home|No work from project Is there any way to increase the memory? My operating system is Windows XP. Ihave been running Rosetta continuously for over 6 months without this peoblem ocurring before. |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 2,272 |
>> @ Rosetta/Ralph Project Team, There is a problem with the Rosetta and Ralph projects that relates to the Boinc Screensaver, that has been happening since Ralph release 5.28 on or about the 6/10/06. The problems that were occuring then and reported in Ralph thread "http://ralph.bakerlab.org/forum_thread.php?id=255", are still occuring. Problems such as screens locking up and not responding, workunits needing to be killed by Taskmanager to get access back to the computer (during this scenario the processor is doing very little and so is the computer, the workunit promptly then often errors out with the 'error 161' error code), workunits going so long that the Boinc watchdog has to kill the job with error 'workunit stuck', Windows runtime errors sometimes develop (and worse case for me needed a computer reformat and rebuild (plus $200+ computer Tech visit) to get working again). Ralph releases 5.28 (start of the problems), through to current 5.38 all have the problem. Rosetta releases 5.32 through to current 5.36 all have the problem. Numerous reports have been made in a number of threads. The only way I have been able to continue working with Ralph and Rosetta has been to effectively turn off the Boinc Screensaver and use a default one on Windows, not as pretty or informative but the computer keeps working and I have had no more lockups, lost workunits or errors caused by this problem. It is only a stop gap 'fix' as it also stops me seeing the screensaver for other projects as well (I run between 2 and 5 projects on my Windows machines and they are the only ones having the problem). My 5 other computers (both Linux and XP), do not have the problem as they do not have graphics enabled (Linux does not have graphics at all and Boinc is installed as a service on the XP machines). By a lot of us turning off the graphics so we can keep working it masks the problems that are still there (as pointed out by fellow tester 'genes'). We need this problem fixed as all the work you are putting into the graphics only to have people turn them off to keep working, really wastes a lot of your time and resources. This covers most of the reasons for this post (it has taken me over an hour as I accidently wiped all typing twice when switched screens, need to save as I type, repeat need to save as I type). Thanks for your time. Keep smiling it makes others wonder what you have been up to. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I got this error last night after 38 minutes i was not using graphics it was not preempted no idea. https://boinc.bakerlab.org/rosetta/result.php?resultid=45855465 |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Hi, there are many debugging options we can use and one target at screen savers https://boinc.bakerlab.org/forum_thread.php?id=2550 I posted about it here, maybe some of the people having screensaver troubles could enable some of the debugging info then post what it says in the message tab related to it ? Team mauisun.org |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 2,272 |
Hi, there are many debugging options we can use and one target at screen savers >> Thanks FluffyChicken, as this has been going on for over a month now and no cure insight, I will take your advice and try and debug the thing myself. I followed your link and have created the 'cc_config.xml' file, placing it in the Boinc folder. I hope I have created it correctly, I did not include any options only the flags 'guirpc_debug' and 'scrsave_debug'. I included the guirpc as I was not sure what it did and gui is graphical user interface is it not? I am having graphic problems so included the debug feature. I have enabled Boinc screensaver again on one machine and will wait and see what I find. |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 704,566 RAC: 347 |
Here's one I just had fail due to the screensaver: resultid=46126708 The machine is a dual Xeon with HT, so 4 processors, and BOINC is running 4 projects at a time. I just switched to Boinc CC version 5.7.2, but that had no effect on the behavior of Rosetta, it did the same things under 5.4.11. Here's how it went, because I was exercising at the time and I saw it happen: Boinc went into screensaver mode, and Seti was displayed. After 10 minutes the CC changed the screensaver to Rosetta. Rosetta was initially running, and the graphics were changing. Sometime during its 10 minute slice, it froze (the cpu time counter on the graphics stopped updating) while in the "relax" phase. At the end of the slice, the graphics changed to QMC, no problem (but Rosetta was already dead). Then CDPN, and Seti again, then it was Rosetta's turn. The Seti graphics just stopped updating but remained on the screen, and the taskbar appeared. I could see the Rosetta app on the taskbar, and I could move the mouse onto the taskbar, but no programs responded. Ctrl-alt-del got me the task manager, and I killed the Rosetta app. The screen came back to life, and everything worked normally after that. The Rosetta WU showed as a Computation error in the Boinc manager. I manually reported it a few minutes ago. I'll look at the debugging options mentioned in the last post to see what I can do to help. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Hi, there are many debugging options we can use and one target at screen savers Don't do the GUIRPC one otherwise you'll have a never ending list updated very very fast. I would stick with the screensaver one. Just in case the cc_config.xml file should be for screensaver <cc_config> <log_flags> <scrsave_debug>1</scrsave_debug> </log_flags> </cc_config> Also attach to Ralph@Home as we are up to R@H 5.40 there trying to fix some error code. http://ralph.bakerlab.org You will know if the logging is working as it'll be logging from the beginning 08/11/2006 15:22:51||[scrsave_debug] ACTIVE_TASK::check_graphics_mode_ack(): got graphics ack <mode_hide_graphics/> for 1dcj__ETABLE_TEST_ABRELAX_rhh13sm6__1470_193_0, previous mode <mode_unsupported/> Also example of mem_use debug, updated every 10 seconds, though 08/11/2006 15:22:59|ralph@home|[mem_usage_debug] 1dcj__ETABLE_TEST_ABRELAX_rhh13sm6__1470_193_0: RAM 28.30MB, page 54.39MB, 710.91 page faults/sec, user CPU 8.903, kernel CPU 0.110 Team mauisun.org |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
This WU failed when I did a file->exit of BOINC prior to rebooting the PC. I have a 24hr preference, and it only ran 15hrs, and ended upon my reboot, so, I'm pretty sure this CAUSED it to end. This is a 1ogw__BOINC_POSE_ABRELAX_NEWRELAXFLAGS__1341_6529_0 WU. Messages just shows the WU resuming after starting BOINC, and then 80 seconds later says computation is finished. It shows a successful outcome, but it shouldn't have ended when it did. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi Conan: Thanks for keeping us posted on this -- your problems (including the reformat) sound bad, and we're really glad that you're sticking with Rosetta@home and ralph and helping us debug. Based on your first reports with 5.28, Chu and I thought your problems might stem from one of the new kinds of workunits that we are testing. Your report that things run smoothly with the screensaver off tells us that this is probably not the case. So it seems to be a graphics-induced problem. We were able to trap such a problem in 5.28-5.30 and fix it, but apparently the app still isn't working for you. Please keep us posted on whether Fluffychicken's fix helps you. Otherwise, we'll need to track down what is particularly wrong with your system (our overall error rate from Windows machines remains as low as our pre 5.28 applications). >> @ Rosetta/Ralph Project Team, |
Message boards :
Number crunching :
Report problems with Rosetta version 5.36
©2024 University of Washington
https://www.bakerlab.org