Problems and Technical Issues with Rosetta@home

Author	Message
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 81105 - Posted: 30 Jan 2017, 16:36:49 UTC - in response to Message 81104. As should be quite apparent to the truly active participants of this project, communicating with active participants by the project is a VERY low priority in the Rosetta scheme of things. That means that issues perhaps viewed as insignificant by the project folks (or perhaps issues that they are simply not aware of) only get passing response. I believe it is an informed choice made by the project to not allocate time and resources to the 'care and feeding' of the active user community. Users get to prioritize as well, as a long time participant (going back over ten years) there have been times when maintained a daily completed work traffic generating 30 to 40 thousand credits. These days, it is more like 5 thousand credits as I shifted MY priorities to the WorldGrid project. We all make choices. And another blank day for stats ID: 81105 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,093,569 RAC: 3,986	Message 81106 - Posted: 31 Jan 2017, 3:26:27 UTC - in response to Message 81105. As should be quite apparent to the truly active participants of this project, communicating with active participants by the project is a VERY low priority in the Rosetta scheme of things. That means that issues perhaps viewed as insignificant by the project folks (or perhaps issues that they are simply not aware of) only get passing response. I believe it is an informed choice made by the project to not allocate time and resources to the 'care and feeding' of the active user community. Users get to prioritize as well, as a long time participant (going back over ten years) there have been times when maintained a daily completed work traffic generating 30 to 40 thousand credits. These days, it is more like 5 thousand credits as I shifted MY priorities to the WorldGrid project. We all make choices. And another blank day for stats Aside from that, I've wondered whether the project is subject to the communication restrictions instructed from above. Though that kind of subject may be more appropriately discussed in Café Rosetta ID: 81106 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,950,919 RAC: 3,766	Message 81108 - Posted: 1 Feb 2017, 10:03:37 UTC - in response to Message 81106. I think it's probably more likely that there's no funding for someone dedicated to the role, and everyone else has other priorities so it falls to no-one. There could obviously be lots more compute power available here but it might be that there is sufficient as-is so it works for getting the science done, regardless of how frustrating it is for users. I just found my computer sat idle with the 24 hour back-off bug. I'll add a second project, but because the server code here is so old I don't believe I can add a project as a backup - only as a low % so I'll do that. Maybe it would be useful if we maintained a sticky thread (Mod.Sense!) where we list the priorities from our point of view, so the team can see what we think needs fixing. I.e. under URGENT, we'd have the 24hr bug, or make work, and then under the next heading (Less urgent?) we'd have the server upgrade, maybe with a link to the discussion of it. If anything breaks, stick it at the top. Might that help? D ID: 81108 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 81110 - Posted: 1 Feb 2017, 14:09:57 UTC But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms? If they don't need more work done, so be it. But they should tell us I believe. ID: 81110 · Rating: 0 · rate: / Reply Quote

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 81112 - Posted: 1 Feb 2017, 15:49:12 UTC - in response to Message 81110. That raises an interesting question. Maybe a driving reason for the project to be essentially NO attention to the care and feeding of its most active participants reflects an internal decision that they already have too much work to process internally. That, by actively ignoring the user community they are hoping to reduce the number of work units processed. I can confirm that approach has worked just fine for me.... But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms? If they don't need more work done, so be it. But they should tell us I believe. ID: 81112 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 81113 - Posted: 1 Feb 2017, 15:56:48 UTC - in response to Message 81110. Last modified: 1 Feb 2017, 15:59:26 UTC But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms? If they don't need more work done, so be it. But they should tell us I believe. Actually, over the weekend, I did send an EMail to DK and Dr. Baker, with links to these msg boards, summarizing some suggested "todos" that would eliminate some of the annoyances I believe can be easily addressed. I cited: 24hr backoff, short (2 day) deadlines, the mention of "Android" in the message when your scheduler request does not return work, and the suggestion that the logic that detects whether or not to run the next model take the deadline in to account in addition to the runtime preference. The server upgrade we know is coming for hardware, and on their list for BOINC server code. Rosetta Moderator: Mod.Sense ID: 81113 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 81114 - Posted: 1 Feb 2017, 17:01:52 UTC - in response to Message 81113. Actually, over the weekend, I did send an EMail to DK and Dr. Baker, with links to these msg boards, summarizing some suggested "todos" that would eliminate some of the annoyances I believe can be easily addressed. Thanks very much. It will be interesting to see their response. I sometimes think like BarryAZ that it is just an indirect way of managing their workload. It works for a while, but I don't think the long-term prospects are good. ID: 81114 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 81116 - Posted: 1 Feb 2017, 19:26:12 UTC In response to Mod.Sense's feedback/recommendations, I increased the short 2 day deadline to 3 days. I don't think I can increase the deadline longer for these high priority jobs. The deadlines are short for an important reason since there are time constraints for these jobs (weekly CAMEO benchmarks for Robetta). I also increased the standard deadline from 5-7 days to 2 weeks which should help. If anyone knows how to change the 24 hour backoff, please chime in. I'm not sure if it's server or client logic and configurable. Also, if anyone knows how to fix the "Android" alert, please let us know. I'll of course look into this also. The last issue point I think can be coded into our application and I'll put it on the list of things to do for the next update. Also, if you are not getting work, it is most likely because there isn't any to issue at the time. Our demand comes in waves as projects progress. However, our public structure prediction server, Robetta, usually provides continual work. It was down for a few days last week for updates though. ID: 81116 · Rating: 0 · rate: / Reply Quote

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 81123 - Posted: 2 Feb 2017, 7:40:23 UTC - in response to Message 81116. David, thanks much for jumping in here -- the air had been getting rather thin. My sense is that the 24 hour backoff is a server specific function -- as in the multiple other projects I work with their is a progressive backoff typically starting at either 5 minutes or 1 hour and progressing with multiple non-responsiveness up to as much as 3 to 5 hours and then recycling to a 1 hour back off. It is only with Rosetta I have seen that. As to the android no work report -- again that is likely a project specific configuration. Other projects provide a 'no work for your applications' message but with Rosetta it seems specific to android work -- I would think that could be configured out. I don't do code though... so its all speculative. The other issue -- for which your post is seriously appreciated, is the sense of the active user community being a bit 'unloved' by a lack of periodic responses from folks such as yourself. I'm sure you have more work than time, but even a weekly "we're here and watching" message might reduce that sense. Thanks again for you message. In response to Mod.Sense's feedback/recommendations, I increased the short 2 day deadline to 3 days. I don't think I can increase the deadline longer for these high priority jobs. The deadlines are short for an important reason since there are time constraints for these jobs (weekly CAMEO benchmarks for Robetta). I also increased the standard deadline from 5-7 days to 2 weeks which should help. If anyone knows how to change the 24 hour backoff, please chime in. I'm not sure if it's server or client logic and configurable. Also, if anyone knows how to fix the "Android" alert, please let us know. I'll of course look into this also. The last issue point I think can be coded into our application and I'll put it on the list of things to do for the next update. Also, if you are not getting work, it is most likely because there isn't any to issue at the time. Our demand comes in waves as projects progress. However, our public structure prediction server, Robetta, usually provides continual work. It was down for a few days last week for updates though. ID: 81123 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 81129 - Posted: 2 Feb 2017, 19:37:04 UTC I found where the relevant parameters are set in the scheduling code. I'm open to suggestions and feedback for more preferable values as long as it doesn't cause too much load on our servers. // various delay params. // Any of these could be moved into SCHED_CONFIG, if projects need control. #define DELAY_MISSING_KEY 3600 // account key missing or invalid #define DELAY_UNACCEPTABLE_OS 360024 // Darwin 5.x or 6.x (E@h only) #define DELAY_BAD_CLIENT_VERSION 360024 // client version < config.min_core_client_version #define DELAY_NO_WORK_SKIP 0 // no work, config.nowork_skip is set // Rely on the client's exponential backoff in this case #define DELAY_PLATFORM_UNSUPPORTED 360024 // platform not in our DB #define DELAY_DISK_SPACE 3600 // too little disk space or prefs (locality scheduling) #define DELAY_DELETE_FILE 36004 // wait for client to delete a file (locality scheduling) #define DELAY_ANONYMOUS 36004 // anonymous platform client doesn't have version #define DELAY_NO_WORK_TEMP 0 // client asked for work but we didn't send any, // because of a reason that could be fixed by user // (e.g. prefs, or run BOINC more) // Rely on the client's exponential backoff in this case #define DELAY_NO_WORK_PERM 360024 // client asked for work but we didn't send any, // because of a reason not easily changed // (like wrong kind of computer) #define DELAY_NO_WORK_CACHE 0 // client asked for work but we didn't send any, // because user had too many results in cache. // Rely on client's exponential backoff #define DELAY_MAX (2*86400) // maximum delay request ID: 81129 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 81130 - Posted: 2 Feb 2017, 22:00:45 UTC I believe this one is the behavior people are seeing elsewhere and expecting: #define DELAY_NO_WORK_SKIP 0 // no work, config.nowork_skip is set // Rely on the client's exponential backoff in this case Rosetta Moderator: Mod.Sense ID: 81130 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,093,569 RAC: 3,986	Message 81135 - Posted: 3 Feb 2017, 13:17:21 UTC - in response to Message 81129. I found where the relevant parameters are set in the scheduling code. I'm open to suggestions and feedback for more preferable values as long as it doesn't cause too much load on our servers. I'd suggest 1 hour as a reasonable compromise between server load and our buffer sizes ID: 81135 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 11 Jan 16 Posts: 35 Credit: 1,437,503 RAC: 0	Message 81143 - Posted: 6 Feb 2017, 14:19:13 UTC For several days, I've been back to Rosetta with one of my PCs, and have crunched 14 tasks since then. Today, when BOINC was trying to download the next task, I got the notice "06.02.2017 14:49:08 \| rosetta@home \| Rosetta Mini for Android is not available for your type of computer." How come? I am not trying to crunch Rosetta Mini for Android on my Windows PC. ID: 81143 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 11 Jan 16 Posts: 35 Credit: 1,437,503 RAC: 0	Message 81144 - Posted: 6 Feb 2017, 16:01:06 UTC - in response to Message 81143. "06.02.2017 14:49:08 \| rosetta@home \| Rosetta Mini for Android is not available for your type of computer." Just now, after a while, a new task was downloaded :-) ID: 81144 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 11 Jan 16 Posts: 35 Credit: 1,437,503 RAC: 0	Message 81145 - Posted: 6 Feb 2017, 16:44:08 UTC Unfortunaltely, now again the BOINC messanger shows the meassage "6.02.2017 17:38:42 \| rosetta@home \| Rosetta Mini for Android is not available for your type of Computer" when trying to download a new task on my Windows system. Why so? What's going wrong? ID: 81145 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2151 Credit: 12,881,353 RAC: 3,926	Message 81146 - Posted: 6 Feb 2017, 17:05:03 UTC - in response to Message 81145. "6.02.2017 17:38:42 \| rosetta@home \| Rosetta Mini for Android is not available for your type of Computer" when trying to download a new task on my Windows system. Why so? What's going wrong? A long time ago, in a galaxy far.... ID: 81146 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 81147 - Posted: 6 Feb 2017, 18:19:49 UTC - in response to Message 81145. Last modified: 6 Feb 2017, 18:26:04 UTC Unfortunaltely, now again the BOINC messanger shows the meassage "6.02.2017 17:38:42 \| rosetta@home \| Rosetta Mini for Android is not available for your type of Computer" Erich, This is a known problem. Rosetta Mini for Android problem Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything. ID: 81147 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 81148 - Posted: 6 Feb 2017, 20:24:53 UTC - in response to Message 81147. Unfortunaltely, now again the BOINC messanger shows the meassage "6.02.2017 17:38:42 \| rosetta@home \| Rosetta Mini for Android is not available for your type of Computer" Erich, This is a known problem. Rosetta Mini for Android problem Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything. I think this alert only occurs when there are no non-android work units available. It's not a serious issue. ID: 81148 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 81149 - Posted: 6 Feb 2017, 20:33:25 UTC - in response to Message 81148. Unfortunaltely, now again the BOINC messanger shows the meassage "6.02.2017 17:38:42 \| rosetta@home \| Rosetta Mini for Android is not available for your type of Computer" Erich, This is a known problem. Rosetta Mini for Android problem Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything. I think this alert only occurs when there are no non-android work units available. It's not a serious issue. We know it is not a serious issue, but EVERYONE that encounters this message immediately feels things are not running properly (except, I suppose, an Android user). That is why the request was made to improve the wording of the message. Rosetta Moderator: Mod.Sense ID: 81149 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,093,569 RAC: 3,986	Message 81151 - Posted: 7 Feb 2017, 5:17:50 UTC - in response to Message 81148. Unfortunately, now again the BOINC messanger shows the message "6.02.2017 17:38:42 \| rosetta@home \| Rosetta Mini for Android is not available for your type of Computer" Erich, This is a known problem. Rosetta Mini for Android problem Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything. I think this alert only occurs when there are no non-android work units available. It's not a serious issue. But the 24hr backoff that results directly from it <is> a serious issue for <users> if not for the Rosetta project itself. Our buffers run out and we either run nothing or many tasks get downloaded from backup projects if we have one set. I can't even believe you said that tbh. It came up on 3 of my devices today and 1 of my team-members - all coming up with the 24hr backoff message. 2 of those 4 are attended, 2 aren't. If the unattended ones are unlucky they'll re-poll after 24hrs and maybe find there aren't tasks again, which'll mean they get another 24hr backoff and run out of Rosetta work. And when I get to them later in the week I'll spend a few days forcing a heap of non-preferred project's tasks to run in order to clear them down so there's space to get Rosetta tasks back into their buffer. Then when I get back here a few days later I may find the same here and do the same. You make think it's not serious. I think it's a circus that's been driving me crazy for the last few months without a break. So if you could see your way clear to changing that back-off to 1 hour instead of 24 hours (sounds like two minutes work to me) - because it hasn't happened yet - I'd kind of appreciate it, if that's not too much to ask. And if you could avoid saying all this manual intervention I'm having to do week after week after week after week "isn't a serious issue" ever again in your entire lifetime, that would be kind of neat too. No rush, obviously... ID: 81151 · Rating: 0 · rate: / Reply Quote