Message boards : Number crunching : Rosetta Beta 6.00
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,328,464 RAC: 6,028 |
I have been a particiapant in rosetta@home for 8 years, and only rarely do my allocated tasks fail due to computation errors. Yet lately, all but one of about 20 of my tasks on beta 6.03 app, with the 7hal prefix to the task name have led to a 'computation error' message. Sometimes within a few moments of starting, but much more frequently, many times in excess of the original estimated 'remaining time. When I first noticed the problem I got lots of failures (all the beta tasks I received), but they were mostly short ones. As time passed, I got more and more long ones failing. Still I got all failures. Right now I have only Rosetta 4.20 tasks that work just fine. |
Jeff Send message Joined: 24 Jan 15 Posts: 4 Credit: 1,339,258 RAC: 785 |
Thanks for that, Jean-David. I'll abort any 7hal tasks that download, till I learn the problem has ben sorted. |
mmonnin Send message Joined: 2 Jun 16 Posts: 57 Credit: 23,165,110 RAC: 57,119 |
Thanks for that, Jean-David. I'll abort any 7hal tasks that download, till I learn the problem has ben sorted. Just abort the ones that never checkpoint. They either fail immediately, never checkpoint/compute error or complete/ok. Mine would checkpoint within 11-12 min if they did. Not sure if this changes based on set run time in preferences. Mine were set for 12 hours. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,488,078 RAC: 12,215 |
I'm crunching, now, some "Hb_zero_test_7hal" Maybe they decide to test a new version of these wus family (7hal) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,488,078 RAC: 12,215 |
I'm crunching, now, some "Hb_zero_test_7hal" All ok these wus. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2112 Credit: 41,044,764 RAC: 21,216 |
I'm crunching, now, some "Hb_zero_test_7hal" I didn't notice those - interesting. No sign of corrected 7hal tasks coming through yet, but I did grab some more rb tasks today |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2112 Credit: 41,044,764 RAC: 21,216 |
I'm crunching, now, some "Hb_zero_test_7hal" I spoke 12hrs too soon. A fair few Beta 6.03 tasks have come down this morning with the name 7mer_run_af2_hal |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2112 Credit: 41,044,764 RAC: 21,216 |
I'm crunching, now, some "Hb_zero_test_7hal" 24hrs later, a lot of Rosetta Beta 6.03 tasks have run, completed and credited with no errors and I'm now starting 8mer_run_af2_hal tasks as well which have started fine too Looks like it was tasks, not the app. Good news |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,488,078 RAC: 12,215 |
Maybe, in the future, we will see a new version of the app |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 257 Credit: 483,503 RAC: 325 |
0 should mean zero. -1 should mean no limit. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
0 should mean zero.I agree, either that or a dedicated tags in the app_config.xml file, one to disable the entire app and one to disable specific app version: <app_config> [<app> <name>Application_Name</name> <max_concurrent>1</max_concurrent> [<report_results_immediately/>] [<fraction_done_exact/>] [<disable_app/>] <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.4</cpu_usage> </gpu_versions> </app>] ... [<app_version> <app_name>Application_Name</app_name> [<plan_class>mt</plan_class>] [<avg_ncpus>x</avg_ncpus>] [<ngpus>x</ngpus>] [<cmdline>--nthreads 7</cmdline>] [<disable_app_version/>] </app_version>] ... [<project_max_concurrent>N</project_max_concurrent>] [<report_results_immediately/>] </app_config> This would help also in many other cases, like for example to disable the CUDA app on not ancient Nvidia cards on Moo! or in general disable inefficient or problematic app versions without that the servers retest them every now and than. . |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,488,078 RAC: 12,215 |
In the application page there is the new version, 6.04 Bugfix? New functionalities? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,488,078 RAC: 12,215 |
In the application page there is the new version, 6.04 And, for Linux, a new version (6.05) at the end of October. As usual, no info. |
JohnDK Send message Joined: 6 Apr 20 Posts: 33 Credit: 2,390,240 RAC: 0 |
Out of the 13 WUs I got, 11 of the had computation error within 30 secs. The other 2 have so far run for 2 hours. <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> Forkert funktion. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.04_windows_x86_64.exe @07aaNewf_af2_7aa_hal_9.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1920894 Using database: database_0f7f01a1b07database ERROR: Unable to find desired residue 'LEU:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'LEU' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT' ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980 BOINC:: Error reading and gzipping output datafile: default.out 19:41:27 (11480): called boinc_finish(1) </stderr_txt> ]]> |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,040,592 RAC: 14,271 |
Out of the 13 WUs I got, 11 of the had computation error within 30 secs. The other 2 have so far run for 2 hours. Yes, I've just come in to report the same error. Running Ubuntu 22.04.3 LTS with Boinc 7.20.5 the tasks are Beta v6.05 with a name starting 08aaNewf_af2_8aa_dif_8 I've reset the project to pull down a new set of master files but it's a hard error. |
Clint Send message Joined: 1 Oct 10 Posts: 4 Credit: 5,044,833 RAC: 393 |
38 of 41 jobs error out last night for many different reasons. ERROR: Unable to find desired residue 'DVAL:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DVAL' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT' ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980 BOINC:: Error reading and gzipping output datafile: default.out ERROR: Error in simple_cycpep_predict app: The N-methylation position indices must be within the pose! ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2279 BOINC:: Error reading and gzipping output datafile: default.out ERROR: Unable to find desired residue 'PHE:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'PHE' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT' ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980 BOINC:: Error reading and gzipping output datafile: default.out ERROR: Unable to find desired residue 'DLEU:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DLEU' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT' ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980 BOINC:: Error reading and gzipping output datafile: default.out ERROR: Unable to find desired residue 'DPHE:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DPHE' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT' ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980 BOINC:: Error reading and gzipping output datafile: default.out |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
Had 3 re-sends, all errored with variants on "ERROR: Unable to find desired residue...", as did originals. Paul. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,488,078 RAC: 12,215 |
38 of 41 jobs error out last night for many different reasons. I have also ERROR: Unable to find desired residue 'DALA:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DALA' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT' |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
Now 9 of 9 re-sends with similar failure - "ERROR: Unable to find desired residue...", as did originals. Paul. |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,328,464 RAC: 6,028 |
I got a bunch of 36. All failed after a second or two of cpu time. One worked to completion correctly. Here is my machine: Computer 5910575 CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Coprocessors --- Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.8 (Ootpa) [4.18.0-477.27.1.el8_8.x86_64|libc 2.28] BOINC version 7.20.2 Memory 128086.02 MB Cache 16896 KB Swap space 15992 MB This is the error message of one of the failures. Stderr output <core_client_version>7.20.2</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @08aaNewf_af2_8aa_hal_3.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1874700 Using database: database_0f7f01a1b07/database ERROR: Error in simple_cycpep_predict app: The N-methylation position indices must be within the pose! ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2279 BOINC:: Error reading and gzipping output datafile: default.out 18:12:37 (147984): called boinc_finish(1) </stderr_txt> ]]> |
Message boards :
Number crunching :
Rosetta Beta 6.00
©2024 University of Washington
https://www.bakerlab.org