Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 · Next
Author | Message |
---|---|
Tero Send message Joined: 22 Jul 17 Posts: 1 Credit: 939,545 RAC: 0 |
I seems that version 3.78 broke compatibility with the Linux client. After Minirosetta 3.78 update, tasks started to fail with "computation error". Latest version of the "regular" Rosetta works fine. I run CentOS linux 7.3 with client 7.6.22. It seems that the error is with how the new version handles files: ERROR: in::file::zip minirosetta_database.zip does not exist! ERROR:: Exit from: src/apps/public/boinc/minirosetta.cc line: 195 (Example workunit 853521226) There is a database zip-file, but it's name is minirosetta_database_d0bf94b.zip. If I make a copy of the zip file to minirosetta_database.zip, I get file errors like "ERROR: ERROR: Option file open failed for: 'flags_rb_10_11_78082_120670__t000__0_C1_robetta'" (workunit 854223185). That file was present in the project folder. |
planetclown Send message Joined: 27 Jan 12 Posts: 5 Credit: 12,973,394 RAC: 6,758 |
Hello, I'm occasionally seeing two different errors on the following apps:
Rosetta Mini v3.78 i686-pc-linux-gnu
BOINC:: Worker startup. Starting watchdog... Watchdog active. *** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu: free(): invalid pointer: 0x13867fb8 *** ======= Backtrace: ========= [0xdf36941] [0xdf3a45b] [0xede768c] [0xdeffb51] [0x81630ad] [0xd45eb92] [0xd45ebcb] [0xd465336] [0xd46ca67] [0xd46feef] [0xd474232] [0xd400a01] [0xd40c69a] [0xc9ac83d] [0xc9ad47f] [0xca8b53f] [0xb08de97] [0xb265920] [0xb2a83b6] [0xb29f4d2] [0x8aaae73] [0x8aae71d] [0x8ab361b] [0x8a925f9] [0x8a65a47] [0xb371855] [0xb3743be] [0xb434b13] [0xb43119d] [0x8a6fa23] [0x8056303] [0xdf0cfd8] [0x8048131] ======= Memory map: ======== 08048000-0ede4000 r-xp 00000000 08:05 1183736 /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu 0ede4000-0edec000 rw-p 06d9c000 08:05 1183736 /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu 0edec000-0f115000 rw-p 00000000 00:00 0 10d45000-17e18000 rw-p 00000000 00:00 0 [heap] ebd2d000-f2cd4000 rw-p 00000000 00:00 0 f305c000-f3d64000 rw-p 00000000 00:00 0 f4200000-f4221000 rw-p 00000000 00:00 0 f4221000-f4300000 ---p 00000000 00:00 0 f517e000-f517f000 ---p 00000000 00:00 0 f517f000-f5e8f000 rw-p 00000000 00:00 0 f5e8f000-f7667000 rw-s 00000000 08:05 1581177 /var/lib/boinc-client/slots/11/boinc_minirosetta_11 f7667000-f7668000 ---p 00000000 00:00 0 f7668000-f766b000 rw-p 00000000 00:00 0 f766b000-f766d000 rw-s 00000000 08:05 1581173 /var/lib/boinc-client/slots/11/boinc_mmap_file f766d000-f776a000 rw-p 00000000 00:00 0 f776a000-f776c000 r--p 00000000 00:00 0 [vvar] f776c000-f776e000 r-xp 00000000 00:00 0 [vdso] ffc6c000-ffc8e000 rw-p 00000000 00:00 0 [stack] </stderr_txt> ]]> The second error is SIGSEGV: segmentation violation BOINC:: Worker startup. Starting watchdog... Watchdog active. SIGSEGV: segmentation violation Stack trace (4 frames): [0xde75dcf] [0xf77ceca0] [0xdf36358] [0xeffb51ff] Exiting... </stderr_txt> ]]> I haven't seen any errors while running Rosetta v4.06 app or other BOINC projects. Any help would be appreciated. Thank you! |
floyd Send message Joined: 26 Jun 14 Posts: 23 Credit: 10,268,639 RAC: 0 |
Hello, I'm occasionally seeing two different errors on the following apps:Sorry to say that but your crappy Ryzen is the problem. It would be good if we had the choice to run only Rosetta tasks and not Rosetta Mini. Come on project staff, it can't be that hard to do. Every project I know allows you to choose your applications, it's probably already in the standard server code. In the meantime Ryzen users could reduce their run time to lose less time per crash, or switch to other projects. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Sorry to say that but your crappy Ryzen is the problem Ryzen is crappy? Are you a troll? |
floyd Send message Joined: 26 Jun 14 Posts: 23 Credit: 10,268,639 RAC: 0 |
Ryzen is crappy? Are you a troll?Yes. No. You don't seem to own a Ryzen. I do. Let me give some brief information about my current computers. One Ryzen 7 1700, right now showing 516 valid tasks and 60 errors. And one FX-8320E, 67 valid and 1 error. I can assure you the Ryzen behaves exactly as planetclown describes. Either the application crashes outright with a segmentation fault, or the C library kills it because it detected an invalid pointer, this way preventing a possible segfault. If you think about it there must also be cases where an invalid pointer goes unnoticed but doesn't cause a segfault. The result could be anything. I wouldn't rely on a Ryzen for something important, let's hope this project's validator is good. If you dig through the project's host list you'll find more Ryzens showing these symptoms, the most obvious running Linux, but also some Windows hosts with a high number of access violations that could be related. Also as planetclown describes, the errors don't seem to happen with the new Rosetta application and not at other projects, so you could be tempted to dismiss this as an application error in Rosetta Mini. But there's at least one other example of spontaneous segfaults on Ryzens. Search for "kill_ryzen" or "marginality error" and you'll find many reports on Ryzens segfaulting in a particular use case: massive parallel compiler runs on Linux. An extreme scenario, but not unrealistic, and there's no excuse for simply crashing. People there claim you're safe if you don't do that kind of thing, but without arguments, and Rosetta proves them wrong. So there's at least two completely unrelated cases of several Ryzens segfaulting out of the blue and no valid reason to assume thats's all. In other words, those things can unpredictably crash for unknown reasons and if they don't crash you still can't trust the results. Crap. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Either the application crashes outright with a segmentation fault, or the C library kills it because it detected an invalid pointer, this way preventing a possible segfault. If you think about it there must also be cases where an invalid pointer goes unnoticed but doesn't cause a segfault. If you have a invalid pointer in your sw it's your problem, not a cpu problem. Search for "kill_ryzen" or "marginality error" and you'll find many reports on Ryzens segfaulting in a particular use case: massive parallel compiler runs on Linux. An extreme scenario, but not unrealistic, and there's no excuse for simply crashing. Problem solved months ago, with free replaces of early Ryzen and with bios update (agesa 1.0.0.6b). |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Problem solved months ago, with free replaces of early Ryzen and with bios update (agesa 1.0.0.6b). I purchased a Ryzen 1700 made in week 33 of 2017, so it is a fixed version. It is on an ASRock Fatal1ty X370 Gaming X motherboard with the agesa 1.0.0.6b BIOS, and with 32 GB of Patriot DDR4 memory (15-15-15-36). The CPU is not overclocked, and runs Ubuntu 17.10. I just started running Rosetta on 15 cores, with the other core supporting a GTX 970 on Folding. Previously, it had been running WCG for about a month with no errors, but that is too easy. https://boinc.bakerlab.org/rosetta/results.php?hostid=3299745 In addition to errors, I am interested in the output. These are the 24-hour work units, and I was averaging about 800 points each on an i7-3770 (7 cores, with one reserved for a GPU, also on Ubuntu) for those that ran the full 24 hours. We will see how it goes. |
mmonnin Send message Joined: 2 Jun 16 Posts: 59 Credit: 24,222,307 RAC: 83,030 |
You can RMA segfault Zen chips. http://www.extremetech.com/computing/254750-amd-replaces-ryzen-cpus-users-affected-rare-linux-bug |
floyd Send message Joined: 26 Jun 14 Posts: 23 Credit: 10,268,639 RAC: 0 |
If you have a invalid pointer in your sw it's your problem, not a cpu problem.I'm missing the word "because" in that sentence. Problem solved months agoI'm not aware of an official statement saying the problem's been identified, let alone solved. Care to give me a pointer? *giggles* with free replaces of early RyzenThat's not a solution, it's an emergency measure. And of course I expect it to be free. Good thing this option exists though. But in this RMA process they'll ask you to run tests and document them with photos. Believe it or not, I have no means to take photos, so no RMA for me. and with bios update (agesa 1.0.0.6b).AGESA 1.0.0.6b doesn't solve this. Is it even supposed to? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
That's not a solution, it's an emergency measure. And of course I expect it to be free. Good thing this option exists though. But in this RMA process they'll ask you to run tests and document them with photos. Believe it or not, I have no means to take photos, so no RMA for me. There is a radical solution: pass to Windows 10. Problem goes away :-P Or wait 4.06 become the default application. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Problem solved months agoI'm not aware of an official statement saying the problem's been identified, let alone solved. Care to give me a pointer? *giggles* New "RMA Ryzen" has not this problem, so they find it and resolve... |
floyd Send message Joined: 26 Jun 14 Posts: 23 Credit: 10,268,639 RAC: 0 |
New "RMA Ryzen" has not this problem, so they find it and resolve...I can't agree with that conclusion. The fact that you get a "good" processor (i.e. one that passes this particular test) back only shows that those things exist. It does not prove that current processors in general are good, nor that anything has changed at all. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
We will see how it goes. I have gotten rather poor performance with the Ryzen 1700, somewhat less output per core than an i7-3770, and three errors. But I have now disabled SMT in the BIOS. There we some problems with that early on with Ryzen, and maybe Rosetta does not work well with it on AMD. So I am now running Rosetta on 7 full cores, with one core reserved for the GPU. I will run it for about two or three more days to see. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,807 |
If you have a invalid pointer in your sw it's your problem, not a cpu problem.I'm missing the word "because" in that sentence. I just saw a similar problem but under Windows 10 and on an Intel CPU. 7H2LD3_51C703_fold_and_dock_SAVE_ALL_OUT_538615_1685 https://boinc.bakerlab.org/workunit.php?wuid=864346673 Rosetta Mini 3.78 64-bit Windows 10 Intel i7-5950X, 32 GB, SSD Perhaps someone could check if it's the same problem, but under conditions much less likely to have the problem become visible. |
floyd Send message Joined: 26 Jun 14 Posts: 23 Credit: 10,268,639 RAC: 0 |
I just saw a similar problem but under Windows 10 and on an Intel CPU.There's many possible causes for an access violation. Your task list doesn't show any other errors and you'll unlikely ever find out what happened in this single incident. If it doesn't happen repeatedly just ignore it. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have gotten rather poor performance with the Ryzen 1700, somewhat less output per core than an i7-3770, and three errors. But I have now disabled SMT in the BIOS. There we some problems with that early on with Ryzen, and maybe Rosetta does not work well with it on AMD. So I am now running Rosetta on 7 full cores, with one core reserved for the GPU. I will run it for about two or three more days to see. After disabling SMT in the BIOS on my Ryzen 1700 machine (Ubuntu 17.10), I have obtained the following results, which are slightly complicated: https://boinc.bakerlab.org/results.php?hostid=3299745 Good News: No more errors, with 31 work units being completed successfully. This compares with 3 errors out of 21 work units when SMT was enabled. Bad News: The output, as measured by the credits is still quite low even on full Ryzen cores (running 7 cores, with the other one dedicated to a GPU) when you are running only Rosetta (but see below). And the credits are all over the place. Just considering the Rosetta mini 3.78 that ran the full 24 hours, they range from 178 to 815 (except for the last, at 1160 points), and averaged 337 points. That seems to be about the same (per core) as with SMT enabled and running Rosetta on 15 cores, so enabling SMT should at least increase the total output, even with errors. However, in neither case is the Ryzen as good a the i7-3770 (with hyperthreading). I get no errors on 3.78, and credits average around 800 points per work unit running with 7 cores. I see no advantage to Ryzen thus far as compared to Ivy Bridge if you run only Rosetta. But the Ryzen 1700 does much better on WCG (running mainly MCM and MIP, with a few of the others). There I get no errors, and twice the output of the i7-3770. So there is something wrong with how the Rosetta AMD app runs on Ryzen. I hope they can fix it, as I will probably be converting most of my machines to AMD eventually. And, in another twist, the last of the Rosettas did quite well at 1160 points. That was because as I was finishing the Rosettas, I allowed the WCG work units to run. Therefore, when most of the cores were running WCG, the last Rosetta got very good points (though the very last of the 3.78 got stuck and I had to abort it). Moral: Until they fix Rosetta to run properly on Ryzen, it would be best to mix Rosetta with something else on the majority of the cores (WCG works). You will probably need to experiment to find out what works best though. ===================================================================================================== Work units that ran the full 24 hours (3.78 only) run with SMT disabled (running on 7 full cores): Returned 9 Dec: 1160.19 815.46 187.98 186.55 Returned 8 Dec: 815.21 178.20 184.03 182.54 747.96 796.89 Returned 7 Dec: 184.87 187.17 186.49 182.87 183.08 183.50 181.75 Ave: 337 points (excluding the last work unit at 1160 points). NOTE: very little difference in credits per core with SMT enabled (but twice the number of cores). Addendum: I don't know how 4.06 Rosetta runs on Ryzen, except that the points are lower as compared to 3.78 Rosetta mini. But how it runs on an Intel chip is another matter. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Error after 5 hours.... 958310977 -529697949 (0xE06D7363) Unknown error code |
planetclown Send message Joined: 27 Jan 12 Posts: 5 Credit: 12,973,394 RAC: 6,758 |
Ryzen is crappy? Are you a troll?Yes. No. You don't seem to own a Ryzen. I do. Just want to reply with an updated status to my SEGFAULT issues with Ryzen 7 1700. I was able to reproduce the segmentation faults using the “kill ryzen” test. I also got a replacement Ryzen through AMD’s RMD process. It took about a week from when I mailed it back to when I received the replacement. My original CPU had a manufacture date in the 21st week of the year, the replacement in the 39th week (where it’s believed chips produced in the 25th or prior weeks may have issues). I have now completed 97 Rosetta Mini v3.78 tasks on linux without a single error. It appears RMDing the Ryzen was the solution. Thank you floyd for providing information on the issues that people have been having with the Ryzen chips! Results from my desktop with the latest Ryzen: https://boinc.bakerlab.org/results.php?hostid=3297625&offset=0&show_names=0&state=0&appid=4[/url] |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
963219454 ERROR: get_jump_that_builds_residue: not build by a jump! |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Sorry to say that but your crappy Ryzen is the problem. Oh, boys, it's not a bug, it's a feature. :-P Intel security patch |
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org