Message boards : Number crunching : Discussion of the merits and challenges of using GPUs
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
Intel OneApi is landed with SyCl full support Rocm 4 is on the way (with support to Xilinx FPGA and to consumer AMD gpu series 68xx) OneApi interesting example AMD Instinct MI100 deliver up to 11.5 TFLOPs of double precision |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
I know, i know, we will never see a gpu app on R@H, but other projects (WCG OpenPandemics)... On average we anticipate a ~500x average speed-up in processing current packages on the mix of GPUs and CPUs from volunteers, which include from Raspberry PIs to laptops to high-end GPUs. On more powerful GPUs, we see up to 4000x speedups overall compared to a single CPU core |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,260,897 RAC: 4,354 |
I know, i know, we will never see a gpu app on R@H, but other projects (WCG OpenPandemics)... They might consider separating R@H workunits into two classes: 1. One starting point, many steps to improve on that starting point 2. A list of starting points, one step each to do something one them but the same step for all of them The first of these are generally not good candidates for GPU speedup, but the second is much more likely to be good candidates. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,097,357 RAC: 5,678 |
I know, i know, we will never see a gpu app on R@H, but other projects (WCG OpenPandemics)... Another option might be to consider only using gpu's with 8gb or more of ram, that way more of the task will fight into onboard ram, much like a cpu task does now, but of course a gpu uses memory differently so maybe the limit would be 12gb or more for example. The idea being a gpu should not be discarded out of hand and many options should be considered. Personally I DO like the idea of splitting a gpu task into smaller parts, maybe an A part and a B part of the same task that can each be crunch by different pc, or even the same pc to keep things normalized, and then put together for one full task. I would even go with a multipart task beyond 2 parts if that worked and provided provable and reliable Science. To me the problem isn't "why" the question is "why not" and WHY are wasting the opportunity to advance this Project into the petaflop range as mentioned in this message "Good afternoon. I have seen that the project is increasing the computational power in recent times, being currently with a power of 835 teraflop. We are going to increase its power for petaflops. Encourage friends, family and others to join the project. The more people who help, the faster the searches. Make the project reach more people, talk about it." written by Tiago Martins Barreiros For info the message number is: Message 100765 - Posted: 18 Mar 2021, 17:06:10 UTC |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
Another option might be to consider only using gpu's with 8gb or more of ram, that way more of the task will fight into onboard ram, much like a cpu task does now, but of course a gpu uses memory differently so maybe the limit would be 12gb or more for example. The idea being a gpu should not be discarded out of hand and many options should be considered. The last time they made a "public/known" test on gpu was years ago (if i'm not wrong over 7 years ago) and they had problems with gpu ram. But, at that time, the gpus had, at most, 4gb of ram on board (top level gpus, like Radeon R9 290), the others had 1 or 2 gb. Now top level gpus have 12/16 gb (and a different kind of memory, much faster). Other considerations are reguarding sw for gpu: languages (like cuda or opencl or rocm or oneApi), frameworks, tools are changed A LOT during these years. So, hw and sw problems are present for sure, but i think that the first and most important problem is the will to do.....see, for example, the idea to have cpu app optimized (ssex, avx, etc). |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1667 Credit: 17,445,762 RAC: 24,739 |
Personally I DO like the idea of splitting a gpu task into smaller parts, maybe an A part and a B part of the same task that can each be crunch by different pc, or even the same pc to keep things normalized, and then put together for one full task. I would even go with a multipart task beyond 2 parts if that worked and provided provable and reliable Science.A CPU Task & a GPU Task should be the same to keep thing simple for the project, A Work Unit can be processed on a GPU using the GPU application, and it can be processed by the CPU using the CPU application. As for splitting up a Task- that is how Seti was able to use GPUs to process the same data as a CPU. The GPU application broke the Work Unit in to multiple blocks, processed each block as necessary & the results of each block were then re-combined to give the final result, producing the same result as it would have if it was processed on the CPU. However instead of taking an hour or more, a high end GPU of the time could do it in 25 secs or so. Grant Darwin NT |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
As for splitting up a Task- that is how Seti was able to use GPUs to process the same data as a CPU. OpenPandemics team does it: new app does the same simulations, but a single cpu wu has from 1 to 5 simulations inside (and takes 2 hours in a modern cpu), while gpu app has from 30 to 70 simulations (and runs in 15 minutes in old gpu - gtx 750) |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,260,897 RAC: 4,354 |
I know, i know, we will never see a gpu app on R@H, but other projects (WCG OpenPandemics)... Looks like you're overestimating the speedup based on some incorrect assumptions about the GPUs. 1. GPUs have a different clock speed from the CPU cores - their clock speed is typically about a quarter of the CPU's clock speed, at least for Nvidia and AMD GPUs. Therefore, a GPU core can do about a quarter as much as a CPU core can do in the same amount of time. I haven't seen similar information about Intel GPUs. 2. CPU cores get their instructions independently; each CPU core has a register containing the memory address it gets its instruction from, and goes on to the next memory address unless the instruction makes it load a new address into this register. GPU cores (at least Nvidia and AMD) come in groups with an instruction unit that sends the same instruction to every member of the group, plus a mask to determine which cores in the group execute that instruction. The other cores in the group do nothing while this happens. If there is an if ... then ... else ..., then the then branch and the else branch cannot execute simultaneously for cores with the same group. For Nvidia GPUs, each group is called a warp and has 16 GPU cores within it. This means that work doing the same operations on multiple sets of data is more compatible with the hardware, regardless of which computer language is used. As a result, the maximum possible speed of a GPU application divided by the CPU speed is about the number of GPU cores divided by 4. Achieving this speed is very rare; more typical values are 10 to 20 times the speed of the CPU application. It's even possible for the GPU application speed to be only a quarter of the speed of the CPU application, but BOINC projects seldom release a GPU application that doesn't run at least 10 times as fast as the CPU application. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,097,357 RAC: 5,678 |
I know, i know, we will never see a gpu app on R@H, but other projects (WCG OpenPandemics)... All that is true and means it's definitely worth a try on the newer gpu's with their faster memory and much larger amounts of it. The problem could be the lack of access to new ones and getting them into the hands of the people who can then try and make them work here at Rosetta. Since Rosetta has tried to make gpu's work in the past tweaking the existing programming to accommodate the new gpu's shouldn't be that hard. Personally I think a call to Nvidia and/or AMD and finding the right person in Marketing should get one on it's way, with the understanding that it goes back when they are done with it and if it works the company gets the necessary public accolades. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
I know, i know, we will never see a gpu app on R@H, but other projects (WCG OpenPandemics)... I don't overstimate. It's a message from OpenPandemics admin Maybe they know their code, benchmark and results better than you. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
Looks like you're overestimating the speedup based on some incorrect assumptions about the GPUs. From my previous post (it's DATA from OpenPandemic admin): 1 to 5 simulations inside (and takes 2 hours in a modern cpu), while gpu app has from 30 to 70 A volunteer with an RTX2080 makes 2 gpu wus in less than 2 minutes. So, assuming the max simulations inside, a cpu core makes 5 "steps" in 2h, while a gpu makes 140 "steps" in 2 minutes. Do the math. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,260,897 RAC: 4,354 |
World Community Grid now has a GPU application for their OpenPandemics application. OpenPandemics - COVID-19 Now Running on Machines with Graphics Processing Units https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=693 If you already have BOINC installed, select the OpenPandemics project under World Community Grid and enable GPU use. World Community Grid https://join.worldcommunitygrid.org?recruiterId=480838 They are currently running a GPU stress test, so expect internet use to be especially high. Expect a few CPU tasks at first, soon switching to GPU tasks only if you have enabled at least one non-WCG BOINC project offering only CPU tasks. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
IWOCL 2021 conference about OpenCl/Sycl/OneAPI |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
Interesting paper about gpu and QM/MM simulations P.S. Amber and Rosetta are compatible through AMBRose |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2112 Credit: 41,044,764 RAC: 21,216 |
World Community Grid now has a GPU application for their OpenPandemics application. I had some come down. Killing my PCs. I'm only allowing them to run on PCs I'm not using as it makes everything drop to an unbearable crawl |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1990 Credit: 9,492,874 RAC: 12,663 |
I had some come down. Killing my PCs. Uh, that's strange. How many concurrent gpu wus are you crunching? What is your gpu? Entry level?? |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,097,357 RAC: 5,678 |
I had some come down. Killing my PCs. I agree I am running a laptop with an Nvidia 1660Ti gpu and running the WCG gpu tasks, one at a time, and it's working just fine, I'm typing this one it. Now I do leave 3 HT cpu cores free for internet browsing, typing in forums etc. |
Message boards :
Number crunching :
Discussion of the merits and challenges of using GPUs
©2024 University of Washington
https://www.bakerlab.org