Fast GPU, not enough CPU power to keep up?

Frontiers · Post by **Frontiers** » Mon Aug 14, 2023 3:34 pm

Peter_Hucker wrote: ↑Sat Apr 22, 2023 8:41 pm It would be better if Folding stored the data on the GPU, but perhaps the computation makes this impossible.

GPU VRAM due to it proximity to massive hot chip and hot VRM MOSFETs and due to higher VRAM clocks - usually is much hotter than system RAM.
So when not having ECC on GPU VRAM - as it with participating 995 from 1000 cards - it's much safer to keep more data in system RAM then in GPU hotter VRAM. And one can have ECC system RAM when build with ECC enabled CPU and ECC supporting board, it not so times higher priced compared to ECC VRAM GPUs.

Peter_Hucker wrote: ↑Sat Apr 22, 2023 8:41 pm A lot of Boinc projects will load a few GB of data onto the GPU at the start, and it can refer to that much faster than having to get it from main RAM.

Einstein@Home do so, but it compute at least 2 WUs at different machines until it obtain identical results, if 2 results differ - it re-releases same WU at another machine until it obtain 2 completely identical results. So it have very low required PCIe bandwidth as whole their WU in GPU VRAM. And I saw at least 1 from 100 Einstein WUs were broken and then re-released, with mild OCed chipclock and not OCed at all stock VRAM clock, with just ~1500 MB of used VRAM at these days pretty weak GPU.
GPUGRID made with same as FAH's OpenMM but only with CUDA support - compute very differently compared to Einstein@Home, their ATMbetas use little amount of GPU VRAM ~10 times less than Einstein, but use nearly same PCIe utilization as FAH - to store results of computation in much colder and less prone to errors system RAM.

Peter_Hucker · Post by **Peter_Hucker** » Mon Aug 14, 2023 3:53 pm

I was not aware GPU RAM had errors. Do you have a link to this information? I suppose it's possible, since all you'd get in a game is a miscoloured pixel, but I would have thought if the RAM was hot enough to produce errors, it would be getting really stressed and soon fail.

They ought to design the cards with the VRAM closer to the heatsink, mine are a whole mm away!

muziqaz · Post by **muziqaz** » Mon Aug 14, 2023 4:49 pm

VRAM rarely get errors. If it did, all games would be artefacting left and right for everyone. VRAM is design for those temperatures. There are other reasons why it is not used for RAM like applications.

Peter_Hucker · Post by **Peter_Hucker** » Mon Aug 14, 2023 5:20 pm

Everything other than Folding seems to use mostly VRAM and hardly any system RAM. I don't know if it's the program or a big data set in VRAM. I assume the data, as the CPU is doing the processing and making calls to the GPU. Presumably Folding needs to have the CPU accessing the data too. I guess if it's static data it could be placed in both to avoid the bandwidth problem. Maybe I'm the only one with risers? I thought they were common? Since they're originally made for bitcoin, most run at low speeds - PCI-E v2, 1 lane.

Post by **Joe_H** » Mon Aug 14, 2023 6:33 pm

The data is not "static". The WU comes in with data describing the initial positions and velocity vectors for each atom in the simulation. The CPU is used to prepare a batch of that data to be processed on the GPU and works through each time step over the entire WU. At the end of each time step the positions and velocity vectors have changed. Except for very small WUs and very large GPUs the entire set of data will not fit in the shader units on the GPU at one time. So a lot of data passing back and forth between the CPU and the GPU.

Things like cryptocurrency and hashing have a very small amount of data, just a large amount of processing. So they do not need the PCIe bandwidth. Just going to 2 or 4 lane risers can make a bit of difference for F@h GPU processing.

Peter_Hucker · Post by **Peter_Hucker** » Tue Aug 15, 2023 4:28 am

Multi lane risers are very hard to come by / very expensive. The ribbon ones are of course 16 lanes, but they don't reach so far. I use 1 metre long USB ones. One of my machines has 6 GPUs, it's got 12 PCI-E sockets, so I can have 12 cards without sharing a socket, or infinite cards if I do (you can daisy chain the 4 way splitters). It's ok with slower cards like 280X (4000 GFlops), I can run three of those off one lane of PCI-E v2 on folding (the USB risers limit to PCI-E v2). But if I had a card faster than 12000 GFlops, even on it's own it would get throttled. I do have a 4 (or is it 8?) lanes between 6 (or 8) cards riser which I'm not using (a gift from a generous Einstein user) - not sure how it shares the lanes, and if less cards on it would give them more lanes.

So what you're saying is the CPU creates the new velocity vectors? I guess that part can't be computed on the GPU, so the new data has to get transferred regularly.

Things like Einstein, I'm guessing the data is from the telescope and searched through, but doesn't change?

Post by **Joe_H** » Tue Aug 15, 2023 5:36 am

Peter_Hucker wrote: ↑Tue Aug 15, 2023 4:28 am So what you're saying is the CPU creates the new velocity vectors? I guess that part can't be computed on the GPU, so the new data has to get transferred regularly.

The new locations and velocity vectors for the atoms are created by the processing on the GPU, however the whole WU's data will not fit at once. So chunks of the WU data are sent, processed and returned. Then the results from the chunks have a reconciliation for the attractions between atoms in adjacent chunks. The CPU most of the time is just putting together those batches of data to send to the GPU for processing. However every checkpoint the folding core also does run a sanity check calculation using the CPU on the WU data coming back from the GPU.

So after every step the WU data has been changed by the GPU processing, and is dynamic instead of static. At the end the WU data is packaged up and sent back to the servers, that becomes the starting point for the next Gen of that project's trajectory defined by the Run and Clone numbers.

Peter_Hucker wrote: ↑Tue Aug 15, 2023 4:28 am Things like Einstein, I'm guessing the data is from the telescope and searched through, but doesn't change?

That would be my guess as well. The data may cover a time period, but the processing is looking for certain patterns. It is not seeing how it would change by running it through a series of calculations.

Peter_Hucker · Post by **Peter_Hucker** » Sun Aug 27, 2023 8:33 pm

Joe_H wrote: ↑Tue Aug 15, 2023 5:36 amhowever the whole WU's data will not fit at once

How big are we talking? Some GPUs have a lot of VRAM.

Post by **Joe_H** » Sun Aug 27, 2023 11:22 pm

Peter_Hucker wrote: ↑Sun Aug 27, 2023 8:33 pm
Joe_H wrote: ↑Tue Aug 15, 2023 5:36 amhowever the whole WU's data will not fit at once
How big are we talking? Some GPUs have a lot of VRAM.

F@h does not use the VRAM, it uses the shaders. So for example a Nvidia RTX 4090 has 16,384 shaders. I don't know all of the details, some VRAM may be used to cache data, but not a lot. There have been tests between otherwise identical GPUs with differing amounts of VRAM, processing speed was essentially the same.

People will object that they have heard different. Those objections often come from example of cards like the GTX 1060. That came in 3, 5, and 6 GB configurations, and the 6 GB version was definitely faster. But there is a reason for that, the versions are not identical. The 3 GB card has 1152 shaders while the 6 GB card has 1280. I do not recall reports about the 5 GB version, it also has 1280 shaders, but less L2 cache and lower memory bandwidth.

Before the card makers started doing things like this, if someone was just buying a card for folding the lower VRAM card would often be recommended. It would process as fast, and use less power and generate less heat from the VRAM.

Peter_Hucker · Post by **Peter_Hucker** » Mon Aug 28, 2023 2:05 am

Thanks for the explanation.

P.S. how do you make a signature like that?

Post by **Joe_H** » Mon Aug 28, 2023 3:35 am

Peter_Hucker wrote: ↑Mon Aug 28, 2023 2:05 am Thanks for the explanation.

P.S. how do you make a signature like that?

I use the user info only variation from this page - https://folding.extremeoverclocking.com/?nav=IMAGES. You can edit your signature through the User Control Panel. The text I entered separately from the EOC stats image.

Peter_Hucker · Post by **Peter_Hucker** » Mon Aug 28, 2023 4:46 am

Looks complicated. And I just realised I have my points split equally between my old account and my new one (when I added gridcoin), so I won't bother.

P.S. on trying to come in here from the email link I got told I was banned for 3 days, turned off the VPN, then magically got unbanned. Weird.

Post by **Joe_H** » Mon Aug 28, 2023 5:30 am

There are a lot of VPNs that have lax controls over spam. The address it assigned you may have matched one on the temporary ban list for the forum.

Peter_Hucker · Post by **Peter_Hucker** » Mon Aug 28, 2023 5:34 am

I don't think VPNs have controls over anything, that's the whole point.

Post by **Joe_H** » Mon Aug 28, 2023 5:43 am

Actually they do, just some are very lax. There are VPN services that I never or rarely see spam from, and there are others where it shows up on a regular basis. Some have ownership that can be traced to Russian and similar owners. Much of the spam through those is for sites in E. Europe or gambling sites around the world.

Folding Forum

Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?

Re: Fast GPU, not enough CPU power to keep up?