Nvidia vs AMD CPU usage

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

arisu
Posts: 270
Joined: Mon Feb 24, 2025 11:11 pm

Nvidia vs AMD CPU usage

Post by arisu »

I've heard that the CPU thread that feeds an Nvidia GPU is always 100% loaded no matter how much work the GPU is doing, but the CPU thread to feed an AMD GPU is proportional to the amount of processing being done on the GPU. Is this true? And why is this, on a technical level?

Unless I'm wrong, the purpose of the CPU thread for any GPU project is:
- Transferring data to and from the GPU and doing bookkeeping
- Performing occasional sanity checks (initially and during each checkpoint I think)
- Reconciling computed forces between independently-processed slices/ranks (or is that only a thing for GROMACS?)

But the work required to do any of that is proportional to the amount of work the GPU is doing. So why does the CPU thread that manages folding on an Nvidia GPU always use 100% but not on an AMD GPU? I guess it's got something to do with OpenCL vs CUDA?

Not complaining about high usage or anything, just curious.
Joe_H
Site Admin
Posts: 8092
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Nvidia vs AMD CPU usage

Post by Joe_H »

From what I understand, the difference is in how Nvidia and AMD wrote their drivers. Nvidia's driver is doing a spin-wait looking for instructions to be processed and sent to the GPU. AMD from the explanations I have seen implemented this as an interrupt instead. As soon as something is handed off to the driver to process, it wakes up and takes CPU cycles to handle the request and then goes inactive until the next request. So the Nvidia driver process is always active, but the actual amount of work done by the CPU may be a fraction of the cycles available.

Your understanding of the CPU thread usage is similar to mine. Though I am not certain exactly how OpenMM handles the reconciliation between blocks of data from the WU sent to the GPU for processing. There are options as well to pass on other functions to the CPU in OpenMM that the GPU folding cores for F@h currently do not use. For example they could pass 64-bit calculations to the CPU and use GPUs without 64-bit support. But that would mostly only be needed for older or less powerful GPUs, and from testing in the past would also slow down processing on the rest of the GPUs.
Image
arisu
Posts: 270
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

It's shocking that the Nvidia drivers would be polling-driven instead of interrupt-driven. It does this for both Windows and Linux drivers?

When HIP is rolled out to FAH (pull request #328 in fah-client-bastet gives me hope), I pray it will increase AMD performance so that it is comparable to Nvidia and so the polling-driven wait loops can be avoided. CUDA source code can be transpiled into HIP source code, so in theory every CUDA project can immediately switch to HIP on AMD platforms.

Is there a high-level overview somewhere about how OpenMM handles slices/ranks and reconciling forces between them?
Joe_H
Site Admin
Posts: 8092
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Nvidia vs AMD CPU usage

Post by Joe_H »

I don't know the details of the Nvidia driver implementation across OSs, but from the reports I have seen posted here under both there appears to be a CPU thread continuously active while folding whether it is on Windows or Linux. Some experimented a few years ago in multi GPU systems and it appeared a single CPU core could be enough to handle the driver overhead for two or more. But that was with somewhat less powerful GPUs.

Originally OpenCL was used on both Nvidia and AMD GPUs. But Nvidia always gave less than the best support for OpenCL and eventually the GPU core developers took on the extra programming overhead to support both CUDA and OpenCL. There are some limitations, to support the latest GPUs and older ones could require more core versions. At the moment the least common denominator CUDA library will support use of Maxwell to the newest cards. Kepler cards end up falling back to OpenCL. They are working on adding HIP support for AMD, but there have been some issues. Eventually it may be ready for release, but no idea when exactly.

I don't know if there is an overview available, but the site for OpenMM may have something - https://openmm.org.
Image
arisu
Posts: 270
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

Would be much utility to have a program that monitors GPU usage and uses cgroups to limit the CPU usage of the corresponding thread (actually limiting usage, not priority)? Slowly lowering the max CPU use of the thread until the GPU usage starts to decrease / increasing it until GPU usage stops rising. Cgroups CPU limiting works by refusing to return a timeslice to the thread if the thread has exceeded its CPU usage limit over a short period (like a millisecond), even if it means scheduling SCHED_IDLE or the idle kthread instead.

In theory, that would reduce the waste from polling.
muziqaz
Posts: 1551
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Nvidia vs AMD CPU usage

Post by muziqaz »

Nvidia is so sensitive to CPU loads, that it even slows down if CPU is folding on free cores. It gains quite a lot of points per day by just not folding on the CPU at all
FAH Omega tester
Image
arisu
Posts: 270
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

Wow! Maybe the scheduler jostling the thread around hurts performance. I bet a solution would be to bind the Nvidia's CPU thread to a specific core with taskset, and use cpusets to block that CPU thread (and its SMT sibling thread) from the scheduler.
muziqaz
Posts: 1551
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Nvidia vs AMD CPU usage

Post by muziqaz »

arisu wrote: Thu Mar 06, 2025 7:26 am Wow! Maybe the scheduler jostling the thread around hurts performance. I bet a solution would be to bind the Nvidia's CPU thread to a specific core with taskset, and use cpusets to block that CPU thread (and its SMT sibling thread) from the scheduler.
I think it was tried by Nvidians. No luck
FAH Omega tester
Image
arisu
Posts: 270
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

After looking into CUDA programming a little, it seems there is a way to switch it from polling in a loop to interrupt-driven for synchronization. Because it's so easy to make it switch to interrupt-driven there is likely a good reason it hasn't been done. I'm guessing it's because the latency would increase and maybe the Nvidia driver has unacceptable latency when using the interrupt-driven approach.
muziqaz
Posts: 1551
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Nvidia vs AMD CPU usage

Post by muziqaz »

The person who integrated CUDA into FAHcore was working for Nvidia at that time, so I'm pretty sure they knew what they were doing :)
FAH Omega tester
Image
toTOW
Site Moderator
Posts: 6421
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Nvidia vs AMD CPU usage

Post by toTOW »

The fun fact is that OpenCL uses passive polling and doesn't use CPU much while CUDA uses active polling and a full CPU thread ... and this all on nVidia (Windows).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
arisu
Posts: 270
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

Does CUDA transpiled to HIP use busy polling? It will be annoying if AMD gets a speedup at the expense of a CPU core at least on lower end devices where the CPU makes up a good fraction of the PPD.
muziqaz
Posts: 1551
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Nvidia vs AMD CPU usage

Post by muziqaz »

No, there were no signs of that in initial hip testing.
FAH Omega tester
Image
arisu
Posts: 270
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

When HIP is rolled out, will Nvidia systems use it instead of CUDA since hipify (supposedly) produces equally-performant kernels?
muziqaz
Posts: 1551
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Nvidia vs AMD CPU usage

Post by muziqaz »

arisu wrote: Tue Mar 11, 2025 12:35 am When HIP is rolled out, will Nvidia systems use it instead of CUDA since hipify (supposedly) produces equally-performant kernels?
No
FAH Omega tester
Image
Post Reply