You may be correct regarding "1 CPU core per 1 GPU slot" for Nvidia - unfortunately I've never really folded with Nvidia cards before, so I can't contribute much. All of my folding history has been using AMD cards coupled with Intel processors, running Windows or Linux.MeeLee wrote:I'd like to correct that,Dark_Vera wrote:I've learned from researching other forums and threads that F@H (and certain BOINC apps) handle CPU resource allocation differently in Linux than under Windows. The main observation has been that F@H REQUIRES 1 CPU core per GPU folding slot, yet under Linux a GPU folding slot requires a fraction of a core (proven also by my rig discussed earlier in this thread, until paging begins).bruce wrote:A lot depends on whether you run Linux or windows. My main machine can boot eithe Windows or Linux (I'm not at home today, so I can't verify every detail.) It has a GTX960 and a GtX750Ti plus an 8-way CPU. I commonly run 3 slots (including 6 virtual CPU cores dedicate to FAH plus often a browser. Those 3 slots generally push me into the paging range. If I stop one slot, there's no paging.
The monitor runs off of the GTX750Ti. In Windows, the screen lags appreciably, but works fine if I pause the WU that's running on the same GPU as the desktop. If I pause the CPU slot or the other CPU slot, there's still a lag, so it's not paging that's causing the screen lag, it's the limitation of sharing the GTX750Ti. If I pause that slot, the browser works fine.
If I switch Windows to CPU rendering, it doesn't help, which surprises me. I guess that's a paging issue. If I could add a 3rd (slow) GPU and dedicate it to the Windows Desktop, it might work fine. (M/B has no more slots except the 1x PCI and I haven't figured out how to use that yet.)
On Linux, I don't notice the same limitations. Then, too, Linux gets better PPD.
Additionally, GPU BOINC apps and Folding have far higher performance under Linux than in Windows, which is even more apparent when the GPUs are bottlenecked with PCIe X1 slots (versus the recommended X4, X8 or X16 speeds). My GPUs, even when running one slot at a time under Windows, would choke, as I'm running a K37 mining board with all slots restricted to PCIe X1 speeds. Under Linux, I get roughly 2.5 times the PPD per GPU.
Overall, Linux somehow squeezes superior performance out of GPUs throttled at the PCIe lane level while also using significantly less CPU when running multiple GPUs - a proven fact seen throughout this thread and also from dozens of other discussions online.
For full speed results, both Windows and Linux, require one core per GPU, if the GPU is Nvidia.
The difference between Nvidia and AMD GPUs, is that both GPUs (should) use about the same CPU load, when the GPUs are similar in performance.
However the CPU for AMD GPUs will show the CPU load, while NVIDIA drivers will fill CPU passive time with idle data.
That being said, if you have a 4Ghz CPU, you could easily share a CPU core with 2x RTX 2060 or 2070 GPUs; since the CPU's idle data can easily be allocated to the second GPU, just like on AMD.
The difference is that now each GPU runs a bit slower; just like AMD with the AMD drivers. The idle data Nvidia drivers send over the CPU, is actually helping the GPU for higher performance.
If however, you have more GPUs than CPU cores, and your CPU is fast enough, you can split one CPU core with 2 GPUS (or 3 GPUs on 1 CPU core that supports hyperthreading).
The phenomenon '1GPU per CPU core' is true for Nvidia drivers, and is true for both Windows and Linux.
Using Windows 10, the rig that I'm using now (Celeron + 8X RX460s) was getting crushed and throttled to the bone, presumably by CPU usage (100% usage all day long when running 8 GPU slots at once). I did observe the "1 core for 1 GPU" rule in effect on Windows 10, to deleterious effects.
I'm running the same PC now with Archlinux and from the beginning (when it wasn't paging) CPU spiked at a maximum of 60% for 8 GPUs folding on the "Full" preset. Now that I've seemingly resolved my RAM leak issues and paging is no longer happening, my CPU use has maxed out at 32%.
So we might be able to conclude that Windows "forces" 1 CPU core per 1 GPU slot whereas Archlinux does not, and is therefore far more efficient with CPU management where Folding is concerned.