Low GPU utilization

scm2000 · Post by **scm2000** » Wed May 13, 2020 1:22 pm

I had 3 GPUs in a system with a 2 core Celeron CPU...
It had reasonable performance.

I just added a 4th GPU and now all the GPU's are lucky to get slightly above 50% utilization.

I thought the CPU thread per core should not be doing much actual work , but is it the case I need a full CPU core per GPU?

Or is there some current issue with FAH GPU work units. As I see the latest software was supposed to address GPU utilization problems.
I installed that but it did not help GPU utilization.

Neil-B · Post by **Neil-B** » Wed May 13, 2020 1:43 pm

Could you post you log including the top 100 lines or so with the configuration details - It may help identify what is happening.

A number of factors that might be playing into this include the OS, AMD or Nvidia, Age of GPUs, various configuration settings - once a clearer picture of what you setup is (log should provide that) any advice is likely to be more relevant

scm2000 · Post by **scm2000** » Wed May 13, 2020 2:00 pm

Neil-B wrote:Could you post you log including the top 100 lines or so with the configuration details - It may help identify what is happening.

A number of factors that might be playing into this include the OS, AMD or Nvidia, Age of GPUs, various configuration settings - once a clearer picture of what you setup is (log should provide that) any advice is likely to be more relevant

Code: Select all

*********************** Log Started 2020-05-12T21:02:43Z ***********************
21:02:43:Trying to access database...
21:02:44:Successfully acquired database lock
21:02:44:Read GPUs.txt
21:03:16:Enabled folding slot 01: READY gpu:0:GP107 [GeForce GTX 1050 Ti]  2138
21:03:16:Enabled folding slot 02: READY gpu:1:GK110 [Tesla K40m]
21:03:16:Enabled folding slot 03: READY gpu:2:GK110 [Tesla K40m]
21:03:16:Enabled folding slot 00: READY gpu:3:GP107 [GeForce GTX 1050 Ti]  2138
21:03:16:****************************** FAHClient ******************************
21:03:16:        Version: 7.6.13
21:03:16:         Author: Joseph Coffland <[email protected]>
21:03:16:      Copyright: 2020 foldingathome.org
21:03:16:       Homepage: https://foldingathome.org/
21:03:16:           Date: Apr 27 2020
21:03:16:           Time: 21:21:01
21:03:16:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
21:03:16:         Branch: master
21:03:16:       Compiler: Visual C++ 2008
21:03:16:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16:       Platform: win32 10
21:03:16:           Bits: 32
21:03:16:           Mode: Release
21:03:16:         Config: C:\Users\steph\AppData\Roaming\FAHClient\config.xml
21:03:16:******************************** CBang ********************************
21:03:16:           Date: Apr 24 2020
21:03:16:           Time: 17:07:55
21:03:16:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
21:03:16:         Branch: master
21:03:16:       Compiler: Visual C++ 2008
21:03:16:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16:       Platform: win32 10
21:03:16:           Bits: 32
21:03:16:           Mode: Release
21:03:16:******************************* System ********************************
21:03:16:            CPU: Intel(R) Celeron(R) CPU G3930 @ 2.90GHz
21:03:16:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
21:03:16:           CPUs: 2
21:03:16:         Memory: 7.70GiB
21:03:16:    Free Memory: 5.40GiB
21:03:16:        Threads: WINDOWS_THREADS
21:03:16:     OS Version: 6.2
21:03:16:    Has Battery: false
21:03:16:     On Battery: false
21:03:16:     UTC Offset: -4
21:03:16:            PID: 8284
21:03:16:            CWD: C:\Users\steph\AppData\Roaming\FAHClient
21:03:16:  Win32 Service: false
21:03:16:             OS: Windows 10 Enterprise
21:03:16:        OS Arch: AMD64
21:03:16:           GPUs: 4
21:03:16:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 Ti] 2138
21:03:16:          GPU 1: Bus:3 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m]
21:03:16:          GPU 2: Bus:2 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m]
21:03:16:          GPU 3: Bus:4 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 Ti] 2138
21:03:16:  CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:3.5 Driver:11.0
21:03:16:  CUDA Device 1: Platform:0 Device:1 Bus:3 Slot:0 Compute:3.5 Driver:11.0
21:03:16:  CUDA Device 2: Platform:0 Device:2 Bus:1 Slot:0 Compute:6.1 Driver:11.0
21:03:16:  CUDA Device 3: Platform:0 Device:3 Bus:4 Slot:0 Compute:6.1 Driver:11.0
21:03:16:OpenCL Device 0: Platform:0 Device:0 Bus:NA Slot:NA Compute:2.1 Driver:26.20
21:03:16:OpenCL Device 2: Platform:1 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 3: Platform:1 Device:1 Bus:3 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 4: Platform:1 Device:2 Bus:1 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 5: Platform:1 Device:3 Bus:4 Slot:0 Compute:1.2 Driver:445.87
21:03:16:******************************* libFAH ********************************
21:03:16:           Date: Apr 15 2020
21:03:16:           Time: 14:53:14
21:03:16:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
21:03:16:         Branch: master
21:03:16:       Compiler: Visual C++ 2008
21:03:16:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16:       Platform: win32 10
21:03:16:           Bits: 32
21:03:16:           Mode: Release
21:03:16:***********************************************************************
21:03:16:<config>
21:03:16:  <!-- Folding Slot Configuration -->
21:03:16:  <cause v='COVID_19'/>
21:03:16:
21:03:16:  <!-- HTTP Server -->
21:03:16:  <allow v='127.0.0.1 192.168.1.0/24'/>
21:03:16:
21:03:16:  <!-- Network -->
21:03:16:  <proxy v=':8080'/>
21:03:16:
21:03:16:  <!-- Remote Command Server -->
21:03:16:  <password v='*****'/>
21:03:16:
21:03:16:  <!-- Slot Control -->
21:03:16:  <power v='full'/>
21:03:16:
21:03:16:  <!-- User Information -->
21:03:16:  <passkey v='*****'/>
21:03:16:  <team v='41355'/>
21:03:16:  <user v='scm2000'/>
21:03:16:
21:03:16:  <!-- Folding Slots -->
21:03:16:  <slot id='1' type='GPU'/>
21:03:16:  <slot id='2' type='GPU'/>
21:03:16:  <slot id='3' type='GPU'/>
21:03:16:  <slot id='0' type='GPU'/>
21:03:16:</config>

JimboPalmer · Post by **JimboPalmer** » Wed May 13, 2020 2:05 pm

scm2000 it depends which GPU vendor.

AMD uses interrupts, and loads the CPUs fairly lightly, although I would still not recommend more GPUs than CPU threads.

Nvidia uses polled I/O and this fully utilizes a CPU thread per GPU. You will always have difficulty running more GPUs than CPU threads, some folders need more CPU threads then GPUs, they use their PCs.

scm2000 · Post by **scm2000** » Wed May 13, 2020 2:10 pm

Looks like I'm going to do a CPU upgrade then... because I have all NVIDIA GPUs

Neil-B · Post by **Neil-B** » Wed May 13, 2020 2:46 pm

I'll be honest … not the groups of cards I expected given the CPU … Interesting collection

… I'm not a full time GPU folder, but my gut instinct would be (until you CPU upgrade) to pause the 1050s and let the K40ms have a full CPU core each, check what PPD they are pushing out over a few WUs then un-pause one of the 1050s and test the impact.

Your Celeron as JimboPalmer said should only support two cards properly (as nvidia) but "should", and the realities of what actually happens can sometimes surprise - and you said you have had three running reasonably before - baselining "reasonable" with just the two cards then adding the third will soon tell you is it is worth it until such time as you CPU upgrade … but yes a CPU upgrade would make things easier/better.

scm2000 · Post by **scm2000** » Wed May 13, 2020 6:06 pm

Neil-B wrote:I'll be honest … not the groups of cards I expected given the CPU … Interesting collection … I'm not a full time GPU folder, but my gut instinct would be (until you CPU upgrade) to pause the 1050s and let the K40ms have a full CPU core each, check what PPD they are pushing out over a few WUs then un-pause one of the 1050s and test the impact.

Your Celeron as JimboPalmer said should only support two cards properly (as nvidia) but "should", and the realities of what actually happens can sometimes surprise - and you said you have had three running reasonably before - baselining "reasonable" with just the two cards then adding the third will soon tell you is it is worth it until such time as you CPU upgrade … but yes a CPU upgrade would make things easier/better.

I bought the motherboard used, the 2 core celeron came with it.. I've been adding a hodge podge of GPUs all the while suspecting I should upgrade the CPU.. so I guess now is the time.

Post by **bruce** » Wed May 13, 2020 9:21 pm

I would never expect a 2 core celeron of being able to supply enough data to 3 GPUs, let alone 4, to keep those GPUs busy. Second, what are the speeds of the PCIe slots?

Third, your [GeForce GTX 1050 Ti]s are respectable GPUs and should be able top produce nicely. The [Tesla K40m]s also pretty good. A lot is going to depend on the speed of the PCIe slot.

Neil-B · Post by **Neil-B** » Wed May 13, 2020 9:40 pm

… and that (bruce's post) just goes to show I'm not a GPU folder

It does surprise me though that the Tesla K40ms are considered rather weak … I know their clocks are down in comparison to the 1050s but with the significantly larger shader count (x4 ish) and higher FLOPs performance (x2 ish) I'd have expected them to have been better - Techpowerup rate the 1050s 9% behind HD7970 relative performance and the K40ms 23% ahead of HD7970 but I suppose it comes down to what type of relative performance and how that equates to FAH loadings … as I said just shows how little I know about GPUs.

_r2w_ben · Post by **_r2w_ben** » Wed May 13, 2020 9:51 pm

Neil-B wrote:… and that (bruce's post) just goes to show I'm not a GPU folder

It does surprise me though that the Tesla K40ms are considered rather weak … I know their clocks are down in comparison to the 1050s but with the significantly larger shader count (x4 ish) and higher FLOPs performance (x2 ish) I'd have expected them to have been better - Techpowerup rate the 1050s 9% behind HD7970 relative performance and the K40ms 23% ahead of HD7970 but I suppose it comes down to what type of relative performance and how that equates to FAH loadings … as I said just shows how little I know about GPUs.

Your assessment is probably closer to reality. K40m is in the same neighbourhood as a GTX 970. TechPowerUp reports peak FLOPS for the K40m as more than double a 1050 Ti. Combine that with QRB and it could be 3x the points per day!

Neil-B · Post by **Neil-B** » Wed May 13, 2020 10:01 pm

My "assessment" was simply a paper one with no grounding in reality - hence why I happily defer to those who do GPU folding for real

The extent of my GPU folding is a Quadro K420 1GB and a Quadro M1000M 2GB that I run a WU through once in a blue moon just cause I can when I get bored and want to watch paint dry (they still make deadlines and occasionally Timeouts) and a GTX 750 Ti 2GB from my late father that I was going to toss in the recycling until in a moment of madness I ran a WU through it and found it actually gets pretty much the same ppd as my 24/56 core CPU slot - so I have left it running out of amusement and in his memory as I know he would have laughed about it

scm2000 · Post by **scm2000** » Thu May 14, 2020 2:11 am

the absolute power of the GPUs I have are actually not important to me... I chose them for various reasons.

2 of them perform at about 100 percent utilization each with a 2 core CPU... Running 2 of them (a 1050ti and k40) for a while made me think they are on par with each other. If the Teslas are under-powered it's not a problem anyways.

Simply upgrading to to a 4 core CPU should get 4 GPU's back up to about 100 percent each based on my experience with 2 alone and the information that the CPU threads use polling.

I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?

Post by **Joe_H** » Thu May 14, 2020 2:36 am

scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?

That is how nVidia wrote the driver code to handle OpenCL commands. From what I understand, CUDA commands are handled by interrupts.

There has been much conjecture on why they chose to do it that way. As far as I know, nVidia has not made any statement as to the reason.

JimboPalmer · Post by **JimboPalmer** » Thu May 14, 2020 2:46 am

scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?

I do not have the answer, but I do have an opinion.

CUDA is a proprietary interface that locks you into Nvidia cards forever.

OpenCL is an open standard interface that can run anywhere.

How can Nvidia make CUDA look wildly more attractive than OpenCL while still supporting open standards?

scm2000 · Post by **scm2000** » Thu May 14, 2020 3:05 am

JimboPalmer wrote:
scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
I do not have the answer, but I do have an opinion.

CUDA is a proprietary interface that locks you into Nvidia cards forever.

OpenCL is an open standard interface that can run anywhere.

How can Nvidia make CUDA look wildly more attractive than OpenCL while still supporting open standards?

i have the cuda sdk, and 4 nvidia gpus, and happily writing code for them. not locked in to anything... if i buy an amd gpu i’ll use whatever sdk i need to to program it.

Folding Forum

Low GPU utilization

Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization

Re: Low GPU utilization