Low GPU utilization
Moderators: Site Moderators, FAHC Science Team
Low GPU utilization
I had 3 GPUs in a system with a 2 core Celeron CPU...
It had reasonable performance.
I just added a 4th GPU and now all the GPU's are lucky to get slightly above 50% utilization.
I thought the CPU thread per core should not be doing much actual work , but is it the case I need a full CPU core per GPU?
Or is there some current issue with FAH GPU work units. As I see the latest software was supposed to address GPU utilization problems.
I installed that but it did not help GPU utilization.
It had reasonable performance.
I just added a 4th GPU and now all the GPU's are lucky to get slightly above 50% utilization.
I thought the CPU thread per core should not be doing much actual work , but is it the case I need a full CPU core per GPU?
Or is there some current issue with FAH GPU work units. As I see the latest software was supposed to address GPU utilization problems.
I installed that but it did not help GPU utilization.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Low GPU utilization
Could you post you log including the top 100 lines or so with the configuration details - It may help identify what is happening.
A number of factors that might be playing into this include the OS, AMD or Nvidia, Age of GPUs, various configuration settings - once a clearer picture of what you setup is (log should provide that) any advice is likely to be more relevant
A number of factors that might be playing into this include the OS, AMD or Nvidia, Age of GPUs, various configuration settings - once a clearer picture of what you setup is (log should provide that) any advice is likely to be more relevant
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Low GPU utilization
Neil-B wrote:Could you post you log including the top 100 lines or so with the configuration details - It may help identify what is happening.
A number of factors that might be playing into this include the OS, AMD or Nvidia, Age of GPUs, various configuration settings - once a clearer picture of what you setup is (log should provide that) any advice is likely to be more relevant
Code: Select all
*********************** Log Started 2020-05-12T21:02:43Z ***********************
21:02:43:Trying to access database...
21:02:44:Successfully acquired database lock
21:02:44:Read GPUs.txt
21:03:16:Enabled folding slot 01: READY gpu:0:GP107 [GeForce GTX 1050 Ti] 2138
21:03:16:Enabled folding slot 02: READY gpu:1:GK110 [Tesla K40m]
21:03:16:Enabled folding slot 03: READY gpu:2:GK110 [Tesla K40m]
21:03:16:Enabled folding slot 00: READY gpu:3:GP107 [GeForce GTX 1050 Ti] 2138
21:03:16:****************************** FAHClient ******************************
21:03:16: Version: 7.6.13
21:03:16: Author: Joseph Coffland <[email protected]>
21:03:16: Copyright: 2020 foldingathome.org
21:03:16: Homepage: https://foldingathome.org/
21:03:16: Date: Apr 27 2020
21:03:16: Time: 21:21:01
21:03:16: Revision: 5a652817f46116b6e135503af97f18e094414e3b
21:03:16: Branch: master
21:03:16: Compiler: Visual C++ 2008
21:03:16: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16: Platform: win32 10
21:03:16: Bits: 32
21:03:16: Mode: Release
21:03:16: Config: C:\Users\steph\AppData\Roaming\FAHClient\config.xml
21:03:16:******************************** CBang ********************************
21:03:16: Date: Apr 24 2020
21:03:16: Time: 17:07:55
21:03:16: Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
21:03:16: Branch: master
21:03:16: Compiler: Visual C++ 2008
21:03:16: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16: Platform: win32 10
21:03:16: Bits: 32
21:03:16: Mode: Release
21:03:16:******************************* System ********************************
21:03:16: CPU: Intel(R) Celeron(R) CPU G3930 @ 2.90GHz
21:03:16: CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
21:03:16: CPUs: 2
21:03:16: Memory: 7.70GiB
21:03:16: Free Memory: 5.40GiB
21:03:16: Threads: WINDOWS_THREADS
21:03:16: OS Version: 6.2
21:03:16: Has Battery: false
21:03:16: On Battery: false
21:03:16: UTC Offset: -4
21:03:16: PID: 8284
21:03:16: CWD: C:\Users\steph\AppData\Roaming\FAHClient
21:03:16: Win32 Service: false
21:03:16: OS: Windows 10 Enterprise
21:03:16: OS Arch: AMD64
21:03:16: GPUs: 4
21:03:16: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 Ti] 2138
21:03:16: GPU 1: Bus:3 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m]
21:03:16: GPU 2: Bus:2 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m]
21:03:16: GPU 3: Bus:4 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 Ti] 2138
21:03:16: CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:3.5 Driver:11.0
21:03:16: CUDA Device 1: Platform:0 Device:1 Bus:3 Slot:0 Compute:3.5 Driver:11.0
21:03:16: CUDA Device 2: Platform:0 Device:2 Bus:1 Slot:0 Compute:6.1 Driver:11.0
21:03:16: CUDA Device 3: Platform:0 Device:3 Bus:4 Slot:0 Compute:6.1 Driver:11.0
21:03:16:OpenCL Device 0: Platform:0 Device:0 Bus:NA Slot:NA Compute:2.1 Driver:26.20
21:03:16:OpenCL Device 2: Platform:1 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 3: Platform:1 Device:1 Bus:3 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 4: Platform:1 Device:2 Bus:1 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 5: Platform:1 Device:3 Bus:4 Slot:0 Compute:1.2 Driver:445.87
21:03:16:******************************* libFAH ********************************
21:03:16: Date: Apr 15 2020
21:03:16: Time: 14:53:14
21:03:16: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
21:03:16: Branch: master
21:03:16: Compiler: Visual C++ 2008
21:03:16: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16: Platform: win32 10
21:03:16: Bits: 32
21:03:16: Mode: Release
21:03:16:***********************************************************************
21:03:16:<config>
21:03:16: <!-- Folding Slot Configuration -->
21:03:16: <cause v='COVID_19'/>
21:03:16:
21:03:16: <!-- HTTP Server -->
21:03:16: <allow v='127.0.0.1 192.168.1.0/24'/>
21:03:16:
21:03:16: <!-- Network -->
21:03:16: <proxy v=':8080'/>
21:03:16:
21:03:16: <!-- Remote Command Server -->
21:03:16: <password v='*****'/>
21:03:16:
21:03:16: <!-- Slot Control -->
21:03:16: <power v='full'/>
21:03:16:
21:03:16: <!-- User Information -->
21:03:16: <passkey v='*****'/>
21:03:16: <team v='41355'/>
21:03:16: <user v='scm2000'/>
21:03:16:
21:03:16: <!-- Folding Slots -->
21:03:16: <slot id='1' type='GPU'/>
21:03:16: <slot id='2' type='GPU'/>
21:03:16: <slot id='3' type='GPU'/>
21:03:16: <slot id='0' type='GPU'/>
21:03:16:</config>
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Low GPU utilization
scm2000 it depends which GPU vendor.
AMD uses interrupts, and loads the CPUs fairly lightly, although I would still not recommend more GPUs than CPU threads.
Nvidia uses polled I/O and this fully utilizes a CPU thread per GPU. You will always have difficulty running more GPUs than CPU threads, some folders need more CPU threads then GPUs, they use their PCs.
AMD uses interrupts, and loads the CPUs fairly lightly, although I would still not recommend more GPUs than CPU threads.
Nvidia uses polled I/O and this fully utilizes a CPU thread per GPU. You will always have difficulty running more GPUs than CPU threads, some folders need more CPU threads then GPUs, they use their PCs.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: Low GPU utilization
Looks like I'm going to do a CPU upgrade then... because I have all NVIDIA GPUs
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Low GPU utilization
I'll be honest … not the groups of cards I expected given the CPU … Interesting collection … I'm not a full time GPU folder, but my gut instinct would be (until you CPU upgrade) to pause the 1050s and let the K40ms have a full CPU core each, check what PPD they are pushing out over a few WUs then un-pause one of the 1050s and test the impact.
Your Celeron as JimboPalmer said should only support two cards properly (as nvidia) but "should", and the realities of what actually happens can sometimes surprise - and you said you have had three running reasonably before - baselining "reasonable" with just the two cards then adding the third will soon tell you is it is worth it until such time as you CPU upgrade … but yes a CPU upgrade would make things easier/better.
Your Celeron as JimboPalmer said should only support two cards properly (as nvidia) but "should", and the realities of what actually happens can sometimes surprise - and you said you have had three running reasonably before - baselining "reasonable" with just the two cards then adding the third will soon tell you is it is worth it until such time as you CPU upgrade … but yes a CPU upgrade would make things easier/better.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Low GPU utilization
I bought the motherboard used, the 2 core celeron came with it.. I've been adding a hodge podge of GPUs all the while suspecting I should upgrade the CPU.. so I guess now is the time.Neil-B wrote:I'll be honest … not the groups of cards I expected given the CPU … Interesting collection … I'm not a full time GPU folder, but my gut instinct would be (until you CPU upgrade) to pause the 1050s and let the K40ms have a full CPU core each, check what PPD they are pushing out over a few WUs then un-pause one of the 1050s and test the impact.
Your Celeron as JimboPalmer said should only support two cards properly (as nvidia) but "should", and the realities of what actually happens can sometimes surprise - and you said you have had three running reasonably before - baselining "reasonable" with just the two cards then adding the third will soon tell you is it is worth it until such time as you CPU upgrade … but yes a CPU upgrade would make things easier/better.
Re: Low GPU utilization
I would never expect a 2 core celeron of being able to supply enough data to 3 GPUs, let alone 4, to keep those GPUs busy. Second, what are the speeds of the PCIe slots?
Third, your [GeForce GTX 1050 Ti]s are respectable GPUs and should be able top produce nicely. The [Tesla K40m]s also pretty good. A lot is going to depend on the speed of the PCIe slot.
Third, your [GeForce GTX 1050 Ti]s are respectable GPUs and should be able top produce nicely. The [Tesla K40m]s also pretty good. A lot is going to depend on the speed of the PCIe slot.
Last edited by bruce on Wed May 13, 2020 10:32 pm, edited 1 time in total.
Reason: Incorrect information has been corrected.
Reason: Incorrect information has been corrected.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Low GPU utilization
… and that (bruce's post) just goes to show I'm not a GPU folder
It does surprise me though that the Tesla K40ms are considered rather weak … I know their clocks are down in comparison to the 1050s but with the significantly larger shader count (x4 ish) and higher FLOPs performance (x2 ish) I'd have expected them to have been better - Techpowerup rate the 1050s 9% behind HD7970 relative performance and the K40ms 23% ahead of HD7970 but I suppose it comes down to what type of relative performance and how that equates to FAH loadings … as I said just shows how little I know about GPUs.
It does surprise me though that the Tesla K40ms are considered rather weak … I know their clocks are down in comparison to the 1050s but with the significantly larger shader count (x4 ish) and higher FLOPs performance (x2 ish) I'd have expected them to have been better - Techpowerup rate the 1050s 9% behind HD7970 relative performance and the K40ms 23% ahead of HD7970 but I suppose it comes down to what type of relative performance and how that equates to FAH loadings … as I said just shows how little I know about GPUs.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Low GPU utilization
Your assessment is probably closer to reality. K40m is in the same neighbourhood as a GTX 970. TechPowerUp reports peak FLOPS for the K40m as more than double a 1050 Ti. Combine that with QRB and it could be 3x the points per day!Neil-B wrote:… and that (bruce's post) just goes to show I'm not a GPU folder
It does surprise me though that the Tesla K40ms are considered rather weak … I know their clocks are down in comparison to the 1050s but with the significantly larger shader count (x4 ish) and higher FLOPs performance (x2 ish) I'd have expected them to have been better - Techpowerup rate the 1050s 9% behind HD7970 relative performance and the K40ms 23% ahead of HD7970 but I suppose it comes down to what type of relative performance and how that equates to FAH loadings … as I said just shows how little I know about GPUs.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Low GPU utilization
My "assessment" was simply a paper one with no grounding in reality - hence why I happily defer to those who do GPU folding for real
The extent of my GPU folding is a Quadro K420 1GB and a Quadro M1000M 2GB that I run a WU through once in a blue moon just cause I can when I get bored and want to watch paint dry (they still make deadlines and occasionally Timeouts) and a GTX 750 Ti 2GB from my late father that I was going to toss in the recycling until in a moment of madness I ran a WU through it and found it actually gets pretty much the same ppd as my 24/56 core CPU slot - so I have left it running out of amusement and in his memory as I know he would have laughed about it
The extent of my GPU folding is a Quadro K420 1GB and a Quadro M1000M 2GB that I run a WU through once in a blue moon just cause I can when I get bored and want to watch paint dry (they still make deadlines and occasionally Timeouts) and a GTX 750 Ti 2GB from my late father that I was going to toss in the recycling until in a moment of madness I ran a WU through it and found it actually gets pretty much the same ppd as my 24/56 core CPU slot - so I have left it running out of amusement and in his memory as I know he would have laughed about it
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Low GPU utilization
the absolute power of the GPUs I have are actually not important to me... I chose them for various reasons.
2 of them perform at about 100 percent utilization each with a 2 core CPU... Running 2 of them (a 1050ti and k40) for a while made me think they are on par with each other. If the Teslas are under-powered it's not a problem anyways.
Simply upgrading to to a 4 core CPU should get 4 GPU's back up to about 100 percent each based on my experience with 2 alone and the information that the CPU threads use polling.
I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
2 of them perform at about 100 percent utilization each with a 2 core CPU... Running 2 of them (a 1050ti and k40) for a while made me think they are on par with each other. If the Teslas are under-powered it's not a problem anyways.
Simply upgrading to to a 4 core CPU should get 4 GPU's back up to about 100 percent each based on my experience with 2 alone and the information that the CPU threads use polling.
I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Low GPU utilization
That is how nVidia wrote the driver code to handle OpenCL commands. From what I understand, CUDA commands are handled by interrupts.scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
There has been much conjecture on why they chose to do it that way. As far as I know, nVidia has not made any statement as to the reason.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Low GPU utilization
I do not have the answer, but I do have an opinion.scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
CUDA is a proprietary interface that locks you into Nvidia cards forever.
OpenCL is an open standard interface that can run anywhere.
How can Nvidia make CUDA look wildly more attractive than OpenCL while still supporting open standards?
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: Low GPU utilization
i have the cuda sdk, and 4 nvidia gpus, and happily writing code for them. not locked in to anything... if i buy an amd gpu i’ll use whatever sdk i need to to program it.JimboPalmer wrote:I do not have the answer, but I do have an opinion.scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
CUDA is a proprietary interface that locks you into Nvidia cards forever.
OpenCL is an open standard interface that can run anywhere.
How can Nvidia make CUDA look wildly more attractive than OpenCL while still supporting open standards?