GPU on PCIE 1x possible?

MeeLee · Post by **MeeLee** » Sat May 02, 2020 4:31 pm

Core 22 using PCIE x1 affects mostly the fastest (RTX 2060 and above) GPUs in a negative way.
With the old Core 21, it was perfectly possible to run even RTX 2070 GPUs on a PCIE 3.0 x1 slot (in Linux) without too much PPD losses.
The newer core actually lowers PPD vs the old core on an x1 slot.
Which is why you'd want to reserve x1 slots for your slowest GPUs.

DF1 · Post by **DF1** » Mon May 04, 2020 3:35 pm

Nuitari - I think I have started the same journey as you have traversed. Can you please share your Config file?
I'm walking before I run so only two GPU's at the moment on an ASUS Mining Expert Motherboard capable of 19 slots. What I am seeing is the two GPU's / CUDA and OPENCL enabled etc. For some reason the two folds are assigned to the first GPU per NVIDIA-SMI. Nothing assigned per NVIDIA-SMI to the second GPU. FAH reports both are folding however. I'm guessing my config is off.
I will note that per NVIDIA's Utilization graph the first GPU is pinned at 99% - no dips etc so its not data starved by the single PCI-E lane. Memory utilization on the GPU is minimal - well below half a Gigabyte on a 5GB card per Work Unit.
My assumption is the CPU in this scenario should not fold and simply be an IO Processor for the GPU's. I have a lame Celeron in there because that was all I thought the rig needed. (I built the rig for this and number crunching - it has never mined and will not.)

Please forgive the newbie questions.
And one more - is there a "test" fold that can be run for benchmarking? At 99% I am getting this puppy very hot - I want something I can load that will generate a metric I can evaluate as I tweak parameters. IE: perhaps downshifting the GPU clock will reduce the stress on the card. It will take longer but that is relative.

Many thanks

foldy · Post by **foldy** » Mon May 04, 2020 4:40 pm

Each GPU needs one CPU thread to feed it.

https://fahbench.github.io/

FahBench build with Core22 for Windows and real work unit.
https://www.file-upload.net/download-13 ... n.zip.html

Post by **Joe_H** » Mon May 04, 2020 6:33 pm

For a test load, FAHBench utilizes data from an old GPU WU an code to execute it derived from the same as used in the Core_21 folding core. There is also an unofficial build using updated code as used in Core_22.

You mention that you motherboard could support up to 19 slots, however the current folding client is only capable of 10 slots, the single digits 0-9.

HaloJones · Post by **HaloJones** » Mon May 04, 2020 6:39 pm

DF1 wrote:Nuitari - I think I have started the same journey as you have traversed. Can you please share your Config file?
I'm walking before I run so only two GPU's at the moment on an ASUS Mining Expert Motherboard capable of 19 slots. What I am seeing is the two GPU's / CUDA and OPENCL enabled etc. For some reason the two folds are assigned to the first GPU per NVIDIA-SMI. Nothing assigned per NVIDIA-SMI to the second GPU. FAH reports both are folding however. I'm guessing my config is off.
I will note that per NVIDIA's Utilization graph the first GPU is pinned at 99% - no dips etc so its not data starved by the single PCI-E lane. Memory utilization on the GPU is minimal - well below half a Gigabyte on a 5GB card per Work Unit.
My assumption is the CPU in this scenario should not fold and simply be an IO Processor for the GPU's. I have a lame Celeron in there because that was all I thought the rig needed. (I built the rig for this and number crunching - it has never mined and will not.)

Please forgive the newbie questions.
And one more - is there a "test" fold that can be run for benchmarking? At 99% I am getting this puppy very hot - I want something I can load that will generate a metric I can evaluate as I tweak parameters. IE: perhaps downshifting the GPU clock will reduce the stress on the card. It will take longer but that is relative.

Many thanks

If you could post the config of your system we may be able to help. With more than one card, it is critical to get the drivers, OpenCL, and the configuration of the GPU slots all lined up.

Nuitari · Post by **Nuitari** » Tue May 05, 2020 4:22 am

@DF1, I wouldn't go above 2 GPU per CPU core, starvation happens quickly and you start losing performance. This is where your bottleneck is going to be.

I had to stop CPU (AMD A8-9600 RADEON R7) folding on my 7 GPU rig as it was cratering the productivity of the GPUs.

It is entirely possible to have multiple loads on 1 GPU, and I do not know if the client tries to prevent it.

My configuration, for the slots:

Code: Select all

  <slot id='1' type='GPU'>
    <gpu-index v='0'/>
  </slot>
  <slot id='3' type='GPU'>
    <gpu-index v='1'/>
  </slot>
  <slot id='2' type='GPU'>
    <gpu-index v='6'/>
  </slot>
  <slot id='4' type='GPU'>
    <gpu-index v='2'/>
  </slot>
  <slot id='5' type='GPU'>
    <gpu-index v='3'/>
  </slot>
  <slot id='7' type='GPU'>
    <gpu-index v='5'/>
  </slot>
  <slot id='6' type='GPU'>
    <gpu-index v='4'/>
  </slot>

It would be helpful to see your configuration.

DF1 · Post by **DF1** » Tue May 05, 2020 5:22 pm

Thank you Nuitari. You have saved me quite a lot of time.

DF1

DF1 · Post by **DF1** » Wed May 06, 2020 3:13 am

Thanks for the help.
Here's the situation from multiple perspectives:
1) LOG file

Code: Select all

*********************** Log Started 2020-05-06T02:44:45Z ***********************
02:44:45:****************************** FAHClient ******************************
02:44:45:        Version: 7.6.9
02:44:45:         Author: Joseph Coffland <[email protected]>
02:44:45:      Copyright: 2020 foldingathome.org
02:44:45:       Homepage: https://foldingathome.org/
02:44:45:           Date: Apr 17 2020
02:44:45:           Time: 11:13:06
02:44:45:       Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
02:44:45:         Branch: master
02:44:45:       Compiler: Visual C++ 2008
02:44:45:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:44:45:       Platform: win32 10
02:44:45:           Bits: 32
02:44:45:           Mode: Release
02:44:45:         Config: C:\Users\sysadmin\AppData\Roaming\FAHClient\config.xml
02:44:45:******************************** CBang ********************************
02:44:45:           Date: Apr 17 2020
02:44:45:           Time: 11:10:09
02:44:45:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
02:44:45:         Branch: master
02:44:45:       Compiler: Visual C++ 2008
02:44:45:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:44:45:       Platform: win32 10
02:44:45:           Bits: 32
02:44:45:           Mode: Release
02:44:45:******************************* System ********************************
02:44:45:            CPU: Intel(R) Celeron(R) CPU G3900 @ 2.80GHz
02:44:45:         CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
02:44:45:           CPUs: 2
02:44:45:         Memory: 15.89GiB
02:44:45:    Free Memory: 14.47GiB
02:44:45:        Threads: WINDOWS_THREADS
02:44:45:     OS Version: 6.1
02:44:45:    Has Battery: false
02:44:45:     On Battery: false
02:44:45:     UTC Offset: -7
02:44:45:            PID: 2748
02:44:45:            CWD: C:\Users\sysadmin\AppData\Roaming\FAHClient
02:44:45:             OS: Windows 7 Professional
02:44:45:        OS Arch: AMD64
02:44:45:           GPUs: 2
02:44:45:          GPU 0: Bus:11 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K20M]
02:44:45:          GPU 1: Bus:13 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K20M]
02:44:45:  CUDA Device 0: Platform:0 Device:0 Bus:11 Slot:0 Compute:3.5 Driver:6.5
02:44:45:  CUDA Device 1: Platform:0 Device:1 Bus:13 Slot:0 Compute:3.5 Driver:6.5
02:44:45:OpenCL Device 0: Platform:0 Device:0 Bus:NA Slot:NA Compute:1.2 Driver:21.20
02:44:45:OpenCL Device 2: Platform:1 Device:0 Bus:11 Slot:0 Compute:1.1 Driver:342.0
02:44:45:OpenCL Device 3: Platform:1 Device:1 Bus:13 Slot:0 Compute:1.1 Driver:342.0
02:44:45:  Win32 Service: false
02:44:45:******************************* libFAH ********************************
02:44:45:           Date: Apr 15 2020
02:44:45:           Time: 14:53:14
02:44:45:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
02:44:45:         Branch: master
02:44:45:       Compiler: Visual C++ 2008
02:44:45:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:44:45:       Platform: win32 10
02:44:45:           Bits: 32
02:44:45:           Mode: Release
02:44:45:***********************************************************************
02:44:45:<config>
02:44:45:  <!-- Network -->
02:44:45:  <proxy v=':8080'/>
02:44:45:
02:44:45:  <!-- Slot Control -->
02:44:45:  <power v='full'/>
02:44:45:
02:44:45:  <!-- User Information -->
//REMOVED 
02:44:45:
02:44:45:  <!-- Folding Slots -->
02:44:45:  <slot id='0' type='CPU'>
02:44:45:    <paused v='true'/>
02:44:45:  </slot>
02:44:45:  <slot id='1' type='GPU'>
02:44:45:    <cuda-index v='0'/>
02:44:45:    <gpu-index v='0'/>
02:44:45:    <opencl-index v='0'/>
02:44:45:    <paused v='true'/>
02:44:45:  </slot>
02:44:45:  <slot id='2' type='GPU'>
02:44:45:    <cuda-index v='0'/>
02:44:45:    <gpu-index v='1'/>
02:44:45:    <opencl-index v='0'/>
02:44:45:    <paused v='true'/>
02:44:45:  </slot>
02:44:45:</config>
02:44:45:Trying to access database...
02:44:45:Successfully acquired database lock
02:44:45:Enabled folding slot 00: PAUSED cpu:1 (by user)
02:44:45:Enabled folding slot 01: PAUSED gpu:0:GK110 [Tesla K20M] (by user)
02:44:45:Enabled folding slot 02: PAUSED gpu:1:GK110 [Tesla K20M] (by user)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

2) Config file

Code: Select all

<config>
  <!-- Network -->
  <proxy v=':8080'/>

  <!-- Slot Control -->
  <power v='full'/>

  <!-- User Information -->
//REMOVED
  <!-- Folding Slots -->
  <slot id='0' type='CPU'>
    <paused v='true'/>
  </slot>
  <slot id='1' type='GPU'>
    <cuda-index v='0'/>
    <gpu-index v='0'/>
    <opencl-index v='0'/>
    <paused v='true'/>
  </slot>
  <slot id='2' type='GPU'>
    <cuda-index v='0'/>
    <gpu-index v='1'/>
    <opencl-index v='0'/>
    <paused v='true'/>
  </slot>
</config>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

3) NVIDIA-SMI (possibly a different run but it is characteristic. Note two processes on GPU0 and nothing on GPU1)
/* Even at 99% its not pegging the wattmeter */

Code: Select all

PS C:\Program Files\NVIDIA Corporation\NVSMI> .\nvidia-smi.exe
Sat May 02 17:31:28 2020
+------------------------------------------------------+
| NVIDIA-SMI 342.00     Driver Version: 342.00         |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          TCC  | 0000:0B:00.0     Off |                    0 |
| N/A   49C    P0    93W / 225W |    523MiB /  4799MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          TCC  | 0000:0D:00.0     Off |                    0 |
| N/A   27C    P8    13W / 225W |     47MiB /  4799MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0      4956  ...ome.org\v7\win\64bit\Core_22.fah\FahCore_22.exe   198MiB |
|    0      4976  ...ome.org\v7\win\64bit\Core_22.fah\FahCore_22.exe   274MiB |
+-----------------------------------------------------------------------------+

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>..

4) NVIDIA GPU Utilization meter shows interfaces barely loaded. Plenty of memory available. No surprise.

EOF

Post by **PantherX** » Wed May 06, 2020 4:26 am

Welcome to the F@H Forum DF1,

The issue is with how you configured your GPU Slots. Give this a shot:

Code: Select all

  <slot id='1' type='GPU'>
    <cuda-index v='0'/>
    <gpu-index v='0'/>
    <opencl-index v='0'/>
    <paused v='true'/>
  </slot>
  <slot id='2' type='GPU'>
    <cuda-index v='1'/>
    <gpu-index v='1'/>
    <opencl-index v='1'/>
    <paused v='true'/>
  </slot>

I would also suggest that you remove the CPU Slot since you have 2 CPUs and 2 GPUs which requires a CPU each as they are Nvidia GPUs.

DF1 · Post by **DF1** » Wed May 06, 2020 4:17 pm

Thanks PantherX (and all).
That was what I suspected. Is that 1:1 CPU core to GPU a hard rule or just optimal? The MOBO only supports up to a 4 core/8 thread i7-7700K best case. I'll upgrade if it makes a measurable difference.

Post by **Joe_H** » Wed May 06, 2020 4:54 pm

Depends a bit on what GPU that CPU core is feeding. If all were in the 2070/2080 class of GPU, that would be pretty close to a hard rule to keep data flowing fast enough to and from the GPU to fully utilize it. Persons who have tested on lower performance cards did report that a couple could share a CPU core with some moderate impact on throughput.

So, optimal in all cases, and for high end cards there will be a very measurable difference.

Post by **PantherX** » Thu May 07, 2020 9:00 am

DF1 wrote:...The MOBO only supports up to a 4 core/8 thread i7-7700K best case. I'll upgrade if it makes a measurable difference.

With 2 CPUs consumed by the dual GPUs, it will leave 6 CPUs. I am not sure what your definition of "measurable difference" is. If you mean that you're performing important scientific work, then yes, your CPU will be contributing to that. If you mean PPD will be measurable to the GPUs, it will not be since GPUs are significantly more powerful than CPUs but they are rather limited in what calculations they can do.

MeeLee · Post by **MeeLee** » Thu May 07, 2020 4:58 pm

It would be interesting to see if other users, running a modern Ryzen system with access to PCIE 4.0, and a PCIE 4,0 capable GPU (AMD only has those at the moment), if they could record their avg PPDs on both a full size slot, and a PCIE 1x slot with riser.
PCIE 4.0 x1 should in theory have enough bandwidth to run core 22 WUs on an RX 5700 (or on the RX 5500 xt, which is the only other consumer grade GPU supporting PCIE 4.0 on the market right now).

Folding Forum

GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?

Re: GPU on PCIE 1x possible?