Page 6 of 18

Re: PCI-e bandwidth/capacity limitations

Posted: Thu Dec 29, 2016 9:18 pm
by yalexey
Is it possible to organize some kind of queue in OpenCL, and preload data? Perhaps due to the processing of two or more threads on a single GPU. Cost of multigpu system essentially depends on the required bus bandwidth.

The other day I saw the description and photo 12 Radeon GPU system on a single Supermicro board. It really works in the tasks of mining, but not suitable for calculations in connection with the issues discussed in this thread.
https://i.gyazo.com/cc8ca224dd86317f4fc ... b89e36.jpg

EDIT by Mod:
Replaced a large image of that system by a link to that image.
(Images are prohibited to save bandwidth]

Re: PCI-e bandwidth/capacity limitations

Posted: Fri Jan 06, 2017 4:30 pm
by foldy
I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.

12 GPUs is a little heavy but 8 should be possible. When each GPU needs pcie 3.0 x4 as minimum then with 8 GPUs you need 4x8 = 32 lanes. And for nvidia GPUs you need a CPU core each to feed them. A intel Core i7-6900K has 8 real cores and 40 pcie lanes - that matches. But the CPU and mainboard are expensive.

Another alternative may be mainboards with PEX switch chip where the pcie lanes used dynamically. This mainboard then even would offers 7 pcie 3.0 x8.
https://www.asus.com/de/Motherboards/X9 ... fications/

I don't know if using some splitters 16 GPUs with pcie 3.0 x2 would be possible with this board?

Most users find it more cheap and easy to just build several dual GPU systems.

Re: PCI-e bandwidth/capacity limitations

Posted: Fri Jan 06, 2017 7:58 pm
by Aurum
This MSI Z87-G45 GAMING motherboard (https://us.msi.com/Motherboard/Z87-G45- ... cification) has 3xPCIe 3.0 x16 slots with operating modes: x16x0x0, x8x8x0, or x8x4x4. So with only two cards it's running x8x8. I may be able to add an RX 480 in the third slot and test it.

Simultaneous FAHbench January 6, 2017
CPU, Card, GPU, GDDR, Brand, GPU Clock, Memory Clock, Shaders, Compute, Precision, WU, Accuracy Check, NaN Check, Run Length, Score, Scaled Score, Atoms
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.7551, 66.7551, 23558, in tandem
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.5858, 66.5858, 23558, in tandem

Individual ASUS RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 66.1621, 66.1621, 23558, alone

Individual MSI RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470 Ellesmere 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 65.2190, 65.2190, 23558, alone

Re: PCI-e bandwidth/capacity limitations

Posted: Fri Jan 06, 2017 8:00 pm
by 7im
foldy wrote:I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.
Understand this how, please?

Re: PCI-e bandwidth/capacity limitations

Posted: Fri Jan 06, 2017 8:04 pm
by Aurum
foldy wrote:Another alternative may be mainboards with PEX switch chip where the pcie lanes used dynamically. This mainboard then even would offers 7 pcie 3.0 x8.
https://www.asus.com/de/Motherboards/X9 ... fications/
I love the board, and look, it's only $510 :shock: :shock: :shock:

Amazon just told me they cancelled the MB they sold me that they don't have so I'm looking for another, hopefully under $200.

Re: PCI-e bandwidth/capacity limitations

Posted: Fri Jan 06, 2017 8:57 pm
by foldy
7im wrote:
foldy wrote:I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.
Understand this how, please?
I mean this is what I read in the forum threads.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 2:17 am
by Aurum
Card, Score, note
GTX 1070 , 87.1, alone 16x slot
GTX 1070 , 54.9, alone 1x slot
GTX 980 Ti, 81.7, alone 16x slot
GTX 980 Ti, 41.1, alone 1x slot
GTX 1070 , 89.1, in tandem 16x slot
GTX 1070 , 49.0, in tandem 1x slot
GTX 980 Ti, 79.1, in tandem 16x slot
GTX 980 Ti, 39.7, in tandem 1x slot

single Precision, dhfr WU (23,558 atoms), Accuracy Check enabled, NaN Check 10 steps, Run Length 60 s alone or 120 s in tandem
Intel Core i3-4130T @ 2.9 GHz, Windows 7 64-bit, 8 GB RAM, 250 GB SATA-III SSD, Corsair AX1200
Nvidia ForceWare 376.48, FAH 7.4.15, FAHbench 2.2.5
EVGA GTX 1070, GP104, 8 GB, GPU Clock 1595 MHz, Memory 2002 MHz, 1920 shaders
EVGA GTX 980 Ti, GM200, 6 GB, GPU Clock 1102 MHz, Memory 1753 MHz, 2816 shaders
ASRock H81 Pro BTC: 1xPCIe 2.0 x16 + 5xPCIe 2.0 x1

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 9:43 am
by foldy
To summaries:
GTX 1070 x16 90ns
GTX 1070 x1 50ns

GTX 980 Ti x16 80ns
GTX 980 Ti x1 40ns

This is up to 50% performance loss for fast GPUs on x1.

Did you measure this on Windows or Linux?

Can you also measure x4? Both in tandem only one measurement.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 3:13 pm
by Aurum
foldy wrote:Did you measure this on Windows or Linux?
I revised the post (see above) to make it easier to read and more complete. Win7-64.
foldy wrote:Can you also measure x4? Both in tandem only one measurement.
Not on this cheap MB that was on my bench getting a frame to mount 1xHD5830 + 5xHD5970s. I will on another MB.

So the Score is some timed event in nanoseconds??? I have yet to see the documenation that explains what FAHbench is doing. E.g., the final score seems to be the last recorded value and not some average of all measurements. This is a problem when running a tandem test because one always finishes first and the second place GPU takes a jump up at the end.

What's the difference between DHFR (23,558 atoms) and DHFR-implicit (2,489 atoms)??? Single versus double precision???
NAV is small so I wonder if it's useful to run.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 3:45 pm
by rwh202
Some numbers for FAH Bench on Linux Mint 17.3 using drivers 367.44.
GPU: EVGA GTX 1080 FTW
MB: MSI Z87-G41
CPU: Pentium G3258 @ 3.2 GHz

Code: Select all

x16 Gen3 (CPU)  1% bus usage. Score: 149.455 (100%)
x16 Gen2 (CPU)  2% bus usage. Score: 148.494 (99.4%)
x4  Gen2 (MB)   5% bus usage. Score: 135.417 (90.6%)
x1  Gen3 (CPU) 13% bus usage. Score: 143.917 (96.3%)
x1  Gen2 (CPU) 23% bus usage. Score: 137.669 (92.1%)
x1  Gen2 (MB)  17% bus usage. Score: 123.570 (82.6%)
So, PCIe bus does have an effect, but the connection (either via MB chipset or direct to CPU) seems to have a greater effect than the nominal link speed.

However, on Linux, the performance drop off appears to be less than that being reported on Windows.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 3:57 pm
by Aurum
rwh202, How do you control whether you route via MB chipset or direct to CPU??? How do you monitor bus usage, a Linux feature???

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 4:09 pm
by rwh202
Aurum wrote:rwh202, How do you control whether you route via MB chipset or direct to CPU??? How do you monitor bus usage, a Linux feature???
The slots on my motherboard are hard wired to either the PCH or CPU - I don't have any control over it - some MBs have more configuration in bios for how the lanes are allocated and shared between slots, but I just moved the card between slots and used a 1x riser to drop to 1x.

Bus usage is reported by the driver to the nvidia x-server settings app in Linux and to the nvidia-smi interface - I think it's the same number reported by GPU-z and other utilities in Windows.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 4:50 pm
by Aurum
rxh202, So if you moved a single 1080 to different slots then does the second 16x slot (PCI_E4) run at x4 when used alone??? I see from the photo that MSI labels the first 16x slot (PCI_E2) as PCI-E3.0 but no label on PCI_E4 just a different lock style.
The MSI web page spec for your MB says:
• 1 x PCIe 3.0 x16 slot
• 1 x PCIe 2.0 x16 slot
- PCI_E4 supports up to PCIe 2.0 x4 speed
• 2 x PCIe 2.0 x1 slots
TIA, just trying to learn this stuff as I've never thought about it before and would like to get the most out of my multi-GPU rigs.
Thanks for the GPU-Z tip, I see the Sensors tab has some interesting monitors. While folding my 1x slot with 980Ti has a Bus Interface Load of 74% and my 16x slot 2.0 slot with a 1070 has 52%. It even tells me why GPU performance is capped.

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 6:06 pm
by foldy
You could also load a real work unit into FahBench. Just copy one from FahClient work folder to FahBench\share\fahbench\workunits and rename accordingly.
On my gtx 970 on Windows 7 64 pcie 2.0 x8 the default FahBench dhfr has 38% bus usage while the real work unit in FahBench uses 60% bus usage like in FahClient.
I always run FahBench in default settings except using a real work unit for bus usage test.
http://www.filedropper.com/real_2

Re: PCI-e bandwidth/capacity limitations

Posted: Sat Jan 07, 2017 6:23 pm
by foldy
So on Linux with a GTX 1080 gen3 x16 vs gen 3 x1 you loose only 4% and another 4% when going down to gen 2 x1.
But on your particular mainboard you loose another 10% when using mb instead of CPU connection.