Multi-GPU PPD drop

knopflerbruce · Post by **knopflerbruce** » Sun Jan 20, 2019 5:30 pm

Has anyone tried to run more than two GPUs in their Pascal/Turing rigs using HW that won't make PCIe bandwidth a bottleneck? My mobos are getting old (LGA115x/X79/X99), and I was wondering if I might just as well look for some space saving solutions, similar to those miners use - through some 8x pcie risers. But, if the PPD drop is substantial, 3-7 GPU setups isn't the way to go. Currently running two per rig.

Post by **bruce** » Sun Jan 20, 2019 8:10 pm

There's no Yes or No answer. It will depend on the speed of your GPU, the speed of your PCIe connection, and the particular project you have been assigned.

What defines a bottleneck vs. efficient use of the bus?

Data transfers across the PCIe bus happens in bursts. The duration of those bursts depend on the atom-count of the protein compared to the speed of transfer. The GPU will process the data for a length of time dependent on the speed of the GPU and the number of atoms. Sometimes the GPU may get far enough ahead of the data transfer process that the GPU will have to wait for new data to arrive. The %Utilization of the PCIe bus isn't a very good way of predictging if the data burst overlaps with the GPU finishing processing the last data block or how much overlap there is.

As multiple GPUs are added, there may be times when the CPU cannot supply data immediately when the GPU asks the bus to transfer more data, but that should be very rare as long as there is a free CPU thread allocated for each GPU. (That's supposed to protect that CPU being idle when data is needed.)

Risers that split a 16x slot into muliple 1x slots (or other numbers) are likely able to cause a PCIe slot appear busy when a different GPU is getting data than a second GPU that wants data.

In other words, contention for any resource only becomes a problem when the statistics allow that conflict to be big enough that you notice.

Rarely has anybody who tested their system also determined what resource was responsible for excessive contention because they just give a Y/N answer to a complex issue. Instead they give a simple answer like 2.0 8x is enough for most projects (omitting their GPU speed or the atom-counts of the projects that were active when they ran the test).

Bottom line: Adding a 3rd GPU probably won't make the contention probability into a bottleneck unless your system already had a small limitation into a big limitation -- or unless you are forced to put it iin a PCIe slot with too little bandwidth. Put the fastest GPU on the fastest slot and the slowest GPU on the slowest slot.

foldy · Post by **foldy** » Sun Jan 20, 2019 8:16 pm

On Windows pcie 3.0 x4 is the minimum for fast GPUs to not get too much bottleneck. On Linux it doesn't matter much even x1 risers work with slight bottleneck.

ProDigit · Post by **ProDigit** » Sun Jan 20, 2019 8:25 pm

I'm running a GTX 1050 on a PCIE 1x bus, and with a little overclock it does 104PPD, vs 133 on the PCIE 16x slot.
Which means an AMD Radeon RX 550 is about as fast as you can go on a PCIE 1x slot; or in theory you could run 2x GT 1030s from a 1x slot.

With every addition of a card, there will be a performance loss on the card, regardless of how fast the bus is (even PCIE 5.0 slots will lose some performance running 2 cards, vs 1 card; this due to the 'collision of data' that occasionally happens over these busses that are meant to run only 1 card off).

That being said, by the end of the month, a new mobo will arrive, and I have purchased a PCIE 1x splitter (1 to 4 cards), and will run it from Linux, to see how effective it is (provided the hardware I've ordered is compatible and working).

So far, the only PCIE 16x splitters I've seen online are cards, and ridiculously expensive.
The only affordable option are the 1x cards.

foldy · Post by **foldy** » Mon Jan 21, 2019 12:01 pm

I'm not sure if you loose 20% performance on Linux either gtx 1050 or RTX 2080 on pcie x1, because both wait for CPU/pcie data.
But the RTX 2080 would still produce 80% of its 1400k PPD which is 1100k PPD while gtx 1050 gets 104k PPD?

Holdolin · Post by **Holdolin** » Mon Jan 21, 2019 4:00 pm

My newst folding rig has 3 RTX2070's on an old Asuss P6T motherboard (PCIE2.0 in 16/16/4mode) and all 3 cards are pushing the 1.1M PPD generally seen in said card so no, i'm seeing no drop-off for multiple GPU's.

Post by **Joe_H** » Mon Jan 21, 2019 4:11 pm

Holdolin wrote:My newst folding rig has 3 RTX2070's on an old Asuss P6T motherboard (PCIE2.0 in 16/16/4mode) and all 3 cards are pushing the 1.1M PPD generally seen in said card so no, i'm seeing no drop-off for multiple GPU's.

Running Windows or Linux? That matters.

In any case, there is a long topic that discusses this issue - viewtopic.php?f=38&t=28847 - as it is ultimately related to PCIe bandwidth available to a GPU.

Holdolin · Post by **Holdolin** » Mon Jan 21, 2019 5:15 pm

Joe_H wrote:
Holdolin wrote:My newst folding rig has 3 RTX2070's on an old Asuss P6T motherboard (PCIE2.0 in 16/16/4mode) and all 3 cards are pushing the 1.1M PPD generally seen in said card so no, i'm seeing no drop-off for multiple GPU's.
Running Windows or Linux? That matters.

In any case, there is a long topic that discusses this issue - viewtopic.php?f=38&t=28847 - as it is ultimately related to PCIe bandwidth available to a GPU.

Linux - because it matters

knopflerbruce · Post by **knopflerbruce** » Sat Feb 02, 2019 1:06 am

I completely forgot about this thread. I was thinking about running some cards in WS mobos, with x8 Gen 3 bandwidth. I recall there was a PPD drop even in high bandwidth scenarios before, when using multiple GPUs.

gordonbb · Post by **gordonbb** » Sat Feb 02, 2019 7:47 am

knopflerbruce wrote:I completely forgot about this thread. I was thinking about running some cards in WS mobos, with x8 Gen 3 bandwidth. I recall there was a PPD drop even in high bandwidth scenarios before, when using multiple GPUs.

As long as you have a thread or core dedicated for each GPU if they’re NVidia and you have adequate airflow PCIe3 x8 should be fine even under Windows and you shouldn’t see any reduction in PPD.

I’m running 1070ti and and a 2070 on a LGA1151 with a 3.9GHz Pentium G5500 on PCIe3 x8 on Linux and I see no reduction in PPD compared to running them on a PCIe3 x16.

HaloJones · Post by **HaloJones** » Thu Jun 20, 2019 1:20 pm

To resurrect an old thread, I have just built a dual 1070 rig running Windows and each card is struggling to do 600K when in its previous iteration a single 1070 was getting 850K. However, the very high PPD was using 11733 and the beta Core_22. Those units appear scarce right now so this new rig is running various 14xxx units. Is Windows with two cards always this bad or is this Windows+Core_21 is this bad?

(extra info, cpu is 2600K no CPU client; cards are water-cooled and <45C; cards are running at 2GHz, CPU at 4GHz, both GPU slots report PCIE gen 2x16)

foldy · Post by **foldy** » Thu Jun 20, 2019 2:13 pm

Windows+Core_21 is this bad. And with CPU 2600k you run 2 pcie slots at gen 2 x8 = gen 3 x4 which is also the lower limit. If you run Linux instead (e.g. as dual boot if you need Windows too) then it will give more PPD. On Linux you could even run 4 GPUs at pcie gen 2 x4 if feasible.

Nathan_P · Post by **Nathan_P** » Thu Jun 20, 2019 2:32 pm

@ HaloJones 600k for a 1070 has been the long reported norm under windows, anything over that is a bonus and/or being run on linux. My 1070 gets anything from 600k to 825k on linux.

As you said you were getting 850k with the new core which is to be expected, the new core is doing more science and the PPD should reflect that.

HaloJones · Post by **HaloJones** » Thu Jun 20, 2019 4:04 pm

@Nathan_P, another machine (i5-2400, P67 mojo, 1070 at the same clock, Windows10) is getting 700K with these units so I'm just wondering if the relative slowness of this dual 1070 rig could be down to the alleged latency built into Z77 boards that use a PLX chip to achieve 2 slots at x16. I'd never read about this inherent latency until today. I had a simpler Z77 board with these two 1070s but it died when the PSU blew. I replaced it with this Gigabyte Sniper G1 board that promised 2x x16 but now it may be that this is at the cost of some latency on the PCIE channels. Hopefully when some Core22 units come back I can get a better picture of how much this improves the output.

Post by **bruce** » Thu Jun 20, 2019 4:15 pm

The core_22 beta is just that, a beta. It's not bug-free yet. It does crash periodically on many machines and in its present form will not average your reported 700K even though individual WU will. There's no prediction for what may happen when it is released ... except for the fact that it's currently doing more science and that's likely to continue.

Folding Forum

Multi-GPU PPD drop

Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop

Re: Multi-GPU PPD drop