Multi-GPU PPD drop
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
-
- Posts: 31
- Joined: Tue Dec 01, 2009 1:31 pm
Multi-GPU PPD drop
Has anyone tried to run more than two GPUs in their Pascal/Turing rigs using HW that won't make PCIe bandwidth a bottleneck? My mobos are getting old (LGA115x/X79/X99), and I was wondering if I might just as well look for some space saving solutions, similar to those miners use - through some 8x pcie risers. But, if the PPD drop is substantial, 3-7 GPU setups isn't the way to go. Currently running two per rig.
Re: Multi-GPU PPD drop
There's no Yes or No answer. It will depend on the speed of your GPU, the speed of your PCIe connection, and the particular project you have been assigned.
What defines a bottleneck vs. efficient use of the bus?
Data transfers across the PCIe bus happens in bursts. The duration of those bursts depend on the atom-count of the protein compared to the speed of transfer. The GPU will process the data for a length of time dependent on the speed of the GPU and the number of atoms. Sometimes the GPU may get far enough ahead of the data transfer process that the GPU will have to wait for new data to arrive. The %Utilization of the PCIe bus isn't a very good way of predictging if the data burst overlaps with the GPU finishing processing the last data block or how much overlap there is.
As multiple GPUs are added, there may be times when the CPU cannot supply data immediately when the GPU asks the bus to transfer more data, but that should be very rare as long as there is a free CPU thread allocated for each GPU. (That's supposed to protect that CPU being idle when data is needed.)
Risers that split a 16x slot into muliple 1x slots (or other numbers) are likely able to cause a PCIe slot appear busy when a different GPU is getting data than a second GPU that wants data.
In other words, contention for any resource only becomes a problem when the statistics allow that conflict to be big enough that you notice.
Rarely has anybody who tested their system also determined what resource was responsible for excessive contention because they just give a Y/N answer to a complex issue. Instead they give a simple answer like 2.0 8x is enough for most projects (omitting their GPU speed or the atom-counts of the projects that were active when they ran the test).
Bottom line: Adding a 3rd GPU probably won't make the contention probability into a bottleneck unless your system already had a small limitation into a big limitation -- or unless you are forced to put it iin a PCIe slot with too little bandwidth. Put the fastest GPU on the fastest slot and the slowest GPU on the slowest slot.
What defines a bottleneck vs. efficient use of the bus?
Data transfers across the PCIe bus happens in bursts. The duration of those bursts depend on the atom-count of the protein compared to the speed of transfer. The GPU will process the data for a length of time dependent on the speed of the GPU and the number of atoms. Sometimes the GPU may get far enough ahead of the data transfer process that the GPU will have to wait for new data to arrive. The %Utilization of the PCIe bus isn't a very good way of predictging if the data burst overlaps with the GPU finishing processing the last data block or how much overlap there is.
As multiple GPUs are added, there may be times when the CPU cannot supply data immediately when the GPU asks the bus to transfer more data, but that should be very rare as long as there is a free CPU thread allocated for each GPU. (That's supposed to protect that CPU being idle when data is needed.)
Risers that split a 16x slot into muliple 1x slots (or other numbers) are likely able to cause a PCIe slot appear busy when a different GPU is getting data than a second GPU that wants data.
In other words, contention for any resource only becomes a problem when the statistics allow that conflict to be big enough that you notice.
Rarely has anybody who tested their system also determined what resource was responsible for excessive contention because they just give a Y/N answer to a complex issue. Instead they give a simple answer like 2.0 8x is enough for most projects (omitting their GPU speed or the atom-counts of the projects that were active when they ran the test).
Bottom line: Adding a 3rd GPU probably won't make the contention probability into a bottleneck unless your system already had a small limitation into a big limitation -- or unless you are forced to put it iin a PCIe slot with too little bandwidth. Put the fastest GPU on the fastest slot and the slowest GPU on the slowest slot.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: Multi-GPU PPD drop
On Windows pcie 3.0 x4 is the minimum for fast GPUs to not get too much bottleneck. On Linux it doesn't matter much even x1 risers work with slight bottleneck.
Re: Multi-GPU PPD drop
I'm running a GTX 1050 on a PCIE 1x bus, and with a little overclock it does 104PPD, vs 133 on the PCIE 16x slot.
Which means an AMD Radeon RX 550 is about as fast as you can go on a PCIE 1x slot; or in theory you could run 2x GT 1030s from a 1x slot.
With every addition of a card, there will be a performance loss on the card, regardless of how fast the bus is (even PCIE 5.0 slots will lose some performance running 2 cards, vs 1 card; this due to the 'collision of data' that occasionally happens over these busses that are meant to run only 1 card off).
That being said, by the end of the month, a new mobo will arrive, and I have purchased a PCIE 1x splitter (1 to 4 cards), and will run it from Linux, to see how effective it is (provided the hardware I've ordered is compatible and working).
So far, the only PCIE 16x splitters I've seen online are cards, and ridiculously expensive.
The only affordable option are the 1x cards.
Which means an AMD Radeon RX 550 is about as fast as you can go on a PCIE 1x slot; or in theory you could run 2x GT 1030s from a 1x slot.
With every addition of a card, there will be a performance loss on the card, regardless of how fast the bus is (even PCIE 5.0 slots will lose some performance running 2 cards, vs 1 card; this due to the 'collision of data' that occasionally happens over these busses that are meant to run only 1 card off).
That being said, by the end of the month, a new mobo will arrive, and I have purchased a PCIE 1x splitter (1 to 4 cards), and will run it from Linux, to see how effective it is (provided the hardware I've ordered is compatible and working).
So far, the only PCIE 16x splitters I've seen online are cards, and ridiculously expensive.
The only affordable option are the 1x cards.
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: Multi-GPU PPD drop
I'm not sure if you loose 20% performance on Linux either gtx 1050 or RTX 2080 on pcie x1, because both wait for CPU/pcie data.
But the RTX 2080 would still produce 80% of its 1400k PPD which is 1100k PPD while gtx 1050 gets 104k PPD?
But the RTX 2080 would still produce 80% of its 1400k PPD which is 1100k PPD while gtx 1050 gets 104k PPD?
Re: Multi-GPU PPD drop
My newst folding rig has 3 RTX2070's on an old Asuss P6T motherboard (PCIE2.0 in 16/16/4mode) and all 3 cards are pushing the 1.1M PPD generally seen in said card so no, i'm seeing no drop-off for multiple GPU's.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Multi-GPU PPD drop
Running Windows or Linux? That matters.Holdolin wrote:My newst folding rig has 3 RTX2070's on an old Asuss P6T motherboard (PCIE2.0 in 16/16/4mode) and all 3 cards are pushing the 1.1M PPD generally seen in said card so no, i'm seeing no drop-off for multiple GPU's.
In any case, there is a long topic that discusses this issue - viewtopic.php?f=38&t=28847 - as it is ultimately related to PCIe bandwidth available to a GPU.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: Multi-GPU PPD drop
Linux - because it mattersJoe_H wrote:Running Windows or Linux? That matters.Holdolin wrote:My newst folding rig has 3 RTX2070's on an old Asuss P6T motherboard (PCIE2.0 in 16/16/4mode) and all 3 cards are pushing the 1.1M PPD generally seen in said card so no, i'm seeing no drop-off for multiple GPU's.
In any case, there is a long topic that discusses this issue - viewtopic.php?f=38&t=28847 - as it is ultimately related to PCIe bandwidth available to a GPU.
-
- Posts: 31
- Joined: Tue Dec 01, 2009 1:31 pm
Re: Multi-GPU PPD drop
I completely forgot about this thread. I was thinking about running some cards in WS mobos, with x8 Gen 3 bandwidth. I recall there was a PPD drop even in high bandwidth scenarios before, when using multiple GPUs.
-
- Posts: 511
- Joined: Mon May 21, 2018 4:12 pm
- Hardware configuration: Ubuntu 22.04.2 LTS; NVidia 525.60.11; 2 x 4070ti; 4070; 4060ti; 3x 3080; 3070ti; 3070
- Location: Great White North
Re: Multi-GPU PPD drop
As long as you have a thread or core dedicated for each GPU if they’re NVidia and you have adequate airflow PCIe3 x8 should be fine even under Windows and you shouldn’t see any reduction in PPD.knopflerbruce wrote:I completely forgot about this thread. I was thinking about running some cards in WS mobos, with x8 Gen 3 bandwidth. I recall there was a PPD drop even in high bandwidth scenarios before, when using multiple GPUs.
I’m running 1070ti and and a 2070 on a LGA1151 with a 3.9GHz Pentium G5500 on PCIe3 x8 on Linux and I see no reduction in PPD compared to running them on a PCIe3 x16.
Re: Multi-GPU PPD drop
To resurrect an old thread, I have just built a dual 1070 rig running Windows and each card is struggling to do 600K when in its previous iteration a single 1070 was getting 850K. However, the very high PPD was using 11733 and the beta Core_22. Those units appear scarce right now so this new rig is running various 14xxx units. Is Windows with two cards always this bad or is this Windows+Core_21 is this bad?
(extra info, cpu is 2600K no CPU client; cards are water-cooled and <45C; cards are running at 2GHz, CPU at 4GHz, both GPU slots report PCIE gen 2x16)
(extra info, cpu is 2600K no CPU client; cards are water-cooled and <45C; cards are running at 2GHz, CPU at 4GHz, both GPU slots report PCIE gen 2x16)
single 1070
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: Multi-GPU PPD drop
Windows+Core_21 is this bad. And with CPU 2600k you run 2 pcie slots at gen 2 x8 = gen 3 x4 which is also the lower limit. If you run Linux instead (e.g. as dual boot if you need Windows too) then it will give more PPD. On Linux you could even run 4 GPUs at pcie gen 2 x4 if feasible.
-
- Posts: 1164
- Joined: Wed Apr 01, 2009 9:22 pm
- Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)
Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS
Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only) - Location: Jersey, Channel islands
Re: Multi-GPU PPD drop
@ HaloJones 600k for a 1070 has been the long reported norm under windows, anything over that is a bonus and/or being run on linux. My 1070 gets anything from 600k to 825k on linux.
As you said you were getting 850k with the new core which is to be expected, the new core is doing more science and the PPD should reflect that.
As you said you were getting 850k with the new core which is to be expected, the new core is doing more science and the PPD should reflect that.
Re: Multi-GPU PPD drop
@Nathan_P, another machine (i5-2400, P67 mojo, 1070 at the same clock, Windows10) is getting 700K with these units so I'm just wondering if the relative slowness of this dual 1070 rig could be down to the alleged latency built into Z77 boards that use a PLX chip to achieve 2 slots at x16. I'd never read about this inherent latency until today. I had a simpler Z77 board with these two 1070s but it died when the PSU blew. I replaced it with this Gigabyte Sniper G1 board that promised 2x x16 but now it may be that this is at the cost of some latency on the PCIE channels. Hopefully when some Core22 units come back I can get a better picture of how much this improves the output.
single 1070
Re: Multi-GPU PPD drop
The core_22 beta is just that, a beta. It's not bug-free yet. It does crash periodically on many machines and in its present form will not average your reported 700K even though individual WU will. There's no prediction for what may happen when it is released ... except for the fact that it's currently doing more science and that's likely to continue.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.