bruce wrote:The OpenCL driver for Linux is significantly different than the Windows OpenCL driver. Each GPU uses at most one CPU thread. Many projects do significantly better if you have a faster CPU (Central Processing Unit). The GPU can't get data fast enough to stay busy ...
And how much more does one need to run 2 GPUs (with no CPU folding) than a FX-8320 Eight-Core Processor running at 3500.00MHz?
bruce wrote:And how much more does one need to run 2 GPUs (with no CPU folding) than a FX-8320 Eight-Core Processor running at 3500.00MHz?
Unfortunately having 8-cores doesn't matter. Drivers are not multi-threaded. Each GPU uses one CPU core. I'll bet you could create a slot that uses 6 CPUs without changing the GPU performance ... unless you already run some "heavy" applications that continuously use some of your CPU threads.
bruce wrote:The OpenCL driver for Linux is significantly different than the Windows OpenCL driver. Each GPU uses at most one CPU thread. Many projects do significantly better if you have a faster CPU (Central Processing Unit). The GPU can't get data fast enough to stay busy ...
bruce wrote:
ComputerGenie wrote:And how much more does one need to run 2 GPUs (with no CPU folding) than a FX-8320 Eight-Core Processor running at 3500.00MHz?
Unfortunately having 8-cores doesn't matter. Drivers are not multi-threaded. Each GPU uses one CPU core. I'll bet you could create a slot that uses 6 CPUs without changing the GPU performance ... unless you already run some "heavy" applications that continuously use some of your CPU threads.
OK, so to re-ask:
And how much more does one need to run 2 1080 GPUs (with no CPU folding) than a CPU running at 3500.00MHz?
Ignore the clock speed of the CPU, it is a relatively meaningless measure of performance. The clock speed is only useful comparing processors from the same family. As a design that dates back nearly 5 years, the FX-8320 is probably fast enough in most cases, but its ability to transfer data between the CPU and the GPU card in a PCIe slot is also going to be dependent on the chipset that connects them.
Joe_H wrote:Ignore the clock speed of the CPU, it is a relatively meaningless measure of performance. The clock speed is only useful comparing processors from the same family. As a design that dates back nearly 5 years, the FX-8320 is probably fast enough in most cases, but its ability to transfer data between the CPU and the GPU card in a PCIe slot is also going to be dependent on the chipset that connects them.
M5A99FX PRO R2.0
"Chipset - AMD 990FX/SB950
System Bus - Up to 5.2 GT/s HyperTransport™ 3.0 "
Still not sure how that could/would have a massive effect on one given RCG and not another RCG in the same project.
It could be dependent on the atom count ... in fact, that's pretty likely. It still depends a lot on how NVidia wrote their OpenCL driver. A certain amount of the OpenCL work is done on the CPU, preparing data to be transferred to the GPU. Add the time for the chipset to transfer the data to the GPU and it produces less than ideal performance. As I've said several times, PG is aware of the problem and working toward an improvement but they're still dependent on whatever drivers NVidia distributes.
Any update on this? This project seems to have spread out. Currently have all nine of my GPU's running it at the same time. From 980's to 1080TI's all the same impact. Ubuntu 14.04/16.04. Nvidia 370.28 to 281.22. So a mix.
Dedicated to my grandparents who have passed away from Alzheimer's
When there are many projects that are designed to work with your hardware, you'll get a variety of assignments. If some of those projects happen to be off-line, the frequency of assignments for the remaining projects will increase. If it happens that only a few (or even only one) project is available at the time, it's conceivable that all your systems will be running that project(s).
Rest assured that every WU is unique. FAH doesn't reassign the same WU multiple times -- except after somebody fails to return results by the deadline.
QuintLeo wrote:I'm starting to wonder if this project has memory latency issues, perhaps due to it's size, given it also doesn't seem to run well on the GTX 1080.
It's the same for me: I've only been folding project 10496 on my gtx 1080TI since the last two days. Anyway, this is my personal theory: what if users block this project to fold more "profitable" WUs? I've read something similar some time ago on another thread: they just blocked incoming data from a particular server that was offline for several days
QuintLeo wrote:GTX 1080 ti also runs low PPD on this, commonly less than 800k PPD (vs 1 million PLUS for everything else my 1080 ti cards have seen to date).
Is there a possibility your GPU is throttling, due to heat buildup? I have three 1080Tis, all of which typically process a 10496 work unit at about 1M PPD. I have overclocked the GPUs, but only at a minimal 100MHz core boost. They stay relatively cool, typically at about 65 to 70C core temp.
Keep in mind also that a 1080Ti working a 10496 WU pumps a lot of data through PCIe bus. If there are multiple video cards Folding on the same motherboard, it can very easily saturate the bus.
Unfortunately not - and this project is very bad on 1080 (same to a hair LESS PPD than a 1070) and VERY bad on 1080 ti (gives about 10% more PPD than a 1070). 10494 is very similar.
Not nearly as bad on the 1080ti as 9415 and 9414 though, where it gets A LOT WORSE PPD than the 1070 on a 1080ti (literally about 60% based on the ones I've accumulated in HFM so far).
The performance is SO CRAZY BAD on my 1080ti cards that I'm seriously considering blocking the workserver on that machine - it's ridiculous to WASTE top-end folding cards on a work unit that performs so much BETTER on MUCH WORSE hardware.
I think it's fair to assume that the GPU is either waiting for data to process or it's computing something. In other words, it's either waiting on the PCIe bus or the shaders. If the shaders are 80% busy, that means that 20% of the time it's waiting on the PCIe bus to give it data to work on -- and the PCIe bus is either moving data or it's waiting on the CPU to prepare that data. (in fact, those numbers can add up to more than 100% because it's possible to transfer data concurrently with computing data.)
Given your ability to measure the %Busy for the shaders, the %Busy for the bus, and the %Busy for the FAHCore, give us your estimate of what's going on and why it's doing whatever it's doing.