Its been a long time since I ran them but under core A7 they were getting roughly 50kNilem wrote:I didn't try, I actually tought windows would be a bit faster. Do you have an estimation on how many more points should I get under Linux?Nathan_P wrote:If you haven't already, try linux as the OS. That 20k is low for a pair of x5650's
Dual X5650 giving me 20k PPD, why so slow?
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
-
- Posts: 1164
- Joined: Wed Apr 01, 2009 9:22 pm
- Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)
Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS
Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only) - Location: Jersey, Channel islands
Re: Dual X5650 giving me 20k PPD, why so slow?
Re: Dual X5650 giving me 20k PPD, why so slow?
On some CPUs, folding with hyperthreading is actually more efficient than without. For example, Chris found that on a 3950X, enabling hyperthreading and folding on max-2 threads was the most efficient way to CPU fold. At least on that processor: https://greenfoldingathome.com/2020/08/ ... threading/MeeLee wrote: Disabling HT is also a good way to save power.
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Re: Dual X5650 giving me 20k PPD, why so slow?
A lot depends on the specific project.
Suppose your WU uses 75% FP32 instructions, 3% FP64 instructions and 22% other instructions. Your FPU will be 78% busy.
Suppose another WU running on the other "half" of a hyperthreaded pair does the same? The shared FPU will be 100% busy (since it can't be (78+78)% busy but you'll be getting more total work done: 100/78 = 1.28 which is greater than 1.0
OTOH, if it's not FAH running on the other "half" it may use as little as 0% FP instructions and much more than 22% other instructions so the competition for shared resources ceases to be a problem.
Suppose your WU uses 75% FP32 instructions, 3% FP64 instructions and 22% other instructions. Your FPU will be 78% busy.
Suppose another WU running on the other "half" of a hyperthreaded pair does the same? The shared FPU will be 100% busy (since it can't be (78+78)% busy but you'll be getting more total work done: 100/78 = 1.28 which is greater than 1.0
OTOH, if it's not FAH running on the other "half" it may use as little as 0% FP instructions and much more than 22% other instructions so the competition for shared resources ceases to be a problem.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 70
- Joined: Thu Jul 09, 2020 12:07 pm
- Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
- Location: Germany
Re: Dual X5650 giving me 20k PPD, why so slow?
I have the X5675 (which is the same family as the 5650, and should be quite similar except for the clock speed), and I get almost identical results with 24 threads vs. 12 threads with CPU affinity set to use only the first thread of each core.Neil-B wrote:With xeons (at least the more recent ones) the contention impacts the scenario less than other multi thread cpus (or at least that is my experience) .. the output/throughput from my 14 core xeons increases significantly (admittedly not quite double but fairly close) when running 28 threads ... but worth a few tests using an offline core/wu to see the real difference.
That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
Cheers,
HG
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Re: Dual X5650 giving me 20k PPD, why so slow?
Manually setting affinity isn't really feasible in Windows, as with each new WU the affinity gets reset.Hopfgeist wrote:I have the X5675 (which is the same family as the 5650, and should be quite similar except for the clock speed), and I get almost identical results with 24 threads vs. 12 threads with CPU affinity set to use only the first thread of each core.Neil-B wrote:With xeons (at least the more recent ones) the contention impacts the scenario less than other multi thread cpus (or at least that is my experience) .. the output/throughput from my 14 core xeons increases significantly (admittedly not quite double but fairly close) when running 28 threads ... but worth a few tests using an offline core/wu to see the real difference.
That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
Cheers,
HG
Re: Dual X5650 giving me 20k PPD, why so slow?
Thanks! this is very informative!PantherX wrote:Here's a Google Sheets that have some information for you to make an informed decision:
Same set of hardware for each test: https://docs.google.com/spreadsheets/d/ ... edit#gid=0
The HP Proliant GL380 I run does have a extra PCI-E, and I did try to put a GPU I had around but since my server is a 1U, I only have one available slot and my GPU required 2... I was about to search for a GPU that could fit, but also reading that adding non-HP hardware will make the fans run at full speed because of the lack of sensor, and this server has the ability of making the same noise as a jet.MeeLee wrote:If you have PCIE ports available, it may be better to plug a few GPUs in. As a single GT1030 will get a higher score than both of those CPUs.
Re: Dual X5650 giving me 20k PPD, why so slow?
Being stuck at a single slot GPU, limits the capabilities of GPU folding. There are some single slot GT 1050 GPUs, or below.Nilem wrote:Thanks! this is very informative!PantherX wrote:Here's a Google Sheets that have some information for you to make an informed decision:
Same set of hardware for each test: https://docs.google.com/spreadsheets/d/ ... edit#gid=0
The HP Proliant GL380 I run does have a extra PCI-E, and I did try to put a GPU I had around but since my server is a 1U, I only have one available slot and my GPU required 2... I was about to search for a GPU that could fit, but also reading that adding non-HP hardware will make the fans run at full speed because of the lack of sensor, and this server has the ability of making the same noise as a jet.MeeLee wrote:If you have PCIE ports available, it may be better to plug a few GPUs in. As a single GT1030 will get a higher score than both of those CPUs.
Adding a GPU shouldn't increase fan speed by much. A 1050 operates at 75W, a 1030 at 35W, and a 730 at 25W. The GPU fans are steered by the Nvidia software, but you could manually set them however you like with MSI afterburner.
Re: Dual X5650 giving me 20k PPD, why so slow?
Interesting. Do you run them under Linux or Windows?Hopfgeist wrote: That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
I see the the X5675 is 1 year younger than the X5650. Maybe there is more to it that just clock speed? But if so, I can't find what on intel's website.
Re: Dual X5650 giving me 20k PPD, why so slow?
May also depend on what memory he's running.Nilem wrote:Interesting. Do you run them under Linux or Windows?Hopfgeist wrote: That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
I see the the X5675 is 1 year younger than the X5650. Maybe there is more to it that just clock speed? But if so, I can't find what on intel's website.
Apparently these CPUs run anywhere from 800Mhz to 1333Mhz of RAM.
I bet 800Mhz Ram could be restrictive.
-
- Posts: 70
- Joined: Thu Jul 09, 2020 12:07 pm
- Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
- Location: Germany
Re: Dual X5650 giving me 20k PPD, why so slow?
I run the Linux client on NetBSD in the Linux emulation, which has very little (for FAH practically zero) overhead. It just maps Linux system calls to NetBSD system calls. I use monit to check for the presence of the FahCore process and then have it rescheduled to only use CPUs 0 through 11, which are (on NetBSD) the first threads of each physical core.MeeLee wrote:May also depend on what memory he's running.Nilem wrote:Interesting. Do you run them under Linux or Windows?Hopfgeist wrote: That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
I see the the X5675 is 1 year younger than the X5650. Maybe there is more to it that just clock speed? But if so, I can't find what on intel's website.
Apparently these CPUs run anywhere from 800Mhz to 1333Mhz of RAM.
I bet 800Mhz Ram could be restrictive.
The standard NetBSD scheduler is not (yet?) smart enough to avoid running two (process) threads on two (CPU) threads of the same physical core, even though the kernel detects the difference, so it is significantly faster with the CPU affinity set explicitly. Task monitoring software (such as "top") shows only 50% CPU usage, but in fact the FPUs are fully loaded, and get about as hot as when using 24 calculation threads.
I am quite certain I use 1333 MHz RAM; the machine is a Sun Fire X2270 M2 with 40 GB RAM. Even at 9 years old it's still a very capable machine.
According to Passmark, the X5675 is only a bit faster, even less than the difference in clock speed. Since Passmark tests more than just the CPU core, it seems plausible that they use the same architecture.
It certainly would not be that much slower that it ends up at only 20k PPD for a dual-CPU system. Although early work unit returns are rewarded, and therefore PPD is not linear with processing speed, that's still too much of a difference.
Cheers,
HG
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Re: Dual X5650 giving me 20k PPD, why so slow?
I was about to say "Did you convert the binaries (from RPM or Debian)? As NetBSD/FreeBSD/OpenBSD are not supported operating systems.", until I saw you're emulating them.
You'll ALWAYS get higher PPD from running a program natively, compared to running it in emulation (even if it's just a native supported VM)
You'll ALWAYS get higher PPD from running a program natively, compared to running it in emulation (even if it's just a native supported VM)
-
- Posts: 70
- Joined: Thu Jul 09, 2020 12:07 pm
- Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
- Location: Germany
Re: Dual X5650 giving me 20k PPD, why so slow?
The binaries are not running in a virtual machine, they run natively in the operating system. It is rather like running Linux binaries that had been compiled for an older kernel version and/or an older libc version. Unless you use lots and lots of syscalls, there is no measurable overhead. (Incidentally, the a7 core uses a lot more syscalls than the a8 core, but even so, the system load is basically 0 (< 0,05%) when running FaH and nothing else.)MeeLee wrote:I was about to say "Did you convert the binaries (from RPM or Debian)? As NetBSD/FreeBSD/OpenBSD are not supported operating systems.", until I saw you're emulating them.
You'll ALWAYS get higher PPD from running a program natively, compared to running it in emulation (even if it's just a native supported VM)
Since the system is running NetBSD for various unrelated reasons, syscall mapping is certainly the most efficient way to run FaH, especially compared to a virtual machine.
Straight from the horse's mouth
Besides, I'm not OP complaining about low PPD. I get pretty good PPD, given that it's a 10 year old system.[...] it is only a thin software layer, mostly for system calls which are already very similar between the two systems. The application code itself is processed at the full speed of your CPU, so you don't get a degraded performance with the Linux emulation [...]
Cheers,
HG
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Re: Dual X5650 giving me 20k PPD, why so slow?
Reminds me of what they said about Wine, it's a sort of older Windows, that should get very little overhead,Hopfgeist wrote:The binaries are not running in a virtual machine, they run natively in the operating system. It is rather like running Linux binaries that had been compiled for an older kernel version and/or an older libc version. Unless you use lots and lots of syscalls, there is no measurable overhead. (Incidentally, the a7 core uses a lot more syscalls than the a8 core, but even so, the system load is basically 0 (< 0,05%) when running FaH and nothing else.)MeeLee wrote:I was about to say "Did you convert the binaries (from RPM or Debian)? As NetBSD/FreeBSD/OpenBSD are not supported operating systems.", until I saw you're emulating them.
You'll ALWAYS get higher PPD from running a program natively, compared to running it in emulation (even if it's just a native supported VM)
Since the system is running NetBSD for various unrelated reasons, syscall mapping is certainly the most efficient way to run FaH, especially compared to a virtual machine.
Straight from the horse's mouth
Besides, I'm not OP complaining about low PPD. I get pretty good PPD, given that it's a 10 year old system.[...] it is only a thin software layer, mostly for system calls which are already very similar between the two systems. The application code itself is processed at the full speed of your CPU, so you don't get a degraded performance with the Linux emulation [...]
Cheers,
HG
However, you still have the overhead of the original Operating system you're running from, even if the VM is very efficient.
-
- Posts: 46
- Joined: Fri Mar 20, 2020 3:13 am
- Hardware configuration: EVGA SR-2 motherboard
2x Xeon x5670 CPU
64 GB ECC DDR3
Nvidia RTX 2070
Re: Dual X5650 giving me 20k PPD, why so slow?
Hi,
I can tell for sure, 20K PPD for these CPUs is sort-of low.
Do you have a passkey? Did you do something like 10+ workunits (if I remember correctly) already to have the quick return bonus kick in?
Do you pause the folding or turn of the PC often? (I usually click finish button and wait for all WU to finish before shutdown.)
If you pause a workunit, it will be ready later and you lose some bonus.
I have a dual Xeon X5670, so similar to your setup, but a little higher clock speed. My CPU slots can do somewhere between 60-90K PPD total (depending on work unit)
Also I have all memory slots populated, as these CPUs have 3-channel memory controller per cpu. this might add some speed, if the calculations don't fit in the cache, but I don't know this for sure.
I run a clean install of Linux Mint system for the Folding stuff overnight to keep the room warm, nothing more runs except some task manager and hardware monitor apps and maybe a browser.
What could help:
Do not use only one CPU slot, as some workunits do not like the lot of threads these dual-xeons have.
And also they sometimes don't like when there are prime numbers in the factorization of the number of cpu cores larger than 3 (or something)
I have a CPU:12 and a CPU:9 slot, and then some cores left on the second Cpu for the Gpu slot and the background stuffs
12 cores =2 * 2 *3 and 9 = 3*3, so no large prime numbers.
I can tell for sure, 20K PPD for these CPUs is sort-of low.
Do you have a passkey? Did you do something like 10+ workunits (if I remember correctly) already to have the quick return bonus kick in?
Do you pause the folding or turn of the PC often? (I usually click finish button and wait for all WU to finish before shutdown.)
If you pause a workunit, it will be ready later and you lose some bonus.
I have a dual Xeon X5670, so similar to your setup, but a little higher clock speed. My CPU slots can do somewhere between 60-90K PPD total (depending on work unit)
Also I have all memory slots populated, as these CPUs have 3-channel memory controller per cpu. this might add some speed, if the calculations don't fit in the cache, but I don't know this for sure.
I run a clean install of Linux Mint system for the Folding stuff overnight to keep the room warm, nothing more runs except some task manager and hardware monitor apps and maybe a browser.
What could help:
Do not use only one CPU slot, as some workunits do not like the lot of threads these dual-xeons have.
And also they sometimes don't like when there are prime numbers in the factorization of the number of cpu cores larger than 3 (or something)
I have a CPU:12 and a CPU:9 slot, and then some cores left on the second Cpu for the Gpu slot and the background stuffs
12 cores =2 * 2 *3 and 9 = 3*3, so no large prime numbers.
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Dual X5650 giving me 20k PPD, why so slow?
I would suggest that you experiment with these values:
CPU:18
CPU:20
CPU:21
Since they have shown to be working in majority of cases based off this neat chart that _r2w_ben created a while ago: viewtopic.php?f=72&t=34350&start=45
Note that FahCore_a8 WUs can use whatever number of CPUs you provide to it without throwing any errors like FahCore_a7 does.
CPU:18
CPU:20
CPU:21
Since they have shown to be working in majority of cases based off this neat chart that _r2w_ben created a while ago: viewtopic.php?f=72&t=34350&start=45
Note that FahCore_a8 WUs can use whatever number of CPUs you provide to it without throwing any errors like FahCore_a7 does.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues