Dual X5650 giving me 20k PPD, why so slow?
Moderator: Site Moderators
Forum rules
Please read the forum rules before posting.
Please read the forum rules before posting.
Dual X5650 giving me 20k PPD, why so slow?
I've just found an old HP server with Dual Xeon X5650 and started to fold on it. I know its old, but since, each CPU has a cpu mark of 5863, I was thinking the result should be okay.
But it's been disapointing. I'm currently folding a 16810 project's FAHCore_a8 WU and only making 20k-ish PPD with all 24 thread on it. Meanwhile, I also have a single i5-8500B 6 cores (CPU mark of 9538) that got the same WU, and it could crush it at of speed that would end up with over 100k PPD.
Is this normal? Are my dual X5650 just that slow? Or is there something I should check to unleash their power?
So far, I've tried to split the thread in slots like 12/12, 8/8/6, etc, and couldn't really get much better results. But even when the 24 threads are in one slot, the monitor tells me most cores are pretty much in full use.
Also, sometimes it suddenly starts to fold very fast and get up to 150k PPD, but this is always just a glitch : when this happen, progress stop to be print in the log tab and after a few minutes, the progress bar just go down to where it was before the power up. (Example : in a fews minutes I get from 55% to 60%, but none of the progress was log and it suddently go back down to 55.40%). Not sure if this has anything to do with my main question.
But it's been disapointing. I'm currently folding a 16810 project's FAHCore_a8 WU and only making 20k-ish PPD with all 24 thread on it. Meanwhile, I also have a single i5-8500B 6 cores (CPU mark of 9538) that got the same WU, and it could crush it at of speed that would end up with over 100k PPD.
Is this normal? Are my dual X5650 just that slow? Or is there something I should check to unleash their power?
So far, I've tried to split the thread in slots like 12/12, 8/8/6, etc, and couldn't really get much better results. But even when the 24 threads are in one slot, the monitor tells me most cores are pretty much in full use.
Also, sometimes it suddenly starts to fold very fast and get up to 150k PPD, but this is always just a glitch : when this happen, progress stop to be print in the log tab and after a few minutes, the progress bar just go down to where it was before the power up. (Example : in a fews minutes I get from 55% to 60%, but none of the progress was log and it suddently go back down to 55.40%). Not sure if this has anything to do with my main question.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Dual X5650 giving me 20k PPD, why so slow?
To start, yes the X5650 is that old. It is a 10 year old Westmere design and only supports SSE. Newer chips support AVX or AVX2 which gives a significant speedup over the SSE2 instructions used when on your processors.
You may find depending on the WU that using just the 12 main CPU threads will give nearly the highest PPD.
The PPD and ETA figures right after a start are not reliable estimates. The client needs to see at least 1-2% progress before they will settle down to accurate numbers.
You may find depending on the WU that using just the 12 main CPU threads will give nearly the highest PPD.
The PPD and ETA figures right after a start are not reliable estimates. The client needs to see at least 1-2% progress before they will settle down to accurate numbers.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: Dual X5650 giving me 20k PPD, why so slow?
Well, here is an important details I totally miss! Thanks for the info.Joe_H wrote: Newer chip support AVX or AVX2 which gives a significant speedup over the SSE2 instructions used when on your processors.
You mean that running just 1 slot with 12 threads might get higher PPD than 1 slot with 24 threads? Whats would be the reason for not using all the HT?Joe_H wrote: You may find depending on the WU that using just the 12 main CPU threads will give nearly the highest PPD.
Re: Dual X5650 giving me 20k PPD, why so slow?
FAHCore_a8 does a better job of keeping execution units busy than FAHCore_a7. Spawning and syncing the extra threads that use HT might be slowing things down more than they're helping.Nilem wrote:You mean that running just 1 slot with 12 threads might get higher PPD than 1 slot with 24 threads? Whats would be the reason for not using all the HT?
The current build of FAHCore_a8 is de-optimized for multi-socket environments (referred to as -ntmpi 1 here). This results in extra syncing between sockets so less threads might be faster.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Dual X5650 giving me 20k PPD, why so slow?
Yes, 1 slot with 12 threads.
When using the 2 threads per core provided through HT, the processing threads are in constant contention for use of the single FPU available. Each thread needs to be synchronized with the others, so overall processing is limited by the speed of the slowest threads. In addition, while some WUs will process on all 24 threads, depending on the size in atoms, they may not process much faster past a llower thread count.
Depending on what else is running on the system, including system processes for the OS, leaving at least 1 or 2 threads unused by F@h can result in less interruption of the threads that are being used.
Finally, one other thing I did not mention before is the NUMA related settings in your BIOS. They can affect how threads on one physical CPU communicate with those on the other. Those settings may speed up or slow down folding, or have no discernible effect.
When using the 2 threads per core provided through HT, the processing threads are in constant contention for use of the single FPU available. Each thread needs to be synchronized with the others, so overall processing is limited by the speed of the slowest threads. In addition, while some WUs will process on all 24 threads, depending on the size in atoms, they may not process much faster past a llower thread count.
Depending on what else is running on the system, including system processes for the OS, leaving at least 1 or 2 threads unused by F@h can result in less interruption of the threads that are being used.
Finally, one other thing I did not mention before is the NUMA related settings in your BIOS. They can affect how threads on one physical CPU communicate with those on the other. Those settings may speed up or slow down folding, or have no discernible effect.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Dual X5650 giving me 20k PPD, why so slow?
With xeons (at least the more recent ones) the contention impacts the scenario less than other multi thread cpus (or at least that is my experience) .. the output/throughput from my 14 core xeons increases significantly (admittedly not quite double but fairly close) when running 28 threads ... but worth a few tests using an offline core/wu to see the real difference.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Dual X5650 giving me 20k PPD, why so slow?
I can't help you with your Problem, but I would like to know how you squeeze 100k PPD out of an i5.Nilem wrote:[...]I also have a single i5-8500B 6 cores (CPU mark of 9538) that got the same WU, and it could crush it at of speed that would end up with over 100k PPD.[...]
I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Dual X5650 giving me 20k PPD, why so slow?
Have you entered a passkey for your Folding?sptn. wrote:I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
https://foldingathome.org/support/faq/points/passkey/
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: Dual X5650 giving me 20k PPD, why so slow?
The "i7" doesn't mean much. An i7-8550U is a CPU for notebook and it is nearly half as powerful as an i5-8500B, but consume a little 15w compare to 65w. The "i7" version that would compare better is the i7-8700B, which is a 6 cores HT (the i5-8500B isn't HT) which deliver 30% more,sptn. wrote:I can't help you with your Problem, but I would like to know how you squeeze 100k PPD out of an i5.
I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
https://www.cpubenchmark.net/compare/In ... 3064vs3388
Re: Dual X5650 giving me 20k PPD, why so slow?
So I got my best PPD with two slots like so : 12/10. I could than achieve around 25k PPD.
But I see that the lack of AVX makes this machine very inefficient under FAH. I've tried Rosetta on it, and the result where more proportional to the raw power of that dual X5650, so I will probably keep it there for now and run FAH on my more recent machine.
But I see that the lack of AVX makes this machine very inefficient under FAH. I've tried Rosetta on it, and the result where more proportional to the raw power of that dual X5650, so I will probably keep it there for now and run FAH on my more recent machine.
Re: Dual X5650 giving me 20k PPD, why so slow?
Yes I did.JimboPalmer wrote:Have you entered a passkey for your Folding?sptn. wrote:I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
https://foldingathome.org/support/faq/points/passkey/
Ah this makes sense.Nilem wrote:The "i7" doesn't mean much. An i7-8550U is a CPU for notebook and it is nearly half as powerful as an i5-8500B, but consume a little 15w compare to 65w. The "i7" version that would compare better is the i7-8700B, which is a 6 cores HT (the i5-8500B isn't HT) which deliver 30% more,sptn. wrote:I can't help you with your Problem, but I would like to know how you squeeze 100k PPD out of an i5.
I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
https://www.cpubenchmark.net/compare/In ... 3064vs3388
-
- Posts: 1164
- Joined: Wed Apr 01, 2009 9:22 pm
- Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)
Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS
Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only) - Location: Jersey, Channel islands
Re: Dual X5650 giving me 20k PPD, why so slow?
If you haven't already, try linux as the OS. That 20k is low for a pair of x5650's
Re: Dual X5650 giving me 20k PPD, why so slow?
I didn't try, I actually tought windows would be a bit faster. Do you have an estimation on how many more points should I get under Linux?Nathan_P wrote:If you haven't already, try linux as the OS. That 20k is low for a pair of x5650's
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Dual X5650 giving me 20k PPD, why so slow?
Currently, folding on Linux does generate more points, each Project would have a different increase. Do note, that while this is the current situation, it might change in the future. Here's a Google Sheets that have some information for you to make an informed decision:Nilem wrote:...I didn't try, I actually tought windows would be a bit faster. Do you have an estimation on how many more points should I get under Linux?
Same set of hardware for each test: https://docs.google.com/spreadsheets/d/ ... edit#gid=0
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: Dual X5650 giving me 20k PPD, why so slow?
The idea of running one slot per CPU is probably interesting.
I don't know how you'd do that in the slots setting.
But if you run 2 CPUs, and force a WU on them that utilizes both CPUs, the connection between both CPUs are most definitely going to slow each unit down.
Disabling HT is also a good way to save power.
If you have PCIE ports available, it may be better to plug a few GPUs in. As a single GT1030 will get a higher score than both of those CPUs.
Or, if you have some money spare, get a Ryzen 3900X. They're not only more than twice as fast @ 3,8-4Ghz, they're also getting extra bonus PPDs (due to many threads and faster finishing WUs) AND they use less power (around 150W stock, to 200W with PBO for a system doing CPU folding only, at the wall).
I don't know how you'd do that in the slots setting.
But if you run 2 CPUs, and force a WU on them that utilizes both CPUs, the connection between both CPUs are most definitely going to slow each unit down.
Disabling HT is also a good way to save power.
If you have PCIE ports available, it may be better to plug a few GPUs in. As a single GT1030 will get a higher score than both of those CPUs.
Or, if you have some money spare, get a Ryzen 3900X. They're not only more than twice as fast @ 3,8-4Ghz, they're also getting extra bonus PPDs (due to many threads and faster finishing WUs) AND they use less power (around 150W stock, to 200W with PBO for a system doing CPU folding only, at the wall).