Bug report: FAH client cannot detect more than 32 CPUs

Moderators: Site Moderators, FAHC Science Team

MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by MeeLee »

But 2 slots of 32 cores is still better than a single one.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by PantherX »

MeeLee wrote:But 2 slots of 32 cores is still better than a single one.
Humm... can you please elaborate on what assumptions or use case you would suggest 2 CPU:32 Slots instead of 1 CPU:64 Slot?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
jnv11
Posts: 31
Joined: Wed Sep 02, 2020 5:49 am
Hardware configuration: CPU: Intel® Xeon® W-2295 Processor
GPU: Nvidia Titan RTX
OS: Windows 10 Pro
Motherboard: Asus WS C422 SAGE/10G
RAM: 4x16GB Crucial DDR4-2933 RDIMMs
Location: Morrisville, NC, USA

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by jnv11 »

PantherX wrote:
MeeLee wrote:But 2 slots of 32 cores is still better than a single one.
Humm... can you please elaborate on what assumptions or use case you would suggest 2 CPU:32 Slots instead of 1 CPU:64 Slot?
I think that MeeLee was trying to say that 2 slots of 32 cores is better than 1 slot of 32 cores. Personally, I can only think that 2 slots of 32 cores is better than 1 slot of 64 cores if the tasks assigned are so small that at least one of the tasks would leave several idle cores. There are work units with so few atoms that the core will refuse to use all cores if given a large amount of cores. It may be server-driven, though. I don't remember exactly what my logs said when this happened to my computer.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by Neil-B »

In my case for the moment I run a 32/56 and a 24/56 slot because of this issue ... Have been providing some analysis into the team which posted elsewhere but relevant to this:

Working "outside" of the Client running the new A8 Core on a test p16810 WU the current 24 core slot and a 32 core slot produced less than a single 54 core slot (54 was better than 56) ... Not only did the single slot process slightly more over time (20.5 WUs/d to 20 WUs/day) but 11 of them are completed 60 mins quicker using a single 54 core slot (in 70 mins rather than 130 mins) and the other 9 are completed 90 mins quicker (in 70 mins rather than 160 mins) ... Only a little bit more science but one heck of a lot faster completion of each WU.

This is being looked into - but as/when a fix for the Client that resolves this we just need to be patient as it is one of many enhancements being worked on ... We have a workaround using multiple slots so we can still utilise the cores - just not quite as efficiently as we might be able to in the future :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by PantherX »

[quote="jnv11"...Personally, I can only think that 2 slots of 32 cores is better than 1 slot of 64 cores if the tasks assigned are so small that at least one of the tasks would leave several idle cores. There are work units with so few atoms that the core will refuse to use all cores if given a large amount of cores. It may be server-driven, though. I don't remember exactly what my logs said when this happened to my computer.[/quote]
Most of the time, your client will request WUs for X CPUs and the Server will allocate you something that matches that. If that fails, it might assign you a WU for Y CPUs (where Y < X) to ensure that your system does work instead of being idle.

In FahCore_a7, if a small WU is assigned and it can't be partitioned equally between the CPUs, it will throw a domain decomposition error and resolving it would require lowering the CPU value manually.

In FahCore_a8, the current workaround prevents any domain decomposition errors from happening. However, the plan is to have a must most intelligent and robust method to handle domain decomposition errors on large CPUs since Threadripper and high-end Ryzen 9 CPUs have a lot more CPUs than what was common few years ago. Glad to see rapid hardware advancements and it will take a bit of time for software to catch-up :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by MeeLee »

Neil-B wrote:In my case for the moment I run a 32/56 and a 24/56 slot because of this issue ... Have been providing some analysis into the team which posted elsewhere but relevant to this:

Working "outside" of the Client running the new A8 Core on a test p16810 WU the current 24 core slot and a 32 core slot produced less than a single 54 core slot (54 was better than 56) ... Not only did the single slot process slightly more over time (20.5 WUs/d to 20 WUs/day) but 11 of them are completed 60 mins quicker using a single 54 core slot (in 70 mins rather than 130 mins) and the other 9 are completed 90 mins quicker (in 70 mins rather than 160 mins) ... Only a little bit more science but one heck of a lot faster completion of each WU.
One of the problems with Ryzen and Threadripper CPUs, is that all CPU chiplets communicate with one another over the Infinity fabric.
Basically a similar idea (probably derived of) Intel Core I's ring bus technology, with the exception that a ring bus goes from core to core, while infinity fabric is some sort of blazing fast network, connecting all cores pretty much directly to one another. This poses a problem when a lot of data is moved around, which is why a Ryzen speeds up so much with faster RAM (thus also faster Infinity Fabric).

Data to and from RAM, also gets transported via this infinity network.
You can imagine, an Infinity Fabric running at 1,8Ghz, having to provide data to all CPU cores and RAM, running at twice the frequency...
Thankfully they're equipped with (from what I can understand) 4 lanes connecting each chiplet to another, and an additional 6 lanes connecting cores within the Chiplet (the chiplet is the block enveloping a quadcore, 6-core, or octacore CPU. Ryzen and Threadripper have many Chiplets stacked together, and within each chiplet are many CPUs; Basically a CPU core on a chiplet, is a CPU within a CPU). The marketing names don't make things easier...

Anyway, the reason why your Threadripper (and even Ryzen 3900x/xt or 3950x) isn't operating as fast at an all core load is twofold.
One because of bottlenecking the IF,
Two, I firmly believe that Threadrippers have too many cores, asking for data from RAM. So the RAM is somewhat bottlenecked too.
Three, because having a few cores passive, will allow that power to be routed to the other cores, allowing them to have a higher clock speed.

This, in my opinion, is the difference between Intel and AMD.
Intel would never (or at least never has) fabricate a CPU where the additional cores would slow down operation.
They even measure voltages, and power consumption levels, so that each CPU is optimized pretty much about as well as can be; so that the average of a bunch of tasks that need completion, will be done at the lowest carbon footprint possible for that technology.
Meaning, if they'd increase the CPU frequency, the CPU needs more power, and overall the watts used to finish the job would rise.
Consequently, lowering the frequency, lowers the power requirements, but also takes the task longer to complete; resulting in an increase in power consumption as well.
AMD on the other hand, I don't feel they look at this.
They just either shoot for fastest CPU frequency, or in case of 3000 series Ryzen and Threadrippers, their CPU algorithms are a total mess!
You could be running a 6 CPU threads workload on a 3900x, and instead of running that at a maximum rated frequency of 4,?Ghz, the CPU is running it at a 2,5Ghz frequency.
Not to mention their initial bios issues on all ryzens!
I feel their latest products were released hastily, and aren't as refined as Intel, despite they running a smaller lithography (7nm), and having more cores...
Some tests are showing Intel to gain superiority back with their 11th gen CPUs, that are running workloads more efficient at 10nm, than AMD at 7nm.
jnv11
Posts: 31
Joined: Wed Sep 02, 2020 5:49 am
Hardware configuration: CPU: Intel® Xeon® W-2295 Processor
GPU: Nvidia Titan RTX
OS: Windows 10 Pro
Motherboard: Asus WS C422 SAGE/10G
RAM: 4x16GB Crucial DDR4-2933 RDIMMs
Location: Morrisville, NC, USA

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by jnv11 »

MeeLee wrote:
Neil-B wrote:In my case for the moment I run a 32/56 and a 24/56 slot because of this issue ... Have been providing some analysis into the team which posted elsewhere but relevant to this:

Working "outside" of the Client running the new A8 Core on a test p16810 WU the current 24 core slot and a 32 core slot produced less than a single 54 core slot (54 was better than 56) ... Not only did the single slot process slightly more over time (20.5 WUs/d to 20 WUs/day) but 11 of them are completed 60 mins quicker using a single 54 core slot (in 70 mins rather than 130 mins) and the other 9 are completed 90 mins quicker (in 70 mins rather than 160 mins) ... Only a little bit more science but one heck of a lot faster completion of each WU.
One of the problems with Ryzen and Threadripper CPUs, is that all CPU chiplets communicate with one another over the Infinity fabric.
Basically a similar idea (probably derived of) Intel Core I's ring bus technology, with the exception that a ring bus goes from core to core, while infinity fabric is some sort of blazing fast network, connecting all cores pretty much directly to one another. This poses a problem when a lot of data is moved around, which is why a Ryzen speeds up so much with faster RAM (thus also faster Infinity Fabric).

Data to and from RAM, also gets transported via this infinity network.
You can imagine, an Infinity Fabric running at 1,8Ghz, having to provide data to all CPU cores and RAM, running at twice the frequency...
Thankfully they're equipped with (from what I can understand) 4 lanes connecting each chiplet to another, and an additional 6 lanes connecting cores within the Chiplet (the chiplet is the block enveloping a quadcore, 6-core, or octacore CPU. Ryzen and Threadripper have many Chiplets stacked together, and within each chiplet are many CPUs; Basically a CPU core on a chiplet, is a CPU within a CPU). The marketing names don't make things easier...

Anyway, the reason why your Threadripper (and even Ryzen 3900x/xt or 3950x) isn't operating as fast at an all core load is twofold.
One because of bottlenecking the IF,
Two, I firmly believe that Threadrippers have too many cores, asking for data from RAM. So the RAM is somewhat bottlenecked too.
Three, because having a few cores passive, will allow that power to be routed to the other cores, allowing them to have a higher clock speed.

This, in my opinion, is the difference between Intel and AMD.
Intel would never (or at least never has) fabricate a CPU where the additional cores would slow down operation.
They even measure voltages, and power consumption levels, so that each CPU is optimized pretty much about as well as can be; so that the average of a bunch of tasks that need completion, will be done at the lowest carbon footprint possible for that technology.
Meaning, if they'd increase the CPU frequency, the CPU needs more power, and overall the watts used to finish the job would rise.
Consequently, lowering the frequency, lowers the power requirements, but also takes the task longer to complete; resulting in an increase in power consumption as well.
AMD on the other hand, I don't feel they look at this.
They just either shoot for fastest CPU frequency, or in case of 3000 series Ryzen and Threadrippers, their CPU algorithms are a total mess!
You could be running a 6 CPU threads workload on a 3900x, and instead of running that at a maximum rated frequency of 4,?Ghz, the CPU is running it at a 2,5Ghz frequency.
Not to mention their initial bios issues on all ryzens!
I feel their latest products were released hastily, and aren't as refined as Intel, despite they running a smaller lithography (7nm), and having more cores...
Some tests are showing Intel to gain superiority back with their 11th gen CPUs, that are running workloads more efficient at 10nm, than AMD at 7nm.
Actually, there are problems where having too many processes competing with each other can cause slowdowns. For example, after all of the BOINC clients that are part of NFS@home return their results, the staff setting up the back end postprocessing of the BOINC client results reserve many more cores than they actually use in the compute servers so that the cores which do process instructions can have little or no competition for the memory controllers since the back end processes are not CPU hogs, but memory throughput hogs. Having many cores sit idle drastically speeds up the cores that spend most of their time waiting for memory accesses.

As for AMD Ryzen, this is a complex beast. Each mainstream third generation Ryzen CPU (which uses the Zen 2 architecture) has one I/O chip and one or two chiplets with 8 cores per chiplet. Further complicating things is that each chiplet contains two core complexes, or CCXes, of 4 cores and one pool of L3 cache. Communication between the cores and the shared L3 cache in a CCX is lightning fast. Communications between CCXes depends on how fast the Infinity Fabric runs, so this communication is slower and is clocked to the same clock speed as the system RAM to minimize latency. In Threadripper processors based on the Zen or Zen+ architectures (e.g. the first and second generation Threadrippers), communication between the dies is slower than communications kept within a CPU die, so the ideal mode is to use NUMA mode and pin one slot per CPU die. For the third-generation Threadripper CPUs, all CPU core chiplets share 1 I/O die, so they behave a lot better if the NUMA mode is disabled and the chip appears like one huge chip to the OS. This results in a very large shared L3 cache with high latency. NUMA mode has split L3 caches that have lower latency since cores do not need to search the L3 caches outside of their own chiplets when they have an L2 cache miss. All generations of Threadripper will require benchmarking to see if NUMA mode or non-NUMA mode works better for each generation once a 64-bit client that is properly NUMA-aware is developed and shipped.

Also, you have to consider that AMD was racing against the financial clock since it was going to go bankrupt if Ryzen was not released in time, so AMD did not have the time to release a completely polished product. Once Ryzen sold well and rescued AMD from its death spiral, AMD worked on mitigating the worst flaws with later models.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by PantherX »

MeeLee wrote:...I feel their latest products were released hastily, and aren't as refined as Intel, despite they running a smaller lithography (7nm), and having more cores...
Some tests are showing Intel to gain superiority back with their 11th gen CPUs, that are running workloads more efficient at 10nm, than AMD at 7nm.
Personally, it seems slightly unfair to compare a new architecture that has been out for ~3 years to something that has been out for ~11 years. I am sure that the first few Intel Core i Series had their own heat issues too. However, I am still thankful to AMD for shaking up the market and producing CPUs that a more affordable price than Intel. That's healthy competition which is needed as it benefits customers like us and drives up innovation.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by PantherX »

jnv11 wrote:...All generations of Threadripper will require benchmarking to see if NUMA mode or non-NUMA mode works better for each generation once a 64-bit client that is properly NUMA-aware is developed and shipped...
AFAIK, there's no real benefit for a 64-bit client. All folding is done on FahCore_a7 and FahCore_a8 which are 64-bit compatible.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by gunnarre »

If the 32 bit client can get correct answers from the OS about the structure of a 64 bit system (core count, NUMA, multi-CPU structure, RAM bandwidth), then that shouldn't be a problem, but if it can't then those questions would need to be asked by the 64 bit folding core instead and either passed up to the client or managed completely within the folding core.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by PantherX »

Agreed! Let's wait and see what happens :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by Neil-B »

MeeLee wrote:Anyway, the reason why your Threadripper (and even Ryzen 3900x/xt or 3950x) isn't operating as fast at an all core load is twofold.
If you were referring to my comment re 54/56 being quicker than 56/56 just to point out this was on an dual Intel Xeon server .. and tbh the reason was probably due to the contention and overhead with other software on the server at the time during the testing .. and I kind of like 54 as a count as doesn't have an "large" prime issues :) .. and yes I know A8 shouldn't have these but at least for a while the ingrained avoidance twitch will still kick in :)

If you were referring to the world in general re Threadripper then I can't comment as I am an Intel only user !!
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by MeeLee »

PantherX wrote:Personally, it seems slightly unfair to compare a new architecture that has been out for ~3 years to something that has been out for ~11 years. I am sure that the first few Intel Core i Series had their own heat issues too. However, I am still thankful to AMD for shaking up the market and producing CPUs that a more affordable price than Intel. That's healthy competition which is needed as it benefits customers like us and drives up innovation.
But they aren't. Ryzen 3000 series have just come out last year, and Intel Core I 11th gen is a big leap from 2nd to 9th gen.
There's a lot that changed after 9th gen. 10th gen being a lot more efficient than any iterations before. 11th gen is mainly 10th gen with an improved 10nm design, and due to it being 10nm, they added a few CPU and iGPU cores.
Neil-B wrote: If you were referring to my comment re 54/56 being quicker than 56/56 just to point out this was on an dual Intel Xeon server .. and tbh the reason was probably due to the contention and overhead with other software on the server at the time during the testing .. and I kind of like 54 as a count as doesn't have an "large" prime issues :) .. and yes I know A8 shouldn't have these but at least for a while the ingrained avoidance twitch will still kick in :)

If you were referring to the world in general re Threadripper then I can't comment as I am an Intel only user !!
It's a general rule of thumb, that if you're folding on that many cores, one core is needed to feed those, much like you're feeding a GPU.
And if you're running Windows, it's generally recommended to have 1 core spare for the OS's background activity.

AMD has the issue amplified due to issues mentioned above.
Also, perhaps, is that the first 2 CCXs are much more efficient and higher quality made, than the last two; resulting in more threads bringing down the overall boost frequencies even more...
jnv11
Posts: 31
Joined: Wed Sep 02, 2020 5:49 am
Hardware configuration: CPU: Intel® Xeon® W-2295 Processor
GPU: Nvidia Titan RTX
OS: Windows 10 Pro
Motherboard: Asus WS C422 SAGE/10G
RAM: 4x16GB Crucial DDR4-2933 RDIMMs
Location: Morrisville, NC, USA

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by jnv11 »

PantherX wrote:
jnv11 wrote:...All generations of Threadripper will require benchmarking to see if NUMA mode or non-NUMA mode works better for each generation once a 64-bit client that is properly NUMA-aware is developed and shipped...
AFAIK, there's no real benefit for a 64-bit client. All folding is done on FahCore_a7 and FahCore_a8 which are 64-bit compatible.
Microsoft reworked the data structures for 64-bit Windows to greatly expand its ability to cope with more cores. A 32-bit client could not hope to get accurate data structures since the capacity limits of the 32-bit structures are being exceeded by high end systems today. The only thing that it could accurately get is a core count by using a new API, and there is no hope to get accurate versions of the rest of the data entirely within the 32-bit client. A 64-bit client could use the 64-bit versions of the data structures. Since the new structures are designed to use the same coding unless one is using assembly code, a recompile without changing the high level language code could likely fix the issue.
jnv11
Posts: 31
Joined: Wed Sep 02, 2020 5:49 am
Hardware configuration: CPU: Intel® Xeon® W-2295 Processor
GPU: Nvidia Titan RTX
OS: Windows 10 Pro
Motherboard: Asus WS C422 SAGE/10G
RAM: 4x16GB Crucial DDR4-2933 RDIMMs
Location: Morrisville, NC, USA

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by jnv11 »

gunnarre wrote:If the 32 bit client can get correct answers from the OS about the structure of a 64 bit system (core count, NUMA, multi-CPU structure, RAM bandwidth), then that shouldn't be a problem, but if it can't then those questions would need to be asked by the 64 bit folding core instead and either passed up to the client or managed completely within the folding core.
Since Microsoft greatly expanded the data structures when designing 64-bit Windows to accommodate more cores and my system already exceeds the limits for the 32-bit data structures, a 32-bit Windows client getting accurate information by itself is ruled out. Asking the folding cores to pass the data to the client would create loads of messy complexity that is asking for more spaghetti code which is a nightmare to maintain. Furthermore, the current Windows Folding@home client will ignore user requests to set the number of CPU cores in one folding slot to more than 32 cores, so that will need to be changed in the next version of the Folding@home client software. Asking the cores to manage themselves will anger users who want manual control to be able to set more than 32 cores per slot.
Post Reply