Thread Limitation?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Thread Limitation?
Hey guys,
So first of all a little introduction, I've been the leader of the LTT folding team for almost 10 years, taking the team from number 16 to second in the rankings. I've been involved in folding for longer than that, but you get the idea
Recently it was time for a Folding Server upgrade, finally retiring my 22 core Xeon and replacing it with a EPYC 7K62. The board supports two CPUs and I'm hopeful of getting the second chip in the near future, but currently it is running a single 48/96 core CPU. (Along with 2 4070tis)
My issue is, the client seems to to be limited to 64 threads, which is an issue when I build the machine specifically to help with the processing of CPU WUs.
So is there anyway to force it to use more threads? Especially once I get the second chip.
My current possible solution is to run VMs but obviously that is less than ideal so thought I would make a thread here first, in hopes there is a solution other than Virtual Machines.
Happy folding,
Spec.
So first of all a little introduction, I've been the leader of the LTT folding team for almost 10 years, taking the team from number 16 to second in the rankings. I've been involved in folding for longer than that, but you get the idea
Recently it was time for a Folding Server upgrade, finally retiring my 22 core Xeon and replacing it with a EPYC 7K62. The board supports two CPUs and I'm hopeful of getting the second chip in the near future, but currently it is running a single 48/96 core CPU. (Along with 2 4070tis)
My issue is, the client seems to to be limited to 64 threads, which is an issue when I build the machine specifically to help with the processing of CPU WUs.
So is there anyway to force it to use more threads? Especially once I get the second chip.
My current possible solution is to run VMs but obviously that is less than ideal so thought I would make a thread here first, in hopes there is a solution other than Virtual Machines.
Happy folding,
Spec.
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Re: Thread Limitation?
So to update this,
On the web interface it says it has allocated a 64 and a 24 thread slot, with obviously 2 more used for each GPU...
But in task manager it is showing only 64 threads being utilised... Which is even more weird than the original issue of only allocating a single 64 thread slot
On the web interface it says it has allocated a 64 and a 24 thread slot, with obviously 2 more used for each GPU...
But in task manager it is showing only 64 threads being utilised... Which is even more weird than the original issue of only allocating a single 64 thread slot
-
- Site Moderator
- Posts: 1115
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Thread Limitation?
The 64 thread limit is probably a limit imposed by the assigned WU.
Most assignments say 1 to 64 cpus.
You can see the assignment info in log by installing the debug build.
If you have v8.4.4+, you can also use `fahctl state`.
You will need python 3.6+ and `pip install websocket-client`.
For linux, you might use `apt install python3-websocket`
Yes, there are some funky historical naming issues with the module.
For more control, you can use resource groups.
Please see https://foldingathome.org/v8-3-client-guide/
You might want to create a separate group for each gpu.
Use default group for cpu folding.
Gpu groups can have cpus set to zero, but they may still consume a cpu thread each.
Avoid over-allocation cpus across groups, and leave one or more threads for the system.
Running gpus in groups with zero cpus can prevent interruptions to cpu folding as available cpus fluctuates.
Ideally, the client will better manage the allocated resource pool dynamically.
But I would say it's not quite there yet.
Feedback on how the client manages the single group is valuable.
Most assignments say 1 to 64 cpus.
You can see the assignment info in log by installing the debug build.
If you have v8.4.4+, you can also use `fahctl state`.
You will need python 3.6+ and `pip install websocket-client`.
For linux, you might use `apt install python3-websocket`
Yes, there are some funky historical naming issues with the module.
For more control, you can use resource groups.
Please see https://foldingathome.org/v8-3-client-guide/
You might want to create a separate group for each gpu.
Use default group for cpu folding.
Gpu groups can have cpus set to zero, but they may still consume a cpu thread each.
Avoid over-allocation cpus across groups, and leave one or more threads for the system.
Running gpus in groups with zero cpus can prevent interruptions to cpu folding as available cpus fluctuates.
Ideally, the client will better manage the allocated resource pool dynamically.
But I would say it's not quite there yet.
Feedback on how the client manages the single group is valuable.
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Re: Thread Limitation?
So I'm suspecting it's a windows issue rather than a client issue, just installing Ubuntu now to see.
I'll get back to you!
I was running Windows for testing purposes but I'm a mostly Linux guy these days, so it was going to end up with Ubuntu on it either way.
I'll get back to you!
I was running Windows for testing purposes but I'm a mostly Linux guy these days, so it was going to end up with Ubuntu on it either way.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Thread Limitation?
Windows is a bit behind the hardware with respect to high core count processors, and also has licensing issues on how threads are utilized. Some of that is unlocked with a Win 10 Pro Workstation or Enterprise license. But one constant in recent Windows versions is use of 64 thread processor groups. Those processor groups will be treated like separate processor chips on a multi processor logic board. There are issues with inter-thread communication across processor groups.
I haven't looked at Win 11 and its licensing levels to see how this treatment of threads goes and what if any changes MS has made.
As for the CPU folding cores, testing in the past had the code scaling well into the 100+ thread range. But that only worked well if the WU was large enough in atom count to have enough in each region the WU was decomposed into for each thread. Currently most of the large atom count projects are being run as GPU projects. So at assignment time most CPU projects do have upper limits on the number of threads used by a WU.
I haven't looked at Win 11 and its licensing levels to see how this treatment of threads goes and what if any changes MS has made.
As for the CPU folding cores, testing in the past had the code scaling well into the 100+ thread range. But that only worked well if the WU was large enough in atom count to have enough in each region the WU was decomposed into for each thread. Currently most of the large atom count projects are being run as GPU projects. So at assignment time most CPU projects do have upper limits on the number of threads used by a WU.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Re: Thread Limitation?
In a move that surprises no one, Linux fixed all of my issues
Last edited by GOTspectrum on Sat Sep 21, 2024 1:31 pm, edited 1 time in total.
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Re: Thread Limitation?
More issues...
The CPU is underperforming so much I don't think I'll even meet the deadline for the WUs....
I just dumped a slot, it got a new WU and worked up to 1% fine, now it's been sat at 1% for at least 20 minutes, system monitor says the cores are being used, but the PPD is just dropping like a stone and the completion percentage is not moving...
The CPU is underperforming so much I don't think I'll even meet the deadline for the WUs....
I just dumped a slot, it got a new WU and worked up to 1% fine, now it's been sat at 1% for at least 20 minutes, system monitor says the cores are being used, but the PPD is just dropping like a stone and the completion percentage is not moving...
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Thread Limitation?
Try setting up a resource group with just 32 CPUs assigned to it. It is possible you got WUs that were not large enough and the folding core spent all of the CPU time reconciling the separate threads. Possibly post the log showing the startup of a WU and what is reported as it runs.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Re: Thread Limitation?
Sorry I didn't notice you replied, not getting notifications for replay is kinda meh.
So the WUs run, i have a 64 thread and a 28 thread slot, running V8, but the slots are performing like a 2400k...
I'm planning to do some investigations to determine if the CPU is underperforming in all tasks, E.G a issue with the hardware or configuration, or if it's only an issue for folding workloads.
Hopefully I'll be running some geekbench and other benchmark runs this evening and will report back with my findings.
For reference, I'm Running Ubuntu 24.04.1, it's an EPYC 7k62 running on a H11DSi in single socket configuration. Currently running only 4 memory channels but that shouldn't account for the massive underperformance I'm seeing. It's running a NVMe drive too, not that I would imagine would effect folding performance.
The GPUs on the platform are, for the most part, performing as expected.
So the WUs run, i have a 64 thread and a 28 thread slot, running V8, but the slots are performing like a 2400k...
I'm planning to do some investigations to determine if the CPU is underperforming in all tasks, E.G a issue with the hardware or configuration, or if it's only an issue for folding workloads.
Hopefully I'll be running some geekbench and other benchmark runs this evening and will report back with my findings.
For reference, I'm Running Ubuntu 24.04.1, it's an EPYC 7k62 running on a H11DSi in single socket configuration. Currently running only 4 memory channels but that shouldn't account for the massive underperformance I'm seeing. It's running a NVMe drive too, not that I would imagine would effect folding performance.
The GPUs on the platform are, for the most part, performing as expected.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Thread Limitation?
You need to opt-in for notifications when there is a reply. That is one of the options under the text entry window when using the full editor such as through "Post reply". If after enabling reply notifications they don't show up, there could be a problem with the mailer, but it is working for other purposes. Let me know and I can have someone look into it.
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Re: Thread Limitation?
So geekbench showed what I expected, slightly reduced perforamce but nothing on the slace of what I was seeing...
But here is a new issue GPUs not working
But here is a new issue GPUs not working
Code: Select all
18:34:48:W :Unrecognized option 'gpu'
18:34:48:W :Unrecognized option 'power'
18:34:48:I1:*********************** Folding@home Client ***********************
18:34:48:I1: Version: 8.3.18
18:34:48:I1: Author: Joseph Coffland <[email protected]>
18:34:48:I1: Org: foldingathome.org
18:34:48:I1: Copyright: 2023-2024, foldingathome.org
18:34:48:I1: Homepage: https://foldingathome.org/
18:34:48:I1: License: GPL-3.0-or-later
18:34:48:I1: URL: https://v8-3.foldingathome.org/
18:34:48:I1: Date: Jul 12 2024
18:34:48:I1: Time: 13:26:31
18:34:48:I1: Revision: 99ae953ee7b1c0b3070161cfcf9150184f76bd96
18:34:48:I1: Branch: master
18:34:48:I1: Compiler: GNU 8.3.0
18:34:48:I1: Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
18:34:48:I1: -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
18:34:48:I1: Platform: linux 4.19.0-26-cloud-amd64
18:34:48:I1: Bits: 64
18:34:48:I1: Mode: Release
18:34:48:I1: Args: --config=/etc/fah-client/config.xml
18:34:48:I1: --log=/var/log/fah-client/log.txt
18:34:48:I1: --log-rotate-dir=/var/log/fah-client/
18:34:48:I1: Config: /etc/fah-client/config.xml
18:34:48:I1:****************************** CBang ******************************
18:34:48:I1: Version: 1.7.2
18:34:48:I1: Author: Joseph Coffland <[email protected]>
18:34:48:I1: Org: Cauldron Development
18:34:48:I1: Copyright: Cauldron Development, 2003-2024
18:34:48:I1: Homepage: https://cauldrondevelopment.com/
18:34:48:I1: License: LGPL-2.1-or-later
18:34:48:I1: Date: Jun 24 2024
18:34:48:I1: Time: 13:29:44
18:34:48:I1: Revision: 1b05ea96f0ed3043c32b78a66dbf50a9b2002289
18:34:48:I1: Branch: master
18:34:48:I1: Compiler: GNU 8.3.0
18:34:48:I1: Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
18:34:48:I1: -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
18:34:48:I1: -fPIC
18:34:48:I1: Platform: linux 4.19.0-26-cloud-amd64
18:34:48:I1: Bits: 64
18:34:48:I1: Mode: Release
18:34:48:I1:***************************** System ******************************
18:34:48:I1: CPU: AMD EPYC 7K62 48-Core Processor
18:34:48:I1: CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
18:34:48:I1: CPUs: 96
18:34:48:I1: Memory: 125.63GiB
18:34:48:I1:Free Memory: 123.79GiB
18:34:48:I1: OS Version: 6.8
18:34:48:I1:Has Battery: false
18:34:48:I1: On Battery: false
18:34:48:I1: Hostname: spec-Super-Server
18:34:48:I1: UTC Offset: 1
18:34:48:I1: PID: 2282
18:34:48:I1: CWD: /var/lib/fah-client
18:34:48:I1: Exec: /usr/bin/fah-client
18:34:48:I1:*******************************************************************
18:34:48:I2:<config/>
18:34:48:I1:Opening Database
18:34:48:I1:F@H ID = Eytq9oYSqk-IRoNgNBMoDSIG2NbtgNRXohGGQO1uy-o
18:34:48:I3:Loading default group
18:34:48:I3:Loading default resource group
18:34:48:I1:Listening for HTTP on 127.0.0.1:7396
18:34:48:I3:WU48:Loading work unit 48 with ID ku2tdQFg1VAQjq7127q3o-7vQ3UcSirJQuKEiO-0U4A
18:34:48:I3:WU49:Loading work unit 49 with ID dycJQWTjaTTjex4OsbJA3mEEY5aygXKuQ9bkVoXwE9E
18:34:48:I3:WU50:Loading work unit 50 with ID 0fAW8v1upcMpJTp_rs0yfVmCeEFVNN5hUgCeVF55BiI
18:34:48:I3:Loaded 3 wus.
18:34:48:W :OpenCL not supported: clGetPlatformIDs() returned -1001
18:34:48:W :CUDA not supported: cuInit() returned 100
18:34:48:I3:gpus = {
18:34:48:I3: "gpu:132:00:00": {"vendor": 4318, "device": 10114, "type": "nvidia", "supported": false, "description": "AD104 [GeForce RTX 4070 Ti]"},
18:34:48:I3: "gpu:65:00:00": {"vendor": 4318, "device": 10114, "type": "nvidia", "supported": false, "description": "AD104 [GeForce RTX 4070 Ti]"}
18:34:48:I3:}
18:34:49:I1:Loaded cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8
18:34:49:I3:WU50:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8 -dir 0fAW8v1upcMpJTp_rs0yfVmCeEFVNN5hUgCeVF55BiI -suffix 01 -version 8.3.18 -lifeline 2282 -np 34
18:34:49:I3:WU50:Started FahCore on PID 2296
18:34:49:I1:WU50:*********************** Log Started 2024-09-23T18:34:49Z ***********************
18:34:49:I1:WU50:************************** Gromacs Folding@home Core ***************************
18:34:49:I1:WU50: Core: Gromacs
18:34:49:I1:WU50: Type: 0xa8
18:34:49:I1:WU50: Version: 0.0.12
18:34:49:I1:WU50: Author: Joseph Coffland <[email protected]>
18:34:49:I1:WU50: Copyright: 2020 foldingathome.org
18:34:49:I1:WU50: Homepage: https://foldingathome.org/
18:34:49:I1:WU50: Date: Jan 16 2021
18:34:49:I1:WU50: Time: 19:24:44
18:34:49:I1:WU50: Compiler: GNU 8.3.0
18:34:49:I1:WU50: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
18:34:49:I1:WU50: -fdata-sections -O3 -funroll-loops -fno-pie
18:34:49:I1:WU50: Platform: linux2 4.15.0-128-generic
18:34:49:I1:WU50: Bits: 64
18:34:49:I1:WU50: Mode: Release
18:34:49:I1:WU50: SIMD: avx2_256
18:34:49:I1:WU50: OpenMP: ON
18:34:49:I1:WU50: CUDA: OFF
18:34:49:I1:WU50: Args: -dir 0fAW8v1upcMpJTp_rs0yfVmCeEFVNN5hUgCeVF55BiI -suffix 01
18:34:49:I1:WU50: -version 8.3.18 -lifeline 2282 -np 34
18:34:49:I1:WU50:************************************ libFAH ************************************
18:34:49:I1:WU50: Date: Jan 16 2021
18:34:49:I1:WU50: Time: 19:21:38
18:34:49:I1:WU50: Compiler: GNU 8.3.0
18:34:49:I1:WU50: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
18:34:49:I1:WU50: -fdata-sections -O3 -funroll-loops -fno-pie
18:34:49:I1:WU50: Platform: linux2 4.15.0-128-generic
18:34:49:I1:WU50: Bits: 64
18:34:49:I1:WU50: Mode: Release
18:34:49:I1:WU50:************************************ CBang *************************************
18:34:49:I1:WU50: Date: Jan 16 2021
18:34:49:I1:WU50: Time: 19:21:24
18:34:49:I1:WU50: Compiler: GNU 8.3.0
18:34:49:I1:WU50: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
18:34:49:I1:WU50: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
18:34:49:I1:WU50: Platform: linux2 4.15.0-128-generic
18:34:49:I1:WU50: Bits: 64
18:34:49:I1:WU50: Mode: Release
18:34:49:I1:WU50:************************************ System ************************************
18:34:49:I1:WU50: CPU: AMD EPYC 7K62 48-Core Processor
18:34:49:I1:WU50: CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
18:34:49:I1:WU50: CPUs: 96
18:34:49:I1:WU50: Memory: 125.63GiB
18:34:49:I1:WU50:Free Memory: 123.72GiB
18:34:49:I1:WU50: Threads: POSIX_THREADS
18:34:49:I1:WU50: OS Version: 6.8
18:34:49:I1:WU50:Has Battery: false
18:34:49:I1:WU50: On Battery: false
18:34:49:I1:WU50: UTC Offset: 1
18:34:49:I1:WU50: PID: 2296
18:34:49:I1:WU50: CWD: /var/lib/fah-client/work
18:34:49:I1:WU50:********************************************************************************
18:34:49:I1:WU50:Project: 19229 (Run 5394, Clone 3, Gen 5)
18:34:49:I1:WU50:Unit: 0x00000000000000000000000000000000
18:34:49:I1:WU50:Digital signatures verified
18:34:49:I1:WU50:Calling: mdrun -c md5.gro -s md5.tpr -x md5.xtc -cpi state.cpt -cpt 5 -nt 34 -ntmpi 1
18:34:49:I1:WU50:Steps: first=2500000 total=3000000
18:34:50:E :Exception: Failed to prevent sleep: Permission denied
18:35:04:I1:WU50:Completed 16802 out of 500000 steps (3%)
18:36:48:I1:WU50:Caught signal SIGINT(2) on PID 2296
18:36:48:I1:WU50:Exiting, please wait. . .
18:36:50:I1:WU50:Folding@home Core Shutdown: INTERRUPTED
18:36:50:I1:WU50:Core returned INTERRUPTED (102)
-
- Posts: 8
- Joined: Fri Sep 20, 2024 5:00 pm
Re: Thread Limitation?
Sorry I never got to replying to this..
I ended up buying a windows 10 enterprise licence and running that any so far with the V8 client it is running as expected
I suspect the issues are Nvidia related and nothing to do with the client itself.
HOWEVER, the issue is the way folding at home loads on Linux, it loads before the GPU drivers do. So you can't enable your GPUs without killing and restarting the process.
SOLUTION, add an option to delay the opening of the client by maybe 60 seconds after the user logs in. That would, in theory remedy the issue
I ended up buying a windows 10 enterprise licence and running that any so far with the V8 client it is running as expected
I suspect the issues are Nvidia related and nothing to do with the client itself.
HOWEVER, the issue is the way folding at home loads on Linux, it loads before the GPU drivers do. So you can't enable your GPUs without killing and restarting the process.
SOLUTION, add an option to delay the opening of the client by maybe 60 seconds after the user logs in. That would, in theory remedy the issue
-
- Site Moderator
- Posts: 1115
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Thread Limitation?
I think what you want would be to modify the After rules in the fah-client.service file on Linux.
Someone else might know how.
Someone else might know how.