another question about threads
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
another question about threads
I apologize if this is answered elsewhere, but I see something that I cannot understand.
I am using windows 10 with a 20 thread machine. This should mean 5% CPU for each thread. I gave my CPU slot 17 threads. The windows task manager (details tab) shows 17 threads allocated to the a7 process. However, it is only using 80% of the CPU. I would think that should be 85%. What am I missing?
Additionally, how should the a8 core act?
I am using windows 10 with a 20 thread machine. This should mean 5% CPU for each thread. I gave my CPU slot 17 threads. The windows task manager (details tab) shows 17 threads allocated to the a7 process. However, it is only using 80% of the CPU. I would think that should be 85%. What am I missing?
Additionally, how should the a8 core act?
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: another question about threads
That sounds like a math riddle ... maybe FAH work unit did not like using 17 threads and lowered it to 16?
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
Re: another question about threads
Well, the a8 core is using 85% (what I expected). So this is either an a7 feature, or (as foldy suggested) something associated with the specific work unit.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: another question about threads
Check log ... this would show how many threads are actually used for the a7 wu ... I would expect 17 to be down stepped
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Site Admin
- Posts: 7939
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: another question about threads
If you check the log file for the run command information connected with the CPU WU being processed, you will probably find the core using 16 threads. 17 is a large prime and will not be used directly for computing. The log entry will look a bit like this:
the '-nt 2' indicates that 2 threads will be used. Depending on the OS and libraries in use, 1 or 2 more threads will be present but mostly inactive for the main code of the core executable. In the example I have here the 2 threads slice the region being simulated into 2 sections. A higher thread count will result in more slices in up to 3 dimensions. This is why 17 and similar "large" primes and their multiples aren't used, those as factors will result in slices that are too thin in some dimension.
Beyond about 16-18 threads the Gromacs code used in A7 and A8 can reserve some threads for doing PME calculations separately from the threads for each section. Details on this breakdown would be in either the science.log or md.log files that are part of the work files connected with the running WU.
The current version of the A8 core was created with its '--ntmpi' parameter set to 1. That has some implications for thread usage on larger systems, especially multiple processor systems. It also allows use of some thread counts that would not be used by the A7 core, still figuring out the full implications of that. One that is known is that depending on the WU size in atoms and space, different projects have thread counts beyond which there is little or no improvement in processing time. But the core will still use that many.
Code: Select all
08:35:44:WU00:FS00:0xa7: SIMD: avx_256
08:35:44:WU00:FS00:0xa7:********************************************************************************
08:35:44:WU00:FS00:0xa7:Project: 16927 (Run 7, Clone 6, Gen 50)
08:35:44:WU00:FS00:0xa7:Unit: 0x000000328120d1c95f930c26c013ad6d
08:35:44:WU00:FS00:0xa7:Digital signatures verified
08:35:44:WU00:FS00:0xa7:Calling: mdrun -s frame50.tpr -o frame50.trr -cpi state.cpt -cpt 15 -nt 2
Beyond about 16-18 threads the Gromacs code used in A7 and A8 can reserve some threads for doing PME calculations separately from the threads for each section. Details on this breakdown would be in either the science.log or md.log files that are part of the work files connected with the running WU.
The current version of the A8 core was created with its '--ntmpi' parameter set to 1. That has some implications for thread usage on larger systems, especially multiple processor systems. It also allows use of some thread counts that would not be used by the A7 core, still figuring out the full implications of that. One that is known is that depending on the WU size in atoms and space, different projects have thread counts beyond which there is little or no improvement in processing time. But the core will still use that many.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
Re: another question about threads
Interesting. The log file did show 16 threads on the a7 core and 17 threads on the a8. I never noticed this in the log file before. This answers my question. Thanks.
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
Re: another question about threads
Ok, I finally understand that there is a difference between the number of threads you give to the client and the number of threads the client actually uses. The a7 core has a number of restrictions (prime number above 3, the value 10, there may be more). Apparently a8 has different set of restrictions. In any event, the client will use the most number of threads that it can, which will process a given work unit as fast as possible on your machine.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: another question about threads
FYI ... There is also another circumstance and that is when researcher limit cpu projects to certain thread numbers (such as less than 11 or more than 18) which will mean generally your kit wont see them if the rule applies - however occasionally if there are no other WUs available the AS may assign you one of these if it can instead of nothing - for instance my 24 and 32 slots occasionally get 10 thread max WUs when there is a lack of WUs for bigger thread counts.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: another question about threads
Taskmanager is inaccurate compared it to eg: HWMonitor.
I think, taskmanager is a program ported from the old Windows NT/95 days.
It doesn't know how to deal with turbo boost frequencies of CPUs.
It can show them, but the graphs don't accurately represent them.
Try HWMonitor for more accurate results.
I think, taskmanager is a program ported from the old Windows NT/95 days.
It doesn't know how to deal with turbo boost frequencies of CPUs.
It can show them, but the graphs don't accurately represent them.
Try HWMonitor for more accurate results.
-
- Posts: 127
- Joined: Tue Mar 24, 2020 12:47 pm
Re: another question about threads
I already use HWMonitor (for the same reason). However, I have found the details tab of task manager to be pretty accurate.
Re: another question about threads
On the Win 10 computer I have task manager lies like there is no tomorrow, nearly useless. On an 8 core/thread cpu, using 6 for cpu and one for gpu, task manager shows 100% use, and no, I do not have so much other junk running that it takes up the whole last core. HWMonitor reports a more realistic 90% use.