If you check the log file for the run command information connected with the CPU WU being processed, you will probably find the core using 16 threads. 17 is a large prime and will not be used directly for computing. The log entry will look a bit like this:
Code: Select all
08:35:44:WU00:FS00:0xa7: SIMD: avx_256
08:35:44:WU00:FS00:0xa7:********************************************************************************
08:35:44:WU00:FS00:0xa7:Project: 16927 (Run 7, Clone 6, Gen 50)
08:35:44:WU00:FS00:0xa7:Unit: 0x000000328120d1c95f930c26c013ad6d
08:35:44:WU00:FS00:0xa7:Digital signatures verified
08:35:44:WU00:FS00:0xa7:Calling: mdrun -s frame50.tpr -o frame50.trr -cpi state.cpt -cpt 15 -nt 2
the '-nt 2' indicates that 2 threads will be used. Depending on the OS and libraries in use, 1 or 2 more threads will be present but mostly inactive for the main code of the core executable. In the example I have here the 2 threads slice the region being simulated into 2 sections. A higher thread count will result in more slices in up to 3 dimensions. This is why 17 and similar "large" primes and their multiples aren't used, those as factors will result in slices that are too thin in some dimension.
Beyond about 16-18 threads the Gromacs code used in A7 and A8 can reserve some threads for doing PME calculations separately from the threads for each section. Details on this breakdown would be in either the science.log or md.log files that are part of the work files connected with the running WU.
The current version of the A8 core was created with its '--ntmpi' parameter set to 1. That has some implications for thread usage on larger systems, especially multiple processor systems. It also allows use of some thread counts that would not be used by the A7 core, still figuring out the full implications of that. One that is known is that depending on the WU size in atoms and space, different projects have thread counts beyond which there is little or no improvement in processing time. But the core will still use that many.