You have to get all work done on the P-cores; if E-cores are involved, it'll just slow down work for FAH. That processor has 8 P-cores, which can run 16 threads. Set the number of threads (in FAH Control: go to Configure -> Slots -> cpy slot -> CPUs) to 16. Then use Process Lasso to bind the process FahCore A8 / A9 to the P cores. I don't need Process Lasso myself, so how to do that exactly, I can't tell you.
The CPU core processes are either A8 or A9 (A9 is fairly new, and won't pop up very often yet). Chances are very low that you'd still get A7. The GPU core process is something like FahCore 22 - that can happily run on an E-core if you want to fold with your GPU.
When you increase the number of threads (CPUs), that is only applied to the next job. When you decrease the number of threads, it's applied immediately. There probably is some good design reason for that, but it confused the hell out of me when I started on this, so just mentioning
Optimally, I would not expect (much) more than 400k PPD.
The thing with Intel 12th and 13th gen is that big.LITTLE architecture. Windows 11 is supposed to be able to deal with that, but still seems to have issues with it, and then applications aren't always tuned to that either. FAH at least is not yet.