Best practices for CPU folding on dual-CCD AMD Ryzen 3000/5000/7000 series
Posted: Thu May 18, 2023 12:38 am
Since 12 and 16 core Ryzen CPUs have dual CCDs/chiplets, is it generally recommended to divide available CPU cores into two CPU folding slots with >16 threads each (multithreading enabled, of course) or not...? Windows 11 OS, by the way. Perhaps particularly with 12 or 16-core Ryzen 7000-series X3D parts, as the CCD with the stacked L3 cache die is lower clocked than the other CCD in this case (and as we know, CPU folding is always limited to the speed of the lowest performing core.)
I have a newly purchased 7950X3D CPU, with AMD's thread management driver installed, and it seems that CPU folding threads still get bounced around basically randomly amongst the CCDs by the windows task scheduler, something which is generally bad for performance on AMD hardware. Why is the bloody scheduler still so brainless, after all this time?
I tried using Process Lasso to limit one of the CPU folding slots to logical cores 1-15 only, and the other folding slot to logical cores 17-32 (leaving one logical core free per CCD), but it seems Process Lasso is unable to distinguish between multiple instances of the same executable, and would set both folding slots to utilize the same set of CPU cores (rrrrroooggggnnntudjuuuuuu!!!! lol), with the expected disastrous performance dropoff as a result.
Could we please maybe perhaps get a core/thread affinity locking feature built right into the client instead? It would seem the best solution to the issue. Who knows how much performance is lost by Winows' braindead scheduler spreading a folding slot between two separate CCDs (maybe more than two, in case of a Threadripper/Epyc processor.)
I know for a fact the scheduler will spread threads across both CCDs on my CPU, it's sufficient to simply pause one of the CPU folding slots and open the task manager's CPU load graph section (set to logical processors) to see this in action.
I have a newly purchased 7950X3D CPU, with AMD's thread management driver installed, and it seems that CPU folding threads still get bounced around basically randomly amongst the CCDs by the windows task scheduler, something which is generally bad for performance on AMD hardware. Why is the bloody scheduler still so brainless, after all this time?
I tried using Process Lasso to limit one of the CPU folding slots to logical cores 1-15 only, and the other folding slot to logical cores 17-32 (leaving one logical core free per CCD), but it seems Process Lasso is unable to distinguish between multiple instances of the same executable, and would set both folding slots to utilize the same set of CPU cores (rrrrroooggggnnntudjuuuuuu!!!! lol), with the expected disastrous performance dropoff as a result.
Could we please maybe perhaps get a core/thread affinity locking feature built right into the client instead? It would seem the best solution to the issue. Who knows how much performance is lost by Winows' braindead scheduler spreading a folding slot between two separate CCDs (maybe more than two, in case of a Threadripper/Epyc processor.)
I know for a fact the scheduler will spread threads across both CCDs on my CPU, it's sufficient to simply pause one of the CPU folding slots and open the task manager's CPU load graph section (set to logical processors) to see this in action.