Ryzen 9 3950x Benchmark Machine: What should I test for you?

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by PantherX »

Starman157 wrote:...I've noticed that GPU processing WU 13444 has higher performance for the first 27%, then drops about 200,000 PPD for the next 25%, rises again to starting values for the following 25%, then drops 200,000 PPD again to the end. At least that is on the 6900xt. So it appears non uniformity isn't just a small scale issue, but can appear to be long term and vary depending on where in the WU processing is. What I was trying to figure out was whether simultaneous CPU processing was causing it. It wasn't. I guess it's just a feature of that WU...
You're right that it's a feature of the WU, specifically, it's a feature of Moonshot Project (134XX Series). Some Projects when them have four phases of workload where the first and third have the same performance while the the second and fourth have the same performance. Here's the sequence:
Equilibrium in molecule A
Nonequilibrium switch from A to B
Equilibrium in B
Nonequilibrium switch from B to A

Thus, you may notice the difference in the TFP (reflected in PPD) on your system. This is specific to those GPU Projects. Of course, if there's CPU contention with the GPU and other applications, it can impact that difference.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Starman157
Posts: 30
Joined: Tue Jul 14, 2020 12:55 pm
Hardware configuration: 3950x/5700XT, 2600x/5700XT, 2500/1070ti, 1090T/7950, 3570K/NA

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Starman157 »

@PantherX
Yeah. I've been trying to "nail jelly to the tree". Specifically maximizing efficiency in light of a rather shaky underpinning of ever changing CPU and GPU demands. Thanks for explaining the idiosyncrasies of the 13xxx WUs. Since I noticed that there was a distinct pattern to the changes in GPU performance, I assumed it was how the WU was "programmed".

As for contention, that's what I've been trying to minimize since I was lucky enough to "score" a 6900xt. The previous card was a 5700xt, and the performance difference between the two is quite shocking. At first blush, it appears the 6900xt is about 3x-5x faster than the 5700xt. As such, I assumed feeding the beastly 6900xt would involve more pressure on CPU resources to keep it fed (ignoring the power requirements are at best only a minor increase from the 5700xt). Finding out these contention issues when one doesn't fully understand the background of WU processing is fraught with a lot of guess work. Anyway, it seems everything is running smoothly and I'm happy with the balance I've finally achieved.

Folding ON!
Starman157
Posts: 30
Joined: Tue Jul 14, 2020 12:55 pm
Hardware configuration: 3950x/5700XT, 2600x/5700XT, 2500/1070ti, 1090T/7950, 3570K/NA

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Starman157 »

@MeeLee
I suspect that the Windows scheduling issues are more of a core quality issue than cache issues. I'm assuming of course that the Windows scheduler is completely ignorant of the programming of a particular thread and any possible parallelism that could be achieved by stuffing another thread onto an already consumed core. As far as the scheduler goes, I only sees that there is a process with something to do. Ok. Look around as to "who" can do it. This is where I think the scheduler gets it wrong. The quality scores (as reported by CTR "Clock Tuner for Ryzen") on my 3950x range from 193 down to 136. I suspect that Windows is taking these into account when loading up cores. As such, I'm "guessing" that it'd prefer to load up a second thread on a busy core than use one of the lower quality cores that isn't doing anything (or at least doing very little). All I'm doing is now forcing the issue by ensuring that 15 of my best cores are used, with the background Windows services et al being forced onto the last remaining core. I've run across a free program, Process Lasso, that allows me to prioritize the various processes since FAH doesn't have affinity control in the client specifically.

However, I am forcing the GPU threads onto the already busy CPU threads as a second thread since I suspect the workloads are quite different and shouldn't create a FP32 contention issue. Also, the GPU needs are fairly bursty at odd occasions (mainly at checkpoint times) so the overall impact of GPU interuptions should be minimal.

Yes, it's difficult measuring all this with so many moving "parts". As such, I'm only interested in maximizing efficiency with the least user input and "futzing". So running down your provided list:

1. Manual is not the way to go. PBO, set and forget. You don't get the same granular capability when going manual, and you only end up creating a LOT of heat. PBO does a much better job than I can and I've been overclocking CPUs since 1985.
2. BIOS automatics only when absolutely necessary. I've hand tuned memory timings along with IF timings too.
3. I don't have control over that other than to ensure it has the latest "production" BIOS. I've carefully selected the mobo for the 3950x believing that it's a more than adequate match (I'm using an Asus ROG Crosshair VIII Hero wifi). Power delivery stages, as determined by others, are more than adequate for stable overclocking a 3950x.
4. Power settings. Nope. Full power all the time.
5. Cooling the CPU. Thermaltake Water 3.0 360mm radiator using 3 120mm fans on full speed.
6. Never pause a WU unless absolutely necessary, after all, PPD is TIME calculated.
7. Already know about PPD differences.
8. Ah, the rest of the Windows crap. Turned off if I can do without, or relegated to the unused core if I cannot.

The case is a Lian Li O11 Dynamic XL with the sides off. It's really only meant to "hold" onto the components. Including the 3 fans for the rad, there are 9 total fans in the case moving air around to various areas. I learned early on that full time folding creates a lot of heat, so thermals are an important consideration in my builds (and always have been). Powering all this is a Seasonic Prime Titanium 850W, which also happens to be the lowest recommended power level for a Radeon 6900xt (which runs maxed at almost 2.7Ghz at 60C, 80C Tjunc) presently consuming 241W (although I've seen it as low as 200w).

The 3950x runs at 70-75C (depending on WU), 4.2Ghz (thanks PBO) at 1.3v or so.

I've taken many considerations into account for this Folding build. The only thing left to maximize the efficiency was the affinity control, hence my request figuring that native control within the application needing control would be better than other solutions (Process Lasso). I still would like the FAHClient to do what is needed since I'm brute forcing after that fact and there's a minor impact to performance on process startup (before Process Lasso get's its hands on things).
BobWilliams757
Posts: 522
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by BobWilliams757 »

PantherX wrote:
Starman157 wrote:...I've noticed that GPU processing WU 13444 has higher performance for the first 27%, then drops about 200,000 PPD for the next 25%, rises again to starting values for the following 25%, then drops 200,000 PPD again to the end. At least that is on the 6900xt. So it appears non uniformity isn't just a small scale issue, but can appear to be long term and vary depending on where in the WU processing is. What I was trying to figure out was whether simultaneous CPU processing was causing it. It wasn't. I guess it's just a feature of that WU...
You're right that it's a feature of the WU, specifically, it's a feature of Moonshot Project (134XX Series). Some Projects when them have four phases of workload where the first and third have the same performance while the the second and fourth have the same performance. Here's the sequence:
Equilibrium in molecule A
Nonequilibrium switch from A to B
Equilibrium in B
Nonequilibrium switch from B to A

Thus, you may notice the difference in the TFP (reflected in PPD) on your system. This is specific to those GPU Projects. Of course, if there's CPU contention with the GPU and other applications, it can impact that difference.
Interesting stuff. I've run a couple of that WU but never noticed. But then again on my system the variations are small because my PPD average is small.
Fold them if you get them!
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by MeeLee »

Starman157 wrote:@MeeLee
I suspect that the Windows scheduling issues are more of a core quality issue than cache issues. I'm assuming of course that the Windows scheduler is completely ignorant of the programming of a particular thread and any possible parallelism that could be achieved by stuffing another thread onto an already consumed core. As far as the scheduler goes, I only sees that there is a process with something to do. Ok. Look around as to "who" can do it. This is where I think the scheduler gets it wrong. The quality scores (as reported by CTR "Clock Tuner for Ryzen") on my 3950x range from 193 down to 136. I suspect that Windows is taking these into account when loading up cores. As such, I'm "guessing" that it'd prefer to load up a second thread on a busy core than use one of the lower quality cores that isn't doing anything (or at least doing very little). All I'm doing is now forcing the issue by ensuring that 15 of my best cores are used, with the background Windows services et al being forced onto the last remaining core. I've run across a free program, Process Lasso, that allows me to prioritize the various processes since FAH doesn't have affinity control in the client specifically.

However, I am forcing the GPU threads onto the already busy CPU threads as a second thread since I suspect the workloads are quite different and shouldn't create a FP32 contention issue. Also, the GPU needs are fairly bursty at odd occasions (mainly at checkpoint times) so the overall impact of GPU interuptions should be minimal.

Yes, it's difficult measuring all this with so many moving "parts". As such, I'm only interested in maximizing efficiency with the least user input and "futzing". So running down your provided list:

1. Manual is not the way to go. PBO, set and forget. You don't get the same granular capability when going manual, and you only end up creating a LOT of heat. PBO does a much better job than I can and I've been overclocking CPUs since 1985.
2. BIOS automatics only when absolutely necessary. I've hand tuned memory timings along with IF timings too.
3. I don't have control over that other than to ensure it has the latest "production" BIOS. I've carefully selected the mobo for the 3950x believing that it's a more than adequate match (I'm using an Asus ROG Crosshair VIII Hero wifi). Power delivery stages, as determined by others, are more than adequate for stable overclocking a 3950x.
4. Power settings. Nope. Full power all the time.
5. Cooling the CPU. Thermaltake Water 3.0 360mm radiator using 3 120mm fans on full speed.
6. Never pause a WU unless absolutely necessary, after all, PPD is TIME calculated.
7. Already know about PPD differences.
8. Ah, the rest of the Windows crap. Turned off if I can do without, or relegated to the unused core if I cannot.

The case is a Lian Li O11 Dynamic XL with the sides off. It's really only meant to "hold" onto the components. Including the 3 fans for the rad, there are 9 total fans in the case moving air around to various areas. I learned early on that full time folding creates a lot of heat, so thermals are an important consideration in my builds (and always have been). Powering all this is a Seasonic Prime Titanium 850W, which also happens to be the lowest recommended power level for a Radeon 6900xt (which runs maxed at almost 2.7Ghz at 60C, 80C Tjunc) presently consuming 241W (although I've seen it as low as 200w).

The 3950x runs at 70-75C (depending on WU), 4.2Ghz (thanks PBO) at 1.3v or so.

I've taken many considerations into account for this Folding build. The only thing left to maximize the efficiency was the affinity control, hence my request figuring that native control within the application needing control would be better than other solutions (Process Lasso). I still would like the FAHClient to do what is needed since I'm brute forcing after that fact and there's a minor impact to performance on process startup (before Process Lasso get's its hands on things).
I'd have to disagree with you on some points.
Windows actually is aware of in what L-cache program data is buffered. It is more aware than we think!
It even predicts data to be loaded in the L-cache, before the program calls for it.

1- Manual overclocking on the Ryzen is a skill. So long you have a consistent data to crunch (eg: CPU folding of one specific WU or project), manual overclocking is much better than PBO.
You can increase the CPU frequency by about 5-15% over PBO. This, because the cores are fixed, rather than constantly fluctuating.
On my 3900x for instance, PBO runs around 3.85Ghz, while I can bump it to 3,92Ghz.
Other projects run 3,58Ghz, and I can bump them to 3,87Ghz with a manual overclock.
The problem is, when a project uses a power hungry extension in the CPU, like AVX or something, the CPU may hit an undervolt, and error.
That's where PBO becomes interesting. Especially if low CPU intensive projects are mixed with high intensive projects.

6- People pause WUs, when they try to measure performance between hardware, so they can for sure run the same project WU on both hardware. For instance, to measure if an Asus RTX 2060 is as fast as a MSI 2060 or other...
The small pause introduced (sometimes with a PC reset), will lower PPD, and jinx the score.
Starman157
Posts: 30
Joined: Tue Jul 14, 2020 12:55 pm
Hardware configuration: 3950x/5700XT, 2600x/5700XT, 2500/1070ti, 1090T/7950, 3570K/NA

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Starman157 »

@MeeLee
1 - Manual overclocking with the tools provided from AMD do not have the same voltage and speed granularity that PBO provides. As for my specific 3950x, my own experience has proven that PBO provides an additional 75-100Mhz higher clocks than manual at a fixed speed and voltage. Also PBO is far more dynamic in its settings and adjusts according to load. Faster when light loads, slower when heavy. Makes sense. Also, manual overclocks eliminate any possibility of reaching the max speed on the 3950x (specifically 4.7Ghz), so say goodbye to your lovely single thread performance. The Ryzen Master software also forces any overclock to a specific speed (across an entire CCX) and an entirely fixed voltage (which is usually set to achieve stable operation at the speed chosen. Nothing dynamic about it, contrary to PBO. I have managed to pump about 185W through the 3950x ALL CORE LOAD in the quest for the highest stable clocks. The end result. Massive amounts of heat. Fierce really. Performance? Less than what PBO does at much lower voltages, but increased speed. Funny thing about overclocking. YMMV (your mileage may vary). So my 3950x is currently running 15 threads of Folding, consuming between 120-130W (less than the 140W max) at 4.175Ghz (although it does get up to 4.25Ghz) at the same time I'm crunching on a 6900xt consuming 250W at 2.7Ghz. Happy as a clam, as the saying goes.

As for point 6, sure, whatever tricks you need to do to "benchmark" a program that hasn't been designed as one from the programmers. Just be aware that those tricks can impact your numbers (so why are you doing it if you want accurate numbers). Also, since the WU calculation loads vary by work unit, it's kind of dubious as to what you're going to find. What may be good for one WU may not be for another. The WUs that you use for benchmarking aren't fixed, and neither is their programming. Just take a look at the nature of the changes for the COVID moonshot WUs (13xxx) to see an example (look previously in this forum thread).

I'm looking for the most efficient use of my entire computing resources (overall system performance - CPU + GPU) given the input of electricity to achieve it, with waste heat (and the noise necessary to remove it) the byproduct (which is fine for winter here in the cold north - summer is a different story).
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

Hi everyone,

Sorry for the delay. I had to do more testing than planned due to the new core a8 work units messing things up (had to wait for a7 work units to be comparable to previous plots). I also got side-tracked by my main writing gig (sci-fi novels), but I wasn't going to leave you all hanging.

Here's part 4: https://greenfoldingathome.com/2021/02/ ... e-and-smt/

Key takeaways: The auto-overclocking on the Ryzen 9 (CPB) takes a huge chunk out of efficiency for only a modest performance improvement.

Side-Note: A8 work units are pretty great, but I didn't do much with them because I don't have a basis of comparison to the older tests.

Also, it looks like the reason there is a big nose-dive in performance and efficiency in the 17-25 thread region is Windows 10 itself, namely in how it chooses to keep a few physical CPU cores free and loads up logical processors with the work units. I have a few Ryzen Master screenshots in there showing this activity. Eventually, when you throw a hard enough problem at the processor with enough threads, Windows stops being silly and really cranks it up. It's now on my list to someday investigate this in Ubuntu to see if the Linux task scheduler does this or not.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by MeeLee »

Paragon wrote:Hi everyone,

Sorry for the delay. I had to do more testing than planned due to the new core a8 work units messing things up (had to wait for a7 work units to be comparable to previous plots). I also got side-tracked by my main writing gig (sci-fi novels), but I wasn't going to leave you all hanging.

Here's part 4: https://greenfoldingathome.com/2021/02/ ... e-and-smt/

Key takeaways: The auto-overclocking on the Ryzen 9 (CPB) takes a huge chunk out of efficiency for only a modest performance improvement.

Side-Note: A8 work units are pretty great, but I didn't do much with them because I don't have a basis of comparison to the older tests.

Also, it looks like the reason there is a big nose-dive in performance and efficiency in the 17-25 thread region is Windows 10 itself, namely in how it chooses to keep a few physical CPU cores free and loads up logical processors with the work units. I have a few Ryzen Master screenshots in there showing this activity. Eventually, when you throw a hard enough problem at the processor with enough threads, Windows stops being silly and really cranks it up. It's now on my list to someday investigate this in Ubuntu to see if the Linux task scheduler does this or not.
I think this is more how AMD drivers work.
They don't utilize the CPU fully, until there's a >75-80% CPU load.
The same is true for Linux btw.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by bruce »

it depends on how the task scheduler allocates resources and what the programmer was thinking the last time he changed that code. For example, does the OS change it's behavior when it encounters hardware that's running big.LITTLE ... and how does it handle a pair of HyperThreaded CPUs when both code segments are predominantly integer or predominantly Floating Point (or one of each). [Is your OS smart enough to handle FAHCore_a* differently than FAHCore_7* / _8* ?]
Post Reply