Page 2 of 5
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sat May 23, 2020 11:21 pm
by Paragon
MeeLee wrote:The problem with the 3900x and the 3950x, is that if you have a motherboard with a 10 VRM count, they usually don't have enough power to run the CPU at their rated frequencies.
MSI sells boards that have 12VRMs, and CPU uses a 3x 4-pin power connector from the PSU.
I actually bought the X570-P Asus prime board because it has the same VRM as the top-tier boards, and heavy duty heat sinking (although not active cooling). It has 1 x 4-pin power and 1 x 8 pin CPU power connectors, and supports over 340 watts of socket power. I did try pushing it and found that I could get to the advertised 4.3 GHz all-core turbo fairly easily. Temps were warm at 90 degrees C, but never thermally throttled (that happens at 95c). Before I upgraded to the Noctua cooler, I was using a cheap tower cooler that hit 95 C at near stock frequencies.
MeeLee wrote:Second issue is that you'll possibly need to set the CPU frequency to fixed.
As if you're using PBO, your frequencies are all over the place.
And quite often you can gain 25-50Mhz just by going fixed frequency.
For this testing, I disabled PBO and CPB, and am running all cores at the stock 3.5 GHz. This is to get a fair apples to apples comparison of thread scaling, taking the MHz out of the equation. I plan to redo all tests with frequency scaling enabled to see how that reshapes the performance and efficiency curves.
MeeLee wrote:
Fourth, Ryzen 9 3000 series CPUs are very speed dependent on the RAM.
If you run them with stock 2133Mhz RAM, they perform much worse than at higher frequencies.
Tests done online, state that they run best with DDR4 3700Mhz RAM modules, as RAM frequency, and Infinity fabric operate at the same frequency.
Meaning, most older Ryzen 9 3000 CPUs had an Infinity Fabric that could do <1800Mhz.
Newer chiplets have been slightly optimized and can run the Infinity Fabric at <1900Mhz.
Since RAM speed is Double Data Rate, a 3600Mhz module actually operates at 1800Mhz.
Your Infinity Fabric speeds should be linked to the RAM speeds (don't set them to auto).
Although, there is a debate whether or not, IF set to auto might slow down the bus ring, resulting in less heat, and higher core boost frequencies.
The con, is that data will be read slower from RAM.
Which is why most BIOS versions allow a 'performance' setting on the IF, which means they'll run at their max speed, all the time.
The heat penalty is minor, so long the system runs stable.
On average IF can be overclocked to 1850Mhz safely (not on older CPUs), then paired with DDR4 4000Mhz memory you can get the most out of your system by setting it to 3700Mhz, and lower CAS latencies.
But if the memory is too costly (they're hard to get nowadays), Amazon sells 3600Mhz modules for $75 (for 2x8GB sticks), which can safely be overclocked to 3700Mhz using the same latencies as at 3600Mhz.
Got that covered too. I bought some pretty nice Corsair Vengence LPX modules, rated for 3600 MHz out of the box. The XPM profile / memory auto tune was enabled in the BIOS. Before running all the F@H tests, I confirmed I was at 3600 MHz on the memory. The Infinity Fabric bus speed was also at 3600 MHz, running in linked mode (configured in Ryzen Master). Good tip about pushing to 3700 MHz. I might try that, and also try dropping down to 2133 to see the effect on performance and efficiency.
MeeLee wrote:Once you have the PPD results of all core/threads, you'll have to redo the CPU tuning with SMT disabled, as you probably will be able to overclock to higher CPU speeds.
It's a tedious project that will probably keep you busy for an entire day, just to get the voltages, and PBO settings correctly, without running the CPU in excess of 90C (60-80C preferably).
The plan for now is to rerun with SMT disabled and keeping CPB off, to see the effect of the one change. Whatever curve set ends up being faster (CPB off, SMT on vs. SMT off) will then be tuned with CPB on and / or PBO. I don't think temps will be a problem since I went completely overkill on cooling, and have already done a few tests with PBO enabled with +100 watts to the socket power in BIOS running all-core at 4.3 GHz, stayed under 90 C.
MeeLee wrote:
But (stable) manual overclocking offers much more consistent results than with PBO.
Trying to determine PPD with PBO enabled, will not only lower your PPD, but will also be very inconsistent.
Almost as if you're trying to get an average on a random number generator.
Yeah I saw some of that in preliminary testing. We're talking about a 5% performance boost( if that) at the expense of like 100 watts more electrical consumption. Some offset frequency / voltage tuning is likely the way to go. This project makes me feel like I'll be busy tweaking this CPU for the rest of the year and never get back to GPUs (which is fine considering my compute budget is blown for the year already).
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sun May 24, 2020 12:33 am
by MeeLee
You can just put your finger on the VRM cooling module. If you're using air cooling, it should be warm. But with water cooling and insufficient airflow they tend to surpass 60C on the top (meaning VRMs will run close to 90C), meaning you put your finger on it, and it's actually painful.
In that case scenario, some sort of extra airflow is needed.
I'm wondering if there's a FAHBench update for core 22, to see if this gives quicker and more consistent results?
I'm fairly sure core 22 and a fixed WU (or a set of WUs of different projects) can be ran multiple times over.
The chances on getting a WU of a specific project are pretty slim.
And to get an average on all projects would take weeks.
It'd be much easier to achieve, if you had 3 the same PCs with 3 different configurations running the same projects for the same amount of weeks, to get a consistent result.
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sun May 24, 2020 1:42 am
by Paragon
MeeLee wrote:You can just put your finger on the VRM cooling module. If you're using air cooling, it should be warm. But with water cooling and insufficient airflow they tend to surpass 60C on the top (meaning VRMs will run close to 90C), meaning you put your finger on it, and it's actually painful.
In that case scenario, some sort of extra airflow is needed.
I'm wondering if there's a FAHBench update for core 22, to see if this gives quicker and more consistent results?
I'm fairly sure core 22 and a fixed WU (or a set of WUs of different projects) can be ran multiple times over.
The chances on getting a WU of a specific project are pretty slim.
And to get an average on all projects would take weeks.
It'd be much easier to achieve, if you had 3 the same PCs with 3 different configurations running the same projects for the same amount of weeks, to get a consistent result.
Man I wish I had three of these! Just one broke the bank, haha. It is going to take a while, but that's fine. The results should be worth it...I haven't seen an in-depth core study like this done for a while.
I am using air cooling (Dual tower Noctua NH-D15 SE):
https://www.amazon.com/Noctua-NH-D15-SE ... B01NC06ZYT
It's got two 140 mm fans and all that air blows over the VRMs, and then goes out the 120mm case exhaust at the back. The case has a 120mm top and 80mm side fan up by the CPU to dump heat as well, and is fed by 2 x 120MM front intakes (one I custom mounted in the 5.25 drive bays so it blows cold air right at the CPU cooler). With this setup, I see 30-35 C CPU idle, 55C CPU fully loaded (CPB Off), 70C CPU fully loaded (CPB On), and 90C maxed out (maximum PBO wattage settings). VRMs on the Asus Prime X570-P are pretty good but not great...still enough to overclock the 3950x according to what I've read (and I was able to push it nicely). I'll have to bust out my thermal camera and image the VRMs to see how they are doing.
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sun May 24, 2020 3:09 pm
by MeeLee
Well, the 3950x is listed at 3500Mhz, so anything above this is overclocking.
Personally, I run them at 3,7-3,9Ghz with only 10 VRMs (2x 4 pin CPU plug from PSU).
Only one 3900x I run at 3,5Ghz, at 'eco' setting, because the case is too small to keep the CPU , GPU, and VRMs cool.
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Wed May 27, 2020 7:03 am
by PantherX
MeeLee wrote:...I'm wondering if there's a FAHBench update for core 22, to see if this gives quicker and more consistent results?
I'm fairly sure core 22 and a fixed WU (or a set of WUs of different projects) can be ran multiple times over.
The chances on getting a WU of a specific project are pretty slim...
An update to FAHBench is planned and will be next after CUDA version of FahCore_22. There's no ETA.
In the meantime, I did write about how to capture WU and use it to benchmark the system to ensure that you get consistent results across various hardware without waiting for weeks or hurting science progress: viewtopic.php?p=335217#p335217
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Wed Jun 03, 2020 6:46 am
by Juggy
I am getting rid of my 9900KF and replacing it with a 3950X on an Asus X570I Gaming ITX board. According to BuildZoid it has a very, very good VRM so I am hoping I can run it at around 4GHZ permanently
Also putting in 2x16GB 3600 G-Skill Trident Z Neo CL16 modules so hopefully it will run smoothly.
I may have missed it somewhere but what is the ideal slot settings for the 3950X bearing in mind I have a 2080 Super as well.
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Wed Jun 03, 2020 7:05 am
by MeeLee
It has 16 cores, and 32 threads.
If you go Linux, 15 cores (no smt) is good.
For Windows, you might need to put 14 cores.
With smt both windows and Linux should work well with 30 out of 32 threads.
While I have a 3900x and a 3950x, they are currently occupied running other projects, so I won't test fah on them.
But a test needs to be done on PPD, and smt enabled or disabled.
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Wed Jun 03, 2020 8:37 am
by PantherX
This post from _r2w_ben has some good insights into the CPU Projects and the usable CPU values: viewtopic.php?f=72&t=34350&start=45
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Thu Jun 04, 2020 6:25 pm
by Nathan_P
Paragon wrote:Thanks for all the replies! I can see there are endless possibilities for testing here. One thing I've found is that this is going to be an ongoing adventure, since it takes so long to even get one data set across these 32 threads.
Here is part 1, the simplest test: 1-32 cores, just pulling PPD out of the client. The plot everyone wants to see is at the bottom. Work unit variability messes up the plot towards the high end, but I think the trend is pretty clear. Also, seeing a CPU do over 400K PPD is pretty nuts.
I am currently running this all over again, logging power as well as running multiple work units per core setting. I think some averaging will clear up the trend.
https://greenfoldingathome.com/2020/05/ ... f-threads/
Run some work units on linux, you should see quite the jump in PPD
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sat Jun 06, 2020 2:04 pm
by jchang6
Interesting, my 10-core Xeon E5, SMT on, under-clocked to 2GHz from 2.2 rated, 2.5 turbo, is currently getting 50K ppd, but I have seen it as high as 70-100K. It is very unclear to me what the effects of SMT are, as there is variability from run-to-run, project related? High SMT scaling implies much of the core time is idle waiting for a round-trip memory access. Workloads that stream memory or running entirely within the on-die cache may show poor, zero or even negative SMT scaling. It seems the memory footprint of FahCore_a7.exe as shown in Task Manager is low, 10MB/cpu ? But what is the true very frequently accessed memory footprint, somewhat or much lower? The Xeon E5 has 256K L2 per core and 2.5M per slice shared L3. The Ryzen L3 is 14M shared over 4 cores. So is the better performance per core of the Ryzen due to the higher frequency or the larger L2/L3? and by what degree?
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sat Jun 06, 2020 2:26 pm
by Neil-B
@jchang6 ... Have you had a recent patch updating Intel microcode if so your lower ppd rate recently may well be seeing the impact of that on your PPD ...
viewtopic.php?f=38&t=35473
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sat Jun 06, 2020 4:00 pm
by Paragon
jchang6 wrote:Interesting, my 10-core Xeon E5, SMT on, under-clocked to 2GHz from 2.2 rated, 2.5 turbo, is currently getting 50K ppd, but I have seen it as high as 70-100K. It is very unclear to me what the effects of SMT are, as there is variability from run-to-run, project related? High SMT scaling implies much of the core time is idle waiting for a round-trip memory access. Workloads that stream memory or running entirely within the on-die cache may show poor, zero or even negative SMT scaling. It seems the memory footprint of FahCore_a7.exe as shown in Task Manager is low, 10MB/cpu ? But what is the true very frequently accessed memory footprint, somewhat or much lower? The Xeon E5 has 256K L2 per core and 2.5M per slice shared L3. The Ryzen L3 is 14M shared over 4 cores. So is the better performance per core of the Ryzen due to the higher frequency or the larger L2/L3? and by what degree?
It's hard to say what drives the Ryzen's performance vs. a Xeon. The cache differences are certainly part of it, but the overall architecture is also much newer on the Ryzen (2019) then the Xeon (2016). The clock rate difference is also massive (3.5 GHz on the Ryzen vs. 2.2 on the Xeon). That Intel security patch doesn't do the Xeon any favors either.
Looking at a non-F@h multicore benchmark is helpful. Here's the Xeon:
https://www.cpubenchmark.net/cpu.php?cp ... Hz&id=2758
And here's the Ryzen 3950x
https://www.cpubenchmark.net/cpu.php?cp ... 0X&id=3598
The Ryzen is scoring almost 4x higher in the passmark multicore test. This is due to many factors (6 more cores, 1.3 GHz or so faster, more cache, faster bus, not crippled by security updates, etc).
A more fair comparison would be to run a 10 core solve on the Ryzen with the frequency set to match the Xeon and see what happens.
And yes, there is definitely large variation in work unit PPD, by as much as 10-20 percent. My in process testing shows a lot of variation when running with threads greater than # of CPU cores (relying on SMT), so you might be on to something there. Work units that are hogging the entire FPU on each core, executing within cache, will show little to no improvement as more SMT threads are added. I am going to do my testing on both one work unit (swept through thread count as suggested above) and on a random mix of work units (averaging the result), and will present a plot of the mean, variance, and 95 percent confidence interval of where the typical work unit score falls for a given CPU thread
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sat Jun 06, 2020 4:13 pm
by MeeLee
Paragon wrote:jchang6 wrote:Interesting, my 10-core Xeon E5, SMT on, under-clocked to 2GHz from 2.2 rated, 2.5 turbo, is currently getting 50K ppd, but I have seen it as high as 70-100K. It is very unclear to me what the effects of SMT are, as there is variability from run-to-run, project related? High SMT scaling implies much of the core time is idle waiting for a round-trip memory access. Workloads that stream memory or running entirely within the on-die cache may show poor, zero or even negative SMT scaling. It seems the memory footprint of FahCore_a7.exe as shown in Task Manager is low, 10MB/cpu ? But what is the true very frequently accessed memory footprint, somewhat or much lower? The Xeon E5 has 256K L2 per core and 2.5M per slice shared L3. The Ryzen L3 is 14M shared over 4 cores. So is the better performance per core of the Ryzen due to the higher frequency or the larger L2/L3? and by what degree?
It's hard to say what drives the Ryzen's performance vs. a Xeon. The cache differences are certainly part of it, but the overall architecture is also much newer on the Ryzen (2019) then the Xeon (2016). The clock rate difference is also massive (3.5 GHz on the Ryzen vs. 2.2 on the Xeon). That Intel security patch doesn't do the Xeon any favors either.
Looking at a non-F@h multicore benchmark is helpful. Here's the Xeon:
https://www.cpubenchmark.net/cpu.php?cp ... Hz&id=2758
And here's the Ryzen 3950x
https://www.cpubenchmark.net/cpu.php?cp ... 0X&id=3598
The Ryzen is scoring almost 4x higher in the passmark multicore test. This is due to many factors (2 more cores, 1.3 GHz or so faster, more cache, faster bus, not crippled by security updates, etc).
A more fair comparison would be to run a 10 core solve on the Ryzen with the frequency set to match the Xeon and see what happens.
And yes, there is definitely large variation in work unit PPD, by as much as 10-20 percent. My in process testing shows a lot of variation when running with threads greater than # of CPU cores (relying on SMT), so you might be on to something there. Work units that are hogging the entire FPU on each core, executing within cache, will show little to no improvement as more SMT threads are added. I am going to do my testing on both one work unit (swept through thread count as suggested above) and on a random mix of work units (averaging the result), and will present a plot of the mean, variance, and 95 percent confidence interval of where the typical work unit score falls for a given CPU thread
Not to mention RAM speeds! Intel Xeons are made for 2133Mhz (DDR4) or 1600Mhz (DDR3). Ryzens were made for 3200Mhz, but some seem to successfully run 4400Mhz memory!
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sat Jun 06, 2020 9:01 pm
by jchang6
this morning when I posted, the 10-C Xeon E5-2630 v4, rated for 2.2GHz that I power limited to 2.0GHz was running at about 50K (Project 14704) is now show 77K PPD on a different project (13850). Also, I have an RX5600 that is normally 700K PPD, but could be 200K (13409). The 2080 Super is typically between 1.4-2M PPD.
Many people talk about memory "speed" which is mostly bandwidth. I am not familiar with the FaH workload, but given the small memory foot print (10M per thread?) I am disinclined to believe that it is streaming such a small amount? Note, the gaming memory that can run over 3GHz, is also set to tCAS of 10-11ns, versus ~14.5ns for conventional In database transaction processing workloads, performance scaling is almost linear with SMT, indicating a high percentage of dead cycles waiting for round-trip memory access, which is why IBM POWER is 4 or 8-way SMT.
I am trying to find a DIMM maker who is willing to disable the upper Row address bit, giving half the memory on each chip, the half furthest away from the sense amp to see what latency can be achieved.
Re: Ryzen 9 3950x Benchmark Machine: What should I test for
Posted: Sun Jun 07, 2020 3:58 pm
by MeeLee
For Ryzen 3000 series CPUs, RAM speed make a lot of difference, especially on CPU folding; as the higher the RAM speed, the faster the inter-core connect (infinity fabric) will operate.
Theoretically, infinity fabric runs fastest at 1800Mhz (with 3600Mhz DDR ram), and can do a little overclock to 1850Mhz (3700Mhz for the RAM).
Anything beyond that, will default infinity fabric to a lower state (thus slow down the data sharing between CPUs).
Ryzen 4000 CPUs should not have this limitation (according to rumors).