Does OpenMM take full advantage of rDNA for Navi GPUs?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 168
- Joined: Tue Apr 07, 2020 2:38 pm
Does OpenMM take full advantage of rDNA for Navi GPUs?
Several people have pointed out in my thread on "Top GPUs for Folding@Home" (viewtopic.php?f=38&t=34240) that the 2060 Super performs better than the 5700 XT, despite the 5700 XT having more FP32 cores. Juggy and Tohya posted benchmarks using the new FAHBench compiled by foldy; their results showed that the 2060 Super outperformed the 5700 XT by 15%, despite the 5700 XT also operating at a higher clock rate. My calculations tell me that the 5700 XT should have outperformed the the 2060 Super by 23% at those frequencies. gordonbb and foldinghomealone2 both suggested that the AMD drivers are not as optimized as NVDIA drivers. While that may be the whole story, another possibility could be that some tasks in OpenMM are using GCN instead of rDNA on Navi GPUs. Any chance that could be the case? Thanks for any information!
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
As the title suggests, OpenMM is open source, you are free to read it yourself.
http://openmm.org/
https://en.wikipedia.org/wiki/RDNA_(microarchitecture) As I read this much of the performance is in the render pipeline. Folding does not render.
OTOH, AMD has invested in 'primitive shaders' (FP16, half precision) and just like Nvidia's FP16 shaders, they do not help F@H as it needs more precision. In both cases they are idle.
By some definition, F@H does not use rDNA, it uses OpenCL. If AMD's OpenCL code improves, then F@h will speed up without change.
http://openmm.org/
https://en.wikipedia.org/wiki/RDNA_(microarchitecture) As I read this much of the performance is in the render pipeline. Folding does not render.
OTOH, AMD has invested in 'primitive shaders' (FP16, half precision) and just like Nvidia's FP16 shaders, they do not help F@H as it needs more precision. In both cases they are idle.
By some definition, F@H does not use rDNA, it uses OpenCL. If AMD's OpenCL code improves, then F@h will speed up without change.
Last edited by JimboPalmer on Tue Apr 14, 2020 7:59 pm, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
The issue with 5700 XT lower performance as expected may be result of a low atom count work unit. It needs to get retested using FahBench and a high atom count work unit.
-
- Posts: 168
- Joined: Tue Apr 07, 2020 2:38 pm
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
I was hoping there might be someone familiar with OpenMM here. I have been looking at it, but I am not a computer scientist, much less one trained for GPU optimization. Navi can compute using either GCN or rDNA, not just for rendering. GCN can perform an FP16 operation in 1 clock cycle, an FP32 op in 2 clock cycles, and an FP64 op in 4 clock cycles. rDNA changes the design of their execution units so they can perform an FP16 op in one cycle in GCN mode, an FP32 op in one clock cycle in Wave32 mode, or an FP64 op in one clock cycle in Wave64 mode.JimboPalmer wrote:As the title suggests, OpenMM is open source, you are free to read it yourself.
http://openmm.org/
https://en.wikipedia.org/wiki/RDNA_(microarchitecture) As I read this much of the performance is in the render pipeline. Folding does not render.
OTOH, AMD has invested in 'primitive shaders' (FP16, half precision) and just like Nvidia's FP16 shaders, they do not help F@H as it needs more precision. In both cases they are idle.
https://www.amd.com/system/files/docume ... epaper.pdf
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
Not sure of the details, but I do know they had to update code in OpenMM to support the RDNA based cards. Before that they were not usable for F@h. The GPU folding core that supported them was released for beta testing in late December, and not released to full use until the end of January.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 168
- Joined: Tue Apr 07, 2020 2:38 pm
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
They both ran a simulation of 64614 atoms. Maybe that wasn't large enough to fully load the processors? If that turns out to be the case, then we'll want to recommend longer test times when benchmarking with FAHBench.foldy wrote:The issue with 5700 XT lower performance as expected may be result of a low atom count work unit. It needs to get retested using FahBench and a high atom count work unit.
-
- Posts: 146
- Joined: Sun Jul 30, 2017 8:40 pm
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
Testing as long as you get results you want to see?NoMoreQuarantine wrote:They both ran a simulation of 64614 atoms. Maybe that wasn't large enough to fully load the processors? If that turns out to be the case, then we'll want to recommend longer test times when benchmarking with FAHBench.
And it's not about the time (although that matters to see the effect of the cooling solution) but about atoms count.
Better go to AMD and complain about their crappy OpenCL-implementation
-
- Posts: 168
- Joined: Tue Apr 07, 2020 2:38 pm
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
People simulate large atoms for FAH. If the atom size at 1 minute is too small to give an accurate representation of performance for FAH, then it's not very helpful to the people trying to benchmark. That said, I doubt that is the issue. I also doubt it's the OpenCL implementation.foldinghomealone2 wrote:Testing as long as you get results you want to see?
Better go to AMD and complain about their crappy OpenCL-implementation
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
The issue of protein atom-count is prominent in discussions of the NVidia GPUs, too. For GPUs with large numbers of shaders, their performance also drops for small proteins but is acceptable for large proteins. As far as I know, the OpenCL 1.2 API and the FAHCore itself being used are identical.
Back to nV: I consider the 2060 and above examples of the same issue and apparently we're talking about the same order of shader counts. For those GPUs which support half precision, it's also a wasted feature for them.
If you post the project number(s) that you're testing and the specific Navi model, I'll ask around but I don't have the equipment to be able to personally compare results.
As was mentioned eariler, it probably best to ignore the rDNA benchmarks. Comparing the FP32 FLOPS plus a small percentage of FP64 FLOPS is a reasonable approximation for FAH benchmarks but of course the actual benchmark is better than that sort of approximation.
Back to nV: I consider the 2060 and above examples of the same issue and apparently we're talking about the same order of shader counts. For those GPUs which support half precision, it's also a wasted feature for them.
If you post the project number(s) that you're testing and the specific Navi model, I'll ask around but I don't have the equipment to be able to personally compare results.
As was mentioned eariler, it probably best to ignore the rDNA benchmarks. Comparing the FP32 FLOPS plus a small percentage of FP64 FLOPS is a reasonable approximation for FAH benchmarks but of course the actual benchmark is better than that sort of approximation.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
If the OpenCL implementation is correctly setting all these modes, you may be right. But it would be interesting to watch a trace to see how often they are in the 'wrong' mode and what the overhead of swapping modes is.NoMoreQuarantine wrote: rDNA changes the design of their execution units so they can perform an FP16 op in one cycle in GCN mode, an FP32 op in one clock cycle in Wave32 mode, or an FP64 op in one clock cycle in Wave64 mode.
I also doubt it's the OpenCL implementation.
We can see that the OpenMM programmer was tuning at the level he controls but it would not surprise me if AMD is staying in GCN mode more often than is optimal. "we've always done it that way" is never a good excuse.
The comments about Very Long Instruction Word refer to the even older Terascale2 and 3 GPUs.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Posts: 168
- Joined: Tue Apr 07, 2020 2:38 pm
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
I don't own a Navi GPU, we would have to get someone else to assist.bruce wrote:If you post the project number(s) that you're testing and the specific Navi model, I'll ask around but I don't have the equipment to be able to personally compare results.
The 5700 XT that was tested should have 10.5 FP32 TFLOPS and the 2060 Super should have 8.1 FP32 TFLOPS at the frequencies posted. The reason I made this thread was because I was making a list of the the current generation of AMD & NVIDIA GPU specs and found that discrepancy when comparing actual performance.bruce wrote:As was mentioned eariler, it probably best to ignore the rDNA benchmarks. Comparing the FP32 FLOPS plus a small percentage of FP64 FLOPS is a reasonable approximation for FAH benchmarks but of course the actual benchmark is better than that sort of approximation.
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
Understood but the small protein problem does reduce the performance measurably below the large protein performance.
(Which is another way of saying that GPU folding performance is NOT linear.)
Somebody who knows the internals of OpenMM may have some cogent comments.
(Which is another way of saying that GPU folding performance is NOT linear.)
Somebody who knows the internals of OpenMM may have some cogent comments.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 146
- Joined: Sun Jul 30, 2017 8:40 pm
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
Believe what you want to believe. That'll make your theoretical approaches not a bit better.NoMoreQuarantine wrote:People simulate large atoms for FAH. If the atom size at 1 minute is too small to give an accurate representation of performance for FAH, then it's not very helpful to the people trying to benchmark. That said, I doubt that is the issue. I also doubt it's the OpenCL implementation.foldinghomealone2 wrote:Testing as long as you get results you want to see?
Better go to AMD and complain about their crappy OpenCL-implementation
But you don't have to believe me, you can test it yourself.
FahBench's run length has nothing to do with atom count. It's the same WU with 64k atoms. You just run the benchmark longer if you increase the time.
With increased times you can test your system when it's heated through and see if it's stable then.
And why should a higher atom count be 'better'?
It should only be higher if all current WUs have more atoms to see realistic results.
But atom counts differ from project to project. Like currently released projects p14549 (28k atoms) and p14415 (290k atoms).
I don't know what the best number of atoms to bench would be.
Maybe 64k resembles a good average of current projects, maybe it should be higher.
But just saying it should be higher because you think the 5700XT would score higher is the wrong approach.
And what about all the 'slower' GPUs that can't handle much more atoms well? I guess then the score would under-represent their value to folding.
To be on the safe side it would be necessary to run several benchmarks with low atom count, average/medium atom count and high atom count.
And run 1min tests and 15min tests (to even out starting conditions and to reflect a real-world folding scenario with 'hot' GPUs)
-
- Posts: 168
- Joined: Tue Apr 07, 2020 2:38 pm
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
Did I say something to offend you at some point? I think you've been insulting my effort since the first time I saw your username.foldinghomealone2 wrote:Believe what you want to believe. That'll make your theoretical approaches not a bit better.
Yep, you have to change the WU to get a different atom count. I wasn't thinking when I wrote about run length.foldinghomealone2 wrote:But you don't have to believe me, you can test it yourself.
FahBench's run length has nothing to do with atom count. It's the same WU with 64k atoms. You just run the benchmark longer if you increase the time.
With increased times you can test your system when it's heated through and see if it's stable then.
I don't think it would be better, but foldy proposed that a low atom count may be the reason for the performance difference.foldinghomealone2 wrote:And why should a higher atom count be 'better'?
It should only be higher if all current WUs have more atoms to see realistic results.
But atom counts differ from project to project. Like currently released projects p14549 (28k atoms) and p14415 (290k atoms).
We'd have to look at what kind of distribution the projects have. Likely all over the place.foldinghomealone2 wrote:I don't know what the best number of atoms to bench would be.
Maybe 64k resembles a good average of current projects, maybe it should be higher.
I'm glad nobody said that then. Good point with the slower GPUs. I don't know, it could accurately represent their value, depends on how FAH distributes WUs to slower GPUs.foldinghomealone2 wrote:But just saying it should be higher because you think the 5700XT would score higher is the wrong approach.
And what about all the 'slower' GPUs that can't handle much more atoms well? I guess then the score would under-represent their value to folding.
Sounds reasonable.foldinghomealone2 wrote:To be on the safe side it would be necessary to run several benchmarks with low atom count, average/medium atom count and high atom count.
And run 1min tests and 15min tests (to even out starting conditions and to reflect a real-world folding scenario with 'hot' GPUs)
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Does OpenMM take full advantage of rDNA for Navi GPUs?
The smallest GPU Project (14321) right now has 13,252 atomsNoMoreQuarantine wrote:...We'd have to look at what kind of distribution the projects have. Likely all over the place...
The largest GPU Project (14416) right now has 307,167 atoms
https://apps.foldingathome.org/psummary
With new GPU Projects (if/when) they are released, there's a possibility that the above values may change.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues