FAHBench (OpenMM 5.1)

JimF · Post by **JimF** » Tue May 28, 2013 5:14 pm

OK, I just wanted to show that it would work, but it is taking too long to finish.

7im · Post by **7im** » Wed May 29, 2013 6:31 pm

Asus GT430 @ 730 MHz core, 900 MHz memory (stock)
WinXP 32-bit, Nvidia 314.22 drivers

OpenCL Explicit SP: 3.91663 ns/day
OpenCL Implicit SP: 16.9768 ns/day

Intel Core2 Duo (E8400) @ 3.0 GHz (stock)
WinXP 32-bit

Explicit SP: 0.7763 ns/day
Implicit SP: 0.929 ns/day

Post by **Jesse_V** » Thu May 30, 2013 7:54 am

Nvidia GT 240m.

OpenCL explicit single-precision: 2.14831 ns/day
OpenCL implicit single-precision: 11.983 ns/day

Double precision is unsupported on my GPU according to FAHBench, and although CUDA is, I don't have VS2010 installed for that FAHBench option.
I enabled "verify accuracy".

PantherX wrote:First post updated.

Thanks for the attention to detail you put into the first post there, it's nice.

Post by **PantherX** » Thu May 30, 2013 7:59 am

Thanks for those kind words Jesse_V

The first post has been updated with your data.

Veix · Post by **Veix** » Thu May 30, 2013 8:14 pm

Anandtech article on GTX770 including FAHBench brought me back here

FAHBench 1.2.0
Win7 64Bit, latest 3071 GPU driver
Running on CPU, Intel i5-3570K

Before Intel OpenCL 2013 SDK install
OpenCL explicit single-precision: 2.78193 ns/day

After Intel OpenCL 2013 SDK install
OpenCL explicit single-precision: 3.04155 ns/day

OpenCL explicit double-precision: 2.00282 ns/day

And yes, testing on HD4000 crashes the display driver

Napoleon · Post by **Napoleon** » Wed Jun 05, 2013 11:37 am

I updated the GPU QRB PPD predictor spreadsheet I've mentioned in a previous post of mine. Thanks to folding_hoomer, he kindly provided reference values for ns/day and an SMP project on a processor that matches the official SMP benchmark machine pretty closely. According to the chart, GPUs with the following OpenCL SP explicit ns/day results should score as follows (based on FAHBench v1.2.0 results):

5 ns/day 11 178 PPD
10 ns/day 31 617 PPD (2x, 31 617 / 11 178 == 2.83x)
20 ns/day 89 425 PPD (4x, 8.00x)
30 ns/day 164 285 PPD (6x, 14.70x)
40 ns/day 252 933 PPD ( 8x, 22.63x)

As the QRB formula suggests, for every doubling of speed (== halving the TPF) you should receive 2*√2 == ~2.83x more points. FYI, 11 178*(2*√2)³ == ~252 929, so the spreadsheet has small rounding errors. Anyway, we'll see how it really goes once core_17 gets released to production.

AndyE · Post by **AndyE** » Wed Jun 05, 2013 5:20 pm

Napoleon,
I am not sure I can follow the model. For instance: A 7970 does approx 40ns, I havent yet seen any of my 7970 delivering these ppds.

Rgds,
Andy

mdk777 · Post by **mdk777** » Wed Jun 05, 2013 6:05 pm

I am not sure I can follow the model. For instance: A 7970 does approx 40ns, I havent yet seen any of my 7970 delivering these ppds.

Yeah, that is his point.

The current beta projects, while obviously a huge improvement, have not yet yielded points as anticipated if you simply extrapolate out from current projects.

AndyE · Post by **AndyE** » Wed Jun 05, 2013 6:41 pm

Ah, thanks.

I read the term "should" in a different way ....

Napoleon · Post by **Napoleon** » Wed Jun 05, 2013 7:03 pm

Okay, I'll try to give an example: you have an OpenCL capable CPU that can do 2.0ns/day OpenCL explicit SP in FahBench and has 4.0min TPF for an explicit solvent SMP project A, resulting in B points with the SMP QRB. GROMACS is much more optimized than OpenCL (about 2.5x faster), so the GPU version of project A would have to be run on a 2.5x2.0ns/day == 5.0ns/day capable GPU to reach the same TPF, and hence B points with the QRB (equal pay for equal work).

The idea is to scale the TPF of a known CPU and SMP project with a factor of "2.5 * CPU ns/day" / "GPU ns/day" and calculate the expected "equal pay for equal work" PPD based on the scaled TPF. The spreadsheet plots an XY graph for "GPU ns/day" in the [3.0, 3.1, 3.2, ... , 40.0] range. Since the reference CPU currently used in the chart is pretty close to 2.0 ns/day, the TPF scaling factor could be simplified to be approximately "5 ns/day" / "GPU ns/day".

Then it's just a matter of inserting the scaled TPF to the QRB PPD formula, using the QRB parameters of the SMP project. The QRB PPD formula used in my spreadsheet is the same as in http://www.linuxforge.net/bonuscalc2.php.

FYI, it has been mentioned elsewhere that we should not extrapolate our GPU QRB expectations from FAHBench results. Human nature (at least mine) being what it is, I simply had to do just that.

Never push this button, or else...

Post by **bruce** » Wed Jun 05, 2013 7:50 pm

It's not uncommon for a benchmark to fail to scale uniformly.

Suppose we're talking about a CPU benchmark. While a program that uses a small amount of RAM may be accurate for small programs, increasing the size of the program eventually exceeds the size of the cache and may eventually exceed the size of main ram, requiring paging. Which of those benchmarks is "correct"?

The same thing is true for GPUs. Does FAHBench just measure the compute capability or does it include "proportional" delays due to the speed of VRAM or the speed of the PCIe transfers? Would the concept of "proportional" apply equally to FAHBench and FahCore_17 with assorted proteins? In fact, that's not possible, no matter how you define proportional because it changes based on the project.

AndyE · Post by **AndyE** » Wed Jun 05, 2013 10:35 pm

Thanks Napoleon for your explanation.

One question though: Where is the 2.5x factor coming from?
Are there identical projects available with a gromacs and opencl implementation?

mdk777 · Post by **mdk777** » Wed Jun 05, 2013 11:50 pm

One question though: Where is the 2.5x factor coming from?

From online discussions with Proteneer regarding the established efficiency of gromacs compared to OpenCL.

Are there identical projects available with a gromacs and opencl implementation?

Not for us, but there is no reason that PG cannot do testing.

This is of course the missing link for FAHBench . You can compare any hardware under opencl or CUDA...but you don't have an accurate conversion factor for comparing the same hardware under GROMACS.

It is a great tool for picking your next graphics card(against all other graphics cards)...but not necessarily for comparing how that graphics card will produce compared to a 4P rig. (at say similar cost and power consumption.)

This is also what Napoleon is getting at with his exercise.

AndyE · Post by **AndyE** » Thu Jun 06, 2013 12:20 am

Thanks mdk777.

Proteneer should know.
I would have questioned the accuracy of this factor, if it would have been inferred from unrelated work units.

Andy

Quisarious · Post by **Quisarious** » Thu Jun 06, 2013 12:49 am

mdk777 wrote: This is also what Napoleon is getting at with his exercise.

But just because there are some numbers attached, doesn't make this exercise any more informative than speculation. FAHBench provides relative performance of GPUs, and CPUs, on the simulation inside of FAHBench. The relative rankings, in particular between GPUs and CPUs, are dependent on the WU being simulated, as is the openCL-Gromacs fudge factor.

I understand the motivation behind this endeavor, but I think people are assuming because there are actual numbers, that the numbers are meaningful.

Folding Forum

FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)

Re: FAHBench (OpenMM 5.1)