Question on Fermi GPU P10927-10978 project design

GreyWhiskers · Post by **GreyWhiskers** » Sun May 15, 2011 1:19 am

As I mentioned in my earlier post Anatomy of a series of GPU Work Units from the trenches, the vast majority (450 out of 455 by my count on the HFM Work Unit Viewer) of the GPU Work Units for my GTX-560Ti have been the P6801 series, of which I profiled a set of some 370 WUs.

But, I've gotten several of the Project 10965-like WUs, and see them perform quite differently on my system. Note that the observations below are a bit like reading tea leaves - just looking at external factors with no idea what's going on inside the black box.

P10965 WUs have a TPF of 00:00:42, and complete in just over an hour. HFM computes 19,048 PPD, vs the P6801 PPD of 14,378.

I was thinking that the design of the P10965 WUs must be making the GPU work harder. I was surprised to see MSI-afterburner and GPU-Z were showing 99% GPU utilization on both P6801 and P10965-like WUs, but my wall plug power draw on the system dropped from a rock-steady 288 watts (P6801 GPU WU plus a -bigadv SMP-7 WU) to 252 watts with P10965.

When it finished the P10965 and picked up the next P6801, now on Generation 13, the wall-plug power went right back up to 288 watts.

Wow!! More PPD, less power.

Just as a matter of curiosity, what are they doing in the P10965-like project designs to make this difference? I see from the project description that this is a beta project for the OpenMM core with the GB model. Whatever their design is seems to be most desirable - especially the electrical power draw!

Project 10965

P10927-10978: Test simulations of Protein-G peptide with gpu openmm-gromacs (Fermi boards)

These beta tests are to evaluate the performance of a new core (openmm-gromacs) on gpu with Generalized Born (GB) model used as implicit solvent. Different force fields and different inner dielectric constants are used for this set of simulations.

Points and deadlines:
project 10927-10978: 925 points, preferred deadline 14 days, final deadline 20 days

k1wi · Post by **k1wi** » Sun May 15, 2011 2:46 am

The first thing you'll notice is that the P6801 series has a larger number of atoms ~600, whereas the 10927-10978 series has 247.

There has been a lot of discussion about different GPUs folding 'better' on different sized atom projects, due to them having either more or fewer shaders than the benchmark machine. I would suspect that this is most likely to be the cause of the difference in ppd and temperature.

To elaborate further, the benchmark machine will fold both work units for an equal ppd, so the difference there is due to the difference between your GPU and the benchmark machine - cheaper/less powerful/fewer shaders GPUs will result in a drop in ppd on big work units relative to small work units and vice versa for expensive/more powerful/more shaders GPUs.

On the temperature side of things, others may be able to answer more accurately, and correct me where neccessary, but the work unit with the higher atom count will putting more strain on your GPU compared with the smaller atom count work unit than is actually reflected by the 100% utilisation reported in your monitoring software.

(in the same way that F@H at 100% CPU utilisation runs a lot hotter than most other applications running @ 100% CPU utilisation)

Napoleon · Post by **Napoleon** » Sun May 15, 2011 7:11 am

I'm thinking number of atoms makes a difference, too. Proportionally, this is much more pronounced on a GT430. Vast majority of WUs I receive are P68xx, they draw more power, temps are higher, about 4200PPD. 112xx projects - less power, lower temps, about 8400PPD. Haven't seen 109xx projects yet, but based on number of atoms I'd expect them to behave similarly to 112xx on a GT430.

EDIT: Spoke too soon, I have seen at least one of these WUs in the past. What I said above seems to be valid. As far as guesses go, I suppose they might be doing something in a more efficient (or simpler) way, giving them a fixed 4000+ PPD boost. FahCore_15.exe CPU utilization on my setup is much higher with these small WUs, though. Not necessarily good news for SMP+GPU folders...

viewtopic.php?f=38&t=17186&p=173029#p173070

For obvious reasons, I personally wouldn't mind receiving these smaller WUs instead of P68xx.

rklapp · Post by **rklapp** » Tue May 17, 2011 1:42 am

My gpu ppd goes up 5k with these wus but my smp also drops 3k ppd. It appears to be using more of the cpu cycles. Perhaps the drop in watts is because of the power efficiency difference between the cpu and gpu cores.

GreyWhiskers · Post by **GreyWhiskers** » Tue May 17, 2011 2:00 am

@rklapp. I haven't seen the diff in the SMP WUs, maybe because I wasn't observing the simultaneous frame-by-frame performance, and maybe because the p109xx and their friends are in an out so quickly (~ one hour out of a 2+day P6900 SMP run), and maybe because I am running -smp 7 on the Sandy Bridge during the 2+ day run of the P6900 SMP WUs.

In any case, I've observed a ppd increase of 4670 between GPU p6801 and the p109xx. Maybe that's a good trade, if your instantaneous SMP ppd only goes down by 3k, especially if that's for only a very short part of the SMP life span, so as not to materially affect the time-to-complete for the SMP.

But, we turn on our systems, and process whatever the Stanford servers throw at us, whether its good news like the p109xx or the not-so-good news like the infamous p2684s on SMP.

k1wi · Post by **k1wi** » Tue May 17, 2011 3:51 am

@Napoleon, I believe the reason for the higher CPU load is because the GPU is working through the computations faster, so is having to go back to the CPU more frequently (not sure if it's for synchronization or issuing of new instructions). With larger atom counts it does not have to go back as frequently.

@Greywhiskers, it is most likely because you are running -smp 7.

I think comparing to p2684s is a bit of a stretch. If anything, they are getting more credit than their 'proportional power' on the smaller work units. At the end of the day Stanford, can't be limited to a single range of atom-sized work units and it also cannot benchmark for all different types of GPUs out there. Those machines that have fewer shaders/lower performance than the benchmark machine will always have a higher than expected ppd on smaller atom-count work units (because the benchmark machine isn't as efficient at/fully utilised with them), while having either more representative or lower ppd on the large atom-count work units.

The question then becomes "should we differentiate between small medium and large sized GPU projects and allow GPU users to choose the category which folds most effectively on their machine?" Or perhaps should we 'teach' the client to differentiate between them for us? Of course, that answer does not necessarily dictate that everyone gets the highest ppd projects as Stanford could also recalculate the points. So rather than avoiding all the 'harder, lower ppd' projects, all the points are recalculated such that the higher ppd work units are recalculated to the lower ppd rate (reflecting the lower performance relative to the higher performance GPUs).

Folding Forum

Question on Fermi GPU P10927-10978 project design

Question on Fermi GPU P10927-10978 project design

Re: Question on Fermi GPU P10927-10978 project design

Re: Question on Fermi GPU P10927-10978 project design

Re: Question on Fermi GPU P10927-10978 project design

Re: Question on Fermi GPU P10927-10978 project design

Re: Question on Fermi GPU P10927-10978 project design