Question on Fermi GPU P10927-10978 project design

Moderators: Site Moderators, FAHC Science Team

Post Reply
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 [email protected] GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 [email protected] GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Question on Fermi GPU P10927-10978 project design

Post by GreyWhiskers »

As I mentioned in my earlier post Anatomy of a series of GPU Work Units from the trenches, the vast majority (450 out of 455 by my count on the HFM Work Unit Viewer) of the GPU Work Units for my GTX-560Ti have been the P6801 series, of which I profiled a set of some 370 WUs.

But, I've gotten several of the Project 10965-like WUs, and see them perform quite differently on my system. Note that the observations below are a bit like reading tea leaves - just looking at external factors with no idea what's going on inside the black box.

P10965 WUs have a TPF of 00:00:42, and complete in just over an hour. HFM computes 19,048 PPD, vs the P6801 PPD of 14,378.

I was thinking that the design of the P10965 WUs must be making the GPU work harder. I was surprised to see MSI-afterburner and GPU-Z were showing 99% GPU utilization on both P6801 and P10965-like WUs, but my wall plug power draw on the system dropped from a rock-steady 288 watts (P6801 GPU WU plus a -bigadv SMP-7 WU) to 252 watts with P10965.

When it finished the P10965 and picked up the next P6801, now on Generation 13, the wall-plug power went right back up to 288 watts.

Wow!! More PPD, less power. :eo

Just as a matter of curiosity, what are they doing in the P10965-like project designs to make this difference? I see from the project description that this is a beta project for the OpenMM core with the GB model. Whatever their design is seems to be most desirable - especially the electrical power draw!
Project 10965

P10927-10978: Test simulations of Protein-G peptide with gpu openmm-gromacs (Fermi boards)

These beta tests are to evaluate the performance of a new core (openmm-gromacs) on gpu with Generalized Born (GB) model used as implicit solvent. Different force fields and different inner dielectric constants are used for this set of simulations.

Points and deadlines:
project 10927-10978: 925 points, preferred deadline 14 days, final deadline 20 days
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Re: Question on Fermi GPU P10927-10978 project design

Post by k1wi »

The first thing you'll notice is that the P6801 series has a larger number of atoms ~600, whereas the 10927-10978 series has 247.

There has been a lot of discussion about different GPUs folding 'better' on different sized atom projects, due to them having either more or fewer shaders than the benchmark machine. I would suspect that this is most likely to be the cause of the difference in ppd and temperature.

To elaborate further, the benchmark machine will fold both work units for an equal ppd, so the difference there is due to the difference between your GPU and the benchmark machine - cheaper/less powerful/fewer shaders GPUs will result in a drop in ppd on big work units relative to small work units and vice versa for expensive/more powerful/more shaders GPUs.

On the temperature side of things, others may be able to answer more accurately, and correct me where neccessary, but the work unit with the higher atom count will putting more strain on your GPU compared with the smaller atom count work unit than is actually reflected by the 100% utilisation reported in your monitoring software.

(in the same way that F@H at 100% CPU utilisation runs a lot hotter than most other applications running @ 100% CPU utilisation)
Image
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: Question on Fermi GPU P10927-10978 project design

Post by Napoleon »

I'm thinking number of atoms makes a difference, too. Proportionally, this is much more pronounced on a GT430. Vast majority of WUs I receive are P68xx, they draw more power, temps are higher, about 4200PPD. 112xx projects - less power, lower temps, about 8400PPD. Haven't seen 109xx projects yet, but based on number of atoms I'd expect them to behave similarly to 112xx on a GT430.

EDIT: Spoke too soon, I have seen at least one of these WUs in the past. What I said above seems to be valid. As far as guesses go, I suppose they might be doing something in a more efficient (or simpler) way, giving them a fixed 4000+ PPD boost. FahCore_15.exe CPU utilization on my setup is much higher with these small WUs, though. Not necessarily good news for SMP+GPU folders... :e?:
viewtopic.php?f=38&t=17186&p=173029#p173070

For obvious reasons, I personally wouldn't mind receiving these smaller WUs instead of P68xx. :twisted:
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
rklapp
Posts: 6
Joined: Thu Feb 04, 2010 8:00 am

Re: Question on Fermi GPU P10927-10978 project design

Post by rklapp »

My gpu ppd goes up 5k with these wus but my smp also drops 3k ppd. It appears to be using more of the cpu cycles. Perhaps the drop in watts is because of the power efficiency difference between the cpu and gpu cores.
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 [email protected] GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 [email protected] GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Re: Question on Fermi GPU P10927-10978 project design

Post by GreyWhiskers »

@rklapp. I haven't seen the diff in the SMP WUs, maybe because I wasn't observing the simultaneous frame-by-frame performance, and maybe because the p109xx and their friends are in an out so quickly (~ one hour out of a 2+day P6900 SMP run), and maybe because I am running -smp 7 on the Sandy Bridge during the 2+ day run of the P6900 SMP WUs.

In any case, I've observed a ppd increase of 4670 between GPU p6801 and the p109xx. Maybe that's a good trade, if your instantaneous SMP ppd only goes down by 3k, especially if that's for only a very short part of the SMP life span, so as not to materially affect the time-to-complete for the SMP.

But, we turn on our systems, and process whatever the Stanford servers throw at us, whether its good news like the p109xx or the not-so-good news like the infamous p2684s on SMP.
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Re: Question on Fermi GPU P10927-10978 project design

Post by k1wi »

@Napoleon, I believe the reason for the higher CPU load is because the GPU is working through the computations faster, so is having to go back to the CPU more frequently (not sure if it's for synchronization or issuing of new instructions). With larger atom counts it does not have to go back as frequently.

@Greywhiskers, it is most likely because you are running -smp 7.

I think comparing to p2684s is a bit of a stretch. If anything, they are getting more credit than their 'proportional power' on the smaller work units. At the end of the day Stanford, can't be limited to a single range of atom-sized work units and it also cannot benchmark for all different types of GPUs out there. Those machines that have fewer shaders/lower performance than the benchmark machine will always have a higher than expected ppd on smaller atom-count work units (because the benchmark machine isn't as efficient at/fully utilised with them), while having either more representative or lower ppd on the large atom-count work units.

The question then becomes "should we differentiate between small medium and large sized GPU projects and allow GPU users to choose the category which folds most effectively on their machine?" Or perhaps should we 'teach' the client to differentiate between them for us? Of course, that answer does not necessarily dictate that everyone gets the highest ppd projects as Stanford could also recalculate the points. So rather than avoiding all the 'harder, lower ppd' projects, all the points are recalculated such that the higher ppd work units are recalculated to the lower ppd rate (reflecting the lower performance relative to the higher performance GPUs).
Image
Post Reply