Better Performance Without Hyperthreading

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
HendricksSA
Posts: 336
Joined: Fri Jun 26, 2009 4:34 am

Better Performance Without Hyperthreading

Post by HendricksSA »

I recently completed a new 2p computer and yes, I know bigadv work units are ending soon. Bruce and 7im remind us regularly to check settings in our own computing environments so I decided to evaluate the machine running only SMP work units with and without hyperthreading. Unlike many reports, I found I got faster folding performance with hyperthreading turned off. The machine is based around a SuperMicro motherboard with E5-2680v3 processors and DDR4 memory. The performance increase without hyperthreading going from 48 threads to 24 cores averaged 13.3% to 20.8% with a corresponding increase in points earned. Specifics follow:
Threads / Project / TPF mins (average of 10 frames across the same and similar projects)
48 / 6095 / 2:20 - 2:31
24 / 6095 / 2:03 - 2:05 (improvement range of 13.8% to 20.8%)
48 / 6096 / 2:16
24 / 6096 / 1:53 - 2:00 (improvement range of 13.3% to 20.4%)
48 / 9009 / :21
24 / 9009 / :18 (improvement of 16.6%)
48 / 9010 / :21
24 / 9010 / :18 (improvement of 16.6%)
EXT64
Posts: 323
Joined: Mon Apr 09, 2012 11:54 pm

Re: Better Performance Without Hyperthreading

Post by EXT64 »

Two quick questions to add a little more information:

Operating System?

And are you using The Kraken (affinity wrapper and dlb starter)?
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: Better Performance Without Hyperthreading

Post by Nathan_P »

Have to agree with EXT64, I've run a similar experiment and found the exact opposite - now I was using older xeons, either westmere or sandy bridge based but the results should be similar. Linux is the fastest OS, and installing The Kraken will get the most out the machine. I'll see if I can locate my old post and link it.

As for a 2p machine - don't sweat it - I have a pair of ivy based 12c/24t Xeons on their way to me to upgrade one of my 2p's. Back in the day a 2p running SMP was the fastest way to get the most points - with the latest xeons that will still be the case unless you start going for a multi gpu set up
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Better Performance Without Hyperthreading

Post by 7im »

I was just thinking someone had done a lot of testing like this before. ;)

What I did not recall was if it was done on BA work units or SMP work units. If I had to guess, BA work units are larger and so they would scale better with thread count. Looking forward to that link.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
HendricksSA
Posts: 336
Joined: Fri Jun 26, 2009 4:34 am

Re: Better Performance Without Hyperthreading

Post by HendricksSA »

EXT64 and Nathan_P, I was thinking last night I forgot the OS info. I am running Fedora 21 Workstation with X. I would not mind running the server version or The Kraken if it wasn't for my weakness with wireless USB adapter networking. I do not know how to start and log on to a wireless network from a terminal. I am certainly up for optimizing and like 7im looking forward to your reply.
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: Better Performance Without Hyperthreading

Post by Nathan_P »

Found the link I posted earlier in the year.

viewtopic.php?f=16&t=26399. The test was run with a SMP WU, once BA finishes i'll run some more tests.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Better Performance Without Hyperthreading

Post by Grandpa_01 »

You can not compare from 1 WU to the next it is flawed from the beginning. You need to save a WU at 0% and run it in both scenerios there is too big of a variation from 1 WU to the next to try and compare 2 different WU's. :wink:

I am a bit surprised by the difference you are seeing though, that is just the opposite of all the test I have run, I suspect DLB may be engaging in 1 scenario and not in the other, that can account for that type of difference in TPF. Or maybe the v3 Intel's really suck at HT scaling. There are some Boinc projects that do better with HT off but that is due to the fact they use AVX and with HT on it bottle necks the memory lanes. AVX is far more efficient and there is too much data being transferred, so with HT on the threads are continually waiting in line to proceed. To my knowledge F@H is not using AVX at this point in time.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
HendricksSA
Posts: 336
Joined: Fri Jun 26, 2009 4:34 am

Re: Better Performance Without Hyperthreading

Post by HendricksSA »

Grandpa_01, I took work unit variation into account for my somewhat limited testing. I ran the work unit to about 50%, stopped the client, changed hyperthread settings and then finished the work unit with the revised processor count. I also tested it going both ways, from 24 to 48 and from 48 to 24. The results seemed pretty consistent. I do not know how informative the Fedora system monitor is but with hyperthreading on during Folding, it looks like the real cores run at/near 100% and what I assume are the hyperthread group seem to hover about 75%. I too am surprised by my results. I keep track of my project frame times and I will try running all next week with hyperthreads on.

Nathan_P, I read your ref. I can't account for what I am experiencing. Perhaps it is a Fedora 21 thing? This is my first use of 21. My other systems were Fedora 20 and the Intel ones were faster with threading on. If I can find some time, I'll try to test comparing Fedora 20 to 21. Perhaps this is something that crops up with SMP vs Bigadv or is somehow associated with bigger caches in v3 processors or use of DDR4. One last thought. I have NUMA turned on. What is the current thinking on that ... on or off?
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: Better Performance Without Hyperthreading

Post by Nathan_P »

Not sure on the numa thing - I set my last client up over a year ago and have not tinkered with them since. I'm not familiar with fedora - I use Ubuntu, 2 of my systems run the optimised [H]folding appliance and the other one runs ultimate edition 3.4. If you had better performance with Fedora 20 and HT on then its definitely something wrong, have you tried 20 on the new machine to see if you get the same results? If you do maybe your OS needs more updates to work better/faster with the haswell xeons & DDR4
Image
HendricksSA
Posts: 336
Joined: Fri Jun 26, 2009 4:34 am

Re: Better Performance Without Hyperthreading

Post by HendricksSA »

Lots of testing later that confirmed my earlier observations. My server is slower with Hyperthreading on but I think I finally figured out what may be causing it. Fedora's bug tracking software and the system logs are showing a Direct Cache Access (DCA) error. This problem dates back some time (at least Fedora 19) and was supposedly fixed last year. I have filed a bug report with Red Hat (https://bugzilla.redhat.com/show_bug.cgi?id=1185660) and plan to inform SuperMicro. I am pretty sure memory and cache access problems will slow the computer down and account for my observations that are at odds with the prevailing thinking here. If this is ever resolved I will report back. I guess it is time to look for a new OS.
Gooders
Posts: 83
Joined: Sun Jan 12, 2014 8:17 pm
Hardware configuration: HP z600-dual 5650 xeons (6 cores-2.67 x2) , 32g ram, gtx780
Location: UK

Re: Better Performance Without Hyperthreading

Post by Gooders »

Has anyone done any testing on how turning off hyperthreading would be within windows 7? Im guessing the gains are going to be no where near??
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Better Performance Without Hyperthreading

Post by 7im »

There are no gains to folding when turning off HT in Windows of any version. That's why HendricksSA is reporting a bug, because his findings of getting better performance without HT is opposite of what everyone else has reported about using HT. See the link that Nathan_P posted above.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Better Performance Without Hyperthreading

Post by bruce »

First assume that you have a choice between running a WU on N real cores and running two WUs on 0.5*N cores. During a given day, you'll complete the same total number of WUs because although you're running twice as many, each one will take twice as long, so there's no net gain in baseline points.

Now assume those two CPUs are HT partners, sharing the same FPU/SSE components. Total performance will increase slightly (reports indicate it's between 10% and 30%) but the scientific value goes down because WUs are returned more slowly. When faced with this challenge, the Pande Group designed a non-linear points system called the Quick Return Bonus (QRB) so that you'll earn more points NOT running twice as many WUs.

Now assume that you have a choice of running ONE WU using either a dedicated non-HT CPU or using both partners which share a HT pair. You will gain between 10% and 30% in total throughput, and somewhat more than that in PPD.

Now suppose I have a 32-way machine which, which with HT can run 64 threads. Further suppose I have a WU which will run successfully on 32 CPUs but which blows up when given 64 threads. I can compare it running on 32 dedicated CPUs (with HT off) to running it with 32-threads (with HT on) but I'll be using only half of my hardware. Guess what, using only half of my hardware is slower than using it all.

Increasing the slot setting from 32 (non HT) to 64 (with HT) is a (small) benefit, but only if the WU doesn't blow up.
HendricksSA
Posts: 336
Joined: Fri Jun 26, 2009 4:34 am

Re: Better Performance Without Hyperthreading

Post by HendricksSA »

Given the before-mentioned direct cache access error, I transitioned from Fedora 21 to Ubuntu 14.10. I was initially bummed to see the same error reported in the dmesg output for Ubuntu 14.10. However, Ubuntu apparently installs a really good workaround to the problem and performance with hyperthreading is back to what you would expect (and then some). Initial results:

Code: Select all

Fedora 21 (48 threads)			Ubuntu 14.10 (48 threads)	           Improvement
Projects 88xx avg frame 2:13		avg frame 1:42			           30 %

Fedora 21 (24 threads)			Ubuntu 14.10 (48 threads)
Project 7520 avg frame 1:12		avg frame 0:52			            38 %
(note: in Fedora, 24 threads outperformed 48 threads in all cases) 
Note: I had some minor difficulty installing client 7.4.4 following the Ubuntu install directions. It installed without requesting any user input and began running the service immediately as anonymous. Stopping the client and copying over my config.xml file into /etc/fahclient solved the problem.
EXT64
Posts: 323
Joined: Mon Apr 09, 2012 11:54 pm

Re: Better Performance Without Hyperthreading

Post by EXT64 »

Yep - I've had that problem before. It seems to depend on which distro and which version of which distro - and I can never remember which have the problem and which don't. So, I always end up having it glitch and then me manually fix it.
Post Reply