Page 1 of 1

Better Performance Without Hyperthreading

Posted: Sun Dec 21, 2014 10:38 pm
by HendricksSA
I recently completed a new 2p computer and yes, I know bigadv work units are ending soon. Bruce and 7im remind us regularly to check settings in our own computing environments so I decided to evaluate the machine running only SMP work units with and without hyperthreading. Unlike many reports, I found I got faster folding performance with hyperthreading turned off. The machine is based around a SuperMicro motherboard with E5-2680v3 processors and DDR4 memory. The performance increase without hyperthreading going from 48 threads to 24 cores averaged 13.3% to 20.8% with a corresponding increase in points earned. Specifics follow:
Threads / Project / TPF mins (average of 10 frames across the same and similar projects)
48 / 6095 / 2:20 - 2:31
24 / 6095 / 2:03 - 2:05 (improvement range of 13.8% to 20.8%)
48 / 6096 / 2:16
24 / 6096 / 1:53 - 2:00 (improvement range of 13.3% to 20.4%)
48 / 9009 / :21
24 / 9009 / :18 (improvement of 16.6%)
48 / 9010 / :21
24 / 9010 / :18 (improvement of 16.6%)

Re: Better Performance Without Hyperthreading

Posted: Sun Dec 21, 2014 11:41 pm
by EXT64
Two quick questions to add a little more information:

Operating System?

And are you using The Kraken (affinity wrapper and dlb starter)?

Re: Better Performance Without Hyperthreading

Posted: Mon Dec 22, 2014 10:23 am
by Nathan_P
Have to agree with EXT64, I've run a similar experiment and found the exact opposite - now I was using older xeons, either westmere or sandy bridge based but the results should be similar. Linux is the fastest OS, and installing The Kraken will get the most out the machine. I'll see if I can locate my old post and link it.

As for a 2p machine - don't sweat it - I have a pair of ivy based 12c/24t Xeons on their way to me to upgrade one of my 2p's. Back in the day a 2p running SMP was the fastest way to get the most points - with the latest xeons that will still be the case unless you start going for a multi gpu set up

Re: Better Performance Without Hyperthreading

Posted: Mon Dec 22, 2014 2:05 pm
by 7im
I was just thinking someone had done a lot of testing like this before. ;)

What I did not recall was if it was done on BA work units or SMP work units. If I had to guess, BA work units are larger and so they would scale better with thread count. Looking forward to that link.

Re: Better Performance Without Hyperthreading

Posted: Mon Dec 22, 2014 2:58 pm
by HendricksSA
EXT64 and Nathan_P, I was thinking last night I forgot the OS info. I am running Fedora 21 Workstation with X. I would not mind running the server version or The Kraken if it wasn't for my weakness with wireless USB adapter networking. I do not know how to start and log on to a wireless network from a terminal. I am certainly up for optimizing and like 7im looking forward to your reply.

Re: Better Performance Without Hyperthreading

Posted: Tue Dec 23, 2014 4:45 pm
by Nathan_P
Found the link I posted earlier in the year.

viewtopic.php?f=16&t=26399. The test was run with a SMP WU, once BA finishes i'll run some more tests.

Re: Better Performance Without Hyperthreading

Posted: Wed Dec 24, 2014 6:06 pm
by Grandpa_01
You can not compare from 1 WU to the next it is flawed from the beginning. You need to save a WU at 0% and run it in both scenerios there is too big of a variation from 1 WU to the next to try and compare 2 different WU's. :wink:

I am a bit surprised by the difference you are seeing though, that is just the opposite of all the test I have run, I suspect DLB may be engaging in 1 scenario and not in the other, that can account for that type of difference in TPF. Or maybe the v3 Intel's really suck at HT scaling. There are some Boinc projects that do better with HT off but that is due to the fact they use AVX and with HT on it bottle necks the memory lanes. AVX is far more efficient and there is too much data being transferred, so with HT on the threads are continually waiting in line to proceed. To my knowledge F@H is not using AVX at this point in time.

Re: Better Performance Without Hyperthreading

Posted: Wed Dec 24, 2014 10:48 pm
by HendricksSA
Grandpa_01, I took work unit variation into account for my somewhat limited testing. I ran the work unit to about 50%, stopped the client, changed hyperthread settings and then finished the work unit with the revised processor count. I also tested it going both ways, from 24 to 48 and from 48 to 24. The results seemed pretty consistent. I do not know how informative the Fedora system monitor is but with hyperthreading on during Folding, it looks like the real cores run at/near 100% and what I assume are the hyperthread group seem to hover about 75%. I too am surprised by my results. I keep track of my project frame times and I will try running all next week with hyperthreads on.

Nathan_P, I read your ref. I can't account for what I am experiencing. Perhaps it is a Fedora 21 thing? This is my first use of 21. My other systems were Fedora 20 and the Intel ones were faster with threading on. If I can find some time, I'll try to test comparing Fedora 20 to 21. Perhaps this is something that crops up with SMP vs Bigadv or is somehow associated with bigger caches in v3 processors or use of DDR4. One last thought. I have NUMA turned on. What is the current thinking on that ... on or off?

Re: Better Performance Without Hyperthreading

Posted: Thu Dec 25, 2014 9:20 pm
by Nathan_P
Not sure on the numa thing - I set my last client up over a year ago and have not tinkered with them since. I'm not familiar with fedora - I use Ubuntu, 2 of my systems run the optimised [H]folding appliance and the other one runs ultimate edition 3.4. If you had better performance with Fedora 20 and HT on then its definitely something wrong, have you tried 20 on the new machine to see if you get the same results? If you do maybe your OS needs more updates to work better/faster with the haswell xeons & DDR4

Re: Better Performance Without Hyperthreading

Posted: Tue Jan 27, 2015 4:45 pm
by HendricksSA
Lots of testing later that confirmed my earlier observations. My server is slower with Hyperthreading on but I think I finally figured out what may be causing it. Fedora's bug tracking software and the system logs are showing a Direct Cache Access (DCA) error. This problem dates back some time (at least Fedora 19) and was supposedly fixed last year. I have filed a bug report with Red Hat (https://bugzilla.redhat.com/show_bug.cgi?id=1185660) and plan to inform SuperMicro. I am pretty sure memory and cache access problems will slow the computer down and account for my observations that are at odds with the prevailing thinking here. If this is ever resolved I will report back. I guess it is time to look for a new OS.

Re: Better Performance Without Hyperthreading

Posted: Tue Jan 27, 2015 6:39 pm
by Gooders
Has anyone done any testing on how turning off hyperthreading would be within windows 7? Im guessing the gains are going to be no where near??

Re: Better Performance Without Hyperthreading

Posted: Tue Jan 27, 2015 7:32 pm
by 7im
There are no gains to folding when turning off HT in Windows of any version. That's why HendricksSA is reporting a bug, because his findings of getting better performance without HT is opposite of what everyone else has reported about using HT. See the link that Nathan_P posted above.

Re: Better Performance Without Hyperthreading

Posted: Wed Jan 28, 2015 12:00 am
by bruce
First assume that you have a choice between running a WU on N real cores and running two WUs on 0.5*N cores. During a given day, you'll complete the same total number of WUs because although you're running twice as many, each one will take twice as long, so there's no net gain in baseline points.

Now assume those two CPUs are HT partners, sharing the same FPU/SSE components. Total performance will increase slightly (reports indicate it's between 10% and 30%) but the scientific value goes down because WUs are returned more slowly. When faced with this challenge, the Pande Group designed a non-linear points system called the Quick Return Bonus (QRB) so that you'll earn more points NOT running twice as many WUs.

Now assume that you have a choice of running ONE WU using either a dedicated non-HT CPU or using both partners which share a HT pair. You will gain between 10% and 30% in total throughput, and somewhat more than that in PPD.

Now suppose I have a 32-way machine which, which with HT can run 64 threads. Further suppose I have a WU which will run successfully on 32 CPUs but which blows up when given 64 threads. I can compare it running on 32 dedicated CPUs (with HT off) to running it with 32-threads (with HT on) but I'll be using only half of my hardware. Guess what, using only half of my hardware is slower than using it all.

Increasing the slot setting from 32 (non HT) to 64 (with HT) is a (small) benefit, but only if the WU doesn't blow up.

Re: Better Performance Without Hyperthreading

Posted: Sun Feb 08, 2015 8:55 pm
by HendricksSA
Given the before-mentioned direct cache access error, I transitioned from Fedora 21 to Ubuntu 14.10. I was initially bummed to see the same error reported in the dmesg output for Ubuntu 14.10. However, Ubuntu apparently installs a really good workaround to the problem and performance with hyperthreading is back to what you would expect (and then some). Initial results:

Code: Select all

Fedora 21 (48 threads)			Ubuntu 14.10 (48 threads)	           Improvement
Projects 88xx avg frame 2:13		avg frame 1:42			           30 %

Fedora 21 (24 threads)			Ubuntu 14.10 (48 threads)
Project 7520 avg frame 1:12		avg frame 0:52			            38 %
(note: in Fedora, 24 threads outperformed 48 threads in all cases) 
Note: I had some minor difficulty installing client 7.4.4 following the Ubuntu install directions. It installed without requesting any user input and began running the service immediately as anonymous. Stopping the client and copying over my config.xml file into /etc/fahclient solved the problem.

Re: Better Performance Without Hyperthreading

Posted: Mon Feb 09, 2015 3:35 am
by EXT64
Yep - I've had that problem before. It seems to depend on which distro and which version of which distro - and I can never remember which have the problem and which don't. So, I always end up having it glitch and then me manually fix it.