other than FAH work
Moderator: Site Moderators
other than FAH work
as far as I know (pleace correct me if I am wrong) FAH is nearly only using floating operations of the CPU. I am wondering if someone has tried to use folding@home smp and another non-floating-operations-heavy program for science?
-
- Posts: 887
- Joined: Wed May 26, 2010 2:31 pm
- Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard - Location: Finland
Re: other than FAH work
I haven't done so specifically for scientific apps, but I've been pleasantly surprised how well HyperThreading does on my 2C/4T Atom330, allowing me to run my occasional ALU-intensive workloads concurrently with CPU folding. Not quite as good as having four real CPU cores, no, but I can get quite close in some specific real life scenarios of mine. One recent example in viewtopic.php?f=66&t=20407&start=30#p203413 / scenario 7. CPU TPF increased roughly 13min => 17min, combined compression speed dropped about 800kBps => 600kBps, a far cry from 13min => 26min and 800kBps => 400kBps I would expect without HT doing its magic.
Then again, my scenario 7 was a nice symmetric workload, quite friendly for FAH and HT. Got to take another look at BOINC, for example, in case there is something similar available.
Then again, my scenario 7 was a nice symmetric workload, quite friendly for FAH and HT. Got to take another look at BOINC, for example, in case there is something similar available.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
Re: other than FAH work
Napoleon has explained it very well. Here's some additional detail:beer wrote:as far as I know (pleace correct me if I am wrong) FAH is nearly only using floating operations of the CPU. I am wondering if someone has tried to use folding@home smp and another non-floating-operations-heavy program for science?
There are two limiting factors here. One is whether the FPU and the ALU can both be active at the same time and the other is whether the OS can manage your workload.
Hyperthreading (or bulldozer) gives the OS the capability of running two threads that have to share the same FPU. If both tasks need the FPU, they compete with each other and both slow down to about half speed. If only one needs the FPU and the other uses the ALU, both tasks can run at (almost) full speed.
Assume a Quad plus HT which gives your OS 8 threads to work with. A) Run SMP8 and there's nothing free to run anything else. Add 4 ALU tasks and they'll have to compete for OS resources, slowing things down. B) Run SMP4 plus 4 other tasks that use just the ALU and (if the OS assigns them in the optimum order) it's possible that all 8 will run at "normal" speed. In one case, HT gives you no extra performance; in the other case, you get twice the througput as you would without HT. That's why the advertisements for HT are very careful to use the words "depending on..." or "as much as..."
In fact, in an empty machine, SMP8 does give maybe 15% faster results than SMP4 but that's because no code uses ONLY the FPU. One FAH task uses maybe 90% of the FPU and uses the ALU the rest of the time. Another SMP thread can use the unused resources. My "about half speed" allows for that extra capacity.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 887
- Joined: Wed May 26, 2010 2:31 pm
- Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard - Location: Finland
Re: other than FAH work
Unless I'm mistaken, distributed.net OGR is ALU-only. This particular math puzzle seems to have some interesting practical applications:
For starters, I was surprised to see that running 4 OGR crunchers instead of just 2 gave me over 1.5x performance boost. Then again, ALUs are much simpler than FPUs and HyperThreads have some resources duplicated, so I presume the ALU side of my 2C/4T gets closer to a true quad than the FPU side. Consider AMD BullDozer, for example; ALU side is true octocore but it actually has only 4 FPUs... Without further ado, let's have FAH and OGR duke it out on my 2C/4T CPU.
OGR only:
Conclusion: I'm going to stick with 2x uni + 2x GPU + 2x OGR. About (34-29) / 29 * 100% == 17% increase in uniprocessor TPF introduced by adding OGR to the mix isn't that bad. Uniprocessor WUs have long deadlines anyway, and for some reason I never get any A4 WUs, so it's not like I'm losing any QRBes either.
X-ray crystallography is used in protein studies. Who knows, OGR might even benefit FAH indirectly. According to Wikipedia, one of the coordinators of distributed.net in its current form is certain Adam L. Beberg. Hmm, why does the name Beberg sound vaguely familiar...OGR's have many applications including sensor placements for X-ray crystallography and radio astronomy. Golomb rulers can also play a significant role in combinatorics, coding theory and communications, and Dr. Golomb was one of the first to analyze them for use in these areas.
For starters, I was surprised to see that running 4 OGR crunchers instead of just 2 gave me over 1.5x performance boost. Then again, ALUs are much simpler than FPUs and HyperThreads have some resources duplicated, so I presume the ALU side of my 2C/4T gets closer to a true quad than the FPU side. Consider AMD BullDozer, for example; ALU side is true octocore but it actually has only 4 FPUs... Without further ado, let's have FAH and OGR duke it out on my 2C/4T CPU.
OGR only:
- 28 Mnodes / s == 36ms / Mnode (2 crunchers, 50% CPU)
- 43 Mnodes / s == 23ms / Mnode (4 crunchers, 100% CPU)
- 27min TPF (50% CPU)
- 29min TPF + 20Mnodes / s == 50ms / Mnode (50% uni + 48% OGR, 100% total)
- 29min TPF (50% uni + 0.5% GPU2 + 3.5% GPU3)
- 34min TPF + 17 Mnodes /s == 59ms / Mnode (50% uni + 0.5% GPU2 + 3.5% GPU + 43% OGR, 100% total)
- OGR is Low priority and running on cores 0 and 3 along with GPU cores, ensuring access to both physical ALUs
- Uniprocessor slots are Above Normal priority and running on cores 1 and 2, ensuring access to both physical FPUs as well as minimal preemption from normal processes
- GPU2 slot is High priority and running on core 0, preempting just about everything on it
- GPU3 slot is High priority and running on core 3, --- "" ----
- GPU folding performance remains unaffected in all cases, no surprises there
- GPU folding requires also some kernel time because it needs to access the GPU hardware through drivers
- CPU kernel time produced by GPU folding seems to stick to cores 0 and 3 according to Task Manager graphs
Conclusion: I'm going to stick with 2x uni + 2x GPU + 2x OGR. About (34-29) / 29 * 100% == 17% increase in uniprocessor TPF introduced by adding OGR to the mix isn't that bad. Uniprocessor WUs have long deadlines anyway, and for some reason I never get any A4 WUs, so it's not like I'm losing any QRBes either.
Last edited by Napoleon on Mon Jan 23, 2012 6:59 pm, edited 5 times in total.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
-
- Posts: 450
- Joined: Tue Dec 04, 2007 8:36 pm
Re: other than FAH work
I don't understand why you mentioned 64bit. If we're still talking about HT and sharing ALU-only with FPU-mostly code, it shouldn't matter whether it's 32bit or 64bit.Napoleon wrote:Unless I'm mistaken, distributed.net OGR is ALU-only, and they even have a 64bit client available.
Perhaps you're responding to the discussions about the absence of a 64bit v7 client for F@h. If so, perhaps someone needs to remind you that F@h has a 64bit SMP core which works with the 32bit client and covers those bigadv cases where large amounts of RAM are needed.
Perhaps you meant something else.
-
- Posts: 887
- Joined: Wed May 26, 2010 2:31 pm
- Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard - Location: Finland
Re: other than FAH work
Okay, I edited the offending sentence and corrected a typo or two. What I meant is that distributed.net has both 32bit and 64bit client versions available. All the rest is strictly about "HT and sharing ALU-only with FPU-mostly code", as you put it. My choice of words would have been "HT and running ALU-only code concurrently with FPU-mostly code".
Gee, I had no idea that merely mentioning 64bit in a single subordinate clause could make all the other sentences in my post appear to be offtopic. Better now? Bear with me, english isn't my native language.
Gee, I had no idea that merely mentioning 64bit in a single subordinate clause could make all the other sentences in my post appear to be offtopic. Better now? Bear with me, english isn't my native language.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
Re: other than FAH work
Trust me, your english is much better than many 'native' speakers...Napoleon wrote:...english isn't my native language...