I have a quad running 3 SMP clients on cores 2-4, and an ATI GPU client on core 1 + a uniprocessor client on core 1 picking up the slack. This box is on Vista 64-bit by the way.
The SMP clients average about 1200-1250 PPD each, so roughly 3600-3700 PPD on 3 cores.
If I run one SMP client on all cores at idle priority and the GPU client on low, the SMP gets about 2000 PPD (it varies some, say 1700-2200).
If I run one SMP client on 2 cores, it produces about the same, e.g. 2000-2100 PPD.
If I run one SMP client on 3 cores, it varies a lot, I've seen from 1400-2500 PPD. Generally it seems 2 cores are faster, and for sure one SMP for each core is the fastest - by far.
My work PC has a Core 2 Duo, which when idle and running one SMP client runs at about 1500-1600 PPD in Vista. When I use it (standard office use basically) it slows down a great deal, when I checked it today I was down at just 500 PPD. It doesn't do a great deal of work otherwise, but I guess the constant processor use by higher priority processes really messes up the MPI. I think I'm going to stick a uniprocessor client on this one instead.
Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?
Hold on there shatteredsilicon, you post above talks about 1 client on 4 cores vs. 4 clients on 4 cores? That's quite a bit different than the original question of 2x2 GHz vs. 1x4 GHz.
And I think in our hypothetical disccusion, we can agree that the discussion should be about dedicated folding machines when debating performance of one config vs. another. If you want to start throwing in real world variable CPU loads, we might as well close the thread. There is no way to accurately predict how those non-dedicated machines would perform. But I venture to guess the 1x4 GHz config would perform less well.
And I think in our hypothetical disccusion, we can agree that the discussion should be about dedicated folding machines when debating performance of one config vs. another. If you want to start throwing in real world variable CPU loads, we might as well close the thread. There is no way to accurately predict how those non-dedicated machines would perform. But I venture to guess the 1x4 GHz config would perform less well.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?
No, actually, the cases are as equivalent as you can get. The single CPU core performance will scale linearly - assuming that a single 4GHz core will perform twice as fast as a single 2GHz core is pretty much bang on the money. Thus, if you have a 2GHz 2-core CPU, you can measure the performance of running the SMP client bound to just one core or to two cores. Since the single-core case will scale linearly (assuming RAM I/O is not being a bottleneck, which it doesn't seem to be on my machines since running one client per CPU core scales linearly (PPD-wise) with the number of cores used - thus indicating the problem is CPU rather than memory bound), any discrepancy between 2GHz speed x2 (guesstimated 1x4GHz performance) and speed of the client running on 2x2GHz cores simultaneously will come directly from the imbalances and overheads incurred.7im wrote:Hold on there shatteredsilicon, you post above talks about 1 client on 4 cores vs. 4 clients on 4 cores? That's quite a bit different than the original question of 2x2 GHz vs. 1x4 GHz.
I disagree on several points here:7im wrote:And I think in our hypothetical disccusion, we can agree that the discussion should be about dedicated folding machines when debating performance of one config vs. another. If you want to start throwing in real world variable CPU loads, we might as well close the thread. There is no way to accurately predict how those non-dedicated machines would perform. But I venture to guess the 1x4 GHz config would perform less well.
1) I don't think it makes sense to discuss this just in the context of dedicated machines in this way, especially because throwing a GPU client on this dedicated folding machine will blow the whole thing out of the water with the mentioned imbalance problem. The point being that a dedicated folding machine isn't necessarily a machine dedicated only to SMP folding. GPU client or other "unpredictable" load, it doesn't really matter, it'll kill the scaling just the same.
2) Real world variable CPU loads are what most people's machines will face. Only a very small number of people have dedicated folding farms. I would imagine that most contributors have various machines they use for other things and F@H is just a way to make extra use of them by running the client at low priority whenever there is some spare CPU time to be had (e.g. various servers). It is much easier to justify running f@h on a machine that is already on and serving some load than it is to justify having a machine running 24/7 that is only otherwise useful being switched on for a few hours/day (e.g. a home desktop machine).
3) While you cannot predict exactly how a non-dedicated machine would perform, formulating a model describing this is actually very easy from simply recognizing that the folding process scales with the slowest thread. Thus, having a 4-core CPU running an SMP client and a GPU client (which uses around 19-25% (let's assume 25% for now since it's a nice round figure to work with) of one core depending on the performance of the processor, under Linux - this observation is from my own systems) will only perform at about 75% capacity on the SMP client (with 3x25% being left idle). In this case the performance of the SMP client running on 4 cores is only at best 50% faster than it being limited to only 2 cores. This is also a somewhat optimistic guesstimate because there are also significant overheads incurred by migrating processes between CPU cores in an attempt to load-balance them, which also causes cache misses, etc. - AMD CPUs may handle this more efficiently than Core2 class CPUs.
4) 1x4GHz scenario would perform better because there is no wasted/idle time caused by the slowest thread limitation. On a single core CPU, all processes run on the same CPU, so they all get scheduled linearly. That means that all processes are automatically balanced to the full capacity of the CPU to run them. If you have a GPU client using up 25% of the CPU, the SMP client will use all of the remaining 75%. In the 4x1GHz case, you'd get 3GHz equivalent on the SMP client, 0.25GHz GPU, 0.75GHz idle. In the 1x4GHz case you'd get 0.25GHz GPU, 3.75GHz SMP, and no idle time.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Quad-core 2Ghz vs Dual-core 4Ghz - Which faster?
No, those are not as equivalent as you can get. 1 client on 4 cores is one of the least efficient configurations (CPU usage, NOT folding usage, big difference) because the SMP fahcores do not yet scale well (ignoring the scant few sightings of the a2 core).
When making the comparison, it's MUCH easier to determine a winner when you eliminate as many variables as possible. Now you want to add as many variables as possible. With those assumptions and guesses, as I said before, we might as well close this thread. No clear determination could be made, let alone be supported by any hard facts.
Feel free to start a new thread if you like. It might be interesting to see what title you choose.
When making the comparison, it's MUCH easier to determine a winner when you eliminate as many variables as possible. Now you want to add as many variables as possible. With those assumptions and guesses, as I said before, we might as well close this thread. No clear determination could be made, let alone be supported by any hard facts.
Feel free to start a new thread if you like. It might be interesting to see what title you choose.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.