Page 1 of 1
Dynamic Load Balancing
Posted: Sat Jan 21, 2012 10:06 am
by One_Box
I know this subject has been discussed before, does V7 suffer from the same issues as 6.34 (Linux).
I don't suppose there is a solution to the problem yet ?
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 2:31 pm
by Grandpa_01
Dynamic Load Balancing is a OS issue not a gromacs or client version issue.
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 3:46 pm
by One_Box
I have tried Ubuntu 10.10, Ubuntu 11.04 and Linux Mint 12.0 (which seems to be the best from personal observation) and they all suffer from the problem to some degree.
Are you aware of an OS that doesn't have the problem ?
Thanks for your help.
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 3:57 pm
by 7im
Like fahlimit?
Edit: nevermind, wrong load...
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 5:20 pm
by artoar_11
I also noticed that in some WUs "dynamic load balancing" worked. Then at the end of WU (only in Linux, only in the terminal window) shows the following (p75xx):
Code: Select all
[04:12:11] Completed 0 out of 500000 steps (0%)
NOTE: Turning on dynamic load balancing
[04:17:28] Completed 5000 out of 500000 steps (1%)
[04:22:29] Completed 10000 out of 500000 steps (2%)
..............
[12:32:27] Completed 500000 out of 500000 steps (100%)
Writing final coordinates.
Average load imbalance: 0.4 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
When the "dynamic load balancing" is not activated, I see the following (p6098):
Code: Select all
[07:06:37] Completed 0 out of 500000 steps (0%)
[07:19:17] Completed 5000 out of 500000 steps (1%)
[07:31:57] Completed 10000 out of 500000 steps (2%)
[07:44:35] Completed 15000 out of 500000 steps (3%)
..................
[04:09:53] Completed 500000 out of 500000 steps (100%)
Writing final coordinates.
Average load imbalance: 2.2 %
Part of the total run time spent waiting due to load imbalance: 0.9 %
Note the difference in the percentages. Not much, but the processor calculates faster, when activated. "Dynamic Load Balancing" has been discussed and before:
viewtopic.php?f=44&t=13324&start=0&hilit=Dynamic+Load+Balancing#p135781
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 6:02 pm
by Grandpa_01
Any Generic-ck distro of Linux uses the BFS scheduler. You can find Musky's guide for installing it over at the [H]
http://hardforum.com/showthread.php?t=1601608
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 6:16 pm
by brutis
Have you tried tear's Kraken?
http://www.amdzone.com/phpbb3/viewtopic ... 1&t=138463
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 6:22 pm
by 7im
One_Box wrote:I know this subject has been discussed before, does V7 suffer from the same issues as 6.34 (Linux).
I don't suppose there is a solution to the problem yet ?
Actually, this is probably fixed in the more recent fahcores as they are using a newer version of gromacs.
Also, if people were still seeing a 10x slowdown as the previous thread mentioned, the would be a lot more posts about this, and there aren't.
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 7:01 pm
by bruce
artoar_11 wrote:I also noticed that in some WUs "dynamic load balancing" worked. Then at the end of WU (only in Linux, only in the terminal window) shows the following (p75xx): . . .
There are at least two (actually 3) very different sources of Load Imbalance.
1) The one most of you are thinking of is what we see when we allocate SMP threads to each of our processors and GROMACS is not running on a dedicated machine. Some higher priority task steps in and preempts the processing of one or more threads, disrupting the desired pattern of equal CPU resources being available to each thread.
2) When a given protein is processed in an SMP environment, a specific group of atoms is assigned to each thread. The basic assumption that each atom requires exactly the same amount of processing and the total number of atoms can be divided into N equal groups is only approximately true so even without any disruptions from item 1, the processing times of each thread may not be equal.
3) We use GROMACS in a Symmetric MultiProcessing environment. Although it doesn't apply to us, GROMACS is capable of running on an Asymmetric cluster of nodes, the processing speed of the various CPUs may be unequal.
In the case where the workload locked to specific processors, items 2 and 3 are pretty stable. If one group of atoms consistently needs more processor time than another group, dynamic load balancing can make adjustments by moving a few atoms from one thread to another. (Similarly if one CPU is consistently slower than another, but remember 3 doesn't apply to us.)
Now let's consider item 1. The amount of CPU time used by other processes is highly variable from step to step (except in an idle machine). In most cases, the processor that is preempted to do that non-FAH work is chosen by the OS and that will vary. GROMACS cannot find a consistent, discernible pattern of one CPU being slower than another by precisely the same amount at every time-step. Dynamic Load Balancing can decide that one CPU is slow ... reduce the work in that thread ... and then find that on the next step, that processor is no longer the slow one. It's really unlikely that DLB can do anything useful about item 1 because conditions vary so much from time-step to time-step.
In our environment, DLB does help with item 2. While folding proceeds, atoms move, both randomly over short distances and systematically over longer distances. During the run, as the systematic motions proceed, atoms that were formerly far from other atoms may get closer to more atoms or they may move away from other atoms. The calculation of forces on an individual atom grows more complicated when there are more atoms nearby and becomes less complicated when there are fewer close neighbors. Thus the time spent calculating each group of atoms can change over the course of many steps. Changes to the calculations that are caused by changes in protein shape can be dynamically balanced.
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 8:08 pm
by artoar_11
@ bruce, thanks for the detailed explanation.
In percentages (time), how much w/wo the BFS scheduler, if you have information? Now I have Ubuntu 10.10/64b. I can try, if has a good effect.
@ brutis, some time ago I asked tear, for Kraken. Kraken shown to be effective only in multiprocessors configurations. I tried on my Q9400. Effect = 0.
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 8:32 pm
by brutis
good to know and thanks artoar_11
Re: Dynamic Load Balancing
Posted: Sat Jan 21, 2012 10:05 pm
by Grandpa_01
I have seen results of up to 2 min. less per frame on 6903 and 6904 Wu's. That was on early versions of 10.10 I believe .22 on the latest version .35 I really have not noticed that large of a difference but when you watch the cpu usage in the monitor you can see the difference. If you search the [H] forums there is some documentation from a while back on the results of testing BFS vs the default scheduler. And yes the Kraken is for MP rigs and BFS is for single P rigs.
Re: Dynamic Load Balancing
Posted: Sun Jan 22, 2012 6:54 am
by One_Box
@bruce, thanks for the detailed explanation of DLB.
I'll try a ck kernel and see if that improves matters.
Re: Dynamic Load Balancing
Posted: Sun Jan 22, 2012 1:27 pm
by artoar_11
Grandpa_01 wrote:I have seen results of up to 2 min. less per frame on 6903 and 6904 Wu's. That was on early versions of 10.10 I believe .22 on the latest version .35 I really have not noticed that large of a difference but when you watch the cpu usage in the monitor you can see the difference. If you search the [H] forums there is some documentation from a while back on the results of testing BFS vs the default scheduler. And yes the Kraken is for MP rigs and BFS is for single P rigs.
Sorry for deviation from the title of the topic.
BFS installed with Kernel Linux 2.6.35-30-generic-ck.
In System Monitor / %CPU - ~400% now. Before it was - 392-396%. But TPF no change.
I suppose that BFS shows a reduction of TPF, for processors with more cores (6/8/12 ...), or processors with HT?
I try with Q9400/L2-6MB cache. I had to try. Thank you Grandpa_01