Dynamic Load Balancing
Moderators: Site Moderators, FAHC Science Team
Dynamic Load Balancing
I know this subject has been discussed before, does V7 suffer from the same issues as 6.34 (Linux).
I don't suppose there is a solution to the problem yet ?
I don't suppose there is a solution to the problem yet ?
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Dynamic Load Balancing
Dynamic Load Balancing is a OS issue not a gromacs or client version issue.
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Re: Dynamic Load Balancing
I have tried Ubuntu 10.10, Ubuntu 11.04 and Linux Mint 12.0 (which seems to be the best from personal observation) and they all suffer from the problem to some degree.
Are you aware of an OS that doesn't have the problem ?
Thanks for your help.
Are you aware of an OS that doesn't have the problem ?
Thanks for your help.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Dynamic Load Balancing
Like fahlimit?
Edit: nevermind, wrong load...
Edit: nevermind, wrong load...
Last edited by 7im on Sat Jan 21, 2012 6:24 pm, edited 1 time in total.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 652
- Joined: Sun Nov 22, 2009 8:42 pm
- Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
- Location: Bulgaria/Team #224497/artoar11_ALL_....
Re: Dynamic Load Balancing
I also noticed that in some WUs "dynamic load balancing" worked. Then at the end of WU (only in Linux, only in the terminal window) shows the following (p75xx):
When the "dynamic load balancing" is not activated, I see the following (p6098):
Note the difference in the percentages. Not much, but the processor calculates faster, when activated. "Dynamic Load Balancing" has been discussed and before:
viewtopic.php?f=44&t=13324&start=0&hilit=Dynamic+Load+Balancing#p135781
Code: Select all
[04:12:11] Completed 0 out of 500000 steps (0%)
NOTE: Turning on dynamic load balancing
[04:17:28] Completed 5000 out of 500000 steps (1%)
[04:22:29] Completed 10000 out of 500000 steps (2%)
..............
[12:32:27] Completed 500000 out of 500000 steps (100%)
Writing final coordinates.
Average load imbalance: 0.4 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
Code: Select all
[07:06:37] Completed 0 out of 500000 steps (0%)
[07:19:17] Completed 5000 out of 500000 steps (1%)
[07:31:57] Completed 10000 out of 500000 steps (2%)
[07:44:35] Completed 15000 out of 500000 steps (3%)
..................
[04:09:53] Completed 500000 out of 500000 steps (100%)
Writing final coordinates.
Average load imbalance: 2.2 %
Part of the total run time spent waiting due to load imbalance: 0.9 %
viewtopic.php?f=44&t=13324&start=0&hilit=Dynamic+Load+Balancing#p135781
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Dynamic Load Balancing
Any Generic-ck distro of Linux uses the BFS scheduler. You can find Musky's guide for installing it over at the [H] http://hardforum.com/showthread.php?t=1601608
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Re: Dynamic Load Balancing
Bazinga!
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Dynamic Load Balancing
Actually, this is probably fixed in the more recent fahcores as they are using a newer version of gromacs.One_Box wrote:I know this subject has been discussed before, does V7 suffer from the same issues as 6.34 (Linux).
I don't suppose there is a solution to the problem yet ?
Also, if people were still seeing a 10x slowdown as the previous thread mentioned, the would be a lot more posts about this, and there aren't.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: Dynamic Load Balancing
There are at least two (actually 3) very different sources of Load Imbalance.artoar_11 wrote:I also noticed that in some WUs "dynamic load balancing" worked. Then at the end of WU (only in Linux, only in the terminal window) shows the following (p75xx): . . .
1) The one most of you are thinking of is what we see when we allocate SMP threads to each of our processors and GROMACS is not running on a dedicated machine. Some higher priority task steps in and preempts the processing of one or more threads, disrupting the desired pattern of equal CPU resources being available to each thread.
2) When a given protein is processed in an SMP environment, a specific group of atoms is assigned to each thread. The basic assumption that each atom requires exactly the same amount of processing and the total number of atoms can be divided into N equal groups is only approximately true so even without any disruptions from item 1, the processing times of each thread may not be equal.
3) We use GROMACS in a Symmetric MultiProcessing environment. Although it doesn't apply to us, GROMACS is capable of running on an Asymmetric cluster of nodes, the processing speed of the various CPUs may be unequal.
In the case where the workload locked to specific processors, items 2 and 3 are pretty stable. If one group of atoms consistently needs more processor time than another group, dynamic load balancing can make adjustments by moving a few atoms from one thread to another. (Similarly if one CPU is consistently slower than another, but remember 3 doesn't apply to us.)
Now let's consider item 1. The amount of CPU time used by other processes is highly variable from step to step (except in an idle machine). In most cases, the processor that is preempted to do that non-FAH work is chosen by the OS and that will vary. GROMACS cannot find a consistent, discernible pattern of one CPU being slower than another by precisely the same amount at every time-step. Dynamic Load Balancing can decide that one CPU is slow ... reduce the work in that thread ... and then find that on the next step, that processor is no longer the slow one. It's really unlikely that DLB can do anything useful about item 1 because conditions vary so much from time-step to time-step.
In our environment, DLB does help with item 2. While folding proceeds, atoms move, both randomly over short distances and systematically over longer distances. During the run, as the systematic motions proceed, atoms that were formerly far from other atoms may get closer to more atoms or they may move away from other atoms. The calculation of forces on an individual atom grows more complicated when there are more atoms nearby and becomes less complicated when there are fewer close neighbors. Thus the time spent calculating each group of atoms can change over the course of many steps. Changes to the calculations that are caused by changes in protein shape can be dynamically balanced.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 652
- Joined: Sun Nov 22, 2009 8:42 pm
- Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
- Location: Bulgaria/Team #224497/artoar11_ALL_....
Re: Dynamic Load Balancing
@ bruce, thanks for the detailed explanation.
@ brutis, some time ago I asked tear, for Kraken. Kraken shown to be effective only in multiprocessors configurations. I tried on my Q9400. Effect = 0.
In percentages (time), how much w/wo the BFS scheduler, if you have information? Now I have Ubuntu 10.10/64b. I can try, if has a good effect.Grandpa_01 wrote:Any Generic-ck distro of Linux uses the BFS scheduler. You can find Musky's guide for installing it over at the [H] http://hardforum.com/showthread.php?t=1601608
@ brutis, some time ago I asked tear, for Kraken. Kraken shown to be effective only in multiprocessors configurations. I tried on my Q9400. Effect = 0.
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Dynamic Load Balancing
I have seen results of up to 2 min. less per frame on 6903 and 6904 Wu's. That was on early versions of 10.10 I believe .22 on the latest version .35 I really have not noticed that large of a difference but when you watch the cpu usage in the monitor you can see the difference. If you search the [H] forums there is some documentation from a while back on the results of testing BFS vs the default scheduler. And yes the Kraken is for MP rigs and BFS is for single P rigs.
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Re: Dynamic Load Balancing
@bruce, thanks for the detailed explanation of DLB.
I'll try a ck kernel and see if that improves matters.
I'll try a ck kernel and see if that improves matters.
-
- Posts: 652
- Joined: Sun Nov 22, 2009 8:42 pm
- Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
- Location: Bulgaria/Team #224497/artoar11_ALL_....
Re: Dynamic Load Balancing
Sorry for deviation from the title of the topic.Grandpa_01 wrote:I have seen results of up to 2 min. less per frame on 6903 and 6904 Wu's. That was on early versions of 10.10 I believe .22 on the latest version .35 I really have not noticed that large of a difference but when you watch the cpu usage in the monitor you can see the difference. If you search the [H] forums there is some documentation from a while back on the results of testing BFS vs the default scheduler. And yes the Kraken is for MP rigs and BFS is for single P rigs.
BFS installed with Kernel Linux 2.6.35-30-generic-ck.
In System Monitor / %CPU - ~400% now. Before it was - 392-396%. But TPF no change.
I suppose that BFS shows a reduction of TPF, for processors with more cores (6/8/12 ...), or processors with HT?
I try with Q9400/L2-6MB cache. I had to try. Thank you Grandpa_01