fahclient 7.6.x ignore <power value="*"/>

Moderators: Site Moderators, FAHC Science Team

PeterGarlic
Posts: 29
Joined: Fri May 08, 2020 6:12 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by PeterGarlic »

PantherX wrote:
PeterGarlic wrote:...

Code: Select all

<config>
	...
  <power v='light'/>
...
  <slot id='0' type='CPU'/>

</config>

I set basic options: usage of 1 of 4 CPU and power light. After I start the client top show me this (pressing 1 to see all cpu)

Code: Select all

top - 12:08:40 up  1:26,  1 user,  load average: 0,94, 0,46, 0,23
Tasks: 261 total,   2 running, 200 sleeping,   0 stopped,  10 zombie
%Cpu0  :  0,3 us,  0,0 sy,  0,0 ni, 99,7 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu1  :  0,0 us,  0,0 sy,100,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu2  :  0,3 us,  0,3 sy,  0,0 ni, 99,3 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  :  0,0 us,  0,0 sy,  0,0 ni,100,0 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem :  4030252 total,  2115832 free,  1064676 used,   849744 buff/cache
KiB Swap:   999420 total,   999420 free,        0 used.  2728880 avail Mem
One cpu at 100% ignoring the <power v='light'/> directive...
The expected behavior of having light on 4 CPUs is 2 CPUs being used. In your example, only 1 CPU is being used and there's an edge case. If the WU was downloaded before the CPU slot was configured, it will only run on a single CPU thread regardless of the power value and the number of CPUs you have. Once that WU has completed, it will download the correct WU depending on what settings you have used. Checking the log will provide additional details to explain this situation.

Generally speaking, you can decrease the number of CPUs to an assigned WU but you can't increase it more than the value the WU had when it was first assigned.

Got it!!
This sentence solve one problem that I was not able to check on my our sessions: We didn´t consider the CPU-WU relation.
Now our lab test make more sense and we can go over.
Thx
PeterGarlic
Posts: 29
Joined: Fri May 08, 2020 6:12 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by PeterGarlic »

bruce wrote:FAH doesn't specify which CPU it will use, only how much ... it's up to the scheduler in the OS to pick the available CPU. From the perspective of the program, 100% of 1 CPU is the same as 50% of two CPUs. or 33% of three CPUs.
From a point of view of a physical multicore CPU I think that this make a big difference!
Some cores will be overheated and some will sleep producing one unbalanced thermic distribution.

I´m wrong?
NRT_AntiKytherA
Posts: 107
Joined: Sun May 10, 2020 11:50 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by NRT_AntiKytherA »

Apart from heat spreaders which are designed to negate any such hot spots, the operating system's scheduler should also distribute workloads evenly across the cpu cores available. Even if you have only 2 physical cores actually folding they will not be adjacent to each other. Then you would have any other processes also running from the host systems which will also even out temperatures a little. There's also another layer of protection via automatic thermal throttling at EFI or legacy BIOS level to prevent damage if your system's cooling solution isn't good enough.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: fahclient 7.6.x ignore <power value="*"/>

Post by Neil-B »

PeterGarlic wrote:I´m wrong?

Possibly not but ime I can't recall ever seeing all cores/thread of any cpu synchronously loading up and down through the load range … there are always some threads under load where others are idle … in fact I have probably seen the opposite most times in that separate cores/threads get loaded up and spike, sometimes singly or in cluster/groups … Have seen this whilst watching/nursemaiding a variety of machine learning tools on both windows and Linux platforms. :shock:

Now I have no clue as to whether old school HPC kit (the nicely tuned really cool hand built stuff - not more modern the large clusters of utility compute :ewink: ) are better at loading their processing in a balanced manner though I guess they might be … or might not as even at their internal speeds farming out one "chunk" of work across say 10,000 processing cores will have an overhead and just dumping it on one or a few may well be the optimised approach? :?:

… and I'm fairly sure the CPU manfacturers will have designed into their CPUs safeguards to allow for the way they will tend to be used by the OSs … but there again … :twisted:

So wrong - probably not there will be "hotspotting" of heat generation … but is this expected behaviour of the nature of the beast - probably? :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
NRT_AntiKytherA
Posts: 107
Joined: Sun May 10, 2020 11:50 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by NRT_AntiKytherA »

Neil-B wrote:
PeterGarlic wrote:I´m wrong?
Possibly not but ime I can't recall ever seeing all cores/thread of any cpu synchronously loading up and down through the load range …
Probably OS and to some extent hardware architecture dependent but this is mine on medium folding power:

Image
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: fahclient 7.6.x ignore <power value="*"/>

Post by Neil-B »

… which to me doesn't look like synchronous loading of all cores?

Now if all 12 threads were all at the same levels with the same spikes then I'd admit the OSs do a good job at balanced distribution :) … though it could be that the lower loaded threads are actually cores nearer the middle of the die and that the OS is very cleverly balancing heat production over the whole CPU :?: … it isn't just the load as per "Task Miss-Manager" (I tend to ignore what it purports to say) but also individual clocks on each core and utilisation on each thread and individual core temps which in windows I use HWMonitor to watch (other tools exist and may/may not be better/worse) … Linux for me has always been harder to monitor (but I avoid it where possible as I know I lack skills needed there) … lesser loaded threads (not 100%) on a core have a bit of correlation to higher clock on that core (noted, but non investigated/proven) … some workloads load much more evenly (Ok I'll admit it) than others - guess there is only so much the OS can do to balance a horribly single threaded bit of programming - software designed to be multi-threaded will most likely make it easier on the OS to try to balance.

First time for everything so I'll try to post screen capture of my 56 thread system running 24/56 Folding Slot - I see this type of thing whatever I am doing - whilst there may be some general load balancing effect it can in no way be called uniform :)

https://imgur.com/gallery/yNAnMvB
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
NRT_AntiKytherA
Posts: 107
Joined: Sun May 10, 2020 11:50 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by NRT_AntiKytherA »

That was of course a snapshot image which can never be truly reflective. The core loads shift around dynamically so while it's not 100% synchronous the loading is near enough equal across all 12 threads at any given time. I've also got the zip version of hwmonitor and this seems to ring true in that application too. None stay at 100% for more than a few milliseconds and no threads seem constantly loaded more than the others in that regard either. The distribution seems to be uniform across the die, the higher loaded cores are well spread out. threads #2 and #3, #6 and #7 and #9 and #10 have marginally higher load values than the others one minute and then it switches to other pairs the next, so specific cores are being loaded dynamically by the i/o scheduler. Thermal management? Maybe but max temp for the processor is 95 celsius, mine is peaking around 70 under load with the stock AMD Cooler.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: fahclient 7.6.x ignore <power value="*"/>

Post by Neil-B »

Yup I agree .. things are reasonably distributed over time but in a very ephemeral unbalanced way … but it is not uniform loading - all threads don't run at the same percentage at the same time which appeared to be what the OP was after?

By the way, how do you get your screen capture to display in the forum - I have only found guides for posting images as links :(
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
NRT_AntiKytherA
Posts: 107
Joined: Sun May 10, 2020 11:50 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by NRT_AntiKytherA »

I think so too but that might not be possible to achieve with either windows or linux

I posted them as

Code: Select all

[IMG]urlgoeshere[/img]
they are stored on my dropbox
PeterGarlic
Posts: 29
Joined: Fri May 08, 2020 6:12 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by PeterGarlic »

Good. Thanks for all the answers.
My question was generated from the analysis of a running 4 core Linux VM under a KVM virtualization server: the WU process once started remain attached to the assigned vCPU.

Maybe this is related to the virtualization platform but to be sure I scheduled a test session on physical hardware to view how is working the scheduler.

I will report the test results.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: fahclient 7.6.x ignore <power value="*"/>

Post by Neil-B »

iirc it is possible to assign a task within an OS to a specific core/thread if one wants to (how I haven't a clue tbh) but it might be that within the Linux VM that type of assignment is actually the norm?

… or could it be that it is reported as such by the internal VM monitors but that outside the VM at the physical hardware layer it is better balanced across the real physical cores/threads? … I mention the latter as I seem to recall watching an 8vCPU Hyper-V VM running flat out (as per resource monitor in VM) but with a much wider spread/utilisation of cores/threads in the host operating system but it is a while since I ran that type of configuration so I may be miss-remembering.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PeterGarlic
Posts: 29
Joined: Fri May 08, 2020 6:12 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by PeterGarlic »

Neil-B wrote:iirc it is possible to assign a task within an OS to a specific core/thread if one wants to (how I haven't a clue tbh) but it might be that within the Linux VM that type of assignment is actually the norm?

… or could it be that it is reported as such by the internal VM monitors but that outside the VM at the physical hardware layer it is better balanced across the real physical cores/threads? … I mention the latter as I seem to recall watching an 8vCPU Hyper-V VM running flat out (as per resource monitor in VM) but with a much wider spread/utilisation of cores/threads in the host operating system but it is a while since I ran that type of configuration so I may be miss-remembering.
1 - I think that normally is better to leave the the internal scheduler but assignment of process affinity to core/threads can be done using taskset command or with a little bit complex kernel masking configuration (there are many post on internet about). In my case I will never use this approach because I have to deploy many instances to cluster groups and this make no sense. There are other cluster-related configuration that are more appropriate for that.

2 - The second point is really interesting: the VM instance is one abstraction over the physical layer and the top/htop visualization can be one something that is different from the real hardware status. Unfortunately at the moment I don´t have a cluster free and I cannot check what´s happening inside the hypervisor nodes because there are too many instances running and will be a little bit complicated and time consuming task but I will check this next time we will rebuild the lab cluster.

Thanks Neil
PeterGarlic
Posts: 29
Joined: Fri May 08, 2020 6:12 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by PeterGarlic »

Many interesting topics have been discussed on this post and after another day of testing the situation seems to be less complicated.

Let me resume what I have understood on CPU configuration for FAHclient and (if you like) correct my sentences.

Supposing to have a 4 core CPU there are many different ways to configure them on FAHclient:

Single slot assignment
First way to configure is usage of each core/cpu as slot – not the best approach because each configured slot will be used at 100% and each one will be slower than the sum of available processing power. Maybe can be useful for some special configurations (for example can be assigned less slot than the available cou) but the power control is unusable and I´m not sure about the activity of the OS scheduler.

Code: Select all

<!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='CPU'/>
  <slot id='2' type='CPU'/>
  <slot id='3' type='CPU'/>
Grouping CPU in one slot:
With this configuration 4 CPU will be available and the power control (thanks PantherX) will use:
light: 1 core
medium: 3 cores
full: 4 cores

Code: Select all

<!-- Folding Slots -->
<slot id='0' type='CPU'>
   <cpus v='4'/>
</slot>
Grouping CPU in more slots
With this configuration 2 groups of 2 CPU will be available and the power control will use:
light: 1 core
medium: 1 core
full: 2 cores

Code: Select all

<!-- Folding Slots -->
<slot id='0' type='CPU'>
  <cpus v='2'/>
</slot>
<slot id='1' type='CPU'>
  <cpus v='2'/>
</slot>
Setting -1 in FAHControl
If you set -1 as CPU slot the system goes in automatic mode and your slot configuration will be something like:

Code: Select all

<!-- Folding Slots -->
  <slot id='12345678901234567890' type='CPU'/>
where the slot is a “pseudo-random” 20 digits number assigned to the user (?)/team (?) but is always the same for all the instances running with the same account. Once again you will have full control of power with the same assignment seen on "Grouping CPU in one slot" .
Note: I´m not sure about that but our lab VMs are working in this way (anybody have one explanation for that?)

All right or I have to study more?
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: fahclient 7.6.x ignore <power value="*"/>

Post by Neil-B »

From the getting science done perspective the one slot with 4 vCPUs (if possible on full power) is the "common knowledge" preference as even if not as efficient the individual WUs actually get turned round quickest.

I believe that when cloning VMs one shouldn't connect the Gold VM to the internet after installing the FAH software and before taking the clone copy … this may be part of what you are seeing … that way the VM gets a chance to set its own ID.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PeterGarlic
Posts: 29
Joined: Fri May 08, 2020 6:12 pm

Re: fahclient 7.6.x ignore <power value="*"/>

Post by PeterGarlic »

Neil-B wrote:From the getting science done perspective the one slot with 4 vCPUs (if possible on full power) is the "common knowledge" preference as even if not as efficient the individual WUs actually get turned round quickest.

I believe that when cloning VMs one shouldn't connect the Gold VM to the internet after installing the FAH software and before taking the clone copy … this may be part of what you are seeing … that way the VM gets a chance to set its own ID.
1 - May you explain me a little bit in deep your first sentence?
2 - With "that way the VM gets a chance to set its own ID" do you mean what I called "“pseudo-random” 20 digits number"?
Post Reply