Running FAHclient on a cloud resouses on temporary VMs

Moderators: Site Moderators, FAHC Science Team

Post Reply
Gavelock
Posts: 4
Joined: Thu May 07, 2020 9:08 pm

Running FAHclient on a cloud resouses on temporary VMs

Post by Gavelock »

Hello!

I work in an organization. We have a private cloud for our business needs.
The problem is that the cloud is never 100% busy, there are always some resources available: 40-100 CPU cores.
We have an idea to use these "free" resources for Folding@home.
The Operating System on VMs is CentOS7.
Unfortunately, we can allow VMs to run for a week continuously.
What we can do is to create a new VM, complete exactly one work unit, and delete the VM. If there are still free resources available: repeat.
What I need right now is a command like:
FAHClient --amount-of-workunits=1 --user=username --team=12345 --passkey=***** --gpu=false --cpu-usage=100

This command should request a work unit and when the one is done, finish with exit code 0.
I did not find anything like that in FAHClient help. I tried cycles option but it is different.

Basically, there are two questions:
1. Is that use case with a cloud useful for the Folding@home project? (VMs created in the cloud and removed after one work unit is done)
2. If the first answer is yes, how can we restrict the number of work units done by FAHClient during one run?
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by PantherX »

Welcome to the F@H Forum Gavelock,

This is the command that would work for you:

Code: Select all

  max-units <integer=0>
    Process at most this number of units, then pause.
You can experiment with the value, say 3 WUs which could potentially be finished within 1 Week assuming that it runs 24/7 and has multiple CPUs to fold.

Since you're using company owned hardware, please ensure that you have permission (usually written) from the person authorized to make such decisions (Internal IT, CTO, etc.). Folding on CPUs is valuable and important scientific work so whatever your business can contribute towards, it would be appreciated :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Gavelock
Posts: 4
Joined: Thu May 07, 2020 9:08 pm

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by Gavelock »

Hi PantherX,

Thank you for your answer!
I have started a test run with max-units parameter.
The company is interested in participating in helping COVID-19 research projects. Right now it is just a request to study the possibility to participate in F@H. Once it is done I hope we will run real campaigns.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by MeeLee »

I think it'll be better to run a script on your servers to pause the VMs, once your resources are less than x-amount of threads.
Run one major VM running FAH on multiple cores, and run a few smaller ones that you can easily pause (like running 4 to 8 cores).

That way you don't have to set up and reload each VM.
As long as the (average) WU is able to continue within ~8-14 hours (on average hardware of ~3Ghz quad core or more), it should make the deadline.
[WHGT]Cyberman
Posts: 82
Joined: Sat Dec 17, 2011 4:22 pm
Hardware configuration: none anymore, FAH doesn't want it, it seems.

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by [WHGT]Cyberman »

Instead of a full VM just for FAH, you could also run FAH inside a docker container that runs inside any other VM.
Probably somewhat less efficient, but much less work to set up every time.
There's several dockerfiles on dockerhub to use as inspiration.
It seems I can't write a signature that both conveys my feelings and doesn't look like a miserable trolling attempt...
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by PantherX »

If you're planning on using Docker, have a look here (https://github.com/FoldingAtHome/containers). If you're planning to use VMWare, then have a look here (https://flings.vmware.com/vmware-applia ... lding-home). Please note that the VMWare appliance isn't officially supported.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by bruce »

Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by Catalina588 »

See also issue with managing preemptible VMs https://github.com/FoldingAtHome/fah-issues/issues/1458
PeterGarlic
Posts: 29
Joined: Fri May 08, 2020 6:12 pm

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by PeterGarlic »

Gavelock wrote:Hello!

I work in an organization. We have a private cloud for our business needs.
The problem is that the cloud is never 100% busy, there are always some resources available: 40-100 CPU cores.
We have an idea to use these "free" resources for Folding@home.
The Operating System on VMs is CentOS7.
Unfortunately, we can allow VMs to run for a week continuously.
What we can do is to create a new VM, complete exactly one work unit, and delete the VM. If there are still free resources available: repeat.
What I need right now is a command like:
FAHClient --amount-of-workunits=1 --user=username --team=12345 --passkey=***** --gpu=false --cpu-usage=100

This command should request a work unit and when the one is done, finish with exit code 0.
I did not find anything like that in FAHClient help. I tried cycles option but it is different.

Basically, there are two questions:
1. Is that use case with a cloud useful for the Folding@home project? (VMs created in the cloud and removed after one work unit is done)
2. If the first answer is yes, how can we restrict the number of work units done by FAHClient during one run?
Hi Gavelock,
I have a similar situation and I would like to ask you if is possible to know what configuration are you using for your VMs (vCPU, Ram, Disk).
We are testing private cloud deployment (KVM clusters) as you and next step is to find the best VM configuration for maximum performances.
Thanks in advance
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by PantherX »

The most stable CPU values are: 2, 4, 8, 12, 16 while RAM would what the OS needs plus a bit more as F@H isn't RAM intensive on CPU folding only. For storage, a fast one means less time writing checkpoints but F@H isn't disk heavy, only when reading/writing checkpoints and packing/unpack WUs to be sent/received.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by Neil-B »

24 and 32 are also pretty rock solid so if your VMs are scalable to that then these will complete WUs much faster - dependant on underlying hardware and the specific project probably in the 45mins to 4hours window.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Gavelock
Posts: 4
Joined: Thu May 07, 2020 9:08 pm

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by Gavelock »

PeterGarlic wrote: Hi Gavelock,
I have a similar situation and I would like to ask you if is possible to know what configuration are you using for your VMs (vCPU, Ram, Disk).
We are testing private cloud deployment (KVM clusters) as you and next step is to find the best VM configuration for maximum performances.
Thanks in advance
Hello PeterGarlic,

We are using 1 CPU core, 4 GB RAM, 15 GB disks, CentOS7 for that task. That was done to allow filling even the smallest pieces of free CPU resources.

Kind regards,
Gavelock
Posts: 4
Joined: Thu May 07, 2020 9:08 pm

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by Gavelock »

PantherX wrote:Welcome to the F@H Forum Gavelock,

This is the command that would work for you:

Code: Select all

  max-units <integer=0>
    Process at most this number of units, then pause.
You can experiment with the value, say 3 WUs which could potentially be finished within 1 Week assuming that it runs 24/7 and has multiple CPUs to fold.

Since you're using company owned hardware, please ensure that you have permission (usually written) from the person authorized to make such decisions (Internal IT, CTO, etc.). Folding on CPUs is valuable and important scientific work so whatever your business can contribute towards, it would be appreciated :)
Hello PantherX,

Thanks again for your help with FAHClient. I would like to share some information about our solution.
In the institute (Joint Institute for Nuclear Research) we have a cloud. Some other members of our institute also have clouds. These clouds are partially used to run batch jobs on either dedicated resources or on free ones. The batch job here is a shell script that should be executed. All clouds are joint together with DIRAC Interware. It is some special opensource platform used in science to organize distributed heterogeneous systems to run High Throughput Computing load through them. When jobs for cloud resources appear, DIRAC spawns VMs on available clouds. Each VM after contextualization ask the central DIRAC service for one job. DIRAC sends to each VM one job from queue. When the job is done on the VM, that VM asks DIRAC to delete itself(delete VM). If there are still jobs in the job queue, DIRAC will try to spawn new VMs on the freed resources.

So the task was to create shell script to run FAHClient as a job which will finish after the FaH work unit is completed. The shell script for the job is super simple:

Code: Select all

#!/bin/bash
set -x
echo $1
echo $2 
FAHClient --cause=covid-19 --user=$1 --team=265602 --passkey=$2 --gpu=false --cycles=-1 --cpu-usage=100 --exit-when-done --max-units=1
Another part is a program sending jobs to the DIRAC Job Queue, but that is closely related to DIRAC API so I will not post it here. This program checks the status of queues and resources and sends FaH jobs to the queue. Each job goes with parameters depending on the resource on which it will be run. Parameters contain FaH Username and FaH Passkey. That allows keeping track of each cloud in the joint infrastructure.

Our team is Joint Institute for Nuclear Research, ID: 265602. It's been 3 months since the start of this activity. The team has rank around 7000, 23M credits received(https://stats.foldingathome.org/team/265602). And we are happy that idle resources are used now for good cause.

Thank you PantherX for your help!
Thanks, everybody for reaction on this thread! That was a surprise for me when I came here today. I found very interesting suggestions and ideas.

Kind regards,
Igor Pelevanyuk
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by gunnarre »

Thank you for helping out with the research.

The advantage to running one WU at a time like you do - compared to running it as an interruptible instance - is that you'll more likely complete the work within the timeout. So that is preferable if the alternative is to have the instance paused for days - which might cause the effort to be wasted.

On a very heterogenous platform, doing it like you have with just one CPU thread per instance may indeed be the way to go, but also note that CPU folding takes good advantage of multi-threading - so for most VM hosts it might be better to run just one or a few instances with say 8 or 16 threads on low priority to use idle resources.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Running FAHclient on a cloud resouses on temporary VMs

Post by Neil-B »

... so to consider ... whilst a multi core vm will tie up more cores it will do it for a shorter time ... even just 2 or 4 cores will significantly assist the science by returning WUs quicker than 2x or 4x single core vms
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Post Reply