FAH runs fine on one server, dies on another.
Posted: Thu May 06, 2021 11:54 pm
I've just configured two servers to run FAH. They're in remote data centres, so all that's running is FAHClient, monitoring from my desktop.
The older one, 8 cores, slower everything is running fine at 17.5K PPD.
The newer one, 12 cores, faster bus/ram/CPU etc. F@h starts, reports about 2.3K PPD and then an hour or so later just stops. Tried turning the number of cores down, same result.
Snippets from the top and bottom of the logs:
All other processes are stable.
I'm paying for this beast until the end of the month and would like to put it to good use...
The older one, 8 cores, slower everything is running fine at 17.5K PPD.
The newer one, 12 cores, faster bus/ram/CPU etc. F@h starts, reports about 2.3K PPD and then an hour or so later just stops. Tried turning the number of cores down, same result.
Snippets from the top and bottom of the logs:
Code: Select all
*********************** Log Started 2021-05-06T13:42:36Z ***********************
13:42:36:************************* Folding@home Client *************************
13:42:36: Website: http://folding.stanford.edu/
13:42:36: Copyright: (c) 2009-2014 Stanford University
13:42:36: Author: Joseph Coffland <[email protected]>
13:42:36: Args: --child --lifeline 3680 /etc/fahclient/config.xml --run-as
13:42:36: fahclient --pid-file=/var/run/fahclient.pid --daemon
13:42:36: Config: /etc/fahclient/config.xml
13:42:36:******************************** Build ********************************
13:42:36: Version: 7.4.4
13:42:36: Date: Mar 4 2014
13:42:36: Time: 12:01:17
13:42:36: SVN Rev: 4130
13:42:36: Branch: fah/trunk/client
13:42:36: Compiler: GNU 4.1.2 20080704 (Red Hat 4.1.2-46)
13:42:36: Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
13:42:36: -fno-unsafe-math-optimizations -msse2
13:42:36: Platform: linux2 2.6.18-164.11.1.el5
13:42:36: Bits: 64
13:42:36: Mode: Release
13:42:36:******************************* System ********************************
13:42:36: CPU: Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz
13:42:36: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
13:42:36: CPUs: 12
13:42:36: Memory: 31.25GiB
13:42:36:Free Memory: 23.98GiB
13:42:36: Threads: POSIX_THREADS
13:42:36: OS Version: 4.19
13:42:36:Has Battery: false
13:42:36: On Battery: false
13:42:36: UTC Offset: -4
13:42:36: PID: 3682
13:42:36: CWD: /var/lib/fahclient
13:42:36: OS: Linux 4.19.62-mod-std-ipv6-64-rescue x86_64
13:42:36: OS Arch: AMD64
13:42:36: GPUs: 0
13:42:36: CUDA: Not detected
13:42:36:***********************************************************************
13:42:36:<config>
13:42:36: <!-- Folding Core -->
13:42:36: <core-priority v='low'/>
13:42:36:
13:42:36: <!-- Folding Slot Configuration -->
13:42:36: <gpu v='false'/>
13:42:36:
13:42:36: <!-- HTTP Server -->
13:42:36: <allow v='127.0.0.1 148.170.166.209'/>
13:42:36:
13:42:36: <!-- Network -->
13:42:36: <proxy v=':8080'/>
13:42:36:
13:42:36: <!-- Remote Command Server -->
13:42:36: <command-allow-no-pass v='127.0.0.1 148.170.166.209'/>
13:42:36: <password v='********************'/>
13:42:36:
13:42:36: <!-- Slot Control -->
13:42:36: <power v='full'/>
13:42:36:
13:42:36: <!-- User Information -->
13:42:36: <user v='instance'/>
13:42:36:
13:42:36: <!-- Folding Slots -->
13:42:36: <slot id='0' type='CPU'>
13:42:36: <cpus v='10'/>
13:42:36: </slot>
13:42:36:</config>
13:42:36:Switching to user fahclient
13:42:36:Trying to access database...
13:42:36:Successfully acquired database lock
13:42:36:Enabled folding slot 00: READY cpu:10
13:42:36:WU00:FS00:Starting
13:42:36:WU00:FS00:Removing old file './work/00/logfile_01-20210506-064720.txt'
13:42:36:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 00 -suffix 01 -version 704 -lifeline 3682 -checkpoint 15 -np 10
13:42:36:WU00:FS00:Started FahCore on PID 3691
Code: Select all
14:39:46:WU00:FS00:0xa8:Calling: mdrun -c frame34.gro -s frame34.tpr -x frame34.xtc -cpi state.cpt -cpt 15 -nt 10 -ntmpi 1
14:39:46:WU00:FS00:0xa8:Steps: first=85000000 total=87500000
14:39:46:WU00:FS00:0xa8:Completed 32002 out of 2500000 steps (1%)
14:40:44:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
14:40:45:WU00:FS00:Starting
14:40:45:WU00:FS00:Removing old file './work/00/logfile_01-20210506-140843.txt'
14:40:45:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 00 -suffix 01 -version 704 -lifeline 3682 -checkpoint 15 -np 10
14:40:45:WU00:FS00:Started FahCore on PID 21360
14:40:45:WU00:FS00:Core PID:21364
14:40:45:WU00:FS00:FahCore 0xa8 started
14:40:46:WU00:FS00:0xa8:*********************** Log Started 2021-05-06T14:40:45Z ***********************
14:40:46:WU00:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
14:40:46:WU00:FS00:0xa8: Core: Gromacs
14:40:46:WU00:FS00:0xa8: Type: 0xa8
14:40:46:WU00:FS00:0xa8: Version: 0.0.12
14:40:46:WU00:FS00:0xa8: Author: Joseph Coffland <[email protected]>
14:40:46:WU00:FS00:0xa8: Copyright: 2020 foldingathome.org
14:40:46:WU00:FS00:0xa8: Homepage: https://foldingathome.org/
14:40:46:WU00:FS00:0xa8: Date: Jan 16 2021
14:40:46:WU00:FS00:0xa8: Time: 19:23:19
14:40:46:WU00:FS00:0xa8: Compiler: GNU 8.3.0
14:40:46:WU00:FS00:0xa8: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
14:40:46:WU00:FS00:0xa8: -fdata-sections -O3 -funroll-loops -fno-pie
14:40:46:WU00:FS00:0xa8: Platform: linux2 4.15.0-128-generic
14:40:46:WU00:FS00:0xa8: Bits: 64
14:40:46:WU00:FS00:0xa8: Mode: Release
14:40:46:WU00:FS00:0xa8: SIMD: avx_256
14:40:46:WU00:FS00:0xa8: OpenMP: ON
14:40:46:WU00:FS00:0xa8: CUDA: OFF
14:40:46:WU00:FS00:0xa8: Args: -dir 00 -suffix 01 -version 704 -lifeline 21360 -checkpoint 15 -np
14:40:46:WU00:FS00:0xa8: 10
14:40:46:WU00:FS00:0xa8:************************************ libFAH ************************************
14:40:46:WU00:FS00:0xa8: Date: Jan 16 2021
14:40:46:WU00:FS00:0xa8: Time: 19:21:38
14:40:46:WU00:FS00:0xa8: Compiler: GNU 8.3.0
14:40:46:WU00:FS00:0xa8: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
14:40:46:WU00:FS00:0xa8: -fdata-sections -O3 -funroll-loops -fno-pie
14:40:46:WU00:FS00:0xa8: Platform: linux2 4.15.0-128-generic
14:40:46:WU00:FS00:0xa8: Bits: 64
14:40:46:WU00:FS00:0xa8: Mode: Release
14:40:46:WU00:FS00:0xa8:************************************ CBang *************************************
14:40:46:WU00:FS00:0xa8: Date: Jan 16 2021
14:40:46:WU00:FS00:0xa8: Time: 19:21:24
14:40:46:WU00:FS00:0xa8: Compiler: GNU 8.3.0
14:40:46:WU00:FS00:0xa8: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
14:40:46:WU00:FS00:0xa8: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
14:40:46:WU00:FS00:0xa8: Platform: linux2 4.15.0-128-generic
14:40:46:WU00:FS00:0xa8: Bits: 64
14:40:46:WU00:FS00:0xa8: Mode: Release
14:40:46:WU00:FS00:0xa8:************************************ System ************************************
14:40:46:WU00:FS00:0xa8: CPU: Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz
14:40:46:WU00:FS00:0xa8: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
14:40:46:WU00:FS00:0xa8: CPUs: 12
14:40:46:WU00:FS00:0xa8: Memory: 31.25GiB
14:40:46:WU00:FS00:0xa8:Free Memory: 23.97GiB
14:40:46:WU00:FS00:0xa8: Threads: POSIX_THREADS
14:40:46:WU00:FS00:0xa8: OS Version: 4.19
14:40:46:WU00:FS00:0xa8:Has Battery: false
14:40:46:WU00:FS00:0xa8: On Battery: false
14:40:46:WU00:FS00:0xa8: UTC Offset: -4
14:40:46:WU00:FS00:0xa8: PID: 21364
14:40:46:WU00:FS00:0xa8: CWD: /var/lib/fahclient/work
14:40:46:WU00:FS00:0xa8:********************************************************************************
14:40:46:WU00:FS00:0xa8:Project: 16959 (Run 31, Clone 401, Gen 34)
14:40:46:WU00:FS00:0xa8:Unit: 0x00000000000000000000000000000000
14:40:46:WU00:FS00:0xa8:Digital signatures verified
14:40:46:WU00:FS00:0xa8:Calling: mdrun -c frame34.gro -s frame34.tpr -x frame34.xtc -cpi state.cpt -cpt 15 -nt 10 -ntmpi 1
14:40:46:WU00:FS00:0xa8:Steps: first=85000000 total=87500000
14:40:46:WU00:FS00:0xa8:Completed 32002 out of 2500000 steps (1%)
14:41:45:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
14:41:45:WU00:FS00:Starting
14:41:45:WU00:FS00:Removing old file './work/00/logfile_01-20210506-140944.txt'
14:41:45:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 00 -suffix 01 -version 704 -lifeline 3682 -checkpoint 15 -np 10
14:41:45:WU00:FS00:Started FahCore on PID 21439
14:41:45:WU00:FS00:Core PID:21443
14:41:45:WU00:FS00:FahCore 0xa8 started
14:41:46:WU00:FS00:0xa8:*********************** Log Started 2021-05-06T14:41:45Z ***********************
14:41:46:WU00:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
14:41:46:WU00:FS00:0xa8: Core: Gromacs
14:41:46:WU00:FS00:0xa8: Type: 0xa8
14:41:46:WU00:FS00:0xa8: Version: 0.0.12
14:41:46:WU00:FS00:0xa8: Author: Joseph Coffland <[email protected]>
14:41:46:WU00:FS00:0xa8: Copyright: 2020 foldingathome.org
14:41:46:WU00:FS00:0xa8: Homepage: https://foldingathome.org/
14:41:46:WU00:FS00:0xa8: Date: Jan 16 2021
14:41:46:WU00:FS00:0xa8: Time: 19:23:19
14:41:46:WU00:FS00:0xa8: Compiler: GNU 8.3.0
14:41:46:WU00:FS00:0xa8: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
14:41:46:WU00:FS00:0xa8: -fdata-sections -O3 -funroll-loops -fno-pie
14:41:46:WU00:FS00:0xa8: Platform: linux2 4.15.0-128-generic
14:41:46:WU00:FS00:0xa8: Bits: 64
14:41:46:WU00:FS00:0xa8: Mode: Release
14:41:46:WU00:FS00:0xa8: SIMD: avx_256
14:41:46:WU00:FS00:0xa8: OpenMP: ON
14:41:46:WU00:FS00:0xa8: CUDA: OFF
14:41:46:WU00:FS00:0xa8: Args: -dir 00 -suffix 01 -version 704 -lifeline 21439 -checkpoint 15 -np
14:41:46:WU00:FS00:0xa8: 10
14:41:46:WU00:FS00:0xa8:************************************ libFAH ************************************
14:41:46:WU00:FS00:0xa8: Date: Jan 16 2021
14:41:46:WU00:FS00:0xa8: Time: 19:21:38
14:41:46:WU00:FS00:0xa8: Compiler: GNU 8.3.0
14:41:46:WU00:FS00:0xa8: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
14:41:46:WU00:FS00:0xa8: -fdata-sections -O3 -funroll-loops -fno-pie
14:41:46:WU00:FS00:0xa8: Platform: linux2 4.15.0-128-generic
14:41:46:WU00:FS00:0xa8: Bits: 64
14:41:46:WU00:FS00:0xa8: Mode: Release
14:41:46:WU00:FS00:0xa8:************************************ CBang *************************************
14:41:46:WU00:FS00:0xa8: Date: Jan 16 2021
14:41:46:WU00:FS00:0xa8: Time: 19:21:24
14:41:46:WU00:FS00:0xa8: Compiler: GNU 8.3.0
14:41:46:WU00:FS00:0xa8: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
14:41:46:WU00:FS00:0xa8: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
14:41:46:WU00:FS00:0xa8: Platform: linux2 4.15.0-128-generic
14:41:46:WU00:FS00:0xa8: Bits: 64
14:41:46:WU00:FS00:0xa8: Mode: Release
14:41:46:WU00:FS00:0xa8:************************************ System ************************************
14:41:46:WU00:FS00:0xa8: CPU: Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz
14:41:46:WU00:FS00:0xa8: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
14:41:46:WU00:FS00:0xa8: CPUs: 12
14:41:46:WU00:FS00:0xa8: Memory: 31.25GiB
14:41:46:WU00:FS00:0xa8:Free Memory: 23.97GiB
14:41:46:WU00:FS00:0xa8: Threads: POSIX_THREADS
14:41:46:WU00:FS00:0xa8: OS Version: 4.19
14:41:46:WU00:FS00:0xa8:Has Battery: false
14:41:46:WU00:FS00:0xa8: On Battery: false
14:41:46:WU00:FS00:0xa8: UTC Offset: -4
14:41:46:WU00:FS00:0xa8: PID: 21443
14:41:46:WU00:FS00:0xa8: CWD: /var/lib/fahclient/work
14:41:46:WU00:FS00:0xa8:********************************************************************************
14:41:46:WU00:FS00:0xa8:Project: 16959 (Run 31, Clone 401, Gen 34)
14:41:46:WU00:FS00:0xa8:Unit: 0x00000000000000000000000000000000
14:41:46:WU00:FS00:0xa8:Digital signatures verified
14:41:46:WU00:FS00:0xa8:Calling: mdrun -c frame34.gro -s frame34.tpr -x frame34.xtc -cpi state.cpt -cpt 15 -nt 10 -ntmpi 1
14:41:46:WU00:FS00:0xa8:Steps: first=85000000 total=87500000
14:41:46:WU00:FS00:0xa8:Completed 32002 out of 2500000 steps (1%)
I'm paying for this beast until the end of the month and would like to put it to good use...