Page 2 of 3

Re: Cannot fold A4 cpu but can fold A7

Posted: Wed Feb 21, 2018 5:37 pm
by beer
Hi
As far as I can see then I currently having WU (they refere to slots?)
WU01 is the cpu and withing /var/lib/fahclient/work/01 find a log file called logfile_01.txt:

Code: Select all

*------------------------------*
Folding@Home Gromacs GB Core
Version 2.27 (Dec. 15, 2010)

Preparing to commence simulation
- Ensuring status. Please wait.
- Looking at optimizations...
- Working with standard loops on this execution.
Examination of work files indicates 8 consecutive improper terminations of core.
- Expanded 825690 -> 1402156 (decompressed 169.8 percent)
Called DecompressByteArray: compressed_data_size=825690 data_size=1402156, decompressed_data_size=1402156 diff=0
- Digital signature verified

Project: 9035 (Run 154, Clone 0, Gen 1055)

Entering M.D.
The cpu/WU01 does not contain any subfolders

However the others does (/var/lib/fahclient/work/00/01) and in here I found a log that looks like this

Code: Select all

Project: 11432 (Run 0, Clone 981, Gen 40)
Unit: 0x000000338ca304e85a5a6c4686da8781
CPU: 0x00000000000000000000000000000000
Machine: 1
Reading tar file core.xml
Reading tar file integrator.xml
Reading tar file state.xml
Reading tar file system.xml
Digital signatures verified
*************************** Core21 Folding@home Core ***************************
       Type: 33
       Core: Core21
    Website: http://folding.stanford.edu/
  Copyright: (c) 2009-2014 Stanford University
     Author: Yutong Zhao <[email protected]>
       Args: -dir 00 -suffix 01 -version 704 -lifeline 8816 -checkpoint 30
             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
             0 -gpu 0
     Config: <none>
************************************ Build *************************************
    Version: 0.0.18
       Date: Jan 20 2017
       Time: 03:42:31
 Repository: Git
   Revision: 2745fc8067662d2e7b9e455232edb5ebd8790640
     Branch: HEAD
   Compiler: GNU 4.4.7 20120313 (Red Hat 4.4.7-17)
    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
             -fno-unsafe-math-optimizations -msse2
   Platform: linux2 4.4.39-moby
       Bits: 64
       Mode: Release
************************************ System ************************************
        CPU: Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
       CPUs: 8
     Memory: 7.73GiB
Free Memory: 4.68GiB
    Threads: POSIX_THREADS
 OS Version: 4.14
Has Battery: false
 On Battery: false
 UTC Offset: 1
        PID: 8820
        CWD: /var/lib/fahclient/work
         OS: Linux 4.14.0-3-amd64 x86_64
    OS Arch: AMD64
       GPUs: 0
       CUDA: Not detected
     OpenCL: Not detected
********************************************************************************
Folding@home GPU Core21 Folding@home Core
Version 0.0.18
[1] compatible platform(s):
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 CUDA 9.0.282
  NAME = NVIDIA CUDA
  VENDOR = NVIDIA Corporation

(1) device(s) found on platform 0:
  -- 0 --
  DEVICE_NAME = GeForce GTX 1070
  DEVICE_VENDOR = NVIDIA Corporation
  DEVICE_VERSION = OpenCL 1.2 CUDA

[ Entering Init ]
  Launch time: 2018-02-21T17:14:26Z
  Arguments passed: -dir 00 -suffix 01 -version 704 -lifeline 8816 -checkpoint 30 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 
  For testState comparison of CPU and GPU, will use:
    forceTolerance: 5 kJ/mol/nm
    energyTolerance: 10 kJ/mol
[ Leaving  Init ]
[ Entering Main ]
  Reading core settings...
  Total number of steps: 5000000
  XTC write frequency: 250000
  Checkpoint write frequency: 250000 (5%)
  Number of frames per WU: 20
[ Initializing Core Contexts ]
  Using platform OpenCL
  Looking for vendor: nvidia...found on platformId 0
  Setting platform precision to mixed
  Deserializing System...
  Found MonteCarloBarostat @ 1.01325 (default) Bar, 300 Kelvin, 50 pressure change frequency.
    Found: 58738 atoms, 9 forces.
  Deserializing State...  done.
    Ewald error tolerance in force 6 is 0.0005
    Ewald parameters: alpha 2.920289872087185 nx 75 ny 75 nz 75
    Integrator Type: N6OpenMM18LangevinIntegratorE
    Constraint Tolerance: 1e-05
    Time Step in PS: 0.002
    Temperature: 300
    Friction Coeff: 5
    Using CPU platform for reference calculations.
  Checking core state against reference...
  Checking checkpoint state against reference...
[ Initialized Core Contexts... ]
  Using OpenCL on platformId 0 and gpu 0
  v(^_^)v  MD ready starting from step 0

Completed 0 out of 5000000 steps (0%)
Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
Completed 50000 out of 5000000 steps (1%)
Completed 100000 out of 5000000 steps (2%)
Completed 150000 out of 5000000 steps (3%)
Completed 200000 out of 5000000 steps (4%)

with a bit of luck then I got an A7 core and this folder /var/lib/fahclient/work/01/01 was created with a log called md.log:

Code: Select all

Log file opened on Wed Feb 21 18:33:46 2018
Host: debian  pid: 9564  rank ID: 0  number of ranks:  1
GROMACS:    GROMACS, VERSION 5.0.4-20161122-4846b12-unknown

GROMACS is written by:
Emile Apol         Rossen Apostolov   Herman J.C. Berendsen Par Bjelkmar       
Aldert van Buuren  Rudi van Drunen    Anton Feenstra     Sebastian Fritsch  
Gerrit Groenhof    Christoph Junghans Peter Kasson       Carsten Kutzner    
Per Larsson        Justin A. Lemkul   Magnus Lundborg    Pieter Meulenhoff  
Erik Marklund      Teemu Murtola      Szilard Pall       Sander Pronk       
Roland Schulz      Alexey Shvetsov    Michael Shirts     Alfons Sijbers     
Peter Tieleman     Christian Wennberg Maarten Wolf       
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.


GROMACS:      GROMACS, VERSION 5.0.4-20161122-4846b12-unknown

Gromacs version:    VERSION 5.0.4-20161122-4846b12-unknown
GIT SHA1 hash:      4846b12ba1ad097bbb8e24164b2e54ab4c5dc17b
Branched from:      unknown
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     disabled
GPU support:        disabled
invsqrt routine:    gmx_software_invsqrt(x)
SIMD instructions:  AVX_256
FFT library:        fftw-3.3.4-sse2-avx
RDTSCP usage:       disabled
C++11 compilation:  disabled
TNG support:        enabled
Tracing support:    disabled
Built on:           Wed Mar 22 01:02:31 UTC 2017
Built by:           root@69562b3fdcef [CMAKE]
Build OS/arch:      Linux 4.9.0-1-amd64 x86_64
Build CPU vendor:   GenuineIntel
Build CPU brand:    Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
Build CPU family:   6   Model: 58   Stepping: 9
Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 4.9.2
C compiler flags:    -mavx   -I/host/debian-stable-64bit-core-a7-avx-release/libfah/build/src -Wno-maybe-uninitialized -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -Wno-unknown-pragmas  -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
C++ compiler:       /usr/bin/c++ GNU 4.9.2
C++ compiler flags:  -mavx   -I/host/debian-stable-64bit-core-a7-avx-release/libfah/build/src -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -Wno-unknown-pragmas  -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
Boost version:      1.62.0 (external)



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.002
   nsteps                         = 2500000
   init-step                      = 47500000
   simulation-part                = 1
   comm-mode                      = Linear
   nstcomm                        = 100
   bd-fric                        = 0
   ld-seed                        = 1959294850
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 500000
   nstvout                        = 500000
   nstfout                        = 0
   nstlog                         = 50000
   nstcalcenergy                  = 100
   nstenergy                      = 50000
   nstxout-compressed             = 50000
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 10
   ns-type                        = Grid
   pbc                            = xyz
   periodic-molecules             = FALSE
   verlet-buffer-tolerance        = 0.005
   rlist                          = 0.903
   rlistlong                      = 0.903
   nstcalclr                      = 10
   coulombtype                    = PME
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 0.9
   epsilon-r                      = 1
   epsilon-rf                     = inf
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 0.9
   DispCorr                       = EnerPres
   table-extension                = 1
   fourierspacing                 = 0.16
   fourier-nx                     = 24
   fourier-ny                     = 24
   fourier-nz                     = 24
   pme-order                      = 4
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 0
   epsilon-surface                = 0
   implicit-solvent               = No
   gb-algorithm                   = Still
   nstgbradii                     = 1
   rgbradii                       = 1
   gb-epsilon-solvent             = 80
   gb-saltconc                    = 0
   gb-obc-alpha                   = 1
   gb-obc-beta                    = 0.8
   gb-obc-gamma                   = 4.85
   gb-dielectric-offset           = 0.009
   sa-algorithm                   = Ace-approximation
   sa-surface-tension             = 2.05016
   tcoupl                         = Nose-Hoover
   nsttcouple                     = 10
   nh-chain-length                = 1
   print-nose-hoover-chain-variables = FALSE
   pcoupl                         = No
   pcoupltype                     = Isotropic
   nstpcouple                     = -1
   tau-p                          = 1
   compressibility (3x3):
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   ref-p (3x3):
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord-scaling               = No
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = FALSE
   QMconstraints                  = 0
   QMMMscheme                     = 0
   MMChargeScaleFactor            = 1
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = TRUE
   Shake-SOR                      = FALSE
   shake-tol                      = 0.0001
   lincs-order                    = 4
   lincs-iter                     = 1
   lincs-warnangle                = 30
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = no
   rotation                       = FALSE
   interactiveMD                  = FALSE
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = FALSE
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = FALSE
   E-x:
      n = 0
   E-xt:
      n = 0
   E-y:
      n = 0
   E-yt:
      n = 0
   E-z:
      n = 0
   E-zt:
      n = 0
   swapcoords                     = no
   adress                         = FALSE
   userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
grpopts:
   nrdf:      761.79     10122.2
   ref-t:         360         360
   tau-t:           1           1
annealing:          No          No
annealing-npoints:           0           0
   acc:	           0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Initializing Domain Decomposition on 6 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.404 nm, LJ-14, atoms 19 30
  multi-body bonded interactions: 0.454 nm, CMAP Dih., atoms 175 184
Minimum cell size due to bonded interactions: 0.500 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
Estimated maximum distance required for P-LINCS: 0.222 nm
Using 0 separate PME ranks, as there are too few total
 ranks for efficient splitting
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 6 cells with a minimum initial size of 0.625 nm
The maximum allowed number of cells is: X 5 Y 5 Z 5
Domain decomposition grid 3 x 2 x 1, separate PME ranks 0
PME domain decomposition: 3 x 2 x 1
Domain decomposition rank 0, coordinates 0 0 0

Using 6 MPI threads

Detecting CPU SIMD instructions.
Present hardware specification:
Vendor: GenuineIntel
Brand:  Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
Family:  6  Model: 60  Stepping:  3
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX_256


Binary not matching hardware - you might be losing performance.
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX_256


The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.

Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Will do ordinary reciprocal space Ewald sum.
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Cut-off's:   NS: 0.903   Coulomb: 0.9   LJ: 0.9
Long Range LJ corr.: <C6> 3.0399e-04
System total charge: 0.000
Generated table with 951 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 951 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 951 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 951 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 951 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 951 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Using AVX_256 4x4 non-bonded kernels

Using Lorentz-Berthelot Lennard-Jones combination rule

Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 2151


NOTE: The number of threads is not equal to the number of (logical) cores
      and the -pin option is set to auto: will not pin thread to cores.
      This can lead to significant performance degradation.
      Consider using -pin on (and -pinoffset in case you run multiple jobs).


Initializing Parallel LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 116-122
-------- -------- --- Thank You --- -------- --------

The number of constraints is 150
There are inter charge-group constraints,
will communicate selected coordinates each lincs iteration

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------

Setting the maximum number of constraint warnings to -1
maxwarn < 0, will not stop on constraint errors

Linking all bonded interactions to atoms
There are 6745 inter charge-group exclusions,
will use an extra communication step for exclusion forces for PME

The initial number of communication pulses is: X 1 Y 1
The initial domain decomposition cell size is: X 1.23 nm Y 1.85 nm

The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           0.903 nm
            two-body bonded interactions  (-rdd)   0.903 nm
          multi-body bonded interactions  (-rdd)   0.903 nm
  atoms separated by up to 5 constraints  (-rcon)  1.233 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1 Y 1
The minimum size for domain decomposition cells is 0.903 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.73 Y 0.49
The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           0.903 nm
            two-body bonded interactions  (-rdd)   0.903 nm
          multi-body bonded interactions  (-rdd)   0.903 nm
  atoms separated by up to 5 constraints  (-rcon)  0.903 nm


Making 2D domain decomposition grid 3 x 2 x 1, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  rest
There are: 5365 Atoms
Charge group distribution at step 47500000: 911 881 887 890 889 907
Initial temperature: 365.706 K

Started mdrun on rank 0 Wed Feb 21 18:33:47 2018
           Step           Time         Lambda
       47500000    95000.00000        0.00000

   Energies (kJ/mol)
           Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
    2.81980e+02    8.53134e+02    4.00920e+02    5.02920e+01   -8.66576e+01
          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
    2.30543e+02    4.01757e+03    1.04793e+04   -9.91009e+02   -8.30525e+04
   Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.
    8.03228e+02   -6.70131e+04    1.65673e+04   -5.04458e+04   -5.04458e+04
    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
    3.66149e+02   -3.25382e+02    2.53839e+03    3.14117e-06

DD  step 47500009 load imb.: force 20.7%

At step 47500010 the performance loss due to force load imbalance is 3.9 %

NOTE: Turning on dynamic load balancing

DD  step 47549999  vol min/aver 0.807  load imb.: force  1.7%

           Step           Time         Lambda
       47550000    95100.00000        0.00000

   Energies (kJ/mol)
           Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
    3.09016e+02    8.04182e+02    3.14941e+02    5.97698e+01   -8.77557e+01
          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
    1.88819e+02    4.01889e+03    1.01774e+04   -9.91009e+02   -8.17578e+04
   Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.
    7.75933e+02   -6.61877e+04    1.63524e+04   -4.98353e+04   -5.01356e+04
    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
    3.61398e+02   -3.25382e+02    2.19064e+03    3.08043e-06




Does any of those logs help?

Re: Cannot fold A4 cpu but can fold A7

Posted: Wed Feb 21, 2018 10:28 pm
by bruce
beer wrote:As far as I can see then it is all project that is using A4 that fails like that.

Here is a project that is using A4

Code: Select all

05:24:11:WU00:FS00:0xa4:*------------------------------*
05:24:11:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
05:24:11:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
05:24:11:WU00:FS00:0xa4:
05:24:11:WU00:FS00:0xa4:Preparing to commence simulation
05:24:11:WU00:FS00:0xa4:- Ensuring status. Please wait.
05:24:20:WU00:FS00:0xa4:- Looking at optimizations...
05:24:20:WU00:FS00:0xa4:- Working with standard loops on this execution.
05:24:20:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
05:24:20:WU00:FS00:0xa4:- Expanded 887891 -> 2072336 (decompressed 233.3 percent)
05:24:20:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=887891 data_size=2072336, decompressed_data_size=2072336 diff=0
05:24:20:WU00:FS00:0xa4:- Digital signature verified
05:24:20:WU00:FS00:0xa4:
05:24:20:WU00:FS00:0xa4:Project: 8633 (Run 1, Clone 34, Gen 95)
05:24:20:WU00:FS00:0xa4:
05:24:20:WU00:FS00:0xa4:Entering M.D.
05:24:26:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
I'm asking you for the md.log associated with the A4 problem you reported originally. You found a log from Core_21 and Core_a7. You need to capture the log when an A4 WU has been downloaded AND just after the error occurs. That log may say
WU00:FS00:0xa4 or
WU01:FS00:0xa4 or
WU02:FS00:0xa4 or even
WU03:FS00:0xa4

It will be in /var/lib/fahclient/work/??/01 where the ?? will be the number of the WU, not necessarily the number of the slot.

If you no longer have a problem, fine. If it breaks again, please follow my instructions.

Re: Cannot fold A4 cpu but can fold A7

Posted: Thu Feb 22, 2018 4:13 am
by beer
Sorry for not being clear before. The folder /var/lib/fahclient/work/??/01 where ?? match the WU where I have the problem with the A4 core is never created. I can find the the /var/lib/fahclient/work/??/ folder but there is no sub folders in it and it is no subfolder being creating before the problem arrives or after the problem happens. So I am unable to to capture that file since it never is created or existed in the first place.
The only log I can find in that folder looks looks like:

Code: Select all

------------------------------*
Folding@Home Gromacs GB Core
Version 2.27 (Dec. 15, 2010)

Preparing to commence simulation
- Ensuring status. Please wait.
- Looking at optimizations...
- Working with standard loops on this execution.
- Previous termination of core was improper.
- Going to use standard loops.
- Files status OK
- Expanded 825030 -> 1398040 (decompressed 169.4 percent)
Called DecompressByteArray: compressed_data_size=825030 data_size=1398040, decompressed_data_size=1398040 diff=0
- Digital signature verified

Project: 9039 (Run 292, Clone 2, Gen 752)

Entering M.D.
And everytime it fails to start then another log is created jist with a time stamp in the name.

The only workaround is still to to drop all the a4 cores I resive unto I got an A7. I have been looking around in the userinterface but I cannot find any place that do say I will like to fold A7 cores

Re: Cannot fold A4 cpu but can fold A7

Posted: Thu Feb 22, 2018 5:20 pm
by bruce
That sounds like a permissions problem.

FAHClient normally runs as a service within a special user name created at install time. All subdirectories of /var/lib/fahclient should be owned by that user, not by your user name or by root. (The username is "fahclient" or something like that.) If that user cannot create new files whenever and wherever it wants inside of /var/lib/fahclient and later remove them, it will probably fail in the way you've described.

I'm assuming the client is always started using "sudo /etc/init.d/FAHClient start," whether it's started automatically at boot time or you manually invoke that script.

Re: Cannot fold A4 cpu but can fold A7

Posted: Thu Feb 22, 2018 8:55 pm
by beer
You are talking about this line that I found in the log?
20:03:10:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 659 -checkpoint 30 -np 6

I am not the expert in sudo but last time I looked into then the user was either and preveliges was in # User privilege specification and # Allow members of group sudo to execute any command. I cannot see any trace of FAH in my /etc/sudoers-file:

Code: Select all

#
# This file MUST be edited with the 'visudo' command as root.
#
# Please consider adding local content in /etc/sudoers.d/ instead of
# directly modifying this file.
#
# See the man page for details on how to write a sudoers file.
#
Defaults        env_reset
Defaults        mail_badpass
Defaults        secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

# Host alias specification

# User alias specification

# Cmnd alias specification

# User privilege specification
root    ALL=(ALL:ALL) ALL

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL

# See sudoers(5) for more information on "#include" directives:

#includedir /etc/sudoers.d

I am not an expert in permission but the below lines means that everyone has acces to everything in those folder?

Code: Select all

beer@debian:/var/lib/fahclient$ ls -ld work/
drwxrwxrwx 4 fahclient root 4096 Feb 22 19:35 work/
beer@debian:/var/lib/fahclient$ cd work/

Code: Select all

beer@debian:/var/lib/fahclient/work$ ls -ld 00/                                                                                                                                                                                                                                
drwxrwxrwx 3 fahclient root 4096 Feb 22 21:51 00/                                                                                                                                                                                                                              
beer@debian:/var/lib/fahclient/work$    
And there is something that I dont undertand (I presume you are right). When FAH creates this folder /var/lib/fahclient/work/01 then fah would also have permission to create subfolders so how can it be a permission problem?

Re: Cannot fold A4 cpu but can fold A7

Posted: Fri Feb 23, 2018 5:08 am
by bruce
The script "/etc/init.d/FAHClient" that I mentioned above will start the FAHClient service provided it is started with root privileges and the files will, in fact, then belong to that special user I mentioned. On the other hand, if you start FAHClient manually, the files created will belong to you. If a mixture of file ownerships has been created, it needs to be resolved.

Is it possible that FAHClient has been started by two different methods?

Re: Cannot fold A4 cpu but can fold A7

Posted: Fri Feb 23, 2018 6:20 am
by beer
I dont recollect that I have setup anything speciel thus time from my own user (but my memory could be wrong. So lets assume it is). Do you know how I can check what privileged that is running with?

Another thing that puzzles me is that both A7 and A4 is managed by the same slot. The A7 runs perfectly find and only A4 cores fails. So we are hunting down the user that starts FAHClient. And that user have writing privileged if it runs A7 but do not have all the privilege with running A4.
ANd that user can write /var/lib/fahclient/work/00/ but not into /var/lib/fahclient/work/00/01/?

I am a bit if noob into this. How do I check the stuff needed so we can progress on the problem?

Re: Cannot fold A4 cpu but can fold A7

Posted: Fri Feb 23, 2018 9:44 pm
by bruce
Check the ownership of
/var/lib/fahclient/work/* and any subdirectories of each file you find. (You can ignore anything that's missing.)

Re: Cannot fold A4 cpu but can fold A7

Posted: Sat Feb 24, 2018 5:44 am
by beer
This information?

Code: Select all

root@debian:/var/lib/fahclient/work# ls -l 00/
total 5312
drwxrwxrwx 2 fahclient root    4096 Feb 24 06:38 01
-rw-r--r-- 1 fahclient root     592 Feb 24 06:38 logfile_01.txt
-rw-r--r-- 1 fahclient root   11883 Feb 24 06:38 viewerFrame1.json
-rw-r--r-- 1 fahclient root   10075 Feb 24 06:38 viewerTop.json
-rw-r--r-- 1 fahclient root 5395968 Feb 24 06:37 wudata_01.dat
-rw-rw-rw- 1 fahclient root       5 Feb 24 06:37 wudata_01.lock
-rw-r--r-- 1 fahclient root     512 Feb 24 06:38 wuinfo_01.dat

Code: Select all

root@debian:/var/lib/fahclient/work# ls -l .
total 60
drwxrwxrwx 3 fahclient root  4096 Feb 24 06:39 00
drwxrwxrwx 2 fahclient root  4096 Feb 24 06:39 02
-rw-r--r-- 1 fahclient root 32768 Feb 24 06:39 client.db
-rw-r--r-- 1 fahclient root 16928 Feb 24 06:39 client.db-journal

Code: Select all

root@debian:/var/lib/fahclient/work# ls -l 02
total 2320
-rw-rw-rw- 1 fahclient root       0 Feb 23 08:24 core78.sta
-rwxr-x--- 1 fahclient root     577 Feb 23 20:58 logfile_01-20180223-195916.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 20:59 logfile_01-20180223-200016.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:00 logfile_01-20180223-200116.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:01 logfile_01-20180223-200216.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:02 logfile_01-20180223-200316.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:03 logfile_01-20180223-200416.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:04 logfile_01-20180223-200516.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:05 logfile_01-20180223-200616.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:06 logfile_01-20180223-200716.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:07 logfile_01-20180223-200816.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:08 logfile_01-20180223-200916.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:09 logfile_01-20180223-201016.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:10 logfile_01-20180223-201116.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:11 logfile_01-20180223-201216.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:12 logfile_01-20180223-201316.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:13 logfile_01-20180223-201416.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:14 logfile_01-20180223-201516.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:15 logfile_01-20180223-201616.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:16 logfile_01-20180223-201716.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:17 logfile_01-20180223-201816.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:18 logfile_01-20180223-201916.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:19 logfile_01-20180223-202016.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:20 logfile_01-20180223-202116.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:21 logfile_01-20180223-202216.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:22 logfile_01-20180223-202316.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:23 logfile_01-20180223-202416.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 23 21:24 logfile_01-20180224-053649.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 24 06:36 logfile_01-20180224-053705.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 24 06:37 logfile_01-20180224-053805.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 24 06:38 logfile_01-20180224-053905.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 24 06:39 logfile_01-20180224-054005.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     577 Feb 24 06:40 logfile_01-20180224-054105.txt                                                                                                                                                                                                
-rwxr-x--- 1 fahclient root     158 Feb 24 06:41 logfile_01.txt                                                                                                                                                                                                                
-rw-r--r-- 1 fahclient root  826720 Feb 23 08:24 wudata_01.dat                                                                                                                                                                                                                 
-rw-rw-rw- 1 fahclient root      64 Feb 23 08:31 wudata_01.dyn                                                                                                                                                                                                                 
-rw-rw-rw- 1 fahclient root       0 Feb 24 06:40 wudata_01.log                                                                                                                                                                                                                 
-rw-rw-rw- 1 fahclient root 1402156 Feb 24 06:40 wudata_01.tpr                                                                                                                                                                                                                 
-rwxr-x--- 1 fahclient root     512 Feb 24 06:40 wuinfo_01.dat 

Code: Select all

root@debian:/var/lib/fahclient/work/00# ls -l 01/
total 8256
-rw-r--r-- 1 fahclient root 2864327 Feb 24 06:41 checkpointState.xml
-rw-r--r-- 1 fahclient root     536 Feb 24 06:41 checkpt.crc
-rw-r--r-- 1 fahclient root      74 Feb 24 06:37 core.xml
-rw-r--r-- 1 fahclient root     165 Feb 24 06:37 integrator.xml
-rw-rw-rw- 1 fahclient root    4032 Feb 24 06:41 log.txt
-rw-rw-rw- 1 fahclient root  167956 Feb 24 06:41 positions.xtc
-rw-r--r-- 1 fahclient root 2864276 Feb 24 06:37 state.xml
-rw-r--r-- 1 fahclient root       6 Feb 24 06:41 stepsDone
-rw-r--r-- 1 fahclient root 2526633 Feb 24 06:37 system.xml

Code: Select all

root@debian:/var/lib/fahclient/work/02# ls -l .
total 2320
-rw-rw-rw- 1 fahclient root       0 Feb 23 08:24 core78.sta
-rwxr-x--- 1 fahclient root     577 Feb 23 20:59 logfile_01-20180223-200016.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:00 logfile_01-20180223-200116.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:01 logfile_01-20180223-200216.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:02 logfile_01-20180223-200316.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:03 logfile_01-20180223-200416.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:04 logfile_01-20180223-200516.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:05 logfile_01-20180223-200616.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:06 logfile_01-20180223-200716.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:07 logfile_01-20180223-200816.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:08 logfile_01-20180223-200916.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:09 logfile_01-20180223-201016.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:10 logfile_01-20180223-201116.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:11 logfile_01-20180223-201216.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:12 logfile_01-20180223-201316.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:13 logfile_01-20180223-201416.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:14 logfile_01-20180223-201516.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:15 logfile_01-20180223-201616.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:16 logfile_01-20180223-201716.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:17 logfile_01-20180223-201816.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:18 logfile_01-20180223-201916.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:19 logfile_01-20180223-202016.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:20 logfile_01-20180223-202116.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:21 logfile_01-20180223-202216.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:22 logfile_01-20180223-202316.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:23 logfile_01-20180223-202416.txt
-rwxr-x--- 1 fahclient root     577 Feb 23 21:24 logfile_01-20180224-053649.txt
-rwxr-x--- 1 fahclient root     577 Feb 24 06:36 logfile_01-20180224-053705.txt
-rwxr-x--- 1 fahclient root     577 Feb 24 06:37 logfile_01-20180224-053805.txt
-rwxr-x--- 1 fahclient root     577 Feb 24 06:38 logfile_01-20180224-053905.txt
-rwxr-x--- 1 fahclient root     577 Feb 24 06:39 logfile_01-20180224-054005.txt
-rwxr-x--- 1 fahclient root     577 Feb 24 06:40 logfile_01-20180224-054105.txt
-rwxr-x--- 1 fahclient root     577 Feb 24 06:41 logfile_01-20180224-054205.txt
-rwxr-x--- 1 fahclient root     577 Feb 24 06:42 logfile_01.txt
-rw-r--r-- 1 fahclient root  826720 Feb 23 08:24 wudata_01.dat
-rw-rw-rw- 1 fahclient root      64 Feb 23 08:31 wudata_01.dyn
-rw-rw-rw- 1 fahclient root       0 Feb 24 06:42 wudata_01.log
-rw-rw-rw- 1 fahclient root 1402156 Feb 24 06:42 wudata_01.tpr
-rwxr-x--- 1 fahclient root     512 Feb 24 06:42 wuinfo_01.dat

Code: Select all

root@debian:/var/lib/fahclient/work/00/01# ls -l .
total 8384
-rw-r--r-- 1 fahclient root 2864205 Feb 24 06:43 checkpointState.xml
-rw-r--r-- 1 fahclient root     656 Feb 24 06:43 checkpt.crc
-rw-r--r-- 1 fahclient root      74 Feb 24 06:37 core.xml
-rw-r--r-- 1 fahclient root     165 Feb 24 06:37 integrator.xml
-rw-rw-rw- 1 fahclient root    4315 Feb 24 06:43 log.txt
-rw-rw-rw- 1 fahclient root  293892 Feb 24 06:43 positions.xtc
-rw-r--r-- 1 fahclient root 2864276 Feb 24 06:37 state.xml
-rw-r--r-- 1 fahclient root       6 Feb 24 06:43 stepsDone
-rw-r--r-- 1 fahclient root 2526633 Feb 24 06:37 system.xml


Re: Cannot fold A4 cpu but can fold A7

Posted: Sat Feb 24, 2018 6:09 pm
by bruce
What WU is shown in /var/lib/fahclient/work/02/logfile_01.txt? (I suspect that all of the other 577 byte files contain the same information.)

That WU is clearly having difficulties though I'm not sure what the problem is.

I suspect the problem will be cured if you run ./fahclient.exe --dump 02 but I'd like to know more about what's currently happening there.
You'll probably need to pause all WUs before doing that.

(From the information I see here, we don't know which slot is currently processing that WU.)

Also, there's a chance that this report may help. Subject: Core a4 WUs segfaulting on kernel 4.15
Apparently the fix found there also applies as far up as kernel 4.8 so it's worth a try.

Core a4 WUs segfaulting on kernel 4.15

Posted: Sat Feb 24, 2018 6:47 pm
by butc8
Ever since upgrading the kernel to 4.15 core a4 WUs have stopped working (core a7 still work). The WU doesn't error out but is interrupted and restart, hence making no progress. Log attached.

Code: Select all

*********************** Log Started 2018-02-24T18:40:47Z ***********************
18:40:47:************************* Folding@home Client *************************
18:40:47:    Website: http://folding.stanford.edu/
18:40:47:  Copyright: (c) 2009-2014 Stanford University
18:40:47:     Author: Joseph Coffland <[email protected]>
18:40:47:       Args: --config /opt/fah/config.xml --exec-directory=/opt/fah
18:40:47:             --data-directory=/opt/fah
18:40:47:     Config: /opt/fah/config.xml
18:40:47:******************************** Build ********************************
18:40:47:    Version: 7.4.4
18:40:47:       Date: Mar 4 2014
18:40:47:       Time: 12:02:38
18:40:47:    SVN Rev: 4130
18:40:47:     Branch: fah/trunk/client
18:40:47:   Compiler: GNU 4.4.7
18:40:47:Started thread 1 on PID 4364
18:40:47:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
18:40:47:             -fno-unsafe-math-optimizations -msse2
18:40:47:   Platform: linux2 3.2.0-1-amd64
18:40:47:       Bits: 64
18:40:47:       Mode: Release
18:40:47:******************************* System ********************************
18:40:47:        CPU: Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz
18:40:47:     CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
18:40:47:       CPUs: 4
18:40:47:     Memory: 15.40GiB
18:40:47:Free Memory: 6.09GiB
18:40:47:    Threads: POSIX_THREADS
18:40:47: OS Version: 4.15
18:40:47:Has Battery: true
18:40:47: On Battery: false
18:40:47: UTC Offset: 2
18:40:47:        PID: 4364
18:40:47:        CWD: /opt/fah
18:40:47:         OS: Linux 4.15.5-1-ck-skylake x86_64
18:40:47:    OS Arch: AMD64
18:40:47:       GPUs: 1
18:40:47:      GPU 0: NVIDIA:4 GM107 [GeForce GTX 960M]
18:40:47:       CUDA: 5.0
18:40:47:CUDA Driver: 9010
18:40:47:***********************************************************************
18:40:47:<config>
18:40:47:  <!-- Client Control -->
18:40:47:  <client-threads v='6'/>
18:40:47:  <cycle-rate v='4'/>
18:40:47:  <cycles v='-1'/>
18:40:47:  <data-directory v='/opt/fah'/>
18:40:47:  <disable-sleep-when-active v='true'/>
18:40:47:  <exec-directory v='/opt/fah'/>
18:40:47:  <exit-when-done v='false'/>
18:40:47:  <fold-anon v='false'/>
18:40:47:  <idle-seconds v='300'/>
18:40:47:  <open-web-control v='false'/>
18:40:47:
18:40:47:  <!-- Configuration -->
18:40:47:  <config-rotate v='true'/>
18:40:47:  <config-rotate-dir v='configs'/>
18:40:47:  <config-rotate-max v='16'/>
18:40:47:
18:40:47:  <!-- Debugging -->
18:40:47:  <assignment-servers>
18:40:47:    assign3.stanford.edu:8080 assign4.stanford.edu:80
18:40:47:  </assignment-servers>
18:40:47:  <auth-as v='true'/>
18:40:47:  <capture-directory v='capture'/>
18:40:47:  <capture-on-error v='false'/>
18:40:47:  <capture-packets v='false'/>
18:40:47:  <capture-requests v='false'/>
18:40:47:  <capture-responses v='false'/>
18:40:47:  <capture-sockets v='false'/>
18:40:47:  <core-exec v='FahCore_$type'/>
18:40:47:  <core-wrapper-exec v='FAHCoreWrapper'/>
18:40:47:  <debug-sockets v='false'/>
18:40:47:  <exception-locations v='true'/>
18:40:47:  <gpu-assignment-servers>
18:40:47:    assign-GPU.stanford.edu:80 assign-GPU2.stanford.edu:80
18:40:47:  </gpu-assignment-servers>
18:40:47:  <stack-traces v='false'/>
18:40:47:
18:40:47:  <!-- Error Handling -->
18:40:47:  <max-slot-errors v='10'/>
18:40:47:  <max-unit-errors v='5'/>
18:40:47:
18:40:47:  <!-- Folding Core -->
18:40:47:  <checkpoint v='15'/>
18:40:47:  <core-dir v='cores'/>
18:40:47:  <core-priority v='idle'/>
18:40:47:  <cpu-affinity v='false'/>
18:40:47:  <cpu-usage v='100'/>
18:40:47:  <gpu-usage v='100'/>
18:40:47:  <no-assembly v='false'/>
18:40:47:
18:40:47:  <!-- Folding Slot Configuration -->
18:40:47:  <cause v='ANY'/>
18:40:47:  <client-subtype v='LINUX'/>
18:40:47:  <client-type v='normal'/>
18:40:47:  <cpu-species v='X86_PENTIUM_II'/>
18:40:47:  <cpu-type v='AMD64'/>
18:40:47:  <cpus v='-1'/>
18:40:47:  <gpu v='true'/>
18:40:47:  <max-packet-size v='normal'/>
18:40:47:  <os-species v='UNKNOWN'/>
18:40:47:  <os-type v='LINUX'/>
18:40:47:  <project-key v='0'/>
18:40:47:  <smp v='true'/>
18:40:47:
18:40:47:  <!-- GUI -->
18:40:47:  <gui-enabled v='true'/>
18:40:47:
18:40:47:  <!-- HTTP Server -->
18:40:47:  <allow v='127.0.0.1'/>
18:40:47:  <connection-timeout v='60'/>
18:40:47:  <deny v='0/0'/>
18:40:47:  <http-addresses v='0:7396'/>
18:40:47:  <https-addresses v=''/>
18:40:47:  <max-connect-time v='900'/>
18:40:47:  <max-connections v='800'/>
18:40:47:  <max-request-length v='52428800'/>
18:40:47:  <min-connect-time v='300'/>
18:40:47:  <threads v='4'/>
18:40:47:
18:40:47:  <!-- Logging -->
18:40:47:  <log v='log.txt'/>
18:40:47:  <log-color v='true'/>
18:40:47:  <log-crlf v='false'/>
18:40:47:  <log-date v='false'/>
18:40:47:  <log-date-periodically v='21600'/>
18:40:47:  <log-debug v='true'/>
18:40:47:  <log-domain v='false'/>
18:40:47:  <log-header v='true'/>
18:40:47:  <log-level v='true'/>
18:40:47:  <log-no-info-header v='true'/>
18:40:47:  <log-redirect v='false'/>
18:40:47:  <log-rotate v='true'/>
18:40:47:  <log-rotate-dir v='logs'/>
18:40:47:  <log-rotate-max v='16'/>
18:40:47:  <log-short-level v='false'/>
18:40:47:  <log-simple-domains v='true'/>
18:40:47:  <log-thread-id v='false'/>
18:40:47:  <log-thread-prefix v='true'/>
18:40:47:  <log-time v='true'/>
18:40:47:  <log-to-screen v='true'/>
18:40:47:  <log-truncate v='false'/>
18:40:47:  <verbosity v='5'/>
18:40:47:
18:40:47:  <!-- Network -->
18:40:47:  <proxy v=':8080'/>
18:40:47:  <proxy-enable v='false'/>
18:40:47:  <proxy-pass v=''/>
18:40:47:  <proxy-user v=''/>
18:40:47:
18:40:47:  <!-- Process Control -->
18:40:47:  <child v='false'/>
18:40:47:  <daemon v='false'/>
18:40:47:  <fork v='false'/>
18:40:47:  <pid v='false'/>
18:40:47:  <pid-file v='Folding@home Client.pid'/>
18:40:47:  <respawn v='false'/>
18:40:47:  <service v='false'/>
18:40:47:
18:40:47:  <!-- Remote Command Server -->
18:40:47:  <command-address v='0.0.0.0'/>
18:40:47:  <command-allow-no-pass v='127.0.0.1'/>
18:40:47:  <command-deny-no-pass v='0/0'/>
18:40:47:  <command-enable v='true'/>
18:40:47:  <command-port v='36330'/>
18:40:47:
18:40:47:  <!-- Slot Control -->
18:40:47:  <idle v='false'/>
18:40:47:  <max-shutdown-wait v='60'/>
18:40:47:  <pause-on-battery v='true'/>
18:40:47:  <pause-on-start v='false'/>
18:40:47:  <paused v='false'/>
18:40:47:  <power v='full'/>
18:40:47:
18:40:47:  <!-- User Information -->
18:40:47:  <machine-id v='0'/>
18:40:47:  <passkey v='********************************'/>
18:40:47:  <team v='224497'/>
18:40:47:  <user v='MeissnerEffect'/>
18:40:47:
18:40:47:  <!-- Web Server -->
18:40:47:  <web-allow v='127.0.0.1'/>
18:40:47:  <web-deny v='0/0'/>
18:40:47:  <web-enable v='true'/>
18:40:47:
18:40:47:  <!-- Web Server Sessions -->
18:40:47:  <session-cookie v='sid'/>
18:40:47:  <session-lifetime v='86400'/>
18:40:47:  <session-timeout v='3600'/>
18:40:47:
18:40:47:  <!-- Work Unit Control -->
18:40:47:  <dump-after-deadline v='true'/>
18:40:47:  <max-queue v='16'/>
18:40:47:  <max-units v='0'/>
18:40:47:  <next-unit-percentage v='99'/>
18:40:47:  <stall-detection-enabled v='false'/>
18:40:47:  <stall-percent v='5'/>
18:40:47:  <stall-timeout v='1800'/>
18:40:47:
18:40:47:  <!-- Folding Slots -->
18:40:47:  <slot id='0' type='CPU'>
18:40:47:    <cpus v='3'/>
18:40:47:    <paused v='true'/>
18:40:47:  </slot>
18:40:47:  <slot id='1' type='GPU'/>
18:40:47:</config>
18:40:47:Trying to access database...
18:40:47:Successfully acquired database lock
18:40:47:Enabled folding slot 00: PAUSED cpu:3 (by user)
18:40:47:Enabled folding slot 01: READY gpu:0:GM107 [GeForce GTX 960M]
18:40:47:Started thread 5 on PID 4364
18:40:47:WU01:FS01:Starting
18:40:47:Started thread 10 on PID 4364
18:40:47:Started thread 6 on PID 4364
18:40:47:Started thread 7 on PID 4364
18:40:47:Started thread 8 on PID 4364
18:40:47:Started thread 9 on PID 4364
18:40:47:WU01:FS01:Running FahCore: /opt/fah/FAHCoreWrapper /opt/fah/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 4364 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
18:40:47:WU01:FS01:Started FahCore on PID 4435
18:40:47:Started thread 11 on PID 4364
18:40:47:WU01:FS01:Core PID:4439
18:40:47:WU01:FS01:FahCore 0x21 started
18:40:47:WU01:FS01:0x21:*********************** Log Started 2018-02-24T18:40:47Z ***********************
18:40:47:WU01:FS01:0x21:Project: 9842 (Run 6, Clone 2, Gen 445)
18:40:47:WU01:FS01:0x21:Unit: 0x000001f2ab436ca05a1f05abfdc0e298
18:40:47:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
18:40:47:WU01:FS01:0x21:Machine: 1
18:40:47:WU01:FS01:0x21:Digital signatures verified
18:40:47:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
18:40:47:WU01:FS01:0x21:Version 0.0.18
18:40:47:WU01:FS01:0x21:  Found a checkpoint file
18:40:49:Started thread 12 on PID 4364
18:40:49:WU01:FS01:0x21:Completed 1600000 out of 2400000 steps (66%)
18:40:49:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
18:40:54:FS00:Unpaused
18:40:54:WU02:FS00:Starting
18:40:54:WU02:FS00:Removing old file '/opt/fah/work/02/logfile_01-20180224-121548.txt'
18:40:54:WU02:FS00:Running FahCore: /opt/fah/FAHCoreWrapper /opt/fah/cores/fahwebx.stanford.edu/cores/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 02 -suffix 01 -version 704 -lifeline 4364 -checkpoint 15 -np 3
18:40:54:WU02:FS00:Started FahCore on PID 4528
18:40:54:Started thread 13 on PID 4364
18:40:54:WU02:FS00:Core PID:4532
18:40:54:WU02:FS00:FahCore 0xa4 started
18:40:55:WU02:FS00:0xa4:
18:40:55:WU02:FS00:0xa4:*------------------------------*
18:40:55:WU02:FS00:0xa4:Folding@Home Gromacs GB Core
18:40:55:WU02:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
18:40:55:WU02:FS00:0xa4:
18:40:55:WU02:FS00:0xa4:Preparing to commence simulation
18:40:55:WU02:FS00:0xa4:- Ensuring status. Please wait.
18:41:04:WU02:FS00:0xa4:- Looking at optimizations...
18:41:04:WU02:FS00:0xa4:- Working with standard loops on this execution.
18:41:04:WU02:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
18:41:04:WU02:FS00:0xa4:- Expanded 659985 -> 1554608 (decompressed 235.5 percent)
18:41:04:WU02:FS00:0xa4:Called DecompressByteArray: compressed_data_size=659985 data_size=1554608, decompressed_data_size=1554608 diff=0
18:41:04:WU02:FS00:0xa4:- Digital signature verified
18:41:04:WU02:FS00:0xa4:
18:41:04:WU02:FS00:0xa4:Project: 14041 (Run 7, Clone 64, Gen 1)
18:41:04:WU02:FS00:0xa4:
18:41:04:WU02:FS00:0xa4:Entering M.D.
18:41:11:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
18:41:11:WU02:FS00:Starting
18:41:11:WU02:FS00:Removing old file '/opt/fah/work/02/logfile_01-20180224-121648.txt'
18:41:11:WU02:FS00:Running FahCore: /opt/fah/FAHCoreWrapper /opt/fah/cores/fahwebx.stanford.edu/cores/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 02 -suffix 01 -version 704 -lifeline 4364 -checkpoint 15 -np 3
18:41:11:WU02:FS00:Started FahCore on PID 4576
18:41:11:Started thread 14 on PID 4364
18:41:11:WU02:FS00:Core PID:4580
18:41:11:WU02:FS00:FahCore 0xa4 started
18:41:11:WU02:FS00:0xa4:
18:41:11:WU02:FS00:0xa4:*------------------------------*
18:41:11:WU02:FS00:0xa4:Folding@Home Gromacs GB Core
18:41:11:WU02:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
18:41:11:WU02:FS00:0xa4:
18:41:11:WU02:FS00:0xa4:Preparing to commence simulation
18:41:11:WU02:FS00:0xa4:- Ensuring status. Please wait.
18:41:20:WU02:FS00:0xa4:- Looking at optimizations...
18:41:20:WU02:FS00:0xa4:- Working with standard loops on this execution.
18:41:20:WU02:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
18:41:21:WU02:FS00:0xa4:- Expanded 659985 -> 1554608 (decompressed 235.5 percent)
18:41:21:WU02:FS00:0xa4:Called DecompressByteArray: compressed_data_size=659985 data_size=1554608, decompressed_data_size=1554608 diff=0
18:41:21:WU02:FS00:0xa4:- Digital signature verified
18:41:21:WU02:FS00:0xa4:
18:41:21:WU02:FS00:0xa4:Project: 14041 (Run 7, Clone 64, Gen 1)
18:41:21:WU02:FS00:0xa4:
18:41:21:WU02:FS00:0xa4:Entering M.D.
18:41:27:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Having a look a the output of dmesg, it seems as if there is a segfault.

Code: Select all

[ 7290.795572] FahCore_a4[5393]: segfault at ffffffffff600400 ip ffffffffff600400 sp 00007f316fa83d88 error 15
[ 7350.913653] FahCore_a4[5626] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7f395887bd88 ax:ffffffffff600400 si:7f395887bc2a di:7f395887bdc8
This message shows up everytime the core is restarted.

My specs are

Dell Inspiron 7559
Core i5-6300HQ
Arch Linux
Linux kernel 4.15.5-1

Thank you :)

Re: Core a4 WUs segfaulting on kernel 4.15

Posted: Sat Feb 24, 2018 7:39 pm
by toTOW
I think that I read somewhere that this kernel was full of bugs and should be avoided (and not specifically for FAH) ... I don't remember to what exact version it was applying, but you might be facing one of those bugs ...

Re: Core a4 WUs segfaulting on kernel 4.15

Posted: Sat Feb 24, 2018 8:05 pm
by butc8
I have discovered a workaround, adding the kernel parameter "vsyscall=emulate" seems to fix the problem.

Re: Core a4 WUs segfaulting on kernel 4.15

Posted: Wed Feb 28, 2018 6:00 am
by beer
Hi
I think I have the same situration:
viewtopic.php?f=72&t=30654&start=15

I am a bit of a linux newbie. I have tried to google how/where to enter "vsyscall=emulate" on a debian system. But without no good results. Can you point me in the right directions (I am running Debian testing)

Re: Core a4 WUs segfaulting on kernel 4.15

Posted: Wed Feb 28, 2018 5:10 pm
by Dr. Merkwürdigliebe
No problems here with 4.15.4 and 4.15.7 whatsoever...Ubuntu 17.10 x64

Edit:

Because the Ubuntu kernel seems to have this option set at compile time

Code: Select all

/boot$ cat config-4.15.4-041504-generic | grep -i emulate
CONFIG_LEGACY_VSYSCALL_EMULATE=y