Page 1 of 1

Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 5:50 pm
by Gnomuz
In the past 12 hours, I got my first three failed WU (see one example in log below) with project 17215, which seems to be a recent project.

Although I'm rather new to F@H, I think I understand it is a hardware compatibility issue. I have a 16 threads CPU (AMD Ryzen 7 3700X), and also fold with a Quadro P2000 GPU. So 15 threads are available for CPU computing, which obviously is not compatible with this very project. I read in a pinned thread by PantherX that these issues were to be reported here so that the F@H team can prevent that project from being assigned to other folders with a similar setup.
I hope that will help, and let's keep on folding !

Code: Select all

06:40:58:WU01:FS00:Connecting to assign1.foldingathome.org:80
06:40:59:WU01:FS00:Assigned to work server 128.252.203.10
06:40:59:WU01:FS00:Requesting new work unit for slot 00: cpu:15 from 128.252.203.10
06:40:59:WU01:FS00:Connecting to 128.252.203.10:8080
06:41:00:WU01:FS00:Downloading 2.07MiB
06:41:02:WU01:FS00:Download complete
06:41:02:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:17215 run:1649 clone:0 gen:47 core:0xa7 unit:0x0000003680fccb0a5fab2b3dc12ec25e
06:41:02:WU01:FS00:Starting
06:41:02:WU01:FS00:Running FahCore: /app/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 269 -checkpoint 10 -np 15
06:41:02:WU01:FS00:Started FahCore on PID 586
06:41:02:WU01:FS00:Core PID:590
06:41:02:WU01:FS00:FahCore 0xa7 started
06:41:03:WU01:FS00:0xa7:*********************** Log Started 2020-11-21T06:41:02Z ***********************
06:41:03:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
06:41:03:WU01:FS00:0xa7:       Type: 0xa7
06:41:03:WU01:FS00:0xa7:       Core: Gromacs
06:41:03:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 586 -checkpoint 10 -np 15
06:41:03:WU01:FS00:0xa7:************************************ CBang *************************************
06:41:03:WU01:FS00:0xa7:       Date: Nov 27 2019
06:41:03:WU01:FS00:0xa7:       Time: 11:26:54
06:41:03:WU01:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
06:41:03:WU01:FS00:0xa7:     Branch: master
06:41:03:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:41:03:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:41:03:WU01:FS00:0xa7:             -fno-pie -fPIC
06:41:03:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:41:03:WU01:FS00:0xa7:       Bits: 64
06:41:03:WU01:FS00:0xa7:       Mode: Release
06:41:03:WU01:FS00:0xa7:************************************ System ************************************
06:41:03:WU01:FS00:0xa7:        CPU: AMD Ryzen 7 3700X 8-Core Processor
06:41:03:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
06:41:03:WU01:FS00:0xa7:       CPUs: 16
06:41:03:WU01:FS00:0xa7:     Memory: 31.36GiB
06:41:03:WU01:FS00:0xa7:Free Memory: 1.29GiB
06:41:03:WU01:FS00:0xa7:    Threads: POSIX_THREADS
06:41:03:WU01:FS00:0xa7: OS Version: 5.8
06:41:03:WU01:FS00:0xa7:Has Battery: false
06:41:03:WU01:FS00:0xa7: On Battery: false
06:41:03:WU01:FS00:0xa7: UTC Offset: 1
06:41:03:WU01:FS00:0xa7:        PID: 590
06:41:03:WU01:FS00:0xa7:        CWD: /config/work
06:41:03:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
06:41:03:WU01:FS00:0xa7:    Version: 0.0.19
06:41:03:WU01:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
06:41:03:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
06:41:03:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
06:41:03:WU01:FS00:0xa7:       Date: Nov 26 2019
06:41:03:WU01:FS00:0xa7:       Time: 00:41:42
06:41:03:WU01:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
06:41:03:WU01:FS00:0xa7:     Branch: master
06:41:03:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:41:03:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:41:03:WU01:FS00:0xa7:             -fno-pie
06:41:03:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:41:03:WU01:FS00:0xa7:       Bits: 64
06:41:03:WU01:FS00:0xa7:       Mode: Release
06:41:03:WU01:FS00:0xa7:************************************ Build *************************************
06:41:03:WU01:FS00:0xa7:       SIMD: avx_256
06:41:03:WU01:FS00:0xa7:********************************************************************************
06:41:03:WU01:FS00:0xa7:Project: 17215 (Run 1649, Clone 0, Gen 47)
06:41:03:WU01:FS00:0xa7:Unit: 0x0000003680fccb0a5fab2b3dc12ec25e
06:41:03:WU01:FS00:0xa7:Reading tar file core.xml
06:41:03:WU01:FS00:0xa7:Reading tar file frame47.tpr
06:41:03:WU01:FS00:0xa7:Digital signatures verified
06:41:03:WU01:FS00:0xa7:Calling: mdrun -s frame47.tpr -o frame47.trr -x frame47.xtc -cpt 10 -nt 15
06:41:03:WU01:FS00:0xa7:Steps: first=11750000 total=250000
06:41:03:WU01:FS00:0xa7:ERROR:
06:41:03:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:41:03:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
06:41:03:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
06:41:03:WU01:FS00:0xa7:ERROR:
06:41:03:WU01:FS00:0xa7:ERROR:Fatal error:
06:41:03:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
06:41:03:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
06:41:03:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
06:41:03:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
06:41:03:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
06:41:03:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:41:07:WU01:FS00:0xa7:WARNING:Unexpected exit
\x1b[93m06:41:08:WARNING:WU01:FS00:FahCore returned: EARLY_UNIT_END (123 = 0x7b)\x1b[0m
06:41:08:WU01:FS00:Starting
06:41:08:WU01:FS00:Running FahCore: /app/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 269 -checkpoint 10 -np 15
06:41:08:WU01:FS00:Started FahCore on PID 608
06:41:08:WU01:FS00:Core PID:612
06:41:08:WU01:FS00:FahCore 0xa7 started
06:41:08:WU01:FS00:0xa7:*********************** Log Started 2020-11-21T06:41:08Z ***********************
06:41:08:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
06:41:08:WU01:FS00:0xa7:       Type: 0xa7
06:41:08:WU01:FS00:0xa7:       Core: Gromacs
06:41:08:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 608 -checkpoint 10 -np 15
06:41:08:WU01:FS00:0xa7:************************************ CBang *************************************
06:41:08:WU01:FS00:0xa7:       Date: Nov 27 2019
06:41:08:WU01:FS00:0xa7:       Time: 11:26:54
06:41:08:WU01:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
06:41:08:WU01:FS00:0xa7:     Branch: master
06:41:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:41:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:41:08:WU01:FS00:0xa7:             -fno-pie -fPIC
06:41:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:41:08:WU01:FS00:0xa7:       Bits: 64
06:41:08:WU01:FS00:0xa7:       Mode: Release
06:41:08:WU01:FS00:0xa7:************************************ System ************************************
06:41:08:WU01:FS00:0xa7:        CPU: AMD Ryzen 7 3700X 8-Core Processor
06:41:08:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
06:41:08:WU01:FS00:0xa7:       CPUs: 16
06:41:08:WU01:FS00:0xa7:     Memory: 31.36GiB
06:41:08:WU01:FS00:0xa7:Free Memory: 1.28GiB
06:41:08:WU01:FS00:0xa7:    Threads: POSIX_THREADS
06:41:08:WU01:FS00:0xa7: OS Version: 5.8
06:41:08:WU01:FS00:0xa7:Has Battery: false
06:41:08:WU01:FS00:0xa7: On Battery: false
06:41:08:WU01:FS00:0xa7: UTC Offset: 1
06:41:08:WU01:FS00:0xa7:        PID: 612
06:41:08:WU01:FS00:0xa7:        CWD: /config/work
06:41:08:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
06:41:08:WU01:FS00:0xa7:    Version: 0.0.19
06:41:08:WU01:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
06:41:08:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
06:41:08:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
06:41:08:WU01:FS00:0xa7:       Date: Nov 26 2019
06:41:08:WU01:FS00:0xa7:       Time: 00:41:42
06:41:08:WU01:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
06:41:08:WU01:FS00:0xa7:     Branch: master
06:41:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:41:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:41:08:WU01:FS00:0xa7:             -fno-pie
06:41:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:41:08:WU01:FS00:0xa7:       Bits: 64
06:41:08:WU01:FS00:0xa7:       Mode: Release
06:41:08:WU01:FS00:0xa7:************************************ Build *************************************
06:41:08:WU01:FS00:0xa7:       SIMD: avx_256
06:41:08:WU01:FS00:0xa7:********************************************************************************
06:41:08:WU01:FS00:0xa7:Project: 17215 (Run 1649, Clone 0, Gen 47)
06:41:08:WU01:FS00:0xa7:Unit: 0x0000003680fccb0a5fab2b3dc12ec25e
06:41:08:WU01:FS00:0xa7:Reading tar file core.xml
06:41:08:WU01:FS00:0xa7:Reading tar file frame47.tpr
06:41:08:WU01:FS00:0xa7:Digital signatures verified
06:41:08:WU01:FS00:0xa7:Calling: mdrun -s frame47.tpr -o frame47.trr -x frame47.xtc -cpt 10 -nt 15
06:41:08:WU01:FS00:0xa7:Steps: first=11750000 total=250000
06:41:08:WU01:FS00:0xa7:ERROR:
06:41:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:41:08:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
06:41:08:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
06:41:08:WU01:FS00:0xa7:ERROR:
06:41:08:WU01:FS00:0xa7:ERROR:Fatal error:
06:41:08:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
06:41:08:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
06:41:08:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
06:41:08:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
06:41:08:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
06:41:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:41:13:WU01:FS00:0xa7:WARNING:Unexpected exit
\x1b[93m06:41:13:WARNING:WU01:FS00:FahCore returned: EARLY_UNIT_END (123 = 0x7b)\x1b[0m
06:42:08:WU01:FS00:Starting
06:42:08:WU01:FS00:Running FahCore: /app/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 269 -checkpoint 10 -np 15
06:42:08:WU01:FS00:Started FahCore on PID 630
06:42:08:WU01:FS00:Core PID:634
06:42:08:WU01:FS00:FahCore 0xa7 started
06:42:08:WU01:FS00:0xa7:*********************** Log Started 2020-11-21T06:42:08Z ***********************
06:42:08:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
06:42:08:WU01:FS00:0xa7:       Type: 0xa7
06:42:08:WU01:FS00:0xa7:       Core: Gromacs
06:42:08:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 630 -checkpoint 10 -np 15
06:42:08:WU01:FS00:0xa7:************************************ CBang *************************************
06:42:08:WU01:FS00:0xa7:       Date: Nov 27 2019
06:42:08:WU01:FS00:0xa7:       Time: 11:26:54
06:42:08:WU01:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
06:42:08:WU01:FS00:0xa7:     Branch: master
06:42:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:42:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:42:08:WU01:FS00:0xa7:             -fno-pie -fPIC
06:42:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:42:08:WU01:FS00:0xa7:       Bits: 64
06:42:08:WU01:FS00:0xa7:       Mode: Release
06:42:08:WU01:FS00:0xa7:************************************ System ************************************
06:42:08:WU01:FS00:0xa7:        CPU: AMD Ryzen 7 3700X 8-Core Processor
06:42:08:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
06:42:08:WU01:FS00:0xa7:       CPUs: 16
06:42:08:WU01:FS00:0xa7:     Memory: 31.36GiB
06:42:08:WU01:FS00:0xa7:Free Memory: 1.27GiB
06:42:08:WU01:FS00:0xa7:    Threads: POSIX_THREADS
06:42:08:WU01:FS00:0xa7: OS Version: 5.8
06:42:08:WU01:FS00:0xa7:Has Battery: false
06:42:08:WU01:FS00:0xa7: On Battery: false
06:42:08:WU01:FS00:0xa7: UTC Offset: 1
06:42:08:WU01:FS00:0xa7:        PID: 634
06:42:08:WU01:FS00:0xa7:        CWD: /config/work
06:42:08:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
06:42:08:WU01:FS00:0xa7:    Version: 0.0.19
06:42:08:WU01:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
06:42:08:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
06:42:08:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
06:42:08:WU01:FS00:0xa7:       Date: Nov 26 2019
06:42:08:WU01:FS00:0xa7:       Time: 00:41:42
06:42:08:WU01:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
06:42:08:WU01:FS00:0xa7:     Branch: master
06:42:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:42:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:42:08:WU01:FS00:0xa7:             -fno-pie
06:42:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:42:08:WU01:FS00:0xa7:       Bits: 64
06:42:08:WU01:FS00:0xa7:       Mode: Release
06:42:08:WU01:FS00:0xa7:************************************ Build *************************************
06:42:08:WU01:FS00:0xa7:       SIMD: avx_256
06:42:08:WU01:FS00:0xa7:********************************************************************************
06:42:08:WU01:FS00:0xa7:Project: 17215 (Run 1649, Clone 0, Gen 47)
06:42:08:WU01:FS00:0xa7:Unit: 0x0000003680fccb0a5fab2b3dc12ec25e
06:42:08:WU01:FS00:0xa7:Reading tar file core.xml
06:42:08:WU01:FS00:0xa7:Reading tar file frame47.tpr
06:42:08:WU01:FS00:0xa7:Digital signatures verified
06:42:08:WU01:FS00:0xa7:Calling: mdrun -s frame47.tpr -o frame47.trr -x frame47.xtc -cpt 10 -nt 15
06:42:08:WU01:FS00:0xa7:Steps: first=11750000 total=250000
06:42:08:WU01:FS00:0xa7:ERROR:
06:42:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:42:08:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
06:42:08:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
06:42:08:WU01:FS00:0xa7:ERROR:
06:42:08:WU01:FS00:0xa7:ERROR:Fatal error:
06:42:08:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
06:42:08:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
06:42:08:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
06:42:08:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
06:42:08:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
06:42:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:42:13:WU01:FS00:0xa7:WARNING:Unexpected exit
\x1b[93m06:42:13:WARNING:WU01:FS00:FahCore returned: EARLY_UNIT_END (123 = 0x7b)\x1b[0m
06:43:08:WU01:FS00:Starting
06:43:08:WU01:FS00:Running FahCore: /app/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 269 -checkpoint 10 -np 15
06:43:08:WU01:FS00:Started FahCore on PID 652
06:43:08:WU01:FS00:Core PID:656
06:43:08:WU01:FS00:FahCore 0xa7 started
06:43:08:WU01:FS00:0xa7:*********************** Log Started 2020-11-21T06:43:08Z ***********************
06:43:08:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
06:43:08:WU01:FS00:0xa7:       Type: 0xa7
06:43:08:WU01:FS00:0xa7:       Core: Gromacs
06:43:08:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 652 -checkpoint 10 -np 15
06:43:08:WU01:FS00:0xa7:************************************ CBang *************************************
06:43:08:WU01:FS00:0xa7:       Date: Nov 27 2019
06:43:08:WU01:FS00:0xa7:       Time: 11:26:54
06:43:08:WU01:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
06:43:08:WU01:FS00:0xa7:     Branch: master
06:43:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:43:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:43:08:WU01:FS00:0xa7:             -fno-pie -fPIC
06:43:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:43:08:WU01:FS00:0xa7:       Bits: 64
06:43:08:WU01:FS00:0xa7:       Mode: Release
06:43:08:WU01:FS00:0xa7:************************************ System ************************************
06:43:08:WU01:FS00:0xa7:        CPU: AMD Ryzen 7 3700X 8-Core Processor
06:43:08:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
06:43:08:WU01:FS00:0xa7:       CPUs: 16
06:43:08:WU01:FS00:0xa7:     Memory: 31.36GiB
06:43:08:WU01:FS00:0xa7:Free Memory: 1.25GiB
06:43:08:WU01:FS00:0xa7:    Threads: POSIX_THREADS
06:43:08:WU01:FS00:0xa7: OS Version: 5.8
06:43:08:WU01:FS00:0xa7:Has Battery: false
06:43:08:WU01:FS00:0xa7: On Battery: false
06:43:08:WU01:FS00:0xa7: UTC Offset: 1
06:43:08:WU01:FS00:0xa7:        PID: 656
06:43:08:WU01:FS00:0xa7:        CWD: /config/work
06:43:08:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
06:43:08:WU01:FS00:0xa7:    Version: 0.0.19
06:43:08:WU01:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
06:43:08:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
06:43:08:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
06:43:08:WU01:FS00:0xa7:       Date: Nov 26 2019
06:43:08:WU01:FS00:0xa7:       Time: 00:41:42
06:43:08:WU01:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
06:43:08:WU01:FS00:0xa7:     Branch: master
06:43:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:43:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:43:08:WU01:FS00:0xa7:             -fno-pie
06:43:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:43:08:WU01:FS00:0xa7:       Bits: 64
06:43:08:WU01:FS00:0xa7:       Mode: Release
06:43:08:WU01:FS00:0xa7:************************************ Build *************************************
06:43:08:WU01:FS00:0xa7:       SIMD: avx_256
06:43:08:WU01:FS00:0xa7:********************************************************************************
06:43:08:WU01:FS00:0xa7:Project: 17215 (Run 1649, Clone 0, Gen 47)
06:43:08:WU01:FS00:0xa7:Unit: 0x0000003680fccb0a5fab2b3dc12ec25e
06:43:08:WU01:FS00:0xa7:Reading tar file core.xml
06:43:08:WU01:FS00:0xa7:Reading tar file frame47.tpr
06:43:08:WU01:FS00:0xa7:Digital signatures verified
06:43:08:WU01:FS00:0xa7:Calling: mdrun -s frame47.tpr -o frame47.trr -x frame47.xtc -cpt 10 -nt 15
06:43:08:WU01:FS00:0xa7:Steps: first=11750000 total=250000
06:43:08:WU01:FS00:0xa7:ERROR:
06:43:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:43:08:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
06:43:08:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
06:43:08:WU01:FS00:0xa7:ERROR:
06:43:08:WU01:FS00:0xa7:ERROR:Fatal error:
06:43:08:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
06:43:08:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
06:43:08:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
06:43:08:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
06:43:08:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
06:43:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:43:13:WU01:FS00:0xa7:WARNING:Unexpected exit
\x1b[93m06:43:13:WARNING:WU01:FS00:FahCore returned: EARLY_UNIT_END (123 = 0x7b)\x1b[0m
06:44:08:WU01:FS00:Starting
06:44:08:WU01:FS00:Running FahCore: /app/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 269 -checkpoint 10 -np 15
06:44:08:WU01:FS00:Started FahCore on PID 674
06:44:08:WU01:FS00:Core PID:678
06:44:08:WU01:FS00:FahCore 0xa7 started
06:44:08:WU01:FS00:0xa7:*********************** Log Started 2020-11-21T06:44:08Z ***********************
06:44:08:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
06:44:08:WU01:FS00:0xa7:       Type: 0xa7
06:44:08:WU01:FS00:0xa7:       Core: Gromacs
06:44:08:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 674 -checkpoint 10 -np 15
06:44:08:WU01:FS00:0xa7:************************************ CBang *************************************
06:44:08:WU01:FS00:0xa7:       Date: Nov 27 2019
06:44:08:WU01:FS00:0xa7:       Time: 11:26:54
06:44:08:WU01:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
06:44:08:WU01:FS00:0xa7:     Branch: master
06:44:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:44:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:44:08:WU01:FS00:0xa7:             -fno-pie -fPIC
06:44:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:44:08:WU01:FS00:0xa7:       Bits: 64
06:44:08:WU01:FS00:0xa7:       Mode: Release
06:44:08:WU01:FS00:0xa7:************************************ System ************************************
06:44:08:WU01:FS00:0xa7:        CPU: AMD Ryzen 7 3700X 8-Core Processor
06:44:08:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
06:44:08:WU01:FS00:0xa7:       CPUs: 16
06:44:08:WU01:FS00:0xa7:     Memory: 31.36GiB
06:44:08:WU01:FS00:0xa7:Free Memory: 1.24GiB
06:44:08:WU01:FS00:0xa7:    Threads: POSIX_THREADS
06:44:08:WU01:FS00:0xa7: OS Version: 5.8
06:44:08:WU01:FS00:0xa7:Has Battery: false
06:44:08:WU01:FS00:0xa7: On Battery: false
06:44:08:WU01:FS00:0xa7: UTC Offset: 1
06:44:08:WU01:FS00:0xa7:        PID: 678
06:44:08:WU01:FS00:0xa7:        CWD: /config/work
06:44:08:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
06:44:08:WU01:FS00:0xa7:    Version: 0.0.19
06:44:08:WU01:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
06:44:08:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
06:44:08:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
06:44:08:WU01:FS00:0xa7:       Date: Nov 26 2019
06:44:08:WU01:FS00:0xa7:       Time: 00:41:42
06:44:08:WU01:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
06:44:08:WU01:FS00:0xa7:     Branch: master
06:44:08:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
06:44:08:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
06:44:08:WU01:FS00:0xa7:             -fno-pie
06:44:08:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
06:44:08:WU01:FS00:0xa7:       Bits: 64
06:44:08:WU01:FS00:0xa7:       Mode: Release
06:44:08:WU01:FS00:0xa7:************************************ Build *************************************
06:44:08:WU01:FS00:0xa7:       SIMD: avx_256
06:44:08:WU01:FS00:0xa7:********************************************************************************
06:44:08:WU01:FS00:0xa7:Project: 17215 (Run 1649, Clone 0, Gen 47)
06:44:08:WU01:FS00:0xa7:Unit: 0x0000003680fccb0a5fab2b3dc12ec25e
06:44:08:WU01:FS00:0xa7:Reading tar file core.xml
06:44:08:WU01:FS00:0xa7:Reading tar file frame47.tpr
06:44:08:WU01:FS00:0xa7:Digital signatures verified
06:44:08:WU01:FS00:0xa7:Calling: mdrun -s frame47.tpr -o frame47.trr -x frame47.xtc -cpt 10 -nt 15
06:44:08:WU01:FS00:0xa7:Steps: first=11750000 total=250000
06:44:08:WU01:FS00:0xa7:ERROR:
06:44:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:44:08:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
06:44:08:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
06:44:08:WU01:FS00:0xa7:ERROR:
06:44:08:WU01:FS00:0xa7:ERROR:Fatal error:
06:44:08:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
06:44:08:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
06:44:08:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
06:44:08:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
06:44:08:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
06:44:08:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
06:44:13:WU01:FS00:0xa7:WARNING:Unexpected exit
\x1b[93m06:44:13:WARNING:WU01:FS00:FahCore returned: EARLY_UNIT_END (123 = 0x7b)\x1b[0m
\x1b[93m06:44:13:WARNING:WU01:FS00:Too many errors, failing\x1b[0m
06:44:13:WU01:FS00:Sending unit results: id:01 state:SEND error:FAILED project:17215 run:1649 clone:0 gen:47 core:0xa7 unit:0x0000003680fccb0a5fab2b3dc12ec25e
06:44:13:WU01:FS00:Connecting to 128.252.203.10:8080
06:44:14:WU01:FS00:Server responded WORK_ACK (400)
06:44:14:WU01:FS00:Cleaning up

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 6:22 pm
by foldy
If you get more of these errors you can limit your FAH CPU slot to use less threads like 14 or 12 temporarily. But is should get fixed on server side...

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 6:26 pm
by Neil-B
Not hardware incompatibility simply a settings issue .. that project seems not to like 15 threads .. 15 is not a great number for the old cpu core tbh .. adjusting the slot thread count to 12 in advanced control will stop the issue if you continue to have it .. 14 and 13 aren't great either .. the newer A8 core doesn't have this issue but there are still a good few projects around still using the a7 core and probably will bd for a while.

Reporting here is good as the researchers can put a block on issuing this project to 15 slots.

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 6:45 pm
by Joe_H
I have sent a message to the researcher running this project, assignment to CPU settings that are a multiple of 5 were supposed to be excluded from what I can find.

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 6:49 pm
by psaam0001
Let my legacy quad-core CPU's have a shot at crunching 'em.... :D

Paul

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 6:51 pm
by Gnomuz
Thanks for the replies, it widely confirms what I had understood reading other threads. When I mentioned a "hardware incompatibility", I meant an incompatibility between that very project and the number of threads available (i.e. 15). And the recent post by Joe_H seems to confirm it's a settings issue.

So far, out of a bit more than 300 CPU-computed WU, only 3 on that specific project failed. And as the log shows, it only takes 4'15'' to switch to another WU, which will very likely complete properly with 15 threads. Overall, the loss in production is rather limited, unless, of course, it reaches a point where I would get a much larger percentage of failed WU.

But as foldy and Neil-B suggested, I also envisage reducing the current CPU slot to 12 threads, and creating a second CPU slot with 3 threads to benefit from the 15 available threads at any time. Now, I don't know whether it would be better to fold with :
- a single 15 threads slot, which will suffer from a small percentage of failed WU, but optimizes the CPU
- two slots (12 threads and 3 threads), which should encounter less failed WU, but may be less efficient.

Thanks in advance for any idea and/or personal experience on this possible trade-off.

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 6:56 pm
by sacampb
I got the same error multiple times with 11, bumped it down to 8 resolved the problem for me

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 7:08 pm
by Neil-B
tbh if 15 core slot rarely fails for you stick with it - most of the time the slot count incompatibility is reported and the project appropriately stopped from release for these counts during beta - sounds from Joe_H post that it was spotted but for some reason the hold on release to multiples of 5 didn't stick - so good report :) ... a 15 is normally better than a 12 and a 3 for thoughput/science as long as it doesn't error ... as more projects are release to the a8 core and less to the a7 this will become a thing of the past ... fold on

Re: Domain decomposition issue with project 17215

Posted: Sat Nov 21, 2020 8:16 pm
by Gnomuz
Thanks for the clear answer Neil-B, I'm gonna stick to 15 threads unless I encounter too frequent errors. As moreover it's future-proof with a8, let's keep it simple !