Page 1 of 1

Project: 14245 (Run 0, Clone 84, Gen 146)

Posted: Sat Nov 30, 2019 6:38 pm
by DeeGee
I keep getting this workunit sent to my Kubuntu box and it failing instantly. Apparently there was also Project: 13829 (Run 124, Clone 0, Gen 64), that failed at 70% with the same error.

Code: Select all

18:25:44:WU00:FS00:Starting
18:25:44:WU00:FS00:Removing old file './work/00/logfile_01-20191130-175343.txt'
18:25:44:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 18256 -checkpoint 15 -np 11
18:25:44:WU00:FS00:Started FahCore on PID 4729
18:25:44:WU00:FS00:Core PID:4733
18:25:44:WU00:FS00:FahCore 0xa7 started
18:25:44:WU00:FS00:0xa7:*********************** Log Started 2019-11-30T18:25:44Z ***********************
18:25:44:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
18:25:44:WU00:FS00:0xa7:       Type: 0xa7
18:25:44:WU00:FS00:0xa7:       Core: Gromacs
18:25:44:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 4729 -checkpoint 15 -np
18:25:44:WU00:FS00:0xa7:             11
18:25:44:WU00:FS00:0xa7:************************************ CBang *************************************
18:25:44:WU00:FS00:0xa7:       Date: Nov 5 2019
18:25:44:WU00:FS00:0xa7:       Time: 06:06:57
18:25:44:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
18:25:44:WU00:FS00:0xa7:     Branch: master
18:25:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
18:25:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
18:25:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
18:25:44:WU00:FS00:0xa7:       Bits: 64
18:25:44:WU00:FS00:0xa7:       Mode: Release
18:25:44:WU00:FS00:0xa7:************************************ System ************************************
18:25:44:WU00:FS00:0xa7:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
18:25:44:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
18:25:44:WU00:FS00:0xa7:       CPUs: 16
18:25:44:WU00:FS00:0xa7:     Memory: 31.40GiB
18:25:44:WU00:FS00:0xa7:Free Memory: 231.17MiB
18:25:44:WU00:FS00:0xa7:    Threads: POSIX_THREADS
18:25:44:WU00:FS00:0xa7: OS Version: 5.0
18:25:44:WU00:FS00:0xa7:Has Battery: false
18:25:44:WU00:FS00:0xa7: On Battery: false
18:25:44:WU00:FS00:0xa7: UTC Offset: 2
18:25:44:WU00:FS00:0xa7:        PID: 4733
18:25:44:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
18:25:44:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
18:25:44:WU00:FS00:0xa7:    Version: 0.0.18
18:25:44:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
18:25:44:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
18:25:44:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
18:25:44:WU00:FS00:0xa7:       Date: Nov 5 2019
18:25:44:WU00:FS00:0xa7:       Time: 06:13:26
18:25:44:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
18:25:44:WU00:FS00:0xa7:     Branch: master
18:25:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
18:25:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
18:25:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
18:25:44:WU00:FS00:0xa7:       Bits: 64
18:25:44:WU00:FS00:0xa7:       Mode: Release
18:25:44:WU00:FS00:0xa7:************************************ Build *************************************
18:25:44:WU00:FS00:0xa7:       SIMD: avx_256
18:25:44:WU00:FS00:0xa7:********************************************************************************
18:25:44:WU00:FS00:0xa7:Project: 14245 (Run 0, Clone 84, Gen 146)
18:25:44:WU00:FS00:0xa7:Unit: 0x000000d980fccb0a5d6fe0b4a6c93987
18:25:44:WU00:FS00:0xa7:Reading tar file core.xml
18:25:44:WU00:FS00:0xa7:Reading tar file frame146.tpr
18:25:44:WU00:FS00:0xa7:Digital signatures verified
18:25:44:WU00:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
18:25:44:WU00:FS00:0xa7:Calling: mdrun -s frame146.tpr -o frame146.trr -x frame146.xtc -cpt 15 -nt 10
18:25:44:WU00:FS00:0xa7:Steps: first=36500000 total=250000
18:25:44:WU00:FS00:0xa7:ERROR:
18:25:44:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
18:25:44:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
18:25:44:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
18:25:44:WU00:FS00:0xa7:ERROR:
18:25:44:WU00:FS00:0xa7:ERROR:Fatal error:
18:25:44:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
18:25:44:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
18:25:44:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
18:25:44:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
18:25:44:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
18:25:44:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
18:25:49:WU00:FS00:0xa7:WARNING:Unexpected exit() call
18:25:49:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
18:25:49:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
18:25:49:WU00:FS00:0xa7:Saving result file md.log
18:25:49:WU00:FS00:0xa7:Saving result file science.log
18:25:49:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
13829:

Code: Select all

04:30:01:WU00:FS00:0xa7:Completed 1250000 out of 1250000 steps (100%)
04:30:03:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
04:30:03:WU00:FS00:0xa7:Saving result file frame15.trr
04:30:03:WU00:FS00:0xa7:Saving result file md.log
04:30:03:WU00:FS00:0xa7:Saving result file science.log
04:30:03:WU00:FS00:0xa7:Saving result file traj_comp.xtc
04:30:03:WU00:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
04:30:03:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:30:04:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14188 run:1 clone:768 gen:15 core:0xa7 unit:0x000000150002894b5d77e69b2c1adbdf
04:30:04:WU00:FS00:Uploading 9.80MiB to 155.247.166.219
04:30:04:WU00:FS00:Connecting to 155.247.166.219:8080
04:30:10:WU00:FS00:Upload 64.39%
04:30:18:WU00:FS00:Upload 93.71%
04:30:25:WU00:FS00:Upload complete
04:30:25:WU00:FS00:Server responded WORK_ACK (400)
04:30:25:WU00:FS00:Final credit estimate, 35181.00 points
04:30:25:WU00:FS00:Cleaning up
******************************* Date: 2019-11-30 *******************************
11:25:41:WU00:FS00:Connecting to 65.254.110.245:8080
11:25:42:WU00:FS00:Assigned to work server 128.252.203.9
11:25:42:WU00:FS00:Requesting new work unit for slot 00: READY cpu:11 from 128.252.203.9
11:25:42:WU00:FS00:Connecting to 128.252.203.9:8080
11:25:42:WU00:FS00:Downloading 8.14MiB
11:25:44:WU00:FS00:Download complete
11:25:44:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13829 run:124 clone:0 gen:64 core:0xa7 unit:0x0000004d80fccb095d66d49f091a544d
11:25:44:WU00:FS00:Starting
11:25:44:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 18256 -checkpoint 15 -np 11
11:25:44:WU00:FS00:Started FahCore on PID 30223
11:25:44:WU00:FS00:Core PID:30227
11:25:44:WU00:FS00:FahCore 0xa7 started
11:25:44:WU00:FS00:0xa7:*********************** Log Started 2019-11-30T11:25:44Z ***********************
11:25:44:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
11:25:44:WU00:FS00:0xa7:       Type: 0xa7
11:25:44:WU00:FS00:0xa7:       Core: Gromacs
11:25:44:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 30223 -checkpoint 15 -np
11:25:44:WU00:FS00:0xa7:             11
11:25:44:WU00:FS00:0xa7:************************************ CBang *************************************
11:25:44:WU00:FS00:0xa7:       Date: Nov 5 2019
11:25:44:WU00:FS00:0xa7:       Time: 06:06:57
11:25:44:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
11:25:44:WU00:FS00:0xa7:     Branch: master
11:25:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
11:25:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
11:25:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
11:25:44:WU00:FS00:0xa7:       Bits: 64
11:25:44:WU00:FS00:0xa7:       Mode: Release
11:25:44:WU00:FS00:0xa7:************************************ System ************************************
11:25:44:WU00:FS00:0xa7:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:25:44:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:25:44:WU00:FS00:0xa7:       CPUs: 16
11:25:44:WU00:FS00:0xa7:     Memory: 31.40GiB
11:25:44:WU00:FS00:0xa7:Free Memory: 244.66MiB
11:25:44:WU00:FS00:0xa7:    Threads: POSIX_THREADS
11:25:44:WU00:FS00:0xa7: OS Version: 5.0
11:25:44:WU00:FS00:0xa7:Has Battery: false
11:25:44:WU00:FS00:0xa7: On Battery: false
11:25:44:WU00:FS00:0xa7: UTC Offset: 2
11:25:44:WU00:FS00:0xa7:        PID: 30227
11:25:44:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
11:25:44:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
11:25:44:WU00:FS00:0xa7:    Version: 0.0.18
11:25:44:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
11:25:44:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
11:25:44:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
11:25:44:WU00:FS00:0xa7:       Date: Nov 5 2019
11:25:44:WU00:FS00:0xa7:       Time: 06:13:26
11:25:44:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
11:25:44:WU00:FS00:0xa7:     Branch: master
11:25:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
11:25:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
11:25:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
11:25:44:WU00:FS00:0xa7:       Bits: 64
11:25:44:WU00:FS00:0xa7:       Mode: Release
11:25:44:WU00:FS00:0xa7:************************************ Build *************************************
11:25:44:WU00:FS00:0xa7:       SIMD: avx_256
11:25:44:WU00:FS00:0xa7:********************************************************************************
11:25:44:WU00:FS00:0xa7:Project: 13829 (Run 124, Clone 0, Gen 64)
11:25:44:WU00:FS00:0xa7:Unit: 0x0000004d80fccb095d66d49f091a544d
11:25:44:WU00:FS00:0xa7:Reading tar file core.xml
11:25:44:WU00:FS00:0xa7:Reading tar file frame64.tpr
11:25:44:WU00:FS00:0xa7:Digital signatures verified
11:25:44:WU00:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
11:25:44:WU00:FS00:0xa7:Calling: mdrun -s frame64.tpr -o frame64.trr -x frame64.xtc -cpt 15 -nt 10
11:25:44:WU00:FS00:0xa7:Steps: first=8000000 total=125000
11:25:47:WU00:FS00:0xa7:Completed 1 out of 125000 steps (0%)
11:26:58:WU00:FS00:0xa7:Completed 1250 out of 125000 steps (1%)
11:28:08:WU00:FS00:0xa7:Completed 2500 out of 125000 steps (2%)
11:29:20:WU00:FS00:0xa7:Completed 3750 out of 125000 steps (3%)
11:30:33:WU00:FS00:0xa7:Completed 5000 out of 125000 steps (4%)
11:31:43:WU00:FS00:0xa7:Completed 6250 out of 125000 steps (5%)
11:32:53:WU00:FS00:0xa7:Completed 7500 out of 125000 steps (6%)
11:34:04:WU00:FS00:0xa7:Completed 8750 out of 125000 steps (7%)
11:35:14:WU00:FS00:0xa7:Completed 10000 out of 125000 steps (8%)
11:36:25:WU00:FS00:0xa7:Completed 11250 out of 125000 steps (9%)
11:37:36:WU00:FS00:0xa7:Completed 12500 out of 125000 steps (10%)
11:38:47:WU00:FS00:0xa7:Completed 13750 out of 125000 steps (11%)
11:39:58:WU00:FS00:0xa7:Completed 15000 out of 125000 steps (12%)
11:41:10:WU00:FS00:0xa7:Completed 16250 out of 125000 steps (13%)
11:42:22:WU00:FS00:0xa7:Completed 17500 out of 125000 steps (14%)
11:43:32:WU00:FS00:0xa7:Completed 18750 out of 125000 steps (15%)
11:44:42:WU00:FS00:0xa7:Completed 20000 out of 125000 steps (16%)
11:45:53:WU00:FS00:0xa7:Completed 21250 out of 125000 steps (17%)
11:47:03:WU00:FS00:0xa7:Completed 22500 out of 125000 steps (18%)
11:48:14:WU00:FS00:0xa7:Completed 23750 out of 125000 steps (19%)
11:49:25:WU00:FS00:0xa7:Completed 25000 out of 125000 steps (20%)
11:50:33:WU00:FS00:0xa7:Completed 26250 out of 125000 steps (21%)
11:51:39:WU00:FS00:0xa7:Completed 27500 out of 125000 steps (22%)
11:52:44:WU00:FS00:0xa7:Completed 28750 out of 125000 steps (23%)
11:53:55:WU00:FS00:0xa7:Completed 30000 out of 125000 steps (24%)
11:55:06:WU00:FS00:0xa7:Completed 31250 out of 125000 steps (25%)
11:56:17:WU00:FS00:0xa7:Completed 32500 out of 125000 steps (26%)
11:57:26:WU00:FS00:0xa7:Completed 33750 out of 125000 steps (27%)
11:58:37:WU00:FS00:0xa7:Completed 35000 out of 125000 steps (28%)
11:59:47:WU00:FS00:0xa7:Completed 36250 out of 125000 steps (29%)
12:00:55:WU00:FS00:0xa7:Completed 37500 out of 125000 steps (30%)
12:02:03:WU00:FS00:0xa7:Completed 38750 out of 125000 steps (31%)
12:03:11:WU00:FS00:0xa7:Completed 40000 out of 125000 steps (32%)
12:04:21:WU00:FS00:0xa7:Completed 41250 out of 125000 steps (33%)
12:05:30:WU00:FS00:0xa7:Completed 42500 out of 125000 steps (34%)
12:06:40:WU00:FS00:0xa7:Completed 43750 out of 125000 steps (35%)
12:07:51:WU00:FS00:0xa7:Completed 45000 out of 125000 steps (36%)
12:09:00:WU00:FS00:0xa7:Completed 46250 out of 125000 steps (37%)
12:10:08:WU00:FS00:0xa7:Completed 47500 out of 125000 steps (38%)
12:11:18:WU00:FS00:0xa7:Completed 48750 out of 125000 steps (39%)
12:12:28:WU00:FS00:0xa7:Completed 50000 out of 125000 steps (40%)
12:13:44:WU00:FS00:0xa7:Completed 51250 out of 125000 steps (41%)
12:14:57:WU00:FS00:0xa7:Completed 52500 out of 125000 steps (42%)
12:16:09:WU00:FS00:0xa7:Completed 53750 out of 125000 steps (43%)
12:17:22:WU00:FS00:0xa7:Completed 55000 out of 125000 steps (44%)
12:18:36:WU00:FS00:0xa7:Completed 56250 out of 125000 steps (45%)
12:19:50:WU00:FS00:0xa7:Completed 57500 out of 125000 steps (46%)
12:21:03:WU00:FS00:0xa7:Completed 58750 out of 125000 steps (47%)
12:22:14:WU00:FS00:0xa7:Completed 60000 out of 125000 steps (48%)
12:23:29:WU00:FS00:0xa7:Completed 61250 out of 125000 steps (49%)
12:24:43:WU00:FS00:0xa7:Completed 62500 out of 125000 steps (50%)
12:25:55:WU00:FS00:0xa7:Completed 63750 out of 125000 steps (51%)
12:27:08:WU00:FS00:0xa7:Completed 65000 out of 125000 steps (52%)
12:28:21:WU00:FS00:0xa7:Completed 66250 out of 125000 steps (53%)
12:29:33:WU00:FS00:0xa7:Completed 67500 out of 125000 steps (54%)
12:30:46:WU00:FS00:0xa7:Completed 68750 out of 125000 steps (55%)
12:31:59:WU00:FS00:0xa7:Completed 70000 out of 125000 steps (56%)
12:33:12:WU00:FS00:0xa7:Completed 71250 out of 125000 steps (57%)
12:34:25:WU00:FS00:0xa7:Completed 72500 out of 125000 steps (58%)
12:35:40:WU00:FS00:0xa7:Completed 73750 out of 125000 steps (59%)
12:36:54:WU00:FS00:0xa7:Completed 75000 out of 125000 steps (60%)
12:38:09:WU00:FS00:0xa7:Completed 76250 out of 125000 steps (61%)
12:39:23:WU00:FS00:0xa7:Completed 77500 out of 125000 steps (62%)
12:40:37:WU00:FS00:0xa7:Completed 78750 out of 125000 steps (63%)
12:41:49:WU00:FS00:0xa7:Completed 80000 out of 125000 steps (64%)
12:43:02:WU00:FS00:0xa7:Completed 81250 out of 125000 steps (65%)
12:44:15:WU00:FS00:0xa7:Completed 82500 out of 125000 steps (66%)
12:45:27:WU00:FS00:0xa7:Completed 83750 out of 125000 steps (67%)
12:46:39:WU00:FS00:0xa7:Completed 85000 out of 125000 steps (68%)
12:47:51:WU00:FS00:0xa7:Completed 86250 out of 125000 steps (69%)
12:49:10:WU00:FS00:0xa7:Completed 87500 out of 125000 steps (70%)
17:03:43:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
17:03:43:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
17:03:43:WU00:FS00:0xa7:ERROR:
17:03:43:WU00:FS00:0xa7:ERROR:Fatal error:
17:03:43:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
17:03:43:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
17:03:43:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
17:03:43:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
17:03:43:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
17:03:43:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
17:03:48:WU00:FS00:0xa7:WARNING:Unexpected exit() call
17:03:48:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
17:03:48:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
17:03:48:WU00:FS00:0xa7:Saving result file md.log
17:03:48:WU00:FS00:0xa7:Saving result file science.log
17:03:48:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

Re: Project: 14245 (Run 0, Clone 84, Gen 146)

Posted: Sat Dec 14, 2019 11:33 am
by DeeGee
After having normal workunits for couple of weeks, I now got another one with the same problem. And the way it errors out, the client just keeps trying the same borked workunit again and again. I need to delete the actual workunit folder to clear it up.

Project: 14246 (Run 0, Clone 63, Gen 143)

Code: Select all

00:47:40:WU00:FS00:0xa7:Completed 1250000 out of 1250000 steps (100%)
00:47:41:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
00:47:41:WU00:FS00:0xa7:Saving result file frame14.trr
00:47:41:WU00:FS00:0xa7:Saving result file md.log
00:47:41:WU00:FS00:0xa7:Saving result file science.log
00:47:41:WU00:FS00:0xa7:Saving result file traj_comp.xtc
00:47:41:WU00:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
00:47:42:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
00:47:42:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14188 run:0 clone:371 gen:14 core:0xa7 unit:0x000000110002894b5d77e723a4130b2a
00:47:42:WU00:FS00:Uploading 12.72MiB to 155.247.166.219
00:47:42:WU00:FS00:Connecting to 155.247.166.219:8080
00:47:48:WU00:FS00:Upload 74.20%
00:47:51:WU00:FS00:Upload complete
00:47:51:WU00:FS00:Server responded WORK_ACK (400)
00:47:51:WU00:FS00:Final credit estimate, 36011.00 points
00:47:51:WU00:FS00:Cleaning up
******************************* Date: 2019-12-14 *******************************
11:12:11:WU00:FS00:Connecting to 65.254.110.245:8080
11:15:19:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
11:15:19:WU00:FS00:0xa7:    Version: 0.0.18
11:15:19:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
11:15:19:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
11:15:19:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
11:15:19:WU00:FS00:0xa7:       Date: Nov 5 2019
11:15:19:WU00:FS00:0xa7:       Time: 06:13:26
11:15:19:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
11:15:19:WU00:FS00:0xa7:     Branch: master
11:15:19:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
11:15:19:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
11:15:19:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
11:15:19:WU00:FS00:0xa7:       Bits: 64
11:15:19:WU00:FS00:0xa7:       Mode: Release
11:15:19:WU00:FS00:0xa7:************************************ Build *************************************
11:15:19:WU00:FS00:0xa7:       SIMD: avx_256
11:15:19:WU00:FS00:0xa7:********************************************************************************
11:15:19:WU00:FS00:0xa7:Project: 14246 (Run 0, Clone 63, Gen 143)
11:15:19:WU00:FS00:0xa7:Unit: 0x000000e480fccb0a5d6fe220565baff4
11:15:19:WU00:FS00:0xa7:Reading tar file core.xml
11:15:19:WU00:FS00:0xa7:Reading tar file frame143.tpr
11:15:19:WU00:FS00:0xa7:Digital signatures verified
11:15:19:WU00:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
11:15:19:WU00:FS00:0xa7:Calling: mdrun -s frame143.tpr -o frame143.trr -x frame143.xtc -cpt 15 -nt 10
11:15:19:WU00:FS00:0xa7:Steps: first=35750000 total=250000
11:15:19:WU00:FS00:0xa7:ERROR:
11:15:19:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
11:15:19:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
11:15:19:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
11:15:19:WU00:FS00:0xa7:ERROR:
11:15:19:WU00:FS00:0xa7:ERROR:Fatal error:
11:15:19:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
11:15:19:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
11:15:19:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
11:15:19:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
11:15:19:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
11:15:19:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
11:15:24:WU00:FS00:0xa7:WARNING:Unexpected exit() call
11:15:24:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
11:15:24:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
11:15:24:WU00:FS00:0xa7:Saving result file md.log
11:15:24:WU00:FS00:0xa7:Saving result file science.log
11:15:24:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
11:16:19:WU00:FS00:Starting
11:16:19:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 1820 -checkpoint 15 -np 11
11:16:19:WU00:FS00:Started FahCore on PID 1749
11:16:19:WU00:FS00:Core PID:1753
11:16:19:WU00:FS00:FahCore 0xa7 started
11:16:19:WU00:FS00:0xa7:*********************** Log Started 2019-12-14T11:16:19Z ***********************
11:16:19:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
11:16:19:WU00:FS00:0xa7:       Type: 0xa7
11:16:19:WU00:FS00:0xa7:       Core: Gromacs
11:16:19:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 1749 -checkpoint 15 -np
11:16:19:WU00:FS00:0xa7:             11
11:16:19:WU00:FS00:0xa7:************************************ CBang *************************************
11:16:19:WU00:FS00:0xa7:       Date: Nov 5 2019
11:16:19:WU00:FS00:0xa7:       Time: 06:06:57
11:16:19:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
11:16:19:WU00:FS00:0xa7:     Branch: master
11:16:19:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
11:16:19:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
11:16:19:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
11:16:19:WU00:FS00:0xa7:       Bits: 64
11:16:19:WU00:FS00:0xa7:       Mode: Release
11:16:19:WU00:FS00:0xa7:************************************ System ************************************
11:16:19:WU00:FS00:0xa7:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:16:19:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:16:19:WU00:FS00:0xa7:       CPUs: 16
11:16:19:WU00:FS00:0xa7:     Memory: 31.40GiB
11:16:19:WU00:FS00:0xa7:Free Memory: 3.46GiB
11:16:19:WU00:FS00:0xa7:    Threads: POSIX_THREADS
11:16:19:WU00:FS00:0xa7: OS Version: 5.0
11:16:19:WU00:FS00:0xa7:Has Battery: false
11:16:19:WU00:FS00:0xa7: On Battery: false
11:16:19:WU00:FS00:0xa7: UTC Offset: 2
11:16:19:WU00:FS00:0xa7:        PID: 1753
11:16:19:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
11:16:19:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
11:16:19:WU00:FS00:0xa7:    Version: 0.0.18
11:16:19:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
11:16:19:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
11:16:19:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
11:16:19:WU00:FS00:0xa7:       Date: Nov 5 2019
11:16:19:WU00:FS00:0xa7:       Time: 06:13:26
11:16:19:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
11:16:19:WU00:FS00:0xa7:     Branch: master
11:16:19:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
11:16:19:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
11:16:19:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
11:16:19:WU00:FS00:0xa7:       Bits: 64
11:16:19:WU00:FS00:0xa7:       Mode: Release
11:16:19:WU00:FS00:0xa7:************************************ Build *************************************
11:16:19:WU00:FS00:0xa7:       SIMD: avx_256
11:16:19:WU00:FS00:0xa7:********************************************************************************
11:16:19:WU00:FS00:0xa7:Project: 14246 (Run 0, Clone 63, Gen 143)
11:16:19:WU00:FS00:0xa7:Unit: 0x000000e480fccb0a5d6fe220565baff4
11:16:19:WU00:FS00:0xa7:Reading tar file core.xml
11:16:19:WU00:FS00:0xa7:Reading tar file frame143.tpr
11:16:19:WU00:FS00:0xa7:Digital signatures verified
11:16:19:WU00:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
11:16:19:WU00:FS00:0xa7:Calling: mdrun -s frame143.tpr -o frame143.trr -x frame143.xtc -cpt 15 -nt 10
11:16:19:WU00:FS00:0xa7:Steps: first=35750000 total=250000
11:16:19:WU00:FS00:0xa7:ERROR:
11:16:19:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
11:16:19:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
11:16:19:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
11:16:19:WU00:FS00:0xa7:ERROR:
11:16:19:WU00:FS00:0xa7:ERROR:Fatal error:
11:16:19:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
11:16:19:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
11:16:19:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
11:16:19:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
11:16:19:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
11:16:19:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
11:16:24:WU00:FS00:0xa7:WARNING:Unexpected exit() call
11:16:24:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
11:16:24:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
11:16:24:WU00:FS00:0xa7:Saving result file md.log
11:16:24:WU00:FS00:0xa7:Saving result file science.log
11:16:24:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
11:17:19:WU00:FS00:Starting

Re: Project: 14245 (Run 0, Clone 84, Gen 146)

Posted: Sat Dec 14, 2019 3:42 pm
by bollix47
It's not clear from your posts, which don't include your configuration section, why your slot is trying 10 or 11 cpus for folding but in all cases 11 (11x1) won't work and for some projects neither will 10 (2x5) because of a prime number. The only prime numbers that work with consistency are 1 & 3. Try reconfiguring your CPU slot to use either 12(3x4) or 9(3x3) CPUs.

Re: Project: 14245 (Run 0, Clone 84, Gen 146)

Posted: Sat Dec 14, 2019 5:36 pm
by bruce
Documented the loop condition as a bug in FAHCore_a7.
https://github.com/FoldingAtHome/fah-issues/issues/1307

Re: Project: 14245 (Run 0, Clone 84, Gen 146)

Posted: Sun Dec 15, 2019 4:48 pm
by DeeGee
@bollix47, Ah I forgot the thread number to 11. I'll change it to 12, as that should leave enough threads free for other stuff running on the same machine. Although if 10 isn't working properly, then the client should drop the thread number automatically to something else.

Re: Project: 14245 (Run 0, Clone 84, Gen 146)

Posted: Sun Jun 21, 2020 8:29 am
by uyaem
This project seems to not work with -np 21 (resulting in 16 + 5 PME), I had one fail over night.
I believe it is possible to set the project to not be assigned to certain configurations, which is why I'm reporting.

Unlike for the others is this thread, FAHClient cleaned up immediately and grabbed another :)
Maybe that's an improvement in version 7.6.13?

Here are the relevant parts of the log:

Code: Select all

05:25:45:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:14245 run:0 clone:79 gen:270 core:0xa7 unit:0x000001d380fccb0a5d6fe0b4d8c36094
[...]
05:25:45:WU02:FS00:Running FahCore: \"C:\\Program Files (x86)\\FAHClient/FAHCoreWrapper.exe\" C:\\Users\\X\\AppData\\Roaming\\FAHClient\\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 11704 -checkpoint 15 -np 21
[...]
05:25:45:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
05:25:45:WU02:FS00:0xa7:ERROR:Source code file: C:\\build\\fah\\core-a7-avx-release\\windows-10-64bit-core-a7-avx-release\\gromacs-core\\build\\gromacs\\src\\gromacs\\mdlib\\domdec.c, line: 6902
05:25:45:WU02:FS00:0xa7:ERROR:
05:25:45:WU02:FS00:0xa7:ERROR:Fatal error:
05:25:45:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
[...]
05:25:50:WU02:FS00:0xa7:Folding@home Core Shutdown: BAD_WORK_UNIT
05:25:51:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:25:51:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:14245 run:0 clone:79 gen:270 core:0xa7 unit:0x000001d380fccb0a5d6fe0b4d8c36094

Re: Project: 14245 (Run 0, Clone 84, Gen 146)

Posted: Sun Jun 21, 2020 1:57 pm
by Joe_H
That is a difference between the Windows and Linux client, or more exactly the folding core on each OS. Which error code is being set changes how the client counts errors before discarding the WU and moving on to the next. So the Windows client reaches the error limit and on Linux the WU goes into a loop of restarting.