Issues, perhaps bad WU? - 16417

Moderators: Site Moderators, FAHC Science Team

pasamio
Posts: 4
Joined: Thu Apr 02, 2020 7:05 am

Re: Issues, perhaps bad WU? - 16417

Post by pasamio »

Ran into this today with PRCG 14523 (433, 0, 24) where it kept failing with a CPU configuration of 11 or 12, reduced it down to 4 and it seems to have unblocked itself.

Code: Select all

19:41:42:WARNING:WU00:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
19:41:42:WU00:FS00:Starting
19:41:42:WU00:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/v7/osx/64bit/avx/Core_a7.fah/FahCore_a7" -dir 00 -suffix 01 -version 705 -lifeline 58 -checkpoint 15 -np 11
19:41:42:WU00:FS00:Started FahCore on PID 60027
19:41:42:WU00:FS00:Core PID:60028
19:41:42:WU00:FS00:FahCore 0xa7 started
19:41:43:WU00:FS00:0xa7:*********************** Log Started 2020-04-26T19:41:42Z ***********************
19:41:43:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
19:41:43:WU00:FS00:0xa7:       Type: 0xa7
19:41:43:WU00:FS00:0xa7:       Core: Gromacs
19:41:43:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 60027 -checkpoint 15 -np
19:41:43:WU00:FS00:0xa7:             11
19:41:43:WU00:FS00:0xa7:************************************ CBang *************************************
19:41:43:WU00:FS00:0xa7:       Date: Oct 26 2019
19:41:43:WU00:FS00:0xa7:       Time: 03:00:53
19:41:43:WU00:FS00:0xa7:   Revision: 3b1c887e9f30a608262e0d62833b273e843f7c1b
19:41:43:WU00:FS00:0xa7:     Branch: master
19:41:43:WU00:FS00:0xa7:   Compiler: GNU 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.33.8)
19:41:43:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -mmacosx-version-min=10.7
19:41:43:WU00:FS00:0xa7:             -Wno-unused-local-typedefs -stdlib=libc++ -fPIC
19:41:43:WU00:FS00:0xa7:   Platform: darwin 19.0.0
19:41:43:WU00:FS00:0xa7:       Bits: 64
19:41:43:WU00:FS00:0xa7:       Mode: Release
19:41:43:WU00:FS00:0xa7:************************************ System ************************************
19:41:43:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
19:41:43:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:41:43:WU00:FS00:0xa7:       CPUs: 12
19:41:43:WU00:FS00:0xa7:     Memory: 32.00GiB
19:41:43:WU00:FS00:0xa7:Free Memory: 166.48MiB
19:41:43:WU00:FS00:0xa7:    Threads: POSIX_THREADS
19:41:43:WU00:FS00:0xa7: OS Version: 10.14
19:41:43:WU00:FS00:0xa7:Has Battery: true
19:41:43:WU00:FS00:0xa7: On Battery: false
19:41:43:WU00:FS00:0xa7: UTC Offset: -7
19:41:43:WU00:FS00:0xa7:        PID: 60028
19:41:43:WU00:FS00:0xa7:        CWD: /Library/Application Support/FAHClient/work
19:41:43:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
19:41:43:WU00:FS00:0xa7:    Version: 0.0.18
19:41:43:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
19:41:43:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
19:41:43:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
19:41:43:WU00:FS00:0xa7:       Date: Oct 26 2019
19:41:43:WU00:FS00:0xa7:       Time: 03:06:33
19:41:43:WU00:FS00:0xa7:   Revision: fcc08f30b8997509aaba3a213354c363f474e056
19:41:43:WU00:FS00:0xa7:     Branch: master
19:41:43:WU00:FS00:0xa7:   Compiler: GNU 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.33.8)
19:41:43:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -mmacosx-version-min=10.7
19:41:43:WU00:FS00:0xa7:             -Wno-unused-local-typedefs -stdlib=libc++
19:41:43:WU00:FS00:0xa7:   Platform: darwin 19.0.0
19:41:43:WU00:FS00:0xa7:       Bits: 64
19:41:43:WU00:FS00:0xa7:       Mode: Release
19:41:43:WU00:FS00:0xa7:************************************ Build *************************************
19:41:43:WU00:FS00:0xa7:       SIMD: avx_256
19:41:43:WU00:FS00:0xa7:********************************************************************************
19:41:43:WU00:FS00:0xa7:Project: 14523 (Run 433, Clone 0, Gen 24)
19:41:43:WU00:FS00:0xa7:Unit: 0x0000002480fccb0a5e459bdfb691f19b
19:41:43:WU00:FS00:0xa7:Reading tar file core.xml
19:41:43:WU00:FS00:0xa7:Reading tar file frame24.tpr
19:41:43:WU00:FS00:0xa7:Digital signatures verified
19:41:43:WU00:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
19:41:43:WU00:FS00:0xa7:Calling: mdrun -s frame24.tpr -o frame24.trr -x frame24.xtc -cpt 15 -nt 10
19:41:43:WU00:FS00:0xa7:Steps: first=6000000 total=250000
19:41:43:WU00:FS00:0xa7:ERROR:
19:41:43:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
19:41:43:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
19:41:43:WU00:FS00:0xa7:ERROR:Source code file: /Users/buildbot/fah/osx-10.11-64bit-core-a7-avx-release/osx-10.11-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
19:41:43:WU00:FS00:0xa7:ERROR:
19:41:43:WU00:FS00:0xa7:ERROR:Fatal error:
19:41:43:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
19:41:43:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
19:41:43:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
19:41:43:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
19:41:43:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
19:41:43:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
19:41:48:WU00:FS00:0xa7:WARNING:Unexpected exit() call
19:41:48:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
19:41:48:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
19:41:48:WU00:FS00:0xa7:Saving result file md.log
19:41:48:WU00:FS00:0xa7:Saving result file science.log
19:51:43:WARNING:WU00:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
19:51:43:WU00:FS00:Starting
This was at the end of md.log:

Code: Select all

Initializing Domain Decomposition on 10 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.425 nm, LJ-14, atoms 1187 1195
  multi-body bonded interactions: 0.425 nm, Proper Dih., atoms 1187 1195
Minimum cell size due to bonded interactions: 0.467 nm
Maximum distance for 7 constraints, at 120 deg. angles, all-trans: 1.138 nm
Estimated maximum distance required for P-LINCS: 1.138 nm
This distance will limit the DD cell size, you can override this with -rcon
Using 0 separate PME ranks, as there are too few total
 ranks for efficient splitting
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 10 cells with a minimum initial size of 1.423 nm
The maximum allowed number of cells is: X 4 Y 4 Z 3
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Issues, perhaps bad WU? - 16417

Post by PantherX »

Thanks for that. I will notify the researcher about this :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Issues, perhaps bad WU? - 16417

Post by bruce »

I'm a bit surprised that 8 or 9 didn't work. For 10, the factor of 5 does often cause problems but it's not consistent.
Post Reply