ERROR:There is no domain decomposition for 10 ranks that is

Moderators: Site Moderators, FAHC Science Team

Post Reply
HannuN
Posts: 6
Joined: Sun Mar 29, 2020 5:41 pm

ERROR:There is no domain decomposition for 10 ranks that is

Post by HannuN »

Code: Select all

...
02:36:12:WU01:FS00:Starting
02:36:12:WU01:FS00:Removing old file 'work/01/logfile_01-20200621-020412.txt'
02:36:12:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 23033 -checkpoint 15 -np 11
02:36:12:WU01:FS00:Started FahCore on PID 3582
02:36:12:WU01:FS00:Core PID:3586
02:36:12:WU01:FS00:FahCore 0xa7 started
02:36:13:WU01:FS00:0xa7:*********************** Log Started 2020-06-21T02:36:12Z ***********************
02:36:13:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
02:36:13:WU01:FS00:0xa7:       Type: 0xa7
02:36:13:WU01:FS00:0xa7:       Core: Gromacs
02:36:13:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 3582 -checkpoint 15 -np
02:36:13:WU01:FS00:0xa7:             11
02:36:13:WU01:FS00:0xa7:************************************ CBang *************************************
02:36:13:WU01:FS00:0xa7:       Date: Nov 5 2019
02:36:13:WU01:FS00:0xa7:       Time: 06:06:57
02:36:13:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
02:36:13:WU01:FS00:0xa7:     Branch: master
02:36:13:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:36:13:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
02:36:13:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:36:13:WU01:FS00:0xa7:       Bits: 64
02:36:13:WU01:FS00:0xa7:       Mode: Release
02:36:13:WU01:FS00:0xa7:************************************ System ************************************
02:36:13:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
02:36:13:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
02:36:13:WU01:FS00:0xa7:       CPUs: 12
02:36:13:WU01:FS00:0xa7:     Memory: 31.29GiB
02:36:13:WU01:FS00:0xa7:Free Memory: 26.72GiB
02:36:13:WU01:FS00:0xa7:    Threads: POSIX_THREADS
02:36:13:WU01:FS00:0xa7: OS Version: 5.3
02:36:13:WU01:FS00:0xa7:Has Battery: false
02:36:13:WU01:FS00:0xa7: On Battery: false
02:36:13:WU01:FS00:0xa7: UTC Offset: 2
02:36:13:WU01:FS00:0xa7:        PID: 3586
02:36:13:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
02:36:13:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
02:36:13:WU01:FS00:0xa7:    Version: 0.0.18
02:36:13:WU01:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
02:36:13:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
02:36:13:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
02:36:13:WU01:FS00:0xa7:       Date: Nov 5 2019
02:36:13:WU01:FS00:0xa7:       Time: 06:13:26
02:36:13:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
02:36:13:WU01:FS00:0xa7:     Branch: master
02:36:13:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:36:13:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
02:36:13:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:36:13:WU01:FS00:0xa7:       Bits: 64
02:36:13:WU01:FS00:0xa7:       Mode: Release
02:36:13:WU01:FS00:0xa7:************************************ Build *************************************
02:36:13:WU01:FS00:0xa7:       SIMD: avx_256
02:36:13:WU01:FS00:0xa7:********************************************************************************
02:36:13:WU01:FS00:0xa7:Project: 14523 (Run 915, Clone 2, Gen 50)
02:36:13:WU01:FS00:0xa7:Unit: 0x0000004f80fccb0a5e459bc34b84af55
02:36:13:WU01:FS00:0xa7:Reading tar file core.xml
02:36:13:WU01:FS00:0xa7:Reading tar file frame50.tpr
02:36:13:WU01:FS00:0xa7:Digital signatures verified
02:36:13:WU01:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
02:36:13:WU01:FS00:0xa7:Calling: mdrun -s frame50.tpr -o frame50.trr -x frame50.xtc -cpt 15 -nt 10
02:36:13:WU01:FS00:0xa7:Steps: first=12500000 total=250000
02:36:13:WU01:FS00:0xa7:ERROR:
02:36:13:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:36:13:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
02:36:13:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
02:36:13:WU01:FS00:0xa7:ERROR:
02:36:13:WU01:FS00:0xa7:ERROR:Fatal error:
02:36:13:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
02:36:13:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
02:36:13:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
02:36:13:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
02:36:13:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
02:36:13:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:36:17:WU01:FS00:0xa7:WARNING:Unexpected exit() call
02:36:17:WU01:FS00:0xa7:WARNING:Unexpected exit from science code
02:36:17:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
02:36:17:WU01:FS00:0xa7:Saving result file md.log
02:36:17:WU01:FS00:0xa7:Saving result file science.log
02:36:18:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Mod Edit: Added Code Tags - PantherX
uyaem
Posts: 219
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: ERROR:There is no domain decomposition for 10 ranks that

Post by uyaem »

Did your client dump the WU after, or is it stuck in a loop with this?
If stuck in a loop, temporarily reduce the CPU count to 9 or 8.
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
HannuN
Posts: 6
Joined: Sun Mar 29, 2020 5:41 pm

Re: ERROR:There is no domain decomposition for 10 ranks that

Post by HannuN »

It is stuck in a loop.

I'm fairly fed up with this. It happens more often recently.
I check the clients at most once per 24h.

Code: Select all

1:12:02:WU02:FS00:0xa7:ERROR:
21:12:02:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
21:12:02:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
21:12:02:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
21:12:02:WU02:FS00:0xa7:ERROR:
21:12:02:WU02:FS00:0xa7:ERROR:Fatal error:
21:12:02:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
21:12:02:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
21:12:02:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
21:12:02:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
21:12:02:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
21:12:02:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
21:12:02:WU01:FS01:0x22:Completed 90000 out of 600000 steps (15%)
21:12:07:WU02:FS00:0xa7:WARNING:Unexpected exit() call
21:12:07:WU02:FS00:0xa7:WARNING:Unexpected exit from science code

Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: ERROR:There is no domain decomposition for 10 ranks that

Post by Joe_H »

Then change the CPU thread setting to something else. As shown in your first log, you have 12 CPU threads, half of those are from HT. Using just the 6 FPUs by assigning a value of 6 will get about 75-80% of the possible processing power out of your system, Leaving it set at 11 is always going to adjust down to 10.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
JimboPalmer
Posts: 2522
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: ERROR:There is no domain decomposition for 10 ranks that

Post by JimboPalmer »

Welcome to Folding@Home!

I am going to give some Philosophy, then advice.

If a GPU exists, one thread (F@H calls them CPUs) is devoted to each GPU. In your example, while you have 12 threads, the most you can fold on are 11 as one is feeding data to the GPU.

The science code is called GROMACS, and there is a team of programmers who work on improving it. Then F@H uses GROMACS to fold on CPUs.
https://en.wikipedia.org/wiki/GROMACS

Sadly, GROMACS has issues with 'large' prime numbers and numbers with 'large' prime factors. 2 and 3 are not large, 5 is sometimes large, and 7 and up are always large. In your case, 10 has 2 and 5 as factors and 5 is causing trouble. You will notice that the numbers being recommended only have 2 and 3 as factors.

(During your schooling, they swore you would use that math as a grown up. This is it)

So that is why folks are suggesting 9, 8, or 6 as better choices than 11 or 10.

You are using Linux, which I know little about, but if you have a GUI that support FAHControl here is the directions.

Start fahcontrol

On the screen to the left is a Configure button, click it

Now you get a screen with a Slots tab, click it

On this white field should be a cpu item, click it and then click edit

By default F@H set the number of CPUs to -1 meaning let the software decide.
You can enter any number from 1 to the number of threads your CPU supports.

If you have GPUs, F@H reserves one CPU per GPU to feed it data across the PCIE bus.

1, 2, 3, 4, 6, 8, and 9 are good numbers of CPUs to choose.
5 and 10 may work most of the time. Other numbers will bite you
Type the number you want, and click save.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
HannuN
Posts: 6
Joined: Sun Mar 29, 2020 5:41 pm

Re: ERROR:There is no domain decomposition for 10 ranks that

Post by HannuN »

...
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: ERROR:There is no domain decomposition for 10 ranks that

Post by _r2w_ben »

HannuN, thanks for the reports! Could you include the project number for the most recent occurrence?
Post Reply