As of last night the CPU folding is not working for me, 2 GPUs working just fine.
I'm running Ubuntu and have been trying to kill only the FahCore_a7 process, but not getting any results.
Please advice (pasted log example of error below).
As a result, my disk is filling up as the following line gets printed over and over.....
12441:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
Thanks!
Code: Select all
381:04:30:50:WU02:FS00:Core PID:3167
382:04:30:50:WU02:FS00:FahCore 0xa7 started
383:04:30:51:WU02:FS00:0xa7:*********************** Log Started 2020-06-11T04:30:50Z ***********************
384:04:30:51:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
385:04:30:51:WU02:FS00:0xa7: Type: 0xa7
386:04:30:51:WU02:FS00:0xa7: Core: Gromacs
387:04:30:51:WU02:FS00:0xa7: Args: -dir 02 -suffix 01 -version 706 -lifeline 3163 -checkpoint 15 -np
388:04:30:51:WU02:FS00:0xa7: 22
389:04:30:51:WU02:FS00:0xa7:************************************ CBang *************************************
390:04:30:51:WU02:FS00:0xa7: Date: Nov 5 2019
391:04:30:51:WU02:FS00:0xa7: Time: 06:06:57
392:04:30:51:WU02:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
393:04:30:51:WU02:FS00:0xa7: Branch: master
394:04:30:51:WU02:FS00:0xa7: Compiler: GNU 8.3.0
395:04:30:51:WU02:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
396:04:30:51:WU02:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
397:04:30:51:WU02:FS00:0xa7: Bits: 64
398:04:30:51:WU02:FS00:0xa7: Mode: Release
399:04:30:51:WU02:FS00:0xa7:************************************ System ************************************
400:04:30:51:WU02:FS00:0xa7: CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
401:04:30:51:WU02:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
402:04:30:51:WU02:FS00:0xa7: CPUs: 24
403:04:30:51:WU02:FS00:0xa7: Memory: 15.54GiB
404:04:30:51:WU02:FS00:0xa7:Free Memory: 10.66GiB
405:04:30:51:WU02:FS00:0xa7: Threads: POSIX_THREADS
406:04:30:51:WU02:FS00:0xa7: OS Version: 5.4
407:04:30:51:WU02:FS00:0xa7:Has Battery: false
408:04:30:51:WU02:FS00:0xa7: On Battery: false
409:04:30:51:WU02:FS00:0xa7: UTC Offset: 2
410:04:30:51:WU02:FS00:0xa7: PID: 3167
411:04:30:51:WU02:FS00:0xa7: CWD: /var/lib/fahclient/work
412:04:30:51:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
413:04:30:51:WU02:FS00:0xa7: Version: 0.0.18
414:04:30:51:WU02:FS00:0xa7: Author: Joseph Coffland <[email protected]>
415:04:30:51:WU02:FS00:0xa7: Copyright: 2019 foldingathome.org
416:04:30:51:WU02:FS00:0xa7: Homepage: https://foldingathome.org/
417:04:30:51:WU02:FS00:0xa7: Date: Nov 5 2019
418:04:30:51:WU02:FS00:0xa7: Time: 06:13:26
419:04:30:51:WU02:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
420:04:30:51:WU02:FS00:0xa7: Branch: master
421:04:30:51:WU02:FS00:0xa7: Compiler: GNU 8.3.0
422:04:30:51:WU02:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
423:04:30:51:WU02:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
424:04:30:51:WU02:FS00:0xa7: Bits: 64
425:04:30:51:WU02:FS00:0xa7: Mode: Release
426:04:30:51:WU02:FS00:0xa7:************************************ Build *************************************
427:04:30:51:WU02:FS00:0xa7: SIMD: avx_256
428:04:30:51:WU02:FS00:0xa7:********************************************************************************
429:04:30:51:WU02:FS00:0xa7:Project: 14524 (Run 706, Clone 5, Gen 17)
430:04:30:51:WU02:FS00:0xa7:Unit: 0x0000001f80fccb0a5e781bc15bdeaaff
431:04:30:51:WU02:FS00:0xa7:Reading tar file core.xml
432:04:30:51:WU02:FS00:0xa7:Reading tar file frame17.tpr
433:04:30:51:WU02:FS00:0xa7:Digital signatures verified
434:04:30:51:WU02:FS00:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
435:04:30:51:WU02:FS00:0xa7:Calling: mdrun -s frame17.tpr -o frame17.trr -x frame17.xtc -cpt 15 -nt 21
436:04:30:51:WU02:FS00:0xa7:Steps: first=4250000 total=250000
437:04:30:51:WU02:FS00:0xa7:ERROR:
438:04:30:51:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
439:04:30:51:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
440:04:30:51:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
441:04:30:51:WU02:FS00:0xa7:ERROR:
442:04:30:51:WU02:FS00:0xa7:ERROR:Fatal error:
443:04:30:51:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
444:04:30:51:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
445:04:30:51:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
446:04:30:51:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
447:04:30:51:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
448:04:30:51:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
449:04:30:55:WU02:FS00:0xa7:WARNING:Unexpected exit() call
450:04:30:55:WU02:FS00:0xa7:WARNING:Unexpected exit from science code
451:04:30:55:WU02:FS00:0xa7:Saving result file ../logfile_01.txt
452:04:30:55:WU02:FS00:0xa7:Saving result file md.log
453:04:30:55:WU02:FS00:0xa7:Saving result file science.log
454:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
455:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
n PID 3167
2340:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2341:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2342:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2343:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2344:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2345:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2346:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2789:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2790:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
2
12445:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
12446:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
12447:04:30:55:WU02:FS00:0xa7:Caught signal SIGSEGV(11) on PID 3167
12448:04:30:56:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)