I currently have about 130 CPU cores folding, most of them in my servers. I have one AMD GPU running fine, but for some days now I try to get fahclient to run on my workstations NVIDIA GPU without success so far.
All my computers run Linux. On my workstation I have OpenSUSE LEAP 15.1 with current NVIDIA drivers.
I installed fahclient 7.6.9 from the 64 Bit RPM.
The client does recognize the GPU, loads some WU, starts folding, but almost immediately stops with an error.
Here are some logs:
Code: Select all
16:52:33:WU00:FS01:Downloading 11.98MiB
16:52:39:WU00:FS01:Download 43.31%
16:52:45:WU00:FS01:Download 66.80%
16:52:49:WU00:FS01:Download complete
16:52:49:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:11747 run:0 clone:9683 gen:10 core:0x22 unit:0x0000001b8ca304e75e6baf22f72b2965
16:52:49:WU00:FS01:Starting
16:52:49:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 1475 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
16:52:49:WU00:FS01:Started FahCore on PID 28973
16:52:49:WU00:FS01:Core PID:28977
16:52:49:WU00:FS01:FahCore 0x22 started
16:52:50:WU00:FS01:0x22:*********************** Log Started 2020-04-22T16:52:49Z ***********************
16:52:50:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
16:52:50:WU00:FS01:0x22: Type: 0x22
16:52:50:WU00:FS01:0x22: Core: Core22
16:52:50:WU00:FS01:0x22: Website: https://foldingathome.org/
16:52:50:WU00:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
16:52:50:WU00:FS01:0x22: Author: John Chodera <[email protected]> and Rafal Wiewiora
16:52:50:WU00:FS01:0x22: <[email protected]>
16:52:50:WU00:FS01:0x22: Args: -dir 00 -suffix 01 -version 706 -lifeline 28973 -checkpoint 15
16:52:50:WU00:FS01:0x22: -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
16:52:50:WU00:FS01:0x22: 0 -gpu 0
16:52:50:WU00:FS01:0x22: Config: <none>
16:52:50:WU00:FS01:0x22:************************************ Build *************************************
16:52:50:WU00:FS01:0x22: Version: 0.0.2
16:52:50:WU00:FS01:0x22: Date: Dec 6 2019
16:52:50:WU00:FS01:0x22: Time: 21:20:17
16:52:50:WU00:FS01:0x22: Repository: Git
16:52:50:WU00:FS01:0x22: Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
16:52:50:WU00:FS01:0x22: Branch: core22
16:52:50:WU00:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
16:52:50:WU00:FS01:0x22: Options: -std=gnu++98 -O3 -funroll-loops
16:52:50:WU00:FS01:0x22: Platform: linux2 4.9.87-linuxkit-aufs
16:52:50:WU00:FS01:0x22: Bits: 64
16:52:50:WU00:FS01:0x22: Mode: Release
16:52:50:WU00:FS01:0x22:************************************ System ************************************
16:52:50:WU00:FS01:0x22: CPU: AMD Ryzen 9 3900X 12-Core Processor
16:52:50:WU00:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:52:50:WU00:FS01:0x22: CPUs: 24
16:52:50:WU00:FS01:0x22: Memory: 62.85GiB
16:52:50:WU00:FS01:0x22:Free Memory: 48.94GiB
16:52:50:WU00:FS01:0x22: Threads: POSIX_THREADS
16:52:50:WU00:FS01:0x22: OS Version: 4.12
16:52:50:WU00:FS01:0x22:Has Battery: false
16:52:50:WU00:FS01:0x22: On Battery: false
16:52:50:WU00:FS01:0x22: UTC Offset: 2
16:52:50:WU00:FS01:0x22: PID: 28977
16:52:50:WU00:FS01:0x22: CWD: /var/lib/fahclient/work
16:52:50:WU00:FS01:0x22: OS: Linux 4.12.14-lp151.28.44-default x86_64
16:52:50:WU00:FS01:0x22: OS Arch: AMD64
16:52:50:WU00:FS01:0x22:********************************************************************************
16:52:50:WU00:FS01:0x22:Project: 11747 (Run 0, Clone 9683, Gen 10)
16:52:50:WU00:FS01:0x22:Unit: 0x0000001b8ca304e75e6baf22f72b2965
16:52:50:WU00:FS01:0x22:Reading tar file core.xml
16:52:50:WU00:FS01:0x22:Reading tar file integrator.xml
16:52:50:WU00:FS01:0x22:Reading tar file state.xml
16:52:51:WU00:FS01:0x22:Reading tar file system.xml
16:52:52:WU00:FS01:0x22:Digital signatures verified
16:52:52:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
16:52:52:WU00:FS01:0x22:Version 0.0.2
16:52:52:WU00:FS01:0x22:ERROR:exception: There is no registered Platform called "OpenCL"
16:52:52:WU00:FS01:0x22:Saving result file ../logfile_01.txt
16:52:52:WU00:FS01:0x22:Saving result file science.log
16:52:52:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
16:52:52:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
16:52:52:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11747 run:0 clone:9683 gen:10 core:0x22 unit:0x0000001b8ca304e75e6baf22f72b2965
Code: Select all
16:52:52:WU00:FS01:0x22:ERROR:exception: There is no registered Platform called "OpenCL"
Another log from today:
Code: Select all
18:40:23:WU00:FS01:Assigned to work server 128.252.203.10
18:40:23:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:TU116 [GeForce GTX 1660] from 128.252.203.10
18:40:23:WU00:FS01:Connecting to 128.252.203.10:8080
18:42:34:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
18:42:34:WU00:FS01:Connecting to 128.252.203.10:80
18:43:37:WU00:FS01:Downloading 29.59MiB
18:43:43:WU00:FS01:Download 13.31%
18:43:49:WU00:FS01:Download 27.25%
18:43:55:WU00:FS01:Download 43.93%
18:44:01:WU00:FS01:Download 56.82%
18:44:07:WU00:FS01:Download 69.70%
18:44:13:WU00:FS01:Download 83.64%
18:44:19:WU00:FS01:Download 95.26%
18:44:21:WU00:FS01:Download complete
18:44:21:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:11761 run:0 clone:1500 gen:45 core:0x22 unit:0x0000005480fccb0a5e6d7d2c494df012
18:44:21:WU00:FS01:Starting
18:44:21:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 1577 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
18:44:21:WU00:FS01:Started FahCore on PID 2322
18:44:21:WU00:FS01:Core PID:2326
18:44:21:WU00:FS01:FahCore 0x22 started
18:44:21:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:44:21:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11761 run:0 clone:1500 gen:45 core:0x22 unit:0x0000005480fccb0a5e6d7d2c494df012
18:44:21:WU00:FS01:Uploading 7.00KiB to 128.252.203.10
18:44:21:WU00:FS01:Connecting to 128.252.203.10:8080
And another one:
Code: Select all
18:44:22:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:TU116 [GeForce GTX 1660] from 52.224.109.74
18:44:22:WU01:FS01:Connecting to 52.224.109.74:8080
18:45:39:WU01:FS01:Downloading 161.51MiB
18:45:45:WU01:FS01:Download 2.32%
18:45:51:WU01:FS01:Download 4.60%
[...]
18:49:57:WU01:FS01:Download 97.40%
18:50:03:WU01:FS01:Download 99.99%
18:50:03:WU01:FS01:Download complete
18:50:03:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13877 run:0 clone:1458 gen:37 core:0x22 unit:0x0000003234e06d4a5e80cfeac2e5a163
18:50:03:WU01:FS01:Starting
18:50:03:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 1577 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
18:50:03:WU01:FS01:Started FahCore on PID 2421
18:50:03:WU01:FS01:Core PID:2425
18:50:03:WU01:FS01:FahCore 0x22 started
18:50:03:WU01:FS01:0x22:*********************** Log Started 2020-04-22T18:50:03Z ***********************
18:50:03:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
18:50:03:WU01:FS01:0x22: Type: 0x22
18:50:03:WU01:FS01:0x22: Core: Core22
18:50:03:WU01:FS01:0x22: Website: https://foldingathome.org/
18:50:03:WU01:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
18:50:03:WU01:FS01:0x22: Author: John Chodera <[email protected]> and Rafal Wiewiora
18:50:03:WU01:FS01:0x22: <[email protected]>
18:50:03:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 706 -lifeline 2421 -checkpoint 15
18:50:03:WU01:FS01:0x22: -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
18:50:03:WU01:FS01:0x22: 0 -gpu 0
18:50:03:WU01:FS01:0x22: Config: <none>
18:50:03:WU01:FS01:0x22:************************************ Build *************************************
18:50:03:WU01:FS01:0x22: Version: 0.0.2
18:50:03:WU01:FS01:0x22: Date: Dec 6 2019
18:50:03:WU01:FS01:0x22: Time: 21:20:17
18:50:03:WU01:FS01:0x22: Repository: Git
18:50:03:WU01:FS01:0x22: Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
18:50:03:WU01:FS01:0x22: Branch: core22
18:50:03:WU01:FS01:0x22: Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:50:03:WU01:FS01:0x22: Options: -std=gnu++98 -O3 -funroll-loops
18:50:03:WU01:FS01:0x22: Platform: linux2 4.9.87-linuxkit-aufs
18:50:03:WU01:FS01:0x22: Bits: 64
18:50:03:WU01:FS01:0x22: Mode: Release
18:50:03:WU01:FS01:0x22:************************************ System ************************************
18:50:03:WU01:FS01:0x22: CPU: AMD Ryzen 9 3900X 12-Core Processor
18:50:03:WU01:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
18:50:03:WU01:FS01:0x22: CPUs: 24
18:50:03:WU01:FS01:0x22: Memory: 62.85GiB
18:50:03:WU01:FS01:0x22:Free Memory: 49.31GiB
18:50:03:WU01:FS01:0x22: Threads: POSIX_THREADS
18:50:03:WU01:FS01:0x22: OS Version: 4.12
18:50:03:WU01:FS01:0x22:Has Battery: false
18:50:03:WU01:FS01:0x22: On Battery: false
18:50:03:WU01:FS01:0x22: UTC Offset: 2
18:50:03:WU01:FS01:0x22: PID: 2425
18:50:03:WU01:FS01:0x22: CWD: /var/lib/fahclient/work
18:50:03:WU01:FS01:0x22: OS: Linux 4.12.14-lp151.28.44-default x86_64
18:50:03:WU01:FS01:0x22: OS Arch: AMD64
18:50:03:WU01:FS01:0x22:********************************************************************************
18:50:03:WU01:FS01:0x22:Project: 13877 (Run 0, Clone 1458, Gen 37)
18:50:03:WU01:FS01:0x22:Unit: 0x0000003234e06d4a5e80cfeac2e5a163
18:50:03:WU01:FS01:0x22:Reading tar file core.xml
18:50:03:WU01:FS01:0x22:Reading tar file integrator.xml
18:50:03:WU01:FS01:0x22:Reading tar file state.xml
18:50:03:WU01:FS01:0x22:Reading tar file system.xml
18:50:04:WU01:FS01:0x22:Digital signatures verified
18:50:04:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
18:50:04:WU01:FS01:0x22:Version 0.0.2
18:50:04:WU01:FS01:0x22:ERROR:exception: There is no registered Platform called "OpenCL"
18:50:04:WU01:FS01:0x22:Saving result file ../logfile_01.txt
18:50:04:WU01:FS01:0x22:Saving result file science.log
18:50:04:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:50:04:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:50:04:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13877 run:0 clone:1458 gen:37 core:0x22 unit:0x0000003234e06d4a5e80cfeac2e5a163
18:50:04:WU01:FS01:Uploading 7.00KiB to 52.224.109.74
I have the impression my NVIDIA CUDA/OpenCL installation is broken somehow.
I tried to check the installation as thoroughly as possible, but I just can't find the error.
Here's what I have found so far:
nvidia-smi tells me the installed driver and CUDA version:
Code: Select all
andreas@ws1:~> nvidia-smi
Wed Apr 22 20:17:41 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1660 Off | 00000000:2D:00.0 On | N/A |
| 47% 36C P8 9W / 130W | 870MiB / 5941MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3875 G /usr/bin/X 615MiB |
| 0 7417 G kwin_x11 98MiB |
| 0 7422 G /usr/bin/krunner 2MiB |
| 0 7468 G /usr/bin/nextcloud 3MiB |
| 0 9004 G /usr/lib64/firefox/firefox 2MiB |
| 0 21618 G /usr/bin/plasmashell 138MiB |
+-----------------------------------------------------------------------------+
Programs like hashcat are actually able to use the GPU:
Code: Select all
andreas@ws1:~> hashcat -b
hashcat (v3.00) starting in benchmark-mode...
OpenCL Platform #1: NVIDIA Corporation
======================================
- Device #1: GeForce GTX 1660, 1485/5941 MB allocatable, 22MCU
Hashtype: MD4
Speed.Dev.#1.: 32367.8 MH/s (95.09ms)
Hashtype: MD5
Speed.Dev.#1.: 17449.0 MH/s (97.95ms)
Hashtype: Half MD5
Speed.Dev.#1.: 11695.3 MH/s (96.60ms)
Hashtype: SHA1
Speed.Dev.#1.: 6489.4 MH/s (97.61ms)
...
Code: Select all
*********************** Log Started 2020-04-22T18:23:14Z ***********************
18:23:14:****************************** FAHClient ******************************
18:23:14: Version: 7.6.9
18:23:14: Author: Joseph Coffland <[email protected]>
18:23:14: Copyright: 2020 foldingathome.org
18:23:14: Homepage: https://foldingathome.org/
18:23:14: Date: Apr 17 2020
18:23:14: Time: 18:11:30
18:23:14: Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
18:23:14: Branch: master
18:23:14: Compiler: GNU 4.9.4
18:23:14: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
18:23:14: -funroll-loops
18:23:14: Platform: linux2 4.19.0-5-amd64
18:23:14: Bits: 64
18:23:14: Mode: Release
18:23:14: Args: --child /etc/fahclient/config.xml --run-as fahclient
18:23:14: --pid-file=/var/run/fahclient.pid --daemon
18:23:14: Config: /etc/fahclient/config.xml
18:23:14:******************************** CBang ********************************
18:23:14: Date: Apr 17 2020
18:23:14: Time: 18:10:08
18:23:14: Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
18:23:14: Branch: master
18:23:14: Compiler: GNU 4.9.4
18:23:14: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
18:23:14: -funroll-loops -fPIC
18:23:14: Platform: linux2 4.19.0-5-amd64
18:23:14: Bits: 64
18:23:14: Mode: Release
18:23:14:******************************* System ********************************
18:23:14: CPU: AMD Ryzen 9 3900X 12-Core Processor
18:23:14: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
18:23:14: CPUs: 24
18:23:14: Memory: 62.85GiB
18:23:14: Free Memory: 49.88GiB
18:23:14: Threads: POSIX_THREADS
18:23:14: OS Version: 4.12
18:23:14: Has Battery: false
18:23:14: On Battery: false
18:23:14: UTC Offset: 2
18:23:14: PID: 1577
18:23:14: CWD: /var/lib/fahclient
18:23:14: OS: Linux 4.12.14-lp151.28.44-default x86_64
18:23:14: OS Arch: AMD64
18:23:14: GPUs: 1
18:23:14: GPU 0: Bus:45 Slot:0 Func:0 NVIDIA:7 TU116 [GeForce GTX 1660]
18:23:14: CUDA Device 0: Platform:0 Device:0 Bus:45 Slot:0 Compute:7.5 Driver:10.2
18:23:14:OpenCL Device 0: Platform:0 Device:0 Bus:45 Slot:0 Compute:1.2 Driver:440.82
18:23:14:******************************* libFAH ********************************
18:23:14: Date: Apr 15 2020
18:23:14: Time: 21:43:27
18:23:14: Revision: 216968bc7025029c841ed6e36e81a03a316890d3
18:23:14: Branch: master
18:23:14: Compiler: GNU 4.9.4
18:23:14: Options: -std=c++11 -ffunction-sections -fdata-sections -O3
18:23:14: -funroll-loops
18:23:14: Platform: linux2 4.19.0-5-amd64
18:23:14: Bits: 64
18:23:14: Mode: Release
18:23:14:***********************************************************************
Code: Select all
andreas@ws1:/var/lib/fahclient> id fahclient
uid=454(fahclient) gid=100(users) Gruppen=100(users),484(video)
andreas@ws1:/var/lib/fahclient> ll /dev/nvidia*
crw-rw----+ 1 root video 195, 0 21. Apr 17:10 /dev/nvidia0
crw-rw----+ 1 root video 195, 255 21. Apr 17:10 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254 21. Apr 17:10 /dev/nvidia-modeset
crw-rw-rw-+ 1 root root 241, 0 21. Apr 17:10 /dev/nvidia-uvm
crw-rw-rw- 1 root root 241, 1 21. Apr 17:17 /dev/nvidia-uvm-tools
Code: Select all
andreas@ws1:/var/lib/fahclient> rpm -qa | grep -i "nvidia\|mesa\|icd\|clinfo" | sort
clinfo-2.2.18.04.06-lp151.2.3.x86_64
libOSMesa8-18.3.2-lp151.23.9.1.x86_64
libOSMesa8-32bit-18.3.2-lp151.23.9.1.x86_64
Mesa-18.3.2-lp151.23.9.1.x86_64
Mesa-32bit-18.3.2-lp151.23.9.1.x86_64
Mesa-demo-x-8.3.0-lp151.2.3.x86_64
Mesa-dri-18.3.2-lp151.23.9.1.x86_64
Mesa-dri-32bit-18.3.2-lp151.23.9.1.x86_64
Mesa-gallium-18.3.2-lp151.23.9.1.x86_64
Mesa-gallium-32bit-18.3.2-lp151.23.9.1.x86_64
Mesa-KHR-devel-18.3.2-lp151.23.9.1.x86_64
Mesa-libEGL1-18.3.2-lp151.23.9.1.x86_64
Mesa-libEGL-devel-18.3.2-lp151.23.9.1.x86_64
Mesa-libGL1-18.3.2-lp151.23.9.1.x86_64
Mesa-libGL1-32bit-18.3.2-lp151.23.9.1.x86_64
Mesa-libglapi0-18.3.2-lp151.23.9.1.x86_64
Mesa-libglapi0-32bit-18.3.2-lp151.23.9.1.x86_64
Mesa-libGL-devel-18.3.2-lp151.23.9.1.x86_64
Mesa-libGLESv1_CM1-18.3.2-lp151.23.9.1.x86_64
Mesa-libGLESv2-2-18.3.2-lp151.23.9.1.x86_64
Mesa-libva-18.3.2-lp151.23.9.1.x86_64
nvidia-computeG05-440.82-lp151.25.1.x86_64
nvidia-gfxG05-kmp-default-440.82_k4.12.14_lp151.27-lp151.25.1.x86_64
nvidia-glG05-440.82-lp151.25.1.x86_64
ocl-icd-devel-2.2.11-lp151.3.1.x86_64
x11-video-nvidiaG05-440.82-lp151.25.1.x86_64
What else can I do to get f@h to run on this GPU?
It seems I'm running out of ideas...
Any help to get that thing up and running is much appreciated!
- andreas