Page 1 of 2

Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sat Mar 21, 2020 11:02 am
by Markus_Laker
I'm running 64-bit Fedora 31 on a machine with an Nvidia GeForce GT 1030. I've installed version 440.64 of the binary Nvidia drivers from rpmfusion. I've also (after reading around) installed the ocl-icd-devel package. I'm using a version of FAHControl that I've modified in one way: the shebang line explicitly requests Python 2, because the default on F31 is Python 3. I've made no other changes to FAH software. When I try to add a GPU slot, FAHControl says no GPU is available. Here's my current log:

Code: Select all

*********************** Log Started 2020-03-21T10:39:33Z ***********************
10:39:33:************************* Folding@home Client *************************
10:39:33:      Website: https://foldingathome.org/
10:39:33:    Copyright: (c) 2009-2018 foldingathome.org
10:39:33:       Author: Joseph Coffland <[email protected]>
10:39:33:         Args: --child --lifeline 13755 /etc/fahclient/config.xml --run-as
10:39:33:               fahclient --pid-file=/var/run/fahclient.pid --daemon
10:39:33:       Config: /etc/fahclient/config.xml
10:39:33:******************************** Build ********************************
10:39:33:      Version: 7.5.1
10:39:33:         Date: May 12 2018
10:39:33:         Time: 22:51:07
10:39:33:   Repository: Git
10:39:33:     Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
10:39:33:       Branch: master
10:39:33:     Compiler: GNU 4.4.7 20120313 (Red Hat 4.4.7-18)
10:39:33:      Options: -std=gnu++98 -O3 -funroll-loops
10:39:33:     Platform: linux2 4.14.0-3-amd64
10:39:33:         Bits: 64
10:39:33:         Mode: Release
10:39:33:******************************* System ********************************
10:39:33:          CPU: AMD Ryzen Threadripper 1920X 12-Core Processor
10:39:33:       CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
10:39:33:         CPUs: 24
10:39:33:       Memory: 62.74GiB
10:39:33:  Free Memory: 51.04GiB
10:39:33:      Threads: POSIX_THREADS
10:39:33:   OS Version: 5.5
10:39:33:  Has Battery: false
10:39:33:   On Battery: false
10:39:33:   UTC Offset: 0
10:39:33:          PID: 13757
10:39:33:          CWD: /var/lib/fahclient
10:39:33:           OS: Linux 5.5.8-200.fc31.x86_64 x86_64
10:39:33:      OS Arch: AMD64
10:39:33:         GPUs: 0
10:39:33:CUDA Device 0: Platform:0 Device:0 Bus:65 Slot:0 Compute:6.1 Driver:10.2
10:39:33:       OpenCL: Not detected: clGetDeviceIDs() returned -1
10:39:33:***********************************************************************
10:39:33:<config>
10:39:33:  <!-- Network -->
10:39:33:  <proxy v=':8080'/>
10:39:33:
10:39:33:  <!-- User Information -->
10:39:33:  <passkey v='********************************'/>
10:39:33:  <team v='12501'/>
10:39:33:  <user v='Markus_Laker'/>
10:39:33:
10:39:33:  <!-- Folding Slots -->
10:39:33:  <slot id='0' type='CPU'>
10:39:33:    <cpus v='22'/>
10:39:33:  </slot>
10:39:33:</config>
10:39:33:Switching to user fahclient
10:39:33:Trying to access database...
10:39:35:Successfully acquired database lock
10:39:35:Enabled folding slot 00: READY cpu:22
10:39:35:WU00:FS00:Starting
10:39:35:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 13757 -checkpoint 15 -np 22
10:39:35:WU00:FS00:Started FahCore on PID 13809
10:39:35:WU00:FS00:Core PID:13813
10:39:35:WU00:FS00:FahCore 0xa7 started
10:39:35:WU00:FS00:0xa7:*********************** Log Started 2020-03-21T10:39:35Z ***********************
10:39:35:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
10:39:35:WU00:FS00:0xa7:       Type: 0xa7
10:39:35:WU00:FS00:0xa7:       Core: Gromacs
10:39:35:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 13809 -checkpoint 15 -np
10:39:35:WU00:FS00:0xa7:             22
10:39:35:WU00:FS00:0xa7:************************************ CBang *************************************
10:39:35:WU00:FS00:0xa7:       Date: Nov 5 2019
10:39:35:WU00:FS00:0xa7:       Time: 06:06:57
10:39:35:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
10:39:35:WU00:FS00:0xa7:     Branch: master
10:39:35:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
10:39:35:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
10:39:35:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
10:39:35:WU00:FS00:0xa7:       Bits: 64
10:39:35:WU00:FS00:0xa7:       Mode: Release
10:39:35:WU00:FS00:0xa7:************************************ System ************************************
10:39:35:WU00:FS00:0xa7:        CPU: AMD Ryzen Threadripper 1920X 12-Core Processor
10:39:35:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
10:39:35:WU00:FS00:0xa7:       CPUs: 24
10:39:35:WU00:FS00:0xa7:     Memory: 62.74GiB
10:39:35:WU00:FS00:0xa7:Free Memory: 51.23GiB
10:39:35:WU00:FS00:0xa7:    Threads: POSIX_THREADS
10:39:35:WU00:FS00:0xa7: OS Version: 5.5
10:39:35:WU00:FS00:0xa7:Has Battery: false
10:39:35:WU00:FS00:0xa7: On Battery: false
10:39:35:WU00:FS00:0xa7: UTC Offset: 0
10:39:35:WU00:FS00:0xa7:        PID: 13813
10:39:35:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
10:39:35:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
10:39:35:WU00:FS00:0xa7:    Version: 0.0.18
10:39:35:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
10:39:35:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
10:39:35:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
10:39:35:WU00:FS00:0xa7:       Date: Nov 5 2019
10:39:35:WU00:FS00:0xa7:       Time: 06:13:26
10:39:35:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
10:39:35:WU00:FS00:0xa7:     Branch: master
10:39:35:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
10:39:35:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
10:39:35:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
10:39:35:WU00:FS00:0xa7:       Bits: 64
10:39:35:WU00:FS00:0xa7:       Mode: Release
10:39:35:WU00:FS00:0xa7:************************************ Build *************************************
10:39:35:WU00:FS00:0xa7:       SIMD: avx_256
10:39:35:WU00:FS00:0xa7:********************************************************************************
10:39:35:WU00:FS00:0xa7:Project: 13851 (Run 0, Clone 11538, Gen 8)
10:39:35:WU00:FS00:0xa7:Unit: 0x00000008287234c95e73024da3060dfa
10:39:35:WU00:FS00:0xa7:Digital signatures verified
10:39:35:WU00:FS00:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
10:39:35:WU00:FS00:0xa7:Calling: mdrun -s frame8.tpr -o frame8.trr -x frame8.xtc -e frame8.edr -cpi state.cpt -cpt 15 -nt 21
10:39:35:WU00:FS00:0xa7:Steps: first=4000000 total=500000
10:39:36:WU00:FS00:0xa7:Completed 11002 out of 500000 steps (2%)
10:40:11:WU00:FS00:0xa7:Completed 15000 out of 500000 steps (3%)
(etc.)
OpenCL is installed, but doesn't seem quite right:

Code: Select all

[msl@localhost ~]$ clinfo -l
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
Platform #0: NVIDIA CUDA
 `-- Device #0: GeForce GT 1030
Platform #1: Portable Computing Language
 `-- Device #0: pthread-AMD Ryzen Threadripper 1920X 12-Core Processor
Platform #2: Clover
Platform #3: Intel Gen OCL Driver
[msl@localhost ~]$
How can I get FAH to use my GPU?

Thanks for any help you can give, and for all the amazing work done by the FAH volunteers.

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sat Mar 21, 2020 2:48 pm
by goodyca
One thing to check is the existence of the GPUs.txt file. It should be in the /var/lib/fahclient directory. When I added the GPU on a Fedora 31 install, it was missing. The GPUs.txt file can be downloaded from:

https://apps.foldingathome.org/GPUs.txt

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sat Mar 21, 2020 6:08 pm
by Markus_Laker
It's already there, thanks, and it mentions my graphics card:

Code: Select all

[msl@localhost ~]$ grep 1030 /var/lib/fahclient/GPUs.txt 
0x10de:0x1d01:2:5:GP108 [GeForce GT 1030]
0x10de:0x1d12:2:5:GP108 [GeForce MX150 (GT 1030) Max-Q]
[msl@localhost ~]$
What else should I try? Do need to do something about the errors reported by `clinfo -l', or are they not relevant to FAH?

Thanks again,

Markus

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sat Mar 21, 2020 7:42 pm
by Joe_H
Does the Device ID for your 1030 match either of those two entries? Just asking in case they have come out with another variant than those. A new entry would be needed for a new ID if the one you have does not match.

You have installed:

nVidia driver - check
opencl dev kit - check
nvidia OpenCL - ? asking in case it was not part of the driver rpm

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sat Mar 21, 2020 10:54 pm
by Markus_Laker
About the device ID -- I have this:

Code: Select all

[msl@localhost ~]$ lspci | grep -i vga
41:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)
[msl@localhost ~]$
I'm well outside my area of expertise. Is that string "GP108 [GeForce GT 1030]" the device ID you have in mind? If so, it looks like a match.

About Nvidia OpenCL: I don't know, but this --

Code: Select all

[msl@localhost ~]$ find /usr/lib64 -iname \*opencl\* | regrep -viX '/ (?: libreoffice | clang ) /'
/usr/lib64/libMesaOpenCL.so
/usr/lib64/libOpenCL.so
/usr/lib64/libnvidia-opencl.so.1
/usr/lib64/libOpenCL.so.1
/usr/lib64/libnvidia-opencl.so.440.64
/usr/lib64/libMesaOpenCL.so.1.0.0
/usr/lib64/libOpenCL.so.1.0.0
/usr/lib64/pkgconfig/OpenCL.pc
/usr/lib64/libMesaOpenCL.so.1
[msl@localhost ~]$
-- looks to me as if the answer might be yes.

libOpenCL.so is a symlink to libOpenCL.so.1.0.0 in the usual way.

Thanks again,

Markus

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sat Mar 21, 2020 11:27 pm
by Joe_H
Could you run fahclient --lspci from a command line? That lists the various PCI devices including the GPU and gives the device ID as well. You may need to cd /usr/bin first.

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sun Mar 22, 2020 11:12 am
by Markus_Laker
Hi, Joe. That command segfaults, unfortunately:

Code: Select all

[msl@localhost bin]$ FAHClient --lspci
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
VendorID:DeviceID:PCI Bus:PCI Slot:PCI function:Vendor Name:Description
0x1022:0x1450:0:0:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1451:0:0:2:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1452:0:1:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1453:0:1:1:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1452:0:2:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1452:0:3:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1452:0:4:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1452:0:7:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1454:0:7:1:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1452:0:8:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1454:0:8:1:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x790b:0:20:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x790e:0:20:3:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1460:0:24:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1461:0:24:1:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1462:0:24:2:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1463:0:24:3:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1464:0:24:4:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1465:0:24:5:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1466:0:24:6:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1467:0:24:7:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1460:0:25:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1461:0:25:1:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1462:0:25:2:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1463:0:25:3:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1464:0:25:4:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1465:0:25:5:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1466:0:25:6:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x1467:0:25:7:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43ba:1:0:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b6:1:0:1:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b1:1:0:2:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b4:2:0:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b4:2:1:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b4:2:2:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b4:2:3:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b4:2:4:0:Advanced Micro Devices, Inc. [AMD]:
0x1022:0x43b4:2:9:0:Advanced Micro Devices, Inc. [AMD]:
0x8086:0x1533:3:0:0:Intel Corporation:
0x8086:0x24fd:4:0:0:Intel Corporation:
0x8086:0x1533:5:0:0:Intel Corporation:
Segmentation fault (core dumped)
[msl@localhost bin]$ 
My guess is that it doesn't correctly handle the failures we see at the start of the transcript. strace doesn't tell us much about what FAHClient was doing when it segfaulted:

Code: Select all

...
write(1, "0x1022:0x43b4:2:9:0:Advanced Mic"..., 560x1022:0x43b4:2:9:0:Advanced Micro Devices, Inc. [AMD]:
) = 56
write(1, "0x8086:0x1533:3:0:0:Intel Corpor"..., 390x8086:0x1533:3:0:0:Intel Corporation:
) = 39
write(1, "0x8086:0x24fd:4:0:0:Intel Corpor"..., 390x8086:0x24fd:4:0:0:Intel Corporation:
) = 39
write(1, "0x8086:0x1533:5:0:0:Intel Corpor"..., 390x8086:0x1533:5:0:0:Intel Corporation:
) = 39
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} ---
rt_sigaction(SIGHUP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGTRAP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGABRT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGFPE, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGSYS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGXCPU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGXFSZ, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigaction(SIGUSR1, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f15b9ef8b20}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, ~[RTMIN RT_1], NULL, 8) = 0
stat("/home/msl/.cache/pocl/kcache/tempfile-19-4c-11-df-40", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlink("/home/msl/.cache/pocl/kcache/tempfile-19-4c-11-df-40") = 0
rt_sigreturn({mask=[PIPE]})             = 12283840
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
[msl@localhost bin]$
So perhaps the piece of code it was about to run forgets to check whether an earlier piece of setup code managed to read in the necessary data or not.

Thanks for persevering with this.

M.

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sun Mar 22, 2020 4:34 pm
by Joe_H
Perhaps try with a single '-', FAHClient -lspci. There are so many different commands that sometimes use it singly and other time doubled, I sometimes forget which and have to try both.

Another possibility is the command ran into a protection issue, it may need to be run with sudo. From what out it did generate, it dd manage to scan part of the system and return values for AMD and Intel devices before segfaulting. Recent versions of Linux have been moving GPU/video resources into groups that can require privileges to access.

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sun Mar 22, 2020 4:55 pm
by Markus_Laker
Hi, Joe. -lspci with a single dash produced an error message:

Code: Select all

16:52:23:ERROR:Exception: Invalid argument '-lspci'
16:52:23:ERROR:Caused by: Option '-lspci' does not exist.
`sudo FAHClient --lspci', with two dashes and a `sudo' produced the same output as it does without `sudo', and ended by segfaulting in the same place as before.

Cheers,

Markus

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Sun Mar 22, 2020 4:58 pm
by bruce
My hunch: You need the proprietary drivers from NVidia, not the ones commonly distributed with Linux. You'll probably find that they include an OpenCL package that works with their CUDA API but I have not updated my LInux servers in a long time.

Be especially careful if you install drivers while X is running. [i.e. don't unless you get a package that was tested on the GUI version of Linux.)

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Mon Mar 23, 2020 10:56 am
by Markus_Laker
Hi, Bruce. You're right -- Nouveau, the open source driver that ships with Fedora, doesn't support OpenCL. I reluctantly installed Nvidia's proprietary drivers from RPM Fusion last year, because I found Noveau too unstable with dual monitors, and I'm currently running version 440.64. The clinfo transcript I posted here over the weekend and the presence of /usr/lib64/libnvidia-opencl.so.440.64 and /etc/OpenCL/vendors/nvidia.icd suggest that these drivers support OpenCL, or at least try to.

Nouveau is still installed, but disabled via kernel boot parameters. That's the way the driver from RPM Fusion sets itself up by deault. I don't want to uninstall Nouceau, because I know that Nvidia proprietary drivers can stop working when the kernel is upgraded, and Fedora upgrades its kernels quite aggressively.

Intel's Beignet software is also installed on this machine, which seems a little odd, since I have an AMD processor and Nvidia graphics card. I'm guessing that Fedora always installs Beignet on AMD64 systems.

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Mon Mar 23, 2020 11:59 am
by katakaio
Joe_H wrote:You have installed:

nVidia driver - check
opencl dev kit - check
nvidia OpenCL - ? asking in case it was not part of the driver rpm
It's not clear to me if OP installed nvidia-opencl-dev in addition to ocl-icd-opencl-dev. I'd double-check this before pursuing bruce's wise suggestion of installing drivers straight from NVidia.

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Mon Mar 23, 2020 1:24 pm
by Markus_Laker
Hi, katakaio,

When I google `nvidia-opencl-dev', I find a package for Debian-based distros, but I'm running Fedora. Fedora has no package of that name that I can find.

Thanks,

Markus

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Mon Mar 23, 2020 1:40 pm
by katakaio
Thanks Markus - sorry for the bad tip! My Fedora knowledge is rusty, and I didn't realize that such a package didn't exist for Fedora (or if it does, I don't know the name). Installing ocl-icd-devel may be sufficient, but I install nvidia-opencl-dev on my Ubuntu rig (along with packaged NVidia drivers from a PPA) just to be safe.

At this point, I'd second bruce's suggestion of installing proprietary drivers from NVidia's .run files. They tend to bundle everything that you'd need (OpenCL, CUDA, etc), so let us know if you have more success with that.

Re: Fedora 31, Nvidia GT 1030: clGetDeviceIDs() returned -1

Posted: Mon Mar 23, 2020 11:53 pm
by Markus_Laker
Thanks, katakaio, but I've been burnt that way once before. I tried installing Nvidia's drivers by following the instructions on a site called If Not True Then False. The instructions had me uninstall Nouveau and then reboot, and then I found that the Nvidia drivers wouldn't build on my system, and so I ended up without an X server. I eventually had to rebuild it by booting up a live image and using a chroot to reinstall Nouveau. It's not the first time I've had Nvidia drivers that wouldn't build -- it happened to me several times on Debian in a past life.

I daren't do anything to destabilise this machine -- I need it all day, every day, to work from home during the Covid-19 outbreak. It sounds as if I won't be able to do GPU folding at present. But I thank you, along with Bruce, Joe_H and goodyca for your time and suggestions. CPU folding on a 12-core Threadripper is still a decent contribution to be able to make to the cause. :)