Page 1 of 2

cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 01, 2020 4:28 pm
by astrorob
Hi all -

i see the new 0.0.13 core trying to create a CUDA context on both linux and W10, and both failing.

on windows 10 home, the error is:

Failed to create CUDA context
Error loading CUDA module: CUDA_ERROR_FILE_NOT_FOUND (301)

on this machine i have VB2019 and cuda 11.1.0 installed. the NVIDIA control panel reports driver version 456.43 loaded on both my RTX2060 and GTX 1060. i think this must be the latest.

over on linux, i see a similar error:

Error launching CUDA compiler: 256
gcc: error trying to exec 'cc1plus': execvp: No such file or directory.

on this machine an older nvidia driver is loaded but g++ is definitely installed.

both errors maybe seem like some kind of a PATH error, but i'm not sure how to set up PATH for the FAH daemons on either platform. further, i'm not really even sure what directory is missing from PATH.

has anyone faced this particular problem on either platform?

thanks

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 01, 2020 5:13 pm
by foldy
On Windows uninstall cuda toolkit sdk would help. Or remove the CUDA_PATH... before launching FAH

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 01, 2020 5:49 pm
by Kjetil
Running 441.66 win 10. 49d online. 2x2060s and beta is now 134xx
On win uninstall cuda toolkit sdk?

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 01, 2020 5:59 pm
by foldy
Only uninstall cuda toolkit sdk if you have issues running FAH in CUDA mode and fallback to OpenCL. Or wait for a fixed FAHcore

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 01, 2020 6:22 pm
by astrorob
foldy wrote:On Windows uninstall cuda toolkit sdk would help. Or remove the CUDA_PATH... before launching FAH
how do you do this on windows? FAH is started by some kind of windows service daemon. is there a registry setting or something?

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 01, 2020 6:43 pm
by astrorob
ok well i couldn't figure out the CUDA_PATH thing on windows, but i found that i had cuda 10 and cuda 11 both installed simultaneously. removed all of cuda 10 (probably not necessary) and then removed the cuda 11 development stuff and now at least on my windows 10 box FAH is using CUDA to fold on both nvidia GPUs.

this leaves linux where i suppose the problem is of a similar nature? i need to remove the dev kit portion of the cuda installation?

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Fri Oct 02, 2020 3:14 pm
by bruce
I expect a later revision than 0.0.13 will resolve the issue for those people who need a different SDK toolkit version than the one FAH is delivering. For the rest of the world, removing an unused SDK works now.

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Sun Oct 04, 2020 3:00 am
by astrorob
thanks - are there any instructions on how to do this on Ubuntu? i think i just installed nvidia's cuda package and i don't think there's any SDK version mismatch (though the installer probably did install the SDK)

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Mon Oct 05, 2020 8:24 pm
by bruce
The SDK includes a lot of things that are non-essential for FAH -- but would be used by a developer. The nVidia drivers (if you get them directly for nV) include OpenCL and everything that is needed to run it. FAHCore_22 delivers the parts of CUDA that it takes to use CUDA but you're not equiped to develop for CUDA.

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 08, 2020 2:29 pm
by Azmodes
astrorob wrote:removed the cuda 11 development stuff and now at least on my windows 10 box FAH is using CUDA to fold on both nvidia GPUs.
That worked for me on Windows too, had the exact same issue. :)

Most of my other crunchers are running some flavour of Linux, though (primarily Ubuntu), and none of them are using CUDA. There isn't even an error, it's not showing up at all:

Code: Select all

14:55:54:WU00:FS01:Connecting to 65.254.110.245:80
14:55:55:WU00:FS01:Assigned to work server 66.170.111.50
14:55:55:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1070 Ti] 8186 from 66.170.111.50
14:55:55:WU00:FS01:Connecting to 66.170.111.50:8080
14:55:55:WU00:FS01:Downloading 11.17MiB
14:56:01:WU00:FS01:Download 47.58%
14:56:06:WU00:FS01:Download complete
14:56:06:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14485 run:0 clone:1439 gen:81 core:0x22 unit:0x0000006e42aa6f325f45deaa14b9e36d
14:56:06:WU00:FS01:Starting
14:56:06:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /etc/init.d/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 704 -lifeline 2281 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
14:56:06:WU00:FS01:Started FahCore on PID 47834
14:56:06:WU00:FS01:Core PID:47838
14:56:06:WU00:FS01:FahCore 0x22 started
14:56:06:WU00:FS01:0x22:*********************** Log Started 2020-10-08T14:56:06Z ***********************
14:56:06:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
14:56:06:WU00:FS01:0x22:       Core: Core22
14:56:06:WU00:FS01:0x22:       Type: 0x22
14:56:06:WU00:FS01:0x22:    Version: 0.0.13
14:56:06:WU00:FS01:0x22:     Author: Joseph Coffland <[email protected]>
14:56:06:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
14:56:06:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
14:56:06:WU00:FS01:0x22:       Date: Sep 19 2020
14:56:06:WU00:FS01:0x22:       Time: 01:10:35
14:56:06:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
14:56:06:WU00:FS01:0x22:     Branch: core22-0.0.13
14:56:06:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:56:06:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:56:06:WU00:FS01:0x22:             -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
14:56:06:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:56:06:WU00:FS01:0x22:       Bits: 64
14:56:06:WU00:FS01:0x22:       Mode: Release
14:56:06:WU00:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
14:56:06:WU00:FS01:0x22:             <[email protected]>
14:56:06:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 704 -lifeline 47834 -checkpoint 15 -gpu
14:56:06:WU00:FS01:0x22:             0 -gpu-vendor nvidia
14:56:06:WU00:FS01:0x22:************************************ libFAH ************************************
14:56:06:WU00:FS01:0x22:       Date: Sep 15 2020
14:56:06:WU00:FS01:0x22:       Time: 05:14:43
14:56:06:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
14:56:06:WU00:FS01:0x22:     Branch: HEAD
14:56:06:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:56:06:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:56:06:WU00:FS01:0x22:             -funroll-loops
14:56:06:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:56:06:WU00:FS01:0x22:       Bits: 64
14:56:06:WU00:FS01:0x22:       Mode: Release
14:56:06:WU00:FS01:0x22:************************************ CBang *************************************
14:56:06:WU00:FS01:0x22:       Date: Sep 15 2020
14:56:06:WU00:FS01:0x22:       Time: 05:11:04
14:56:06:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
14:56:06:WU00:FS01:0x22:     Branch: HEAD
14:56:06:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:56:06:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:56:06:WU00:FS01:0x22:             -funroll-loops -fPIC
14:56:06:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:56:06:WU00:FS01:0x22:       Bits: 64
14:56:06:WU00:FS01:0x22:       Mode: Release
14:56:06:WU00:FS01:0x22:************************************ System ************************************
14:56:06:WU00:FS01:0x22:        CPU: AMD Ryzen 7 3700X 8-Core Processor
14:56:06:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
14:56:06:WU00:FS01:0x22:       CPUs: 16
14:56:06:WU00:FS01:0x22:     Memory: 23.49GiB
14:56:06:WU00:FS01:0x22:Free Memory: 16.82GiB
14:56:06:WU00:FS01:0x22:    Threads: POSIX_THREADS
14:56:06:WU00:FS01:0x22: OS Version: 5.4
14:56:06:WU00:FS01:0x22:Has Battery: false
14:56:06:WU00:FS01:0x22: On Battery: false
14:56:06:WU00:FS01:0x22: UTC Offset: 2
14:56:06:WU00:FS01:0x22:        PID: 47838
14:56:06:WU00:FS01:0x22:        CWD: /etc/init.d/work
14:56:06:WU00:FS01:0x22:************************************ OpenMM ************************************
14:56:06:WU00:FS01:0x22:   Revision: 189320d0
14:56:06:WU00:FS01:0x22:********************************************************************************
14:56:06:WU00:FS01:0x22:Project: 14485 (Run 0, Clone 1439, Gen 81)
14:56:06:WU00:FS01:0x22:Unit: 0x0000006e42aa6f325f45deaa14b9e36d
14:56:06:WU00:FS01:0x22:Reading tar file core.xml
14:56:06:WU00:FS01:0x22:Reading tar file integrator.xml.bz2
14:56:06:WU00:FS01:0x22:Reading tar file state.xml.bz2
14:56:06:WU00:FS01:0x22:Reading tar file system.xml.bz2
14:56:06:WU00:FS01:0x22:Digital signatures verified
14:56:06:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:56:06:WU00:FS01:0x22:Version 0.0.13
14:56:06:WU00:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
14:56:06:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
14:56:06:WU00:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
14:56:06:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
14:56:06:WU00:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
14:56:06:WU00:FS01:0x22:Please consider upgrading your client version.
14:56:06:WU00:FS01:0x22:There are 3 platforms available.
14:56:06:WU00:FS01:0x22:Platform 0: Reference
14:56:06:WU00:FS01:0x22:Platform 1: CPU
14:56:06:WU00:FS01:0x22:Platform 2: OpenCL
14:56:06:WU00:FS01:0x22:  opencl-device 0 specified
14:56:14:WU00:FS01:0x22:Attempting to create OpenCL context:
14:56:14:WU00:FS01:0x22:  Configuring platform OpenCL
14:56:20:WU00:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
14:56:20:WU00:FS01:0x22:Completed 0 out of 1250000 steps (0%)
14:56:20:WU00:FS01:0x22:Checkpoint completed at step 0
14:57:11:WU00:FS01:0x22:Completed 12500 out of 1250000 steps (1%)
Is there anything I have to install or remove to make this work? They're all Pascal or Turing cards, drivers vary between 440.82 and 450.66 (all installed directly via a driver executable downloaded from Nvidia's site, NOT with the Linux utility), OS versions are mostly Ubuntu 20.04, some 16/18.04 and one runs Debian. All cards can run BOINC tasks using CUDA just fine, fwiw.

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Thu Oct 08, 2020 10:27 pm
by bruce
The FAHCore downloads the necessary CUDA support code. The bug that you're dealing with is a conflict between the version of CUDA in your SDK confliciting with the code downloading in FAHCore_22. You probably can uninstall the SDK and avoid the issue unless you're actually using it to develop CUDA code.

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Fri Oct 09, 2020 4:17 am
by Rel25917
Upgrade your client version, had to do that on one of mine to get cuda recognized properly.

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Fri Oct 09, 2020 7:31 am
by foldy
@Azmodes: FAHclient should be v7.6.13

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Fri Oct 09, 2020 5:15 pm
by Azmodes
foldy wrote:@Azmodes: FAHclient should be v7.6.13
That did it. Thank you!

Re: cuda failure - core 0.0.13 - both win10 and linux

Posted: Mon Oct 12, 2020 7:37 am
by florinandrei
I still have this problem with 7.6.13 and NVidia 456.71, Win10, TitanX Pascal.

07:28:37:WU00:FS01:0x22:There are 4 platforms available.
07:28:37:WU00:FS01:0x22:Platform 0: Reference
07:28:37:WU00:FS01:0x22:Platform 1: CPU
07:28:37:WU00:FS01:0x22:Platform 2: OpenCL
07:28:37:WU00:FS01:0x22: opencl-device 0 specified
07:28:37:WU00:FS01:0x22:Platform 3: CUDA
07:28:37:WU00:FS01:0x22: cuda-device 0 specified
07:28:47:WU00:FS01:0x22:Attempting to create CUDA context:
07:28:47:WU00:FS01:0x22: Configuring platform CUDA
07:28:48:WU00:FS01:0x22:Failed to create CUDA context:
07:28:48:WU00:FS01:0x22:Error loading CUDA module: CUDA_ERROR_FILE_NOT_FOUND (301)
07:28:48:WU00:FS01:0x22:Attempting to create OpenCL context:
07:28:48:WU00:FS01:0x22: Configuring platform OpenCL
07:29:02:WU00:FS01:0x22: Using OpenCL on platformId 0 and gpu 0