GPU unsupported after restart

Moderators: Site Moderators, FAHC Science Team

Post Reply
devachnid
Posts: 1
Joined: Wed Feb 12, 2025 12:30 pm

GPU unsupported after restart

Post by devachnid »

I'm running 8.4.9 in a Debian LXC on Proxmox, and it was folding perfectly using both the GPU and CPUs, however after a reboot of my host the GPU is suddenly "unsupported":

Code: Select all

19:12:06:I3:gpus = {
19:12:06:I3: "gpu:66:00:00": {"vendor": 4318, "device": 9604, "type": "nvidia", "supported": false, "description": "GA107 [GeForce RTX 3050 6GB]"}
19:12:06:I3:}
I've tried the suggested fix listed here viewtopic.php?p=367227&hilit=gpu+suppor ... se#p367227 of setting

Code: Select all

NoNewPrivileges=no 
in the service file and restarting the service but that doesn't fix it unfortunately.

nvidia-smi command works fine within the LXC and host, there's been no update to the drivers/kernel.

Any ideas why the GPU would suddenly be unsupported and stop working after a restart?

Full log:

Code: Select all

FOLDING
@
HOME
Machine Log
Proxmox
Close
Search
 Errors
 Warnings
11%
19:12:06:I1: Version: 8.4.9
19:12:06:I1: Author: Joseph Coffland <[email protected]>
19:12:06:I1: Org: foldingathome.org
19:12:06:I1: Copyright: 2023-2024, foldingathome.org
19:12:06:I1: Homepage: https://foldingathome.org/
19:12:06:I1: License: GPL-3.0-or-later
19:12:06:I1: URL: https://v8-4.foldingathome.org/
19:12:06:I1: Date: Nov 20 2024
19:12:06:I1: Time: 14:47:19
19:12:06:I1: Revision: 360fe71b1bd05bb89814bfb97b73a5bda84802d6
19:12:06:I1: Branch: master
19:12:06:I1: Compiler: GNU 8.3.0
19:12:06:I1: Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
19:12:06:I1: -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
19:12:06:I1: Platform: linux 4.19.0-27-cloud-amd64
19:12:06:I1: Bits: 64
19:12:06:I1: Mode: Release
19:12:06:I1: Args: --config=/etc/fah-client/config.xml
19:12:06:I1: --log=/var/log/fah-client/log.txt
19:12:06:I1: --log-rotate-dir=/var/log/fah-client/
19:12:06:I1: Config: /etc/fah-client/config.xml
19:12:06:I1:****************************** CBang ******************************
19:12:06:I1: Version: 1.7.2
19:12:06:I1: Author: Joseph Coffland <[email protected]>
19:12:06:I1: Org: Cauldron Development
19:12:06:I1: Copyright: Cauldron Development, 2003-2024
19:12:06:I1: Homepage: https://cauldrondevelopment.com/
19:12:06:I1: License: LGPL-2.1-or-later
19:12:06:I1: Date: Nov 19 2024
19:12:06:I1: Time: 21:54:38
19:12:06:I1: Revision: 443c54e909eb8d8994405a18fb328b5b05a623a5
19:12:06:I1: Branch: master
19:12:06:I1: Compiler: GNU 8.3.0
19:12:06:I1: Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
19:12:06:I1: -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
19:12:06:I1: -fPIC
19:12:06:I1: Platform: linux 4.19.0-27-cloud-amd64
19:12:06:I1: Bits: 64
19:12:06:I1: Mode: Release
19:12:06:I1:***************************** System ******************************
19:12:06:I1: CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
19:12:06:I1: CPU ID: GenuineIntel Family 6 Model 62 Stepping 4
19:12:06:I1: CPUs: 8
19:12:06:I1: Memory: 125.82GiB
19:12:06:I1:Free Memory: 35.12GiB
19:12:06:I1: OS Version: 6.8
19:12:06:I1:Has Battery: false
19:12:06:I1: On Battery: false
19:12:06:I1: Hostname: FaH
19:12:06:I1: UTC Offset: 0
19:12:06:I1: PID: 400
19:12:06:I1: CWD: /var/lib/fah-client
19:12:06:I1: Exec: /usr/bin/fah-client
19:12:06:I1:*******************************************************************
19:12:06:I2:<config>
19:12:06:I2: <!-- Account -->
19:12:06:I2: <account-token v='nZPqDnZPkhE3qkhEGaWKwGaVHQFOFHQJRBXuZRBe_Wk'/>
19:12:06:I2: <machine-name v='Proxmox'/>
19:12:06:I2:</config>
19:12:06:I1:Opening Database
19:12:06:I1:F@H ID = TQAWUFGH4FEY2bFua4_7MlksLh0o8pEQ3p6oElpsxM4
19:12:06:I3:Loading default resource group
19:12:06:I1:Listening for HTTP on 127.0.0.1:7396
19:12:06:I3:WU52:Loading work unit 52 with ID 3fCSsdY0hT1RzYVbiicpuN57YO_3AKqyROWZnS7I8rY
19:12:06:I3:WU58:Loading work unit 58 with ID azsIVqvskE4tGInHSL7NGu9M-YiPj1cljdOidWsF_EQ
19:12:06:I3:Loaded 2 wus.
19:12:06:I3:gpus = {
19:12:06:I3: "gpu:66:00:00": {"vendor": 4318, "device": 9604, "type": "nvidia", "supported": false, "description": "GA107 [GeForce RTX 3050 6GB]"}
19:12:06:I3:}
19:12:06:I1:Loaded cores/openmm-core-24/centos-7.9.2009-64bit/release/fahcore-24-centos-7.9.2009-64bit-release-8.1.4/FahCore_24
19:12:06:I1:Loaded cores/fahcore-a8-lin-64bit-avx_256-0.0.12/FahCore_a8
19:12:06:I3:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx_256-0.0.12/FahCore_a8 -dir azsIVqvskE4tGInHSL7NGu9M-YiPj1cljdOidWsF_EQ -suffix 01 -version 8.4.9 -lifeline 400 -np 7
19:12:06:I3:WU58:Started FahCore on PID 407
19:12:06:I1:OUT1:> GET https://api.foldingathome.org/machine/TQAWUFGH4FEY2bFua4_7MlksLh0o8pEQ3p6oElpsxM4 HTTP/1.1
19:12:06:I1:WU58:*********************** Log Started 2025-02-23T19:12:06Z ***********************
19:12:06:I1:WU58:************************** Gromacs Folding@home Core ***************************
19:12:06:I1:WU58: Core: Gromacs
19:12:06:I1:WU58: Type: 0xa8
19:12:06:I1:WU58: Version: 0.0.12
19:12:06:I1:WU58: Author: Joseph Coffland <[email protected]>
19:12:06:I1:WU58: Copyright: 2020 foldingathome.org
19:12:06:I1:WU58: Homepage: https://foldingathome.org/
19:12:06:I1:WU58: Date: Jan 16 2021
19:12:06:I1:WU58: Time: 19:23:19
19:12:06:I1:WU58: Compiler: GNU 8.3.0
19:12:06:I1:WU58: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:12:06:I1:WU58: -fdata-sections -O3 -funroll-loops -fno-pie
19:12:06:I1:WU58: Platform: linux2 4.15.0-128-generic
19:12:06:I1:WU58: Bits: 64
19:12:06:I1:WU58: Mode: Release
19:12:06:I1:WU58: SIMD: avx_256
19:12:06:I1:WU58: OpenMP: ON
19:12:06:I1:WU58: CUDA: OFF
19:12:06:I1:WU58: Args: -dir azsIVqvskE4tGInHSL7NGu9M-YiPj1cljdOidWsF_EQ -suffix 01
19:12:06:I1:WU58: -version 8.4.9 -lifeline 400 -np 7
19:12:06:I1:WU58:************************************ libFAH ************************************
19:12:06:I1:WU58: Date: Jan 16 2021
19:12:06:I1:WU58: Time: 19:21:38
19:12:06:I1:WU58: Compiler: GNU 8.3.0
19:12:06:I1:WU58: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:12:06:I1:WU58: -fdata-sections -O3 -funroll-loops -fno-pie
19:12:06:I1:WU58: Platform: linux2 4.15.0-128-generic
19:12:06:I1:WU58: Bits: 64
19:12:06:I1:WU58: Mode: Release
19:12:06:I1:WU58:************************************ CBang *************************************
19:12:06:I1:WU58: Date: Jan 16 2021
19:12:06:I1:WU58: Time: 19:21:24
19:12:06:I1:WU58: Compiler: GNU 8.3.0
19:12:06:I1:WU58: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:12:06:I1:WU58: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
19:12:06:I1:WU58: Platform: linux2 4.15.0-128-generic
19:12:06:I1:WU58: Bits: 64
19:12:06:I1:WU58: Mode: Release
19:12:06:I1:WU58:************************************ System ************************************
19:12:06:I1:WU58: CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
19:12:06:I1:WU58: CPU ID: GenuineIntel Family 6 Model 62 Stepping 4
19:12:06:I1:WU58: CPUs: 8
19:12:06:I1:WU58: Memory: 125.82GiB
19:12:06:I1:WU58:Free Memory: 35.11GiB
19:12:06:I1:WU58: Threads: POSIX_THREADS
19:12:06:I1:WU58: OS Version: 6.8
19:12:06:I1:WU58:Has Battery: false
91Notch
Posts: 3
Joined: Sat Jul 02, 2022 9:01 pm

Re: GPU unsupported after restart

Post by 91Notch »

I had the same problem and was going to respond to this to see if anyone had found a solution, and in the process I found a solution. I had this GPU working previously in a Windows 11 VM, so I had to unwind all of the pci passthrough configurations for that, and then went through the mediated device passthrough steps. The GPU was recognized in the container in nvtop, and the GPU showed up in the web control page, but was greyed-out in the settings. I double checked that previous VM passthrough settings, and stripped out the vfio lines of /etc/modules, restarted the host, and still no joy. Just about ready to give up but one last thing: apt update and apt upgrade, and the GPU started folding. I didn't take note of which files got updated/upgraded in that process, but there weren't that many.
In retrospect, it seems kind of obvious, but hopefully it will help someone get over that last hurdle.
muziqaz
Posts: 1531
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: GPU unsupported after restart

Post by muziqaz »

Try
sudo systemctl restart fah-client

On my 22.1 Mint Linux fahclient loads at the same time or before everything else loads, thus it does not manage to scan opencl/cuda libraries in time (or something like that, or not, just my theory). restarting fah-client with above command forces client to scan again, but this time everything within distro has been loaded already.
I honestly not sure if this will help
FAH Omega tester
Image
91Notch
Posts: 3
Joined: Sat Jul 02, 2022 9:01 pm

Re: GPU unsupported after restart

Post by 91Notch »

Okay, my client worked, until it didn't. I checked a few hours later, and the CPU for the container was pinned, and the console was unresponsive, so I reset the container and now I'm back to where I started, with a client that says the gpu:07:00:00 is not not supported, OpenCL and CUDA both show unsupported. Restarting fah-client has not resolved it, and apt update / apt upgrade both show everything is up to date. Rebooting the guest, and rebooting the host don't get it any further along. It was running but didn't complete a workunit before it crashed, and then it wouldn't run again.
However, after I reinstalled with dpkg -i, it's working again. So I'll keep an eye on it and see how far it gets, and whether it can complete anything.
muziqaz
Posts: 1531
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: GPU unsupported after restart

Post by muziqaz »

Key word there is container...
FAH Omega tester
Image
Post Reply