Page 2 of 2
Re: Unable to run NVIDIA GPU with driver 535 [Solved]
Posted: Fri Dec 06, 2024 1:57 pm
by AthanSpod
Hmmm, except after shutdown and booting backup:
- `nvidia_uvm` module is and was loaded,
- But fah-client again didn't think CUDA was supported. The usual `CUDA not supported: cuInit() returned 999` logged.
- `systemctl restart fah-client.service` has it working OK again.
Logging for module insertion and fah-client startup:
Code: Select all
2024-12-06T08:26:46.967616+00:00 emilia systemd-modules-load[552]: Inserted module 'nvidia_uvm'
2024-12-06T08:26:58.410268+00:00 emilia systemd[1]: Started fah-client.service - Folding@home Client.
So you'd think that was timed such that it should have worked.
Re: Unable to run NVIDIA GPU with driver 535 [Solved]
Posted: Wed Dec 18, 2024 12:11 am
by HackinDoge
Not sure if valuable/applicable, but my workaround has been running /usr/bin/nvidia-smi right before starting up FAH. No insight as to how/why that works, but it does...
Code: Select all
$ cat /sys/module/nvidia/version
550.135
Code: Select all
$ nvidia-smi
Tue Dec 17 16:08:57 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.135 Driver Version: 550.135 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1070 Ti Off | 00000000:01:00.0 Off | N/A |
| 38% 69C P2 156W / 180W | 1035MiB / 8192MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 30083 C ...2009-64bit-release-8.1.4/FahCore_24 1032MiB |
+-----------------------------------------------------------------------------------------+
Code: Select all
$ podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
lscr.io/linuxserver/foldingathome latest 53f4ad7aec5a 21 hours ago 420 MB
Code: Select all
$ podman container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9d734ec5fb0d lscr.io/linuxserver/foldingathome:latest 17 hours ago Up 17 hours 0.0.0.0:7396->7396/tcp foldingathome
Re: Unable to run NVIDIA GPU with driver 535 [Solved]
Posted: Wed Dec 18, 2024 11:26 am
by Marcos FRM
Up until version 8.4.9, the fah-client service has NoNewPrivileges=yes set, meaning any process running as a normal user can't escalate privileges, like running SUID root binaries. It's a crucial security measure, as nothing in the fah-client needs root privileges.
Unfortunately, the Nvidia driver is buggy and, under certain circumstances, relies on the nvidia-modprobe binary (see
https://manpages.ubuntu.com/manpages/or ... obe.1.html) to create device nodes and do other tweaks. This binary will be invoked, I'm not entirely sure how, by the process requesting CUDA: in other words, by a process running within the fah-client service, running as the fah-client user, which is restricted by NoNewPrivileges=yes. So it ends up failing.
For the next version, we're disabling this feature (reluctantly) to avoid this issue, hoping Nvidia fixes their driver someday.