Page 1 of 1

Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Thu Jun 13, 2024 10:17 pm
by jchang6
I had a RTX 3080 Ti fail after about 3 years. The display went dark (motherboard did not connect iGPU), the system was still running - network accessible. A remote system FAHControl showed the system in question as up, but gpu disabled.
I was only folding on the GPU, I had deleted the cpu slot
Shutdown system, replaced the 3080 (card was warm, not hot) with a 4060 Ti.
System now works, display is good, updated nVidia driver,
FAH control says gou is disabled.
Uninstalled FAH, including data,
reinstalled, FAH shows cpu and gpu, but gpu is still disabled,
any ideas?
thanks

ps, I have lost 12 places in the time the 3080 was disabled, will need to get a couple of additional 4060 Ti's to get caught up

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Thu Jun 13, 2024 11:02 pm
by bikeaddict
The Log and System Info tabs in FAHControl should show any CUDA or OpenCL errors with the GPU.

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Fri Jun 14, 2024 12:06 am
by jchang6
22:05:41:WARNING:FS01:Disabling beta GPU slot 01: gpu:1:0. Beta GPUs can be tested for no points by setting ``gpu-beta=true`` in the configuration.

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Fri Jun 14, 2024 12:10 am
by jchang6
on a working system, there is
13:36:03: GPUs: 1
13:36:03: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 GA106 [GeForce RTX 3060 Lite Hash
13:36:03: Rate]
13:36:03: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:8.6 Driver:12.5
13:36:03:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.0 Driver:555.99
13:36:03:OpenCL Device 1: Platform:1 Device:0 Bus:NA Slot:NA Compute:3.0 Driver:31.0

on the non-functional system
22:05:41: GPUs: 1
22:05:41: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:1
22:05:41: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:8.9 Driver:12.5
22:05:41:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.0 Driver:555.99

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Fri Jun 14, 2024 12:21 am
by bikeaddict
It usually gives the beta GPUs message when it failed to download the GPUs.txt file from the F@H server. Sometimes the network isn't initialized when the F@H service starts at boot. You can try deleting the GPUs.txt file or downloading it manually from https://apps.foldingathome.org/GPUs.txt and restarting the client.

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Fri Jun 14, 2024 12:36 am
by jchang6
I did notice there is a gpu.tct file that does have the 4060 Ti.
in retrospect, I have seen this problem before, and it eventually cleared itself
what you say would make sense.
I will just reboot daily until it clears

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Mon Jun 17, 2024 1:09 pm
by jchang6
I removed the 4060 Ti from the first machine, put it in the different machine. Still same.
FAH Control System Info says GPU 0 Bus:1 Slot:0 NVIDIA
status says: Disabled description gpu:1:0
on the first machine, I put in an old AMD R7, also disabled, but status does say R7 ...

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Mon Jun 24, 2024 7:33 pm
by toTOW
I guess this GPU has a new Device ID, nVidia likes to have the same model with different IDs ... see this post to get it and request it to be added : viewtopic.php?p=262894#p262894

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Thu Aug 22, 2024 3:44 pm
by jchang6
So I gave up on this. But just yesterday I felt heat coming from the formerly largely inactive GPU.
FaH now shows , starting Log Started 2024-08-20T03:58:07Z
03:58:07: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:9 AD104 [GeForce RTX 4060 Ti]

before was:
21:36:23: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:1

GPUs.txt file is dated 8/19/2024 11:58PM (EST/DLST)

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Thu Aug 22, 2024 4:41 pm
by Joe_H
You never responded to toTOW's directions to post the Device ID of the new card, so no one had the necessary info to add it. toTOW obtained a list of some new GPUs and their Device IDs through other sources and added them last Saturday:

viewtopic.php?p=365207#p365207 & viewtopic.php?p=365208#p365208

That appears to have included your new card, and the autoupdate the client does about once a month pulled in an updated GPUs.txt. There are now three different entries for RTX 4060 Ti cards. Card info with Device IDs provided by folders do get added more quickly.

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Thu Aug 22, 2024 7:15 pm
by jchang6
According to GPU-Z
10DE 2788 - 1462 5121

the entry in GPU.txt seems to be
0x10de:0x2788:2:9:AD104 [GeForce RTX 4060 Ti]

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Thu Aug 22, 2024 7:23 pm
by Joe_H
Yes, that is correct and was added in the second post I linked to. GPU-Z just showed the hexadecimal portion of the ID, the "0x" is a standard way of indicating that what follows is hexadecimal. 10de is the manufacturer code for Nvidia, 2788 is the designation assigned by Nvidia to that particular GPU.

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Thu Aug 22, 2024 7:38 pm
by jchang6
it would be appear that a 4060 Ti can be either built from the size appropriate AD-106 chip or the larger AD104 chip that can accommodate either 4070, 4070 Super, or 4070 Ti, though not the 4070 Ti Super
so it is really a matter of noting which product can be built from each of the chips

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Posted: Fri Aug 23, 2024 12:55 am
by Joe_H
Nvidia does that a lot. They may take binned GPU chips with some sections that don't pass tests and disable those sections. Then they use the for a lesser model GPU card. Sometimes they may even do it with chips that pass completely, depends on the costs of having a fab schedule production of a different chip versus more of the the current one. End result is a lower cost to Nvidia overall to produce a range of cards, and the ones with the same model designation will be almost identical in processing power.