Replaced failed RTX 3080 Ti - Fah shows gpu disabled

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

I had a RTX 3080 Ti fail after about 3 years. The display went dark (motherboard did not connect iGPU), the system was still running - network accessible. A remote system FAHControl showed the system in question as up, but gpu disabled.
I was only folding on the GPU, I had deleted the cpu slot
Shutdown system, replaced the 3080 (card was warm, not hot) with a 4060 Ti.
System now works, display is good, updated nVidia driver,
FAH control says gou is disabled.
Uninstalled FAH, including data,
reinstalled, FAH shows cpu and gpu, but gpu is still disabled,
any ideas?
thanks

ps, I have lost 12 places in the time the 3080 was disabled, will need to get a couple of additional 4060 Ti's to get caught up
Image
bikeaddict
Posts: 210
Joined: Sun May 03, 2020 1:20 am

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by bikeaddict »

The Log and System Info tabs in FAHControl should show any CUDA or OpenCL errors with the GPU.
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

22:05:41:WARNING:FS01:Disabling beta GPU slot 01: gpu:1:0. Beta GPUs can be tested for no points by setting ``gpu-beta=true`` in the configuration.
Image
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

on a working system, there is
13:36:03: GPUs: 1
13:36:03: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 GA106 [GeForce RTX 3060 Lite Hash
13:36:03: Rate]
13:36:03: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:8.6 Driver:12.5
13:36:03:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.0 Driver:555.99
13:36:03:OpenCL Device 1: Platform:1 Device:0 Bus:NA Slot:NA Compute:3.0 Driver:31.0

on the non-functional system
22:05:41: GPUs: 1
22:05:41: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:1
22:05:41: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:8.9 Driver:12.5
22:05:41:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.0 Driver:555.99
Image
bikeaddict
Posts: 210
Joined: Sun May 03, 2020 1:20 am

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by bikeaddict »

It usually gives the beta GPUs message when it failed to download the GPUs.txt file from the F@H server. Sometimes the network isn't initialized when the F@H service starts at boot. You can try deleting the GPUs.txt file or downloading it manually from https://apps.foldingathome.org/GPUs.txt and restarting the client.
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

I did notice there is a gpu.tct file that does have the 4060 Ti.
in retrospect, I have seen this problem before, and it eventually cleared itself
what you say would make sense.
I will just reboot daily until it clears
Image
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

I removed the 4060 Ti from the first machine, put it in the different machine. Still same.
FAH Control System Info says GPU 0 Bus:1 Slot:0 NVIDIA
status says: Disabled description gpu:1:0
on the first machine, I put in an old AMD R7, also disabled, but status does say R7 ...
Image
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by toTOW »

I guess this GPU has a new Device ID, nVidia likes to have the same model with different IDs ... see this post to get it and request it to be added : viewtopic.php?p=262894#p262894
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

So I gave up on this. But just yesterday I felt heat coming from the formerly largely inactive GPU.
FaH now shows , starting Log Started 2024-08-20T03:58:07Z
03:58:07: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:9 AD104 [GeForce RTX 4060 Ti]

before was:
21:36:23: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:1

GPUs.txt file is dated 8/19/2024 11:58PM (EST/DLST)
Image
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by Joe_H »

You never responded to toTOW's directions to post the Device ID of the new card, so no one had the necessary info to add it. toTOW obtained a list of some new GPUs and their Device IDs through other sources and added them last Saturday:

viewtopic.php?p=365207#p365207 & viewtopic.php?p=365208#p365208

That appears to have included your new card, and the autoupdate the client does about once a month pulled in an updated GPUs.txt. There are now three different entries for RTX 4060 Ti cards. Card info with Device IDs provided by folders do get added more quickly.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

According to GPU-Z
10DE 2788 - 1462 5121

the entry in GPU.txt seems to be
0x10de:0x2788:2:9:AD104 [GeForce RTX 4060 Ti]
Image
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by Joe_H »

Yes, that is correct and was added in the second post I linked to. GPU-Z just showed the hexadecimal portion of the ID, the "0x" is a standard way of indicating that what follows is hexadecimal. 10de is the manufacturer code for Nvidia, 2788 is the designation assigned by Nvidia to that particular GPU.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
jchang6
Posts: 65
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by jchang6 »

it would be appear that a 4060 Ti can be either built from the size appropriate AD-106 chip or the larger AD104 chip that can accommodate either 4070, 4070 Super, or 4070 Ti, though not the 4070 Ti Super
so it is really a matter of noting which product can be built from each of the chips
Image
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Replaced failed RTX 3080 Ti - Fah shows gpu disabled

Post by Joe_H »

Nvidia does that a lot. They may take binned GPU chips with some sections that don't pass tests and disable those sections. Then they use the for a lesser model GPU card. Sometimes they may even do it with chips that pass completely, depends on the costs of having a fab schedule production of a different chip versus more of the the current one. End result is a lower cost to Nvidia overall to produce a range of cards, and the ones with the same model designation will be almost identical in processing power.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply