Page 1 of 1
Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Sun Jan 21, 2024 10:28 pm
by Beercules48
Today FaH started to cause my PC to crash. Either the PC blackscreens and Windows restarts or it just freezes completely so I have to do the old choke press on the power button. It happened with the DUD-E and Alzheimers tasks. Happens every time, reproducible. The CPU/GPU doesn't overheat or anything and other non-FaH computing jobs run perfectly fine.
Is anyone aware of any changes on the side of FaH that my cause this? Only thing I changed is that I updated the graphics driver but I'd hate to revert that one back...
Anyone else with the same issue?
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Mon Jan 22, 2024 5:04 am
by HaloJones
haven't updated my driver and I'm getting identical behaviour as you describe.
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Mon Jan 22, 2024 6:10 am
by Beercules48
Ah, that is very interesting. Makes me think it might be unrelated to the driver and saves me from the hassle of DDUing the driver and all those shenanigans.... For now I think I will try again later today and see if it is fixed...
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Mon Jan 22, 2024 8:46 am
by HaloJones
getting a Windows DPC Watchdog Violation. If I have the logs open at the moment of failure it's showing as a CUDA error before it freezes. No changes on my end. Computer is 100% stable if folding is paused. no overclock. single watercooled 1070. no cpu folding.
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Mon Jan 22, 2024 5:37 pm
by Beercules48
having that open when one anticipates a crash is a very smart idea, i might try that later IF i can be bothered to crash my PC on purpose. well, i hope that it's just an issue of work units being broken that resolves itself soon.....
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Tue Jan 23, 2024 12:31 pm
by HaloJones
if it's any use to you, testing my gpu has caused it to fail completely. I think it has been failing for a few days and is now gone entirely.
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Tue Jan 23, 2024 4:04 pm
by Joe_H
There is no general report of problems with the most recent driver from Nvidia. Please provide extracts from your logs showing the system and folding configuration as well as the WUs that are failing with the associated error messages.
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Tue Jan 23, 2024 6:26 pm
by Beercules48
HaloJones wrote: ↑Tue Jan 23, 2024 12:31 pm
if it's any use to you, testing my gpu has caused it to fail completely. I think it has been failing for a few days and is now gone entirely.
well that is troubling news of course, that your GPU failed. doesn't bode well for mine of course. so far I only have issues with FaH. so maybe the CUDA platform in my GPU has failed...
which test did you run so I can try that myself?
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Tue Jan 23, 2024 6:40 pm
by Beercules48
I had some time and ran a few tests. It is not driver related, I reverted back to 546.33 which I know for a fact has worked flawlessly in the past. Same crashes.
I checked the Windows logs, nothing suspicious there either. Since the GPU runs fine for hours during heavy gaming where she also pulls upwards of 440 Watts, it can't reasonably be the PSU. Temps are not the issue either.
So yeah, it points to a faulty CUDA platform in my GPU. Sad times....
I can't attach text files here, so I will keep the extracts from the log files as brief as I can while retaining the relevant information, please let me know what additional data might be helpful.
Code: Select all
18:17:01:I1::WU88:Project: 18215 (Run 11155, Clone 1, Gen 11)
...
18:17:01:I1::WU88: Core: Core23
18:17:01:I1::WU88: Type: 0x23
18:17:01:I1::WU88: Version: 8.0.3
...
18:17:01:I1::WU88:There are 4 platforms available.
18:17:01:I1::WU88:Platform 0: Reference
18:17:01:I1::WU88:Platform 1: CPU
18:17:01:I1::WU88:Platform 2: OpenCL
18:17:01:I1::WU88: opencl-device 0 specified
18:17:01:I1::WU88:Platform 3: CUDA
18:17:01:I1::WU88: cuda-device 0 specified
18:17:17:I1::WU88:Attempting to create CUDA context:
18:17:17:I1::WU88: Configuring platform CUDA
18:17:22:I1::WU88: Using CUDA on CUDA Platform and gpu 0
18:17:22:I1::WU88: GPU info: Platform: CUDA
18:17:22:I1::WU88: GPU info: PlatformIndex: 0
18:17:22:I1::WU88: GPU info: Device: NVIDIA GeForce RTX 3090 Ti
18:17:22:I1::WU88: GPU info: DeviceIndex: 0
18:17:22:I1::WU88: GPU info: Vendor: 0x10de
18:17:22:I1::WU88: GPU info: PCI: 45:00:00
18:17:22:I1::WU88: GPU info: Compute: 8.6
18:17:22:I1::WU88: GPU info: Driver: 12.3
18:17:22:I1::WU88: GPU info: GPU: true
18:17:22:I1::WU88:Completed 0 out of 1250000 steps (0%)
18:17:23:I1::WU88:Checkpoint completed at step 0
18:18:10:I1::WU88:Completed 12500 out of 1250000 steps (1%)
18:18:57:I1::WU88:Completed 25000 out of 1250000 steps (2%)
18:18:58:I1::WU88:Checkpoint completed at step 25000
18:19:45:I1::WU88:Completed 37500 out of 1250000 steps (3%)
[log ends abruptly]
Code: Select all
17:56:11:I1::WU87:Project: 12245 (Run 0, Clone 337, Gen 12)
17:56:11:I1::WU87:Reading tar file core.xml
17:56:11:I1::WU87:Reading tar file integrator.xml
17:56:11:I1::WU87:Reading tar file state.xml.bz2
17:56:11:I1::WU87:Reading tar file system.xml.bz2
17:56:11:I1::WU87:Digital signatures verified
17:56:11:I1::WU87:Folding@home GPU Core23 Folding@home Core
17:56:11:I1::WU87:Version 8.0.3
17:56:11:I1::WU87: Checkpoint write interval: 50000 steps (2%) [50 total]
17:56:11:I1::WU87: JSON viewer frame write interval: 25000 steps (1%) [100 total]
17:56:11:I1::WU87: XTC frame write interval: 25000 steps (1%) [100 total]
17:56:11:I1::WU87: Global context and integrator variables write interval: disabled
17:56:11:I1::WU87:There are 4 platforms available.
17:56:11:I1::WU87:Platform 0: Reference
17:56:11:I1::WU87:Platform 1: CPU
17:56:11:I1::WU87:Platform 2: OpenCL
17:56:11:I1::WU87: opencl-device 0 specified
17:56:11:I1::WU87:Platform 3: CUDA
17:56:11:I1::WU87: cuda-device 0 specified
17:56:14:I1::WU87:Attempting to create CUDA context:
17:56:14:I1::WU87: Configuring platform CUDA
17:56:17:I1::WU87: Using CUDA on CUDA Platform and gpu 0
17:56:17:I1::WU87: GPU info: Platform: CUDA
17:56:17:I1::WU87: GPU info: PlatformIndex: 0
17:56:17:I1::WU87: GPU info: Device: NVIDIA GeForce RTX 3090 Ti
17:56:17:I1::WU87: GPU info: DeviceIndex: 0
17:56:17:I1::WU87: GPU info: Vendor: 0x10de
17:56:17:I1::WU87: GPU info: PCI: 45:00:00
17:56:17:I1::WU87: GPU info: Compute: 8.6
17:56:17:I1::WU87: GPU info: Driver: 12.3
17:56:17:I1::WU87: GPU info: GPU: true
17:56:17:I1::WU87:Completed 0 out of 2500000 steps (0%)
17:56:17:I1::WU87:Checkpoint completed at step 0
.....
18:06:08:I1::WU87:Completed 875000 out of 2500000 steps (35%)
18:06:25:I1::WU87:Completed 900000 out of 2500000 steps (36%)
18:06:25:I1::WU87:Checkpoint completed at step 900000
18:06:42:I1::WU87:Completed 925000 out of 2500000 steps (37%)
[log ends]
same issue for "Project: 12280 (Run 0, Clone 336, Gen 46)", log looks basically the same so I didn't post it
I did not receive any error message and am unsure where I would find one.
Also if anyone knows a good way to test and diagnose the CUDA platform of ones GPU. Any info would be appreciated.
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Wed Jan 24, 2024 2:31 pm
by HaloJones
I tried Heaven (ancient) which seemed to work. Tried to setup Timespy but I hate Steam with an overriding passion. I tried Furmark and the computer instantly crashed. On restart the card was no longer present in Device Manager. Thankfully I have a cpu with an IGP so am working off that until I get a replacement gpu.
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Wed Jan 24, 2024 4:49 pm
by Beercules48
HaloJones wrote: ↑Wed Jan 24, 2024 2:31 pm
I tried Heaven (ancient) which seemed to work. Tried to setup Timespy but I hate Steam with an overriding passion. I tried Furmark and the computer instantly crashed. On restart the card was no longer present in Device Manager. Thankfully I have a cpu with an IGP so am working off that until I get a replacement gpu.
ah, that sucks. sorry to hear that!
well, now I am extremely hesitant to stress my GPU further, because everything except CUDA works fine, SO FAR.... thanks for the info nonetheless, I might have to bite the bullet and try that at some point to know for sure....
my cpu doesnt have that so i'd have to raid my B-rig for a GPU....
sad times.
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Sun Jan 28, 2024 2:22 pm
by toTOW
I was about to suggest to test GPU and/or PSU for stability after reading the first post ... HaloJones proved me right.
Levels of GPU hardware issues :
- NaNs detected on GPU
- GPU/driver resets
- system shutdowns/PSU triggering protections
- smoke
I had a 980 Ti that went through all steps ...
Re: Anyone having trouble with the newest NVIDIA Driver 546.65?
Posted: Sun Jan 28, 2024 10:05 pm
by Beercules48
well tried it again after the most recent nvidia driver update. to my surprise, it worked. I dunno what happened but I'm happy it works again and my GPU seems to be fine.
Code: Select all
22:01:31:I1::WU89:Completed 5000000 out of 5000000 steps (100%)
22:01:31:I1::WU89:Average performance: 263.415 ns/day
22:01:31:I1::WU89:Checkpoint completed at step 5000000
22:01:36:I1::WU89:Saving result file ..\logfile_01.txt
22:01:36:I1::WU89:Saving result file checkpointIntegrator.xml.bz2
22:01:36:I1::WU89:Saving result file checkpointState.xml.bz2
22:01:36:I1::WU89:Saving result file positions.xtc
22:01:36:I1::WU89:Saving result file science.log
22:01:36:I1::WU89:Folding@home Core Shutdown: FINISHED_UNIT
22:01:37:I1::WU89:Core returned FINISHED_UNIT (100)
22:01:37:I1::Added new work unit: cpus:0 gpus:gpu:45:00:00
22:01:37:I1::WU89:Uploading WU results
....
22:02:18:I1::WU89:Credited
so I'm back in the fold