RTX 2080Ti clWaitForEvents error (driver issue? Boost 4.0?)

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

RTX 2080Ti clWaitForEvents error (driver issue? Boost 4.0?)

Post by flarbear »

[Note that these problems were temporarily resolved (for a few days) by reverting to 441.87 but then the errors returned]
[Note also that underclocking helped a lot - basically just enough to cancel out the Boost 4.0 overclock but still running higher than "stock" speeds...]
[Finally got an RMA from nVidia and hoping to return it soon (and verify that the GPU fails in another rig first as well)]

RTX 2080 Ti Founders Edition, stock clocks and configuration, water cooled and never exceeds 55 degrees

I did a fair bit of verifying the cooling with various benchmarks and stress tests and then reinstalled windows from scratch to start using it from a clean slate.
Installed nVidia drivers 445.75 from the nVidia web site
Installed FaH client just yesterday from the FaH web site

My CPU runs WUs just fine when it can get them, but my GPU always gets an error on clWaitForEvents about 10% of the way into the processing. HWInfo verifies that the max GPU temp is 55 degrees which is well under its 85 degree limit.

I reinstalled nVidia 445.75 again just to be sure that a Windows Update hadn't downgraded any of its drivers and the same thing is happening. I have yet to see a WU complete on the GPU which is sad because when it gets one it appears to be a Coronavirus project that is worth a lot, but my configuration fails on it.

Here is the top of the log with my config and setup info:

Code: Select all

*********************** Log Started 2020-04-03T07:07:35Z ***********************
07:07:35:************************* Folding@home Client *************************
07:07:35:        Website: https://foldingathome.org/
07:07:35:      Copyright: (c) 2009-2018 foldingathome.org
07:07:35:         Author: Joseph Coffland <[email protected]>
07:07:35:           Args: 
07:07:35:         Config: C:\Users\Flar\AppData\Roaming\FAHClient\config.xml
07:07:35:******************************** Build ********************************
07:07:35:        Version: 7.5.1
07:07:35:           Date: May 11 2018
07:07:35:           Time: 13:06:32
07:07:35:     Repository: Git
07:07:35:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
07:07:35:         Branch: master
07:07:35:       Compiler: Visual C++ 2008
07:07:35:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:07:35:       Platform: win32 10
07:07:35:           Bits: 32
07:07:35:           Mode: Release
07:07:35:******************************* System ********************************
07:07:35:            CPU: AMD Ryzen 9 3900X 12-Core Processor
07:07:35:         CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
07:07:35:           CPUs: 24
07:07:35:         Memory: 31.92GiB
07:07:35:    Free Memory: 29.28GiB
07:07:35:        Threads: WINDOWS_THREADS
07:07:35:     OS Version: 6.2
07:07:35:    Has Battery: false
07:07:35:     On Battery: false
07:07:35:     UTC Offset: -7
07:07:35:            PID: 11640
07:07:35:            CWD: C:\Users\Flar\AppData\Roaming\FAHClient
07:07:35:             OS: Windows 10 Enterprise
07:07:35:        OS Arch: AMD64
07:07:35:           GPUs: 1
07:07:35:          GPU 0: Bus:10 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti Rev.
07:07:35:                 A] M 13448
07:07:35:  CUDA Device 0: Platform:0 Device:0 Bus:10 Slot:0 Compute:7.5 Driver:11.0
07:07:35:OpenCL Device 0: Platform:0 Device:0 Bus:10 Slot:0 Compute:1.2 Driver:445.75
07:07:35:  Win32 Service: false
07:07:35:***********************************************************************
07:07:35:<config>
07:07:35:  <!-- Network -->
07:07:35:  <proxy v=':8080'/>
07:07:35:
07:07:35:  <!-- User Information -->
07:07:35:  <passkey v='********************************'/>
07:07:35:  <team v='227867'/>
07:07:35:  <user v='flarbear'/>
07:07:35:
07:07:35:  <!-- Folding Slots -->
07:07:35:  <slot id='0' type='CPU'/>
07:07:35:  <slot id='1' type='GPU'>
07:07:35:    <paused v='true'/>
07:07:35:  </slot>
07:07:35:</config>
07:07:35:Trying to access database...
07:07:35:Successfully acquired database lock
And here is a section of the log where it got a GPU WU and then failed soon thereafter:

Code: Select all

07:24:24:WU01:FS01:Connecting to 65.254.110.245:8080
07:24:24:WU01:FS01:Assigned to work server 140.163.4.241
07:24:24:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 140.163.4.241
07:24:24:WU01:FS01:Connecting to 140.163.4.241:8080
07:24:36:WU01:FS01:Downloading 7.93MiB
07:24:42:WU01:FS01:Download 97.79%
07:24:42:WU01:FS01:Download complete
07:24:42:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11745 run:0 clone:1832 gen:35 core:0x22 unit:0x0000002f8ca304f15e67ef6d4783cbcb
07:24:42:WU01:FS01:Starting
07:24:42:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Flar\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 11640 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
07:24:42:WU01:FS01:Started FahCore on PID 4752
07:24:42:WU01:FS01:Core PID:8548
07:24:42:WU01:FS01:FahCore 0x22 started
07:24:42:WU01:FS01:0x22:*********************** Log Started 2020-04-03T07:24:42Z ***********************
07:24:42:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
07:24:42:WU01:FS01:0x22:       Type: 0x22
07:24:42:WU01:FS01:0x22:       Core: Core22
07:24:42:WU01:FS01:0x22:    Website: https://foldingathome.org/
07:24:42:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
07:24:42:WU01:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
07:24:42:WU01:FS01:0x22:             <[email protected]>
07:24:42:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 4752 -checkpoint 15
07:24:42:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
07:24:42:WU01:FS01:0x22:             0 -gpu 0
07:24:42:WU01:FS01:0x22:     Config: <none>
07:24:42:WU01:FS01:0x22:************************************ Build *************************************
07:24:42:WU01:FS01:0x22:    Version: 0.0.2
07:24:42:WU01:FS01:0x22:       Date: Dec 6 2019
07:24:42:WU01:FS01:0x22:       Time: 21:30:31
07:24:42:WU01:FS01:0x22: Repository: Git
07:24:42:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
07:24:42:WU01:FS01:0x22:     Branch: HEAD
07:24:42:WU01:FS01:0x22:   Compiler: Visual C++ 2008
07:24:42:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:24:42:WU01:FS01:0x22:   Platform: win32 10
07:24:42:WU01:FS01:0x22:       Bits: 64
07:24:42:WU01:FS01:0x22:       Mode: Release
07:24:42:WU01:FS01:0x22:************************************ System ************************************
07:24:42:WU01:FS01:0x22:        CPU: AMD Ryzen 9 3900X 12-Core Processor
07:24:42:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
07:24:42:WU01:FS01:0x22:       CPUs: 24
07:24:42:WU01:FS01:0x22:     Memory: 31.92GiB
07:24:42:WU01:FS01:0x22:Free Memory: 28.58GiB
07:24:42:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
07:24:42:WU01:FS01:0x22: OS Version: 6.2
07:24:42:WU01:FS01:0x22:Has Battery: false
07:24:42:WU01:FS01:0x22: On Battery: false
07:24:42:WU01:FS01:0x22: UTC Offset: -7
07:24:42:WU01:FS01:0x22:        PID: 8548
07:24:42:WU01:FS01:0x22:        CWD: C:\Users\Flar\AppData\Roaming\FAHClient\work
07:24:42:WU01:FS01:0x22:         OS: Windows 10 Pro
07:24:42:WU01:FS01:0x22:    OS Arch: AMD64
07:24:42:WU01:FS01:0x22:********************************************************************************
07:24:42:WU01:FS01:0x22:Project: 11745 (Run 0, Clone 1832, Gen 35)
07:24:42:WU01:FS01:0x22:Unit: 0x0000002f8ca304f15e67ef6d4783cbcb
07:24:42:WU01:FS01:0x22:Reading tar file core.xml
07:24:42:WU01:FS01:0x22:Reading tar file integrator.xml
07:24:42:WU01:FS01:0x22:Reading tar file state.xml
07:24:43:WU01:FS01:0x22:Reading tar file system.xml
07:24:43:WU01:FS01:0x22:Digital signatures verified
07:24:43:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:24:43:WU01:FS01:0x22:Version 0.0.2
07:24:52:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
07:24:52:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
07:25:20:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
07:25:47:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
07:26:15:WU01:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
07:26:43:WU01:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
07:27:10:WU01:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
07:27:40:WU01:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
07:28:08:WU01:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
07:28:36:WU01:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
07:29:03:WU01:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
07:29:31:WU01:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
07:30:02:WU01:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
07:30:11:WU01:FS01:0x22:ERROR:exception: clWaitForEvents
07:30:11:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
07:30:11:WU01:FS01:0x22:Saving result file checkpointState.xml
07:30:13:WU01:FS01:0x22:Saving result file checkpt.crc
07:30:13:WU01:FS01:0x22:Saving result file positions.xtc
07:30:13:WU01:FS01:0x22:Saving result file science.log
07:30:13:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
07:30:13:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:30:13:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11745 run:0 clone:1832 gen:35 core:0x22 unit:0x0000002f8ca304f15e67ef6d4783cbcb
07:30:13:WU01:FS01:Uploading 7.61MiB to 140.163.4.241
07:30:13:WU01:FS01:Connecting to 140.163.4.241:8080
07:30:14:WU02:FS01:Connecting to 65.254.110.245:8080
07:30:14:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
07:30:14:WU02:FS01:Connecting to 18.218.241.186:80
07:30:14:WARNING:WU02:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
07:30:14:ERROR:WU02:FS01:Exception: Could not get an assignment
07:30:15:WU02:FS01:Connecting to 65.254.110.245:8080
07:30:15:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
07:30:15:WU02:FS01:Connecting to 18.218.241.186:80
07:30:15:WU02:FS01:Assigned to work server 40.114.52.201
07:30:15:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 40.114.52.201
07:30:15:WU02:FS01:Connecting to 40.114.52.201:8080
07:30:19:WU01:FS01:Upload 30.39%
07:30:25:WU01:FS01:Upload 73.10%
07:30:31:WU01:FS01:Upload complete
07:30:31:WU01:FS01:Server responded WORK_ACK (400)
07:30:31:WU01:FS01:Cleaning up
Last edited by flarbear on Sat Apr 18, 2020 4:00 am, edited 5 times in total.
Roger.Weihrauch
Posts: 3
Joined: Fri Apr 03, 2020 8:04 am
Hardware configuration: GPU: 2x RX580, 1x GTX1050Ti
CPU: AMD Threadripper 1900X
Board: Ausus ROG Zenith extreme
RAM: 128 GB Kingston Value
Location: Heerbrugg, Switzerland

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by Roger.Weihrauch »

Hello flarbear
To me, this source for your described error is the:
07:30:13:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
message/result of WU.
Reasons for this:
1st) not completely downloaded WU (but log shows: 07:24:42:WU01:FS01:Download complete)
2nd) HW/Mainboard settings for the PCIE/GPU settings (Did you overclock something, not GPU?, but some CPU/PCIE settings?)
3rd) GPU 'driver', since 'çl...'something relates to OpenCL, which is used in FaH afaik. You use the latest one?
4th) I tried also to use MSI Afterburner for some 'tuning', but it was a mess: no more windows in Win10, dark screen after booting, ...
So I do not suggest any overclocking, also not with additional tools to do/support you in this.

Regards,
Roger
If there is Help needed, one should provide it.
uyaem
Posts: 219
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by uyaem »

Have you ever run Furmark/ stress test (or similar) to see if your card is working fine?
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by flarbear »

Thanks for the requests for clarification everyone! In order:

To Roger:
1) As you said, the download complete message is there
2) No settings for PCIe or GPU changed, all I did was to enable (AMD equivalent of) XMP for the memory to read the stock rating for the CPU RAM. I also initialized the fan curves for the cooling and the GPU temp is way below thermal throttling.
3) As I said above, I'm using nVidia 445.75 which was the latest as of when I reinstalled everything a few days ago. It is still the latest, and I reinstalled it again last night.
4) I didn't use MSI Afterburner or any overclocking software. I've played a few hours of gaming with absolutely no glitches.

(I have been reading about people who claimed there were memory problems with the 2080Ti and noting that many GPU memory benchmarking tools were noticing that the GPU memory stress tests would down-clock their memory by 200MHz - that's an underclock, not an overclock. I've thought about underclocking the RAM to see if the problems go away, but I haven't gotten there yet. Every one of those people had early 2080s with Micron memory, but my card was a more recent batch from this fall which has Samsung memory.)

To uyaem:
I ran Furmark for quite a while, also, blender for its full run of more than half an hour, Unigen Heaven for about 40 minutes and the temps barely broke 70c (out of 85c thermal max). With the small amount of time that it runs the WUs, the longest I saw was one got about 10 minutes before it gave up and the temps were under 55c. I've also owned this card for several months now and played games with it for quite some time with no issues. It was recently that I swapped its stock cooler for the water block and moved it to the new build, but I've been using it in the new build for about a month while I was doing the thermal testing and making sure all of my daily activities worked just fine before I wiped the drive and reinstalled Windows just a few days ago. I only installed FaH day before yesterday after I got it all set up for my daily activities.
flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by flarbear »

Some more investigations from late last night and this morning.

Reading up on the FaH web site, they recommend running memtestCL to verify your "overclocks" (mine isn't overclocked).

The first couple of times I ran it, it got an Out Of Resources error that made it quit early. This was right after I had seen a couple of WUs fail in FaH, but I had the GPU instance on Pause at the time I ran the memtestCL tests. I started to notice that my Windows menus were not responding, though, which made me suspect that my system had gotten into a bad state (for the record, I've never seen that problem with the Windows menu either when the card was installed in the previous system for months, or in the month of testing this new build, or in the week since I wiped and reinstalled Windows), so I rebooted and the system seemed fine again.

At that point the memtestCL ran just fine on short runs, but I started doing things like opening Chrome windows while it was running and oddly the test seemed to get errors when I was doing something with the GPU in another thread/process (note that the memtestCL was only using 128M out of an 11GB card so resources shouldn't have been an issue). At this point I noticed that the memtestCL tasks were taking about a second each to run, except if I moved the mouse then they completed in milliseconds. What? This was like in the old Windows XP days where you could wave your mouse to break wait states, but why would it be affecting memtestCL?

I also noticed that it could accumulate a number of errors in the "Random blocks" test that were not fatal. It looks like this is a long-standing bug in the test that is discussed in a large number of threads - the random test is broken somehow. I looked briefly at the code and saw that it was doing things like seeding the random data sets in both the work and the verify phase with the same seed so that it could repeat the randomness between the filling and the verifying phases. But it would then parallelize both tasks and it would have each separate test thread use a variant of the seed that depended on its "threadID". Hmm, this was assuming that the parallelization between generating the data and verifying the results were happening identically, but i'm not sure it can assume that...? Sounds like the random test is buggy and I am not bothered by the errors there. Also, I noticed that the errors often correlated to doing things with the GPU like moving windows around. Grab a window and drag and the "random" errors accumulated.

At this point, I'm wondering if there is something about the drivers that improperly protects the OpenCL library against multi-threaded access.

Is anyone else using nVidia 445.75 drivers on FaH without problems? These drivers were just released on March 23rd...
ipkh
Posts: 173
Joined: Thu Jul 16, 2015 2:03 pm

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by ipkh »

XMP is an overclock to memory and thus warrants testing.
Your testing is revealing abnormal behavior and reducing memory to stock and testing is recommended to eliminate 1 potential problem.
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by HaloJones »

Hard to be definitive but if it's doing this on multiple units, it does point to your end. You're water-cooled so it's highly unlikely to be temps. I'd look at the software end of things. Remove the FAH client and all associated directories.

Run DDU to de-install all the graphics drivers.

Download the latest NVidia drivers and re-install.
Reboot
Download MSI Afterburner and maybe set the power limit to 90% or so
Download FAH and install.
Delete the CPU slot and ensure you have a single GPU slot.
See if you can fold a GPU unit
single 1070

Image
Narcil
Posts: 2
Joined: Fri Apr 03, 2020 6:29 pm
Hardware configuration: 6600K + 1080FE

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by Narcil »

Try the hotfixed driver 445.78 released soon after 445.75, .75 was causing issues with older games because of openCL issues.
I'm folding with .78 without issues albeit on a 1080.
Last edited by Narcil on Fri Apr 03, 2020 7:14 pm, edited 3 times in total.
flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by flarbear »

To ipkh:

While XMP is technically an overclock, I did this on the first day when I built the system over a month ago and have never had any problems with any of the many CPU stress tests I have done to vet the system (prime, blender, cinebench, etc). Also, I only chose the base profile that sets the rated timings for the RAM I bought listed on the package and they are well within the timings that my Ryzen 3900x is capable of. These timings have been well tested prior to folding.

Also, XMP only affects CPU RAM, not the GPU. The FaH CPU WUs are running just fine. Only the GPU WUs are failing. This is a GPU problem and the GPU has no overclocks whatsoever, not on memory or processing - all stock.

To HaloJones:

I'll look into that. I think I might just downgrade the drivers to an earlier nVidia driver set to see how that affects things as a first step. Is there any tracking of which nVidia releases people are running FaH on? It would be nice if the FaH database would publish which OpenCL driver versions are returning correct results so we could know which version of the drivers to try out if we have a problem.

Meanwhile, after seeing all of the complaints about memtestCL giving errors randomly (on the "random" test for irony), and the fact that it is literally a decade since it was last updated, I found another "overclock tester" (OCCT v5.5.5) that seems to be more actively developed and has a GPU memtest and it is running just fine for >30 minutes now with no errors detected. I'll leave it running for an hour or two and see what it turns up. On the flip side, though, it is just a general memory test and doesn't necessarily do any OpenCL work so this is just telling me if the GPU RAM chips are running fine.

It occurs to me that while I may have been doing a bunch of gaming, it is not clear if the games are using OpenCL so I may not have tested OpenCL before running FaH. I did run Blender which should have had an OpenCL mode, but I will double-check on that. Are there standard OpenCL benchmarks or stress testers that people prefer?
flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by flarbear »

Thanks Narcil, I tried installing 445.78 over 445.75 and had to wait a bit for a GPU WU, but it also failed with a clWaitForEvents error as well. When I have time later this evening I'll try a clean install with DDU to see if that helps. The nVidia control panel listed all components as being from 445.78, though.

Meanwhile I also ran the luxmark OpenCL benchmarks and all returned valid images and then ran the stress test on that for over an hour with no errors. I'll look for more OpenCL benchmarks and stress tests later this evening.
flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by flarbear »

I just did a full uninstall of FaH and then did a full DDU in safe mode and reinstalled 445.78 without network connectivity. Reconnected network and redownloaded the FaH 7.5.1 client and installed it.

Now things are much worse. I get the clWaitForEvent error failure almost immediately on every single GPU WU. Before it might do 10% work and waste 10 minutes of time and energy before that happened.

So, a clean install made it 100% worse...?
ipkh
Posts: 173
Joined: Thu Jul 16, 2015 2:03 pm

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by ipkh »

There is literally no test you can run that can prove stability in another use.
Something is obviously wrong with your system. Could be a degraded RAM module, CPU, CPU memory controller or something with the GPU. You will need to remove all variables to figure out what's broken.
Start by removing PBO and RAM XMP. Test and see if it still happens. That will point towards a cause.
pavelanni
Posts: 2
Joined: Fri Apr 03, 2020 11:54 pm

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by pavelanni »

I'm seeing _exactly_ the same set of errors. I'm running a Linux Mint 19.3 box with 440 drivers and a 1080Ti card. The only common thing is that both cards are Nvidia.
flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by flarbear »

To rule out the possibilities of a configuration issue, I just did a full BIOS reset of all parameters and set all fans and the pump to maximum. The memory is running at the Least Common Denominator speed of 2660MHz. Results:
  • The CPU is about 3 degrees cooler running a Core A7 job than it was previously.
  • The GPU still gets a clWaitForEvents error immediately on starting a new job.
For full disclosure of previous settings, PBO was always at its default setting which lets the processor manage all clocks and power distribution. XMP was set to use the standard (non-extreme) profile1 which is the base rating of the memory.

To agree, it is true that any single test cannot predict the outcome of any other test. But many tests all designed to specifically stress a system in a variety of use cases and all of them showing no problems at all for repeated runs tends to reduce the probability of any other test showing an issue in the target system. It is true that it doesn't "prove" anything, but Occam's razor begins to suggest that there are other points of failure that are more likely.

The only thing that has made any difference so far is that reinstalling the video drivers and the FaH client after a system cleanse with no other parameters varied has taken the system from a state where all WUs would get the error after 5-10% completion to the new state where all WUs get the error exactly at 0%, it literally prints the "0 out of NNNNN" message, the GPU requirements message, and then immediately gets the error.

If a reinstall can have such a large effect it sounds like the software is having an interesting and probably unintended interaction.

I'd still like to get a read on which drivers are most likely to produce a completed WU...
flarbear
Posts: 27
Joined: Fri Apr 03, 2020 7:45 am

Re: RTX 2080Ti always gets clWaitForEvents error on every WU

Post by flarbear »

HaloJones wrote:Hard to be definitive but if it's doing this on multiple units, it does point to your end. You're water-cooled so it's highly unlikely to be temps. I'd look at the software end of things. Remove the FAH client and all associated directories.

Run DDU to de-install all the graphics drivers.

Download the latest NVidia drivers and re-install.
Reboot
Download MSI Afterburner and maybe set the power limit to 90% or so
Download FAH and install.
Delete the CPU slot and ensure you have a single GPU slot.
See if you can fold a GPU unit
Following up on this suggestion...

I've done the FaH uninstall and DDU/network-free reinstall of 445.78 and it made the GPU problem worse (as mentioned above - clWaitForEvents errors are now immediate).
I then deleted the CPU slot as you suggested and all I have now is a GPU slot and it is still having the same error.
Post Reply