RTX 2080Ti clWaitForEvents error (driver issue? Boost 4.0?)
Posted: Fri Apr 03, 2020 7:58 am
[Note that these problems were temporarily resolved (for a few days) by reverting to 441.87 but then the errors returned]
[Note also that underclocking helped a lot - basically just enough to cancel out the Boost 4.0 overclock but still running higher than "stock" speeds...]
[Finally got an RMA from nVidia and hoping to return it soon (and verify that the GPU fails in another rig first as well)]
RTX 2080 Ti Founders Edition, stock clocks and configuration, water cooled and never exceeds 55 degrees
I did a fair bit of verifying the cooling with various benchmarks and stress tests and then reinstalled windows from scratch to start using it from a clean slate.
Installed nVidia drivers 445.75 from the nVidia web site
Installed FaH client just yesterday from the FaH web site
My CPU runs WUs just fine when it can get them, but my GPU always gets an error on clWaitForEvents about 10% of the way into the processing. HWInfo verifies that the max GPU temp is 55 degrees which is well under its 85 degree limit.
I reinstalled nVidia 445.75 again just to be sure that a Windows Update hadn't downgraded any of its drivers and the same thing is happening. I have yet to see a WU complete on the GPU which is sad because when it gets one it appears to be a Coronavirus project that is worth a lot, but my configuration fails on it.
Here is the top of the log with my config and setup info:
And here is a section of the log where it got a GPU WU and then failed soon thereafter:
[Note also that underclocking helped a lot - basically just enough to cancel out the Boost 4.0 overclock but still running higher than "stock" speeds...]
[Finally got an RMA from nVidia and hoping to return it soon (and verify that the GPU fails in another rig first as well)]
RTX 2080 Ti Founders Edition, stock clocks and configuration, water cooled and never exceeds 55 degrees
I did a fair bit of verifying the cooling with various benchmarks and stress tests and then reinstalled windows from scratch to start using it from a clean slate.
Installed nVidia drivers 445.75 from the nVidia web site
Installed FaH client just yesterday from the FaH web site
My CPU runs WUs just fine when it can get them, but my GPU always gets an error on clWaitForEvents about 10% of the way into the processing. HWInfo verifies that the max GPU temp is 55 degrees which is well under its 85 degree limit.
I reinstalled nVidia 445.75 again just to be sure that a Windows Update hadn't downgraded any of its drivers and the same thing is happening. I have yet to see a WU complete on the GPU which is sad because when it gets one it appears to be a Coronavirus project that is worth a lot, but my configuration fails on it.
Here is the top of the log with my config and setup info:
Code: Select all
*********************** Log Started 2020-04-03T07:07:35Z ***********************
07:07:35:************************* Folding@home Client *************************
07:07:35: Website: https://foldingathome.org/
07:07:35: Copyright: (c) 2009-2018 foldingathome.org
07:07:35: Author: Joseph Coffland <[email protected]>
07:07:35: Args:
07:07:35: Config: C:\Users\Flar\AppData\Roaming\FAHClient\config.xml
07:07:35:******************************** Build ********************************
07:07:35: Version: 7.5.1
07:07:35: Date: May 11 2018
07:07:35: Time: 13:06:32
07:07:35: Repository: Git
07:07:35: Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
07:07:35: Branch: master
07:07:35: Compiler: Visual C++ 2008
07:07:35: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:07:35: Platform: win32 10
07:07:35: Bits: 32
07:07:35: Mode: Release
07:07:35:******************************* System ********************************
07:07:35: CPU: AMD Ryzen 9 3900X 12-Core Processor
07:07:35: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
07:07:35: CPUs: 24
07:07:35: Memory: 31.92GiB
07:07:35: Free Memory: 29.28GiB
07:07:35: Threads: WINDOWS_THREADS
07:07:35: OS Version: 6.2
07:07:35: Has Battery: false
07:07:35: On Battery: false
07:07:35: UTC Offset: -7
07:07:35: PID: 11640
07:07:35: CWD: C:\Users\Flar\AppData\Roaming\FAHClient
07:07:35: OS: Windows 10 Enterprise
07:07:35: OS Arch: AMD64
07:07:35: GPUs: 1
07:07:35: GPU 0: Bus:10 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti Rev.
07:07:35: A] M 13448
07:07:35: CUDA Device 0: Platform:0 Device:0 Bus:10 Slot:0 Compute:7.5 Driver:11.0
07:07:35:OpenCL Device 0: Platform:0 Device:0 Bus:10 Slot:0 Compute:1.2 Driver:445.75
07:07:35: Win32 Service: false
07:07:35:***********************************************************************
07:07:35:<config>
07:07:35: <!-- Network -->
07:07:35: <proxy v=':8080'/>
07:07:35:
07:07:35: <!-- User Information -->
07:07:35: <passkey v='********************************'/>
07:07:35: <team v='227867'/>
07:07:35: <user v='flarbear'/>
07:07:35:
07:07:35: <!-- Folding Slots -->
07:07:35: <slot id='0' type='CPU'/>
07:07:35: <slot id='1' type='GPU'>
07:07:35: <paused v='true'/>
07:07:35: </slot>
07:07:35:</config>
07:07:35:Trying to access database...
07:07:35:Successfully acquired database lock
Code: Select all
07:24:24:WU01:FS01:Connecting to 65.254.110.245:8080
07:24:24:WU01:FS01:Assigned to work server 140.163.4.241
07:24:24:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 140.163.4.241
07:24:24:WU01:FS01:Connecting to 140.163.4.241:8080
07:24:36:WU01:FS01:Downloading 7.93MiB
07:24:42:WU01:FS01:Download 97.79%
07:24:42:WU01:FS01:Download complete
07:24:42:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11745 run:0 clone:1832 gen:35 core:0x22 unit:0x0000002f8ca304f15e67ef6d4783cbcb
07:24:42:WU01:FS01:Starting
07:24:42:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Flar\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 11640 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
07:24:42:WU01:FS01:Started FahCore on PID 4752
07:24:42:WU01:FS01:Core PID:8548
07:24:42:WU01:FS01:FahCore 0x22 started
07:24:42:WU01:FS01:0x22:*********************** Log Started 2020-04-03T07:24:42Z ***********************
07:24:42:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
07:24:42:WU01:FS01:0x22: Type: 0x22
07:24:42:WU01:FS01:0x22: Core: Core22
07:24:42:WU01:FS01:0x22: Website: https://foldingathome.org/
07:24:42:WU01:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
07:24:42:WU01:FS01:0x22: Author: John Chodera <[email protected]> and Rafal Wiewiora
07:24:42:WU01:FS01:0x22: <[email protected]>
07:24:42:WU01:FS01:0x22: Args: -dir 01 -suffix 01 -version 705 -lifeline 4752 -checkpoint 15
07:24:42:WU01:FS01:0x22: -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
07:24:42:WU01:FS01:0x22: 0 -gpu 0
07:24:42:WU01:FS01:0x22: Config: <none>
07:24:42:WU01:FS01:0x22:************************************ Build *************************************
07:24:42:WU01:FS01:0x22: Version: 0.0.2
07:24:42:WU01:FS01:0x22: Date: Dec 6 2019
07:24:42:WU01:FS01:0x22: Time: 21:30:31
07:24:42:WU01:FS01:0x22: Repository: Git
07:24:42:WU01:FS01:0x22: Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
07:24:42:WU01:FS01:0x22: Branch: HEAD
07:24:42:WU01:FS01:0x22: Compiler: Visual C++ 2008
07:24:42:WU01:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:24:42:WU01:FS01:0x22: Platform: win32 10
07:24:42:WU01:FS01:0x22: Bits: 64
07:24:42:WU01:FS01:0x22: Mode: Release
07:24:42:WU01:FS01:0x22:************************************ System ************************************
07:24:42:WU01:FS01:0x22: CPU: AMD Ryzen 9 3900X 12-Core Processor
07:24:42:WU01:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
07:24:42:WU01:FS01:0x22: CPUs: 24
07:24:42:WU01:FS01:0x22: Memory: 31.92GiB
07:24:42:WU01:FS01:0x22:Free Memory: 28.58GiB
07:24:42:WU01:FS01:0x22: Threads: WINDOWS_THREADS
07:24:42:WU01:FS01:0x22: OS Version: 6.2
07:24:42:WU01:FS01:0x22:Has Battery: false
07:24:42:WU01:FS01:0x22: On Battery: false
07:24:42:WU01:FS01:0x22: UTC Offset: -7
07:24:42:WU01:FS01:0x22: PID: 8548
07:24:42:WU01:FS01:0x22: CWD: C:\Users\Flar\AppData\Roaming\FAHClient\work
07:24:42:WU01:FS01:0x22: OS: Windows 10 Pro
07:24:42:WU01:FS01:0x22: OS Arch: AMD64
07:24:42:WU01:FS01:0x22:********************************************************************************
07:24:42:WU01:FS01:0x22:Project: 11745 (Run 0, Clone 1832, Gen 35)
07:24:42:WU01:FS01:0x22:Unit: 0x0000002f8ca304f15e67ef6d4783cbcb
07:24:42:WU01:FS01:0x22:Reading tar file core.xml
07:24:42:WU01:FS01:0x22:Reading tar file integrator.xml
07:24:42:WU01:FS01:0x22:Reading tar file state.xml
07:24:43:WU01:FS01:0x22:Reading tar file system.xml
07:24:43:WU01:FS01:0x22:Digital signatures verified
07:24:43:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:24:43:WU01:FS01:0x22:Version 0.0.2
07:24:52:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
07:24:52:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
07:25:20:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
07:25:47:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
07:26:15:WU01:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
07:26:43:WU01:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
07:27:10:WU01:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
07:27:40:WU01:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
07:28:08:WU01:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
07:28:36:WU01:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
07:29:03:WU01:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
07:29:31:WU01:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
07:30:02:WU01:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
07:30:11:WU01:FS01:0x22:ERROR:exception: clWaitForEvents
07:30:11:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
07:30:11:WU01:FS01:0x22:Saving result file checkpointState.xml
07:30:13:WU01:FS01:0x22:Saving result file checkpt.crc
07:30:13:WU01:FS01:0x22:Saving result file positions.xtc
07:30:13:WU01:FS01:0x22:Saving result file science.log
07:30:13:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
07:30:13:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:30:13:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11745 run:0 clone:1832 gen:35 core:0x22 unit:0x0000002f8ca304f15e67ef6d4783cbcb
07:30:13:WU01:FS01:Uploading 7.61MiB to 140.163.4.241
07:30:13:WU01:FS01:Connecting to 140.163.4.241:8080
07:30:14:WU02:FS01:Connecting to 65.254.110.245:8080
07:30:14:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
07:30:14:WU02:FS01:Connecting to 18.218.241.186:80
07:30:14:WARNING:WU02:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
07:30:14:ERROR:WU02:FS01:Exception: Could not get an assignment
07:30:15:WU02:FS01:Connecting to 65.254.110.245:8080
07:30:15:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
07:30:15:WU02:FS01:Connecting to 18.218.241.186:80
07:30:15:WU02:FS01:Assigned to work server 40.114.52.201
07:30:15:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 40.114.52.201
07:30:15:WU02:FS01:Connecting to 40.114.52.201:8080
07:30:19:WU01:FS01:Upload 30.39%
07:30:25:WU01:FS01:Upload 73.10%
07:30:31:WU01:FS01:Upload complete
07:30:31:WU01:FS01:Server responded WORK_ACK (400)
07:30:31:WU01:FS01:Cleaning up