Page 1 of 1

Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Thu Apr 02, 2020 7:28 pm
by Jertzuu
So, I have had this problem for almost 2 weeks. My GPU is unable to finish the projects assigned to it. I did manage to finish 2 or 3 project successfully when I started folding 2 weeks ago.

The GPU is not overclocked at the moment, rather underclocked. I have tried everything I could find from the forum, reinstalled the client, reinstalled graphics drivers directly from NVidia, changed GPU core index from -1 to 1 and back to -1 again, lowered the clocks on the GPU, lowered and raised power limit and raised the core voltage.

I'm really starting to run out of options and knowledge on behalf of the problem. Is there anything else I could try to make it work?

The GPU is a Zotac AMP! Extreme GTX 1080 Ti if that is of any help.

Could this just be a bad silicon lottery and a card that is just not capable of folding? I have seen a few other instances on the forum, but I highly doubt it since I was able to finish a few of my projects.

If there is any additional information needed, I am more than happy to try and provide it.

And according to GPU-Z, I have OpenCL enabled

Code: Select all

18:33:43:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:11751 run:0 clone:1753 gen:5 core:0x22 unit:0x0000000b8ca304e75e6a8042cbeba995
18:33:43:WU00:FS01:Starting
18:33:43:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\jeret\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 19108 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
18:33:43:WU00:FS01:Started FahCore on PID 25732
18:33:43:WU00:FS01:Core PID:48232
18:33:43:WU00:FS01:FahCore 0x22 started
18:33:43:WU00:FS01:0x22:*********************** Log Started 2020-04-02T18:33:43Z ***********************
18:33:43:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
18:33:43:WU00:FS01:0x22:       Type: 0x22
18:33:43:WU00:FS01:0x22:       Core: Core22
18:33:43:WU00:FS01:0x22:    Website: https://foldingathome.org/
18:33:43:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
18:33:43:WU00:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
18:33:43:WU00:FS01:0x22:             <[email protected]>
18:33:43:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 25732 -checkpoint 15
18:33:43:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
18:33:43:WU00:FS01:0x22:             0 -gpu 0
18:33:43:WU00:FS01:0x22:     Config: <none>
18:33:43:WU00:FS01:0x22:************************************ Build *************************************
18:33:43:WU00:FS01:0x22:    Version: 0.0.2
18:33:43:WU00:FS01:0x22:       Date: Dec 6 2019
18:33:43:WU00:FS01:0x22:       Time: 21:30:31
18:33:43:WU00:FS01:0x22: Repository: Git
18:33:43:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
18:33:43:WU00:FS01:0x22:     Branch: HEAD
18:33:43:WU00:FS01:0x22:   Compiler: Visual C++ 2008
18:33:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:33:43:WU00:FS01:0x22:   Platform: win32 10
18:33:43:WU00:FS01:0x22:       Bits: 64
18:33:43:WU00:FS01:0x22:       Mode: Release
18:33:43:WU00:FS01:0x22:************************************ System ************************************
18:33:43:WU00:FS01:0x22:        CPU: AMD Ryzen 5 2600 Six-Core Processor
18:33:43:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
18:33:43:WU00:FS01:0x22:       CPUs: 12
18:33:43:WU00:FS01:0x22:     Memory: 31.92GiB
18:33:43:WU00:FS01:0x22:Free Memory: 24.05GiB
18:33:43:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
18:33:43:WU00:FS01:0x22: OS Version: 6.2
18:33:43:WU00:FS01:0x22:Has Battery: false
18:33:43:WU00:FS01:0x22: On Battery: false
18:33:43:WU00:FS01:0x22: UTC Offset: 3
18:33:43:WU00:FS01:0x22:        PID: 48232
18:33:43:WU00:FS01:0x22:        CWD: C:\Users\jeret\AppData\Roaming\FAHClient\work
18:33:43:WU00:FS01:0x22:         OS: Windows 10 Pro
18:33:43:WU00:FS01:0x22:    OS Arch: AMD64
18:33:43:WU00:FS01:0x22:********************************************************************************
18:33:43:WU00:FS01:0x22:Project: 11751 (Run 0, Clone 1753, Gen 5)
18:33:43:WU00:FS01:0x22:Unit: 0x0000000b8ca304e75e6a8042cbeba995
18:33:43:WU00:FS01:0x22:Reading tar file core.xml
18:33:43:WU00:FS01:0x22:Reading tar file integrator.xml
18:33:43:WU00:FS01:0x22:Reading tar file state.xml
18:33:44:WU00:FS01:0x22:Reading tar file system.xml
18:33:46:WU00:FS01:0x22:Digital signatures verified
18:33:46:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
18:33:46:WU00:FS01:0x22:Version 0.0.2
18:33:59:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
18:33:59:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
18:34:47:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
18:35:34:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
18:36:21:WU00:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
18:37:08:WU00:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
18:37:55:WU00:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
18:38:47:WU00:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
18:39:34:WU00:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
18:40:21:WU00:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
18:41:09:WU00:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
18:41:56:WU00:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
18:42:48:WU00:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
18:43:35:WU00:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
18:44:22:WU00:FS01:0x22:Completed 130000 out of 1000000 steps (13%)
18:45:10:WU00:FS01:0x22:Completed 140000 out of 1000000 steps (14%)
18:45:57:WU00:FS01:0x22:Completed 150000 out of 1000000 steps (15%)
18:46:49:WU00:FS01:0x22:Completed 160000 out of 1000000 steps (16%)
18:47:36:WU00:FS01:0x22:Completed 170000 out of 1000000 steps (17%)
18:48:23:WU00:FS01:0x22:Completed 180000 out of 1000000 steps (18%)
18:49:10:WU00:FS01:0x22:Completed 190000 out of 1000000 steps (19%)
18:49:30:WU00:FS01:0x22:ERROR:exception: clWaitForEvents
18:49:30:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
18:49:30:WU00:FS01:0x22:Saving result file checkpointState.xml
18:49:33:WU00:FS01:0x22:Saving result file checkpt.crc
18:49:33:WU00:FS01:0x22:Saving result file positions.xtc
18:49:33:WU00:FS01:0x22:Saving result file science.log
18:49:34:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:49:34:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:49:34:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11751 run:0 clone:1753 gen:5 core:0x22 unit:0x0000000b8ca304e75e6a8042cbeba995
18:49:34:WU00:FS01:Uploading 8.00MiB to 140.163.4.231

Re: Problem with 1080 Ti BAD_WORK_UNIT

Posted: Thu Apr 02, 2020 7:51 pm
by ipkh
Unfortunately this sounds like a faulty GPU. I have never had a card fold and fail and escape RMA status. Assuming you are not using any flags like advanced or beta, it would appear your GPU or system has a problem.
Do you have another gpu you can test in that system? This would help narrow down any problems.

Re: Problem with 1080 Ti BAD_WORK_UNIT

Posted: Thu Apr 02, 2020 8:15 pm
by Jertzuu
ipkh wrote:Unfortunately this sounds like a faulty GPU. I have never had a card fold and fail and escape RMA status. Assuming you are not using any flags like advanced or beta, it would appear your GPU or system has a problem.
Do you have another gpu you can test in that system? This would help narrow down any problems.
I have a few yes, and I could try them tomorrow. But I do doubt that it could be an issue somewhere else in the system other than in the GPU itself.

I just removed all previous drivers with DDU, and I'll see if that helped anything

Re: Problem with 1080 Ti BAD_WORK_UNIT

Posted: Fri Apr 03, 2020 12:00 am
by toTOW
Does the card runs fine in Furmak with manufacturer default clocks ?

Re: Problem with 1080 Ti BAD_WORK_UNIT

Posted: Fri Apr 03, 2020 6:42 am
by Jertzuu
toTOW wrote:Does the card runs fine in Furmak with manufacturer default clocks ?
I haven't tried Furmark, but 3DMark stress tests give about 92% stability on stock clocks. With my current settings I get about 98,5%. Could it just be a bad card?

Re: Problem with 1080 Ti BAD_WORK_UNIT

Posted: Fri Apr 03, 2020 7:28 am
by Jertzuu
Just got the following in my log

Code: Select all

07:25:33:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11764 run:0 clone:5162 gen:20 core:0x22 unit:0x0000002880fccb0a5e71130fc7c49beb
07:25:33:WU02:FS01:Starting
07:25:33:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\jeret\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 705 -lifeline 6036 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
07:25:33:WU02:FS01:Started FahCore on PID 3860
07:25:33:WU02:FS01:Core PID:5848
07:25:33:WU02:FS01:FahCore 0x22 started
07:25:34:WU02:FS01:0x22:*********************** Log Started 2020-04-03T07:25:33Z ***********************
07:25:34:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
07:25:34:WU02:FS01:0x22:       Type: 0x22
07:25:34:WU02:FS01:0x22:       Core: Core22
07:25:34:WU02:FS01:0x22:    Website: https://foldingathome.org/
07:25:34:WU02:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
07:25:34:WU02:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
07:25:34:WU02:FS01:0x22:             <[email protected]>
07:25:34:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 3860 -checkpoint 15
07:25:34:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
07:25:34:WU02:FS01:0x22:             0 -gpu 0
07:25:34:WU02:FS01:0x22:     Config: <none>
07:25:34:WU02:FS01:0x22:************************************ Build *************************************
07:25:34:WU02:FS01:0x22:    Version: 0.0.2
07:25:34:WU02:FS01:0x22:       Date: Dec 6 2019
07:25:34:WU02:FS01:0x22:       Time: 21:30:31
07:25:34:WU02:FS01:0x22: Repository: Git
07:25:34:WU02:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
07:25:34:WU02:FS01:0x22:     Branch: HEAD
07:25:34:WU02:FS01:0x22:   Compiler: Visual C++ 2008
07:25:34:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:25:34:WU02:FS01:0x22:   Platform: win32 10
07:25:34:WU02:FS01:0x22:       Bits: 64
07:25:34:WU02:FS01:0x22:       Mode: Release
07:25:34:WU02:FS01:0x22:************************************ System ************************************
07:25:34:WU02:FS01:0x22:        CPU: AMD Ryzen 5 2600 Six-Core Processor
07:25:34:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
07:25:34:WU02:FS01:0x22:       CPUs: 12
07:25:34:WU02:FS01:0x22:     Memory: 31.92GiB
07:25:34:WU02:FS01:0x22:Free Memory: 27.68GiB
07:25:34:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
07:25:34:WU02:FS01:0x22: OS Version: 6.2
07:25:34:WU02:FS01:0x22:Has Battery: false
07:25:34:WU02:FS01:0x22: On Battery: false
07:25:34:WU02:FS01:0x22: UTC Offset: 3
07:25:34:WU02:FS01:0x22:        PID: 5848
07:25:34:WU02:FS01:0x22:        CWD: C:\Users\jeret\AppData\Roaming\FAHClient\work
07:25:34:WU02:FS01:0x22:         OS: Windows 10 Pro
07:25:34:WU02:FS01:0x22:    OS Arch: AMD64
07:25:34:WU02:FS01:0x22:********************************************************************************
07:25:34:WU02:FS01:0x22:Project: 11764 (Run 0, Clone 5162, Gen 20)
07:25:34:WU02:FS01:0x22:Unit: 0x0000002880fccb0a5e71130fc7c49beb
07:25:34:WU02:FS01:0x22:Reading tar file core.xml
07:25:34:WU02:FS01:0x22:Reading tar file integrator.xml
07:25:34:WU02:FS01:0x22:Reading tar file state.xml
07:25:35:WU02:FS01:0x22:Reading tar file system.xml
07:25:36:WU02:FS01:0x22:Digital signatures verified
07:25:36:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
07:25:36:WU02:FS01:0x22:Version 0.0.2
07:26:03:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)
07:26:03:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
07:26:55:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
07:26:55:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
07:27:18:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
07:27:18:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
07:27:42:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
07:27:42:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
07:27:42:WU02:FS01:0x22:ERROR:114: Max Retries Reached
07:27:42:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
07:27:42:WU02:FS01:0x22:Saving result file badstate-0.xml
07:27:42:WU02:FS01:0x22:Saving result file badstate-1.xml
07:27:42:WU02:FS01:0x22:Saving result file badstate-2.xml
07:27:42:WU02:FS01:0x22:Saving result file checkpt.crc
07:27:42:WU02:FS01:0x22:Saving result file science.log
07:27:42:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
07:27:43:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:27:43:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11764 run:0 clone:5162 gen:20 core:0x22 unit:0x0000002880fccb0a5e71130fc7c49beb
07:27:43:WU02:FS01:Uploading 59.12MiB to 128.252.203.10

Re: Problem with 1080 Ti BAD_WORK_UNIT

Posted: Fri Apr 03, 2020 12:26 pm
by ipkh
A failure that quickly into processing strikes me as either a genuinely bad set of WUs or a faulty GPU. A faulty GPU or system instability is much more likely and I'd contact the gpu manufacturer for guidance as you'll need to start there for RMA purposes anyway.

Re: Problem with 1080 Ti BAD_WORK_UNIT

Posted: Fri Apr 03, 2020 7:02 pm
by Jertzuu
ipkh wrote:A failure that quickly into processing strikes me as either a genuinely bad set of WUs or a faulty GPU. A faulty GPU or system instability is much more likely and I'd contact the gpu manufacturer for guidance as you'll need to start there for RMA purposes anyway.
Bad GPU is what I'm suspecting as well, but I do think it is rather weird that is started acting up after a few projects. I don't really believe it could be a set of bad WUs

I need to look up the paperwork from this card, since I bought it 2nd hand

Re: Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Sat Apr 04, 2020 12:00 am
by pavelanni
I just have joined the F@H community and started with a self-assembled box with 1080i. And I see the same error last three days (actually the whole time since I joined). A couple of times I saw the GPU becoming busy and it even moved ahead with a WU for a fraction of percent, but then it dropped again. My system is Linux Mint 19.3 with CUDA 10.2 and drivers v.440 installed. The software recognizes it fine, but can't start any WU. I don't think the GPU is bad -- I just tested CUDA libraries with TensorFlow and it worked fine.

Re: Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Sat Apr 04, 2020 12:40 am
by PantherX
Welcome to the F@H Forum pavelanni,

Can you please create a new there here (viewforum.php?f=61) and in your post, please provide the log ensuring that you include the top section of the log which has your system configuration (details in my signature). Once we have that, we will have sufficient data to start troubleshooting :)

Re: Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Sat Apr 04, 2020 12:40 pm
by toTOW
Jertzuu> would it be possible to test your GPU in another system ? Or another GPU in your system ?

Re: Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Sat Apr 04, 2020 4:48 pm
by Jertzuu
toTOW wrote:Jertzuu> would it be possible to test your GPU in another system ? Or another GPU in your system ?
Yes, I do have another system to try it out. I'll see if I have time tomorrow to test it

Re: Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Mon Apr 27, 2020 8:02 am
by Jertzuu
Did a bit of brainstorming with my friend, and we came to the conclusion that I need to lower the clocks in my CPU, to try if I can get my GPU to fold, since the overclocks would crash my CPU units. And we're not mistaken GPU folding needs a thread to work. So far so good, it seems to have done the trick. I'll post updates to help other people with the same problem

Edit: still getting the following log about once per WU, but now it is able to continue:

Code: Select all

07:17:45:WU00:FS00:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
07:17:45:WU00:FS00:0x22:Following exception occured: Particle coordinate is nan
CPU is running at stock speeds, GPU core clock is lowered as low as MSI Afterburner will allow, and changed thread amount to 9 in FAHControl.

GPU-Z is still giving me vRel, but as of now it seems to be folding nicely. 2 hours per GPU WUs, and around 4 hours on CPU WUs

Re: Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Sat May 02, 2020 8:57 am
by ipkh
You could try changing cpu to 8. nan is likely a WU error and not directly related to overclocking.

Re: Problem with 1080 Ti BAD_WORK_UNIT or Bad GPU?

Posted: Sat May 02, 2020 4:15 pm
by Joe_H
The NaN was in the GPU WU, and yes there is a high correlation of a NaN occurring and an overclock on a GPU being too much, a voltage set too low, or overheating of some component(s) on the GPU.

Reducing the CPU thread count does provide an extra thread for GPU processing or general system overhead. With some of the larger GPU WUs that extra thread is useful in keeping the GPU processing at full speed.