Page 1 of 1

Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Thu Apr 30, 2020 2:05 pm
by kwthom
Potential stability issues?

This has happened a few times; I finally caught the log the last time it happened - after my computer crashed & rebooted:

Thoughts?

Thanks!

Code: Select all

*********************** Log Started 2020-04-29T19:21:19Z ***********************
19:21:55:FS01:Unpaused
19:21:56:WU00:FS01:Starting
19:21:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\kwtho\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 5532 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:21:56:WU00:FS01:Started FahCore on PID 11208
19:21:56:WU00:FS01:Core PID:11232
19:21:56:WU00:FS01:FahCore 0x22 started
19:21:56:WU00:FS01:0x22:*********************** Log Started 2020-04-29T19:21:56Z ***********************
19:21:56:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:21:56:WU00:FS01:0x22:       Type: 0x22
19:21:56:WU00:FS01:0x22:       Core: Core22
19:21:56:WU00:FS01:0x22:    Website: https://foldingathome.org/
19:21:56:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
19:21:56:WU00:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
19:21:56:WU00:FS01:0x22:             <[email protected]>
19:21:56:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 11208 -checkpoint 15
19:21:56:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:21:56:WU00:FS01:0x22:     Config: <none>
19:21:56:WU00:FS01:0x22:************************************ Build *************************************
19:21:56:WU00:FS01:0x22:    Version: 0.0.2
19:21:56:WU00:FS01:0x22:       Date: Dec 6 2019
19:21:56:WU00:FS01:0x22:       Time: 21:30:31
19:21:56:WU00:FS01:0x22: Repository: Git
19:21:56:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
19:21:56:WU00:FS01:0x22:     Branch: HEAD
19:21:56:WU00:FS01:0x22:   Compiler: Visual C++ 2008
19:21:56:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:21:56:WU00:FS01:0x22:   Platform: win32 10
19:21:56:WU00:FS01:0x22:       Bits: 64
19:21:56:WU00:FS01:0x22:       Mode: Release
19:21:56:WU00:FS01:0x22:************************************ System ************************************
19:21:56:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
19:21:56:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:21:56:WU00:FS01:0x22:       CPUs: 6
19:21:56:WU00:FS01:0x22:     Memory: 15.93GiB
19:21:56:WU00:FS01:0x22:Free Memory: 13.10GiB
19:21:56:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
19:21:56:WU00:FS01:0x22: OS Version: 6.2
19:21:56:WU00:FS01:0x22:Has Battery: false
19:21:56:WU00:FS01:0x22: On Battery: false
19:21:56:WU00:FS01:0x22: UTC Offset: -7
19:21:56:WU00:FS01:0x22:        PID: 11232
19:21:56:WU00:FS01:0x22:        CWD: C:\Users\kwtho\AppData\Roaming\FAHClient\work
19:21:56:WU00:FS01:0x22:         OS: Windows 10 Pro
19:21:56:WU00:FS01:0x22:    OS Arch: AMD64
19:21:56:WU00:FS01:0x22:********************************************************************************
19:21:56:WU00:FS01:0x22:Project: 16435 (Run 2194, Clone 1, Gen 2)
19:21:56:WU00:FS01:0x22:Unit: 0x0000000303854c135e9a4ef8ab85c093
19:21:56:WU00:FS01:0x22:Digital signatures verified
19:21:56:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:21:56:WU00:FS01:0x22:Version 0.0.2
19:21:56:WU00:FS01:0x22:  Found a checkpoint file
19:22:04:WU00:FS01:0x22:ERROR:Guru Meditation #0.3153f6969d0b62 (7.7) '00/01/stepsDone'
19:22:04:WU00:FS01:0x22:WARNING:Unexpected exit() call
19:22:04:WU00:FS01:0x22:WARNING:Unexpected exit from science code
19:22:04:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
19:22:04:WU00:FS01:0x22:Saving result file checkpointState.xml
19:22:04:WU00:FS01:0x22:Saving result file checkpt.crc
19:22:04:WU00:FS01:0x22:Saving result file positions.xtc
19:22:04:WU00:FS01:0x22:Saving result file science.log
19:22:04:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
19:22:05:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:22:05:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:16435 run:2194 clone:1 gen:2 core:0x22 unit:0x0000000303854c135e9a4ef8ab85c093
19:22:05:WU00:FS01:Uploading 57.13MiB to 3.133.76.19
19:22:05:WU00:FS01:Connecting to 3.133.76.19:8080
19:22:05:WU01:FS01:Connecting to 65.254.110.245:80
19:22:06:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:06:WU01:FS01:Connecting to 18.218.241.186:80
19:22:07:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:07:WU01:FS01:Connecting to 65.254.110.245:80
19:22:07:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:07:WU01:FS01:Connecting to 18.218.241.186:80
19:22:07:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:07:ERROR:WU01:FS01:Exception: Could not get an assignment
19:22:08:WU01:FS01:Connecting to 65.254.110.245:80
19:22:08:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:08:WU01:FS01:Connecting to 18.218.241.186:80
19:22:08:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:08:WU01:FS01:Connecting to 65.254.110.245:80
19:22:08:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:08:WU01:FS01:Connecting to 18.218.241.186:80
19:22:09:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:09:ERROR:WU01:FS01:Exception: Could not get an assignment
19:22:11:WU00:FS01:Upload 6.02%
19:23:29:WU00:FS01:Upload 98.24%
19:23:30:WU00:FS01:Upload complete
19:23:30:WU00:FS01:Server responded WORK_ACK (400)
19:23:30:WU00:FS01:Cleaning up

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Thu Apr 30, 2020 2:15 pm
by Neil-B
Up Front: There are a number of issues currently in lay that link with this Project and those servers which may/may no relate to the issue you are having and the team have been working on these for number of days and have yet to resolve them.

However, the type of failure I am seeing (and I am not a GPU specialist) is similar to some where people have been having one of a variety of issues with their GPUs - hopefully one of the GPU folders will step in and advise.

Could you repost your log and include the first couple of hundred lines of the log as there may be some clues in the configuration settings that will display there … Given someone is likely to ask I'll mention it now, Is you GPU running at Stock speeds, Stock OC speeds or Bespoke OC speeds?

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Thu Apr 30, 2020 3:09 pm
by kwthom
I've paused GPU processing for the last ~24 hours, so all it's got at the moment is CPU efforts.

I'll need to revisit in a few hours for the GPU details, but I've been under-clocking it for the last week, due to ambient heat issues (it's now hot in the desert southwest...)

Thanks!

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Thu Apr 30, 2020 3:13 pm
by Neil-B
Can you send some of your heat please over to the UK please? … Cold, wet , miserable at the moment - so typical British weather I suppose … even with my server toasting my toes it feels sub optimal in the office.

Hope someone manages to help and get this sorted for you.

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Thu Apr 30, 2020 6:48 pm
by PantherX
Can you please post the first 100 lines of your log file so we can see what hardware the client has detected and how its configured?

Also, have you configured an exception for the client files from your anti-virus/anti-malware/anit-spyware/anti-ransomeware software?

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Thu Apr 30, 2020 9:08 pm
by kwthom
PantherX wrote:Can you please post the first 100 lines of your log file so we can see what hardware the client has detected and how its configured?

Code: Select all

*********************** Log Started 2020-04-30T21:02:59Z ***********************
21:02:59:****************************** FAHClient ******************************
21:02:59:        Version: 7.6.9
21:02:59:         Author: Joseph Coffland <[email protected]>
21:02:59:      Copyright: 2020 foldingathome.org
21:02:59:       Homepage: https://foldingathome.org/
21:02:59:           Date: Apr 17 2020
21:02:59:           Time: 11:13:06
21:02:59:       Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
21:02:59:         Branch: master
21:02:59:       Compiler: Visual C++ 2008
21:02:59:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:02:59:       Platform: win32 10
21:02:59:           Bits: 32
21:02:59:           Mode: Release
21:02:59:           Args: --open-web-control
21:02:59:         Config: C:\Users\kwtho\AppData\Roaming\FAHClient\config.xml
21:02:59:******************************** CBang ********************************
21:02:59:           Date: Apr 17 2020
21:02:59:           Time: 11:10:09
21:02:59:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
21:02:59:         Branch: master
21:02:59:       Compiler: Visual C++ 2008
21:02:59:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:02:59:       Platform: win32 10
21:02:59:           Bits: 32
21:02:59:           Mode: Release
21:02:59:******************************* System ********************************
21:02:59:            CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
21:02:59:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
21:02:59:           CPUs: 6
21:02:59:         Memory: 15.93GiB
21:02:59:    Free Memory: 12.05GiB
21:02:59:        Threads: WINDOWS_THREADS
21:02:59:     OS Version: 6.2
21:02:59:    Has Battery: false
21:02:59:     On Battery: false
21:02:59:     UTC Offset: -7
21:02:59:            PID: 6296
21:02:59:            CWD: C:\Users\kwtho\AppData\Roaming\FAHClient
21:02:59:             OS: Windows 10 Enterprise
21:02:59:        OS Arch: AMD64
21:02:59:           GPUs: 1
21:02:59:          GPU 0: Bus:1 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
21:02:59:                 470/480/570/580/590]
21:02:59:           CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
21:02:59:                 specified module could not be found.
21:02:59:
21:02:59:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:3004.8
21:02:59:  Win32 Service: false
21:02:59:******************************* libFAH ********************************
21:02:59:           Date: Apr 15 2020
21:02:59:           Time: 14:53:14
21:02:59:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
21:02:59:         Branch: master
21:02:59:       Compiler: Visual C++ 2008
21:02:59:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:02:59:       Platform: win32 10
21:02:59:           Bits: 32
21:02:59:           Mode: Release
21:02:59:***********************************************************************
21:02:59:<config>
21:02:59:  <!-- Folding Slot Configuration -->
21:02:59:  <cause v='HIGH_PRIORITY'/>
21:02:59:
21:02:59:  <!-- Network -->
21:02:59:  <proxy v=':8080'/>
21:02:59:
21:02:59:  <!-- Slot Control -->
21:02:59:  <pause-on-start v='true'/>
21:02:59:
21:02:59:  <!-- User Information -->
21:02:59:  <passkey v='*****'/>
21:02:59:  <team v='35780'/>
21:02:59:  <user v='kwthom'/>
21:02:59:
21:02:59:  <!-- Folding Slots -->
21:02:59:  <slot id='0' type='CPU'/>
21:02:59:  <slot id='1' type='GPU'>
21:02:59:    <paused v='true'/>
21:02:59:  </slot>
21:02:59:</config>
21:02:59:Trying to access database...
21:02:59:Successfully acquired database lock
21:02:59:Enabled folding slot 00: PAUSED cpu:4 (by user)
21:02:59:Enabled folding slot 01: PAUSED gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] (by user)
21:03:00:3:127.0.0.1:New Web session
PantherX wrote:Also, have you configured an exception for the client files from your anti-virus/anti-malware/anit-spyware/anti-ransomeware software?
Haven't had the necessity to do so, since CPU WU's are functioning without issue.

If you'd be kind enough to point me in the direction of where I'd find the prefered settings for F@H, then I'll confirm my settings.

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Fri May 01, 2020 4:57 am
by Joe_H
Generally we recommend that the F@h data directory be excluded from scanning by anti-virus software, the random binary data can trigger false positives. In your case that would be C:\Users\kwtho\AppData\Roaming\FAHClient. That can keep files from being opened by the scanning process and being blocked from use.

The Guru Meditation error in the first log usually indicates a problem opening a file, in this case connected with the checkpoint. Either the file was locked, corrupted, or missing. One other thing that can do this is a shutdown at the same time as a checkpoint is being written. Windows sometimes does not give running applications time enough to complete exiting, so data ends up not be written fully.

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Fri May 01, 2020 7:46 pm
by kwthom
I *really* think it's a GPU card setting issue I'll need to resolve.

EDIT: Exclusion set as described above; I'll release the pause I have on my GPU when it cools down a bit (36°C - 97°F) today. :biggrin:

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Posted: Sat May 02, 2020 3:01 pm
by kwthom
UPDATE: Let it run thru the evening, then paused at the end of the WU; zero issues. I did find this morning that my screen was set to turn off after 5 hours; changed to 'never'. Since I shut off the monitors with the front panel switches when not in use, no big change.

When it ran thru the evening, I actually had the log tab of F@H active, so that means there was always something refreshing on the screen. That, along with a slight tweak of under-volting the card *may* be the solutions on my end.

Card is holding ~68°C while under load at this early hour of the morning. Current WU has four more hours to complete - this will be another good test of my settings.