Fixed: Many failed work units for core 0x24

Moderators: Site Moderators, FAHC Science Team

AnudderArcher
Posts: 2
Joined: Mon Mar 10, 2025 11:07 am

Fixed: Many failed work units for core 0x24

Post by AnudderArcher »

Good morning,

I installed 8.4.9 a couple days ago. I hadn't run FaH for a couple years: I had to uninstall 7.x (and delete the data directory) before installing 8.4.9.

I have configured the client to use the GPU (NVIDIA 2060 Super) and 4 CPU cores. OpenCL driver is 560.94, with compute capability 3.0. CUDA driver is 12.6 with compute capability 7.5.

OS is Windoes 10.

Work Units that run on the CPU seem to work. Work units for core 0x22 and 0x23 seem to complete. However, any time a work unit for core 0x24 is attempted, it fails. Since the majority of these failed Work Units seem to be from the Alzheimer's project, I tried switching to just Cancer research only, but I still get work units from the Alzheimer's project.

Here's the log from one instance:

Code: Select all

11:01:52:I1:WU43:Received WU assignment FCO1bNDwW3fDcsJX-fReGlUVMSbL9pVTE9MGpIvzTh0
11:01:52:I1:WU43:Downloading WU
11:01:53:I1:WU43:Received WU P18238 R1146 C4 G50
11:01:54:I3:WU43:Started FahCore on PID 8448
11:01:54:I1:WU43:*********************** Log Started 2025-03-10T11:01:54Z ***********************
11:01:54:I1:WU43:*************************** Core24 Folding@home Core ***************************
11:01:54:I1:WU43: Core: Core24
11:01:54:I1:WU43: Type: 0x24
11:01:54:I1:WU43: Version: 8.1.4
11:01:54:I1:WU43: Author: Joseph Coffland <[email protected]>
11:01:54:I1:WU43: Copyright: 2022 foldingathome.org
11:01:54:I1:WU43: Homepage: https://foldingathome.org/
11:01:54:I1:WU43: Date: Jul 25 2024
11:01:54:I1:WU43: Time: 05:42:49
11:01:54:I1:WU43: Revision: cf9f0139862b8945a2091772770e4631aac37792
11:01:54:I1:WU43: Branch: HEAD
11:01:54:I1:WU43: Compiler: Visual C++
11:01:54:I1:WU43: Options: $( /TP $) /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2
11:01:54:I1:WU43: /Zc:throwingNew /MT -DOPENMM_VERSION="\"8.1.1\"" /Ox /std:c++14 
11:01:54:I1:WU43: Platform: win32 10
11:01:54:I1:WU43: Bits: 64
11:01:54:I1:WU43: Mode: Release
11:01:54:I1:WU43:Maintainers: John Chodera <[email protected]> and Peter Eastman
11:01:54:I1:WU43: <[email protected]>
11:01:54:I1:WU43: Args: -dir FCO1bNDwW3fDcsJX-fReGlUVMSbL9pVTE9MGpIvzTh0 -suffix 01
11:01:54:I1:WU43: -version 8.4.9 -lifeline 5836 -gpu-uuid
11:01:54:I1:WU43: b2c85621-e885-ac83-2abe-aaa3b35e96a5 -gpu-platform cuda -gpu-vendor
11:01:54:I1:WU43: nvidia -opencl-platform 0 -opencl-device 0 -cuda-platform 0
11:01:54:I1:WU43: -cuda-device 0 -gpu 0
11:01:54:I1:WU43:************************************ libFAH ************************************
11:01:54:I1:WU43: Date: Jul 25 2024
11:01:54:I1:WU43: Time: 05:23:50
11:01:54:I1:WU43: Revision: c7d2824a47eb025fa8cda8968c7a5e971585d90c
11:01:54:I1:WU43: Branch: HEAD
11:01:54:I1:WU43: Compiler: Visual C++
11:01:54:I1:WU43: Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
11:01:54:I1:WU43: Platform: win32 10 
11:01:54:I1:WU43: Bits: 64
11:01:54:I1:WU43: Mode: Release
11:01:54:I1:WU43:************************************ CBang *************************************
11:01:54:I1:WU43: Version: 1.7.2
11:01:54:I1:WU43: Author: Joseph Coffland <[email protected]>
11:01:54:I1:WU43: Org: Cauldron Development LLC
11:01:54:I1:WU43: Copyright: Cauldron Development LLC, 2003-2024
11:01:54:I1:WU43: Homepage: https://cauldrondevelopment.com/
11:01:54:I1:WU43: License: LGPL-2.1-or-later
11:01:54:I1:WU43: Date: Jul 25 2024
11:01:54:I1:WU43: Time: 05:22:43
11:01:54:I1:WU43: Revision: f1cd4c791e8c40a35dcfeab3ab85d910949cc0cb
11:01:54:I1:WU43: Branch: HEAD
11:01:54:I1:WU43: Compiler: Visual C++
11:01:54:I1:WU43: Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
11:01:54:I1:WU43: Platform: win32 10
11:01:54:I1:WU43: Bits: 64
11:01:54:I1:WU43: Mode: Release 
11:01:54:I1:WU43:************************************ System ************************************
11:01:54:I1:WU43: CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
11:01:54:I1:WU43: CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
11:01:54:I1:WU43: CPUs: 6
11:01:54:I1:WU43: Memory: 15.92GiB
11:01:54:I1:WU43:Free Memory: 12.34GiB
11:01:54:I1:WU43: OS Version: 10.0
11:01:54:I1:WU43:Has Battery: false
11:01:54:I1:WU43: On Battery: false
11:01:54:I1:WU43: Hostname: DESKTOP-KBJQ123
11:01:54:I1:WU43: UTC Offset: -4
11:01:54:I1:WU43: PID: 8448
11:01:54:I1:WU43: CWD: C:\ProgramData\FAHClient\work
11:01:54:I1:WU43: Exec: C:\ProgramData\FAHClient\cores\openmm-core-24\windows-10-64bit\release\fahcore-24-windows-10-64bit-release-8.1.4\FahCore_24.exe
11:01:54:I1:WU43:************************************ OpenMM ************************************
11:01:54:I1:WU43: Version: 8.1.1
11:01:54:I1:WU43:********************************************************************************
11:01:54:I1:WU43:Project: 18238 (Run 1146, Clone 4, Gen 50) 
11:01:54:I1:WU43:Reading tar file core.xml
11:01:54:I1:WU43:Reading tar file integrator.xml
11:01:54:I1:WU43:Reading tar file state.xml.bz2
11:01:54:I1:WU43:Reading tar file system.xml.bz2
11:01:54:I1:WU43:Digital signatures verified
11:01:54:I1:WU43:Folding@home GPU Core24 Folding@home Core
11:01:54:I1:WU43:Version 8.1.4
11:01:54:I1:WU43: Checkpoint write interval: 50000 steps (2%) [50 total]
11:01:54:I1:WU43: JSON viewer frame write interval: 25000 steps (1%) [100 total]
11:01:54:I1:WU43: XTC frame write interval: 10000 steps (0.4%) [250 total]
11:01:54:I1:WU43: TRR frame write interval: disabled
11:01:54:I1:WU43: Global context and integrator variables write interval: disabled
11:01:54:I1:WU43:There are 4 platforms available.
11:01:54:I1:WU43:Platform 0: Reference
11:01:54:I1:WU43:Platform 1: CPU
11:01:54:I1:WU43:Platform 2: OpenCL
11:01:54:I1:WU43: opencl-device 0 specified
11:01:54:I1:WU43:Platform 3: CUDA
11:01:54:I1:WU43: cuda-device 0 specified 
11:02:05:I1:WU43:Attempting to create CUDA context:
11:02:05:I1:WU43: Configuring platform CUDA
11:02:07:I1:WU43:ERROR:Win32: 0xc0000005: Access violation
11:02:07:I1:WU43:Saving result file ..\logfile_01.txt
11:02:07:I1:WU43:Saving result file science.log
11:02:07:I1:WU43:Saving result file state.xml.bz2
11:02:07:I1:WU43:Folding@home Core Shutdown: BAD_WORK_UNIT
11:02:08:E :WU43:Core returned BAD_WORK_UNIT (114)
11:02:08:I1:WU43:Uploading WU results
11:02:30:I1:WU43:Credited 
Side note: the log viewer doesn't let me copy the whole log in one chunk: if I do Ctrl-A or Shift-PageUp, the start of my selection just gets reset to a line that is currently visible instead of staying at the bottom of the log.
Last edited by AnudderArcher on Tue Mar 11, 2025 12:28 pm, edited 1 time in total.
muziqaz
Posts: 1410
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Many failed work units for core 0x24

Post by muziqaz »

You are running two fahclients at once. Uninstall everything FAH related and then install v8 again. Do not start it by yourself, just restart the PC, and launch browser with v8-4.foldingathome.org in the address bar. Then log in and configure your GPU and CPU and start folding.

Also, in windows when you want to restart the PC, pause any folding before reboot, otherwise you will lose the WUs you are folding at the time your are rebooting
FAH Omega tester
Image
Peter_Hucker
Posts: 338
Joined: Wed Feb 16, 2022 1:18 am
Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers.
Location: Scotland

Re: Many failed work units for core 0x24

Post by Peter_Hucker »

Surely FAH can save when you restart?!
Joe_H
Site Admin
Posts: 8073
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Many failed work units for core 0x24

Post by Joe_H »

Peter_Hucker wrote: Mon Mar 10, 2025 10:09 pm Surely FAH can save when you restart?!
Windows ignores the settings within the folding client to wait for processes to exit cleanly. So often it will just kill the folding processes before the cores have exited, or before the files have been flushed from RAM and written to the drive.
Image
Peter_Hucker
Posts: 338
Joined: Wed Feb 16, 2022 1:18 am
Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers.
Location: Scotland

Re: Many failed work units for core 0x24

Post by Peter_Hucker »

Joe_H wrote: Mon Mar 10, 2025 10:19 pm
Peter_Hucker wrote: Mon Mar 10, 2025 10:09 pm Surely FAH can save when you restart?!
Windows ignores the settings within the folding client to wait for processes to exit cleanly. So often it will just kill the folding processes before the cores have exited, or before the files have been flushed from RAM and written to the drive.
Boinc does it ok. This guy might help on how, chief programmer:

https://www.linkedin.com/in/aenbleidd/? ... bdomain=de
Joe_H
Site Admin
Posts: 8073
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Many failed work units for core 0x24

Post by Joe_H »

The F@h developer has included the recommended code for doing this, it is on the list of things to be looked into further to see why it is not being observed. You are welcome to look at the code on Github or have others look at it.
Image
Peter_Hucker
Posts: 338
Joined: Wed Feb 16, 2022 1:18 am
Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers.
Location: Scotland

Re: Many failed work units for core 0x24

Post by Peter_Hucker »

I'm no expert at programming. I only know Vitalii well enough to ask, and he's busy enough with his job and Boinc.
AnudderArcher
Posts: 2
Joined: Mon Mar 10, 2025 11:07 am

Re: Many failed work units for core 0x24

Post by AnudderArcher »

muziqaz wrote: Mon Mar 10, 2025 10:05 pm You are running two fahclients at once. Uninstall everything FAH related and then install v8 again. Do not start it by yourself, just restart the PC, and launch browser with v8-4.foldingathome.org in the address bar. Then log in and configure your GPU and CPU and start folding.

Also, in windows when you want to restart the PC, pause any folding before reboot, otherwise you will lose the WUs you are folding at the time your are rebooting
Confirmed: this fixed the issue. The first WU I received at the re-uninstall, reboot, and re-install was for Core 0x24, and it completed successfully. Thank you!
appepi
Posts: 66
Joined: Wed Mar 18, 2020 2:55 pm
Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080
Location: Sydney Australia

Re: Fixed: Many failed work units for core 0x24

Post by appepi »

However, any time a work unit for core 0x24 is attempted, it fails. Since the majority of these failed Work Units seem to be from the Alzheimer's project, I tried switching to just Cancer research only, but I still get work units from the Alzheimer's project.
Now that the main problem is solved, you may find that your 2060 super is unable to crunch Alzheimer's Project 18251 jobs as well as you might wish. Currently LAR systems shows that TU106 2060 SUPERs average about 0.9M PPD on Project 18251 which is less than half of what one would expect.

I am sure of those of us with similar GPU's (esp TU106 GTX 2060) would be interested to know how your 2060 Super behaves with PPD on 18251 versus other jobs. For example this one on which I am typing (Donor Z442, Rank #7066 after 4,086 WUs) is currently being 75% wasted by Project 18251 because it is getting an Estimated PPD of about 440K as against 2M PPD on normal work.

For the debate on this topic so far see viewtopic.php?t=42221
Image
Peter_Hucker
Posts: 338
Joined: Wed Feb 16, 2022 1:18 am
Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers.
Location: Scotland

Re: Many failed work units for core 0x24

Post by Peter_Hucker »

Joe_H wrote: Mon Mar 10, 2025 10:19 pm
Peter_Hucker wrote: Mon Mar 10, 2025 10:09 pm Surely FAH can save when you restart?!
Windows ignores the settings within the folding client to wait for processes to exit cleanly. So often it will just kill the folding processes before the cores have exited, or before the files have been flushed from RAM and written to the drive.
Is this why I got a warning after rebooting a system? The folding monitoring webpage showed an orange box round the computer in question, and next to the cog symbol for the machine was an orange word warning. Hovering over it told me to check the logs. But the logs for the machine and every workunit in progress had no warnings when I checked warnings only. I wasn't going to look through hundreds of lines of info! All I could tell was it was starting the workunit (or a new workunit?) from the beginning on both GPUs.
Joe_H
Site Admin
Posts: 8073
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Fixed: Many failed work units for core 0x24

Post by Joe_H »

The error may be recorded in the prior log from just before the restart at the end of the log. By default the client retains the last 99 for v8, and 16 for v7 in a folder within the F@h data directory called "logs".

I run on under macOS versions, rarely do I see an improper shutdown of folding in my logs. But when it happens there can be messages that the folding core failed to exit and was killed instead. GPU folding cores can take more than a few seconds to exit depending on the WU size.
Image
Peter_Hucker
Posts: 338
Joined: Wed Feb 16, 2022 1:18 am
Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers.
Location: Scotland

Re: Fixed: Many failed work units for core 0x24

Post by Peter_Hucker »

If you shut down your mac, does folding exit gracefully? I guess windows makes the programming more difficult. It's possible to do it though, gridcoin for example puts up a "gridcoin did not yet exit safely" for about 5 seconds while it saves something and windows waits unless you click "shutdown anyway".
Joe_H
Site Admin
Posts: 8073
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Fixed: Many failed work units for core 0x24

Post by Joe_H »

Most of the time it does exit gracefully. The client is running as a background service stated and stopped by a system service. It is also CPU folding only, those cores appear to exit faster.
Image
muziqaz
Posts: 1410
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Many failed work units for core 0x24

Post by muziqaz »

Peter_Hucker wrote: Fri Mar 14, 2025 9:10 pm
Joe_H wrote: Mon Mar 10, 2025 10:19 pm
Peter_Hucker wrote: Mon Mar 10, 2025 10:09 pm Surely FAH can save when you restart?!
Windows ignores the settings within the folding client to wait for processes to exit cleanly. So often it will just kill the folding processes before the cores have exited, or before the files have been flushed from RAM and written to the drive.
Is this why I got a warning after rebooting a system? The folding monitoring webpage showed an orange box round the computer in question, and next to the cog symbol for the machine was an orange word warning. Hovering over it told me to check the logs. But the logs for the machine and every workunit in progress had no warnings when I checked warnings only. I wasn't going to look through hundreds of lines of info! All I could tell was it was starting the workunit (or a new workunit?) from the beginning on both GPUs.
In Windows you must pause any folding before restarting your computer. Fix is being worked on for that.
Other OSs have no issue with it
FAH Omega tester
Image
Peter_Hucker
Posts: 338
Joined: Wed Feb 16, 2022 1:18 am
Hardware configuration: Ryzen 9 3900XT: 24 cores, 128GB RAM, 1TB NVME, 4TB HDD, R9 Nano (Fiji) GPU.
Ryzen 9 3900X: 24 cores, 64GB RAM, 250GB NVME.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME, R9 290(Hawaii) GPU.
Xeon X5650 dual CPU server: 24 cores, 64GB RAM, 250GB NVME.
I3-6100: 4 cores, 32GB RAM, 250GB NVME, 2 of R9 2980X (Tahiti) GPUs.
5 other smaller computers.
Location: Scotland

Re: Fixed: Many failed work units for core 0x24

Post by Peter_Hucker »

Only time I restart my Windows computer is if it's locked up (once every 6 months). It's then difficult to pause folding.

Those of you who haven't bypassed Microsoft's auto-update and reboot, losing all the things you were working on, probably lose a lot of folding tasks. I've bypassed it ever since I lost an unsaved word document. The autosave was discarded because apparently "I" had chosen to close and not save?!
Post Reply