16581 (127, 3, 20) Unexpected Dump + Log

Moderators: Site Moderators, FAHC Science Team

Post Reply
brrr
Posts: 1
Joined: Tue Apr 08, 2025 5:33 pm

16581 (127, 3, 20) Unexpected Dump + Log

Post by brrr »

Code: Select all

03:18:47:I1:WU21:Requesting WU assignment for user Brrr team 169927
03:18:48:I1:WU21:Received WU assignment BUO8GxPzVnkZc9V2RR1ODBt0eSIf6pveBD7-8GNgC1I
03:18:48:I1:WU21:Downloading WU
03:18:57:I1:WU21:Received WU P16581 R127 C3 G20
03:18:57:I3:WU21:Started FahCore on PID 4120
03:18:57:I1:WU21:*********************** Log Started 2025-04-08T03:18:57Z ***********************
03:18:57:I1:WU21:*************************** Core24 Folding@home Core ***************************
03:18:57:I1:WU21: Core: Core24
03:18:57:I1:WU21: Type: 0x24
03:18:57:I1:WU21: Version: 8.1.4
03:18:57:I1:WU21: Author: Joseph Coffland <[email protected]>
03:18:57:I1:WU21: Copyright: 2022 foldingathome.org
03:18:57:I1:WU21: Homepage: https://foldingathome.org/
03:18:57:I1:WU21: Date: Jul 25 2024
03:18:57:I1:WU21: Time: 05:42:49
03:18:57:I1:WU21: Revision: cf9f0139862b8945a2091772770e4631aac37792
03:18:57:I1:WU21: Branch: HEAD
03:18:57:I1:WU21: Compiler: Visual C++
03:18:57:I1:WU21: Options: $( /TP $) /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2
03:18:57:I1:WU21: /Zc:throwingNew /MT -DOPENMM_VERSION="\"8.1.1\"" /Ox /std:c++14
03:18:57:I1:WU21: Platform: win32 10
03:18:57:I1:WU21: Bits: 64
03:18:57:I1:WU21: Mode: Release
03:18:57:I1:WU21:Maintainers: John Chodera <[email protected]> and Peter Eastman
03:18:57:I1:WU21: <[email protected]>
03:18:57:I1:WU21: Args: -dir BUO8GxPzVnkZc9V2RR1ODBt0eSIf6pveBD7-8GNgC1I -suffix 01
03:18:57:I1:WU21: -version 8.4.9 -lifeline 7160 -gpu-uuid
03:18:57:I1:WU21: 31857b8c-790d-2885-8303-27d5dce74468 -gpu-platform cuda -gpu-vendor
03:18:57:I1:WU21: nvidia -opencl-platform 0 -opencl-device 0 -cuda-platform 0
03:18:57:I1:WU21: -cuda-device 0 -gpu 0
03:18:57:I1:WU21:************************************ libFAH ************************************
03:18:57:I1:WU21: Date: Jul 25 2024
03:18:57:I1:WU21: Time: 05:23:50
03:18:57:I1:WU21: Revision: c7d2824a47eb025fa8cda8968c7a5e971585d90c
03:18:57:I1:WU21: Branch: HEAD
03:18:57:I1:WU21: Compiler: Visual C++
03:18:57:I1:WU21: Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
03:18:57:I1:WU21: Platform: win32 10
03:18:57:I1:WU21: Bits: 64
03:18:57:I1:WU21: Mode: Release
03:18:57:I1:WU21:************************************ CBang *************************************
03:18:57:I1:WU21: Version: 1.7.2
03:18:57:I1:WU21: Author: Joseph Coffland <[email protected]>
03:18:57:I1:WU21: Org: Cauldron Development LLC
03:18:57:I1:WU21: Copyright: Cauldron Development LLC, 2003-2024
03:18:57:I1:WU21: Homepage: https://cauldrondevelopment.com/
03:18:57:I1:WU21: License: LGPL-2.1-or-later
03:18:57:I1:WU21: Date: Jul 25 2024
03:18:57:I1:WU21: Time: 05:22:43
03:18:57:I1:WU21: Revision: f1cd4c791e8c40a35dcfeab3ab85d910949cc0cb
03:18:57:I1:WU21: Branch: HEAD
03:18:57:I1:WU21: Compiler: Visual C++
03:18:57:I1:WU21: Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
03:18:57:I1:WU21: Platform: win32 10
03:18:57:I1:WU21: Bits: 64
03:18:57:I1:WU21: Mode: Release
03:18:57:I1:WU21:************************************ System ************************************
03:18:57:I1:WU21: CPU: AMD Ryzen 5 1600X Six-Core Processor
03:18:57:I1:WU21: CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
03:18:57:I1:WU21: CPUs: 12
03:18:57:I1:WU21: Memory: 31.91GiB
03:18:57:I1:WU21:Free Memory: 27.12GiB
03:18:57:I1:WU21: OS Version: 10.0
03:18:57:I1:WU21:Has Battery: false
03:18:57:I1:WU21: On Battery: false
03:18:57:I1:WU21: Hostname: Curiosity
03:18:57:I1:WU21: UTC Offset: 1
03:18:57:I1:WU21: PID: 4120
03:18:57:I1:WU21: CWD: C:\ProgramData\FAHClient\work
03:18:57:I1:WU21: Exec: C:\ProgramData\FAHClient\cores\openmm-core-24\windows-10-64bit\release\fahcore-24-windows-10-64bit-release-8.1.4\FahCore_24.exe
03:18:57:I1:WU21:************************************ OpenMM ************************************
03:18:57:I1:WU21: Version: 8.1.1
03:18:57:I1:WU21:********************************************************************************
03:18:57:I1:WU21:Project: 16581 (Run 127, Clone 3, Gen 20)
03:18:57:I1:WU21:Reading tar file core.xml
03:18:57:I1:WU21:Reading tar file integrator.xml
03:18:57:I1:WU21:Reading tar file state.xml
03:18:59:I1:WU21:Reading tar file system.xml
03:19:00:I1:WU21:Digital signatures verified
03:19:00:I1:WU21:Folding@home GPU Core24 Folding@home Core
03:19:00:I1:WU21:Version 8.1.4
03:19:00:I1:WU21: Checkpoint write interval: 50000 steps (2%) [50 total]
03:19:00:I1:WU21: JSON viewer frame write interval: 25000 steps (1%) [100 total]
03:19:00:I1:WU21: XTC frame write interval: 25000 steps (1%) [100 total]
03:19:00:I1:WU21: TRR frame write interval: disabled
03:19:00:I1:WU21: Global context and integrator variables write interval: disabled
03:19:00:I1:WU21:There are 4 platforms available.
03:19:00:I1:WU21:Platform 0: Reference
03:19:00:I1:WU21:Platform 1: CPU
03:19:00:I1:WU21:Platform 2: OpenCL
03:19:00:I1:WU21: opencl-device 0 specified
03:19:00:I1:WU21:Platform 3: CUDA
03:19:00:I1:WU21: cuda-device 0 specified
03:19:22:I1:WU21:Attempting to create CUDA context:
03:19:22:I1:WU21: Configuring platform CUDA
03:19:27:I1:WU21: Using CUDA on CUDA Platform and gpu 0
03:19:27:I1:WU21: GPU info: Platform: CUDA
03:19:27:I1:WU21: GPU info: PlatformIndex: 0
03:19:27:I1:WU21: GPU info: Device: NVIDIA GeForce GTX 1660
03:19:27:I1:WU21: GPU info: DeviceIndex: 0
03:19:27:I1:WU21: GPU info: Vendor: 0x10de
03:19:27:I1:WU21: GPU info: PCI: 10:00:00
03:19:27:I1:WU21: GPU info: Compute: 7.5
03:19:27:I1:WU21: GPU info: Driver: 12.8
03:19:27:I1:WU21: GPU info: GPU: true
03:19:27:I1:WU21:Completed 0 out of 2500000 steps (0%)
03:19:28:I1:WU21:Checkpoint completed at step 0
03:24:36:I1:WU21:Completed 25000 out of 2500000 steps (1%)
03:29:45:I1:WU21:Completed 50000 out of 2500000 steps (2%)
03:29:46:I1:WU21:Checkpoint completed at step 50000
03:35:02:I1:WU21:Completed 75000 out of 2500000 steps (3%)
03:40:22:I1:WU21:Completed 100000 out of 2500000 steps (4%)
03:40:24:I1:WU21:Checkpoint completed at step 100000

Code: Select all

10:57:42:I1:WU21:Checkpoint completed at step 2150000
11:03:01:I1:WU21:Completed 2175000 out of 2500000 steps (87%)
11:08:19:I1:WU21:Completed 2200000 out of 2500000 steps (88%)
11:08:21:I1:WU21:Checkpoint completed at step 2200000
11:13:40:I1:WU21:Completed 2225000 out of 2500000 steps (89%)
11:17:23:I1:WU21:An exception occurred at step 2242504: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)
11:17:23:I1:WU21:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
11:17:23:I1:WU21:Folding@home Core Shutdown: CORE_RESTART
11:17:26:E :WU21:Core exited with Windows unhandled exception code 0xc0000409. See https://bit.ly/2CXgWkZ for more information.
11:17:26:E :WU21:Core returned FAILED_1 (0)
11:17:26:E :WU21:Run did not produce any results. Dumping WU
11:17:26:I1:WU21:Sending dump report
11:17:27:I1:WU21:Dumped
muziqaz
Posts: 1534
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: 16581 (127, 3, 20) Unexpected Dump + Log

Post by muziqaz »

Dieing GPU? Broken drivers?
GPU is quite old now. Driver team pays no attention to it anymore, CUDA devs do the same
FAH Omega tester
Image
toTOW
Site Moderator
Posts: 6421
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 16581 (127, 3, 20) Unexpected Dump + Log

Post by toTOW »

CUDA_ERROR_ILLEGAL_ADDRESS (700) is usually a GPU or driver reset.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
arisu
Posts: 252
Joined: Mon Feb 24, 2025 11:11 pm

Re: 16581 (127, 3, 20) Unexpected Dump + Log

Post by arisu »

toTOW wrote: Sat Apr 12, 2025 9:56 pm CUDA_ERROR_ILLEGAL_ADDRESS (700) is usually a GPU or driver reset.
It looks like the first error and second error are different. The first error it DOES try to recover from, but the second error makes it think that it has received the same problem twice in a row. So most likely another dump that could have been recovered from, since it's not a discrepancy in the simulation (like the particle position is NaN errors). Maybe it should have waited longer before retrying? Or should have prompted the user to reboot?
Post Reply