Work units crashing 2x 1070ti set-up
Posted: Thu Nov 19, 2020 6:03 pm
Hi,
One of the gpu's fails quite often. I switched the motherboard, currently on prime z270-p with all drivers up to date. Ideas?
Tnx much
One of the gpu's fails quite often. I switched the motherboard, currently on prime z270-p with all drivers up to date. Ideas?
Tnx much
Code: Select all
117:27:43:WU00:FS02:0x22: Using CUDA and gpu 0
17:27:44:WU00:FS02:0x22:Completed 0 out of 2000000 steps (0%)
17:28:33:WU01:FS00:0x22:Completed 337500 out of 1250000 steps (27%)
1[color=#FF0000]7:29:01:WU00:FS02:0x22:An exception occurred at step 17067: Particle coordinate is nan
17:29:01:WU00:FS02:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
17:29:01:WU00:FS02:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.[/color]
17:29:01:WU00:FS02:0x22:Saving result file ..\logfile_01.txt
17:29:01:WU00:FS02:0x22:Saving result file science.log
17:29:01:WU00:FS02:0x22:Saving result file state.xml
[color=#FF0000]17:29:06:WU00:FS02:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT[/color]
17:29:06:WARNING:WU00:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:29:06:WU00:FS02:Sending unit results: id:00 state:SEND error:FAULTY project:14904 run:239 clone:5 gen:188 core:0x22 unit:0x0000010081d59d695f4ec9dfa5d32bbc
17:29:06:WU00:FS02:Uploading 9.53MiB to 129.213.157.105
17:29:06:WU00:FS02:Connecting to 129.213.157.105:8080
17:29:07:WU02:FS02:Connecting to assign1.foldingathome.org:80
17:29:07:WU02:FS02:Assigned to work server 18.188.125.154
17:29:07:WU02:FS02:Requesting new work unit for slot 02: gpu:1:0 GP104 [GeForce GTX 1070 Ti] 8186 from 18.188.125.154
17:29:07:WU02:FS02:Connecting to 18.188.125.154:8080
[color=#FF0000]7:27:32:WU00:FS02:0x22:An exception occurred at step 28111: Particle coordinate is nan
17:27:32:WU00:FS02:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
17:27:32:WU00:FS02:0x22:Folding@home Core Shutdown: CORE_RESTART[/color]
17:27:32:WARNING:WU00:FS02:FahCore returned: CORE_RESTART (98 = 0x62)
17:27:32:WU00:FS02:Starting
17:27:32:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 11520 -checkpoint 5 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
17:27:32:WU00:FS02:Started FahCore on PID 7376
17:27:32:WU00:FS02:Core PID:5028
17:27:32:WU00:FS02:FahCore 0x22 started
17:27:33:WU00:FS02:0x22:*********************** Log Started 2020-11-19T17:27:32Z ***********************
17:27:33:WU00:FS02:0x22:*************************** Core22 Folding@home Core ***************************
17:27:33:WU00:FS02:0x22: Core: Core22
17:27:33:WU00:FS02:0x22: Type: 0x22
17:27:33:WU00:FS02:0x22: Version: 0.0.13
17:27:33:WU00:FS02:0x22: Author: Joseph Coffland <[email protected]>
17:27:33:WU00:FS02:0x22: Copyright: 2020 foldingathome.org
17:27:33:WU00:FS02:0x22: Homepage: https://foldingathome.org/
17:27:33:WU00:FS02:0x22: Date: Sep 19 2020
17:27:33:WU00:FS02:0x22: Time: 02:35:58
17:27:33:WU00:FS02:0x22: Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
17:27:33:WU00:FS02:0x22: Branch: core22-0.0.13
17:27:33:WU00:FS02:0x22: Compiler: Visual C++ 2015
17:27:33:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:27:33:WU00:FS02:0x22: -DOPENMM_GIT_HASH="\"189320d0\""
17:27:33:WU00:FS02:0x22: Platform: win32 10
17:27:33:WU00:FS02:0x22: Bits: 64
17:27:33:WU00:FS02:0x22: Mode: Release
17:27:33:WU00:FS02:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
17:27:33:WU00:FS02:0x22: <[email protected]>
17:27:33:WU00:FS02:0x22: Args: -dir 00 -suffix 01 -version 706 -lifeline 7376 -checkpoint 5
17:27:33:WU00:FS02:0x22: -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
17:27:33:WU00:FS02:0x22: nvidia -gpu 0 -gpu-usage 100
17:27:33:WU00:FS02:0x22:************************************ libFAH ************************************
17:27:33:WU00:FS02:0x22: Date: Sep 7 2020
17:27:33:WU00:FS02:0x22: Time: 19:09:56
17:27:33:WU00:FS02:0x22: Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
17:27:33:WU00:FS02:0x22: Branch: HEAD
17:27:33:WU00:FS02:0x22: Compiler: Visual C++ 2015
17:27:33:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:27:33:WU00:FS02:0x22: Platform: win32 10
17:27:33:WU00:FS02:0x22: Bits: 64
17:27:33:WU00:FS02:0x22: Mode: Release
17:27:33:WU00:FS02:0x22:************************************ CBang *************************************
17:27:33:WU00:FS02:0x22: Date: Sep 7 2020
17:27:33:WU00:FS02:0x22: Time: 19:08:30
17:27:33:WU00:FS02:0x22: Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
17:27:33:WU00:FS02:0x22: Branch: HEAD
17:27:33:WU00:FS02:0x22: Compiler: Visual C++ 2015
17:27:33:WU00:FS02:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:27:33:WU00:FS02:0x22: Platform: win32 10
17:27:33:WU00:FS02:0x22: Bits: 64
17:27:33:WU00:FS02:0x22: Mode: Release
17:27:33:WU00:FS02:0x22:************************************ System ************************************
17:27:33:WU00:FS02:0x22: CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
17:27:33:WU00:FS02:0x22: CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
17:27:33:WU00:FS02:0x22: CPUs: 4
17:27:33:WU00:FS02:0x22: Memory: 3.95GiB
17:27:33:WU00:FS02:0x22:Free Memory: 1.40GiB
17:27:33:WU00:FS02:0x22: Threads: WINDOWS_THREADS
17:27:33:WU00:FS02:0x22: OS Version: 6.2
17:27:33:WU00:FS02:0x22:Has Battery: false
17:27:33:WU00:FS02:0x22: On Battery: false
17:27:33:WU00:FS02:0x22: UTC Offset: -6
17:27:33:WU00:FS02:0x22: PID: 5028
17:27:33:WU00:FS02:0x22: CWD: C:\ProgramData\FAHClient\work
17:27:33:WU00:FS02:0x22:************************************ OpenMM ************************************
17:27:33:WU00:FS02:0x22: Revision: 189320d0
17:27:33:WU00:FS02:0x22:********************************************************************************
17:27:33:WU00:FS02:0x22:Project: 14904 (Run 239, Clone 5, Gen 188)
17:27:33:WU00:FS02:0x22:Unit: 0x0000010081d59d695f4ec9dfa5d32bbc
17:27:33:WU00:FS02:0x22:Digital signatures verified
17:27:33:WU00:FS02:0x22:Folding@home GPU Core22 Folding@home Core
17:27:33:WU00:FS02:0x22:Version 0.0.13
17:27:33:WU00:FS02:0x22: Checkpoint write interval: 100000 steps (5%) [20 total]
17:27:33:WU00:FS02:0x22: JSON viewer frame write interval: 20000 steps (1%) [100 total]
17:27:33:WU00:FS02:0x22: XTC frame write interval: 50000 steps (2.5%) [40 total]
17:27:33:WU00:FS02:0x22: Global context and integrator variables write interval: disabled
17:27:33:WU00:FS02:0x22:There are 4 platforms available.
17:27:33:WU00:FS02:0x22:Platform 0: Reference
17:27:33:WU00:FS02:0x22:Platform 1: CPU
17:27:33:WU00:FS02:0x22:Platform 2: OpenCL
17:27:33:WU00:FS02:0x22: opencl-device 0 specified
17:27:33:WU00:FS02:0x22:Platform 3: CUDA
17:27:33:WU00:FS02:0x22: cuda-device 0 specified
17:27:40:WU00:FS02:0x22:Attempting to create CUDA context:
17:27:40:WU00:FS02:0x22: Configuring platform CUDA
17:27:43:WU00:FS02:0x22: Using CUDA and gpu 0
17:27:44:WU00:FS02:0x22:Completed 0 out of 2000000 steps (0%)
17:28:33:WU01:FS00:0x22:Completed 337500 out of 1250000 steps (27%)