bruce wrote:I doubt it's a driver issue, but I can't rule it out.
All FAH projects run a "sanity check" periodically designed to terminate projects containing errors (the idea being that if the simulation is going to explode soon or be discarded anyway, it's a good idea to abort the run before it wastes any more of your time. The concept of "periodically" varies depending on the project, but 20% sounds like a possible frequency to run the sanity check.
Since the project was complete by somebody else, that does suggest that your hardware made a error that was detected at 20%. If that's true, the first places to start are (A)drivers, (B)overclocking and (C)overheating.
Some errors are recoverable and are worth a retry. When FAH suspects that a retry is worth it, it will retry 3 times before giving up.
Temps are awesome running core 21 units, highest recorded temp on any core 21 WU is 52C, GPU reaches up to 68C work units with other cores. So far, this one WU is the only one to fail with this GPU. There may be more showing on the Stanford side, but were due to me testing the overclocking on this card far past factory speed. Boost clock when folding core 21 work units is 1215Mhz. Yes this is technically overclocked vs reference speed, but on all other cores, this card runs at 1390MHz - 1454MHz boost clock.
Seems like core 21 may still have issues... core itself seems to have crashed twice, restarted, and finished the WU it was working on before the crash:
Code: Select all
00:31:31:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
00:31:31:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
00:31:31:WU01:FS01:Starting
00:31:31:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/chaosdsm/Documents/Folding/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 8528 -checkpoint 7 -gpu 0 -gpu-vendor nvidia
00:31:31:WU01:FS01:Started FahCore on PID 5420
00:31:31:WU01:FS01:Core PID:6564
00:31:31:WU01:FS01:FahCore 0x21 started
00:31:32:WU01:FS01:0x21:*********************** Log Started 2016-11-22T00:31:31Z ***********************
00:31:32:WU01:FS01:0x21:Project: 11708 (Run 0, Clone 194, Gen 8)
00:31:32:WU01:FS01:0x21:Unit: 0x000000118ca304e75814df28f1225fe3
00:31:32:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
00:31:32:WU01:FS01:0x21:Machine: 1
00:31:32:WU01:FS01:0x21:Digital signatures verified
00:31:32:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
00:31:32:WU01:FS01:0x21:Version 0.0.17
00:31:32:WU01:FS01:0x21: Found a checkpoint file
00:31:40:WU01:FS01:0x21:Completed 6250000 out of 7500000 steps (83%)
00:31:40:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:34:29:WU01:FS01:0x21:Completed 6300000 out of 7500000 steps (84%)
00:38:39:WU01:FS01:0x21:Completed 6375000 out of 7500000 steps (85%)
00:42:50:WU01:FS01:0x21:Completed 6450000 out of 7500000 steps (86%)
00:47:02:WU01:FS01:0x21:Completed 6525000 out of 7500000 steps (87%)
00:51:13:WU01:FS01:0x21:Completed 6600000 out of 7500000 steps (88%)
00:55:23:WU01:FS01:0x21:Completed 6675000 out of 7500000 steps (89%)
00:59:31:WU01:FS01:0x21:Completed 6750000 out of 7500000 steps (90%)
01:03:32:WU01:FS01:0x21:Completed 6825000 out of 7500000 steps (91%)
01:07:36:WU01:FS01:0x21:Completed 6900000 out of 7500000 steps (92%)
01:11:41:WU01:FS01:0x21:Completed 6975000 out of 7500000 steps (93%)
01:15:45:WU01:FS01:0x21:Completed 7050000 out of 7500000 steps (94%)
01:19:46:WU01:FS01:0x21:Completed 7125000 out of 7500000 steps (95%)
01:23:50:WU01:FS01:0x21:Completed 7200000 out of 7500000 steps (96%)
01:27:54:WU01:FS01:0x21:Completed 7275000 out of 7500000 steps (97%)
01:31:56:WU01:FS01:0x21:Completed 7350000 out of 7500000 steps (98%)
01:35:57:WU01:FS01:0x21:Completed 7425000 out of 7500000 steps (99%)
01:40:00:WU01:FS01:0x21:Completed 7500000 out of 7500000 steps (100%)
01:40:01:WU00:FS01:Connecting to 171.67.108.45:80
01:40:01:WU00:FS01:Assigned to work server 171.67.108.159
01:40:01:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM204 [GeForce GTX 970] from 171.67.108.159
01:40:01:WU00:FS01:Connecting to 171.67.108.159:8080
01:40:03:WU01:FS01:0x21:Saving result file logfile_01.txt
01:40:03:WU01:FS01:0x21:Saving result file checkpointState.xml
01:40:03:WU00:FS01:Downloading 22.85MiB
01:40:03:WU01:FS01:0x21:Saving result file checkpt.crc
01:40:03:WU01:FS01:0x21:Saving result file log.txt
01:40:03:WU01:FS01:0x21:Saving result file positions.xtc
01:40:03:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
01:40:04:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
01:40:04:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11708 run:0 clone:194 gen:8 core:0x21
Code: Select all
22:01:53:WU01:FS01:0x21:Completed 4500000 out of 7500000 steps (60%)
22:02:22:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
22:02:22:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
22:02:22:WU01:FS01:Starting
22:02:22:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/chaosdsm/Documents/Folding/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 8528 -checkpoint 7 -gpu 0 -gpu-vendor nvidia
22:02:22:WU01:FS01:Started FahCore on PID 9000
22:02:22:WU01:FS01:Core PID:2208
22:02:22:WU01:FS01:FahCore 0x21 started
22:02:23:WU01:FS01:0x21:*********************** Log Started 2016-11-22T22:02:23Z ***********************
22:02:23:WU01:FS01:0x21:Project: 11709 (Run 3, Clone 31, Gen 48)
22:02:23:WU01:FS01:0x21:Unit: 0x000000458ca304f357f594b8cb203d05
22:02:23:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
22:02:23:WU01:FS01:0x21:Machine: 1
22:02:23:WU01:FS01:0x21:Digital signatures verified
22:02:23:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
22:02:23:WU01:FS01:0x21:Version 0.0.17
22:02:23:WU01:FS01:0x21: Found a checkpoint file
22:02:31:WU01:FS01:0x21:Completed 4500000 out of 7500000 steps (60%)
22:02:31:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
22:06:51:WU01:FS01:0x21:Completed 4575000 out of 7500000 steps (61%)
22:11:13:WU01:FS01:0x21:Completed 4650000 out of 7500000 steps (62%)
22:15:34:WU01:FS01:0x21:Completed 4725000 out of 7500000 steps (63%)
22:19:57:WU01:FS01:0x21:Completed 4800000 out of 7500000 steps (64%)
22:24:21:WU01:FS01:0x21:Completed 4875000 out of 7500000 steps (65%)
22:28:46:WU01:FS01:0x21:Completed 4950000 out of 7500000 steps (66%)
22:33:11:WU01:FS01:0x21:Completed 5025000 out of 7500000 steps (67%)
22:37:36:WU01:FS01:0x21:Completed 5100000 out of 7500000 steps (68%)
******************************* Date: 2016-11-22 *******************************
22:41:56:WU01:FS01:0x21:Completed 5175000 out of 7500000 steps (69%)
22:46:18:WU01:FS01:0x21:Completed 5250000 out of 7500000 steps (70%)
22:50:45:WU01:FS01:0x21:Completed 5325000 out of 7500000 steps (71%)
22:55:11:WU01:FS01:0x21:Completed 5400000 out of 7500000 steps (72%)
22:59:36:WU01:FS01:0x21:Completed 5475000 out of 7500000 steps (73%)
23:04:05:WU01:FS01:0x21:Completed 5550000 out of 7500000 steps (74%)
23:08:30:WU01:FS01:0x21:Completed 5625000 out of 7500000 steps (75%)
23:12:54:WU01:FS01:0x21:Completed 5700000 out of 7500000 steps (76%)
23:17:19:WU01:FS01:0x21:Completed 5775000 out of 7500000 steps (77%)
23:21:39:WU01:FS01:0x21:Completed 5850000 out of 7500000 steps (78%)
23:26:05:WU01:FS01:0x21:Completed 5925000 out of 7500000 steps (79%)
23:30:27:WU01:FS01:0x21:Completed 6000000 out of 7500000 steps (80%)
23:34:52:WU01:FS01:0x21:Completed 6075000 out of 7500000 steps (81%)
23:39:14:WU01:FS01:0x21:Completed 6150000 out of 7500000 steps (82%)
23:43:33:WU01:FS01:0x21:Completed 6225000 out of 7500000 steps (83%)
23:47:46:WU01:FS01:0x21:Completed 6300000 out of 7500000 steps (84%)
23:51:59:WU01:FS01:0x21:Completed 6375000 out of 7500000 steps (85%)
23:56:16:WU01:FS01:0x21:Completed 6450000 out of 7500000 steps (86%)
00:00:40:WU01:FS01:0x21:Completed 6525000 out of 7500000 steps (87%)
00:04:58:WU01:FS01:0x21:Completed 6600000 out of 7500000 steps (88%)
00:09:16:WU01:FS01:0x21:Completed 6675000 out of 7500000 steps (89%)
00:13:35:WU01:FS01:0x21:Completed 6750000 out of 7500000 steps (90%)
00:17:56:WU01:FS01:0x21:Completed 6825000 out of 7500000 steps (91%)
00:22:14:WU01:FS01:0x21:Completed 6900000 out of 7500000 steps (92%)
00:26:32:WU01:FS01:0x21:Completed 6975000 out of 7500000 steps (93%)
00:30:52:WU01:FS01:0x21:Completed 7050000 out of 7500000 steps (94%)
00:35:12:WU01:FS01:0x21:Completed 7125000 out of 7500000 steps (95%)
00:39:30:WU01:FS01:0x21:Completed 7200000 out of 7500000 steps (96%)
00:43:51:WU01:FS01:0x21:Completed 7275000 out of 7500000 steps (97%)
00:48:08:WU01:FS01:0x21:Completed 7350000 out of 7500000 steps (98%)
00:52:26:WU01:FS01:0x21:Completed 7425000 out of 7500000 steps (99%)
00:56:43:WU01:FS01:0x21:Completed 7500000 out of 7500000 steps (100%)
00:56:44:WU00:FS01:Connecting to 171.67.108.45:80
00:56:45:WU00:FS01:Assigned to work server 171.64.65.84
00:56:45:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM204 [GeForce GTX 970] from 171.64.65.84
00:56:45:WU00:FS01:Connecting to 171.64.65.84:8080
00:56:46:WU00:FS01:Downloading 2.99MiB
00:56:46:WU01:FS01:0x21:Saving result file logfile_01.txt
00:56:46:WU01:FS01:0x21:Saving result file checkpointState.xml
00:56:49:WU01:FS01:0x21:Saving result file checkpt.crc
00:56:49:WU01:FS01:0x21:Saving result file log.txt
00:56:49:WU01:FS01:0x21:Saving result file positions.xtc
00:56:51:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
00:56:52:WU00:FS01:Download 68.91%
00:56:52:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
00:56:52:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11709 run:3 clone:31 gen:48 core:0x21
Plus core 21 only shows about 50 - 60% TDP utilization vs 90 to 100% for other cores.
Folding rig: EVGA Z370 Classified K w/i7-8700 & Hyper 212 EVO - WIN7 PRO 64bit - EVGA 1660 Ti XC Gaming (soon to be water cooled) - Corsair Vengeance 16GB DDR4-2666 dual channel memory - Samsung 970 Pro 512GB M.2 SSD - EVGA SuperNova 850 Platinum PSU