Project 16435 and RX Vega 56/64
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 73
- Joined: Sat Mar 21, 2020 3:56 pm
Re: Project 16435 and RX Vega 56/64
I do (20.4.2). Release notes for it say it's more stable for folding. And it is, except for this project.
2nd crash at even lower clock limit. System didn't even have time to warm up.
I'll just post the majority of the crash log here (sorry mobile users):
WU recovered a 2nd time, but I'm going Light for now. Need to use my PC.
=========================================================EDIT BELOW=================
Started the core. Here are the numbers that my GPU is posting while crunching; it's very reasonable to assume similar numbers for the recent (2nd) crash of this WU.
Summary: GPU bounces between 70 and 90% utilization, 1450 - 1500 Mhz clock, and 71 - 73C temp.
Historically it has run at 83C during stress tests and is stable.
2nd crash at even lower clock limit. System didn't even have time to warm up.
I'll just post the majority of the crash log here (sorry mobile users):
Code: Select all
11:46:16:WU03:FS01:Started FahCore on PID 15116
11:46:16:WU03:FS01:Core PID:14828
11:46:16:WU03:FS01:FahCore 0x22 started
11:46:17:WU02:FS00:Starting
11:46:17:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 8520 -checkpoint 15 -np 14
11:46:17:WU02:FS00:Started FahCore on PID 1204
11:46:17:WU02:FS00:Core PID:11516
11:46:17:WU02:FS00:FahCore 0xa7 started
11:46:17:WU03:FS01:0x22:*********************** Log Started 2020-05-08T11:46:17Z ***********************
11:46:17:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
11:46:17:WU03:FS01:0x22: Type: 0x22
11:46:17:WU03:FS01:0x22: Core: Core22
11:46:17:WU03:FS01:0x22: Website: https://foldingathome.org/
11:46:17:WU03:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
11:46:17:WU03:FS01:0x22: Author: John Chodera <[email protected]> and Rafal Wiewiora
11:46:17:WU03:FS01:0x22: <[email protected]>
11:46:17:WU03:FS01:0x22: Args: -dir 03 -suffix 01 -version 706 -lifeline 15116 -checkpoint 15
11:46:17:WU03:FS01:0x22: -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
11:46:17:WU03:FS01:0x22: Config: <none>
11:46:17:WU03:FS01:0x22:************************************ Build *************************************
11:46:17:WU03:FS01:0x22: Version: 0.0.2
11:46:17:WU03:FS01:0x22: Date: Dec 6 2019
11:46:17:WU03:FS01:0x22: Time: 21:30:31
11:46:17:WU03:FS01:0x22: Repository: Git
11:46:17:WU03:FS01:0x22: Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
11:46:17:WU03:FS01:0x22: Branch: HEAD
11:46:17:WU03:FS01:0x22: Compiler: Visual C++ 2008
11:46:17:WU03:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:46:17:WU03:FS01:0x22: Platform: win32 10
11:46:17:WU03:FS01:0x22: Bits: 64
11:46:17:WU03:FS01:0x22: Mode: Release
11:46:17:WU03:FS01:0x22:************************************ System ************************************
11:46:17:WU03:FS01:0x22: CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:46:17:WU03:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:46:17:WU03:FS01:0x22: CPUs: 16
11:46:17:WU03:FS01:0x22: Memory: 31.95GiB
11:46:17:WU03:FS01:0x22:Free Memory: 24.17GiB
11:46:17:WU03:FS01:0x22: Threads: WINDOWS_THREADS
11:46:17:WU03:FS01:0x22: OS Version: 6.2
11:46:17:WU03:FS01:0x22:Has Battery: false
11:46:17:WU03:FS01:0x22: On Battery: false
11:46:17:WU03:FS01:0x22: UTC Offset: -4
11:46:17:WU03:FS01:0x22: PID: 14828
11:46:17:WU03:FS01:0x22: CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:46:17:WU03:FS01:0x22: OS: Windows 10 Home
11:46:17:WU03:FS01:0x22: OS Arch: AMD64
11:46:17:WU03:FS01:0x22:********************************************************************************
11:46:17:WU03:FS01:0x22:Project: 16435 (Run 420, Clone 1, Gen 12)
11:46:17:WU03:FS01:0x22:Unit: 0x0000001403854c135e9a4efb06870c6b
11:46:17:WU03:FS01:0x22:Digital signatures verified
11:46:17:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
11:46:17:WU03:FS01:0x22:Version 0.0.2
11:46:17:WU03:FS01:0x22: Found a checkpoint file
11:46:17:WU02:FS00:0xa7:*********************** Log Started 2020-05-08T11:46:17Z ***********************
11:46:17:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
11:46:17:WU02:FS00:0xa7: Type: 0xa7
11:46:17:WU02:FS00:0xa7: Core: Gromacs
11:46:17:WU02:FS00:0xa7: Args: -dir 02 -suffix 01 -version 706 -lifeline 1204 -checkpoint 15 -np
11:46:17:WU02:FS00:0xa7: 14
11:46:17:WU02:FS00:0xa7:************************************ CBang *************************************
11:46:17:WU02:FS00:0xa7: Date: Oct 26 2019
11:46:17:WU02:FS00:0xa7: Time: 01:38:25
11:46:17:WU02:FS00:0xa7: Revision: c46a1a011a24143739ac7218c5a435f66777f62f
11:46:17:WU02:FS00:0xa7: Branch: master
11:46:17:WU02:FS00:0xa7: Compiler: Visual C++ 2008
11:46:17:WU02:FS00:0xa7: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:46:17:WU02:FS00:0xa7: Platform: win32 10
11:46:17:WU02:FS00:0xa7: Bits: 64
11:46:17:WU02:FS00:0xa7: Mode: Release
11:46:17:WU02:FS00:0xa7:************************************ System ************************************
11:46:17:WU02:FS00:0xa7: CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:46:17:WU02:FS00:0xa7: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:46:17:WU02:FS00:0xa7: CPUs: 16
11:46:17:WU02:FS00:0xa7: Memory: 31.95GiB
11:46:17:WU02:FS00:0xa7:Free Memory: 24.16GiB
11:46:17:WU02:FS00:0xa7: Threads: WINDOWS_THREADS
11:46:17:WU02:FS00:0xa7: OS Version: 6.2
11:46:17:WU02:FS00:0xa7:Has Battery: false
11:46:17:WU02:FS00:0xa7: On Battery: false
11:46:17:WU02:FS00:0xa7: UTC Offset: -4
11:46:17:WU02:FS00:0xa7: PID: 11516
11:46:17:WU02:FS00:0xa7: CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:46:17:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
11:46:17:WU02:FS00:0xa7: Version: 0.0.18
11:46:17:WU02:FS00:0xa7: Author: Joseph Coffland <[email protected]>
11:46:17:WU02:FS00:0xa7: Copyright: 2019 foldingathome.org
11:46:17:WU02:FS00:0xa7: Homepage: https://foldingathome.org/
11:46:17:WU02:FS00:0xa7: Date: Oct 26 2019
11:46:17:WU02:FS00:0xa7: Time: 01:52:30
11:46:17:WU02:FS00:0xa7: Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
11:46:17:WU02:FS00:0xa7: Branch: master
11:46:17:WU02:FS00:0xa7: Compiler: Visual C++ 2008
11:46:17:WU02:FS00:0xa7: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:46:17:WU02:FS00:0xa7: Platform: win32 10
11:46:17:WU02:FS00:0xa7: Bits: 64
11:46:17:WU02:FS00:0xa7: Mode: Release
11:46:17:WU02:FS00:0xa7:************************************ Build *************************************
11:46:17:WU02:FS00:0xa7: SIMD: avx_256
11:46:17:WU02:FS00:0xa7:********************************************************************************
11:46:17:WU02:FS00:0xa7:Project: 14235 (Run 587, Clone 1, Gen 17)
11:46:17:WU02:FS00:0xa7:Unit: 0x00000013cedfaa925ea375e71d1602bc
11:46:17:WU02:FS00:0xa7:Digital signatures verified
11:46:17:WU02:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
11:46:17:WU02:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
11:46:17:WU02:FS00:0xa7:Calling: mdrun -s frame17.tpr -o frame17.trr -x frame17.xtc -cpi state.cpt -cpt 15 -nt 12
11:46:17:WU02:FS00:0xa7:Steps: first=4250000 total=250000
11:46:19:WU02:FS00:0xa7:Completed 165881 out of 250000 steps (66%)
11:46:30:WU03:FS01:0x22:Completed 1610000 out of 5000000 steps (32%)
11:46:30:WU03:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
11:48:12:WU02:FS00:0xa7:Completed 167500 out of 250000 steps (67%)
11:48:47:WU03:FS01:0x22:Completed 1650000 out of 5000000 steps (33%)
11:51:07:WU02:FS00:0xa7:Completed 170000 out of 250000 steps (68%)
11:51:40:WU03:FS01:0x22:Completed 1700000 out of 5000000 steps (34%)
11:53:27:FS00:Paused
11:53:27:FS01:Paused
11:53:27:FS00:Shutting core down
11:53:27:FS01:Shutting core down
11:53:27:WU03:FS01:0x22:WARNING:Console control signal 1 on PID 14828
11:53:27:WU03:FS01:0x22:Exiting, please wait. . .
11:53:27:WU02:FS00:0xa7:WARNING:Console control signal 1 on PID 11516
11:53:27:WU02:FS00:0xa7:Exiting, please wait. . .
11:53:28:WU03:FS01:0x22:Folding@home Core Shutdown: INTERRUPTED
11:53:28:WU03:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
11:53:29:WU02:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
11:53:30:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
11:54:00:FS00:Unpaused
11:54:00:FS01:Unpaused
11:54:00:WU03:FS01:Starting
11:54:00:WU03:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 03 -suffix 01 -version 706 -lifeline 8520 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
11:54:00:WU03:FS01:Started FahCore on PID 6964
11:54:01:WU03:FS01:Core PID:14472
11:54:01:WU03:FS01:FahCore 0x22 started
11:54:01:WU02:FS00:Starting
11:54:01:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 8520 -checkpoint 15 -np 14
11:54:01:WU02:FS00:Started FahCore on PID 14476
11:54:01:WU02:FS00:Core PID:9856
11:54:01:WU02:FS00:FahCore 0xa7 started
11:54:01:WU03:FS01:0x22:*********************** Log Started 2020-05-08T11:54:01Z ***********************
11:54:01:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
11:54:01:WU03:FS01:0x22: Type: 0x22
11:54:01:WU03:FS01:0x22: Core: Core22
11:54:01:WU03:FS01:0x22: Website: https://foldingathome.org/
11:54:01:WU03:FS01:0x22: Copyright: (c) 2009-2018 foldingathome.org
11:54:01:WU03:FS01:0x22: Author: John Chodera <[email protected]> and Rafal Wiewiora
11:54:01:WU03:FS01:0x22: <[email protected]>
11:54:01:WU03:FS01:0x22: Args: -dir 03 -suffix 01 -version 706 -lifeline 6964 -checkpoint 15
11:54:01:WU03:FS01:0x22: -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
11:54:01:WU03:FS01:0x22: Config: <none>
11:54:01:WU03:FS01:0x22:************************************ Build *************************************
11:54:01:WU03:FS01:0x22: Version: 0.0.2
11:54:01:WU03:FS01:0x22: Date: Dec 6 2019
11:54:01:WU03:FS01:0x22: Time: 21:30:31
11:54:01:WU03:FS01:0x22: Repository: Git
11:54:01:WU03:FS01:0x22: Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
11:54:01:WU03:FS01:0x22: Branch: HEAD
11:54:01:WU03:FS01:0x22: Compiler: Visual C++ 2008
11:54:01:WU03:FS01:0x22: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:54:01:WU03:FS01:0x22: Platform: win32 10
11:54:01:WU03:FS01:0x22: Bits: 64
11:54:01:WU03:FS01:0x22: Mode: Release
11:54:01:WU03:FS01:0x22:************************************ System ************************************
11:54:01:WU03:FS01:0x22: CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:54:01:WU03:FS01:0x22: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:54:01:WU03:FS01:0x22: CPUs: 16
11:54:01:WU03:FS01:0x22: Memory: 31.95GiB
11:54:01:WU03:FS01:0x22:Free Memory: 24.05GiB
11:54:01:WU03:FS01:0x22: Threads: WINDOWS_THREADS
11:54:01:WU03:FS01:0x22: OS Version: 6.2
11:54:01:WU03:FS01:0x22:Has Battery: false
11:54:01:WU03:FS01:0x22: On Battery: false
11:54:01:WU03:FS01:0x22: UTC Offset: -4
11:54:01:WU03:FS01:0x22: PID: 14472
11:54:01:WU03:FS01:0x22: CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:54:01:WU03:FS01:0x22: OS: Windows 10 Home
11:54:01:WU03:FS01:0x22: OS Arch: AMD64
11:54:01:WU03:FS01:0x22:********************************************************************************
11:54:01:WU03:FS01:0x22:Project: 16435 (Run 420, Clone 1, Gen 12)
11:54:01:WU03:FS01:0x22:Unit: 0x0000001403854c135e9a4efb06870c6b
11:54:01:WU03:FS01:0x22:Digital signatures verified
11:54:01:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
11:54:01:WU03:FS01:0x22:Version 0.0.2
11:54:01:WU03:FS01:0x22: Found a checkpoint file
11:54:01:WU02:FS00:0xa7:*********************** Log Started 2020-05-08T11:54:01Z ***********************
11:54:01:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
11:54:01:WU02:FS00:0xa7: Type: 0xa7
11:54:01:WU02:FS00:0xa7: Core: Gromacs
11:54:01:WU02:FS00:0xa7: Args: -dir 02 -suffix 01 -version 706 -lifeline 14476 -checkpoint 15 -np
11:54:01:WU02:FS00:0xa7: 14
11:54:01:WU02:FS00:0xa7:************************************ CBang *************************************
11:54:01:WU02:FS00:0xa7: Date: Oct 26 2019
11:54:01:WU02:FS00:0xa7: Time: 01:38:25
11:54:01:WU02:FS00:0xa7: Revision: c46a1a011a24143739ac7218c5a435f66777f62f
11:54:01:WU02:FS00:0xa7: Branch: master
11:54:01:WU02:FS00:0xa7: Compiler: Visual C++ 2008
11:54:01:WU02:FS00:0xa7: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:54:01:WU02:FS00:0xa7: Platform: win32 10
11:54:01:WU02:FS00:0xa7: Bits: 64
11:54:01:WU02:FS00:0xa7: Mode: Release
11:54:01:WU02:FS00:0xa7:************************************ System ************************************
11:54:01:WU02:FS00:0xa7: CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:54:01:WU02:FS00:0xa7: CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:54:01:WU02:FS00:0xa7: CPUs: 16
11:54:01:WU02:FS00:0xa7: Memory: 31.95GiB
11:54:01:WU02:FS00:0xa7:Free Memory: 24.06GiB
11:54:01:WU02:FS00:0xa7: Threads: WINDOWS_THREADS
11:54:01:WU02:FS00:0xa7: OS Version: 6.2
11:54:01:WU02:FS00:0xa7:Has Battery: false
11:54:01:WU02:FS00:0xa7: On Battery: false
11:54:01:WU02:FS00:0xa7: UTC Offset: -4
11:54:01:WU02:FS00:0xa7: PID: 9856
11:54:01:WU02:FS00:0xa7: CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:54:01:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
11:54:01:WU02:FS00:0xa7: Version: 0.0.18
11:54:01:WU02:FS00:0xa7: Author: Joseph Coffland <[email protected]>
11:54:01:WU02:FS00:0xa7: Copyright: 2019 foldingathome.org
11:54:01:WU02:FS00:0xa7: Homepage: https://foldingathome.org/
11:54:01:WU02:FS00:0xa7: Date: Oct 26 2019
11:54:01:WU02:FS00:0xa7: Time: 01:52:30
11:54:01:WU02:FS00:0xa7: Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
11:54:01:WU02:FS00:0xa7: Branch: master
11:54:01:WU02:FS00:0xa7: Compiler: Visual C++ 2008
11:54:01:WU02:FS00:0xa7: Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:54:01:WU02:FS00:0xa7: Platform: win32 10
11:54:01:WU02:FS00:0xa7: Bits: 64
11:54:01:WU02:FS00:0xa7: Mode: Release
11:54:01:WU02:FS00:0xa7:************************************ Build *************************************
11:54:01:WU02:FS00:0xa7: SIMD: avx_256
11:54:01:WU02:FS00:0xa7:********************************************************************************
11:54:01:WU02:FS00:0xa7:Project: 14235 (Run 587, Clone 1, Gen 17)
11:54:01:WU02:FS00:0xa7:Unit: 0x00000013cedfaa925ea375e71d1602bc
11:54:01:WU02:FS00:0xa7:Digital signatures verified
11:54:01:WU02:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
11:54:01:WU02:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
11:54:01:WU02:FS00:0xa7:Calling: mdrun -s frame17.tpr -o frame17.trr -x frame17.xtc -cpi state.cpt -cpt 15 -nt 12
11:54:01:WU02:FS00:0xa7:Steps: first=4250000 total=250000
11:54:03:WU02:FS00:0xa7:Completed 171992 out of 250000 steps (68%)
11:54:15:WU03:FS01:0x22:Completed 1730000 out of 5000000 steps (34%)
11:54:15:WU03:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
11:54:17:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
11:54:39:WU02:FS00:0xa7:Completed 172500 out of 250000 steps (69%)
11:55:22:WU03:FS01:0x22:Completed 1750000 out of 5000000 steps (35%)
11:57:34:WU02:FS00:0xa7:Completed 175000 out of 250000 steps (70%)
11:58:16:WU03:FS01:0x22:Completed 1800000 out of 5000000 steps (36%)
12:00:28:WU02:FS00:0xa7:Completed 177500 out of 250000 steps (71%)
12:01:09:WU03:FS01:0x22:Completed 1850000 out of 5000000 steps (37%)
12:03:21:WU02:FS00:0xa7:Completed 180000 out of 250000 steps (72%)
12:04:02:WU03:FS01:0x22:Completed 1900000 out of 5000000 steps (38%)
12:06:15:WU02:FS00:0xa7:Completed 182500 out of 250000 steps (73%)
12:06:54:WU03:FS01:0x22:Completed 1950000 out of 5000000 steps (39%)
12:09:11:WU02:FS00:0xa7:Completed 185000 out of 250000 steps (74%)
12:09:47:WU03:FS01:0x22:Completed 2000000 out of 5000000 steps (40%)
12:12:05:WU02:FS00:0xa7:Completed 187500 out of 250000 steps (75%)
12:12:40:WU03:FS01:0x22:Completed 2050000 out of 5000000 steps (41%)
12:14:57:WU02:FS00:0xa7:Completed 190000 out of 250000 steps (76%)
=========================================================EDIT BELOW=================
Started the core. Here are the numbers that my GPU is posting while crunching; it's very reasonable to assume similar numbers for the recent (2nd) crash of this WU.
Code: Select all
GPU UTIL GPU SCLK GPU MCLK GPU TEMP GPU PWR GPU FAN GPU VRAM UTIL CPU UTIL RAM UTIL
65 1472 945 72 139 2316 621 80.06 8.94
69 1490 945 72 134 2403 621 80.5 8.94
72 1478 945 72 134 2397 621 79.84 8.94
74 1487 945 72 133 2402 621 78.73 8.93
78 1472 945 72 141 2401 621 78.91 8.93
37 1449 945 72 120 2398 621 78.89 8.93
66 1479 945 73 134 2365 621 78.89 8.93
67 1464 945 73 130 2366 621 79.55 8.93
62 1487 945 73 138 2365 621 79.8 8.92
73 1466 945 73 135 2366 621 78.51 8.92
68 1487 945 73 139 2365 621 79.25 8.92
46 1473 945 73 141 2365 621 78.67 8.92
68 1488 945 73 141 2364 621 80.28 8.93
0 118 945 70 13 2367 621 82.68 8.96
73 1081 945 70 132 2311 621 80.92 8.95
67 1471 945 71 135 2237 621 80.01 8.94
65 1480 945 72 136 2218 621 79.37 8.94
71 1486 945 72 143 2300 621 78.53 8.93
64 1475 945 72 138 2391 621 79.06 8.93
66 1466 945 72 131 2406 621 80.35 8.93
70 1478 945 72 137 2405 621 79.98 8.93
87 1462 945 72 137 2388 621 79.44 8.93
75 1476 945 72 139 2367 621 78.29 8.93
61 1461 945 73 131 2365 621 79 8.93
69 1482 945 73 143 2365 621 79.7 8.93
70 1475 945 73 134 2366 621 78.85 8.93
86 1486 945 73 145 2366 621 79.98 8.93
82 1471 945 73 135 2365 621 79.16 8.93
87 1483 945 73 146 2366 621 78.63 8.93
80 1476 945 73 132 2366 621 79.5 8.93
0 150 945 70 13 2368 621 83.15 8.94
72 898 945 71 139 2326 621 81.55 8.95
70 1474 945 71 137 2241 621 78.08 8.94
38 1469 945 72 127 2222 621 79.88 8.94
85 1473 945 72 137 2308 621 80 8.94
85 1458 945 72 130 2391 621 78.77 8.94
80 1481 945 72 134 2406 621 78.72 8.94
85 1448 945 72 134 2396 621 79.8 8.93
83 1483 945 72 143 2403 621 78.92 8.93
80 1457 945 72 130 2368 622 82.53 8.93
67 1478 945 73 137 2375 623 80.93 8.93
64 1472 945 73 137 2368 623 79.92 8.93
72 1486 945 73 138 2367 623 79.65 8.93
73 1465 945 74 137 2366 623 80.58 8.93
86 1475 945 73 142 2367 623 79.37 8.93
79 1473 945 73 138 2366 623 80.14 8.93
81 1482 945 74 139 2366 623 79.37 8.93
0 278 945 71 14 2368 623 82.23 8.94
69 287 945 70 83 2335 623 82.97 8.94
73 1450 945 71 135 2265 623 80.28 8.94
88 1483 945 72 146 2219 623 80.22 8.94
83 1479 945 72 142 2297 623 79.8 8.93
61 1481 945 72 137 2372 623 80.88 8.93
83 1489 945 73 142 2402 625 81.22 8.91
81 1455 945 72 135 2388 573 86.81 8.96
74 1461 945 72 124 2366 628 92.16 9.06
76 1470 945 73 134 2366 495 90.36 9.16
74 1465 945 72 130 2368 498 91.51 9.2
62 1449 945 72 137 2366 487 90.86 9.24
82 1475 945 72 145 2365 490 83.74 9.25
71 1454 945 72 126 2367 512 89 9.25
73 1457 945 72 130 2366 527 89.5 9.24
86 1490 945 72 145 2367 527 80.68 9.24
Historically it has run at 83C during stress tests and is stable.
-
- Posts: 1099
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Project 16435 and RX Vega 56/64
I see you system keep crashing. You might need to go through couple of memtest86 sessions
FAH Omega tester
-
- Posts: 534
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: Project 16435 and RX Vega 56/64
Had another RCG of this work unit complete early today without errors. Once again the control showed about 1/2 the PPD that was actually awarded, but the actual PPD awarded was not out of line with expected.
Just a thought beyond some system checks, which I would also agree with. But recently there was a member that posted having many issues with some sort of NUC unit he had. The most recent drivers didn't work, and reverting back to an older driver got the system up and folding. You could possibly have a similar situation IMO.
As for system and memory testing, I've found that the more brutal the better. I often run several benchmarks at one time to really generate the heat and dig for any potential issues.
Just a thought beyond some system checks, which I would also agree with. But recently there was a member that posted having many issues with some sort of NUC unit he had. The most recent drivers didn't work, and reverting back to an older driver got the system up and folding. You could possibly have a similar situation IMO.
As for system and memory testing, I've found that the more brutal the better. I often run several benchmarks at one time to really generate the heat and dig for any potential issues.
Fold them if you get them!
-
- Posts: 73
- Joined: Sat Mar 21, 2020 3:56 pm
Re: Project 16435 and RX Vega 56/64
My system doesn't crash unless it's folding a 16435 GPU work unit. This includes games, video editing, benchmarking, and stress tests. I did attempt the memtest but after nearly two hours it was 91% complete with its first pass (zero errors) and I need to use my PC so I cancelled it. 32 GB of RAM does have at least that downside. That, plus folding isn't very RAM intensive compared to 90% of what I successfully use this PC for, so I don't really understand the suggestion but when I have time I will follow through with testing my memory just to be thorough.
The calculated and credited PPD for this project is roughly in line with everything. There are anomalies where I get mega-high PPD calculations (and very low), but this project isn't one of them. 1.0M - 1.2M is what I typically see; it drops a bit when the CPU is running a WU. Other projects push my GPU much harder; this one rarely goes above 90% GPU utilization, many times staying below 78%.
The above WU crashed a third time earlier today and did not recover. All three crashes are with clock speeds 45 Mhz below spec and GPU core temperature at 73C or less.
I put the settings back to default and it downloaded another 16435 WU - but this time it actually finished it without a crash. Since then it's crunched a 14564 and 11751 without issue and measuring 81 - 83C with peak clocks at 50 Mhz above spec to boot (I'm not telling it to overclock, I swear).
I'm not blaming the project necessarily - I get it that other PC configurations are finishing it just fine. The point of this thread was to show a pattern and point out a potential incompatibility between it and the Vega 64 (and by proxy, Vega 56). I didn't think it would turn into this.
The calculated and credited PPD for this project is roughly in line with everything. There are anomalies where I get mega-high PPD calculations (and very low), but this project isn't one of them. 1.0M - 1.2M is what I typically see; it drops a bit when the CPU is running a WU. Other projects push my GPU much harder; this one rarely goes above 90% GPU utilization, many times staying below 78%.
The above WU crashed a third time earlier today and did not recover. All three crashes are with clock speeds 45 Mhz below spec and GPU core temperature at 73C or less.
I put the settings back to default and it downloaded another 16435 WU - but this time it actually finished it without a crash. Since then it's crunched a 14564 and 11751 without issue and measuring 81 - 83C with peak clocks at 50 Mhz above spec to boot (I'm not telling it to overclock, I swear).
I'm not blaming the project necessarily - I get it that other PC configurations are finishing it just fine. The point of this thread was to show a pattern and point out a potential incompatibility between it and the Vega 64 (and by proxy, Vega 56). I didn't think it would turn into this.
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: Project 16435 and RX Vega 56/64
BSOD THREAD_STUCK_DEVICE_DRIVER
BlueScreenView may show which driver caused the problem
https://www.nirsoft.net/utils/blue_screen_view.html
Have you looked for a BIOS update?
Or Windows repair tools in cmd line
If other users would also get this issue then it would make sense to disable project 16435 for RX Vega 56/64
BlueScreenView may show which driver caused the problem
https://www.nirsoft.net/utils/blue_screen_view.html
Have you looked for a BIOS update?
Or Windows repair tools in cmd line
Code: Select all
sfc /scannow
DISM /Online /Cleanup-Image /RestoreHealth
-
- Posts: 1099
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Project 16435 and RX Vega 56/64
Project ran fine on vega 64 which is nothing different than 56 with a bit more shaders.
@Crawdaddy79, it is possible that project WU is reaching areas of your system which are not stable.
@Crawdaddy79, it is possible that project WU is reaching areas of your system which are not stable.
FAH Omega tester
-
- Posts: 73
- Joined: Sat Mar 21, 2020 3:56 pm
Re: Project 16435 and RX Vega 56/64
Is there data available that shows success rates of various system configurations per project? I would be very interested to see that.
Memory Test ran overnight:
![Image](http://www.crawspace.com/web3/mem_test.png)
Memory Test ran overnight:
![Image](http://www.crawspace.com/web3/mem_test.png)
-
- Posts: 1099
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Project 16435 and RX Vega 56/64
Failure rate for this project is 0.74%, which is considered normal. There is no easy way to filter that rate through different configs, I'm afraid
What is event viewer telling you, have you tried Foldy's suggestion?
Have you tried downclocking HBM memory?
What is event viewer telling you, have you tried Foldy's suggestion?
Have you tried downclocking HBM memory?
FAH Omega tester
-
- Posts: 73
- Joined: Sat Mar 21, 2020 3:56 pm
Re: Project 16435 and RX Vega 56/64
I had not adjusted HBM memory after starting the spreadsheet. I knocked it down from 945 to 930 and yods be praised, my system crunched through two 16435 WUs in a row. But then it crashed on a 11741 WU - this project has been 100% reliable previously.
I turned my case fans to max and installed Afterburner to monitor HBM temps and I learned that Afterburner sucks for monitoring. It would report blips of HBM temps of 0C and 3600C, ruining the low/high value holds. Assuming that those readings are glitches, HBM temps never got above 86C and GPU 79C.
After uninstalling Afterburner and grabbing GPU-Z, I adjusted it from 930 to 925, then set the clock to -3% and power limit to -5%. It crashed one more time on a non-16435 WU, but recovered on boot-up.
I turned my case fans back down, re-applied those settings and it's been folding strong for nearly 24 hours (with a break in the evening). It got two 16435 projects, one of which it sent back as faulty at 7% completion, the other it sent back with NO_ERROR. With case fans turned down, four degrees can be added to both the GPU and HBM temps (83C and 90C).
@ foldy - those are all new things to me. I did do sfc /scannow and it reported back that it found corrupt files and repaired them. This didn't solve my crashing issue, but I was surprised to see that there was an issue. As I get more time to tinker, I'll check out the other things. Thank you.
With this workaround, I do think I'm otherwise done with this thread and my tracking sheet. I am perplexed as to why this one project gave me so many issues with default settings.
Thanks everyone for your time.
I turned my case fans to max and installed Afterburner to monitor HBM temps and I learned that Afterburner sucks for monitoring. It would report blips of HBM temps of 0C and 3600C, ruining the low/high value holds. Assuming that those readings are glitches, HBM temps never got above 86C and GPU 79C.
After uninstalling Afterburner and grabbing GPU-Z, I adjusted it from 930 to 925, then set the clock to -3% and power limit to -5%. It crashed one more time on a non-16435 WU, but recovered on boot-up.
I turned my case fans back down, re-applied those settings and it's been folding strong for nearly 24 hours (with a break in the evening). It got two 16435 projects, one of which it sent back as faulty at 7% completion, the other it sent back with NO_ERROR. With case fans turned down, four degrees can be added to both the GPU and HBM temps (83C and 90C).
@ foldy - those are all new things to me. I did do sfc /scannow and it reported back that it found corrupt files and repaired them. This didn't solve my crashing issue, but I was surprised to see that there was an issue. As I get more time to tinker, I'll check out the other things. Thank you.
With this workaround, I do think I'm otherwise done with this thread and my tracking sheet. I am perplexed as to why this one project gave me so many issues with default settings.
Thanks everyone for your time.
-
- Posts: 534
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: Project 16435 and RX Vega 56/64
Just a follow up, since this WU has impacted at least one other person here.
I've now done 6 16435 WU's without error. But the PPD jumps all over, and it does run slow on my onboard graphics. I'm slightly missing the timeout time, and it's the only WU to do this. Usually even on this slower hardware it's done in less than half the timeout time. If nothing else, it's a good WU to really test a system. I hope you nail down a solid fix Crawdaddy79. I'm sure some time in the future there will be others that give our specific cards a workout.
And I figure if I'm picking up more of them, they still need someone to do the work so they get the data.
I've now done 6 16435 WU's without error. But the PPD jumps all over, and it does run slow on my onboard graphics. I'm slightly missing the timeout time, and it's the only WU to do this. Usually even on this slower hardware it's done in less than half the timeout time. If nothing else, it's a good WU to really test a system. I hope you nail down a solid fix Crawdaddy79. I'm sure some time in the future there will be others that give our specific cards a workout.
And I figure if I'm picking up more of them, they still need someone to do the work so they get the data.
Fold them if you get them!
-
- Posts: 73
- Joined: Sat Mar 21, 2020 3:56 pm
Re: Project 16435 and RX Vega 56/64
I think the issue with the PPD jumping all over the place for you is the 0.2% checkpointing frequency for this project. If your hard drive is slow or busy at the time, it could cause long pauses in the process and mess up the calculation.
I think my issue has to do with how the GPU hotspots are not evaluated in the board's auto-underclocking algorithm. I only learned about the hotspot sensor yesterday morning (it routinely measures 105C). Turning my air flow down and down clocking the memory and lowering the power limit seems to do the trick. I had one BSOD in 48 hours, and even then both slots WU's recovered and continued processing.
I think my issue has to do with how the GPU hotspots are not evaluated in the board's auto-underclocking algorithm. I only learned about the hotspot sensor yesterday morning (it routinely measures 105C). Turning my air flow down and down clocking the memory and lowering the power limit seems to do the trick. I had one BSOD in 48 hours, and even then both slots WU's recovered and continued processing.
-
- Posts: 1099
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Project 16435 and RX Vega 56/64
Jumping in PPD is due to WU demanding of more than one CPU core. Pause all the CPU slots on that Computer, and your 16435 TPF will decrease. This has been observed with top of the line CPUs on windows. This is mainly due to driver overhead on AMD GPUs.
Also, frequent checkpoints don't help either. Basic SSD seems to help a lot compared to old HDD
Also, frequent checkpoints don't help either. Basic SSD seems to help a lot compared to old HDD
FAH Omega tester
-
- Posts: 73
- Joined: Sat Mar 21, 2020 3:56 pm
Re: Project 16435 and RX Vega 56/64
Apologies for doubting this number, but I recently found apps.foldingathome.org/wu and have been going through my recent failed returns of this project.muziqaz wrote:Failure rate for this project is 0.74%, which is considered normal.
Check these out:
https://apps.foldingathome.org/wu#proje ... ne=2&gen=3
https://apps.foldingathome.org/wu#proje ... e=1&gen=12
https://apps.foldingathome.org/wu#proje ... ne=2&gen=6
https://apps.foldingathome.org/wu#proje ... ne=2&gen=9
https://apps.foldingathome.org/wu#proje ... e=4&gen=17
https://apps.foldingathome.org/wu#proje ... e=0&gen=23
https://apps.foldingathome.org/wu#proje ... e=0&gen=32
Does 1 OK return and 4 failed returns of the same WU equal a success rate of 100%? Almost every WU I look at from this project has at least one failure, even ones that I return as OK.
-
- Posts: 1099
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Project 16435 and RX Vega 56/64
I believe failure rate counts every returned WU even if it was eventually finished successfully. I might be wrong though
FAH Omega tester