Project 16435 and RX Vega 56/64

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

muziqaz
Posts: 1099
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Project 16435 and RX Vega 56/64

Post by muziqaz »

Do you have latest driver installed?
FAH Omega tester
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Project 16435 and RX Vega 56/64

Post by Crawdaddy79 »

I do (20.4.2). Release notes for it say it's more stable for folding. And it is, except for this project.

2nd crash at even lower clock limit. System didn't even have time to warm up.

I'll just post the majority of the crash log here (sorry mobile users):

Code: Select all

11:46:16:WU03:FS01:Started FahCore on PID 15116
11:46:16:WU03:FS01:Core PID:14828
11:46:16:WU03:FS01:FahCore 0x22 started
11:46:17:WU02:FS00:Starting
11:46:17:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 8520 -checkpoint 15 -np 14
11:46:17:WU02:FS00:Started FahCore on PID 1204
11:46:17:WU02:FS00:Core PID:11516
11:46:17:WU02:FS00:FahCore 0xa7 started
11:46:17:WU03:FS01:0x22:*********************** Log Started 2020-05-08T11:46:17Z ***********************
11:46:17:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
11:46:17:WU03:FS01:0x22:       Type: 0x22
11:46:17:WU03:FS01:0x22:       Core: Core22
11:46:17:WU03:FS01:0x22:    Website: https://foldingathome.org/
11:46:17:WU03:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
11:46:17:WU03:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
11:46:17:WU03:FS01:0x22:             <[email protected]>
11:46:17:WU03:FS01:0x22:       Args: -dir 03 -suffix 01 -version 706 -lifeline 15116 -checkpoint 15
11:46:17:WU03:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
11:46:17:WU03:FS01:0x22:     Config: <none>
11:46:17:WU03:FS01:0x22:************************************ Build *************************************
11:46:17:WU03:FS01:0x22:    Version: 0.0.2
11:46:17:WU03:FS01:0x22:       Date: Dec 6 2019
11:46:17:WU03:FS01:0x22:       Time: 21:30:31
11:46:17:WU03:FS01:0x22: Repository: Git
11:46:17:WU03:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
11:46:17:WU03:FS01:0x22:     Branch: HEAD
11:46:17:WU03:FS01:0x22:   Compiler: Visual C++ 2008
11:46:17:WU03:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:46:17:WU03:FS01:0x22:   Platform: win32 10
11:46:17:WU03:FS01:0x22:       Bits: 64
11:46:17:WU03:FS01:0x22:       Mode: Release
11:46:17:WU03:FS01:0x22:************************************ System ************************************
11:46:17:WU03:FS01:0x22:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:46:17:WU03:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:46:17:WU03:FS01:0x22:       CPUs: 16
11:46:17:WU03:FS01:0x22:     Memory: 31.95GiB
11:46:17:WU03:FS01:0x22:Free Memory: 24.17GiB
11:46:17:WU03:FS01:0x22:    Threads: WINDOWS_THREADS
11:46:17:WU03:FS01:0x22: OS Version: 6.2
11:46:17:WU03:FS01:0x22:Has Battery: false
11:46:17:WU03:FS01:0x22: On Battery: false
11:46:17:WU03:FS01:0x22: UTC Offset: -4
11:46:17:WU03:FS01:0x22:        PID: 14828
11:46:17:WU03:FS01:0x22:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:46:17:WU03:FS01:0x22:         OS: Windows 10 Home
11:46:17:WU03:FS01:0x22:    OS Arch: AMD64
11:46:17:WU03:FS01:0x22:********************************************************************************
11:46:17:WU03:FS01:0x22:Project: 16435 (Run 420, Clone 1, Gen 12)
11:46:17:WU03:FS01:0x22:Unit: 0x0000001403854c135e9a4efb06870c6b
11:46:17:WU03:FS01:0x22:Digital signatures verified
11:46:17:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
11:46:17:WU03:FS01:0x22:Version 0.0.2
11:46:17:WU03:FS01:0x22:  Found a checkpoint file
11:46:17:WU02:FS00:0xa7:*********************** Log Started 2020-05-08T11:46:17Z ***********************
11:46:17:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
11:46:17:WU02:FS00:0xa7:       Type: 0xa7
11:46:17:WU02:FS00:0xa7:       Core: Gromacs
11:46:17:WU02:FS00:0xa7:       Args: -dir 02 -suffix 01 -version 706 -lifeline 1204 -checkpoint 15 -np
11:46:17:WU02:FS00:0xa7:             14
11:46:17:WU02:FS00:0xa7:************************************ CBang *************************************
11:46:17:WU02:FS00:0xa7:       Date: Oct 26 2019
11:46:17:WU02:FS00:0xa7:       Time: 01:38:25
11:46:17:WU02:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
11:46:17:WU02:FS00:0xa7:     Branch: master
11:46:17:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
11:46:17:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:46:17:WU02:FS00:0xa7:   Platform: win32 10
11:46:17:WU02:FS00:0xa7:       Bits: 64
11:46:17:WU02:FS00:0xa7:       Mode: Release
11:46:17:WU02:FS00:0xa7:************************************ System ************************************
11:46:17:WU02:FS00:0xa7:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:46:17:WU02:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:46:17:WU02:FS00:0xa7:       CPUs: 16
11:46:17:WU02:FS00:0xa7:     Memory: 31.95GiB
11:46:17:WU02:FS00:0xa7:Free Memory: 24.16GiB
11:46:17:WU02:FS00:0xa7:    Threads: WINDOWS_THREADS
11:46:17:WU02:FS00:0xa7: OS Version: 6.2
11:46:17:WU02:FS00:0xa7:Has Battery: false
11:46:17:WU02:FS00:0xa7: On Battery: false
11:46:17:WU02:FS00:0xa7: UTC Offset: -4
11:46:17:WU02:FS00:0xa7:        PID: 11516
11:46:17:WU02:FS00:0xa7:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:46:17:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
11:46:17:WU02:FS00:0xa7:    Version: 0.0.18
11:46:17:WU02:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
11:46:17:WU02:FS00:0xa7:  Copyright: 2019 foldingathome.org
11:46:17:WU02:FS00:0xa7:   Homepage: https://foldingathome.org/
11:46:17:WU02:FS00:0xa7:       Date: Oct 26 2019
11:46:17:WU02:FS00:0xa7:       Time: 01:52:30
11:46:17:WU02:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
11:46:17:WU02:FS00:0xa7:     Branch: master
11:46:17:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
11:46:17:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:46:17:WU02:FS00:0xa7:   Platform: win32 10
11:46:17:WU02:FS00:0xa7:       Bits: 64
11:46:17:WU02:FS00:0xa7:       Mode: Release
11:46:17:WU02:FS00:0xa7:************************************ Build *************************************
11:46:17:WU02:FS00:0xa7:       SIMD: avx_256
11:46:17:WU02:FS00:0xa7:********************************************************************************
11:46:17:WU02:FS00:0xa7:Project: 14235 (Run 587, Clone 1, Gen 17)
11:46:17:WU02:FS00:0xa7:Unit: 0x00000013cedfaa925ea375e71d1602bc
11:46:17:WU02:FS00:0xa7:Digital signatures verified
11:46:17:WU02:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
11:46:17:WU02:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
11:46:17:WU02:FS00:0xa7:Calling: mdrun -s frame17.tpr -o frame17.trr -x frame17.xtc -cpi state.cpt -cpt 15 -nt 12
11:46:17:WU02:FS00:0xa7:Steps: first=4250000 total=250000
11:46:19:WU02:FS00:0xa7:Completed 165881 out of 250000 steps (66%)
11:46:30:WU03:FS01:0x22:Completed 1610000 out of 5000000 steps (32%)
11:46:30:WU03:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
11:48:12:WU02:FS00:0xa7:Completed 167500 out of 250000 steps (67%)
11:48:47:WU03:FS01:0x22:Completed 1650000 out of 5000000 steps (33%)
11:51:07:WU02:FS00:0xa7:Completed 170000 out of 250000 steps (68%)
11:51:40:WU03:FS01:0x22:Completed 1700000 out of 5000000 steps (34%)
11:53:27:FS00:Paused
11:53:27:FS01:Paused
11:53:27:FS00:Shutting core down
11:53:27:FS01:Shutting core down
11:53:27:WU03:FS01:0x22:WARNING:Console control signal 1 on PID 14828
11:53:27:WU03:FS01:0x22:Exiting, please wait. . .
11:53:27:WU02:FS00:0xa7:WARNING:Console control signal 1 on PID 11516
11:53:27:WU02:FS00:0xa7:Exiting, please wait. . .
11:53:28:WU03:FS01:0x22:Folding@home Core Shutdown: INTERRUPTED
11:53:28:WU03:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
11:53:29:WU02:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
11:53:30:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
11:54:00:FS00:Unpaused
11:54:00:FS01:Unpaused
11:54:00:WU03:FS01:Starting
11:54:00:WU03:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 03 -suffix 01 -version 706 -lifeline 8520 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
11:54:00:WU03:FS01:Started FahCore on PID 6964
11:54:01:WU03:FS01:Core PID:14472
11:54:01:WU03:FS01:FahCore 0x22 started
11:54:01:WU02:FS00:Starting
11:54:01:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 8520 -checkpoint 15 -np 14
11:54:01:WU02:FS00:Started FahCore on PID 14476
11:54:01:WU02:FS00:Core PID:9856
11:54:01:WU02:FS00:FahCore 0xa7 started
11:54:01:WU03:FS01:0x22:*********************** Log Started 2020-05-08T11:54:01Z ***********************
11:54:01:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
11:54:01:WU03:FS01:0x22:       Type: 0x22
11:54:01:WU03:FS01:0x22:       Core: Core22
11:54:01:WU03:FS01:0x22:    Website: https://foldingathome.org/
11:54:01:WU03:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
11:54:01:WU03:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
11:54:01:WU03:FS01:0x22:             <[email protected]>
11:54:01:WU03:FS01:0x22:       Args: -dir 03 -suffix 01 -version 706 -lifeline 6964 -checkpoint 15
11:54:01:WU03:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
11:54:01:WU03:FS01:0x22:     Config: <none>
11:54:01:WU03:FS01:0x22:************************************ Build *************************************
11:54:01:WU03:FS01:0x22:    Version: 0.0.2
11:54:01:WU03:FS01:0x22:       Date: Dec 6 2019
11:54:01:WU03:FS01:0x22:       Time: 21:30:31
11:54:01:WU03:FS01:0x22: Repository: Git
11:54:01:WU03:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
11:54:01:WU03:FS01:0x22:     Branch: HEAD
11:54:01:WU03:FS01:0x22:   Compiler: Visual C++ 2008
11:54:01:WU03:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:54:01:WU03:FS01:0x22:   Platform: win32 10
11:54:01:WU03:FS01:0x22:       Bits: 64
11:54:01:WU03:FS01:0x22:       Mode: Release
11:54:01:WU03:FS01:0x22:************************************ System ************************************
11:54:01:WU03:FS01:0x22:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:54:01:WU03:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:54:01:WU03:FS01:0x22:       CPUs: 16
11:54:01:WU03:FS01:0x22:     Memory: 31.95GiB
11:54:01:WU03:FS01:0x22:Free Memory: 24.05GiB
11:54:01:WU03:FS01:0x22:    Threads: WINDOWS_THREADS
11:54:01:WU03:FS01:0x22: OS Version: 6.2
11:54:01:WU03:FS01:0x22:Has Battery: false
11:54:01:WU03:FS01:0x22: On Battery: false
11:54:01:WU03:FS01:0x22: UTC Offset: -4
11:54:01:WU03:FS01:0x22:        PID: 14472
11:54:01:WU03:FS01:0x22:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:54:01:WU03:FS01:0x22:         OS: Windows 10 Home
11:54:01:WU03:FS01:0x22:    OS Arch: AMD64
11:54:01:WU03:FS01:0x22:********************************************************************************
11:54:01:WU03:FS01:0x22:Project: 16435 (Run 420, Clone 1, Gen 12)
11:54:01:WU03:FS01:0x22:Unit: 0x0000001403854c135e9a4efb06870c6b
11:54:01:WU03:FS01:0x22:Digital signatures verified
11:54:01:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
11:54:01:WU03:FS01:0x22:Version 0.0.2
11:54:01:WU03:FS01:0x22:  Found a checkpoint file
11:54:01:WU02:FS00:0xa7:*********************** Log Started 2020-05-08T11:54:01Z ***********************
11:54:01:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
11:54:01:WU02:FS00:0xa7:       Type: 0xa7
11:54:01:WU02:FS00:0xa7:       Core: Gromacs
11:54:01:WU02:FS00:0xa7:       Args: -dir 02 -suffix 01 -version 706 -lifeline 14476 -checkpoint 15 -np
11:54:01:WU02:FS00:0xa7:             14
11:54:01:WU02:FS00:0xa7:************************************ CBang *************************************
11:54:01:WU02:FS00:0xa7:       Date: Oct 26 2019
11:54:01:WU02:FS00:0xa7:       Time: 01:38:25
11:54:01:WU02:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
11:54:01:WU02:FS00:0xa7:     Branch: master
11:54:01:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
11:54:01:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:54:01:WU02:FS00:0xa7:   Platform: win32 10
11:54:01:WU02:FS00:0xa7:       Bits: 64
11:54:01:WU02:FS00:0xa7:       Mode: Release
11:54:01:WU02:FS00:0xa7:************************************ System ************************************
11:54:01:WU02:FS00:0xa7:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
11:54:01:WU02:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
11:54:01:WU02:FS00:0xa7:       CPUs: 16
11:54:01:WU02:FS00:0xa7:     Memory: 31.95GiB
11:54:01:WU02:FS00:0xa7:Free Memory: 24.06GiB
11:54:01:WU02:FS00:0xa7:    Threads: WINDOWS_THREADS
11:54:01:WU02:FS00:0xa7: OS Version: 6.2
11:54:01:WU02:FS00:0xa7:Has Battery: false
11:54:01:WU02:FS00:0xa7: On Battery: false
11:54:01:WU02:FS00:0xa7: UTC Offset: -4
11:54:01:WU02:FS00:0xa7:        PID: 9856
11:54:01:WU02:FS00:0xa7:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
11:54:01:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
11:54:01:WU02:FS00:0xa7:    Version: 0.0.18
11:54:01:WU02:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
11:54:01:WU02:FS00:0xa7:  Copyright: 2019 foldingathome.org
11:54:01:WU02:FS00:0xa7:   Homepage: https://foldingathome.org/
11:54:01:WU02:FS00:0xa7:       Date: Oct 26 2019
11:54:01:WU02:FS00:0xa7:       Time: 01:52:30
11:54:01:WU02:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
11:54:01:WU02:FS00:0xa7:     Branch: master
11:54:01:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
11:54:01:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
11:54:01:WU02:FS00:0xa7:   Platform: win32 10
11:54:01:WU02:FS00:0xa7:       Bits: 64
11:54:01:WU02:FS00:0xa7:       Mode: Release
11:54:01:WU02:FS00:0xa7:************************************ Build *************************************
11:54:01:WU02:FS00:0xa7:       SIMD: avx_256
11:54:01:WU02:FS00:0xa7:********************************************************************************
11:54:01:WU02:FS00:0xa7:Project: 14235 (Run 587, Clone 1, Gen 17)
11:54:01:WU02:FS00:0xa7:Unit: 0x00000013cedfaa925ea375e71d1602bc
11:54:01:WU02:FS00:0xa7:Digital signatures verified
11:54:01:WU02:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
11:54:01:WU02:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
11:54:01:WU02:FS00:0xa7:Calling: mdrun -s frame17.tpr -o frame17.trr -x frame17.xtc -cpi state.cpt -cpt 15 -nt 12
11:54:01:WU02:FS00:0xa7:Steps: first=4250000 total=250000
11:54:03:WU02:FS00:0xa7:Completed 171992 out of 250000 steps (68%)
11:54:15:WU03:FS01:0x22:Completed 1730000 out of 5000000 steps (34%)
11:54:15:WU03:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
11:54:17:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
11:54:39:WU02:FS00:0xa7:Completed 172500 out of 250000 steps (69%)
11:55:22:WU03:FS01:0x22:Completed 1750000 out of 5000000 steps (35%)
11:57:34:WU02:FS00:0xa7:Completed 175000 out of 250000 steps (70%)
11:58:16:WU03:FS01:0x22:Completed 1800000 out of 5000000 steps (36%)
12:00:28:WU02:FS00:0xa7:Completed 177500 out of 250000 steps (71%)
12:01:09:WU03:FS01:0x22:Completed 1850000 out of 5000000 steps (37%)
12:03:21:WU02:FS00:0xa7:Completed 180000 out of 250000 steps (72%)
12:04:02:WU03:FS01:0x22:Completed 1900000 out of 5000000 steps (38%)
12:06:15:WU02:FS00:0xa7:Completed 182500 out of 250000 steps (73%)
12:06:54:WU03:FS01:0x22:Completed 1950000 out of 5000000 steps (39%)
12:09:11:WU02:FS00:0xa7:Completed 185000 out of 250000 steps (74%)
12:09:47:WU03:FS01:0x22:Completed 2000000 out of 5000000 steps (40%)
12:12:05:WU02:FS00:0xa7:Completed 187500 out of 250000 steps (75%)
12:12:40:WU03:FS01:0x22:Completed 2050000 out of 5000000 steps (41%)
12:14:57:WU02:FS00:0xa7:Completed 190000 out of 250000 steps (76%)
WU recovered a 2nd time, but I'm going Light for now. Need to use my PC.

=========================================================EDIT BELOW=================

Started the core. Here are the numbers that my GPU is posting while crunching; it's very reasonable to assume similar numbers for the recent (2nd) crash of this WU.

Code: Select all

GPU UTIL	GPU SCLK	GPU MCLK	GPU TEMP	GPU PWR	GPU FAN	GPU VRAM UTIL	CPU UTIL	RAM UTIL
65	1472	945	72	139	2316	621	80.06	8.94
69	1490	945	72	134	2403	621	80.5	8.94
72	1478	945	72	134	2397	621	79.84	8.94
74	1487	945	72	133	2402	621	78.73	8.93
78	1472	945	72	141	2401	621	78.91	8.93
37	1449	945	72	120	2398	621	78.89	8.93
66	1479	945	73	134	2365	621	78.89	8.93
67	1464	945	73	130	2366	621	79.55	8.93
62	1487	945	73	138	2365	621	79.8	8.92
73	1466	945	73	135	2366	621	78.51	8.92
68	1487	945	73	139	2365	621	79.25	8.92
46	1473	945	73	141	2365	621	78.67	8.92
68	1488	945	73	141	2364	621	80.28	8.93
0	118	945	70	13	2367	621	82.68	8.96
73	1081	945	70	132	2311	621	80.92	8.95
67	1471	945	71	135	2237	621	80.01	8.94
65	1480	945	72	136	2218	621	79.37	8.94
71	1486	945	72	143	2300	621	78.53	8.93
64	1475	945	72	138	2391	621	79.06	8.93
66	1466	945	72	131	2406	621	80.35	8.93
70	1478	945	72	137	2405	621	79.98	8.93
87	1462	945	72	137	2388	621	79.44	8.93
75	1476	945	72	139	2367	621	78.29	8.93
61	1461	945	73	131	2365	621	79	8.93
69	1482	945	73	143	2365	621	79.7	8.93
70	1475	945	73	134	2366	621	78.85	8.93
86	1486	945	73	145	2366	621	79.98	8.93
82	1471	945	73	135	2365	621	79.16	8.93
87	1483	945	73	146	2366	621	78.63	8.93
80	1476	945	73	132	2366	621	79.5	8.93
0	150	945	70	13	2368	621	83.15	8.94
72	898	945	71	139	2326	621	81.55	8.95
70	1474	945	71	137	2241	621	78.08	8.94
38	1469	945	72	127	2222	621	79.88	8.94
85	1473	945	72	137	2308	621	80	8.94
85	1458	945	72	130	2391	621	78.77	8.94
80	1481	945	72	134	2406	621	78.72	8.94
85	1448	945	72	134	2396	621	79.8	8.93
83	1483	945	72	143	2403	621	78.92	8.93
80	1457	945	72	130	2368	622	82.53	8.93
67	1478	945	73	137	2375	623	80.93	8.93
64	1472	945	73	137	2368	623	79.92	8.93
72	1486	945	73	138	2367	623	79.65	8.93
73	1465	945	74	137	2366	623	80.58	8.93
86	1475	945	73	142	2367	623	79.37	8.93
79	1473	945	73	138	2366	623	80.14	8.93
81	1482	945	74	139	2366	623	79.37	8.93
0	278	945	71	14	2368	623	82.23	8.94
69	287	945	70	83	2335	623	82.97	8.94
73	1450	945	71	135	2265	623	80.28	8.94
88	1483	945	72	146	2219	623	80.22	8.94
83	1479	945	72	142	2297	623	79.8	8.93
61	1481	945	72	137	2372	623	80.88	8.93
83	1489	945	73	142	2402	625	81.22	8.91
81	1455	945	72	135	2388	573	86.81	8.96
74	1461	945	72	124	2366	628	92.16	9.06
76	1470	945	73	134	2366	495	90.36	9.16
74	1465	945	72	130	2368	498	91.51	9.2
62	1449	945	72	137	2366	487	90.86	9.24
82	1475	945	72	145	2365	490	83.74	9.25
71	1454	945	72	126	2367	512	89	9.25
73	1457	945	72	130	2366	527	89.5	9.24
86	1490	945	72	145	2367	527	80.68	9.24
Summary: GPU bounces between 70 and 90% utilization, 1450 - 1500 Mhz clock, and 71 - 73C temp.

Historically it has run at 83C during stress tests and is stable.
Image
muziqaz
Posts: 1099
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Project 16435 and RX Vega 56/64

Post by muziqaz »

I see you system keep crashing. You might need to go through couple of memtest86 sessions
FAH Omega tester
BobWilliams757
Posts: 534
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 16435 and RX Vega 56/64

Post by BobWilliams757 »

Had another RCG of this work unit complete early today without errors. Once again the control showed about 1/2 the PPD that was actually awarded, but the actual PPD awarded was not out of line with expected.


Just a thought beyond some system checks, which I would also agree with. But recently there was a member that posted having many issues with some sort of NUC unit he had. The most recent drivers didn't work, and reverting back to an older driver got the system up and folding. You could possibly have a similar situation IMO.

As for system and memory testing, I've found that the more brutal the better. I often run several benchmarks at one time to really generate the heat and dig for any potential issues.
Fold them if you get them!
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Project 16435 and RX Vega 56/64

Post by Crawdaddy79 »

My system doesn't crash unless it's folding a 16435 GPU work unit. This includes games, video editing, benchmarking, and stress tests. I did attempt the memtest but after nearly two hours it was 91% complete with its first pass (zero errors) and I need to use my PC so I cancelled it. 32 GB of RAM does have at least that downside. That, plus folding isn't very RAM intensive compared to 90% of what I successfully use this PC for, so I don't really understand the suggestion but when I have time I will follow through with testing my memory just to be thorough.

The calculated and credited PPD for this project is roughly in line with everything. There are anomalies where I get mega-high PPD calculations (and very low), but this project isn't one of them. 1.0M - 1.2M is what I typically see; it drops a bit when the CPU is running a WU. Other projects push my GPU much harder; this one rarely goes above 90% GPU utilization, many times staying below 78%.

The above WU crashed a third time earlier today and did not recover. All three crashes are with clock speeds 45 Mhz below spec and GPU core temperature at 73C or less.

I put the settings back to default and it downloaded another 16435 WU - but this time it actually finished it without a crash. Since then it's crunched a 14564 and 11751 without issue and measuring 81 - 83C with peak clocks at 50 Mhz above spec to boot (I'm not telling it to overclock, I swear).

I'm not blaming the project necessarily - I get it that other PC configurations are finishing it just fine. The point of this thread was to show a pattern and point out a potential incompatibility between it and the Vega 64 (and by proxy, Vega 56). I didn't think it would turn into this.
Image
foldy
Posts: 2040
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Project 16435 and RX Vega 56/64

Post by foldy »

BSOD THREAD_STUCK_DEVICE_DRIVER
BlueScreenView may show which driver caused the problem
https://www.nirsoft.net/utils/blue_screen_view.html

Have you looked for a BIOS update?

Or Windows repair tools in cmd line

Code: Select all

sfc /scannow
DISM /Online /Cleanup-Image /RestoreHealth
If other users would also get this issue then it would make sense to disable project 16435 for RX Vega 56/64
muziqaz
Posts: 1099
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Project 16435 and RX Vega 56/64

Post by muziqaz »

Project ran fine on vega 64 which is nothing different than 56 with a bit more shaders.
@Crawdaddy79, it is possible that project WU is reaching areas of your system which are not stable.
FAH Omega tester
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Project 16435 and RX Vega 56/64

Post by Crawdaddy79 »

Is there data available that shows success rates of various system configurations per project? I would be very interested to see that.

Memory Test ran overnight:
Image
Image
muziqaz
Posts: 1099
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Project 16435 and RX Vega 56/64

Post by muziqaz »

Failure rate for this project is 0.74%, which is considered normal. There is no easy way to filter that rate through different configs, I'm afraid
What is event viewer telling you, have you tried Foldy's suggestion?
Have you tried downclocking HBM memory?
FAH Omega tester
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Project 16435 and RX Vega 56/64

Post by Crawdaddy79 »

I had not adjusted HBM memory after starting the spreadsheet. I knocked it down from 945 to 930 and yods be praised, my system crunched through two 16435 WUs in a row. But then it crashed on a 11741 WU - this project has been 100% reliable previously.

I turned my case fans to max and installed Afterburner to monitor HBM temps and I learned that Afterburner sucks for monitoring. It would report blips of HBM temps of 0C and 3600C, ruining the low/high value holds. Assuming that those readings are glitches, HBM temps never got above 86C and GPU 79C.

After uninstalling Afterburner and grabbing GPU-Z, I adjusted it from 930 to 925, then set the clock to -3% and power limit to -5%. It crashed one more time on a non-16435 WU, but recovered on boot-up.

I turned my case fans back down, re-applied those settings and it's been folding strong for nearly 24 hours (with a break in the evening). It got two 16435 projects, one of which it sent back as faulty at 7% completion, the other it sent back with NO_ERROR. With case fans turned down, four degrees can be added to both the GPU and HBM temps (83C and 90C).

@ foldy - those are all new things to me. I did do sfc /scannow and it reported back that it found corrupt files and repaired them. This didn't solve my crashing issue, but I was surprised to see that there was an issue. As I get more time to tinker, I'll check out the other things. Thank you.

With this workaround, I do think I'm otherwise done with this thread and my tracking sheet. I am perplexed as to why this one project gave me so many issues with default settings.

Thanks everyone for your time.
Image
BobWilliams757
Posts: 534
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 16435 and RX Vega 56/64

Post by BobWilliams757 »

Just a follow up, since this WU has impacted at least one other person here.

I've now done 6 16435 WU's without error. But the PPD jumps all over, and it does run slow on my onboard graphics. I'm slightly missing the timeout time, and it's the only WU to do this. Usually even on this slower hardware it's done in less than half the timeout time. If nothing else, it's a good WU to really test a system. I hope you nail down a solid fix Crawdaddy79. I'm sure some time in the future there will be others that give our specific cards a workout.

And I figure if I'm picking up more of them, they still need someone to do the work so they get the data.
Fold them if you get them!
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Project 16435 and RX Vega 56/64

Post by Crawdaddy79 »

I think the issue with the PPD jumping all over the place for you is the 0.2% checkpointing frequency for this project. If your hard drive is slow or busy at the time, it could cause long pauses in the process and mess up the calculation.

I think my issue has to do with how the GPU hotspots are not evaluated in the board's auto-underclocking algorithm. I only learned about the hotspot sensor yesterday morning (it routinely measures 105C). Turning my air flow down and down clocking the memory and lowering the power limit seems to do the trick. I had one BSOD in 48 hours, and even then both slots WU's recovered and continued processing.
Image
muziqaz
Posts: 1099
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Project 16435 and RX Vega 56/64

Post by muziqaz »

Jumping in PPD is due to WU demanding of more than one CPU core. Pause all the CPU slots on that Computer, and your 16435 TPF will decrease. This has been observed with top of the line CPUs on windows. This is mainly due to driver overhead on AMD GPUs.
Also, frequent checkpoints don't help either. Basic SSD seems to help a lot compared to old HDD
FAH Omega tester
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Project 16435 and RX Vega 56/64

Post by Crawdaddy79 »

muziqaz wrote:Failure rate for this project is 0.74%, which is considered normal.
Apologies for doubting this number, but I recently found apps.foldingathome.org/wu and have been going through my recent failed returns of this project.

Check these out:
https://apps.foldingathome.org/wu#proje ... ne=2&gen=3
https://apps.foldingathome.org/wu#proje ... e=1&gen=12
https://apps.foldingathome.org/wu#proje ... ne=2&gen=6
https://apps.foldingathome.org/wu#proje ... ne=2&gen=9
https://apps.foldingathome.org/wu#proje ... e=4&gen=17
https://apps.foldingathome.org/wu#proje ... e=0&gen=23
https://apps.foldingathome.org/wu#proje ... e=0&gen=32

Does 1 OK return and 4 failed returns of the same WU equal a success rate of 100%? Almost every WU I look at from this project has at least one failure, even ones that I return as OK.
Image
muziqaz
Posts: 1099
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Project 16435 and RX Vega 56/64

Post by muziqaz »

I believe failure rate counts every returned WU even if it was eventually finished successfully. I might be wrong though
FAH Omega tester
Post Reply