Page 1 of 1

issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 5:30 pm
by atlr
On Windows F@H 7.6.21, Core22 seems launch with parameters that make it run only on a GPU on the internal PCIe bus.

The host is a Windows 10 PC with an Nvidia GPU in an internal PCIe slot and and discrete AMD GPUs connected via Thunderbolt interfaces.

The PCIe buses that represent the Thunderbolt interfaces show as detected in the log. But the parameters generated for Core22 don't seem to use this information.

Here is the log of the case when the only Running job is an AMD card on Folding Slot ID 2 PCIe bus 124 device 0, but Core22 is using the Nvidia card that is defined as Folding Slot 0 bus 225 device 0. I confirm the Nvidia GPU is doing the work by using HWinfo.

Code: Select all

*********************** Log Started 2021-12-19T17:00:26Z ***********************
17:00:26:******************************* libFAH ********************************
17:00:26:           Date: Oct 20 2020
17:00:26:           Time: 13:36:55
17:00:26:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
17:00:26:         Branch: master
17:00:26:       Compiler: Visual C++ 2015
17:00:26:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
17:00:26:       Platform: win32 10
17:00:26:           Bits: 32
17:00:26:           Mode: Release
17:00:26:****************************** FAHClient ******************************
17:00:26:        Version: 7.6.21
17:00:26:         Author: Joseph Coffland <[email protected]>
17:00:26:      Copyright: 2020 foldingathome.org
17:00:26:       Homepage: https://foldingathome.org/
17:00:26:           Date: Oct 20 2020
17:00:26:           Time: 13:41:04
17:00:26:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
17:00:26:         Branch: master
17:00:26:       Compiler: Visual C++ 2015
17:00:26:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
17:00:26:       Platform: win32 10
17:00:26:           Bits: 32
17:00:26:           Mode: Release
17:00:26:           Args: --open-web-control
17:00:26:         Config: C:\ProgramData\FAHClient\config.xml
17:00:26:******************************** CBang ********************************
17:00:26:           Date: Oct 20 2020
17:00:26:           Time: 11:36:18
17:00:26:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
17:00:26:         Branch: master
17:00:26:       Compiler: Visual C++ 2015
17:00:26:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
17:00:26:       Platform: win32 10
17:00:26:           Bits: 32
17:00:26:           Mode: Release
17:00:26:******************************* System ********************************
17:00:26:            CPU: AMD Ryzen Threadripper PRO 3945WX 12-Cores
17:00:26:         CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
17:00:26:           CPUs: 24
17:00:26:         Memory: 31.84GiB
17:00:26:    Free Memory: 26.63GiB
17:00:26:        Threads: WINDOWS_THREADS
17:00:26:     OS Version: 6.2
17:00:26:    Has Battery: false
17:00:26:     On Battery: false
17:00:26:     UTC Offset: -5
17:00:26:            PID: 10640
17:00:26:            CWD: C:\ProgramData\FAHClient
17:00:26:  Win32 Service: false
17:00:26:             OS: Windows 10 Enterprise
17:00:26:        OS Arch: AMD64
17:00:26:           GPUs: 3
17:00:26:          GPU 0: Bus:72 Slot:0 Func:0 AMD:5 Vega 20 [Radeon VII] 13,284
17:00:26:          GPU 1: Bus:225 Slot:0 Func:0 NVIDIA:8 GA102 [GeForce RTX 3090]
17:00:26:          GPU 2: Bus:124 Slot:0 Func:0 AMD:5 Vega 20 [Radeon VII] 13,284
17:00:26:  CUDA Device 0: Platform:0 Device:0 Bus:225 Slot:0 Compute:8.6 Driver:11.5
17:00:26:OpenCL Device 0: Platform:0 Device:0 Bus:225 Slot:0 Compute:3.0 Driver:497.9
17:00:26:OpenCL Device 1: Platform:1 Device:0 Bus:124 Slot:0 Compute:1.2 Driver:3354.13
17:00:26:OpenCL Device 2: Platform:1 Device:1 Bus:72 Slot:0 Compute:1.2 Driver:3354.13
17:00:26:***********************************************************************
17:00:26:<config>
17:00:26:  <!-- Folding Slot Configuration -->
17:00:26:  <cause v='COVID_19'/>
17:00:26:
17:00:26:  <!-- Network -->
17:00:26:  <proxy v=':8080'/>
17:00:26:
17:00:26:  <!-- User Information -->
17:00:26:  <passkey v='*****'/>
17:00:26:  <team v='234771'/>
17:00:26:  <user v='atlr'/>
17:00:26:
17:00:26:  <!-- Folding Slots -->
17:00:26:  <slot id='1' type='GPU'>
17:00:26:    <paused v='true'/>
17:00:26:    <pci-bus v='72'/>
17:00:26:    <pci-slot v='0'/>
17:00:26:  </slot>
17:00:26:  <slot id='0' type='GPU'>
17:00:26:    <paused v='true'/>
17:00:26:    <pci-bus v='225'/>
17:00:26:    <pci-slot v='0'/>
17:00:26:  </slot>
17:00:26:  <slot id='2' type='GPU'>
17:00:26:    <paused v='true'/>
17:00:26:    <pci-bus v='124'/>
17:00:26:    <pci-slot v='0'/>
17:00:26:  </slot>
17:00:26:</config>
17:00:26:Trying to access database...
17:00:26:Successfully acquired database lock
17:00:26:FS01:Initialized folding slot 01: gpu:72:0 Vega 20 [Radeon VII] 13,284 
17:00:26:FS00:Initialized folding slot 00: gpu:225:0 GA102 [GeForce RTX 3090]
17:00:26:FS02:Initialized folding slot 02: gpu:124:0 Vega 20 [Radeon VII] 13,284 
17:01:27:Removing old file 'configs/config-20211219-162242.xml'
17:01:27:Saving configuration to config.xml
17:01:27:<config>
17:01:27:  <!-- Folding Slot Configuration -->
17:01:27:  <cause v='COVID_19'/>
17:01:27:
17:01:27:  <!-- Network -->
17:01:27:  <proxy v=':8080'/>
17:01:27:
17:01:27:  <!-- Slot Control -->
17:01:27:  <power v='FULL'/>
17:01:27:
17:01:27:  <!-- User Information -->
17:01:27:  <passkey v='*****'/>
17:01:27:  <team v='234771'/>
17:01:27:  <user v='atlr'/>
17:01:27:
17:01:27:  <!-- Folding Slots -->
17:01:27:  <slot id='1' type='GPU'>
17:01:27:    <paused v='true'/>
17:01:27:    <pci-bus v='72'/>
17:01:27:    <pci-slot v='0'/>
17:01:27:  </slot>
17:01:27:  <slot id='0' type='GPU'>
17:01:27:    <paused v='true'/>
17:01:27:    <pci-bus v='225'/>
17:01:27:    <pci-slot v='0'/>
17:01:27:  </slot>
17:01:27:  <slot id='2' type='GPU'>
17:01:27:    <paused v='true'/>
17:01:27:    <pci-bus v='124'/>
17:01:27:    <pci-slot v='0'/>
17:01:27:  </slot>
17:01:27:</config>
17:01:38:FS02:Unpaused
17:01:38:WU03:FS02:Starting
17:01:38:WU03:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.18/Core_22.fah/FahCore_22.exe -dir 03 -suffix 01 -version 706 -lifeline 10640 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
17:01:38:WU03:FS02:Started FahCore on PID 5704
17:01:38:WU03:FS02:Core PID:12340
17:01:38:WU03:FS02:FahCore 0x22 started
17:01:39:WU03:FS02:0x22:*********************** Log Started 2021-12-19T17:01:38Z ***********************
17:01:39:WU03:FS02:0x22:*************************** Core22 Folding@home Core ***************************
17:01:39:WU03:FS02:0x22:       Core: Core22
17:01:39:WU03:FS02:0x22:       Type: 0x22
17:01:39:WU03:FS02:0x22:    Version: 0.0.18
17:01:39:WU03:FS02:0x22:     Author: Joseph Coffland <[email protected]>
17:01:39:WU03:FS02:0x22:  Copyright: 2020 foldingathome.org
17:01:39:WU03:FS02:0x22:   Homepage: https://foldingathome.org/
17:01:39:WU03:FS02:0x22:       Date: Sep 28 2021
17:01:39:WU03:FS02:0x22:       Time: 05:55:05
17:01:39:WU03:FS02:0x22:   Revision: cfe3d7d990e8f456e371f8ce63b5fcc6daab2103
17:01:39:WU03:FS02:0x22:     Branch: HEAD
17:01:39:WU03:FS02:0x22:   Compiler: Visual C++
17:01:39:WU03:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:01:39:WU03:FS02:0x22:             -DOPENMM_VERSION="\"7.6.0\""
17:01:39:WU03:FS02:0x22:   Platform: win32 10
17:01:39:WU03:FS02:0x22:       Bits: 64
17:01:39:WU03:FS02:0x22:       Mode: Release
17:01:39:WU03:FS02:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
17:01:39:WU03:FS02:0x22:             <[email protected]>
17:01:39:WU03:FS02:0x22:       Args: -dir 03 -suffix 01 -version 706 -lifeline 5704 -checkpoint 15
17:01:39:WU03:FS02:0x22:             -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0
17:01:39:WU03:FS02:0x22:             -gpu-usage 100
17:01:39:WU03:FS02:0x22:************************************ libFAH ************************************
17:01:39:WU03:FS02:0x22:       Date: Sep 28 2021
17:01:39:WU03:FS02:0x22:       Time: 05:53:43
17:01:39:WU03:FS02:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
17:01:39:WU03:FS02:0x22:     Branch: HEAD
17:01:39:WU03:FS02:0x22:   Compiler: Visual C++
17:01:39:WU03:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:01:39:WU03:FS02:0x22:   Platform: win32 10
17:01:39:WU03:FS02:0x22:       Bits: 64
17:01:39:WU03:FS02:0x22:       Mode: Release
17:01:39:WU03:FS02:0x22:************************************ CBang *************************************
17:01:39:WU03:FS02:0x22:       Date: Sep 28 2021
17:01:39:WU03:FS02:0x22:       Time: 05:52:38
17:01:39:WU03:FS02:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
17:01:39:WU03:FS02:0x22:     Branch: HEAD
17:01:39:WU03:FS02:0x22:   Compiler: Visual C++
17:01:39:WU03:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:01:39:WU03:FS02:0x22:   Platform: win32 10
17:01:39:WU03:FS02:0x22:       Bits: 64
17:01:39:WU03:FS02:0x22:       Mode: Release
17:01:39:WU03:FS02:0x22:************************************ System ************************************
17:01:39:WU03:FS02:0x22:        CPU: AMD Ryzen Threadripper PRO 3945WX 12-Cores
17:01:39:WU03:FS02:0x22:     CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
17:01:39:WU03:FS02:0x22:       CPUs: 24
17:01:39:WU03:FS02:0x22:     Memory: 31.84GiB
17:01:39:WU03:FS02:0x22:Free Memory: 26.56GiB
17:01:39:WU03:FS02:0x22:    Threads: WINDOWS_THREADS
17:01:39:WU03:FS02:0x22: OS Version: 6.2
17:01:39:WU03:FS02:0x22:Has Battery: false
17:01:39:WU03:FS02:0x22: On Battery: false
17:01:39:WU03:FS02:0x22: UTC Offset: -5
17:01:39:WU03:FS02:0x22:        PID: 12340
17:01:39:WU03:FS02:0x22:        CWD: C:\ProgramData\FAHClient\work
17:01:39:WU03:FS02:0x22:************************************ OpenMM ************************************
17:01:39:WU03:FS02:0x22:    Version: 7.6.0
17:01:39:WU03:FS02:0x22:********************************************************************************
17:01:39:WU03:FS02:0x22:Project: 18201 (Run 2755, Clone 0, Gen 31)
17:01:39:WU03:FS02:0x22:Unit: 0x00000000000000000000000000000000
17:01:39:WU03:FS02:0x22:Digital signatures verified
17:01:39:WU03:FS02:0x22:Folding@home GPU Core22 Folding@home Core
17:01:39:WU03:FS02:0x22:Version 0.0.18
17:01:39:WU03:FS02:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
17:01:39:WU03:FS02:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
17:01:39:WU03:FS02:0x22:  XTC frame write interval: 20000 steps (1.6%) [62 total]
17:01:39:WU03:FS02:0x22:  Global context and integrator variables write interval: disabled
17:01:39:WU03:FS02:0x22:There are 3 platforms available.
17:01:39:WU03:FS02:0x22:Platform 0: Reference
17:01:39:WU03:FS02:0x22:Platform 1: CPU
17:01:39:WU03:FS02:0x22:Platform 2: OpenCL
17:01:39:WU03:FS02:0x22:  opencl-device 0 specified
17:01:56:WU03:FS02:0x22:Attempting to create OpenCL context:
17:01:56:WU03:FS02:0x22:  Configuring platform OpenCL
17:01:59:WU03:FS02:0x22:  Using OpenCL on platformId 1 and gpu 0
17:01:59:WU03:FS02:0x22:Completed 50000 out of 1250000 steps (4%)
17:02:28:Removing old file 'configs/config-20211219-162343.xml'
17:02:28:Saving configuration to config.xml
17:02:28:<config>
17:02:28:  <!-- Folding Slot Configuration -->
17:02:28:  <cause v='COVID_19'/>
17:02:28:
17:02:28:  <!-- Network -->
17:02:28:  <proxy v=':8080'/>
17:02:28:
17:02:28:  <!-- Slot Control -->
17:02:28:  <power v='FULL'/>
17:02:28:
17:02:28:  <!-- User Information -->
17:02:28:  <passkey v='*****'/>
17:02:28:  <team v='234771'/>
17:02:28:  <user v='atlr'/>
17:02:28:
17:02:28:  <!-- Folding Slots -->
17:02:28:  <slot id='1' type='GPU'>
17:02:28:    <paused v='true'/>
17:02:28:    <pci-bus v='72'/>
17:02:28:    <pci-slot v='0'/>
17:02:28:  </slot>
17:02:28:  <slot id='0' type='GPU'>
17:02:28:    <paused v='true'/>
17:02:28:    <pci-bus v='225'/>
17:02:28:    <pci-slot v='0'/>
17:02:28:  </slot>
17:02:28:  <slot id='2' type='GPU'>
17:02:28:    <pci-bus v='124'/>
17:02:28:    <pci-slot v='0'/>
17:02:28:  </slot>
17:02:28:</config>
17:03:06:WU03:FS02:0x22:Completed 62500 out of 1250000 steps (5%)
17:04:12:WU03:FS02:0x22:Completed 75000 out of 1250000 steps (6%)
17:04:13:WU03:FS02:0x22:Checkpoint completed at step 75000
17:05:19:WU03:FS02:0x22:Completed 87500 out of 1250000 steps (7%)

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 6:32 pm
by toTOW
The only WU shown in your log is running with OpenCL on platformId 1 and GPU 0 according to the core (and GPU vendor AMD according to the command sent by the client) ...

Matching this information with the client detection on start up indicates that it is running on GPU 2 which is a Radeon VII (OpenCL Device 1: Platform:1 Device:0 Bus:124 Slot:0 Compute:1.2 Driver:3354.13 which matches with GPU 2: Bus:124 Slot:0 Func:0 AMD:5 Vega 20 [Radeon VII] 13,284).

It is Folding Slot 2 (FS02 in the log) which matches your configuration file :

Code: Select all

17:02:28:  <slot id='2' type='GPU'>
17:02:28:    <pci-bus v='124'/>
17:02:28:    <pci-slot v='0'/>
17:02:28:  </slot>

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 7:02 pm
by atlr
Thank you for taking a look at this. I am trying to report that the status the F@H client displays does not match what is happening. I will record a video to better communicate what I observe. Stay tuned....

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 7:39 pm
by atlr
Here is a video showing what I described in my first message.
https://youtu.be/dnJO0BcGpcU

Here is the F@H log written during the video recording.

Code: Select all

*********************** Log Started 2021-12-19T19:17:02Z ***********************
19:17:02:******************************* libFAH ********************************
19:17:02:           Date: Oct 20 2020
19:17:02:           Time: 13:36:55
19:17:02:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
19:17:02:         Branch: master
19:17:02:       Compiler: Visual C++ 2015
19:17:02:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
19:17:02:       Platform: win32 10
19:17:02:           Bits: 32
19:17:02:           Mode: Release
19:17:02:****************************** FAHClient ******************************
19:17:02:        Version: 7.6.21
19:17:02:         Author: Joseph Coffland <[email protected]>
19:17:02:      Copyright: 2020 foldingathome.org
19:17:02:       Homepage: https://foldingathome.org/
19:17:02:           Date: Oct 20 2020
19:17:02:           Time: 13:41:04
19:17:02:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
19:17:02:         Branch: master
19:17:02:       Compiler: Visual C++ 2015
19:17:02:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
19:17:02:       Platform: win32 10
19:17:02:           Bits: 32
19:17:02:           Mode: Release
19:17:02:           Args: --open-web-control
19:17:02:         Config: C:\ProgramData\FAHClient\config.xml
19:17:02:******************************** CBang ********************************
19:17:02:           Date: Oct 20 2020
19:17:02:           Time: 11:36:18
19:17:02:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
19:17:02:         Branch: master
19:17:02:       Compiler: Visual C++ 2015
19:17:02:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
19:17:02:       Platform: win32 10
19:17:02:           Bits: 32
19:17:02:           Mode: Release
19:17:02:******************************* System ********************************
19:17:02:            CPU: AMD Ryzen Threadripper PRO 3945WX 12-Cores
19:17:02:         CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
19:17:02:           CPUs: 24
19:17:02:         Memory: 31.84GiB
19:17:02:    Free Memory: 26.24GiB
19:17:02:        Threads: WINDOWS_THREADS
19:17:02:     OS Version: 6.2
19:17:02:    Has Battery: false
19:17:02:     On Battery: false
19:17:02:     UTC Offset: -5
19:17:02:            PID: 5688
19:17:02:            CWD: C:\ProgramData\FAHClient
19:17:02:  Win32 Service: false
19:17:02:             OS: Windows 10 Enterprise
19:17:02:        OS Arch: AMD64
19:17:02:           GPUs: 3
19:17:02:          GPU 0: Bus:72 Slot:0 Func:0 AMD:5 Vega 20 [Radeon VII] 13,284
19:17:02:          GPU 1: Bus:225 Slot:0 Func:0 NVIDIA:8 GA102 [GeForce RTX 3090]
19:17:02:          GPU 2: Bus:124 Slot:0 Func:0 AMD:5 Vega 20 [Radeon VII] 13,284
19:17:02:  CUDA Device 0: Platform:0 Device:0 Bus:225 Slot:0 Compute:8.6 Driver:11.5
19:17:02:OpenCL Device 0: Platform:0 Device:0 Bus:225 Slot:0 Compute:3.0 Driver:497.9
19:17:02:OpenCL Device 1: Platform:1 Device:0 Bus:124 Slot:0 Compute:1.2 Driver:3354.13
19:17:02:OpenCL Device 2: Platform:1 Device:1 Bus:72 Slot:0 Compute:1.2 Driver:3354.13
19:17:02:***********************************************************************
19:17:02:<config>
19:17:02:  <!-- Folding Slot Configuration -->
19:17:02:  <cause v='COVID_19'/>
19:17:02:
19:17:02:  <!-- Network -->
19:17:02:  <proxy v=':8080'/>
19:17:02:
19:17:02:  <!-- Slot Control -->
19:17:02:  <power v='FULL'/>
19:17:02:
19:17:02:  <!-- User Information -->
19:17:02:  <passkey v='*****'/>
19:17:02:  <team v='234771'/>
19:17:02:  <user v='atlr'/>
19:17:02:
19:17:02:  <!-- Folding Slots -->
19:17:02:  <slot id='0' type='GPU'>
19:17:02:    <paused v='true'/>
19:17:02:    <pci-bus v='72'/>
19:17:02:    <pci-slot v='0'/>
19:17:02:  </slot>
19:17:02:  <slot id='1' type='GPU'>
19:17:02:    <paused v='true'/>
19:17:02:    <pci-bus v='225'/>
19:17:02:    <pci-slot v='0'/>
19:17:02:  </slot>
19:17:02:  <slot id='2' type='GPU'>
19:17:02:    <paused v='true'/>
19:17:02:    <pci-bus v='124'/>
19:17:02:    <pci-slot v='0'/>
19:17:02:  </slot>
19:17:02:</config>
19:17:02:Trying to access database...
19:17:02:Successfully acquired database lock
19:17:02:FS00:Initialized folding slot 00: gpu:72:0 Vega 20 [Radeon VII] 13,284 
19:17:02:FS01:Initialized folding slot 01: gpu:225:0 GA102 [GeForce RTX 3090]
19:17:02:FS02:Initialized folding slot 02: gpu:124:0 Vega 20 [Radeon VII] 13,284 
19:17:03:5:127.0.0.1:New Web session
19:17:04:10:127.0.0.1:New Web session
19:23:11:FS02:Unpaused
19:23:11:WU02:FS02:Starting
19:23:11:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.18/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 706 -lifeline 5688 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
19:23:11:WU02:FS02:Started FahCore on PID 5676
19:23:11:WU02:FS02:Core PID:6584
19:23:11:WU02:FS02:FahCore 0x22 started
19:23:11:WU02:FS02:0x22:*********************** Log Started 2021-12-19T19:23:11Z ***********************
19:23:11:WU02:FS02:0x22:*************************** Core22 Folding@home Core ***************************
19:23:11:WU02:FS02:0x22:       Core: Core22
19:23:11:WU02:FS02:0x22:       Type: 0x22
19:23:11:WU02:FS02:0x22:    Version: 0.0.18
19:23:11:WU02:FS02:0x22:     Author: Joseph Coffland <[email protected]>
19:23:11:WU02:FS02:0x22:  Copyright: 2020 foldingathome.org
19:23:11:WU02:FS02:0x22:   Homepage: https://foldingathome.org/
19:23:11:WU02:FS02:0x22:       Date: Sep 28 2021
19:23:11:WU02:FS02:0x22:       Time: 05:55:05
19:23:11:WU02:FS02:0x22:   Revision: cfe3d7d990e8f456e371f8ce63b5fcc6daab2103
19:23:11:WU02:FS02:0x22:     Branch: HEAD
19:23:11:WU02:FS02:0x22:   Compiler: Visual C++
19:23:11:WU02:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:23:11:WU02:FS02:0x22:             -DOPENMM_VERSION="\"7.6.0\""
19:23:11:WU02:FS02:0x22:   Platform: win32 10
19:23:11:WU02:FS02:0x22:       Bits: 64
19:23:11:WU02:FS02:0x22:       Mode: Release
19:23:11:WU02:FS02:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
19:23:11:WU02:FS02:0x22:             <[email protected]>
19:23:11:WU02:FS02:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 5676 -checkpoint 15
19:23:11:WU02:FS02:0x22:             -opencl-platform 1 -opencl-device 0 -gpu-vendor amd -gpu 0
19:23:11:WU02:FS02:0x22:             -gpu-usage 100
19:23:11:WU02:FS02:0x22:************************************ libFAH ************************************
19:23:11:WU02:FS02:0x22:       Date: Sep 28 2021
19:23:11:WU02:FS02:0x22:       Time: 05:53:43
19:23:11:WU02:FS02:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
19:23:11:WU02:FS02:0x22:     Branch: HEAD
19:23:11:WU02:FS02:0x22:   Compiler: Visual C++
19:23:11:WU02:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:23:11:WU02:FS02:0x22:   Platform: win32 10
19:23:11:WU02:FS02:0x22:       Bits: 64
19:23:11:WU02:FS02:0x22:       Mode: Release
19:23:11:WU02:FS02:0x22:************************************ CBang *************************************
19:23:11:WU02:FS02:0x22:       Date: Sep 28 2021
19:23:11:WU02:FS02:0x22:       Time: 05:52:38
19:23:11:WU02:FS02:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
19:23:11:WU02:FS02:0x22:     Branch: HEAD
19:23:11:WU02:FS02:0x22:   Compiler: Visual C++
19:23:11:WU02:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:23:11:WU02:FS02:0x22:   Platform: win32 10
19:23:11:WU02:FS02:0x22:       Bits: 64
19:23:11:WU02:FS02:0x22:       Mode: Release
19:23:11:WU02:FS02:0x22:************************************ System ************************************
19:23:11:WU02:FS02:0x22:        CPU: AMD Ryzen Threadripper PRO 3945WX 12-Cores
19:23:11:WU02:FS02:0x22:     CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
19:23:11:WU02:FS02:0x22:       CPUs: 24
19:23:11:WU02:FS02:0x22:     Memory: 31.84GiB
19:23:11:WU02:FS02:0x22:Free Memory: 26.14GiB
19:23:11:WU02:FS02:0x22:    Threads: WINDOWS_THREADS
19:23:11:WU02:FS02:0x22: OS Version: 6.2
19:23:11:WU02:FS02:0x22:Has Battery: false
19:23:11:WU02:FS02:0x22: On Battery: false
19:23:11:WU02:FS02:0x22: UTC Offset: -5
19:23:11:WU02:FS02:0x22:        PID: 6584
19:23:11:WU02:FS02:0x22:        CWD: C:\ProgramData\FAHClient\work
19:23:11:WU02:FS02:0x22:************************************ OpenMM ************************************
19:23:11:WU02:FS02:0x22:    Version: 7.6.0
19:23:11:WU02:FS02:0x22:********************************************************************************
19:23:11:WU02:FS02:0x22:Project: 18021 (Run 11, Clone 6, Gen 80)
19:23:11:WU02:FS02:0x22:Unit: 0x00000000000000000000000000000000
19:23:11:WU02:FS02:0x22:Digital signatures verified
19:23:11:WU02:FS02:0x22:Folding@home GPU Core22 Folding@home Core
19:23:11:WU02:FS02:0x22:Version 0.0.18
19:23:11:WU02:FS02:0x22:  Checkpoint write interval: 125000 steps (5%) [20 total]
19:23:11:WU02:FS02:0x22:  JSON viewer frame write interval: 25000 steps (1%) [100 total]
19:23:11:WU02:FS02:0x22:  XTC frame write interval: 250000 steps (10%) [10 total]
19:23:11:WU02:FS02:0x22:  Global context and integrator variables write interval: disabled
19:23:11:WU02:FS02:0x22:There are 3 platforms available.
19:23:11:WU02:FS02:0x22:Platform 0: Reference
19:23:11:WU02:FS02:0x22:Platform 1: CPU
19:23:11:WU02:FS02:0x22:Platform 2: OpenCL
19:23:11:WU02:FS02:0x22:  opencl-device 0 specified
19:23:19:WU02:FS02:0x22:Attempting to create OpenCL context:
19:23:19:WU02:FS02:0x22:  Configuring platform OpenCL
19:23:21:WU02:FS02:0x22:  Using OpenCL on platformId 1 and gpu 0
19:23:21:WU02:FS02:0x22:Completed 0 out of 2500000 steps (0%)
19:23:21:WU02:FS02:0x22:Checkpoint completed at step 0
19:24:09:Removing old file 'configs/config-20211219-173946.xml'
19:24:09:Saving configuration to config.xml
19:24:09:<config>
19:24:09:  <!-- Folding Slot Configuration -->
19:24:09:  <cause v='COVID_19'/>
19:24:09:
19:24:09:  <!-- Network -->
19:24:09:  <proxy v=':8080'/>
19:24:09:
19:24:09:  <!-- Slot Control -->
19:24:09:  <power v='FULL'/>
19:24:09:
19:24:09:  <!-- User Information -->
19:24:09:  <passkey v='*****'/>
19:24:09:  <team v='234771'/>
19:24:09:  <user v='atlr'/>
19:24:09:
19:24:09:  <!-- Folding Slots -->
19:24:09:  <slot id='0' type='GPU'>
19:24:09:    <paused v='true'/>
19:24:09:    <pci-bus v='72'/>
19:24:09:    <pci-slot v='0'/>
19:24:09:  </slot>
19:24:09:  <slot id='1' type='GPU'>
19:24:09:    <paused v='true'/>
19:24:09:    <pci-bus v='225'/>
19:24:09:    <pci-slot v='0'/>
19:24:09:  </slot>
19:24:09:  <slot id='2' type='GPU'>
19:24:09:    <pci-bus v='124'/>
19:24:09:    <pci-slot v='0'/>
19:24:09:  </slot>
19:24:09:</config>
19:24:25:WU02:FS02:0x22:Completed 25000 out of 2500000 steps (1%)
19:25:29:WU02:FS02:0x22:Completed 50000 out of 2500000 steps (2%)
19:26:31:FS02:Paused
19:26:31:FS02:Shutting core down
19:26:31:WU02:FS02:0x22:WARNING:Console control signal 1 on PID 6584
19:26:31:WU02:FS02:0x22:Exiting, please wait. . .
19:26:31:WU02:FS02:0x22:Folding@home Core Shutdown: INTERRUPTED
19:26:31:WU02:FS02:FahCore returned: INTERRUPTED (102 = 0x66)
19:27:12:Removing old file 'configs/config-20211219-174229.xml'
19:27:12:Saving configuration to config.xml
19:27:12:<config>
19:27:12:  <!-- Folding Slot Configuration -->
19:27:12:  <cause v='COVID_19'/>
19:27:12:
19:27:12:  <!-- Network -->
19:27:12:  <proxy v=':8080'/>
19:27:12:
19:27:12:  <!-- Slot Control -->
19:27:12:  <power v='FULL'/>
19:27:12:
19:27:12:  <!-- User Information -->
19:27:12:  <passkey v='*****'/>
19:27:12:  <team v='234771'/>
19:27:12:  <user v='atlr'/>
19:27:12:
19:27:12:  <!-- Folding Slots -->
19:27:12:  <slot id='0' type='GPU'>
19:27:12:    <paused v='true'/>
19:27:12:    <pci-bus v='72'/>
19:27:12:    <pci-slot v='0'/>
19:27:12:  </slot>
19:27:12:  <slot id='1' type='GPU'>
19:27:12:    <paused v='true'/>
19:27:12:    <pci-bus v='225'/>
19:27:12:    <pci-slot v='0'/>
19:27:12:  </slot>
19:27:12:  <slot id='2' type='GPU'>
19:27:12:    <paused v='true'/>
19:27:12:    <pci-bus v='124'/>
19:27:12:    <pci-slot v='0'/>
19:27:12:  </slot>
19:27:12:</config>

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 8:09 pm
by toTOW
What happens if you start all the slots ?

Anyway, mixing nVidia and AMD GPUs in the same system has always been a pain ... :(

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 8:22 pm
by atlr
The first OpenCL AMD slot job starts on the Nvidia GPU.
The first CUDA Nvidia slot job starts on the Nvidia GPU.
These two contexts run successfully simultaneously on the Nvidia GPU.
The second OpenCL AMD slot job downloads a work unit, errors out trying to create an OpenCL context and then repeats this download/context creation error cycle.

Code: Select all

20:20:23:WU00:FS00:Downloading 26.53MiB
20:20:28:WU02:FS02:0x22:Completed 600000 out of 2500000 steps (24%)
20:20:29:WU00:FS00:Download 19.55%
20:20:35:WU00:FS00:Download 38.40%
20:20:41:WU00:FS00:Download 57.96%
20:20:47:WU00:FS00:Download 76.57%
20:20:53:WU00:FS00:Download 93.06%
20:20:55:WU00:FS00:Download complete
20:20:55:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:18201 run:5836 clone:0 gen:31 core:0x22 unit:0x000000000000001f00004719000016cc
20:20:55:WU00:FS00:Starting
20:20:55:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.18/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 5688 -checkpoint 15 -opencl-platform 1 -opencl-device 1 -gpu-vendor amd -gpu 1 -gpu-usage 100
20:20:55:WU00:FS00:Started FahCore on PID 1072
20:20:55:WU00:FS00:Core PID:13720
20:20:55:WU00:FS00:FahCore 0x22 started
20:20:56:WU00:FS00:0x22:*********************** Log Started 2021-12-19T20:20:55Z ***********************
20:20:56:WU00:FS00:0x22:*************************** Core22 Folding@home Core ***************************
20:20:56:WU00:FS00:0x22:       Core: Core22
20:20:56:WU00:FS00:0x22:       Type: 0x22
20:20:56:WU00:FS00:0x22:    Version: 0.0.18
20:20:56:WU00:FS00:0x22:     Author: Joseph Coffland <[email protected]>
20:20:56:WU00:FS00:0x22:  Copyright: 2020 foldingathome.org
20:20:56:WU00:FS00:0x22:   Homepage: https://foldingathome.org/
20:20:56:WU00:FS00:0x22:       Date: Sep 28 2021
20:20:56:WU00:FS00:0x22:       Time: 05:55:05
20:20:56:WU00:FS00:0x22:   Revision: cfe3d7d990e8f456e371f8ce63b5fcc6daab2103
20:20:56:WU00:FS00:0x22:     Branch: HEAD
20:20:56:WU00:FS00:0x22:   Compiler: Visual C++
20:20:56:WU00:FS00:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
20:20:56:WU00:FS00:0x22:             -DOPENMM_VERSION="\"7.6.0\""
20:20:56:WU00:FS00:0x22:   Platform: win32 10
20:20:56:WU00:FS00:0x22:       Bits: 64
20:20:56:WU00:FS00:0x22:       Mode: Release
20:20:56:WU00:FS00:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
20:20:56:WU00:FS00:0x22:             <[email protected]>
20:20:56:WU00:FS00:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 1072 -checkpoint 15
20:20:56:WU00:FS00:0x22:             -opencl-platform 1 -opencl-device 1 -gpu-vendor amd -gpu 1
20:20:56:WU00:FS00:0x22:             -gpu-usage 100
20:20:56:WU00:FS00:0x22:************************************ libFAH ************************************
20:20:56:WU00:FS00:0x22:       Date: Sep 28 2021
20:20:56:WU00:FS00:0x22:       Time: 05:53:43
20:20:56:WU00:FS00:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
20:20:56:WU00:FS00:0x22:     Branch: HEAD
20:20:56:WU00:FS00:0x22:   Compiler: Visual C++
20:20:56:WU00:FS00:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
20:20:56:WU00:FS00:0x22:   Platform: win32 10
20:20:56:WU00:FS00:0x22:       Bits: 64
20:20:56:WU00:FS00:0x22:       Mode: Release
20:20:56:WU00:FS00:0x22:************************************ CBang *************************************
20:20:56:WU00:FS00:0x22:       Date: Sep 28 2021
20:20:56:WU00:FS00:0x22:       Time: 05:52:38
20:20:56:WU00:FS00:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
20:20:56:WU00:FS00:0x22:     Branch: HEAD
20:20:56:WU00:FS00:0x22:   Compiler: Visual C++
20:20:56:WU00:FS00:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
20:20:56:WU00:FS00:0x22:   Platform: win32 10
20:20:56:WU00:FS00:0x22:       Bits: 64
20:20:56:WU00:FS00:0x22:       Mode: Release
20:20:56:WU00:FS00:0x22:************************************ System ************************************
20:20:56:WU00:FS00:0x22:        CPU: AMD Ryzen Threadripper PRO 3945WX 12-Cores
20:20:56:WU00:FS00:0x22:     CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
20:20:56:WU00:FS00:0x22:       CPUs: 24
20:20:56:WU00:FS00:0x22:     Memory: 31.84GiB
20:20:56:WU00:FS00:0x22:Free Memory: 24.88GiB
20:20:56:WU00:FS00:0x22:    Threads: WINDOWS_THREADS
20:20:56:WU00:FS00:0x22: OS Version: 6.2
20:20:56:WU00:FS00:0x22:Has Battery: false
20:20:56:WU00:FS00:0x22: On Battery: false
20:20:56:WU00:FS00:0x22: UTC Offset: -5
20:20:56:WU00:FS00:0x22:        PID: 13720
20:20:56:WU00:FS00:0x22:        CWD: C:\ProgramData\FAHClient\work
20:20:56:WU00:FS00:0x22:************************************ OpenMM ************************************
20:20:56:WU00:FS00:0x22:    Version: 7.6.0
20:20:56:WU00:FS00:0x22:********************************************************************************
20:20:56:WU00:FS00:0x22:Project: 18201 (Run 5836, Clone 0, Gen 31)
20:20:56:WU00:FS00:0x22:Unit: 0x00000000000000000000000000000000
20:20:56:WU00:FS00:0x22:Reading tar file core.xml
20:20:56:WU00:FS00:0x22:Reading tar file integrator.xml
20:20:56:WU00:FS00:0x22:Reading tar file state.xml
20:20:56:WU00:FS00:0x22:Reading tar file system.xml
20:20:56:WU00:FS00:0x22:Digital signatures verified
20:20:56:WU00:FS00:0x22:Folding@home GPU Core22 Folding@home Core
20:20:56:WU00:FS00:0x22:Version 0.0.18
20:20:56:WU00:FS00:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
20:20:56:WU00:FS00:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
20:20:56:WU00:FS00:0x22:  XTC frame write interval: 20000 steps (1.6%) [62 total]
20:20:56:WU00:FS00:0x22:  Global context and integrator variables write interval: disabled
20:20:56:WU00:FS00:0x22:There are 3 platforms available.
20:20:56:WU00:FS00:0x22:Platform 0: Reference
20:20:56:WU00:FS00:0x22:Platform 1: CPU
20:20:56:WU00:FS00:0x22:Platform 2: OpenCL
20:20:56:WU00:FS00:0x22:  opencl-device 1 specified
20:21:14:WU00:FS00:0x22:Attempting to create OpenCL context:
20:21:14:WU00:FS00:0x22:  Configuring platform OpenCL
20:21:15:WU00:FS00:0x22:Failed to create OpenCL context:
20:21:15:WU00:FS00:0x22:Illegal value for DeviceIndex: 1
20:21:15:WU00:FS00:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
20:21:15:WU00:FS00:0x22:Saving result file ..\logfile_01.txt
20:21:15:WU00:FS00:0x22:Saving result file science.log
20:21:15:WU00:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
20:21:15:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:21:15:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:18201 run:5836 clone:0 gen:31 core:0x22 unit:0x000000000000001f00004719000016cc
20:21:15:WU00:FS00:Uploading 2.67KiB to 128.252.203.11
20:21:15:WU00:FS00:Connecting to 128.252.203.11:8080
20:21:15:WU00:FS00:Upload complete
20:21:15:WU00:FS00:Server responded WORK_ACK (400)
20:21:15:WU00:FS00:Cleaning up
20:21:15:WU03:FS00:Connecting to assign1.foldingathome.org:80
20:21:16:WU03:FS00:Assigned to work server 129.32.209.202
20:21:16:WU03:FS00:Requesting new work unit for slot 00: gpu:72:0 Vega 20 [Radeon VII] 13,284  from 129.32.209.202
20:21:16:WU03:FS00:Connecting to 129.32.209.202:8080
20:21:16:WU03:FS00:Downloading 27.73MiB
20:21:22:WU03:FS00:Download 17.13%
20:21:28:WU03:FS00:Download 34.93%

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 8:46 pm
by Neil-B
OK so what is the chance hwinfo is missreporting rather than fah doing something odd ... try running just the amd one that hwinfo is reporting as nvidia and checking which gpu is getting warm "manually" rather than rely on hwinfo ... also check tpf on nvidia wu with just it running then start the amd one that is being reported by hwinfo as running on the nvidia and see if it changes significantly - if it doesn't then the issue is with hwinfo

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 9:09 pm
by toTOW
Does clinfo tool have the same output/device ordering as the client ?

With such a strange setup, you may have to set everything manually in config.xml file ... :(

Re: issue with GPUs on multiple PCI buses

Posted: Sun Dec 19, 2021 9:17 pm
by toTOW
Maybe something like this :

Code: Select all

<!-- Folding Slots -->
<slot id='0' type='GPU'>
 <pci-bus v='72'/>
 <pci-slot v='0'/>
 <opencl-platform v='1'/>
 <opencl-device v='1'/>
</slot>
<slot id='1' type='GPU'>
 <pci-bus v='225'/>
 <pci-slot v='0'/>
 <cuda-device v='0'>
 <opencl-platform v='0'/>
 <opencl-device v='0'/>
</slot>
<slot id='2' type='GPU'>
 <pci-bus v='124'/>
 <pci-slot v='0'/>
 <opencl-platform v='1'/>
 <opencl-device v='0'/>
</slot>
If it still has the same behaviour, you could try :

Code: Select all

<!-- Folding Slots -->
<slot id='0' type='GPU'>
 <pci-bus v='72'/>
 <pci-slot v='0'/>
 <opencl-platform v='1'/>
 <opencl-device v='2'/>
</slot>
<slot id='1' type='GPU'>
 <pci-bus v='225'/>
 <pci-slot v='0'/>
 <cuda-device v='0'>
 <opencl-platform v='0'/>
 <opencl-device v='0'/>
</slot>
<slot id='2' type='GPU'>
 <pci-bus v='124'/>
 <pci-slot v='0'/>
 <opencl-platform v='1'/>
 <opencl-device v='1'/>
</slot>