Page 1 of 1

BAD_WORK_UNIT (114 = 0x72) on AMD Radeon RX 5500 XT

Posted: Sat May 02, 2020 3:01 pm
by 4n0n
Hi everyone,

I have problems getting my GPU to fold.

After installing the fahclient 7.6.9 the GPU slot seemed to fold fine. After some days the GPU slot seems to be "broken". Everytime a new WU is downloaded, the GPU slot starts folding but immediately stops at 0% with the warning "Error initializing context: clGetDeviceIDs (-1)" and "BAD_WORK_UNIT (114 = 0x72)" in the log. After 10 or something retries the slot shows "Failed" in fahcontrol.

I am on Linux Mint v19.3, fahclient v7.6.9 with the newest Radeon drivers v20.10 from amd-dot-com/de/support/graphics/amd-radeon-5500-series/amd-radeon-rx-5500-series/amd-radeon-rx-5500-xt

Does anyone know what's going on here and how to fix the problem?

Any help is appreciated.

Some more details to come:

log:

Code: Select all

06:10:46:****************************** FAHClient ******************************
06:10:46:Started thread 1 on PID 1344
06:10:46:        Version: 7.6.9
06:10:46:         Author: Joseph Coffland <[email protected]>
06:10:46:      Copyright: 2020 foldingathome.org
06:10:46:       Homepage: https://foldingathome.org/
06:10:46:           Date: Apr 17 2020
06:10:46:           Time: 18:11:26
06:10:46:       Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
06:10:46:         Branch: master
06:10:46:       Compiler: GNU 8.3.0
06:10:46:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
06:10:46:                 -funroll-loops -fno-pie
06:10:46:       Platform: linux2 4.19.0-5-amd64
06:10:46:           Bits: 64
06:10:46:           Mode: Release
06:10:46:           Args: --child /etc/fahclient/config.xml --run-as fahclient
06:10:46:                 --pid-file=/var/run/fahclient.pid --daemon
06:10:46:         Config: /etc/fahclient/config.xml
06:10:46:******************************** CBang ********************************
06:10:46:           Date: Apr 17 2020
06:10:46:           Time: 18:10:13
06:10:46:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
06:10:46:         Branch: master
06:10:46:       Compiler: GNU 8.3.0
06:10:46:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
06:10:46:                 -funroll-loops -fno-pie -fPIC
06:10:46:       Platform: linux2 4.19.0-5-amd64
06:10:46:           Bits: 64
06:10:46:           Mode: Release
06:10:46:******************************* System ********************************
06:10:46:            CPU: AMD Ryzen 5 3600 6-Core Processor
06:10:46:         CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
06:10:46:           CPUs: 12
06:10:46:         Memory: 31.37GiB
06:10:46:    Free Memory: 30.32GiB
06:10:46:        Threads: POSIX_THREADS
06:10:46:     OS Version: 5.6
06:10:46:    Has Battery: false
06:10:46:     On Battery: false
06:10:46:     UTC Offset: 2
06:10:46:            PID: 1344
06:10:46:            CWD: /var/lib/fahclient
06:10:46:             OS: Linux 5.6.6-050606-generic x86_64
06:10:46:        OS Arch: AMD64
06:10:46:           GPUs: 1
06:10:46:          GPU 0: Bus:40 Slot:0 Func:0 AMD:6 Navi 14 [Radeon RX 5500/5500M / Pro
06:10:46:                 5500M]
06:10:46:           CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
06:10:46:                 libcuda.so: cannot open shared object file: No such file or
06:10:46:                 directory
06:10:46:OpenCL Device 0: Platform:0 Device:0 Bus:40 Slot:0 Compute:2.0 Driver:3075.10
06:10:46:******************************* libFAH ********************************
06:10:46:           Date: Apr 15 2020
06:10:46:           Time: 21:43:24
06:10:46:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
06:10:46:         Branch: master
06:10:46:       Compiler: GNU 8.3.0
06:10:46:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
06:10:46:                 -funroll-loops -fno-pie
06:10:46:       Platform: linux2 4.19.0-5-amd64
06:10:46:           Bits: 64
06:10:46:           Mode: Release
06:10:46:***********************************************************************
06:10:46:<config>
06:10:46:  <!-- Client Control -->
06:10:46:  <client-threads v='6'/>
06:10:46:  <cycle-rate v='4'/>
06:10:46:  <cycles v='-1'/>
06:10:46:  <disable-sleep-when-active v='true'/>
06:10:46:  <exit-when-done v='false'/>
06:10:46:  <fold-anon v='true'/>
06:10:46:  <idle-seconds v='300'/>
06:10:46:  <open-web-control v='false'/>
06:10:46:
06:10:46:  <!-- Configuration -->
06:10:46:  <config-rotate v='true'/>
06:10:46:  <config-rotate-dir v='configs'/>
06:10:46:  <config-rotate-max v='16'/>
06:10:46:
06:10:46:  <!-- Debugging -->
06:10:46:  <assignment-servers>
06:10:46:    assign1.foldingathome.org assign2.foldingathome.org assign3.foldingathome.org assign4.foldingathome.org 
06:10:46:  </assignment-servers>
06:10:46:  <auth-as v='true'/>
06:10:46:  <capture-directory v='capture'/>
06:10:46:  <capture-on-error v='false'/>
06:10:46:  <capture-packets v='false'/>
06:10:46:  <capture-requests v='false'/>
06:10:46:  <capture-responses v='false'/>
06:10:46:  <capture-sockets v='false'/>
06:10:46:  <debug-sockets v='false'/>
06:10:46:  <exception-locations v='true'/>
06:10:46:  <stack-traces v='false'/>
06:10:46:
06:10:46:  <!-- Error Handling -->
06:10:46:  <max-slot-errors v='10'/>
06:10:46:  <max-unit-errors v='5'/>
06:10:46:
06:10:46:  <!-- Folding Core -->
06:10:46:  <checkpoint v='15'/>
06:10:46:  <core-priority v='idle'/>
06:10:46:  <cpu-usage v='100'/>
06:10:46:  <gpu-usage v='100'/>
06:10:46:  <no-assembly v='false'/>
06:10:46:
06:10:46:  <!-- Folding Slot Configuration -->
06:10:46:  <cause v='COVID_19'/>
06:10:46:  <client-subtype v='LINUX'/>
06:10:46:  <client-type v='normal'/>
06:10:46:  <cpu-species v='X86_AMD'/>
06:10:46:  <cpu-type v='AMD64'/>
06:10:46:  <cpus v='-1'/>
06:10:46:  <disable-viz v='false'/>
06:10:46:  <gpu v='true'/>
06:10:46:  <max-packet-size v='normal'/>
06:10:46:  <os-species v='UNKNOWN'/>
06:10:46:  <os-type v='LINUX'/>
06:10:46:  <project-key v='0'/>
06:10:46:  <smp v='true'/>
06:10:46:
06:10:46:  <!-- GUI -->
06:10:46:  <gui-enabled v='true'/>
06:10:46:
06:10:46:  <!-- HTTP Server -->
06:10:46:  <allow v='127.0.0.1 192.168.10.0/24'/>
06:10:46:  <connection-timeout v='60'/>
06:10:46:  <deny v='0/0'/>
06:10:46:  <http-addresses v='0:7396'/>
06:10:46:  <https-addresses v=''/>
06:10:46:  <max-connect-time v='900'/>
06:10:46:  <max-connections v='800'/>
06:10:46:  <max-request-length v='52428800'/>
06:10:46:  <min-connect-time v='300'/>
06:10:46:
06:10:46:  <!-- Logging -->
06:10:46:  <log v='log.txt'/>
06:10:46:  <log-color v='true'/>
06:10:46:  <log-crlf v='false'/>
06:10:46:  <log-date v='false'/>
06:10:46:  <log-date-periodically v='21600'/>
06:10:46:  <log-domain v='false'/>
06:10:46:  <log-header v='true'/>
06:10:46:  <log-level v='true'/>
06:10:46:  <log-no-info-header v='true'/>
06:10:46:  <log-redirect v='false'/>
06:10:46:  <log-rotate v='true'/>
06:10:46:  <log-rotate-dir v='logs'/>
06:10:46:  <log-rotate-max v='16'/>
06:10:46:  <log-short-level v='false'/>
06:10:46:  <log-simple-domains v='true'/>
06:10:46:  <log-thread-id v='false'/>
06:10:46:  <log-thread-prefix v='true'/>
06:10:46:  <log-time v='true'/>
06:10:46:  <log-to-screen v='true'/>
06:10:46:  <log-truncate v='false'/>
06:10:46:  <verbosity v='5'/>
06:10:46:
06:10:46:  <!-- Process Control -->
06:10:46:  <child v='true'/>
06:10:46:  <daemon v='true'/>
06:10:46:  <fork v='false'/>
06:10:46:  <pid v='false'/>
06:10:46:  <pid-file v='/var/run/fahclient.pid'/>
06:10:46:  <respawn v='false'/>
06:10:46:  <service v='false'/>
06:10:46:
06:10:46:  <!-- Slot Control -->
06:10:46:  <idle v='false'/>
06:10:46:  <max-shutdown-wait v='60'/>
06:10:46:  <pause-on-battery v='true'/>
06:10:46:  <pause-on-start v='false'/>
06:10:46:  <paused v='false'/>
06:10:46:  <power v='medium'/>
06:10:46:
06:10:46:  <!-- Web Server -->
06:10:46:  <web-allow v='127.0.0.1'/>
06:10:46:  <web-deny v='0/0'/>
06:10:46:  <web-enable v='true'/>
06:10:46:
06:10:46:  <!-- Web Server Sessions -->
06:10:46:  <session-cookie v='sid'/>
06:10:46:  <session-lifetime v='86400'/>
06:10:46:  <session-timeout v='3600'/>
06:10:46:
06:10:46:  <!-- Work Unit Control -->
06:10:46:  <dump-after-deadline v='true'/>
06:10:46:  <max-queue v='16'/>
06:10:46:  <max-units v='0'/>
06:10:46:  <next-unit-percentage v='99'/>
06:10:46:  <stall-detection-enabled v='false'/>
06:10:46:  <stall-percent v='5'/>
06:10:46:  <stall-timeout v='1800'/>
06:10:46:
06:10:46:  <!-- Folding Slots -->
06:10:46:  <slot id='0' type='CPU'>
06:10:46:    <cpus v='12'/>
06:10:46:  </slot>
06:10:46:  <slot id='1' type='GPU'>
06:10:46:    <paused v='True'/>
06:10:46:  </slot>
06:10:46:</config>
06:10:46:Trying to access database...
06:10:46:Successfully acquired database lock
06:10:46:Enabled folding slot 00: READY cpu:12
06:10:46:Enabled folding slot 01: PAUSED gpu:0:Navi 14 [Radeon RX 5500/5500M / Pro 5500M] (by user)
[...]
14:37:43:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:14430 run:0 clone:886 gen:31 core:0x22 unit:0x0000002b0d5262775e8b4d597b7ac87c
14:37:43:WU01:FS01:Starting
14:37:43:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 1344 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
14:37:43:WU01:FS01:Started FahCore on PID 17757
14:37:43:WU01:FS01:Core PID:17761
14:37:43:WU01:FS01:FahCore 0x22 started
14:37:43:WU01:FS01:0x22:*********************** Log Started 2020-05-02T14:37:43Z ***********************
14:37:43:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
14:37:43:WU01:FS01:0x22:       Type: 0x22
14:37:43:WU01:FS01:0x22:       Core: Core22
14:37:43:WU01:FS01:0x22:    Website: https://foldingathome.org/
14:37:43:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
14:37:43:WU01:FS01:0x22:     Author: John Chodera <[email protected]> and Rafal Wiewiora
14:37:43:WU01:FS01:0x22:             <[email protected]>
14:37:43:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 17757 -checkpoint 15
14:37:43:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
14:37:43:WU01:FS01:0x22:     Config: <none>
14:37:43:WU01:FS01:0x22:************************************ Build *************************************
14:37:43:WU01:FS01:0x22:    Version: 0.0.5
14:37:43:WU01:FS01:0x22:       Date: Apr 22 2020
14:37:43:WU01:FS01:0x22:       Time: 03:57:11
14:37:43:WU01:FS01:0x22: Repository: Git
14:37:43:WU01:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
14:37:43:WU01:FS01:0x22:     Branch: HEAD
14:37:43:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:37:43:WU01:FS01:0x22:    Options: -std=c++11 -O3 -funroll-loops
14:37:43:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:37:43:WU01:FS01:0x22:       Bits: 64
14:37:43:WU01:FS01:0x22:       Mode: Release
14:37:43:WU01:FS01:0x22:************************************ System ************************************
14:37:43:WU01:FS01:0x22:        CPU: AMD Ryzen 5 3600 6-Core Processor
14:37:43:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
14:37:43:WU01:FS01:0x22:       CPUs: 12
14:37:43:WU01:FS01:0x22:     Memory: 31.37GiB
14:37:43:WU01:FS01:0x22:Free Memory: 26.09GiB
14:37:43:WU01:FS01:0x22:    Threads: POSIX_THREADS
14:37:43:WU01:FS01:0x22: OS Version: 5.6
14:37:43:WU01:FS01:0x22:Has Battery: false
14:37:43:WU01:FS01:0x22: On Battery: false
14:37:43:WU01:FS01:0x22: UTC Offset: 2
14:37:43:WU01:FS01:0x22:        PID: 17761
14:37:43:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
14:37:43:WU01:FS01:0x22:         OS: Linux 5.6.6-050606-generic x86_64
14:37:43:WU01:FS01:0x22:    OS Arch: AMD64
14:37:43:WU01:FS01:0x22:********************************************************************************
14:37:43:WU01:FS01:0x22:Project: 14430 (Run 0, Clone 886, Gen 31)
14:37:43:WU01:FS01:0x22:Unit: 0x0000002b0d5262775e8b4d597b7ac87c
14:37:43:WU01:FS01:0x22:Reading tar file core.xml
14:37:43:WU01:FS01:0x22:Reading tar file integrator.xml
14:37:43:WU01:FS01:0x22:Reading tar file state.xml
14:37:43:WU01:FS01:0x22:Reading tar file system.xml
14:37:44:WU01:FS01:0x22:Digital signatures verified
14:37:44:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:37:44:WU01:FS01:0x22:Version 0.0.5
14:37:57:WU01:FS01:0x22:ERROR:exception: Error initializing context: clGetDeviceIDs (-1)
14:37:57:WU01:FS01:0x22:Saving result file ../logfile_01.txt
14:37:57:WU01:FS01:0x22:Saving result file science.log
14:37:57:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
14:37:57:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
14:37:57:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14430 run:0 clone:886 gen:31 core:0x22 unit:0x0000002b0d5262775e8b4d597b7ac87c
14:37:57:WU01:FS01:Uploading 7.50KiB to 13.82.98.119
lsb_release -a:

Code: Select all

Distributor ID:	LinuxMint
Description:	Linux Mint 19.3 Tricia
Release:	19.3
Codename:	tricia
clinfo:

Code: Select all

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3075.10)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx1012
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 AMD-APP (3075.10)
  Driver Version                                  3075.10 (PAL,LC)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         Radeon RX 5500 XT
  Device Topology (AMD)                           PCI-E, 28:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               11
  SIMD per compute unit (AMD)                     2
  SIMD width (AMD)                                32
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1890MHz
  Graphics IP (AMD)                               10.12
  Device Partition                                (core)
    Max number of sub-devices                     11
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              32
  Wavefront width (AMD)                           32
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8573157376 (7.984GiB)
  Global free memory (AMD)                        8306688 (7.922GiB)
  Global memory channels (AMD)                    4
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           7059013632 (6.574GiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    6353112064 (5.917GiB)
  Preferred total size of global vars             8573157376 (7.984GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    16
  Max pipe packet size                            2764046336 (2.574GiB)
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        7059013632 (6.574GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                262144 (256KiB)
    Max size                                      8388608 (8MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Number of P2P devices (AMD)                     0
  P2P devices (AMD)                               <printDeviceInfo:144: get number of CL_DEVICE_P2P_DEVICES_AMD : error -30>
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1588399835813280593ns (Sat May  2 08:10:35 2020)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  4
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             0
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx1012
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx1012
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx1012
FAHBench-cmd -d:

Code: Select all

Device                    Type        Dev Plat  Platform version             Device version              
----------------------------------------------------------------------------------------------------
gfx1012                   OpenCL        0    0  OpenCL 2.1 AMD-APP (3075.10) OpenCL 2.0 AMD-APP (3075.10)
FAHBench-cmd --device-id=0 --platform=CPU --precision=single --workunit=dhfr --nan-check=0 --run-length=60: (which seems to work as expected)

Code: Select all

FAHBench Simulation
-------------------
Plugin directory: "/usr/lib/openmm"
Work unit: dhfr
WU Name: Dihydrofolate reductase
WU Description: A common system for benchmarking molecular dynamics
System XML: /usr/share/fahbench/workunits/dhfr/system.xml
Integrator XML: /usr/share/fahbench/workunits/dhfr/integrator.xml
State XML: /usr/share/fahbench/workunits/dhfr/state.xml
Step chunk: 40
Device ID 0; Platform CPU
Run length: 60s

Loading plugins from plugin directory
Number of registered plugins: 3
Deserializing input files: system
Deserializing input files: state
Deserializing input files: integrator
Creating context (may take several minutes)
Checking accuracy against reference code
Creating reference context (may take several minutes)
Comparing forces and energy
Starting Benchmark
                                                                                
Benchmarking finished
Final score:    2.6677
Scaled score:   2.6677 (23558 atoms)
FAHBench-cmd --device-id=0 --platform-id=0 --platform=OpenCL --precision=single --workunit=dhfr --nan-check=0 --run-length=60: (seems NOT to work as expected)

Code: Select all

FAHBench Simulation
-------------------
Plugin directory: "/usr/lib/openmm"
Work unit: dhfr
WU Name: Dihydrofolate reductase
WU Description: A common system for benchmarking molecular dynamics
System XML: /usr/share/fahbench/workunits/dhfr/system.xml
Integrator XML: /usr/share/fahbench/workunits/dhfr/integrator.xml
State XML: /usr/share/fahbench/workunits/dhfr/state.xml
Step chunk: 40
Device ID 0; Platform OpenCL; Platform ID 0
Run length: 60s

Loading plugins from plugin directory
Number of registered plugins: 3
Deserializing input files: system
Deserializing input files: state
Deserializing input files: integrator
Creating context (may take several minutes)
Checking accuracy against reference code
Creating reference context (may take several minutes)
Comparing forces and energy

Something went wrong:
Force RMSE error of 885677 with threshold of 5
Thanks in advance.

Best regards

Re: BAD_WORK_UNIT (114 = 0x72) on AMD Radeon RX 5500 XT

Posted: Sat May 02, 2020 6:42 pm
by PantherX
Welcome to the F@H Forum 4n0n,

I have read somewhere on the Forum that the latest AMD Drivers have removed OpenCL support from some old GPU models which are supported. The "fix" was to use a previous version of the drivers. By chance, was there a recent driver update which you can revert and check to see if it resolves your problem or not?

Re: BAD_WORK_UNIT (114 = 0x72) on AMD Radeon RX 5500 XT

Posted: Sat May 02, 2020 8:05 pm
by Joe_H
Or a recent update broke a link to the necessary OpenCL libraries. The RX 5500 is recent enough that it should still have OpenCL support in the drivers from AMD, however I don't know whether the package downloaded from AMD includes that support or if you also need to install a package with matching OpenCL support. You may also need to install the OpenCL-dev package.

Re: BAD_WORK_UNIT (114 = 0x72) on AMD Radeon RX 5500 XT

Posted: Sat May 02, 2020 9:00 pm
by PantherX
I found this topic which might potentially help you: viewtopic.php?f=81&t=33353

Re: BAD_WORK_UNIT (114 = 0x72) on AMD Radeon RX 5500 XT

Posted: Sun May 03, 2020 11:18 am
by 4n0n
PantherX wrote:I found this topic which might potentially help you: viewtopic.php?f=81&t=33353
Thanks, PantherX for your reply. In fact this is indeed the solution for my problem. In the meantime i figured out the following:

1. The root cause is the way, FAHClient spawns its child processes on linux. In consequence they have not the sufficient privileges to access the resources (OpenCL) required for GPU processing (see https://github.com/FoldingAtHome/fah-issues/issues/1418 for details). So a real "solution" can only be done by the devs fixing the bug in FAHClient.

2. There exist two conceptually different workarounds for the problem:

a) change the user, the FAClient runs with from "fahclient" to "root" as noted here: viewtopic.php?f=74&t=31096#p303883
Therefor the file /etc/init.d/FAHClient needs to be changed. The line

Code: Select all

USER=fahclient
needs to be changed to

Code: Select all

USER=root
Attention: This workaround might put your system at risk. Running jobs as "root" normally is a bad idea.

b) creating a systemd Service for starting FAHClient
This approach is described here: viewtopic.php?f=81&t=33353
For what I know, this has no security impact to your system and thus is the preferred method for me.