Page 1 of 1

Debian 8 (Jessie) AMD R9 280 Issues

Posted: Mon Nov 21, 2016 9:43 pm
by picoutputcls
Hi there,

I downloaded FAH for the first time over the weekend. I was able to get things set up and running on my CPU fairly quickly but initially had some problems getting the client to see my GPU. After a bit of reading around I realised I needed to switch from using the Open Source AMD drivers to the official binaries.

I started by downloading the most recent driver from the AMD site.

But after running with these I've been getting the following errors in my logs:

Code: Select all

Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
or

Code: Select all

ahCore returned: BAD_WORK_UNIT (114 = 0x72)
My full log is as follows:

Code: Select all

*********************** Log Started 2016-11-21T21:10:21Z ***********************
21:10:21:************************* Folding@home Client *************************
21:10:21:    Website: http://folding.stanford.edu/
21:10:21:  Copyright: (c) 2009-2014 Stanford University
21:10:21:     Author: Joseph Coffland <[email protected]>
21:10:21:       Args: --child --lifeline 1037 /etc/fahclient/config.xml --run-as
21:10:21:             fahclient --pid-file=/var/run/fahclient.pid --daemon
21:10:21:     Config: /etc/fahclient/config.xml
21:10:21:******************************** Build ********************************
21:10:21:    Version: 7.4.4
21:10:21:       Date: Mar 4 2014
21:10:21:       Time: 12:02:38
21:10:21:    SVN Rev: 4130
21:10:21:     Branch: fah/trunk/client
21:10:21:   Compiler: GNU 4.4.7
21:10:21:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
21:10:21:             -fno-unsafe-math-optimizations -msse2
21:10:21:   Platform: linux2 3.2.0-1-amd64
21:10:21:       Bits: 64
21:10:21:       Mode: Release
21:10:21:******************************* System ********************************
21:10:21:        CPU: AMD FX(tm)-8120 Eight-Core Processor
21:10:21:     CPU ID: AuthenticAMD Family 21 Model 1 Stepping 2
21:10:21:       CPUs: 8
21:10:21:     Memory: 15.65GiB
21:10:21:Free Memory: 15.40GiB
21:10:21:    Threads: POSIX_THREADS
21:10:21: OS Version: 3.16
21:10:21:Has Battery: false
21:10:21: On Battery: false
21:10:21: UTC Offset: 0
21:10:21:        PID: 1050
21:10:21:        CWD: /var/lib/fahclient
21:10:21:         OS: Linux 3.16.0-4-amd64 x86_64
21:10:21:    OS Arch: AMD64
21:10:21:       GPUs: 1
21:10:21:      GPU 0: ATI:5 Tahiti PRO [Radeon HD 7950]
21:10:21:       CUDA: Not detected
21:10:21:***********************************************************************
21:10:21:<config>
21:10:21:  <!-- Client Control -->
21:10:21:  <fold-anon v='true'/>
21:10:21:
21:10:21:  <!-- Folding Slot Configuration -->
21:10:21:  <cause v='ALZHEIMERS'/>
21:10:21:  <gpu v='false'/>
21:10:21:
21:10:21:  <!-- Network -->
21:10:21:  <proxy v=':8080'/>
21:10:21:
21:10:21:  <!-- Slot Control -->
21:10:21:  <pause-on-battery v='false'/>
21:10:21:  <power v='full'/>
21:10:21:
21:10:21:  <!-- User Information -->
21:10:21:  <passkey v='********************************'/>
21:10:21:  <team v='231300'/>
21:10:21:  <user v='PicoutputCls'/>
21:10:21:
21:10:21:  <!-- Folding Slots -->
21:10:21:  <slot id='0' type='CPU'>
21:10:21:    <cpus v='6'/>
21:10:21:  </slot>
21:10:21:  <slot id='2' type='CPU'>
21:10:21:    <cpus v='1'/>
21:10:21:  </slot>
21:10:21:  <slot id='1' type='GPU'/>
21:10:21:</config>
21:10:21:Switching to user fahclient
21:10:21:Trying to access database...
21:10:21:Successfully acquired database lock
21:10:21:Enabled folding slot 00: READY cpu:6
21:10:21:Enabled folding slot 02: READY cpu:1
21:10:21:Enabled folding slot 01: READY gpu:0:Tahiti PRO [Radeon HD 7950]
21:10:21:WU02:FS02:Starting
21:10:21:WU02:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 02 -suffix 01 -version 704 -lifeline 1050 -checkpoint 15
21:10:21:WU02:FS02:Started FahCore on PID 1072
21:10:21:WU02:FS02:Core PID:1080
21:10:21:WU02:FS02:FahCore 0xa4 started
21:10:22:WU00:FS01:Starting
21:10:22:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/ATI/R600/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1050 -checkpoint 15 -gpu 0 -gpu-vendor ati
21:10:22:WU00:FS01:Started FahCore on PID 1089
21:10:22:WU00:FS01:Core PID:1093
21:10:22:WU00:FS01:FahCore 0x21 started
21:10:22:WU01:FS00:Starting
21:10:22:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1050 -checkpoint 15 -np 6
21:10:22:WU01:FS00:Started FahCore on PID 1094
21:10:22:WU01:FS00:Core PID:1098
21:10:22:WU01:FS00:FahCore 0xa4 started
21:10:22:WU02:FS02:0xa4:
21:10:22:WU02:FS02:0xa4:*------------------------------*
21:10:22:WU02:FS02:0xa4:Folding@Home Gromacs GB Core
21:10:22:WU02:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
21:10:22:WU02:FS02:0xa4:
21:10:22:WU02:FS02:0xa4:Preparing to commence simulation
21:10:22:WU02:FS02:0xa4:- Looking at optimizations...
21:10:22:WU02:FS02:0xa4:- Files status OK
21:10:22:WU02:FS02:0xa4:- Expanded 824911 -> 1398040 (decompressed 169.4 percent)
21:10:22:WU02:FS02:0xa4:Called DecompressByteArray: compressed_data_size=824911 data_size=1398040, decompressed_data_size=1398040 diff=0
21:10:22:WU02:FS02:0xa4:- Digital signature verified
21:10:22:WU02:FS02:0xa4:
21:10:22:WU02:FS02:0xa4:Project: 9039 (Run 376, Clone 1, Gen 553)
21:10:22:WU02:FS02:0xa4:
21:10:22:WU02:FS02:0xa4:Assembly optimizations on if available.
21:10:22:WU02:FS02:0xa4:Entering M.D.
21:10:22:WU01:FS00:0xa4:
21:10:22:WU01:FS00:0xa4:*------------------------------*
21:10:22:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
21:10:22:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
21:10:22:WU01:FS00:0xa4:
21:10:22:WU01:FS00:0xa4:Preparing to commence simulation
21:10:22:WU01:FS00:0xa4:- Looking at optimizations...
21:10:22:WU01:FS00:0xa4:- Files status OK
21:10:22:WU01:FS00:0xa4:- Expanded 826278 -> 1402440 (decompressed 169.7 percent)
21:10:22:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=826278 data_size=1402440, decompressed_data_size=1402440 diff=0
21:10:22:WU01:FS00:0xa4:- Digital signature verified
21:10:22:WU01:FS00:0xa4:
21:10:22:WU01:FS00:0xa4:Project: 9040 (Run 261, Clone 0, Gen 320)
21:10:22:WU01:FS00:0xa4:
21:10:22:WU01:FS00:0xa4:Assembly optimizations on if available.
21:10:22:WU01:FS00:0xa4:Entering M.D.
21:10:24:WU00:FS01:0x21:*********************** Log Started 2016-11-21T21:10:23Z ***********************
21:10:24:WU00:FS01:0x21:Project: 9191 (Run 0, Clone 43, Gen 89)
21:10:24:WU00:FS01:0x21:Unit: 0x00000085ab40415457cb2c937ef3e2bb
21:10:24:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:10:24:WU00:FS01:0x21:Machine: 1
21:10:24:WU00:FS01:0x21:Digital signatures verified
21:10:24:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:10:24:WU00:FS01:0x21:Version 0.0.17
21:10:28:WU02:FS02:0xa4:Using Gromacs checkpoints
21:10:28:WU01:FS00:0xa4:Using Gromacs checkpoints
21:10:28:WU02:FS02:0xa4:Resuming from checkpoint
21:10:28:WU02:FS02:0xa4:Verified 02/wudata_01.log
21:10:28:WU02:FS02:0xa4:Verified 02/wudata_01.trr
21:10:28:WU02:FS02:0xa4:Verified 02/wudata_01.xtc
21:10:28:WU02:FS02:0xa4:Verified 02/wudata_01.edr
21:10:28:WU01:FS00:0xa4:Resuming from checkpoint
21:10:28:WU01:FS00:0xa4:Verified 01/wudata_01.log
21:10:28:WU01:FS00:0xa4:Verified 01/wudata_01.trr
21:10:28:WU01:FS00:0xa4:Verified 01/wudata_01.xtc
21:10:28:WU01:FS00:0xa4:Verified 01/wudata_01.edr
21:10:28:WU01:FS00:0xa4:Completed 187115 out of 250000 steps  (74%)
21:10:29:WU02:FS02:0xa4:Completed 236905 out of 250000 steps  (94%)
21:10:33:WU00:FS01:0x21:Completed 0 out of 2500000 steps (0%)
21:10:33:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:11:02:WU01:FS00:0xa4:Completed 187500 out of 250000 steps  (75%)
21:11:59:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:12:11:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
21:12:11:WU00:FS01:Starting
21:12:11:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/ATI/R600/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1050 -checkpoint 15 -gpu 0 -gpu-vendor ati
21:12:11:WU00:FS01:Started FahCore on PID 1916
21:12:11:WU00:FS01:Core PID:1920
21:12:11:WU00:FS01:FahCore 0x21 started
21:12:11:WU00:FS01:0x21:*********************** Log Started 2016-11-21T21:12:11Z ***********************
21:12:11:WU00:FS01:0x21:Project: 9191 (Run 0, Clone 43, Gen 89)
21:12:11:WU00:FS01:0x21:Unit: 0x00000085ab40415457cb2c937ef3e2bb
21:12:11:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:12:11:WU00:FS01:0x21:Machine: 1
21:12:11:WU00:FS01:0x21:Digital signatures verified
21:12:11:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:12:11:WU00:FS01:0x21:Version 0.0.17
21:12:23:WU00:FS01:0x21:Completed 0 out of 2500000 steps (0%)
21:12:23:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:13:40:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:14:14:WU02:FS02:0xa4:Completed 237500 out of 250000 steps  (95%)
21:14:34:WU01:FS00:0xa4:Completed 190000 out of 250000 steps  (76%)
21:14:58:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:16:13:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:16:13:WU00:FS01:0x21:ERROR:Max Retries Reached
21:16:13:WU00:FS01:0x21:Saving result file logfile_01.txt
21:16:13:WU00:FS01:0x21:Saving result file badstate-0.xml
21:16:16:WU00:FS01:0x21:Saving result file badstate-1.xml
21:16:19:WU00:FS01:0x21:Saving result file badstate-2.xml
21:16:22:WU00:FS01:0x21:Saving result file log.txt
21:16:22:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
21:16:23:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:16:23:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9191 run:0 clone:43 gen:89 core:0x21 unit:0x00000085ab40415457cb2c937ef3e2bb
21:16:23:WU00:FS01:Uploading 7.10KiB to 171.64.65.84
21:16:23:WU00:FS01:Connecting to 171.64.65.84:8080
21:16:23:WU00:FS01:Upload complete
21:16:23:WU03:FS01:Connecting to 171.67.108.45:80
21:16:23:WU00:FS01:Server responded WORK_ACK (400)
21:16:23:WU00:FS01:Cleaning up
21:16:24:WU03:FS01:Assigned to work server 140.163.4.243
21:16:24:WU03:FS01:Requesting new work unit for slot 01: READY gpu:0:Tahiti PRO [Radeon HD 7950] from 140.163.4.243
21:16:24:WU03:FS01:Connecting to 140.163.4.243:8080
21:16:24:WU03:FS01:Downloading 2.67MiB
21:16:26:WU03:FS01:Download complete
21:16:26:WU03:FS01:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:11709 run:1 clone:172 gen:63 core:0x21 unit:0x0000005c8ca304f357ed33dbe433e274
21:16:26:WU03:FS01:Starting
21:16:26:WU03:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/ATI/R600/Core_21.fah/FahCore_21 -dir 03 -suffix 01 -version 704 -lifeline 1050 -checkpoint 15 -gpu 0 -gpu-vendor ati
21:16:26:WU03:FS01:Started FahCore on PID 2152
21:16:26:WU03:FS01:Core PID:2156
21:16:26:WU03:FS01:FahCore 0x21 started
21:16:27:WU03:FS01:0x21:*********************** Log Started 2016-11-21T21:16:26Z ***********************
21:16:27:WU03:FS01:0x21:Project: 11709 (Run 1, Clone 172, Gen 63)
21:16:27:WU03:FS01:0x21:Unit: 0x0000005c8ca304f357ed33dbe433e274
21:16:27:WU03:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:16:27:WU03:FS01:0x21:Machine: 1
21:16:27:WU03:FS01:0x21:Reading tar file core.xml
21:16:27:WU03:FS01:0x21:Reading tar file system.xml
21:16:27:WU03:FS01:0x21:Reading tar file integrator.xml
21:16:27:WU03:FS01:0x21:Reading tar file state.xml
21:16:27:WU03:FS01:0x21:Digital signatures verified
21:16:27:WU03:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:16:27:WU03:FS01:0x21:Version 0.0.17
21:16:37:WU03:FS01:0x21:Completed 0 out of 7500000 steps (0%)
21:16:37:WU03:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:17:54:WU03:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:17:58:WU01:FS00:0xa4:Completed 192500 out of 250000 steps  (77%)
21:19:08:WU03:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:20:21:WU03:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:20:21:WU03:FS01:0x21:ERROR:Max Retries Reached
21:20:21:WU03:FS01:0x21:Saving result file logfile_01.txt
21:20:21:WU03:FS01:0x21:Saving result file badstate-0.xml
21:20:24:WU03:FS01:0x21:Saving result file badstate-1.xml
21:20:27:WU03:FS01:0x21:Saving result file badstate-2.xml
21:20:30:WU03:FS01:0x21:Saving result file log.txt
21:20:31:WU03:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
21:20:32:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:20:32:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:11709 run:1 clone:172 gen:63 core:0x21 unit:0x0000005c8ca304f357ed33dbe433e274
21:20:32:WU03:FS01:Uploading 6.25KiB to 140.163.4.243
21:20:32:WU03:FS01:Connecting to 140.163.4.243:8080
21:20:32:WU00:FS01:Connecting to 171.67.108.45:80
21:20:32:WU03:FS01:Upload complete
21:20:32:WU03:FS01:Server responded WORK_ACK (400)
21:20:32:WU03:FS01:Cleaning up
21:20:32:WU00:FS01:Assigned to work server 171.64.65.92
21:20:32:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Tahiti PRO [Radeon HD 7950] from 171.64.65.92
21:20:32:WU00:FS01:Connecting to 171.64.65.92:8080
21:20:33:WU00:FS01:Downloading 2.52MiB
21:20:36:WU00:FS01:Download complete
21:20:36:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9194 run:0 clone:72 gen:126 core:0x21 unit:0x000000a6ab40415c57cb2e2086193c9b
21:20:36:WU00:FS01:Starting
21:20:36:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/ATI/R600/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1050 -checkpoint 15 -gpu 0 -gpu-vendor ati
21:20:36:WU00:FS01:Started FahCore on PID 2486
21:20:36:WU00:FS01:Core PID:2490
21:20:36:WU00:FS01:FahCore 0x21 started
21:20:37:WU00:FS01:0x21:*********************** Log Started 2016-11-21T21:20:36Z ***********************
21:20:37:WU00:FS01:0x21:Project: 9194 (Run 0, Clone 72, Gen 126)
21:20:37:WU00:FS01:0x21:Unit: 0x000000a6ab40415c57cb2e2086193c9b
21:20:37:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:20:37:WU00:FS01:0x21:Machine: 1
21:20:37:WU00:FS01:0x21:Reading tar file core.xml
21:20:37:WU00:FS01:0x21:Reading tar file system.xml
21:20:37:WU00:FS01:0x21:Reading tar file integrator.xml
21:20:37:WU00:FS01:0x21:Reading tar file state.xml
21:20:37:WU00:FS01:0x21:Digital signatures verified
21:20:37:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:20:37:WU00:FS01:0x21:Version 0.0.17
21:20:49:WU00:FS01:0x21:Completed 0 out of 2500000 steps (0%)
21:20:49:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:21:33:WU01:FS00:0xa4:Completed 195000 out of 250000 steps  (78%)
21:21:55:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:23:00:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:24:09:WU00:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
21:24:09:WU00:FS01:0x21:ERROR:Max Retries Reached
21:24:09:WU00:FS01:0x21:Saving result file logfile_01.txt
21:24:09:WU00:FS01:0x21:Saving result file badstate-0.xml
21:24:11:WU00:FS01:0x21:Saving result file badstate-1.xml
21:24:14:WU00:FS01:0x21:Saving result file badstate-2.xml
21:24:17:WU00:FS01:0x21:Saving result file log.txt
21:24:17:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
21:24:18:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
The other behaviour I have noted is that from running

Code: Select all

aticonfig -odgc
I see that the GPU is getting used very little if at all.

I read somewhere else online that it may be a driver incompatibility with the FAHClient so I have tried installing every release of the AMD Linux driver since V14.12 (any earlier seems to have issue with my kernel version) along with the binaries provided through the Debian repositories, with the same behaviour.

I was wondering if anyone had any advice or is running FAH in Linux with an R9 280 and could give me some pointers to a known working configuration I could maybe try?

I realise the issue may be my card but from game play and the fact I seem to be able to compile and run basic OpenCL code I am inclined to believe that is most likely not the case. Does anyone know of any benchmarks which might help rule this out?

Thanks in advance!

Re: Debian 8 (Jessie) AMD R9 280 Issues

Posted: Tue Nov 22, 2016 6:30 pm
by DeeGee
You probably have the same permission problem as I had/have. I just run FAHClient as my normal user in a screen and it works then. But if I try to run FAHClient as service/as it's own user account it doesn't work.

For more info check this forum thread: viewtopic.php?f=96&t=28504.

Re: Debian 8 (Jessie) AMD R9 280 Issues

Posted: Tue Nov 22, 2016 7:45 pm
by picoutputcls
Thanks DeeGee! :D

I just stopped my fahclient service and changed my service configuration in

Code: Select all

/etc/init.d/FAHClient
so that the line

Code: Select all

USER=fahclient
instead was

Code: Select all

USER=my-linux-username
and ran

Code: Select all

sudo systemctl daemon-reload
before starting the fahclient service once more.

Initially this did not work but I managed to trace that down to a DB write permissions issue in

Code: Select all

/var/lib/fahclient
. To resolve this I did a

Code: Select all

chmod -R my-linux-username:my-linux-username /var/lib/fahclient 
.

So far I can now see GPU load when I watch

Code: Select all

aticonfig --odgc
and my logs don't seem to contain the errors I was having before.

Hopefully this is the issue fixed but if not I'll be sure to comment below.