Page 1 of 1

Failing all GPU Work Units

Posted: Sat Dec 20, 2014 10:44 pm
by compdewd
Hi there,

I seem to be failing GPU Work Units on both of my graphics cards. My cards have not been folding for a week since I have been on vacation. I just got back home and started the cards back up and they are now getting BAD_WORK_UNIT errors on every unit that they receive from several different projects. Before I went on vacation, the cards were working completely normally and since then I have made no changes. There are no (manual) overclocks on either card.

I have restarted my computer, deleted and readded the GPU slots, deleted FahCore_17 from /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/, uninstalled and reinstalled FAHClient: all with no success. Can anyone recommend any other steps? (I have paused the slots because I don't know if all of these Dumped units are negatively impacting my Unit Return Ratio.)

System details:

Code: Select all

*********************** Log Started 2014-12-20T22:20:35Z ***********************
22:20:35:************************* Folding@home Client *************************
22:20:35:    Website: http://folding.stanford.edu/
22:20:35:  Copyright: (c) 2009-2014 Stanford University
22:20:35:     Author: Joseph Coffland <[email protected]>
22:20:35:       Args: --child --lifeline 3805 /etc/fahclient/config.xml --run-as
22:20:35:             fahclient --pid-file=/var/run/fahclient.pid --daemon
22:20:35:     Config: /etc/fahclient/config.xml
22:20:35:******************************** Build ********************************
22:20:35:    Version: 7.4.4
22:20:35:       Date: Mar 4 2014
22:20:35:       Time: 12:02:38
22:20:35:    SVN Rev: 4130
22:20:35:     Branch: fah/trunk/client
22:20:35:   Compiler: GNU 4.4.7
22:20:35:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
22:20:35:             -fno-unsafe-math-optimizations -msse2
22:20:35:   Platform: linux2 3.2.0-1-amd64
22:20:35:       Bits: 64
22:20:35:       Mode: Release
22:20:35:******************************* System ********************************
22:20:35:        CPU: AMD FX(tm)-8120 Eight-Core Processor
22:20:35:     CPU ID: AuthenticAMD Family 21 Model 1 Stepping 2
22:20:35:       CPUs: 8
22:20:35:     Memory: 3.84GiB
22:20:35:Free Memory: 643.03MiB
22:20:35:    Threads: POSIX_THREADS
22:20:35: OS Version: 3.13
22:20:35:Has Battery: false
22:20:35: On Battery: false
22:20:35: UTC Offset: -5
22:20:35:        PID: 3807
22:20:35:        CWD: /var/lib/fahclient
22:20:35:         OS: Linux 3.13.0-24-generic x86_64
22:20:35:    OS Arch: AMD64
22:20:35:       GPUs: 2
22:20:35:      GPU 0: NVIDIA:2 GF104 [GeForce GTX 460]
22:20:35:      GPU 1: NVIDIA:3 GK106 [GeForce GTX 650 Ti]
22:20:35:       CUDA: Not detected
22:20:35:***********************************************************************
Relevant log information:

Code: Select all

22:21:23:Adding folding slot 01: READY gpu:0:GF104 [GeForce GTX 460]
22:21:23:Saving configuration to /etc/fahclient/config.xml
22:21:23:<config>
22:21:23:  <!-- Client Control -->
22:21:23:  <fold-anon v='true'/>
22:21:23:
22:21:23:  <!-- Folding Slot Configuration -->
22:21:23:  <gpu v='false'/>
22:21:23:
22:21:23:  <!-- Network -->
22:21:23:  <proxy v=':8080'/>
22:21:23:
22:21:23:  <!-- Folding Slots -->
22:21:23:  <slot id='1' type='GPU'>
22:21:23:    <cuda-index v='1'/>
22:21:23:    <gpu-index v='0'/>
22:21:23:    <opencl-index v='1'/>
22:21:23:  </slot>
22:21:23:</config>
22:21:23:WARNING:WU00:Slot ID 0 no longer exists and there are no other matching slots, dumping
22:21:23:WU00:Sending unit results: id:00 state:SEND error:DUMPED project:9009 run:659 clone:1 gen:126 core:0xa4 unit:0x0000008d664f2de453868a0c32745789
22:21:23:WU00:Connecting to 171.64.65.124:8080
22:21:24:WU00:Server responded WORK_ACK (400)
22:21:24:WU00:Cleaning up
22:21:24:WU01:FS01:Connecting to 171.67.108.200:80
22:21:24:WU01:FS01:Assigned to work server 171.67.108.52
22:21:24:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 171.67.108.52
22:21:24:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:24:WU01:FS01:Downloading 1.52MiB
22:21:27:WU01:FS01:Download complete
22:21:27:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9201 run:626 clone:3 gen:106 core:0x17 unit:0x000000a16652edc45399eea7e9ddf0c0
22:21:27:WU01:FS01:Downloading core from http://web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah
22:21:27:WU01:FS01:Connecting to web.stanford.edu:80
22:21:27:WU01:FS01:FahCore 17: Downloading 3.01MiB
22:21:31:WU01:FS01:FahCore 17: Download complete
22:21:31:WU01:FS01:Valid core signature
22:21:31:WU01:FS01:Unpacked 8.16MiB to cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17
22:21:31:WU01:FS01:Starting
22:21:31:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 01 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:31:WU01:FS01:Started FahCore on PID 3850
22:21:31:WU01:FS01:Core PID:3854
22:21:31:WU01:FS01:FahCore 0x17 started
22:21:32:WU01:FS01:0x17:*********************** Log Started 2014-12-20T22:21:31Z ***********************
22:21:32:WU01:FS01:0x17:Project: 9201 (Run 626, Clone 3, Gen 106)
22:21:32:WU01:FS01:0x17:Unit: 0x000000a16652edc45399eea7e9ddf0c0
22:21:32:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:32:WU01:FS01:0x17:Machine: 1
22:21:32:WU01:FS01:0x17:Reading tar file state.xml
22:21:32:WU01:FS01:0x17:Reading tar file system.xml
22:21:32:WU01:FS01:0x17:Reading tar file integrator.xml
22:21:32:WU01:FS01:0x17:Reading tar file core.xml
22:21:32:WU01:FS01:0x17:Digital signatures verified
22:21:32:WU01:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:32:WU01:FS01:0x17:Saving result file logfile_01.txt
22:21:32:WU01:FS01:0x17:Saving result file log.txt
22:21:32:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:32:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:32:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9201 run:626 clone:3 gen:106 core:0x17 unit:0x000000a16652edc45399eea7e9ddf0c0
22:21:32:WU01:FS01:Uploading 1.86KiB to 171.67.108.52
22:21:32:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:32:WU00:FS01:Connecting to 171.67.108.200:80
22:21:32:WU01:FS01:Upload complete
22:21:32:WU01:FS01:Server responded WORK_ACK (400)
22:21:32:WU01:FS01:Cleaning up
22:21:33:WU00:FS01:Assigned to work server 171.67.108.52
22:21:33:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 171.67.108.52
22:21:33:WU00:FS01:Connecting to 171.67.108.52:8080
22:21:33:WU00:FS01:Downloading 1.53MiB
22:21:36:Saving configuration to /etc/fahclient/config.xml
22:21:36:<config>
22:21:36:  <!-- Client Control -->
22:21:36:  <fold-anon v='true'/>
22:21:36:
22:21:36:  <!-- Folding Slot Configuration -->
22:21:36:  <gpu v='false'/>
22:21:36:
22:21:36:  <!-- Network -->
22:21:36:  <proxy v=':8080'/>
22:21:36:
22:21:36:  <!-- Folding Slots -->
22:21:36:  <slot id='1' type='GPU'>
22:21:36:    <cuda-index v='1'/>
22:21:36:    <gpu-index v='0'/>
22:21:36:    <opencl-index v='1'/>
22:21:36:  </slot>
22:21:36:</config>
22:21:36:WU00:FS01:Download complete
22:21:36:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9201 run:265 clone:2 gen:78 core:0x17 unit:0x000000816652edc45399e06f480e4522
22:21:36:WU00:FS01:Starting
22:21:36:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:36:WU00:FS01:Started FahCore on PID 3857
22:21:36:WU00:FS01:Core PID:3861
22:21:36:WU00:FS01:FahCore 0x17 started
22:21:36:WU00:FS01:0x17:*********************** Log Started 2014-12-20T22:21:36Z ***********************
22:21:36:WU00:FS01:0x17:Project: 9201 (Run 265, Clone 2, Gen 78)
22:21:36:WU00:FS01:0x17:Unit: 0x000000816652edc45399e06f480e4522
22:21:36:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:36:WU00:FS01:0x17:Machine: 1
22:21:36:WU00:FS01:0x17:Reading tar file state.xml
22:21:36:WU00:FS01:0x17:Reading tar file system.xml
22:21:36:WU00:FS01:0x17:Reading tar file integrator.xml
22:21:36:WU00:FS01:0x17:Reading tar file core.xml
22:21:36:WU00:FS01:0x17:Digital signatures verified
22:21:36:WU00:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:36:WU00:FS01:0x17:Saving result file logfile_01.txt
22:21:36:WU00:FS01:0x17:Saving result file log.txt
22:21:36:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:37:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:37:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9201 run:265 clone:2 gen:78 core:0x17 unit:0x000000816652edc45399e06f480e4522
22:21:37:WU00:FS01:Uploading 1.87KiB to 171.67.108.52
22:21:37:WU00:FS01:Connecting to 171.67.108.52:8080
22:21:37:WU00:FS01:Upload complete
22:21:37:WU00:FS01:Server responded WORK_ACK (400)
22:21:37:WU00:FS01:Cleaning up
22:21:37:WU01:FS01:Connecting to 171.67.108.200:80
22:21:38:WU01:FS01:Assigned to work server 171.67.108.52
22:21:38:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 171.67.108.52
22:21:38:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:38:WU01:FS01:Downloading 1.52MiB
22:21:40:WU01:FS01:Download complete
22:21:41:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9201 run:466 clone:4 gen:76 core:0x17 unit:0x000000706652edc45399e861591d2356
22:21:41:WU01:FS01:Starting
22:21:41:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 01 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:41:WU01:FS01:Started FahCore on PID 3864
22:21:41:WU01:FS01:Core PID:3868
22:21:41:WU01:FS01:FahCore 0x17 started
22:21:41:WU01:FS01:0x17:*********************** Log Started 2014-12-20T22:21:41Z ***********************
22:21:41:WU01:FS01:0x17:Project: 9201 (Run 466, Clone 4, Gen 76)
22:21:41:WU01:FS01:0x17:Unit: 0x000000706652edc45399e861591d2356
22:21:41:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:41:WU01:FS01:0x17:Machine: 1
22:21:41:WU01:FS01:0x17:Reading tar file state.xml
22:21:41:WU01:FS01:0x17:Reading tar file system.xml
22:21:41:WU01:FS01:0x17:Reading tar file integrator.xml
22:21:41:WU01:FS01:0x17:Reading tar file core.xml
22:21:41:WU01:FS01:0x17:Digital signatures verified
22:21:41:WU01:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:41:WU01:FS01:0x17:Saving result file logfile_01.txt
22:21:41:WU01:FS01:0x17:Saving result file log.txt
22:21:41:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:41:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:41:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9201 run:466 clone:4 gen:76 core:0x17 unit:0x000000706652edc45399e861591d2356
22:21:41:WU01:FS01:Uploading 1.86KiB to 171.67.108.52
22:21:41:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:42:WU01:FS01:Upload complete
22:21:42:WU01:FS01:Server responded WORK_ACK (400)
22:21:42:WU01:FS01:Cleaning up
22:21:42:WU00:FS01:Connecting to 171.67.108.200:80
22:21:42:WU00:FS01:Assigned to work server 140.163.4.231
22:21:42:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 140.163.4.231
22:21:42:WU00:FS01:Connecting to 140.163.4.231:8080
22:21:42:WU00:FS01:Downloading 4.83MiB
22:21:45:WU00:FS01:Download complete
22:21:45:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:48 clone:7 gen:63 core:0x17 unit:0x0000007c538b3db753285d7aa08e7365
22:21:45:WU00:FS01:Starting
22:21:45:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:45:WU00:FS01:Started FahCore on PID 3871
22:21:45:WU00:FS01:Core PID:3875
22:21:45:WU00:FS01:FahCore 0x17 started
22:21:45:WU00:FS01:0x17:*********************** Log Started 2014-12-20T22:21:45Z ***********************
22:21:45:WU00:FS01:0x17:Project: 13001 (Run 48, Clone 7, Gen 63)
22:21:45:WU00:FS01:0x17:Unit: 0x0000007c538b3db753285d7aa08e7365
22:21:45:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:45:WU00:FS01:0x17:Machine: 1
22:21:45:WU00:FS01:0x17:Reading tar file state.xml
22:21:45:WU00:FS01:0x17:Reading tar file system.xml
22:21:46:WU00:FS01:0x17:Reading tar file integrator.xml
22:21:46:WU00:FS01:0x17:Reading tar file core.xml
22:21:46:WU00:FS01:0x17:Digital signatures verified
22:21:46:WU00:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:46:WU00:FS01:0x17:Saving result file logfile_01.txt
22:21:46:WU00:FS01:0x17:Saving result file log.txt
22:21:46:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:46:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:46:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:48 clone:7 gen:63 core:0x17 unit:0x0000007c538b3db753285d7aa08e7365
22:21:46:WU00:FS01:Uploading 1.87KiB to 140.163.4.231
22:21:46:WU00:FS01:Connecting to 140.163.4.231:8080
22:21:46:WU00:FS01:Upload complete
22:21:46:WU00:FS01:Server responded WORK_ACK (400)
22:21:46:WU00:FS01:Cleaning up
22:21:46:WU01:FS01:Connecting to 171.67.108.200:80
22:21:47:WU01:FS01:Assigned to work server 140.163.4.231
22:21:47:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 140.163.4.231
22:21:47:WU01:FS01:Connecting to 140.163.4.231:8080
22:21:47:FS01:Finishing
22:21:47:WU01:FS01:Downloading 4.84MiB
22:21:49:WU01:FS01:Download complete
22:21:49:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13000 run:275 clone:1 gen:62 core:0x17 unit:0x0000006c538b3db7530fe97a0fbb95ed
22:21:49:WU01:FS01:Starting
22:21:49:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 01 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:49:WU01:FS01:Started FahCore on PID 3878
22:21:49:WU01:FS01:Core PID:3882
22:21:49:WU01:FS01:FahCore 0x17 started
22:21:50:WU01:FS01:0x17:*********************** Log Started 2014-12-20T22:21:49Z ***********************
22:21:50:WU01:FS01:0x17:Project: 13000 (Run 275, Clone 1, Gen 62)
22:21:50:WU01:FS01:0x17:Unit: 0x0000006c538b3db7530fe97a0fbb95ed
22:21:50:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:50:WU01:FS01:0x17:Machine: 1
22:21:50:WU01:FS01:0x17:Reading tar file state.xml
22:21:50:WU01:FS01:0x17:Reading tar file system.xml
22:21:51:WU01:FS01:0x17:Reading tar file integrator.xml
22:21:51:WU01:FS01:0x17:Reading tar file core.xml
22:21:51:WU01:FS01:0x17:Digital signatures verified
22:21:51:WU01:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:51:WU01:FS01:0x17:Saving result file logfile_01.txt
22:21:51:WU01:FS01:0x17:Saving result file log.txt
22:21:51:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:51:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:51:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13000 run:275 clone:1 gen:62 core:0x17 unit:0x0000006c538b3db7530fe97a0fbb95ed
22:21:51:WU01:FS01:Uploading 1.86KiB to 140.163.4.231
22:21:51:WU01:FS01:Connecting to 140.163.4.231:8080
22:21:51:WU01:FS01:Upload complete
22:21:51:WU01:FS01:Server responded WORK_ACK (400)
22:21:51:WU01:FS01:Cleaning up

Re: Failing all GPU Work Units

Posted: Sun Dec 21, 2014 1:04 am
by Joe_H
What version of the video drivers is installed for the GPU's? And did Windows run any updates in the intervening period, especially any updates that would have loaded a different video driver? The errors appear similar to those from a system that does not have the OpenCL support installed that goes along with the video drivers.

Re: Failing all GPU Work Units

Posted: Sun Dec 21, 2014 5:06 am
by compdewd
Thanks for responding, Joe!

I am actually running Linux, but now that you mentioned it, I remember that I saw an NVIDIA driver security update before I left last week. I had installed the update and never thought twice about it. I have downgraded my driver back to what its previous version and I am onto folding again!

Thanks a lot for your help!

P.S. If it is worth anything to anyone, the new driver update for my Linux system was for packages: "nvidia-opencl-icd-331", "nvidia-331", and "libcuda-331". All were updated to version "331.113-0ubuntu0.0.4" which is the version that was causing all of my WUs to crash. It may have just been the OpenCL package that was causing trouble as Joe suspected, but I reverted all three packages back to version "331.38-0ubuntu7" because I didn't want to risk experiencing any incompatibility problems.

Re: Failing all GPU Work Units

Posted: Sun Dec 21, 2014 11:05 am
by davidcoton
There seems to have been some build problems with parts of the nVidia driver set and certain Linux kernels. I also installed that driver update on Ubuntu 14.04 (with whatever kernel) and ended up having to re-install from scratch. However I've now got 14.10 running with 331.113, although I'm not 100% confident that everything built properly it seems to work for FAH.