Failing all GPU Work Units

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
compdewd
Posts: 165
Joined: Sat Jun 09, 2012 6:56 am
Hardware configuration: [1] Debian 8 64-bit: EVGA NVIDIA GTX 650 Ti, MSI NVIDIA GTX 460, AMD FX-8120
[2] Windows 7 64-bit: MSI NVIDIA GTX 460, AMD Phenom II X4
Location: Cincinnati, Ohio, USA
Contact:

Failing all GPU Work Units

Post by compdewd »

Hi there,

I seem to be failing GPU Work Units on both of my graphics cards. My cards have not been folding for a week since I have been on vacation. I just got back home and started the cards back up and they are now getting BAD_WORK_UNIT errors on every unit that they receive from several different projects. Before I went on vacation, the cards were working completely normally and since then I have made no changes. There are no (manual) overclocks on either card.

I have restarted my computer, deleted and readded the GPU slots, deleted FahCore_17 from /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/, uninstalled and reinstalled FAHClient: all with no success. Can anyone recommend any other steps? (I have paused the slots because I don't know if all of these Dumped units are negatively impacting my Unit Return Ratio.)

System details:

Code: Select all

*********************** Log Started 2014-12-20T22:20:35Z ***********************
22:20:35:************************* Folding@home Client *************************
22:20:35:    Website: http://folding.stanford.edu/
22:20:35:  Copyright: (c) 2009-2014 Stanford University
22:20:35:     Author: Joseph Coffland <[email protected]>
22:20:35:       Args: --child --lifeline 3805 /etc/fahclient/config.xml --run-as
22:20:35:             fahclient --pid-file=/var/run/fahclient.pid --daemon
22:20:35:     Config: /etc/fahclient/config.xml
22:20:35:******************************** Build ********************************
22:20:35:    Version: 7.4.4
22:20:35:       Date: Mar 4 2014
22:20:35:       Time: 12:02:38
22:20:35:    SVN Rev: 4130
22:20:35:     Branch: fah/trunk/client
22:20:35:   Compiler: GNU 4.4.7
22:20:35:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
22:20:35:             -fno-unsafe-math-optimizations -msse2
22:20:35:   Platform: linux2 3.2.0-1-amd64
22:20:35:       Bits: 64
22:20:35:       Mode: Release
22:20:35:******************************* System ********************************
22:20:35:        CPU: AMD FX(tm)-8120 Eight-Core Processor
22:20:35:     CPU ID: AuthenticAMD Family 21 Model 1 Stepping 2
22:20:35:       CPUs: 8
22:20:35:     Memory: 3.84GiB
22:20:35:Free Memory: 643.03MiB
22:20:35:    Threads: POSIX_THREADS
22:20:35: OS Version: 3.13
22:20:35:Has Battery: false
22:20:35: On Battery: false
22:20:35: UTC Offset: -5
22:20:35:        PID: 3807
22:20:35:        CWD: /var/lib/fahclient
22:20:35:         OS: Linux 3.13.0-24-generic x86_64
22:20:35:    OS Arch: AMD64
22:20:35:       GPUs: 2
22:20:35:      GPU 0: NVIDIA:2 GF104 [GeForce GTX 460]
22:20:35:      GPU 1: NVIDIA:3 GK106 [GeForce GTX 650 Ti]
22:20:35:       CUDA: Not detected
22:20:35:***********************************************************************
Relevant log information:

Code: Select all

22:21:23:Adding folding slot 01: READY gpu:0:GF104 [GeForce GTX 460]
22:21:23:Saving configuration to /etc/fahclient/config.xml
22:21:23:<config>
22:21:23:  <!-- Client Control -->
22:21:23:  <fold-anon v='true'/>
22:21:23:
22:21:23:  <!-- Folding Slot Configuration -->
22:21:23:  <gpu v='false'/>
22:21:23:
22:21:23:  <!-- Network -->
22:21:23:  <proxy v=':8080'/>
22:21:23:
22:21:23:  <!-- Folding Slots -->
22:21:23:  <slot id='1' type='GPU'>
22:21:23:    <cuda-index v='1'/>
22:21:23:    <gpu-index v='0'/>
22:21:23:    <opencl-index v='1'/>
22:21:23:  </slot>
22:21:23:</config>
22:21:23:WARNING:WU00:Slot ID 0 no longer exists and there are no other matching slots, dumping
22:21:23:WU00:Sending unit results: id:00 state:SEND error:DUMPED project:9009 run:659 clone:1 gen:126 core:0xa4 unit:0x0000008d664f2de453868a0c32745789
22:21:23:WU00:Connecting to 171.64.65.124:8080
22:21:24:WU00:Server responded WORK_ACK (400)
22:21:24:WU00:Cleaning up
22:21:24:WU01:FS01:Connecting to 171.67.108.200:80
22:21:24:WU01:FS01:Assigned to work server 171.67.108.52
22:21:24:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 171.67.108.52
22:21:24:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:24:WU01:FS01:Downloading 1.52MiB
22:21:27:WU01:FS01:Download complete
22:21:27:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9201 run:626 clone:3 gen:106 core:0x17 unit:0x000000a16652edc45399eea7e9ddf0c0
22:21:27:WU01:FS01:Downloading core from http://web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah
22:21:27:WU01:FS01:Connecting to web.stanford.edu:80
22:21:27:WU01:FS01:FahCore 17: Downloading 3.01MiB
22:21:31:WU01:FS01:FahCore 17: Download complete
22:21:31:WU01:FS01:Valid core signature
22:21:31:WU01:FS01:Unpacked 8.16MiB to cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17
22:21:31:WU01:FS01:Starting
22:21:31:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 01 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:31:WU01:FS01:Started FahCore on PID 3850
22:21:31:WU01:FS01:Core PID:3854
22:21:31:WU01:FS01:FahCore 0x17 started
22:21:32:WU01:FS01:0x17:*********************** Log Started 2014-12-20T22:21:31Z ***********************
22:21:32:WU01:FS01:0x17:Project: 9201 (Run 626, Clone 3, Gen 106)
22:21:32:WU01:FS01:0x17:Unit: 0x000000a16652edc45399eea7e9ddf0c0
22:21:32:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:32:WU01:FS01:0x17:Machine: 1
22:21:32:WU01:FS01:0x17:Reading tar file state.xml
22:21:32:WU01:FS01:0x17:Reading tar file system.xml
22:21:32:WU01:FS01:0x17:Reading tar file integrator.xml
22:21:32:WU01:FS01:0x17:Reading tar file core.xml
22:21:32:WU01:FS01:0x17:Digital signatures verified
22:21:32:WU01:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:32:WU01:FS01:0x17:Saving result file logfile_01.txt
22:21:32:WU01:FS01:0x17:Saving result file log.txt
22:21:32:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:32:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:32:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9201 run:626 clone:3 gen:106 core:0x17 unit:0x000000a16652edc45399eea7e9ddf0c0
22:21:32:WU01:FS01:Uploading 1.86KiB to 171.67.108.52
22:21:32:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:32:WU00:FS01:Connecting to 171.67.108.200:80
22:21:32:WU01:FS01:Upload complete
22:21:32:WU01:FS01:Server responded WORK_ACK (400)
22:21:32:WU01:FS01:Cleaning up
22:21:33:WU00:FS01:Assigned to work server 171.67.108.52
22:21:33:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 171.67.108.52
22:21:33:WU00:FS01:Connecting to 171.67.108.52:8080
22:21:33:WU00:FS01:Downloading 1.53MiB
22:21:36:Saving configuration to /etc/fahclient/config.xml
22:21:36:<config>
22:21:36:  <!-- Client Control -->
22:21:36:  <fold-anon v='true'/>
22:21:36:
22:21:36:  <!-- Folding Slot Configuration -->
22:21:36:  <gpu v='false'/>
22:21:36:
22:21:36:  <!-- Network -->
22:21:36:  <proxy v=':8080'/>
22:21:36:
22:21:36:  <!-- Folding Slots -->
22:21:36:  <slot id='1' type='GPU'>
22:21:36:    <cuda-index v='1'/>
22:21:36:    <gpu-index v='0'/>
22:21:36:    <opencl-index v='1'/>
22:21:36:  </slot>
22:21:36:</config>
22:21:36:WU00:FS01:Download complete
22:21:36:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9201 run:265 clone:2 gen:78 core:0x17 unit:0x000000816652edc45399e06f480e4522
22:21:36:WU00:FS01:Starting
22:21:36:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:36:WU00:FS01:Started FahCore on PID 3857
22:21:36:WU00:FS01:Core PID:3861
22:21:36:WU00:FS01:FahCore 0x17 started
22:21:36:WU00:FS01:0x17:*********************** Log Started 2014-12-20T22:21:36Z ***********************
22:21:36:WU00:FS01:0x17:Project: 9201 (Run 265, Clone 2, Gen 78)
22:21:36:WU00:FS01:0x17:Unit: 0x000000816652edc45399e06f480e4522
22:21:36:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:36:WU00:FS01:0x17:Machine: 1
22:21:36:WU00:FS01:0x17:Reading tar file state.xml
22:21:36:WU00:FS01:0x17:Reading tar file system.xml
22:21:36:WU00:FS01:0x17:Reading tar file integrator.xml
22:21:36:WU00:FS01:0x17:Reading tar file core.xml
22:21:36:WU00:FS01:0x17:Digital signatures verified
22:21:36:WU00:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:36:WU00:FS01:0x17:Saving result file logfile_01.txt
22:21:36:WU00:FS01:0x17:Saving result file log.txt
22:21:36:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:37:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:37:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9201 run:265 clone:2 gen:78 core:0x17 unit:0x000000816652edc45399e06f480e4522
22:21:37:WU00:FS01:Uploading 1.87KiB to 171.67.108.52
22:21:37:WU00:FS01:Connecting to 171.67.108.52:8080
22:21:37:WU00:FS01:Upload complete
22:21:37:WU00:FS01:Server responded WORK_ACK (400)
22:21:37:WU00:FS01:Cleaning up
22:21:37:WU01:FS01:Connecting to 171.67.108.200:80
22:21:38:WU01:FS01:Assigned to work server 171.67.108.52
22:21:38:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 171.67.108.52
22:21:38:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:38:WU01:FS01:Downloading 1.52MiB
22:21:40:WU01:FS01:Download complete
22:21:41:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9201 run:466 clone:4 gen:76 core:0x17 unit:0x000000706652edc45399e861591d2356
22:21:41:WU01:FS01:Starting
22:21:41:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 01 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:41:WU01:FS01:Started FahCore on PID 3864
22:21:41:WU01:FS01:Core PID:3868
22:21:41:WU01:FS01:FahCore 0x17 started
22:21:41:WU01:FS01:0x17:*********************** Log Started 2014-12-20T22:21:41Z ***********************
22:21:41:WU01:FS01:0x17:Project: 9201 (Run 466, Clone 4, Gen 76)
22:21:41:WU01:FS01:0x17:Unit: 0x000000706652edc45399e861591d2356
22:21:41:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:41:WU01:FS01:0x17:Machine: 1
22:21:41:WU01:FS01:0x17:Reading tar file state.xml
22:21:41:WU01:FS01:0x17:Reading tar file system.xml
22:21:41:WU01:FS01:0x17:Reading tar file integrator.xml
22:21:41:WU01:FS01:0x17:Reading tar file core.xml
22:21:41:WU01:FS01:0x17:Digital signatures verified
22:21:41:WU01:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:41:WU01:FS01:0x17:Saving result file logfile_01.txt
22:21:41:WU01:FS01:0x17:Saving result file log.txt
22:21:41:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:41:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:41:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9201 run:466 clone:4 gen:76 core:0x17 unit:0x000000706652edc45399e861591d2356
22:21:41:WU01:FS01:Uploading 1.86KiB to 171.67.108.52
22:21:41:WU01:FS01:Connecting to 171.67.108.52:8080
22:21:42:WU01:FS01:Upload complete
22:21:42:WU01:FS01:Server responded WORK_ACK (400)
22:21:42:WU01:FS01:Cleaning up
22:21:42:WU00:FS01:Connecting to 171.67.108.200:80
22:21:42:WU00:FS01:Assigned to work server 140.163.4.231
22:21:42:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 140.163.4.231
22:21:42:WU00:FS01:Connecting to 140.163.4.231:8080
22:21:42:WU00:FS01:Downloading 4.83MiB
22:21:45:WU00:FS01:Download complete
22:21:45:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:48 clone:7 gen:63 core:0x17 unit:0x0000007c538b3db753285d7aa08e7365
22:21:45:WU00:FS01:Starting
22:21:45:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:45:WU00:FS01:Started FahCore on PID 3871
22:21:45:WU00:FS01:Core PID:3875
22:21:45:WU00:FS01:FahCore 0x17 started
22:21:45:WU00:FS01:0x17:*********************** Log Started 2014-12-20T22:21:45Z ***********************
22:21:45:WU00:FS01:0x17:Project: 13001 (Run 48, Clone 7, Gen 63)
22:21:45:WU00:FS01:0x17:Unit: 0x0000007c538b3db753285d7aa08e7365
22:21:45:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:45:WU00:FS01:0x17:Machine: 1
22:21:45:WU00:FS01:0x17:Reading tar file state.xml
22:21:45:WU00:FS01:0x17:Reading tar file system.xml
22:21:46:WU00:FS01:0x17:Reading tar file integrator.xml
22:21:46:WU00:FS01:0x17:Reading tar file core.xml
22:21:46:WU00:FS01:0x17:Digital signatures verified
22:21:46:WU00:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:46:WU00:FS01:0x17:Saving result file logfile_01.txt
22:21:46:WU00:FS01:0x17:Saving result file log.txt
22:21:46:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:46:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:46:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:48 clone:7 gen:63 core:0x17 unit:0x0000007c538b3db753285d7aa08e7365
22:21:46:WU00:FS01:Uploading 1.87KiB to 140.163.4.231
22:21:46:WU00:FS01:Connecting to 140.163.4.231:8080
22:21:46:WU00:FS01:Upload complete
22:21:46:WU00:FS01:Server responded WORK_ACK (400)
22:21:46:WU00:FS01:Cleaning up
22:21:46:WU01:FS01:Connecting to 171.67.108.200:80
22:21:47:WU01:FS01:Assigned to work server 140.163.4.231
22:21:47:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF104 [GeForce GTX 460] from 140.163.4.231
22:21:47:WU01:FS01:Connecting to 140.163.4.231:8080
22:21:47:FS01:Finishing
22:21:47:WU01:FS01:Downloading 4.84MiB
22:21:49:WU01:FS01:Download complete
22:21:49:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13000 run:275 clone:1 gen:62 core:0x17 unit:0x0000006c538b3db7530fe97a0fbb95ed
22:21:49:WU01:FS01:Starting
22:21:49:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 01 -suffix 01 -version 704 -lifeline 3807 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
22:21:49:WU01:FS01:Started FahCore on PID 3878
22:21:49:WU01:FS01:Core PID:3882
22:21:49:WU01:FS01:FahCore 0x17 started
22:21:50:WU01:FS01:0x17:*********************** Log Started 2014-12-20T22:21:49Z ***********************
22:21:50:WU01:FS01:0x17:Project: 13000 (Run 275, Clone 1, Gen 62)
22:21:50:WU01:FS01:0x17:Unit: 0x0000006c538b3db7530fe97a0fbb95ed
22:21:50:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
22:21:50:WU01:FS01:0x17:Machine: 1
22:21:50:WU01:FS01:0x17:Reading tar file state.xml
22:21:50:WU01:FS01:0x17:Reading tar file system.xml
22:21:51:WU01:FS01:0x17:Reading tar file integrator.xml
22:21:51:WU01:FS01:0x17:Reading tar file core.xml
22:21:51:WU01:FS01:0x17:Digital signatures verified
22:21:51:WU01:FS01:0x17:ERROR:exception: Bad platformId size.
22:21:51:WU01:FS01:0x17:Saving result file logfile_01.txt
22:21:51:WU01:FS01:0x17:Saving result file log.txt
22:21:51:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
22:21:51:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:21:51:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13000 run:275 clone:1 gen:62 core:0x17 unit:0x0000006c538b3db7530fe97a0fbb95ed
22:21:51:WU01:FS01:Uploading 1.86KiB to 140.163.4.231
22:21:51:WU01:FS01:Connecting to 140.163.4.231:8080
22:21:51:WU01:FS01:Upload complete
22:21:51:WU01:FS01:Server responded WORK_ACK (400)
22:21:51:WU01:FS01:Cleaning up
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Failing all GPU Work Units

Post by Joe_H »

What version of the video drivers is installed for the GPU's? And did Windows run any updates in the intervening period, especially any updates that would have loaded a different video driver? The errors appear similar to those from a system that does not have the OpenCL support installed that goes along with the video drivers.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
compdewd
Posts: 165
Joined: Sat Jun 09, 2012 6:56 am
Hardware configuration: [1] Debian 8 64-bit: EVGA NVIDIA GTX 650 Ti, MSI NVIDIA GTX 460, AMD FX-8120
[2] Windows 7 64-bit: MSI NVIDIA GTX 460, AMD Phenom II X4
Location: Cincinnati, Ohio, USA
Contact:

Re: Failing all GPU Work Units

Post by compdewd »

Thanks for responding, Joe!

I am actually running Linux, but now that you mentioned it, I remember that I saw an NVIDIA driver security update before I left last week. I had installed the update and never thought twice about it. I have downgraded my driver back to what its previous version and I am onto folding again!

Thanks a lot for your help!

P.S. If it is worth anything to anyone, the new driver update for my Linux system was for packages: "nvidia-opencl-icd-331", "nvidia-331", and "libcuda-331". All were updated to version "331.113-0ubuntu0.0.4" which is the version that was causing all of my WUs to crash. It may have just been the OpenCL package that was causing trouble as Joe suspected, but I reverted all three packages back to version "331.38-0ubuntu7" because I didn't want to risk experiencing any incompatibility problems.
davidcoton
Posts: 1094
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Failing all GPU Work Units

Post by davidcoton »

There seems to have been some build problems with parts of the nVidia driver set and certain Linux kernels. I also installed that driver update on Ubuntu 14.04 (with whatever kernel) and ended up having to re-install from scratch. However I've now got 14.10 running with 331.113, although I'm not 100% confident that everything built properly it seems to work for FAH.
Image
Post Reply