13001 WU failure

Moderators: Site Moderators, FAHC Science Team

bfromcolo
Posts: 56
Joined: Fri Mar 01, 2013 1:12 am

13001 WU failure

Post by bfromcolo »

This system is running Mint 17, I have a 750ti and NVIDIA driver 343.22. All stock clocks. Been running 9201 WUs fine, this is the first 13001 I have seen and it failed with:

15:49:07:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 447.223 with threshold of 5

What does this error mean?



Code: Select all

*********************** Log Started 2014-10-04T15:45:26Z ***********************
15:45:26:************************* Folding@home Client *************************
15:45:26:    Website: http://folding.stanford.edu/
15:45:26:  Copyright: (c) 2009-2014 Stanford University
15:45:26:     Author: Joseph Coffland <[email protected]>
15:45:26:       Args: --child --lifeline 2647 /etc/fahclient/config.xml --run-as
15:45:26:             fahclient --pid-file=/var/run/fahclient.pid --daemon
15:45:26:     Config: /etc/fahclient/config.xml
15:45:26:******************************** Build ********************************
15:45:26:    Version: 7.4.4
15:45:26:       Date: Mar 4 2014
15:45:26:       Time: 12:02:38
15:45:26:    SVN Rev: 4130
15:45:26:     Branch: fah/trunk/client
15:45:26:   Compiler: GNU 4.4.7
15:45:26:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
15:45:26:             -fno-unsafe-math-optimizations -msse2
15:45:26:   Platform: linux2 3.2.0-1-amd64
15:45:26:       Bits: 64
15:45:26:       Mode: Release
15:45:26:******************************* System ********************************
15:45:26:        CPU: AMD Phenom(tm) II X6 1045T Processor
15:45:26:     CPU ID: AuthenticAMD Family 16 Model 10 Stepping 0
15:45:26:       CPUs: 6
15:45:26:     Memory: 7.80GiB
15:45:26:Free Memory: 6.92GiB
15:45:26:    Threads: POSIX_THREADS
15:45:26: OS Version: 3.13
15:45:26:Has Battery: false
15:45:26: On Battery: false
15:45:26: UTC Offset: -6
15:45:26:        PID: 2649
15:45:26:        CWD: /var/lib/fahclient
15:45:26:         OS: Linux 3.13.0-24-generic x86_64
15:45:26:    OS Arch: AMD64
15:45:26:       GPUs: 1
15:45:26:      GPU 0: NVIDIA:4 GM107 [GeForce GTX 750 Ti]
15:45:26:       CUDA: 5.0
15:45:26:CUDA Driver: 6050
15:45:26:***********************************************************************
15:45:26:<config>
15:45:26:  <!-- Client Control -->
15:45:26:  <fold-anon v='true'/>
15:45:26:
15:45:26:  <!-- Network -->
15:45:26:  <proxy v=':8080'/>
15:45:26:
15:45:26:  <!-- Slot Control -->
15:45:26:  <power v='full'/>
15:45:26:
15:45:26:  <!-- User Information -->
15:45:26:  <passkey v='********************************'/>
15:45:26:  <team v='37726'/>
15:45:26:  <user v='bfromcolo'/>
15:45:26:
15:45:26:  <!-- Folding Slots -->
15:45:26:  <slot id='1' type='GPU'/>
15:45:26:</config>
15:45:26:Switching to user fahclient
15:45:26:Trying to access database...
15:45:27:Successfully acquired database lock
15:45:27:Enabled folding slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti]
15:45:27:WU00:FS01:Connecting to 171.67.108.201:80
15:45:28:WU00:FS01:Assigned to work server 140.163.4.231
15:45:28:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
15:45:28:WU00:FS01:Connecting to 140.163.4.231:8080
15:45:29:WU00:FS01:Downloading 4.84MiB
15:45:35:WU00:FS01:Download 71.05%
15:45:37:WU00:FS01:Download complete
15:45:37:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:378 clone:1 gen:68 core:0x17 unit:0x00000096538b3db75328bad892c4b6cd
15:45:38:WU00:FS01:Starting
15:45:38:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 2649 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
15:45:38:WU00:FS01:Started FahCore on PID 2667
15:45:38:WU00:FS01:Core PID:2671
15:45:38:WU00:FS01:FahCore 0x17 started
15:45:38:WU00:FS01:0x17:*********************** Log Started 2014-10-04T15:45:38Z ***********************
15:45:38:WU00:FS01:0x17:Project: 13001 (Run 378, Clone 1, Gen 68)
15:45:38:WU00:FS01:0x17:Unit: 0x00000096538b3db75328bad892c4b6cd
15:45:38:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
15:45:38:WU00:FS01:0x17:Machine: 1
15:45:38:WU00:FS01:0x17:Reading tar file state.xml
15:45:39:WU00:FS01:0x17:Reading tar file system.xml
15:45:39:WU00:FS01:0x17:Reading tar file integrator.xml
15:45:39:WU00:FS01:0x17:Reading tar file core.xml
15:45:39:WU00:FS01:0x17:Digital signatures verified
15:49:07:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 447.223 with threshold of 5
15:49:07:WU00:FS01:0x17:Saving result file logfile_01.txt
15:49:07:WU00:FS01:0x17:Saving result file badStateCheckpoint_57114166
15:49:08:WU00:FS01:0x17:Saving result file badStateForceGroup0_57114166Core.xml
15:49:11:WU00:FS01:0x17:Saving result file badStateForceGroup0_57114166Ref.xml
15:49:14:WU00:FS01:0x17:Saving result file badStateForceGroup1_57114166Core.xml
15:49:16:WU00:FS01:0x17:Saving result file badStateForceGroup1_57114166Ref.xml
15:49:19:WU00:FS01:0x17:Saving result file badStateForceGroup2_57114166Core.xml
15:49:21:WU00:FS01:0x17:Saving result file badStateForceGroup2_57114166Ref.xml
15:49:23:WU00:FS01:0x17:Saving result file log.txt
15:49:23:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
15:49:24:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:49:24:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:378 clone:1 gen:68 core:0x17 unit:0x00000096538b3db75328bad892c4b6cd
15:49:24:WU00:FS01:Uploading 24.64MiB to 140.163.4.231
15:49:24:WU00:FS01:Connecting to 140.163.4.231:8080
Mod edit: Please use Code tags instead of Quote tags around log files
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 13001 WU failure

Post by Joe_H »

The error indicates that you may have received a bad WU. So far no one has completed this WU, though one person did get about 25% of the way through it.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: 13001 WU failure

Post by Breach »

I think this is a more general problem following some changes done today to the AS. You have a Maxwell like me and after the change we're being given Core 17 WUs which error out (or even crash the core) - see here:
viewtopic.php?f=18&t=26807&start=15

I don't know whether this is the case with all Core 17 WUs and Maxwells or just some projects. From what I understand it's an old problem which emerged again with the new AS and the recent changes. After failing all WUs I have received I stopped GPU folding for now (at least with Core 15 WUs we could do something ;-)
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13001 WU failure

Post by bruce »

Breach wrote:I think this is a more general problem following some changes done today to the AS. You have a Maxwell like me and after the change we're being given Core 17 WUs which error out (or even crash the core) - see here:
viewtopic.php?f=18&t=26807&start=15

I don't know whether this is the case with all Core 17 WUs and Maxwells or just some projects. From what I understand it's an old problem which emerged again with the new AS and the recent changes. After failing all WUs I have received I stopped GPU folding for now (at least with Core 15 WUs we could do something ;-)
The Maxwell most definitely are more reliable with the latest drivers that with older versions. I'm not sure if that's significant for FahCore_17 but it's worth considering.

While changes to the AS code have altered the assignment probabilities for specific projects, actual changes may not match with our perception of how particular projects behave.
Kjetil
Posts: 175
Joined: Sat Apr 14, 2012 5:56 pm
Location: Stavanger Norway

Re: 13001 WU failure

Post by Kjetil »

Latest Short Lived Branch version: 343.22. He has the last drivers for linux. I have the same problems om win. It is As not the drivers?
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: 13001 WU failure

Post by Breach »

bruce, right now all Core 17 WUs assigned to Maxwells seem to fail (with latest drivers) - in my case about 10 out of 10. I posted here as I don't think this here is an isolated incident.
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
bfromcolo
Posts: 56
Joined: Fri Mar 01, 2013 1:12 am

Re: 13001 WU failure

Post by bfromcolo »

My system runs 9201 fine, but overnight it stopped processing after 10 consecutive 13001 failures. Will any flag make these 9201 more likely?

Code: Select all

23:33:52:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 454.735 with threshold of 5
23:34:09:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:38:00:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 453.528 with threshold of 5
23:38:18:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:42:10:WU02:FS01:0x17:ERROR:exception: Force RMSE error of 446.944 with threshold of 5
23:42:28:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:46:29:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 453.412 with threshold of 5
23:46:47:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:50:41:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 451.321 with threshold of 5
23:50:59:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:54:50:WU02:FS01:0x17:ERROR:exception: Force RMSE error of 452.633 with threshold of 5
23:55:07:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:59:01:WU03:FS01:0x17:ERROR:exception: Force RMSE error of 455.484 with threshold of 5
23:59:17:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:03:25:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 456.956 with threshold of 5
00:03:42:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:07:31:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 450.132 with threshold of 5
00:07:48:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:11:39:WU02:FS01:0x17:ERROR:exception: Force RMSE error of 452.811 with threshold of 5
00:11:56:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
snapshot
Posts: 132
Joined: Thu Apr 09, 2009 7:25 pm
Location: Wiltshire, UK

Re: 13001 WU failure

Post by snapshot »

I've just had the same problem:

Code: Select all

18:57:26:WU02:FS00:0x17:ERROR:exception: Force RMSE error of 455.059 with threshold of 5
18:57:26:WU02:FS00:0x17:Saving result file logfile_01.txt
18:57:26:WU02:FS00:0x17:Saving result file log.txt
18:57:26:WU02:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
18:57:26:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:57:26:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:13001 run:62 clone:3 gen:11 core:0x17 unit:0x0000001e538b3db753286153604b81f0
18:57:26:WU02:FS00:Uploading 2.30KiB to 140.163.4.231
18:57:26:WU02:FS00:Connecting to 140.163.4.231:8080
18:57:26:WU02:FS00:Upload complete
18:57:26:WU02:FS00:Server responded WORK_ACK (400)
18:57:26:WU02:FS00:Cleaning up
Nvidia drivers 340.52 under W7 Pro 64. I'll try the 344.11 drivers on my test box but I wasn't using them because they were so poor on 9201s.
Last edited by snapshot on Mon Oct 06, 2014 5:44 am, edited 1 time in total.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 13001 WU failure

Post by 7im »

What version of fahcore?

On what kind of hardware. Need more info to help you.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
snapshot
Posts: 132
Joined: Thu Apr 09, 2009 7:25 pm
Location: Wiltshire, UK

Re: 13001 WU failure

Post by snapshot »

FAHcore is version 52. Hardware is i7-3770, 16GB RAM, GTX750ti.

Just had another one:

Code: Select all

20:06:01:WU02:FS00:0x17:ERROR:exception: Force RMSE error of 450.68 with threshold of 5
20:06:01:WU02:FS00:0x17:Saving result file logfile_01.txt
20:06:01:WU02:FS00:0x17:Saving result file log.txt
20:06:01:WU02:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
20:06:01:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:06:01:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:13000 run:129 clone:0 gen:50 core:0x17 unit:0x00000066538b3db7530fc0694e857c15
20:06:01:WU02:FS00:Uploading 2.31KiB to 140.163.4.231
20:06:01:WU02:FS00:Connecting to 140.163.4.231:8080
20:06:02:WU02:FS00:Upload complete
20:06:02:WU02:FS00:Server responded WORK_ACK (400)
20:06:02:WU02:FS00:Cleaning up
This is a system that was 100% stable with 9201, 8108 and 762x WUs and has not had any hardware changes or any extra software installed other than MS updates as I've been away from home for the last four days.
This is preventing me folding with my GPU and, if I can only use the CPU, then I'm just not going to bother.
gwildperson
Posts: 450
Joined: Tue Dec 04, 2007 8:36 pm

Re: 13001 WU failure

Post by gwildperson »

snapshot wrote: Nvidia drivers 304.52 under W7 Pro 64. I'll try the 344.11 drivers on my test box but I wasn't using them because they were so poor on 9201s.
Why 344.11, when 344.16 was released 5 days later?
Kjetil
Posts: 175
Joined: Sat Apr 14, 2012 5:56 pm
Location: Stavanger Norway

Re: 13001 WU failure

Post by Kjetil »

344.16 is for ONLY 970 and 980.
Razzaa
Posts: 2
Joined: Mon Oct 06, 2014 1:41 am

Re: 13001 WU failure

Post by Razzaa »

I am having the exact same issues. I have tried numerous things to fix it but now my GPU wont fold at all.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13001 WU failure

Post by bruce »

Razzaa wrote:I am having the exact same issues. I have tried numerous things to fix it but now my GPU wont fold at all.
Please report which GPU you have and which drivers you are running.
Barryfla
Posts: 5
Joined: Sat Sep 27, 2014 6:10 pm

Re: 13001 WU failure

Post by Barryfla »

I am having the same problem as others stated. My gtx 750ti won't fold, driver version 334.89, win 7, amd fx6350 6core and 16gigs ram.

14:18:11:WU01:FS01:Connecting to 171.67.108.201:80
14:18:12:WU01:FS01:Assigned to work server 140.163.4.231
14:18:12:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
14:18:12:WU01:FS01:Connecting to 140.163.4.231:8080
14:18:12:WU01:FS01:Downloading 4.84MiB
14:18:17:WU01:FS01:Download complete
14:18:17:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13001 run:48 clone:4 gen:34 core:0x17 unit:0x00000048538b3db753285d6453ddcf7a
14:18:17:WU01:FS01:Starting
14:18:17:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Barry/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 01 -suffix 01 -version 704 -lifeline 13732 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
14:18:17:WU01:FS01:Started FahCore on PID 12816
14:18:17:WU01:FS01:Core PID:5296
14:18:17:WU01:FS01:FahCore 0x17 started
14:18:18:WU01:FS01:0x17:*********************** Log Started 2014-10-06T14:18:18Z ***********************
14:18:18:WU01:FS01:0x17:Project: 13001 (Run 48, Clone 4, Gen 34)
14:18:18:WU01:FS01:0x17:Unit: 0x00000048538b3db753285d6453ddcf7a
14:18:18:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
14:18:18:WU01:FS01:0x17:Machine: 1
14:18:18:WU01:FS01:0x17:Reading tar file state.xml
14:18:19:WU01:FS01:0x17:Reading tar file system.xml
14:18:20:WU01:FS01:0x17:Reading tar file integrator.xml
14:18:20:WU01:FS01:0x17:Reading tar file core.xml
14:18:20:WU01:FS01:0x17:Digital signatures verified
14:18:21:WU01:FS01:0x17:Folding@home GPU core17
14:18:21:WU01:FS01:0x17:Version 0.0.52
14:22:20:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 455.674 with threshold of 5
14:22:20:WU01:FS01:0x17:Saving result file logfile_01.txt
14:22:20:WU01:FS01:0x17:Saving result file log.txt
14:22:20:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
14:22:21:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
14:22:21:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13001 run:48 clone:4 gen:34 core:0x17 unit:0x00000048538b3db753285d6453ddcf7a
Last edited by Barryfla on Tue Oct 07, 2014 12:07 am, edited 1 time in total.
Post Reply