Page 1 of 1

Trouble folding in linux

Posted: Mon Feb 23, 2015 5:35 am
by james888
I am trying to fold in linux and am having trouble. I have folded in linux without issue. I would love to get some help with the issue. Log is below. Hardware is an asrock h97m itx/ac, with a pentium g3258, and a nvidia 750ti.

I have googled BAD_WORK_UNIT (114 = 0x72) and it seems to be related to stability issues. I have done many stability test in windows on this same hardware and it was fine. I can fold in windows with out any errors. I uninstalled and reinstalled the nvidia drivers and the error persists.

Code: Select all

05:10:45:WU01:FS00:0x17:*********************** Log Started 2015-02-23T05:10:44Z ***********************
05:10:45:WU01:FS00:0x17:Project: 13000 (Run 672, Clone 9, Gen 76)
05:10:45:WU01:FS00:0x17:Unit: 0x0000008d538b3db75310598b20ce5ec0
05:10:45:WU01:FS00:0x17:CPU: 0x00000000000000000000000000000000
05:10:45:WU01:FS00:0x17:Machine: 0
05:10:45:WU01:FS00:0x17:Reading tar file state.xml
05:10:45:WU01:FS00:0x17:Reading tar file system.xml
05:10:45:WU01:FS00:0x17:Reading tar file integrator.xml
05:10:45:WU01:FS00:0x17:Reading tar file core.xml
05:10:45:WU01:FS00:0x17:Digital signatures verified
05:10:50:WU02:FS00:Upload 30.80%
05:10:56:WU02:FS00:Upload 45.31%
05:11:02:WU02:FS00:Upload 60.84%
05:11:08:WU02:FS00:Upload 76.11%
05:11:14:WU02:FS00:Upload 90.37%
05:11:19:WU02:FS00:Upload complete
05:11:19:WU02:FS00:Server responded WORK_ACK (400)
05:11:19:WU02:FS00:Cleaning up
05:13:08:WU01:FS00:0x17:ERROR:exception: Force RMSE error of 452.982 with threshold of 5
05:13:08:WU01:FS00:0x17:Saving result file logfile_01.txt
05:13:08:WU01:FS00:0x17:Saving result file badStateCheckpoint_517940078
05:13:08:WU01:FS00:0x17:Saving result file badStateForceGroup0_517940078Core.xml
05:13:10:WU01:FS00:0x17:Saving result file badStateForceGroup0_517940078Ref.xml
05:13:13:WU01:FS00:0x17:Saving result file badStateForceGroup1_517940078Core.xml
05:13:15:WU01:FS00:0x17:Saving result file badStateForceGroup1_517940078Ref.xml
05:13:17:WU01:FS00:0x17:Saving result file badStateForceGroup2_517940078Core.xml
05:13:19:WU01:FS00:0x17:Saving result file badStateForceGroup2_517940078Ref.xml
05:13:20:WU01:FS00:0x17:Saving result file log.txt
05:13:20:WU01:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
05:13:21:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:13:21:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:13000 run:672 clone:9 gen:76 core:0x17 unit:0x0000008d538b3db75310598b20ce5ec0
05:13:21:WU01:FS00:Uploading 24.54MiB to 140.163.4.231
05:13:21:WU01:FS00:Connecting to 140.163.4.231:8080
05:13:21:WU02:FS00:Connecting to 171.67.108.200:80
05:13:22:WU02:FS00:Assigned to work server 140.163.4.233
05:13:22:WU02:FS00:Requesting new work unit for slot 00: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.233
05:13:22:WU02:FS00:Connecting to 140.163.4.233:8080
05:13:22:WU02:FS00:Downloading 4.93MiB
05:13:24:WU02:FS00:Download complete
05:13:24:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:10469 run:0 clone:157 gen:91 core:0x17 unit:0x00000091538b3db9538f3c536c518b66
05:13:24:WU02:FS00:Starting
05:13:24:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 02 -suffix 01 -version 704 -lifeline 1270 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
05:13:24:WU02:FS00:Started FahCore on PID 29253
05:13:24:Started thread 23 on PID 1270
05:13:24:WU02:FS00:Core PID:29257
05:13:24:WU02:FS00:FahCore 0x17 started

Re: Trouble folding in linux

Posted: Mon Feb 23, 2015 7:21 am
by bruce
BAD_WORK_UNIT can happen because of stability issues, but a percentage of WUs are actually bad. There's no dependable way to determine which it is except to retry the WU on somebody else's hardware.

In this case, Project: 13000 (Run 672, Clone 9, Gen 76) was attempted by you and one other person and both failed rather quickly, indicating that there's probably nothing wrong with your system. Is the next WU processing normally?

Re: Trouble folding in linux

Posted: Mon Feb 23, 2015 7:53 am
by rwh202
That looks like the same old Maxwell / driver issue - what version are you running?
The latest drivers (346.35) seem to have it fixed so Maxwell can now again fold non-9201 core-17 WUs on linux (13000/13001 are only being sent out to 750ti but I think the 970/980 can process them too)

Re: Trouble folding in linux

Posted: Mon Feb 23, 2015 3:29 pm
by james888
bruce wrote:BAD_WORK_UNIT can happen because of stability issues, but a percentage of WUs are actually bad. There's no dependable way to determine which it is except to retry the WU on somebody else's hardware.

In this case, Project: 13000 (Run 672, Clone 9, Gen 76) was attempted by you and one other person and both failed rather quickly, indicating that there's probably nothing wrong with your system. Is the next WU processing normally?
These errors have been going on for days. The client stops itself, but I keep telling it to try again. I can not fold at all on linux, it just gives me that error shown until the client gives up.
rwh202 wrote:That looks like the same old Maxwell / driver issue - what version are you running?
The latest drivers (346.35) seem to have it fixed so Maxwell can now again fold non-9201 core-17 WUs on linux (13000/13001 are only being sent out to 750ti but I think the 970/980 can process them too)
I installed 346.35 from the xorg edgers ppa. I was folding before on nvidia 343 drivers but those are not available anymore.

Re: Trouble folding in linux

Posted: Mon Feb 23, 2015 10:07 pm
by bruce
james888 wrote:These errors have been going on for days. The client stops itself, but I keep telling it to try again. I can not fold at all on linux, it just gives me that error shown until the client gives up.
At that time, I only checked one WU. Since it was completed with an error, it was issued to somebody else and you would have been assigned a different WU. Repeated failures on DIFFERENT WUs does indicate a problem with your system, whether that's bad drivers or excessive overclocking or hardware that's actually defective.