Page 1 of 1
project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x0000004
Posted: Mon May 05, 2014 12:00 pm
by Devlin85
03:02:50:WU03:FS00:0x17:Bad State detected... attempting to resume from last good checkpoint
03:02:50:WU03:FS00:0x17:Max number of retries reached. Aborting.
03:02:50:WU03:FS00:0x17:ERROR:exception: Max Retries Reached
03:02:50:WU03:FS00:0x17:Saving result file logfile_01.txt
03:02:50:WU03:FS00:0x17:Saving result file log.txt
03:02:50:WU03:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
03:02:51:WARNING:WU03:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:02:51:WU03:FS00:Sending unit results: id:03 state:SEND error:FAULTY project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000000430a3b1e81533f581756f6ca6f
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 1:31 pm
by PantherX
The WU was successfully completed by another donor so it isn't a bad one.
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 4:51 pm
by bruce
0x0000004 indicates that Windows discovered a memory error. I recommend thorough diagnostics of main RAM and then reduce RAM overclocking if you don't find a bad stick.
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 5:04 pm
by Devlin85
It's 9101 again.. Just got another one.. Actually locked my computer afterwards, I'm gonna run some memtests, but the 13000, 13001, and 9408's run and never seem to fail. just these 9101's..
Project: 9101 (Run 790, Clone 0, Gen 65)
Unit: 0x000000440a3b1e81533f77b99203944a
CPU: 0x00000000000000000000000000000000
Machine: 0
Reading tar file state.xml
Reading tar file system.xml
Reading tar file integrator.xml
Reading tar file core.xml
Digital signatures verified
Folding@home GPU core17
Version 0.0.52
Completed 0 out of 2500000 steps (0%)
Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
Completed 25000 out of 2500000 steps (1%)
Completed 50000 out of 2500000 steps (2%)
Bad State detected... attempting to resume from last good checkpoint
Completed 25000 out of 2500000 steps (1%)
Completed 50000 out of 2500000 steps (2%)
Bad State detected... attempting to resume from last good checkpoint
Completed 25000 out of 2500000 steps (1%)
Completed 50000 out of 2500000 steps (2%)
Bad State detected... attempting to resume from last good checkpoint
Max number of retries reached. Aborting.
ERROR:exception: Max Retries Reached
Saving result file logfile_01.txt
Saving result file log.txt
Folding@home Core Shutdown: BAD_WORK_UNIT
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 5:16 pm
by Devlin85
Ran MemtestG80
Final error count after 100 iterations over 128 MiB of GPU memory: 0 errors
Also did at 1GB:
Final error count after 50 iterations over 1024 MiB of GPU memory: 0 errors
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 6:24 pm
by bruce
It's not a GPU memory error, it's a Main RAM memory detected by Windows in the CPU application FahCore_17, not in the GPU. (WIndows doesn't manage GPU memory). Run memtest86*
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 7:55 pm
by Devlin85
bruce wrote:It's not a GPU memory error, it's a Main RAM memory detected by Windows in the CPU application FahCore_17, not in the GPU. (WIndows doesn't manage GPU memory). Run memtest86*
Ran a couple memtests.. oddly it looks like the XMP profile (Standard, not OC) is to blame.. didn't produce any errors, just flat out froze. Went in and set everything manually, passed all the tests without issue. Fingers crossed I'm back in business here. I noticed it was having issues and paused it so it still has the same 9101 WU to process, so I should find out soon.
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 8:02 pm
by Devlin85
Well either it didn't like the fact I paused it or it's still something else.. but it got to the same spot and bailed on the WU, requested new one. No crash this time though, Went right back to processing the new WU.
19:46:44:WU01:FS00:0x17:Completed 0 out of 2500000 steps (0%)
19:46:44:WU01:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:47:11:Started thread 10 on PID 4412
19:48:31:WU00:FS02:0x17:Completed 0 out of 2000000 steps (0%)
19:48:31:WU00:FS02:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:55:52:WU01:FS00:0x17:Completed 25000 out of 2500000 steps (1%)
19:56:12:WU00:FS02:0x17:Completed 20000 out of 2000000 steps (1%)
19:57:22:WU01:FS00:0x17:ERROR:exception: The periodic box size has decreased to less than twice the nonbonded cutoff.
19:57:22:WU01:FS00:0x17:Saving result file logfile_01.txt
19:57:22:WU01:FS00:0x17:Saving result file log.txt
19:57:22:WU01:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
19:57:22:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:57:22:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:9101 run:790 clone:0 gen:65 core:0x17 unit:0x000000440a3b1e81533f77b99203944a
19:57:22:WU01:FS00:Uploading 2.88KiB to 171.64.65.93
19:57:22:WU01:FS00:Connecting to 171.64.65.93:8080
19:57:23:WU02:FS00:Connecting to assign-GPU.stanford.edu:80
19:57:23:WU01:FS00:Upload complete
19:57:23:WU01:FS00:Server responded WORK_ACK (400)
19:57:23:WU01:FS00:Cleaning up
19:57:23:WU02:FS00:News: Welcome to Folding@Home
19:57:23:WU02:FS00:Assigned to work server 171.64.65.93
19:57:23:WU02:FS00:Requesting new work unit for slot 00: READY gpu:0:GK104 [GeForce GTX 760] from 171.64.65.93
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 8:17 pm
by Devlin85
Just bombed another 9101.. but this time/last time it gives an error.. ERROR:exception: First periodic box vector must be parallel to x. & ERROR:exception: The periodic box size has decreased to less than twice the nonbonded cutoff.
20:11:08:WU00:FS02:0x17:Completed 60000 out of 2000000 steps (3%)
20:11:53:WU02:FS00:0x17:ERROR:exception: First periodic box vector must be parallel to x.
20:11:53:WU02:FS00:0x17:Saving result file logfile_01.txt
20:11:53:WU02:FS00:0x17:Saving result file log.txt
20:11:53:WU02:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
20:11:54:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:11:54:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9101 run:757 clone:0 gen:69 core:0x17 unit:0x0000004d0a3b1e81533f74c0698268cb
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 9:03 pm
by PantherX
What GPU are you using and what driver version? Please do post the initial section of the log file so we can see the system configuration and F@H settings.
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 9:43 pm
by Devlin85
Code: Select all
*********************** Log Started 2014-05-05T19:45:49Z ***********************
19:45:49:************************* Folding@home Client *************************
19:45:49: Website: http://folding.stanford.edu/
19:45:49: Copyright: (c) 2009-2013 Stanford University
19:45:49: Author: Joseph Coffland <[email protected]>
19:45:49: Args:
19:45:49: Config: C:/ProgramData/FAHClient/config.xml
19:45:49:******************************** Build ********************************
19:45:49: Version: 7.3.6
19:45:49: Date: Feb 18 2013
19:45:49: Time: 15:25:17
19:45:49: SVN Rev: 3923
19:45:49: Branch: fah/trunk/client
19:45:49: Compiler: Intel(R) C++ MSVC 1500 mode 1200
19:45:49: Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
19:45:49: /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
19:45:49: Platform: win32 XP
19:45:49: Bits: 32
19:45:49: Mode: Release
19:45:49:******************************* System ********************************
19:45:49: CPU: Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz
19:45:49: CPU ID: GenuineIntel Family 6 Model 62 Stepping 4
19:45:49: CPUs: 8
19:45:49: Memory: 15.94GiB
19:45:49: Free Memory: 14.34GiB
19:45:49: Threads: WINDOWS_THREADS
19:45:49: Has Battery: false
19:45:49: On Battery: false
19:45:49: UTC offset: -4
19:45:49: PID: 4412
19:45:49: CWD: C:/ProgramData/FAHClient
19:45:49: OS: Windows 8 Evolution 2014 64-Bit
19:45:49: OS Arch: AMD64
19:45:49: GPUs: 2
19:45:49: GPU 0: NVIDIA:3 GK104 [GeForce GTX 760]
19:45:49: GPU 1: NVIDIA:3 GK104 [GeForce GTX 760]
19:45:49: CUDA: 3.0
19:45:49: CUDA Driver: 5050
19:45:49:Win32 Service: false
19:45:49:***********************************************************************
19:45:49:<config>
19:45:49: <service-description v='Folding@home Client'/>
19:45:49: <service-restart v='true'/>
19:45:49: <service-restart-delay v='5000'/>
19:45:49:
19:45:49: <!-- Client Control -->
19:45:49: <client-threads v='4'/>
19:45:49: <cycle-rate v='4'/>
19:45:49: <cycles v='-1'/>
19:45:49: <data-directory v='.'/>
19:45:49: <disable-sleep-when-active v='true'/>
19:45:49: <exec-directory v='C:\Program Files (x86)\FAHClient'/>
19:45:49: <exit-when-done v='false'/>
19:45:49: <fold-anon v='false'/>
19:45:49: <open-web-control v='false'/>
19:45:49:
19:45:49: <!-- Configuration -->
19:45:49: <config-rotate v='true'/>
19:45:49: <config-rotate-dir v='configs'/>
19:45:49: <config-rotate-max v='16'/>
19:45:49:
19:45:49: <!-- Debugging -->
19:45:49: <assignment-servers>
19:45:49: assign3.stanford.edu:8080 assign4.stanford.edu:80
19:45:49: </assignment-servers>
19:45:49: <capture-directory v='capture'/>
19:45:49: <capture-on-error v='false'/>
19:45:49: <capture-packets v='false'/>
19:45:49: <capture-requests v='false'/>
19:45:49: <capture-responses v='false'/>
19:45:49: <capture-sockets v='false'/>
19:45:49: <debug-sockets v='false'/>
19:45:49: <exception-locations v='true'/>
19:45:49: <gpu-assignment-servers>
19:45:49: assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
19:45:49: </gpu-assignment-servers>
19:45:49: <stack-traces v='false'/>
19:45:49:
19:45:49: <!-- Error Handling -->
19:45:49: <max-slot-errors v='5'/>
19:45:49: <max-unit-errors v='5'/>
19:45:49:
19:45:49: <!-- Folding Core -->
19:45:49: <checkpoint v='30'/>
19:45:49: <core-dir v='cores'/>
19:45:49: <core-priority v='low'/>
19:45:49: <cpu-affinity v='false'/>
19:45:49: <cpu-usage v='100'/>
19:45:49: <gpu-usage v='100'/>
19:45:49: <no-assembly v='false'/>
19:45:49:
19:45:49: <!-- Folding Slot Configuration -->
19:45:49: <cause v='ANY'/>
19:45:49: <client-subtype v='STDCLI'/>
19:45:49: <client-type v='advanced'/>
19:45:49: <cpu-species v='X86_PENTIUM_II'/>
19:45:49: <cpu-type v='AMD64'/>
19:45:49: <cpus v='-1'/>
19:45:49: <cuda-index v='0'/>
19:45:49: <gpu v='true'/>
19:45:49: <max-packet-size v='normal'/>
19:45:49: <opencl-index v='0'/>
19:45:49: <os-species v='UNKNOWN'/>
19:45:49: <os-type v='WIN32'/>
19:45:49: <power v='full'/>
19:45:49: <project-key v='0'/>
19:45:49: <smp v='true'/>
19:45:49:
19:45:49: <!-- Process Control -->
19:45:49: <child v='false'/>
19:45:49: <daemon v='false'/>
19:45:49: <pid v='false'/>
19:45:49: <pid-file v='Folding@home Client.pid'/>
19:45:49: <respawn v='false'/>
19:45:49: <service v='false'/>
19:45:49:
19:45:49: <!-- Remote Command Server -->
19:45:49: <command-address v='0.0.0.0'/>
19:45:49: <command-allow-no-pass v='127.0.0.1'/>
19:45:49: <command-deny-no-pass v='0/0'/>
19:45:49: <command-port v='36330'/>
19:45:49:
19:45:49: <!-- Slot Control -->
19:45:49: <idle v='false'/>
19:45:49: <max-shutdown-wait v='60'/>
19:45:49: <pause-on-battery v='false'/>
19:45:49: <pause-on-start v='false'/>
19:45:49:
19:45:49: <!-- Web Server -->
19:45:49: <session-timeout v='3600'/>
19:45:49: <web-allow v='127.0.0.1'/>
19:45:49: <web-deny v='0/0'/>
19:45:49:
19:45:49: <!-- Work Unit Control -->
19:45:49: <dump-after-deadline v='true'/>
19:45:49: <max-queue v='16'/>
19:45:49: <max-units v='0'/>
19:45:49: <next-unit-percentage v='98'/>
19:45:49:
19:45:49: <!-- Folding Slots -->
19:45:49: <slot id='2' type='GPU'>
19:45:49: <client-type v='beta'/>
19:45:49: <gpu-index v='1'/>
19:45:49: </slot>
19:45:49: <slot id='0' type='GPU'/>
19:45:49:</config>
Mod edit: Added Code tags to log
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 10:11 pm
by Devlin85
The one having the issue is slot 0, the one set to handle beta is working fine. I made both my slots beta now, it downloaded a new WU (still 9101) and is cruising through it right now.. at 23%.
Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000
Posted: Mon May 05, 2014 11:10 pm
by P5-133XL
You can choose to run closed beta and not be a member of the beta team, but note that all support of beta is done in the beta forums and specifically not in the general forums. You will need to be a beta team member to make any post in the beta forums. You can
join the beta team, but you also must willing to accept all the duties and responsibilities of a beta tester.