Page 2 of 2
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Sat Nov 03, 2012 10:08 pm
by Rolo
I seemed to have done it all.
To not hijack this thread, I posted the particulars here: viewtopic.php?f=80&t=22874
To get back on topic, does anyone know if there is a log parser that can list all project runs and their outcomes? I'd like to get a list of all WUs I've run and their outcomes.
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Sat Nov 03, 2012 11:07 pm
by Napoleon
Try FahWatch, viewtopic.php?f=14&t=20391&p=205051&hilit=fahwatch#p205051.
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Fri Nov 30, 2012 6:01 pm
by aoeu
I have also experienced a problem with a 5769 WU (13, 297, 20). It sits at 0% after several days. I tried to 'finish' the unit hoping that it would go away. It didn't. I don't think it will either as it reports knowing that it expired a week ago. I just updated FAH to 7.2.9 hoping that it would go away and it's still hanging. How do I delete this unit?
Peace?
aoeu
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Fri Nov 30, 2012 6:34 pm
by Joe_H
Do you happen to have a Fermi based GPU? If so,
this topic describes the problem and the fix. Last week non-Fermi WU's were assigned to Fermi clients and would not fold. If that is not your problem, please post the beginning of your log showing the System information and part of the log showing the beginning of the WU being processed and any errors reported. Information on how to post from the log is
here.
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Fri Nov 30, 2012 6:48 pm
by aoeu
Does this help?
Code: Select all
*********************** Log Started 2012-11-30T17:53:58Z ***********************
17:53:58:************************* Folding@home Client *************************
17:53:58: Website: http://folding.stanford.edu/
17:53:58: Copyright: (c) 2009-2012 Stanford University
17:53:58: Author: Joseph Coffland <[email protected]>
17:53:58: Args: --lifeline 4052 --command-port=36330
17:53:58: Config: C:/Users/aoeu/AppData/Roaming/FAHClient/config.xml
17:53:58:******************************** Build ********************************
17:53:58: Version: 7.2.9
17:53:58: Date: Oct 3 2012
17:53:58: Time: 18:05:48
17:53:58: SVN Rev: 3578
17:53:58: Branch: fah/trunk/client
17:53:58: Compiler: Intel(R) C++ MSVC 1500 mode 1200
17:53:58: Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
17:53:58: /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
17:53:58: Platform: win32 XP
17:53:58: Bits: 32
17:53:58: Mode: Release
17:53:58:******************************* System ********************************
17:53:58: CPU: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz
17:53:58: CPU ID: GenuineIntel Family 6 Model 30 Stepping 5
17:53:58: CPUs: 4
17:53:58: Memory: 7.99GiB
17:53:58: Free Memory: 6.00GiB
17:53:58: Threads: WINDOWS_THREADS
17:53:58: On Battery: false
17:53:58: UTC offset: -5
17:53:58: PID: 33108
17:53:58: CWD: C:/Users/aoeu/AppData/Roaming/FAHClient
17:53:58: OS: Windows 7 Professional
17:53:58: OS Arch: AMD64
17:53:58: GPUs: 2
17:53:58: GPU 0: NVIDIA:2 GF116 [GeForce GTS 450]
17:53:58: GPU 1: NVIDIA:2 GF116 [GeForce GTS 450]
17:53:58: CUDA: 2.1
17:53:58: CUDA Driver: 5000
17:53:58:Win32 Service: false
17:53:58:***********************************************************************
17:53:58:<config>
17:53:58: <!-- Folding Slot Configuration -->
17:53:58: <gpu v='true'/>
17:53:58:
17:53:58: <!-- Network -->
17:53:58: <proxy v=':8080'/>
17:53:58:
17:53:58: <!-- User Information -->
17:53:58: <passkey v='********************************'/>
17:53:58: <team v='48083'/>
17:53:58: <user v='aoeu'/>
17:53:58:
17:53:58: <!-- Folding Slots -->
17:53:58: <slot id='0' type='GPU'/>
17:53:58: <slot id='1' type='GPU'/>
17:53:58: <slot id='2' type='SMP'/>
17:53:58:</config>
17:53:58:Connecting to assign-GPU.stanford.edu:80
17:53:58:Connecting to assign-GPU.stanford.edu:8080
17:53:58:Read GPUs.txt
17:53:58:Trying to access database...
17:53:58:Successfully acquired database lock
17:53:58:Enabled folding slot 00: READY gpu:0:"GF116 [GeForce GTS 450]"
17:53:58:Enabled folding slot 01: READY gpu:1:"GF116 [GeForce GTS 450]"
17:53:58:Enabled folding slot 02: READY smp:4
17:53:58:WU01:FS00:Starting
17:53:58:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/aoeu/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_11.fah/FahCore_11.exe -dir 01 -suffix 01 -version 702 -lifeline 33108 -checkpoint 15 -gpu 0
17:53:59:WU01:FS00:Started FahCore on PID 14296
17:53:59:WU01:FS00:Core PID:3632
17:53:59:WU01:FS00:FahCore 0x11 started
17:53:59:WU00:FS01:Starting
17:53:59:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/aoeu/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -version 702 -lifeline 33108 -checkpoint 15 -gpu 1
17:53:59:WU00:FS01:Started FahCore on PID 3908
17:53:59:WU00:FS01:Core PID:32828
17:53:59:WU00:FS01:FahCore 0x15 started
17:53:59:WU02:FS02:Starting
17:53:59:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/aoeu/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 02 -suffix 01 -version 702 -lifeline 33108 -checkpoint 15 -np 4
17:53:59:WU02:FS02:Started FahCore on PID 33544
17:53:59:WU02:FS02:Core PID:33800
17:53:59:WU02:FS02:FahCore 0xa4 started
17:53:59:WU01:FS00:0x11:
17:53:59:WU01:FS00:0x11:*------------------------------*
17:53:59:WU01:FS00:0x11:Folding@Home GPU Core
17:53:59:WU01:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
17:53:59:WU01:FS00:0x11:
17:53:59:WU01:FS00:0x11:Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
17:53:59:WU01:FS00:0x11:Build host: amoeba
17:53:59:WU01:FS00:0x11:Board Type: Nvidia
17:53:59:WU01:FS00:0x11:Core :
17:53:59:WU01:FS00:0x11:Preparing to commence simulation
17:53:59:WU01:FS00:0x11:- Ensuring status. Please wait.
17:53:59:WU00:FS01:0x15:
17:53:59:WU00:FS01:0x15:*------------------------------*
17:53:59:WU00:FS01:0x15:Folding@Home GPU Core
17:53:59:WU00:FS01:0x15:Version 2.25 (Wed May 9 17:03:01 EDT 2012)
17:53:59:WU00:FS01:0x15:Build host AmoebaRemote
17:53:59:WU00:FS01:0x15:Board Type NVIDIA/CUDA
17:53:59:WU00:FS01:0x15:Core 15
17:53:59:WU00:FS01:0x15:GPU device info vendor=0 device=0 name=NA match=0 deviceId=1
17:53:59:WU00:FS01:0x15:
17:53:59:WU00:FS01:0x15:Window's signal control handler registered.
17:53:59:WU00:FS01:0x15:Preparing to commence simulation
17:53:59:WU00:FS01:0x15:- Looking at optimizations...
17:53:59:WU00:FS01:0x15:- Files status OK
17:53:59:WU00:FS01:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
17:53:59:WU00:FS01:0x15:- Expanded 60165 -> 264278 (decompressed 439.2 percent)
17:53:59:WU00:FS01:0x15:Called DecompressByteArray: compressed_data_size=60165 data_size=264278, decompressed_data_size=264278 diff=0
17:53:59:WU00:FS01:0x15:- Digital signature verified
17:53:59:WU00:FS01:0x15:
17:53:59:WU00:FS01:0x15:Project: 8054 (Run 0, Clone 3101, Gen 12)
17:53:59:WU00:FS01:0x15:
17:53:59:WU00:FS01:0x15:Assembly optimizations on if available.
17:53:59:WU00:FS01:0x15:Entering M.D.
17:53:59:WU02:FS02:0xa4:
17:53:59:WU02:FS02:0xa4:*------------------------------*
17:53:59:WU02:FS02:0xa4:Folding@Home Gromacs GB Core
17:53:59:WU02:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
17:53:59:WU02:FS02:0xa4:
17:53:59:WU02:FS02:0xa4:Preparing to commence simulation
17:53:59:WU02:FS02:0xa4:- Looking at optimizations...
17:53:59:WU02:FS02:0xa4:- Files status OK
17:53:59:WU02:FS02:0xa4:- Expanded 2079568 -> 5386224 (decompressed 259.0 percent)
17:53:59:WU02:FS02:0xa4:Called DecompressByteArray: compressed_data_size=2079568 data_size=5386224, decompressed_data_size=5386224 diff=0
17:53:59:WU02:FS02:0xa4:- Digital signature verified
17:53:59:WU02:FS02:0xa4:
17:53:59:WU02:FS02:0xa4:Project: 7809 (Run 10, Clone 243, Gen 18)
17:53:59:WU02:FS02:0xa4:
17:53:59:WU02:FS02:0xa4:Assembly optimizations on if available.
17:53:59:WU02:FS02:0xa4:Entering M.D.
17:54:01:WU00:FS01:0x15:Will resume from checkpoint file 00/wudata_01.ckp
17:54:01:WU00:FS01:0x15:Tpr hash 00/wudata_01.tpr: 590112850 2274287033 1458076086 840345429 968028606
17:54:01:WU00:FS01:0x15:GPU device id=1
17:54:01:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
17:54:01:WU00:FS01:0x15:Working on Good ROcking Metal Altar for Chronical Sinners
17:54:01:WU00:FS01:0x15:Client config unavailable.
17:54:01:WU00:FS01:0x15:Starting GUI Server
17:54:05:WU02:FS02:0xa4:Using Gromacs checkpoints
17:54:05:WU02:FS02:0xa4:Mapping NT from 4 to 4
17:54:06:WU02:FS02:0xa4:Resuming from checkpoint
17:54:06:WU02:FS02:0xa4:Verified 02/wudata_01.log
17:54:06:WU02:FS02:0xa4:Verified 02/wudata_01.trr
17:54:06:WU02:FS02:0xa4:Verified 02/wudata_01.xtc
17:54:06:WU02:FS02:0xa4:Verified 02/wudata_01.edr
17:54:06:WU02:FS02:0xa4:Completed 840570 out of 1500000 steps (56%)
17:54:08:WU01:FS00:0x11:- Looking at optimizations...
17:54:08:WU01:FS00:0x11:- Working with standard loops on this execution.
17:54:08:WU01:FS00:0x11:- Previous termination of core was improper.
17:54:08:WU01:FS00:0x11:- Going to use standard loops.
17:54:08:WU01:FS00:0x11:- Files status OK
17:54:08:WU01:FS00:0x11:- Expanded 45386 -> 251112 (decompressed 553.2 percent)
17:54:08:WU01:FS00:0x11:Called DecompressByteArray: compressed_data_size=45386 data_size=251112, decompressed_data_size=251112 diff=0
17:54:08:WU01:FS00:0x11:- Digital signature verified
17:54:08:WU01:FS00:0x11:
17:54:08:WU01:FS00:0x11:Project: 5769 (Run 13, Clone 297, Gen 20)
17:54:08:WU01:FS00:0x11:
17:54:08:WU01:FS00:0x11:Entering M.D.
17:54:14:WU01:FS00:0x11:Tpr hash 01/wudata_01.tpr: 1490663758 55188980 3700858087 2492274234 503400090
17:54:14:WU01:FS00:0x11:
17:54:14:WU01:FS00:0x11:Calling fah_main args: 14 usage=100
17:54:14:WU01:FS00:0x11:
17:55:07:WU00:FS01:0x15:Resuming from checkpoint
17:55:07:WU00:FS01:0x15:fcCheckPointResume: retreived and current tpr file hash:
17:55:07:WU00:FS01:0x15: 0 590112850 590112850
17:55:07:WU00:FS01:0x15: 1 2274287033 2274287033
17:55:07:WU00:FS01:0x15: 2 1458076086 1458076086
17:55:07:WU00:FS01:0x15: 3 840345429 840345429
17:55:07:WU00:FS01:0x15: 4 968028606 968028606
17:55:07:WU00:FS01:0x15:fcCheckPointResume: file hashes same.
17:55:07:WU00:FS01:0x15:fcCheckPointResume: state restored.
17:55:07:WU00:FS01:0x15:fcCheckPointResume: name 00/wudata_01.log Verified 00/wudata_01.log
17:55:07:WU00:FS01:0x15:fcCheckPointResume: name 00/wudata_01.trr Verified 00/wudata_01.trr
17:55:07:WU00:FS01:0x15:fcCheckPointResume: name 00/wudata_01.xtc Verified 00/wudata_01.xtc
17:55:07:WU00:FS01:0x15:fcCheckPointResume: name 00/wudata_01.edr Verified 00/wudata_01.edr
17:55:07:WU00:FS01:0x15:fcCheckPointResume: state restored 2
17:55:07:WU00:FS01:0x15:Resumed from checkpoint
17:55:07:WU00:FS01:0x15:Setting checkpoint frequency: 500000
17:55:07:WU00:FS01:0x15:Completed 16500001 out of 50000000 steps (33%).
17:55:08:WARNING:WU00:FS01:Detected clock skew (1 mins 09 secs), adjusting time estimates
18:06:48:WU00:FS01:0x15:Completed 17000000 out of 50000000 steps (34%).
18:11:08:WU02:FS02:0xa4:Completed 855000 out of 1500000 steps (57%)
18:18:27:WU00:FS01:0x15:Completed 17500000 out of 50000000 steps (35%).
18:28:33:WU02:FS02:0xa4:Completed 870000 out of 1500000 steps (58%)
18:30:08:WU00:FS01:0x15:Completed 18000000 out of 50000000 steps (36%).
18:41:47:WU00:FS01:0x15:Completed 18500000 out of 50000000 steps (37%).
18:45:51:WU02:FS02:0xa4:Completed 885000 out of 1500000 steps (59%)
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Fri Nov 30, 2012 7:04 pm
by bollix47
Yes thanks, it does.
In FAHControl click on the Pause button and wait a few seconds until all slots pause.
Click on Start>All Programs>FAHClient>Data Directory>work
Right click on 01 and select Delete.
Click on the Fold button.
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Fri Nov 30, 2012 7:19 pm
by aoeu
Thank you.
Without a reply in an hour consider that you were right.
I'm willing to take the points hit for being slow in asking about this.
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Fri Nov 30, 2012 7:29 pm
by aoeu
I see a new WU. THX
Peace?
aoeu
Re: Project: 5769 (Run 4, Clone 20, Gen 2247) BAD_WORK_UNIT
Posted: Fri Nov 30, 2012 11:10 pm
by codysluder
The magic is not just that you got a new WU, but that you are running a different fahcore. The WUs that used Fahcore_11 seem to have been the problem.