Page 1 of 1

Progress Stuck?

Posted: Tue Jul 23, 2013 11:27 am
by gebraset
I am folding on both GPU and CPU, albiet slower to keep my temperatures down. However, my display shows that there items have been stuck at 99.99% for a while. The log for one of the slots shows a different percentage, which hasn't updated for about 6 or so hours. The other log shows nothing but that it found a checkpoint.

I have already restarted the application as well as the program, but not sure where to go from here. It's been taking a while to finish these two units, so I'm hoping that I don't have to scrap them and start over.

Any ideas? :?

Image

Slot 00:

Code: Select all

*********************** Log Started 2013-07-22T22:45:18Z ***********************
22:45:21:WU02:FS00:Starting
22:45:21:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Brentt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 02 -suffix 01 -version 703 -lifeline 1548 -checkpoint 30 -gpu 0 -gpu-vendor ati
22:45:21:WU02:FS00:Started FahCore on PID 3524
22:45:21:WU02:FS00:Core PID:3576
22:45:21:WU02:FS00:FahCore 0x17 started
22:45:25:WU02:FS00:0x17:*********************** Log Started 2013-07-22T22:45:25Z ***********************
22:45:25:WU02:FS00:0x17:Project: 8900 (Run 53, Clone 1, Gen 55)
22:45:25:WU02:FS00:0x17:Unit: 0x00000052028c126651a63237402fbc59
22:45:25:WU02:FS00:0x17:CPU: 0x00000000000000000000000000000000
22:45:25:WU02:FS00:0x17:Machine: 0
22:45:25:WU02:FS00:0x17:Digital signatures verified
22:45:36:WU02:FS00:0x17:  Found a checkpoint file
Slot 01:

Code: Select all

*********************** Log Started 2013-07-22T22:45:18Z ***********************
22:45:18:WU00:FS01:Starting
22:45:18:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Brentt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 703 -lifeline 1548 -checkpoint 30 -cpu 25 -np 1
22:45:19:WU00:FS01:Started FahCore on PID 2248
22:45:20:WU00:FS01:Core PID:3996
22:45:20:WU00:FS01:FahCore 0xa4 started
22:45:21:WU00:FS01:0xa4:
22:45:21:WU00:FS01:0xa4:*------------------------------*
22:45:21:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
22:45:21:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
22:45:21:WU00:FS01:0xa4:
22:45:21:WU00:FS01:0xa4:Preparing to commence simulation
22:45:21:WU00:FS01:0xa4:- Ensuring status. Please wait.
22:45:31:WU00:FS01:0xa4:- Looking at optimizations...
22:45:31:WU00:FS01:0xa4:- Working with standard loops on this execution.
22:45:31:WU00:FS01:0xa4:- Previous termination of core was improper.
22:45:31:WU00:FS01:0xa4:- Going to use standard loops.
22:45:31:WU00:FS01:0xa4:- Files status OK
22:45:31:WU00:FS01:0xa4:- Expanded 1117638 -> 3125912 (decompressed 279.6 percent)
22:45:31:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=1117638 data_size=3125912, decompressed_data_size=3125912 diff=0
22:45:31:WU00:FS01:0xa4:- Digital signature verified
22:45:31:WU00:FS01:0xa4:
22:45:31:WU00:FS01:0xa4:Project: 8703 (Run 507, Clone 0, Gen 5)
22:45:31:WU00:FS01:0xa4:
22:45:32:WU00:FS01:0xa4:Entering M.D.
22:45:38:WU00:FS01:0xa4:Using Gromacs checkpoints
22:45:39:WU00:FS01:0xa4:Mapping NT from 1 to 1 
22:45:46:WU00:FS01:0xa4:Resuming from checkpoint
22:45:47:WU00:FS01:0xa4:Verified 00/wudata_01.log
22:45:47:WU00:FS01:0xa4:Verified 00/wudata_01.trr
22:45:48:WU00:FS01:0xa4:Verified 00/wudata_01.xtc
22:45:48:WU00:FS01:0xa4:Verified 00/wudata_01.edr
22:45:54:WU00:FS01:0xa4:Completed 1255610 out of 1500000 steps  (83%)
01:56:46:WU00:FS01:0xa4:Completed 1260000 out of 1500000 steps  (84%)

Re: Progress Stuck?

Posted: Tue Jul 23, 2013 4:35 pm
by bruce
Does your computer enter a sleep or hibernate state while folding?

This has not been confirmed yet, but there are reports suggesting that if a computer hibernates/sleeps while processing a GPU assignment, the progress may be disrupted and the WU cannot be completed. Could this be what's happening to you?

Here's what we believe to be true: A sleep/hibernate will not disrupt a CPU-based WU. Pausing a GPU WU before hibernating/sleeping will allow processing to be resumed later.

Any additional information that you can provide will help us isolate this problem.

Re: Progress Stuck?

Posted: Tue Jul 23, 2013 7:16 pm
by gebraset
-Edit- Currently the CPU has backtracked to 85.70% and has updated itself in the log. The GPU still is stuck at 99.99% and in the log shows the same as what was posted previously. :(

I do not believe that the computer went into sleep or hibernation mode t all during these WUs. I set this computer up, as well as my wife's computer up, to be folding full time while plugged in. Now, the folding does pause when the computer is taken off of the charger since that is an option within the program, but they have not gone to sleep.

They have restarted for updates, but I highly doubt this to be the issue.

Re: Progress Stuck?

Posted: Tue Jul 23, 2013 7:55 pm
by 7im
The 99% display on the GPU core is a cosmetic error. It will show 0% or 99% when it doesn't have an actual frame number to display. This is common at the start of a work unit, and at the end, or when the FAHClient is starting up and hasn't updated yet.

Also, please note there is a delay at the start of FAHCore_17 work units while the work unit environment is being setup. It can take anywhere from 2-7 minutes, depending on the speed of the CPU, or what CPU resources are available. Longer if all the CPU cores are pegged while folding an SMP work unit.

Also, is this a normal frame time for your system? That seems like a really long time. Is there something else using up CPU resources, like a defrag or virus scan?

22:45:54:WU00:FS01:0xa4:Completed 1255610 out of 1500000 steps (83%)
01:56:46:WU00:FS01:0xa4:Completed 1260000 out of 1500000 steps (84%)

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 12:54 am
by bruce
What's missing from this discussion is the configuration of your client. Please paste the first page of the log into your next post. I'm interested in the hardware that the client detects and the configuration of the slots which are only partially described by the screenshot of FAHClient. Edit the log from the data directory and copy the part before the first WU starts or in the log panel, uncheck "Follow" and then click Refresh and scroll all the way to the top.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 1:00 am
by gebraset
It may be a cosmetic error, but the log still shows no progress on the GPU side of things. It simply had been running for almost 48 hours now with nothing in the log besides it ticking over to a new day. And yes, that frame is pretty normal as I am only folding on one core out of four, as well as 25%. This is simply to keep temperatures down since I am also folding on the GPU.

Code: Select all

*********************** Log Started 2013-07-22T22:45:18Z ***********************
22:45:18:************************* Folding@home Client *************************
22:45:18:      Website: http://folding.stanford.edu/
22:45:18:    Copyright: (c) 2009-2013 Stanford University
22:45:18:       Author: Joseph Coffland <[email protected]>
22:45:18:         Args: 
22:45:18:       Config: C:/Users/Brentt/AppData/Roaming/FAHClient/config.xml
22:45:18:******************************** Build ********************************
22:45:18:      Version: 7.3.6
22:45:18:         Date: Feb 18 2013
22:45:18:         Time: 15:25:17
22:45:18:      SVN Rev: 3923
22:45:18:       Branch: fah/trunk/client
22:45:18:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
22:45:18:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
22:45:18:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
22:45:18:     Platform: win32 XP
22:45:18:         Bits: 32
22:45:18:         Mode: Release
22:45:18:******************************* System ********************************
22:45:18:          CPU: AMD A8-4500M APU with Radeon(tm) HD Graphics
22:45:18:       CPU ID: AuthenticAMD Family 21 Model 16 Stepping 1
22:45:18:         CPUs: 4
22:45:18:       Memory: 3.45GiB
22:45:18:  Free Memory: 2.36GiB
22:45:18:      Threads: WINDOWS_THREADS
22:45:18:  Has Battery: true
22:45:18:   On Battery: false
22:45:18:   UTC offset: -4
22:45:18:          PID: 1548
22:45:18:          CWD: C:/Users/Brentt/AppData/Roaming/FAHClient
22:45:18:           OS: Windows 8
22:45:18:      OS Arch: AMD64
22:45:18:         GPUs: 1
22:45:18:        GPU 0: ATI:5 Trinity [Radeon HD 7640G]
22:45:18:         CUDA: Not detected
22:45:18:Win32 Service: false
22:45:18:***********************************************************************
22:45:18:<config>
22:45:18:  <!-- Folding Core -->
22:45:18:  <checkpoint v='30'/>
22:45:18:  <cpu-usage v='25'/>
22:45:18:
22:45:18:  <!-- Folding Slot Configuration -->
22:45:18:  <power v='full'/>
22:45:18:
22:45:18:  <!-- Network -->
22:45:18:  <proxy v=':8080'/>
22:45:18:
22:45:18:  <!-- User Information -->
22:45:18:  <passkey v='********************************'/>
22:45:18:  <team v='12772'/>
22:45:18:  <user v='gebraset'/>
22:45:18:
22:45:18:  <!-- Folding Slots -->
22:45:18:  <slot id='0' type='GPU'>
22:45:18:    <client-type v='advanced'/>
22:45:18:  </slot>
22:45:18:  <slot id='1' type='CPU'>
22:45:18:    <cpus v='1'/>
22:45:18:  </slot>
22:45:18:</config>

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 1:23 am
by 7im
The CPU usage at 25% might be bottlenecking the GPU processing. Set it to 100% for a couple hours the next time you try a core 17 WU.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 2:00 am
by gebraset
I set the CPU usage to 100%, and restarted my machine. It is back to stating that it found a checkpoint file, but that is all.

I'll give it until tomorrow morning or so I suppose, and if it still is stuck in the log, get rid of it and grab a new WU. Shame I miss out on 6000 points though.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 3:11 am
by Jesse_V
That "cosmetic issue" sounds like a bug to me.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 3:28 am
by P5-133XL
There is also a bug that seems to show up with to much overclock. The Log will not update a GPU slot; The %GPU usage will stay at 0% forever; The slot will update in the advanced/web client till 99% and then stay there forever. Upon pausing+unpausing or restarting the client it will start at the last known checkpoint for the slot (typically 0%) and work fine. The higher the OC the more often this will happen and the less the OC the less frequent.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 4:01 am
by bruce
7im wrote:Also, is this a normal frame time for your system? That seems like a really long time. Is there something else using up CPU resources, like a defrag or virus scan?

22:45:54:WU00:FS01:0xa4:Completed 1255610 out of 1500000 steps (83%)
01:56:46:WU00:FS01:0xa4:Completed 1260000 out of 1500000 steps (84%)
It should be noted that the CPU slot was set to 25% of one of the four CPUs (and perhaps not 24x7) which will very likely not make the deadline. With that fraction of your processing power, progress will necessarily be very, very slow. 0.3 frame in 3h11m works out to be about 44 days to complete that WU if you fold continuously at that rate. That's a pretty small time period to extrapolate to a complete WU so there's a huge uncertainty factor.

Project 8703 has a timeout of 26 days, at which time it will be assumed to be lost and will be reissued to someone else even though you will still be working on it.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 4:39 am
by 7im
Jesse_V wrote:That "cosmetic issue" sounds like a bug to me.
Sounds to me like you already reported said issue. Although it usually has little impact because the symptom is so short lived most of the time.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 11:26 am
by gebraset
Well, setting it to 100% seemed to work! Sadly, bruce you are right. It is too slow in order for me to actually complete the unit before it times out. Due to this, I suppose I will make some adjustments in my settings in order for it to get back on track for the next WU. Either way, setting it to a different usage has resulted in a 4% change over night, so that is helpful that it worked.

How would I go about deleting this WU so that I can retrieve a different one? I rather not work on this WU until it times out, as that is simply a waste of electricity.

-Edit-

I will simply stop folding on the GPU in order to combat this issue, as well as temperature issues. Back to 100% for all cores on the CPU! Thank you all for the help on this somewhat daunting issue.

Re: Progress Stuck?

Posted: Wed Jul 24, 2013 4:00 pm
by bruce
FAH runs on a very tight schedule. The bonus points increase quite rapidly the quicker that you return any WU, indicating their strong desire for prompt returns of results. Also, when a WU expires, it gets no credit. These characteristics are quite unlike BOINC (at least for the projects I was familiar with a few years ago.)

Some people do like to split their resources between FAH and BOINC, and when they do I recommend that instead of attempting to run both concurrently, that you dedicate say 30 days to FAH and then 30 days to BOINC. That way FAH WUs are checked out and returned promptly. (You can use FAH's "Finish" function to complete the current assignment without downloading a new assignment.)