Page 2 of 2

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Posted: Fri Jun 06, 2008 3:09 am
by bruce
anko1 wrote:Well, I was wondering if I had results already for this WU. It is the same one I was working on that hung up after 100% at "Estimating time frame..." Should I qfix it, and if I do, should I continue to rerun the same project to see if it hangs again? Or should I just continue with the second run of this WU?
There's nothing to "qfix"

For qfix to help, there needs to be a file such as wuresults_03.dat which, at the time it was completed, was "deleted" from the queue by an EUE but which still is in \work. Either the WU hung before wuresuts_^ was created, or the file was actually deleted, not just removed from the queue.

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Posted: Fri Jun 06, 2008 3:40 am
by anko1
Oh, thanks. I thought that maybe I could qfix the wuresults_05 file. Is that for the redo of the wu?

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Posted: Fri Jun 06, 2008 4:22 am
by bruce
anko1 wrote:Oh, thanks. I thought that maybe I could qfix the wuresults_05 file. Is that for the redo of the wu?
Yes, qfix can repair any queue entry, but only under certain conditions.

When the status READY is seen in queueinfo, that means the WU in that slot still needs to be completed.

After a WU reaches 100%, the core shuts down, several of the files are compined into a single file called wuresults_^ ready to be uploaded. The other files are deleted, and the status is changed from READY to (I don't remember the next word).

With a typical EUE, partial results are moved into wuresults_* and the queue is updated, and uploading proceeds.

With certain really bad EUEs, the WU is deleted rather than uploaded. In other conditions, something hangs and the client must be killed.

Though it's never supposed to happen, In those situations, it's possible to have a queue that says the entry is EMPTY bue the wuresults actually was created. That's something that qfix can repair.
It's also possible to have a wuresults file but the queue may not be EMPTY. Qfix will not fix that condition, but you can. In this situation, you can delete the WU (which makes READY into EMPTY). Then qfix can repair it.

Qfix can fix wuresults_05 provided (A) The wuresults file has been stored by the FahCore, (B) The queue entry says it is EMPTY, and (C) the PRCG contained in the wuresults matches the PRCG that used to be in the corresponding queue entry.

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Posted: Fri Jun 06, 2008 4:30 am
by anko1
Thank you so much for your very patient and detailed reply. I guess I'll just proceed with the unit.

Thanks again.

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Posted: Mon Jun 23, 2008 8:03 pm
by anko1
I reran the project, using forceasm flag and got hung up again:

Code: Select all

# Windows Graphical Edition ###################################################
###############################################################################

                       Folding@Home Client Version 5.03

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding@Home
Arguments: -local -verbosity 9 -forceasm 

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[05:12:24] - Ask before connecting: No
[05:12:24] - User name: anko1 (Team 47815)
[05:12:24] - User ID: 14991F842ED3B1A8
[05:12:24] - Machine ID: 3
[05:12:24] 
[05:12:24] Loaded queue successfully.
[05:12:24] Initialization complete
[05:12:24] + Benchmarking ...
[05:12:28] The benchmark result is 4640
[05:12:28] 
[05:12:28] + Processing work unit
[05:12:28] - Autosending finished units...
[05:12:28] Trying to send all finished work units
[05:12:28] + No unsent completed units remaining.
[05:12:28] - Autosend completed
[05:12:28] Core required: FahCore_82.exe
[05:12:28] Core found.
[05:12:28] Working on Unit 05 [May 30 05:12:28]
[05:12:28] + Working ...
[05:12:28] - Calling 'FahCore_82.exe -dir work/ -suffix 05 -checkpoint 15 -forceasm -verbose -lifeline 2988 -version 503'

[05:12:28] 
[05:12:28] *------------------------------*
[05:12:28] Folding@Home PMD Core
[05:12:28] Version 1.03 (September 7, 2005)
[05:12:28] 
[05:12:28] Preparing to commence simulation
[05:12:28] - Assembly optimizations manually forced on.
[05:12:28] - Not checking prior termination.
[05:12:29] - Expanded 92947 -> 599777 (decompressed 645.2 percent)
[05:12:29] 
[05:12:29] Project: 2170 (Run 46, Clone 234, Gen 2)
[05:12:29] 
[05:12:29] Assembly optimizations on if available.
[05:12:29] Entering M.D.
[05:12:40] Protein: p2170_lambda_obc_300K
[05:12:40] 
[05:12:40] Completed 3350 out of 500000 steps  (0)
[05:27:07] Writing local files
[05:27:07] Completed 5000 out of 500000 steps  (1)
[05:27:34] Writing checkpoint files
[05:42:46] Writing checkpoint files
[05:57:49] Writing checkpoint files
[06:11:16] Writing local files

{snip}

[22:06:33] Completed 500000 out of 500000 steps  (100)
[22:06:33] Writing checkpoint files
[22:06:37] + Writing 'sec_per_frame = 377.142853' to config
[22:06:37] + Working ...+ New frame time estimate; Working...
[22:06:47] + New frame time estimate; Working...
[22:06:52] + New frame time estimate; Working...
[22:06:57] + New frame time estimate; Working...
[22:07:02] + New frame time estimate; Working...
[22:07:07] + New frame time estimate; Working...
[22:07:12] + New frame time estimate; Working...
[22:07:17] + New frame time estimate; Working...
[22:07:22] + New frame time estimate; Working...
[22:07:27] + New frame time estimate; Working...
[22:07:32] + New frame time estimate; Working...
[22:07:33] 
[22:07:33] Finished Work Unit:
[22:07:33] Leaving Run
[22:07:36] - Writing 817008 bytes of core data to disk...
[22:07:36] Done: 816496 -> 216748 (compressed to 26.5 percent)
[22:07:36]   ... Done.
[22:07:36] - Shutting down core
[22:07:37] + New frame time estimate; Working...
[22:07:42] + New frame time estimate; Working...
[22:07:47] + New frame time estimate; Working...
[22:07:52] + New frame time estimate; Working...
[22:07:57] + New frame time estimate; Working...
[22:08:02] + New frame time estimate; Working...
[22:08:07] + New frame time estimate; Working...
[22:08:12] + New frame time estimate; Working...
[22:08:17] + New frame time estimate; Working...
[22:08:22] + New frame time estimate; Working...
[22:08:27] + New frame time estimate; Working...
[22:08:32] + New frame time estimate; Working...
[22:08:37] + New frame time estimate; Working...
[22:08:42] + New frame time estimate; Working...
[22:08:47] + New frame time estimate; Working...
[22:08:52] + New frame time estimate; Working...
[22:08:57] + New frame time estimate; Working...
[22:09:02] + New frame time estimate; Working...
[22:09:07] + New frame time estimate; Working...
[22:09:12] + New frame time estimate; Working...
[22:09:17] + New frame time estimate; Working...
[22:09:22] + New frame time estimate; Working...
and so on for about 2-1/2 days (I was out of town) until something shut down the computer (Windows restart?). When I returned and started up the graphical client again, it did the same thing: started the WU from scratch. I'm planning on (hoping to) stopping the unit just short of 100% and restarting and getting it done and sent. If that doesn't work, unless there are other suggestions, I am going to delete the unit and move on.

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Posted: Wed Jun 25, 2008 8:25 pm
by anko1
I am happy [!!!!!] to report that shutting down at 99% and then restarting did the trick [finally !!!!!] and Project: 2170 (Run 46, Clone 234, Gen 2) has gone on it's merry way.