Project: 2170 (Run 46, Clone 234, Gen 2) hung

Moderators: Site Moderators, FAHC Science Team

anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

Hi, all.

I was running 2170 (Run 46, Clone 234, Gen 2) on my graphical client (windows XP) and it hung at "new time frame estimate working" for a really long time, so I quit the application, using the icon. Well, the whole log disappeared, and didn't go into FAHlog Prev (which is why some of my info is a little vague). Anyway, when I restarted the client, hoping that it would send, it kept trying to do the same work unit, but using standard loops b/c the prior termination was improper [but it wasn't!!! I swear I used the icon!!! ;-)]. I've just stopped the client. Since I have a WU results file for that project, I'm planning on trying qfix. Any other suggestions, words of wisdom? If it helps, here's the entry from the que:

CURRENT QUEUE:
00 EMPTY
01 EMPTY
02 EMPTY
03 EMPTY
04 EMPTY
05 *ACTIVE "Folding@Home" (82) 171.65.103.160:8080 May 17 04:27 | July 22 04:27
06 EMPTY
07 EMPTY
08 EMPTY
09 EMPTY

Thanks very much,

Angela
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by 7im »

You could add the -forceasm switch to the shortcut that launches the client and avoid the standard loops, even the when the client behaves badly. Not sure what caused the original hang. Probably need to let it run, and see if it happens again, and collect more info about what the computer and what the client is doing when it hangs.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

So I should go ahead and rerun it? Should I use qfix to send the original work?
anandhanju
Posts: 522
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anandhanju »

If you're game for a little bit of experimentation, you could

a) Take a backup of your FAH directory.
b) Disable your internet connection temporarily on that system and start the client.
c) If it attempts to send the result, good. You can now enable the internet connection and send the result on its way.
d) If it starts working from a checkpoint, you can let it chug along while you renable the connection.
e) If it starts from 0%, you have nothing to lose. Try qfixing it. Restart client and enable connection. Whatever happens now is the only alternative.

Or you could wait until someone less sleepier than me thinks of a simple way to get around this :wink:
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

The client doesn't recognize that the unit is there. Any other suggestions? Thanks for taking the time to answer.

--- Opening Log file [May 20 14:01:38]

Code: Select all

# Windows Graphical Edition ###################################################
###############################################################################

                       Folding@Home Client Version 5.03

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding@Home
Arguments: -local -verbosity 9 

[14:01:38] - Ask before connecting: No
[14:01:38] - User name: anko1 (Team 47815)
[14:01:38] - User ID: 14991F842ED3B1A8
[14:01:38] - Machine ID: 3
[14:01:38] 
[14:01:38] Loaded queue successfully.
[14:01:38] Initialization complete
[14:01:38] + Benchmarking ...
[14:01:41] The benchmark result is 4896
[14:01:41] 
[14:01:41] + Processing work unit
[14:01:41] - Autosending finished units...
[14:01:41] Trying to send all finished work units
[14:01:41] + No unsent completed units remaining.
[14:01:41] - Autosend completed
[14:01:41] Core required: FahCore_82.exe
[14:01:41] Core found.
[14:01:41] Working on Unit 05 [May 20 14:01:41]
[14:01:41] + Working ...
[14:01:41] - Calling 'FahCore_82.exe -dir work/ -suffix 05 -checkpoint 15 -verbose -lifeline 404 -version 503'

[14:01:41] 
[14:01:41] *------------------------------*
[14:01:41] Folding@Home PMD Core
[14:01:41] Version 1.03 (September 7, 2005)
[14:01:41] 
[14:01:41] Preparing to commence simulation
[14:01:41] - Ensuring status. Please wait.
[14:01:58] - Looking at optimizations...
[14:01:58] - Working with standard loops on this execution.
[14:01:58] - Previous termination of core was improper.
[14:01:58] - Files status OK
[14:01:59] - Expanded 92947 -> 599777 (decompressed 645.2 percent)
[14:01:59] 
[14:01:59] Project: 2170 (Run 46, Clone 234, Gen 2)
[14:01:59] 
[14:01:59] Entering M.D.
[14:02:06] Protein: p2170_lambda_obc_300K
[14:02:06] 
[14:02:06] Completed 0 out of 500000 steps  (0)
[14:05:22] Printing Queue Information
CURRENT QUEUE: 
00  EMPTY    
01  EMPTY    
02  EMPTY    
03  EMPTY    
04  EMPTY    
05 *ACTIVE    "Folding@Home" (82) 171.65.103.160:8080  May 17 04:27 | July 22 04:27
06  EMPTY    
07  EMPTY    
08  EMPTY    
09  EMPTY    
[14:10:06] ***** Got a SIGTERM signal (2)

Folding@Home Client Shutdown.
anandhanju
Posts: 522
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anandhanju »

I'm out of ideas. If I were you, I'd just let it run and see if it hangs again.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

Thanks for trying!
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by codysluder »

anko1 wrote:The client doesn't recognize that the unit is there. Any other suggestions?

Code: Select all

CURRENT QUEUE: 
00  EMPTY    
01  EMPTY    
02  EMPTY    
03  EMPTY    
04  EMPTY    
05 *ACTIVE    "Folding@Home" (82) 171.65.103.160:8080  May 17 04:27 | July 22 04:27
06  EMPTY    
07  EMPTY    
08  EMPTY    
09  EMPTY    
Look in the work folder for WURESULTS_04.dat or _03.dat. If either is there, stop the client and run qfix from the CLI.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

I have results from O5. Would that be the one I'm looking for?

Thanks for the help.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by bruce »

The queue says that 05 is active. If that WU isn't finished yet, you can expect there to be quite a few files with *_05* in their name. Normally, when a WU finishes, the important data is collected into a file called wuresults_0*.dat (which will be uploaded) and most of the other files are deleted. Then the status of the WU is changed from active to ready-to-upload.

When a WU has an error, this process may be disrupted. It's difficult to know whether wuresults_*.dat was created before the disruption or the disruption prevented it from being created. Qfix will look for wuresults* files and in some cases is able to correct for all or part of the disruption.

Since WU 05 is active, I assumed that the error you reported earlier was WU 04 or perhaps 03.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

Sorry, guess I wasn't clear. WU5 is the one that I stopped when it hung up at "new time frame...." Then when I restarted, hoping that it would finish up and send, it began the same WU at the start, so I have a results file for it (I presume generated on July 22). So what do you suggest? Qfix it and then run the unit to see if it hangs again? or just let the unit proceed and replace what ever is currently in Results_05?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by bruce »

I'm confused.

Post a list of the contents of "work" including the date-time together with the queueinfo output taken at the same time.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

Thanks for trying to help me resolve this, Bruce. Here's the contents of my work folder:
core82.sta
current.xyz
current.xyz_temp
logfile_02
logfile_03
logfile_05
logfile_05-2170restart [a file I saved]
wudata_05 [dat file]
wudata_05 [INC file]
wudata_05 [MSInfo document]
wudata_05.dyn
wudata_05.eng
wudata_05.inp
wudata_05.out
wudata_05.top
wudata_05.trj
wudata_05CP.arc
wuinfo_05
wuresults_05

and here's the current queue:

Code: Select all

--- Opening Log file [June 5 05:28:58] 


# Windows Graphical Edition ###################################################
###############################################################################

                       Folding@Home Client Version 5.03

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding@Home
Arguments: -local -verbosity 9 -forceasm 

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[05:28:58] - Ask before connecting: No
[05:28:58] - User name: anko1 (Team 47815)
[05:28:58] - User ID: 14991F842ED3B1A8
[05:28:58] - Machine ID: 3
[05:28:58] 
[05:28:58] Loaded queue successfully.
[05:28:58] Initialization complete
[05:28:58] + Benchmarking ...
[05:29:02] The benchmark result is 4596
[05:29:02] 
[05:29:02] + Processing work unit
[05:29:02] - Autosending finished units...
[05:29:02] Trying to send all finished work units
[05:29:02] + No unsent completed units remaining.
[05:29:02] - Autosend completed
[05:29:02] Core required: FahCore_82.exe
[05:29:02] Core found.
[05:29:02] Working on Unit 05 [June 5 05:29:02]
[05:29:02] + Working ...
[05:29:02] - Calling 'FahCore_82.exe -dir work/ -suffix 05 -checkpoint 15 -forceasm -verbose -lifeline 2232 -version 503'

[05:29:02] 
[05:29:02] *------------------------------*
[05:29:02] Folding@Home PMD Core
[05:29:02] Version 1.03 (September 7, 2005)
[05:29:02] 
[05:29:02] Preparing to commence simulation
[05:29:02] - Assembly optimizations manually forced on.
[05:29:02] - Not checking prior termination.
[05:29:03] - Expanded 92947 -> 599777 (decompressed 645.2 percent)
[05:29:03] 
[05:29:03] Project: 2170 (Run 46, Clone 234, Gen 2)
[05:29:03] 
[05:29:03] Assembly optimizations on if available.
[05:29:03] Entering M.D.
[05:30:09] Protein: p2170_lambda_obc_300K
[05:30:09] 
[05:30:09] Completed 57000 out of 500000 steps  (11)
[05:37:00] Printing Queue Information
CURRENT QUEUE: 
00  EMPTY    
01  EMPTY    
02  EMPTY    
03  EMPTY    
04  EMPTY    
05 *ACTIVE    "Folding@Home" (82) 171.65.103.160:8080  May 17 04:27 | July 22 04:27
06  EMPTY    
07  EMPTY    
08  EMPTY    
09  EMPTY    

I see that the second date in the active line is a July date, and not an earlier date. I hadn't looked closely enough :oops: , and thought that was the date the unit first finished and hung.

Thanks again!!

Angela
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by bruce »

Everthing looks normal to me.
anko1 wrote:[05:29:02] + Processing work unit
[05:29:02] - Autosending finished units...
[05:29:02] Trying to send all finished work units
[05:29:02] + No unsent completed units remaining.
[05:29:02] - Autosend completed
[05:29:02] Core required: FahCore_82.exe
[05:29:02] Core found.
[05:29:02] Working on Unit 05 [June 5 05:29:02] <---- currently working on unit 05
[05:29:02] + Working ...
<snip>
[05:29:03] Project: 2170 (Run 46, Clone 234, Gen 2)
<snip>
[05:30:09] Completed 57000 out of 500000 steps (11) <--- and 11% has already been finished.

[05:37:00] Printing Queue Information
CURRENT QUEUE:
00 EMPTY
01 EMPTY
02 EMPTY
03 EMPTY
04 EMPTY
05 *ACTIVE "Folding@Home" (82) 171.65.103.160:8080 May 17 04:27 | July 22 04:27 <----Active WU downloaded 17May is due 22 July
06 EMPTY
07 EMPTY
08 EMPTY
09 EMPTY <---- . . . and nothing else is in queue except the one you're working on.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2170 (Run 46, Clone 234, Gen 2) hung

Post by anko1 »

Well, I was wondering if I had results already for this WU. It is the same one I was working on that hung up after 100% at "Estimating time frame..." Should I qfix it, and if I do, should I continue to rerun the same project to see if it hangs again? Or should I just continue with the second run of this WU?
Post Reply