Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Moderators: Site Moderators, FAHC Science Team

Post Reply
matte.2
Posts: 9
Joined: Sat Dec 08, 2007 5:56 pm

Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Post by matte.2 »

Hi,

found my smp client hung on the mentioned WU


part of logfile on client startup

Code: Select all

--- Opening Log file [October 23 03:18:40 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.22 SMP Beta2

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\fah_smp_1
Executable: C:\Program Files\fah_smp_1\[email protected]
Arguments: -smp -config -forceasm -verbosity 9 

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[03:18:40] - Ask before connecting: No
[03:18:40] - User name: Duffelcoat-minion-1480 (Team 53338)
[03:18:40] - User ID: 763D5F8B1A3ADB10
[03:18:40] - Machine ID: 1
[03:18:40] 
[03:18:40] Configuring Folding@Home...


[03:18:46] - Ask before connecting: No
[03:18:46] - User name: Duffelcoat-minion-1480 (Team 53338)
[03:18:46] - User ID: 763D5F8B1A3ADB10
[03:18:46] - Machine ID: 1
[03:18:46] 
[03:18:46] Loaded queue successfully.
[03:18:46] 
[03:18:46] - Autosending finished units... [October 23 03:18:46 UTC]
[03:18:46] + Processing work unit
part of logfile on unit startup

Code: Select all

[01:40:59] + Received work.
[01:40:59] Trying to send all finished work units
[01:40:59] + No unsent completed units remaining.
[01:40:59] + Closed connections
[01:40:59] 
[01:40:59] + Processing work unit
[01:40:59] Work type a1 not eligible for variable processors
[01:40:59] Core required: FahCore_a1.exe
[01:40:59] Core found.
[01:40:59] Using generic mpiexec calls
[01:40:59] Working on queue slot 06 [October 29 01:40:59 UTC]
[01:40:59] + Working ...
[01:40:59] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 10 -forceasm -verbose -lifeline 4504 -version 622'

[01:40:59] 
[01:40:59] *------------------------------*
[01:40:59] Folding@Home Gromacs SMP Core
[01:40:59] Version 1.74 (March 10, 2007)
[01:40:59] 
[01:40:59] Preparing to commence simulation
[01:40:59] - Ensuring status. Please wait.
[01:41:16] - Assembly optimizations manually forced on.
[01:41:16] - Not checking prior termination.
[01:41:36] - Expanded 4823966 -> 24810145 (decompressed 514.3 percent)
[01:41:36] - Starting from initial work packet
[01:41:36] 
[01:41:36] Project: 2665 (Run 2, Clone 430, Gen 46)
[01:41:36] 
[01:41:40] Assembly optimizations on if available.
[01:41:40] Entering M.D.
[01:41:46] Rejecting checkpoint
[01:41:48] cosylations
[01:41:48] Writing local files
[01:41:49] 
[01:41:49] Writing local files
[01:42:01] Extra SSE boost OK.
[01:42:01] Writing local files
[01:42:02] Completed 0 out of 250000 steps  (0 percent)
[01:52:02] Timered checkpoint triggered.
[02:02:03] Timered checkpoint triggered.
[02:02:55] Writing local files
[02:02:56] Completed 2500 out of 250000 steps  (1 percent)
part of logfile on unit crashing

Code: Select all

[11:53:42] Completed 242500 out of 250000 steps  (97 percent)
[12:03:43] Timered checkpoint triggered.
[12:13:43] Timered checkpoint triggered.
[12:14:47] Writing local files
[12:14:47] Completed 245000 out of 250000 steps  (98 percent)
[12:24:48] Timered checkpoint triggered.
[12:30:23] Warning:  long 1-4 interactions
[12:30:23] Gromacs cannot continue further.
[12:30:23] Going to send back what have done.
[12:30:23] logfile size: 193520
[12:30:23] - Writing 194056 bytes of core data to disk...
[12:30:23]   ... Done.
[12:30:23] - Failed to delete work/wudata_06.sas
[12:30:23] - Failed to delete work/wudata_06.goe
[12:30:23] Warning:  check for stray files
[12:32:23] 
[12:32:23] Folding@home Core Shutdown: EARLY_UNIT_END
[12:32:23] 
[12:32:23] Folding@home Core Shutdown: EARLY_UNIT_END
[12:32:28] CoreStatus = 7B (123)
[12:32:28] Client-core communications error: ERROR 0x7b
[12:32:28] This is a sign of more serious problems, shutting down.
[15:18:56] - Autosending finished units... [October 30 15:18:56 UTC]
[15:18:56] Trying to send all finished work units
[15:18:56] + No unsent completed units remaining.
[15:18:56] - Autosend completed
Following the sticky post on returning partial results I downloaded this QFIX

Code: Select all

- Windows/x86 : qfix.exe (10.00 KB) 
  Compiled with : i586-mingw32msvc-gcc -Wall -DSYSTYPE=1 -s -O2 -o qfix.exe qfix.c 
  Compiled on : Debian GNU/Linux 4.0 "Etch" with gcc version 3.4.5 (mingw special) 
  Modified : Sat Nov 17 14:09:56 2007 
when I ran it I got

Code: Select all

C:\Program Files\fah_smp_1>qfix
entry 7, status 0, address 171.64.65.64:8080
entry 8, status 0, address 171.64.65.64:8080
entry 9, status 0, address 171.64.65.64:8080
entry 0, status 0, address 171.64.65.64:8080
entry 1, status 0, address 171.64.65.64:8080
entry 2, status 0, address 171.64.65.64:8080
entry 3, status 0, address 171.64.65.63:8080
entry 4, status 0, address 171.64.65.64:8080
entry 5, status 0, address 171.64.65.64:8080
entry 6, status 1, address 171.64.65.64:8080
  Found results <work\wuresults_06.dat>: proj 2665, run 2, clone 430, gen 46
   -- queue entry: proj 2665, run 2, clone 430, gen 46
   -- queue entry isn't empty
File is OK
so I tried to send all

Code: Select all

 Directory of C:\Program Files\fah_smp_1

28/07/2008  16:58           422.400 [email protected]
               1 File(s)        422.400 bytes
               0 Dir(s)  12.395.061.248 bytes free

C:\Program Files\fah_smp_1>[email protected] -send all

Note: Please read the license agreement ([email protected] -license). F
urther
use of this software requires that you have read and accepted this agreement.



--- Opening Log file [October 30 17:09:27 UTC]


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.22 SMP Beta2

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\fah_smp_1
Executable: [email protected]
Arguments: -send all

[17:09:27] - Ask before connecting: No
[17:09:27] - User name: Duffelcoat-minion-1480 (Team 53338)
[17:09:27] - User ID: 763D5F8B1A3ADB10
[17:09:27] - Machine ID: 1
[17:09:27]
[17:09:27] Loaded queue successfully.
[17:09:27] Attempting to return result(s) to server...

Folding@Home Client Shutdown.
as the "Folding@Home Client Shutdown." message appeared instantly I then tried to send just unit 6

Code: Select all


C:\Program Files\fah_smp_1>[email protected] -send #6

Note: Please read the license agreement ([email protected] -license). F
urther
use of this software requires that you have read and accepted this agreement.



--- Opening Log file [October 30 17:09:50 UTC]


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.22 SMP Beta2

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\fah_smp_1
Executable: [email protected]
Arguments: -send #6

[17:09:50] - Ask before connecting: No
[17:09:50] - User name: Duffelcoat-minion-1480 (Team 53338)
[17:09:50] - User ID: 763D5F8B1A3ADB10
[17:09:50] - Machine ID: 1
[17:09:50]
[17:09:50] Loaded queue successfully.
[17:09:50] Attempting to return result(s) to server...
[17:09:50] Project: 2665 (Run 2, Clone 562, Gen 60)
[17:09:50] - Failed to send unit 00 to server

Folding@Home Client Shutdown.

So now I'm stuck. As, after more than a days work, the WU is at 98% and I'd hate to lose this,
has anyone got any idea how I can return the almost finished unit ?

thanks
Marc

edit : any tips on how to complete it are welcome as well
Image
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Post by toTOW »

Update your client to 6.23

There are 3 other reports for partial credit, and someone was able to complete it.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
matte.2
Posts: 9
Joined: Sat Dec 08, 2007 5:56 pm

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Post by matte.2 »

Thx toTOW
toTOW wrote:Update your client to 6.23
learned about the existence on posting this problem. Will try (but not tonight as it getting late)
toTOW wrote:There are 3 other reports for partial credit, and someone was able to complete it.
Why was the unit re-distributed if someone has completed it ?

marc

edit : ...unless of course it was completed after the deadline :?:
Image
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Post by toTOW »

I don't know ... he didn't miss the preferred deadline.

It might be a check, to get two successes to confirm that the results are valid.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
matte.2
Posts: 9
Joined: Sat Dec 08, 2007 5:56 pm

Re: Project: 2665 (Run 2, Clone 430, Gen 46) stopping after 98%

Post by matte.2 »

Thx for the suggestion toTOW,
installed 6.23 and restarting gave me this

Code: Select all



--- Opening Log file [October 31 16:32:36 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23 Beta R1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\fah_smp_1
Executable: C:\Program Files\fah_smp_1\[email protected]
Arguments: -smp -config -forceasm -verbosity 9 

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[16:32:36] - Ask before connecting: No
[16:32:36] - User name: Duffelcoat-minion-1480 (Team 53338)
[16:32:36] - User ID: 763D5F8B1A3ADB10
[16:32:36] - Machine ID: 1
[16:32:36] 
[16:32:36] Configuring Folding@Home...


[16:32:44] - Ask before connecting: No
[16:32:44] - User name: Duffelcoat-minion-1480 (Team 53338)
[16:32:44] - User ID: 763D5F8B1A3ADB10
[16:32:44] - Machine ID: 1
[16:32:44] 
[16:32:44] Loaded queue successfully.
[16:32:44] 
[16:32:44] - Autosending finished units... [October 31 16:32:44 UTC]
[16:32:44] + Processing work unit
[16:32:44] Trying to send all finished work units
[16:32:44] Work type a1 not eligible for variable processors
[16:32:44] + No unsent completed units remaining.
[16:32:44] Core required: FahCore_a1.exe
[16:32:44] - Autosend completed
[16:32:44] Core found.
[16:32:44] Using generic mpiexec calls
[16:32:44] Working on queue slot 06 [October 31 16:32:44 UTC]
[16:32:44] + Working ...
[16:32:44] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 10 -forceasm -verbose -lifeline 5736 -version 623'

[16:32:44] 
[16:32:44] *------------------------------*
[16:32:44] Folding@Home Gromacs SMP Core
[16:32:44] Version 1.74 (March 10, 2007)
[16:32:44] 
[16:32:44] Preparing to commence simulation
[16:32:44] - Ensuring status. Please wait.
[16:33:01] - Assembly optimizations manually forced on.
[16:33:01] - Not checking prior termination.
[16:33:01] 
[16:33:01] Folding@home Core Shutdown: MISSING_WORK_FILES
[16:33:01] Finalizing output
[16:35:04] CoreStatus = 1 (1)
[16:35:04] Sending work to server
[16:35:04] Project: 2665 (Run 2, Clone 430, Gen 46)


[16:35:04] + Attempting to send results [October 31 16:35:04 UTC]
[16:35:04] - Reading file work/wuresults_06.dat from core
[16:35:04]   (Read 194056 bytes from disk)
[16:35:04] Connecting to http://171.64.65.64:8080/
[16:35:10] Posted data.
[16:35:10] Initial: 0000; - Uploaded at ~31 kB/s
[16:35:10] - Averaged speed for that direction ~39 kB/s
[16:35:10] + Results successfully sent
[16:35:10] Thank you for your contribution to Folding@Home.
[16:35:30] - Warning: Could not delete all work unit files (6): Core returned invalid code
[16:35:30] Trying to send all finished work units
[16:35:30] + No unsent completed units remaining.
[16:35:30] - Preparing to get new work unit...
[16:35:30] + Attempting to get work packet
[16:35:30] - Will indicate memory of 3069 MB
[16:35:30] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[16:35:30] - Connecting to assignment server
[16:35:30] Connecting to http://assign.stanford.edu:8080/
[16:35:31] Posted data.
[16:35:31] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[16:35:31] + News From Folding@Home: Welcome to Folding@Home
[16:35:31] Loaded queue successfully.
[16:35:31] Connecting to http://171.64.65.64:8080/
[16:35:34] Posted data.
[16:35:34] Initial: 0000; - Receiving payload (expected size: 2439680)
[16:35:53] - Downloaded at ~125 kB/s
[16:35:53] - Averaged speed for that direction ~163 kB/s
[16:35:53] + Received work.
[16:35:53] Trying to send all finished work units
[16:35:53] + No unsent completed units remaining.
[16:35:53] + Closed connections
[16:35:58] 
[16:35:58] + Processing work unit
[16:35:58] Work type a1 not eligible for variable processors
[16:35:58] Core required: FahCore_a1.exe
[16:35:58] Core found.
[16:35:58] Using generic mpiexec calls
[16:35:58] Working on queue slot 07 [October 31 16:35:58 UTC]
[16:35:58] + Working ...
[16:35:58] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 10 -forceasm -verbose -lifeline 5736 -version 623'

[16:35:58] 
[16:35:58] *------------------------------*
[16:35:58] Folding@Home Gromacs SMP Core
[16:35:58] Version 1.74 (March 10, 2007)
[16:35:58] 
[16:35:58] Preparing to commence simulation
[16:35:58] - Ensuring status. Please wait.
[16:36:15] - Assembly optimizations manually forced on.
[16:36:15] - Not checking prior termination.
[16:36:21] - Expanded 2439168 -> 12879713 (decompressed 528.0 percent)
[16:36:21] - Starting from initial work packet
[16:36:21] 
[16:36:21] Project: 2653 (Run 16, Clone 13, Gen 88)
[16:36:21] 
[16:36:22] Assembly optimizations on if available.
[16:36:22] Entering M.D.
I don't understand the "missing workfiles" bit but apparently something was sent and I'm curious to see the credit
thanks again
Marc
User name: Duffelcoat-minion-1480 (Team 53338)
User ID: 763D5F8B1A3ADB10
Image
Post Reply