I started seeing this same issue yesterday, on 2 machines. I restarted folding, and they started reprocessing the same unit from the beginning. The unit was not downloaded again, it reprocessed the existing unit. Today, these 2 machines were stuck in exactly the same way, and they were joined by 2 more machines. I started over in a new directory, and it downloaded the 2.04 version of the core (had been using 2.01). I am hoping the new units will process normally. All 4 machines have been running SMP folding for many months without incident.
Affected units:
Project: 2669 (Run 11, Clone 8, Gen 108)
Project: 2669 (Run 15, Clone 148, Gen 104)
Project: 2677 (Run 36, Clone 86, Gen 4)
Project: 2669 (Run 10, Clone 136, Gen 41)
Log from restart.
Code: Select all
--- Opening Log file [April 8 05:11:04]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/dmearns/FAH
Executable: /home/dmearns/FAH/fah6
Arguments: -smp -verbosity 9 -forceasm
Warning:
By using the -forceasm flag, you are overriding
safeguards in the program. If you did not intend to
do this, please restart the program without -forceasm.
If work units are not completing fully (and particularly
if your machine is overclocked), then please discontinue
use of the flag.
[05:11:04] - Ask before connecting: No
[05:11:04] - User name: chiana (Team 13149)
[05:11:04] - User ID: 2B758B140504A7C3
[05:11:04] - Machine ID: 1
[05:11:04]
[05:11:04] Loaded queue successfully.
[05:11:04] - Autosending finished units...
[05:11:04] Trying to send all finished work units
[05:11:04] + No unsent completed units remaining.
[05:11:04] - Autosend completed
[05:11:04]
[05:11:04] + Processing work unit
[05:11:04] Core required: FahCore_a2.exe
[05:11:04] Core found.
[05:11:04] Working on Unit 00 [April 8 05:11:04]
[05:11:04] + Working ...
[05:11:04] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 00 -checkpoint 15 -forceasm -verbose -lifeline 31156 -version 602'
[05:11:05]
[05:11:05] *------------------------------*
[05:11:05] Folding@Home Gromacs SMP Core
[05:11:05] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[05:11:05]
[05:11:05] Preparing to commence simulation
[05:11:05] - Ensuring status. Please wait.
[05:11:14] - Assembly optimizations manually forced on.
[05:11:14] - Not checking prior termination.
[05:11:14] Need version 206
[05:11:14] Error: Work unit read from disk is invalid
[05:11:16] - Expanded 4838141 -> 23979009 (decompressed 495.6 percent)
[05:11:16] Called DecompressByteArray: compressed_data_size=4838141 data_size=23979009, decompressed_data_size=23979009 diff=0
[05:11:16] - Digital signature verified
[05:11:16]
[05:11:16] Project: 2669 (Run 10, Clone 136, Gen 41)
[05:11:16]
[05:11:16] Assembly optimizations on if available.
[05:11:16] Entering M.D.
[05:20:51] Completed 2509 out of 249999 steps (1%)
...
[20:57:38] Completed 247509 out of 249999 steps (99%)
[21:07:08] Completed 249999 out of 249999 steps (100%)
[21:08:10]
[21:08:10] Finished Work Unit:
[21:08:25] - Reading up to 17602080 from "work/wudata_00.trr": Read 17602080
[21:08:25] trr file hash check passed.
[21:08:25] - Reading up to 4414924 from "work/wudata_00.xtc": Read 4414924
[21:08:25] xtc file hash check passed.
[21:08:25] edr file hash check passed.
[21:08:25] logfile size: 179443
[21:08:25] Leaving Run
[21:08:25] - Writing 22423711 bytes of core data to disk...
[21:08:25] ... Done.
[21:08:25] - Shutting down core
[23:11:04] - Autosending finished units...
[23:11:04] Trying to send all finished work units
[23:11:04] + No unsent completed units remaining.
[23:11:04] - Autosend completed
[05:11:04] - Autosending finished units...
[05:11:04] Trying to send all finished work units
[05:11:04] + No unsent completed units remaining.
[05:11:04] - Autosend completed
[11:11:04] - Autosending finished units...
[11:11:04] Trying to send all finished work units
[11:11:04] + No unsent completed units remaining.
[11:11:04] - Autosend completed
[15:11:30] ***** Got a SIGTERM signal (15)
[15:11:30] Killing all core threads
Folding@Home Client Shutdown.