Note: While it might appear that I am posting an inordinate number of problem WUs, please be advised that I have 22 machines running the SMP client and that the failure/problem frequency is about 1 per 30 or 40 WUs. I do not consider this to be unreasonable.
My concern is that if I go out of town for a week or so, I may return to a situation that I would rather not have to address.
Code: Select all
[13:49:46] Core required: FahCore_a1.exe
[13:49:46] Core found.
[13:49:46] Working on Unit 02 [August 31 13:49:46]
[13:49:46] + Working ...
[13:49:46] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 02 -checkpoint 15 -forceasm -verbose -lifeline 16137 -version 602'
[13:49:47]
[13:49:47] *------------------------------*
[13:49:47] Folding@Home Gromacs SMP Core
[13:49:47] Version 1.74 (November 27, 2006)
[13:49:47]
[13:49:47] Preparing to commence simulation
[13:49:47] - Ensuring status. Please wait.
[13:50:04] - Assembly optimizations manually forced on.
[13:50:04] - Not checking prior termination.
[13:50:05] - Expanded 2422545 -> 12896633 (decompressed 532.3 percent)
[13:50:05] - Starting from initial work packet
[13:50:05]
[13:50:05] Project: 2605 (Run 11, Clone 561, Gen 84)
[13:50:05]
[13:50:05] Assembly optimizations on if available.
[13:50:05] Entering M.D.
[13:50:11] Rejecting checkpoint
[13:50:12] Protein: Protein in POPCExtra SSE boost OK.
[13:50:12]
[13:50:12] Extra SSE boost OK.
[13:50:12] Writing local files
[13:50:12] Completed 0 out of 500000 steps (0 percent)
[14:05:13] Timered checkpoint triggered.
[17:17:51] Completed 500000 out of 500000 steps (100 percent)
[17:17:52] Writing final coordinates.
[17:17:52] Past main M.D. loop
[17:17:52] Will end MPI now
[17:18:52]
[17:18:52] Finished Work Unit:
[17:18:52] - Reading up to 3723552 from "work/wudata_02.arc": Read 3723552
[17:18:52] - Reading up to 1780676 from "work/wudata_02.xtc": Read 1780676
[17:18:52] goefile size: 0
[17:18:52] logfile size: 21735
[17:18:52] Leaving Run
[17:18:54] - Writing 5530363 bytes of core data to disk...
[17:18:54] ... Done.
[17:18:55] - Shutting down core
[17:18:55]
[17:18:55] Folding@home Core Shutdown: FINISHED_UNIT
Note: client hang at this point
[17:44:17] ***** Got an Activate signal (2)<----User generated
[17:44:17] Killing all core threads
Folding@Home Client Shutdown.
I WAS able to recover and send this WU at around 18:00 UTC.
![Smile :)](./images/smilies/icon_smile.gif)
(deleted slot two/ran qfix/used -send 2/acknowledged by work server)
I would appreciate a "lookup" in a few hours to confirm that all went well.
Thank you in advance!