There seems to be a lot of this with 2652. I know it taxes the system, but identical failure points say to me at least that it's another bad WU. If correct, why so many on 2652?
![Confused :e?:](./images/smilies/icon_e_confused.gif)
Moderators: Site Moderators, FAHC Science Team
[14:08:08] Completed 550000 out of 1000000 steps (55 percent)
[14:23:49] Writing local files
[14:23:49] Completed 560000 out of 1000000 steps (56 percent)
[14:39:29] Writing local files
[14:39:29] Completed 570000 out of 1000000 steps (57 percent)
[14:51:10] Warning: long 1-4 interactions
[14:51:10] Gromacs cannot continue further.
[14:51:10] Going to send back what have done.
[14:51:10] logfile size: 353037
[14:51:10] - Writing 353573 bytes of core data to disk...
[14:51:11] ... Done.
[14:51:11] - Failed to delete work/wudata_06.arc
[14:51:11] No C.P. to delete.
[14:51:11] - Failed to delete work/wudata_06.dyn
[14:51:11] - Failed to delete work/wudata_06.chk
[14:51:11] - Failed to delete work/wudata_06.sas
[14:51:11] - Failed to delete work/wudata_06.goe
[14:51:11] - Failed to delete work/wudata_06.xvg
[14:51:11] Warning: check for stray files
[14:51:11]
[14:51:11] Folding@home Core Shutdown: EARLY_UNIT_END
[14:51:11]
[14:51:11] Folding@home Core Shutdown: EARLY_UNIT_END
[14:51:17] CoreStatus = 7B (123)
[14:51:17] Client-core communications error: ERROR 0x7b
[14:51:17] Deleting current work unit & continuing...
[14:53:21] - Preparing to get new work unit...
[14:53:21] + Attempting to get work packet
[14:53:21] - Connecting to assignment server
[14:53:22] - Successful: assigned to (171.64.65.64).
[14:53:22] + News From Folding@Home: Welcome to Folding@Home
[14:53:22] Loaded queue successfully.
[14:53:27] + Closed connections
[14:53:32]
[14:53:32] + Processing work unit
[14:53:32] Core required: FahCore_a1.exe
[14:53:32] Core found.
[14:53:32] Working on Unit 07 [January 1 14:53:32]
[14:53:32] + Working ...
[14:53:33]
[14:53:33] *------------------------------*
[14:53:33] Folding@Home Gromacs SMP Core
[14:53:33] Version 1.74 (March 10, 2007)
[14:53:33]
[14:53:33] Preparing to commence simulation
[14:53:33] - Ensuring status. Please wait.
[14:53:33] Created dyn
[14:53:33] - Files status OK
[14:53:33] this execution.
[14:53:33] - Files status OK
[14:53:34] mpressed 507.5 percent)
[14:53:34] - Starting from initial work packet
[14:53:34]
[14:53:34] Project: 2652 (Run 0, Clone 430, Gen 44)
[14:53:34]
[14:53:34] : 2652 (Run 0, Clone 430, Gen 44)
[14:53:34]
[14:53:34] ble.
[14:53:34] Entering M.D.
[14:53:51] al work pa- Starting from initial work packet
[14:53:51]
[14:53:51] Project: 2652 (Run 0, Clone 430, Gen 44)
[14:53:51]
[14:53:51] Entering M.D.
[14:53:58] rotein
[14:53:58] Writing local files
[14:53:58] cal files
[14:53:58] boost OK.
[14:53:58] Writing local files
[14:53:58] Completed 0 out of 1000000 steps (0 percent)
[15:09:39] Writing local files
[15:09:39] Completed 10000 out of 1000000 steps (1 percent)
[15:28:27] Writing local files
[15:28:27] Completed 20000 out of 1000000 steps (2 percent)
[15:49:30] Writing local files
A fails at same point, and B fails at same "other" point?bruce wrote:Can anyone explain why the WU fails at the same point for user A but repeatedly fails at a different point for User B?
On the same step- some crap out 3 times at step 0, another I know of and myself hung 3 times on step 3, another report of step 5- consistently 3 times on the same step- but not always at the same step number.7im wrote:A fails at same point, and B fails at same "other" point?bruce wrote:Can anyone explain why the WU fails at the same point for user A but repeatedly fails at a different point for User B?
I'm really disturbed about this whole 0x7b situation. It appears to be a catch-all category with multiple causes and I don't know any good way to isolate them in a way that allows them to be fixed. Some are certainly issues that could be handled by the software; some are not. Virtually none of them are reproducible on different hardware.Qeldroma wrote:And yes, Bruce, they're all 7Bs.