Page 1 of 1

Project: 3062 (Run 4, Clone 77, Gen 11) Bad WU?

Posted: Wed Apr 02, 2008 2:47 pm
by road-runner
I guess I am going to delete it, this is the 4th time it has failed at the same point. This is on native Ubuntu with a Quad that has been stable for months and months....

Code: Select all

[23:58:52] Project: 3062 (Run 4, Clone 77, Gen 11)
[23:58:52] 
[23:58:52] Assembly optimizations on if available.
[23:58:52] Entering M.D.
[23:58:58] Rejecting checkpoint
[23:58:58] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[23:58:58] 
[23:58:58] Extra SSE boost OK.
[23:58:58] Writing local files
[23:58:58] Completed 0 out of 5000000 steps  (0 percent)
[00:10:18] Writing local files
[00:10:18] Completed 50000 out of 5000000 steps  (1 percent)
[00:21:36] Writing local files
[00:21:36] Completed 100000 out of 5000000 steps  (2 percent)

snip

[03:00:15] Completed 800000 out of 5000000 steps  (16 percent)
[03:11:34] Writing local files
[03:11:34] Completed 850000 out of 5000000 steps  (17 percent)
[03:19:00] Warning:  long 1-4 interactions
[03:19:04] CoreStatus = 0 (0)
[03:19:04] Client-core communications error: ERROR 0x0
[03:19:04] Deleting current work unit & continuing...
[03:23:25] - Preparing to get new work unit...
[03:23:25] + Attempting to get work packet
[03:23:25] - Connecting to assignment server
[03:23:26] - Successful: assigned to (171.64.65.63).
[03:23:26] + News From Folding@Home: Welcome to Folding@Home
[03:23:26] Loaded queue successfully.
[03:23:28] + Closed connections
[03:23:33] 
[03:23:33] + Processing work unit
[03:23:33] Core required: FahCore_a1.exe
[03:23:33] Core found.
[03:23:33] Working on Unit 08 [April 2 03:23:33]
[03:23:33] + Working ...
[03:23:33] 
[03:23:33] *------------------------------*
[03:23:33] Folding@Home Gromacs SMP Core
[03:23:33] Version 1.74 (November 27, 2006)
[03:23:33] 
[03:23:33] Preparing to commence simulation
[03:23:33] - Ensuring status. Please wait.
[03:23:50] - Assembly optimizations manually forced on.
[03:23:50] - Not checking prior termination.
[03:23:50] - Expanded 608008 -> 3255645 (decompressed 535.4 percent)
[03:23:50] - Starting from initial work packet
[03:23:50] 
[03:23:50] Project: 3062 (Run 4, Clone 77, Gen 11)
[03:23:50] 
[03:23:50] Assembly optimizations on if available.
[03:23:50] Entering M.D.
[03:23:56] Rejecting checkpoint
[03:23:57] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[03:23:57] 
[03:23:57] Extra SSE boost OK.
[03:23:57] Writing local files
[03:23:57] Completed 0 out of 5000000 steps  (0 percent)
[03:35:18] Writing local files
[03:35:18] Completed 50000 out of 5000000 steps  (1 percent)
[03:46:42] Writing local files
[03:46:42] Completed 100000 out of 5000000 steps  (2 percent)

snip

[06:25:58] Completed 800000 out of 5000000 steps  (16 percent)
[06:37:21] Writing local files
[06:37:21] Completed 850000 out of 5000000 steps  (17 percent)
[06:44:46] Warning:  long 1-4 interactions
[06:44:50] CoreStatus = 1 (1)
[06:44:50] Client-core communications error: ERROR 0x1
[06:44:50] Deleting current work unit & continuing...
[06:49:12] - Preparing to get new work unit...
[06:49:12] + Attempting to get work packet
[06:49:12] - Connecting to assignment server
[06:49:12] - Successful: assigned to (171.64.65.63).
[06:49:12] + News From Folding@Home: Welcome to Folding@Home
[06:49:12] Loaded queue successfully.
[06:49:14] + Closed connections
[06:49:19] 
[06:49:19] + Processing work unit
[06:49:19] Core required: FahCore_a1.exe
[06:49:19] Core found.
[06:49:19] Working on Unit 09 [April 2 06:49:19]
[06:49:19] + Working ...
[06:49:19] 
[06:49:19] *------------------------------*
[06:49:19] Folding@Home Gromacs SMP Core
[06:49:19] Version 1.74 (November 27, 2006)
[06:49:19] 
[06:49:19] Preparing to commence simulation
[06:49:19] - Ensuring status. Please wait.
[06:49:36] - Assembly optimizations manually forced on.
[06:49:36] - Not checking prior termination.
[06:49:36] - Expanded 608008 -> 3255645 (decompressed 535.4 percent)
[06:49:36] - Starting from initial work packet
[06:49:36] 
[06:49:36] Project: 3062 (Run 4, Clone 77, Gen 11)
[06:49:36] 
[06:49:36] Assembly optimizations on if available.
[06:49:36] Entering M.D.
[06:49:43] Protein: p3062_lambda5_99sb
[06:49:43] Writing local files
[06:49:43] Extra SSE boost OK.
[06:49:43] 
[06:49:43] Extra SSE boost OK.
[06:49:43] Writing local files
[06:49:43] Completed 0 out of 5000000 steps  (0 percent)
[07:01:02] Writing local files
[07:01:02] Completed 50000 out of 5000000 steps  (1 percent)
[07:12:22] Writing local files
[07:12:22] Completed 100000 out of 5000000 steps  (2 percent)

snip

[09:50:45] Completed 800000 out of 5000000 steps  (16 percent)
[10:02:07] Writing local files
[10:02:07] Completed 850000 out of 5000000 steps  (17 percent)
[10:09:32] Warning:  long 1-4 interactions
[10:09:36] CoreStatus = 1 (1)
[10:09:36] Client-core communications error: ERROR 0x1
[10:09:36] - Attempting to download new core...
[10:09:36] + Downloading new core: FahCore_a1.exe
[10:09:37] + 10240 bytes downloaded
[10:09:37] + 20480 bytes downloaded

snip

[10:09:39] + 1484800 bytes downloaded
[10:09:39] + 1490945 bytes downloaded
[10:09:39] Verifying core Core_a1.fah...
[10:09:39] Signature is VALID
[10:09:39] 
[10:09:39] Trying to unzip core FahCore_a1.exe
[10:09:39] Decompressed FahCore_a1.exe (3625104 bytes) successfully
[10:09:39] + Core successfully engaged
[10:09:39] Deleting current work unit & continuing...
[10:14:00] - Preparing to get new work unit...
[10:14:00] + Attempting to get work packet
[10:14:00] - Connecting to assignment server
[10:14:01] - Successful: assigned to (171.64.65.63).
[10:14:01] + News From Folding@Home: Welcome to Folding@Home
[10:14:01] Loaded queue successfully.
[10:14:03] + Closed connections
[10:14:08] 
[10:14:08] + Processing work unit
[10:14:08] Core required: FahCore_a1.exe
[10:14:08] Core found.
[10:14:08] Working on Unit 00 [April 2 10:14:08]
[10:14:08] + Working ...
[10:14:08] 
[10:14:08] *------------------------------*
[10:14:08] Folding@Home Gromacs SMP Core
[10:14:08] Version 1.74 (November 27, 2006)
[10:14:08] 
[10:14:08] Preparing to commence simulation
[10:14:08] - Ensuring status. Please wait.
[10:14:25] - Assembly optimizations manually forced on.
[10:14:25] - Not checking prior termination.
[10:14:25] - Expanded 608008 -> 3255645 (decompressed 535.4 percent)
[10:14:25] - Starting from initial work packet
[10:14:25] 
[10:14:25] Project: 3062 (Run 4, Clone 77, Gen 11)
[10:14:25] 
[10:14:25] Assembly optimizations on if available.
[10:14:25] Entering M.D.
[10:14:31] Rejecting checkpoint
[10:14:32] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[10:14:32] 
[10:14:32] Extra SSE boost OK.
[10:14:32] Writing local files
[10:14:32] Completed 0 out of 5000000 steps  (0 percent)
[10:25:55] Writing local files
[10:25:55] Completed 50000 out of 5000000 steps  (1 percent)

snip

[13:16:36] Completed 800000 out of 5000000 steps  (16 percent)
[13:27:57] Writing local files
[13:27:57] Completed 850000 out of 5000000 steps  (17 percent)
[13:35:23] Warning:  long 1-4 interactions
[13:35:27] CoreStatus = 1 (1)
[13:35:27] Client-core communications error: ERROR 0x1
[13:35:27] Deleting current work unit & continuing...
[13:39:48] - Preparing to get new work unit...
[13:39:48] + Attempting to get work packet
[13:39:48] - Connecting to assignment server
[13:39:49] - Successful: assigned to (171.64.65.63).
[13:39:49] + News From Folding@Home: Welcome to Folding@Home
[13:39:49] Loaded queue successfully.
[13:39:51] + Closed connections
[13:39:56] 
[13:39:56] + Processing work unit
[13:39:56] Core required: FahCore_a1.exe
[13:39:56] Core found.
[13:39:56] Working on Unit 01 [April 2 13:39:56]
[13:39:56] + Working ...
[13:39:56] 
[13:39:56] *------------------------------*
[13:39:56] Folding@Home Gromacs SMP Core
[13:39:56] Version 1.74 (November 27, 2006)
[13:39:56] 
[13:39:56] Preparing to commence simulation
[13:39:56] - Ensuring status. Please wait.
[13:40:13] - Assembly optimizations manually forced on.
[13:40:13] - Not checking prior termination.
[13:40:13] - Expanded 608008 -> 3255645 (decompressed 535.4 percent)
[13:40:13] - Starting from initial work packet
[13:40:13] 
[13:40:13] Project: 3062 (Run 4, Clone 77, Gen 11)
[13:40:13] 
[13:40:13] Assembly optimizations on if available.
[13:40:13] Entering M.D.
[13:40:19] Protein: p3062_lambda5_99sb
[13:40:20] Writing local files
[13:40:20] Extra SSE boost OK.
[13:40:20] 
[13:40:20] Extra SSE boost OK.
[13:40:20] Writing local files
[13:40:20] Completed 0 out of 5000000 steps  (0 percent)
[13:51:40] Writing local files
[13:51:40] Completed 50000 out of 5000000 steps  (1 percent)
[14:03:01] Writing local files
[14:03:01] Completed 100000 out of 5000000 steps  (2 percent)
[14:14:23] Writing local files
[14:14:23] Completed 150000 out of 5000000 steps  (3 percent)
[14:25:44] Writing local files
[14:25:44] Completed 200000 out of 5000000 steps  (4 percent)
[14:37:04] Writing local files
[14:37:04] Completed 250000 out of 5000000 steps  (5 percent)

Re: Project: 3062 (Run 4, Clone 77, Gen 11) Bad WU?

Posted: Wed Apr 02, 2008 4:20 pm
by 7im
When if fails in the exact same place, that is the most likely answer. Nobody has returned this WU yet so that matches our guess. Thanks for the report.

Re: Project: 3062 (Run 4, Clone 77, Gen 11) Bad WU?

Posted: Fri Apr 11, 2008 7:26 pm
by DocJonz
One of my machines has baulked at a 3062 (Run 4, Clone 77, Gen 12) WU last night - same WU described in the post, but with Gen +1. It fell over 3x at 8% before pulling down a new WU. It was running on a [email protected]/2GB Ram/Ubuntu 7.10.

Code: Select all

[23:44:08] 
[23:44:08] *------------------------------*
[23:44:08] Folding@Home Gromacs SMP Core
[23:44:08] Version 1.74 (November 27, 2006)
[23:44:08] 
[23:44:08] Preparing to commence simulation
[23:44:08] - Ensuring status. Please wait.
[23:44:26] - Assembly optimizations manually forced on.
[23:44:26] - Not checking prior termination.
[23:44:26] - Expanded 283831 -> 1508541 (decompressed 531.4 percent)
[23:44:26] - Starting from initial work packet
[23:44:26] 
[23:44:26] Project: 3043 (Run 6, Clone 39, Gen 76)
[23:44:26] 
[23:44:26] Assembly optimizations on if available.
[23:44:26] Entering M.D.
[23:44:32] Protein: 9684 p3029_SProtein: 9684 p3029_SMP-emsv-03Extra SSE boost OK.
[23:44:32] 
[23:44:32] Extra SSE boost OK.
[23:44:32] Writing local files
[23:44:32] Completed 0 out of 10000000 steps  (0 percent)
[23:59:33] Timered checkpoint triggered.
[23:59:42] Writing local files
[23:59:42] Completed 100000 out of 10000000 steps  (1 percent)
[00:14:43] Timered checkpoint triggered.
[00:14:53] Writing local files
[00:14:53] Completed 200000 out of 10000000 steps  (2 percent)
[00:29:54] Timered checkpoint triggered.
[00:30:06] Writing local files
[00:30:06] Completed 300000 out of 10000000 steps  (3 percent)
[00:45:07] Timered checkpoint triggered.
[00:45:10] Writing local files
[00:45:10] Completed 400000 out of 10000000 steps  (4 percent)
[01:00:11] Timered checkpoint triggered.
[01:00:16] Writing local files
[01:00:16] Completed 500000 out of 10000000 steps  (5 percent)
[01:15:17] Timered checkpoint triggered.
[01:15:26] Writing local files
[01:15:26] Completed 600000 out of 10000000 steps  (6 percent)
[01:30:25] Writing local files
[01:30:25] Completed 700000 out of 10000000 steps  (7 percent)
[01:45:26] Timered checkpoint triggered.
[01:45:32] Writing local files
[01:45:32] Completed 800000 out of 10000000 steps  (8 percent)
[01:53:48] Warning:  long 1-4 interactions
[01:53:53] CoreStatus = 0 (0)
[01:53:53] Client-core communications error: ERROR 0x0