bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
Moderators: Site Moderators, FAHC Science Team
bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
Project: 6503 (Run 3, Clone 189, Gen 41)
Client-core communications error: ERROR 0x0
and
Project: 6511 (Run 0, Clone 94, Gen 11)
Client-core communications error: ERROR 0x0
I've deleted 'em and restarted, but of course it keeps sent them to me to try again.
(two clients running on a dual-core processor)
is it my imagination, or do there seem to be more bad WUs lately? I've run for several years without (AFAIK) previously encountering a bad WU.
Client-core communications error: ERROR 0x0
and
Project: 6511 (Run 0, Clone 94, Gen 11)
Client-core communications error: ERROR 0x0
I've deleted 'em and restarted, but of course it keeps sent them to me to try again.
(two clients running on a dual-core processor)
is it my imagination, or do there seem to be more bad WUs lately? I've run for several years without (AFAIK) previously encountering a bad WU.
-
- Site Moderator
- Posts: 6359
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
No data in the DB for Project: 6503 (Run 3, Clone 189, Gen 41) and Project: 6511 (Run 0, Clone 94, Gen 11) yet.
-
- Posts: 1024
- Joined: Sun Dec 02, 2007 12:43 pm
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
The wiki says that ERROR 0x0 is an unknown error so it's possibly the WU is bad or that you have some sort of hardware issue. Have you run diagnostics recently?
You didn't post FAHlog, but I'm guessing that the client deleted the WU rather than uploading a partial result. How far into the processing was it before it got the error? Please report what happens on the retry, too.
You didn't post FAHlog, but I'm guessing that the client deleted the WU rather than uploading a partial result. How far into the processing was it before it got the error? Please report what happens on the retry, too.
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
toTow said:
I haven't run any diagnostics lately, no. I can take it down and run memtest86+ for a while. But since everything else runs fine (and I usually have uptimes of 30-60 days whenever I get a kernel update causing me to reboot) I'd think it's probably not some hardware issue. but it could be: stranger things have happened.
Added code tags. ~sorto'
here's one of the log files (CPU1):
I'm not sure what he means, exactly: does he mean that no one has successfully returned one of those WUs yet, or does he mean there aren't any such WUs yet? or something else?No data in the DB for Project: 6503 (Run 3, Clone 189, Gen 41) and Project: 6511 (Run 0, Clone 94, Gen 11) yet.
I haven't run any diagnostics lately, no. I can take it down and run memtest86+ for a while. But since everything else runs fine (and I usually have uptimes of 30-60 days whenever I get a kernel update causing me to reboot) I'd think it's probably not some hardware issue. but it could be: stranger things have happened.
Added code tags. ~sorto'
here's one of the log files (CPU1):
Code: Select all
--- Opening Log file [July 13 12:58:41]
# Linux Console Edition #######################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/folding/foldingathome/CPU1
Executable: /home/folding/foldingathome/CPU1/fah6
Arguments: -verbosity 9
[12:58:41] - Ask before connecting: No
[12:58:41] - User name: fredex (Team 48721)
[12:58:41] - User ID: 61F9905C1CB2ABB5
[12:58:41] - Machine ID: 1
[12:58:41]
[12:58:41] Loaded queue successfully.
[12:58:41] - Preparing to get new work unit...
[12:58:41] + Attempting to get work packet
[12:58:41] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 4, Stepping: 2
[12:58:41] - Connecting to assignment server
[12:58:41] Connecting to http://assign.stanford.edu:8080/
[12:58:41] - Autosending finished units...
[12:58:41] Trying to send all finished work units
[12:58:41] + No unsent completed units remaining.
[12:58:41] - Autosend completed
[12:58:42] Posted data.
[12:58:42] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[12:58:42] + News From Folding@Home: Welcome to Folding@Home
[12:58:42] Loaded queue successfully.
[12:58:42] Connecting to http://171.64.65.62:8080/
[12:58:43] Posted data.
[12:58:43] Initial: 0000; - Receiving payload (expected size: 751267)
[12:58:49] - Downloaded at ~122 kB/s
[12:58:49] - Averaged speed for that direction ~315 kB/s
[12:58:49] + Received work.
[12:58:49] + Closed connections
[12:58:49]
[12:58:49] + Processing work unit
[12:58:49] Core required: FahCore_78.exe
[12:58:49] Core found.
[12:58:49] Working on Unit 04 [July 13 12:58:49]
[12:58:49] + Working ...
[12:58:49] - Calling './FahCore_78.exe -dir work/ -suffix 04 -checkpoint 15 -verbose -lifeline 6108 -version 602'
[12:58:49]
[12:58:49] *------------------------------*
[12:58:49] Folding@Home Gromacs Core
[12:58:49] Version 1.90 (March 8, 2006)
[12:58:49]
[12:58:49] Preparing to commence simulation
[12:58:49] - Looking at optimizations...
[12:58:49] - Created dyn
[12:58:49] - Files status OK
[12:58:49] - Expanded 750755 -> 3750157 (decompressed 499.5 percent)
[12:58:49] - Starting from initial work packet
[12:58:49]
[12:58:49] Project: 6511 (Run 0, Clone 94, Gen 11)
[12:58:49]
[12:58:49] Assembly optimizations on if available.
[12:58:49] Entering M.D.
[12:58:56] Protein: UBIQUITIN MODEL250 in water
[12:58:56]
[12:58:56] Writing local files
[12:58:56] Extra SSE boost OK.
[12:58:56] Writing local files
[12:58:56] Completed 0 out of 250000 steps (0%)
[13:06:29] Writing local files
[13:06:29] Completed 2500 out of 250000 steps (1%)
[13:14:03] Writing local files
[13:14:03] Completed 5000 out of 250000 steps (2%)
[13:21:36] Writing local files
[13:21:36] Completed 7500 out of 250000 steps (3%)
[13:29:08] Writing local files
[13:29:08] Completed 10000 out of 250000 steps (4%)
[13:36:39] Writing local files
[13:36:39] Completed 12500 out of 250000 steps (5%)
[13:44:12] Writing local files
[13:44:12] Completed 15000 out of 250000 steps (6%)
[13:51:45] Writing local files
[13:51:45] Completed 17500 out of 250000 steps (7%)
[13:59:15] Writing local files
[13:59:15] Completed 20000 out of 250000 steps (8%)
[14:06:51] Writing local files
[14:06:51] Completed 22500 out of 250000 steps (9%)
[14:14:24] Writing local files
[14:14:24] Completed 25000 out of 250000 steps (10%)
[14:21:56] Writing local files
[14:21:56] Completed 27500 out of 250000 steps (11%)
[14:29:28] Writing local files
[14:29:28] Completed 30000 out of 250000 steps (12%)
[14:37:00] Writing local files
[14:37:00] Completed 32500 out of 250000 steps (13%)
[14:44:33] Writing local files
[14:44:33] Completed 35000 out of 250000 steps (14%)
[14:52:05] Writing local files
[14:52:05] Completed 37500 out of 250000 steps (15%)
[14:59:39] Writing local files
[14:59:39] Completed 40000 out of 250000 steps (16%)
[15:07:14] Writing local files
[15:07:14] Completed 42500 out of 250000 steps (17%)
[15:14:46] Writing local files
[15:14:46] Completed 45000 out of 250000 steps (18%)
[15:22:18] Writing local files
[15:22:18] Completed 47500 out of 250000 steps (19%)
[15:29:50] Writing local files
[15:29:50] Completed 50000 out of 250000 steps (20%)
[15:37:22] Writing local files
[15:37:22] Completed 52500 out of 250000 steps (21%)
[15:44:56] Writing local files
[15:44:56] Completed 55000 out of 250000 steps (22%)
[15:52:28] Writing local files
[15:52:28] Completed 57500 out of 250000 steps (23%)
[16:00:01] Writing local files
[16:00:01] Completed 60000 out of 250000 steps (24%)
[16:07:35] Writing local files
[16:07:35] Completed 62500 out of 250000 steps (25%)
[16:15:09] Writing local files
[16:15:09] Completed 65000 out of 250000 steps (26%)
[16:20:01] CoreStatus = 0 (0)
[16:20:01] Client-core communications error: ERROR 0x0
[16:20:01] Deleting current work unit & continuing...
[16:20:18] Trying to send all finished work units
[16:20:18] + No unsent completed units remaining.
[16:20:18] - Preparing to get new work unit...
[16:20:18] + Attempting to get work packet
[16:20:18] - Connecting to assignment server
[16:20:18] Connecting to http://assign.stanford.edu:8080/
[16:20:19] Posted data.
[16:20:19] Initial: 40AB; - Successful: assigned to (171.64.65.111).
[16:20:19] + News From Folding@Home: Welcome to Folding@Home
[16:20:19] Loaded queue successfully.
[16:20:19] Connecting to http://171.64.65.111:8080/
[16:20:20] Posted data.
[16:20:20] Initial: 0000; - Receiving payload (expected size: 464525)
[16:20:22] - Downloaded at ~226 kB/s
[16:20:22] - Averaged speed for that direction ~298 kB/s
[16:20:22] + Received work.
[16:20:22] + Closed connections
[16:20:27]
[16:20:27] + Processing work unit
[16:20:27] Core required: FahCore_78.exe
[16:20:27] Core found.
[16:20:27] Working on Unit 05 [July 13 16:20:27]
[16:20:27] + Working ...
[16:20:27] - Calling './FahCore_78.exe -dir work/ -suffix 05 -checkpoint 15 -verbose -lifeline 6108 -version 602'
[16:20:27]
[16:20:27] *------------------------------*
[16:20:27] Folding@Home Gromacs Core
[16:20:27] Version 1.90 (March 8, 2006)
[16:20:27]
[16:20:27] Preparing to commence simulation
[16:20:27] - Looking at optimizations...
[16:20:27] - Created dyn
[16:20:27] - Files status OK
[16:20:27] - Expanded 464013 -> 2244013 (decompressed 483.6 percent)
[16:20:27] - Starting from initial work packet
[16:20:27]
[16:20:27] Project: 6316 (Run 43, Clone 1, Gen 72)
[16:20:27]
[16:20:27] Assembly optimizations on if available.
[16:20:27] Entering M.D.
[16:20:33] Protein: p6316_sh3_with_ALA_frags
[16:20:33]
[16:20:33] Writing local files
[16:20:33] Extra SSE boost OK.
[16:20:33] Writing local files
[16:20:33] Completed 0 out of 500000 steps (0%)
[16:30:00] Writing local files
[16:30:00] Completed 5000 out of 500000 steps (1%)
[16:39:26] Writing local files
[16:39:26] Completed 10000 out of 500000 steps (2%)
[16:48:52] Writing local files
[16:48:52] Completed 15000 out of 500000 steps (3%)
[16:58:18] Writing local files
[16:58:18] Completed 20000 out of 500000 steps (4%)
[17:07:45] Writing local files
[17:07:45] Completed 25000 out of 500000 steps (5%)
[17:17:12] Writing local files
[17:17:12] Completed 30000 out of 500000 steps (6%)
[17:26:40] Writing local files
[17:26:40] Completed 35000 out of 500000 steps (7%)
[17:36:07] Writing local files
[17:36:07] Completed 40000 out of 500000 steps (8%)
[17:45:33] Writing local files
[17:45:33] Completed 45000 out of 500000 steps (9%)
[17:54:59] Writing local files
[17:54:59] Completed 50000 out of 500000 steps (10%)
[18:04:26] Writing local files
[18:04:26] Completed 55000 out of 500000 steps (11%)
[18:13:54] Writing local files
[18:13:54] Completed 60000 out of 500000 steps (12%)
[18:23:21] Writing local files
[18:23:21] Completed 65000 out of 500000 steps (13%)
[18:32:46] Writing local files
[18:32:46] Completed 70000 out of 500000 steps (14%)
[18:42:12] Writing local files
[18:42:12] Completed 75000 out of 500000 steps (15%)
[18:51:36] Writing local files
[18:51:36] Completed 80000 out of 500000 steps (16%)
[18:58:41] - Autosending finished units...
[18:58:41] Trying to send all finished work units
[18:58:41] + No unsent completed units remaining.
[18:58:41] - Autosend completed
[19:01:01] Writing local files
[19:01:01] Completed 85000 out of 500000 steps (17%)
[/quote]
and here's the other one (CPU2):
[quote]--- Opening Log file [July 13 12:58:41]
# Linux Console Edition #######################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/folding/foldingathome/CPU2
Executable: /home/folding/foldingathome/CPU2/fah6
Arguments: -verbosity 9
[12:58:41] - Ask before connecting: No
[12:58:41] - User name: fredex (Team 48721)
[12:58:41] - User ID: 2DF7B59021CFC89F
[12:58:41] - Machine ID: 2
[12:58:41]
[12:58:41] Loaded queue successfully.
[12:58:41] - Preparing to get new work unit...
[12:58:41] + Attempting to get work packet
[12:58:41] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 4, Stepping: 2
[12:58:41] - Connecting to assignment server
[12:58:41] Connecting to http://assign.stanford.edu:8080/
[12:58:41] - Autosending finished units...
[12:58:41] Trying to send all finished work units
[12:58:41] + No unsent completed units remaining.
[12:58:41] - Autosend completed
[12:58:42] Posted data.
[12:58:42] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[12:58:42] + News From Folding@Home: Welcome to Folding@Home
[12:58:42] Loaded queue successfully.
[12:58:42] Connecting to http://171.64.65.62:8080/
[12:58:43] Posted data.
[12:58:43] Initial: 0000; - Receiving payload (expected size: 515872)
[12:58:47] - Downloaded at ~125 kB/s
[12:58:47] - Averaged speed for that direction ~268 kB/s
[12:58:47] + Received work.
[12:58:47] + Closed connections
[12:58:47]
[12:58:47] + Processing work unit
[12:58:47] Core required: FahCore_78.exe
[12:58:47] Core found.
[12:58:47] Working on Unit 02 [July 13 12:58:47]
[12:58:47] + Working ...
[12:58:47] - Calling './FahCore_78.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 6122 -version 602'
[12:58:47]
[12:58:47] *------------------------------*
[12:58:47] Folding@Home Gromacs Core
[12:58:47] Version 1.90 (March 8, 2006)
[12:58:47]
[12:58:47] Preparing to commence simulation
[12:58:47] - Looking at optimizations...
[12:58:47] - Created dyn
[12:58:47] - Files status OK
[12:58:47] - Expanded 515360 -> 2531073 (decompressed 491.1 percent)
[12:58:47] - Starting from initial work packet
[12:58:47]
[12:58:47] Project: 6503 (Run 3, Clone 189, Gen 41)
[12:58:47]
[12:58:47] Assembly optimizations on if available.
[12:58:47] Entering M.D.
[12:58:53] Protein: TR462_B_4 in water
[12:58:53]
[12:58:53] Writing local files
[12:58:53] Extra SSE boost OK.
[12:58:53] Writing local files
[12:58:53] Completed 0 out of 250000 steps (0%)
[13:03:36] Writing local files
[13:03:36] Completed 2500 out of 250000 steps (1%)
[13:08:21] Writing local files
[13:08:21] Completed 5000 out of 250000 steps (2%)
[13:13:03] Writing local files
[13:13:03] Completed 7500 out of 250000 steps (3%)
[13:17:45] Writing local files
[13:17:45] Completed 10000 out of 250000 steps (4%)
[13:22:27] Writing local files
[13:22:27] Completed 12500 out of 250000 steps (5%)
[13:27:10] Writing local files
[13:27:10] Completed 15000 out of 250000 steps (6%)
[13:31:53] Writing local files
[13:31:53] Completed 17500 out of 250000 steps (7%)
[13:36:36] Writing local files
[13:36:36] Completed 20000 out of 250000 steps (8%)
[13:41:18] Writing local files
[13:41:18] Completed 22500 out of 250000 steps (9%)
[13:46:01] Writing local files
[13:46:01] Completed 25000 out of 250000 steps (10%)
[13:50:44] Writing local files
[13:50:44] Completed 27500 out of 250000 steps (11%)
[13:55:28] Writing local files
[13:55:28] Completed 30000 out of 250000 steps (12%)
[14:00:12] Writing local files
[14:00:12] Completed 32500 out of 250000 steps (13%)
[14:04:54] Writing local files
[14:04:54] Completed 35000 out of 250000 steps (14%)
[14:09:37] Writing local files
[14:09:37] Completed 37500 out of 250000 steps (15%)
[14:14:20] Writing local files
[14:14:20] Completed 40000 out of 250000 steps (16%)
[14:19:03] Writing local files
[14:19:03] Completed 42500 out of 250000 steps (17%)
[14:23:46] Writing local files
[14:23:46] Completed 45000 out of 250000 steps (18%)
[14:28:29] Writing local files
[14:28:29] Completed 47500 out of 250000 steps (19%)
[14:33:12] Writing local files
[14:33:12] Completed 50000 out of 250000 steps (20%)
[14:37:55] Writing local files
[14:37:55] Completed 52500 out of 250000 steps (21%)
[14:42:38] Writing local files
[14:42:38] Completed 55000 out of 250000 steps (22%)
[14:47:21] Writing local files
[14:47:21] Completed 57500 out of 250000 steps (23%)
[14:52:04] Writing local files
[14:52:04] Completed 60000 out of 250000 steps (24%)
[14:56:46] Writing local files
[14:56:46] Completed 62500 out of 250000 steps (25%)
[15:01:29] Writing local files
[15:01:29] Completed 65000 out of 250000 steps (26%)
[15:06:12] Writing local files
[15:06:12] Completed 67500 out of 250000 steps (27%)
[15:10:55] Writing local files
[15:10:55] Completed 70000 out of 250000 steps (28%)
[15:15:38] Writing local files
[15:15:38] Completed 72500 out of 250000 steps (29%)
[15:20:22] Writing local files
[15:20:22] Completed 75000 out of 250000 steps (30%)
[15:25:05] Writing local files
[15:25:05] Completed 77500 out of 250000 steps (31%)
[15:29:49] Writing local files
[15:29:49] Completed 80000 out of 250000 steps (32%)
[15:34:33] Writing local files
[15:34:33] Completed 82500 out of 250000 steps (33%)
[15:39:15] Writing local files
[15:39:15] Completed 85000 out of 250000 steps (34%)
[15:43:57] Writing local files
[15:43:58] Completed 87500 out of 250000 steps (35%)
[15:48:41] Writing local files
[15:48:41] Completed 90000 out of 250000 steps (36%)
[15:53:24] Writing local files
[15:53:24] Completed 92500 out of 250000 steps (37%)
[15:58:07] Writing local files
[15:58:07] Completed 95000 out of 250000 steps (38%)
[16:02:51] Writing local files
[16:02:51] Completed 97500 out of 250000 steps (39%)
[16:07:35] Writing local files
[16:07:35] Completed 100000 out of 250000 steps (40%)
[16:12:18] Writing local files
[16:12:18] Completed 102500 out of 250000 steps (41%)
[16:17:01] Writing local files
[16:17:01] Completed 105000 out of 250000 steps (42%)
[16:21:44] Writing local files
[16:21:44] Completed 107500 out of 250000 steps (43%)
[16:26:25] Writing local files
[16:26:25] Completed 110000 out of 250000 steps (44%)
[16:31:08] Writing local files
[16:31:08] Completed 112500 out of 250000 steps (45%)
[16:35:51] Writing local files
[16:35:51] Completed 115000 out of 250000 steps (46%)
[16:40:34] Writing local files
[16:40:34] Completed 117500 out of 250000 steps (47%)
[16:45:17] Writing local files
[16:45:17] Completed 120000 out of 250000 steps (48%)
[16:49:59] Writing local files
[16:49:59] Completed 122500 out of 250000 steps (49%)
[16:54:42] Writing local files
[16:54:42] Completed 125000 out of 250000 steps (50%)
[16:59:25] Writing local files
[16:59:25] Completed 127500 out of 250000 steps (51%)
[17:04:07] Writing local files
[17:04:07] Completed 130000 out of 250000 steps (52%)
[17:08:51] Writing local files
[17:08:51] Completed 132500 out of 250000 steps (53%)
[17:13:33] Writing local files
[17:13:33] Completed 135000 out of 250000 steps (54%)
[17:18:14] Writing local files
[17:18:14] Completed 137500 out of 250000 steps (55%)
[17:22:57] Writing local files
[17:22:57] Completed 140000 out of 250000 steps (56%)
[17:27:38] Writing local files
[17:27:38] Completed 142500 out of 250000 steps (57%)
[17:32:19] Writing local files
[17:32:20] Completed 145000 out of 250000 steps (58%)
[17:37:02] Writing local files
[17:37:02] Completed 147500 out of 250000 steps (59%)
[17:41:44] Writing local files
[17:41:44] Completed 150000 out of 250000 steps (60%)
[17:46:27] Writing local files
[17:46:27] Completed 152500 out of 250000 steps (61%)
[17:51:09] Writing local files
[17:51:09] Completed 155000 out of 250000 steps (62%)
[17:53:08] CoreStatus = 0 (0)
[17:53:08] Client-core communications error: ERROR 0x0
[17:53:08] Deleting current work unit & continuing...
[17:53:26] Trying to send all finished work units
[17:53:26] + No unsent completed units remaining.
[17:53:26] - Preparing to get new work unit...
[17:53:26] + Attempting to get work packet
[17:53:26] - Connecting to assignment server
[17:53:26] Connecting to http://assign.stanford.edu:8080/
[17:53:26] Posted data.
[17:53:26] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[17:53:26] + News From Folding@Home: Welcome to Folding@Home
[17:53:26] Loaded queue successfully.
[17:53:26] Connecting to http://171.64.65.62:8080/
[17:53:27] Posted data.
[17:53:27] Initial: 0000; - Receiving payload (expected size: 515872)
[17:53:34] - Downloaded at ~71 kB/s
[17:53:34] - Averaged speed for that direction ~229 kB/s
[17:53:34] + Received work.
[17:53:34] + Closed connections
[17:53:39]
[17:53:39] + Processing work unit
[17:53:39] Core required: FahCore_78.exe
[17:53:39] Core found.
[17:53:39] Working on Unit 03 [July 13 17:53:39]
[17:53:39] + Working ...
[17:53:39] - Calling './FahCore_78.exe -dir work/ -suffix 03 -checkpoint 15 -verbose -lifeline 6122 -version 602'
[17:53:39]
[17:53:39] *------------------------------*
[17:53:39] Folding@Home Gromacs Core
[17:53:39] Version 1.90 (March 8, 2006)
[17:53:39]
[17:53:39] Preparing to commence simulation
[17:53:39] - Looking at optimizations...
[17:53:39] - Created dyn
[17:53:39] - Files status OK
[17:53:39] - Expanded 515360 -> 2531073 (decompressed 491.1 percent)
[17:53:39] - Starting from initial work packet
[17:53:39]
[17:53:39] Project: 6503 (Run 3, Clone 189, Gen 41)
[17:53:39]
[17:53:39] Assembly optimizations on if available.
[17:53:39] Entering M.D.
[17:53:45] Protein: TR462_B_4 in water
[17:53:45]
[17:53:45] Writing local files
[17:53:45] Extra SSE boost OK.
[17:53:45] Writing local files
[17:53:45] Completed 0 out of 250000 steps (0%)
[17:58:27] Writing local files
[17:58:28] Completed 2500 out of 250000 steps (1%)
[18:03:10] Writing local files
[18:03:10] Completed 5000 out of 250000 steps (2%)
[18:07:54] Writing local files
[18:07:54] Completed 7500 out of 250000 steps (3%)
[18:12:34] Writing local files
[18:12:34] Completed 10000 out of 250000 steps (4%)
[18:17:17] Writing local files
[18:17:17] Completed 12500 out of 250000 steps (5%)
[18:21:58] Writing local files
[18:21:58] Completed 15000 out of 250000 steps (6%)
[18:26:40] Writing local files
[18:26:40] Completed 17500 out of 250000 steps (7%)
[18:31:22] Writing local files
[18:31:22] Completed 20000 out of 250000 steps (8%)
[18:36:04] Writing local files
[18:36:04] Completed 22500 out of 250000 steps (9%)
[18:40:47] Writing local files
[18:40:47] Completed 25000 out of 250000 steps (10%)
[18:45:30] Writing local files
[18:45:30] Completed 27500 out of 250000 steps (11%)
[18:50:14] Writing local files
[18:50:14] Completed 30000 out of 250000 steps (12%)
[18:54:58] Writing local files
[18:54:58] Completed 32500 out of 250000 steps (13%)
[18:58:41] - Autosending finished units...
[18:58:41] Trying to send all finished work units
[18:58:41] + No unsent completed units remaining.
[18:58:41] - Autosend completed
[18:59:42] Writing local files
[18:59:42] Completed 35000 out of 250000 steps (14%)
[19:04:25] Writing local files
[19:04:25] Completed 37500 out of 250000 steps (15%)
-
- Site Admin
- Posts: 3110
- Joined: Fri Nov 30, 2007 8:06 pm
- Location: Team Helix
- Contact:
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
Yes, that is what he was saying and that there were not any signs, either, of anyone returning partial WUs. If the WUs have been assigned to someone else and they are not causing problems for them, it should take a little time to learn if they are completed successfully.I'm not sure what he means, exactly: does he mean that no one has successfully returned one of those WUs yet ...
Edit: I just checked again and there is still no data back on either of those work units.
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
Have a look in your /var/log/syslog about that time. I think you would find suchfredex wrote:toTow said:
I haven't run any diagnostics lately, no. I can take it down and run memtest86+ for a while. But since everything else runs fine (and I usually have uptimes of 30-60 days whenever I get a kernel update causing me to reboot) I'd think it's probably not some hardware issue. but it could be: stranger things have happened.
Code: Select all
Jul 17 08:07:14 carpet kernel: [3522967.506736] FahCore_78.exe[13126]: segfault at e629da00 ip 08087a2f sp bf3fe3dc error 5 in FahCore_78.exe[8048000+322000]
--
I'm counting for science.
Points just make me sick.
I'm counting for science.
Points just make me sick.
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
Hi fredex (team 48721),
Your WU (P6503 R3 C189 G41) was added to the stats database on 2010-07-17 03:06:59 for 75 points of credit.
Your WU (P6503 R3 C189 G41) was added to the stats database on 2010-07-17 03:06:59 for 75 points of credit.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
Bruce:
Thanks for the info!
I continue to have what appears to be the same problem with several other WUs. so far I've not found any problems here.
I've taken the system down and run Memtest86+ through two full iterations (around an hour and a half) letting it run all its standard tests. nothing.
I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine.
but I continue to have these failures in FAH. various projects, various WUs, but all with the same 0x0 error.
Some WUs process successfully, many don't. it's frustrating.
here's a log entry from one that failed just this evening:
that same WU has failed several times in a row, at the same point (based on the % printouts) and with the same error code.
I've got two FAH clients running (it's a dual core Phenom II) and both take the same kinds of errors on various projects. Here's another log entry from today from the OTHER client:
"whynot" suggested I'd find segfaults listed in my "syslog" (I assume he means /var/log/messages) at the same time as these FAH failures occur, but I don't. I see nothing at all like he suggests.
Thanks for the info!
I continue to have what appears to be the same problem with several other WUs. so far I've not found any problems here.
I've taken the system down and run Memtest86+ through two full iterations (around an hour and a half) letting it run all its standard tests. nothing.
I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine.
but I continue to have these failures in FAH. various projects, various WUs, but all with the same 0x0 error.
Some WUs process successfully, many don't. it's frustrating.
here's a log entry from one that failed just this evening:
Code: Select all
[17:58:43] Initial: 0000; - Receiving payload (expected size: 518483)
[17:58:44] - Downloaded at ~506 kB/s
[17:58:44] - Averaged speed for that direction ~424 kB/s
[17:58:44] + Received work.
[17:58:44] + Closed connections
[17:58:49]
[17:58:49] + Processing work unit
[17:58:49] Core required: FahCore_78.exe
[17:58:49] Core found.
[17:58:49] Working on Unit 09 [July 23 17:58:49]
[17:58:49] + Working ...
[17:58:49] - Calling './FahCore_78.exe -dir work/ -suffix 09 -checkpoint 15 -verbose -lifeline 6594 -version 602'
[17:58:49]
[17:58:49] *------------------------------*
[17:58:49] Folding@Home Gromacs Core
[17:58:49] Version 1.90 (March 8, 2006)
[17:58:49]
[17:58:49] Preparing to commence simulation
[17:58:49] - Looking at optimizations...
[17:58:49] - Created dyn
[17:58:49] - Files status OK
[17:58:50] - Expanded 517971 -> 2533901 (decompressed 489.1 percent)
[17:58:50] - Starting from initial work packet
[17:58:50]
[17:58:50] Project: 6513 (Run 6, Clone 43, Gen 18)
[17:58:50]
[17:58:50] Assembly optimizations on if available.
[17:58:50] Entering M.D.
[17:58:56] Protein: TR462_B_7 in water
[17:58:56]
[17:58:56] Writing local files
[17:58:56] Extra SSE boost OK.
[17:58:56] Writing local files
[17:58:56] Completed 0 out of 250000 steps (0%)
[18:03:38] Writing local files
[18:03:38] Completed 2500 out of 250000 steps (1%)
[18:08:21] Writing local files
[18:08:21] Completed 5000 out of 250000 steps (2%)
[18:13:04] Writing local files
[18:13:04] Completed 7500 out of 250000 steps (3%)
[18:17:47] Writing local files
[18:17:47] Completed 10000 out of 250000 steps (4%)
[18:22:29] Writing local files
[18:22:29] Completed 12500 out of 250000 steps (5%)
[18:27:12] Writing local files
[18:27:13] Completed 15000 out of 250000 steps (6%)
[18:31:55] Writing local files
[18:31:55] Completed 17500 out of 250000 steps (7%)
[18:36:38] Writing local files
[18:36:38] Completed 20000 out of 250000 steps (8%)
[18:41:21] Writing local files
[18:41:21] Completed 22500 out of 250000 steps (9%)
[18:46:04] Writing local files
[18:46:04] Completed 25000 out of 250000 steps (10%)
[18:50:47] Writing local files
[18:50:47] Completed 27500 out of 250000 steps (11%)
[18:55:31] Writing local files
[18:55:31] Completed 30000 out of 250000 steps (12%)
[19:00:14] Writing local files
[19:00:15] Completed 32500 out of 250000 steps (13%)
[19:04:57] Writing local files
[19:04:57] Completed 35000 out of 250000 steps (14%)
[19:09:41] Writing local files
[19:09:41] Completed 37500 out of 250000 steps (15%)
[19:14:24] Writing local files
[19:14:24] Completed 40000 out of 250000 steps (16%)
[19:19:06] Writing local files
[19:19:07] Completed 42500 out of 250000 steps (17%)
[19:23:50] Writing local files
[19:23:50] Completed 45000 out of 250000 steps (18%)
[19:28:33] Writing local files
[19:28:33] Completed 47500 out of 250000 steps (19%)
[19:33:15] Writing local files
[19:33:15] Completed 50000 out of 250000 steps (20%)
[19:37:58] Writing local files
[19:37:58] Completed 52500 out of 250000 steps (21%)
[19:42:40] Writing local files
[19:42:40] Completed 55000 out of 250000 steps (22%)
[19:47:22] Writing local files
[19:47:22] Completed 57500 out of 250000 steps (23%)
[19:52:05] Writing local files
[19:52:05] Completed 60000 out of 250000 steps (24%)
[19:56:48] Writing local files
[19:56:48] Completed 62500 out of 250000 steps (25%)
[20:01:31] Writing local files
[20:01:31] Completed 65000 out of 250000 steps (26%)
[20:06:14] Writing local files
[20:06:14] Completed 67500 out of 250000 steps (27%)
[20:10:56] Writing local files
[20:10:56] Completed 70000 out of 250000 steps (28%)
[20:15:39] Writing local files
[20:15:39] Completed 72500 out of 250000 steps (29%)
[20:20:22] Writing local files
[20:20:22] Completed 75000 out of 250000 steps (30%)
[20:25:05] Writing local files
[20:25:05] Completed 77500 out of 250000 steps (31%)
[20:29:48] Writing local files
[20:29:48] Completed 80000 out of 250000 steps (32%)
[20:34:30] Writing local files
[20:34:30] Completed 82500 out of 250000 steps (33%)
[20:39:13] Writing local files
[20:39:13] Completed 85000 out of 250000 steps (34%)
[20:43:55] Writing local files
[20:43:55] Completed 87500 out of 250000 steps (35%)
[20:48:38] Writing local files
[20:48:38] Completed 90000 out of 250000 steps (36%)
[20:50:45] - Autosending finished units...
[20:50:45] Trying to send all finished work units
[20:50:45] + No unsent completed units remaining.
[20:50:45] - Autosend completed
[20:53:23] Writing local files
[20:53:23] Completed 92500 out of 250000 steps (37%)
[20:58:05] Writing local files
[20:58:05] Completed 95000 out of 250000 steps (38%)
[21:02:48] Writing local files
[21:02:48] Completed 97500 out of 250000 steps (39%)
[21:07:31] Writing local files
[21:07:31] Completed 100000 out of 250000 steps (40%)
[21:12:14] Writing local files
[21:12:14] Completed 102500 out of 250000 steps (41%)
[21:16:57] Writing local files
[21:16:57] Completed 105000 out of 250000 steps (42%)
[21:21:40] Writing local files
[21:21:40] Completed 107500 out of 250000 steps (43%)
[21:26:22] Writing local files
[21:26:22] Completed 110000 out of 250000 steps (44%)
[21:31:06] Writing local files
[21:31:06] Completed 112500 out of 250000 steps (45%)
[21:35:48] Writing local files
[21:35:48] Completed 115000 out of 250000 steps (46%)
[21:40:31] Writing local files
[21:40:31] Completed 117500 out of 250000 steps (47%)
[21:43:32] CoreStatus = 0 (0)
[21:43:32] Client-core communications error: ERROR 0x0
[21:43:32] Deleting current work unit & continuing...
[21:43:50] Trying to send all finished work units
[21:43:50] + No unsent completed units remaining.
[21:43:50] - Preparing to get new work unit...
[21:43:50] + Attempting to get work packet
[21:43:50] - Connecting to assignment server
[21:43:50] Connecting to http://assign.stanford.edu:8080/
[21:43:50] Posted data.
[21:43:50] Initial: 40AB; - Successful: assigned to (171.64.65.111).
[21:43:50] + News From Folding@Home: Welcome to Folding@Home
[21:43:50] Loaded queue successfully.
[21:43:50] Connecting to http://171.64.65.111:8080/
[21:43:51] Posted data.
[21:43:51] Initial: 0000; - Receiving payload (expected size: 465035)
[21:43:52] - Downloaded at ~454 kB/s
[21:43:52] - Averaged speed for that direction ~430 kB/s
[21:43:52] + Received work.
[21:43:52] + Closed connections
I've got two FAH clients running (it's a dual core Phenom II) and both take the same kinds of errors on various projects. Here's another log entry from today from the OTHER client:
Code: Select all
[19:21:05] Initial: 0000; - Receiving payload (expected size: 519161)
[19:21:06] - Downloaded at ~506 kB/s
[19:21:06] - Averaged speed for that direction ~432 kB/s
[19:21:06] + Received work.
[19:21:06] + Closed connections
[19:21:11]
[19:21:11] + Processing work unit
[19:21:11] Core required: FahCore_78.exe
[19:21:11] Core found.
[19:21:11] Working on Unit 06 [July 23 19:21:11]
[19:21:11] + Working ...
[19:21:11] - Calling './FahCore_78.exe -dir work/ -suffix 06 -checkpoint 15 -verbose -lifeline 6549 -version 602'
[19:21:11]
[19:21:11] *------------------------------*
[19:21:11] Folding@Home Gromacs Core
[19:21:11] Version 1.90 (March 8, 2006)
[19:21:11]
[19:21:11] Preparing to commence simulation
[19:21:11] - Looking at optimizations...
[19:21:11] - Created dyn
[19:21:11] - Files status OK
[19:21:12] - Expanded 518649 -> 2533093 (decompressed 488.4 percent)
[19:21:12] - Starting from initial work packet
[19:21:12]
[19:21:12] Project: 6503 (Run 17, Clone 92, Gen 56)
[19:21:12]
[19:21:12] Assembly optimizations on if available.
[19:21:12] Entering M.D.
[19:21:18] Protein: TR462_B_18 in water
[19:21:18]
[19:21:18] Writing local files
[19:21:18] Extra SSE boost OK.
[19:21:18] Writing local files
[19:21:18] Completed 0 out of 250000 steps (0%)
[19:26:00] Writing local files
[19:26:00] Completed 2500 out of 250000 steps (1%)
[19:30:43] Writing local files
[19:30:43] Completed 5000 out of 250000 steps (2%)
[19:35:26] Writing local files
[19:35:26] Completed 7500 out of 250000 steps (3%)
[19:40:08] Writing local files
[19:40:08] Completed 10000 out of 250000 steps (4%)
[19:44:51] Writing local files
[19:44:51] Completed 12500 out of 250000 steps (5%)
[19:49:34] Writing local files
[19:49:34] Completed 15000 out of 250000 steps (6%)
[19:54:21] Writing local files
[19:54:21] Completed 17500 out of 250000 steps (7%)
[19:59:02] Writing local files
[19:59:02] Completed 20000 out of 250000 steps (8%)
[20:03:45] Writing local files
[20:03:45] Completed 22500 out of 250000 steps (9%)
[20:08:27] Writing local files
[20:08:27] Completed 25000 out of 250000 steps (10%)
[20:13:09] Writing local files
[20:13:09] Completed 27500 out of 250000 steps (11%)
[20:17:52] Writing local files
[20:17:52] Completed 30000 out of 250000 steps (12%)
[20:22:33] Writing local files
[20:22:33] Completed 32500 out of 250000 steps (13%)
[20:27:15] Writing local files
[20:27:15] Completed 35000 out of 250000 steps (14%)
[20:31:57] Writing local files
[20:31:57] Completed 37500 out of 250000 steps (15%)
[20:36:40] Writing local files
[20:36:40] Completed 40000 out of 250000 steps (16%)
[20:41:22] Writing local files
[20:41:22] Completed 42500 out of 250000 steps (17%)
[20:46:05] Writing local files
[20:46:05] Completed 45000 out of 250000 steps (18%)
[20:50:45] - Autosending finished units...
[20:50:45] Trying to send all finished work units
[20:50:45] + No unsent completed units remaining.
[20:50:45] - Autosend completed
[20:50:48] Writing local files
[20:50:48] Completed 47500 out of 250000 steps (19%)
[20:55:30] Writing local files
[20:55:30] Completed 50000 out of 250000 steps (20%)
[21:00:12] Writing local files
[21:00:13] Completed 52500 out of 250000 steps (21%)
[21:04:55] Writing local files
[21:04:55] Completed 55000 out of 250000 steps (22%)
[21:09:37] Writing local files
[21:09:37] Completed 57500 out of 250000 steps (23%)
[21:14:19] Writing local files
[21:14:19] Completed 60000 out of 250000 steps (24%)
[21:19:02] Writing local files
[21:19:02] Completed 62500 out of 250000 steps (25%)
[21:23:44] Writing local files
[21:23:44] Completed 65000 out of 250000 steps (26%)
[21:28:26] Writing local files
[21:28:26] Completed 67500 out of 250000 steps (27%)
[21:33:09] Writing local files
[21:33:09] Completed 70000 out of 250000 steps (28%)
[21:37:52] Writing local files
[21:37:52] Completed 72500 out of 250000 steps (29%)
[21:42:35] Writing local files
[21:42:35] Completed 75000 out of 250000 steps (30%)
[21:47:18] Writing local files
[21:47:18] Completed 77500 out of 250000 steps (31%)
[21:52:03] Writing local files
[21:52:03] Completed 80000 out of 250000 steps (32%)
[21:56:46] Writing local files
[21:56:46] Completed 82500 out of 250000 steps (33%)
[22:01:28] Writing local files
[22:01:28] Completed 85000 out of 250000 steps (34%)
[22:06:11] Writing local files
[22:06:11] Completed 87500 out of 250000 steps (35%)
[22:10:54] Writing local files
[22:10:54] Completed 90000 out of 250000 steps (36%)
[22:15:38] Writing local files
[22:15:38] Completed 92500 out of 250000 steps (37%)
[22:20:22] Writing local files
[22:20:22] Completed 95000 out of 250000 steps (38%)
[22:25:06] Writing local files
[22:25:06] Completed 97500 out of 250000 steps (39%)
[22:29:49] Writing local files
[22:29:49] Completed 100000 out of 250000 steps (40%)
[22:34:32] Writing local files
[22:34:32] Completed 102500 out of 250000 steps (41%)
[22:39:15] Writing local files
[22:39:15] Completed 105000 out of 250000 steps (42%)
[22:43:58] Writing local files
[22:43:58] Completed 107500 out of 250000 steps (43%)
[22:48:41] Writing local files
[22:48:41] Completed 110000 out of 250000 steps (44%)
[22:53:26] Writing local files
[22:53:26] Completed 112500 out of 250000 steps (45%)
[22:58:09] Writing local files
[22:58:09] Completed 115000 out of 250000 steps (46%)
[23:02:52] Writing local files
[23:02:52] Completed 117500 out of 250000 steps (47%)
[23:07:35] Writing local files
[23:07:35] Completed 120000 out of 250000 steps (48%)
[23:12:18] Writing local files
[23:12:18] Completed 122500 out of 250000 steps (49%)
[23:17:01] Writing local files
[23:17:01] Completed 125000 out of 250000 steps (50%)
[23:21:44] Writing local files
[23:21:44] Completed 127500 out of 250000 steps (51%)
[23:26:27] Writing local files
[23:26:27] Completed 130000 out of 250000 steps (52%)
[23:31:11] Writing local files
[23:31:11] Completed 132500 out of 250000 steps (53%)
[23:35:54] Writing local files
[23:35:54] Completed 135000 out of 250000 steps (54%)
[23:40:37] Writing local files
[23:40:37] Completed 137500 out of 250000 steps (55%)
[23:45:21] Writing local files
[23:45:21] Completed 140000 out of 250000 steps (56%)
[23:50:04] Writing local files
[23:50:04] Completed 142500 out of 250000 steps (57%)
[23:54:46] Writing local files
[23:54:46] Completed 145000 out of 250000 steps (58%)
[23:58:11] CoreStatus = 0 (0)
[23:58:11] Client-core communications error: ERROR 0x0
[23:58:11] Deleting current work unit & continuing...
[23:58:28] Trying to send all finished work units
[23:58:28] + No unsent completed units remaining.
[23:58:28] - Preparing to get new work unit...
[23:58:28] + Attempting to get work packet
[23:58:28] - Connecting to assignment server
[23:58:28] Connecting to http://assign.stanford.edu:8080/
[23:58:29] Posted data.
[23:58:29] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[23:58:29] + News From Folding@Home: Welcome to Folding@Home
[23:58:29] Loaded queue successfully.
[23:58:29] Connecting to http://171.64.65.62:8080/
[23:58:30] Posted data.
[23:58:30] Initial: 0000; - Receiving payload (expected size: 519161)
[23:58:32] - Downloaded at ~253 kB/s
[23:58:32] - Averaged speed for that direction ~396 kB/s
[23:58:32] + Received work.
[23:58:32] + Closed connections
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
Nobody has uploaded a result from either Project: 6511 Run 0, Clone 94, Gen 11 or Project: 6513 (Run 6, Clone 43, Gen 18) yet.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 289
- Joined: Sun Dec 02, 2007 4:31 am
- Location: Carrizo Plain National Monument, California
- Contact:
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
"that same WU has failed several times in a row, at the same point (based on the % printouts) and with the same error code." sounds like a bad WU
"I've got two FAH clients running (it's a dual core Phenom II) and both take the same kinds of errors on various projects." sounds like a problem with your machine.
"I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine."
This sounds like the source of your problem. What are the details of your machine and have you tried swapping around the memory sticks, checked for a BIOS update, got some extra RAM to test the machine?
"I've got two FAH clients running (it's a dual core Phenom II) and both take the same kinds of errors on various projects." sounds like a problem with your machine.
"I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine."
This sounds like the source of your problem. What are the details of your machine and have you tried swapping around the memory sticks, checked for a BIOS update, got some extra RAM to test the machine?
Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)
but you'll note that I recently put it back to what the MB think is its native speed and it doesn't help."I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine."
This sounds like the source of your problem. What are the details of your machine and have you tried swapping around the memory sticks, checked for a BIOS update, got some extra RAM to test the machine?
two 2-gig sticks. I could pull them one at a time, but haven't yet. No, no extra RAM that will fit this machine.
it's a Gigabyte MA770-UD3 (AM2+/AM3) with AMD PhenomII X2 CPU. latest BIOS as of last time I checked the gigabyte site. the RAM is G.Skill DDR2-1066 CL5-5-5-15 (F2-8500CL5D). 500W PC Power and Cooling PS. 500 watts should be plenty, my KillAWatt meter reports the entire machine draws only a couple hundred watts even when "cranking".
The failed WU's have started only in the last 1-2 months.