bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Moderators: Site Moderators, FAHC Science Team

Post Reply
fredex
Posts: 48
Joined: Thu Apr 01, 2010 1:17 am
Location: stoneham, ma, us

bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by fredex »

Project: 6503 (Run 3, Clone 189, Gen 41)
Client-core communications error: ERROR 0x0

and

Project: 6511 (Run 0, Clone 94, Gen 11)
Client-core communications error: ERROR 0x0

I've deleted 'em and restarted, but of course it keeps sent them to me to try again.

(two clients running on a dual-core processor)

is it my imagination, or do there seem to be more bad WUs lately? I've run for several years without (AFAIK) previously encountering a bad WU.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by toTOW »

No data in the DB for Project: 6503 (Run 3, Clone 189, Gen 41) and Project: 6511 (Run 0, Clone 94, Gen 11) yet.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by codysluder »

The wiki says that ERROR 0x0 is an unknown error so it's possibly the WU is bad or that you have some sort of hardware issue. Have you run diagnostics recently?

You didn't post FAHlog, but I'm guessing that the client deleted the WU rather than uploading a partial result. How far into the processing was it before it got the error? Please report what happens on the retry, too.
fredex
Posts: 48
Joined: Thu Apr 01, 2010 1:17 am
Location: stoneham, ma, us

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by fredex »

toTow said:
No data in the DB for Project: 6503 (Run 3, Clone 189, Gen 41) and Project: 6511 (Run 0, Clone 94, Gen 11) yet.
I'm not sure what he means, exactly: does he mean that no one has successfully returned one of those WUs yet, or does he mean there aren't any such WUs yet? or something else?

I haven't run any diagnostics lately, no. I can take it down and run memtest86+ for a while. But since everything else runs fine (and I usually have uptimes of 30-60 days whenever I get a kernel update causing me to reboot) I'd think it's probably not some hardware issue. but it could be: stranger things have happened.

Added code tags. ~sorto'
here's one of the log files (CPU1):

Code: Select all

--- Opening Log file [July 13 12:58:41] 


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/folding/foldingathome/CPU1
Executable: /home/folding/foldingathome/CPU1/fah6
Arguments: -verbosity 9 

[12:58:41] - Ask before connecting: No
[12:58:41] - User name: fredex (Team 48721)
[12:58:41] - User ID: 61F9905C1CB2ABB5
[12:58:41] - Machine ID: 1
[12:58:41] 
[12:58:41] Loaded queue successfully.
[12:58:41] - Preparing to get new work unit...
[12:58:41] + Attempting to get work packet
[12:58:41] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 4, Stepping: 2
[12:58:41] - Connecting to assignment server
[12:58:41] Connecting to http://assign.stanford.edu:8080/
[12:58:41] - Autosending finished units...
[12:58:41] Trying to send all finished work units
[12:58:41] + No unsent completed units remaining.
[12:58:41] - Autosend completed
[12:58:42] Posted data.
[12:58:42] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[12:58:42] + News From Folding@Home: Welcome to Folding@Home
[12:58:42] Loaded queue successfully.
[12:58:42] Connecting to http://171.64.65.62:8080/
[12:58:43] Posted data.
[12:58:43] Initial: 0000; - Receiving payload (expected size: 751267)
[12:58:49] - Downloaded at ~122 kB/s
[12:58:49] - Averaged speed for that direction ~315 kB/s
[12:58:49] + Received work.
[12:58:49] + Closed connections
[12:58:49] 
[12:58:49] + Processing work unit
[12:58:49] Core required: FahCore_78.exe
[12:58:49] Core found.
[12:58:49] Working on Unit 04 [July 13 12:58:49]
[12:58:49] + Working ...
[12:58:49] - Calling './FahCore_78.exe -dir work/ -suffix 04 -checkpoint 15 -verbose -lifeline 6108 -version 602'

[12:58:49] 
[12:58:49] *------------------------------*
[12:58:49] Folding@Home Gromacs Core
[12:58:49] Version 1.90 (March 8, 2006)
[12:58:49] 
[12:58:49] Preparing to commence simulation
[12:58:49] - Looking at optimizations...
[12:58:49] - Created dyn
[12:58:49] - Files status OK
[12:58:49] - Expanded 750755 -> 3750157 (decompressed 499.5 percent)
[12:58:49] - Starting from initial work packet
[12:58:49] 
[12:58:49] Project: 6511 (Run 0, Clone 94, Gen 11)
[12:58:49] 
[12:58:49] Assembly optimizations on if available.
[12:58:49] Entering M.D.
[12:58:56] Protein: UBIQUITIN MODEL250 in water
[12:58:56] 
[12:58:56] Writing local files
[12:58:56] Extra SSE boost OK.
[12:58:56] Writing local files
[12:58:56] Completed 0 out of 250000 steps  (0%)
[13:06:29] Writing local files
[13:06:29] Completed 2500 out of 250000 steps  (1%)
[13:14:03] Writing local files
[13:14:03] Completed 5000 out of 250000 steps  (2%)
[13:21:36] Writing local files
[13:21:36] Completed 7500 out of 250000 steps  (3%)
[13:29:08] Writing local files
[13:29:08] Completed 10000 out of 250000 steps  (4%)
[13:36:39] Writing local files
[13:36:39] Completed 12500 out of 250000 steps  (5%)
[13:44:12] Writing local files
[13:44:12] Completed 15000 out of 250000 steps  (6%)
[13:51:45] Writing local files
[13:51:45] Completed 17500 out of 250000 steps  (7%)
[13:59:15] Writing local files
[13:59:15] Completed 20000 out of 250000 steps  (8%)
[14:06:51] Writing local files
[14:06:51] Completed 22500 out of 250000 steps  (9%)
[14:14:24] Writing local files
[14:14:24] Completed 25000 out of 250000 steps  (10%)
[14:21:56] Writing local files
[14:21:56] Completed 27500 out of 250000 steps  (11%)
[14:29:28] Writing local files
[14:29:28] Completed 30000 out of 250000 steps  (12%)
[14:37:00] Writing local files
[14:37:00] Completed 32500 out of 250000 steps  (13%)
[14:44:33] Writing local files
[14:44:33] Completed 35000 out of 250000 steps  (14%)
[14:52:05] Writing local files
[14:52:05] Completed 37500 out of 250000 steps  (15%)
[14:59:39] Writing local files
[14:59:39] Completed 40000 out of 250000 steps  (16%)
[15:07:14] Writing local files
[15:07:14] Completed 42500 out of 250000 steps  (17%)
[15:14:46] Writing local files
[15:14:46] Completed 45000 out of 250000 steps  (18%)
[15:22:18] Writing local files
[15:22:18] Completed 47500 out of 250000 steps  (19%)
[15:29:50] Writing local files
[15:29:50] Completed 50000 out of 250000 steps  (20%)
[15:37:22] Writing local files
[15:37:22] Completed 52500 out of 250000 steps  (21%)
[15:44:56] Writing local files
[15:44:56] Completed 55000 out of 250000 steps  (22%)
[15:52:28] Writing local files
[15:52:28] Completed 57500 out of 250000 steps  (23%)
[16:00:01] Writing local files
[16:00:01] Completed 60000 out of 250000 steps  (24%)
[16:07:35] Writing local files
[16:07:35] Completed 62500 out of 250000 steps  (25%)
[16:15:09] Writing local files
[16:15:09] Completed 65000 out of 250000 steps  (26%)
[16:20:01] CoreStatus = 0 (0)
[16:20:01] Client-core communications error: ERROR 0x0
[16:20:01] Deleting current work unit & continuing...
[16:20:18] Trying to send all finished work units
[16:20:18] + No unsent completed units remaining.
[16:20:18] - Preparing to get new work unit...
[16:20:18] + Attempting to get work packet
[16:20:18] - Connecting to assignment server
[16:20:18] Connecting to http://assign.stanford.edu:8080/
[16:20:19] Posted data.
[16:20:19] Initial: 40AB; - Successful: assigned to (171.64.65.111).
[16:20:19] + News From Folding@Home: Welcome to Folding@Home
[16:20:19] Loaded queue successfully.
[16:20:19] Connecting to http://171.64.65.111:8080/
[16:20:20] Posted data.
[16:20:20] Initial: 0000; - Receiving payload (expected size: 464525)
[16:20:22] - Downloaded at ~226 kB/s
[16:20:22] - Averaged speed for that direction ~298 kB/s
[16:20:22] + Received work.
[16:20:22] + Closed connections
[16:20:27] 
[16:20:27] + Processing work unit
[16:20:27] Core required: FahCore_78.exe
[16:20:27] Core found.
[16:20:27] Working on Unit 05 [July 13 16:20:27]
[16:20:27] + Working ...
[16:20:27] - Calling './FahCore_78.exe -dir work/ -suffix 05 -checkpoint 15 -verbose -lifeline 6108 -version 602'

[16:20:27] 
[16:20:27] *------------------------------*
[16:20:27] Folding@Home Gromacs Core
[16:20:27] Version 1.90 (March 8, 2006)
[16:20:27] 
[16:20:27] Preparing to commence simulation
[16:20:27] - Looking at optimizations...
[16:20:27] - Created dyn
[16:20:27] - Files status OK
[16:20:27] - Expanded 464013 -> 2244013 (decompressed 483.6 percent)
[16:20:27] - Starting from initial work packet
[16:20:27] 
[16:20:27] Project: 6316 (Run 43, Clone 1, Gen 72)
[16:20:27] 
[16:20:27] Assembly optimizations on if available.
[16:20:27] Entering M.D.
[16:20:33] Protein: p6316_sh3_with_ALA_frags
[16:20:33] 
[16:20:33] Writing local files
[16:20:33] Extra SSE boost OK.
[16:20:33] Writing local files
[16:20:33] Completed 0 out of 500000 steps  (0%)
[16:30:00] Writing local files
[16:30:00] Completed 5000 out of 500000 steps  (1%)
[16:39:26] Writing local files
[16:39:26] Completed 10000 out of 500000 steps  (2%)
[16:48:52] Writing local files
[16:48:52] Completed 15000 out of 500000 steps  (3%)
[16:58:18] Writing local files
[16:58:18] Completed 20000 out of 500000 steps  (4%)
[17:07:45] Writing local files
[17:07:45] Completed 25000 out of 500000 steps  (5%)
[17:17:12] Writing local files
[17:17:12] Completed 30000 out of 500000 steps  (6%)
[17:26:40] Writing local files
[17:26:40] Completed 35000 out of 500000 steps  (7%)
[17:36:07] Writing local files
[17:36:07] Completed 40000 out of 500000 steps  (8%)
[17:45:33] Writing local files
[17:45:33] Completed 45000 out of 500000 steps  (9%)
[17:54:59] Writing local files
[17:54:59] Completed 50000 out of 500000 steps  (10%)
[18:04:26] Writing local files
[18:04:26] Completed 55000 out of 500000 steps  (11%)
[18:13:54] Writing local files
[18:13:54] Completed 60000 out of 500000 steps  (12%)
[18:23:21] Writing local files
[18:23:21] Completed 65000 out of 500000 steps  (13%)
[18:32:46] Writing local files
[18:32:46] Completed 70000 out of 500000 steps  (14%)
[18:42:12] Writing local files
[18:42:12] Completed 75000 out of 500000 steps  (15%)
[18:51:36] Writing local files
[18:51:36] Completed 80000 out of 500000 steps  (16%)
[18:58:41] - Autosending finished units...
[18:58:41] Trying to send all finished work units
[18:58:41] + No unsent completed units remaining.
[18:58:41] - Autosend completed
[19:01:01] Writing local files
[19:01:01] Completed 85000 out of 500000 steps  (17%)
[/quote]

and here's the other one (CPU2):

[quote]--- Opening Log file [July 13 12:58:41] 


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/folding/foldingathome/CPU2
Executable: /home/folding/foldingathome/CPU2/fah6
Arguments: -verbosity 9 

[12:58:41] - Ask before connecting: No
[12:58:41] - User name: fredex (Team 48721)
[12:58:41] - User ID: 2DF7B59021CFC89F
[12:58:41] - Machine ID: 2
[12:58:41] 
[12:58:41] Loaded queue successfully.
[12:58:41] - Preparing to get new work unit...
[12:58:41] + Attempting to get work packet
[12:58:41] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 4, Stepping: 2
[12:58:41] - Connecting to assignment server
[12:58:41] Connecting to http://assign.stanford.edu:8080/
[12:58:41] - Autosending finished units...
[12:58:41] Trying to send all finished work units
[12:58:41] + No unsent completed units remaining.
[12:58:41] - Autosend completed
[12:58:42] Posted data.
[12:58:42] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[12:58:42] + News From Folding@Home: Welcome to Folding@Home
[12:58:42] Loaded queue successfully.
[12:58:42] Connecting to http://171.64.65.62:8080/
[12:58:43] Posted data.
[12:58:43] Initial: 0000; - Receiving payload (expected size: 515872)
[12:58:47] - Downloaded at ~125 kB/s
[12:58:47] - Averaged speed for that direction ~268 kB/s
[12:58:47] + Received work.
[12:58:47] + Closed connections
[12:58:47] 
[12:58:47] + Processing work unit
[12:58:47] Core required: FahCore_78.exe
[12:58:47] Core found.
[12:58:47] Working on Unit 02 [July 13 12:58:47]
[12:58:47] + Working ...
[12:58:47] - Calling './FahCore_78.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 6122 -version 602'

[12:58:47] 
[12:58:47] *------------------------------*
[12:58:47] Folding@Home Gromacs Core
[12:58:47] Version 1.90 (March 8, 2006)
[12:58:47] 
[12:58:47] Preparing to commence simulation
[12:58:47] - Looking at optimizations...
[12:58:47] - Created dyn
[12:58:47] - Files status OK
[12:58:47] - Expanded 515360 -> 2531073 (decompressed 491.1 percent)
[12:58:47] - Starting from initial work packet
[12:58:47] 
[12:58:47] Project: 6503 (Run 3, Clone 189, Gen 41)
[12:58:47] 
[12:58:47] Assembly optimizations on if available.
[12:58:47] Entering M.D.
[12:58:53] Protein: TR462_B_4 in water
[12:58:53] 
[12:58:53] Writing local files
[12:58:53] Extra SSE boost OK.
[12:58:53] Writing local files
[12:58:53] Completed 0 out of 250000 steps  (0%)
[13:03:36] Writing local files
[13:03:36] Completed 2500 out of 250000 steps  (1%)
[13:08:21] Writing local files
[13:08:21] Completed 5000 out of 250000 steps  (2%)
[13:13:03] Writing local files
[13:13:03] Completed 7500 out of 250000 steps  (3%)
[13:17:45] Writing local files
[13:17:45] Completed 10000 out of 250000 steps  (4%)
[13:22:27] Writing local files
[13:22:27] Completed 12500 out of 250000 steps  (5%)
[13:27:10] Writing local files
[13:27:10] Completed 15000 out of 250000 steps  (6%)
[13:31:53] Writing local files
[13:31:53] Completed 17500 out of 250000 steps  (7%)
[13:36:36] Writing local files
[13:36:36] Completed 20000 out of 250000 steps  (8%)
[13:41:18] Writing local files
[13:41:18] Completed 22500 out of 250000 steps  (9%)
[13:46:01] Writing local files
[13:46:01] Completed 25000 out of 250000 steps  (10%)
[13:50:44] Writing local files
[13:50:44] Completed 27500 out of 250000 steps  (11%)
[13:55:28] Writing local files
[13:55:28] Completed 30000 out of 250000 steps  (12%)
[14:00:12] Writing local files
[14:00:12] Completed 32500 out of 250000 steps  (13%)
[14:04:54] Writing local files
[14:04:54] Completed 35000 out of 250000 steps  (14%)
[14:09:37] Writing local files
[14:09:37] Completed 37500 out of 250000 steps  (15%)
[14:14:20] Writing local files
[14:14:20] Completed 40000 out of 250000 steps  (16%)
[14:19:03] Writing local files
[14:19:03] Completed 42500 out of 250000 steps  (17%)
[14:23:46] Writing local files
[14:23:46] Completed 45000 out of 250000 steps  (18%)
[14:28:29] Writing local files
[14:28:29] Completed 47500 out of 250000 steps  (19%)
[14:33:12] Writing local files
[14:33:12] Completed 50000 out of 250000 steps  (20%)
[14:37:55] Writing local files
[14:37:55] Completed 52500 out of 250000 steps  (21%)
[14:42:38] Writing local files
[14:42:38] Completed 55000 out of 250000 steps  (22%)
[14:47:21] Writing local files
[14:47:21] Completed 57500 out of 250000 steps  (23%)
[14:52:04] Writing local files
[14:52:04] Completed 60000 out of 250000 steps  (24%)
[14:56:46] Writing local files
[14:56:46] Completed 62500 out of 250000 steps  (25%)
[15:01:29] Writing local files
[15:01:29] Completed 65000 out of 250000 steps  (26%)
[15:06:12] Writing local files
[15:06:12] Completed 67500 out of 250000 steps  (27%)
[15:10:55] Writing local files
[15:10:55] Completed 70000 out of 250000 steps  (28%)
[15:15:38] Writing local files
[15:15:38] Completed 72500 out of 250000 steps  (29%)
[15:20:22] Writing local files
[15:20:22] Completed 75000 out of 250000 steps  (30%)
[15:25:05] Writing local files
[15:25:05] Completed 77500 out of 250000 steps  (31%)
[15:29:49] Writing local files
[15:29:49] Completed 80000 out of 250000 steps  (32%)
[15:34:33] Writing local files
[15:34:33] Completed 82500 out of 250000 steps  (33%)
[15:39:15] Writing local files
[15:39:15] Completed 85000 out of 250000 steps  (34%)
[15:43:57] Writing local files
[15:43:58] Completed 87500 out of 250000 steps  (35%)
[15:48:41] Writing local files
[15:48:41] Completed 90000 out of 250000 steps  (36%)
[15:53:24] Writing local files
[15:53:24] Completed 92500 out of 250000 steps  (37%)
[15:58:07] Writing local files
[15:58:07] Completed 95000 out of 250000 steps  (38%)
[16:02:51] Writing local files
[16:02:51] Completed 97500 out of 250000 steps  (39%)
[16:07:35] Writing local files
[16:07:35] Completed 100000 out of 250000 steps  (40%)
[16:12:18] Writing local files
[16:12:18] Completed 102500 out of 250000 steps  (41%)
[16:17:01] Writing local files
[16:17:01] Completed 105000 out of 250000 steps  (42%)
[16:21:44] Writing local files
[16:21:44] Completed 107500 out of 250000 steps  (43%)
[16:26:25] Writing local files
[16:26:25] Completed 110000 out of 250000 steps  (44%)
[16:31:08] Writing local files
[16:31:08] Completed 112500 out of 250000 steps  (45%)
[16:35:51] Writing local files
[16:35:51] Completed 115000 out of 250000 steps  (46%)
[16:40:34] Writing local files
[16:40:34] Completed 117500 out of 250000 steps  (47%)
[16:45:17] Writing local files
[16:45:17] Completed 120000 out of 250000 steps  (48%)
[16:49:59] Writing local files
[16:49:59] Completed 122500 out of 250000 steps  (49%)
[16:54:42] Writing local files
[16:54:42] Completed 125000 out of 250000 steps  (50%)
[16:59:25] Writing local files
[16:59:25] Completed 127500 out of 250000 steps  (51%)
[17:04:07] Writing local files
[17:04:07] Completed 130000 out of 250000 steps  (52%)
[17:08:51] Writing local files
[17:08:51] Completed 132500 out of 250000 steps  (53%)
[17:13:33] Writing local files
[17:13:33] Completed 135000 out of 250000 steps  (54%)
[17:18:14] Writing local files
[17:18:14] Completed 137500 out of 250000 steps  (55%)
[17:22:57] Writing local files
[17:22:57] Completed 140000 out of 250000 steps  (56%)
[17:27:38] Writing local files
[17:27:38] Completed 142500 out of 250000 steps  (57%)
[17:32:19] Writing local files
[17:32:20] Completed 145000 out of 250000 steps  (58%)
[17:37:02] Writing local files
[17:37:02] Completed 147500 out of 250000 steps  (59%)
[17:41:44] Writing local files
[17:41:44] Completed 150000 out of 250000 steps  (60%)
[17:46:27] Writing local files
[17:46:27] Completed 152500 out of 250000 steps  (61%)
[17:51:09] Writing local files
[17:51:09] Completed 155000 out of 250000 steps  (62%)
[17:53:08] CoreStatus = 0 (0)
[17:53:08] Client-core communications error: ERROR 0x0
[17:53:08] Deleting current work unit & continuing...
[17:53:26] Trying to send all finished work units
[17:53:26] + No unsent completed units remaining.
[17:53:26] - Preparing to get new work unit...
[17:53:26] + Attempting to get work packet
[17:53:26] - Connecting to assignment server
[17:53:26] Connecting to http://assign.stanford.edu:8080/
[17:53:26] Posted data.
[17:53:26] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[17:53:26] + News From Folding@Home: Welcome to Folding@Home
[17:53:26] Loaded queue successfully.
[17:53:26] Connecting to http://171.64.65.62:8080/
[17:53:27] Posted data.
[17:53:27] Initial: 0000; - Receiving payload (expected size: 515872)
[17:53:34] - Downloaded at ~71 kB/s
[17:53:34] - Averaged speed for that direction ~229 kB/s
[17:53:34] + Received work.
[17:53:34] + Closed connections
[17:53:39] 
[17:53:39] + Processing work unit
[17:53:39] Core required: FahCore_78.exe
[17:53:39] Core found.
[17:53:39] Working on Unit 03 [July 13 17:53:39]
[17:53:39] + Working ...
[17:53:39] - Calling './FahCore_78.exe -dir work/ -suffix 03 -checkpoint 15 -verbose -lifeline 6122 -version 602'

[17:53:39] 
[17:53:39] *------------------------------*
[17:53:39] Folding@Home Gromacs Core
[17:53:39] Version 1.90 (March 8, 2006)
[17:53:39] 
[17:53:39] Preparing to commence simulation
[17:53:39] - Looking at optimizations...
[17:53:39] - Created dyn
[17:53:39] - Files status OK
[17:53:39] - Expanded 515360 -> 2531073 (decompressed 491.1 percent)
[17:53:39] - Starting from initial work packet
[17:53:39] 
[17:53:39] Project: 6503 (Run 3, Clone 189, Gen 41)
[17:53:39] 
[17:53:39] Assembly optimizations on if available.
[17:53:39] Entering M.D.
[17:53:45] Protein: TR462_B_4 in water
[17:53:45] 
[17:53:45] Writing local files
[17:53:45] Extra SSE boost OK.
[17:53:45] Writing local files
[17:53:45] Completed 0 out of 250000 steps  (0%)
[17:58:27] Writing local files
[17:58:28] Completed 2500 out of 250000 steps  (1%)
[18:03:10] Writing local files
[18:03:10] Completed 5000 out of 250000 steps  (2%)
[18:07:54] Writing local files
[18:07:54] Completed 7500 out of 250000 steps  (3%)
[18:12:34] Writing local files
[18:12:34] Completed 10000 out of 250000 steps  (4%)
[18:17:17] Writing local files
[18:17:17] Completed 12500 out of 250000 steps  (5%)
[18:21:58] Writing local files
[18:21:58] Completed 15000 out of 250000 steps  (6%)
[18:26:40] Writing local files
[18:26:40] Completed 17500 out of 250000 steps  (7%)
[18:31:22] Writing local files
[18:31:22] Completed 20000 out of 250000 steps  (8%)
[18:36:04] Writing local files
[18:36:04] Completed 22500 out of 250000 steps  (9%)
[18:40:47] Writing local files
[18:40:47] Completed 25000 out of 250000 steps  (10%)
[18:45:30] Writing local files
[18:45:30] Completed 27500 out of 250000 steps  (11%)
[18:50:14] Writing local files
[18:50:14] Completed 30000 out of 250000 steps  (12%)
[18:54:58] Writing local files
[18:54:58] Completed 32500 out of 250000 steps  (13%)
[18:58:41] - Autosending finished units...
[18:58:41] Trying to send all finished work units
[18:58:41] + No unsent completed units remaining.
[18:58:41] - Autosend completed
[18:59:42] Writing local files
[18:59:42] Completed 35000 out of 250000 steps  (14%)
[19:04:25] Writing local files
[19:04:25] Completed 37500 out of 250000 steps  (15%)
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by sortofageek »

I'm not sure what he means, exactly: does he mean that no one has successfully returned one of those WUs yet ...
Yes, that is what he was saying and that there were not any signs, either, of anyone returning partial WUs. If the WUs have been assigned to someone else and they are not causing problems for them, it should take a little time to learn if they are completed successfully.

Edit: I just checked again and there is still no data back on either of those work units.
whynot
Posts: 91
Joined: Wed Mar 26, 2008 9:02 pm
Location: Kyiv, Ukraine

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by whynot »

fredex wrote:toTow said:
I haven't run any diagnostics lately, no. I can take it down and run memtest86+ for a while. But since everything else runs fine (and I usually have uptimes of 30-60 days whenever I get a kernel update causing me to reboot) I'd think it's probably not some hardware issue. but it could be: stranger things have happened.
Have a look in your /var/log/syslog about that time. I think you would find such

Code: Select all

Jul 17 08:07:14 carpet kernel: [3522967.506736] FahCore_78.exe[13126]: segfault at e629da00 ip 08087a2f sp bf3fe3dc error 5 in FahCore_78.exe[8048000+322000]
--
I'm counting for science.
Points just make me sick.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by bruce »

Hi fredex (team 48721),
Your WU (P6503 R3 C189 G41) was added to the stats database on 2010-07-17 03:06:59 for 75 points of credit.
fredex
Posts: 48
Joined: Thu Apr 01, 2010 1:17 am
Location: stoneham, ma, us

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by fredex »

Bruce:

Thanks for the info!

I continue to have what appears to be the same problem with several other WUs. so far I've not found any problems here.

I've taken the system down and run Memtest86+ through two full iterations (around an hour and a half) letting it run all its standard tests. nothing.

I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine.

but I continue to have these failures in FAH. various projects, various WUs, but all with the same 0x0 error.

Some WUs process successfully, many don't. it's frustrating.

here's a log entry from one that failed just this evening:

Code: Select all

[17:58:43] Initial: 0000; - Receiving payload (expected size: 518483)
[17:58:44] - Downloaded at ~506 kB/s
[17:58:44] - Averaged speed for that direction ~424 kB/s
[17:58:44] + Received work.
[17:58:44] + Closed connections
[17:58:49] 
[17:58:49] + Processing work unit
[17:58:49] Core required: FahCore_78.exe
[17:58:49] Core found.
[17:58:49] Working on Unit 09 [July 23 17:58:49]
[17:58:49] + Working ...
[17:58:49] - Calling './FahCore_78.exe -dir work/ -suffix 09 -checkpoint 15 -verbose -lifeline 6594 -version 602'

[17:58:49] 
[17:58:49] *------------------------------*
[17:58:49] Folding@Home Gromacs Core
[17:58:49] Version 1.90 (March 8, 2006)
[17:58:49] 
[17:58:49] Preparing to commence simulation
[17:58:49] - Looking at optimizations...
[17:58:49] - Created dyn
[17:58:49] - Files status OK
[17:58:50] - Expanded 517971 -> 2533901 (decompressed 489.1 percent)
[17:58:50] - Starting from initial work packet
[17:58:50] 
[17:58:50] Project: 6513 (Run 6, Clone 43, Gen 18)
[17:58:50] 
[17:58:50] Assembly optimizations on if available.
[17:58:50] Entering M.D.
[17:58:56] Protein: TR462_B_7 in water
[17:58:56] 
[17:58:56] Writing local files
[17:58:56] Extra SSE boost OK.
[17:58:56] Writing local files
[17:58:56] Completed 0 out of 250000 steps  (0%)
[18:03:38] Writing local files
[18:03:38] Completed 2500 out of 250000 steps  (1%)
[18:08:21] Writing local files
[18:08:21] Completed 5000 out of 250000 steps  (2%)
[18:13:04] Writing local files
[18:13:04] Completed 7500 out of 250000 steps  (3%)
[18:17:47] Writing local files
[18:17:47] Completed 10000 out of 250000 steps  (4%)
[18:22:29] Writing local files
[18:22:29] Completed 12500 out of 250000 steps  (5%)
[18:27:12] Writing local files
[18:27:13] Completed 15000 out of 250000 steps  (6%)
[18:31:55] Writing local files
[18:31:55] Completed 17500 out of 250000 steps  (7%)
[18:36:38] Writing local files
[18:36:38] Completed 20000 out of 250000 steps  (8%)
[18:41:21] Writing local files
[18:41:21] Completed 22500 out of 250000 steps  (9%)
[18:46:04] Writing local files
[18:46:04] Completed 25000 out of 250000 steps  (10%)
[18:50:47] Writing local files
[18:50:47] Completed 27500 out of 250000 steps  (11%)
[18:55:31] Writing local files
[18:55:31] Completed 30000 out of 250000 steps  (12%)
[19:00:14] Writing local files
[19:00:15] Completed 32500 out of 250000 steps  (13%)
[19:04:57] Writing local files
[19:04:57] Completed 35000 out of 250000 steps  (14%)
[19:09:41] Writing local files
[19:09:41] Completed 37500 out of 250000 steps  (15%)
[19:14:24] Writing local files
[19:14:24] Completed 40000 out of 250000 steps  (16%)
[19:19:06] Writing local files
[19:19:07] Completed 42500 out of 250000 steps  (17%)
[19:23:50] Writing local files
[19:23:50] Completed 45000 out of 250000 steps  (18%)
[19:28:33] Writing local files
[19:28:33] Completed 47500 out of 250000 steps  (19%)
[19:33:15] Writing local files
[19:33:15] Completed 50000 out of 250000 steps  (20%)
[19:37:58] Writing local files
[19:37:58] Completed 52500 out of 250000 steps  (21%)
[19:42:40] Writing local files
[19:42:40] Completed 55000 out of 250000 steps  (22%)
[19:47:22] Writing local files
[19:47:22] Completed 57500 out of 250000 steps  (23%)
[19:52:05] Writing local files
[19:52:05] Completed 60000 out of 250000 steps  (24%)
[19:56:48] Writing local files
[19:56:48] Completed 62500 out of 250000 steps  (25%)
[20:01:31] Writing local files
[20:01:31] Completed 65000 out of 250000 steps  (26%)
[20:06:14] Writing local files
[20:06:14] Completed 67500 out of 250000 steps  (27%)
[20:10:56] Writing local files
[20:10:56] Completed 70000 out of 250000 steps  (28%)
[20:15:39] Writing local files
[20:15:39] Completed 72500 out of 250000 steps  (29%)
[20:20:22] Writing local files
[20:20:22] Completed 75000 out of 250000 steps  (30%)
[20:25:05] Writing local files
[20:25:05] Completed 77500 out of 250000 steps  (31%)
[20:29:48] Writing local files
[20:29:48] Completed 80000 out of 250000 steps  (32%)
[20:34:30] Writing local files
[20:34:30] Completed 82500 out of 250000 steps  (33%)
[20:39:13] Writing local files
[20:39:13] Completed 85000 out of 250000 steps  (34%)
[20:43:55] Writing local files
[20:43:55] Completed 87500 out of 250000 steps  (35%)
[20:48:38] Writing local files
[20:48:38] Completed 90000 out of 250000 steps  (36%)
[20:50:45] - Autosending finished units...
[20:50:45] Trying to send all finished work units
[20:50:45] + No unsent completed units remaining.
[20:50:45] - Autosend completed
[20:53:23] Writing local files
[20:53:23] Completed 92500 out of 250000 steps  (37%)
[20:58:05] Writing local files
[20:58:05] Completed 95000 out of 250000 steps  (38%)
[21:02:48] Writing local files
[21:02:48] Completed 97500 out of 250000 steps  (39%)
[21:07:31] Writing local files
[21:07:31] Completed 100000 out of 250000 steps  (40%)
[21:12:14] Writing local files
[21:12:14] Completed 102500 out of 250000 steps  (41%)
[21:16:57] Writing local files
[21:16:57] Completed 105000 out of 250000 steps  (42%)
[21:21:40] Writing local files
[21:21:40] Completed 107500 out of 250000 steps  (43%)
[21:26:22] Writing local files
[21:26:22] Completed 110000 out of 250000 steps  (44%)
[21:31:06] Writing local files
[21:31:06] Completed 112500 out of 250000 steps  (45%)
[21:35:48] Writing local files
[21:35:48] Completed 115000 out of 250000 steps  (46%)
[21:40:31] Writing local files
[21:40:31] Completed 117500 out of 250000 steps  (47%)
[21:43:32] CoreStatus = 0 (0)
[21:43:32] Client-core communications error: ERROR 0x0
[21:43:32] Deleting current work unit & continuing...
[21:43:50] Trying to send all finished work units
[21:43:50] + No unsent completed units remaining.
[21:43:50] - Preparing to get new work unit...
[21:43:50] + Attempting to get work packet
[21:43:50] - Connecting to assignment server
[21:43:50] Connecting to http://assign.stanford.edu:8080/
[21:43:50] Posted data.
[21:43:50] Initial: 40AB; - Successful: assigned to (171.64.65.111).
[21:43:50] + News From Folding@Home: Welcome to Folding@Home
[21:43:50] Loaded queue successfully.
[21:43:50] Connecting to http://171.64.65.111:8080/
[21:43:51] Posted data.
[21:43:51] Initial: 0000; - Receiving payload (expected size: 465035)
[21:43:52] - Downloaded at ~454 kB/s
[21:43:52] - Averaged speed for that direction ~430 kB/s
[21:43:52] + Received work.
[21:43:52] + Closed connections
that same WU has failed several times in a row, at the same point (based on the % printouts) and with the same error code.

I've got two FAH clients running (it's a dual core Phenom II) and both take the same kinds of errors on various projects. Here's another log entry from today from the OTHER client:

Code: Select all

[19:21:05] Initial: 0000; - Receiving payload (expected size: 519161)
[19:21:06] - Downloaded at ~506 kB/s
[19:21:06] - Averaged speed for that direction ~432 kB/s
[19:21:06] + Received work.
[19:21:06] + Closed connections
[19:21:11] 
[19:21:11] + Processing work unit
[19:21:11] Core required: FahCore_78.exe
[19:21:11] Core found.
[19:21:11] Working on Unit 06 [July 23 19:21:11]
[19:21:11] + Working ...
[19:21:11] - Calling './FahCore_78.exe -dir work/ -suffix 06 -checkpoint 15 -verbose -lifeline 6549 -version 602'

[19:21:11] 
[19:21:11] *------------------------------*
[19:21:11] Folding@Home Gromacs Core
[19:21:11] Version 1.90 (March 8, 2006)
[19:21:11] 
[19:21:11] Preparing to commence simulation
[19:21:11] - Looking at optimizations...
[19:21:11] - Created dyn
[19:21:11] - Files status OK
[19:21:12] - Expanded 518649 -> 2533093 (decompressed 488.4 percent)
[19:21:12] - Starting from initial work packet
[19:21:12] 
[19:21:12] Project: 6503 (Run 17, Clone 92, Gen 56)
[19:21:12] 
[19:21:12] Assembly optimizations on if available.
[19:21:12] Entering M.D.
[19:21:18] Protein: TR462_B_18 in water
[19:21:18] 
[19:21:18] Writing local files
[19:21:18] Extra SSE boost OK.
[19:21:18] Writing local files
[19:21:18] Completed 0 out of 250000 steps  (0%)
[19:26:00] Writing local files
[19:26:00] Completed 2500 out of 250000 steps  (1%)
[19:30:43] Writing local files
[19:30:43] Completed 5000 out of 250000 steps  (2%)
[19:35:26] Writing local files
[19:35:26] Completed 7500 out of 250000 steps  (3%)
[19:40:08] Writing local files
[19:40:08] Completed 10000 out of 250000 steps  (4%)
[19:44:51] Writing local files
[19:44:51] Completed 12500 out of 250000 steps  (5%)
[19:49:34] Writing local files
[19:49:34] Completed 15000 out of 250000 steps  (6%)
[19:54:21] Writing local files
[19:54:21] Completed 17500 out of 250000 steps  (7%)
[19:59:02] Writing local files
[19:59:02] Completed 20000 out of 250000 steps  (8%)
[20:03:45] Writing local files
[20:03:45] Completed 22500 out of 250000 steps  (9%)
[20:08:27] Writing local files
[20:08:27] Completed 25000 out of 250000 steps  (10%)
[20:13:09] Writing local files
[20:13:09] Completed 27500 out of 250000 steps  (11%)
[20:17:52] Writing local files
[20:17:52] Completed 30000 out of 250000 steps  (12%)
[20:22:33] Writing local files
[20:22:33] Completed 32500 out of 250000 steps  (13%)
[20:27:15] Writing local files
[20:27:15] Completed 35000 out of 250000 steps  (14%)
[20:31:57] Writing local files
[20:31:57] Completed 37500 out of 250000 steps  (15%)
[20:36:40] Writing local files
[20:36:40] Completed 40000 out of 250000 steps  (16%)
[20:41:22] Writing local files
[20:41:22] Completed 42500 out of 250000 steps  (17%)
[20:46:05] Writing local files
[20:46:05] Completed 45000 out of 250000 steps  (18%)
[20:50:45] - Autosending finished units...
[20:50:45] Trying to send all finished work units
[20:50:45] + No unsent completed units remaining.
[20:50:45] - Autosend completed
[20:50:48] Writing local files
[20:50:48] Completed 47500 out of 250000 steps  (19%)
[20:55:30] Writing local files
[20:55:30] Completed 50000 out of 250000 steps  (20%)
[21:00:12] Writing local files
[21:00:13] Completed 52500 out of 250000 steps  (21%)
[21:04:55] Writing local files
[21:04:55] Completed 55000 out of 250000 steps  (22%)
[21:09:37] Writing local files
[21:09:37] Completed 57500 out of 250000 steps  (23%)
[21:14:19] Writing local files
[21:14:19] Completed 60000 out of 250000 steps  (24%)
[21:19:02] Writing local files
[21:19:02] Completed 62500 out of 250000 steps  (25%)
[21:23:44] Writing local files
[21:23:44] Completed 65000 out of 250000 steps  (26%)
[21:28:26] Writing local files
[21:28:26] Completed 67500 out of 250000 steps  (27%)
[21:33:09] Writing local files
[21:33:09] Completed 70000 out of 250000 steps  (28%)
[21:37:52] Writing local files
[21:37:52] Completed 72500 out of 250000 steps  (29%)
[21:42:35] Writing local files
[21:42:35] Completed 75000 out of 250000 steps  (30%)
[21:47:18] Writing local files
[21:47:18] Completed 77500 out of 250000 steps  (31%)
[21:52:03] Writing local files
[21:52:03] Completed 80000 out of 250000 steps  (32%)
[21:56:46] Writing local files
[21:56:46] Completed 82500 out of 250000 steps  (33%)
[22:01:28] Writing local files
[22:01:28] Completed 85000 out of 250000 steps  (34%)
[22:06:11] Writing local files
[22:06:11] Completed 87500 out of 250000 steps  (35%)
[22:10:54] Writing local files
[22:10:54] Completed 90000 out of 250000 steps  (36%)
[22:15:38] Writing local files
[22:15:38] Completed 92500 out of 250000 steps  (37%)
[22:20:22] Writing local files
[22:20:22] Completed 95000 out of 250000 steps  (38%)
[22:25:06] Writing local files
[22:25:06] Completed 97500 out of 250000 steps  (39%)
[22:29:49] Writing local files
[22:29:49] Completed 100000 out of 250000 steps  (40%)
[22:34:32] Writing local files
[22:34:32] Completed 102500 out of 250000 steps  (41%)
[22:39:15] Writing local files
[22:39:15] Completed 105000 out of 250000 steps  (42%)
[22:43:58] Writing local files
[22:43:58] Completed 107500 out of 250000 steps  (43%)
[22:48:41] Writing local files
[22:48:41] Completed 110000 out of 250000 steps  (44%)
[22:53:26] Writing local files
[22:53:26] Completed 112500 out of 250000 steps  (45%)
[22:58:09] Writing local files
[22:58:09] Completed 115000 out of 250000 steps  (46%)
[23:02:52] Writing local files
[23:02:52] Completed 117500 out of 250000 steps  (47%)
[23:07:35] Writing local files
[23:07:35] Completed 120000 out of 250000 steps  (48%)
[23:12:18] Writing local files
[23:12:18] Completed 122500 out of 250000 steps  (49%)
[23:17:01] Writing local files
[23:17:01] Completed 125000 out of 250000 steps  (50%)
[23:21:44] Writing local files
[23:21:44] Completed 127500 out of 250000 steps  (51%)
[23:26:27] Writing local files
[23:26:27] Completed 130000 out of 250000 steps  (52%)
[23:31:11] Writing local files
[23:31:11] Completed 132500 out of 250000 steps  (53%)
[23:35:54] Writing local files
[23:35:54] Completed 135000 out of 250000 steps  (54%)
[23:40:37] Writing local files
[23:40:37] Completed 137500 out of 250000 steps  (55%)
[23:45:21] Writing local files
[23:45:21] Completed 140000 out of 250000 steps  (56%)
[23:50:04] Writing local files
[23:50:04] Completed 142500 out of 250000 steps  (57%)
[23:54:46] Writing local files
[23:54:46] Completed 145000 out of 250000 steps  (58%)
[23:58:11] CoreStatus = 0 (0)
[23:58:11] Client-core communications error: ERROR 0x0
[23:58:11] Deleting current work unit & continuing...
[23:58:28] Trying to send all finished work units
[23:58:28] + No unsent completed units remaining.
[23:58:28] - Preparing to get new work unit...
[23:58:28] + Attempting to get work packet
[23:58:28] - Connecting to assignment server
[23:58:28] Connecting to http://assign.stanford.edu:8080/
[23:58:29] Posted data.
[23:58:29] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[23:58:29] + News From Folding@Home: Welcome to Folding@Home
[23:58:29] Loaded queue successfully.
[23:58:29] Connecting to http://171.64.65.62:8080/
[23:58:30] Posted data.
[23:58:30] Initial: 0000; - Receiving payload (expected size: 519161)
[23:58:32] - Downloaded at ~253 kB/s
[23:58:32] - Averaged speed for that direction ~396 kB/s
[23:58:32] + Received work.
[23:58:32] + Closed connections
"whynot" suggested I'd find segfaults listed in my "syslog" (I assume he means /var/log/messages) at the same time as these FAH failures occur, but I don't. I see nothing at all like he suggests.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by bruce »

Nobody has uploaded a result from either Project: 6511 Run 0, Clone 94, Gen 11 or Project: 6513 (Run 6, Clone 43, Gen 18) yet.
John_Weatherman
Posts: 289
Joined: Sun Dec 02, 2007 4:31 am
Location: Carrizo Plain National Monument, California
Contact:

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by John_Weatherman »

"that same WU has failed several times in a row, at the same point (based on the % printouts) and with the same error code." sounds like a bad WU

"I've got two FAH clients running (it's a dual core Phenom II) and both take the same kinds of errors on various projects." sounds like a problem with your machine.

"I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine."
This sounds like the source of your problem. What are the details of your machine and have you tried swapping around the memory sticks, checked for a BIOS update, got some extra RAM to test the machine?
fredex
Posts: 48
Joined: Thu Apr 01, 2010 1:17 am
Location: stoneham, ma, us

Re: bad Work Units 6503 (3, 189, 41) and 6511 (0, 94, 11)

Post by fredex »

"I down-clocked the memory a little... when I built this system almost a year ago I bought RAM rated at 1066 DDR. for some reason the BIOS wanted to set it at 800, so I manually tweaked the BIOS settings to 1066 and ran Memtest86+ for several hours and it tested fine."
This sounds like the source of your problem. What are the details of your machine and have you tried swapping around the memory sticks, checked for a BIOS update, got some extra RAM to test the machine?
but you'll note that I recently put it back to what the MB think is its native speed and it doesn't help.

two 2-gig sticks. I could pull them one at a time, but haven't yet. No, no extra RAM that will fit this machine.

it's a Gigabyte MA770-UD3 (AM2+/AM3) with AMD PhenomII X2 CPU. latest BIOS as of last time I checked the gigabyte site. the RAM is G.Skill DDR2-1066 CL5-5-5-15 (F2-8500CL5D). 500W PC Power and Cooling PS. 500 watts should be plenty, my KillAWatt meter reports the entire machine draws only a couple hundred watts even when "cranking".

The failed WU's have started only in the last 1-2 months.
Post Reply