Page 1 of 1
Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Sun Nov 20, 2011 8:52 pm
by Pick2
Instant Client-core communications error: ERROR 0x8b
Stable machine , stock clocks , Linux Debian Squeeze server
Code: Select all
[20:45:15] + Results successfully sent
[20:45:15] Thank you for your contribution to Folding@Home.
[20:45:15] + Number of Units Completed: 880
[20:45:16] - Preparing to get new work unit...
[20:45:16] Cleaning up work directory
[20:45:16] + Attempting to get work packet
[20:45:16] Passkey found
[20:45:16] - Connecting to assignment server
[20:45:16] - Successful: assigned to (128.143.199.97).
[20:45:16] + News From Folding@Home: Welcome to Folding@Home
[20:45:16] Loaded queue successfully.
[20:45:20] + Closed connections
[20:45:20]
[20:45:20] + Processing work unit
[20:45:20] Core required: FahCore_a3.exe
[20:45:20] Core found.
[20:45:20] Working on queue slot 07 [November 20 20:45:20 UTC]
[20:45:20] + Working ...
[20:45:21]
[20:45:21] *------------------------------*
[20:45:21] Folding@Home Gromacs SMP Core
[20:45:21] Version 2.27 (Dec. 15, 2010)
[20:45:21]
[20:45:21] Preparing to commence simulation
[20:45:21] - Looking at optimizations...
[20:45:21] - Created dyn
[20:45:21] - Files status OK
[20:45:21] - Expanded 2166157 -> 3127236 (decompressed 144.3 percent)
[20:45:21] Called DecompressByteArray: compressed_data_size=2166157 data_size=3127236, decompressed_data_size=3127236 diff=0
[20:45:21] - Digital signature verified
[20:45:21]
[20:45:21] Project: 7507 (Run 0, Clone 160, Gen 1)
[20:45:21]
[20:45:21] Assembly optimizations on if available.
[20:45:21] Entering M.D.
[20:45:27] Mapping NT from 6 to 6
[20:45:27] Completed 0 out of 500000 steps (0%)
[20:45:28] CoreStatus = 8B (139)
[20:45:28] [color=#BF0000]Client-core communications error: ERROR 0x8b[/color]
[20:45:28] Deleting current work unit & continuing...
[20:45:38] - Preparing to get new work unit...
[20:45:38] Cleaning up work directory
[20:45:38] + Attempting to get work packet
[20:45:38] Passkey found
[20:45:38] - Connecting to assignment server
[20:45:39] - Successful: assigned to (128.143.199.97).
[20:45:39] + News From Folding@Home: Welcome to Folding@Home
[20:45:39] Loaded queue successfully.
[20:45:42] + Closed connections
[20:45:47]
[20:45:47] + Processing work unit
[20:45:47] Core required: FahCore_a3.exe
[20:45:47] Core found.
[20:45:47] Working on queue slot 08 [November 20 20:45:47 UTC]
[20:45:47] + Working ...
[20:45:47]
[20:45:47] *------------------------------*
[20:45:47] Folding@Home Gromacs SMP Core
[20:45:47] Version 2.27 (Dec. 15, 2010)
[20:45:47]
[20:45:47] Preparing to commence simulation
[20:45:47] - Looking at optimizations...
[20:45:47] - Created dyn
[20:45:47] - Files status OK
[20:45:47] - Expanded 1253014 -> 2077012 (decompressed 165.7 percent)
[20:45:47] Called DecompressByteArray: compressed_data_size=1253014 data_size=2077012, decompressed_data_size=2077012 diff=0
[20:45:47] - Digital signature verified
[20:45:47]
[20:45:47] Project: 7501 (Run 0, Clone 146, Gen 102)
[20:45:47]
[20:45:47] Assembly optimizations on if available.
[20:45:47] Entering M.D.
[20:45:53] Mapping NT from 6 to 6
[20:45:53] Completed 0 out of 500000 steps (0%)
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Sun Nov 20, 2011 9:13 pm
by Jesse_V
Some quick research for you:
This page:
http://fahwiki.net/index.php/CoreStatus_codes says its due to "Triggered by the OS, probably due to overclocking/overheating or a memory failure"
Also, see this page:
http://forum.overclock3d.net/index.php? ... -is-alive/
Looks like it might have something to do with your RAM, but that would be pretty unfortunate so hopefully someone more knowledgeable can make a better diagnosis.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Sun Nov 20, 2011 10:43 pm
by Pick2
Thank you , Jesse_V
Please excuse my brief Opening Post as I was rushed , but this was just to report a bad WU.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Sun Nov 20, 2011 10:49 pm
by Jesse_V
Pick2 wrote:Thank you , Jesse_V
Please excuse my brief Opening Post as I was rushed , but this was just to report a bad WU.
No problem. I think it's probably a bad WU, as from the last bit of your log it seems that the next one fired up just fine. Since that one uses the same core so your RAM might be fine.
Anyway, one of the mods can come by and check to see if anyone else can complete it. F@h also has a script in place to remove the bad WUs from being passed on to too many donors, so I'm sure it'll be taken care of. Thanks for reporting it.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Mon Nov 21, 2011 8:44 pm
by bruce
Is your folding name "Pick2"?
There is one report of an instant EUE in the database but it's not from "Pick2" (and it's from Windows) so this probably counts as the second report.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Mon Nov 21, 2011 9:49 pm
by Pick2
@
bruce
Folding under "Plutronics" this month , back to Pick2 next month
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Mon Nov 21, 2011 10:02 pm
by sortofageek
Pick2 wrote:@
bruce
Folding under "Plutronics" this month , back to Pick2 next month
The one report to which Bruce referred is not for Plutronics or Pick2, so your report is the second report. Marked for followup and thanks for reporting it.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Wed Jan 04, 2012 12:22 am
by Pick2
I just received this same bad WU again:
Project: 7507 (Run 0, Clone 160, Gen 1)
CoreStatus = 8B (139)
[22:37:44] Client-core communications error: ERROR 0x8b
[22:37:44]
Folding@Home will go to sleep for 1 day as there have been 5 consecutive Cores executed which failed to complete a work unit.
It tried to run 10 times and EUE'd right off the bat each time. This has been the only failure on this blade ever.
Folding under "Plutronics" team 1971
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Wed Jan 04, 2012 3:57 am
by PantherX
There are three failures in the WU Database so I have marked it as a bad WU:
The WU (P7507,R0,C160,G1) has been reported as a bad WU.
Thanks for your report.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Wed Jan 04, 2012 7:47 am
by CBT
I seem to have a different problem with the same WU:
Code: Select all
Launch directory: C:\Users\bogaartc\Documents\FAH6SMP
Executable: C:\Users\bogaartc\Documents\FAH6SMP\FAH6.exe
Arguments: -smp -verbosity 9 -smp -verbosity 9
[06:54:45] - Ask before connecting: No
[06:54:45] - User name: Bogaarts (Team 92)
[06:54:45] - User ID: 5D1560B702431C49
[06:54:45] - Machine ID: 1
[06:54:45]
[06:54:45] Loaded queue successfully.
[06:54:45]
[06:54:45] + Processing work unit
[06:54:45] Core required: FahCore_a3.exe
[06:54:45] Core found.
[06:54:45] - Autosending finished units... [January 4 06:54:45 UTC]
[06:54:45] Trying to send all finished work units
[06:54:45] + No unsent completed units remaining.
[06:54:45] - Autosend completed
[06:54:45] Working on queue slot 01 [January 4 06:54:45 UTC]
[06:54:45] + Working ...
[06:54:45] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 4 -cpu 95 -checkpoint 30 -verbose -lifeline 268 -version 634'
[06:54:45]
[06:54:45] *------------------------------*
[06:54:45] Folding@Home Gromacs SMP Core
[06:54:45] Version 2.27 (Dec. 15, 2010)
[06:54:45]
[06:54:45] Preparing to commence simulation
[06:54:45] - Ensuring status. Please wait.
[06:54:55] - Looking at optimizations...
[06:54:55] - Working with standard loops on this execution.
[06:54:55] - Previous termination of core was improper.
[06:54:55] - Files status OK
[06:54:56] - Expanded 2166157 -> 3127236 (decompressed 144.3 percent)
[06:54:56] Called DecompressByteArray: compressed_data_size=2166157 data_size=3127236, decompressed_data_size=3127236 diff=0
[06:54:56] - Digital signature verified
[06:54:56]
[06:54:56] Project: 7507 (Run 0, Clone 160, Gen 1)
[06:54:56]
[06:54:56] Entering M.D.
[06:55:02] Mapping NT from 4 to 4
[06:55:03] Completed 0 out of 500000 steps (0%)
[06:55:10] CoreStatus = C0000029 (-1073741783)
[06:55:10] Client-core communications error: ERROR 0xc0000029
[06:55:10] Deleting current work unit & continuing...
[06:55:24] Trying to send all finished work units
[06:55:24] + No unsent completed units remaining.
[06:55:24] - Preparing to get new work unit...
[06:55:24] Cleaning up work directory
[06:55:24] + Attempting to get work packet
[06:55:24] Passkey found
[06:55:24] - Will indicate memory of 1024 MB
[06:55:24] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 5, Stepping: 5
[06:55:24] - Connecting to assignment server
[06:55:24] Connecting to http://assign.stanford.edu:8080/
[06:55:25] Posted data.
[06:55:25] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[06:55:25] + News From Folding@Home: Welcome to Folding@Home
[06:55:25] Loaded queue successfully.
[06:55:25] Sent data
[06:55:25] Connecting to http://128.143.199.97:8080/
[06:55:26] Posted data.
[06:55:26] Initial: 0000; - Receiving payload (expected size: 2166669)
[06:55:29] - Downloaded at ~705 kB/s
[06:55:29] - Averaged speed for that direction ~1048 kB/s
[06:55:29] + Received work.
[06:55:29] + Closed connections
[06:55:34]
[06:55:34] + Processing work unit
[06:55:34] Core required: FahCore_a3.exe
[06:55:34] Core found.
[06:55:34] Working on queue slot 02 [January 4 06:55:34 UTC]
[06:55:34] + Working ...
[06:55:34] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 4 -cpu 95 -checkpoint 30 -verbose -lifeline 268 -version 634'
[06:55:34]
[06:55:34] *------------------------------*
[06:55:34] Folding@Home Gromacs SMP Core
[06:55:34] Version 2.27 (Dec. 15, 2010)
[06:55:34]
[06:55:34] Preparing to commence simulation
[06:55:34] - Looking at optimizations...
[06:55:34] - Created dyn
[06:55:34] - Files status OK
[06:55:35] - Expanded 2166157 -> 3127236 (decompressed 144.3 percent)
[06:55:35] Called DecompressByteArray: compressed_data_size=2166157 data_size=3127236, decompressed_data_size=3127236 diff=0
[06:55:35] - Digital signature verified
[06:55:35]
[06:55:35] Project: 7507 (Run 0, Clone 160, Gen 1)
[06:55:35]
[06:55:35] Assembly optimizations on if available.
[06:55:35] Entering M.D.
[06:55:41] Mapping NT from 4 to 4
[06:55:41] Completed 0 out of 500000 steps (0%)
[06:55:56] CoreStatus = C0000029 (-1073741783)
[06:55:56] Client-core communications error: ERROR 0xc0000029
[06:55:56] Deleting current work unit & continuing...
[06:56:10] Trying to send all finished work units
[06:56:10] + No unsent completed units remaining.
[06:56:10] - Preparing to get new work unit...
[06:56:10] Cleaning up work directory
[06:56:10] + Attempting to get work packet
[06:56:10] Passkey found
[06:56:10] - Will indicate memory of 1024 MB
[06:56:10] - Connecting to assignment server
[06:56:10] Connecting to http://assign.stanford.edu:8080/
[06:56:11] Posted data.
[06:56:11] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[06:56:11] + News From Folding@Home: Welcome to Folding@Home
[06:56:12] Loaded queue successfully.
[06:56:12] Sent data
[06:56:12] Connecting to http://128.143.199.97:8080/
[06:56:13] Posted data.
[06:56:13] Initial: 0000; - Receiving payload (expected size: 2166669)
[06:56:15] - Downloaded at ~1057 kB/s
[06:56:15] - Averaged speed for that direction ~1050 kB/s
[06:56:15] + Received work.
[06:56:15] + Closed connections
[06:56:21]
[06:56:21] + Processing work unit
[06:56:21] Core required: FahCore_a3.exe
[06:56:21] Core found.
[06:56:21] Working on queue slot 03 [January 4 06:56:21 UTC]
[06:56:21] + Working ...
[06:56:21] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 03 -np 4 -cpu 95 -checkpoint 30 -verbose -lifeline 268 -version 634'
[06:56:21]
[06:56:21] *------------------------------*
[06:56:21] Folding@Home Gromacs SMP Core
[06:56:21] Version 2.27 (Dec. 15, 2010)
[06:56:21]
[06:56:21] Preparing to commence simulation
[06:56:21] - Looking at optimizations...
[06:56:21] - Created dyn
[06:56:21] - Files status OK
[06:56:21] - Expanded 2166157 -> 3127236 (decompressed 144.3 percent)
[06:56:21] Called DecompressByteArray: compressed_data_size=2166157 data_size=3127236, decompressed_data_size=3127236 diff=0
[06:56:21] - Digital signature verified
[06:56:21]
[06:56:21] Project: 7507 (Run 0, Clone 160, Gen 1)
[06:56:21]
[06:56:21] Assembly optimizations on if available.
[06:56:21] Entering M.D.
[06:56:27] Mapping NT from 4 to 4
[06:56:28] Completed 0 out of 500000 steps (0%)
[06:56:43] CoreStatus = C0000029 (-1073741783)
[06:56:43] Client-core communications error: ERROR 0xc0000029
[06:56:43] - Attempting to download new core...
[06:56:43] + Downloading new core: FahCore_a3.exe
[06:56:43] Downloading core (/~pande/Win32/x86/Core_a3.fah from www.stanford.edu)
[06:56:43] Initial: AFDE; + 10240 bytes downloaded
<snip>
[06:57:02] Initial: 6557; + 3028785 bytes downloaded
[06:57:02] Verifying core Core_a3.fah...
[06:57:02] Signature is VALID
[06:57:02]
[06:57:02] Trying to unzip core FahCore_a3.exe
[06:57:03] Decompressed FahCore_a3.exe (10057216 bytes) successfully
[06:57:08] + Core successfully engaged
[06:57:09] Deleting current work unit & continuing...
[06:57:23] Trying to send all finished work units
[06:57:23] + No unsent completed units remaining.
[06:57:23] - Preparing to get new work unit...
[06:57:23] Cleaning up work directory
[06:57:23] + Attempting to get work packet
[06:57:23] Passkey found
[06:57:23] - Will indicate memory of 1024 MB
[06:57:23] - Connecting to assignment server
[06:57:23] Connecting to http://assign.stanford.edu:8080/
[06:57:24] Posted data.
[06:57:24] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[06:57:24] + News From Folding@Home: Welcome to Folding@Home
[06:57:24] Loaded queue successfully.
[06:57:24] Sent data
[06:57:24] Connecting to http://128.143.199.97:8080/
[06:57:25] Posted data.
[06:57:25] Initial: 0000; - Receiving payload (expected size: 2166669)
[06:57:27] - Downloaded at ~1057 kB/s
[06:57:27] - Averaged speed for that direction ~1051 kB/s
[06:57:27] + Received work.
[06:57:27] + Closed connections
[06:57:32]
[06:57:32] + Processing work unit
[06:57:32] Core required: FahCore_a3.exe
[06:57:32] Core found.
[06:57:32] Working on queue slot 04 [January 4 06:57:32 UTC]
[06:57:32] + Working ...
[06:57:32] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 04 -np 4 -cpu 95 -checkpoint 30 -verbose -lifeline 268 -version 634'
[06:57:32]
[06:57:32] *------------------------------*
[06:57:32] Folding@Home Gromacs SMP Core
[06:57:32] Version 2.27 (Dec. 15, 2010)
[06:57:32]
[06:57:32] Preparing to commence simulation
[06:57:32] - Looking at optimizations...
[06:57:32] - Created dyn
[06:57:32] - Files status OK
[06:57:33] - Expanded 2166157 -> 3127236 (decompressed 144.3 percent)
[06:57:33] Called DecompressByteArray: compressed_data_size=2166157 data_size=3127236, decompressed_data_size=3127236 diff=0
[06:57:33] - Digital signature verified
[06:57:33]
[06:57:33] Project: 7507 (Run 0, Clone 160, Gen 1)
[06:57:33]
[06:57:33] Assembly optimizations on if available.
[06:57:33] Entering M.D.
[06:57:39] Mapping NT from 4 to 4
[06:57:40] Completed 0 out of 500000 steps (0%)
[06:57:55] CoreStatus = C0000029 (-1073741783)
[06:57:55] Client-core communications error: ERROR 0xc0000029
[06:57:55] Deleting current work unit & continuing...
I keep receiving this same WU, even after removing the Work-folder, etc. as described here
http://foldingforum.org/viewtopic.php?f=19&t=16526.
Corné
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Wed Jan 04, 2012 1:31 pm
by kasson
Thanks--looks like it's a bad WU. We re-generated the start file, but it may take the server labeling it as bad to get rid of these.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Wed Jan 04, 2012 2:58 pm
by CBT
When would this take place? As I would like to start my client again, and currently the client still gets the same WU and still crashes on it.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Wed Jan 04, 2012 11:19 pm
by bruce
Mods can label a WU as "bad" but when they do, it is taken out of service at 8am Pacific Time the next morning.
By that time, I'm not sure if that would end up taking the newly regenerated WU out of service so I don't think I should do it.
Re: Project: 7507 (Run 0, Clone 160, Gen 1)
Posted: Thu Jan 05, 2012 9:37 am
by CBT
Fyi,
8AM PST (i.e. 0:00 GMT) has passed for several hours, but I kept receiving the same WU.
After following
http://foldingforum.org/viewtopic.php?f ... 26#p164322, my computer is now working on a different WU.
Corné