Page 1 of 1
Project: 6508 (Run 12, Clone 193, Gen 90)
Posted: Mon Feb 21, 2011 7:04 pm
by entraidelec
Project: 6508 (Run 12, Clone 193, Gen 90)
Hi ;
This unit keep going wrong at 47%. The early end happend after a "Client-core communication error 0x0".
First try occurs on Feb 9th, until Feb 19th on a headless zombie running flawlessly up to this point.
Manual stop and user change allows downloading and folding of another unit without problem.
Regards.
Re: Project 6508 (Run 12, Clone 193, Gen 90)
Posted: Mon Feb 21, 2011 7:58 pm
by bruce
Welcome to foldingforum.org.
Without information from your FAHlog, it's difficult to tell what happened, but apparently the work was repeatedly deleted and restarted because I can't find any record of you uploading a partial result. The WU was reassigned and someone else has successfully completed it.
Which OS do you run? What version of the SMP FahCore do you have?
Re: Project 6508 (Run 12, Clone 193, Gen 90)
Posted: Tue Feb 22, 2011 9:44 pm
by entraidelec
Hi ;
Sorry, i've missed these usefull infos.
So here they are :
- linux kernel 2.6.30.6-slitaz ;
- fah 6.02 console edition, flags normal size, and advanced ;
- Fahcore_78 ;
No partial results were uploaded, keep deleting current WU then reload the same one again.
It's a rather slow machine, that's why i didn't notice the problem before.
I change the username for an other of our team and everything goes fine.
And a sample of the backed up fahlog :
Code: Select all
[06:34:13] + News From Folding@Home: Welcome to Folding@Home
[06:34:13] Loaded queue successfully.
[06:34:25] + Closed connections
[06:34:30]
[06:34:30] + Processing work unit
[06:34:30] Core required: FahCore_78.exe
[06:34:30] Core found.
[06:34:30] Working on Unit 02 [February 18 06:34:30]
[06:34:30] + Working ...
[06:34:30]
[06:34:30] *------------------------------*
[06:34:30] Folding@Home Gromacs Core
[06:34:30] Version 1.90 (March 8, 2006)
[06:34:30]
[06:34:30] Preparing to commence simulation
[06:34:30] - Looking at optimizations...
[06:34:30] - Created dyn
[06:34:30] - Files status OK
[06:34:31] - Expanded 477478 -> 2356697 (decompressed 493.5 percent)
[06:34:31] - Starting from initial work packet
[06:34:31]
[06:34:31] Project: 6508 (Run 12, Clone 193, Gen 90)
[06:34:31]
[06:34:31] Assembly optimizations on if available.
[06:34:31] Entering M.D.
[06:34:38] Protein: proG_13 in water
[06:34:38]
[06:34:38] Writing local files
[06:34:38] Extra SSE boost OK.
[06:34:39] Writing local files
[06:34:39] Completed 0 out of 250000 steps (0%)
[07:06:38] Writing local files
[07:06:39] Completed 2500 out of 250000 steps (1%)
[07:38:38] Writing local files
[07:38:38] Completed 5000 out of 250000 steps (2%)
[08:10:39] Writing local files
[08:10:39] Completed 7500 out of 250000 steps (3%)
[08:42:42] Writing local files
[08:42:42] Completed 10000 out of 250000 steps (4%)
[09:14:42] Writing local files
[09:14:42] Completed 12500 out of 250000 steps (5%)
[09:46:41] Writing local files
[09:46:41] Completed 15000 out of 250000 steps (6%)
[10:18:43] Writing local files
[10:18:44] Completed 17500 out of 250000 steps (7%)
[10:50:45] Writing local files
[10:50:45] Completed 20000 out of 250000 steps (8%)
[11:22:48] Writing local files
[11:22:48] Completed 22500 out of 250000 steps (9%)
[11:54:51] Writing local files
[11:54:51] Completed 25000 out of 250000 steps (10%)
[12:26:54] Writing local files
[12:26:55] Completed 27500 out of 250000 steps (11%)
[12:58:56] Writing local files
[12:58:56] Completed 30000 out of 250000 steps (12%)
[13:30:57] Writing local files
[13:30:57] Completed 32500 out of 250000 steps (13%)
[14:02:57] Writing local files
[14:02:57] Completed 35000 out of 250000 steps (14%)
[14:34:56] Writing local files
[14:34:56] Completed 37500 out of 250000 steps (15%)
[15:06:54] Writing local files
[15:06:54] Completed 40000 out of 250000 steps (16%)
[15:38:54] Writing local files
[15:38:54] Completed 42500 out of 250000 steps (17%)
[16:10:53] Writing local files
[16:10:53] Completed 45000 out of 250000 steps (18%)
[16:42:54] Writing local files
[16:42:54] Completed 47500 out of 250000 steps (19%)
[17:14:53] Writing local files
[17:14:53] Completed 50000 out of 250000 steps (20%)
[17:46:54] Writing local files
[17:46:54] Completed 52500 out of 250000 steps (21%)
[18:18:56] Writing local files
[18:18:56] Completed 55000 out of 250000 steps (22%)
[18:50:59] Writing local files
[18:50:59] Completed 57500 out of 250000 steps (23%)
[19:23:01] Writing local files
[19:23:01] Completed 60000 out of 250000 steps (24%)
[19:55:03] Writing local files
[19:55:03] Completed 62500 out of 250000 steps (25%)
[20:27:05] Writing local files
[20:27:05] Completed 65000 out of 250000 steps (26%)
[20:59:06] Writing local files
[20:59:06] Completed 67500 out of 250000 steps (27%)
[21:31:07] Writing local files
[21:31:07] Completed 70000 out of 250000 steps (28%)
[22:03:08] Writing local files
[22:03:08] Completed 72500 out of 250000 steps (29%)
[22:35:09] Writing local files
[22:35:09] Completed 75000 out of 250000 steps (30%)
[23:07:10] Writing local files
[23:07:11] Completed 77500 out of 250000 steps (31%)
[23:39:10] Writing local files
[23:39:10] Completed 80000 out of 250000 steps (32%)
[00:11:09] Writing local files
[00:11:09] Completed 82500 out of 250000 steps (33%)
[00:43:11] Writing local files
[00:43:11] Completed 85000 out of 250000 steps (34%)
[01:15:09] Writing local files
[01:15:09] Completed 87500 out of 250000 steps (35%)
[01:47:09] Writing local files
[01:47:09] Completed 90000 out of 250000 steps (36%)
[02:19:08] Writing local files
[02:19:08] Completed 92500 out of 250000 steps (37%)
[02:51:06] Writing local files
[02:51:06] Completed 95000 out of 250000 steps (38%)
[03:23:05] Writing local files
[03:23:05] Completed 97500 out of 250000 steps (39%)
[03:55:03] Writing local files
[03:55:03] Completed 100000 out of 250000 steps (40%)
[04:27:00] Writing local files
[04:27:00] Completed 102500 out of 250000 steps (41%)
[04:59:00] Writing local files
[04:59:00] Completed 105000 out of 250000 steps (42%)
[05:30:59] Writing local files
[05:30:59] Completed 107500 out of 250000 steps (43%)
[06:02:59] Writing local files
[06:02:59] Completed 110000 out of 250000 steps (44%)
[06:34:59] Writing local files
[06:35:00] Completed 112500 out of 250000 steps (45%)
[07:07:00] Writing local files
[07:07:00] Completed 115000 out of 250000 steps (46%)
[07:39:02] Writing local files
[07:39:02] Completed 117500 out of 250000 steps (47%)
[07:52:13] CoreStatus = 0 (0)
[07:52:13] Client-core communications error: ERROR 0x0
[07:52:13] - Attempting to download new core...
[07:52:13] + Downloading new core: FahCore_78.exe
[07:52:17] + 10240 bytes downloaded
[07:52:17] + 20480 bytes downloaded
[07:52:18] + 30720 bytes downloaded
[07:52:18] + 40960 bytes downloaded
[07:52:18] + 51200 bytes downloaded
[07:52:18] + 61440 bytes downloaded
[07:52:19] + 71680 bytes downloaded
[07:52:19] + 81920 bytes downloaded
[07:52:19] + 92160 bytes downloaded
[07:52:20] + 102400 bytes downloaded
[07:52:20] + 112640 bytes downloaded
[07:52:20] + 122880 bytes downloaded
[07:52:21] + 133120 bytes downloaded
[07:52:21] + 143360 bytes downloaded
[07:52:21] + 153600 bytes downloaded
[07:52:22] + 163840 bytes downloaded
[07:52:22] + 174080 bytes downloaded
[07:52:22] + 184320 bytes downloaded
[07:52:23] + 194560 bytes downloaded
[07:52:23] + 204800 bytes downloaded
[07:52:23] + 215040 bytes downloaded
[07:52:23] + 225280 bytes downloaded
[07:52:24] + 235520 bytes downloaded
[07:52:24] + 245760 bytes downloaded
[07:52:24] + 256000 bytes downloaded
[07:52:24] + 266240 bytes downloaded
[07:52:24] + 276480 bytes downloaded
[07:52:24] + 286720 bytes downloaded
[07:52:24] + 296960 bytes downloaded
[07:52:25] + 307200 bytes downloaded
[07:52:25] + 317440 bytes downloaded
[07:52:25] + 327680 bytes downloaded
[07:52:25] + 337920 bytes downloaded
[07:52:25] + 348160 bytes downloaded
[07:52:25] + 358400 bytes downloaded
[07:52:25] + 368640 bytes downloaded
[07:52:26] + 378880 bytes downloaded
[07:52:26] + 389120 bytes downloaded
[07:52:26] + 399360 bytes downloaded
[07:52:26] + 409600 bytes downloaded
[07:52:26] + 419840 bytes downloaded
[07:52:26] + 430080 bytes downloaded
[07:52:27] + 440320 bytes downloaded
[07:52:27] + 450560 bytes downloaded
[07:52:27] + 460800 bytes downloaded
[07:52:28] + 471040 bytes downloaded
[07:52:28] + 481280 bytes downloaded
[07:52:28] + 491520 bytes downloaded
[07:52:28] + 501760 bytes downloaded
[07:52:29] + 512000 bytes downloaded
[07:52:29] + 522240 bytes downloaded
[07:52:29] + 532480 bytes downloaded
[07:52:29] + 542720 bytes downloaded
[07:52:29] + 552960 bytes downloaded
[07:52:29] + 563200 bytes downloaded
[07:52:30] + 573440 bytes downloaded
[07:52:30] + 583680 bytes downloaded
[07:52:30] + 593920 bytes downloaded
[07:52:30] + 604160 bytes downloaded
[07:52:31] + 614400 bytes downloaded
[07:52:31] + 624640 bytes downloaded
[07:52:31] + 634880 bytes downloaded
[07:52:31] + 645120 bytes downloaded
[07:52:31] + 655360 bytes downloaded
[07:52:32] + 665600 bytes downloaded
[07:52:32] + 675840 bytes downloaded
[07:52:32] + 686080 bytes downloaded
[07:52:32] + 696320 bytes downloaded
[07:52:32] + 706560 bytes downloaded
[07:52:33] + 716800 bytes downloaded
[07:52:33] + 727040 bytes downloaded
[07:52:33] + 737280 bytes downloaded
[07:52:34] + 747520 bytes downloaded
[07:52:34] + 757760 bytes downloaded
[07:52:34] + 768000 bytes downloaded
[07:52:35] + 778240 bytes downloaded
[07:52:35] + 788480 bytes downloaded
[07:52:35] + 798720 bytes downloaded
[07:52:36] + 808960 bytes downloaded
[07:52:36] + 819200 bytes downloaded
[07:52:36] + 829440 bytes downloaded
[07:52:36] + 839680 bytes downloaded
[07:52:36] + 849920 bytes downloaded
[07:52:36] + 860160 bytes downloaded
[07:52:36] + 870400 bytes downloaded
[07:52:37] + 880640 bytes downloaded
[07:52:37] + 890880 bytes downloaded
[07:52:37] + 901120 bytes downloaded
[07:52:37] + 911360 bytes downloaded
[07:52:37] + 921600 bytes downloaded
[07:52:37] + 931840 bytes downloaded
[07:52:38] + 942080 bytes downloaded
[07:52:38] + 952320 bytes downloaded
[07:52:38] + 962560 bytes downloaded
[07:52:38] + 972800 bytes downloaded
[07:52:38] + 983040 bytes downloaded
[07:52:39] + 993280 bytes downloaded
[07:52:39] + 1003520 bytes downloaded
[07:52:39] + 1013760 bytes downloaded
[07:52:39] + 1024000 bytes downloaded
[07:52:39] + 1034240 bytes downloaded
[07:52:39] + 1044480 bytes downloaded
[07:52:39] + 1054720 bytes downloaded
[07:52:39] + 1064960 bytes downloaded
[07:52:40] + 1075200 bytes downloaded
[07:52:40] + 1085440 bytes downloaded
[07:52:40] + 1095680 bytes downloaded
[07:52:40] + 1105920 bytes downloaded
[07:52:40] + 1116160 bytes downloaded
[07:52:40] + 1126400 bytes downloaded
[07:52:41] + 1134407 bytes downloaded
[07:52:41] Verifying core Core_78.fah...
[07:52:41] Signature is VALID
[07:52:41]
[07:52:41] Trying to unzip core FahCore_78.exe
[07:52:42] Decompressed FahCore_78.exe (3435296 bytes) successfully
[07:52:42] + Core successfully engaged
[07:52:43] Deleting current work unit & continuing...
[07:53:02] - Preparing to get new work unit...
[07:53:02] + Attempting to get work packet
[07:53:02] - Connecting to assignment server
[07:53:05] - Successful: assigned to (171.64.65.62).
[07:53:05] + News From Folding@Home: Welcome to Folding@Home
[07:53:05] Loaded queue successfully.
[07:53:14] + Closed connections
[07:53:19]
[07:53:19] + Processing work unit
[07:53:19] Core required: FahCore_78.exe
[07:53:19] Core found.
[07:53:19] Working on Unit 03 [February 19 07:53:19]
[07:53:19] + Working ...
[07:53:19]
[07:53:19] *------------------------------*
[07:53:19] Folding@Home Gromacs Core
[07:53:19] Version 1.90 (March 8, 2006)
[07:53:19]
[07:53:19] Preparing to commence simulation
[07:53:19] - Looking at optimizations...
[07:53:19] - Created dyn
[07:53:19] - Files status OK
[07:53:19] - Expanded 477478 -> 2356697 (decompressed 493.5 percent)
[07:53:20] - Starting from initial work packet
[07:53:20]
[07:53:20] Project: 6508 (Run 12, Clone 193, Gen 90)
[07:53:20]
[07:53:20] Assembly optimizations on if available.
[07:53:20] Entering M.D.
[07:53:26] Protein: proG_13 in water
[07:53:26]
[07:53:26] Writing local files
[07:53:27] Extra SSE boost OK.
[07:53:27] Writing local files
[07:53:27] Completed 0 out of 250000 steps (0%)
Many thanks for our interest.
Re: Project 6508 (Run 12, Clone 193, Gen 90)
Posted: Tue Feb 22, 2011 11:02 pm
by toTOW
Is your system overclocked ?
Re: Project 6508 (Run 12, Clone 193, Gen 90)
Posted: Tue Feb 22, 2011 11:07 pm
by bruce
An error like that can be due to a problem with the WU but it's more likely an issue with the hardware.
FAH makes very heavy use of the system. If the computer is overclocked or the memory has occasional errors or it's an old system and it's overheating, FAH may find an instability that might not be noticed with other software.
Are you oveclocking?
When was the last time you cleaned the dust out of the heatsink?
Re: Project 6508 (Run 12, Clone 193, Gen 90)
Posted: Wed Feb 23, 2011 7:02 am
by entraidelec
Hi ;
It's a celeron 800 pushed at 880 Mhz, full load CPU temperature reaches 35°C. Maintenance cleaning was made in January.
Many other workunits were completed without any problem since last September (startup of this machine) and one more this monday.
Maybe this unit have found the weak point, maybe it's time for retirement. I will plug a screen and test it with meme86 to be sure.
Re: Project: 6508 (Run 12, Clone 193, Gen 90)
Posted: Wed Feb 23, 2011 9:14 am
by bruce
This same WU was reissued and someone else completed it without difficulty. When one machine can do it and another one fails, there is a very good chance that FAH has pushed a marginal machine just enough harder than any of the benchmarks you tried. In other words, it's an unstable machine.
Re: Project: 6508 (Run 12, Clone 193, Gen 90)
Posted: Wed Feb 23, 2011 12:21 pm
by entraidelec
Ok, many thanks for this explanation. I'm glad to know that someone else completed the work.
Now I will have to make some investigations to re-validate this pinguin.
Cheers.
Re: Project: 6508 (Run 12, Clone 193, Gen 90)
Posted: Thu Feb 24, 2011 9:18 pm
by toTOW
bruce wrote:This same WU was reissued and someone else completed it without difficulty. When one machine can do it and another one fails, there is a very good chance that FAH has pushed a marginal machine just enough harder than any of the benchmarks you tried. In other words, it's an unstable machine.
That's not completely true with p65xx ... I've noticed that they tend to be quite unstable on some machines, but will complete fine on another, without any specific pattern