Project: 3043 (Run 6, Clone 39, Gen 75)

Moderators: Site Moderators, FAHC Science Team

Post Reply
ChasR
Posts: 402
Joined: Sun Dec 02, 2007 5:36 am
Location: Atlanta, GA

Project: 3043 (Run 6, Clone 39, Gen 75)

Post by ChasR »

Project: 3043 (Run 6, Clone 39, Gen 75) runs to 8%, reports long 1-4 interactions and hangs the client. Stopping and restarting the client results in WU restarting @ 8%, but quitting with a segmentation fault within a few minutes,wherepon the WU is deleted by the client. The process is then repeated with identical results. Q6600 @ 3.4, Native Linux SMP client (6.01beta2)

Client Hang:

Code: Select all

[09:49:16] Project: 3043 (Run 6, Clone 39, Gen 75)
[09:49:16] 
[09:49:16] Assembly optimizations on if available.
[09:49:16] Entering M.D.
[09:49:22] Protein: 9684 p3029_SMP-emsv-03
[09:49:22] Writing local files
[09:49:22] Extra SSE boost OK.
[09:49:22] 
[09:49:22] Extra SSE boost OK.
[09:49:22] Writing local files
[09:49:22] Completed 0 out of 10000000 steps  (0 percent)
[10:00:12] Writing local files
[10:00:12] Completed 100000 out of 10000000 steps  (1 percent)
[10:08:01] Writing local files
[10:08:01] Completed 200000 out of 10000000 steps  (2 percent)
[10:12:45] - Autosending finished units...
[10:12:45] Trying to send all finished work units
[10:12:45] + No unsent completed units remaining.
[10:12:45] - Autosend completed
[10:17:43] Writing local files
[10:17:43] Completed 300000 out of 10000000 steps  (3 percent)
[10:28:37] Writing local files
[10:28:37] Completed 400000 out of 10000000 steps  (4 percent)
[10:39:29] Writing local files
[10:39:29] Completed 500000 out of 10000000 steps  (5 percent)
[10:50:20] Writing local files
[10:50:20] Completed 600000 out of 10000000 steps  (6 percent)
[11:01:08] Writing local files
[11:01:08] Completed 700000 out of 10000000 steps  (7 percent)
[11:11:57] Writing local files
[11:11:57] Completed 800000 out of 10000000 steps  (8 percent)
[11:20:52] Warning:  long 1-4 interactions
[16:12:45] - Autosending finished units...
[16:12:45] Trying to send all finished work units
[16:12:45] + No unsent completed units remaining.
[16:12:45] - Autosend completed
[22:12:45] - Autosending finished units...
[22:12:45] Trying to send all finished work units
[22:12:45] + No unsent completed units remaining.
[22:12:45] - Autosend completed
[04:12:45] - Autosending finished units...
On restart:

Code: Select all

[16:28:15] Project: 3043 (Run 6, Clone 39, Gen 75)
[16:28:15] 
[16:28:15] Assembly optimizations on if available.
[16:28:15] Entering M.D.
[16:28:32]  on if available.
[16:28:32] Entering M.D.
[16:28:38] Protein: 9684 p3029_SMP-emsv-03
[16:28:38] Writing local files
[16:28:38] Completed 800000 out of 10000000 steps  (8 percent)
[16:28:38] Extra SSE boost OK.
[16:28:38] 00 steps  (8 percent)
[16:28:38] Extra SSE boost OK.
[16:30:30] CoreStatus = 1 (1)
[16:30:30] Client-core communications error: ERROR 0x1
[16:30:30] Deleting current work unit & continuing...
Image
ChasR
Posts: 402
Joined: Sun Dec 02, 2007 5:36 am
Location: Atlanta, GA

Re: Project: 3043 (Run 6, Clone 39, Gen 75)

Post by ChasR »

I got this WU again over the weekend. It needs to be taken out of circulation. If assigned to an unmonitored machine, it may well run doing nothing but autosend until the machine is rebooted. As it was, I lost a day and a half with the client hung.
Image
Rolo71
Posts: 6
Joined: Mon Jan 28, 2008 8:34 am
Location: The Netherlands

WU 3043 error

Post by Rolo71 »

Wu 3043 always goes wrong at 64% on my comp.
This is the log:

[20:52:17] Completed 6400000 out of 10000000 steps (64 percent)
[21:01:21] Warning: long 1-4 interactions
[21:01:21] Gromacs cannot continue further.
[21:01:21] Going to send back what have done.
[21:01:21] logfile size: 169213
[21:01:21] - Writing 169749 bytes of core data to disk...
[21:01:21] ... Done.
[21:01:21] - Failed to delete work/wudata_07.chk
[21:01:22] - Failed to delete work/wudata_07.sas
[21:01:22] Warning: check for stray files
[21:03:22]
[21:03:22] Folding@home Core Shutdown: EARLY_UNIT_END
[21:03:22]
[21:03:22] Folding@home Core Shutdown: EARLY_UNIT_END
[21:03:25] CoreStatus = 7B (123)
[21:03:25] Client-core communications error: ERROR 0x7b
[21:03:25] Deleting current work unit & continuing...


I'm running a [email protected] with 2 smp clients from different folders and different machine id's.
And I'm using the fah affinity changer.
Last edited by Rolo71 on Thu Mar 20, 2008 1:58 pm, edited 1 time in total.
anandhanju
Posts: 522
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: WU 3043 error

Post by anandhanju »

Please post the Run, Clone, Gen numbers from the log in this format

Code: Select all

Project: 3043 (Run xx, Clone yy, Gen zz)
This will help one of the moderators look up the ones that EUEd on your system.
Rolo71
Posts: 6
Joined: Mon Jan 28, 2008 8:34 am
Location: The Netherlands

Re: WU 3043 error

Post by Rolo71 »

It's Project: 3043 (Run 6, Clone 39, Gen 75)
Leoslocks
Posts: 120
Joined: Fri Jan 25, 2008 3:20 am
Hardware configuration: Q6600 | P35-DQ6 | Crucial 2 x 1 GB ram | VisionTek 3870
GPU2 Version 6.20| CPU three 6.20 Clients

Re: WU 3043 error

Post by Leoslocks »

I have completed two 3043 WU's on a Q6600 stock. Have you completed WU's with out the Affinity Changer?

Code: Select all

[00:36:13] Project: 3043 (Run 0, Clone 82, Gen 6)
[00:36:13] 
[00:36:13] Entering M.D.
[00:36:19] 
[00:36:19] Writing local files
[00:36:19] Extra SSE boost OK.
[00:36:19] SMP-emsv-03Extra SSE boost OK.
[00:36:19] 
[00:36:19] Extra SSE boost OK.
[00:36:19] Writing local files
[00:36:19] Completed 0 out of 10000000 steps  (0 percent)
[00:45:40] Writing local files
[00:45:40] Completed 100000 out of 10000000 steps  (1 percent)
///////////////////////////////////////////////////////
[16:21:56] Completed 10000000 out of 10000000 steps  (100 percent)
[16:21:56] Writing final coordinates.
[16:21:56] Past main M.D. loop
[16:21:56] Will end MPI now
anandhanju
Posts: 522
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: WU 3043 error

Post by anandhanju »

Rolo71 wrote:It's Project: 3043 (Run 6, Clone 39, Gen 75)
Please see this topic indicating similar reports with this WU (although at a different %). Can a mod please look this up and advise?
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: WU 3043 error

Post by 7im »

One record, obviously not completed fully...

Your WU (P3043 R6 C39 G75) was added to the stats database on 2007-12-22 08:45:14 for 172.67 points of credit.

Nothing more recent.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WU 3043 error

Post by bruce »

anandhanju wrote:Please see this topic indicating similar reports with this WU (although at a different %). Can a mod please look this up and advise?
Sure, but I'm not sure that's too helpful. The WU has been returned only once, and that was from Linux for partial credit.
Your WU (P3043 R6 C39 G75) was added to the stats database on 2007-12-22 08:45:14 for 172.67 points of credit.

We don't have access to any numbers about how many times it has been assigned. The fact that Windows Beta SMP tends to delete most WUs after an error makes it very difficult to tell what's going on in situations like this.

I'll merge the two threads since they're about the same WU.

EDIT: Oops. 7im beat me to it.
Rolo71
Posts: 6
Joined: Mon Jan 28, 2008 8:34 am
Location: The Netherlands

Re: Project: 3043 (Run 6, Clone 39, Gen 75)

Post by Rolo71 »

The log above is from march 19

This is the log from march 10:

12:02:53] Working on Unit 06 [March 10 12:02:53]
[12:02:53] + Working ...
[12:02:53]
[12:02:53] *------------------------------*
[12:02:53] Folding@Home Gromacs SMP Core
[12:02:53] Version 1.74 (March 10, 2007)
[12:02:53]
[12:02:53] Preparing to commence simulation
[12:02:53] - Ensuring status. Please wait.
[12:02:54] - Couldn't send HTTP request to server
[12:02:54] + Could not connect to Work Server (results)
[12:02:54] (171.64.65.64:8080)
[12:02:54] - Error: Could not transmit unit 05 (completed March 10) to work server.


[12:02:54] + Attempting to send results
[12:03:10] - Looking at optimizations...
[12:03:10] - Working with standard loops on this execution.
[12:03:10] Examination of work files indicates 8 consecutive improper terminations of core.
[12:03:11] - Expanded 283027 -> 1508541 (decompressed 533.0 percent)
[12:03:11]
[12:03:11] Project: 3043 (Run 6, Clone 39, Gen 75)
[12:03:11]
[12:03:11] Entering M.D.
[12:03:14] - Couldn't send HTTP request to server
[12:03:14] + Could not connect to Work Server (results)
[12:03:14] (171.64.122.76:8080)
[12:03:14] Could not transmit unit 05 to Collection server; keeping in queue.
[12:03:20] Calling FAH init
[12:03:20] Read topology
[12:03:21] (Starting from checkpoint)
[12:03:22] SSE boost OK.
[12:03:22] t
[12:03:22] Protein: 9684 p3029_SMP-emsv-03
[12:03:22] Writing local files
[12:03:22] Extra SSE boost OK.
[12:03:22] Writing local files
[12:03:22] Completed 0 out of 10000000 steps (0 percent)
[12:20:30] Writing local files
[[19:13:51] Writing local files
[19:13:51] Completed 3100000 out of 10000000 steps (31 percent)
[19:21:43] Gromacs cannot continue further.
[19:21:43] Going to send back what have done.
[19:21:43] logfile size: 106449
[19:21:43] - Writing 106985 bytes of core data to disk...
[19:21:43] ... Done.
[19:21:43] - Failed to delete work/wudata_06.xtc
[19:21:44] - Failed to delete work/wudata_06.bed
[19:21:44] - Failed to delete work/wudata_06.sas
[19:21:44] - Failed to delete work/wudata_06.goe
[19:21:44] Warning: check for stray files
[19:23:44]
[19:23:44] Folding@home Core Shutdown: EARLY_UNIT_END
[19:23:44]
[19:23:44] Folding@home Core Shutdown: EARLY_UNIT_END
[19:23:47] CoreStatus = 7B (123)
[19:23:47] Client-core communications error: ERROR 0x7b
[19:23:47] Deleting current work unit & continuing...
Last edited by Rolo71 on Thu Mar 20, 2008 5:36 pm, edited 3 times in total.
Rolo71
Posts: 6
Joined: Mon Jan 28, 2008 8:34 am
Location: The Netherlands

Re: Project: 3043 (Run 6, Clone 39, Gen 75)

Post by Rolo71 »

this is the log from march 13:

[09:48:34] Working on Unit 00 [March 13 09:48:34]
[09:48:34] + Working ...
[09:48:34]
[09:48:34] *------------------------------*
[09:48:34] Folding@Home Gromacs SMP Core
[09:48:34] Version 1.74 (March 10, 2007)
[09:48:34]
[09:48:34] Preparing to commence simulation
[09:48:34] - Ensuring status. Please wait.
[09:48:34] - Starting from initial work packet
[09:48:34]
[09:48:34] Project: 3043 (Run 6, Clone 39, Gen 75)
[09:48:34]
[09:48:34] Assembly optimizations on if available.
[09:48:34] Entering M.D.
[09:48:51] ial work packet
[09:48:52] rting from initial work packet
[09:48:52]
[09:48:52] Project: 3043 (Run 6, Clone 39, Gen 75)
[09:48:52]
[09:48:52] Entering M.D.
[09:48:58]
[09:48:58] Writing local files
[09:48:58] Extra SSE boost OK.
[09:48:58] SMP-emsv-03Extra SSE boost OK.
[09:48:58]
[09:48:58] Extra SSE boost OK.
[09:48:58] Writing local files
[09:48:58] Completed 0 out of 10000000 steps (0 percent)
[10:02:23] Writing local files
[00:26:41] Completed 6400000 out of 10000000 steps (64 percent)
[00:35:48] Warning: long 1-4 interactions
[00:35:48] Gromacs cannot continue further.
[00:35:48] Going to send back what have done.
[00:35:48] logfile size: 169213
[00:35:48] - Writing 169749 bytes of core data to disk...
[00:35:48] ... Done.
[00:35:48] - Failed to delete work/wudata_00.arc
[00:35:48] - Failed to delete work/wudata_00.xtc
[00:35:48] No C.P. to delete.
[00:35:48] - Failed to delete work/wudata_00.sas
[00:35:48] Warning: check for stray files
[00:37:49]
[00:37:49] Folding@home Core Shutdown: EARLY_UNIT_END
[00:37:49]
[00:37:49] Folding@home Core Shutdown: EARLY_UNIT_END
[00:37:53] CoreStatus = 7B (123)
[00:37:53] Client-core communications error: ERROR 0x7b
[00:37:53] Deleting current work unit & continuing...
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 3043 (Run 6, Clone 39, Gen 75)

Post by bruce »

So far it appears that every WU is getting ERROR 0x7b which seems to mean very little. All we really know is that Windows cancelled the folding process, with no explanation as to why.

You can TRY stopping the folding process not long before the "Warning: long 1-4 interactions" message. Make a backup. Then resume processing. For some unexplained reason, some people have been able to pass the error and complete WUs like that one after a stop/resume.
Rolo71
Posts: 6
Joined: Mon Jan 28, 2008 8:34 am
Location: The Netherlands

Re: Project: 3043 (Run 6, Clone 39, Gen 75)

Post by Rolo71 »

I finally managed to fold this WU completely.

Stoppped the console at 60%. Then restarted my comp and continued. It went straight to 100%.
Post Reply