Project: 3062 (Run 2, Clone 71, Gen 7)

Moderators: Site Moderators, FAHC Science Team

Post Reply
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Project: 3062 (Run 2, Clone 71, Gen 7)

Post by 314159 »

Q6600, Linux Client, Stable Machine, Stock Clock

Code: Select all

[23:36:42] *------------------------------*
[23:36:42] Folding@Home Gromacs SMP Core
[23:36:59] - Starting from initial work packet 
[23:36:59] Project: 3062 (Run 2, Clone 71, Gen 7)
[23:37:06] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[23:37:06] Extra SSE boost OK.
[23:37:06] Writing local files
[23:37:06] Completed 0 out of 5000000 steps  (0 percent)

[08:57:18] Writing local files
[08:57:18] Completed 2800000 out of 5000000 steps  (56 percent)
[09:07:17] Warning:  long 1-4 interactions
[09:07:21] CoreStatus = 0 (0)
[09:07:21] Client-core communications error: ERROR 0x0
[09:07:21] Deleting current work unit & continuing...

[09:11:55] *------------------------------*
[09:11:55] Folding@Home Gromacs SMP Core
[09:11:55] Version 1.74 (November 27, 2006)
[09:12:12] Project: 3062 (Run 2, Clone 71, Gen 7)
[09:12:19] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[09:12:19] Extra SSE boost OK.
[09:12:19] Completed 0 out of 5000000 steps  (0 percent)
[14:43:14] Completed 1650000 out of 5000000 steps  (33 percent)
I do not understand the "logic" of reassigning the same WU to a machine that has experienced a client core comm error.
I also do not have time to baby-sit the second (or third) assignment of the same WU.

Last week, when I was unavailable, I noted that one SMP WU had been assigned three times consecutively after 0X0'ing at the same point.
This machine, also a Q6600, then completed three WUs sucessfully after which the same offending WU was again assigned.

The error trapping in the SMP clients is quite substandard and perhaps another approach to this issue would be appropriate until such time as the code is improved.
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
susato
Site Moderator
Posts: 511
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Project: 3062 (Run 2, Clone 71, Gen 7)

Post by susato »

This particular work unit represents an exception to the rule that when a unit fails from simulation instability (e.g. long 1-4 interactions), it will fail again:

Project 3062, Run 2, Clone 71, Gen 7

Donator: 314159 Team: 1971
CPUId: 759EXXXXXXXXXXXXE8
Credit: 1732 Credit Time: 2008-03-18 20:19:15
Entered into logs at: 2008-03-18 20:00:04
WU assigned to donor at: 2008-03-18 01:09:58
Days taken to complete WU: 0.78
Error code: 0

Hi 314159 (team 1971),
Your WU (P3062 R2 C71 G7) was added to the stats database on 2008-03-18 20:19:15 for 1732 points of credit.

This entry represents your second try at the work unit, starting at 3/18/08, 9:09 UTC as shown in your FAHlog.txt excerpt.
Congrats on the completion - unusual in my experience with failed units. Did you perhaps stop and restart it partway, to help it finish?
314159
Posts: 232
Joined: Sun Dec 02, 2007 2:46 am
Location: http://www.teammacosx.org/

Re: Project: 3062 (Run 2, Clone 71, Gen 7)

Post by 314159 »

Contrary to my statement that I did not have time to baby-sit this WU, I spent considerable time attempting to complete it - using every trick known to me. :)

It was my initial attempt at this particular project and I really wanted it to complete. When it did, I broke out the (non-alcoholic) champagne.

Suspecting a Core crash as the cause, I restarted the WU several times and backed it up at each attempt. (I did not have to go to the backups)

My only concern is whether the results returned are scientifically valid. My conclusion is that they are.

BTW, on two occasions, I happened to be refreshing the Mac SMP client JUST at the time it had logged long 1-4 interactions (talk about luck - and a true story).
I immediately stopped the client and upon restart, it picked up at the previous checkpoint in each case and completed normally!!

Thank you very much for the log detail. (you didn't think that I deleted it, did you?) !!!! :D
John (from the central part of the Commonwealth of Virginia, U.S.A.)

A friendly visitor to what hopefully will remain a friendly Forum.
With thanks to all of the dedicated volunteers on the staff here!!
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 3062 (Run 2, Clone 71, Gen 7)

Post by 7im »

My only concern is whether the results returned are scientifically valid. My conclusion is that they are.
Good conclusion. Stanford has security and data checks during upload. If you got points, the data was accepted.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply