Project: 2605 (Run 9, Clone 571, Gen 5)

Moderators: Site Moderators, FAHC Science Team

Tigerbiten
Posts: 62
Joined: Sun Dec 02, 2007 6:02 am

Project: 2605 (Run 9, Clone 571, Gen 5)

Post by Tigerbiten »

This work unit would not run on my box.
It shut itself down right at the start.Deleated it 3 times to get another work unit.

Log files ................
First attempt

Code: Select all

[13:41:34] + Processing work unit
[13:41:34] Core required: FahCore_a1.exe
[13:41:34] Core found.
[13:41:34] Working on Unit 03 [December 23 13:41:34]
[13:41:34] + Working ...
[13:41:34] 
[13:41:34] *------------------------------*
[13:41:34] Folding@Home Gromacs SMP Core
[13:41:34] Version 1.74 (November 27, 2006)
[13:41:34] 
[13:41:34] Preparing to commence simulation
[13:41:34] - Ensuring status. Please wait.
[13:41:35] - Starting from initial work packet
[13:41:35] 
[13:41:35] Project: 2605 (Run 9, Clone 571, Gen 5)
[13:41:35] 
[13:41:35] Assembly optimizations on if available.
[13:41:35] Entering M.D.
[13:41:52] 0 percent)
[13:41:52] - Starting from initial work packet
[13:41:52] 
[13:41:52] Project: 2605 (Run 9, Clone 571, Gen 5)
[13:41:52] 
[13:41:52] Entering M.D.
[13:41:59] Protein: ProteExtra SSE boost OK.
[13:41:59] ocal files
[13:41:59] Extra SSE boost OK.
[13:42:00] Finalizing output
[13:42:00] UPTED
[13:42:04] CoreStatus = 66 (102)
[13:42:04] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
Restarted ..............

Code: Select all

[16:16:19] Loaded queue successfully.
[16:16:19] 
[16:16:19] + Processing work unit
[16:16:19] Core required: FahCore_a1.exe
[16:16:19] Core found.
[16:16:19] Working on Unit 03 [December 23 16:16:19]
[16:16:19] + Working ...
[16:16:19] 
[16:16:19] *------------------------------*
[16:16:19] Folding@Home Gromacs SMP Core
[16:16:19] Version 1.74 (November 27, 2006)
[16:16:19] 
[16:16:19] Preparing to commence simulation
[16:16:19] - Ensuring status. Please wait.
[16:16:19] 
[16:16:19] Project: 2605 (Run 9, Clone 571, Gen 5)
[16:16:19] 
[16:16:19] Assembly optimizations on if available.
[16:16:19] Entering M.D.
[16:16:36] 
[16:16:36] - Expanded 2435509 -> 12886013 (decompressed 529.0 percent)
[16:16:36] 
[16:16:36] Project: 2605 (Run 9, Clone 571, Gen 5)
[16:16:36] 
[16:16:37] Entering M.D.
[16:16:43] s
[16:16:43] Extra SSE boost OK.
[16:16:43] E boost OK.
[16:16:43] ocal files
[16:16:43] Extra SSE boost OK.
[16:16:44] Finalizing output
[16:16:44] nt)
[16:16:44] 
[16:16:44] Folding@home Core Shutdown: INTERRUPTED
[16:16:48] CoreStatus = 66 (102)
[16:16:48] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
The next 2 runs had identical results.

Luck ................ :D
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: p2605 (Run 9, Clone 571, Gen 5 )

Post by toTOW »

Hum the fahwiki doesn't help much :( : http://fahwiki.net/index.php/CoreStatus_codes#66
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
preet.to
Posts: 19
Joined: Sun Dec 16, 2007 3:20 pm

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by preet.to »

I have had repeated issues with 2605. It fails to delete all the WU files in the work directory. Then on the next run I get to 100% complete and fails. Nothing gets uploaded and I lose all the points. Since I run the Linux version with SMP, I keep getting reassigned this WU. Managed to lose over 12,000 points now.

If I catch a WU early, I wipe out the queue, work folder and all other WU files and restart. That lets me get a couple of WU's complete before the cycle of errors starts up again.

Is there any hope here? Is anyone working on this problem or should I back off the Beta and continue there?

This problem has been vexing me for a couple of months.

Thanks
MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by MoneyGuyBK »

preet.to wrote:I have had repeated issues with 2605. It fails to delete all the WU files in the work directory. Then on the next run I get to 100% complete and fails. Nothing gets uploaded and I lose all the points. Since I run the Linux version with SMP, I keep getting reassigned this WU. Managed to lose over 12,000 points now.
If I catch a WU early, I wipe out the queue, work folder and all other WU files and restart. That lets me get a couple of WU's complete before the cycle of errors starts up again.
Is there any hope here? Is anyone working on this problem or should I back off the Beta and continue there?
This problem has been vexing me for a couple of months.
Thanks
That same issue has been with me for the last two months.
To date, I have lost over 17 WUs (Almost 30K_Points) :roll:
I, too, would like to know if there is an answer to this(these) issue(s) and what all causes it(them).... and therefore a cure to a WU completing to 100% and not getting any points.

Hopefully we will have some luck with this in 2008 as we did not in 2007!!!

Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by bruce »

I don't know what's causing your problem, but I do have a guess. Check the permissions on the directory in which the fah client is running. It could be that when FAH creates the "work" subdirectory, it does not have ownership and full permissions on the directory and all of the files. The same may be true for the containing directory.
preet.to
Posts: 19
Joined: Sun Dec 16, 2007 3:20 pm

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by preet.to »

Thanks for the idea. The permissions are fine.

You did give me an idea. What if the thread that is trying to delete did not wait for a thread trying to write? This would explain why a WU never completes. It is trying to collect all the data and a thread is still writing. This appears to be a timing issue with the SMP architecture.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by bruce »

preet.to wrote:Thanks for the idea. The permissions are fine.

You did give me an idea. What if the thread that is trying to delete did not wait for a thread trying to write? This would explain why a WU never completes. It is trying to collect all the data and a thread is still writing. This appears to be a timing issue with the SMP architecture.
I'll pass it on to the developers. With software this complex, a race condition is always possible.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by kasson »

It's a good idea. We try to allow for such things, but the MPI timing is a bit tricky.
Tigerbiten
Posts: 62
Joined: Sun Dec 02, 2007 6:02 am

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by Tigerbiten »

Got it again and the same thing happened.
The work units shuts the client down.

Code: Select all

[20:42:10] Project: 2605 (Run 9, Clone 571, Gen 5)
[20:42:10] 
[20:42:10] Assembly optimizations on if available.
[20:42:10] Entering M.D.
[20:42:27] 0 percent)
[20:42:27] - Starting from initial work packet
[20:42:27] 
[20:42:27] Project: 2605 (Run 9, Clone 571, Gen 5)
[20:42:27] 
[20:42:27] Entering M.D.
[20:42:35] Protein: Protein in POPC
[20:42:35] Writing local files
[20:42:36] Extra SSE boost OK.
[20:42:37] e Shutdown: INTERRUPTED
[20:42:42] CoreStatus = 66 (102)
[20:42:42] + Shutdown requested by user. Exiting.
Going to try again after deleteing this copy of the work unit.

Edit. Same thing happened on next two trys.
Tigerbiten
Posts: 62
Joined: Sun Dec 02, 2007 6:02 am

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by Tigerbiten »

Got it for a third time.
Same thing happened.
Hass anyone been able to fold this protiern ??????

Code: Select all

[02:05:24] Folding@Home Gromacs SMP Core
[02:05:24] Version 1.74 (November 27, 2006)
[02:05:24] 
[02:05:24] Preparing to commence simulation
[02:05:24] - Ensuring status. Please wait.
[02:05:25] - Starting from initial work packet
[02:05:25] 
[02:05:25] Project: 2605 (Run 9, Clone 571, Gen 5)
[02:05:25] 
[02:05:25] Assembly optimizations on if available.
[02:05:25] Entering M.D.
[02:05:42] 0 percent)
[02:05:42] - Starting from initial work packet
[02:05:42] 
[02:05:42] Project: 2605 (Run 9, Clone 571, Gen 5)
[02:05:42] 
[02:05:42] Entering M.D.
[02:05:50] Protein: Protein in POPC
[02:05:50] Writing local files
[02:05:52] Extra SSE boost OK.
[02:05:53] e Shutdown: INTERRUPTED
[02:05:57] CoreStatus = 66 (102)
[02:05:57] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[02:05:57] Killing all core threads

Folding@Home Client Shutdown.
Luck ........... :D
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by Flathead74 »

One of my teammates has also tried to process this WU with the following results:

Immediate failure, each time.

Project: 2605 (Run 9, Clone 571, Gen 5)
[21:25:23] Completed 0 out of 500000 steps (0 percent)
[21:25:28] CoreStatus = 0 (0)
[21:25:28] Client-core communications error: ERROR 0x0
[21:25:28] - Attempting to download new core...

Sir_Loin - 1/4/2007
*3x and out. Downloaded new core.
E6550 @ 3360 / 2gb PC2-6400 1:1 / Ubuntu 7.1 on MSI G33M-FI
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by bruce »

Nobody has returned Project: 2605 (Run 9, Clone 571, Gen 5) yet.
Tigerbiten
Posts: 62
Joined: Sun Dec 02, 2007 6:02 am

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by Tigerbiten »

Just got it again.
Same result ............. :(

First try ...............

Code: Select all

[04:55:33] Project: 2605 (Run 9, Clone 571, Gen 5)
[04:55:33] 
[04:55:33] Assembly optimizations on if available.
[04:55:33] Entering M.D.
[04:55:50] 0 percent)
[04:55:50] - Starting from initial work packet
[04:55:50] 
[04:55:50] Project: 2605 (Run 9, Clone 571, Gen 5)
[04:55:50] 
[04:55:50] Entering M.D.
[04:55:58] Protein: Protein in POPC
[04:55:58] Writing local files
[04:55:59] Extra SSE boost OK.
[04:56:00] cal files
[04:56:00] Completed 0 out of 500000 steps  (0 percent)
[04:56:00] 
[04:56:00] Folding@home Core Shutdown: INTERRUPTED
[04:56:04] CoreStatus = 66 (102)
[04:56:04] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[04:56:04] Killing all core threads

Restarted ................

Code: Select all

[08:23:27] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:23:27] 
[08:23:28] Entering M.D.
[08:23:35] g local files
[08:23:36] in in POPC
[08:23:36] Writing local files
[08:23:37] Extra SSE boost OK.
[08:23:38] 00 steps  (0 percent)
[08:23:42] CoreStatus = 0 (0)
[08:23:42] Client-core communications error: ERROR 0x0
[08:23:42] Deleting current work unit & continuing...
Stopped the client and deleted the work by hand.
Restarted and got ................

Code: Select all

[08:41:02] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:41:02] 
[08:41:02] Assembly optimizations on if available.
[08:41:02] Entering M.D.
[08:41:19] 0 percent)
[08:41:19] cket
[08:41:19] 
[08:41:19] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:41:19] 
[08:41:19] Entering M.D.
[08:41:20] one 571, Gen 5)
[08:41:20] 
[08:41:20] Entering M.D.
[08:41:27] Protein: Protein in POPC
[08:41:27] Writing local files
[08:41:28] Extra SSE boost OK.
[08:41:29] cal files
[08:41:29] Completed 0 out of 500000 steps  (0 percent)
[08:42:48] CoreStatus = 0 (0)
[08:42:48] Client-core communications error: ERROR 0x0
[08:42:48] Deleting current work unit & continuing...
Third time

Code: Select all

[08:48:48] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:48:48] 
[08:48:48] Assembly optimizations on if available.
[08:48:48] Entering M.D.
[08:50:26] 0 percent)
[08:50:26] - Starting from initial work packet
[08:50:26] 
[08:50:26] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:50:26] 
[08:50:26] Entering M.D.
[08:50:34] Protein: Protein in POPC
[08:50:34] Writing local files
[08:50:35] Extra SSE boost OK.
[08:50:35] cal files
[08:50:36] Completed 0 out of 500000 steps  (0 percent)
[08:50:36] 
[08:50:36] Folding@home Core Shutdown: INTERRUPTED
[08:50:40] CoreStatus = 66 (102)
[08:50:40] + Shutdown requested by user. Exiting.
I've deleted it again.
I've a saved copy of the work folder and queue.dat if you want it.

Luck ......... :D
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by Flathead74 »

This evidently bad WU is still being hand out.

Thanks for that.

[05:44:45] Project: 2605 (Run 9, Clone 571, Gen 5)

[05:45:11] Completed 0 out of 500000 steps (0 percent)
[05:45:11] Folding@home Core Shutdown: INTERRUPTED
[05:45:16] CoreStatus = 0 (0)
[05:45:16] Client-core communications error: ERROR 0x0
[05:45:16] Deleting current work unit & continuing...

*3x failure, manually deleted after third failure

3.0GHz Nocona Xeon x 2 / DH800 / 1GB PC3200 / Suse 10.1

A teammate has already been blessed with this WU twice, with the same exact mode of failure each time.
See post dated Jan 08, 2008, in this thread for reference.

Maybe it is time to pull this one from circulation?
klasseng
Posts: 125
Joined: Thu Dec 27, 2007 6:08 am
Hardware configuration: System #1, Quad GPU:
Motherboard: Asus Rampage IV Extreme
CPU: 6 Core Intel i7 (3930K)
GPU: 4 X NVIDIA GForce GTS 450
OS: WIndows 7 Home Premium, 64-bit
RAM: 16GB

System #2:
MacPro 2,1 (Early 2007)
Dual Quad-Core Intel Xeon 3GHz (X5365)
9GB Memory
OS: Mac OS X 10.7.5
GPU: N/A
Location: Canada

Re: Project: 2605 (Run 9, Clone 571, Gen 5)

Post by klasseng »

This one just died in similar fashion on my Mac Mini (2GHz Core2Duo Intel).
peace,
Grant
Post Reply