Page 1 of 2
Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Sun Dec 23, 2007 4:33 pm
by Tigerbiten
This work unit would not run on my box.
It shut itself down right at the start.Deleated it 3 times to get another work unit.
Log files ................
First attempt
Code: Select all
[13:41:34] + Processing work unit
[13:41:34] Core required: FahCore_a1.exe
[13:41:34] Core found.
[13:41:34] Working on Unit 03 [December 23 13:41:34]
[13:41:34] + Working ...
[13:41:34]
[13:41:34] *------------------------------*
[13:41:34] Folding@Home Gromacs SMP Core
[13:41:34] Version 1.74 (November 27, 2006)
[13:41:34]
[13:41:34] Preparing to commence simulation
[13:41:34] - Ensuring status. Please wait.
[13:41:35] - Starting from initial work packet
[13:41:35]
[13:41:35] Project: 2605 (Run 9, Clone 571, Gen 5)
[13:41:35]
[13:41:35] Assembly optimizations on if available.
[13:41:35] Entering M.D.
[13:41:52] 0 percent)
[13:41:52] - Starting from initial work packet
[13:41:52]
[13:41:52] Project: 2605 (Run 9, Clone 571, Gen 5)
[13:41:52]
[13:41:52] Entering M.D.
[13:41:59] Protein: ProteExtra SSE boost OK.
[13:41:59] ocal files
[13:41:59] Extra SSE boost OK.
[13:42:00] Finalizing output
[13:42:00] UPTED
[13:42:04] CoreStatus = 66 (102)
[13:42:04] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
Restarted ..............
Code: Select all
[16:16:19] Loaded queue successfully.
[16:16:19]
[16:16:19] + Processing work unit
[16:16:19] Core required: FahCore_a1.exe
[16:16:19] Core found.
[16:16:19] Working on Unit 03 [December 23 16:16:19]
[16:16:19] + Working ...
[16:16:19]
[16:16:19] *------------------------------*
[16:16:19] Folding@Home Gromacs SMP Core
[16:16:19] Version 1.74 (November 27, 2006)
[16:16:19]
[16:16:19] Preparing to commence simulation
[16:16:19] - Ensuring status. Please wait.
[16:16:19]
[16:16:19] Project: 2605 (Run 9, Clone 571, Gen 5)
[16:16:19]
[16:16:19] Assembly optimizations on if available.
[16:16:19] Entering M.D.
[16:16:36]
[16:16:36] - Expanded 2435509 -> 12886013 (decompressed 529.0 percent)
[16:16:36]
[16:16:36] Project: 2605 (Run 9, Clone 571, Gen 5)
[16:16:36]
[16:16:37] Entering M.D.
[16:16:43] s
[16:16:43] Extra SSE boost OK.
[16:16:43] E boost OK.
[16:16:43] ocal files
[16:16:43] Extra SSE boost OK.
[16:16:44] Finalizing output
[16:16:44] nt)
[16:16:44]
[16:16:44] Folding@home Core Shutdown: INTERRUPTED
[16:16:48] CoreStatus = 66 (102)
[16:16:48] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
The next 2 runs had identical results.
Luck ................
Re: p2605 (Run 9, Clone 571, Gen 5 )
Posted: Sun Dec 23, 2007 11:03 pm
by toTOW
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Tue Jan 01, 2008 8:48 pm
by preet.to
I have had repeated issues with 2605. It fails to delete all the WU files in the work directory. Then on the next run I get to 100% complete and fails. Nothing gets uploaded and I lose all the points. Since I run the Linux version with SMP, I keep getting reassigned this WU. Managed to lose over 12,000 points now.
If I catch a WU early, I wipe out the queue, work folder and all other WU files and restart. That lets me get a couple of WU's complete before the cycle of errors starts up again.
Is there any hope here? Is anyone working on this problem or should I back off the Beta and continue there?
This problem has been vexing me for a couple of months.
Thanks
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Wed Jan 02, 2008 4:03 am
by MoneyGuyBK
preet.to wrote:I have had repeated issues with 2605. It fails to delete all the WU files in the work directory. Then on the next run I get to 100% complete and fails. Nothing gets uploaded and I lose all the points. Since I run the Linux version with SMP, I keep getting reassigned this WU. Managed to lose over 12,000 points now.
If I catch a WU early, I wipe out the queue, work folder and all other WU files and restart. That lets me get a couple of WU's complete before the cycle of errors starts up again.
Is there any hope here? Is anyone working on this problem or should I back off the Beta and continue there?
This problem has been vexing me for a couple of months.
Thanks
That same issue has been with me for the last two months.
To date, I have lost over 17 WUs (Almost 30K_Points)
I, too, would like to know if there is an answer to this(these) issue(s) and what all causes it(them).... and therefore a cure to a WU completing to 100% and not getting any points.
Hopefully we will have some luck with this in 2008 as we did not in 2007!!!
Peace
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Wed Jan 02, 2008 12:26 pm
by bruce
I don't know what's causing your problem, but I do have a guess. Check the permissions on the directory in which the fah client is running. It could be that when FAH creates the "work" subdirectory, it does not have ownership and full permissions on the directory and all of the files. The same may be true for the containing directory.
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Wed Jan 02, 2008 1:00 pm
by preet.to
Thanks for the idea. The permissions are fine.
You did give me an idea. What if the thread that is trying to delete did not wait for a thread trying to write? This would explain why a WU never completes. It is trying to collect all the data and a thread is still writing. This appears to be a timing issue with the SMP architecture.
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Wed Jan 02, 2008 1:03 pm
by bruce
preet.to wrote:Thanks for the idea. The permissions are fine.
You did give me an idea. What if the thread that is trying to delete did not wait for a thread trying to write? This would explain why a WU never completes. It is trying to collect all the data and a thread is still writing. This appears to be a timing issue with the SMP architecture.
I'll pass it on to the developers. With software this complex, a race condition is always possible.
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Wed Jan 02, 2008 4:55 pm
by kasson
It's a good idea. We try to allow for such things, but the MPI timing is a bit tricky.
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Sun Jan 06, 2008 11:14 pm
by Tigerbiten
Got it again and the same thing happened.
The work units shuts the client down.
Code: Select all
[20:42:10] Project: 2605 (Run 9, Clone 571, Gen 5)
[20:42:10]
[20:42:10] Assembly optimizations on if available.
[20:42:10] Entering M.D.
[20:42:27] 0 percent)
[20:42:27] - Starting from initial work packet
[20:42:27]
[20:42:27] Project: 2605 (Run 9, Clone 571, Gen 5)
[20:42:27]
[20:42:27] Entering M.D.
[20:42:35] Protein: Protein in POPC
[20:42:35] Writing local files
[20:42:36] Extra SSE boost OK.
[20:42:37] e Shutdown: INTERRUPTED
[20:42:42] CoreStatus = 66 (102)
[20:42:42] + Shutdown requested by user. Exiting.
Going to try again after deleteing this copy of the work unit.
Edit. Same thing happened on next two trys.
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Tue Jan 08, 2008 9:23 am
by Tigerbiten
Got it for a third time.
Same thing happened.
Hass anyone been able to fold this protiern ??????
Code: Select all
[02:05:24] Folding@Home Gromacs SMP Core
[02:05:24] Version 1.74 (November 27, 2006)
[02:05:24]
[02:05:24] Preparing to commence simulation
[02:05:24] - Ensuring status. Please wait.
[02:05:25] - Starting from initial work packet
[02:05:25]
[02:05:25] Project: 2605 (Run 9, Clone 571, Gen 5)
[02:05:25]
[02:05:25] Assembly optimizations on if available.
[02:05:25] Entering M.D.
[02:05:42] 0 percent)
[02:05:42] - Starting from initial work packet
[02:05:42]
[02:05:42] Project: 2605 (Run 9, Clone 571, Gen 5)
[02:05:42]
[02:05:42] Entering M.D.
[02:05:50] Protein: Protein in POPC
[02:05:50] Writing local files
[02:05:52] Extra SSE boost OK.
[02:05:53] e Shutdown: INTERRUPTED
[02:05:57] CoreStatus = 66 (102)
[02:05:57] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[02:05:57] Killing all core threads
Folding@Home Client Shutdown.
Luck ...........
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Tue Jan 08, 2008 5:05 pm
by Flathead74
One of my teammates has also tried to process this WU with the following results:
Immediate failure, each time.
Project: 2605 (Run 9, Clone 571, Gen 5)
[21:25:23] Completed 0 out of 500000 steps (0 percent)
[21:25:28] CoreStatus = 0 (0)
[21:25:28] Client-core communications error: ERROR 0x0
[21:25:28] - Attempting to download new core...
Sir_Loin - 1/4/2007
*3x and out. Downloaded new core.
E6550 @ 3360 / 2gb PC2-6400 1:1 / Ubuntu 7.1 on MSI G33M-FI
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Wed Jan 09, 2008 8:38 am
by bruce
Nobody has returned Project: 2605 (Run 9, Clone 571, Gen 5) yet.
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Thu Feb 07, 2008 9:43 am
by Tigerbiten
Just got it again.
Same result .............
First try ...............
Code: Select all
[04:55:33] Project: 2605 (Run 9, Clone 571, Gen 5)
[04:55:33]
[04:55:33] Assembly optimizations on if available.
[04:55:33] Entering M.D.
[04:55:50] 0 percent)
[04:55:50] - Starting from initial work packet
[04:55:50]
[04:55:50] Project: 2605 (Run 9, Clone 571, Gen 5)
[04:55:50]
[04:55:50] Entering M.D.
[04:55:58] Protein: Protein in POPC
[04:55:58] Writing local files
[04:55:59] Extra SSE boost OK.
[04:56:00] cal files
[04:56:00] Completed 0 out of 500000 steps (0 percent)
[04:56:00]
[04:56:00] Folding@home Core Shutdown: INTERRUPTED
[04:56:04] CoreStatus = 66 (102)
[04:56:04] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[04:56:04] Killing all core threads
Restarted ................
Code: Select all
[08:23:27] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:23:27]
[08:23:28] Entering M.D.
[08:23:35] g local files
[08:23:36] in in POPC
[08:23:36] Writing local files
[08:23:37] Extra SSE boost OK.
[08:23:38] 00 steps (0 percent)
[08:23:42] CoreStatus = 0 (0)
[08:23:42] Client-core communications error: ERROR 0x0
[08:23:42] Deleting current work unit & continuing...
Stopped the client and deleted the work by hand.
Restarted and got ................
Code: Select all
[08:41:02] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:41:02]
[08:41:02] Assembly optimizations on if available.
[08:41:02] Entering M.D.
[08:41:19] 0 percent)
[08:41:19] cket
[08:41:19]
[08:41:19] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:41:19]
[08:41:19] Entering M.D.
[08:41:20] one 571, Gen 5)
[08:41:20]
[08:41:20] Entering M.D.
[08:41:27] Protein: Protein in POPC
[08:41:27] Writing local files
[08:41:28] Extra SSE boost OK.
[08:41:29] cal files
[08:41:29] Completed 0 out of 500000 steps (0 percent)
[08:42:48] CoreStatus = 0 (0)
[08:42:48] Client-core communications error: ERROR 0x0
[08:42:48] Deleting current work unit & continuing...
Third time
Code: Select all
[08:48:48] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:48:48]
[08:48:48] Assembly optimizations on if available.
[08:48:48] Entering M.D.
[08:50:26] 0 percent)
[08:50:26] - Starting from initial work packet
[08:50:26]
[08:50:26] Project: 2605 (Run 9, Clone 571, Gen 5)
[08:50:26]
[08:50:26] Entering M.D.
[08:50:34] Protein: Protein in POPC
[08:50:34] Writing local files
[08:50:35] Extra SSE boost OK.
[08:50:35] cal files
[08:50:36] Completed 0 out of 500000 steps (0 percent)
[08:50:36]
[08:50:36] Folding@home Core Shutdown: INTERRUPTED
[08:50:40] CoreStatus = 66 (102)
[08:50:40] + Shutdown requested by user. Exiting.
I've deleted it again.
I've a saved copy of the work folder and queue.dat if you want it.
Luck .........
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Fri Feb 15, 2008 6:28 am
by Flathead74
This evidently bad WU is still being hand out.
Thanks for that.
[05:44:45] Project: 2605 (Run 9, Clone 571, Gen 5)
[05:45:11] Completed 0 out of 500000 steps (0 percent)
[05:45:11] Folding@home Core Shutdown: INTERRUPTED
[05:45:16] CoreStatus = 0 (0)
[05:45:16] Client-core communications error: ERROR 0x0
[05:45:16] Deleting current work unit & continuing...
*3x failure, manually deleted after third failure
3.0GHz Nocona Xeon x 2 / DH800 / 1GB PC3200 / Suse 10.1
A teammate has already been blessed with this WU twice, with the same exact mode of failure each time.
See post dated Jan 08, 2008, in this thread for reference.
Maybe it is time to pull this one from circulation?
Re: Project: 2605 (Run 9, Clone 571, Gen 5)
Posted: Fri Feb 29, 2008 1:28 am
by klasseng
This one just died in similar fashion on my Mac Mini (2GHz Core2Duo Intel).