Page 2 of 2

Re: Project 10085 Failed (48 core system)

Posted: Sun Dec 07, 2014 12:35 am
by AtwaterFS
Also got same error and fahcore crash on vanilla setup 16 core system - Project 10085 (Run 2, Clone 31, Gen 7).

As someone else voiced - it's a constantly ocuring problem that I dont have time to repeatedly deal with it - I miss the past few months where I could set it and forget it.


Code: Select all

*********************** Log Started 2014-12-03T00:06:25Z ***********************
00:06:25:************************* Folding@home Client *************************
00:06:25:      Website:
00:06:25:    Copyright: (c) 2009-2014 Stanford University
00:06:25:       Author: Joseph Coffland <[email protected]>
00:06:25:         Args: --open-web-control
00:06:25:       Config: C:/Users/Administrator/AppData/Roaming/FAHClient/config.xml
00:06:25:******************************** Build ********************************
00:06:25:      Version: 7.4.4
00:06:25:         Date: Mar 4 2014
00:06:25:         Time: 20:26:54
00:06:25:      SVN Rev: 4130
00:06:25:       Branch: fah/trunk/client
00:06:25:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
00:06:25:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
00:06:25:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
00:06:25:     Platform: win32 XP
00:06:25:         Bits: 32
00:06:25:         Mode: Release
00:06:25:******************************* System ********************************
00:06:25:          CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
00:06:25:       CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
00:06:25:         CPUs: 16
00:06:25:       Memory: 7.99GiB
00:06:25:  Free Memory: 5.14GiB
00:06:25:      Threads: WINDOWS_THREADS
00:06:25:   OS Version: 6.1
00:06:25:  Has Battery: false
00:06:25:   On Battery: false
00:06:25:   UTC Offset: -5
00:06:25:          PID: 3436
00:06:25:          CWD: C:/Users/Administrator/AppData/Roaming/FAHClient
00:06:25:           OS: Windows Server 2008 R2 Standard
00:06:25:      OS Arch: AMD64
00:06:25:         GPUs: 0
00:06:25:         CUDA: Not detected
00:06:25:Win32 Service: false

06:19:53:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
06:19:53:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
06:19:53:WU00:FS00:0xa4:Preparing to commence simulation
06:19:53:WU00:FS00:0xa4:- Looking at optimizations...
06:19:53:WU00:FS00:0xa4:- Created dyn
06:19:53:WU00:FS00:0xa4:- Files status OK
06:19:53:WU00:FS00:0xa4:- Expanded 54222 -> 201448 (decompressed 371.5 percent)
06:19:53:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=54222 data_size=201448, decompressed_data_size=201448 diff=0
06:19:53:WU00:FS00:0xa4:- Digital signature verified
06:19:53:WU00:FS00:0xa4:Project: 10085 (Run 2, Clone 31, Gen 7)
06:19:53:WU00:FS00:0xa4:Assembly optimizations on if available.
06:19:53:WU00:FS00:0xa4:Entering M.D.
06:19:59:WU01:FS00:Upload 72.75%
06:19:59:WU00:FS00:0xa4:Mapping NT from 15 to 15 
06:20:02:WU01:FS00:Upload complete
06:20:02:WU01:FS00:Server responded WORK_ACK (400)
06:20:02:WU01:FS00:Final credit estimate, 1887.00 points
06:20:02:WU01:FS00:Cleaning up
******************************* Date: 2014-12-07 *******************************
00:17:05:WARNING:WU00:FS00:FahCore returned an unknown error code which probably indicates that it crashed
00:17:05:WARNING:WU00:FS00:FahCore returned: UNKNOWN_ENUM (-1073741783 = 0xc0000029)

Re: repeated failure with project 10090 (Run 98, Clone 23, G

Posted: Sun Dec 07, 2014 12:51 am
by bruce
I wonder if this is a very similar problem.

It seems that certain projects do not work well with large numbers of cores. Does that description fit your situation?

Re: Project 10085 Failed (48 core system)

Posted: Sun Dec 07, 2014 8:20 pm
by Grandpa_01
AtwaterFS wrote:Also got same error and fahcore crash on vanilla setup 16 core system - Project 10085 (Run 2, Clone 31, Gen 7).

As someone else voiced - it's a constantly ocuring problem that I dont have time to repeatedly deal with it - I miss the past few months where I could set it and forget it.

There is an option which will most likely take care of this problem, (which by the way is not popular with those it does not affect) but in th past Linux V6 of F@H did not get assigned to the server that distributes the WU's in question here, I have not ran any smp on V6 since the resent upgrade of server software but prior to that Linux V6 on 48+ core machines only got A3 WU's. I would imagine it is still the same so there is still a set it and forget it option out there. :wink:

Re: Project 10085 Failed (48 core system)

Posted: Sun Dec 07, 2014 10:56 pm
by VijayPande
We're on it.

Re: repeated failure with project 10090 (Run 98, Clone 23, G

Posted: Mon Dec 08, 2014 2:15 am
by Sailer
bruce wrote:I wonder if this is a very similar problem.

It seems that certain projects do not work well with large numbers of cores. Does that description fit your situation?
I'd guess that it is a very similar problem. I don't know which would be easier; to change the coding so that computers with a large number of cores could run these WUs, or to change the server setting so that it doesn't assign these WUs to computers with a large number of cores. Probably the second option, but that's only a guess.

Re: Project 10085 Failed (48 core system)

Posted: Mon Dec 08, 2014 12:21 pm
by Gooders
On the 3rd, 4th, 5th and 6th of this month, i too have a load of code i can add to this, i have saved a notpad file of log if it is needed

Re: Project 10085 Failed (48 core system)

Posted: Mon Dec 08, 2014 5:07 pm
by 007quick
Just as an update. I have since split my folding slot into 4 (24core,12core,8core,4core) and have not received any 1008* Wu to know whether they will fold or not.

Re: Project 10085 Failed (48 core system)

Posted: Mon Dec 08, 2014 6:30 pm
by Joe_H
That may or may not be due to your settings. The server for this project appears to be out of WU's and is just accepting returns of WU's that were assigned.

Re: Project 10085 Failed (48 core system)

Posted: Mon Dec 08, 2014 8:01 pm
by 007quick
Ah... Good to know!

Re: repeated failure with project 10090 (Run 98, Clone 23, G

Posted: Tue Dec 09, 2014 11:15 pm
by Joe_H
I have heard from PG and the settings on this server have been adjusted so it should no longer assign WU's to systems with a large number of cores. If you see this problem reoccur, please let us know.