Page 1 of 1

Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Sun May 30, 2010 2:35 am
by Stewart1
Am using Unix folding appliance in VMware. It keeps downloading this and failing.

First try:

Code: Select all

[02:14:35] + Attempting to get work packet
[02:14:35] Passkey found
[02:14:35] - Connecting to assignment server
[02:14:37] - Successful: assigned to (171.67.108.22).
[02:14:37] + News From Folding@Home: Welcome to Folding@Home
[02:14:37] Loaded queue successfully.
[02:19:37] + Closed connections
[02:19:42] 
[02:19:42] + Processing work unit
[02:19:42] Core required: FahCore_a3.exe
[02:19:42] Core found.
[02:19:42] Working on queue slot 03 [May 30 02:19:42 UTC]
[02:19:42] + Working ...
[02:19:42] 
[02:19:42] *------------------------------*
[02:19:42] Folding@Home Gromacs SMP Core
[02:19:42] Version 2.21 (May 10, 2010)
[02:19:42] 
[02:19:42] Preparing to commence simulation
[02:19:42] - Looking at optimizations...
[02:19:42] - Created dyn
[02:19:42] - Files status OK
[02:19:44] - Expanded 20077630 -> 30791309 (decompressed 153.3 percent)
[02:19:44] Called DecompressByteArray: compressed_data_size=20077630 data_size=30791309, decompressed_data_size=30791309 diff=0
[02:19:45] - Digital signature verified
[02:19:45] 
[02:19:45] Project: 2684 (Run 6, Clone 1, Gen 0)
[02:19:45] 
[02:19:45] Assembly optimizations on if available.
[02:19:45] Entering M.D.
[02:20:06] Completed 0 out of 250000 steps  (0%)
[02:20:09] mdrun returned 255
[02:20:09] Going to send back what have done -- stepsTotalG=250000
[02:20:09] Work fraction=0.0000 steps=250000.
[02:20:13] logfile size=12708 infoLength=12708 edr=25 trr=1
[02:20:13] logfile size: 12708 info=12708 bed=25 hdr=1
[02:20:13] - Writing 13246 bytes of core data to disk...
[02:20:14]   ... Done.
[02:20:51] 
[02:20:51] Folding@home Core Shutdown: EARLY_UNIT_END
[02:20:52] CoreStatus = 72 (114)
[02:20:52] Sending work to server
[02:20:52] Project: 2684 (Run 6, Clone 1, Gen 0)
Second try gave me an error message in the console which for some reason was not copied to the log file. It said:

t = 0.000 ps: Water molecule starting at atom 570285 can not be settled. Check for bad contacts and/or resduce the timeset.

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Sun May 30, 2010 3:08 am
by Wrish
Got the same work unit. Immediate EUE, log error was "Client-core communications error" (8B). Was waiting a few days to see if anyone else completed it... thought I didn't see my client upload anything. Our errors are similar yet different. Console log:

Code: Select all

[13:08:53] + Processing work unit
[13:08:53] Core required: FahCore_a3.exe
[13:08:53] Core found.
[13:08:53] Working on queue slot 05 [May 29 13:08:53 UTC]
[13:08:53] + Working ...
[13:08:53] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 8 -priority 96 -checkpoint 30 -forceasm -verbose -lifeline 8313 -version 629'

[13:08:53] 
[13:08:53] *------------------------------*
[13:08:53] Folding@Home Gromacs SMP Core
[13:08:53] Version 2.21 (May 10, 2010)
[13:08:53] 
[13:08:53] Preparing to commence simulation
[13:08:53] - Assembly optimizations manually forced on.
[13:08:53] - Not checking prior termination.
[13:08:55] - Expanded 20077630 -> 30791309 (decompressed 153.3 percent)
[13:08:55] Called DecompressByteArray: compressed_data_size=20077630 data_size=30791309, decompressed_data_size=30791309 diff=0
[13:08:55] - Digital signature verified
[13:08:55] 
[13:08:55] Project: 2684 (Run 6, Clone 1, Gen 0)
[13:08:55] 
[13:08:55] Assembly optimizations on if available.
[13:08:55] Entering M.D.
Starting 8 threads
NNODES=8, MYRANK=2, HOSTNAME=thread #2
NNODES=8, MYRANK=3, HOSTNAME=thread #3
NNODES=8, MYRANK=4, HOSTNAME=thread #4
NNODES=8, MYRANK=5, HOSTNAME=thread #5
NNODES=8, MYRANK=6, HOSTNAME=thread #6
NNODES=8, MYRANK=7, HOSTNAME=thread #7
NNODES=8, MYRANK=1, HOSTNAME=thread #1
NNODES=8, MYRANK=0, HOSTNAME=thread #0
Reading file work/wudata_05.tpr, VERSION 4.0.99_development_20090605 (single precision)
Making 1D domain decomposition 8 x 1 x 1
starting mdrun 'SINGLE VESICLE in water'
250000 steps,   1000.0 ps.
[13:09:08] Completed 0 out of 250000 steps  (0%)

t = 0.004 ps: Water molecule starting at atom 449631 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 0.004 ps: Water molecule starting at atom 142431 can not be settled.
Check for bad contacts and/or reduce the timestep.

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

t = 0.008 ps: Water molecule starting at atom 1058712 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 0.008 ps: Water molecule starting at atom 877182 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 0.008 ps: Water molecule starting at atom 1019217 can not be settled.
Check for bad contacts and/or reduce the timestep.
Segmentation fault
[13:09:11] CoreStatus = 8B (139)
[13:09:11] Client-core communications error: ERROR 0x8b
[13:09:11] Deleting current work unit & continuing...
System: i7 @ 4GHz, native Ubuntu 8.04, 38 A2 bigadvs completed before this.

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Sun May 30, 2010 5:48 am
by Stewart1
I actually had this unit fail multiple times for different reasons.

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Sun May 30, 2010 8:47 am
by toTOW
I found 14 reports of instant EUE for this WU ... but some one has been able to complete it.

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Sun May 30, 2010 12:47 pm
by bruce
Is it failing on overclocked machines? Maybe it's time for a new stability test tool.

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Sun May 30, 2010 5:28 pm
by Grandpa_01
bruce wrote:Is it failing on overclocked machines? Maybe it's time for a new stability test tool.
Or may be it is time for Stanford to realize there might be a slight bug in there client or WU's and fix it. :mrgreen: That would be better than the dumping method. :ewink:

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Sun May 30, 2010 6:58 pm
by BigJohnFAH
toTOW wrote:... but some one has been able to complete it.
Then why did I get it ? :mrgreen:
Currently at 18% on this one. I guess this will test my OC.

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Posted: Mon May 31, 2010 5:19 pm
by Wrish
Update: Despite claims that someone completed 2684-6-1-0, I was reassigned this unit yet again today, twice in a row. On my second attempt, it EUE'ed in 3 seconds like before and uploaded 13251 bytes to the server. On the attempt immediately following - my third - my CPU began folding it successfully (past 2% now). All I can think of is that the third attempt did not immediately follow an A2 bigadv folding session like the ones before. Or, considering the different errors each time, maybe I rolled a compatible number on my random number generator... assuming the workload is not deterministic.