Project: 2684 (Run 6, Clone 1, Gen 0)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Stewart1
Posts: 16
Joined: Sat Jan 02, 2010 9:57 pm

Project: 2684 (Run 6, Clone 1, Gen 0)

Post by Stewart1 »

Am using Unix folding appliance in VMware. It keeps downloading this and failing.

First try:

Code: Select all

[02:14:35] + Attempting to get work packet
[02:14:35] Passkey found
[02:14:35] - Connecting to assignment server
[02:14:37] - Successful: assigned to (171.67.108.22).
[02:14:37] + News From Folding@Home: Welcome to Folding@Home
[02:14:37] Loaded queue successfully.
[02:19:37] + Closed connections
[02:19:42] 
[02:19:42] + Processing work unit
[02:19:42] Core required: FahCore_a3.exe
[02:19:42] Core found.
[02:19:42] Working on queue slot 03 [May 30 02:19:42 UTC]
[02:19:42] + Working ...
[02:19:42] 
[02:19:42] *------------------------------*
[02:19:42] Folding@Home Gromacs SMP Core
[02:19:42] Version 2.21 (May 10, 2010)
[02:19:42] 
[02:19:42] Preparing to commence simulation
[02:19:42] - Looking at optimizations...
[02:19:42] - Created dyn
[02:19:42] - Files status OK
[02:19:44] - Expanded 20077630 -> 30791309 (decompressed 153.3 percent)
[02:19:44] Called DecompressByteArray: compressed_data_size=20077630 data_size=30791309, decompressed_data_size=30791309 diff=0
[02:19:45] - Digital signature verified
[02:19:45] 
[02:19:45] Project: 2684 (Run 6, Clone 1, Gen 0)
[02:19:45] 
[02:19:45] Assembly optimizations on if available.
[02:19:45] Entering M.D.
[02:20:06] Completed 0 out of 250000 steps  (0%)
[02:20:09] mdrun returned 255
[02:20:09] Going to send back what have done -- stepsTotalG=250000
[02:20:09] Work fraction=0.0000 steps=250000.
[02:20:13] logfile size=12708 infoLength=12708 edr=25 trr=1
[02:20:13] logfile size: 12708 info=12708 bed=25 hdr=1
[02:20:13] - Writing 13246 bytes of core data to disk...
[02:20:14]   ... Done.
[02:20:51] 
[02:20:51] Folding@home Core Shutdown: EARLY_UNIT_END
[02:20:52] CoreStatus = 72 (114)
[02:20:52] Sending work to server
[02:20:52] Project: 2684 (Run 6, Clone 1, Gen 0)
Second try gave me an error message in the console which for some reason was not copied to the log file. It said:

t = 0.000 ps: Water molecule starting at atom 570285 can not be settled. Check for bad contacts and/or resduce the timeset.
Wrish
Posts: 74
Joined: Thu Jan 28, 2010 5:09 am

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Post by Wrish »

Got the same work unit. Immediate EUE, log error was "Client-core communications error" (8B). Was waiting a few days to see if anyone else completed it... thought I didn't see my client upload anything. Our errors are similar yet different. Console log:

Code: Select all

[13:08:53] + Processing work unit
[13:08:53] Core required: FahCore_a3.exe
[13:08:53] Core found.
[13:08:53] Working on queue slot 05 [May 29 13:08:53 UTC]
[13:08:53] + Working ...
[13:08:53] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 8 -priority 96 -checkpoint 30 -forceasm -verbose -lifeline 8313 -version 629'

[13:08:53] 
[13:08:53] *------------------------------*
[13:08:53] Folding@Home Gromacs SMP Core
[13:08:53] Version 2.21 (May 10, 2010)
[13:08:53] 
[13:08:53] Preparing to commence simulation
[13:08:53] - Assembly optimizations manually forced on.
[13:08:53] - Not checking prior termination.
[13:08:55] - Expanded 20077630 -> 30791309 (decompressed 153.3 percent)
[13:08:55] Called DecompressByteArray: compressed_data_size=20077630 data_size=30791309, decompressed_data_size=30791309 diff=0
[13:08:55] - Digital signature verified
[13:08:55] 
[13:08:55] Project: 2684 (Run 6, Clone 1, Gen 0)
[13:08:55] 
[13:08:55] Assembly optimizations on if available.
[13:08:55] Entering M.D.
Starting 8 threads
NNODES=8, MYRANK=2, HOSTNAME=thread #2
NNODES=8, MYRANK=3, HOSTNAME=thread #3
NNODES=8, MYRANK=4, HOSTNAME=thread #4
NNODES=8, MYRANK=5, HOSTNAME=thread #5
NNODES=8, MYRANK=6, HOSTNAME=thread #6
NNODES=8, MYRANK=7, HOSTNAME=thread #7
NNODES=8, MYRANK=1, HOSTNAME=thread #1
NNODES=8, MYRANK=0, HOSTNAME=thread #0
Reading file work/wudata_05.tpr, VERSION 4.0.99_development_20090605 (single precision)
Making 1D domain decomposition 8 x 1 x 1
starting mdrun 'SINGLE VESICLE in water'
250000 steps,   1000.0 ps.
[13:09:08] Completed 0 out of 250000 steps  (0%)

t = 0.004 ps: Water molecule starting at atom 449631 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 0.004 ps: Water molecule starting at atom 142431 can not be settled.
Check for bad contacts and/or reduce the timestep.

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

Step 2  Warning: pressure scaling more than 1%, mu: 1.01253 1.01253 1.01253

t = 0.008 ps: Water molecule starting at atom 1058712 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 0.008 ps: Water molecule starting at atom 877182 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 0.008 ps: Water molecule starting at atom 1019217 can not be settled.
Check for bad contacts and/or reduce the timestep.
Segmentation fault
[13:09:11] CoreStatus = 8B (139)
[13:09:11] Client-core communications error: ERROR 0x8b
[13:09:11] Deleting current work unit & continuing...
System: i7 @ 4GHz, native Ubuntu 8.04, 38 A2 bigadvs completed before this.
Stewart1
Posts: 16
Joined: Sat Jan 02, 2010 9:57 pm

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Post by Stewart1 »

I actually had this unit fail multiple times for different reasons.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Post by toTOW »

I found 14 reports of instant EUE for this WU ... but some one has been able to complete it.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Post by bruce »

Is it failing on overclocked machines? Maybe it's time for a new stability test tool.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Post by Grandpa_01 »

bruce wrote:Is it failing on overclocked machines? Maybe it's time for a new stability test tool.
Or may be it is time for Stanford to realize there might be a slight bug in there client or WU's and fix it. :mrgreen: That would be better than the dumping method. :ewink:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
BigJohnFAH
Posts: 2
Joined: Thu May 28, 2009 6:35 pm

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Post by BigJohnFAH »

toTOW wrote:... but some one has been able to complete it.
Then why did I get it ? :mrgreen:
Currently at 18% on this one. I guess this will test my OC.
Wrish
Posts: 74
Joined: Thu Jan 28, 2010 5:09 am

Re: Project: 2684 (Run 6, Clone 1, Gen 0)

Post by Wrish »

Update: Despite claims that someone completed 2684-6-1-0, I was reassigned this unit yet again today, twice in a row. On my second attempt, it EUE'ed in 3 seconds like before and uploaded 13251 bytes to the server. On the attempt immediately following - my third - my CPU began folding it successfully (past 2% now). All I can think of is that the third attempt did not immediately follow an A2 bigadv folding session like the ones before. Or, considering the different errors each time, maybe I rolled a compatible number on my random number generator... assuming the workload is not deterministic.
Post Reply