Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Moderators: Site Moderators, FAHC Science Team

Post Reply
dmearns
Posts: 11
Joined: Tue Dec 04, 2007 5:29 pm
Hardware configuration: penguin Core 2 Duo 1.86 GHz Linux 2.6.24 (Ubuntu 8.04)
thorn AMD64 X2 4200+ Linux 2.6.19 (Fedora 6)
zhaan AMD64 X2 4200+ Windows MCE
inara Core 2 Duo 1.86 GHz Linux 2.6.20 (Ubuntu 7.04)
dmearns Sempron 3000+ Linux 2.6.14 (Fedora 4)
river Pentium D 3.0 GHz Linux 2.6.15 (Ubuntu 6.06)
pilsner Core 2 Duo 1.86 GHz Windows MCE
starbuck Core 2 Duo 2.84 GHz Linux 2.6.22 (Ubuntu 7.10)
carter AMD64 X2 4200+ Linux 2.6.22 (Ubuntu 7.10)
corona Athlon XP-3000+ Linux 2.6.15 (Ubuntu 6.06)
zoe AMD64 3800+ Windows XP
whiskey Pentium 4 2.4 GHz Linux 2.4.25 (Debian)
kaylee Athlon XP-2800+ Linux 2.6.24 (Ubuntu 8.04)
mycroft AMD64 3200+ Windows XP
Location: Columbia MD USA

Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by dmearns »

Now this is weird:

Code: Select all

[15:07:28]
[15:07:28] + Processing work unit
[15:07:28] Core required: FahCore_a2.exe
[15:07:28] Core found.
[15:07:28] Working on Unit 07 [November 7 15:07:28]
[15:07:28] + Working ...
[15:07:28] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 07 -checkpoint 15 -forceasm -verbose -lifeline 5521 -version 602'

[15:07:28]
[15:07:28] *------------------------------*
[15:07:28] Folding@Home Gromacs SMP Core
[15:07:28] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[15:07:28]
[15:07:28] Preparing to commence simulation
[15:07:28] - Ensuring status. Please wait.
[15:07:38] - Assembly optimizations manually forced on.
[15:07:38] - Not checking prior termination.
[15:07:41] - Expanded 4836721 -> 23985345 (decompressed 495.9 percent)
[15:07:41] Called DecompressByteArray: compressed_data_size=4836721 data_size=23985345, decompressed_data_size=23985345 diff=0
[15:07:41] - Digital signature verified
[15:07:41]
[15:07:41] Project: 2669 (Run 17, Clone 49, Gen 20)
[15:07:41]
[15:07:41] Assembly optimizations on if available.
[15:07:41] Entering M.D.
[15:07:47] Will resume from checkpoint file
[15:07:50] Resuming from checkpoint
[15:07:51] fcSaveRestoreState: I/O failed dir=0, var=0000000003870650, varsize=573564
[15:07:51] fcSaveRestoreState: I/O failed dir=0, var=0000000003A1E4A0, varsize=573564
[15:07:51] Verified work/wudata_07.log
[15:07:51] Verified work/wudata_07.trr
[15:07:51] Verified work/wudata_07.xtc
[15:07:51] Verified work/wudata_07.edr
[15:07:51] Completed 825012 out of 250000 steps  (330%)
[15:33:08] Completed 827502 out of 250000 steps  (331%)
[15:58:31] Completed 830002 out of 250000 steps  (332%)
[16:23:46] Completed 832502 out of 250000 steps  (333%)
[16:49:07] Completed 835002 out of 250000 steps  (334%)
[17:00:03] - Autosending finished units...
[17:00:03] Trying to send all finished work units
[17:00:03] + No unsent completed units remaining.
[17:00:03] - Autosend completed
[17:14:33] Completed 837502 out of 250000 steps  (335%)
[17:39:50] Completed 840002 out of 250000 steps  (336%)
[18:05:09] Completed 842502 out of 250000 steps  (337%)
Is this just a reporting problem or is this WU messed up?

- Dave
parkut
Posts: 363
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by parkut »

what does the unitinfo.txt file say? if you have access to QD-tools, what does that reveal?
dmearns
Posts: 11
Joined: Tue Dec 04, 2007 5:29 pm
Hardware configuration: penguin Core 2 Duo 1.86 GHz Linux 2.6.24 (Ubuntu 8.04)
thorn AMD64 X2 4200+ Linux 2.6.19 (Fedora 6)
zhaan AMD64 X2 4200+ Windows MCE
inara Core 2 Duo 1.86 GHz Linux 2.6.20 (Ubuntu 7.04)
dmearns Sempron 3000+ Linux 2.6.14 (Fedora 4)
river Pentium D 3.0 GHz Linux 2.6.15 (Ubuntu 6.06)
pilsner Core 2 Duo 1.86 GHz Windows MCE
starbuck Core 2 Duo 2.84 GHz Linux 2.6.22 (Ubuntu 7.10)
carter AMD64 X2 4200+ Linux 2.6.22 (Ubuntu 7.10)
corona Athlon XP-3000+ Linux 2.6.15 (Ubuntu 6.06)
zoe AMD64 3800+ Windows XP
whiskey Pentium 4 2.4 GHz Linux 2.4.25 (Debian)
kaylee Athlon XP-2800+ Linux 2.6.24 (Ubuntu 8.04)
mycroft AMD64 3200+ Windows XP
Location: Columbia MD USA

Re: Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by dmearns »

parkut wrote:what does the unitinfo.txt file say? if you have access to QD-tools, what does that reveal?

Code: Select all

Current Work Unit
-----------------
Name: Gromacs
Tag: P2669R17C49G20
Download time: November 7 15:07:28
Due time: November 10 15:07:28
Progress: 352%  [|||||||||||||||||||||||||||||||||||]

Code: Select all

 Index 7: folding now 27.3 X min speed; 352% complete
  server: 171.64.65.56:8080; project: 2669
  Folding: run 17, clone 49, generation 20; benchmark 0; misc: 500, 200
  issue: Fri Nov  7 10:07:20 2008; begin: Fri Nov  7 10:07:28 2008
  expect: Fri Nov  7 12:45:57 2008; due: Mon Nov 10 10:07:28 2008 (3 days)
  core URL: http://www.stanford.edu/~pande/Linux/x86Core_a2.fah (V2.01)
  CPU: 1,0 x86; OS: 4,0 Linux
  assignment info (le): Fri Nov  7 10:07:19 2008; BBFE10E0
  CS: 171.67.108.25; P limit: 524286976
  user: river; team: 13149; ID: EA7DA9587F20D637; mach ID: 1
  work/wudata_07.dat file size: 4837233; WU type: Folding@Home
parkut
Posts: 363
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by parkut »

I'm going to guess it's a cosmetic issue. Keep an eye on it, at 25 minutes, 23 seconds per fame completed, that would be done in about 39 or 40 hours...
dmearns
Posts: 11
Joined: Tue Dec 04, 2007 5:29 pm
Hardware configuration: penguin Core 2 Duo 1.86 GHz Linux 2.6.24 (Ubuntu 8.04)
thorn AMD64 X2 4200+ Linux 2.6.19 (Fedora 6)
zhaan AMD64 X2 4200+ Windows MCE
inara Core 2 Duo 1.86 GHz Linux 2.6.20 (Ubuntu 7.04)
dmearns Sempron 3000+ Linux 2.6.14 (Fedora 4)
river Pentium D 3.0 GHz Linux 2.6.15 (Ubuntu 6.06)
pilsner Core 2 Duo 1.86 GHz Windows MCE
starbuck Core 2 Duo 2.84 GHz Linux 2.6.22 (Ubuntu 7.10)
carter AMD64 X2 4200+ Linux 2.6.22 (Ubuntu 7.10)
corona Athlon XP-3000+ Linux 2.6.15 (Ubuntu 6.06)
zoe AMD64 3800+ Windows XP
whiskey Pentium 4 2.4 GHz Linux 2.4.25 (Debian)
kaylee Athlon XP-2800+ Linux 2.6.24 (Ubuntu 8.04)
mycroft AMD64 3200+ Windows XP
Location: Columbia MD USA

Re: Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by dmearns »

Well it looks like it was a real problem after all:

Code: Select all

[12:44:58] Completed 1277502 out of 250000 steps  (511%)
[13:04:31] Completed 1280002 out of 250000 steps  (512%)
[13:24:07] Completed 1282502 out of 250000 steps  (513%)
[13:43:38] Completed 1285002 out of 250000 steps  (514%)
[14:03:11] Completed 1287502 out of 250000 steps  (515%)
[14:22:42] Completed 1290002 out of 250000 steps  (516%)
[14:42:13] Completed 1292502 out of 250000 steps  (517%)
[15:02:03] Completed 1295002 out of 250000 steps  (518%)
[15:27:22] Completed 1297502 out of 250000 steps  (519%)
[15:27:22] Unit 7's deadline (November 10 15:07) has passed.
[15:27:22] Going to interrupt core and move on to next unit...
[15:27:23] CoreStatus = 0 (0)
[15:27:23] Client-core communications error: ERROR 0x0
[15:27:23] Deleting current work unit & continuing...
[15:27:38] - Warning: Could not delete all work unit files (7): Core file absent
[15:27:38] Trying to send all finished work units
[15:27:38] + No unsent completed units remaining.
[15:27:38] - Preparing to get new work unit...
[15:27:38] + Attempting to get work packet
And it failed to kill the core processes, so I had 2 sets going until I killed the old ones manually.

- Dave
verdeva
Posts: 30
Joined: Mon Dec 03, 2007 1:40 pm
Location: Seattle, WA

Re: Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by verdeva »

I had this same thing just occur on a 2669, except it started at 198% and is now at 325%.

Based on what I read here, I'm going to delete this WU.

Project: 2669 (Run 7, Clone 165, Gen 18)
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by kasson »

It looks like the problem was with the checkpoint files--try clearing your checkpoint files and restarting. (It will restart the WU from the beginning.)
dmearns
Posts: 11
Joined: Tue Dec 04, 2007 5:29 pm
Hardware configuration: penguin Core 2 Duo 1.86 GHz Linux 2.6.24 (Ubuntu 8.04)
thorn AMD64 X2 4200+ Linux 2.6.19 (Fedora 6)
zhaan AMD64 X2 4200+ Windows MCE
inara Core 2 Duo 1.86 GHz Linux 2.6.20 (Ubuntu 7.04)
dmearns Sempron 3000+ Linux 2.6.14 (Fedora 4)
river Pentium D 3.0 GHz Linux 2.6.15 (Ubuntu 6.06)
pilsner Core 2 Duo 1.86 GHz Windows MCE
starbuck Core 2 Duo 2.84 GHz Linux 2.6.22 (Ubuntu 7.10)
carter AMD64 X2 4200+ Linux 2.6.22 (Ubuntu 7.10)
corona Athlon XP-3000+ Linux 2.6.15 (Ubuntu 6.06)
zoe AMD64 3800+ Windows XP
whiskey Pentium 4 2.4 GHz Linux 2.4.25 (Debian)
kaylee Athlon XP-2800+ Linux 2.6.24 (Ubuntu 8.04)
mycroft AMD64 3200+ Windows XP
Location: Columbia MD USA

Re: Project: 2669 (Run 17, Clone 49, Gen 20) - starts at 330%

Post by dmearns »

kasson wrote:It looks like the problem was with the checkpoint files--try clearing your checkpoint files and restarting. (It will restart the WU from the beginning.)
Thanks. Would the checkpoint files be state.cpt and state_prev.cpt?

- Dave
Post Reply