Project: 2683 (Run 5, Clone 5, Gen 14)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Karamiekos
Posts: 33
Joined: Tue Jul 15, 2008 12:27 am

FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Post by Karamiekos »

Dont know whats going on here, but I finished one work unit and downloaded this one, but it can't even start........
I tried deleting the core and downloading a new one Just to make sure it wasn't corrupted or anything, but no help.

Code: Select all

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

Using local directory for work files
16 cores detected


--- Opening Log file [January 20 07:35:41 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.24R3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /media/fah/fah
Executable: ./fah6
Arguments: -bigadv -smp 15 -verbosity 9 -local 

[07:35:41] - Ask before connecting: No
[07:35:41] - User name: Karamiekos (Team 36837)
[07:35:41] - User ID: 23E128773C713161
[07:35:41] - Machine ID: 1
[07:35:41] 
[07:35:41] Loaded queue successfully.
[07:35:41] 
[07:35:41] - Autosending finished units... [January 20 07:35:41 UTC]
[07:35:41] + Processing work unit
[07:35:41] Trying to send all finished work units
[07:35:41] Core required: FahCore_a2.exe
[07:35:41] + No unsent completed units remaining.
[07:35:41] Core found.
[07:35:41] - Autosend completed
[07:35:41] Working on queue slot 06 [January 20 07:35:41 UTC]
[07:35:41] + Working ...
[07:35:41] - Calling './mpiexec -np 15 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -nice 19 -suffix 06 -priority 96 -checkpoint 22 -verbose -lifeline 6102 -version 624'

[07:35:42] 
[07:35:42] *------------------------------*
[07:35:42] Folding@Home Gromacs SMP Core
[07:35:42] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[07:35:42] 
[07:35:42] Preparing to commence simulation
[07:35:42] - Ensuring status. Please wait.
[07:35:47] Called DecompressByteArray: compressed_data_size=30234593 data_size=159270593, decompressed_data_size=159270593 diff=0
[07:35:49] - Digital signature verified
[07:35:49] 
[07:35:49] Project: 2683 (Run 5, Clone 5, Gen 14)
[07:35:49] 
[07:35:49] Assembly optimizations on if available.
[07:35:49] Entering M.D.
[07:35:59]  (Run 5, Clone 5, Gen 14)
[07:35:59] 
[07:36:00] Entering M.D.
NNODES=15, MYRANK=1, HOSTNAME=new-host-2
NNODES=15, MYRANK=6, HOSTNAME=new-host-2
NNODES=15, MYRANK=7, HOSTNAME=new-host-2
NNODES=15, MYRANK=8, HOSTNAME=new-host-2
NNODES=15, MYRANK=12, HOSTNAME=new-host-2
NNODES=15, MYRANK=13, HOSTNAME=new-host-2
NNODES=15, MYRANK=14, HOSTNAME=new-host-2
NNODES=15, MYRANK=3, HOSTNAME=new-host-2
NNODES=15, MYRANK=4, HOSTNAME=new-host-2
NNODES=15, MYRANK=10, HOSTNAME=new-host-2
NNODES=15, MYRANK=2, HOSTNAME=new-host-2
NNODES=15, MYRANK=11, HOSTNAME=new-host-2
NNODES=15, MYRANK=5, HOSTNAME=new-host-2
NNODES=15, MYRANK=9, HOSTNAME=new-host-2
NNODES=15, MYRANK=0, HOSTNAME=new-host-2
NODEID=0 argc=20
Reading file work/wudata_06.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=2 argc=20
NODEID=3 argc=20
NODEID=4 argc=20
NODEID=6 argc=20
NODEID=7 argc=20
NODEID=8 argc=20
NODEID=10 argc=20
NODEID=12 argc=20
NODEID=13 argc=20
NODEID=14 argc=20
NODEID=5 argc=20
NODEID=11 argc=20
NODEID=9 argc=20
NODEID=1 argc=20
Note: tpx file_version 48, software version 68

Will use 10 particle-particle and 5 PME only nodes
This is a guess, check the performance at the end of the log file

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 2D domain decomposition 5 x 1 x 2
starting mdrun 'SINGLE VESICLE in water'
3750000 steps,  15000.0 ps (continuing from step 3500000,  14000.0 ps).
[07:36:24] Completed 0 out of 250000 steps  (0%)

t = 14000.001 ps: Water molecule starting at atom 149430 can not be settled.
Check for bad contacts and/or reduce the timestep.
[07:36:26] 
[07:36:26] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Quit
[0]4:Return code = 0, signaled with Quit
[0]5:Return code = 0, signaled with Quit
[0]6:Return code = 0, signaled with Quit
[0]7:Return code = 0, signaled with Quit
[0]8:Return code = 0, signaled with Quit
[0]9:Return code = 0, signaled with Quit
[0]10:Return code = 0, signaled with Quit
[0]11:Return code = 0, signaled with Quit
[0]12:Return code = 0, signaled with Quit
[0]13:Return code = 0, signaled with Quit
[0]14:Return code = 0, signaled with Quit
[07:36:41] CoreStatus = 66 (102)
[07:36:41] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[07:36:41] Killing all core threads

Folding@Home Client Shutdown.
Last edited by Karamiekos on Wed Jan 20, 2010 8:08 am, edited 2 times in total.
Zakk Wylde, "Then you start firing back some cocktails."
Rigs
Phenom II 965 With 2 4850s Running BOINC
Quad 8356s Running BOINC
Karamiekos
Posts: 33
Joined: Tue Jul 15, 2008 12:27 am

Re: FAH Core Interrupted

Post by Karamiekos »

I tried to delete it, but I kept getting the same exact Work unit back, and always had the same result. I have had zero problems up to now. I think there might be something wrong with this particular one. I finally got a different work unit after 2-3 attempts and it is working fine.

Just a heads up.... it would be nice to double verify with the next person to get it and see if they have problems.
Zakk Wylde, "Then you start firing back some cocktails."
Rigs
Phenom II 965 With 2 4850s Running BOINC
Quad 8356s Running BOINC
rickoic
Posts: 320
Joined: Sat May 23, 2009 4:49 pm
Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970
Location: Mississippi near Memphis, Tn

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Post by rickoic »

The line:
t=14000.001 ps: Water molecule starting at atom 149430 can not bge settled.

indicates that be beginning parameters are so out of bounds that folding cannont be accomplished.

You need to post this in the forum about problems with a specific work unit so that pandegroup can remove the work unit.

Until then you may have to remove the -bigadv and fold one of the 1920pt wu's to get your pc back up and running.
Erase this one. Remove the -bigadv. D/l a wu. Stop it and put the -bigadv back in if you want, or fold that way for a day or so.

Probably the only way your going to be able to continue folding around receiving this bad wu.

Fold on

Rick
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Karamiekos
Posts: 33
Joined: Tue Jul 15, 2008 12:27 am

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Post by Karamiekos »

I did get it up and running on another unit no problems. I am definitely leaning towards a work unit problem. I posted here due to the unique nature of the -bigadv project, but if the mods see fit I hope they move the thread wherever it needs to be.

Thank you for the input though Rick, if they don't seem to notice this thread I will try again to contact and make sure they are aware. I know they are busy.
Zakk Wylde, "Then you start firing back some cocktails."
Rigs
Phenom II 965 With 2 4850s Running BOINC
Quad 8356s Running BOINC
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Post by bruce »

There's certainly a possibility of a bad WU but has this particular client completed other BigWUs successfully? One possible cause for the error is insufficient (virtual?) memory.
Karamiekos
Posts: 33
Joined: Tue Jul 15, 2008 12:27 am

Re: FAH Core Interrupted Project 2683(Run 5, Clone 5, Gen 14)

Post by Karamiekos »

This client has been running big work units good for about a month now. The machine usually doesn't dip into the file swap with 8 gigs of ram and an 11 gig swap it usually uses less than 6 gigs of ram and no file swap.
I would be really interested to see what happens if someone else gets it.
Zakk Wylde, "Then you start firing back some cocktails."
Rigs
Phenom II 965 With 2 4850s Running BOINC
Quad 8356s Running BOINC
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Project: 2683 (Run 5, Clone 5, Gen 14)

Post by k1wi »

Code: Select all

[16:47:19] *------------------------------*
[16:47:19] Folding@Home Gromacs SMP Core
[16:47:19] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[16:47:19] 
[16:47:19] Preparing to commence simulation
[16:47:19] - Ensuring status. Please wait.
[16:47:19] Files status OK
[16:47:22] - Expanded 30234593 -> 159270593 (decompressed 100.6 percent)
[16:47:22] Called DecompressByteArray: compressed_data_size=30234593 data_size=159270593, decompressed_data_size=159270593 diff=0
[16:47:23] - Digital signature verified
[16:47:23] 
[16:47:23] Project: 2683 (Run 5, Clone 5, Gen 14)
[16:47:23] 
[16:47:23] Assembly optimizations on if available.
[16:47:23] Entering M.D.
[16:47:34]  (Run 5, Clone 5, Gen 14)
[16:47:34] 
[16:47:35] Entering M.D.
NNODES=8, MYRANK=0, HOSTNAME=FAH
NODEID=0 argc=20
NNODES=8, MYRANK=1, HOSTNAME=FAH
NODEID=1 argc=20
NNODES=8, MYRANK=2, HOSTNAME=FAH
NODEID=2 argc=20
NNODES=8, MYRANK=3, HOSTNAME=FAH
NODEID=3 argc=20
NNODES=8, MYRANK=4, HOSTNAME=FAH
NODEID=4 argc=20
NNODES=8, MYRANK=5, HOSTNAME=FAH
NODEID=5 argc=20
NNODES=8, MYRANK=6, HOSTNAME=FAH
NODEID=6 argc=20
NNODES=8, MYRANK=7, HOSTNAME=FAH
NODEID=7 argc=20
Reading file work/wudata_02.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 8 x 1 x 1
starting mdrun 'SINGLE VESICLE in water'
3750000 steps,  15000.0 ps (continuing from step 3500000,  14000.0 ps).
[16:47:55] Completed 0 out of 250000 steps  (0%)

t = 14000.001 ps: Water molecule starting at atom 859944 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 14000.001 ps: Water molecule starting at atom 597471 can not be settled.
Check for bad contacts and/or reduce the timestep.
[16:47:57] 
[16:47:57] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[0]4:Return code = 0, signaled with Quit
[0]5:Return code = 0, signaled with Quit
[0]6:Return code = 0, signaled with Quit
[0]7:Return code = 0, signaled with Quit
[16:48:05] CoreStatus = 66 (102)
[16:48:05] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)

Folding@Home Client Shutdown.

I just about to go and look @ how to get rid of this work unit, as it isn't starting a new one, so my computer's sitting idle at the moment.

Will this hurt my passkey ratio? IE affect whether or not I earn -bigadv bonuses
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2683 (Run 5, Clone 5, Gen 14)

Post by bruce »

The ratio for bonuses is 80% so unless you've had other failures, it will not affect your bonus.

I'll report this as a bad WU.
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 2683 (Run 5, Clone 5, Gen 14)

Post by tear »

FYI, it's failed for me the same way on 20th; found it in the log just now.
One man's ceiling is another man's floor.
Image
Post Reply