NaN detected OR Bad WU? Project: 2605 (12, 272, 75)

Moderators: Site Moderators, FAHC Science Team

Post Reply
MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

NaN detected OR Bad WU? Project: 2605 (12, 272, 75)

Post by MoneyGuyBK »

ERROR "Quit 101 - NaN detected: (ener[20])" OR Problem WU: Project: 2605 (Run 12, Clone 272, Gen 75)

This is the first time I see this error code...
I named the subject of this post as such.
If I should have named it after the particular WU, please say so.

So would anyone care to tell me what this error code means? OR was this just a bad or problem WU?
TIA

Peace

Code: Select all

[12:24:58] Completed 500000 out of 500000 steps  (100 percent)
[12:24:58] Writing final coordinates.
[12:24:58] Past main M.D. loop
[12:24:59] Will end MPI now
[12:25:58] 
[12:25:58] Finished Work Unit:
[12:25:58] - Reading up to 3723552 from "work/wudata_06.arc": Read 3723552
[12:25:58] - Reading up to 1779224 from "work/wudata_06.xtc": Read 1779224
[12:25:59] goefile size: 0
[12:25:59] logfile size: 16916
[12:25:59] Leaving Run
[12:26:03] - Writing 5524092 bytes of core data to disk...
[12:26:03]   ... Done.
[12:26:04] - Shutting down core
[12:26:04] 
[12:26:04] Folding@home Core Shutdown: FINISHED_UNIT
[12:26:10] CoreStatus = 64 (100)
[12:26:10] Unit 6 finished with 78 percent of time to deadline remaining.
[12:26:10] Updated performance fraction: 0.739575
[12:26:10] Sending work to server


[12:26:10] + Attempting to send results
[12:26:10] - Reading file work/wuresults_06.dat from core
[12:26:10]   (Read 5524092 bytes from disk)
[12:26:10] Connecting to http://171.64.65.56:8080/
[12:26:36] Posted data.
[12:26:36] Initial: 0000; - Uploaded at ~199 kB/s
[12:26:37] - Averaged speed for that direction ~135 kB/s
[12:26:37] + Results successfully sent
[12:26:37] Thank you for your contribution to Folding@Home.
[12:26:37] + Number of Units Completed: 17

[12:30:42] - Warning: Could not delete all work unit files (6): Core returned invalid code
[12:30:42] Trying to send all finished work units
[12:30:42] + No unsent completed units remaining.
[12:30:42] - Preparing to get new work unit...
[12:30:42] + Attempting to get work packet
[12:30:42] - Will indicate memory of 512 MB
[12:30:42] - Connecting to assignment server
[12:30:42] Connecting to http://assign.stanford.edu:8080/
[12:30:42] Posted data.
[12:30:42] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[12:30:42] + News From Folding@Home: Welcome to Folding@Home
[12:30:42] Loaded queue successfully.
[12:30:42] Connecting to http://171.64.65.56:8080/
[12:30:45] Posted data.
[12:30:45] Initial: 0000; - Receiving payload (expected size: 2419460)
[12:30:46] - Downloaded at ~2362 kB/s
[12:30:46] - Averaged speed for that direction ~1788 kB/s
[12:30:46] + Received work.
[12:30:46] Trying to send all finished work units
[12:30:46] + No unsent completed units remaining.
[12:30:46] + Closed connections
[12:30:46] 
[12:30:46] + Processing work unit
[12:30:46] Core required: FahCore_a1.exe
[12:30:46] Core found.
[12:30:46] Working on Unit 07 [July 20 12:30:46]
[12:30:46] + Working ...
[12:30:46] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 07 -priority 96 -checkpoint 30 -verbose -lifeline 5563 -version 602'

[12:30:46] 
[12:30:46] *------------------------------*
[12:30:46] Folding@Home Gromacs SMP Core
[12:30:46] Version 1.74 (November 27, 2006)
[12:30:46] 
[12:30:46] Preparing to commence simulation
[12:30:46] - Ensuring status. Please wait.
[12:30:47] - Starting from initial work packet
[12:30:47] 
[12:30:47] Project: 2605 (Run 12, Clone 272, Gen 75)
[12:30:47] 
[12:30:47] Assembly optimizations on if available.
[12:30:47] Entering M.D.
[12:31:04] percent)
[12:31:04] - Starting from initial work packet
[12:31:04] 
[12:31:04] Project: 2605 (Run 12, Clone 272, Gen 75)
[12:31:04] 
[12:31:05] Entering M.D.
[12:31:12] g local files
[12:31:12] in in POPC
[12:31:12] Writing local files
[12:31:13] Extra SSE boost OK.
[12:43:54] es
[12:43:54] Completed 5000 out of 500000 steps  (1 percent)
[12:56:33] Writing local files
[12:56:33] Completed 10000 out of 500000 steps  (2 percent)
[13:09:12] Writing local files
[13:09:12] Completed 15000 out of 500000 steps  (3 percent)
[13:21:50] Writing local files
[13:21:50] Completed 20000 out of 500000 steps  (4 percent)
[13:23:07] - Autosending finished units...
[13:23:07] Trying to send all finished work units
[13:23:07] + No unsent completed units remaining.
[13:23:07] - Autosend completed
[13:34:27] Writing local files
[13:34:27] Completed 25000 out of 500000 steps  (5 percent)
[13:47:03] Writing local files
[13:47:03] Completed 30000 out of 500000 steps  (6 percent)
[13:59:40] Writing local files
[13:59:40] Completed 35000 out of 500000 steps  (7 percent)
[14:12:15] Writing local files
[14:12:15] Completed 40000 out of 500000 steps  (8 percent)
[14:24:51] Writing local files
[14:24:51] Completed 45000 out of 500000 steps  (9 percent)
[14:37:30] Writing local files
[14:37:30] Completed 50000 out of 500000 steps  (10 percent)
[14:50:05] Writing local files
[14:50:05] Completed 55000 out of 500000 steps  (11 percent)
[15:02:41] Writing local files
[15:02:41] Completed 60000 out of 500000 steps  (12 percent)
[15:15:19] Writing local files
[15:15:19] Completed 65000 out of 500000 steps  (13 percent)
[15:27:56] Writing local files
[15:27:56] Completed 70000 out of 500000 steps  (14 percent)
[15:40:35] Writing local files
[15:40:35] Completed 75000 out of 500000 steps  (15 percent)
[15:45:07] Quit 101 - NaN detected: (ener[20])
[15:45:07] 
[15:45:07] Simulation instability has been encountered. The run has entered a
[15:45:07]   state from which no further progress can be made.
[15:45:07] This may be the correct result of the simulation, however if you
[15:45:07]   often see other project units terminating early like this
[15:45:07]   too, you may wish to check the stability of your computer (issues
[15:45:07]   such as high temperature, overclocking, etc.).
[15:45:07] Going to send back what have done.
[15:45:07] logfile size: 8599
[15:45:07] - Writing 9149 bytes of core data to disk...
[15:45:07]   ... Done.
[15:45:07] 
[15:45:07] Folding@home Core Shutdown: EARLY_UNIT_END
[15:45:12] CoreStatus = 72 (114)
[15:45:12] Sending work to server


[15:45:12] + Attempting to send results
[15:45:12] - Reading file work/wuresults_07.dat from core
[15:45:12]   (Read 9149 bytes from disk)
[15:45:12] Connecting to http://171.64.65.56:8080/
[15:45:12] Posted data.
[15:45:13] Initial: 0000; - Uploaded at ~9 kB/s
[15:45:13] - Averaged speed for that direction ~110 kB/s
[15:45:13] + Results successfully sent
[15:45:13] Thank you for your contribution to Folding@Home.
[15:49:17] - Warning: Could not delete all work unit files (7): Core returned invalid code
[15:49:17] Trying to send all finished work units
[15:49:17] + No unsent completed units remaining.
[15:49:17] - Preparing to get new work unit...
[15:49:17] + Attempting to get work packet
[15:49:17] - Will indicate memory of 512 MB
[15:49:17] - Connecting to assignment server
[15:49:17] Connecting to http://assign.stanford.edu:8080/
[15:49:17] Posted data.
[15:49:17] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[15:49:17] + News From Folding@Home: Welcome to Folding@Home
[15:49:17] Loaded queue successfully.
[15:49:17] Connecting to http://171.64.65.56:8080/
[15:49:20] Posted data.
[15:49:20] Initial: 0000; - Receiving payload (expected size: 2449057)
[15:49:21] - Downloaded at ~2391 kB/s
[15:49:21] - Averaged speed for that direction ~1908 kB/s
[15:49:21] + Received work.
[15:49:21] Trying to send all finished work units
[15:49:21] + No unsent completed units remaining.
[15:49:21] + Closed connections
[15:49:26] 
[15:49:26] + Processing work unit
[15:49:26] Core required: FahCore_a1.exe
[15:49:26] Core found.
[15:49:26] Working on Unit 08 [July 20 15:49:26]
[15:49:26] + Working ...
[15:49:26] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 08 -priority 96 -checkpoint 30 -verbose -lifeline 5563 -version 602'

[15:49:26] 
[15:49:26] *------------------------------*
[15:49:26] Folding@Home Gromacs SMP Core
[15:49:26] Version 1.74 (November 27, 2006)
[15:49:26] 
[15:49:26] Preparing to commence simulation
[15:49:26] - Ensuring status. Please wait.
[15:49:26] - Starting from initial work packet
[15:49:26] 
[15:49:26] Project: 2605 (Run 13, Clone 408, Gen 74)
[15:49:26] 
[15:49:27] Assembly optimizations on if available.
[15:49:27] Entering M.D.
[15:49:43] percent)
[15:49:44] - Starting from initial work packet
[15:49:44] 
[15:49:44] Project: 2605 (Run 13, Clone 408, Gen 74)
[15:49:44] 
[15:49:44] Entering M.D.
[15:49:51] g local files
[15:49:51] in in POPC
[15:49:51] Writing local files
[15:49:52] Extra SSE boost OK.
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
toTOW
Site Moderator
Posts: 6395
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: NaN detected OR Bad WU? Project: 2605 (12, 272, 75)

Post by toTOW »

Hi MoneyGuyBK (team 80856),
Your WU (P2605 R12 C272 G75) was added to the stats database on 2008-07-20 10:36:50 for 270.32 points of credit.

You're the only one who submitted this WU at the moment ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Re: NaN detected OR Bad WU? Project: 2605 (12, 272, 75)

Post by MoneyGuyBK »

Thanx for checking toTOW.... so I received partial credit.
If you or any mod would not mind, so that I will know in the future...

* What is ERROR "Quit 101 - NaN detected: (ener[20])" ???
Is thus an EUE error or is it new?... Please educate me ;)



Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: NaN detected OR Bad WU? Project: 2605 (12, 272, 75)

Post by bruce »

NaN means "Not a Number" which is what many people commonly call "infinity" but it's really the result of division by zero or other mathematical errors for which no accurate answer can be provided.

This can happen because of preconditions in the WU (e.g.- if two atoms collide or decide to fly off into space) or it can happen because of hardware errors including overclocking/overheating or things like that.
MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Re: NaN detected OR Bad WU? Project: 2605 (12, 272, 75)

Post by MoneyGuyBK »

bruce wrote:NaN means "Not a Number" which is what many people commonly call "infinity" but it's really the result of division by zero or other mathematical errors for which no accurate answer can be provided.

This can happen because of preconditions in the WU (e.g.- if two atoms collide or decide to fly off into space) or it can happen because of hardware errors including overclocking/overheating or things like that.
Thanx for the explanation my dear neighbor.... now I know I learned something today :mrgreen:



Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
Post Reply