Page 1 of 1

Project: 2665 (Run 0, Clone 628, Gen 42)

Posted: Wed Sep 03, 2008 12:06 pm
by ChrisDTC
Broke three times at the same place, other WUs are fine on this machine, so I dont think its instability

Code: Select all

[23:38:27] 
[23:38:27] + Processing work unit
[23:38:27] Work type a1 not eligible for variable processors
[23:38:27] Core required: FahCore_a1.exe
[23:38:27] Core found.
[23:38:27] Working on queue slot 05 [September 2 23:38:27 UTC]
[23:38:27] + Working ...
[23:38:27] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 05 -checkpoint 3 -service -verbose -lifeline 3276 -version 622'

[23:38:30] 
[23:38:30] *------------------------------*
[23:38:30] Folding@Home Gromacs SMP Core
[23:38:30] Version 1.76 (February 23, 2008)
[23:38:30] 
[23:38:30] Preparing to commence simulation
[23:38:30] - Ensuring status. Please wait- Created dyn
[23:38:30] - Files status OK
[23:38:36] - Expanded 4737438 -> 24426905 (decompressed 515.6 percent)
[23:38:36] - Starting from initial work packet
[23:38:36] 
[23:38:36] Project: 2665 (Run 0, Clone 628, Gen 42)
[23:38:36] 
[23:38:37] Assembly optimizations on if available.
[23:38:37] Entering M.D.
[23:38:59]  percent)
[23:38:59] - Starting from initial work packet
[23:38:59] 
[23:38:59] Project: 2665 (Run 0, Clone 628, Gen 42)
[23:38:59] 
[23:39:01] Entering M.D.
[23:39:11] Rejecting checkpoint
[23:39:13] Protein: HGG in water
[23:39:13] Writing local files
[23:39:20] Extra SSE boost OK.
[23:39:21] Writing local files
[23:39:21] Completed 0 out of 250000 steps  (0 percent)
[23:42:20] Timered checkpoint triggered.
[23:43:05] - Autosending finished units... [September 2 23:43:05 UTC]
[23:43:05] Trying to send all finished work units
[23:43:05] + No unsent completed units remaining.
[23:43:05] - Autosend completed
[23:45:21] Timered checkpoint triggered.
[23:48:21] Timered checkpoint triggered.
[23:51:20] Timered checkpoint triggered.
[23:54:21] Timered checkpoint triggered.
[23:54:33] Writing local files
[23:54:34] Completed 2500 out of 250000 steps  (1 percent)
[23:57:34] Timered checkpoint triggered.
[00:00:35] Timered checkpoint triggered.
[00:03:36] Timered checkpoint triggered.
[00:06:37] Timered checkpoint triggered.
[00:09:38] Timered checkpoint triggered.
[00:10:24] Writing local files
[00:10:25] Completed 5000 out of 250000 steps  (2 percent)
[00:13:25] Timered checkpoint triggered.
[00:16:26] Timered checkpoint triggered.
[00:19:27] Timered checkpoint triggered.
[00:22:28] Timered checkpoint triggered.
[00:25:01] Writing local files
[00:25:01] Completed 7500 out of 250000 steps  (3 percent)
[00:28:02] Timered checkpoint triggered.
[00:30:44] Warning:  long 1-4 interactions
[00:30:45] Quit 101 - NaN detected: (ener[0])
[00:30:45] 
[00:30:45] Simulation instability has been encountered. The run has entered a
[00:30:45]   state from which no further progress can be made.
[00:30:45] This may be the correct result of the simulation, however if you
[00:30:45]   often see other project units terminating early like this
[00:30:45]   too, you may wish to check the stability of your computer (issues
[00:30:45]   such as high temperature, overclocking, etc.).
[00:30:45] Going to send back what have done.
[00:30:45] logfile size: 15854
[00:30:45] - Writing 16403 bytes of core data to disk...
[00:30:45]   ... Done.
[00:30:45] No C.P. to delete.
[00:30:45] - Failed to delete work/wudata_05.dyn
[00:30:45] - Failed to delete work/wudata_05.chk
[00:30:45] - Failed to delete work/wudata_05.pdo
[00:30:45] - Failed to delete work/wudata_05.xvg
[00:30:45] Warning:  check for stray files
[00:32:45] 
[00:32:45] Folding@home Core Shutdown: EARLY_UNIT_END
[00:32:45] 
[00:32:45] Folding@home Core Shutdown: EARLY_UNIT_END
[00:32:50] CoreStatus = 63 (99)
[00:32:50] + Error starting Folding@Home core.
[00:32:55] 
[00:32:55] + Processing work unit
[00:32:55] Work type a1 not eligible for variable processors
[00:32:55] Core required: FahCore_a1.exe
[00:32:55] Core found.
[00:32:55] Working on queue slot 05 [September 3 00:32:55 UTC]
[00:32:55] + Working ...
[00:32:55] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 05 -checkpoint 3 -service -verbose -lifeline 3276 -version 622'

[00:32:58] 
[00:32:58] *------------------------------*
[00:32:58] Folding@Home Gromacs SMP Core
[00:32:58] Version 1.76 (February 23, 2008)
[00:32:58] 
[00:32:58] Preparing to commence simulation
[00:32:58] - Ensuring status. Please wait.
[00:32:58] Created dyn
[00:32:58] - Files status OK
[00:32:58] 
[00:32:58] Folding@home Core Shutdown: MISSING_WORK_FILES
[00:32:58] Finalizing output
[00:33:15] ation of core was improper.
[00:33:15] - Going to use standard loops.
[00:33:15] - Files status OK
[00:35:15] 
[00:35:15] Folding@home Core Shutdown: MISSING_WORK_FILES
[00:35:15] Finalizing output
[00:35:19] CoreStatus = 1 (1)
[00:35:19] Client-core communications error: ERROR 0x1
[00:35:19] Deleting current work unit & continuing...
[00:37:43] - Warning: Could not delete all work unit files (5): Core returned invalid code
[00:37:43] Trying to send all finished work units
[00:37:43] + No unsent completed units remaining.
[00:37:43] - Preparing to get new work unit...
[00:37:43] + Attempting to get work packet
[00:37:43] - Will indicate memory of 3581 MB
[00:37:43] - Connecting to assignment server
[00:37:43] Connecting to http://assign.stanford.edu:8080/
[00:37:43] Posted data.
[00:37:43] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[00:37:43] + News From Folding@Home: Welcome to Folding@Home
[00:37:44] Loaded queue successfully.
[00:37:44] Connecting to http://171.64.65.64:8080/
[00:37:51] Posted data.
[00:37:51] Initial: 0000; - Receiving payload (expected size: 4737950)
[00:37:59] - Downloaded at ~578 kB/s
[00:37:59] - Averaged speed for that direction ~558 kB/s
[00:37:59] + Received work.
[00:37:59] + Closed connections
[00:38:04] 
[00:38:04] + Processing work unit
[00:38:04] Work type a1 not eligible for variable processors
[00:38:04] Core required: FahCore_a1.exe
[00:38:04] Core found.
[00:38:04] Working on queue slot 06 [September 3 00:38:04 UTC]
[00:38:04] + Working ...
[00:38:04] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 3 -service -verbose -lifeline 3276 -version 622'

[00:38:07] 
[00:38:07] *------------------------------*
[00:38:07] Folding@Home Gromacs SMP Core
[00:38:07] Version 1.76 (February 23, 2008)
[00:38:07] 
[00:38:07] Preparing to commence simulation
[00:38:07] - Ensuring status. Please wait.
[00:38:24] - Looking at optimizations...
[00:38:24] - Working with standard loops on this execution.
[00:38:24] - Previous termination of core was improper.
[00:38:24] - Going to use standard loops.
[00:38:24] - Files status OK
[00:38:40] - Expanded 4737438 -> 24426905 (decompressed 515.6 percent)
[00:38:40] ne 628, Gen 42)
[00:38:40] 
[00:38:40] nitial work packet
[00:38:40] 
[00:38:40] Project: 2665 (Run 0, Clone 628, Gen 42)
[00:38:40] 
[00:38:41] Entering M.D.
[00:38:53] Protein: HGG in water
[00:38:55] Writing local files
[00:38:55] Extra SSE boost OK.
[00:39:07] 0000 steps  (0 percent)
[00:42:08] Timered checkpoint triggered.
[00:45:09] Timered checkpoint triggered.
[00:48:10] Timered checkpoint triggered.
[00:51:11] Timered checkpoint triggered.
[00:53:57] Writing local files
[00:53:57] Completed 2500 out of 250000 steps  (1 percent)
[00:56:58] Timered checkpoint triggered.
[00:59:59] Timered checkpoint triggered.
[01:03:00] Timered checkpoint triggered.
[01:06:01] Timered checkpoint triggered.
[01:08:32] Writing local files
[01:08:32] Completed 5000 out of 250000 steps  (2 percent)
[01:11:33] Timered checkpoint triggered.
[01:14:34] Timered checkpoint triggered.
[01:17:35] Timered checkpoint triggered.
[01:20:37] Timered checkpoint triggered.
[01:23:07] Writing local files
[01:23:07] Completed 7500 out of 250000 steps  (3 percent)
[01:26:07] Timered checkpoint triggered.
[01:28:52] Warning:  long 1-4 interactions
[01:28:52] Quit 101 - NaN detected: (ener[20])
[01:28:52] 
[01:28:52] Simulation instability has been encountered. The run has entered a
[01:28:52]   state from which no further progress can be made.
[01:28:52] This may be the correct result of the simulation, however if you
[01:28:52]   often see other project units terminating early like this
[01:28:52]   too, you may wish to check the stability of your computer (issues
[01:28:52]   such as high temperature, overclocking, etc.).
[01:28:52] Going to send back what have done.
[01:28:52] logfile size: 15854
[01:28:52] - Writing 16404 bytes of core data to disk...
[01:28:52]   ... Done.
[01:28:52] - Failed to delete work/wudata_06.sas
[01:28:52] - Failed to delete work/wudata_06.xvg
[01:28:52] Warning:  check for stray files
[01:30:52] 
[01:30:52] Folding@home Core Shutdown: EARLY_UNIT_END
[01:30:52] 
[01:30:52] Folding@home Core Shutdown: EARLY_UNIT_END
[01:30:57] CoreStatus = 63 (99)
[01:30:57] + Error starting Folding@Home core.
[01:30:57] - Attempting to download new core...
[01:30:57] + Downloading new core: FahCore_a1.exe
[01:30:57] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[01:30:59] Initial: AFDE; + 10240 bytes downloaded
[01:30:59] Initial: 10F0; + 20480 bytes downloaded
[01:30:59] Initial: DB70; + 30720 bytes downloaded
[01:30:59] Initial: 865E; + 40960 bytes downloaded
[01:30:59] Initial: 8F87; + 51200 bytes downloaded
[01:30:59] Initial: C48B; + 61440 bytes downloaded
[01:30:59] Initial: 92B3; + 71680 bytes downloaded
[01:30:59] Initial: C102; + 81920 bytes downloaded
[01:30:59] Initial: 1996; + 92160 bytes downloaded
[01:30:59] Initial: BFFE; + 102400 bytes downloaded
[01:30:59] Initial: 1810; + 112640 bytes downloaded
[01:30:59] Initial: 0626; + 122880 bytes downloaded
[01:30:59] Initial: 7B53; + 133120 bytes downloaded
[01:30:59] Initial: 0441; + 143360 bytes downloaded
[01:30:59] Initial: FECE; + 153600 bytes downloaded
[01:30:59] Initial: D346; + 163840 bytes downloaded
[01:30:59] Initial: 2DE8; + 174080 bytes downloaded
[01:30:59] Initial: B3F0; + 184320 bytes downloaded
[01:30:59] Initial: 2881; + 194560 bytes downloaded
[01:30:59] Initial: 9507; + 204800 bytes downloaded
[01:30:59] Initial: 1BAF; + 215040 bytes downloaded
[01:30:59] Initial: 717C; + 225280 bytes downloaded
[01:30:59] Initial: 23FD; + 235520 bytes downloaded
[01:30:59] Initial: 915F; + 245760 bytes downloaded
[01:30:59] Initial: CE52; + 256000 bytes downloaded
[01:30:59] Initial: ED88; + 266240 bytes downloaded
[01:30:59] Initial: 2579; + 276480 bytes downloaded
[01:30:59] Initial: 3396; + 286720 bytes downloaded
[01:30:59] Initial: 410C; + 296960 bytes downloaded
[01:30:59] Initial: 56D1; + 307200 bytes downloaded
[01:30:59] Initial: 1EBD; + 317440 bytes downloaded
[01:30:59] Initial: 6AD9; + 327680 bytes downloaded
[01:30:59] Initial: F931; + 337920 bytes downloaded
[01:30:59] Initial: 1C40; + 348160 bytes downloaded
[01:30:59] Initial: C4AE; + 358400 bytes downloaded
[01:30:59] Initial: 57E4; + 368640 bytes downloaded
[01:30:59] Initial: 1843; + 378880 bytes downloaded
[01:30:59] Initial: B0C0; + 389120 bytes downloaded
[01:30:59] Initial: AAAA; + 399360 bytes downloaded
[01:30:59] Initial: D737; + 409600 bytes downloaded
[01:30:59] Initial: 762A; + 419840 bytes downloaded
[01:30:59] Initial: 8685; + 430080 bytes downloaded
[01:30:59] Initial: 25B1; + 440320 bytes downloaded
[01:30:59] Initial: 44F1; + 450560 bytes downloaded
[01:30:59] Initial: EF81; + 460800 bytes downloaded
[01:30:59] Initial: 900E; + 471040 bytes downloaded
[01:30:59] Initial: 906E; + 481280 bytes downloaded
[01:30:59] Initial: D59F; + 491520 bytes downloaded
[01:30:59] Initial: 2406; + 501760 bytes downloaded
[01:30:59] Initial: 9777; + 512000 bytes downloaded
[01:30:59] Initial: 7783; + 522240 bytes downloaded
[01:30:59] Initial: AEC5; + 532480 bytes downloaded
[01:30:59] Initial: B8A1; + 542720 bytes downloaded
[01:30:59] Initial: D50E; + 552960 bytes downloaded
[01:30:59] Initial: BDEE; + 563200 bytes downloaded
[01:30:59] Initial: E433; + 573440 bytes downloaded
[01:30:59] Initial: 667A; + 583680 bytes downloaded
[01:30:59] Initial: C413; + 593920 bytes downloaded
[01:30:59] Initial: DB64; + 604160 bytes downloaded
[01:30:59] Initial: 313C; + 614400 bytes downloaded
[01:30:59] Initial: 4B8A; + 624640 bytes downloaded
[01:30:59] Initial: 1B3A; + 634880 bytes downloaded
[01:30:59] Initial: E39B; + 645120 bytes downloaded
[01:30:59] Initial: F9FD; + 655360 bytes downloaded
[01:30:59] Initial: BFF6; + 665600 bytes downloaded
[01:30:59] Initial: 0552; + 675840 bytes downloaded
[01:30:59] Initial: 14A7; + 686080 bytes downloaded
[01:30:59] Initial: 99A6; + 696320 bytes downloaded
[01:30:59] Initial: 06B2; + 706560 bytes downloaded
[01:30:59] Initial: 445D; + 716800 bytes downloaded
[01:30:59] Initial: 62C1; + 727040 bytes downloaded
[01:30:59] Initial: 0E27; + 737280 bytes downloaded
[01:30:59] Initial: EF9A; + 747520 bytes downloaded
[01:30:59] Initial: C105; + 757760 bytes downloaded
[01:30:59] Initial: 46D3; + 768000 bytes downloaded
[01:30:59] Initial: 33C7; + 778240 bytes downloaded
[01:30:59] Initial: 7E92; + 788480 bytes downloaded
[01:30:59] Initial: 24B3; + 795847 bytes downloaded
[01:30:59] Verifying core Core_a1.fah...
[01:30:59] Signature is VALID
[01:30:59] 
[01:30:59] Trying to unzip core FahCore_a1.exe
[01:30:59] Decompressed FahCore_a1.exe (2117632 bytes) successfully
[01:31:04] + Core successfully engaged
[01:31:09] 
[01:31:09] + Processing work unit
[01:31:09] Work type a1 not eligible for variable processors
[01:31:09] Core required: FahCore_a1.exe
[01:31:09] Core found.
[01:31:09] Working on queue slot 06 [September 3 01:31:09 UTC]
[01:31:09] + Working ...
[01:31:09] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 3 -service -verbose -lifeline 3276 -version 622'

[01:31:13] 
[01:31:13] *------------------------------*
[01:31:13] Folding@Home Gromacs SMP Core
[01:31:13] Version 1.76 (February 23, 2008)
[01:31:13] 
[01:31:13] Preparing to commence simulation
[01:31:13] - Looking at optimizations...
[01:31:13] - Created dyn
[01:31:13] - Files status OK
[01:31:13] 
[01:31:13] Folding@home Core Shutdown: MISSING_WORK_FILES
[01:31:13] Finalizing output
[01:33:16] CoreStatus = 1 (1)
[01:33:16] Client-core communications error: ERROR 0x1
[01:33:16] Deleting current work unit & continuing...
[01:35:40] - Warning: Could not delete all work unit files (6): Core returned invalid code
[01:35:40] Trying to send all finished work units
[01:35:40] + No unsent completed units remaining.
[01:35:40] - Preparing to get new work unit...
[01:35:40] + Attempting to get work packet
[01:35:40] - Will indicate memory of 3581 MB
[01:35:40] - Connecting to assignment server
[01:35:40] Connecting to http://assign.stanford.edu:8080/
[01:35:40] Posted data.
[01:35:40] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[01:35:40] + News From Folding@Home: Welcome to Folding@Home
[01:35:40] Loaded queue successfully.
[01:35:40] Connecting to http://171.64.65.64:8080/
[01:35:49] Posted data.
[01:35:49] Initial: 0000; - Receiving payload (expected size: 4737950)
[01:35:57] - Downloaded at ~578 kB/s
[01:35:57] - Averaged speed for that direction ~562 kB/s
[01:35:57] + Received work.
[01:35:57] + Closed connections
[01:36:02] 
[01:36:02] + Processing work unit
[01:36:02] Work type a1 not eligible for variable processors
[01:36:02] Core required: FahCore_a1.exe
[01:36:02] Core found.
[01:36:02] Working on queue slot 07 [September 3 01:36:02 UTC]
[01:36:02] + Working ...
[01:36:02] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 3 -service -verbose -lifeline 3276 -version 622'

[01:36:05] 
[01:36:05] *------------------------------*
[01:36:05] Folding@Home Gromacs SMP Core
[01:36:05] Version 1.76 (February 23, 2008)
[01:36:05] 
[01:36:05] Preparing to commence simulation
[01:36:05] - Ensuring status. Please wait.
[01:36:22] - Looking at optimizations...
[01:36:22] - Working with standard loops on this execution.
[01:36:22] - Previous termination of core was improper.
[01:36:22] - Going to use standard loops.
[01:36:22] - Files status OK
[01:36:37] Starting from initial work packet
[01:36:37] 
[01:36:37] Project: 2665 (Run 0, Clone 628, Gen 42)
[01:36:37] 
[01:36:37] 65 (Run 0, Clone 628, Gen 42)
[01:36:37] 
[01:36:38] 65 (Run 0, Clone 628, Gen 42)
[01:36:38] 
[01:36:39] Entering M.D.
[01:36:51] 
[01:36:51] cal files
[01:36:51] G in water
[01:36:51] Writing local files
[01:36:52] Extra SSE boost OK.
[01:36:59] 0000 steps  (0 percent)
[01:40:00] Timered checkpoint triggered.
[01:43:00] Timered checkpoint triggered.
[01:46:00] Timered checkpoint triggered.
[01:49:00] Timered checkpoint triggered.
[01:51:26] Writing local files
[01:51:26] Completed 2500 out of 250000 steps  (1 percent)
[01:54:27] Timered checkpoint triggered.
[01:57:27] Timered checkpoint triggered.
[02:00:27] Timered checkpoint triggered.
[02:03:27] Timered checkpoint triggered.
[02:05:48] Writing local files
[02:05:48] Completed 5000 out of 250000 steps  (2 percent)
[02:08:48] Timered checkpoint triggered.
[02:11:48] Timered checkpoint triggered.
[02:14:48] Timered checkpoint triggered.
[02:17:48] Timered checkpoint triggered.
[02:20:10] Writing local files
[02:20:10] Completed 7500 out of 250000 steps  (3 percent)
[02:23:11] Timered checkpoint triggered.
[02:25:50] Warning:  long 1-4 interactions
[02:25:51] Quit 101 - NaN detected: (ener[0])
[02:25:51] 
[02:25:51] Simulation instability has been encountered. The run has entered a
[02:25:51]   state from which no further progress can be made.
[02:25:51] This may be the correct result of the simulation, however if you
[02:25:51]   often see other project units terminating early like this
[02:25:51]   too, you may wish to check the stability of your computer (issues
[02:25:51]   such as high temperature, overclocking, etc.).
[02:25:51] Going to send back what have done.
[02:25:51] logfile size: 15854
[02:25:51] - Writing 16403 bytes of core data to disk...
[02:25:51]   ... Done.
[02:25:51] - Failed to delete work/wudata_07.chk
[02:25:51] - Failed to delete work/wudata_07.pdo
[02:25:51] Warning:  check for stray files
[02:27:52] 
[02:27:52] Folding@home Core Shutdown: EARLY_UNIT_END
[02:27:52] 
[02:27:52] Folding@home Core Shutdown: EARLY_UNIT_END
[02:27:54] CoreStatus = 63 (99)
[02:27:54] + Error starting Folding@Home core.
[02:27:59] 
[02:27:59] + Processing work unit
[02:27:59] Work type a1 not eligible for variable processors
[02:27:59] Core required: FahCore_a1.exe
[02:27:59] Core found.
[02:27:59] Working on queue slot 07 [September 3 02:27:59 UTC]
[02:27:59] + Working ...
[02:27:59] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 3 -service -verbose -lifeline 3276 -version 622'

[02:28:02] 
[02:28:02] *------------------------------*
[02:28:02] Folding@Home Gromacs SMP Core
[02:28:02] Version 1.76 (February 23, 2008)
[02:28:02] 
[02:28:02] Preparing to commence simulation
[02:28:02] - Ensuring status. Please wait.
[02:28:03] Created dyn
[02:28:03] - Files status OK
[02:28:03] 
[02:28:03] Folding@home Core Shutdown: MISSING_WORK_FILES
[02:28:03] Finalizing output
[02:28:20] ation of core was improper.
[02:28:20] - Going to use standard loops.
[02:28:20] - Files status OK
[02:30:19] SSING_WORK_FILES
[02:30:19] Finalizing output
[02:30:20] G_WORK_FILES
[02:30:20] Finalizing output
[02:30:24] CoreStatus = 1 (1)
[02:30:24] Client-core communications error: ERROR 0x1
[02:30:24] - Attempting to download new core...
[02:30:24] + Downloading new core: FahCore_a1.exe
[02:30:24] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[02:30:26] Initial: AFDE; + 10240 bytes downloaded
[02:30:26] Initial: 10F0; + 20480 bytes downloaded
[02:30:26] Initial: DB70; + 30720 bytes downloaded
[02:30:26] Initial: 865E; + 40960 bytes downloaded
[02:30:26] Initial: 8F87; + 51200 bytes downloaded
[02:30:26] Initial: C48B; + 61440 bytes downloaded
[02:30:26] Initial: 92B3; + 71680 bytes downloaded
[02:30:26] Initial: C102; + 81920 bytes downloaded
[02:30:26] Initial: 1996; + 92160 bytes downloaded
[02:30:26] Initial: BFFE; + 102400 bytes downloaded
[02:30:26] Initial: 1810; + 112640 bytes downloaded
[02:30:26] Initial: 0626; + 122880 bytes downloaded
[02:30:26] Initial: 7B53; + 133120 bytes downloaded
[02:30:26] Initial: 0441; + 143360 bytes downloaded
[02:30:26] Initial: FECE; + 153600 bytes downloaded
[02:30:26] Initial: D346; + 163840 bytes downloaded
[02:30:26] Initial: 2DE8; + 174080 bytes downloaded
[02:30:26] Initial: B3F0; + 184320 bytes downloaded
[02:30:26] Initial: 2881; + 194560 bytes downloaded
[02:30:26] Initial: 9507; + 204800 bytes downloaded
[02:30:26] Initial: 1BAF; + 215040 bytes downloaded
[02:30:26] Initial: 717C; + 225280 bytes downloaded
[02:30:26] Initial: 23FD; + 235520 bytes downloaded
[02:30:26] Initial: 915F; + 245760 bytes downloaded
[02:30:26] Initial: CE52; + 256000 bytes downloaded
[02:30:26] Initial: ED88; + 266240 bytes downloaded
[02:30:26] Initial: 2579; + 276480 bytes downloaded
[02:30:26] Initial: 3396; + 286720 bytes downloaded
[02:30:26] Initial: 410C; + 296960 bytes downloaded
[02:30:26] Initial: 56D1; + 307200 bytes downloaded
[02:30:26] Initial: 1EBD; + 317440 bytes downloaded
[02:30:26] Initial: 6AD9; + 327680 bytes downloaded
[02:30:26] Initial: F931; + 337920 bytes downloaded
[02:30:26] Initial: 1C40; + 348160 bytes downloaded
[02:30:26] Initial: C4AE; + 358400 bytes downloaded
[02:30:26] Initial: 57E4; + 368640 bytes downloaded
[02:30:26] Initial: 1843; + 378880 bytes downloaded
[02:30:26] Initial: B0C0; + 389120 bytes downloaded
[02:30:26] Initial: AAAA; + 399360 bytes downloaded
[02:30:26] Initial: D737; + 409600 bytes downloaded
[02:30:26] Initial: 762A; + 419840 bytes downloaded
[02:30:26] Initial: 8685; + 430080 bytes downloaded
[02:30:26] Initial: 25B1; + 440320 bytes downloaded
[02:30:26] Initial: 44F1; + 450560 bytes downloaded
[02:30:26] Initial: EF81; + 460800 bytes downloaded
[02:30:26] Initial: 900E; + 471040 bytes downloaded
[02:30:26] Initial: 906E; + 481280 bytes downloaded
[02:30:26] Initial: D59F; + 491520 bytes downloaded
[02:30:26] Initial: 2406; + 501760 bytes downloaded
[02:30:26] Initial: 9777; + 512000 bytes downloaded
[02:30:26] Initial: 7783; + 522240 bytes downloaded
[02:30:26] Initial: AEC5; + 532480 bytes downloaded
[02:30:26] Initial: B8A1; + 542720 bytes downloaded
[02:30:26] Initial: D50E; + 552960 bytes downloaded
[02:30:26] Initial: BDEE; + 563200 bytes downloaded
[02:30:26] Initial: E433; + 573440 bytes downloaded
[02:30:26] Initial: 667A; + 583680 bytes downloaded
[02:30:26] Initial: C413; + 593920 bytes downloaded
[02:30:26] Initial: DB64; + 604160 bytes downloaded
[02:30:26] Initial: 313C; + 614400 bytes downloaded
[02:30:26] Initial: 4B8A; + 624640 bytes downloaded
[02:30:26] Initial: 1B3A; + 634880 bytes downloaded
[02:30:26] Initial: E39B; + 645120 bytes downloaded
[02:30:26] Initial: F9FD; + 655360 bytes downloaded
[02:30:26] Initial: BFF6; + 665600 bytes downloaded
[02:30:26] Initial: 0552; + 675840 bytes downloaded
[02:30:26] Initial: 14A7; + 686080 bytes downloaded
[02:30:26] Initial: 99A6; + 696320 bytes downloaded
[02:30:26] Initial: 06B2; + 706560 bytes downloaded
[02:30:26] Initial: 445D; + 716800 bytes downloaded
[02:30:26] Initial: 62C1; + 727040 bytes downloaded
[02:30:26] Initial: 0E27; + 737280 bytes downloaded
[02:30:26] Initial: EF9A; + 747520 bytes downloaded
[02:30:26] Initial: C105; + 757760 bytes downloaded
[02:30:26] Initial: 46D3; + 768000 bytes downloaded
[02:30:26] Initial: 33C7; + 778240 bytes downloaded
[02:30:26] Initial: 7E92; + 788480 bytes downloaded
[02:30:26] Initial: 24B3; + 795847 bytes downloaded
[02:30:26] Verifying core Core_a1.fah...
[02:30:26] Signature is VALID
[02:30:26] 
[02:30:26] Trying to unzip core FahCore_a1.exe
[02:30:27] Decompressed FahCore_a1.exe (2117632 bytes) successfully
[02:30:32] + Core successfully engaged
[02:30:32] Deleting current work unit & continuing...
[02:32:56] - Warning: Could not delete all work unit files (7): Core returned invalid code
[02:32:56] Trying to send all finished work units
[02:32:56] + No unsent completed units remaining.
[02:32:56] - Preparing to get new work unit...
[02:32:56] + Attempting to get work packet
[02:32:56] - Will indicate memory of 3581 MB
[02:32:56] - Connecting to assignment server
[02:32:56] Connecting to http://assign.stanford.edu:8080/
[02:32:56] Posted data.
[02:32:56] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[02:32:56] + News From Folding@Home: Welcome to Folding@Home
[02:32:56] Loaded queue successfully.
[02:32:56] Connecting to http://171.64.65.64:8080/
[02:32:57] Posted data.
[02:32:57] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[02:32:57] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[02:33:12] + Attempting to get work packet
[02:33:12] - Will indicate memory of 3581 MB
[02:33:12] - Connecting to assignment server
[02:33:12] Connecting to http://assign.stanford.edu:8080/
[02:33:12] Posted data.
[02:33:12] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[02:33:12] + News From Folding@Home: Welcome to Folding@Home
[02:33:12] Loaded queue successfully.
[02:33:12] Connecting to http://171.64.65.64:8080/
[02:33:20] Posted data.
[02:33:20] Initial: 0000; - Receiving payload (expected size: 4645414)
[02:33:28] - Downloaded at ~567 kB/s
[02:33:28] - Averaged speed for that direction ~563 kB/s
[02:33:28] + Received work.
[02:33:28] + Closed connections
[02:33:33] 
[02:33:33] + Processing work unit
[02:33:33] Work type a1 not eligible for variable processors
[02:33:33] Core required: FahCore_a1.exe
[02:33:33] Core found.
[02:33:33] Working on queue slot 08 [September 3 02:33:33 UTC]
[02:33:33] + Working ...
[02:33:33] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 3 -service -verbose -lifeline 3276 -version 622'

[02:33:36] 

Re: Project: 2665 (Run 0, Clone 628, Gen 42)

Posted: Wed Sep 03, 2008 3:09 pm
by toTOW
Probably a bad WU ... I see 5 reports for partial credit in the DB :(

Re: Project: 2665 (Run 0, Clone 628, Gen 42)

Posted: Mon Oct 06, 2008 3:05 am
by Zagen30
I got this particular WU today, and it fails at the exact same place as TC:

Code: Select all

[01:12:15] Project: 2665 (Run 0, Clone 628, Gen 42)
[01:12:15] 
[01:12:31] Entering M.D.
[01:12:41] Calling FAH init
[01:12:44] ater
[01:12:44] Writing local files
[01:12:44] rom checkpoint)
[01:12:44] Read checkpoint
[01:12:44] Protein: HGG in water
[01:12:44] Writing local files
[01:12:55] Extra SSE boost OK.
[01:12:56] Writing local files
[01:12:56] Completed 0 out of 250000 steps  (0 percent)
[01:44:21] Writing local files
[01:44:22] Completed 2500 out of 250000 steps  (1 percent)
[02:13:09] Writing local files
[02:13:10] Completed 5000 out of 250000 steps  (2 percent)
[02:42:01] Writing local files
[02:42:02] Completed 7500 out of 250000 steps  (3 percent)
[02:53:25] Warning:  long 1-4 interactions
[02:53:27] , overclocking, etc.).
[02:53:27] Going to send back what have done.
[02:53:27] logfile size: 25276
[02:53:27] - Writing 25826 bytes of core data to disk...
[02:53:27]   ... Done.
[02:53:27] - Failed to delete work/wudata_00.arc
[02:53:27] Warning:  check for stray files
[02:53:27] 0.xtc
[02:53:27] - Failed to delete work/wudata_00.bed
[02:53:27] - Failed to delete work/wudata_00.sas
[02:53:27] - Failed to delete work/wudata_00.goe
[02:53:27] Warning:  check for stray files
[02:53:27]  high temperature, overclocking, etc.).
[02:53:27] Going to send back what have done.
[02:53:27] logfile size: 25276
[02:53:27] - Writing 25826 bytes of core data to disk...
[02:53:27]   ... Done.
[02:53:27] - Failed to delete work/wudata_00.arc
[02:53:27] Warning:  check for stray files
[02:55:27] 
[02:55:27] Folding@home Core Shutdown: EARLY_UNIT_END
[02:55:27] 
[02:55:27] Folding@home Core Shutdown: EARLY_UNIT_END
[02:55:30] CoreStatus = 7B (123)
[02:55:30] Client-core communications error: ERROR 0x7b
[02:55:30] This is a sign of more serious problems, shutting down.
I'm on my third go-around with it now, and hopefully after it EUEs at 3% again I'll get a new WU. Perhaps this one should be removed from the servers...

Re: Project: 2665 (Run 0, Clone 628, Gen 42)

Posted: Mon Oct 06, 2008 9:11 am
by toTOW
You can use qfix to send partial result, and to avoid processing the same WU multiple times (when the server gets your partial results, it won't assign this WU to you again).

Re: Project: 2665 (Run 0, Clone 628, Gen 42)

Posted: Mon Oct 06, 2008 3:46 pm
by Zagen30
Not sure how much use qfix was, but it sent the results back after the third EUE.

Re: Project: 2665 (Run 0, Clone 628, Gen 42)

Posted: Tue Oct 07, 2008 1:19 am
by kittle
simillar error here.
I just deleted the work folder, que.dat and unitinfo.txt file and got a new WU

Code: Select all

Launch directory: C:\Program Files (x86)\Folding@Home Windows SMP Client V1.01
Executable: [email protected]
Arguments: -local -smp

[15:55:41] - Ask before connecting: No
[15:55:41] - User name: kittle (Team 31574)
[15:55:41] - User ID: 26FE9FB465D0D80D
[15:55:41] - Machine ID: 2
[15:55:41]
[15:55:42] Loaded queue successfully.
[15:55:42] - Preparing to get new work unit...
[15:55:42] + Attempting to get work packet
[15:55:42] - Connecting to assignment server
[15:55:42] - Successful: assigned to (171.64.65.64).
[15:55:42] + News From Folding@Home: Welcome to Folding@Home
[15:55:42] Loaded queue successfully.
[15:55:56] + Closed connections
[15:55:56]
[15:55:56] + Processing work unit
[15:55:56] Work type a1 not eligible for variable processors
[15:55:56] Core required: FahCore_a1.exe
[15:55:56] Core found.
[15:55:56] Using generic mpiexec calls
[15:55:56] Working on queue slot 02 [October 6 15:55:56 UTC]
[15:55:56] + Working ...
[15:55:57]
[15:55:57] *------------------------------*
[15:55:57] Folding@Home Gromacs SMP Core
[15:55:57] Version 1.74 (March 10, 2007)
[15:55:57]
[15:55:57] Preparing to commence simulation
[15:55:57] - Looking at optimizations...
[15:55:57] - Created dyn
[15:55:57] - Files status OK
[15:56:18] - Expanded 4737438 -> 24426905 (decompressed 515.6 percent)
[15:56:18] - Starting from initial work packet
[15:56:18]
[15:56:18] Project: 2665 (Run 0, Clone 628, Gen 42)
[15:56:18]
[15:56:19] Assembly optimizations on if available.
[15:56:19] Entering M.D.
[15:56:27] Rejecting checkpoint
[15:56:30]
[15:56:30] Writing local files
[15:56:30]
[15:56:30] Writing local files
[15:56:42] Extra SSE boost OK.
[15:56:43] Writing local files
[15:56:43] Completed 0 out of 250000 steps  (0 percent)
[16:08:59] Warning:  long 1-4 interactions
[16:09:00] ich no further progress can be made.
[16:09:00] This may be the correct result of the simulation, however if you
[16:09:00]   often see other project units terminating early like this
[16:09:00]   too, you may wish to check the stability of your computer (issues
[16:09:00]   such as high temperature, overclocking, etc.).
[16:09:00] Going to send back what have done.
[16:09:00] logfile size: 9883
[16:09:00] - Writing 10432 bytes of core data to disk...
[16:09:00]   ... Done.
[16:09:00] - Failed to delete work/wudata_02.arc
[16:09:00] - Failed to delete work/wudata_02.xtc
[16:09:00] No C.P. to delete.
[16:09:00] Warning:  check for stray files
[16:09:00] 2.dyn
[16:09:00] - Failed to delete work/wudata_02.bed
[16:09:00] - Fa- Failed to delete work/wudata_02.- FaWarning:  check for stray f
iles
[16:09:00] e
[16:09:00] Warning:  check for stray files
[16:09:00]
[16:09:00] Folding@home Core Shutdown: EARLY_UNIT_END
[16:09:00] Finalizing output
[16:11:03] CoreStatus = 7B (123)
[16:11:03] Client-core communications error: ERROR 0x7b
[16:11:03] This is a sign of more serious problems, shutting down.

Re: Project: 2665 (Run 0, Clone 628, Gen 42)

Posted: Tue Oct 07, 2008 7:55 am
by 7up1n3
Reports of failed 2665 WUs are popping up in our forums as well (anecdotal as I haven't run into this WU myself yet).