P11020 immediate EUE
Posted: Sat Mar 05, 2011 4:58 am
Started getting 11020 assignments last night, every one of them EUEd within a few seconds of starting:
Eventually got a 6023 and that ran OK with the same core:
Log file shows that it "ate" a few dozen units:
No good ideas on where to start troubleshooting. This box has been folding everything else with no problem for over a year (i7 860), just finished a bigadv WU.
Question for the experts: does the "number of threads" (-smp n) affect different WUs in different ways? I've ran quite a few many WUs over the last few weeks with "smp -7" - could P11020 not "like" this? I don't have a unit to test with of course, since they got self-deleted.
Edit: Found this thread, but apparently "-smp 7" worked well for everyone, at least back then:
viewtopic.php?f=58&t=14423&start=75#p165234
Code: Select all
[02:53:04] + Attempting to get work packet
[02:53:04] Passkey found
[02:53:04] - Will indicate memory of 4087 MB
[02:53:04] - Connecting to assignment server
[02:53:04] Connecting to http://assign.stanford.edu:8080/
[02:53:04] Posted data.
[02:53:04] Initial: 40AB; - Successful: assigned to (171.64.65.55).
[02:53:04] + News From Folding@Home: Welcome to Folding@Home
[02:53:05] Loaded queue successfully.
[02:53:05] Sent data
[02:53:05] Connecting to http://171.64.65.55:8080/
[02:53:05] Posted data.
[02:53:05] Initial: 0000; - Receiving payload (expected size: 659772)
[02:53:06] - Downloaded at ~644 kB/s
[02:53:06] - Averaged speed for that direction ~723 kB/s
[02:53:06] + Received work.
[02:53:06] Trying to send all finished work units
[02:53:06] + No unsent completed units remaining.
[02:53:06] + Closed connections
[02:53:11]
[02:53:11] + Processing work unit
[02:53:11] Core required: FahCore_a3.exe
[02:53:11] Core found.
[02:53:11] Working on queue slot 01 [March 5 02:53:11 UTC]
[02:53:11] + Working ...
[02:53:11] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 7 -checkpoint 15 -verbose -lifeline 380 -version 634'
[02:53:11]
[02:53:11] *------------------------------*
[02:53:11] Folding@Home Gromacs SMP Core
[02:53:11] Version 2.27 (Dec. 15, 2010)
[02:53:11]
[02:53:11] Preparing to commence simulation
[02:53:11] - Looking at optimizations...
[02:53:11] - Created dyn
[02:53:11] - Files status OK
[02:53:11] - Expanded 659260 -> 1092080 (decompressed 165.6 percent)
[02:53:11] Called DecompressByteArray: compressed_data_size=659260 data_size=1092080, decompressed_data_size=1092080 diff=0
[02:53:11] - Digital signature verified
[02:53:11]
[02:53:11] Project: 11020 (Run 0, Clone 85, Gen 1)
[02:53:11]
[02:53:11] Assembly optimizations on if available.
[02:53:11] Entering M.D.
[02:53:17] Mapping NT from 7 to 7
[02:53:17] mdrun returned 255
[02:53:17] Going to send back what have done -- stepsTotalG=1000000
[02:53:17] Work fraction=0.0000 steps=1000000.
[02:53:21] logfile size=0 infoLength=0 edr=0 trr=25
[02:53:21] logfile size: 0 info=0 bed=0 hdr=25
[02:53:21] - Writing 643 bytes of core data to disk...
[02:53:21] Done: 131 -> 151 (compressed to 115.2 percent)
[02:53:21] ... Done.
[02:53:21]
[02:53:21] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:25] CoreStatus = 72 (114)
[02:53:25] Sending work to server
[02:53:25] Project: 11020 (Run 0, Clone 85, Gen 1)
[02:53:25] + Attempting to send results [March 5 02:53:25 UTC]
[02:53:25] - Reading file work/wuresults_01.dat from core
[02:53:25] (Read 663 bytes from disk)
[02:53:25] Connecting to http://171.64.65.55:8080/
[02:53:25] Posted data.
[02:53:25] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[02:53:25] - Uploaded at ~3 kB/s
[02:53:25] - Averaged speed for that direction ~6 kB/s
[02:53:25] + Results successfully sent
[02:53:25] Thank you for your contribution to Folding@Home.
[02:53:29] Trying to send all finished work units
[02:53:29] + No unsent completed units remaining.
[02:53:29] - Preparing to get new work unit...
[02:53:29] Cleaning up work directory
Code: Select all
[02:53:29] + Attempting to get work packet
[02:53:29] Passkey found
[02:53:29] - Will indicate memory of 4087 MB
[02:53:29] - Connecting to assignment server
[02:53:29] Connecting to http://assign.stanford.edu:8080/
[02:53:29] Posted data.
[02:53:29] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[02:53:29] + News From Folding@Home: Welcome to Folding@Home
[02:53:30] Loaded queue successfully.
[02:53:30] Sent data
[02:53:30] Connecting to http://171.64.65.54:8080/
[02:53:30] Posted data.
[02:53:30] Initial: 0000; - Receiving payload (expected size: 1767336)
[02:53:32] - Downloaded at ~862 kB/s
[02:53:32] - Averaged speed for that direction ~751 kB/s
[02:53:32] + Received work.
[02:53:32] Trying to send all finished work units
[02:53:32] + No unsent completed units remaining.
[02:53:32] + Closed connections
[02:53:37]
[02:53:37] + Processing work unit
[02:53:37] Core required: FahCore_a3.exe
[02:53:37] Core found.
[02:53:37] Working on queue slot 02 [March 5 02:53:37 UTC]
[02:53:37] + Working ...
[02:53:37] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 7 -checkpoint 15 -verbose -lifeline 380 -version 634'
[02:53:37]
[02:53:37] *------------------------------*
[02:53:37] Folding@Home Gromacs SMP Core
[02:53:37] Version 2.27 (Dec. 15, 2010)
[02:53:37]
[02:53:37] Preparing to commence simulation
[02:53:37] - Looking at optimizations...
[02:53:37] - Created dyn
[02:53:37] - Files status OK
[02:53:37] - Expanded 1766824 -> 1967109 (decompressed 111.3 percent)
[02:53:37] Called DecompressByteArray: compressed_data_size=1766824 data_size=1967109, decompressed_data_size=1967109 diff=0
[02:53:37] - Digital signature verified
[02:53:37]
[02:53:37] Project: 6023 (Run 0, Clone 6, Gen 479)
[02:53:37]
[02:53:37] Assembly optimizations on if available.
[02:53:37] Entering M.D.
[02:53:43] Mapping NT from 7 to 7
[02:53:43] Completed 0 out of 500000 steps (0%)
[02:57:47] Completed 5000 out of 500000 steps (1%)
[03:01:51] Completed 10000 out of 500000 steps (2%)
[03:05:55] Completed 15000 out of 500000 steps (3%)
Code: Select all
[19:44:54] Project: 11020 (Run 0, Clone 27, Gen 0)
[19:45:04] Folding@home Core Shutdown: EARLY_UNIT_END
[19:45:08] Project: 11020 (Run 0, Clone 27, Gen 0)
[19:45:18] Project: 11020 (Run 0, Clone 49, Gen 0)
[19:45:29] Folding@home Core Shutdown: EARLY_UNIT_END
[19:45:33] Project: 11020 (Run 0, Clone 49, Gen 0)
[02:41:39] Project: 11020 (Run 0, Clone 154, Gen 0)
[02:41:49] Folding@home Core Shutdown: EARLY_UNIT_END
[02:41:53] Project: 11020 (Run 0, Clone 154, Gen 0)
[02:42:04] Project: 11020 (Run 0, Clone 155, Gen 0)
[02:42:14] Folding@home Core Shutdown: EARLY_UNIT_END
[02:42:18] Project: 11020 (Run 0, Clone 155, Gen 0)
[02:42:29] Project: 11020 (Run 0, Clone 151, Gen 0)
[02:42:39] Folding@home Core Shutdown: EARLY_UNIT_END
[02:42:43] Project: 11020 (Run 0, Clone 151, Gen 0)
[02:42:53] Project: 11020 (Run 0, Clone 152, Gen 0)
[02:43:04] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:07] Project: 11020 (Run 0, Clone 152, Gen 0)
[02:43:18] Project: 11020 (Run 0, Clone 157, Gen 0)
[02:43:29] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:32] Project: 11020 (Run 0, Clone 157, Gen 0)
[02:43:43] Project: 11020 (Run 0, Clone 158, Gen 0)
[02:43:53] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:57] Project: 11020 (Run 0, Clone 158, Gen 0)
[02:44:09] Project: 11020 (Run 0, Clone 156, Gen 0)
[02:44:19] Folding@home Core Shutdown: EARLY_UNIT_END
[02:44:23] Project: 11020 (Run 0, Clone 156, Gen 0)
[02:44:34] Project: 11020 (Run 0, Clone 146, Gen 0)
[02:44:44] Folding@home Core Shutdown: EARLY_UNIT_END
[02:44:48] Project: 11020 (Run 0, Clone 146, Gen 0)
[02:44:58] Project: 11020 (Run 0, Clone 147, Gen 0)
[02:45:09] Folding@home Core Shutdown: EARLY_UNIT_END
[02:45:12] Project: 11020 (Run 0, Clone 147, Gen 0)
[02:45:23] Project: 11020 (Run 0, Clone 144, Gen 0)
[02:45:33] Folding@home Core Shutdown: EARLY_UNIT_END
[02:45:37] Project: 11020 (Run 0, Clone 144, Gen 0)
[02:45:47] Project: 11020 (Run 0, Clone 145, Gen 0)
[02:45:58] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:01] Project: 11020 (Run 0, Clone 145, Gen 0)
[02:46:12] Project: 11020 (Run 0, Clone 159, Gen 0)
[02:46:22] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:26] Project: 11020 (Run 0, Clone 159, Gen 0)
[02:46:36] Project: 11020 (Run 0, Clone 143, Gen 0)
[02:46:47] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:51] Project: 11020 (Run 0, Clone 143, Gen 0)
[02:47:01] Project: 11020 (Run 0, Clone 160, Gen 0)
[02:47:12] Folding@home Core Shutdown: EARLY_UNIT_END
[02:47:15] Project: 11020 (Run 0, Clone 160, Gen 0)
[02:47:26] Project: 11020 (Run 0, Clone 352, Gen 1)
[02:47:36] Folding@home Core Shutdown: EARLY_UNIT_END
[02:47:40] Project: 11020 (Run 0, Clone 352, Gen 1)
[02:47:51] Project: 11020 (Run 0, Clone 161, Gen 0)
[02:48:01] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:05] Project: 11020 (Run 0, Clone 161, Gen 0)
[02:48:15] Project: 11020 (Run 0, Clone 162, Gen 0)
[02:48:26] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:29] Project: 11020 (Run 0, Clone 162, Gen 0)
[02:48:40] Project: 11020 (Run 0, Clone 163, Gen 0)
[02:48:50] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:54] Project: 11020 (Run 0, Clone 163, Gen 0)
[02:49:04] Project: 11020 (Run 0, Clone 407, Gen 1)
[02:49:15] Folding@home Core Shutdown: EARLY_UNIT_END
[02:49:19] Project: 11020 (Run 0, Clone 407, Gen 1)
[02:49:29] Project: 11020 (Run 0, Clone 164, Gen 0)
[02:49:40] Folding@home Core Shutdown: EARLY_UNIT_END
[02:49:43] Project: 11020 (Run 0, Clone 164, Gen 0)
[02:49:54] Project: 11020 (Run 0, Clone 165, Gen 0)
[02:50:04] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:08] Project: 11020 (Run 0, Clone 165, Gen 0)
[02:50:18] Project: 11020 (Run 0, Clone 167, Gen 0)
[02:50:29] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:32] Project: 11020 (Run 0, Clone 167, Gen 0)
[02:50:43] Project: 11020 (Run 0, Clone 166, Gen 0)
[02:50:53] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:57] Project: 11020 (Run 0, Clone 166, Gen 0)
[02:51:07] Project: 11020 (Run 0, Clone 168, Gen 0)
[02:51:18] Folding@home Core Shutdown: EARLY_UNIT_END
[02:51:21] Project: 11020 (Run 0, Clone 168, Gen 0)
[02:51:32] Project: 11020 (Run 0, Clone 170, Gen 0)
[02:51:42] Folding@home Core Shutdown: EARLY_UNIT_END
[02:51:46] Project: 11020 (Run 0, Clone 170, Gen 0)
[02:51:56] Project: 11020 (Run 0, Clone 171, Gen 0)
[02:52:07] Folding@home Core Shutdown: EARLY_UNIT_END
[02:52:11] Project: 11020 (Run 0, Clone 171, Gen 0)
[02:52:21] Project: 11020 (Run 0, Clone 172, Gen 1)
[02:52:32] Folding@home Core Shutdown: EARLY_UNIT_END
[02:52:35] Project: 11020 (Run 0, Clone 172, Gen 1)
[02:52:46] Project: 11020 (Run 0, Clone 173, Gen 0)
[02:52:56] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:00] Project: 11020 (Run 0, Clone 173, Gen 0)
[02:53:11] Project: 11020 (Run 0, Clone 85, Gen 1)
[02:53:21] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:25] Project: 11020 (Run 0, Clone 85, Gen 1)
Question for the experts: does the "number of threads" (-smp n) affect different WUs in different ways? I've ran quite a few many WUs over the last few weeks with "smp -7" - could P11020 not "like" this? I don't have a unit to test with of course, since they got self-deleted.
Edit: Found this thread, but apparently "-smp 7" worked well for everyone, at least back then:
viewtopic.php?f=58&t=14423&start=75#p165234