Page 1 of 1

P11020 immediate EUE

Posted: Sat Mar 05, 2011 4:58 am
by vladh4x0r
Started getting 11020 assignments last night, every one of them EUEd within a few seconds of starting:

Code: Select all

[02:53:04] + Attempting to get work packet
[02:53:04] Passkey found
[02:53:04] - Will indicate memory of 4087 MB
[02:53:04] - Connecting to assignment server
[02:53:04] Connecting to http://assign.stanford.edu:8080/
[02:53:04] Posted data.
[02:53:04] Initial: 40AB; - Successful: assigned to (171.64.65.55).
[02:53:04] + News From Folding@Home: Welcome to Folding@Home
[02:53:05] Loaded queue successfully.
[02:53:05] Sent data
[02:53:05] Connecting to http://171.64.65.55:8080/
[02:53:05] Posted data.
[02:53:05] Initial: 0000; - Receiving payload (expected size: 659772)
[02:53:06] - Downloaded at ~644 kB/s
[02:53:06] - Averaged speed for that direction ~723 kB/s
[02:53:06] + Received work.
[02:53:06] Trying to send all finished work units
[02:53:06] + No unsent completed units remaining.
[02:53:06] + Closed connections
[02:53:11] 
[02:53:11] + Processing work unit
[02:53:11] Core required: FahCore_a3.exe
[02:53:11] Core found.
[02:53:11] Working on queue slot 01 [March 5 02:53:11 UTC]
[02:53:11] + Working ...
[02:53:11] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 7 -checkpoint 15 -verbose -lifeline 380 -version 634'

[02:53:11] 
[02:53:11] *------------------------------*
[02:53:11] Folding@Home Gromacs SMP Core
[02:53:11] Version 2.27 (Dec. 15, 2010)
[02:53:11] 
[02:53:11] Preparing to commence simulation
[02:53:11] - Looking at optimizations...
[02:53:11] - Created dyn
[02:53:11] - Files status OK
[02:53:11] - Expanded 659260 -> 1092080 (decompressed 165.6 percent)
[02:53:11] Called DecompressByteArray: compressed_data_size=659260 data_size=1092080, decompressed_data_size=1092080 diff=0
[02:53:11] - Digital signature verified
[02:53:11] 
[02:53:11] Project: 11020 (Run 0, Clone 85, Gen 1)
[02:53:11] 
[02:53:11] Assembly optimizations on if available.
[02:53:11] Entering M.D.
[02:53:17] Mapping NT from 7 to 7 
[02:53:17] mdrun returned 255
[02:53:17] Going to send back what have done -- stepsTotalG=1000000
[02:53:17] Work fraction=0.0000 steps=1000000.
[02:53:21] logfile size=0 infoLength=0 edr=0 trr=25
[02:53:21] logfile size: 0 info=0 bed=0 hdr=25
[02:53:21] - Writing 643 bytes of core data to disk...
[02:53:21] Done: 131 -> 151 (compressed to 115.2 percent)
[02:53:21]   ... Done.
[02:53:21] 
[02:53:21] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:25] CoreStatus = 72 (114)
[02:53:25] Sending work to server
[02:53:25] Project: 11020 (Run 0, Clone 85, Gen 1)


[02:53:25] + Attempting to send results [March 5 02:53:25 UTC]
[02:53:25] - Reading file work/wuresults_01.dat from core
[02:53:25]   (Read 663 bytes from disk)
[02:53:25] Connecting to http://171.64.65.55:8080/
[02:53:25] Posted data.
[02:53:25] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[02:53:25] - Uploaded at ~3 kB/s
[02:53:25] - Averaged speed for that direction ~6 kB/s
[02:53:25] + Results successfully sent
[02:53:25] Thank you for your contribution to Folding@Home.
[02:53:29] Trying to send all finished work units
[02:53:29] + No unsent completed units remaining.
[02:53:29] - Preparing to get new work unit...
[02:53:29] Cleaning up work directory
Eventually got a 6023 and that ran OK with the same core:

Code: Select all

[02:53:29] + Attempting to get work packet
[02:53:29] Passkey found
[02:53:29] - Will indicate memory of 4087 MB
[02:53:29] - Connecting to assignment server
[02:53:29] Connecting to http://assign.stanford.edu:8080/
[02:53:29] Posted data.
[02:53:29] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[02:53:29] + News From Folding@Home: Welcome to Folding@Home
[02:53:30] Loaded queue successfully.
[02:53:30] Sent data
[02:53:30] Connecting to http://171.64.65.54:8080/
[02:53:30] Posted data.
[02:53:30] Initial: 0000; - Receiving payload (expected size: 1767336)
[02:53:32] - Downloaded at ~862 kB/s
[02:53:32] - Averaged speed for that direction ~751 kB/s
[02:53:32] + Received work.
[02:53:32] Trying to send all finished work units
[02:53:32] + No unsent completed units remaining.
[02:53:32] + Closed connections
[02:53:37] 
[02:53:37] + Processing work unit
[02:53:37] Core required: FahCore_a3.exe
[02:53:37] Core found.
[02:53:37] Working on queue slot 02 [March 5 02:53:37 UTC]
[02:53:37] + Working ...
[02:53:37] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 7 -checkpoint 15 -verbose -lifeline 380 -version 634'

[02:53:37] 
[02:53:37] *------------------------------*
[02:53:37] Folding@Home Gromacs SMP Core
[02:53:37] Version 2.27 (Dec. 15, 2010)
[02:53:37] 
[02:53:37] Preparing to commence simulation
[02:53:37] - Looking at optimizations...
[02:53:37] - Created dyn
[02:53:37] - Files status OK
[02:53:37] - Expanded 1766824 -> 1967109 (decompressed 111.3 percent)
[02:53:37] Called DecompressByteArray: compressed_data_size=1766824 data_size=1967109, decompressed_data_size=1967109 diff=0
[02:53:37] - Digital signature verified
[02:53:37] 
[02:53:37] Project: 6023 (Run 0, Clone 6, Gen 479)
[02:53:37] 
[02:53:37] Assembly optimizations on if available.
[02:53:37] Entering M.D.
[02:53:43] Mapping NT from 7 to 7 
[02:53:43] Completed 0 out of 500000 steps  (0%)
[02:57:47] Completed 5000 out of 500000 steps  (1%)
[03:01:51] Completed 10000 out of 500000 steps  (2%)
[03:05:55] Completed 15000 out of 500000 steps  (3%)
Log file shows that it "ate" a few dozen units:

Code: Select all

[19:44:54] Project: 11020 (Run 0, Clone 27, Gen 0)
[19:45:04] Folding@home Core Shutdown: EARLY_UNIT_END
[19:45:08] Project: 11020 (Run 0, Clone 27, Gen 0)
[19:45:18] Project: 11020 (Run 0, Clone 49, Gen 0)
[19:45:29] Folding@home Core Shutdown: EARLY_UNIT_END
[19:45:33] Project: 11020 (Run 0, Clone 49, Gen 0)
[02:41:39] Project: 11020 (Run 0, Clone 154, Gen 0)
[02:41:49] Folding@home Core Shutdown: EARLY_UNIT_END
[02:41:53] Project: 11020 (Run 0, Clone 154, Gen 0)
[02:42:04] Project: 11020 (Run 0, Clone 155, Gen 0)
[02:42:14] Folding@home Core Shutdown: EARLY_UNIT_END
[02:42:18] Project: 11020 (Run 0, Clone 155, Gen 0)
[02:42:29] Project: 11020 (Run 0, Clone 151, Gen 0)
[02:42:39] Folding@home Core Shutdown: EARLY_UNIT_END
[02:42:43] Project: 11020 (Run 0, Clone 151, Gen 0)
[02:42:53] Project: 11020 (Run 0, Clone 152, Gen 0)
[02:43:04] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:07] Project: 11020 (Run 0, Clone 152, Gen 0)
[02:43:18] Project: 11020 (Run 0, Clone 157, Gen 0)
[02:43:29] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:32] Project: 11020 (Run 0, Clone 157, Gen 0)
[02:43:43] Project: 11020 (Run 0, Clone 158, Gen 0)
[02:43:53] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:57] Project: 11020 (Run 0, Clone 158, Gen 0)
[02:44:09] Project: 11020 (Run 0, Clone 156, Gen 0)
[02:44:19] Folding@home Core Shutdown: EARLY_UNIT_END
[02:44:23] Project: 11020 (Run 0, Clone 156, Gen 0)
[02:44:34] Project: 11020 (Run 0, Clone 146, Gen 0)
[02:44:44] Folding@home Core Shutdown: EARLY_UNIT_END
[02:44:48] Project: 11020 (Run 0, Clone 146, Gen 0)
[02:44:58] Project: 11020 (Run 0, Clone 147, Gen 0)
[02:45:09] Folding@home Core Shutdown: EARLY_UNIT_END
[02:45:12] Project: 11020 (Run 0, Clone 147, Gen 0)
[02:45:23] Project: 11020 (Run 0, Clone 144, Gen 0)
[02:45:33] Folding@home Core Shutdown: EARLY_UNIT_END
[02:45:37] Project: 11020 (Run 0, Clone 144, Gen 0)
[02:45:47] Project: 11020 (Run 0, Clone 145, Gen 0)
[02:45:58] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:01] Project: 11020 (Run 0, Clone 145, Gen 0)
[02:46:12] Project: 11020 (Run 0, Clone 159, Gen 0)
[02:46:22] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:26] Project: 11020 (Run 0, Clone 159, Gen 0)
[02:46:36] Project: 11020 (Run 0, Clone 143, Gen 0)
[02:46:47] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:51] Project: 11020 (Run 0, Clone 143, Gen 0)
[02:47:01] Project: 11020 (Run 0, Clone 160, Gen 0)
[02:47:12] Folding@home Core Shutdown: EARLY_UNIT_END
[02:47:15] Project: 11020 (Run 0, Clone 160, Gen 0)
[02:47:26] Project: 11020 (Run 0, Clone 352, Gen 1)
[02:47:36] Folding@home Core Shutdown: EARLY_UNIT_END
[02:47:40] Project: 11020 (Run 0, Clone 352, Gen 1)
[02:47:51] Project: 11020 (Run 0, Clone 161, Gen 0)
[02:48:01] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:05] Project: 11020 (Run 0, Clone 161, Gen 0)
[02:48:15] Project: 11020 (Run 0, Clone 162, Gen 0)
[02:48:26] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:29] Project: 11020 (Run 0, Clone 162, Gen 0)
[02:48:40] Project: 11020 (Run 0, Clone 163, Gen 0)
[02:48:50] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:54] Project: 11020 (Run 0, Clone 163, Gen 0)
[02:49:04] Project: 11020 (Run 0, Clone 407, Gen 1)
[02:49:15] Folding@home Core Shutdown: EARLY_UNIT_END
[02:49:19] Project: 11020 (Run 0, Clone 407, Gen 1)
[02:49:29] Project: 11020 (Run 0, Clone 164, Gen 0)
[02:49:40] Folding@home Core Shutdown: EARLY_UNIT_END
[02:49:43] Project: 11020 (Run 0, Clone 164, Gen 0)
[02:49:54] Project: 11020 (Run 0, Clone 165, Gen 0)
[02:50:04] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:08] Project: 11020 (Run 0, Clone 165, Gen 0)
[02:50:18] Project: 11020 (Run 0, Clone 167, Gen 0)
[02:50:29] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:32] Project: 11020 (Run 0, Clone 167, Gen 0)
[02:50:43] Project: 11020 (Run 0, Clone 166, Gen 0)
[02:50:53] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:57] Project: 11020 (Run 0, Clone 166, Gen 0)
[02:51:07] Project: 11020 (Run 0, Clone 168, Gen 0)
[02:51:18] Folding@home Core Shutdown: EARLY_UNIT_END
[02:51:21] Project: 11020 (Run 0, Clone 168, Gen 0)
[02:51:32] Project: 11020 (Run 0, Clone 170, Gen 0)
[02:51:42] Folding@home Core Shutdown: EARLY_UNIT_END
[02:51:46] Project: 11020 (Run 0, Clone 170, Gen 0)
[02:51:56] Project: 11020 (Run 0, Clone 171, Gen 0)
[02:52:07] Folding@home Core Shutdown: EARLY_UNIT_END
[02:52:11] Project: 11020 (Run 0, Clone 171, Gen 0)
[02:52:21] Project: 11020 (Run 0, Clone 172, Gen 1)
[02:52:32] Folding@home Core Shutdown: EARLY_UNIT_END
[02:52:35] Project: 11020 (Run 0, Clone 172, Gen 1)
[02:52:46] Project: 11020 (Run 0, Clone 173, Gen 0)
[02:52:56] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:00] Project: 11020 (Run 0, Clone 173, Gen 0)
[02:53:11] Project: 11020 (Run 0, Clone 85, Gen 1)
[02:53:21] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:25] Project: 11020 (Run 0, Clone 85, Gen 1)
No good ideas on where to start troubleshooting. This box has been folding everything else with no problem for over a year (i7 860), just finished a bigadv WU.

Question for the experts: does the "number of threads" (-smp n) affect different WUs in different ways? I've ran quite a few many WUs over the last few weeks with "smp -7" - could P11020 not "like" this? I don't have a unit to test with of course, since they got self-deleted.

Edit: Found this thread, but apparently "-smp 7" worked well for everyone, at least back then:

viewtopic.php?f=58&t=14423&start=75#p165234

Re: P11020 immediate EUE

Posted: Sat Mar 05, 2011 5:42 am
by Slash_2CPU
Update your client. I think that may be a new a5 core unit, and your client needs an update to run that new core.

Re: P11020 immediate EUE

Posted: Sat Mar 05, 2011 5:55 am
by vladh4x0r
Thanks Slash - I'm already running the 6.34 client, and finished one bigadv unit with it using the new A5 core. When P11020 first got assigned, it downloaded the new 2.27 A3 core, which is successfully running P6023 now.

Re: P11020 immediate EUE

Posted: Sat Mar 05, 2011 7:25 am
by Jeannie
You're running with -smp 7. I had the same problem with this project 11020. It can't handle -smp 7 - you have to use -smp 8 or -smp 6.

Re: P11020 immediate EUE

Posted: Sat Mar 05, 2011 2:41 pm
by vladh4x0r
Thanks Jeannie - looks like I'll be optimizing this box for bigadv with -smp 7 then. I run two GPU clients on dual GTX 460 on it as well, so -smp 8 is much slower, and -smp 6 would likely lose another round of performance.

Re: P11020 immediate EUE

Posted: Sat Mar 05, 2011 10:56 pm
by Arnette
Yeah i am having the exact same problem. We shouldn't have to change from smp -7 for this to work....

Does anyone from stanford have input on this?

Re: P11020 immediate EUE

Posted: Sun Mar 06, 2011 1:14 am
by dvanatta
Hi,

You shouldn't be getting these if you have the -smp 7, it's a bug. We're looking into it. Temporarily we've disabled using more than 6 for this project, but we'll re-enable 8 once we figure out what's going on.

-Dan

Re: P11020 immediate EUE

Posted: Sun Mar 06, 2011 1:43 pm
by toTOW
Dan> for assignments, the client doesn't take the -smp X into account. It uses the number of detected cores (which is printed at client startup).

So if 8 cores has been detected, it will report 8 cores to the AS, whether it's started with -smp or with another value specified ...

Re: P11020 immediate EUE

Posted: Mon Mar 07, 2011 10:03 pm
by dvanatta
toTOW,

Interesting. If that's the case, I don't think there's anything I can do but restrict this to 6 cores. I've also contacted the people that work on the server code directly, so hopefully this will get resolved at some point.

-Dan