Early Unit End on 128.143.48.226

Moderators: Site Moderators, FAHC Science Team

Post Reply
pompeyrodney
Posts: 37
Joined: Fri Dec 14, 2007 3:53 pm
Hardware configuration: 1 Dell server running Quad Xeon 2.4 Deino, SMP client. 2 Vista GPU client and 2 6.23 BetaR1. 8 other XP clients running 6.23.
Location: Portsmouth England
Contact:

Early Unit End on 128.143.48.226

Post by pompeyrodney »

I have a 6.23 Windows client that is EUE over and over again. It downloads a unit, works on it for a few seconds and then fails again. It has done this about five times before I killed it. Is there a way of knowing if the work units are faulty, this machine is not overclocked at all by the way. Thanks.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Early Unit End on 128.143.48.226

Post by bruce »

There is no dependable way to know if a WU is faulty. The general guidelines are that if you have an occasional WU that fails (no matter how many times it is reassigned) ignore it. If you have a variety of WUs that fail, it's probably your hardware.
pompeyrodney
Posts: 37
Joined: Fri Dec 14, 2007 3:53 pm
Hardware configuration: 1 Dell server running Quad Xeon 2.4 Deino, SMP client. 2 Vista GPU client and 2 6.23 BetaR1. 8 other XP clients running 6.23.
Location: Portsmouth England
Contact:

Re: Early Unit End on 128.143.48.226

Post by pompeyrodney »

My problem is that I cannot just ignore it as it is being constantly re-downloaded, I am using a mobile broadband connection thus the quantity of data is important to my total. Here are the relevant parts of my log:

Code: Select all

[11:36:07] Preparing to commence simulation
[11:36:07] - Files status OK
[11:36:08] - Expanded 245273 -> 653388 (decompressed 266.3 percent)
[11:36:08] 
[11:36:08] Project: 3861 (Run 147, Clone 3, Gen 6)
[11:36:08] 
[11:36:08] Assembly optimizations on if available.
[11:36:08] Entering M.D.
[11:36:14] Gromacs cannot continue further.
[11:36:14] Going to send back what have done.
[11:36:14] logfile size: 0
[11:36:14] Warning: Core could not open logfile.
[11:36:14] - Writing 536 bytes of core data to disk...
[11:36:14] Done: 24 -> 69 (compressed to 287.5 percent)
[11:36:14]   ... Done.
[11:36:14] 
[11:36:14] Folding@home Core Shutdown: EARLY_UNIT_END
[11:36:17] CoreStatus = 72 (114)
[11:36:17] Sending work to server
[11:36:17] Project: 3861 (Run 147, Clone 3, Gen 6)


[11:36:17] + Attempting to send results [June 18 11:36:17 UTC]
[11:36:17] - Reading file work/wuresults_05.dat from core
[11:36:17]   (Read 581 bytes from disk)
[11:36:17] Connecting to http://128.143.48.226:8080/
[11:36:18] Posted data.
[11:36:18] Initial: 0000; - Uploaded at ~1 kB/s
[11:36:18] - Averaged speed for that direction ~1 kB/s
[11:36:18] + Results successfully sent
[11:36:18] Thank you for your contribution to Folding@Home.
[11:36:22] + Attempting to get work packet
[11:36:22] - Will indicate memory of 383 MB
[11:36:22] - Connecting to assignment server
[11:36:22] Connecting to http://assign.stanford.edu:8080/
[11:36:24] Posted data.
[11:36:24] Initial: 8F80; - Successful: assigned to (128.143.48.226).
[11:36:24] + News From Folding@Home: Welcome to Folding@Home
[11:36:24] Loaded queue successfully.
[11:36:24] Connecting to http://128.143.48.226:8080/
[11:36:25] Posted data.
[11:36:31] Initial: 0000; - Receiving payload (expected size: 245785)
[11:36:31] Conversation time very short, giving reduced weight in bandwidth avg
[11:36:31] - Downloaded at ~480 kB/s
[11:36:31] - Averaged speed for that direction ~227 kB/s
[11:36:31] + Received work.
[11:36:31] Trying to send all finished work units
[11:36:31] + No unsent completed units remaining.
[11:36:31] + Closed connections
[11:36:36] 
[11:36:36] + Processing work unit
[11:36:36] Core required: FahCore_7c.exe
[11:36:36] Core found.
[11:36:36] Working on queue slot 06 [June 18 11:36:36 UTC]
[11:36:36] + Working ...
[11:36:36] - Calling '.\FahCore_7c.exe -dir work/ -suffix 06 -checkpoint 15 -verbose -lifeline 3184 -version 623'
[11:36:37] *------------------------------*
[11:36:37] Folding@Home Double Gromacs Core C
[11:36:37] Version 1.00 (Thu Apr 24 19:12:09 PDT 2008)
[11:36:37] 
[11:36:37] Preparing to commence simulation
[11:36:37] - Files status OK
[11:36:37] - Expanded 245273 -> 653388 (decompressed 266.3 percent)
[11:36:37] 
[11:36:37] Project: 3861 (Run 147, Clone 3, Gen 6)
[11:36:37] 
[11:36:37] Assembly optimizations on if available.
[11:36:37] Entering M.D.
[11:36:43] Gromacs cannot continue further.
[11:36:43] Going to send back what have done.
[11:36:43] logfile size: 0
[11:36:43] Warning: Core could not open logfile.
[11:36:43] - Writing 536 bytes of core data to disk...
[11:36:43] Done: 24 -> 69 (compressed to 287.5 percent)
[11:36:43]   ... Done.
[11:36:43] 
[11:36:43] Folding@home Core Shutdown: EARLY_UNIT_END
[11:36:47] CoreStatus = 72 (114)
[11:36:47] Sending work to server
[11:36:47] Project: 3861 (Run 147, Clone 3, Gen 6)
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Early Unit End on 128.143.48.226

Post by bruce »

pompeyrodney wrote:My problem is that I cannot just ignore it as it is being constantly re-downloaded, I am using a mobile broadband connection thus the quantity of data is important to my total.
For people on metered connections (including mobile connections and such) the quickest way to get rid of multiple downloads of the same defective WU is to change your MachineID. Do not use this method if you have previous results which still need to be uploaded, though. It's essentially a variation on sneakernetting except that only one computer is involved.
pompeyrodney
Posts: 37
Joined: Fri Dec 14, 2007 3:53 pm
Hardware configuration: 1 Dell server running Quad Xeon 2.4 Deino, SMP client. 2 Vista GPU client and 2 6.23 BetaR1. 8 other XP clients running 6.23.
Location: Portsmouth England
Contact:

Re: Early Unit End on 128.143.48.226

Post by pompeyrodney »

Thanks Bruce will give that a try.
Image
Mactin
Posts: 222
Joined: Sun Dec 02, 2007 1:08 pm
Location: Côte-des-Neiges, Montréal, Québec

Re: Early Unit End on 128.143.48.226

Post by Mactin »

I've had a few EUEs at 0% with gen 0 p3864 WUs.
p3864, r383, c19, g0 on June 11th
p3864, r385, c19, g0 on June 11th
p3864, r388, c17, g0 on June 27th (today)
p3864, r398, c4, g0 on feb 24th
Notice the range of Run numbers, all nicely close together.
This might be a coincidence, but then I have not had any other p3864 WUs within this range of Runs.
Could it be a bad run of Runs ?
Image
Post Reply