Page 1 of 1

171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 4:32 pm
by SpockLogic
FAHControl is having trouble sending in results of the last unit.

Code: Select all

16:14:25:Unit 00:Completed 1317730 out of 1500000 steps  (87%)
16:15:22:WARNING: WorkServer connection failed on port 8080 trying 80
16:15:22:Connecting to 171.64.65.54:80
16:15:43:Unit 00:Completed 1320000 out of 1500000 steps  (88%)
16:16:37:WARNING: Exception: Failed to send results to work server: Failed to connect to 171.64.65.54:80: Operation timed out
16:16:37:Trying to send results to collection server
16:16:37:Unit 01: Uploading 19.47MiB to 171.67.108.25
16:16:37:Connecting to 171.67.108.25:8080
16:16:37:WARNING: WorkServer connection failed on port 8080 trying 80
16:16:37:Connecting to 171.67.108.25:80
16:16:37:ERROR: Exception: Failed to connect to 171.67.108.25:80: Connection refused
16:16:37:Sending unit results: id:01 state:SEND error:OK project:6026 run:0 clone:44 gen:214 core:0xa3 unit:0x4b6ce1b14efe4e2400d6002c0000178a
16:16:37:Unit 01: Uploading 19.47MiB to 171.64.65.54
16:16:37:Connecting to 171.64.65.54:8080
16:17:52:WARNING: WorkServer connection failed on port 8080 trying 80
16:17:52:Connecting to 171.64.65.54:80
16:19:07:WARNING: Exception: Failed to send results to work server: Failed to connect to 171.64.65.54:80: Operation timed out
16:19:07:Trying to send results to collection server
16:19:07:Unit 01: Uploading 19.47MiB to 171.67.108.25
16:19:07:Connecting to 171.67.108.25:8080
16:19:07:WARNING: WorkServer connection failed on port 8080 trying 80
16:19:07:Connecting to 171.67.108.25:80
16:19:07:ERROR: Exception: Failed to connect to 171.67.108.25:80: Connection refused
16:19:08:Sending unit results: id:01 state:SEND error:OK project:6026 run:0 clone:44 gen:214 core:0xa3 unit:0x4b6ce1b14efe4e2400d6002c0000178a
16:19:08:Unit 01: Uploading 19.47MiB to 171.64.65.54
16:19:08:Connecting to 171.64.65.54:8080
16:20:22:WARNING: WorkServer connection failed on port 8080 trying 80
16:20:22:Connecting to 171.64.65.54:80
16:21:37:WARNING: Exception: Failed to send results to work server: Failed to connect to 171.64.65.54:80: Operation timed out
16:21:37:Trying to send results to collection server
16:21:37:Unit 01: Uploading 19.47MiB to 171.67.108.25
16:21:37:Connecting to 171.67.108.25:8080
16:21:37:WARNING: WorkServer connection failed on port 8080 trying 80
16:21:37:Connecting to 171.67.108.25:80
16:21:37:ERROR: Exception: Failed to connect to 171.67.108.25:80: Connection refused
16:21:37:Sending unit results: id:01 state:SEND error:OK project:6026 run:0 clone:44 gen:214 core:0xa3 unit:0x4b6ce1b14efe4e2400d6002c0000178a
16:21:37:Unit 01: Uploading 19.47MiB to 171.64.65.54
16:21:37:Connecting to 171.64.65.54:8080
16:22:52:WARNING: WorkServer connection failed on port 8080 trying 80
16:22:52:Connecting to 171.64.65.54:80
16:24:07:WARNING: Exception: Failed to send results to work server: Failed to connect to 171.64.65.54:80: Operation timed out
16:24:07:Trying to send results to collection server
16:24:07:Unit 01: Uploading 19.47MiB to 171.67.108.25
16:24:07:Connecting to 171.67.108.25:8080
16:24:08:WARNING: WorkServer connection failed on port 8080 trying 80
16:24:08:Connecting to 171.67.108.25:80
16:24:08:ERROR: Exception: Failed to connect to 171.67.108.25:80: Connection refused
16:24:15:Sending unit results: id:01 state:SEND error:OK project:6026 run:0 clone:44 gen:214 core:0xa3 unit:0x4b6ce1b14efe4e2400d6002c0000178a
16:24:15:Unit 01: Uploading 19.47MiB to 171.64.65.54
16:24:15:Connecting to 171.64.65.54:8080
16:25:05:Unit 00:Completed 1335000 out of 1500000 steps  (89%)
16:25:30:WARNING: WorkServer connection failed on port 8080 trying 80
16:25:30:Connecting to 171.64.65.54:80
16:26:45:WARNING: Exception: Failed to send results to work server: Failed to connect to 171.64.65.54:80: Operation timed out
16:26:45:Trying to send results to collection server
16:26:45:Unit 01: Uploading 19.47MiB to 171.67.108.25
16:26:45:Connecting to 171.67.108.25:8080
16:26:45:WARNING: WorkServer connection failed on port 8080 trying 80
16:26:45:Connecting to 171.67.108.25:80
16:26:45:ERROR: Exception: Failed to connect to 171.67.108.25:80: Connection refused
16:28:29:Sending unit results: id:01 state:SEND error:OK project:6026 run:0 clone:44 gen:214 core:0xa3 unit:0x4b6ce1b14efe4e2400d6002c0000178a
16:28:29:Unit 01: Uploading 19.47MiB to 171.64.65.54
16:28:29:Connecting to 171.64.65.54:8080

Re: 171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 5:05 pm
by Pick2
I'm having the same problem with the Version 6.29r1 client. Not sending finished WU and not receiving a new one
Server Stats shows 171.64.65.54 as "Down"

Re: 171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 5:34 pm
by Foxbat
Hopefully someone will see this before their New Year's Eve party and bring the server back. Internet-wise, everything looks cool until Traceroute gets inside of Stanford's campus:

Code: Select all

traceroute to 171.64.65.54 (171.64.65.54), 64 hops max, 52 byte packets
 1  aaa.bbb.ccc.ddd (aaa.bbb.ccc.ddd)  2.853 ms  2.429 ms  4.246 ms
 2  adsl-70-224-ccc-ddd.dsl.sbndin.ameritech.net (70.224.ccc.ddd)  41.779 ms  44.212 ms  44.232 ms
 3  dist2-vlan50.sbndin.sbcglobal.net (65.43.5.227)  43.436 ms  46.977 ms  45.701 ms
 4  151.164.101.32 (151.164.101.32)  44.289 ms  44.383 ms  45.953 ms
 5  cgcil03jt.ip.att.net (12.122.84.53)  45.951 ms  46.764 ms  45.917 ms
 6  192.205.37.174 (192.205.37.174)  45.963 ms
    192.205.37.178 (192.205.37.178)  46.870 ms  46.884 ms
 7  te0-4-0-1.ccr22.ord01.atlas.cogentco.com (154.54.6.209)  47.919 ms
    te0-2-0-1.ccr22.ord01.atlas.cogentco.com (154.54.29.21)  47.091 ms
    te0-4-0-1.ccr21.ord01.atlas.cogentco.com (154.54.25.65)  47.016 ms
 8  te0-1-0-3.ccr21.mci01.atlas.cogentco.com (154.54.25.81)  59.249 ms
    te0-4-0-5.ccr22.mci01.atlas.cogentco.com (154.54.45.149)  59.508 ms
    te0-4-0-5.ccr21.mci01.atlas.cogentco.com (154.54.45.145)  60.792 ms
 9  te0-0-0-2.ccr22.sfo01.atlas.cogentco.com (154.54.30.65)  98.724 ms
    te0-2-0-6.ccr21.sfo01.atlas.cogentco.com (154.54.45.62)  98.851 ms
    te0-2-0-6.ccr22.sfo01.atlas.cogentco.com (154.54.45.70)  99.043 ms
10  te3-5.ccr02.sjc04.atlas.cogentco.com (154.54.5.110)  98.682 ms
    te3-2.ccr02.sjc04.atlas.cogentco.com (154.54.7.174)  97.070 ms
    te3-5.ccr02.sjc04.atlas.cogentco.com (154.54.5.110)  98.919 ms
11  stanford_university2.demarc.cogentco.com (66.250.7.138)  99.194 ms  98.924 ms  97.893 ms
12  boundarya-rtr.stanford.edu (68.65.168.33)  99.761 ms  98.769 ms  97.901 ms
13  bbrb-rtr-b.stanford.edu (171.66.255.129)  97.935 ms  98.755 ms  98.161 ms
14  yoza-rtr-b.stanford.edu (171.66.255.144)  97.928 ms  99.013 ms  97.911 ms
15  * * *
16  * * *
^C

Re: 171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 6:09 pm
by TomJohnson
Same problem with all of my computers.

Happy New Year to All !!!

Re: 171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 6:23 pm
by iBozz
Same for me with a Gromacs SMP2 unit, on a quad-core i7, since 0515 GM/TUTC this morning and still awaiting a new unit and also with a G4 p3044-Human Hpin1 mutant 4 unit expected to complete around 0338 GMT/UTC tomorrow, 1 January .

Code: Select all

[05:15:37] Completed 500000 out of 500000 steps  (100%)
[05:15:38] DynamicWrapper: Finished Work Unit: sleep=10000
[05:15:48] 
[05:15:48] Finished Work Unit:
[05:15:48] - Reading up to 20449968 from "work/wudata_01.trr": Read 20449968
[05:15:48] trr file hash check passed.
[05:15:48] edr file hash check passed.
[05:15:48] logfile size: 61301
[05:15:48] Leaving Run
[05:15:52] - Writing 20545545 bytes of core data to disk...
[05:15:53]   ... Done.
[05:15:54] - Shutting down core
[05:15:54] 
[05:15:54] Folding@home Core Shutdown: FINISHED_UNIT
[05:15:54] CoreStatus = 64 (100)
[05:15:54] Unit 1 finished with 90 percent of time to deadline remaining.
[05:15:54] Updated performance fraction: 0.908555
[05:15:54] Sending work to server
[05:15:54] Project: 6080 (Run 0, Clone 161, Gen 185)


[05:15:54] + Attempting to send results [December 31 05:15:54 UTC]
[05:15:54] - Reading file work/wuresults_01.dat from core
[05:15:54]   (Read 20545545 bytes from disk)
[05:15:54] Connecting to http://171.64.65.54:8080/
[05:17:09] - Couldn't send HTTP request to server
[05:17:09] + Could not connect to Work Server (results)
[05:17:09]     (171.64.65.54:8080)
[05:17:09] + Retrying using alternative port
[05:17:09] Connecting to http://171.64.65.54:80/
[05:18:24] - Couldn't send HTTP request to server
[05:18:24] + Could not connect to Work Server (results)
[05:18:24]     (171.64.65.54:80)
[05:18:24] - Error: Could not transmit unit 01 (completed December 31) to work server.
[05:18:24] - 1 failed uploads of this unit.
[05:18:24]   Keeping unit 01 in queue.
[05:18:24] Trying to send all finished work units
[05:18:24] Project: 6080 (Run 0, Clone 161, Gen 185)


[05:18:24] + Attempting to send results [December 31 05:18:24 UTC]
[05:18:24] - Reading file work/wuresults_01.dat from core
[05:18:24]   (Read 20545545 bytes from disk)
[05:18:24] Connecting to http://171.64.65.54:8080/
[05:19:39] - Couldn't send HTTP request to server
[05:19:39] + Could not connect to Work Server (results)
[05:19:39]     (171.64.65.54:8080)
[05:19:39] + Retrying using alternative port
[05:19:39] Connecting to http://171.64.65.54:80/
[05:20:54] - Couldn't send HTTP request to server
[05:20:54] + Could not connect to Work Server (results)
[05:20:54]     (171.64.65.54:80)
[05:20:54] - Error: Could not transmit unit 01 (completed December 31) to work server.
[05:20:54] - 2 failed uploads of this unit.


[05:20:54] + Attempting to send results [December 31 05:20:54 UTC]
[05:20:54] - Reading file work/wuresults_01.dat from core
[05:20:54]   (Read 20545545 bytes from disk)
[05:20:54] Connecting to http://171.67.108.25:8080/
[05:20:54] - Couldn't send HTTP request to server
[05:20:54] + Could not connect to Work Server (results)
[05:20:54]     (171.67.108.25:8080)
[05:20:54] + Retrying using alternative port
[05:20:54] Connecting to http://171.67.108.25:80/
[05:20:54] - Couldn't send HTTP request to server
[05:20:54] + Could not connect to Work Server (results)
[05:20:54]     (171.67.108.25:80)
[05:20:54]   Could not transmit unit 01 to Collection server; keeping in queue.
[05:20:54] + Sent 0 of 1 completed units to the server
[05:20:54] - Preparing to get new work unit...
[05:20:54] Cleaning up work directory
[05:20:55] + Attempting to get work packet
[05:20:55] Passkey found
[05:20:55] - Will indicate memory of 4096 MB
[05:20:55] - Connecting to assignment server
[05:20:55] Connecting to http://assign.stanford.edu:8080/
[05:20:56] Posted data.
[05:20:56] Initial: 0000; + No appropriate work server was available; will try again in a bit.
[05:20:56] + Couldn't get work instructions.
[05:20:56] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[05:21:15] + Attempting to get work packet
[05:21:15] Passkey found
[05:21:15] - Will indicate memory of 4096 MB
[05:21:15] - Connecting to assignment server
[05:21:15] Connecting to http://assign.stanford.edu:8080/
[05:21:16] Posted data.
[05:21:16] Initial: 0000; + No appropriate work server was available; will try again in a bit.
[05:21:16] + Couldn't get work instructions.
[05:21:16] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
Happy New Year, one and all! :D

Re: 171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 7:07 pm
by Joe_H
iBozz wrote:Same for me with a Gromacs SMP2 unit, on a quad-core i7, since 0515 GM/TUTC this morning and still awaiting a new unit and also with a G4 p3044-Human Hpin1 mutant 4 unit expected to complete around 0338 GMT/UTC tomorrow, 1 January .

Happy New Year, one and all! :D
Your 3044 on a G4 should be fine, it goes to a different server that is up and accepting okay. I have turned in 1 WU to it earlier today and have another scheduled to go in tonight. Hopefully they get the 171.64.65.54 server back up for you soon, it is the only one available for OS X Intel folders running 6.29.

Re: 171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 9:14 pm
by kasson
This machine is currently down. I sent email to the Stanford sysadmins, and we'll do our best to get it up and running as soon as we can.

Re: 171.64.65.54 Failed to send results to work server

Posted: Sat Dec 31, 2011 11:50 pm
by Pick2
Thank you , and have a happy New Year 1

Edit: I see it went from "Down" to "Reject" ... one more drop kick will do it ! :lol:

Re: 171.64.65.54 Failed to send results to work server

Posted: Sun Jan 01, 2012 6:32 am
by Ravage7779
This server must have a rather funky looking case for all the times it has been kicked over the years...

Re: 171.64.65.54 Failed to send results to work server

Posted: Sun Jan 01, 2012 3:23 pm
by Foxbat
So, do we lose whatever Bonus(es) we would have received for turning in a WU under the time limit because this server isn't available?

Code: Select all

-------- Queue Dump of Unit at ~/Library/FAH-SMP-Term1 --------
qd released 29 July 2011 (fr 086); qd info 30 August 2011 (update-qd.pl)
qd executed Sun Jan 01 10:22:19 EST 2012 (Sun Jan 01 15:22:19 UTC 2012)
Queue version 6.00

Index 6: ready for upload 521.00 pts (40.346 pt/hr, 967.97 ppd) 7.81 X min speed
   bonus pts: 2516.57 (194.814 pt/hr, 4675.55 ppd); bonus factor: 4.83; kfactor: 2.99
   server: 171.64.65.54:8080; project: 6080
   Folding: run 0, clone 169, generation 202; benchmark 0; misc: 500, 629, 12 (le)
   issue: Fri Dec 30 21:17:27 2011; begin: Fri Dec 30 21:17:43 2011
   end: Sat Dec 31 10:12:31 2011; due: Wed Jan  4 02:05:42 2012 (4 days)
   preferred: Mon Jan  2 09:17:43 2012 (2 days)
   core URL: http://www.stanford.edu/~pande/OSX/x86/Core_a3.fah (V2.22)
   core number: 0xa3; core name: GRO-A3
   CPU: 1,0 x86; OS: 3,0 OSX
   smp cores: 4; cores to use: 4
   tag: P6080R0C169G202
   flops: 1063499229 (1063.499229 megaflops)
   memory: 8192 MB
   client type: 3 Advmethods
   assignment info (le): Fri Dec 30 21:15:43 2011; BDC82A06
   CS: 171.67.108.25; upload failures: 5; P limit: 524286976
   user: Foxbat; team: 55236; ID: B082247170277610; mach ID: 1
   work/wudata_06.dat file size: 1799007; WU type: Folding@Home

Re: 171.64.65.54 Failed to send results to work server

Posted: Sun Jan 01, 2012 4:27 pm
by kasson
We had to restore a couple files from our regular backups, but everything should be up and running now.

Re: 171.64.65.54 Failed to send results to work server

Posted: Mon Jan 02, 2012 3:46 am
by Foxbat
kasson wrote:We had to restore a couple files from our regular backups, but everything should be up and running now.
Ugh. Been there, done that. Thanks to the Sysadmin(s) who had to come in over the New Year's weekend to fix this!