Page 1 of 13

Stanford Network Issue {Resolved}

Posted: Sun Dec 11, 2011 6:51 am
by PantherX
Please note that several F@H servers are down due to technical reasons. For details, please read this -> http://folding.typepad.com/news/2011/12 ... ord-1.html

Re: Stanford Network Issue

Posted: Sun Dec 11, 2011 11:03 am
by tear
Stats haven't seen updates for 6+ hours too.

Re: Stanford Network Issue

Posted: Sun Dec 11, 2011 2:39 pm
by PantherX
Here is an update:
UPDATE 4:30am Pacific Time: Chilled water came back on line at 11am, but several of our servers are still down. Our sysadmins will work to get them back up, but it may not be until Monday, depending on their availability on Sunday.

Re: Stanford Network Issue

Posted: Sun Dec 11, 2011 2:58 pm
by Ordinant
Sunday 11Dec11 is turning out to be a good day to do some long-postponed maintenance on several of my folding machines.

Re: Stanford Network Issue

Posted: Sun Dec 11, 2011 6:12 pm
by mattozan
I do hope work done in the last 12 hours was received and will count. My rig did continue receiving, crunching and submitting WUs all night. I assume that means that enough infrastructure was still running to maintain client communication and queue completed work.

Re: Stanford Network Issue

Posted: Sun Dec 11, 2011 6:52 pm
by Joe_H
Well, it looks like some of the machines are back up from what serverstats is showing. As for whether work was received, that would depend on which work server it came from. For instance I have a Project 6026 on one machine that finished just after this problem started yesterday. It is waiting to upload, and should later today since its work server is back up. It could not go to a collection server, the one for this project is not functioning. But some other projects do have a functioning collection server if the work server is not available. For other people it will all depend on their specific work load, but eventually most if not all should get collected and credited.

As for continuing work, once the Project 60nn server went offline, my machines started downloading from a server elsewhere at Stanford and getting Project 8001 WU's and being able to return them. But I will have to wait on the stats servers coming back later to see the points awarded. For others if their configuration only was eligible for WU's from the effected servers, then they just stopped folding for the night if they needed a new WU.

Re: Stanford Network Issue

Posted: Sun Dec 11, 2011 9:11 pm
by Napoleon
It seems I haven't received credit for these three classic WUs (server 171.67.108.53):

Napoleon, team _Solo_ (191980)
project:6892 run:262 clone:8 gen:64
project:6892 run:710 clone:7 gen:45
project:6892 run:158 clone:9 gen:44

Code: Select all

*snip*: Date 2011-12-11
07:38:12:Sending unit results: id:05 state:SEND error:OK project:6892 run:262 clone:8 gen:64 core:0x78 unit:0x000000486652edc54e25cc4a185a1f21
07:38:12:Unit 05: Uploading 1.01MiB to 171.67.108.53
07:38:12:Connecting to 171.67.108.53:8080
07:38:22:Unit 05: Upload complete
07:38:22:Server responded WORK_ACK (400)
07:38:22:Final credit estimate, 136.00 points
07:38:22:Cleaning up Unit 05

*snip: Date 2011-12-11
09:33:27:Sending unit results: id:06 state:SEND error:OK project:6892 run:710 clone:7 gen:45 core:0x78 unit:0x000000366652edc54e25d4d8ce114bb2
09:33:27:Unit 06: Uploading 1.01MiB to 171.67.108.53
09:33:27:Connecting to 171.67.108.53:8080
09:33:36:Unit 06: Upload complete
09:33:36:Server responded WORK_ACK (400)
09:33:36:Final credit estimate, 136.00 points
09:33:36:Cleaning up Unit 06

*snip*: Date 2011-12-11
13:11:20:Sending unit results: id:01 state:SEND error:OK project:6892 run:158 clone:9 gen:44 core:0x78 unit:0x0000002e6652edc54e25ca53ffa1dfc7
13:11:20:Unit 01: Uploading 995.87KiB to 171.67.108.53
13:11:20:Connecting to 171.67.108.53:8080
13:11:29:Unit 01: Upload complete
13:11:29:Server responded WORK_ACK (400)
13:11:29:Final credit estimate, 136.00 points
13:11:29:Cleaning up Unit 01
For comparison purposes - a GPU WU did receive credit (server 171.64.65.105): :biggrin:

Zotac430, team _Solo_ (191980)
project:7622 run:281 clone:0 gen:2

Code: Select all

*snip*: Date 2011-12-11
00:21:11:Sending unit results: id:00 state:SEND error:OK project:7622 run:281 clone:0 gen:2 core:0x15 unit:0x00000002664f2dd14edd583af4a6ea08
00:21:11:Unit 00: Uploading 804.87KiB to 171.64.65.105
00:21:11:Connecting to 171.64.65.105:8080
00:21:22:Unit 00: Upload complete
00:21:22:Server responded WORK_ACK (400)
00:21:22:Final credit estimate, 5187.00 points
00:21:22:Cleaning up Unit 00

Re: Stanford Network Issue

Posted: Sun Dec 11, 2011 9:56 pm
by mattozan
I haven't seen any credit since prior to the server trouble. But my GPUs continue to receive, crunch and submit work. Here's the edited log of one of my GPUs.

You can see where it ran into trouble trying to upload a completed WU at 02:30 to 171.67.108.26

But then it uploaded OK with the next completed WU three hrs later. After that things have seemed to go OK.

But my "Date of last work unit" is still stuck on "2011-12-10 16:02:18"

Code: Select all

[23:40:47] + Attempting to send results [December 10 23:40:47 UTC]
[23:40:47] Gpu type=3 species=21.
[23:40:48] + Results successfully sent
[23:40:48] Thank you for your contribution to Folding@Home.
[23:40:48] + Number of Units Completed: 99
[23:40:52] - Preparing to get new work unit...
[23:40:52] Cleaning up work directory
[23:40:52] + Attempting to get work packet
[23:40:52] Passkey found
[23:40:52] Gpu type=3 species=21.
[23:40:52] - Connecting to assignment server
[23:40:52] - Successful: assigned to (171.64.65.64).


[02:30:19] + Attempting to send results [December 11 02:30:19 UTC]
[02:30:19] Gpu type=3 species=21.
[02:30:20] - Couldn't send HTTP request to server
[02:30:20] + Could not connect to Work Server (results)
[02:30:20]     (171.67.108.26:8080)
[02:30:20] + Retrying using alternative port
[02:30:21] - Couldn't send HTTP request to server
[02:30:21] + Could not connect to Work Server (results)
[02:30:21]     (171.67.108.26:80)
[02:30:21]   Could not transmit unit 00 to Collection server; keeping in queue.
[02:30:21] - Preparing to get new work unit...
[02:30:21] Cleaning up work directory
[02:30:21] + Attempting to get work packet
[02:30:21] Passkey found
[02:30:21] Gpu type=3 species=21.
[02:30:21] - Connecting to assignment server
[02:30:21] - Successful: assigned to (171.67.108.32).


[04:32:22] + Attempting to send results [December 11 04:32:22 UTC]
[04:32:22] Gpu type=3 species=21.
[04:32:23] + Results successfully sent
[04:32:23] Thank you for your contribution to Folding@Home.
[04:32:23] + Number of Units Completed: 100
[04:32:27] Project: 6800 (Run 17638, Clone 0, Gen 938)
[04:32:27] - Read packet limit of 540015616... Set to 524286976.


[04:32:27] + Attempting to send results [December 11 04:32:27 UTC]
[04:32:27] Gpu type=3 species=21.
[04:32:28] + Results successfully sent
[04:32:28] Thank you for your contribution to Folding@Home.
[04:32:28] + Number of Units Completed: 101
[04:32:28] - Preparing to get new work unit...
[04:32:28] Cleaning up work directory
[04:32:28] + Attempting to get work packet
[04:32:28] Passkey found
[04:32:28] Gpu type=3 species=21.
[04:32:28] - Connecting to assignment server
[04:32:28] - Successful: assigned to (171.64.65.64).



[07:20:10] + Attempting to send results [December 11 07:20:10 UTC]
[07:20:10] Gpu type=3 species=21.
[07:20:10] + Results successfully sent
[07:20:10] Thank you for your contribution to Folding@Home.
[07:20:10] + Number of Units Completed: 102
[07:20:14] - Preparing to get new work unit...
[07:20:14] Cleaning up work directory
[07:20:14] + Attempting to get work packet
[07:20:14] Passkey found
[07:20:14] Gpu type=3 species=21.
[07:20:14] - Connecting to assignment server
[07:20:14] - Successful: assigned to (171.67.108.54).



[10:07:53] + Attempting to send results [December 11 10:07:53 UTC]
[10:07:53] Gpu type=3 species=21.
[10:07:53] + Results successfully sent
[10:07:53] Thank you for your contribution to Folding@Home.
[10:07:53] + Number of Units Completed: 103
[10:07:57] - Preparing to get new work unit...
[10:07:57] Cleaning up work directory
[10:07:57] + Attempting to get work packet
[10:07:57] Passkey found
[10:07:57] Gpu type=3 species=21.
[10:07:57] - Connecting to assignment server
[10:07:57] - Successful: assigned to (171.67.108.54).



[12:55:24] + Attempting to send results [December 11 12:55:24 UTC]
[12:55:24] Gpu type=3 species=21.
[12:55:25] + Results successfully sent
[12:55:25] Thank you for your contribution to Folding@Home.
[12:55:25] + Number of Units Completed: 104
[12:55:29] - Preparing to get new work unit...
[12:55:29] Cleaning up work directory
[12:55:29] + Attempting to get work packet
[12:55:29] Passkey found
[12:55:29] Gpu type=3 species=21.
[12:55:29] - Connecting to assignment server
[12:55:29] - Successful: assigned to (171.64.65.64).



[15:43:00] + Attempting to send results [December 11 15:43:00 UTC]
[15:43:00] Gpu type=3 species=21.
[15:43:00] + Results successfully sent
[15:43:00] Thank you for your contribution to Folding@Home.
[15:43:00] + Number of Units Completed: 105
[15:43:04] - Preparing to get new work unit...
[15:43:04] Cleaning up work directory
[15:43:04] + Attempting to get work packet
[15:43:04] Passkey found
[15:43:04] Gpu type=3 species=21.
[15:43:04] - Connecting to assignment server
[15:43:04] - Successful: assigned to (171.67.108.54).



[18:30:33] + Attempting to send results [December 11 18:30:33 UTC]
[18:30:33] Gpu type=3 species=21.
[18:30:33] + Results successfully sent
[18:30:33] Thank you for your contribution to Folding@Home.
[18:30:33] + Number of Units Completed: 106
[18:30:38] - Preparing to get new work unit...
[18:30:38] Cleaning up work directory
[18:30:38] + Attempting to get work packet
[18:30:38] Passkey found
[18:30:38] Gpu type=3 species=21.
[18:30:38] - Connecting to assignment server
[18:30:38] - Successful: assigned to (171.64.65.64).



[21:18:56] + Attempting to send results [December 11 21:18:56 UTC]
[21:18:56] Gpu type=3 species=21.
[21:19:05] + Results successfully sent
[21:19:05] Thank you for your contribution to Folding@Home.
[21:19:05] + Number of Units Completed: 107
[21:19:09] - Preparing to get new work unit...
[21:19:09] Cleaning up work directory
[21:19:09] + Attempting to get work packet
[21:19:09] Passkey found
[21:19:09] Gpu type=3 species=21.
[21:19:09] - Connecting to assignment server
[21:19:09] - Successful: assigned to (171.67.108.54).

Re: Stanford Network Issue

Posted: Mon Dec 12, 2011 5:34 am
by PantherX
Another update:
UPDATE 11:30am Pacific time: Our sysadmins have been in the office getting machines back on line. We're almost there, although it looks like there are a few machines which have issues resulting from the outage.

Re: Stanford Network Issue

Posted: Mon Dec 12, 2011 6:02 am
by joe53
You know, I'm sanguine about personal WUs and PPD not being credited.

As long as I know the work is being done, and is useful.

My client says the work continues to be successfully sent, but Sanford's site says nothing has been received for about the past 36 hours. I hope these 5 or 6 work units have not been irretrievedly wasted.

Re: Stanford Network Issue

Posted: Mon Dec 12, 2011 6:14 am
by mattozan
joe53 wrote:You know, I'm sanguine about personal WUs and PPD not being credited.

As long as I know the work is being done, and is useful.

My client says the work continues to be successfully sent, but Sanford's site says nothing has been received for about the past 36 hours. I hope these 5 or 6 work units have not been irretrievedly wasted.
Yeah, I'm of the same mind. Points aren't the real thing. But I hope the lack of points updates for me doesn't also mean that the science data is somehow going to dev/null

"Date of last work unit 2011-12-10 16:02:18"

Re: Stanford Network Issue

Posted: Mon Dec 12, 2011 6:22 am
by Jesse_V
joe53 wrote:You know, I'm sanguine about personal WUs and PPD not being credited.

As long as I know the work is being done, and is useful.

My client says the work continues to be successfully sent, but Sanford's site says nothing has been received for about the past 36 hours. I hope these 5 or 6 work units have not been irretrievedly wasted.
I doubt it. Your WUs are probably doing fine. The servers that update the stats must be having problems. Here's how I'm guessing things are set up: you upload your WUs to server A. (I'm calling the set of servers A) Then server B periodically checks server A and updates the stats, which server C retrieves when you check online. You go online and get your stats from server C. Now, I'm not sure if B and C are the same (they could be since you can't see personal stats during updates) but if this configuration is correct then server B is down, so the info doesn't go all the way through. If B and C are indeed the same, then it could have one of those "issues resulting from the outage".

Re: Stanford Network Issue

Posted: Mon Dec 12, 2011 10:45 am
by Dinkydau
Yes, it gives the idea to be doing work for nothing.

Re: Stanford Network Issue

Posted: Mon Dec 12, 2011 11:20 am
by War
I have chilled water, send the servers to me
War
Stabellsveg 9a
7021 Trondheim
Norway

Also have a 70/10 Internet, its crap but it works.

Re: Stanford Network Issue

Posted: Mon Dec 12, 2011 3:30 pm
by kromberg
Any news or status on the server outage? This has been one of the longer ones I have seen.