Page 1 of 2

171.67.108.21

Posted: Thu Jun 07, 2012 1:00 am
by a_fool
Looks like this server has been having trouble for a few hours. CPU load looks high compared to normal (as seen in the server stats log).

Code: Select all

22:45:42:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:46:04:WARNING:WU01:FS02:Exception: Failed to send results to work server: Transfer failed
22:46:05:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:46:06:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
22:48:43:WARNING:WU01:FS02:Exception: Failed to send results to work server: Transfer failed
22:48:44:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:48:46:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
22:49:07:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:49:28:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
22:49:29:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:49:31:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
22:50:44:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:51:05:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
22:51:07:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:51:08:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
22:53:21:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:55:58:WARNING:WU01:FS02:Exception: Failed to send results to work server: Transfer failed
22:55:59:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:56:01:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
22:57:15:ERROR:WU03:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0
22:57:16:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:57:16:WARNING:WU03:FS02:WorkServer connection failed on port 8080 trying 80
22:57:17:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
22:57:18:ERROR:WU03:FS02:Exception: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
22:57:19:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
22:57:20:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
22:58:17:WARNING:WU03:FS02:WorkServer connection failed on port 8080 trying 80
22:58:18:ERROR:WU03:FS02:Exception: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
22:59:54:WARNING:WU03:FS02:WorkServer connection failed on port 8080 trying 80
22:59:55:ERROR:WU03:FS02:Exception: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:04:07:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
23:04:09:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:04:10:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
23:04:11:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:15:13:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
23:15:14:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:15:15:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
23:15:17:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:33:10:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
23:33:11:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:33:13:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
23:33:14:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
00:02:12:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
00:02:13:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
00:02:14:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
00:02:16:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
00:49:10:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
00:49:12:WARNING:WU01:FS02:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
00:49:13:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
00:49:15:ERROR:WU01:FS02:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.

Re: 171.67.108.21

Posted: Thu Jun 07, 2012 1:34 am
by ThunderRd
I can confirm the same here. Two of my GPU clients are in the same boat. The log is here; you can see that this one failed to autosend to both .108.21 and then the problematic .108.26. After that, the running WU (previously at 99%) finished, and successfully upped to .108.11. The client then tried again to upload the already finished unit in the queue to 108.21 and 108.26 with no success. This seems to indicate that there isn't a problem with the client, but the server instead.

I think the big question is, why is 108.21, a work server, still pointing at 108.26, a known 'down' collection server, on failure? We have known that 108.26 isn't working for quite a while now. There are several non-working CSs, and it seems a bit bizarre to still have WSs redirecting to them at this point.

Code: Select all

[01:19:10] Completed 98%
[01:20:29] Completed 99%
[01:21:13] - Autosending finished units... [June 7 01:21:13 UTC]
[01:21:13] Trying to send all finished work units
[01:21:13] Project: 10504 (Run 272, Clone 4, Gen 43)
[01:21:13] - Read packet limit of 540015616... Set to 524286976.


[01:21:13] + Attempting to send results [June 7 01:21:13 UTC]
[01:21:13] - Reading file work/wuresults_01.dat from core
[01:21:13]   (Read 130608 bytes from disk)
[01:21:13] Gpu type=2 species=11.
[01:21:13] Connecting to http://171.67.108.21:8080/
[01:21:15] - Couldn't send HTTP request to server
[01:21:15] + Could not connect to Work Server (results)
[01:21:15]     (171.67.108.21:8080)
[01:21:15] + Retrying using alternative port
[01:21:15] Connecting to http://171.67.108.21:80/
[01:21:17] - Couldn't send HTTP request to server
[01:21:17] + Could not connect to Work Server (results)
[01:21:17]     (171.67.108.21:80)
[01:21:17] - Error: Could not transmit unit 01 (completed June 6) to work
.
[01:21:17] - 4 failed uploads of this unit.
[01:21:17] - Read packet limit of 540015616... Set to 524286976.


[01:21:17] + Attempting to send results [June 7 01:21:17 UTC]
[01:21:17] - Reading file work/wuresults_01.dat from core
[01:21:17]   (Read 130608 bytes from disk)
[01:21:17] Gpu type=2 species=11.
[01:21:17] Connecting to http://171.67.108.26:8080/
[01:21:18] - Couldn't send HTTP request to server
[01:21:18] + Could not connect to Work Server (results)
[01:21:18]     (171.67.108.26:8080)
[01:21:18] + Retrying using alternative port
[01:21:18] Connecting to http://171.67.108.26:80/
[01:21:20] - Couldn't send HTTP request to server
[01:21:20] + Could not connect to Work Server (results)
[01:21:20]     (171.67.108.26:80)
[01:21:20]   Could not transmit unit 01 to Collection server; keeping in q
[01:21:20] + Sent 0 of 1 completed units to the server
[01:21:20] - Autosend completed
[01:21:50] Completed 100%
[01:21:50] Successful run
[01:21:50] DynamicWrapper: Finished Work Unit: sleep=10000
[01:22:00] Reserved 75892 bytes for xtc file; Cosm status=0
[01:22:00] Allocated 75892 bytes for xtc file
[01:22:00] - Reading up to 75892 from "work/wudata_02.xtc": Read 75892
[01:22:00] Read 75892 bytes from xtc file; available packet space=78635457
[01:22:00] xtc file hash check passed.
[01:22:00] Reserved 15168 15168 786354572 bytes for arc file=<work/wudata_
> Cosm status=0
[01:22:00] Allocated 15168 bytes for arc file
[01:22:00] - Reading up to 15168 from "work/wudata_02.trr": Read 15168
[01:22:00] Read 15168 bytes from arc file; available packet space=78633940
[01:22:00] trr file hash check passed.
[01:22:00] Allocated 560 bytes for edr file
[01:22:00] Read bedfile
[01:22:00] edr file hash check passed.
[01:22:00] Allocated 33303 bytes for logfile
[01:22:00] Read logfile
[01:22:00] GuardedRun: success in DynamicWrapper
[01:22:00] GuardedRun: done
[01:22:00] Run: GuardedRun completed.
[01:22:01] + Opened results file
[01:22:01] - Writing 125435 bytes of core data to disk...
[01:22:01] Done: 124923 -> 99387 (compressed to 79.5 percent)
[01:22:01]   ... Done.
[01:22:01] DeleteFrameFiles: successfully deleted file=work/wudata_02.ckp
[01:22:01] Shutting down core
[01:22:01]
[01:22:01] Folding@home Core Shutdown: FINISHED_UNIT
[01:22:05] CoreStatus = 64 (100)
[01:22:05] Unit 2 finished with 97 percent of time to deadline remaining.
[01:22:05] Updated performance fraction: 0.980147
[01:22:05] Sending work to server
[01:22:05] Project: 5766 (Run 0, Clone 292, Gen 2180)
[01:22:05] - Read packet limit of 540015616... Set to 524286976.


[01:22:05] + Attempting to send results [June 7 01:22:05 UTC]
[01:22:05] - Reading file work/wuresults_02.dat from core
[01:22:05]   (Read 99899 bytes from disk)
[01:22:05] Gpu type=2 species=11.
[01:22:05] Connecting to http://171.67.108.11:8080/
[01:22:12] Posted data.
[01:22:13] Initial: 0000; - Uploaded at ~12 kB/s
[01:22:13] - Averaged speed for that direction ~23 kB/s
[01:22:13] + Results successfully sent
[01:22:13] Thank you for your contribution to Folding@Home.
[01:22:13] + Number of Units Completed: 4810

[01:22:17] Trying to send all finished work units
[01:22:17] Project: 10504 (Run 272, Clone 4, Gen 43)
[01:22:17] - Read packet limit of 540015616... Set to 524286976.


[01:22:17] + Attempting to send results [June 7 01:22:17 UTC]
[01:22:17] - Reading file work/wuresults_01.dat from core
[01:22:17]   (Read 130608 bytes from disk)
[01:22:17] Gpu type=2 species=11.
[01:22:17] Connecting to http://171.67.108.21:8080/
[01:22:18] - Couldn't send HTTP request to server
[01:22:18] + Could not connect to Work Server (results)
[01:22:18]     (171.67.108.21:8080)
[01:22:18] + Retrying using alternative port
[01:22:18] Connecting to http://171.67.108.21:80/
[01:22:20] - Couldn't send HTTP request to server
[01:22:20] + Could not connect to Work Server (results)
[01:22:20]     (171.67.108.21:80)
[01:22:20] - Error: Could not transmit unit 01 (completed June 6) to work
.
[01:22:20] - 5 failed uploads of this unit.
[01:22:20] - Read packet limit of 540015616... Set to 524286976.


[01:22:20] + Attempting to send results [June 7 01:22:20 UTC]
[01:22:20] - Reading file work/wuresults_01.dat from core
[01:22:20]   (Read 130608 bytes from disk)
[01:22:20] Gpu type=2 species=11.
[01:22:20] Connecting to http://171.67.108.26:8080/
[01:22:22] - Couldn't send HTTP request to server
[01:22:22] + Could not connect to Work Server (results)
[01:22:22]     (171.67.108.26:8080)
[01:22:22] + Retrying using alternative port
[01:22:22] Connecting to http://171.67.108.26:80/
[01:22:23] - Couldn't send HTTP request to server
[01:22:23] + Could not connect to Work Server (results)
[01:22:23]     (171.67.108.26:80)
[01:22:23]   Could not transmit unit 01 to Collection server; keeping in q
[01:22:23] + Sent 0 of 1 completed units to the server
[01:22:23] - Preparing to get new work unit...
[01:22:23] Cleaning up work directory
[01:22:23] + Attempting to get work packet
[01:22:23] Passkey found
[01:22:23] - Will indicate memory of 1023 MB
[01:22:23] Gpu type=2 species=11.
[01:22:23] - Connecting to assignment server
[01:22:23] Connecting to http://assign-GPU.stanford.edu:8080/
[01:22:27] Posted data.
[01:22:27] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[01:22:27] + News From Folding@Home: Welcome to Folding@Home
[01:22:27] Loaded queue successfully.
[01:22:27] Gpu type=2 species=11.
[01:22:27] Sent data
[01:22:27] Connecting to http://171.67.108.11:8080/
[01:22:28] Posted data.
[01:22:28] Initial: 0000; - Receiving payload (expected size: 45918)
[01:22:29] - Downloaded at ~44 kB/s
[01:22:29] - Averaged speed for that direction ~55 kB/s
[01:22:29] + Received work.

Re: 171.67.108.21

Posted: Thu Jun 07, 2012 4:10 am
by sticks435
I'll add to the list. Got a WU that has been trying to send for about 5 hours now.

Code: Select all

23:17:23:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
23:17:23:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:17:23:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:17:23:WU00:FS00:Connecting to 171.67.108.21:8080
23:17:24:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:17:24:WU00:FS00:Connecting to 171.67.108.21:80
23:17:26:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:17:26:WU00:FS00:Trying to send results to collection server
23:17:26:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:17:26:WU00:FS00:Connecting to 171.67.108.26:8080
23:17:27:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:17:27:WU00:FS00:Connecting to 171.67.108.26:80
23:17:28:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:17:29:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:17:29:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:17:29:WU00:FS00:Connecting to 171.67.108.21:8080
23:17:30:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:17:30:WU00:FS00:Connecting to 171.67.108.21:80
23:17:31:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:17:31:WU00:FS00:Trying to send results to collection server
23:17:31:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:17:31:WU00:FS00:Connecting to 171.67.108.26:8080
23:17:33:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:17:33:WU00:FS00:Connecting to 171.67.108.26:80
23:17:34:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:18:29:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:18:29:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:18:29:WU00:FS00:Connecting to 171.67.108.21:8080
23:18:30:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:18:30:WU00:FS00:Connecting to 171.67.108.21:80
23:18:32:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:18:32:WU00:FS00:Trying to send results to collection server
23:18:32:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:18:32:WU00:FS00:Connecting to 171.67.108.26:8080
23:18:33:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:18:33:WU00:FS00:Connecting to 171.67.108.26:80
23:18:34:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:20:06:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:20:06:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:20:06:WU00:FS00:Connecting to 171.67.108.21:8080
23:20:07:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:20:07:WU00:FS00:Connecting to 171.67.108.21:80
23:20:09:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:20:09:WU00:FS00:Trying to send results to collection server
23:20:09:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:20:09:WU00:FS00:Connecting to 171.67.108.26:8080
23:20:10:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:20:10:WU00:FS00:Connecting to 171.67.108.26:80
23:20:12:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:22:43:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:22:43:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:22:43:WU00:FS00:Connecting to 171.67.108.21:8080
23:22:45:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:22:45:WU00:FS00:Connecting to 171.67.108.21:80
23:22:46:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:22:46:WU00:FS00:Trying to send results to collection server
23:22:46:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:22:46:WU00:FS00:Connecting to 171.67.108.26:8080
23:22:47:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:22:47:WU00:FS00:Connecting to 171.67.108.26:80
23:22:49:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:26:58:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:26:58:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:26:58:WU00:FS00:Connecting to 171.67.108.21:8080
23:26:59:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:26:59:WU00:FS00:Connecting to 171.67.108.21:80
23:27:00:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:27:00:WU00:FS00:Trying to send results to collection server
23:27:00:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:27:00:WU00:FS00:Connecting to 171.67.108.26:8080
23:27:02:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:27:02:WU00:FS00:Connecting to 171.67.108.26:80
23:27:03:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:33:49:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:33:49:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:33:49:WU00:FS00:Connecting to 171.67.108.21:8080
23:33:50:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:33:50:WU00:FS00:Connecting to 171.67.108.21:80
23:33:52:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:33:52:WU00:FS00:Trying to send results to collection server
23:33:52:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:33:52:WU00:FS00:Connecting to 171.67.108.26:8080
23:33:53:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:33:53:WU00:FS00:Connecting to 171.67.108.26:80
23:33:55:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
23:44:54:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
23:44:54:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
23:44:54:WU00:FS00:Connecting to 171.67.108.21:8080
23:44:56:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:44:56:WU00:FS00:Connecting to 171.67.108.21:80
23:44:57:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
23:44:57:WU00:FS00:Trying to send results to collection server
23:44:57:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
23:44:57:WU00:FS00:Connecting to 171.67.108.26:8080
23:44:58:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:44:58:WU00:FS00:Connecting to 171.67.108.26:80
23:45:00:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
00:02:51:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
00:02:51:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
00:02:51:WU00:FS00:Connecting to 171.67.108.21:8080
00:02:53:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
00:02:53:WU00:FS00:Connecting to 171.67.108.21:80
00:02:54:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
00:02:54:WU00:FS00:Trying to send results to collection server
00:02:54:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
00:02:54:WU00:FS00:Connecting to 171.67.108.26:8080
00:02:55:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
00:02:55:WU00:FS00:Connecting to 171.67.108.26:80
00:02:57:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
00:31:53:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
00:31:53:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
00:31:53:WU00:FS00:Connecting to 171.67.108.21:8080
00:31:55:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
00:31:55:WU00:FS00:Connecting to 171.67.108.21:80
00:31:56:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
00:31:56:WU00:FS00:Trying to send results to collection server
00:31:56:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
00:31:56:WU00:FS00:Connecting to 171.67.108.26:8080
00:31:58:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
00:31:58:WU00:FS00:Connecting to 171.67.108.26:80
00:31:59:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
01:18:52:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
01:18:52:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
01:18:52:WU00:FS00:Connecting to 171.67.108.21:8080
01:18:54:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
01:18:54:WU00:FS00:Connecting to 171.67.108.21:80
01:18:55:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
01:18:55:WU00:FS00:Trying to send results to collection server
01:18:55:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
01:18:55:WU00:FS00:Connecting to 171.67.108.26:8080
01:18:56:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
01:18:56:WU00:FS00:Connecting to 171.67.108.26:80
01:18:58:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
******************************** Date: 07/06/12 ********************************
02:34:53:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10502 run:121 clone:0 gen:237 core:0x11 unit:0x0000026b6652eda54b6f3a18000036de
02:34:53:WU00:FS00:Uploading 127.48KiB to 171.67.108.21
02:34:53:WU00:FS00:Connecting to 171.67.108.21:8080
02:34:55:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
02:34:55:WU00:FS00:Connecting to 171.67.108.21:80
02:34:56:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
02:34:56:WU00:FS00:Trying to send results to collection server
02:34:56:WU00:FS00:Uploading 127.48KiB to 171.67.108.26
02:34:56:WU00:FS00:Connecting to 171.67.108.26:8080
02:34:57:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
02:34:57:WU00:FS00:Connecting to 171.67.108.26:80
02:34:59:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.

Re: 171.67.108.21

Posted: Thu Jun 07, 2012 6:13 am
by bruce
ThunderRd wrote:I can confirm the same here. Two of my GPU clients are in the same boat. The log is here; you can see that this one failed to autosend to both .108.21 and then the problematic .108.26. After that, the running WU (previously at 99%) finished, and successfully upped to .108.11. The client then tried again to upload the already finished unit in the queue to 108.21 and 108.26 with no success. This seems to indicate that there isn't a problem with the client, but the server instead.

I think the big question is, why is 108.21, a work server, still pointing at 108.26, a known 'down' collection server, on failure? We have known that 108.26 isn't working for quite a while now. There are several non-working CSs, and it seems a bit bizarre to still have WSs redirecting to them at this point.
Id you check serverstat? That's what it says to do in the "Do this first" topic at the top of this forum. If you know that a known "down" collection server is being referenced, you should also know that serverstat.html says that 171.67.108.21 is rejecting connections. Until the server administator fixes that server, you won't be able to upload to it.

Why is are the WUs configured to point to the collection server 171.67.108.26? That's entirely different question. The work server and the projects that you're running were configured to distribute specific projects and those same projects are still working toward completion. Each trajectory has completed over 2000 sequential Gens but the protein hasn't finished folding yet. FAH doesn't stop a project until it's finished. Moreover, they don't shut down a server and upgrade the OS and the FAH software for the same reason; it's hard at work and with vary rare exceptions it doesn't need to be rebooted. Collection server was configured to be 171.67.108.26 when the server was originally outfitted with those projects. (It runs for many months without ever referencing a Collection Server, though it happens to be down now.)

The GPU projects do not receive early-return bonuses, so having a WU sit on your client for a while shouldn't really be cause you to be concerned.

Re: 171.67.108.21

Posted: Thu Jun 07, 2012 1:38 pm
by ThunderRd
DId you check serverstat?

Yes, of course I did. That's how I knew it wasn't my machine. I made that doubly clear, by referencing my log, to avoid people posting to "do this, do that". [You know this happens :)]
Why is are the WUs configured to point to the collection server 171.67.108.26? That's entirely different question.

Exactly.
Collection server was configured to be 171.67.108.26 when the server was originally outfitted with those projects. (It runs for many months without ever referencing a Collection Server, though it happens to be down now.)
This was helpful and clears me up a bit regarding how long/often it may happen.
The GPU projects do not receive early-return bonuses, so having a WU sit on your client for a while shouldn't really be cause you to be concerned.
I wasn't :) I have seen far bigger problems than this over the years I have been folding. :) And a few points one way or the other won't ruffle my feathers. :)

Bruce, after re-reading my post I see that it may have been interpreted as being a bit flippant, while it wasn't really intended to be so. I may be wrong, but I sense that you feel it was: be assured, I have the utmost respect for both you and the project. Actually, you are one of the top forum mods I have had the pleasure of dealing with, ever. And I don't think I am alone in that opinion.

Re: 171.67.108.21

Posted: Fri Jun 08, 2012 4:46 pm
by void4ever
I'll throw my hat into the ring, now that it's been over 24 hours. I also am unable to send my WU's to this server. It's CPU on the stats page has been sitting at 14.08 now since yesterday, and it's still in reject mode. Is there any word on this server?

Void4ever

Re: 171.67.108.21

Posted: Fri Jun 08, 2012 7:49 pm
by mflanaga
void4ever wrote:I'll throw my hat into the ring, now that it's been over 24 hours. I also am unable to send my WU's to this server. It's CPU on the stats page has been sitting at 14.08 now since yesterday, and it's still in reject mode. Is there any word on this server?

Void4ever
Same question here. I have stopped GPU folding until the issue is resolved. I can download wu's all day, but when they complete they wont upload.

Re: 171.67.108.21

Posted: Sat Jun 09, 2012 3:12 am
by ThunderRd
Yes, the server remains in reject, but there isn't really a reason to stop folding. Like Bruce said, the gpu WUs have a fairly long shelf life, so as long as the server gets a kick in the next few days, all of your finished WUs in the queue will upload.

Actually, since my first post around 48 hours ago, I haven't gotten any more WUs attached to .108.21, so all of my units are uploading except for the lone WU that needs that server.

Re: 171.67.108.21

Posted: Sat Jun 09, 2012 8:17 am
by Jesse_V
It actively refused one of my GPU WUs:

Code: Select all

*********************** Log Started 2012-06-09T08:16:04Z ***********************
08:16:04:************************* Folding@home Client *************************
08:16:04:      Website: http://folding.stanford.edu/
08:16:04:    Copyright: (c) 2009-2012 Stanford University
08:16:04:       Author: Joseph Coffland <[email protected]>
08:16:04:         Args: --lifeline 9224 --command-port=36330
08:16:04:       Config: C:/Users/Jesse/AppData/Roaming/FAHClient/config.xml
08:16:04:******************************** Build ********************************
08:16:04:      Version: 7.1.52
08:16:04:         Date: Mar 20 2012
08:16:04:         Time: 19:37:42
08:16:04:      SVN Rev: 3515
08:16:04:       Branch: fah/trunk/client
08:16:04:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
08:16:04:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
08:16:04:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
08:16:04:     Platform: win32 XP
08:16:04:         Bits: 32
08:16:04:         Mode: Release
08:16:04:******************************* System ********************************
08:16:04:          CPU: Intel(R) Core(TM)2 Quad CPU Q9000 @ 2.00GHz
08:16:04:       CPU ID: GenuineIntel Family 6 Model 23 Stepping 10
08:16:04:         CPUs: 4
08:16:04:       Memory: 4.00GiB
08:16:04:  Free Memory: 1.73GiB
08:16:04:      Threads: WINDOWS_THREADS
08:16:04:   On Battery: false
08:16:04:   UTC offset: -8
08:16:04:          PID: 8588
08:16:04:          CWD: C:/Users/Jesse/AppData/Roaming/FAHClient
08:16:04:           OS: Windows 7 Home Premium
08:16:04:      OS Arch: AMD64
08:16:04:         GPUs: 1
08:16:04:        GPU 0: NVIDIA:1 GT216 [GeForce GT 240M]
08:16:04:         CUDA: 1.2
08:16:04:  CUDA Driver: 2020
08:16:04:Win32 Service: false
08:16:04:***********************************************************************
08:16:04:<config>
08:16:04:  <!-- Folding Slot Configuration -->
08:16:04:  <gpu v='true'/>
08:16:04:
08:16:04:  <!-- Network -->
08:16:04:  <proxy v=':8080'/>
08:16:04:
08:16:04:  <!-- User Information -->
08:16:04:  <passkey v='********************************'/>
08:16:04:  <team v='195965'/>
08:16:04:  <user v='Jesse_Victors'/>
08:16:04:
08:16:04:  <!-- Folding Slots -->
08:16:04:  <slot id='0' type='GPU'>
08:16:04:    <client-type v='beta'/>
08:16:04:  </slot>
08:16:04:  <slot id='1' type='SMP'>
08:16:04:    <client-type v='beta'/>
08:16:04:    <cpus v='-1'/>
08:16:04:  </slot>
08:16:04:</config>
08:16:04:Trying to access database...
08:16:04:Successfully acquired database lock
08:16:04:Enabled folding slot 00: READY gpu:0:"GT216 [GeForce GT 240M]"
08:16:04:Enabled folding slot 01: READY smp:4
08:16:04:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10505 run:124 clone:26 gen:5 core:0x11 unit:0x0000001d6652eda54b844ec20000308b
08:16:04:WU00:FS00:Uploading 67.90KiB to 171.67.108.21
08:16:04:WU03:FS01:Starting
08:16:04:WU00:FS00:Connecting to 171.67.108.21:8080
08:16:04:WU03:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Jesse/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/beta/Core_a4.fah/FahCore_a4.exe -dir 03 -suffix 01 -version 701 -lifeline 8588 -checkpoint 15 -np 4
08:16:04:WU03:FS01:Started FahCore on PID 7800
08:16:04:WU03:FS01:Core PID:11112
08:16:05:WU03:FS01:FahCore 0xa4 started
08:16:05:WU02:FS00:Starting
08:16:05:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Jesse/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/G80/Core_11.fah/FahCore_11.exe -dir 02 -suffix 01 -version 701 -lifeline 8588 -checkpoint 15 -gpu 0
08:16:05:WU02:FS00:Started FahCore on PID 5520
08:16:05:WU02:FS00:Core PID:10176
08:16:05:WU02:FS00:FahCore 0x11 started
08:16:05:WU03:FS01:0xa4:
08:16:05:WU03:FS01:0xa4:*------------------------------*
08:16:05:WU03:FS01:0xa4:Folding@Home Gromacs GB Core
08:16:05:WU03:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
08:16:05:WU03:FS01:0xa4:
08:16:05:WU03:FS01:0xa4:Preparing to commence simulation
08:16:05:WU03:FS01:0xa4:- Looking at optimizations...
08:16:05:WU03:FS01:0xa4:- Files status OK
08:16:05:WU03:FS01:0xa4:- Expanded 885492 -> 2025756 (decompressed 228.7 percent)
08:16:05:WU03:FS01:0xa4:Called DecompressByteArray: compressed_data_size=885492 data_size=2025756, decompressed_data_size=2025756 diff=0
08:16:05:WU03:FS01:0xa4:- Digital signature verified
08:16:05:WU03:FS01:0xa4:
08:16:05:WU03:FS01:0xa4:Project: 8013 (Run 14, Clone 33, Gen 56)
08:16:05:WU03:FS01:0xa4:
08:16:05:WU03:FS01:0xa4:Assembly optimizations on if available.
08:16:05:WU03:FS01:0xa4:Entering M.D.
08:16:05:WU02:FS00:0x11:
08:16:05:WU02:FS00:0x11:*------------------------------*
08:16:05:WU02:FS00:0x11:Folding@Home GPU Core
08:16:05:WU02:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
08:16:05:WU02:FS00:0x11:
08:16:05:WU02:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
08:16:05:WU02:FS00:0x11:Build host: amoeba
08:16:05:WU02:FS00:0x11:Board Type: Nvidia
08:16:05:WU02:FS00:0x11:Core      : 
08:16:05:WU02:FS00:0x11:Preparing to commence simulation
08:16:05:WU02:FS00:0x11:- Looking at optimizations...
08:16:05:WU02:FS00:0x11:- Files status OK
08:16:05:WU02:FS00:0x11:- Expanded 46732 -> 252912 (decompressed 541.1 percent)
08:16:05:WU02:FS00:0x11:Called DecompressByteArray: compressed_data_size=46732 data_size=252912, decompressed_data_size=252912 diff=0
08:16:05:WU02:FS00:0x11:- Digital signature verified
08:16:05:WU02:FS00:0x11:
08:16:05:WU02:FS00:0x11:Project: 5766 (Run 14, Clone 221, Gen 151)
08:16:05:WU02:FS00:0x11:
08:16:05:WU02:FS00:0x11:Assembly optimizations on if available.
08:16:05:WU02:FS00:0x11:Entering M.D.
08:16:06:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:16:06:WU00:FS00:Connecting to 171.67.108.21:80
08:16:06:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
08:16:07:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
08:16:07:WU00:FS00:Trying to send results to collection server
08:16:07:WU00:FS00:Uploading 67.90KiB to 171.67.108.26
08:16:07:WU00:FS00:Connecting to 171.67.108.26:8080
08:16:08:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:16:08:WU00:FS00:Connecting to 171.67.108.26:80
08:16:10:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
08:16:10:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10505 run:124 clone:26 gen:5 core:0x11 unit:0x0000001d6652eda54b844ec20000308b
08:16:10:WU00:FS00:Uploading 67.90KiB to 171.67.108.21
08:16:10:WU00:FS00:Connecting to 171.67.108.21:8080
08:16:11:WU03:FS01:0xa4:Using Gromacs checkpoints
08:16:11:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:16:11:WU00:FS00:Connecting to 171.67.108.21:80
08:16:11:WU02:FS00:0x11:Will resume from checkpoint file
08:16:11:WU02:FS00:0x11:Tpr hash 02/wudata_01.tpr:  269433776 1377902909 4066144479 2670740556 436594937
08:16:11:WU02:FS00:0x11:
08:16:11:WU02:FS00:0x11:Calling fah_main args: 14 usage=100
08:16:11:WU02:FS00:0x11:
08:16:11:WU03:FS01:0xa4:Mapping NT from 4 to 4 
08:16:11:WU02:FS00:0x11:Working on Protein
08:16:12:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
08:16:12:WU00:FS00:Trying to send results to collection server
08:16:12:WU00:FS00:Uploading 67.90KiB to 171.67.108.26
08:16:12:WU00:FS00:Connecting to 171.67.108.26:8080
08:16:12:WU02:FS00:0x11:Client config unavailable.
08:16:13:WU02:FS00:0x11:Resuming from checkpoint
08:16:13:WU02:FS00:0x11:fcCheckPointResume: retreived and current tpr file hash:
08:16:13:WU02:FS00:0x11:   0    269433776    269433776
08:16:13:WU02:FS00:0x11:   1   1377902909   1377902909
08:16:13:WU02:FS00:0x11:   2   4066144479   4066144479
08:16:13:WU02:FS00:0x11:   3   2670740556   2670740556
08:16:13:WU02:FS00:0x11:   4    436594937    436594937
08:16:13:WU02:FS00:0x11:fcCheckPointResume: file hashes same.
08:16:13:WU02:FS00:0x11:fcCheckPointResume: state restored.
08:16:13:WU02:FS00:0x11:Verified 02/wudata_01.log
08:16:13:WU03:FS01:0xa4:Resuming from checkpoint
08:16:13:WU02:FS00:0x11:Verified 02/wudata_01.edr
08:16:13:WU03:FS01:0xa4:Verified 03/wudata_01.log
08:16:13:WU02:FS00:0x11:Starting GUI Server
08:16:13:WU02:FS00:0x11:Verified 02/wudata_01.xtc
08:16:13:WU02:FS00:0x11:Completed 51%
08:16:13:WU03:FS01:0xa4:Verified 03/wudata_01.trr
08:16:13:WU03:FS01:0xa4:Verified 03/wudata_01.xtc
08:16:13:WU03:FS01:0xa4:Verified 03/wudata_01.edr
08:16:13:WU03:FS01:0xa4:Completed 238220 out of 250000 steps  (95%)
08:16:14:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:16:14:WU00:FS00:Connecting to 171.67.108.26:80
08:16:15:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.
08:17:10:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:10505 run:124 clone:26 gen:5 core:0x11 unit:0x0000001d6652eda54b844ec20000308b
08:17:10:WU00:FS00:Uploading 67.90KiB to 171.67.108.21
08:17:10:WU00:FS00:Connecting to 171.67.108.21:8080
08:17:11:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:17:11:WU00:FS00:Connecting to 171.67.108.21:80
08:17:12:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.21:80: No connection could be made because the target machine actively refused it.
08:17:12:WU00:FS00:Trying to send results to collection server
08:17:12:WU00:FS00:Uploading 67.90KiB to 171.67.108.26
08:17:12:WU00:FS00:Connecting to 171.67.108.26:8080
08:17:14:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:17:14:WU00:FS00:Connecting to 171.67.108.26:80
08:17:15:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.26:80: No connection could be made because the target machine actively refused it.

Re: 171.67.108.21

Posted: Sat Jun 09, 2012 1:43 pm
by Joe_H
Jesse_V wrote:It actively refused one of my GPU WUs:
That is the definition of a server being in Reject status.

Code: Select all

06:30:00 PDT 2012	171.67.108.21	vsp07b	vvoelz	GPU	full	Reject	14.08	0	0	3	26815	7531	

Re: 171.67.108.21

Posted: Sun Jun 10, 2012 11:46 pm
by SomeStones
Looks like it is working. I had a couple WUs backed up and they both were send successfully.

Thanks to whoever fixed it - on Sunday.

Re: 171.67.108.21

Posted: Mon Jun 11, 2012 2:21 am
by Jesse_V
SomeStones wrote:Looks like it is working. I had a couple WUs backed up and they both were send successfully.

Thanks to whoever fixed it - on Sunday.
Indeed. It worked for me too. :D

Re: 171.67.108.21

Posted: Mon Jun 11, 2012 3:04 am
by bruce
The boss tends to work evey day of the week.

Re: 171.67.108.21

Posted: Mon Jun 11, 2012 4:25 am
by compdewd
The boss is good lol

I have a general server question. Why is a CPU load of something seemingly small like 14% so bad for a server?

Re: 171.67.108.21

Posted: Mon Jun 11, 2012 4:36 am
by Joe_H
The load number is not a percent, it is the average number of active processes. I assume they are showing the 1 minute number, the usual reports on linux/unix servers include 5 and 15 minute period numbers. What value of the load number is high depends on how many processors are available. 14 on a single CPU is very high, on a 16 CPU server it would be fine.