Page 1 of 2

140.163.4.235 and 140.163.4.241

Posted: Thu Oct 08, 2015 9:47 am
by billford
I've had several instances of difficult uploads over the last few days:

Code: Select all

08:52:25:WU01:FS01:0x18:Completed 4950000 out of 5000000 steps (99%)
08:56:09:WU01:FS01:0x18:Completed 5000000 out of 5000000 steps (100%)
08:56:15:WU01:FS01:0x18:Saving result file logfile_01.txt
08:56:15:WU01:FS01:0x18:Saving result file checkpointState.xml
08:56:16:WU01:FS01:0x18:Saving result file checkpt.crc
08:56:16:WU01:FS01:0x18:Saving result file log.txt
08:56:16:WU01:FS01:0x18:Saving result file positions.xtc
08:56:17:WU01:FS01:0x18:Folding@home Core Shutdown: FINISHED_UNIT
08:56:18:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:56:18:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10488 run:2 clone:25 gen:107 core:0x18 unit:0x00000097538b3dbb54af0b6cc50634ae
08:56:18:WU01:FS01:Uploading 12.60MiB to 140.163.4.235
08:56:18:WU01:FS01:Connecting to 140.163.4.235:8080
08:59:18:WU01:FS01:Upload 21.83%
08:59:18:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
08:59:18:WU01:FS01:Trying to send results to collection server
08:59:18:WU01:FS01:Uploading 12.60MiB to 140.163.4.241
08:59:18:WU01:FS01:Connecting to 140.163.4.241:8080
.
.
09:02:23:WU01:FS01:Upload 18.86%
09:02:23:ERROR:WU01:FS01:Exception: Transfer failed
09:02:23:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10488 run:2 clone:25 gen:107 core:0x18 unit:0x00000097538b3dbb54af0b6cc50634ae
09:02:23:WU01:FS01:Uploading 12.60MiB to 140.163.4.235
09:02:23:WU01:FS01:Connecting to 140.163.4.235:8080
.
.
09:04:53:WU01:FS01:Upload 32.75%
09:05:18:WU01:FS01:Upload 33.74%
09:05:18:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
09:05:18:WU01:FS01:Trying to send results to collection server
09:05:18:WU01:FS01:Uploading 12.60MiB to 140.163.4.241
09:05:18:WU01:FS01:Connecting to 140.163.4.241:8080
09:05:24:WU01:FS01:Upload 55.08%
09:05:30:WU01:FS01:Upload 98.25%
09:05:30:WU01:FS01:Upload complete
09:05:30:WU01:FS01:Server responded WORK_ACK (400)
09:05:30:WU01:FS01:Final credit estimate, 108247.00 points
09:05:30:WU01:FS01:Cleaning up
This one wasn't too bad, only took a few minutes, but on one occasion it took over an hour to get the WU uploaded. Same WS and CS in all cases, no problem with other WS's.

As luck would have it, it's been the only client getting work from that server so can't completely rule out an oddity on that machine but it's been rebooted a few times whilst updating GPU drivers.

Anyone else seeing it?

Re: 140.163.4.235 and 140.163.4.241

Posted: Thu Oct 08, 2015 9:23 pm
by JohnChodera
billford, any chance you are on a static IP address, or know what IP address you were using at the time of the upload?

We'll try to trace what was going on there.

John

Re: 140.163.4.235 and 140.163.4.241

Posted: Thu Oct 08, 2015 9:30 pm
by billford
I'm on a static IP- PM sent.

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 8:17 am
by billford
One of the other clients got a WU from 140.163.4.235 overnight and returned it without any problem.

Only time will tell if this is useful evidence of anything :?

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 11:19 am
by billford
Same machine, different (virtual?) WS (140.163.4.245), same CS, same problem again:

Code: Select all

10:36:32:WU01:FS01:0x21:Completed 4950000 out of 5000000 steps (99%)
10:38:29:WU01:FS01:0x21:Completed 5000000 out of 5000000 steps (100%)
10:38:32:WU01:FS01:0x21:Saving result file logfile_01.txt
10:38:32:WU01:FS01:0x21:Saving result file checkpointState.xml
10:38:34:WU01:FS01:0x21:Saving result file checkpt.crc
10:38:34:WU01:FS01:0x21:Saving result file log.txt
10:38:34:WU01:FS01:0x21:Saving result file positions.xtc
10:38:35:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
10:38:35:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
10:38:35:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10495 run:19 clone:11 gen:19 core:0x21 unit:0x0000001b8ca304f555e5df59e1a11b14
10:38:35:WU01:FS01:Uploading 8.76MiB to 140.163.4.245
10:38:35:WU01:FS01:Connecting to 140.163.4.245:8080
10:41:07:WU01:FS01:Upload 24.26%
10:41:36:WU01:FS01:Upload 27.83%
10:41:36:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
10:41:36:WU01:FS01:Trying to send results to collection server
10:41:36:WU01:FS01:Uploading 8.76MiB to 140.163.4.241
10:41:36:WU01:FS01:Connecting to 140.163.4.241:8080
10:45:11:WU01:FS01:Upload 24.98%
10:46:08:WU01:FS01:Upload 25.69%
10:46:09:ERROR:WU01:FS01:Exception: Transfer failed
10:46:09:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10495 run:19 clone:11 gen:19 core:0x21 unit:0x0000001b8ca304f555e5df59e1a11b14
10:46:09:WU01:FS01:Uploading 8.76MiB to 140.163.4.245
10:46:09:WU01:FS01:Connecting to 140.163.4.245:8080
10:48:42:WU01:FS01:Upload 54.95%
10:49:11:WU01:FS01:Upload 55.67%
10:49:11:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
10:49:11:WU01:FS01:Trying to send results to collection server
10:49:11:WU01:FS01:Uploading 8.76MiB to 140.163.4.241
10:49:11:WU01:FS01:Connecting to 140.163.4.241:8080
10:52:54:WU01:FS01:Upload 15.70%
10:53:42:WU01:FS01:Upload 17.84%
10:53:42:ERROR:WU01:FS01:Exception: Transfer failed
10:53:42:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10495 run:19 clone:11 gen:19 core:0x21 unit:0x0000001b8ca304f555e5df59e1a11b14
10:53:42:WU01:FS01:Uploading 8.76MiB to 140.163.4.245
10:53:42:WU01:FS01:Connecting to 140.163.4.245:8080
10:56:15:WU01:FS01:Upload 22.84%
10:56:44:WU01:FS01:Upload 27.83%
10:56:44:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
10:56:44:WU01:FS01:Trying to send results to collection server
10:56:44:WU01:FS01:Uploading 8.76MiB to 140.163.4.241
10:56:44:WU01:FS01:Connecting to 140.163.4.241:8080
11:01:14:WU01:FS01:Upload 29.26%
11:01:14:ERROR:WU01:FS01:Exception: Transfer failed
11:01:14:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10495 run:19 clone:11 gen:19 core:0x21 unit:0x0000001b8ca304f555e5df59e1a11b14
11:01:14:WU01:FS01:Uploading 8.76MiB to 140.163.4.245
11:01:14:WU01:FS01:Connecting to 140.163.4.245:8080
11:03:34:WU01:FS01:Upload 13.56%
11:03:50:WU01:FS01:Upload 15.70%
11:03:50:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
11:03:50:WU01:FS01:Trying to send results to collection server
11:03:50:WU01:FS01:Uploading 8.76MiB to 140.163.4.241
11:03:50:WU01:FS01:Connecting to 140.163.4.241:8080
11:06:35:WU01:FS01:Upload 28.55%
11:06:50:WU01:FS01:Upload 32.11%
11:06:50:ERROR:WU01:FS01:Exception: Transfer failed
11:06:51:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10495 run:19 clone:11 gen:19 core:0x21 unit:0x0000001b8ca304f555e5df59e1a11b14
11:06:51:WU01:FS01:Uploading 8.76MiB to 140.163.4.245
11:06:51:WU01:FS01:Connecting to 140.163.4.245:8080
11:09:21:WU01:FS01:Upload 27.12%
11:09:47:WU01:FS01:Upload 31.40%
11:09:47:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
11:09:47:WU01:FS01:Trying to send results to collection server
11:09:48:WU01:FS01:Uploading 8.76MiB to 140.163.4.241
11:09:48:WU01:FS01:Connecting to 140.163.4.241:8080
11:12:06:WU01:FS01:Upload 19.98%
11:12:20:WU01:FS01:Upload 22.84%
11:12:20:ERROR:WU01:FS01:Exception: Transfer failed
11:12:20:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:10495 run:19 clone:11 gen:19 core:0x21 unit:0x0000001b8ca304f555e5df59e1a11b14
11:12:20:WU01:FS01:Uploading 8.76MiB to 140.163.4.245
11:12:20:WU01:FS01:Connecting to 140.163.4.245:8080
11:12:32:WU01:FS01:Upload complete
11:12:32:WU01:FS01:Server responded WORK_ACK (400)
11:12:32:WU01:FS01:Final credit estimate, 54082.00 points
11:12:32:WU01:FS01:Cleaning up
If it's a problem on my machine (Linux Mint 17.2, not liking IPs starting with 140.163.4) then I wouldn't even know where to start looking.

For information, I have tried:

Watching "Network history" in the system monitor, it roughly agrees with the client log- some fraction is uploaded OK then it just stops until the connection times out.

From a different machine (a Mac) trying to connect to one of those IPs with Chromium either results in it just sitting there saying "Connecting" or erroring out with "Chromium's connection attempt to 140.163.4.245 was rejected. The website may be down, or your network may not be properly configured.". Other WS's (eg 171.64.65.58) work fine.

:?

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 11:33 am
by billford
Couple of very different routings it would seem:

Code: Select all

iMac-3:~ billford$ traceroute 140.163.4.245
traceroute to 140.163.4.245 (140.163.4.245), 64 hops max, 52 byte packets
 1  www.asusnetwork.net (192.168.1.1)  0.378 ms  0.262 ms  0.225 ms
 2  telehouse-gw4-lo1.idnet.net (212.69.63.98)  8.003 ms  7.571 ms  7.749 ms
 3  212.69.63.78 (212.69.63.78)  7.911 ms  7.423 ms  7.941 ms
 4  telehouse-gw1-fa0-0.402.idnet.net (212.69.63.73)  7.962 ms  7.703 ms  7.587 ms
 5  ge-2-25.r02.londen03.uk.bb.gin.ntt.net (213.130.48.161)  9.192 ms  8.363 ms  7.953 ms
 6  ae-4.r22.londen03.uk.bb.gin.ntt.net (129.250.5.24)  8.126 ms  7.729 ms  7.862 ms
 7  ae-4.r22.nycmny01.us.bb.gin.ntt.net (129.250.3.126)  83.998 ms  84.296 ms  88.303 ms
 8  ae-2.r05.nycmny01.us.bb.gin.ntt.net (129.250.4.173)  87.799 ms  94.942 ms  86.564 ms
 9  xe-0-7-0-35.r05.nycmny01.us.ce.gin.ntt.net (129.250.193.158)  76.991 ms  86.042 ms  77.491 ms
10  tengig0-1-0-0.agr03.nwrk01-nj.us.windstream.net (40.128.248.10)  80.179 ms
    tengig0-1-0-1.agr03.nwrk01-nj.us.windstream.net (40.128.248.12)  76.724 ms
    tengig0-1-0-0.agr03.nwrk01-nj.us.windstream.net (40.128.248.10)  79.857 ms
11  67.151.24.149 (67.151.24.149)  77.863 ms  91.471 ms  77.966 ms
12  74.8.57.6 (74.8.57.6)  86.273 ms  78.050 ms  77.264 ms
13  * * *

Code: Select all

iMac-3:~ billford$ traceroute 171.64.65.93
traceroute to 171.64.65.93 (171.64.65.93), 64 hops max, 52 byte packets
 1  www.asusnetwork.net (192.168.1.1)  0.409 ms  0.412 ms  0.242 ms
 2  telehouse-gw4-lo1.idnet.net (212.69.63.98)  7.940 ms  7.715 ms  7.466 ms
 3  212.69.63.78 (212.69.63.78)  7.781 ms  7.722 ms  7.887 ms
 4  10gigabitethernet5-1.core1.lon1.he.net (5.57.80.128)  7.720 ms  7.857 ms  8.549 ms
 5  10ge2-9.core1.lon2.he.net (72.52.92.222)  21.780 ms  7.947 ms  19.437 ms
 6  100ge1-1.core1.nyc4.he.net (72.52.92.166)  73.592 ms  75.074 ms  75.053 ms
 7  100ge15-1.core1.ash1.he.net (184.105.223.165)  79.110 ms  87.651 ms
    100ge7-2.core1.chi1.he.net (184.105.223.161)  90.262 ms
 8  10ge11-4.core1.pao1.he.net (184.105.222.173)  151.563 ms
    10ge9-2.core1.pao1.he.net (184.105.213.177)  139.211 ms  147.765 ms
 9  stanford-university.10gigabitethernet1-4.core1.pao1.he.net (216.218.209.118)  177.332 ms  141.343 ms  138.847 ms
10  csmx-west-rtr-vl8.sunet (171.64.255.214)  151.652 ms  148.278 ms  149.913 ms
11  vspg14d.stanford.edu (171.64.65.93)  142.620 ms !Z  142.902 ms !Z  143.520 ms !Z
iMac-3:~ billford$ 
Any help?

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 12:05 pm
by toTOW
The difference in routes is expected : 140.163.4.245 servers are on the east coast (Chodera Lab is at MSKCC in New York : https://www.mskcc.org/ ) while 171.64.65.93 is on the west coast (at Stanford University).

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 12:13 pm
by billford
I guessed it would be something like that, but I didn't know where the server was physically located.

It introduces another variable to the problem though, but why it should affect my uploads and not downloads is beyond me.

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 4:58 pm
by 7im
Asymmetric connection speeds?

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 5:00 pm
by bruce
140.163.4.24x don't show on serverstat, while 140.163.4.23x do. Do you suppose they're in different buildings?

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 6:21 pm
by billford
7im wrote:Asymmetric connection speeds?
Not that would cause an 8MB upload to take half an hour... my download speed is ~65Mbps, upload is ~16Mbps. Can't speak for New York's speeds :shock:

Re: 140.163.4.235 and 140.163.4.241

Posted: Fri Oct 09, 2015 6:32 pm
by billford
bruce wrote:140.163.4.24x don't show on serverstat, while 140.163.4.23x do. Do you suppose they're in different buildings?
Possibly, but that's not a question I can answer :ewink:

I've just been watching another one do the same thing, it got there after about half an hour.

Re: 140.163.4.235 and 140.163.4.241

Posted: Wed Oct 14, 2015 3:48 am
by cinetrope
I started receiving this error warning upon completion of 10495 (34,6,33)???

Code: Select all

*********************** Log Started 2015-10-14T02:51:24Z ***********************
02:51:26:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:51:28:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
02:51:29:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:51:31:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
02:51:33:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:51:35:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
02:51:36:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:51:38:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
02:52:33:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:52:35:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
02:52:37:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:52:39:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
02:54:10:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:54:12:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
02:54:14:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:54:15:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
02:56:48:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:56:49:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
02:56:51:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
02:56:53:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
03:01:02:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:01:04:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
03:01:05:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:01:07:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
03:07:53:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:07:55:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
03:07:56:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:07:58:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
03:18:59:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:19:00:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
03:19:02:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:19:04:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
03:36:55:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:36:57:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.245:80: No connection could be made because the target machine actively refused it.
03:36:59:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
03:37:00:ERROR:WU03:FS01:Exception: Failed to connect to 140.163.4.241:80: No connection could be made because the target machine actively refused it.
Mod edit: Added Code tags around log file

Re: 140.163.4.235 and 140.163.4.241

Posted: Wed Oct 14, 2015 4:03 am
by ArVee
ERROR
The requested URL could not be retrieved

The following error was encountered while trying to retrieve the URL: http://140.163.4.241/

Connection to 140.163.4.241 failed.

The system returned: (111) Connection refused

The remote host or network may be down. Please try the request again.

Generated Wed, 14 Oct 2015 03:58:06 GMT

This is now about 50 minutes old and I've just had my second machine start doing the same thing trying to get work to the same server.

Re: 140.163.4.235 and 140.163.4.241

Posted: Wed Oct 14, 2015 6:03 am
by sco01
05:47:19:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:10469 run:0 clone:458 gen:170 core:0x17 unit:0x00000126538b3db9538f449f8fcd0752
05:47:20:WU01:FS00:Uploading 15.88MiB to 140.163.4.233
05:47:20:WU01:FS00:Connecting to 140.163.4.233:8080
05:47:20:WU02:FS01:0xa4:Completed 142500 out of 250000 steps (57%)
05:47:22:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
05:47:22:WU01:FS00:Connecting to 140.163.4.233:80
05:47:27:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
05:47:27:WU01:FS00:Connecting to 140.163.4.241:80