Page 1 of 1

Work fails to download more WU after finishing uploading.

Posted: Fri Aug 07, 2020 11:40 am
by Sandman192
The problem goes away after every restart of my computer.

I have not seen this problem on older versions of F@H. F@H v 7.6.13
Hasn't download for 2 days with no work running. Sometimes for CPU sometimes for GPU work. This has happened on 3 of my computers. 1 of which I've stopped using all to gether.
There's over a hundred of these saying the same thing in a row. "10053: An established connection was aborted by the software in your host machine."

Code: Select all

10:03:23:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
10:03:48:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
10:04:13:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
10:04:15:WU01:FS01:0x22:Watchdog shutdown failed, hard shutdown triggered
10:04:38:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
10:04:38:WARNING:WU01:FS01:FahCore returned: WU_STALLED (127 = 0x7f)

Code: Select all

********************************************************************************
10:04:40:WU01:FS01:0x22:Project: 16918 (Run 112, Clone 12, Gen 13)
10:04:40:WU01:FS01:0x22:Unit: 0x000000160002894c5f17618a4e2d8fe9
10:04:40:WU01:FS01:0x22:Digital signatures verified
10:04:40:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:04:40:WU01:FS01:0x22:Version 0.0.11
10:04:40:WU01:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
10:04:40:WU01:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
10:04:40:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
10:04:40:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
10:05:15:ERROR:Send error: 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
10:14:38:WU00:FS00:0xa7:Completed 195000 out of 250000 steps (78%)
10:29:07:WU00:FS00:0xa7:Completed 197500 out of 250000 steps (79%)
10:44:03:WU00:FS00:0xa7:Completed 200000 out of 250000 steps (80%)
10:58:37:WU00:FS00:0xa7:Completed 202500 out of 250000 steps (81%)
11:06:23:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
11:06:50:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
11:07:49:ERROR:Send error: 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
11:13:33:WU00:FS00:0xa7:Completed 205000 out of 250000 steps (82%)
11:27:47:WU00:FS00:0xa7:Completed 207500 out of 250000 steps (83%)
11:41:51:WU00:FS00:0xa7:Completed 210000 out of 250000 steps (84%)
11:55:56:WU00:FS00:0xa7:Completed 212500 out of 250000 steps (85%)
12:10:08:WU00:FS00:0xa7:Completed 215000 out of 250000 steps (86%)
12:24:21:WU00:FS00:0xa7:Completed 217500 out of 250000 steps (87%)
12:38:26:WU00:FS00:0xa7:Completed 220000 out of 250000 steps (88%)
12:52:38:WU00:FS00:0xa7:Completed 222500 out of 250000 steps (89%)
13:07:43:WU00:FS00:0xa7:Completed 225000 out of 250000 steps (90%)
13:23:11:WU00:FS00:0xa7:Completed 227500 out of 250000 steps (91%)
13:37:28:WU00:FS00:0xa7:Completed 230000 out of 250000 steps (92%)
******************************* Date: 2020-08-05 *******************************
13:51:38:WU00:FS00:0xa7:Completed 232500 out of 250000 steps (93%)
14:05:47:WU00:FS00:0xa7:Completed 235000 out of 250000 steps (94%)
14:19:55:WU00:FS00:0xa7:Completed 237500 out of 250000 steps (95%)
14:34:10:WU00:FS00:0xa7:Completed 240000 out of 250000 steps (96%)
14:48:23:WU00:FS00:0xa7:Completed 242500 out of 250000 steps (97%)
15:02:42:WU00:FS00:0xa7:Completed 245000 out of 250000 steps (98%)
15:17:00:WU00:FS00:0xa7:Completed 247500 out of 250000 steps (99%)
15:17:00:WU02:FS00:Connecting to assign1.foldingathome.org:80
15:17:01:WU02:FS00:Assigned to work server 150.136.14.110
15:17:01:WU02:FS00:Requesting new work unit for slot 00: RUNNING cpu:3 from 150.136.14.110
15:17:01:WU02:FS00:Connecting to 150.136.14.110:8080
15:17:01:WU02:FS00:Downloading 2.34MiB
15:31:22:WU00:FS00:0xa7:Completed 250000 out of 250000 steps (100%)
15:31:29:WU00:FS00:0xa7:Saving result file ..\logfile_01.txt
15:31:29:WU00:FS00:0xa7:Saving result file dhdl.xvg
15:31:29:WU00:FS00:0xa7:Saving result file frame318.trr
15:31:29:WU00:FS00:0xa7:Saving result file md.log
15:31:29:WU00:FS00:0xa7:Saving result file pullf.xvg
15:31:29:WU00:FS00:0xa7:Saving result file pullx.xvg
15:31:29:WU00:FS00:0xa7:Saving result file science.log
15:31:29:WU00:FS00:0xa7:Saving result file traj_comp.xtc
15:31:29:WU00:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
15:31:30:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
15:31:30:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14379 run:2253 clone:1 gen:318 core:0xa7 unit:0x00000164455e42075e932f852c3167a7
15:31:30:WU00:FS00:Uploading 6.47MiB to 69.94.66.7
15:31:30:WU00:FS00:Connecting to 69.94.66.7:8080
15:31:36:WU00:FS00:Upload 5.79%
15:31:42:WU00:FS00:Upload 11.59%
15:31:49:WU00:FS00:Upload 17.38%
15:31:58:WU00:FS00:Upload 23.17%
15:32:04:WU00:FS00:Upload 27.04%
15:32:10:WU00:FS00:Upload 32.83%
15:32:16:WU00:FS00:Upload 37.66%
15:32:22:WU00:FS00:Upload 41.52%
15:32:30:WU00:FS00:Upload 44.42%
15:32:36:WU00:FS00:Upload 47.31%
15:32:42:WU00:FS00:Upload 53.11%
15:32:48:WU00:FS00:Upload 57.94%
15:32:54:WU00:FS00:Upload 63.73%
15:33:00:WU00:FS00:Upload 69.52%
15:33:06:WU00:FS00:Upload 75.32%
15:33:12:WU00:FS00:Upload 81.11%
15:33:18:WU00:FS00:Upload 87.87%
15:33:24:WU00:FS00:Upload 94.63%
15:33:30:WU00:FS00:Upload complete
15:33:30:WU00:FS00:Server responded WORK_ACK (400)
15:33:30:WU00:FS00:Final credit estimate, 1252.00 points
15:33:30:WU00:FS00:Cleaning up
******************************* Date: 2020-08-07 *******************************
10:36:57:ERROR:Send error: 10054: An existing connection was forcibly closed by the remote host.

Re: Work fails to download more WU after finishing uploading

Posted: Fri Aug 07, 2020 1:46 pm
by Joe_H
The 10053 and 10054 network error messages are local and not connected with downloading a new WU. From prior experience those are reporting the local network connections to Web Control and FAHViewer being closed.

Things to check would be changes to the firewall and anti-malware settings especially if there has been updates applied recently by Windows Update for instance.

Re: Work fails to download more WU after finishing uploading

Posted: Fri Aug 07, 2020 3:37 pm
by Sandman192
I said this has never happened to older versions of F@H. And the problem always fixes its self after every reboot.
No anit-malware and firewall and anti-virus is from Windows. If it was then restarting my computer would not allow more downloading.
If it was from a Windows update then you be having it too on your Windows machine. Again it started when I updated to the newest version of F@H.

Re: Work fails to download more WU after finishing uploading

Posted: Fri Aug 07, 2020 4:48 pm
by Joe_H
This has happened to others in the past, both for older versions and the current version of the F@h client. You have also only provided a bare minimum of log information, not a single instance of a WU request failing.

As for updating, you may have to re-identify the FAHClient executable as being an exception, that executable would have changed when you updated. The Windows antivirus counts as anti-malware, so does the Windows firewall. From other Windows users, communication done by FAHClient does need to be in the "Private" zone.

So, post the first 100-200 lines of your current log to show the system, hardware and client setup. And post a section showing an actual WU request failing and we can look at it further.

Re: Work fails to download more WU after finishing uploading

Posted: Fri Aug 07, 2020 6:59 pm
by Neil-B
@Joe_H ... Fairly sure the 2nd code window of OP shows this ... if you see at the end it looks like it might be one of those "established connection" not doing anything errors ... It connects and "starts" download after which nothing happens.

15:17:01:WU02:FS00:Connecting to 150.136.14.110:8080
15:17:01:WU02:FS00:Downloading 2.34MiB

@Sandman192 ... I think what may be happening is that the download connection is hanging for some reason and this is not being spotted/cleared by client - if I am right and you are using windows you don't need to restart client you can just drop the established connection - next time this happens try using TCP View to drop the hanging connection - Fairly sure someone posted a Linux equivalent tool a while back.

Re: Work fails to download more WU after finishing uploading

Posted: Sun Aug 09, 2020 4:36 am
by Mxyzptlk
I have seen the same exact issue on two of my computers.

Re: Work fails to download more WU after finishing uploading

Posted: Sun Aug 09, 2020 5:44 am
by bruce
What is the reported value for CWD near the top of FAH"s log? Are you using the shortcut provided at install time?
Please see below about posting FAH's log.

Re: Work fails to download more WU after finishing uploading

Posted: Mon Aug 10, 2020 7:33 pm
by Sandman192
And post a section showing an actual WU request failing and we can look at it further.
I have. It's in the first 3 line in the quote and the second quote in the last line.
@Joe_H ... Fairly sure the 2nd code window of OP shows this ...
Think you for understanding.

Using WiFi.I found out for some reason my "Gaming" router is suppose to be good for giving out strong single connections for WiFi and it seems to sucks even at strong single (It's 20ft or so from my router. I';m glad I have my modem/router and connected to it and it's working fine now.

As for restarting my computer which restarts my WiFi only last long enough to download new work and has trouble after a day even though it's connected and still has a good single.
Could it be F@H not liking certon WiFi routers??? That seems very wired and odd and the first hardware to software bug ever.
I have seen the same exact issue on two of my computers.
Are you using WiFi? And if so, what router are you using?
Mines a Netgear XR500 and uptodate. V2.3.2.56

Re: Work fails to download more WU after finishing uploading

Posted: Mon Aug 10, 2020 8:49 pm
by Joe_H
Sandman192 wrote:Could it be F@H not liking certon WiFi routers??
Not so much certain WiFi routers as not liking network connections that are not reliable and stable. If the connection drops packets, especially ACK packets, the connection can stall with each side waiting on the other. The code to detect that kind of stalled connection has become better over the last few versions, but does not always catch the the condition and do a retry.

With a "gaming" router, its default network settings may prioritize sending and receiving packets used by games. Whether that is at the expense of the TCP packets carrying the HTTP data might take digging down into the documentation or hooking up a network analyzer.