Page 2 of 2

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 10:03 pm
by TwistedKestrel
The way the downloads start but get immediately choked off makes me wonder if the server(s) is/are sitting behind a misbehaving IDS or something.

@Grendel: Having to kill FAHClient.exe is a secondary problem, Joe_H admitted earlier in the thread it's a known issue with the client sometimes when the download is interrupted

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 10:42 pm
by Grendel
Found what you referring to:
Joe_H wrote:There also is a known issue with the network code in the folding client, it can sometimes fail to retry a download or upload that fails for some reason. If this happens the client will just sit there and never retry the connection. It is more commonly seen when there are network problems, and there are some slight improvements in the current version 7.4.4 client. When this happens the only way to get the download or upload to resume is restarting the FAHClient process whether by rebooting the system or manually stopping and restarting that process.
Well, it's not just stopping & restarting -- pause/resume does nothing and while quitting the client will terminate the taskbar icon it leaves a FAHClient.exe zombie process that blocks restarting the client. Given the very slow d/l time for the WU's, my (not very) educated guess is that the server "uncleanly" terminated sending the WU. Could explain it if the client uses the TCP/IP API w/ an indefinite timeout somewhere. This is all very hypothetical and everything seems back to normal at the moment. My concern is that the clients don't recover from this on their own, requiring massive user interaction. This makes it a bit more than a secondary problem IMHO, the client should recover after not receiving a packet from the server for say 10min :) (Running 7.4..4 on all three machines BTW.)

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Tue Mar 24, 2015 10:49 pm
by bruce
Grendel wrote:What I find interesting is that the client apparently doesn't let go of the TCP connection even after two days (!), that may be worthwhile looking into.
This is a known bug which can easily be reproduced. While an upload or download is in progress, suppose there is a temporary interruption in your connection. Whether your ISP does that for you :roll: or you intentionally create the error (such as unplugging your modem) the upload or download will, of course, stop but after the connection is restored, the client will never recover unless you restart FAHClient.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Wed Mar 25, 2015 12:09 am
by Joe_H
It is not just a pause and resume from FAHControl that is needed, the actual FAHClient process needs to be restarted. Why you are seeing a zombie process, I can't tell. But I have not run into that problem personally.

As for detecting the failure of the network download or upload, from what I have seen personally and in a number of folders log files posted here, is that most of the time the client does detect the non-progress and retries on its own. Why it doesn't do so all of the time has not been found so far. As Bruce has mentioned, it is fairly reproducible, if it occurred almost all of the time an interruption happened the bug might have been tracked down by now.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Wed Mar 25, 2015 5:31 pm
by Grendel
Thanks for the info guys ! Will keep a close eye on the machines for a bit.

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Fri Mar 27, 2015 2:35 pm
by SvekY_007
Work server vspg14e stanford edu (IP 171 64 65 124) has wrong system time. Just wanted to inform that the clock is out of sync.
So, some people have had problems with getting assignments from this server, as reported in this thread. This has not happened to me, but maybe this is the problem??

I downloaded and uploaded WU Project 9008 (Run 328, Clone 3, Gen 170) yesterday (Mar 26, 2015) normally and got the estimated amount of points. Normal PPD of greater than 2000 points.

Symptoms:
FaH Control shows Assigned time 2015-03-26T08:58:41Z, yet the log shows

Code: Select all

08:44:59:WU00:FS00:Connecting to assign3.stanford.edu:8080
08:45:00:WU00:FS00:Assigned to work server 171.64.65.124
That's difference of 0:13:42.
This difference matches the "Diff Time" in seconds on server status page.

Yesterday, Mar 26, at one point, the server status page showed that servers vspg14b, vspg14c, vspg14d and vspg14e have Diff Time equal to about -822 (822s = 13min 42s).
The log for 171 64 65 124 shows a time drift increase of 5 second per day. Diff Time decreased from -748 at Mar 12 to -822 at Mar 26, so it is getting worse.

:?: I don't know if this should cause problems... :? And can't Stanford simply sync time, for the heck of it?
Hope this report helps. :-)

Re: unable tp get wu from 171.64.65.124 and 171.64.65.100

Posted: Fri Mar 27, 2015 6:00 pm
by Joe_H
As you have noted, the Diff Time being off has not affected your system getting a WU. It does not appear to be related at all to the occasional report of a failure in getting a WU assigned, downloaded or uploaded with this server. There is a very small number of reports considering the approximately 50,000 WU's a day being downloaded and uploaded to this server.

As to why this server has this time differential, I don't know. A PG member would have to answer that.