Page 1 of 2
V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 4:16 pm
by G3WGV
Curious thing happened overnight (UK time). I have two systems, both folding with CPU and GPU slots. When I got up this morning and checked out the overnight results I found that the CPUs had been working away but neither GPU had received any new WUs. I didn't think too much about this, assuming that it was just a shortage of GPU work.
Eventually the next attempt time got to a few hours. I did the usual Pause, Fold and was surprised to see that the Next Attempt info just showed a blank, not even Unknown. I checked the log and could see the pause/fold commands but no attempt to request a WU.
After ten minutes or so and still no WU request, I closed and restarted the client: no change.
Checked Internet connections: no problem.
Eventually, I decided to reboot one of the machines. Voila! a WU almost immediately. I rebooted the other PC and, likewise, a WU within a minute or two.
Seems something died (at OS level?) or got itself out of sync, such that only a PC reboot would fix it.
It was a new one for me! Has anyone else experienced this?
System details:
AMD Ryzen 3950X, GeForce RTX 2070 Super, Windows 10
Intel i7-4770K, GeForce GTX 1650, Windows 7
John
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 4:21 pm
by bruce
Please post tha applicable log segments, including the PAUSE/UNPAUSE as well as the top couple of pages.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 4:26 pm
by G3WGV
They're gone, Bruce, as I rebooted both PCs. If it happens again I'll grab the logs before hitting the boot button. It's not a big deal, was just curious as it's the first time I've had any problem with V7.6.9. And it was odd that both PCs had the same problem despite totally different CPU architecture, O/S and GPU. Not something to spend time analysing!
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 4:45 pm
by Neil-B
The last 16 logs are stored by default in a sub directory:
Type %AppData%\FAHClient in Windows Explorer and hit Enter
A window will open which will contain F@H files and folders. Depending on your need, focus on:
A) log file -> This is the most recent log file
B) logs folder -> This contains the previous 16 log files
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 6:57 pm
by PantherX
I suspect that you encountered this issue:
https://github.com/FoldingAtHome/fah-issues/issues/983 This can be verified if you check you log file and notice that during download, the percentage would just stop and the only way to recover from it would be via the reboot of the client.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 7:07 pm
by G3WGV
Thanks PantherX, that does look very similar. Now I have examined the logs (thanks Neil-B) I can see that in each case a download had commenced and then stopped part way through... and that was the end of that slot. It happened on both systems at about the same time, which also supports the issue 983 narrative. The only oddity is that I seemed to have to reboot, not just restart the F@H client. It's not really a problem, especially now that I am aware of it. Thanks everyone.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 7:14 pm
by HaloJones
If your logs show you were hitting 155.247.166.220 then yes, it's bug 983 and a specifically fscked server that is misbehaving at the moment.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 7:24 pm
by G3WGV
Yes, that's the one, HaloJones, both machines within a few minutes of one-another. Glad we know what the problem is now. Many thanks.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 7:26 pm
by HaloJones
G3WGV wrote:Yes, that's the one, HaloJones, both machines within a few minutes of one-another. Glad we know what the problem is now. Many thanks.
we may know what the problem is but I don't have any way to inform the people with access to the server who can actually do something about it.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 7:49 pm
by G3WGV
That's an interesting observation. I must confess I don't really have any knowledge of the server infrastructure but I imagine it is highly distributed and therefore many different sysops. It seems rather odd that there is no way to communicate with the people that run them!
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 7:53 pm
by HaloJones
G3WGV wrote:That's an interesting observation. I must confess I don't really have any knowledge of the server infrastructure but I imagine it is highly distributed and therefore many different sysops. It seems rather odd that there is no way to communicate with the people that run them!
yes, it's frustrating. Users of systems often are the first to notice problems. I have deep experience running customer-facing ecommerce systems and I had access to the backends and also client simulators that alerted the moment there was any slowdown in the response times.
here we can report the problems but there's no guarantee that report will be seen any time soon.
I'd kill simply for the reboot button. from what I see the FAH servers need an automated regular reboot function to cycle them regularly.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 8:07 pm
by Joe_H
HaloJones wrote:we may know what the problem is but I don't have any way to inform the people with access to the server who can actually do something about it.
Posting here is what is available, and we do notify them. However it can take a while to identify a specific cause and correct that to fix the problem. What it comes down to is that you can see a problem, but identifying whether it is the server itself, the network it is on, or something else entirely is a different matter
As for rebooting, the server uptime currently is 3 hours, and about 3.5 TB of space left.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 8:13 pm
by HaloJones
good to hear it may be being looked at but what's the good of an assignment server if it isn't easy to pause the directing of traffic to a server that's misbehaving?
Especially when the server in question causes a fault in the client that has been known of for over seven years?
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 8:26 pm
by Joe_H
How do you determine "misbehaving"? They may see a 99% successful download rate, and without the report would not know there is an issue other than someone disconnecting during a download.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posted: Fri May 01, 2020 8:30 pm
by PantherX
HaloJones wrote:...but what's the good of an assignment server if it isn't easy to pause the directing of traffic to a server that's misbehaving?..
I understand your POV but this isn't like a toggle that you flip on/off in an OS/Application that has a team of developers and full customer experience team supporting it. The code is all manually written and it takes time for a single Developer to do that with limited resources. Maybe the problem isn't with AS, it is with WS not telling the AS that it is on a time-out. Debugging these issues under normal conditions is a challenge when things just work fine under normal load.
HaloJones wrote:...Especially when the server in question causes a fault in the client that has been known of for over seven years?
You might be surprised as to how many times this was "fixed" but then an edge case appeared which had to be resolved again. With each "fix" things got better and then I think that the 80/20 rule was applied where in most cases, things would be fine and if it isn't, restarting the client resolves the issue so that the development can focus on other things that can provide more scientific value to the researchers.