V7.6.9 stopped asking for GPU work, needed reboot
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 61
- Joined: Sun Mar 22, 2020 10:52 pm
- Hardware configuration: A mishmash of systems little and large, ranging from an Udoo X86 Ultra up to a new beast completed 2020-03-25 with a Ryzen 9 3950X CPU & RTX 2070 GPU. F@H seems to like the 2070!
- Location: Near Penrith, Cumbria, UK
V7.6.9 stopped asking for GPU work, needed reboot
Curious thing happened overnight (UK time). I have two systems, both folding with CPU and GPU slots. When I got up this morning and checked out the overnight results I found that the CPUs had been working away but neither GPU had received any new WUs. I didn't think too much about this, assuming that it was just a shortage of GPU work.
Eventually the next attempt time got to a few hours. I did the usual Pause, Fold and was surprised to see that the Next Attempt info just showed a blank, not even Unknown. I checked the log and could see the pause/fold commands but no attempt to request a WU.
After ten minutes or so and still no WU request, I closed and restarted the client: no change.
Checked Internet connections: no problem.
Eventually, I decided to reboot one of the machines. Voila! a WU almost immediately. I rebooted the other PC and, likewise, a WU within a minute or two.
Seems something died (at OS level?) or got itself out of sync, such that only a PC reboot would fix it.
It was a new one for me! Has anyone else experienced this?
System details:
AMD Ryzen 3950X, GeForce RTX 2070 Super, Windows 10
Intel i7-4770K, GeForce GTX 1650, Windows 7
John
Eventually the next attempt time got to a few hours. I did the usual Pause, Fold and was surprised to see that the Next Attempt info just showed a blank, not even Unknown. I checked the log and could see the pause/fold commands but no attempt to request a WU.
After ten minutes or so and still no WU request, I closed and restarted the client: no change.
Checked Internet connections: no problem.
Eventually, I decided to reboot one of the machines. Voila! a WU almost immediately. I rebooted the other PC and, likewise, a WU within a minute or two.
Seems something died (at OS level?) or got itself out of sync, such that only a PC reboot would fix it.
It was a new one for me! Has anyone else experienced this?
System details:
AMD Ryzen 3950X, GeForce RTX 2070 Super, Windows 10
Intel i7-4770K, GeForce GTX 1650, Windows 7
John
Radio Amateur, light aircraft owner/pilot, computer nerd, mountaineer, organist (sort of) and, now, folder.
Re: V7.6.9 stopped asking for GPU work, needed reboot
Please post tha applicable log segments, including the PAUSE/UNPAUSE as well as the top couple of pages.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 61
- Joined: Sun Mar 22, 2020 10:52 pm
- Hardware configuration: A mishmash of systems little and large, ranging from an Udoo X86 Ultra up to a new beast completed 2020-03-25 with a Ryzen 9 3950X CPU & RTX 2070 GPU. F@H seems to like the 2070!
- Location: Near Penrith, Cumbria, UK
Re: V7.6.9 stopped asking for GPU work, needed reboot
They're gone, Bruce, as I rebooted both PCs. If it happens again I'll grab the logs before hitting the boot button. It's not a big deal, was just curious as it's the first time I've had any problem with V7.6.9. And it was odd that both PCs had the same problem despite totally different CPU architecture, O/S and GPU. Not something to spend time analysing!
Radio Amateur, light aircraft owner/pilot, computer nerd, mountaineer, organist (sort of) and, now, folder.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: V7.6.9 stopped asking for GPU work, needed reboot
The last 16 logs are stored by default in a sub directory:
Type %AppData%\FAHClient in Windows Explorer and hit Enter
A window will open which will contain F@H files and folders. Depending on your need, focus on:
A) log file -> This is the most recent log file
B) logs folder -> This contains the previous 16 log files
Type %AppData%\FAHClient in Windows Explorer and hit Enter
A window will open which will contain F@H files and folders. Depending on your need, focus on:
A) log file -> This is the most recent log file
B) logs folder -> This contains the previous 16 log files
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: V7.6.9 stopped asking for GPU work, needed reboot
I suspect that you encountered this issue: https://github.com/FoldingAtHome/fah-issues/issues/983 This can be verified if you check you log file and notice that during download, the percentage would just stop and the only way to recover from it would be via the reboot of the client.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 61
- Joined: Sun Mar 22, 2020 10:52 pm
- Hardware configuration: A mishmash of systems little and large, ranging from an Udoo X86 Ultra up to a new beast completed 2020-03-25 with a Ryzen 9 3950X CPU & RTX 2070 GPU. F@H seems to like the 2070!
- Location: Near Penrith, Cumbria, UK
Re: V7.6.9 stopped asking for GPU work, needed reboot
Thanks PantherX, that does look very similar. Now I have examined the logs (thanks Neil-B) I can see that in each case a download had commenced and then stopped part way through... and that was the end of that slot. It happened on both systems at about the same time, which also supports the issue 983 narrative. The only oddity is that I seemed to have to reboot, not just restart the F@H client. It's not really a problem, especially now that I am aware of it. Thanks everyone.
Radio Amateur, light aircraft owner/pilot, computer nerd, mountaineer, organist (sort of) and, now, folder.
Re: V7.6.9 stopped asking for GPU work, needed reboot
If your logs show you were hitting 155.247.166.220 then yes, it's bug 983 and a specifically fscked server that is misbehaving at the moment.
single 1070
-
- Posts: 61
- Joined: Sun Mar 22, 2020 10:52 pm
- Hardware configuration: A mishmash of systems little and large, ranging from an Udoo X86 Ultra up to a new beast completed 2020-03-25 with a Ryzen 9 3950X CPU & RTX 2070 GPU. F@H seems to like the 2070!
- Location: Near Penrith, Cumbria, UK
Re: V7.6.9 stopped asking for GPU work, needed reboot
Yes, that's the one, HaloJones, both machines within a few minutes of one-another. Glad we know what the problem is now. Many thanks.
Radio Amateur, light aircraft owner/pilot, computer nerd, mountaineer, organist (sort of) and, now, folder.
Re: V7.6.9 stopped asking for GPU work, needed reboot
G3WGV wrote:Yes, that's the one, HaloJones, both machines within a few minutes of one-another. Glad we know what the problem is now. Many thanks.
we may know what the problem is but I don't have any way to inform the people with access to the server who can actually do something about it.
single 1070
-
- Posts: 61
- Joined: Sun Mar 22, 2020 10:52 pm
- Hardware configuration: A mishmash of systems little and large, ranging from an Udoo X86 Ultra up to a new beast completed 2020-03-25 with a Ryzen 9 3950X CPU & RTX 2070 GPU. F@H seems to like the 2070!
- Location: Near Penrith, Cumbria, UK
Re: V7.6.9 stopped asking for GPU work, needed reboot
That's an interesting observation. I must confess I don't really have any knowledge of the server infrastructure but I imagine it is highly distributed and therefore many different sysops. It seems rather odd that there is no way to communicate with the people that run them!
Radio Amateur, light aircraft owner/pilot, computer nerd, mountaineer, organist (sort of) and, now, folder.
Re: V7.6.9 stopped asking for GPU work, needed reboot
yes, it's frustrating. Users of systems often are the first to notice problems. I have deep experience running customer-facing ecommerce systems and I had access to the backends and also client simulators that alerted the moment there was any slowdown in the response times.G3WGV wrote:That's an interesting observation. I must confess I don't really have any knowledge of the server infrastructure but I imagine it is highly distributed and therefore many different sysops. It seems rather odd that there is no way to communicate with the people that run them!
here we can report the problems but there's no guarantee that report will be seen any time soon.
I'd kill simply for the reboot button. from what I see the FAH servers need an automated regular reboot function to cycle them regularly.
single 1070
-
- Site Admin
- Posts: 7951
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: V7.6.9 stopped asking for GPU work, needed reboot
Posting here is what is available, and we do notify them. However it can take a while to identify a specific cause and correct that to fix the problem. What it comes down to is that you can see a problem, but identifying whether it is the server itself, the network it is on, or something else entirely is a different matterHaloJones wrote:we may know what the problem is but I don't have any way to inform the people with access to the server who can actually do something about it.
As for rebooting, the server uptime currently is 3 hours, and about 3.5 TB of space left.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: V7.6.9 stopped asking for GPU work, needed reboot
good to hear it may be being looked at but what's the good of an assignment server if it isn't easy to pause the directing of traffic to a server that's misbehaving?
Especially when the server in question causes a fault in the client that has been known of for over seven years?
Especially when the server in question causes a fault in the client that has been known of for over seven years?
single 1070
-
- Site Admin
- Posts: 7951
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: V7.6.9 stopped asking for GPU work, needed reboot
How do you determine "misbehaving"? They may see a 99% successful download rate, and without the report would not know there is an issue other than someone disconnecting during a download.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: V7.6.9 stopped asking for GPU work, needed reboot
I understand your POV but this isn't like a toggle that you flip on/off in an OS/Application that has a team of developers and full customer experience team supporting it. The code is all manually written and it takes time for a single Developer to do that with limited resources. Maybe the problem isn't with AS, it is with WS not telling the AS that it is on a time-out. Debugging these issues under normal conditions is a challenge when things just work fine under normal load.HaloJones wrote:...but what's the good of an assignment server if it isn't easy to pause the directing of traffic to a server that's misbehaving?..
You might be surprised as to how many times this was "fixed" but then an edge case appeared which had to be resolved again. With each "fix" things got better and then I think that the 80/20 rule was applied where in most cases, things would be fine and if it isn't, restarting the client resolves the issue so that the development can focus on other things that can provide more scientific value to the researchers.HaloJones wrote:...Especially when the server in question causes a fault in the client that has been known of for over seven years?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues