Page 4 of 10

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sat May 27, 2017 10:18 pm
by Aurum
We should be able to specify a Failover work server list.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 9:46 am
by Aurum
Adam, You can try using a firewall rule to block the work server. Foldy posted for Windows and another for Linux. Sometimes toggling Pause to Fold for the idle GPU in Advanced Control gets a WU DLed. It's real hit or miss either way.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 1:45 pm
by Sn0wy23
Have had this issue for a day or so now :roll:

Thanks to Foldy's IP block in Windows I am back running both machines.

I had tried reboot, pausing and restarting, clearing cache etc so the only fix is to direct the client away from the offending IP :twisted: until it is back running again. Cheers Foldy!

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 2:20 pm
by rwh202
Is the firewall block really confirmed to work?
I've applied the rule in Linux Mint and still get assigned to the offending server 9 out of 10 times. The only solution I've found is continual pausing and un-pausing to reset the throttling delay in requesting assignments, but getting tedious on 10 rigs that need new WUs every hour or so.
How hard is it to pull the plug on the server and stop assignments to it?

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 2:30 pm
by Aurum
How hard is it :?: :?: :?:
Must be excruciating because it's been days.

I agree, the IP blocking does not seem to work. The AS assigns me to 171.67.108.105 every single time. Occasionally it reassigns me to another WS after a while.
Rig by rig, as they go idle, I'm moving them to crunch another project.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 2:35 pm
by Aurum
Server status do not make sense. I just caught a WU by getting reassigned to 171.67.108.159 but it says WUs Avail = 0 and WUs To Go = 0.
http://fah-web.stanford.edu/pybeta/serverstat.html

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 2:48 pm
by Joe_H
Aurum wrote:Server status do not make sense. I just caught a WU by getting reassigned to 171.67.108.159 but it says WUs Avail = 0 and WUs To Go = 0.
http://fah-web.stanford.edu/pybeta/serverstat.html
Those fields usually do have zeros on currently active servers. Changes in the work server code since the fields were defined for serverstat has made those fields useless for telling how many WU's a particular WS has available. It has been that way for years. About the only lines for WS's that have numbers showing in those fields are for ones that are inactive.

One column that still holds meaningful information about the numbers of WU's is WUs Rcv. That shows the number of WU's collected since the last update sent to the stats server. The script that collects the logs to update the stats runs once an hour, the number in the WUs Rcv column will climb as WU's are returned until the stats are collected.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 2:58 pm
by Aurum
Then please rewrite Troubleshooting Server Connectivity Issues (Do This First) and stop telling us to do useless things.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 3:10 pm
by Joe_H
Where in that topic does it tell you to check those columns? The columns are only mentioned as having information that might be informative to an expert user.

The actual troubleshooting steps do not include any checking of information in those columns. It does mention checking to see if a particular server is up, and how to do so.

As for rewriting those topics, they are on a long list of material that needs to be updated.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 3:25 pm
by SombraGuerrero
I haven't experimented with the firewall solution in a Linux environment, but I can say from a Windows perspective that it is doing what I would expect. It's blocking the work server which does eventually force the logic to pick a different one. The real reason it doesn't appear to be/actually isn't particularly effective is that there's really no way to circumvent the behavior of the assignment servers picking offending work servers. I think it's probably that logic, not the work server logic, that would need to change to make the unhappy path stuff more fluid, and I imagine you'd have to change the whole pool, so I think it might be a bigger effort than it may seem. Looking back on previous threads in this forum, I have to cut the people who maintain these servers slack. They're no different than any other type of server, really. They can fail for any of the same reasons that any server or computer can. We're all very passionate about folding, and that's awesome, but let's not forget, we're dealing with people at the end of the day -- people who work for an academic institution that can at times have a tremendous amount of red tape around getting operational things done.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 3:31 pm
by SteveWillis
rwh202 wrote:Is the firewall block really confirmed to work?
I've applied the rule in Linux Mint and still get assigned to the offending server 9 out of 10 times. The only solution I've found is continual pausing and un-pausing to reset the throttling delay in requesting assignments, but getting tedious on 10 rigs that need new WUs every hour or so.
How hard is it to pull the plug on the server and stop assignments to it?
Oh it works all right.
first are you sure the firewall is enabled? By default is not. I went back and added that to my earlier post but
to check
sudo ufw status
and to enable
sudo ufw enable

Also be aware that it will first try to assign to 105 but you'll get a connection error then it will go on to 102. Sometimes it has to go through this cycle several times before you get an assignment. It is going to take longer than what you are used to.

Here is a script I wrote to automatically pause and unpause when it appears to be hung up. It loops every 15 minutes. I modified it this morning and it hasn't needed to do it's thing yet so not thoroughly tested. Use at your own risk.

Code: Select all

#!/bin/bash
cd /var/lib/fahclient

while true
do
egrep -i "Connected|assign|refused|Upload|Download" log.txt|tail -1|egrep "refused|assign"
results=$?
echo "$(date)    results = $results"
if [ $results = 0 ]
then 

echo "PAUSED *******  $(date) "
echo -e "pause 0\nquit" | nc localhost 36330 &> /dev/null
echo -e "pause 1\nquit" | nc localhost 36330 &> /dev/null
echo -e "pause 2\nquit" | nc localhost 36330 &> /dev/null
echo -e "pause 3\nquit" | nc localhost 36330 &> /dev/null
sleep 10
echo -e "unpause 0\nquit" | nc localhost 36330 &> /dev/null
echo -e "unpause 1\nquit" | nc localhost 36330 &> /dev/null
echo -e "unpause 2\nquit" | nc localhost 36330 &> /dev/null
echo -e "unpause 3\nquit" | nc localhost 36330 &> /dev/null
fi
sleep 900
done



Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Sun May 28, 2017 4:00 pm
by rwh202
SteveWillis wrote:
rwh202 wrote:Is the firewall block really confirmed to work?
I've applied the rule in Linux Mint and still get assigned to the offending server 9 out of 10 times. The only solution I've found is continual pausing and un-pausing to reset the throttling delay in requesting assignments, but getting tedious on 10 rigs that need new WUs every hour or so.
How hard is it to pull the plug on the server and stop assignments to it?
Oh it works all right.
first are you sure the firewall is enabled? By default is not. I went back and added that to my earlier post but
to check
sudo ufw status
and to enable
sudo ufw enable

Also be aware that it will first try to assign to 105 but you'll get a connection error then it will go on to 102. Sometimes it has to go through this cycle several times before you get an assignment. It is going to take longer than what you are used to.
Yeah, I enabled the firewall, but I see the same behaviour regardless whether I block the offending server in the firewall or not - it still fails and goes through the loop of usually getting reassigned to the same one, failing etc. etc.

Thanks for the pause / unpause script though. I'll brush up on my grep and see how I can run it on a per slot basis and just pause the stalled one.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Mon May 29, 2017 2:40 am
by SteveWillis
Mine has been running all day without missing a beat and the script hasn't triggered the pause/unpause even once. I'm going to show you my firewall settings. I messed around with them some and maybe it will be some help.

Code: Select all

Status: active

To                         Action      From
--                         ------      ----
Anywhere                   REJECT      171.67.108.105            
Anywhere                   ALLOW       171.67.108.102            

Anywhere                   REJECT OUT  171.67.108.105            
Anywhere                   ALLOW OUT   171.67.108.102            
171.67.108.102             ALLOW OUT   Anywhere                  
171.67.108.105             REJECT OUT  Anywhere                  

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Mon May 29, 2017 12:36 pm
by boristsybin
still no comments from support team?

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Mon May 29, 2017 1:58 pm
by Joe_H
I have heard back that it is being looked into, but nothing further to post. The first reports came in on a Friday evening and reported to PG on Saturday morning. This is a relatively major holiday weekend, so limited staff would be available to work on this.