155.247.166.220 downloads stalled

Moderators: Site Moderators, FAHC Science Team

NormalDiffusion
Posts: 124
Joined: Sat Apr 18, 2020 1:50 pm

Re: 155.247.166.220 downloads stalled

Post by NormalDiffusion »

Still problems with this server. Downloads get stuck anywhere between 0 and 100%...
Celso Azevedo
Posts: 7
Joined: Wed Dec 18, 2013 9:56 pm

Re: 155.247.166.220 downloads stalled

Post by Celso Azevedo »

bruce wrote:155.247.166.220 has been the target of cyber attacks on and off for the past several days. (I have to wonder why somebody has decided to do that.)
Knowing that servers have problems from time to time, shouldn't the client handle server-side issues better? I mean, this didn't start with covid-19, we've been dealing with similar issues for years.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 155.247.166.220 downloads stalled

Post by Neil-B »

Over the years various approaches have been tried - slowing down retry attempts - adding CS ... but the basic state that a WU needs to return to the server it came from means short of rearchitecting the whole process (which may be an option) there isn't much more that can be done ... Server issues in the past have (afaik) been a rarer occurrence - partly simply due to less throughput, but also potentially due to more stable (as in not changing much) infrastructure ... Over the last months there have been (again afaik) many more issues and this has proved challenging for many to cope with (understandable) but simple fact is that during a period of rapid unforeseen (in the big scheme of things) expansion of infrastructure on an unprecedented scale and with significant changes to structures and types of projects change has been bumpy to say the least.

You are absolutely right, it didn't start with covid-19 but is has been more common and has sensitized people more.

In the case of this server iirc there has possibly been an issue with some form of ddos attack - until that has been resolved and any backlog queue cleared it may well be that there will be a variety of delays/issues.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Celso Azevedo
Posts: 7
Joined: Wed Dec 18, 2013 9:56 pm

Re: 155.247.166.220 downloads stalled

Post by Celso Azevedo »

Independently of how things work server side, scaling problems, attacks, etc, the F@H client is terrible at handling these issues.

This server is being ddos'ed... ok, that's bad, but why does the client stall? Why do I need to reboot the machine to get it to work again?

All servers were having issues 2 or 3 months ago. Understandable as lots of people joined the project, but the client would stay there doing nothing for hours or days even when servers were working again. We had to manually do something (removing/adding slots, rebooting, etc) for it to start again.

Even if you have the resources and capacity, servers will have issues. Hardware failures, network problems, software updates gone wrong, DDoS, etc. Clients need to handle these issues without crashing or stalling.

I've been folding 24/7 since 2016 with a dedicated machine and mostly without issues. Now it needs constant babysitting and the computer inside a garden shed, so temperature and humidity becomes a problem when it's not running. I have no other choice but to stop folding and do something else with the hardware.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 155.247.166.220 downloads stalled

Post by bruce »

Celso Azevedo wrote:Independently of how things work server side, scaling problems, attacks, etc, the F@H client is terrible at handling these issues.

This server is being ddos'ed... ok, that's bad, but why does the client stall? Why do I need to reboot the machine to get it to work again?

All servers were having issues 2 or 3 months ago. Understandable as lots of people joined the project, but the client would stay there doing nothing for hours or days even when servers were working again. We had to manually do something (removing/adding slots, rebooting, etc) for it to start again.

Even if you have the resources and capacity, servers will have issues. Hardware failures, network problems, software updates gone wrong, DDoS, etc. Clients need to handle these issues without crashing or stalling.

I've been folding 24/7 since 2016 with a dedicated machine and mostly without issues. Now it needs constant babysitting and the computer inside a garden shed, so temperature and humidity becomes a problem when it's not running. I have no other choice but to stop folding and do something else with the hardware.
You're frustrated, just like the rest of us. Nobody will dispute any of your claims except the last one. You do have another choice.

Let me point out that FAH is almost exclusively supported by volunteers. We don't sell any product, nor do we create patent-able drugs that can be sold for a profit. The results that are generated by volunteers like yourself with the current software are put in the public domain so other scientists can use them without paying a royalty.

We simply ask you to volunteer your services to FAH. If you are not satisfied with the level of support that's being provided, you're welcome to help. Visit https://github.com/FoldingAtHome/fah-issues/issues/983 and volunteer to propose a fix for FAHClient's code.
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: 155.247.166.220 downloads stalled

Post by HaloJones »

Every user has an option to block this one IP address. The client is quite happy if it fails to connect at all, it takes a few failures over a few minutes but it then goes back to the assignment server and talks to a different server to download units.

Even the Windows Firewall allows "Custom rules" to block all traffic to a specific port. I block it on my ISP router.

Google is your friend
single 1070

Image
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 155.247.166.220 downloads stalled

Post by Neil-B »

Celso Azevedo wrote:I've been folding 24/7 since 2016 with a dedicated machine and mostly without issues.
My point exactly ... It is a recent (in the big scheme of things) problem that is front and center in everyones awareness at the moment which will no doubt improve.

Lessons are being learnt and future cores/clients may well have various adjustments to try and avoid some of these issues but these type of changes take time ... Obviously some of the issues (not enough WUs) that caused folding kit to idle would be very difficult to work around as if there isn't the work then it is hard to hand it out ... The server side comms/loading issues have improved significantly - quite a few of these seemed to be due to the rapid stand up of infrastructure and overloading of existing due to rapid growth in folder resource ... There are still some issues around client/server connections being lost/stalled and this may be harder to resolve - but from comments I have seen elsewhere there team are aware of this and will no doubt be trying to find a resolution.

Frustration is understandable - but as has been mentioned you do have many choices tbh ... For most people I tend to say "be patient" as honestly these things do tend to be resolved in time - however with your humidity issues you might best be advised to take one of those "choices", be it blocking an IP (which would only work for the current variant of the server issues) or finding another distributed project that delivers to your needs ... and this isn't me dismissing your concerns, rather trying to look at the whole picture pragmatically, recognising the changes you are after probably wont happen overnight and that you have a problem that needs a more immediate solution.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 155.247.166.220 downloads stalled

Post by bruce »

If you block a connection to a server that's being used for a collection server for some work server, you're not helping yourself. You have to go back the the associated work server and prevnet yourself from receiving an assignment from that work server. There's no point in getting as assignmetn from WS1 if it can't recive the result and so you client then trys to send it to CS2 but you've blocked that connection.
Celso Azevedo
Posts: 7
Joined: Wed Dec 18, 2013 9:56 pm

Re: 155.247.166.220 downloads stalled

Post by Celso Azevedo »

I understand that you all are trying to help when you suggest blocking the IP, but you need to understand that most users don't touch their firewall. I know how to do it, but blocking IPs fall into the "babysitting" I mentioned (and I lack the time to do it) and doesn't address the main issue: the client.

While it's true that these issues happen more often now because of the influx of new users, DDoS and server issues aren't a new thing. This forum even has a section for server issues. Shouldn't a client handle these issues gracefully? Can you imagine your OS or browser stalling just because it tried to download something but the server was down or took too long to download?

I'm not frustrated because servers are down, because there are no WU or because a WU can't be uploaded because disks are full. I understand why it's happening and understand how hard is to improve things when you don't have the man power and other resources to improve the situation. The frustration comes from the fact that the client stops working because 1 server is having issues.

If someone that works on the client reads this, I'm sorry for being harsh, but the project asks people to fold and then provides a client that stalls if a single server is DDoSed. It just drives people away.

Bruce, I would fix the 2013(!) Github issue you linked to, but I don't know how to code. I try to help by folding.
Post Reply