Collection Server 140.163.4.200:8080
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Collection Server 140.163.4.200:8080
If you are happy rebooting then hopefully the fault will prove to be ephemeral and at some point you will find you no longer need to ... If it becomes too much of a pain then looking into the K Monitoring settings would be the next step ... and if you want to try that anyways then hey why not ... Curiosity may be terminal for furry feline friends but it is what makes life interesting imho !!
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Collection Server 140.163.4.200:8080
No, I'm not happy with the rebooting "solution"! It's barely a bandaid or a troubleshooting step at best! I just checked and I have 5 of the troublesome WUs stuck and working on another one this morning. After some coffee kicks in I'll take a shot a deleting one of the WU folders and see how that works. Then just for "science", I'll try turning off Network Monitoring. For that I'm going to pause Folding a few minutes before the WU completes because you have to reboot for the change to Network Monitoring to take place, that way I won't have to leave my network open waiting to get one of the WUs and wait for it to complete with my network open. And I don't consider leaving my network open a solution either I just want to see if it works. I might not be able to do it with the WU in progress but I'm pretty sure I'll catch one today!
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Collection Server 140.163.4.200:8080
If it is the K network monitoring then you can report it to K support as it would be something their software is doing that is interfering with the comms ... I understand this is frustrating - sorry if my previous came across as flippant as it wasn't meant to ... make sure you pause just after a checkpoint has been saved - the logs will show this - that way you shouldn't lose any work.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Collection Server 140.163.4.200:8080
NO! I did NOT consider your comment flippant in any way! You've been nothing but professional and very helpful! What I don't agree with is FAH's considering rebooting to be a "solution" but I do understand why they are OK with it. I do consider your suggestion (and help finding the darn folder) my best available option given the circumstances is to just nuke the WUx folder. As it worked perfectly! Once I deleted a WU's folder, paused and unpaused the FAH app, the WU had been banished!
As usual, like an idiot, I hadn't ever even considered just Pausing "K" until now!
I just paused it and let several of the stuck WUs run their full gauntlet of upload from 0 to 99% uploaded and they failed. Also tried pausing the FAH app and let some try again from 0 and they still failed. I think that could just mean once a WU fails it will always fail until removed or the PC is rebooted, possibly regardless of "K".
Since just pausing "K" is so much easier than stopping Network Monitoring I'll give that a try on the next evil WU before it starts its first upload. Would you agree that would be a definitive test to rule out or confirm "K" as the culprit and I could jump over to "K" support if the WU completes correctly?
As always, many thanks for all your help and educating me!
As usual, like an idiot, I hadn't ever even considered just Pausing "K" until now!
I just paused it and let several of the stuck WUs run their full gauntlet of upload from 0 to 99% uploaded and they failed. Also tried pausing the FAH app and let some try again from 0 and they still failed. I think that could just mean once a WU fails it will always fail until removed or the PC is rebooted, possibly regardless of "K".
Since just pausing "K" is so much easier than stopping Network Monitoring I'll give that a try on the next evil WU before it starts its first upload. Would you agree that would be a definitive test to rule out or confirm "K" as the culprit and I could jump over to "K" support if the WU completes correctly?
As always, many thanks for all your help and educating me!
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Collection Server 140.163.4.200:8080
I actually meant pausing the fah folding slot before turning off K - but if it is possible to pause K (iirc Network Monitoring isn't paused when the K AV is paused/switched off) then that is a good first check ... tbh past folders iirc turned it off - but what is possible and the specific settings may vary between versions.
If either pausing or turning off the K Monitoring stops the comms issue then K support would be the way to go - but how much help they might be able to be will have to be seen ... Good Luck with your testing.
If either pausing or turning off the K Monitoring stops the comms issue then K support would be the way to go - but how much help they might be able to be will have to be seen ... Good Luck with your testing.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Collection Server 140.163.4.200:8080
Craig,
It has just crossed my mind - you may have already had this thought and checked - you might even have mentioned it in this thread (but I haven't reread it all) - but does K. log when it blocks/alters connections?
I ask as I just checked my McAfee (very different beast I know but K. may do the same) and as part of their "Hey we are brilliant and have done lots of stuff protecting you" approach they have a security history log that shows all the stuff it has blocked - by Time, Date, and IP address ... Now if K. has the same somewhere and you know what time/date a wu failed to get the correct final upload messages there might be a remote chance of you spotting it in the K security log.
If it does have a log and you can see the blocking going on then that is fairly clear evidence.
If it does have a log but there is nothing obvious that unfortunately doesn't put K Monitoring in the clear (I wish it did) as K. does have the ability to "inject" stuff into comms and it does this (from reading up a bit) based on a variety of techniques (iirc I saw 124 mentioned somewhere) and therefore whilst it may not be blocking it may be altering the comms from the server in a way that the FAHClient on you machine cant read it.
If it doesn't have a log system then ignore my thoughts
It has just crossed my mind - you may have already had this thought and checked - you might even have mentioned it in this thread (but I haven't reread it all) - but does K. log when it blocks/alters connections?
I ask as I just checked my McAfee (very different beast I know but K. may do the same) and as part of their "Hey we are brilliant and have done lots of stuff protecting you" approach they have a security history log that shows all the stuff it has blocked - by Time, Date, and IP address ... Now if K. has the same somewhere and you know what time/date a wu failed to get the correct final upload messages there might be a remote chance of you spotting it in the K security log.
If it does have a log and you can see the blocking going on then that is fairly clear evidence.
If it does have a log but there is nothing obvious that unfortunately doesn't put K Monitoring in the clear (I wish it did) as K. does have the ability to "inject" stuff into comms and it does this (from reading up a bit) based on a variety of techniques (iirc I saw 124 mentioned somewhere) and therefore whilst it may not be blocking it may be altering the comms from the server in a way that the FAHClient on you machine cant read it.
If it doesn't have a log system then ignore my thoughts
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Collection Server 140.163.4.200:8080
Sorry for my long absence, I had to just put this out of my mind for a while.
I had removed the folder and everything seemed to be working perfectly. However, as soon as it completed a WU and tried to start a new one wanting to use a folder of the same name I deleted the computer crashed hard with a message “FahCore_22.exe has stopped working” and I could do nothing except power down the PC. When it came up the monitor display was totally scrambled, keyboard and mouse seemed dead. Left if off a couple of hours tried again and it seemed to be OK and it was folding and I just ignored it for a few days. When I finally checked there were a bunch of stuck WUs even some new Projects that had started failing. I said screw it and just let it keep going. Finally decided to just try running Linux, disconnect all other drives and use my Guest Network as I have no idea about virus or intrusion on Linux. My plan was to do it early this morning LOL but it seems like FAHs servers are down as both of my PCs have WU failing to upload, FAH app shows they haven’t been credited as completed and I can’t even ping FAH servers so I guess I picked the wrong day but at least nothing on my end is causing the problem this time.
I know this stuff happens and am not worried. Hopefully, they will get everything up and running again and then I’ll give the Linux thing a try!
I had removed the folder and everything seemed to be working perfectly. However, as soon as it completed a WU and tried to start a new one wanting to use a folder of the same name I deleted the computer crashed hard with a message “FahCore_22.exe has stopped working” and I could do nothing except power down the PC. When it came up the monitor display was totally scrambled, keyboard and mouse seemed dead. Left if off a couple of hours tried again and it seemed to be OK and it was folding and I just ignored it for a few days. When I finally checked there were a bunch of stuck WUs even some new Projects that had started failing. I said screw it and just let it keep going. Finally decided to just try running Linux, disconnect all other drives and use my Guest Network as I have no idea about virus or intrusion on Linux. My plan was to do it early this morning LOL but it seems like FAHs servers are down as both of my PCs have WU failing to upload, FAH app shows they haven’t been credited as completed and I can’t even ping FAH servers so I guess I picked the wrong day but at least nothing on my end is causing the problem this time.
I know this stuff happens and am not worried. Hopefully, they will get everything up and running again and then I’ll give the Linux thing a try!