140.163.4.200
Moderators: Site Moderators, FAHC Science Team
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: 140.163.4.200
Folks: The new work server (pllwskifah1.mskcc.org) ended up in a weird state that was not receiving WUs even though the WS appeared to be running normally. We've restarted it, and it's now receiving the backlog of results.
Please let us know if you notice this happening again! We'll also try to keep a close eye on it and try to figure out what went wrong here.
Apologies for this---it might be the new big NFS storage we mounted on the WS to attempt to avoid out-of-space issues.
~ John Chodera // MSKCC
Please let us know if you notice this happening again! We'll also try to keep a close eye on it and try to figure out what went wrong here.
Apologies for this---it might be the new big NFS storage we mounted on the WS to attempt to avoid out-of-space issues.
~ John Chodera // MSKCC
-
- Posts: 320
- Joined: Sat May 23, 2009 4:49 pm
- Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970 - Location: Mississippi near Memphis, Tn
Re: 140.163.4.200
My backload is slowly disappearing. Had 7 and now its down to 3, so progress is being made. Tks a lot for the fix.
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Re: 140.163.4.200
Can we keep it at zero weight through the weekend unless someone is going to actively keep an eye on it? I'd rather not have my GPUs idled for two days if possible (the science must compute!).JohnChodera wrote:Please let us know if you notice this happening again! We'll also try to keep a close eye on it and try to figure out what went wrong here.
~ John Chodera // MSKCC
-
- Posts: 320
- Joined: Sat May 23, 2009 4:49 pm
- Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970 - Location: Mississippi near Memphis, Tn
Re: 140.163.4.200
Spoke too soon. This just happened a few minutes ago.
Edit: this problem resolved itself a few minutes later. Just slow.
Edit: this problem resolved itself a few minutes later. Just slow.
Code: Select all
15:40:05:WU04:FS01:Connecting to assign1.foldingathome.org:80
15:40:05:WU04:FS01:Assigned to work server 140.163.4.200
15:40:05:WU04:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU104 [GeForce RTX 2070 SUPER] from 140.163.4.200
15:40:05:WU04:FS01:Connecting to 140.163.4.200:8080
15:40:26:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
15:40:26:WU04:FS01:Connecting to 140.163.4.200:80
15:40:48:ERROR:WU04:FS01:Exception: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
15:40:48:WU04:FS01:Connecting to assign1.foldingathome.org:80
15:40:48:WU04:FS01:Assigned to work server 140.163.4.200
15:40:48:WU04:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU104 [GeForce RTX 2070 SUPER] from 140.163.4.200
15:40:48:WU04:FS01:Connecting to 140.163.4.200:8080
15:41:09:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
15:41:09:WU04:FS01:Connecting to 140.163.4.200:80
15:41:31:ERROR:WU04:FS01:Exception: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
15:41:47:WU02:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
15:41:47:WU02:FS01:0x22:Average performance: 83.8835 ns/day
15:41:48:WU04:FS01:Connecting to assign1.foldingathome.org:80
15:41:48:WU04:FS01:Assigned to work server 140.163.4.200
15:41:48:WU04:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU104 [GeForce RTX 2070 SUPER] from 140.163.4.200
15:41:48:WU04:FS01:Connecting to 140.163.4.200:8080
15:41:54:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
15:41:54:WU02:FS01:0x22:Saving result file checkpointState.xml.bz2
15:41:55:WU02:FS01:0x22:Saving result file globals.csv
15:41:55:WU02:FS01:0x22:Saving result file positions.xtc
15:41:55:WU02:FS01:0x22:Saving result file science.log
15:41:55:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
15:41:56:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
15:41:56:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:13426 run:1456 clone:20 gen:4 core:0x22 unit:0x0000000812bc7d9a5f57207fe28d1881
15:41:56:WU02:FS01:Uploading 5.70MiB to 18.188.125.154
15:41:56:WU02:FS01:Connecting to 18.188.125.154:8080
15:42:02:WU02:FS01:Upload 55.94%
15:42:07:WU02:FS01:Upload complete
15:42:07:WU02:FS01:Server responded WORK_ACK (400)
15:42:07:WU02:FS01:Final credit estimate, 176071.00 points
15:42:07:WU02:FS01:Cleaning up
15:42:09:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
15:42:09:WU04:FS01:Connecting to 140.163.4.200:80
15:42:31:ERROR:WU04:FS01:Exception: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
15:43:25:WU04:FS01:Connecting to assign1.foldingathome.org:80
15:43:26:WU04:FS01:Assigned to work server 140.163.4.200
15:43:26:WU04:FS01:Requesting new work unit for slot 01: READY gpu:0:TU104 [GeForce RTX 2070 SUPER] from 140.163.4.200
15:43:26:WU04:FS01:Connecting to 140.163.4.200:8080
15:43:47:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
15:43:47:WU04:FS01:Connecting to 140.163.4.200:80
15:44:08:ERROR:WU04:FS01:Exception: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
15:46:02:WU04:FS01:Connecting to assign1.foldingathome.org:80
15:46:02:WU04:FS01:Assigned to work server 140.163.4.200
15:46:03:WU04:FS01:Requesting new work unit for slot 01: READY gpu:0:TU104 [GeForce RTX 2070 SUPER] from 140.163.4.200
15:46:03:WU04:FS01:Connecting to 140.163.4.200:8080
15:46:24:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
15:46:24:WU04:FS01:Connecting to 140.163.4.200:80
15:46:45:ERROR:WU04:FS01:Exception: Failed to connect to 140.163.4.200:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: 140.163.4.200
Looks like the server ended up not accepting 80/8080 again. We're going to keep it on weight 0 for a while to monitor.
~ John Chodera // MSKCC
~ John Chodera // MSKCC
Re: 140.163.4.200
I have two WUs from it right now:JohnChodera wrote:Looks like the server ended up not accepting 80/8080 again. We're going to keep it on weight 0 for a while to monitor.
~ John Chodera // MSKCC
13436 (22, 5, 2)
13433 (63, 0, 2) completed successfully with no retries 157.664 ns/day
I'll report back in when they finish if they upload or not.
Re: 140.163.4.200
My two work units have since been uploaded. Thank for fix this.
Re: 140.163.4.200
project:13436 run:22 clone:5 gen:2 core:0x22 did upload... but it took forever, something is seriously messed up with that server.
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: 140.163.4.200
Update: it looks like the issue is with an underperforming NFS mount. We're investigating.
Thanks for your patience!
~ John Chodera // MSKCC
Thanks for your patience!
~ John Chodera // MSKCC
Re: 140.163.4.200
I'm noticing this being a super slow connection that keeps timing out.
Re: 140.163.4.200
Can anyone even ping this server?
Pinging 140.163.4.200 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Pinging 140.163.4.200 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: 140.163.4.200
The server is behind the MSKCC firewall, it blocks pings. If you want to check if the server is up, just enter the IP number into a browser window.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 21
- Joined: Tue May 30, 2017 4:55 am
Re: 140.163.4.200
Going to vent here a bit. The collection server (140.163.4.210) tied to this work server has been barely functional for half of December and is still 90% dead today.
I've got no less than 20 completed work units, some days old with 100+ retries, still waiting for the damned server to fix itself.
Can't admins at least set up some sort of redirect?! If 30% of my daily output is just going to be flushed down the drain anyway, then I might as well be running Nicehash...
I've got no less than 20 completed work units, some days old with 100+ retries, still waiting for the damned server to fix itself.
Can't admins at least set up some sort of redirect?! If 30% of my daily output is just going to be flushed down the drain anyway, then I might as well be running Nicehash...
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: 140.163.4.200
Still happen bit .. but better than April to June last year .. worth posting here as message can be got to the people who look after each impacted server by the core team .. over weekends/holidays issues can be more noticable and some of the servers are in different timezones where getting responses can be trickier
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: 140.163.4.200
FYI, the CS 140.163.4.210 has an update of about 1 hour so was recently rebooted. I am aware that working is being done on it to improve certain aspects.
BTW, redirection will not work with the current setup. The WU will either try to reach out to the WS or the CS (if it is defined) which is determined when it was downloaded by the client. There's no way to dynamically update that information on the WU end.
BTW, redirection will not work with the current setup. The WU will either try to reach out to the WS or the CS (if it is defined) which is determined when it was downloaded by the client. There's no way to dynamically update that information on the WU end.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues