WU's not being send to work server 3.*
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: WU's not being send to work server 3.*
Whilst I know how you must feel, and uploading straight away or before Timeout is obviously preferable, the WU is only down the drain once it reaches expiration.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 62
- Joined: Mon Apr 13, 2020 11:47 am
Re: WU's not being send to work server
I think a lot of the frustration on the donor side could have been mitigated by putting simple controls in the client a long time ago. A white list/black list feature for project numbers would alleviate much frustration, and I think it should be something that's allowed. If you consider how much a research team would have to pay for AWS or Azure resources to accomplish the same tasks that the donors allow them to accomplish for free, then it should be a donor's right omit problem projects that are wasting their hardware resources and electricity.Neil-B wrote:but if you think it helps then feel free to state your opinions just as others might feel free to state contradictory ones
More often that not when thumbing through the forums, I just see donors getting push back for complaints and told they can quit FAH if they don't like it. It's akin to giving a homeless man $100, watching him spend it all on liquor, and then being told "hey buddy, if ya don't like it, don't give me any money." I suppose that's his right, but it's still rather tasteless.
-
- Posts: 5
- Joined: Sun Apr 12, 2020 3:44 pm
Re: WU's not being send to work server 3.*
Same problem here
19:52:35:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:16435 run:2891 clone:0 gen:0 core:0x22 unit:0x0000000203854c135e9a4ef77d34b1df
19:52:35:WU00:FS01:Uploading 133.16MiB to 3.133.76.19
19:52:35:WU00:FS01:Connecting to 3.133.76.19:8080
19:52:36:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:52:36:WU00:FS01:Trying to send results to collection server
19:52:36:WU00:FS01:Uploading 133.16MiB to 3.21.157.11
19:52:36:WU00:FS01:Connecting to 3.21.157.11:8080
19:52:37:ERROR:WU00:FS01:Exception: Transfer failed
13 attempts already...
19:52:35:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:16435 run:2891 clone:0 gen:0 core:0x22 unit:0x0000000203854c135e9a4ef77d34b1df
19:52:35:WU00:FS01:Uploading 133.16MiB to 3.133.76.19
19:52:35:WU00:FS01:Connecting to 3.133.76.19:8080
19:52:36:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:52:36:WU00:FS01:Trying to send results to collection server
19:52:36:WU00:FS01:Uploading 133.16MiB to 3.21.157.11
19:52:36:WU00:FS01:Connecting to 3.21.157.11:8080
19:52:37:ERROR:WU00:FS01:Exception: Transfer failed
13 attempts already...
Re: WU's not being send to work server 3.*
Allowing the donors to decide which units to do leads to cherry-picking and impacts the science. This project isn't run for the sake of the donors and for getting points. When it started there were no points, only units and the only statistics were how many units had been done. Many of the changes in the points system have been to try to prevent the donors from having to do anything other than accept work and do work. Donate the hardware available and let them fold.
Until the start of this CV-19 work, the number of donors and the amount of server hardware and work was pretty equitable. It worked most of the time and the only real recurring problem was the stats server constantly stopping.
Now with a new project, a ramp up in donors twenty fold, huge publicity from Nvidia, Intel and a bunch of influential Youtubers, the project is getting constant repeat questions on a forum where there are no staffers only other donors who try to answer and help.
Has this project been a victim of its own sudden success? Yes, of course it has. But arguing with 20/20 hindsight that it should have done x or y without stopping to perhaps ASK why it is the way it is, is perhaps not overly helpful.
It's been going for over a decade and the method of allocating work automatically via the priorities of the scientists has always worked fine. The new CV-19 units were put at the top of the priority list as soon as they were ready to be worked on and everyone would have got that work but oh no, the new donors complained that they couldn't specify that they would only do CV-19. Is there something wrong with curing cancer while waiting for the CV-19 work? Is there something wrong with being allocated the work that the researchers need doing?
We get enough questions about simply adding a GPU slot without having to deal with users asking which units they should blacklist and which ones get the best points. Who are you suggesting will maintain this white/black list? Are you volunteering?
Until the start of this CV-19 work, the number of donors and the amount of server hardware and work was pretty equitable. It worked most of the time and the only real recurring problem was the stats server constantly stopping.
Now with a new project, a ramp up in donors twenty fold, huge publicity from Nvidia, Intel and a bunch of influential Youtubers, the project is getting constant repeat questions on a forum where there are no staffers only other donors who try to answer and help.
Has this project been a victim of its own sudden success? Yes, of course it has. But arguing with 20/20 hindsight that it should have done x or y without stopping to perhaps ASK why it is the way it is, is perhaps not overly helpful.
It's been going for over a decade and the method of allocating work automatically via the priorities of the scientists has always worked fine. The new CV-19 units were put at the top of the priority list as soon as they were ready to be worked on and everyone would have got that work but oh no, the new donors complained that they couldn't specify that they would only do CV-19. Is there something wrong with curing cancer while waiting for the CV-19 work? Is there something wrong with being allocated the work that the researchers need doing?
We get enough questions about simply adding a GPU slot without having to deal with users asking which units they should blacklist and which ones get the best points. Who are you suggesting will maintain this white/black list? Are you volunteering?
single 1070
Re: WU's not being send to work server 3.*
I understand this has just been corrected.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 6
- Joined: Fri Apr 10, 2020 5:52 am
Re: WU's not being send to work server 3.*
Yes, looks like it. Both my stuck work units over 130MiB did finally upload and receive points, although it took many tries before a server connected. I can only imagine there must be a considerable backlog.bruce wrote:I understand this has just been corrected.
Thanks for getting it all sorted out.
Re: WU's not being send to work server 3.*
The science itself necessitates a large amount of storage space for the data. Fragmenting the infrastructure into even more servers that would then require even more attention than before is a troublesome way to approach the issue. Those types of servers aren't meant to be hosted by the average user; it requires a degree of understanding in both hardware and networking and a level of financial resource that typically comes at the institution level.ChrisD5710 wrote:Maybe You should consider supporting work servers with less storage?
Re: WU's not being send to work server 3.*
I've been failing with a large result over the last several days as well. No other issues with any of my slots over the past week:
Waiting on: Send Results
Attempts: 25
Assigned: 2020-04-28T21:46:14Z
Expiration: 2020-05-05T21:46:14Z
Bonus: 0
3.133.76.19 was restarted 30 minutes ago.
3.21.157.11 was restarted 3 hours ago.
We will see....
https://apps.foldingathome.org/serverstats
01:12:32:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:1479 clone:4 gen:0 core:0x22 unit:0x0000000003854c135e9a4ef9e4ff1a84
01:12:32:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
01:12:32:WU02:FS01:Connecting to 3.133.76.19:8080
01:14:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:14:42:WU02:FS01:Connecting to 3.133.76.19:80
01:16:53:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: Connection timed out
01:16:53:WU02:FS01:Trying to send results to collection server
01:16:53:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
01:16:53:WU02:FS01:Connecting to 3.21.157.11:8080
01:19:05:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:19:05:WU02:FS01:Connecting to 3.21.157.11:80
01:21:16:ERROR:WU02:FS01:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
******************************* Date: 2020-05-01 *******************************
Waiting on: Send Results
Attempts: 25
Assigned: 2020-04-28T21:46:14Z
Expiration: 2020-05-05T21:46:14Z
Bonus: 0
3.133.76.19 was restarted 30 minutes ago.
3.21.157.11 was restarted 3 hours ago.
We will see....
https://apps.foldingathome.org/serverstats
01:12:32:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:1479 clone:4 gen:0 core:0x22 unit:0x0000000003854c135e9a4ef9e4ff1a84
01:12:32:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
01:12:32:WU02:FS01:Connecting to 3.133.76.19:8080
01:14:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:14:42:WU02:FS01:Connecting to 3.133.76.19:80
01:16:53:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: Connection timed out
01:16:53:WU02:FS01:Trying to send results to collection server
01:16:53:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
01:16:53:WU02:FS01:Connecting to 3.21.157.11:8080
01:19:05:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:19:05:WU02:FS01:Connecting to 3.21.157.11:80
01:21:16:ERROR:WU02:FS01:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
******************************* Date: 2020-05-01 *******************************
-
- Posts: 522
- Joined: Mon Dec 03, 2007 4:33 am
- Location: Australia
Re: WU's not being send to work server 3.*
@lazyacevw, Can you try stopping your client entirely and restarting the program to ensure it retries again on startup? Please post an update if it fails.
Re: WU's not being send to work server 3.*
Thanks! I restarted the computer yesterday but it didn't work. I went to try a service restart but as I was about to do so, the upload cleared right before my eyes! Must've taken 30 or so attempts.anandhanju wrote:@lazyacevw, Can you try stopping your client entirely and restarting the program to ensure it retries again on startup? Please post an update if it fails.
Code: Select all
01:12:32:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:1479 clone:4 gen:0 core:0x22 unit:0x0000000003854c135e9a4ef9e4ff1a84
01:12:32:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
01:12:32:WU02:FS01:Connecting to 3.133.76.19:8080
01:14:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:14:42:WU02:FS01:Connecting to 3.133.76.19:80
01:16:53:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: Connection timed out
01:16:53:WU02:FS01:Trying to send results to collection server
01:16:53:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
01:16:53:WU02:FS01:Connecting to 3.21.157.11:8080
01:19:05:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:19:05:WU02:FS01:Connecting to 3.21.157.11:80
01:21:16:ERROR:WU02:FS01:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
...
04:31:33:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:1479 clone:4 gen:0 core:0x22 unit:0x0000000003854c135e9a4ef9e4ff1a84
04:31:33:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
04:31:33:WU02:FS01:Connecting to 3.133.76.19:8080
04:33:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
04:33:42:WU02:FS01:Connecting to 3.133.76.19:80
04:33:46:WU02:FS01:Upload 0.04%
04:35:33:WU02:FS01:Upload 0.13%
04:35:39:WU02:FS01:Upload 0.97%
...
04:45:27:WU02:FS01:Upload 97.59%
04:45:33:WU02:FS01:Upload 98.61%
04:45:39:WU02:FS01:Upload 99.67%
04:45:43:WU02:FS01:Upload complete
04:45:43:WU02:FS01:Server responded WORK_ACK (400)
04:45:43:WU02:FS01:Final credit estimate, 43291.00 points
04:45:43:WU02:FS01:Cleaning up
Re: WU's not being send to work server 3.*
My WU for which I started this topic eventually got uploaded within the deadline, I'm not sure if it had to do with (simply) overload or that the server had some other issue, but anyway patience helps.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: WU's not being send to work server 3.*
From a post in another thread - and reading between the lines - the cause of the issues was identified and a solution put in place.
Really glad they got the server accepting WUs before yours expired
Really glad they got the server accepting WUs before yours expired
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 704
- Joined: Tue Dec 04, 2007 6:56 am
- Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.
[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5. - Location: @Home
- Contact:
Re: WU's not being send to work server 3.*
Finally uploading after restarting client this morning. Very slow start, though, then accepting at ~4-5Mbps:
Code: Select all
14:29:01:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
14:29:01:WU02:FS01:Connecting to 3.133.76.19:80
14:29:22:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
14:29:22:WU02:FS01:Trying to send results to collection server
14:29:22:WU02:FS01:Uploading 140.41MiB to 3.21.157.11
14:29:22:WU02:FS01:Connecting to 3.21.157.11:8080
14:29:32:WU02:FS01:Upload 0.09%
14:33:08:WU02:FS01:Upload 0.13%
14:33:14:WU02:FS01:Upload 2.23%
14:33:20:WU02:FS01:Upload 3.96%
14:33:26:WU02:FS01:Upload 6.37%
14:33:32:WU02:FS01:Upload 8.59%
14:33:38:WU02:FS01:Upload 10.51%
14:33:44:WU02:FS01:Upload 12.06%
14:33:50:WU02:FS01:Upload 14.16%
14:33:56:WU02:FS01:Upload 15.22%
14:34:02:WU02:FS01:Upload 17.23%
14:34:08:WU02:FS01:Upload 19.23%
14:34:14:WU02:FS01:Upload 21.59%
14:34:20:WU02:FS01:Upload 23.81%
14:34:26:WU02:FS01:Upload 25.64%
14:34:32:WU02:FS01:Upload 27.42%
14:34:38:WU02:FS01:Upload 29.69%
14:34:44:WU02:FS01:Upload 31.52%
14:34:50:WU02:FS01:Upload 33.30%
14:34:56:WU02:FS01:Upload 35.12%
14:35:02:WU02:FS01:Upload 36.68%
14:35:08:WU02:FS01:Upload 38.73%
14:35:14:WU02:FS01:Upload 40.55%
14:35:20:WU02:FS01:Upload 42.06%
14:35:26:WU02:FS01:Upload 44.34%
14:35:32:WU03:FS00:0xa7:Completed 172500 out of 250000 steps (69%)
14:35:32:WU02:FS01:Upload 45.76%
14:35:38:WU02:FS01:Upload 47.81%
14:35:44:WU02:FS01:Upload 50.17%
14:35:50:WU02:FS01:Upload 51.55%
14:35:56:WU02:FS01:Upload 53.68%
14:36:02:WU02:FS01:Upload 56.00%
14:36:08:WU02:FS01:Upload 58.13%
14:36:14:WU02:FS01:Upload 60.00%
14:36:20:WU02:FS01:Upload 62.18%
14:36:26:WU02:FS01:Upload 64.32%
14:36:32:WU02:FS01:Upload 66.59%
14:36:38:WU02:FS01:Upload 67.66%
14:36:44:WU02:FS01:Upload 69.62%
14:36:50:WU02:FS01:Upload 72.16%
14:36:56:WU02:FS01:Upload 74.34%
14:37:02:WU02:FS01:Upload 76.52%
14:37:08:WU02:FS01:Upload 79.01%
14:37:14:WU02:FS01:Upload 80.93%
14:37:20:WU02:FS01:Upload 82.53%
14:37:26:WU02:FS01:Upload 84.40%
14:37:32:WU02:FS01:Upload 86.36%
14:37:38:WU02:FS01:Upload 88.45%
14:37:44:WU02:FS01:Upload 89.65%
14:37:50:WU02:FS01:Upload 91.96%
14:37:56:WU02:FS01:Upload 93.74%
14:38:02:WU02:FS01:Upload 95.84%
14:38:08:WU02:FS01:Upload 97.57%
14:38:14:WU02:FS01:Upload 99.98%
14:38:16:WU02:FS01:Upload complete
14:38:16:WU02:FS01:Server responded WORK_ACK (400)
14:38:16:WU02:FS01:Final credit estimate, 51387.00 points
14:38:16:WU02:FS01:Cleaning up
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
3.133.76.19 (aws1.foldingathome.org)
This server's uptime is less than 40 minutes and is already having problems again.
3.133.76.19 aws1.foldingathome.org WS 9.6.8 joseph 10,764.00/hr 0 0 Yes Assign 18,116 7,896 OPENMM_22, GRO_A7 6.72TiB 37 minutes 2020-05-03T04:09:54Z
3.133.76.19 aws1.foldingathome.org WS 9.6.8 joseph 10,764.00/hr 0 0 Yes Assign 18,116 7,896 OPENMM_22, GRO_A7 6.72TiB 37 minutes 2020-05-03T04:09:54Z
Code: Select all
04:07:02:WU01:FS01:Uploading 78.05MiB to 3.133.76.19
04:07:02:WU01:FS01:Connecting to 3.133.76.19:8080
04:08:42:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
04:08:42:WU00:FS01:Connecting to 3.133.76.19:80
04:09:09:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
04:09:09:WU01:FS01:Connecting to 3.133.76.19:80
04:10:49:ERROR:WU00:FS01:Exception: Failed to connect to 3.133.76.19:80: Connection timed out
04:10:49:WU00:FS01:Connecting to 65.254.110.245:80
04:10:50:WU00:FS01:Assigned to work server 128.252.203.10
04:10:50:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP107GL [Quadro P1000] from 128.252.203.10
04:10:50:WU00:FS01:Connecting to 128.252.203.10:8080
04:11:16:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: Connection timed out
04:11:16:WU01:FS01:Trying to send results to collection server
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: 3.133.76.19 (aws1.foldingathome.org)
Stats say it is online but I also cannot connect to http://aws1.foldingathome.org/