Page 3 of 12

Re: Send Errors - 155.247.164.213 & .214

Posted: Tue Mar 17, 2020 6:18 pm
by masterofthepenkins
I have a similar error based on logs and number of retries (hasn't succeeded in 30+ hours).

However, my project is for project 11753, but the work server/collection server is 213 and 214 respectively.

Re: Send Errors - 155.247.164.213 & .214

Posted: Tue Mar 17, 2020 7:09 pm
by qoo
Same here... Send error since 3 day´s

Code: Select all

19:06:25:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:2440 gen:0 core:0x22 unit:0x000000019bf7a4d55e6d77167d9507bd
19:06:25:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
19:06:25:WU01:FS01:Connecting to 155.247.164.213:8080
19:06:26:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
19:06:26:WU01:FS01:Trying to send results to collection server
19:06:26:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
19:06:26:WU01:FS01:Connecting to 155.247.164.214:8080
19:06:26:ERROR:WU01:FS01:Exception: Transfer failed

Re: Send Errors - 155.247.164.213 & .214

Posted: Tue Mar 17, 2020 7:14 pm
by RedDeckWins
I'm hitting the same issue for the same servers - 213/214. The client was able to upload results to other servers.

Re: Send Errors - 155.247.164.213 & .214

Posted: Tue Mar 17, 2020 8:19 pm
by DolphinsCry
First, many thanks for the massive effort that has been put into providing more WU's and server capacity

But, same here, finished a WU for 11758. Got this WU @ 2020-03-17T14:57:31Z, many retries sending like:

Code: Select all

19:43:18:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1831 gen:0 core:0x22 unit:0x000000039bf7a4d55e6d77149ebd3ca9
19:43:18:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
19:43:18:WU00:FS01:Connecting to 155.247.164.213:8080
19:43:18:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:43:18:WU00:FS01:Trying to send results to collection server
19:43:18:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
19:43:18:WU00:FS01:Connecting to 155.247.164.214:8080
19:43:18:ERROR:WU00:FS01:Exception: Transfer failed
Then seconds later a different WU finishes and is send to the same server w/o problem:

Code: Select all

19:44:54:WU02:FS00:0xa7:Completed 250000 out of 250000 steps (100%)
19:44:55:WU02:FS00:0xa7:Saving result file ..\logfile_01.txt
19:44:55:WU02:FS00:0xa7:Saving result file frame5.trr
19:44:55:WU02:FS00:0xa7:Saving result file md.log
19:44:55:WU02:FS00:0xa7:Saving result file science.log
19:44:55:WU02:FS00:0xa7:Saving result file traj_comp.xtc
19:44:55:WU02:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
19:44:56:WU02:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
19:44:56:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14328 run:3 clone:3980 gen:5 core:0xa7 unit:0x000000069bf7a4d65e6d1121784e7dfc
19:44:56:WU02:FS00:Uploading 4.96MiB to 155.247.164.214
19:44:56:WU02:FS00:Connecting to 155.247.164.214:8080
19:44:58:WU02:FS00:Upload complete
19:44:58:WU02:FS00:Server responded WORK_ACK (400)
19:44:58:WU02:FS00:Final credit estimate, 3238.00 points
19:44:58:WU02:FS00:Cleaning up

Re: Send Errors - 155.247.164.213 & .214

Posted: Tue Mar 17, 2020 8:51 pm
by timkroeger
Just chipping in that I have the same problems with 11758, everything else works fine and I've successfully uploaded and downloaded other GPU WUs.
Assigned: 2020-03-16T11:34:00Z
Timeout: 2020-03-17T11:34:00Z

Code: Select all

20:22:57:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1923 gen:0 core:0x22 unit:0x000000029bf7a4d55e6d771456cb16f4
20:22:57:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
20:22:57:WU00:FS01:Connecting to 155.247.164.213:8080
20:22:58:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
20:22:58:WU00:FS01:Trying to send results to collection server
20:22:58:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
20:22:58:WU00:FS01:Connecting to 155.247.164.214:8080
20:22:58:ERROR:WU00:FS01:Exception: Transfer failed
Fold on!
Tim

Re: Send Errors - 155.247.164.213 & .214

Posted: Tue Mar 17, 2020 9:17 pm
by Joe_H
The person managing this project, and some others on this server has been notified, and is looking into getting this fixed.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 3:29 am
by tparikka
I am also having issues submitting WU11758 to both .213 and .214. Same logs as others here.

System:
CPU: Intel Core i5-7600K CPU @ 3.80GHz
CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
CPUs: 4
Memory: 15.93 GiB
Free Memory: 6.68 GiB
Threads: WINDOWS_THREADS
OS Version: 6.2
Has Battery: false
On Battery: false
UTC Offset: -5
PID: 6708
CWD: C:\Users\%USER%\AppData\Roaming\FAHClient
OS: Windows 10 Enterprise
OS Arch: AMD64
GPUs: 1
GPU0: Bus: 1 Slot: 0 NVIDIA: 8 TU104 [GeForce RTX 2070 Super] 8218
CUDA Device 0: Platform: 0 Device: 0 Bus: 1 Slot: 0 Compute: 7.5 Driver: 10.2
OpenCL Device 0: Platform: 0 Device: 0 Bus: 1 Slot: 0 Compute: 1.2 Driver: 442.59
Win32 Service: false

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 6:40 am
by octatone
Same here for PCRG 11758
06:31:50:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:777 gen:0 core:0x22 unit:0x000000039bf7a4d55e6d7710389d680a
06:31:50:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
06:31:50:WU01:FS01:Connecting to 155.247.164.213:8080
06:31:53:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
06:31:53:WU01:FS01:Trying to send results to collection server
06:31:53:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
06:31:53:WU01:FS01:Connecting to 155.247.164.214:8080
06:31:54:ERROR:WU01:FS01:Exception: Transfer failed

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 9:49 am
by Craig2.0
I've got the same issue with the same servers, except I've been trying to upload the same work unit for 4 days now. At one point the upload started and got to 0.27% before the transfer failed again. I would very much appreciate a client config command to switch to a different server.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 11:53 am
by aka_daryl
just a quick heads-up that I'm experiencing the same issue with WU11758

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 12:38 pm
by JoranZeno
Same issue here with Project 11758 unable to upload.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 12:58 pm
by Qwarkman
I have 2 WU's trying to send for a while now. One for more than two days and another for almost 12hours. Both 11758.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 1:11 pm
by Klutz
According to the https://apps.foldingathome.org/serverstats server stats page, 155.247.164.214 is up & running, but when you hover over the "Has CS" column, it shows that it can't connect to several work servers, 155.247.164.213 among them.

Is there any remedy for this, except resetting on the server side? Could I do this locally by editing my Hosts file to point to a different set of servers?

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 11:10 pm
by sswilson
Is there a point where it would make sense to just delete the work file and carry on?

My "stuck" WU is 11758 (0, 2135, 0) and it has 155.247.164.214 as the collection server.

I'll leave it for now, but since some folks have reported that .214 is collecting certain WUs, I'm wondering if there aren't a small group of WUs that have been orphaned and will more than likely never be accepted.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 18, 2020 11:58 pm
by davidcoton
AFAICT it is just because of the vastly increased workload on the servers. The servers' owner is aware and investigating, alongside trying to bring more servers online.
Servers are bound to certain projects, largely because the project needs to get back to the right geographical location. This makes it undesirable and currently not possible for users to select alternatives -- the other servers would not know how to handle your work unit.
Since progressing a project requires units to be returned so that the next one can be generated, it is not in anyone's interests for either the server or the client to "lose" a work unit.