Page 1 of 1
Unable to upload results
Posted: Wed Mar 18, 2020 11:28 am
by Rofox
Hi, one of my machines has a "Send" job that stuck.
From the logs I can see that it's unable to upload the results:
11:23:15:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:3034 gen:0 core:0x22 unit:0x000000029bf7a4d55e6d7717c47431bf
11:23:15:WU01:FS01:Uploading 55.26MiB to 155.247.164.213
11:23:15:WU01:FS01:Connecting to 155.247.164.213:8080
11:23:15:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
11:23:15:WU01:FS01:Trying to send results to collection server
11:23:15:WU01:FS01:Uploading 55.26MiB to 155.247.164.214
11:23:15:WU01:FS01:Connecting to 155.247.164.214:8080
11:23:15:ERROR:WU01:FS01:Exception: Transfer failed
Is there anything I can do to fix that? Restarting client doesn't help
Re: Unable to upload results
Posted: Wed Mar 18, 2020 3:34 pm
by BIG_RED
I have a similar issue but it starts to send the file not sure if it is complete or not. If someone can tell me if project:11741 run:0 clone:5988 gen:3 has been turned in I will delete it. It has been 10+ hours and other WU have been uploading fine to the server. This one just stopped and nothing else in the log for WU01
Code: Select all
05:07:20:WU01:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
05:07:20:WU00:FS01:Connecting to 65.254.110.245:8080
05:07:20:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:07:20:WU00:FS01:Connecting to 18.218.241.186:80
05:07:20:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:07:20:ERROR:WU00:FS01:Exception: Could not get an assignment
05:07:21:WU00:FS01:Connecting to 65.254.110.245:8080
05:07:21:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:07:21:WU00:FS01:Connecting to 18.218.241.186:80
05:07:21:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:07:21:ERROR:WU00:FS01:Exception: Could not get an assignment
05:08:21:WU00:FS01:Connecting to 65.254.110.245:8080
05:08:21:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:08:21:WU00:FS01:Connecting to 18.218.241.186:80
05:08:21:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:08:21:ERROR:WU00:FS01:Exception: Could not get an assignment
05:08:58:WU01:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
05:09:03:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
05:09:03:WU01:FS01:0x22:Saving result file checkpointState.xml
05:09:07:WU01:FS01:0x22:Saving result file checkpt.crc
05:09:07:WU01:FS01:0x22:Saving result file positions.xtc
05:09:08:WU01:FS01:0x22:Saving result file science.log
05:09:08:WU01:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
05:09:09:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
05:09:09:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11741 run:0 clone:5988 gen:3 core:0x22 unit:0x000000068ca304f15e6bc595369789ba
05:09:09:WU01:FS01:Uploading 21.92MiB to 140.163.4.241
05:09:09:WU01:FS01:Connecting to 140.163.4.241:8080
05:09:28:WU01:FS01:Upload 0.86%
05:09:35:WU01:FS01:Upload 13.68%
05:09:41:WU01:FS01:Upload 45.90%
05:09:47:WU01:FS01:Upload 82.11%
05:09:58:WU00:FS01:Connecting to 65.254.110.245:8080
05:09:58:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:09:58:WU00:FS01:Connecting to 18.218.241.186:80
05:09:58:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:09:58:ERROR:WU00:FS01:Exception: Could not get an assignment
05:12:35:WU00:FS01:Connecting to 65.254.110.245:8080
05:12:35:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:12:35:WU00:FS01:Connecting to 18.218.241.186:80
05:12:35:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:12:35:ERROR:WU00:FS01:Exception: Could not get an assignment
05:16:49:WU00:FS01:Connecting to 65.254.110.245:8080
05:16:50:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:16:50:WU00:FS01:Connecting to 18.218.241.186:80
05:16:50:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:16:50:ERROR:WU00:FS01:Exception: Could not get an assignment
05:23:41:WU00:FS01:Connecting to 65.254.110.245:8080
05:23:41:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:23:41:WU00:FS01:Connecting to 18.218.241.186:80
05:23:41:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:23:41:ERROR:WU00:FS01:Exception: Could not get an assignment
05:34:46:WU00:FS01:Connecting to 65.254.110.245:8080
05:34:46:WU00:FS01:Assigned to work server 128.252.203.10
05:34:46:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM200 [GeForce GTX 980 Ti] 5632 from 128.252.203.10
05:34:46:WU00:FS01:Connecting to 128.252.203.10:8080
05:35:07:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
05:35:07:WU00:FS01:Connecting to 128.252.203.10:80
05:35:28:ERROR:WU00:FS01:Exception: Failed to connect to 128.252.203.10:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
05:52:43:WU00:FS01:Connecting to 65.254.110.245:8080
05:52:43:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:52:43:WU00:FS01:Connecting to 18.218.241.186:80
05:52:43:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
05:52:43:ERROR:WU00:FS01:Exception: Could not get an assignment
******************************* Date: 2020-03-18 *******************************
06:21:45:WU00:FS01:Connecting to 65.254.110.245:8080
06:21:45:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
06:21:45:WU00:FS01:Connecting to 18.218.241.186:80
06:21:45:WU00:FS01:Assigned to work server 140.163.4.231
06:21:45:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM200 [GeForce GTX 980 Ti] 5632 from 140.163.4.231
06:21:45:WU00:FS01:Connecting to 140.163.4.231:8080
06:23:18:WU00:FS01:Downloading 11.98MiB
06:23:26:WU00:FS01:Download 18.78%
06:23:34:WU00:FS01:Download 37.57%
06:23:40:WU00:FS01:Download 51.66%
06:23:47:WU00:FS01:Download 68.35%
06:23:53:WU00:FS01:Download 79.31%
06:23:59:WU00:FS01:Download 97.05%
06:24:00:WU00:FS01:Download complete
06:24:00:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:11747 run:0 clone:65 gen:2 core:0x22 unit:0x0000000a8ca304e75e6a7fc5f694159f
06:24:00:WU00:FS01:Starting
Re: Unable to upload results
Posted: Wed Mar 18, 2020 3:44 pm
by Jesse_V
Rofox wrote:Hi, one of my machines has a "Send" job that stuck.
From the logs I can see that it's unable to upload the results:
11:23:15:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:3034 gen:0 core:0x22 unit:0x000000029bf7a4d55e6d7717c47431bf
11:23:15:WU01:FS01:Uploading 55.26MiB to 155.247.164.213
11:23:15:WU01:FS01:Connecting to 155.247.164.213:8080
11:23:15:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
11:23:15:WU01:FS01:Trying to send results to collection server
11:23:15:WU01:FS01:Uploading 55.26MiB to 155.247.164.214
11:23:15:WU01:FS01:Connecting to 155.247.164.214:8080
11:23:15:ERROR:WU01:FS01:Exception: Transfer failed
Is there anything I can do to fix that? Restarting client doesn't help
BIG_RED wrote:I have a similar issue but it starts to send the file not sure if it is complete or not. If someone can tell me if project:11741 run:0 clone:5988 gen:3 has been turned in I will delete it. It has been 10+ hours and other WU have been uploading fine to the server. This one just stopped and nothing else in the log for WU01
Code: Select all
05:07:20:WU01:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
05:07:20:WU00:FS01:Connecting to 65.254.110.245:8080
05:07:20:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
05:07:20:WU00:FS01:Connecting to 18.218.241.186:80
Both of these issues are related to the very rapid expansion of F@h's userbase. The servers are slammed and workunits being consumed faster than they can be generated. The research teams are rapidly spinning up new servers and getting more research projects into the queue, including COVID-19. I'd recommend leaving everything running as they should sort themselves out in the next day or two. viewtopic.php?f=61&t=32424
Re: Unable to upload results
Posted: Wed Mar 18, 2020 6:07 pm
by BIG_RED
Jesse_V wrote:
Both of these issues are related to the very rapid expansion of F@h's userbase. The servers are slammed and workunits being consumed faster than they can be generated. The research teams are rapidly spinning up new servers and getting more research projects into the queue, including COVID-19. I'd recommend leaving everything running as they should sort themselves out in the next day or two. viewtopic.php?f=61&t=32424
Both are upload based not download based. So the creation of WU should not be a factor. I don't care if I have to delete my WU I just don't want the science to stop due to my computer not sending stuff back for others to build upon. If 11741 (0, 5988, 3) could be marked as failed upload(late) retry to create more on that line then waiting for a timeout. (Which still has 4 more hour on the timeout.)
Maybe this is a edge case that code needs to be added that once a WU uploads correctly it should look for and retry past failed or incomplete uploads.
Re: Unable to upload results
Posted: Wed Mar 18, 2020 6:09 pm
by Harmin
more people doing WU's = more people sending back those WU's, so it is based on the same exact issue, servers being overloaded, I am getting the same issues and they resolved themselfs after a while