18201 Upload repeating fails at 57.27% - (60 seconds)

Moderators: Site Moderators, FAHC Science Team

Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Joe_H »

The max-packet setting defaults in the client to "normal" or 25 MB. The size used to be much smaller, I think 5 MB, when dial-up was more common. It is still supposed to be used by researchers setting up projects as not everyone is on fast internet or have unlimited data, uploading a 100 MB file over DSL at a max of 768 Kbps can take a while for example.

The setting should have no effect an on upload, it is only used at assignment time to determine which projects are selected. There is a different server side setting for the maximum size file upload the server will accept. This may come into play when some WUs from a project are not accepted but others are. Depending on the settings for what is included in a WU upload, WUs with many restarts or errors may result in a larger upload file.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Gary480six
Posts: 93
Joined: Mon Jan 21, 2008 6:42 pm

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Gary480six »

Joe,

Maybe there is something to that. I can confirm that the OP and my first post, were both trying to upload a file that was 27.50MB. And that my current stuck work is also 27.50MB. But I have no idea if that is the 'correct' size for the finished work.. or if there was some issue on my end that made the finished work larger than it should be?
And why only on some P18201 work units?
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Neil-B »

I've been uploading 27.5MiB uploads (which appears to be the normal size) for 18201 without issue so not an issue with upload size
Last edited by Neil-B on Wed Nov 03, 2021 6:55 pm, edited 1 time in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Joe_H »

My money would be on either a server side problem or some issue on the network between you and the server. These can be hard to diagnose. In some cases access to the network itself and having a network data analyzer in the circuit is needed to see what actual packets are being sent and received as compared to what should be there.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Gary480six
Posts: 93
Joined: Mon Jan 21, 2008 6:42 pm

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Gary480six »

HEY!

My stuck work was successfully uploaded yesterday! Mind you, it was after 4 days so all I will get is the base points.. but it's still better than zero. Did someone find something on the server side? Far as I know, nothing changed on my end.
Oops... not true. While I was looking around in the advanced settings, I unchecked the box for 'pause Folding when on battery power'. This is Not a laptop - but it still seemed odd that the option was checked.

Still.... thanks to someone, somewhere!
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Joe_H »

That option for pausing on battery is the default, and can also work for pausing folding on a desktop connected to a UPS with monitoring over USB. That part can depend on the OS and how the desktop is configured. I do know it has worked for my desktop Mac in the past.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
jjmiller
Scientist
Posts: 139
Joined: Fri Apr 09, 2021 4:43 pm

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by jjmiller »

Very curious and great to hear! We are able to do a TCP, but missed the window for the failed returns this time. If this happens again I think it may be worthwhile to try and push the TCP a bit more to see if we can actually capture what's causing the failed transfer. This will be especially useful if we have exact times where returns were attempted.
jjmiller
Scientist
Posts: 139
Joined: Fri Apr 09, 2021 4:43 pm

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by jjmiller »

Max-packet used to be a setting that could be used for folks working on dial-up modems but I don't think it is widely used any more (and may not be supported?).

No, but good thinking. There are several WUs that have been reported stuck but have not been quick failed.

Right? That's the thing I keep struggling with. In each case it seems to be highly transient, some specific setups seem prone to stuck WUs, but they're able to complete WUs normally on other times... I'm also following up with our sys admin to see if they have any thoughts.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by bruce »

FAH was designed to use the same protocols as http:\\ (probably to avoid blocked connections on win4 or win5). I'm going to guess that using TCP/IP would fix this problem but introduce a lot of new problems, making it expensive to build and test.

The original version was intentionally made to look like a browser -- probably to avoid folks having to fix firewall issues but AV folks decided to pre-load "approved" browsers and block everything else.
Gary480six
Posts: 93
Joined: Mon Jan 21, 2008 6:42 pm

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Gary480six »

jjmiller wrote:Very curious and great to hear! We are able to do a TCP, but missed the window for the failed returns this time. If this happens again I think it may be worthwhile to try and push the TCP a bit more to see if we can actually capture what's causing the failed transfer. This will be especially useful if we have exact times where returns were attempted.
Yes - I'm back. Different computer... same issue.

Code: Select all

12:42:08:WU01:FS01:Uploading 27.49MiB to 128.252.203.11
12:42:08:WU01:FS01:Connecting to 128.252.203.11:8080
12:42:14:WU01:FS01:Upload 4.09%
12:42:20:WU01:FS01:Upload 9.10%
12:42:37:WU00:FS01:0x22:Completed 4650000 out of 5000000 steps (93%)
12:42:42:WU01:FS01:Upload 12.05%
12:42:42:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
12:42:42:WU01:FS01:Trying to send results to collection server
12:42:42:WU01:FS01:Uploading 27.49MiB to 128.252.203.14
12:42:42:WU01:FS01:Connecting to 128.252.203.14:8080
12:42:48:WU01:FS01:Upload 4.55%
12:42:54:WU01:FS01:Upload 9.55%
12:43:16:WU01:FS01:Upload 12.05%
12:43:16:ERROR:WU01:FS01:Exception: Transfer failed
12:44:39:WU00:FS01:0x22:Completed 4700000 out of 5000000 steps (94%)
12:44:39:WU00:FS01:0x22:Checkpoint completed at step 4700000
12:44:46:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:18201 run:22177 clone:0 gen:20 core:0x22 unit:0x000000000000001400004719000056a1
12:44:46:WU01:FS01:Uploading 27.49MiB to 128.252.203.11
12:44:46:WU01:FS01:Connecting to 128.252.203.11:8080
12:44:52:WU01:FS01:Upload 5.91%
12:44:58:WU01:FS01:Upload 11.82%
12:45:17:WU01:FS01:Upload 12.05%
12:45:17:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
12:45:17:WU01:FS01:Trying to send results to collection server
12:45:17:WU01:FS01:Uploading 27.49MiB to 128.252.203.14
12:45:17:WU01:FS01:Connecting to 128.252.203.14:8080
12:45:23:WU01:FS01:Upload 4.78%
12:45:29:WU01:FS01:Upload 9.32%
12:45:51:WU01:FS01:Upload 12.05%
12:45:51:ERROR:WU01:FS01:Exception: Transfer failed
12:46:40:WU00:FS01:0x22:Completed 4750000 out of 5000000 steps (95%)
12:48:42:WU00:FS01:0x22:Completed 4800000 out of 5000000 steps (96%)
12:48:42:WU00:FS01:0x22:Checkpoint completed at step 4800000
12:49:00:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:18201 run:22177 clone:0 gen:20 core:0x22 unit:0x000000000000001400004719000056a1
12:49:00:WU01:FS01:Uploading 27.49MiB to 128.252.203.11
12:49:00:WU01:FS01:Connecting to 128.252.203.11:8080
12:49:06:WU01:FS01:Upload 4.55%
12:49:12:WU01:FS01:Upload 9.78%
12:49:33:WU01:FS01:Upload 12.05%
12:49:33:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
12:49:33:WU01:FS01:Trying to send results to collection server
12:49:33:WU01:FS01:Uploading 27.49MiB to 128.252.203.14
12:49:33:WU01:FS01:Connecting to 128.252.203.14:8080
12:49:39:WU01:FS01:Upload 4.32%
12:49:45:WU01:FS01:Upload 9.32%
12:50:07:WU01:FS01:Upload 12.05%
12:50:07:ERROR:WU01:FS01:Exception: Transfer failed
12:50:44:WU00:FS01:0x22:Completed 4850000 out of 5000000 steps (97%)
12:52:45:WU00:FS01:0x22:Completed 4900000 out of 5000000 steps (98%)
12:52:45:WU00:FS01:0x22:Checkpoint completed at step 4900000
12:54:46:WU00:FS01:0x22:Completed 4950000 out of 5000000 steps (99%)
12:54:47:WU02:FS01:Connecting to assign1.foldingathome.org:80
12:54:47:WU02:FS01:Assigned to work server 66.170.111.50
12:54:47:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GP104 [GeForce GTX 1070] 6463 from 66.170.111.50
12:54:47:WU02:FS01:Connecting to 66.170.111.50:8080
12:55:11:WU02:FS01:Downloading 48.20MiB
12:55:17:WU02:FS01:Download 21.65%
12:55:23:WU02:FS01:Download 45.64%
12:55:29:WU02:FS01:Download 68.85%
12:55:35:WU02:FS01:Download 92.84%
12:55:36:WU02:FS01:Download complete
12:55:36:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:16609 run:198 clone:2 gen:89 core:0x22 unit:0x0000000200000059000040e1000000c6
12:55:51:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:18201 run:22177 clone:0 gen:20 core:0x22 unit:0x000000000000001400004719000056a1
12:55:51:WU01:FS01:Uploading 27.49MiB to 128.252.203.11
12:55:51:WU01:FS01:Connecting to 128.252.203.11:8080
12:55:57:WU01:FS01:Upload 4.32%
12:56:03:WU01:FS01:Upload 9.32%
12:56:25:WU01:FS01:Upload 12.05%
12:56:25:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
12:56:25:WU01:FS01:Trying to send results to collection server
12:56:25:WU01:FS01:Uploading 27.49MiB to 128.252.203.14
12:56:25:WU01:FS01:Connecting to 128.252.203.14:8080
12:56:31:WU01:FS01:Upload 4.55%
12:56:37:WU01:FS01:Upload 9.55%
12:56:48:WU00:FS01:0x22:Completed 5000000 out of 5000000 steps (100%)
12:56:48:WU00:FS01:0x22:Average performance: 177.778 ns/day
12:56:48:WU00:FS01:0x22:Checkpoint completed at step 5000000
12:56:50:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
12:56:50:WU00:FS01:0x22:Saving result file checkpointIntegrator.xml
12:56:50:WU00:FS01:0x22:Saving result file checkpointState.xml
12:56:50:WU00:FS01:0x22:Saving result file positions.xtc
12:56:50:WU00:FS01:0x22:Saving result file science.log
12:56:50:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
12:56:51:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
12:56:51:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18432 run:24 clone:21 gen:1 core:0x22 unit:0x00000015000000010000480000000018
12:56:51:WU00:FS01:Uploading 14.35MiB to 129.32.209.202
12:56:51:WU00:FS01:Connecting to 129.32.209.202:8080
12:56:51:WU02:FS01:Starting
12:56:51:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:\Users\Compaq 64\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.18/Core_22.fah/FahCore_22.exe" -dir 02 -suffix 01 -version 706 -lifeline 3212 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
12:56:51:WU02:FS01:Started FahCore on PID 1240
12:56:51:WU02:FS01:Core PID:1284
12:56:51:WU02:FS01:FahCore 0x22 started
12:56:51:WU02:FS01:0x22:*********************** Log Started 2021-11-24T12:56:51Z ***********************
12:56:51:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
12:56:51:WU02:FS01:0x22:       Core: Core22
12:56:51:WU02:FS01:0x22:       Type: 0x22
12:56:51:WU02:FS01:0x22:    Version: 0.0.18
12:56:51:WU02:FS01:0x22:     Author: Joseph Coffland <[email protected]>
12:56:51:WU02:FS01:0x22:  Copyright: 2020 foldingathome.org
12:56:51:WU02:FS01:0x22:   Homepage: https://foldingathome.org/
12:56:51:WU02:FS01:0x22:       Date: Sep 28 2021
12:56:51:WU02:FS01:0x22:       Time: 05:55:05
12:56:51:WU02:FS01:0x22:   Revision: cfe3d7d990e8f456e371f8ce63b5fcc6daab2103
12:56:51:WU02:FS01:0x22:     Branch: HEAD
12:56:51:WU02:FS01:0x22:   Compiler: Visual C++
12:56:51:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
12:56:51:WU02:FS01:0x22:             -DOPENMM_VERSION="\"7.6.0\""
12:56:51:WU02:FS01:0x22:   Platform: win32 10
12:56:51:WU02:FS01:0x22:       Bits: 64
12:56:51:WU02:FS01:0x22:       Mode: Release
12:56:51:WU02:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
12:56:51:WU02:FS01:0x22:             <[email protected]>
12:56:51:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 1240 -checkpoint 15
12:56:51:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
12:56:51:WU02:FS01:0x22:             0 -gpu 0
12:56:51:WU02:FS01:0x22:************************************ libFAH ************************************
12:56:51:WU02:FS01:0x22:       Date: Sep 28 2021
12:56:51:WU02:FS01:0x22:       Time: 05:53:43
12:56:51:WU02:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
12:56:51:WU02:FS01:0x22:     Branch: HEAD
12:56:51:WU02:FS01:0x22:   Compiler: Visual C++
12:56:51:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
12:56:51:WU02:FS01:0x22:   Platform: win32 10
12:56:51:WU02:FS01:0x22:       Bits: 64
12:56:51:WU02:FS01:0x22:       Mode: Release
12:56:51:WU02:FS01:0x22:************************************ CBang *************************************
12:56:51:WU02:FS01:0x22:       Date: Sep 28 2021
12:56:51:WU02:FS01:0x22:       Time: 05:52:38
12:56:51:WU02:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
12:56:51:WU02:FS01:0x22:     Branch: HEAD
12:56:51:WU02:FS01:0x22:   Compiler: Visual C++
12:56:51:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
12:56:51:WU02:FS01:0x22:   Platform: win32 10
12:56:51:WU02:FS01:0x22:       Bits: 64
12:56:51:WU02:FS01:0x22:       Mode: Release
12:56:51:WU02:FS01:0x22:************************************ System ************************************
12:56:51:WU02:FS01:0x22:        CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
12:56:51:WU02:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
12:56:51:WU02:FS01:0x22:       CPUs: 8
12:56:51:WU02:FS01:0x22:     Memory: 7.98GiB
12:56:51:WU02:FS01:0x22:Free Memory: 6.13GiB
12:56:51:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
12:56:51:WU02:FS01:0x22: OS Version: 6.1
12:56:51:WU02:FS01:0x22:Has Battery: false
12:56:51:WU02:FS01:0x22: On Battery: false
12:56:51:WU02:FS01:0x22: UTC Offset: -5
12:56:51:WU02:FS01:0x22:        PID: 1284
12:56:51:WU02:FS01:0x22:        CWD: C:\Users\Compaq 64\AppData\Roaming\FAHClient\work
12:56:51:WU02:FS01:0x22:************************************ OpenMM ************************************
12:56:51:WU02:FS01:0x22:    Version: 7.6.0
12:56:51:WU02:FS01:0x22:********************************************************************************
12:56:51:WU02:FS01:0x22:Project: 16609 (Run 198, Clone 2, Gen 89)
12:56:51:WU02:FS01:0x22:Unit: 0x00000000000000000000000000000000
12:56:51:WU02:FS01:0x22:Reading tar file core.xml
12:56:51:WU02:FS01:0x22:Reading tar file integrator.xml
12:56:51:WU02:FS01:0x22:Reading tar file state.xml
12:56:53:WU02:FS01:0x22:Reading tar file system.xml
12:56:55:WU02:FS01:0x22:Digital signatures verified
12:56:55:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
12:56:55:WU02:FS01:0x22:Version 0.0.18
12:56:55:WU02:FS01:0x22:  Checkpoint write interval: 62500 steps (5%) [20 total]
12:56:55:WU02:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
12:56:55:WU02:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
12:56:55:WU02:FS01:0x22:  Global context and integrator variables write interval: disabled
12:56:55:WU02:FS01:0x22:There are 4 platforms available.
12:56:55:WU02:FS01:0x22:Platform 0: Reference
12:56:55:WU02:FS01:0x22:Platform 1: CPU
12:56:55:WU02:FS01:0x22:Platform 2: OpenCL
12:56:55:WU02:FS01:0x22:  opencl-device 0 specified
12:56:55:WU02:FS01:0x22:Platform 3: CUDA
12:56:55:WU02:FS01:0x22:  cuda-device 0 specified
12:56:57:WU00:FS01:Upload 21.35%
12:56:58:WU01:FS01:Upload 12.05%
12:56:58:ERROR:WU01:FS01:Exception: Transfer failed
12:57:03:WU00:FS01:Upload 42.69%
12:57:09:WU00:FS01:Upload 64.47%
12:57:15:WU00:FS01:Upload 86.69%
12:57:18:WU00:FS01:Upload complete
12:57:18:WU00:FS01:Server responded WORK_ACK (400)
12:57:18:WU00:FS01:Final credit estimate, 176468.00 points
12:57:18:WU00:FS01:Cleaning up
12:57:28:WU02:FS01:0x22:Attempting to create CUDA context:
12:57:28:WU02:FS01:0x22:  Configuring platform CUDA
12:57:38:WU02:FS01:0x22:  Using CUDA and gpu 0
12:57:39:WU02:FS01:0x22:Completed 0 out of 1250000 steps (0%)
12:57:41:WU02:FS01:0x22:Checkpoint completed at step 0
That snip of my log file has time stamps for East Coast USA ("exact times when returns were attempted").

And, as you can see, even as that GPU work unit was stuck - a different work unit was returned. Same computer, same network, same Microsoft Security Essentials.
(I also just unchecked that 'Pause On battery' box on that PC - no joy this time) I guess it was a fluke?
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Joe_H »

Gary480six wrote:That snip of my log file has time stamps for East Coast USA ("exact times when returns were attempted").
Time stamps in a F@h log file are always in UTC time, not local time. It simplifies checking time between clients and servers that can be in almost any time zone.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
jnv11
Posts: 31
Joined: Wed Sep 02, 2020 5:49 am
Hardware configuration: CPU: Intel® Xeon® W-2295 Processor
GPU: Nvidia Titan RTX
OS: Windows 10 Pro
Motherboard: Asus WS C422 SAGE/10G
RAM: 4x16GB Crucial DDR4-2933 RDIMMs
Location: Morrisville, NC, USA

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by jnv11 »

I can confirm that I have the same trouble with the same project.

Code: Select all

*********************** Log Started 2021-11-27T18:50:05Z ***********************
18:50:06:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:50:06:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:50:06:WU00:FS01:Connecting to 128.252.203.11:8080
18:50:25:WU00:FS01:Upload 0.91%
18:50:32:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:50:32:WU00:FS01:Trying to send results to collection server
18:50:32:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:50:32:WU00:FS01:Connecting to 128.252.203.14:8080
18:50:52:WU00:FS01:Upload 0.91%
18:50:52:ERROR:WU00:FS01:Exception: Transfer failed
18:50:52:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:50:52:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:50:52:WU00:FS01:Connecting to 128.252.203.11:8080
18:51:24:WU00:FS01:Upload 0.68%
18:51:24:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:51:24:WU00:FS01:Trying to send results to collection server
18:51:24:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:51:24:WU00:FS01:Connecting to 128.252.203.14:8080
18:51:44:WU00:FS01:Upload 0.91%
18:51:44:ERROR:WU00:FS01:Exception: Transfer failed
18:51:52:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:51:52:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:51:52:WU00:FS01:Connecting to 128.252.203.11:8080
18:52:27:WU00:FS01:Upload 0.68%
18:52:27:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:52:27:WU00:FS01:Trying to send results to collection server
18:52:27:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:52:27:WU00:FS01:Connecting to 128.252.203.14:8080
18:52:47:WU00:FS01:Upload 0.91%
18:52:47:ERROR:WU00:FS01:Exception: Transfer failed
18:53:29:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:53:29:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:53:29:WU00:FS01:Connecting to 128.252.203.11:8080
18:53:59:WU00:FS01:Upload 0.68%
18:53:59:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:53:59:WU00:FS01:Trying to send results to collection server
18:53:59:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:53:59:WU00:FS01:Connecting to 128.252.203.14:8080
18:54:19:WU00:FS01:Upload 0.91%
18:54:19:ERROR:WU00:FS01:Exception: Transfer failed
18:56:06:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:56:06:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:56:06:WU00:FS01:Connecting to 128.252.203.11:8080
18:56:26:WU00:FS01:Upload 0.91%
18:56:26:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:56:26:WU00:FS01:Trying to send results to collection server
18:56:26:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:56:26:WU00:FS01:Connecting to 128.252.203.14:8080
18:56:46:WU00:FS01:Upload 0.68%
18:56:46:ERROR:WU00:FS01:Exception: Transfer failed
19:00:21:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
19:00:21:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
19:00:21:WU00:FS01:Connecting to 128.252.203.11:8080
19:00:41:WU00:FS01:Upload 0.91%
19:00:41:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:00:41:WU00:FS01:Trying to send results to collection server
19:00:41:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
19:00:41:WU00:FS01:Connecting to 128.252.203.14:8080
19:01:01:WU00:FS01:Upload 0.68%
19:01:01:ERROR:WU00:FS01:Exception: Transfer failed
19:07:12:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
19:07:12:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
19:07:12:WU00:FS01:Connecting to 128.252.203.11:8080
19:07:32:WU00:FS01:Upload 0.91%
19:07:32:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:07:32:WU00:FS01:Trying to send results to collection server
19:07:32:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
19:07:32:WU00:FS01:Connecting to 128.252.203.14:8080
19:07:52:WU00:FS01:Upload 0.91%
19:07:52:ERROR:WU00:FS01:Exception: Transfer failed
19:18:18:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
19:18:18:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
19:18:18:WU00:FS01:Connecting to 128.252.203.11:8080
19:18:45:WU00:FS01:Upload 0.68%
19:18:45:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:18:45:WU00:FS01:Trying to send results to collection server
19:18:45:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
19:18:45:WU00:FS01:Connecting to 128.252.203.14:8080
19:19:05:WU00:FS01:Upload 0.91%
19:19:05:ERROR:WU00:FS01:Exception: Transfer failed
jnv11
Posts: 31
Joined: Wed Sep 02, 2020 5:49 am
Hardware configuration: CPU: Intel® Xeon® W-2295 Processor
GPU: Nvidia Titan RTX
OS: Windows 10 Pro
Motherboard: Asus WS C422 SAGE/10G
RAM: 4x16GB Crucial DDR4-2933 RDIMMs
Location: Morrisville, NC, USA

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by jnv11 »

jnv11 wrote:I can confirm that I have the same trouble with the same project.

Code: Select all

*********************** Log Started 2021-11-27T18:50:05Z ***********************
18:50:06:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:50:06:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:50:06:WU00:FS01:Connecting to 128.252.203.11:8080
18:50:25:WU00:FS01:Upload 0.91%
18:50:32:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:50:32:WU00:FS01:Trying to send results to collection server
18:50:32:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:50:32:WU00:FS01:Connecting to 128.252.203.14:8080
18:50:52:WU00:FS01:Upload 0.91%
18:50:52:ERROR:WU00:FS01:Exception: Transfer failed
18:50:52:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:50:52:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:50:52:WU00:FS01:Connecting to 128.252.203.11:8080
18:51:24:WU00:FS01:Upload 0.68%
18:51:24:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:51:24:WU00:FS01:Trying to send results to collection server
18:51:24:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:51:24:WU00:FS01:Connecting to 128.252.203.14:8080
18:51:44:WU00:FS01:Upload 0.91%
18:51:44:ERROR:WU00:FS01:Exception: Transfer failed
18:51:52:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:51:52:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:51:52:WU00:FS01:Connecting to 128.252.203.11:8080
18:52:27:WU00:FS01:Upload 0.68%
18:52:27:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:52:27:WU00:FS01:Trying to send results to collection server
18:52:27:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:52:27:WU00:FS01:Connecting to 128.252.203.14:8080
18:52:47:WU00:FS01:Upload 0.91%
18:52:47:ERROR:WU00:FS01:Exception: Transfer failed
18:53:29:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:53:29:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:53:29:WU00:FS01:Connecting to 128.252.203.11:8080
18:53:59:WU00:FS01:Upload 0.68%
18:53:59:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:53:59:WU00:FS01:Trying to send results to collection server
18:53:59:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:53:59:WU00:FS01:Connecting to 128.252.203.14:8080
18:54:19:WU00:FS01:Upload 0.91%
18:54:19:ERROR:WU00:FS01:Exception: Transfer failed
18:56:06:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
18:56:06:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
18:56:06:WU00:FS01:Connecting to 128.252.203.11:8080
18:56:26:WU00:FS01:Upload 0.91%
18:56:26:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
18:56:26:WU00:FS01:Trying to send results to collection server
18:56:26:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
18:56:26:WU00:FS01:Connecting to 128.252.203.14:8080
18:56:46:WU00:FS01:Upload 0.68%
18:56:46:ERROR:WU00:FS01:Exception: Transfer failed
19:00:21:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
19:00:21:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
19:00:21:WU00:FS01:Connecting to 128.252.203.11:8080
19:00:41:WU00:FS01:Upload 0.91%
19:00:41:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:00:41:WU00:FS01:Trying to send results to collection server
19:00:41:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
19:00:41:WU00:FS01:Connecting to 128.252.203.14:8080
19:01:01:WU00:FS01:Upload 0.68%
19:01:01:ERROR:WU00:FS01:Exception: Transfer failed
19:07:12:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
19:07:12:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
19:07:12:WU00:FS01:Connecting to 128.252.203.11:8080
19:07:32:WU00:FS01:Upload 0.91%
19:07:32:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:07:32:WU00:FS01:Trying to send results to collection server
19:07:32:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
19:07:32:WU00:FS01:Connecting to 128.252.203.14:8080
19:07:52:WU00:FS01:Upload 0.91%
19:07:52:ERROR:WU00:FS01:Exception: Transfer failed
19:18:18:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:48216 clone:0 gen:21 core:0x22 unit:0x0000000000000015000047190000bc58
19:18:18:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
19:18:18:WU00:FS01:Connecting to 128.252.203.11:8080
19:18:45:WU00:FS01:Upload 0.68%
19:18:45:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:18:45:WU00:FS01:Trying to send results to collection server
19:18:45:WU00:FS01:Uploading 27.49MiB to 128.252.203.14
19:18:45:WU00:FS01:Connecting to 128.252.203.14:8080
19:19:05:WU00:FS01:Upload 0.91%
19:19:05:ERROR:WU00:FS01:Exception: Transfer failed
My work unit finally uploaded.
Gary480six
Posts: 93
Joined: Mon Jan 21, 2008 6:42 pm

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by Gary480six »

Mine Uploaded too - about two days after it got stuck.

No changes made to the PC, to the network.. nothing. Failed to upload, Failed to upload - then success.

And while that one work unit was stuck, other work units.. including other P18201 work units were downloaded, finished, and uploaded without error.
jjmiller
Scientist
Posts: 139
Joined: Fri Apr 09, 2021 4:43 pm

Re: 18201 Upload repeating fails at 57.27% - (60 seconds)

Post by jjmiller »

Thanks- we captured some packet data on WUs from another user which I'm hoping we'll be able to look at once everyone's back from the holiday. As far as I'm aware we haven't changed anything on our end at all either, but I'll ask around tomorrow.

Has anyone had this issue on other projects, or has it just been with 18201s? If so, are 18201s the predominant WUs you see folding?
Post Reply