Multiple WU's Fail downld/upld to 155.247.166.*

Moderators: Site Moderators, FAHC Science Team

Post Reply
rewron
Posts: 12
Joined: Fri Nov 04, 2011 8:25 pm

Multiple WU's Fail downld/upld to 155.247.166.*

Post by rewron »

I now have 3 Wu's that continuously attempt but fail to upload. They share work/collection servers 155.247.166.219; 155.247.166.220; and 128.252.203.4. I am able to receive and upload other WU's while these WU's remain uncollected.

Log file is attached. Suggestions would be appreciated.

Thanks.

Code: Select all

*********************** Log Started 2019-09-24T15:27:30Z ***********************
15:27:30:************************* Folding@home Client *************************
15:27:30:      Website: http://folding.stanford.edu/
15:27:30:    Copyright: (c) 2009-2014 Stanford University
15:27:30:       Author: Joseph Coffland <[email protected]>
15:27:30:         Args: --open-web-control
15:27:30:       Config: C:/Users/Ron/AppData/Roaming/FAHClient/config.xml
15:27:30:******************************** Build ********************************
15:27:30:      Version: 7.4.4
15:27:30:         Date: Mar 4 2014
15:27:30:         Time: 20:26:54
15:27:30:      SVN Rev: 4130
15:27:30:       Branch: fah/trunk/client
15:27:30:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
15:27:30:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
15:27:30:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
15:27:30:     Platform: win32 XP
15:27:30:         Bits: 32
15:27:30:         Mode: Release
15:27:30:******************************* System ********************************
15:27:30:          CPU: Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz
15:27:30:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
15:27:30:         CPUs: 4
15:27:30:       Memory: 3.89GiB
15:27:30:  Free Memory: 3.19GiB
15:27:30:      Threads: WINDOWS_THREADS
15:27:30:   OS Version: 6.1
15:27:30:  Has Battery: false
15:27:30:   On Battery: false
15:27:30:   UTC Offset: -4
15:27:30:          PID: 2384
15:27:30:          CWD: C:/Users/Ron/AppData/Roaming/FAHClient
15:27:30:           OS: Windows 7 Home Premium
15:27:30:      OS Arch: AMD64
15:27:30:         GPUs: 0
15:27:30:         CUDA: Not detected
15:27:30:Win32 Service: false
15:27:30:***********************************************************************
15:27:30:<config>
15:27:30:  <!-- Network -->
15:27:30:  <proxy v=':8080'/>
15:27:30:
15:27:30:  <!-- Slot Control -->
15:27:30:  <power v='FULL'/>
15:27:30:
15:27:30:  <!-- User Information -->
15:27:30:  <passkey v='********************************'/>
15:27:30:  <team v='4'/>
15:27:30:  <user v='rewron'/>
15:27:30:
15:27:30:  <!-- Folding Slots -->
15:27:30:  <slot id='0' type='CPU'>
15:27:30:    <paused v='true'/>
15:27:30:  </slot>
15:27:30:</config>
15:27:30:Trying to access database...
15:27:30:Successfully acquired database lock
15:27:30:Enabled folding slot 00: PAUSED cpu:4 (by user)
15:27:32:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14175 run:0 clone:389 gen:4 core:0xa7 unit:0x000000050002894b5d65700e9d4ea3b4
15:27:35:WU01:FS00:Uploading 378.91MiB to 155.247.166.219
15:27:35:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14189 run:6 clone:134 gen:2 core:0xa7 unit:0x000000040002894b5d543dafa435313a
15:27:35:WU01:FS00:Connecting to 155.247.166.219:8080
15:27:35:WU00:FS00:Uploading 172.91MiB to 155.247.166.219
15:27:35:WU00:FS00:Connecting to 155.247.166.219:8080
15:27:35:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14189 run:1 clone:275 gen:2 core:0xa7 unit:0x000000040002894b5d77e3f99e5d8b7b
15:27:36:WU02:FS00:Uploading 159.71MiB to 155.247.166.219
15:27:36:WU02:FS00:Connecting to 155.247.166.219:8080
15:27:36:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
15:27:36:WU00:FS00:Trying to send results to collection server
15:27:37:WU00:FS00:Uploading 172.91MiB to 155.247.166.220
15:27:37:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
15:27:37:WU01:FS00:Trying to send results to collection server
15:27:37:WU00:FS00:Connecting to 155.247.166.220:8080
15:27:37:WU01:FS00:Uploading 378.91MiB to 155.247.166.220
15:27:37:WU01:FS00:Connecting to 155.247.166.220:8080
15:27:38:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
15:27:38:WU02:FS00:Trying to send results to collection server
15:27:38:WU02:FS00:Uploading 159.71MiB to 128.252.203.4
15:27:38:WU02:FS00:Connecting to 128.252.203.4:8080
15:27:38:ERROR:WU00:FS00:Exception: Transfer failed
15:27:39:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14189 run:6 clone:134 gen:2 core:0xa7 unit:0x000000040002894b5d543dafa435313a
15:27:39:WU00:FS00:Uploading 172.91MiB to 155.247.166.219
15:27:39:ERROR:WU01:FS00:Exception: Transfer failed
15:27:39:WU00:FS00:Connecting to 155.247.166.219:8080
15:27:39:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14175 run:0 clone:389 gen:4 core:0xa7 unit:0x000000050002894b5d65700e9d4ea3b4
15:27:39:WU01:FS00:Uploading 378.91MiB to 155.247.166.219
15:27:39:WU01:FS00:Connecting to 155.247.166.219:8080
15:27:40:ERROR:WU02:FS00:Exception: Transfer failed
15:27:40:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14189 run:1 clone:275 gen:2 core:0xa7 unit:0x000000040002894b5d77e3f99e5d8b7b
15:27:40:WU02:FS00:Uploading 159.71MiB to 155.247.166.219
15:27:40:WU02:FS00:Connecting to 155.247.166.219:8080
15:27:40:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
15:27:40:WU00:FS00:Trying to send results to collection server
15:27:41:WU00:FS00:Uploading 172.91MiB to 155.247.166.220
15:27:41:WU00:FS00:Connecting to 155.247.166.220:8080
15:27:41:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
15:27:41:WU01:FS00:Trying to send results to collection server
15:27:41:WU01:FS00:Uploading 378.91MiB to 155.247.166.220
15:27:41:WU01:FS00:Connecting to 155.247.166.220:8080
15:27:42:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
15:27:42:WU02:FS00:Trying to send results to collection server
15:27:42:WU02:FS00:Uploading 159.71MiB to 128.252.203.4
15:27:42:WU02:FS00:Connecting to 128.252.203.4:8080
15:27:42:ERROR:WU00:FS00:Exception: Transfer failed
15:27:43:ERROR:WU01:FS00:Exception: Transfer failed
15:27:43:ERROR:WU02:FS00:Exception: Transfer failed
15:28:39:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14189 run:6 clone:134 gen:2 core:0xa7 unit:0x000000040002894b5d543dafa435313a
15:28:39:WU00:FS00:Uploading 172.91MiB to 155.247.166.219
15:28:39:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14175 run:0 clone:389 gen:4 core:0xa7 unit:0x000000050002894b5d65700e9d4ea3b4
15:28:39:WU00:FS00:Connecting to 155.247.166.219:8080
15:28:40:WU01:FS00:Uploading 378.91MiB to 155.247.166.219
15:28:40:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14189 run:1 clone:275 gen:2 core:0xa7 unit:0x000000040002894b5d77e3f99e5d8b7b
15:28:40:WU01:FS00:Connecting to 155.247.166.219:8080
15:28:41:WU02:FS00:Uploading 159.71MiB to 155.247.166.219
15:28:41:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
15:28:41:WU00:FS00:Trying to send results to collection server
15:28:41:WU02:FS00:Connecting to 155.247.166.219:8080
15:28:41:WU00:FS00:Uploading 172.91MiB to 155.247.166.220
15:28:41:WU00:FS00:Connecting to 155.247.166.220:8080
15:28:42:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
15:28:42:WU01:FS00:Trying to send results to collection server
15:28:42:WU01:FS00:Uploading 378.91MiB to 155.247.166.220
15:28:42:WU01:FS00:Connecting to 155.247.166.220:8080
15:28:42:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
15:28:42:WU02:FS00:Trying to send results to collection server
15:28:43:WU02:FS00:Uploading 159.71MiB to 128.252.203.4
15:28:43:ERROR:WU00:FS00:Exception: Transfer failed
15:28:43:WU02:FS00:Connecting to 128.252.203.4:8080
15:28:43:ERROR:WU01:FS00:Exception: Transfer failed
15:28:44:ERROR:WU02:FS00:Exception: Transfer failed
15:28:47:FS00:Unpaused
15:28:47:WU03:FS00:Starting
15:28:47:WU03:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Ron/AppData/Roaming/FAHClient/cores/cores.foldingathome.org/Win32/x86/Core_a7.fah/FahCore_a7.exe -dir 03 -suffix 01 -version 704 -lifeline 2384 -checkpoint 15 -np 4
15:28:47:WU03:FS00:Started FahCore on PID 2936
15:28:48:WU03:FS00:Core PID:2948
15:28:48:WU03:FS00:FahCore 0xa7 started
15:28:52:WU03:FS00:0xa7:*********************** Log Started 2019-09-24T15:28:51Z ***********************
15:28:52:WU03:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
15:28:52:WU03:FS00:0xa7:       Type: 0xa7
15:28:52:WU03:FS00:0xa7:       Core: Gromacs
15:28:52:WU03:FS00:0xa7:    Website: https://foldingathome.org/
15:28:52:WU03:FS00:0xa7:  Copyright: (c) 2009-2018 foldingathome.org
15:28:52:WU03:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
15:28:52:WU03:FS00:0xa7:       Args: -dir 03 -suffix 01 -version 704 -lifeline 2936 -checkpoint 15 -np 4
15:28:52:WU03:FS00:0xa7:     Config: <none>
15:28:52:WU03:FS00:0xa7:************************************ Build *************************************
15:28:52:WU03:FS00:0xa7:    Version: 0.0.17
15:28:52:WU03:FS00:0xa7:       Date: Apr 25 2018
15:28:52:WU03:FS00:0xa7:       Time: 11:02:26
15:28:52:WU03:FS00:0xa7: Repository: Git
15:28:52:WU03:FS00:0xa7:   Revision: fd11abfb405c921e66db1226933e9dd2d18d2acc
15:28:52:WU03:FS00:0xa7:     Branch: master
15:28:52:WU03:FS00:0xa7:   Compiler: Visual C++ 2008
15:28:52:WU03:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
15:28:52:WU03:FS00:0xa7:   Platform: win32 10
15:28:52:WU03:FS00:0xa7:       Bits: 32
15:28:52:WU03:FS00:0xa7:       Mode: Release
15:28:52:WU03:FS00:0xa7:       SIMD: sse2
15:28:52:WU03:FS00:0xa7:************************************ System ************************************
15:28:52:WU03:FS00:0xa7:        CPU: Unknown
15:28:52:WU03:FS00:0xa7:     CPU ID: 
15:28:52:WU03:FS00:0xa7:       CPUs: 4
15:28:52:WU03:FS00:0xa7:     Memory: 3.89GiB
15:28:52:WU03:FS00:0xa7:Free Memory: 3.12GiB
15:28:52:WU03:FS00:0xa7:    Threads: WINDOWS_THREADS
15:28:52:WU03:FS00:0xa7: OS Version: 6.1
15:28:52:WU03:FS00:0xa7:Has Battery: false
15:28:52:WU03:FS00:0xa7: On Battery: false
15:28:52:WU03:FS00:0xa7: UTC Offset: -4
15:28:52:WU03:FS00:0xa7:        PID: 2948
15:28:52:WU03:FS00:0xa7:        CWD: C:\Users\Ron\AppData\Roaming\FAHClient\work
15:28:52:WU03:FS00:0xa7:         OS: Windows 7 Home Premium
15:28:52:WU03:FS00:0xa7:    OS Arch: AMD64
15:28:52:WU03:FS00:0xa7:********************************************************************************
15:28:52:WU03:FS00:0xa7:Project: 13823 (Run 270, Clone 0, Gen 98)
15:28:52:WU03:FS00:0xa7:Unit: 0x0000008080fccb095c8ff668f76f5418
15:28:52:WU03:FS00:0xa7:Digital signatures verified
15:28:52:WU03:FS00:0xa7:Calling: mdrun -s frame98.tpr -o frame98.trr -x frame98.xtc -cpi state.cpt -cpt 15 -nt 4
15:29:14:WU03:FS00:0xa7:Steps: first=12250000 total=125000
15:29:25:WU03:FS00:0xa7:Completed 53312 out of 125000 steps (42%)
15:29:32:Removing old file 'configs/config-20190705-171545.xml'
15:29:32:Saving configuration to config.xml
15:29:32:<config>
15:29:32:  <!-- Network -->
15:29:32:  <proxy v=':8080'/>
15:29:32:
15:29:32:  <!-- Slot Control -->
15:29:32:  <power v='FULL'/>
15:29:32:
15:29:32:  <!-- User Information -->
15:29:32:  <passkey v='********************************'/>
15:29:32:  <team v='4'/>
15:29:32:  <user v='rewron'/>
15:29:32:
15:29:32:  <!-- Folding Slots -->
15:29:32:  <slot id='0' type='CPU'/>
15:29:32:</config>
15:29:39:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14189 run:6 clone:134 gen:2 core:0xa7 unit:0x000000040002894b5d543dafa435313a
15:29:39:WU00:FS00:Uploading 172.91MiB to 155.247.166.219
15:29:39:WU00:FS00:Connecting to 155.247.166.219:8080
15:29:39:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
15:29:39:WU00:FS00:Trying to send results to collection server
15:29:40:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14175 run:0 clone:389 gen:4 core:0xa7 unit:0x000000050002894b5d65700e9d4ea3b4
15:29:40:WU00:FS00:Uploading 172.91MiB to 155.247.166.220
15:29:40:WU00:FS00:Connecting to 155.247.166.220:8080
15:29:40:WU01:FS00:Uploading 378.91MiB to 155.247.166.219
15:29:40:WU01:FS00:Connecting to 155.247.166.219:8080
15:29:40:ERROR:WU00:FS00:Exception: Transfer failed
15:29:40:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14189 run:1 clone:275 gen:2 core:0xa7 unit:0x000000040002894b5d77e3f99e5d8b7b
15:29:40:WU02:FS00:Uploading 159.71MiB to 155.247.166.219
15:29:40:WU02:FS00:Connecting to 155.247.166.219:8080
15:29:41:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
15:29:41:WU01:FS00:Trying to send results to collection server
15:29:41:WU01:FS00:Uploading 378.91MiB to 155.247.166.220
15:29:41:WU01:FS00:Connecting to 155.247.166.220:8080
15:29:42:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
15:29:42:WU02:FS00:Trying to send results to collection server
15:29:42:WU02:FS00:Uploading 159.71MiB to 128.252.203.4
15:29:42:WU02:FS00:Connecting to 128.252.203.4:8080
15:29:42:ERROR:WU01:FS00:Exception: Transfer failed
15:30:01:WU02:FS00:Upload 0.04%
15:30:01:ERROR:WU02:FS00:Exception: Transfer failed
15:31:16:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14189 run:6 clone:134 gen:2 core:0xa7 unit:0x000000040002894b5d543dafa435313a
15:31:16:WU00:FS00:Uploading 172.91MiB to 155.247.166.219
15:31:16:WU00:FS00:Connecting to 155.247.166.219:8080
15:31:17:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
15:31:17:WU00:FS00:Trying to send results to collection server
15:31:17:WU00:FS00:Uploading 172.91MiB to 155.247.166.220
15:31:17:WU00:FS00:Connecting to 155.247.166.220:8080
15:31:17:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14175 run:0 clone:389 gen:4 core:0xa7 unit:0x000000050002894b5d65700e9d4ea3b4
15:31:17:WU01:FS00:Uploading 378.91MiB to 155.247.166.219
15:31:17:WU01:FS00:Connecting to 155.247.166.219:8080
15:31:17:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14189 run:1 clone:275 gen:2 core:0xa7 unit:0x000000040002894b5d77e3f99e5d8b7b
15:31:18:WU02:FS00:Uploading 159.71MiB to 155.247.166.219
15:31:18:WU02:FS00:Connecting to 155.247.166.219:8080
15:31:18:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
15:31:18:WU01:FS00:Trying to send results to collection server
15:31:18:WU01:FS00:Uploading 378.91MiB to 155.247.166.220
15:31:18:WU01:FS00:Connecting to 155.247.166.220:8080
15:31:18:ERROR:WU00:FS00:Exception: Transfer failed
15:31:19:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
15:31:19:WU02:FS00:Trying to send results to collection server
15:31:19:WU02:FS00:Uploading 159.71MiB to 128.252.203.4
15:31:19:WU02:FS00:Connecting to 128.252.203.4:8080
15:31:19:ERROR:WU01:FS00:Exception: Transfer failed
15:31:38:WU02:FS00:Upload 0.04%
15:31:38:ERROR:WU02:FS00:Exception: Transfer failed
15:32:52:WU03:FS00:0xa7:Completed 53750 out of 125000 steps (43%)
15:33:53:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14189 run:6 clone:134 gen:2 core:0xa7 unit:0x000000040002894b5d543dafa435313a
15:33:54:WU00:FS00:Uploading 172.91MiB to 155.247.166.219
15:33:54:WU00:FS00:Connecting to 155.247.166.219:8080
15:33:54:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
15:33:54:WU00:FS00:Trying to send results to collection server
15:33:54:WU00:FS00:Uploading 172.91MiB to 155.247.166.220
15:33:54:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14175 run:0 clone:389 gen:4 core:0xa7 unit:0x000000050002894b5d65700e9d4ea3b4
15:33:54:WU00:FS00:Connecting to 155.247.166.220:8080
15:33:55:WU01:FS00:Uploading 378.91MiB to 155.247.166.219
15:33:55:WU01:FS00:Connecting to 155.247.166.219:8080
15:33:55:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14189 run:1 clone:275 gen:2 core:0xa7 unit:0x000000040002894b5d77e3f99e5d8b7b
15:33:55:WU02:FS00:Uploading 159.71MiB to 155.247.166.219
15:33:55:WU02:FS00:Connecting to 155.247.166.219:8080
15:33:55:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
15:33:55:WU01:FS00:Trying to send results to collection server
15:33:55:WU01:FS00:Uploading 378.91MiB to 155.247.166.220
15:33:55:ERROR:WU00:FS00:Exception: Transfer failed
15:33:55:WU01:FS00:Connecting to 155.247.166.220:8080
15:33:56:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
15:33:56:WU02:FS00:Trying to send results to collection server
15:33:56:WU02:FS00:Uploading 159.71MiB to 128.252.203.4
15:33:56:WU02:FS00:Connecting to 128.252.203.4:8080
15:33:57:ERROR:WU01:FS00:Exception: Transfer failed
15:33:58:ERROR:WU02:FS00:Exception: Transfer failed
Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

New Work Units Fail to Complete Download

Post by Catalina588 »

I've had two Linux and two Windows 10 clients hang on different days in the last week while downloading a new work unit. Looking at the log (at normal verbosity), the work unit just hangs at some point beyond 50% completion. The download never completes so the client slot sits ready and awaiting work.

That got my attention. All my machines have two or more GPUs, so it's not likely to be an obvious comms problem. Assume all the FAH and OS are up to date. So far, it's a small percentage of the 125-145 units a day my farm completes.

I noticed that recent work units are approaching 70MB in size -- far greater than I recall.

Is it possible that a more robust download failure recovery routine is needed given the many more packets that need to be received and acknowledged?
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: New Work Units Fail to Complete Download

Post by MeeLee »

In Linux I see the same issues.
Either a down or upload fail.
Most of the time, this happens over wifi (with me), and when multiple cards are downloading / uploading WUs at the same time; one of them gets stalled.
Did you try pausing/unpausing, or were you forced to remove and re-add the slot without restart? (I know you can restart the service as well, but if you don't want to pause the other WUs it's not recommended).
JohnJohn
Posts: 14
Joined: Fri Jun 14, 2019 1:06 am

Re: Multiple WU's Fail to Upload

Post by JohnJohn »

+1 I'm also currently looking at failed downloads from 155.247.166.219...
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail to Upload

Post by bruce »

When were these WUs assigned to your machine? How long have these WUs been on your system?

All WUs have a deadline and normally a WU which expires is deleted by the client. If, for some reason, the client was unable to delete them at that time, they will never upload. In fact, the first two of those three WUs was completed some time ago.

My current theory is that these are copies that have expired and simply are no longer accepted -- but I'd need more information from your logs to confirm that suspicion.

project:14189 run:6 clone:134 gen:2 was completed 2019-09-11 11:45:21.
project:14175 run:0 clone:389 gen:4 was completed 2019-09-10 02:15:18

project:14189 run:1 clone:275 gen:2 is a bit strange because I can find no record of it. In fact, the preceding WU, project:14189 run:1 clone:275 gen:1, was returned 2019-09-21 07:15:16 so gen:2 would have been issued soon after that and it apparently has not been returned, nor has it expired. I'll have to dig deeper.
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Multiple WU's Fail to Upload

Post by Joe_H »

Besides what Bruce has mentioned, I would suggest upgrading to the current version of the folding client, 7.5.1. It does have some improvements in the network connection code over the 7.4.4 version you are running.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
JohnJohn
Posts: 14
Joined: Fri Jun 14, 2019 1:06 am

Re: Multiple WU's Fail to Upload

Post by JohnJohn »

Here is my server error log:

Code: Select all

*********************** Log Started 2019-09-25T01:06:59Z ***********************
01:06:59:************************* Folding@home Client *************************
01:06:59:        Website: https://foldingathome.org/
01:06:59:      Copyright: (c) 2009-2018 foldingathome.org
01:06:59:         Author: Joseph Coffland <[email protected]>
01:06:59:           Args: 
01:06:59:         Config: C:\Users\******\AppData\Roaming\FAHClient\config.xml
01:06:59:******************************** Build ********************************
01:06:59:        Version: 7.5.1
01:06:59:           Date: May 11 2018
01:06:59:           Time: 13:06:32
01:06:59:     Repository: Git
01:06:59:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
01:06:59:         Branch: master
01:06:59:       Compiler: Visual C++ 2008
01:06:59:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
01:06:59:       Platform: win32 10
01:06:59:           Bits: 32
01:06:59:           Mode: Release
01:06:59:******************************* System ********************************
01:06:59:            CPU: AMD Ryzen 7 3700X 8-Core Processor
01:06:59:         CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
01:06:59:           CPUs: 16
01:06:59:         Memory: 15.92GiB
01:06:59:    Free Memory: 13.57GiB
01:06:59:        Threads: WINDOWS_THREADS
01:06:59:     OS Version: 6.2
01:06:59:    Has Battery: false
01:06:59:     On Battery: false
01:06:59:     UTC Offset: -5
01:06:59:            PID: 11992
01:06:59:            CWD: C:\Users\******\AppData\Roaming\FAHClient
01:06:59:             OS: Windows 10 Home
01:06:59:        OS Arch: AMD64
01:06:59:           GPUs: 1
01:06:59:          GPU 0: Bus:8 Slot:0 Func:0 NVIDIA:7 TU116 [GeForce GTX 1660 Ti]
01:06:59:  CUDA Device 0: Platform:0 Device:0 Bus:8 Slot:0 Compute:7.5 Driver:10.1
01:06:59:OpenCL Device 0: Platform:0 Device:0 Bus:8 Slot:0 Compute:1.2 Driver:436.15
01:06:59:  Win32 Service: false
01:06:59:***********************************************************************
01:06:59:<config>
01:06:59:  <!-- Network -->
01:06:59:  <proxy v=':8080'/>
01:06:59:
01:06:59:  <!-- Slot Control -->
01:06:59:  <power v='full'/>
01:06:59:
01:06:59:  <!-- User Information -->
01:06:59:  <passkey v='********************************'/>
01:06:59:  <team v='******'/>
01:06:59:  <user v='******'/>
01:06:59:
01:06:59:  <!-- Folding Slots -->
01:06:59:  <slot id='1' type='GPU'>
01:06:59:    <next-unit-percentage v='100'/>
01:06:59:    <paused v='true'/>
01:06:59:  </slot>
01:06:59:  <slot id='2' type='CPU'>
01:06:59:    <cpus v='15'/>
01:06:59:    <next-unit-percentage v='100'/>
01:06:59:    <paused v='true'/>
01:06:59:  </slot>
01:06:59:</config>
01:06:59:Trying to access database...
01:06:59:Successfully acquired database lock
01:06:59:Enabled folding slot 01: PAUSED gpu:0:TU116 [GeForce GTX 1660 Ti] (by user)
01:06:59:Enabled folding slot 02: PAUSED cpu:15 (by user)
01:07:25:FS01:Unpaused
01:07:25:WU01:FS01:Starting
01:07:25:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\******\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 705 -lifeline 11992 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
01:07:25:WU01:FS01:Started FahCore on PID 15488
01:07:25:WU01:FS01:Core PID:4680
01:07:25:WU01:FS01:FahCore 0x21 started
01:07:25:WU01:FS01:0x21:*********************** Log Started 2019-09-25T01:07:25Z ***********************
01:07:25:WU01:FS01:0x21:Project: 14180 (Run 4, Clone 356, Gen 60)
01:07:25:WU01:FS01:0x21:Unit: 0x0000004d0002894c5d3b54d735134e09
01:07:25:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
01:07:25:WU01:FS01:0x21:Machine: 1
01:07:25:WU01:FS01:0x21:Digital signatures verified
01:07:25:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
01:07:25:WU01:FS01:0x21:Version 0.0.20
01:07:25:WU01:FS01:0x21:  Found a checkpoint file
01:07:28:FS02:Unpaused
01:07:28:WU00:FS02:Connecting to 65.254.110.245:8080
01:07:28:WU00:FS02:Assigned to work server 155.247.166.219
01:07:28:WU00:FS02:Requesting new work unit for slot 02: READY cpu:15 from 155.247.166.219
01:07:28:WU00:FS02:Connecting to 155.247.166.219:8080
01:07:28:WU00:FS02:Downloading 6.16MiB
01:07:29:WU01:FS01:0x21:Completed 7750000 out of 12500000 steps (62%)
01:07:29:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
01:07:36:WU00:FS02:Download 3.05%
01:07:42:WU00:FS02:Download 8.12%
01:07:48:WU00:FS02:Download 11.17%
01:07:55:WU00:FS02:Download 15.23%
01:08:00:Removing old file 'configs/config-20190914-065351.xml'
01:08:00:Saving configuration to config.xml
01:08:00:<config>
01:08:00:  <!-- Network -->
01:08:00:  <proxy v=':8080'/>
01:08:00:
01:08:00:  <!-- Slot Control -->
01:08:00:  <power v='full'/>
01:08:00:
01:08:00:  <!-- User Information -->
01:08:00:  <passkey v='********************************'/>
01:08:00:  <team v='******'/>
01:08:00:  <user v='******'/>
01:08:00:
01:08:00:  <!-- Folding Slots -->
01:08:00:  <slot id='1' type='GPU'>
01:08:00:    <next-unit-percentage v='100'/>
01:08:00:  </slot>
01:08:00:  <slot id='2' type='CPU'>
01:08:00:    <cpus v='15'/>
01:08:00:    <next-unit-percentage v='100'/>
01:08:00:  </slot>
01:08:00:</config>
01:08:03:WU00:FS02:Download 18.27%
01:08:09:WU00:FS02:Download 22.34%
01:08:18:WU00:FS02:Download 25.38%
01:08:30:WU00:FS02:Download 27.41%
01:08:38:WU00:FS02:Download 29.44%
01:08:48:WU00:FS02:Download 31.47%
01:09:33:WU00:FS02:Download 33.50%
01:10:57:WU01:FS01:0x21:Completed 7875000 out of 12500000 steps (63%)
01:14:14:WU01:FS01:0x21:Completed 8000000 out of 12500000 steps (64%)
01:17:32:WU01:FS01:0x21:Completed 8125000 out of 12500000 steps (65%)
01:20:49:WU01:FS01:0x21:Completed 8250000 out of 12500000 steps (66%)
01:24:06:WU01:FS01:0x21:Completed 8375000 out of 12500000 steps (67%)
01:27:23:WU01:FS01:0x21:Completed 8500000 out of 12500000 steps (68%)
01:30:41:WU01:FS01:0x21:Completed 8625000 out of 12500000 steps (69%)
01:33:59:WU01:FS01:0x21:Completed 8750000 out of 12500000 steps (70%)
01:37:18:WU01:FS01:0x21:Completed 8875000 out of 12500000 steps (71%)
01:40:33:WU01:FS01:0x21:Completed 9000000 out of 12500000 steps (72%)
01:43:48:WU01:FS01:0x21:Completed 9125000 out of 12500000 steps (73%)
01:47:03:WU01:FS01:0x21:Completed 9250000 out of 12500000 steps (74%)
01:50:19:WU01:FS01:0x21:Completed 9375000 out of 12500000 steps (75%)
01:53:35:WU01:FS01:0x21:Completed 9500000 out of 12500000 steps (76%)
01:56:53:WU01:FS01:0x21:Completed 9625000 out of 12500000 steps (77%)
02:00:09:WU01:FS01:0x21:Completed 9750000 out of 12500000 steps (78%)
02:03:25:WU01:FS01:0x21:Completed 9875000 out of 12500000 steps (79%)
02:06:41:WU01:FS01:0x21:Completed 10000000 out of 12500000 steps (80%)
02:09:56:WU01:FS01:0x21:Completed 10125000 out of 12500000 steps (81%)
02:13:13:WU01:FS01:0x21:Completed 10250000 out of 12500000 steps (82%)
02:16:32:WU01:FS01:0x21:Completed 10375000 out of 12500000 steps (83%)
02:19:47:WU01:FS01:0x21:Completed 10500000 out of 12500000 steps (84%)
02:23:03:WU01:FS01:0x21:Completed 10625000 out of 12500000 steps (85%)
02:26:19:WU01:FS01:0x21:Completed 10750000 out of 12500000 steps (86%)
02:29:37:WU01:FS01:0x21:Completed 10875000 out of 12500000 steps (87%)
02:32:57:WU01:FS01:0x21:Completed 11000000 out of 12500000 steps (88%)
02:36:17:WU01:FS01:0x21:Completed 11125000 out of 12500000 steps (89%)
02:39:36:WU01:FS01:0x21:Completed 11250000 out of 12500000 steps (90%)
02:42:54:WU01:FS01:0x21:Completed 11375000 out of 12500000 steps (91%)
02:46:12:WU01:FS01:0x21:Completed 11500000 out of 12500000 steps (92%)
02:49:31:WU01:FS01:0x21:Completed 11625000 out of 12500000 steps (93%)
02:52:50:WU01:FS01:0x21:Completed 11750000 out of 12500000 steps (94%)
02:56:09:WU01:FS01:0x21:Completed 11875000 out of 12500000 steps (95%)
02:59:26:WU01:FS01:0x21:Completed 12000000 out of 12500000 steps (96%)
03:02:43:WU01:FS01:0x21:Completed 12125000 out of 12500000 steps (97%)
03:06:01:WU01:FS01:0x21:Completed 12250000 out of 12500000 steps (98%)
03:09:19:WU01:FS01:0x21:Completed 12375000 out of 12500000 steps (99%)
03:12:35:WU01:FS01:0x21:Completed 12500000 out of 12500000 steps (100%)
03:12:36:WU02:FS01:Connecting to 65.254.110.245:8080
03:12:36:WU01:FS01:0x21:Saving result file logfile_01.txt
03:12:36:WU01:FS01:0x21:Saving result file checkpointState.xml
03:12:36:WU01:FS01:0x21:Saving result file checkpt.crc
03:12:36:WU01:FS01:0x21:Saving result file log.txt
03:12:36:WU01:FS01:0x21:Saving result file positions.xtc
03:12:36:WU02:FS01:Assigned to work server 128.252.203.10
03:12:36:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU116 [GeForce GTX 1660 Ti] from 128.252.203.10
03:12:36:WU02:FS01:Connecting to 128.252.203.10:8080
03:12:36:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
03:12:36:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14180 run:4 clone:356 gen:60 core:0x21 unit:0x0000004d0002894c5d3b54d735134e09
03:12:36:WU01:FS01:Uploading 14.87MiB to 155.247.166.220
03:12:36:WU01:FS01:Connecting to 155.247.166.220:8080
03:12:37:WU02:FS01:Downloading 69.56MiB
03:12:40:WU02:FS01:Download complete
03:12:40:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:14249 run:1488 clone:0 gen:0 core:0x21 unit:0x0000000080fccb0a5d6ed217bace6ac7
03:12:40:WU02:FS01:Starting
03:12:40:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\******\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 02 -suffix 01 -version 705 -lifeline 11992 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
03:12:40:WU02:FS01:Started FahCore on PID 14924
03:12:40:WU02:FS01:Core PID:5212
03:12:40:WU02:FS01:FahCore 0x21 started
03:12:41:WU02:FS01:0x21:*********************** Log Started 2019-09-25T03:12:40Z ***********************
03:12:41:WU02:FS01:0x21:Project: 14249 (Run 1488, Clone 0, Gen 0)
03:12:41:WU02:FS01:0x21:Unit: 0x0000000080fccb0a5d6ed217bace6ac7
03:12:41:WU02:FS01:0x21:CPU: 0x00000000000000000000000000000000
03:12:41:WU02:FS01:0x21:Machine: 1
03:12:41:WU02:FS01:0x21:Reading tar file core.xml
03:12:41:WU02:FS01:0x21:Reading tar file integrator.xml
03:12:41:WU02:FS01:0x21:Reading tar file state.xml
03:12:41:WU02:FS01:0x21:Reading tar file system.xml
03:12:42:WU01:FS01:Upload 15.55%
03:12:42:WU02:FS01:0x21:Digital signatures verified
03:12:42:WU02:FS01:0x21:Folding@home GPU Core21 Folding@home Core
03:12:42:WU02:FS01:0x21:Version 0.0.20
03:12:48:WU01:FS01:Upload 68.93%
03:12:56:WU02:FS01:0x21:Completed 0 out of 1000000 steps (0%)
03:12:57:WU02:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:12:57:WU01:FS01:Upload 98.35%
03:13:00:WU01:FS01:Upload complete
03:13:00:WU01:FS01:Server responded WORK_ACK (400)
03:13:00:WU01:FS01:Final credit estimate, 162064.00 points
03:13:00:WU01:FS01:Cleaning up
03:14:37:WU02:FS01:0x21:Completed 10000 out of 1000000 steps (1%)
03:16:18:WU02:FS01:0x21:Completed 20000 out of 1000000 steps (2%)
03:18:04:WU02:FS01:0x21:Completed 30000 out of 1000000 steps (3%)
03:19:45:WU02:FS01:0x21:Completed 40000 out of 1000000 steps (4%)
03:21:29:WU02:FS01:0x21:Completed 50000 out of 1000000 steps (5%)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail to Upload

Post by bruce »

it looks like it's working.
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Multiple WU's Fail to Upload

Post by Joe_H »

Code: Select all

01:08:38:WU00:FS02:Download 29.44%
01:08:48:WU00:FS02:Download 31.47%
01:09:33:WU00:FS02:Download 33.50%
01:10:57:WU01:FS01:0x21:Completed 7875000 out of 12500000 steps (63%)
01:14:14:WU01:FS01:0x21:Completed 8000000 out of 12500000 steps (64%)
01:17:32:WU01:FS01:0x21:Completed 8125000 out of 12500000 steps (65%)
Okay, I see where the download of a WU for your CPU slot, FS02, stalled and never completed. The usual fix if the client doesn't detect the stall, is a restart of the FAHClient process after pausing the slots. In my experience, 7.5.1 often detects such a stall, but it can take 15-30 minutes before trying again.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
JohnJohn
Posts: 14
Joined: Fri Jun 14, 2019 1:06 am

Re: Multiple WU's Fail to Upload

Post by JohnJohn »

Yes; rebooting is the usual workaround, but it has not worked for 4x reboots over 12+ hours. My CPU slot is still currently idle. Over the last week there's been slow uploads to this server, as well. A data set that is approx 15 Mb is taking 30+ minutes to upload.
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: New Work Units Fail to Complete Download

Post by HaloJones »

me too
single 1070

Image
rewron
Posts: 12
Joined: Fri Nov 04, 2011 8:25 pm

Re: Multiple WU's Fail to Upload

Post by rewron »

bruce wrote:When were these WUs assigned to your machine? How long have these WUs been on your system?
Thanks for the response.

Project 14175 (0, 389, 4) was assigned 2019/8/30 and expires 2019/9/29.

Project 14189 (6, 134, 2) was assigned 2019/9/10 and expires 2019/9/30.

Project 14189 (1, 275, 2) was assigned 2019/9/21 T07:06:05Z and expires 2019/10/11 T07:06:05Z..
bollix47
Posts: 2958
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Multiple WU's Fail to Upload

Post by bollix47 »

To anyone having problems with uploading and/or downloading and restarting/rebooting has not helped please reboot your router (if you're using one) or your modem. I noticed after a stuck download that my speeds were drastically reduced (75/10 to 13/2) and turning off my router for ~30 seconds brought all my speeds back to 'normal'. I've notified the 'owner' of vav3 & vav4 (155.247.166.219/155.247.166.220) that he should also reboot his equipment. It appears that whenever a download/upload gets 'stuck' the router/modem suffers some temporary 'confusion' and rebooting clears that up.
rewron
Posts: 12
Joined: Fri Nov 04, 2011 8:25 pm

Re: Multiple WU's Fail to Upload

Post by rewron »

Joe_H wrote:Besides what Bruce has mentioned, I would suggest upgrading to the current version of the folding client, 7.5.1.
I plan to do that, once these WU's expire or are determined to be not salvageable. Meanwhile, I have temporarily halted new folding on this machine.
Catalina588
Posts: 41
Joined: Thu Oct 09, 2008 8:59 pm

Re: New Work Units Fail to Complete Download

Post by Catalina588 »

Overnight, five of seven client machines stalled. Stop/Start Linux clients did not fix the stall; a restart fixed the stall.

The problem occurs on wireless (rebooted) and wired (router rebooted).

If I had a server log to examine at Stanford, it would be at 155.247.166.220, which seems to be awfully slow at downloads (2.5MB/min. to a 200Mbps router).

A Speedtest to San Jose (from Savannah) indicates 14 Mbps down and 10 Mbps up ... that's a problem. To Jacksonville, 16 up, 10 down. Next stop, Comcast.

Everything is running OK at the moment.
Post Reply