Page 1 of 1

171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 3:01 am
by dscreen
Not having any luck uploading a WU that came from this work server...fell back to the collection server 171.67.108.25 and has failed 13 attempts to upload the results.

Code: Select all

02:27:27:WU00:FS00:Uploading 6.75KiB to 171.67.108.11
02:27:27:WU00:FS00:Connecting to 171.67.108.11:8080
02:27:28:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to read stream
02:27:28:WU00:FS00:Trying to send results to collection server
02:27:28:WU00:FS00:Uploading 6.75KiB to 171.67.108.25
02:27:28:WU00:FS00:Connecting to 171.67.108.25:8080
02:27:49:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
02:27:49:WU00:FS00:Connecting to 171.67.108.25:80
02:28:10:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.25:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
02:35:39:FS01:Shutting core down
02:35:48:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
02:35:48:WU01:FS01:Starting
02:35:48:WARNING:WU01:FS01:Changed SMP threads from 4 to 7 this can cause some work units to fail
02:35:48:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Don/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 01 -suffix 01 -version 703 -lifeline 3356 -checkpoint 15 -np 7
02:35:48:WU01:FS01:Started FahCore on PID 7476
02:35:48:WU01:FS01:Core PID:9024
02:35:48:WU01:FS01:FahCore 0xa3 started
02:35:49:WU01:FS01:0xa3:
02:35:49:WU01:FS01:0xa3:*------------------------------*
02:35:49:WU01:FS01:0xa3:Folding@Home Gromacs SMP Core
02:35:49:WU01:FS01:0xa3:Version 2.27 (Dec. 15, 2010)
02:35:49:WU01:FS01:0xa3:
02:35:49:WU01:FS01:0xa3:Preparing to commence simulation
02:35:49:WU01:FS01:0xa3:- Looking at optimizations...
02:35:49:WU01:FS01:0xa3:- Files status OK
02:35:49:WU01:FS01:0xa3:- Expanded 1930208 -> 2862100 (decompressed 148.2 percent)
02:35:49:WU01:FS01:0xa3:Called DecompressByteArray: compressed_data_size=1930208 data_size=2862100, decompressed_data_size=2862100 diff=0
02:35:49:WU01:FS01:0xa3:- Digital signature verified
02:35:49:WU01:FS01:0xa3:
02:35:49:WU01:FS01:0xa3:Project: 7506 (Run 0, Clone 37, Gen 446)
02:35:49:WU01:FS01:0xa3:
02:35:49:WU01:FS01:0xa3:Assembly optimizations on if available.
02:35:49:WU01:FS01:0xa3:Entering M.D.
02:35:55:WU01:FS01:0xa3:Using Gromacs checkpoints
02:35:55:WU01:FS01:0xa3:Mapping NT from 7 to 7 
02:35:55:WU01:FS01:0xa3:Resuming from checkpoint
02:35:55:WU01:FS01:0xa3:Verified 01/wudata_01.log
02:35:55:WU01:FS01:0xa3:Verified 01/wudata_01.trr
02:35:55:WU01:FS01:0xa3:Verified 01/wudata_01.xtc
02:35:55:WU01:FS01:0xa3:Verified 01/wudata_01.edr
02:35:56:WU01:FS01:0xa3:Completed 465960 out of 500000 steps  (93%)
02:42:04:WU01:FS01:0xa3:Completed 470000 out of 500000 steps  (94%)
02:49:40:WU01:FS01:0xa3:Completed 475000 out of 500000 steps  (95%)
02:57:13:WU01:FS01:0xa3:Completed 480000 out of 500000 steps  (96%)

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 4:02 am
by Joe_H
Checking the Server Status page, that WS is up and accepting WU's being returned by others. As for the CS, that is a known issue. It is one of several listed in the Do This First topic as not being available.

Has anything changed on your system in the way of security software or firewall? The same needs to be checked for you internet connection. Can you connect to the IP address of the WS using your browser? A successful connection would give either a blank screen or one with an OK on it. You need to check using both port 80 and port 8080.

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 3:33 pm
by dscreen
I did read the "Do This First" topic. :-) Nothing has changed on my connection. I do get the OK page from 171.67.108.11:8080 and 171.67.108.11:80. The WU send function has fallen back to the bad collection server at 171.67.108.25...I do realize this is a known issue. How do I make the WU send function go back to the original worker server at 171.67.108.11? Or should I just delete the stuck WU? All of my other work units have been processed fine.

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 3:52 pm
by bollix47
There was a short period a couple of hours ago when the status for the .11 server was not normal. It appears to be okay now.

Try pausing the slots and restarting your client. It should send immediately or you could just wait until the client retries.

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 5:35 pm
by dscreen
Paused both active WU's and restarted my client. It did retry sending the stuck WU right away but failed again with this log entry:

17:33:07:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:5767 run:10 clone:83 gen:3391 core:0x11 unit:0x4a150d1f51a78ea00d3f0053000a1687
17:33:07:WU00:FS00:Uploading 6.75KiB to 171.67.108.11
17:33:07:WU00:FS00:Connecting to 171.67.108.11:8080
17:33:07:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to read stream
17:33:07:WU00:FS00:Trying to send results to collection server
17:33:07:WU00:FS00:Uploading 6.75KiB to 171.67.108.25
17:33:07:WU00:FS00:Connecting to 171.67.108.25:8080
17:33:28:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
17:33:28:WU00:FS00:Connecting to 171.67.108.25:80
17:33:50:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.25:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 5:43 pm
by PantherX
I just now tried it and got "OK" in the webpage (171.67.108.11:8080). It could be possible that there is an issue between you and Stanford.

Here is my ping which is successful:

Code: Select all

Microsoft Windows [Version 6.2.9200]
(c) 2012 Microsoft Corporation. All rights reserved.

C:\Users\PantherX>tracert 171.67.108.11

Tracing route to vsp07v.Stanford.EDU [171.67.108.11]
over a maximum of 30 hops:

  1    <1 ms    <1 ms     1 ms  REDACTED
  2    23 ms    28 ms    23 ms  REDACTED
  3    22 ms    22 ms    22 ms  REDACTED
  4    23 ms    21 ms    22 ms  84-235-111-5.igw.com.sa [84.235.111.5]
  5    23 ms    22 ms    22 ms  84.235.122.36
  6    35 ms    48 ms    37 ms  84-235-122-249.igw.com.sa [84.235.122.249]
  7    38 ms    51 ms    49 ms  84-235-120-33.igw.com.sa [84.235.120.33]
  8   206 ms   202 ms   200 ms  sl-gw31-nyc-11-0-0.sprintlink.net [144.232.234.181]
  9   202 ms   197 ms   200 ms  sl-crs2-nyc-0-2-0-0.sprintlink.net [144.232.13.35]
 10   189 ms   188 ms   244 ms  sl-gw50-nyc-.sprintlink.net [144.232.1.42]
 11   210 ms   200 ms   199 ms  e6-4.ar9.NYC1.gblx.net [64.208.110.33]
 12   195 ms   193 ms   263 ms  ae7.scr3.NYC1.gblx.net [67.16.142.49]
 13   261 ms   262 ms   258 ms  te4-3-10G.ar3.SJC2.gblx.net [67.17.105.34]
 14   254 ms   266 ms   258 ms  Hurrican-Electric-LLC.Port-channel100.ar3.SJC2.gblx.net [64.214.174.246]
 15   270 ms   255 ms   268 ms  10gigabitethernet5-2.core1.pao1.he.net [72.52.92.69]
 16   256 ms   268 ms   259 ms  stanford-university.10gigabitethernet1-4.core1.pao1.he.net [216.218.209.118]
 17   263 ms   266 ms   266 ms  boundarya-rtr.Stanford.EDU [68.65.168.33]
 18     *        *        *     Request timed out.
 19   258 ms   261 ms   259 ms  vsp07v.Stanford.EDU [171.67.108.11]

Trace complete.

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 6:24 pm
by dscreen
Seems the problem is with the WU on my computer not the connection to the server. I ended up deleting the WU. :-(

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 6:35 pm
by PantherX
What is of interest is that the WU was already "dumped" by your system. If you could find the log section where that happened, it might be useful to know why and what the error was:

17:33:07:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:5767 run:10 clone:83 gen:3391 core:0x11 unit:0x4a150d1f51a78ea00d3f0053000a1687

Re: 171.67.108.11 work server issue

Posted: Sun Jun 02, 2013 9:12 pm
by dscreen
I found the logfile regarding the "dumped" issue. Looks like missing work files associated with the WU on my laptop. Thanks for the help on this problem.

21:06:20:WU00:FS00:0x11:Folding@home Core Shutdown: MISSING_WORK_FILES
21:06:20:WARNING:WU00:FS00:FahCore returned: MISSING_WORK_FILES (116 = 0x74)
21:06:20:WARNING:WU00:FS00:Fatal error, dumping
21:06:20:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:5767 run:10 clone:83 gen:3391 core:0x11 unit:0x4a150d1f51a78ea00d3f0053000a1687
21:06:20:WU00:FS00:Uploading 6.75KiB to 171.67.108.11
21:06:20:WU00:FS00:Connecting to 171.67.108.11:8080
21:06:21:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to read stream
21:06:21:WU02:FS00:Connecting to assign-GPU.stanford.edu:80
21:06:21:WU00:FS00:Trying to send results to collection server
21:06:21:WU00:FS00:Uploading 6.75KiB to 171.67.108.25
21:06:21:WU00:FS00:Connecting to 171.67.108.25:8080