Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Thu Feb 18, 2010 9:56 pm
More than 25 unsuccessful attempts to upload a WU to 171.67.108.26.
Heiko
Heiko
Community driven support forum for Folding@home
https://foldingforum.org/
.What is Error 503?
Error 503: Service Unavailable means; the server took too long to answer and the connection timed out.
Which also means that there are lots and lots of us with clients that are trying to upload all these WUs that we've been having trouble with and the number of server connections has exceeded whatever it can handle at one time.noorman wrote:You get a Status 503 when the server is too busy ...What is Error 503?
Error 503: Service Unavailable means; the server took too long to answer and the connection timed out.
My single GPU system seems to be happily uploading and downloading WU's.Fri Feb 19, 2010 6:33 pm
Joe has made some good progress in tracking down the problem. He's found the bug that was recently introduced into the WS code that caused this problem and is now testing the fix to rollout to the NV GPU WS's.
He has also suggested a short term workaround which should allow many of the WUs that have been sitting in the queue to be sent back. We've instituted that fix this morning and are looking to see if that helps the situation.
Code: Select all
[13:48:08] + Attempting to send results [February 19 13:48:08 UTC]
[13:48:12] + Results successfully sent
[13:48:12] Thank you for your contribution to Folding@Home.
[13:48:12] + Number of Units Completed: 986
[15:50:42] + Attempting to send results [February 19 15:50:42 UTC]
[15:50:48] + Results successfully sent
[15:50:48] Thank you for your contribution to Folding@Home.
[15:50:48] + Number of Units Completed: 987
[17:53:03] Project: 10105 (Run 128, Clone 9, Gen 10)
[17:53:03] - Read packet limit of 540015616... Set to 524286976.
[17:53:03] + Attempting to send results [February 19 17:53:03 UTC]
[17:53:06] - Server does not have record of this unit. Will try again later.
[17:53:06] - Error: Could not transmit unit 00 (completed February 19) to work server.
[17:53:06] Keeping unit 00 in queue.
[18:05:18] Project: 10105 (Run 128, Clone 9, Gen 10)
[18:05:18] - Read packet limit of 540015616... Set to 524286976.
[18:05:18] + Attempting to send results [February 19 18:05:18 UTC]
[18:05:21] - Couldn't send HTTP request to server
[18:05:21] + Could not connect to Work Server (results)
[18:05:21] (171.64.65.71:8080)
[18:05:21] + Retrying using alternative port
[18:05:23] - Couldn't send HTTP request to server
[18:05:23] + Could not connect to Work Server (results)
[18:05:23] (171.64.65.71:80)
[18:05:23] - Error: Could not transmit unit 00 (completed February 19) to work server.
[18:05:23] - Read packet limit of 540015616... Set to 524286976.
[18:05:23] + Attempting to send results [February 19 18:05:23 UTC]
[18:05:27] - Server does not have record of this unit. Will try again later.
[18:05:27] Could not transmit unit 00 to Collection server; keeping in queue.
Code: Select all
[18:20:23] Completed 100%
[18:20:23] Successful run
[18:20:23] DynamicWrapper: Finished Work Unit: sleep=10000
[18:20:33] Reserved 102184 bytes for xtc file; Cosm status=0
[18:20:33] Allocated 102184 bytes for xtc file
[18:20:33] - Reading up to 102184 from "work/wudata_01.xtc": Read 102184
[18:20:33] Read 102184 bytes from xtc file; available packet space=786328280
[18:20:33] xtc file hash check passed.
[18:20:33] Reserved 30216 30216 786328280 bytes for arc file=<work/wudata_01.trr> Cosm status=0
[18:20:33] Allocated 30216 bytes for arc file
[18:20:33] - Reading up to 30216 from "work/wudata_01.trr": Read 30216
[18:20:33] Read 30216 bytes from arc file; available packet space=786298064
[18:20:33] trr file hash check passed.
[18:20:33] Allocated 560 bytes for edr file
[18:20:33] Read bedfile
[18:20:33] edr file hash check passed.
[18:20:33] Logfile not read.
[18:20:33] GuardedRun: success in DynamicWrapper
[18:20:33] GuardedRun: done
[18:20:33] Run: GuardedRun completed.
[18:20:37] + Opened results file
[18:20:37] - Writing 133472 bytes of core data to disk...
[18:20:37] Done: 132960 -> 132539 (compressed to 99.6 percent)
[18:20:37] ... Done.
[18:20:37] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[18:20:37] Shutting down core
[18:20:37]
[18:20:37] Folding@home Core Shutdown: FINISHED_UNIT
[18:20:39] CoreStatus = 64 (100)
[18:20:39] Sending work to server
[18:20:39] Project: 10104 (Run 88, Clone 5, Gen 31)
[18:20:39] - Read packet limit of 540015616... Set to 524286976.
[18:20:39] + Attempting to send results [February 19 18:20:39 UTC]
[18:20:42] - Server does not have record of this unit. Will try again later.
[18:20:42] - Error: Could not transmit unit 01 (completed February 19) to work server.
[18:20:42] Keeping unit 01 in queue.
[18:20:42] Project: 10104 (Run 88, Clone 5, Gen 31)
[18:20:42] - Read packet limit of 540015616... Set to 524286976.
[18:20:42] + Attempting to send results [February 19 18:20:42 UTC]
[18:20:46] - Server does not have record of this unit. Will try again later.
[18:20:46] - Error: Could not transmit unit 01 (completed February 19) to work server.
[18:20:46] - Read packet limit of 540015616... Set to 524286976.
[18:20:46] + Attempting to send results [February 19 18:20:46 UTC]
[18:29:02] - Couldn't send HTTP request to server
[18:29:02] + Could not connect to Work Server (results)
[18:29:02] (171.67.108.26:8080)
[18:29:02] + Retrying using alternative port
[18:29:02] - Couldn't send HTTP request to server
[18:29:02] + Could not connect to Work Server (results)
[18:29:02] (171.67.108.26:80)
[18:29:02] Could not transmit unit 01 to Collection server; keeping in queue.
[18:29:02] - Preparing to get new work unit...
[18:29:02] + Attempting to get work packet
[18:29:02] - Connecting to assignment server
[18:29:04] + Could not connect to Assignment Server
[18:29:06] + Could not connect to Assignment Server 2
[18:29:06] + Couldn't get work instructions.
[18:29:06] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[18:29:18] + Attempting to get work packet
[18:29:18] - Connecting to assignment server
[18:29:23] - Successful: assigned to (171.64.65.20).
[18:29:23] + News From Folding@Home: Welcome to Folding@Home
[18:29:23] Loaded queue successfully.
[18:29:25] Project: 10104 (Run 88, Clone 5, Gen 31)
[18:29:25] - Read packet limit of 540015616... Set to 524286976.
[18:29:25] + Attempting to send results [February 19 18:29:25 UTC]
[18:29:27] - Couldn't send HTTP request to server
[18:29:27] + Could not connect to Work Server (results)
[18:29:27] (171.64.65.71:8080)
[18:29:27] + Retrying using alternative port
[18:29:28] - Couldn't send HTTP request to server
[18:29:28] + Could not connect to Work Server (results)
[18:29:28] (171.64.65.71:80)
[18:29:28] - Error: Could not transmit unit 01 (completed February 19) to work server.
[18:29:28] - Read packet limit of 540015616... Set to 524286976.
[18:29:28] + Attempting to send results [February 19 18:29:28 UTC]
[18:30:48] - Server does not have record of this unit. Will try again later.
[18:30:48] Could not transmit unit 01 to Collection server; keeping in queue.
[18:30:48] + Closed connections
Code: Select all
--- Opening Log file [February 19 17:46:08 UTC]
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Program Files\Folding@home\Folding@home-gpu
[17:46:08] - Ask before connecting: No
[17:46:08] - User name: PantherX (Team 69411)
[17:46:08] - User ID: 90A20066F80D03B
[17:46:08] - Machine ID: 2
[17:46:08]
[17:46:08] Loaded queue successfully.
[17:46:08] Deleting incompletely fetched item (4) from queue position #1
[17:46:08] Initialization complete
[17:46:08] - Preparing to get new work unit...
[17:46:08] + Attempting to get work packet
[17:46:08] Project: 10105 (Run 403, Clone 9, Gen 8)
[17:46:08] - Read packet limit of 540015616... Set to 524286976.
[17:46:08] + Attempting to send results [February 19 17:46:08 UTC]
[17:46:08] - Connecting to assignment server
[17:46:20] - Successful: assigned to (171.64.65.20).
[17:46:20] + News From Folding@Home: Welcome to Folding@Home
[17:46:20] Loaded queue successfully.
[17:46:26] - Couldn't send HTTP request to server
[17:46:26] (Got status 408)
[17:46:26] + Could not connect to Work Server (results)
[17:46:26] (171.64.65.71:8080)
[17:46:26] + Retrying using alternative port
[17:46:33] - Couldn't send HTTP request to server
[17:46:33] (Got status 503)
[17:46:33] + Could not connect to Work Server (results)
[17:46:33] (171.64.65.71:80)
[17:46:33] - Error: Could not transmit unit 00 (completed February 18) to work server.
[17:46:33] - Read packet limit of 540015616... Set to 524286976.
[17:46:33] + Attempting to send results [February 19 17:46:33 UTC]
[17:46:39] + Closed connections
[17:46:39]
[17:46:39] + Processing work unit
[17:46:40] Core required: FahCore_14.exe
[17:46:40] Core found.
[17:46:40] Working on queue slot 01 [February 19 17:46:40 UTC]
[17:46:40] + Working ...
[17:46:40]
[17:46:40] *------------------------------*
[17:46:40] Folding@Home GPU Core - Beta
[17:46:40] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009)
[17:46:40]
[17:46:40] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:46:40] Build host: vspm46
[17:46:40] Board Type: Nvidia
[17:46:40] Core :
[17:46:40] Preparing to commence simulation
[17:46:40] - Looking at optimizations...
[17:46:40] - Created dyn
[17:46:40] - Files status OK
[17:46:40] - Expanded 70227 -> 360060 (decompressed 512.7 percent)
[17:46:40] Called DecompressByteArray: compressed_data_size=70227 data_size=360060, decompressed_data_size=360060 diff=0
[17:46:40] - Digital signature verified
[17:46:40]
[17:46:40] Project: 5910 (Run 6, Clone 99, Gen 7)
[17:46:40]
[17:46:40] Assembly optimizations on if available.
[17:46:40] Entering M.D.
[17:46:46] Tpr hash work/wudata_01.tpr: 1891766512 2325080062 3182481768 1519290788 839412416
[17:46:46] Working on Protein
[17:46:47] Client config found, loading data.
[17:46:47] Starting GUI Server
[17:46:56] - Server does not have record of this unit. Will try again later.
[17:46:56] Could not transmit unit 00 to Collection server; keeping in queue.
[17:46:56] Project: 10105 (Run 403, Clone 9, Gen 8)
[17:46:56] - Read packet limit of 540015616... Set to 524286976.
[17:46:56] + Attempting to send results [February 19 17:46:56 UTC]
[17:46:59] - Couldn't send HTTP request to server
[17:46:59] (Got status 503)
[17:46:59] + Could not connect to Work Server (results)
[17:46:59] (171.64.65.71:8080)
[17:46:59] + Retrying using alternative port
[17:47:01] - Couldn't send HTTP request to server
[17:47:01] (Got status 503)
[17:47:01] + Could not connect to Work Server (results)
[17:47:01] (171.64.65.71:80)
[17:47:01] - Error: Could not transmit unit 00 (completed February 18) to work server.
[17:47:01] - Read packet limit of 540015616... Set to 524286976.
[17:47:01] + Attempting to send results [February 19 17:47:01 UTC]
[17:47:26] - Server does not have record of this unit. Will try again later.
[17:47:26] Could not transmit unit 00 to Collection server; keeping in queue.
[17:48:07] Completed 1%
[17:50:46] Completed 2%
[17:53:34] Completed 3%
[17:55:41] Completed 4%
[17:57:43] Completed 5%
.Panther-X wrote:Well, yesterday it was fine but later today I have problem uploading the WU, hope they can fix this glitch soon.
Code: Select all
--- Opening Log file [February 19 17:46:08 UTC] # Windows GPU Console Edition ################################################# ############################################################################### Folding@Home Client Version 6.23 http://folding.stanford.edu ############################################################################### ############################################################################### Launch directory: C:\Program Files\Folding@home\Folding@home-gpu [17:46:08] - Ask before connecting: No [17:46:08] - User name: PantherX (Team 69411) [17:46:08] - User ID: 90A20066F80D03B [17:46:08] - Machine ID: 2 [17:46:08] [17:46:08] Loaded queue successfully. [17:46:08] Deleting incompletely fetched item (4) from queue position #1 [17:46:08] Initialization complete [17:46:08] - Preparing to get new work unit... [17:46:08] + Attempting to get work packet [17:46:08] Project: 10105 (Run 403, Clone 9, Gen 8) [17:46:08] - Read packet limit of 540015616... Set to 524286976. [17:46:08] + Attempting to send results [February 19 17:46:08 UTC] [17:46:08] - Connecting to assignment server [17:46:20] - Successful: assigned to (171.64.65.20). [17:46:20] + News From Folding@Home: Welcome to Folding@Home [17:46:20] Loaded queue successfully. [17:46:26] - Couldn't send HTTP request to server [17:46:26] (Got status 408) [17:46:26] + Could not connect to Work Server (results) [17:46:26] (171.64.65.71:8080) [17:46:26] + Retrying using alternative port [17:46:33] - Couldn't send HTTP request to server [17:46:33] (Got status 503) [17:46:33] + Could not connect to Work Server (results) [17:46:33] (171.64.65.71:80) [17:46:33] - Error: Could not transmit unit 00 (completed February 18) to work server. [17:46:33] - Read packet limit of 540015616... Set to 524286976. [17:46:33] + Attempting to send results [February 19 17:46:33 UTC] [17:46:39] + Closed connections [17:46:39] [17:46:39] + Processing work unit [17:46:40] Core required: FahCore_14.exe [17:46:40] Core found. [17:46:40] Working on queue slot 01 [February 19 17:46:40 UTC] [17:46:40] + Working ... [17:46:40] [17:46:40] *------------------------------* [17:46:40] Folding@Home GPU Core - Beta [17:46:40] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009) [17:46:40] [17:46:40] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 [17:46:40] Build host: vspm46 [17:46:40] Board Type: Nvidia [17:46:40] Core : [17:46:40] Preparing to commence simulation [17:46:40] - Looking at optimizations... [17:46:40] - Created dyn [17:46:40] - Files status OK [17:46:40] - Expanded 70227 -> 360060 (decompressed 512.7 percent) [17:46:40] Called DecompressByteArray: compressed_data_size=70227 data_size=360060, decompressed_data_size=360060 diff=0 [17:46:40] - Digital signature verified [17:46:40] [17:46:40] Project: 5910 (Run 6, Clone 99, Gen 7) [17:46:40] [17:46:40] Assembly optimizations on if available. [17:46:40] Entering M.D. [17:46:46] Tpr hash work/wudata_01.tpr: 1891766512 2325080062 3182481768 1519290788 839412416 [17:46:46] Working on Protein [17:46:47] Client config found, loading data. [17:46:47] Starting GUI Server [17:46:56] - Server does not have record of this unit. Will try again later. [17:46:56] Could not transmit unit 00 to Collection server; keeping in queue. [17:46:56] Project: 10105 (Run 403, Clone 9, Gen 8) [17:46:56] - Read packet limit of 540015616... Set to 524286976. [17:46:56] + Attempting to send results [February 19 17:46:56 UTC] [17:46:59] - Couldn't send HTTP request to server [17:46:59] (Got status 503) [17:46:59] + Could not connect to Work Server (results) [17:46:59] (171.64.65.71:8080) [17:46:59] + Retrying using alternative port [17:47:01] - Couldn't send HTTP request to server [17:47:01] (Got status 503) [17:47:01] + Could not connect to Work Server (results) [17:47:01] (171.64.65.71:80) [17:47:01] - Error: Could not transmit unit 00 (completed February 18) to work server. [17:47:01] - Read packet limit of 540015616... Set to 524286976. [17:47:01] + Attempting to send results [February 19 17:47:01 UTC] [17:47:26] - Server does not have record of this unit. Will try again later. [17:47:26] Could not transmit unit 00 to Collection server; keeping in queue. [17:48:07] Completed 1% [17:50:46] Completed 2% [17:53:34] Completed 3% [17:55:41] Completed 4% [17:57:43] Completed 5%
Code: Select all
--- Opening Log file [February 19 19:59:54 UTC]
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Program Files\Folding@home\Folding@home-gpu
[19:59:54] - Ask before connecting: No
[19:59:54] - User name: PantherX (Team 69411)
[19:59:54] - User ID:
[19:59:54] - Machine ID: 2
[19:59:54]
[19:59:54] Loaded queue successfully.
[19:59:54] Initialization complete
[19:59:54]
[19:59:54] + Processing work unit
[19:59:54] Core required: FahCore_14.exe
[19:59:54] Core found.
[19:59:54] Working on queue slot 01 [February 19 19:59:54 UTC]
[19:59:54] + Working ...
[19:59:54] Project: 10105 (Run 403, Clone 9, Gen 8)
[19:59:54] - Read packet limit of 540015616... Set to 524286976.
[19:59:54] + Attempting to send results [February 19 19:59:54 UTC]
[19:59:54]
[19:59:54] *------------------------------*
[19:59:54] Folding@Home GPU Core - Beta
[19:59:54] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009)
[19:59:54]
[19:59:54] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:59:54] Build host: vspm46
[19:59:54] Board Type: Nvidia
[19:59:54] Core :
[19:59:54] Preparing to commence simulation
[19:59:54] - Looking at optimizations...
[19:59:54] - Files status OK
[19:59:54] - Expanded 70227 -> 360060 (decompressed 512.7 percent)
[19:59:54] Called DecompressByteArray: compressed_data_size=70227 data_size=360060, decompressed_data_size=360060 diff=0
[19:59:54] - Digital signature verified
[19:59:54]
[19:59:54] Project: 5910 (Run 6, Clone 99, Gen 7)
[19:59:54]
[19:59:54] Assembly optimizations on if available.
[19:59:54] Entering M.D.
[20:00:00] Will resume from checkpoint file
[20:00:00] Tpr hash work/wudata_01.tpr: 1891766512 2325080062 3182481768 1519290788 839412416
[20:00:01] Working on Protein
[20:00:02] Client config found, loading data.
[20:00:02] Resuming from checkpoint
[20:00:02] fcCheckPointResume: retrieved and current tpr file hash:
[20:00:02] 0 1891766512 1891766512
[20:00:02] 1 2325080062 2325080062
[20:00:02] 2 3182481768 3182481768
[20:00:02] 3 1519290788 1519290788
[20:00:02] 4 839412416 839412416
[20:00:02] Verified work/wudata_01.log
[20:00:02] Verified work/wudata_01.edr
[20:00:02] Verified work/wudata_01.xtc
[20:00:02] Completed 64%
[20:00:02] Starting GUI Server
[20:00:04] - Server does not have record of this unit. Will try again later.
[20:00:04] - Error: Could not transmit unit 00 (completed February 18) to work server.
[20:00:04] - Read packet limit of 540015616... Set to 524286976.
[20:00:04] + Attempting to send results [February 19 20:00:04 UTC]
[20:00:14] - Server does not have record of this unit. Will try again later.
[20:00:14] Could not transmit unit 00 to Collection server; keeping in queue.
[20:00:14] Project: 10105 (Run 403, Clone 9, Gen 8)
[20:00:14] - Read packet limit of 540015616... Set to 524286976.
[20:00:14] + Attempting to send results [February 19 20:00:14 UTC]
[20:00:24] - Server does not have record of this unit. Will try again later.
[20:00:24] - Error: Could not transmit unit 00 (completed February 18) to work server.
[20:00:24] - Read packet limit of 540015616... Set to 524286976.
[20:00:24] + Attempting to send results [February 19 20:00:24 UTC]
[20:00:33] - Server does not have record of this unit. Will try again later.
[20:00:33] Could not transmit unit 00 to Collection server; keeping in queue.
[20:00:33] + Working...
[20:01:24] Completed 65%
[20:03:27] Completed 66%
I am not seeing any progress on this front. If they have rolled out the fix, it is not working yet. I just had two more clients hang, with the now infamous:
Code: Select all
20:06:48]
[20:06:48] Folding@home Core Shutdown: FINISHED_UNIT
[20:06:51] CoreStatus = 64 (100)
[20:06:51] Sending work to server
[20:06:51] Project: 3469 (Run 7, Clone 195, Gen 2)
[20:06:51] + Attempting to send results [February 19 20:06:51 UTC]
[20:06:53] - Server does not have record of this unit. Will try again later.
[20:06:53] - Error: Could not transmit unit 02 (completed February 19) to work server.
[20:06:53] Keeping unit 02 in queue.
[20:06:53] Project: 10105 (Run 44, Clone 0, Gen 19)
[20:06:53] + Attempting to send results [February 19 20:06:53 UTC]
[20:06:53] - Server does not have record of this unit. Will try again later.
[20:06:53] - Error: Could not transmit unit 01 (completed February 19) to work server.
[20:06:53] + Attempting to send results [February 19 20:06:53 UTC]
[20:06:54] - Server does not have record of this unit. Will try again later.
[20:06:54] Could not transmit unit 01 to Collection server; keeping in queue.
[20:06:54] Project: 3469 (Run 7, Clone 195, Gen 2)
[20:06:54] + Attempting to send results [February 19 20:06:54 UTC]
[20:06:54] - Server does not have record of this unit. Will try again later.
[20:06:54] - Error: Could not transmit unit 02 (completed February 19) to work server.
[20:06:54] + Attempting to send results [February 19 20:06:54 UTC]
[20:06:55] - Server does not have record of this unit. Will try again later.
[20:06:55] Could not transmit unit 02 to Collection server; keeping in queue.
[20:06:55] - Preparing to get new work unit...
[20:06:55] + Attempting to get work packet
[20:06:55] - Connecting to assignment server
[20:06:56] - Successful: assigned to (171.67.108.11).
[20:06:56] + News From Folding@Home: Welcome to Folding@Home
[20:06:56] Loaded queue successfully.
[20:06:56] Project: 10105 (Run 44, Clone 0, Gen 19)
[20:06:56] + Attempting to send results [February 19 20:06:56 UTC]
[20:06:57] - Server does not have record of this unit. Will try again later.
[20:06:57] - Error: Could not transmit unit 01 (completed February 19) to work server.
[20:06:57] + Attempting to send results [February 19 20:06:57 UTC]
[20:06:58] - Server does not have record of this unit. Will try again later.
[20:06:58] Could not transmit unit 01 to Collection server; keeping in queue.
[20:06:58] Project: 3469 (Run 7, Clone 195, Gen 2)
[20:06:58] + Attempting to send results [February 19 20:06:58 UTC]
[20:06:58] - Server does not have record of this unit. Will try again later.
[20:06:58] - Error: Could not transmit unit 02 (completed February 19) to work server.
[20:06:58] + Attempting to send results [February 19 20:06:58 UTC]
[20:06:58] - Server does not have record of this unit. Will try again later.
[20:06:58] Could not transmit unit 02 to Collection server; keeping in queue.
[20:06:58] + Closed connections
.- Server does not have record of this unit. Will try again later.