Page 17 of 28

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Thu Feb 18, 2010 9:56 pm
by heikosch
More than 25 unsuccessful attempts to upload a WU to 171.67.108.26. :-(

Heiko

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Thu Feb 18, 2010 10:14 pm
by noorman
.

You get a Status 503 when the server is too busy ...
What is Error 503?

Error 503: Service Unavailable means; the server took too long to answer and the connection timed out.
.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 1:13 am
by bruce
noorman wrote:You get a Status 503 when the server is too busy ...
What is Error 503?
Error 503: Service Unavailable means; the server took too long to answer and the connection timed out.
Which also means that there are lots and lots of us with clients that are trying to upload all these WUs that we've been having trouble with and the number of server connections has exceeded whatever it can handle at one time.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:21 am
by HaloJones
SO why does the client just stop? Shuoldn't it timeout and get some more work?

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:56 am
by noorman
.

In the early days of F@H the Client would have downloaded another WU and gone on to Fold.

I don't know how the new GPU2 Client handles this ?
I wouldn't expect any difference, so I don't know why it is holding up ...

What are the error messages like (can you post any log entries)


.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:03 am
by Teddy
Personally I think the server situation is getting worse not better, can't send work, can't get work.
Yet some servers are behaving normally.

Is F@H becoming a victim of its own success?

Time will tell.

Just my 2c worth.

Teddy

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:19 am
by noorman
.

For now, this is a problem of one or two bugs in new software; they happen everyday in other codes ...
But I don't want to go in to those; they are not F@H related :D


.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 6:55 pm
by derrickmcc
Posted By Vijay Pande:
Fri Feb 19, 2010 6:33 pm

Joe has made some good progress in tracking down the problem. He's found the bug that was recently introduced into the WS code that caused this problem and is now testing the fix to rollout to the NV GPU WS's.

He has also suggested a short term workaround which should allow many of the WUs that have been sitting in the queue to be sent back. We've instituted that fix this morning and are looking to see if that helps the situation.
My single GPU system seems to be happily uploading and downloading WU's.

My 4 GPU system was ok earlier today, but is now having trouble:

Code: Select all

[13:48:08] + Attempting to send results [February 19 13:48:08 UTC]
[13:48:12] + Results successfully sent
[13:48:12] Thank you for your contribution to Folding@Home.
[13:48:12] + Number of Units Completed: 986

[15:50:42] + Attempting to send results [February 19 15:50:42 UTC]
[15:50:48] + Results successfully sent
[15:50:48] Thank you for your contribution to Folding@Home.
[15:50:48] + Number of Units Completed: 987

[17:53:03] Project: 10105 (Run 128, Clone 9, Gen 10)
[17:53:03] - Read packet limit of 540015616... Set to 524286976.
[17:53:03] + Attempting to send results [February 19 17:53:03 UTC]
[17:53:06] - Server does not have record of this unit. Will try again later.
[17:53:06] - Error: Could not transmit unit 00 (completed February 19) to work server.
[17:53:06]   Keeping unit 00 in queue.
[18:05:18] Project: 10105 (Run 128, Clone 9, Gen 10)
[18:05:18] - Read packet limit of 540015616... Set to 524286976.

[18:05:18] + Attempting to send results [February 19 18:05:18 UTC]
[18:05:21] - Couldn't send HTTP request to server
[18:05:21] + Could not connect to Work Server (results)
[18:05:21]     (171.64.65.71:8080)
[18:05:21] + Retrying using alternative port
[18:05:23] - Couldn't send HTTP request to server
[18:05:23] + Could not connect to Work Server (results)
[18:05:23]     (171.64.65.71:80)
[18:05:23] - Error: Could not transmit unit 00 (completed February 19) to work server.
[18:05:23] - Read packet limit of 540015616... Set to 524286976.
[18:05:23] + Attempting to send results [February 19 18:05:23 UTC]
[18:05:27] - Server does not have record of this unit. Will try again later.
[18:05:27]   Could not transmit unit 00 to Collection server; keeping in queue.
Similar pattern on the other 3 GPU's, so I dont think we are out of the woods just yet. :(

Last WU completed (on GPU 3):

Code: Select all

[18:20:23] Completed 100%
[18:20:23] Successful run
[18:20:23] DynamicWrapper: Finished Work Unit: sleep=10000
[18:20:33] Reserved 102184 bytes for xtc file; Cosm status=0
[18:20:33] Allocated 102184 bytes for xtc file
[18:20:33] - Reading up to 102184 from "work/wudata_01.xtc": Read 102184
[18:20:33] Read 102184 bytes from xtc file; available packet space=786328280
[18:20:33] xtc file hash check passed.
[18:20:33] Reserved 30216 30216 786328280 bytes for arc file=<work/wudata_01.trr> Cosm status=0
[18:20:33] Allocated 30216 bytes for arc file
[18:20:33] - Reading up to 30216 from "work/wudata_01.trr": Read 30216
[18:20:33] Read 30216 bytes from arc file; available packet space=786298064
[18:20:33] trr file hash check passed.
[18:20:33] Allocated 560 bytes for edr file
[18:20:33] Read bedfile
[18:20:33] edr file hash check passed.
[18:20:33] Logfile not read.
[18:20:33] GuardedRun: success in DynamicWrapper
[18:20:33] GuardedRun: done
[18:20:33] Run: GuardedRun completed.
[18:20:37] + Opened results file
[18:20:37] - Writing 133472 bytes of core data to disk...
[18:20:37] Done: 132960 -> 132539 (compressed to 99.6 percent)
[18:20:37]   ... Done.
[18:20:37] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[18:20:37] Shutting down core 
[18:20:37] 
[18:20:37] Folding@home Core Shutdown: FINISHED_UNIT
[18:20:39] CoreStatus = 64 (100)
[18:20:39] Sending work to server
[18:20:39] Project: 10104 (Run 88, Clone 5, Gen 31)
[18:20:39] - Read packet limit of 540015616... Set to 524286976.

[18:20:39] + Attempting to send results [February 19 18:20:39 UTC]
[18:20:42] - Server does not have record of this unit. Will try again later.
[18:20:42] - Error: Could not transmit unit 01 (completed February 19) to work server.
[18:20:42]   Keeping unit 01 in queue.
[18:20:42] Project: 10104 (Run 88, Clone 5, Gen 31)
[18:20:42] - Read packet limit of 540015616... Set to 524286976.


[18:20:42] + Attempting to send results [February 19 18:20:42 UTC]
[18:20:46] - Server does not have record of this unit. Will try again later.
[18:20:46] - Error: Could not transmit unit 01 (completed February 19) to work server.
[18:20:46] - Read packet limit of 540015616... Set to 524286976.

[18:20:46] + Attempting to send results [February 19 18:20:46 UTC]
[18:29:02] - Couldn't send HTTP request to server
[18:29:02] + Could not connect to Work Server (results)
[18:29:02]     (171.67.108.26:8080)
[18:29:02] + Retrying using alternative port
[18:29:02] - Couldn't send HTTP request to server
[18:29:02] + Could not connect to Work Server (results)
[18:29:02]     (171.67.108.26:80)
[18:29:02]   Could not transmit unit 01 to Collection server; keeping in queue.
[18:29:02] - Preparing to get new work unit...
[18:29:02] + Attempting to get work packet
[18:29:02] - Connecting to assignment server
[18:29:04] + Could not connect to Assignment Server
[18:29:06] + Could not connect to Assignment Server 2
[18:29:06] + Couldn't get work instructions.
[18:29:06] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[18:29:18] + Attempting to get work packet
[18:29:18] - Connecting to assignment server
[18:29:23] - Successful: assigned to (171.64.65.20).
[18:29:23] + News From Folding@Home: Welcome to Folding@Home
[18:29:23] Loaded queue successfully.
[18:29:25] Project: 10104 (Run 88, Clone 5, Gen 31)
[18:29:25] - Read packet limit of 540015616... Set to 524286976.

[18:29:25] + Attempting to send results [February 19 18:29:25 UTC]
[18:29:27] - Couldn't send HTTP request to server
[18:29:27] + Could not connect to Work Server (results)
[18:29:27]     (171.64.65.71:8080)
[18:29:27] + Retrying using alternative port
[18:29:28] - Couldn't send HTTP request to server
[18:29:28] + Could not connect to Work Server (results)
[18:29:28]     (171.64.65.71:80)
[18:29:28] - Error: Could not transmit unit 01 (completed February 19) to work server.
[18:29:28] - Read packet limit of 540015616... Set to 524286976.

[18:29:28] + Attempting to send results [February 19 18:29:28 UTC]
[18:30:48] - Server does not have record of this unit. Will try again later.
[18:30:48]   Could not transmit unit 01 to Collection server; keeping in queue.
[18:30:48] + Closed connections
Image

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 7:10 pm
by PantherX
Well, yesterday it was fine but later today I have problem uploading the WU, hope they can fix this glitch soon.

Code: Select all

--- Opening Log file [February 19 17:46:08 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\Folding@home\Folding@home-gpu


[17:46:08] - Ask before connecting: No
[17:46:08] - User name: PantherX (Team 69411)
[17:46:08] - User ID: 90A20066F80D03B
[17:46:08] - Machine ID: 2
[17:46:08] 
[17:46:08] Loaded queue successfully.
[17:46:08] Deleting incompletely fetched item (4) from queue position #1
[17:46:08] Initialization complete
[17:46:08] - Preparing to get new work unit...
[17:46:08] + Attempting to get work packet
[17:46:08] Project: 10105 (Run 403, Clone 9, Gen 8)
[17:46:08] - Read packet limit of 540015616... Set to 524286976.


[17:46:08] + Attempting to send results [February 19 17:46:08 UTC]
[17:46:08] - Connecting to assignment server
[17:46:20] - Successful: assigned to (171.64.65.20).
[17:46:20] + News From Folding@Home: Welcome to Folding@Home
[17:46:20] Loaded queue successfully.
[17:46:26] - Couldn't send HTTP request to server
[17:46:26]   (Got status 408)
[17:46:26] + Could not connect to Work Server (results)
[17:46:26]     (171.64.65.71:8080)
[17:46:26] + Retrying using alternative port
[17:46:33] - Couldn't send HTTP request to server
[17:46:33]   (Got status 503)
[17:46:33] + Could not connect to Work Server (results)
[17:46:33]     (171.64.65.71:80)
[17:46:33] - Error: Could not transmit unit 00 (completed February 18) to work server.
[17:46:33] - Read packet limit of 540015616... Set to 524286976.


[17:46:33] + Attempting to send results [February 19 17:46:33 UTC]
[17:46:39] + Closed connections
[17:46:39] 
[17:46:39] + Processing work unit
[17:46:40] Core required: FahCore_14.exe
[17:46:40] Core found.
[17:46:40] Working on queue slot 01 [February 19 17:46:40 UTC]
[17:46:40] + Working ...
[17:46:40] 
[17:46:40] *------------------------------*
[17:46:40] Folding@Home GPU Core - Beta
[17:46:40] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009)
[17:46:40] 
[17:46:40] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:46:40] Build host: vspm46
[17:46:40] Board Type: Nvidia
[17:46:40] Core      : 
[17:46:40] Preparing to commence simulation
[17:46:40] - Looking at optimizations...
[17:46:40] - Created dyn
[17:46:40] - Files status OK
[17:46:40] - Expanded 70227 -> 360060 (decompressed 512.7 percent)
[17:46:40] Called DecompressByteArray: compressed_data_size=70227 data_size=360060, decompressed_data_size=360060 diff=0
[17:46:40] - Digital signature verified
[17:46:40] 
[17:46:40] Project: 5910 (Run 6, Clone 99, Gen 7)
[17:46:40] 
[17:46:40] Assembly optimizations on if available.
[17:46:40] Entering M.D.
[17:46:46] Tpr hash work/wudata_01.tpr:  1891766512 2325080062 3182481768 1519290788 839412416
[17:46:46] Working on Protein
[17:46:47] Client config found, loading data.
[17:46:47] Starting GUI Server
[17:46:56] - Server does not have record of this unit. Will try again later.
[17:46:56]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:46:56] Project: 10105 (Run 403, Clone 9, Gen 8)
[17:46:56] - Read packet limit of 540015616... Set to 524286976.


[17:46:56] + Attempting to send results [February 19 17:46:56 UTC]
[17:46:59] - Couldn't send HTTP request to server
[17:46:59]   (Got status 503)
[17:46:59] + Could not connect to Work Server (results)
[17:46:59]     (171.64.65.71:8080)
[17:46:59] + Retrying using alternative port
[17:47:01] - Couldn't send HTTP request to server
[17:47:01]   (Got status 503)
[17:47:01] + Could not connect to Work Server (results)
[17:47:01]     (171.64.65.71:80)
[17:47:01] - Error: Could not transmit unit 00 (completed February 18) to work server.
[17:47:01] - Read packet limit of 540015616... Set to 524286976.


[17:47:01] + Attempting to send results [February 19 17:47:01 UTC]
[17:47:26] - Server does not have record of this unit. Will try again later.
[17:47:26]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:48:07] Completed 1%
[17:50:46] Completed 2%
[17:53:34] Completed 3%
[17:55:41] Completed 4%
[17:57:43] Completed 5%

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 7:34 pm
by noorman

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 7:46 pm
by noorman
Panther-X wrote:Well, yesterday it was fine but later today I have problem uploading the WU, hope they can fix this glitch soon.

Code: Select all

--- Opening Log file [February 19 17:46:08 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\Folding@home\Folding@home-gpu


[17:46:08] - Ask before connecting: No
[17:46:08] - User name: PantherX (Team 69411)
[17:46:08] - User ID: 90A20066F80D03B
[17:46:08] - Machine ID: 2
[17:46:08] 
[17:46:08] Loaded queue successfully.
[17:46:08] Deleting incompletely fetched item (4) from queue position #1
[17:46:08] Initialization complete
[17:46:08] - Preparing to get new work unit...
[17:46:08] + Attempting to get work packet
[17:46:08] Project: 10105 (Run 403, Clone 9, Gen 8)
[17:46:08] - Read packet limit of 540015616... Set to 524286976.


[17:46:08] + Attempting to send results [February 19 17:46:08 UTC]
[17:46:08] - Connecting to assignment server
[17:46:20] - Successful: assigned to (171.64.65.20).
[17:46:20] + News From Folding@Home: Welcome to Folding@Home
[17:46:20] Loaded queue successfully.
[17:46:26] - Couldn't send HTTP request to server
[17:46:26]   (Got status 408)
[17:46:26] + Could not connect to Work Server (results)
[17:46:26]     (171.64.65.71:8080)
[17:46:26] + Retrying using alternative port
[17:46:33] - Couldn't send HTTP request to server
[17:46:33]   (Got status 503)
[17:46:33] + Could not connect to Work Server (results)
[17:46:33]     (171.64.65.71:80)
[17:46:33] - Error: Could not transmit unit 00 (completed February 18) to work server.
[17:46:33] - Read packet limit of 540015616... Set to 524286976.


[17:46:33] + Attempting to send results [February 19 17:46:33 UTC]
[17:46:39] + Closed connections
[17:46:39] 
[17:46:39] + Processing work unit
[17:46:40] Core required: FahCore_14.exe
[17:46:40] Core found.
[17:46:40] Working on queue slot 01 [February 19 17:46:40 UTC]
[17:46:40] + Working ...
[17:46:40] 
[17:46:40] *------------------------------*
[17:46:40] Folding@Home GPU Core - Beta
[17:46:40] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009)
[17:46:40] 
[17:46:40] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:46:40] Build host: vspm46
[17:46:40] Board Type: Nvidia
[17:46:40] Core      : 
[17:46:40] Preparing to commence simulation
[17:46:40] - Looking at optimizations...
[17:46:40] - Created dyn
[17:46:40] - Files status OK
[17:46:40] - Expanded 70227 -> 360060 (decompressed 512.7 percent)
[17:46:40] Called DecompressByteArray: compressed_data_size=70227 data_size=360060, decompressed_data_size=360060 diff=0
[17:46:40] - Digital signature verified
[17:46:40] 
[17:46:40] Project: 5910 (Run 6, Clone 99, Gen 7)
[17:46:40] 
[17:46:40] Assembly optimizations on if available.
[17:46:40] Entering M.D.
[17:46:46] Tpr hash work/wudata_01.tpr:  1891766512 2325080062 3182481768 1519290788 839412416
[17:46:46] Working on Protein
[17:46:47] Client config found, loading data.
[17:46:47] Starting GUI Server
[17:46:56] - Server does not have record of this unit. Will try again later.
[17:46:56]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:46:56] Project: 10105 (Run 403, Clone 9, Gen 8)
[17:46:56] - Read packet limit of 540015616... Set to 524286976.


[17:46:56] + Attempting to send results [February 19 17:46:56 UTC]
[17:46:59] - Couldn't send HTTP request to server
[17:46:59]   (Got status 503)
[17:46:59] + Could not connect to Work Server (results)
[17:46:59]     (171.64.65.71:8080)
[17:46:59] + Retrying using alternative port
[17:47:01] - Couldn't send HTTP request to server
[17:47:01]   (Got status 503)
[17:47:01] + Could not connect to Work Server (results)
[17:47:01]     (171.64.65.71:80)
[17:47:01] - Error: Could not transmit unit 00 (completed February 18) to work server.
[17:47:01] - Read packet limit of 540015616... Set to 524286976.


[17:47:01] + Attempting to send results [February 19 17:47:01 UTC]
[17:47:26] - Server does not have record of this unit. Will try again later.
[17:47:26]   Could not transmit unit 00 to Collection server; keeping in queue.
[17:48:07] Completed 1%
[17:50:46] Completed 2%
[17:53:34] Completed 3%
[17:55:41] Completed 4%
[17:57:43] Completed 5%
.


'503' is server too busy to respond in a timely fashion, so no connection (is possble) / It 's like "try again later" ...


.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:04 pm
by PantherX
Thanks noorman, I will retry and hope that it works fine.

EDIT - restarted the client and was greeted with this (never had this in my GPU client before)

Code: Select all

--- Opening Log file [February 19 19:59:54 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\Folding@home\Folding@home-gpu


[19:59:54] - Ask before connecting: No
[19:59:54] - User name: PantherX (Team 69411)
[19:59:54] - User ID: 
[19:59:54] - Machine ID: 2
[19:59:54] 
[19:59:54] Loaded queue successfully.
[19:59:54] Initialization complete
[19:59:54] 
[19:59:54] + Processing work unit
[19:59:54] Core required: FahCore_14.exe
[19:59:54] Core found.
[19:59:54] Working on queue slot 01 [February 19 19:59:54 UTC]
[19:59:54] + Working ...
[19:59:54] Project: 10105 (Run 403, Clone 9, Gen 8)
[19:59:54] - Read packet limit of 540015616... Set to 524286976.


[19:59:54] + Attempting to send results [February 19 19:59:54 UTC]
[19:59:54] 
[19:59:54] *------------------------------*
[19:59:54] Folding@Home GPU Core - Beta
[19:59:54] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009)
[19:59:54] 
[19:59:54] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:59:54] Build host: vspm46
[19:59:54] Board Type: Nvidia
[19:59:54] Core      : 
[19:59:54] Preparing to commence simulation
[19:59:54] - Looking at optimizations...
[19:59:54] - Files status OK
[19:59:54] - Expanded 70227 -> 360060 (decompressed 512.7 percent)
[19:59:54] Called DecompressByteArray: compressed_data_size=70227 data_size=360060, decompressed_data_size=360060 diff=0
[19:59:54] - Digital signature verified
[19:59:54] 
[19:59:54] Project: 5910 (Run 6, Clone 99, Gen 7)
[19:59:54] 
[19:59:54] Assembly optimizations on if available.
[19:59:54] Entering M.D.
[20:00:00] Will resume from checkpoint file
[20:00:00] Tpr hash work/wudata_01.tpr:  1891766512 2325080062 3182481768 1519290788 839412416
[20:00:01] Working on Protein
[20:00:02] Client config found, loading data.
[20:00:02] Resuming from checkpoint
[20:00:02] fcCheckPointResume: retrieved and current tpr file hash:
[20:00:02]    0   1891766512   1891766512
[20:00:02]    1   2325080062   2325080062
[20:00:02]    2   3182481768   3182481768
[20:00:02]    3   1519290788   1519290788
[20:00:02]    4    839412416    839412416
[20:00:02] Verified work/wudata_01.log
[20:00:02] Verified work/wudata_01.edr
[20:00:02] Verified work/wudata_01.xtc
[20:00:02] Completed 64%
[20:00:02] Starting GUI Server
[20:00:04] - Server does not have record of this unit. Will try again later.
[20:00:04] - Error: Could not transmit unit 00 (completed February 18) to work server.
[20:00:04] - Read packet limit of 540015616... Set to 524286976.


[20:00:04] + Attempting to send results [February 19 20:00:04 UTC]
[20:00:14] - Server does not have record of this unit. Will try again later.
[20:00:14]   Could not transmit unit 00 to Collection server; keeping in queue.
[20:00:14] Project: 10105 (Run 403, Clone 9, Gen 8)
[20:00:14] - Read packet limit of 540015616... Set to 524286976.


[20:00:14] + Attempting to send results [February 19 20:00:14 UTC]
[20:00:24] - Server does not have record of this unit. Will try again later.
[20:00:24] - Error: Could not transmit unit 00 (completed February 18) to work server.
[20:00:24] - Read packet limit of 540015616... Set to 524286976.


[20:00:24] + Attempting to send results [February 19 20:00:24 UTC]
[20:00:33] - Server does not have record of this unit. Will try again later.
[20:00:33]   Could not transmit unit 00 to Collection server; keeping in queue.
[20:00:33] + Working...
[20:01:24] Completed 65%
[20:03:27] Completed 66%

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:05 pm
by MichaelO
noorman wrote:.

MORE Official news: http://foldingforum.org/viewtopic.php?f=24&t=13474


.
I am not seeing any progress on this front. If they have rolled out the fix, it is not working yet. I just had two more clients hang, with the now infamous:

"Server has no record of this WU" message.

And I have also witnessed that these clients will subsequently hang when trying to resend the WU on subsequent retries. My only success in then restarting these clients has been to delete the queue and lose the work. This is becoming incredibly and increasingly frustrating. I have only 6 GPU clients but I am considering quiting GPU folding altogether if this situation does not improve shortly. Its a waste of my time and the electricity to keep the cards running and to have to constantly babysit them.

If a situation like this happened in a corporate environment someone would have already lost their job. The network instability and what appears to be a lack of any quality assurance on software changes is appalling and the current situation is the worst I have seen it in the 3 years that I have been folding.

Apologies for the rant but its just been sitting here simmering and it finally boiled over.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:15 pm
by MichaelO
Two more just hung with the same message:

Code: Select all

20:06:48] 
[20:06:48] Folding@home Core Shutdown: FINISHED_UNIT
[20:06:51] CoreStatus = 64 (100)
[20:06:51] Sending work to server
[20:06:51] Project: 3469 (Run 7, Clone 195, Gen 2)


[20:06:51] + Attempting to send results [February 19 20:06:51 UTC]
[20:06:53] - Server does not have record of this unit. Will try again later.
[20:06:53] - Error: Could not transmit unit 02 (completed February 19) to work server.
[20:06:53]   Keeping unit 02 in queue.
[20:06:53] Project: 10105 (Run 44, Clone 0, Gen 19)


[20:06:53] + Attempting to send results [February 19 20:06:53 UTC]
[20:06:53] - Server does not have record of this unit. Will try again later.
[20:06:53] - Error: Could not transmit unit 01 (completed February 19) to work server.


[20:06:53] + Attempting to send results [February 19 20:06:53 UTC]
[20:06:54] - Server does not have record of this unit. Will try again later.
[20:06:54]   Could not transmit unit 01 to Collection server; keeping in queue.
[20:06:54] Project: 3469 (Run 7, Clone 195, Gen 2)


[20:06:54] + Attempting to send results [February 19 20:06:54 UTC]
[20:06:54] - Server does not have record of this unit. Will try again later.
[20:06:54] - Error: Could not transmit unit 02 (completed February 19) to work server.


[20:06:54] + Attempting to send results [February 19 20:06:54 UTC]
[20:06:55] - Server does not have record of this unit. Will try again later.
[20:06:55]   Could not transmit unit 02 to Collection server; keeping in queue.
[20:06:55] - Preparing to get new work unit...
[20:06:55] + Attempting to get work packet
[20:06:55] - Connecting to assignment server
[20:06:56] - Successful: assigned to (171.67.108.11).
[20:06:56] + News From Folding@Home: Welcome to Folding@Home
[20:06:56] Loaded queue successfully.
[20:06:56] Project: 10105 (Run 44, Clone 0, Gen 19)


[20:06:56] + Attempting to send results [February 19 20:06:56 UTC]
[20:06:57] - Server does not have record of this unit. Will try again later.
[20:06:57] - Error: Could not transmit unit 01 (completed February 19) to work server.


[20:06:57] + Attempting to send results [February 19 20:06:57 UTC]
[20:06:58] - Server does not have record of this unit. Will try again later.
[20:06:58]   Could not transmit unit 01 to Collection server; keeping in queue.
[20:06:58] Project: 3469 (Run 7, Clone 195, Gen 2)


[20:06:58] + Attempting to send results [February 19 20:06:58 UTC]
[20:06:58] - Server does not have record of this unit. Will try again later.
[20:06:58] - Error: Could not transmit unit 02 (completed February 19) to work server.


[20:06:58] + Attempting to send results [February 19 20:06:58 UTC]
[20:06:58] - Server does not have record of this unit. Will try again later.
[20:06:58]   Could not transmit unit 02 to Collection server; keeping in queue.
[20:06:58] + Closed connections

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:18 pm
by noorman
.
- Server does not have record of this unit. Will try again later.
.

This has been reported before; I 've passed this on to the Pande Group because it sometimes used to happen in the past too, but I had not seen it in years.

It 's a problem with the list of outgoing WU's that is not fully known (or incorrect) at the Collecting server, so it has no reference to it and doesn't accept it.


.