Hardware configuration: Main/Daily Workstation and Primary Folder - EVGA x58 mobo with 12GB RAM Intel Core i7 920 stock clock on water Adaptec 3805 SAS controller 2 x Fujitsu 15K RPM 147GB SAS in RAID0 3 x WD Caviar Black 7200 RPM 500GB SATA in RAID0 2 x EVGA 9800GT 600/1727/900 SuperSpeed RAM Drive - 2GB for temp/cache/capture stuff Vista 64 Cooler Master HAF case
Secondary Folder - EVGA 750i with 4GB RAM E7400 CPU 1 x WD500 SATA EVGA GTX 260 Windows 7
[19:39:13] + Attempting to send results [June 17 19:39:13 UTC]
[19:39:14] - Couldn't send HTTP request to server
[19:39:14] + Could not connect to Work Server (results)
[19:39:14] (171.67.108.11:8080)
[19:39:14] + Retrying using alternative port
[19:39:15] - Couldn't send HTTP request to server
[19:39:15] + Could not connect to Work Server (results)
[19:39:15] (171.67.108.11:80)
[19:39:15] - Error: Could not transmit unit 08 (completed June 17) to work server.
[19:39:15] Keeping unit 08 in queue.
[19:39:15] Project: 5755 (Run 7, Clone 285, Gen 139)
[19:39:15] + Attempting to send results [June 17 19:39:15 UTC]
[19:39:16] - Couldn't send HTTP request to server
[19:39:16] + Could not connect to Work Server (results)
[19:39:16] (171.67.108.11:8080)
[19:39:16] + Retrying using alternative port
[19:39:17] - Couldn't send HTTP request to server
[19:39:17] + Could not connect to Work Server (results)
[19:39:17] (171.67.108.11:80)
[19:39:17] - Error: Could not transmit unit 08 (completed June 17) to work server.
[19:39:17] + Attempting to send results [June 17 19:39:17 UTC]
[19:39:18] - Couldn't send HTTP request to server
[19:39:18] (Got status 503)
[19:39:18] + Could not connect to Work Server (results)
[19:39:18] (171.67.108.25:8080)
[19:39:18] + Retrying using alternative port
[19:39:18] - Couldn't send HTTP request to server
[19:39:18] (Got status 503)
[19:39:18] + Could not connect to Work Server (results)
[19:39:18] (171.67.108.25:80)
[19:39:18] Could not transmit unit 08 to Collection server; keeping in queue.
[19:39:18] - Preparing to get new work unit...
[19:39:18] + Attempting to get work packet
[19:39:18] - Connecting to assignment server
[19:39:18] - Successful: assigned to (171.64.122.70).
[19:39:18] + News From Folding@Home: Welcome to Folding@Home
[19:39:18] Loaded queue successfully.
[19:39:18] Project: 5755 (Run 7, Clone 285, Gen 139)
[19:39:18] + Attempting to send results [June 17 19:39:18 UTC]
[19:39:19] - Couldn't send HTTP request to server
[19:39:19] + Could not connect to Work Server (results)
[19:39:19] (171.67.108.11:8080)
[19:39:19] + Retrying using alternative port
[19:39:20] - Couldn't send HTTP request to server
[19:39:20] + Could not connect to Work Server (results)
[19:39:20] (171.67.108.11:80)
[19:39:20] - Error: Could not transmit unit 08 (completed June 17) to work server.
[19:39:20] + Attempting to send results [June 17 19:39:20 UTC]
[19:39:20] - Couldn't send HTTP request to server
[19:39:20] (Got status 503)
[19:39:20] + Could not connect to Work Server (results)
[19:39:20] (171.67.108.25:8080)
[19:39:20] + Retrying using alternative port
[19:39:20] - Couldn't send HTTP request to server
[19:39:20] (Got status 503)
[19:39:20] + Could not connect to Work Server (results)
[19:39:20] (171.67.108.25:80)
[19:39:20] Could not transmit unit 08 to Collection server; keeping in queue.
[19:39:20] + Closed connections
[19:39:20]
[19:39:20] + Processing work unit
[19:39:20] Core required: FahCore_14.exe
[19:39:20] Core found.
[19:39:20] Working on queue slot 09 [June 17 19:39:20 UTC]
[19:39:20] + Working ...
[23:03:27] Project: 5911 (Run 4, Clone 497, Gen 5)
[23:03:27]
[23:03:27] Assembly optimizations on if available.
[23:03:27] Entering M.D.
[23:03:28] - Couldn't send HTTP request to server
[23:03:28] + Could not connect to Work Server (results)
[23:03:28] (171.67.108.11:8080)
[23:03:28] + Retrying using alternative port
[23:03:29] - Couldn't send HTTP request to server
[23:03:29] + Could not connect to Work Server (results)
[23:03:29] (171.67.108.11:80)
[23:03:29] - Error: Could not transmit unit 01 (completed June 17) to work server.
[23:03:29] - Read packet limit of 540015616... Set to 524286976.
[23:03:29] + Attempting to send results [June 17 23:03:29 UTC]
[23:03:29] - Couldn't send HTTP request to server
[23:03:29] (Got status 503)
[23:03:29] + Could not connect to Work Server (results)
[23:03:29] (171.67.108.25:8080)
[23:03:29] + Retrying using alternative port
[23:03:30] - Couldn't send HTTP request to server
[23:03:30] (Got status 503)
[23:03:30] + Could not connect to Work Server (results)
[23:03:30] (171.67.108.25:80)
[23:03:30] Could not transmit unit 01 to Collection server; keeping in queue.
[23:03:33] Will resume from checkpoint file
[23:03:33] Tpr hash work/wudata_03.tpr: 3617899651 3777321053 2975202315 2002439982 2774397151
[23:03:34] Working on Protein
Got 4 GPU units not able to send back to the above server. Server status page had it in reject for a while there so don't know if there is a backlog hence the reason I can't return units?
Teddy wrote:Got 4 GPU units not able to send back to the above server. Server status page had it in reject for a while there so don't know if there is a backlog hence the reason I can't return units?
Teddy
Should be able to see this by checking the netload.
Looked at server stutus and 171.67.108.11 seems to be up.
Tried to go the the server via a web browser 171.67.108.11:80 and 171.67.108.11:8080 and got page can not be displayed.
Launch directory: C:\GPU0
Executable: C:\GPU0\FAH6_0.exe
Arguments: -local -gpu 0 -forcegpu nvidia_g80 -verbosity 9
[22:35:07] - Ask before connecting: No
[22:35:07] - User name: Dale_Rose (Team 36362)
[22:35:07] - User ID: 2474FE11069308B3
[22:35:07] - Machine ID: 2
[22:35:07]
[22:35:08] Loaded queue successfully.
[22:35:08] - Preparing to get new work unit...
[22:35:08] - Autosending finished units... [August 18 22:35:08 UTC]
[22:35:08] + Attempting to get work packet
[22:35:08] Trying to send all finished work units
[22:35:08] - Will indicate memory of 4095 MB
[22:35:08] + No unsent completed units remaining.
[22:35:08] - Detect CPU.[22:35:08] - Autosend completed
Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 6
[22:35:08] - Connecting to assignment server
[22:35:08] Connecting to http://assign-GPU.stanford.edu:8080/
[22:35:09] - Couldn't send HTTP request to server
[22:35:09] + Could not connect to Assignment Server
[22:35:09] Connecting to http://assign-GPU.stanford.edu:80/
[22:35:09] Posted data.
[22:35:09] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[22:35:09] + News From Folding@Home: Welcome to Folding@Home
[22:35:09] Loaded queue successfully.
[22:35:09] Connecting to http://171.67.108.11:80/
[22:35:11] - Couldn't send HTTP request to server
[22:35:11] + Could not connect to Work Server
[22:35:11] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[22:35:19] + Attempting to get work packet
[22:35:19] - Will indicate memory of 4095 MB
[22:35:19] - Connecting to assignment server
[22:35:19] Connecting to http://assign-GPU.stanford.edu:8080/
[22:35:20] - Couldn't send HTTP request to server
[22:35:20] + Could not connect to Assignment Server
[22:35:20] Connecting to http://assign-GPU.stanford.edu:80/
[22:35:20] Posted data.
[22:35:20] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[22:35:20] + News From Folding@Home: Welcome to Folding@Home
[22:35:20] Loaded queue successfully.
[22:35:20] Connecting to http://171.67.108.11:80/
[22:35:22] - Couldn't send HTTP request to server
[22:35:22] + Could not connect to Work Server
[22:35:22] - Attempt #2 to get work failed, and no other work to do.
Waiting before retry.
Dale_Rose wrote:Looked at server stutus and 171.67.108.11 seems to be up.
Tried to go the the server via a web browser 171.67.108.11:80 and 171.67.108.11:8080 and got page can not be displayed.
Launch directory: C:\GPU0
Executable: C:\GPU0\FAH6_0.exe
Arguments: -local -gpu 0 -forcegpu nvidia_g80 -verbosity 9
[22:35:07] - Ask before connecting: No
[22:35:07] - User name: Dale_Rose (Team 36362)
[22:35:07] - User ID: 2474FE11069308B3
[22:35:07] - Machine ID: 2
[22:35:07]
[22:35:08] Loaded queue successfully.
[22:35:08] - Preparing to get new work unit...
[22:35:08] - Autosending finished units... [August 18 22:35:08 UTC]
[22:35:08] + Attempting to get work packet
[22:35:08] Trying to send all finished work units
[22:35:08] - Will indicate memory of 4095 MB
[22:35:08] + No unsent completed units remaining.
[22:35:08] - Detect CPU.[22:35:08] - Autosend completed
Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 6
[22:35:08] - Connecting to assignment server
[22:35:08] Connecting to http://assign-GPU.stanford.edu:8080/
[22:35:09] - Couldn't send HTTP request to server
[22:35:09] + Could not connect to Assignment Server
[22:35:09] Connecting to http://assign-GPU.stanford.edu:80/
[22:35:09] Posted data.
[22:35:09] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[22:35:09] + News From Folding@Home: Welcome to Folding@Home
[22:35:09] Loaded queue successfully.
[22:35:09] Connecting to http://171.67.108.11:80/
[22:35:11] - Couldn't send HTTP request to server
[22:35:11] + Could not connect to Work Server
[22:35:11] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[22:35:19] + Attempting to get work packet
[22:35:19] - Will indicate memory of 4095 MB
[22:35:19] - Connecting to assignment server
[22:35:19] Connecting to http://assign-GPU.stanford.edu:8080/
[22:35:20] - Couldn't send HTTP request to server
[22:35:20] + Could not connect to Assignment Server
[22:35:20] Connecting to http://assign-GPU.stanford.edu:80/
[22:35:20] Posted data.
[22:35:20] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[22:35:20] + News From Folding@Home: Welcome to Folding@Home
[22:35:20] Loaded queue successfully.
[22:35:20] Connecting to http://171.67.108.11:80/
[22:35:22] - Couldn't send HTTP request to server
[22:35:22] + Could not connect to Work Server
[22:35:22] - Attempt #2 to get work failed, and no other work to do.
Waiting before retry.
Now GPUs on 10 to 16 retries.
Help would be appreiciated.
That is weird, I can connect using 8080 but not 80 here!
Serverlog shows no problem ( cpu load 6.5 but there are higher reported ), netload isn't to high and there are enough wu's.
Edit: well I confirmed 3 times while waiting one minut or so inbetween. Port 80 is down, 8080 is working.
Woke up this morning and entire herd had work . . . 9 hours later and 1/3 of the herd is down again, unable to send and not recieving new work.
Is there any way to remove the 2 server's from the line-up so that I don't have idle cows? I paid over $500 electric bill last month and I have a hard time justifying to wifey giving electricity away for idle cows.
Worst case senerio is that I shut my F@H herd down and support a different project.
If you can't get stability, I can't and won't support your project. This isn't a new problem, it's been an off and on problem since June.
What is that "DL" column on server status page. I've seen that when it is low and shown in yellow background, WUs do not upload. Is it possible that this column is the problem???
Can the people who are not able to get work maybe also provide the results from the browser tests ( both ports ). Not saying it's a big influence but it was strange being able to connect on 8080 but not 80 last time, even while my few clients don't have issues getting work it might still be relevant to the issue at hand?