Page 1 of 2

171.64.65.60

Posted: Wed Feb 03, 2010 8:30 pm
by TSG
It's so dissapointing, all of 8 clients of mine can't upload the wu result since.. 13 days ago or more..? wth..

till they're succesfully uploaded, i refuse to continue doing folding on my machines.

#1

Code: Select all


--- Opening Log file [February 3 19:26:15 UTC] 


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding\Client1
Executable: C:\Folding\Client1\[email protected]
Arguments: -local -verbosity 9 -send all 

[19:26:15] - Ask before connecting: Yes
[19:26:15] - User name: Triyana_Suryakusumah_G (Team 1630)
[19:26:15] - User ID not found locally
[19:26:15] + Requesting User ID from server
[19:26:15] - Getting ID from AS: 
[19:26:15] - Presenting message box asking to network.
[19:26:16] Connecting to http://assign.stanford.edu:8080/
[19:26:18] Posted data.
[19:26:18] Initial: 9625; - Received User ID = 2596EEC862609660
[19:26:18] - Machine ID: 1
[19:26:18] 
[19:26:19] Loaded queue successfully.
[19:26:19] Attempting to return result(s) to server...
[19:26:19] Trying to send all finished work units
[19:26:19] Project: 6318 (Run 2259, Clone 39, Gen 0)


[19:26:19] + Attempting to send results [February 3 19:26:19 UTC]
[19:26:19] - Reading file work/wuresults_03.dat from core
[19:26:19]   (Read 6913276 bytes from disk)
[19:26:19] Connecting to http://171.64.65.60:8080/
[19:30:52] - Couldn't send HTTP request to server
[19:30:52] + Could not connect to Work Server (results)
[19:30:52]     (171.64.65.60:8080)
[19:30:52] + Retrying using alternative port
[19:30:52] Connecting to http://171.64.65.60:80/
[19:30:54] - Couldn't send HTTP request to server
[19:30:54] + Could not connect to Work Server (results)
[19:30:54]     (171.64.65.60:80)
[19:30:54] - Error: Could not transmit unit 03 (completed January 23) to work server.
[19:30:54] - 3 failed uploads of this unit.


[19:30:54] + Attempting to send results [February 3 19:30:54 UTC]
[19:30:54] - Reading file work/wuresults_03.dat from core
[19:30:54]   (Read 6913276 bytes from disk)
[19:30:54] Connecting to http://171.67.108.26:8080/
[19:33:02] ***** Got a SIGTERM signal (2)
[19:33:02] Killing all core threads

Folding@Home Client Shutdown.

Code: Select all

Current Work Unit
-----------------
Name: Great Red Oystrich Makes All Chemists Sane in water
Tag: -
Download time: January 20 16:54:21
Due time: March 13 16:54:21
Progress: 100%  [||||||||||]
--------------

#8

Code: Select all

--- Opening Log file [February 3 19:33:15 UTC] 


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding\Client8
Executable: C:\Folding\Client8\[email protected]
Arguments: -local -verbosity 9 -send all 

[19:33:15] - Ask before connecting: Yes
[19:33:15] - User name: Triyana_Suryakusumah_G (Team 1630)
[19:33:15] - User ID: 2596EEC862609660
[19:33:15] - Machine ID: 8
[19:33:15] 
[19:33:15] Loaded queue successfully.
[19:33:15] Deleting incompletely fetched item (4) from queue position #5
[19:33:15] - Warning: Could not delete all work unit files (5): Core file absent
[19:33:15] Attempting to return result(s) to server...
[19:33:15] Trying to send all finished work units
[19:33:15] Project: 6318 (Run 3055, Clone 39, Gen 0)


[19:33:15] + Attempting to send results [February 3 19:33:15 UTC]
[19:33:15] - Reading file work/wuresults_04.dat from core
[19:33:16]   (Read 7032651 bytes from disk)
[19:33:16] - Presenting message box asking to network.
[19:33:17] Connecting to http://171.64.65.60:8080/
[19:37:48] - Couldn't send HTTP request to server
[19:37:48] + Could not connect to Work Server (results)
[19:37:48]     (171.64.65.60:8080)
[19:37:48] + Retrying using alternative port
[19:37:48] Connecting to http://171.64.65.60:80/
[19:37:49] - Couldn't send HTTP request to server
[19:37:49] + Could not connect to Work Server (results)
[19:37:49]     (171.64.65.60:80)
[19:37:49] - Error: Could not transmit unit 04 (completed January 22) to work server.
[19:37:49] - 11 failed uploads of this unit.


[19:37:50] + Attempting to send results [February 3 19:37:50 UTC]
[19:37:50] - Reading file work/wuresults_04.dat from core
[19:37:50]   (Read 7032651 bytes from disk)
[19:37:50] Connecting to http://171.67.108.26:8080/
[19:42:23] Posted data.
[19:42:23] Initial: 0000; - Uploaded at ~25 kB/s
[19:42:23] - Averaged speed for that direction ~11 kB/s
[19:42:23] - Server does not have record of this unit. Will try again later.
[19:42:23]   Could not transmit unit 04 to Collection server; keeping in queue.
[19:42:23] + Sent 0 of 1 completed units to the server
[19:42:23] - Failed to send all units to server
[19:42:23] ***** Got a SIGTERM signal (2)
[19:42:23] Killing all core threads

Folding@Home Client Shutdown.
>> [19:42:23] - Server does not have record of this unit. Will try again later.

WTH ?????

Code: Select all

Current Work Unit
-----------------
Name: Great Red Oystrich Makes All Chemists Sane in water
Tag: -
Download time: January 20 17:28:07
Due time: March 13 17:28:07
Progress: 100%  [||||||||||]

Re: 171.64.65.60

Posted: Wed Feb 03, 2010 9:54 pm
by bruce
Were the WUs downloaded on the same machine and with the same MachineID as they're being uploaded from?

Please search through FAHlog.txt or FAHlog-Prev.txt and find where the WU reached 100%. What happened during the first few upload attempts?

Re: 171.64.65.60

Posted: Fri Feb 05, 2010 11:28 pm
by Teddy
Same problem, here I have 2 completed work units that to go back to 65.60, the CS does not want to know about them either, no change to my configuration either.

Code: Select all

[20:05:15] Completed 170000 out of 500000 steps  (34%)
[20:07:54] - Autosending finished units... [February 5 20:07:54 UTC]
[20:07:54] Trying to send all finished work units
[20:07:54] Project: 6318 (Run 3813, Clone 37, Gen 1)
[20:07:54] - Read packet limit of 540015616... Set to 524286976.


[20:07:54] + Attempting to send results [February 5 20:07:54 UTC]
[20:07:54] - Reading file work/wuresults_03.dat from core
[20:07:54]   (Read 6870570 bytes from disk)
[20:07:54] Connecting to http://171.64.65.60:8080/
[20:09:16] Timered checkpoint triggered.
[20:11:43] Posted data.
[20:11:43] Initial: 0000; - Uploaded at ~29 kB/s
[20:11:43] - Averaged speed for that direction ~25 kB/s
[20:11:43] - Server does not have record of this unit. Will try again later.
[20:11:43] - Error: Could not transmit unit 03 (completed February 4) to work server.
[20:11:43] - 7 failed uploads of this unit.
[20:11:43] - Read packet limit of 540015616... Set to 524286976.


[20:11:43] + Attempting to send results [February 5 20:11:43 UTC]
[20:11:43] - Reading file work/wuresults_03.dat from core
[20:11:43]   (Read 6870570 bytes from disk)
[20:11:43] Connecting to http://171.67.108.26:8080/
[20:13:16] Timered checkpoint triggered.
[20:15:48] Posted data.
[20:15:48] Initial: 0000; - Uploaded at ~27 kB/s
[20:15:48] - Averaged speed for that direction ~25 kB/s
[20:15:48] - Server does not have record of this unit. Will try again later.
[20:15:48]   Could not transmit unit 03 to Collection server; keeping in queue.
[20:15:48] Project: 6318 (Run 3187, Clone 40, Gen 1)
[20:15:48] - Read packet limit of 540015616... Set to 524286976.


[20:15:48] + Attempting to send results [February 5 20:15:48 UTC]
[20:15:48] - Reading file work/wuresults_04.dat from core
[20:15:48]   (Read 6875847 bytes from disk)
[20:15:48] Connecting to http://171.64.65.60:8080/
[20:16:07] Writing local files
[20:16:07] Completed 175000 out of 500000 steps  (35%)
[20:20:03] Posted data.
[20:20:03] Initial: 0000; - Uploaded at ~26 kB/s
[20:20:03] - Averaged speed for that direction ~25 kB/s
[20:20:03] - Server does not have record of this unit. Will try again later.
[20:20:03] - Error: Could not transmit unit 04 (completed February 5) to work server.
[20:20:03] - 2 failed uploads of this unit.
[20:20:03] - Read packet limit of 540015616... Set to 524286976.


[20:20:03] + Attempting to send results [February 5 20:20:03 UTC]
[20:20:03] - Reading file work/wuresults_04.dat from core
[20:20:03]   (Read 6875847 bytes from disk)
[20:20:03] Connecting to http://171.67.108.26:8080/
[20:20:07] Timered checkpoint triggered.
[20:24:07] Timered checkpoint triggered.
[20:24:13] Posted data.
[20:24:13] Initial: 0000; - Uploaded at ~26 kB/s
[20:24:13] - Averaged speed for that direction ~26 kB/s
[20:24:13] - Server does not have record of this unit. Will try again later.
[20:24:13]   Could not transmit unit 04 to Collection server; keeping in queue.
[20:24:13] + Sent 0 of 2 completed units to the server
[20:24:13] - Autosend completed
That server always looks heavily overloaded!
classic vspg10a - accept Accepting 10.33 415 2

Teddy

Re: 171.64.65.60

Posted: Sat Feb 06, 2010 12:26 am
by Teddy
Well somebody must be listening coz those old units have now been returned, thank-you for your help!

Cheers Teddy

Re: 171.64.65.60

Posted: Sat Feb 06, 2010 2:21 am
by bruce
The server has been on-line accepting uploads for a number of short periods today. I'm glad it has successfully uploaded your work.

Re: 171.64.65.60

Posted: Sat Feb 06, 2010 4:31 pm
by AgrFan
See Vijay's update --> viewtopic.php?f=19&t=13010&start=60#p129884

Re: 171.64.65.60

Posted: Mon Feb 08, 2010 12:13 pm
by TSG
okay bruce, i'm trying again sending the wuresult. thx for the info anyway..
Were the WUs downloaded on the same machine and with the same MachineID as they're being uploaded from?
they're not downloaded and uploaded from the same machine, but the machineID for each client remain the same / not changed. i do the upload for each client manually, as i paranoidly isolated all my machines from the internet thus when needed i bring the finished wu(s) to other machines that has internet access and upload/download the wu(s) from that machines. should this be a problem, let me know..

Re: 171.64.65.60

Posted: Mon Feb 08, 2010 12:26 pm
by TSG
one client is susccesfully sending the result. should be no problem for the others clients to send the result.

i think the problem for this server has been solved, thx for you guys up there!

Re: 171.64.65.60

Posted: Mon Feb 08, 2010 3:33 pm
by VijayPande
Vince and Joe have been working hard, both putting in lots of hours on the weekends, to push this through. There are problems that only arose in the v5 server code under heavy load, but now that we've put it through it's paces, I'm hoping that the bulk of the shakeout is done.

There are many upsides with the new code, most of which is the ability to handle a lot more clients (from the client perspective, this means a better ability to send back WUs) and from our perspective, it's more maintainable and easier to implement new features (= new science).

Re: 171.64.65.60

Posted: Mon Feb 08, 2010 9:11 pm
by toTOW
Btw, I still have a WU that doesn't send to this server :

Code: Select all

[19:09:05] Project: 6318 (Run 537, Clone 3, Gen 1)
[19:09:05] - Read packet limit of 540015616... Set to 524286976.


[19:09:05] + Attempting to send results [February 8 19:09:05 UTC]
[19:09:05] - Reading file work/wuresults_06.dat from core
[19:09:05]   (Read 6935639 bytes from disk)
[19:09:05] Connecting to http://171.64.65.60:8080/
[19:10:15] - Couldn't send HTTP request to server
[19:10:15] + Could not connect to Work Server (results)
[19:10:15]     (171.64.65.60:8080)
[19:10:15] + Retrying using alternative port
[19:10:15] Connecting to http://171.64.65.60:80/
[19:10:29] - Couldn't send HTTP request to server
[19:10:29] + Could not connect to Work Server (results)
[19:10:29]     (171.64.65.60:80)
[19:10:29] - Error: Could not transmit unit 06 (completed February 5) to work server.
[19:10:29] - 15 failed uploads of this unit.
[19:10:29] - Read packet limit of 540015616... Set to 524286976.


[19:10:29] + Attempting to send results [February 8 19:10:29 UTC]
[19:10:29] - Reading file work/wuresults_06.dat from core
[19:10:29]   (Read 6935639 bytes from disk)
[19:10:29] Connecting to http://171.67.108.26:8080/
[19:51:36] - Couldn't send HTTP request to server
[19:51:36] + Could not connect to Work Server (results)
[19:51:36]     (171.67.108.26:8080)
[19:51:36] + Retrying using alternative port
[19:51:36] Connecting to http://171.67.108.26:80/
[19:51:37] - Couldn't send HTTP request to server
[19:51:37]   (Got status 503)
[19:51:37] + Could not connect to Work Server (results)
[19:51:37]     (171.67.108.26:80)
[19:51:37]   Could not transmit unit 06 to Collection server; keeping in queue.
[19:51:37] + Sent 0 of 1 completed units to the server
[19:51:37] - Autosend completed

Re: 171.64.65.60

Posted: Mon Feb 08, 2010 10:53 pm
by Pette Broad
At one stage I had 37 units waiting to upload. Over the days units were sent fairly frequently but more were gathered eventually though I reached a low of 8 units...yesterday this went up to 11, a few hours ago back up to 13. I'm confident that they'll all get uploaded at some stage, I think the deadlines are still a while away. :)

Well, 24 hours later and the number of unsent units is creeping up again, now up to 16.

EDIT..12 hours down the road and I'm to 22 unsent units. Why after at least a week since this was first reported is this server still issuing units that it can't accept back? Wouldn't it be better to fix the server before issuing new units? I'm trying to be patient, but I'm using up a lot of my limited bandwith with all the "server does not have record" messages, some of them are on the 40th or more attempts!!


Pete

Re: 171.64.65.60

Posted: Fri Feb 12, 2010 7:05 pm
by HagaWaga
You are correct, it is very frustrating… I have 16 separate folders for 16 separate WU processing just because of unreliability of servers. Right now 8 or 9 are out of work due to results to be sent (and they are trying hard) and I refuse to let them process more until results HAVE been sent.
Long weekend is coming and last 4 WUs will finish sometime on 13.01.2010 and the rest time will be LOST, at least for this project.
I activated prime search so CPU does not get bored…
Maybe I should download new WU, see what server it comes from and delete it and download again until it comes from reliable one? Kind of dumb…

Shame on support. You should be glad people want to ‘waste’ their resources to help mankind and you should do your part to accept finished work or FIND a way to accept results by e-mail or similar so results does not go to recycle bin.

How do you guys get those boxes below post showing rank, points, etc.?

Re: 171.64.65.60

Posted: Fri Feb 12, 2010 11:16 pm
by k1wi
HagaWaga wrote:How do you guys get those boxes below post showing rank, points, etc.?
You add them from a third party site and add them to your signature

Re: 171.64.65.60

Posted: Sat Feb 13, 2010 9:46 am
by CBT
I would like to report that I seem to have the same issue here:

Code: Select all

[08:31:16] - Autosending finished units... [February 13 08:31:16 UTC]
[08:31:16] Trying to send all finished work units
[08:31:16] Project: 6318 (Run 3846, Clone 85, Gen 0)


[08:31:16] + Attempting to send results [February 13 08:31:16 UTC]
[08:31:16] - Reading file work/wuresults_05.dat from core
[08:31:16]   (Read 6976705 bytes from disk)
[08:31:16] Connecting to http://171.64.65.60:8080/
[08:31:48] - Couldn't send HTTP request to server
[08:31:48] + Could not connect to Work Server (results)
[08:31:48]     (171.64.65.60:8080)
[08:31:48] + Retrying using alternative port
[08:31:48] Connecting to http://171.64.65.60:80/
[08:31:49] - Couldn't send HTTP request to server
[08:31:49] + Could not connect to Work Server (results)
[08:31:49]     (171.64.65.60:80)
[08:31:49] - Error: Could not transmit unit 05 (completed February 2) to work se
rver.
[08:31:49] - 319 failed uploads of this unit.


[08:31:49] + Attempting to send results [February 13 08:31:49 UTC]
[08:31:49] - Reading file work/wuresults_05.dat from core
[08:31:49]   (Read 6976705 bytes from disk)
[08:31:49] Connecting to http://171.67.108.26:8080/
[08:31:56] Timered checkpoint triggered.
[08:32:21] Posted data.
[08:32:21] Initial: 0000; - Uploaded at ~212 kB/s
[08:32:21] - Averaged speed for that direction ~113 kB/s
[08:32:21] - Server does not have record of this unit. Will try again later.
[08:32:21]   Could not transmit unit 05 to Collection server; keeping in queue.
[08:32:21] + Sent 0 of 1 completed units to the server
[08:32:21] - Autosend completed
The machine has been trying to send this WU for days now, 11 days to be precise. This computer finishes it's WU's within a few days, so there should be enough time left to get this WU back in time for the deadline. However, there doesn't seem to be anything I can do to make it upload to it's homeserver.

Can anyone help?

Corné

Re: 171.64.65.60

Posted: Sat Feb 13, 2010 5:21 pm
by CBT
Still not working. Can anyone help?

Corné