Page 18 of 28

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:21 pm
by Tobit
Project: 5781 (Run 10, Clone 80, Gen 4) - Both the WS and the CS have no record of the unit.

Code: Select all

Slot 01  Done
Project: 5781 (Run 10, Clone 80, Gen 4), Core: 11
Work server: 171.67.108.21:8080
Collection server: 171.67.108.26
Download date: February 19 08:07:36
Finished date: February 19 12:28:11
Failed uploads: 11

Code: Select all

Launch directory: C:\fah\gpu1
Executable: [email protected]
Arguments: -send all -verbosity 9 

[20:13:56] - Ask before connecting: No
[20:13:56] - User name: Tobit (Team 33)
[20:13:56] - User ID: 1FA10FEE5F260BA4
[20:13:56] - Machine ID: 3
[20:13:56] 
[20:13:56] Loaded queue successfully.
[20:13:56] Attempting to return result(s) to server...
[20:13:56] Trying to send all finished work units
[20:13:56] Project: 5781 (Run 10, Clone 80, Gen 4)
[20:13:56] - Read packet limit of 540015616... Set to 524286976.

[20:13:56] + Attempting to send results [February 19 20:13:56 UTC]
[20:13:56] - Reading file work/wuresults_01.dat from core
[20:13:56]   (Read 168458 bytes from disk)
[20:13:56] Connecting to http://171.67.108.21:8080/
[20:13:58] Posted data.
[20:13:58] Initial: 0000; - Uploaded at ~82 kB/s
[20:13:58] - Averaged speed for that direction ~90 kB/s
[20:13:58] - Server does not have record of this unit. Will try again later.
[20:13:58] - Error: Could not transmit unit 01 (completed February 19) to work server.
[20:13:58] - 11 failed uploads of this unit.
[20:13:58] - Read packet limit of 540015616... Set to 524286976.

[20:13:58] + Attempting to send results [February 19 20:13:58 UTC]
[20:13:58] - Reading file work/wuresults_01.dat from core
[20:13:58]   (Read 168458 bytes from disk)
[20:13:58] Connecting to http://171.67.108.26:8080/
[20:13:59] Posted data.
[20:13:59] Initial: 0000; - Uploaded at ~165 kB/s
[20:13:59] - Averaged speed for that direction ~105 kB/s
[20:13:59] - Server does not have record of this unit. Will try again later.
[20:13:59]   Could not transmit unit 01 to Collection server; keeping in queue.
[20:13:59] + Sent 0 of 1 completed units to the server
[20:13:59] - Failed to send all units to server
[20:13:59] ***** Got a SIGTERM signal (2)
[20:13:59] Killing all core threads

Folding@Home Client Shutdown.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:23 pm
by MichaelO
noorman wrote:.
- Server does not have record of this unit. Will try again later.
.

This has been reported before; I 've passed this on to the Pande Group because it sometimes used to happen in the past too, but I had not seen it in years.

It 's a problem with the list of outgoing WU's that is not fully known (or incorrect) at the Collecting server, so it has no reference to it and doesn't accept it.


.
Noorman,

I understand that. My issue is that this has been happening for a full week now. I am not the first to report this in this thread, which I think was only 4 or 5 pages long when I made my first post.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:29 pm
by noorman
MichaelO wrote:
noorman wrote:.

MORE Official news: http://foldingforum.org/viewtopic.php?f=24&t=13474


.
I am not seeing any progress on this front. If they have rolled out the fix, it is not working yet. I just had two more clients hang, with the now infamous:

"Server has no record of this WU" message.

And I have also witnessed that these clients will subsequently hang when trying to resend the WU on subsequent retries. My only success in then restarting these clients has been to delete the queue and lose the work. This is becoming incredibly and increasingly frustrating. I have only 6 GPU clients but I am considering quiting GPU folding altogether if this situation does not improve shortly. Its a waste of my time and the electricity to keep the cards running and to have to constantly babysit them.

If a situation like this happened in a corporate environment someone would have already lost their job. The network instability and what appears to be a lack of any quality assurance on software changes is appalling and the current situation is the worst I have seen it in the 3 years that I have been folding.

Apologies for the rant but its just been sitting here simmering and it finally boiled over.
.


It 's the first major problem in years alright (except for some Power issues from time to time).

It 's related to a bug in the server software; like I said before, those happen in all software programs that are longer than a page (or two).
The programmer in charge seems to have found the cause of it, now they need a solution, preferably a final fix in stead of a plaster ...

In the past this would not have the same repercussions because the volume of work was much much smaller and Results came in less frequent.
So, this bug creates a major problem because of that difference.
I too hope for a quick fix now and for some reparations for those who lost work or still have Results in queue.


.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:43 pm
by PantherX
Hope they find and fix that bug soon. I am not having any client hang ups or unusual problems so it continues to fold and only had problems when uploading. I think that MichaelO might be having some other issues as his client hangs up.

PS-
is there a thread where i can post info about a similar WU uploading problem of CPU uni Client ( Project: 6318 (Run 2665, Clone 0, Gen 1))

Thanks for your help

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:49 pm
by MichaelO
Panther-X wrote:Hope they find and fix that bug soon. I am not having any client hang ups or unusual problems so it continues to fold and only had problems when uploading. I think that MichaelO might be having some other issues as his client hangs up.
I am not certain what my issues would be given that these same 6 clients were working fine prior to last weekend. And the 7 SMP clients are all working as well. My network is fine and is monitored all day long since I work at home. There are also the prior 17 pages of issues with people having similar problems as well as other members on our team. This NOT an isolated problem.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 8:59 pm
by noorman
Panther-X wrote:Hope they find and fix that bug soon. I am not having any client hang ups or unusual problems so it continues to fold and only had problems when uploading. I think that MichaelO might be having some other issues as his client hangs up.

PS-
is there a thread where i can post info about a similar WU uploading problem of CPU uni Client ( Project: 6318 (Run 2665, Clone 0, Gen 1))

Thanks for your help
.

You should look up to which server it tried to upload to (in the Client log) and look for or start a thread with that issue on that specific server in "Issues with a specific server" (in fact this sub-forum)

.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:00 pm
by PantherX
well, i guess that i am the lucky for now but really hope that this gets sorted out soon.
I was thinking that Dr. Vajay Pande should post a small update on his blog (for easier RSS Feed ) linking it to the thread that he has started as it clearly isn't a small problem.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:05 pm
by cdb
I'm starting to get abit fed up with this. It's gone from a total meltdown last weekend to getting better and now slowly going down hill again.
I'm getting WUs completing , but not sending (not all, just a few) and then these constantly try to send and stop me downloading new ones until I quit and restart the client. So if I leave my pc alone, it gets to the point where it's sat there doing nothing but using electric trying to upload.
The fix to send WUs back if installed this morning isn't working at my end.

It might be the first major problem in years, but everytime I folded on my ps3 for a couple of weeks or more at a time there seemed to be problems with the servers and now I'm trying to do it 24/7 the system goes into meltdown.

An update on the actual news page might be nice rather than a tread that you have to track down in the forum.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:07 pm
by noorman
Panther-X wrote:well, i guess that i am the lucky for now but really hope that this gets sorted out soon.
I was thinking that Dr. Vajay Pande should post a small update on his blog (for easier RSS Feed ) linking it to the thread that he has started as it clearly isn't a small problem.
.


He recently posted again here: http://foldingforum.org/viewtopic.php?f=24&t=13474.

I guess he 's very busy right now trying to fix the problems before writing reports on that work.
It clearly isn't a simple bug (to find) and if found, a 'good' solution is needed too, then needs testing before a roll-out on all affected servers (in this case WS)

He hasn't even been on Twitter (today) and he posts there more often than in his Blog.


.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:11 pm
by PantherX
noorman is it possible that you give me his twitter URL so that i can follow F@H more closely?
Thanks for your help

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:25 pm
by *hondo*
Hello there folks :) Is there anyway someone within this forum can either tell / let me know if this WU has hit the collection server please? 18:28:01] Project: 10105 (Run 26, Clone 9, Gen 19)[

If it has, I may have stumbled on a rough neck solution


[18:28:01] + Attempting to send results [February 19 18:28:01 UTC]
[18:28:05] - Server does not have record of this unit. Will try again later.
[18:28:05] - Error: Could not transmit unit 06 (completed February 19) to work server.
[18:28:05] Keeping unit 06 in queue.
[18:28:05] Project: 10105 (Run 26, Clone 9, Gen 19)


[18:28:05] + Attempting to send results [February 19 18:28:05 UTC]
[18:28:08] - Server does not have record of this unit. Will try again later.
[18:28:08] - Error: Could not transmit unit 06 (completed February 19) to work server.



[18:28:08] + Attempting to send results [February 19 18:28:08 UTC]
[18:30:40] - Server does not have record of this unit. Will try again later.
[18:30:40] Could not transmit unit 06 to Collection server; keeping in queue.
[18:30:40] - Preparing to get new work unit...

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:26 pm
by noorman
Panther-X wrote:noorman is it possible that you give me his twitter URL so that i can follow F@H more closely?
Thanks for your help
.

This is the one I found: http://twitter.com/vijaypande.

.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:29 pm
by derrickmcc
OK, I PM-ed Vijay and he says:
There was no code change but rather a settings change that makes WU accepts more flexible.
And I know one swallow doesn't make a summer, but on completion of my last WU on GPU3, it loaded the results not just for that WU but also the previous one in the queue:

Code: Select all

[21:12:09] Folding@home Core Shutdown: FINISHED_UNIT
[21:12:13] CoreStatus = 64 (100)
[21:12:13] Sending work to server
[21:12:13] Project: 5771 (Run 0, Clone 266, Gen 1032)
[21:12:13] - Read packet limit of 540015616... Set to 524286976.

[21:12:13] + Attempting to send results [February 19 21:12:13 UTC]
[21:12:16] + Results successfully sent
[21:12:16] Thank you for your contribution to Folding@Home.
[21:12:16] + Number of Units Completed: 1015

[21:12:20] Project: 10104 (Run 88, Clone 5, Gen 31)
[21:12:20] - Read packet limit of 540015616... Set to 524286976.

[21:12:20] + Attempting to send results [February 19 21:12:20 UTC]
[21:12:24] + Results successfully sent
[21:12:24] Thank you for your contribution to Folding@Home.
[21:12:24] + Number of Units Completed: 1016
But now 2 swallows (GPU 4) ...

Code: Select all

[21:18:04] Folding@home Core Shutdown: FINISHED_UNIT
[21:18:08] CoreStatus = 64 (100)
[21:18:08] Sending work to server
[21:18:08] Project: 5767 (Run 3, Clone 96, Gen 1001)
[21:18:08] - Read packet limit of 540015616... Set to 524286976.

[21:18:08] + Attempting to send results [February 19 21:18:08 UTC]
[21:18:12] + Results successfully sent
[21:18:12] Thank you for your contribution to Folding@Home.
[21:18:12] + Number of Units Completed: 991

[21:18:16] Project: 10104 (Run 48, Clone 6, Gen 34)
[21:18:16] - Read packet limit of 540015616... Set to 524286976.

[21:18:16] + Attempting to send results [February 19 21:18:16 UTC]
[21:18:23] + Results successfully sent
[21:18:23] Thank you for your contribution to Folding@Home.
[21:18:23] + Number of Units Completed: 992

[21:18:23] Project: 5783 (Run 6, Clone 69, Gen 49)
[21:18:23] - Read packet limit of 540015616... Set to 524286976.

[21:18:23] + Attempting to send results [February 19 21:18:23 UTC]
[21:18:32] + Results successfully sent
[21:18:32] Thank you for your contribution to Folding@Home.
[21:18:32] + Number of Units Completed: 993
Things might be looking up. :D
Image

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:37 pm
by VijayPande
We're constantly trying new fixes for this. My hope is that we'll get this fixed this weekend. All the donors are understandably upset by this and Joe and I have been working long hours to fix it, so all of us would like to get this behind us!

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 19, 2010 9:47 pm
by Teddy
Well when it is fixed you can have my time & resources again, until then you can only use my resources in a limited fashion.

Edit Oh & if it is any consolation, the GPU machines that I have left running seem to be behaving themselves OK.

Regards Teddy