Page 10 of 28
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:32 pm
by seanego
Tobit wrote:Nathan_P wrote:Yes i'd like to know as well, are we going to have to refold all those wu or is there a way to force the upload, i have about a dozen that the server says it has already received
Unfortunately, there is nothing left to force. When the client receives the message that the server has already received the work unit, the slot in queue.dat the work was assigned to is "emptied". Some of us still have some wuresults.dat files. However, this problem had gone on for so long, many of mine were over written several times with newer work. The clients have only so many slots and once the slot is cleared, there is no way to send any lingering work files back to Stanford.
And what about "Server does not have record of this unit" problem? Do these WU have any chance to be uploaded?
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:36 pm
by bollix47
VijayPande wrote:Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
Not true. I have 3 computers with only 1 GPU each and they all have result files that did not upload due to "Server has already received unit"
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:36 pm
by PantherX
After around 50 attempts and couple of restarts, i finally got w WU (P10105) from 171.64.65.71 so i hope this is the last bug in the system.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:40 pm
by Nathan_P
VijayPande wrote:Nathan_P wrote:chriskwarren wrote:Thanks Dr. Pande. Can you confirm that the "Server has already received unit" problem means that our WUs were accepted by the server and not wasted? From our end it looks like the server rejects our work, and our WU gets wasted.
Yes i'd like to know as well, are we going to have to refold all those wu or is there a way to force the upload, i have about a dozen that the server says it has already received
It depends on the nature of the WS bug that's causing this, but I'm worried that these won't go back. I've escalated this bug to the highest level on our bug tracker and Joe's on it. I'll post more when we know more.
Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
It happened on my single gpu box as well
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:41 pm
by DrSpalding
bollix47 wrote:VijayPande wrote:Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
Not true. I have 3 computers with only 1 GPU each and they all have result files that did not upload due to "Server has already received unit"
Ditto here. I have single GPUs in two machines and both have these WUs in limbo. Results files are still there, and the queue.dat still has the record of them, marked as "finished". I have saved the originals (logs, queue.dat, work/*) and cleared out the directory and restarted the GPU clients. They are both working now and hopefully will have no issues uploading later on.
I have 12 WUs in this state that I would like to see credited if possible. If you need any examples of what happened, I would be happy to send logs, queue.dat, work files, etc. for you to help diagnose and fix it.
Thanks,
Dan
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:44 pm
by ONE-OF-THREE
bollix47 wrote:VijayPande wrote:Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
Not true. I have 3 computers with only 1 GPU each and they all have result files that did not upload due to "Server has already received unit"
Similar situation for me as well, as I only have one computer folding with just one GPU (Nvidia GTX 260) which had the same "Server has already received unit" problem.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:52 pm
by tobor
VijayPande wrote:Nathan_P wrote:chriskwarren wrote:Thanks Dr. Pande. Can you confirm that the "Server has already received unit" problem means that our WUs were accepted by the server and not wasted? From our end it looks like the server rejects our work, and our WU gets wasted.
Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
Please say that is not the case...That's probly about 90% of the peeps on here...
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 5:55 pm
by TheWolf
VijayPande wrote:Nathan_P wrote:chriskwarren wrote:Thanks Dr. Pande. Can you confirm that the "Server has already received unit" problem means that our WUs were accepted by the server and not wasted? From our end it looks like the server rejects our work, and our WU gets wasted.
Yes i'd like to know as well, are we going to have to refold all those wu or is there a way to force the upload, i have about a dozen that the server says it has already received
It depends on the nature of the WS bug that's causing this, but I'm worried that these won't go back. I've escalated this bug to the highest level on our bug tracker and Joe's on it. I'll post more when we know more.
Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
I was seeing this on single GPU rigs as well as muti GPU rigs.
So its not just multiple GPUs in the same box having these problems.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 6:00 pm
by ikerekes
tobor wrote:VijayPande wrote:
Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
I have 7 GPU's none of them multi GPU client, 3 windows and 4 linux wine client.
All of them had the same problems (and still having)
Code: Select all
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Documents and Settings\Ivan\Application Data\Folding@home-gpu
[16:11:41] - Ask before connecting: No
[16:11:41] - User name: ikerekes (Team 50619)
[16:11:41] - User ID: 3AC8B048259843DC
[16:11:41] - Machine ID: 2
[16:11:41]
[16:11:42] Loaded queue successfully.
[16:11:42] Initialization complete
[16:11:42] - Preparing to get new work unit...
[16:11:42] + Attempting to get work packet
[16:11:42] Project: 3470 (Run 10, Clone 62, Gen 0)
[16:11:42] - Read packet limit of 540015616... Set to 524286976.
[16:11:42] + Attempting to send results [February 15 16:11:42 UTC]
[16:11:42] - Connecting to assignment server
[16:11:42] - Successful: assigned to (171.64.65.71).
[16:11:42] + News From Folding@Home: Welcome to Folding@Home
[16:11:42] Loaded queue successfully.
[16:11:43] - Couldn't send HTTP request to server
[16:11:43] + Could not connect to Work Server (results)
[16:11:43] (171.67.108.21:8080)
[16:11:43] + Retrying using alternative port
[16:11:43] + Closed connections
[16:11:43]
[16:11:43] + Processing work unit
[16:11:43] Core required: FahCore_11.exe
[16:11:43] Core found.
[16:11:43] Working on queue slot 04 [February 15 16:11:43 UTC]
[16:11:43] + Working ...
[16:11:43]
[16:11:43] *------------------------------*
[16:11:43] Folding@Home GPU Core
[16:11:43] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[16:11:43]
[16:11:43] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[16:11:43] Build host: amoeba
[16:11:43] Board Type: Nvidia
[16:11:43] Core :
[16:11:43] Preparing to commence simulation
[16:11:43] - Looking at optimizations...
[16:11:43] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[16:11:43] - Created dyn
[16:11:43] - Files status OK
[16:11:44] - Expanded 88632 -> 447307 (decompressed 504.6 percent)
[16:11:44] Called DecompressByteArray: compressed_data_size=88632 data_size=447307, decompressed_data_size=447307 diff=0
[16:11:44] - Digital signature verified
[16:11:44]
[16:11:44] Project: 10105 (Run 409, Clone 2, Gen 3)
[16:11:44]
[16:11:44] Assembly optimizations on if available.
[16:11:44] Entering M.D.
[16:11:50] Tpr hash work/wudata_04.tpr: 1527447982 4044551611 2386089724 1503699569 3186043621
[16:11:50]
[16:11:50] Calling fah_main args: 14 usage=100
[16:11:50]
[16:11:50] Working on p10105_lambda_370K
[16:11:51] Client config found, loading data.
[16:11:52] Starting GUI Server
[16:12:02] - Couldn't send HTTP request to server
[16:12:02] + Could not connect to Work Server (results)
[16:12:02] (171.67.108.21:80)
[16:12:02] - Error: Could not transmit unit 01 (completed February 13) to work server.
[16:12:02] - Read packet limit of 540015616... Set to 524286976.
[16:12:02] + Attempting to send results [February 15 16:12:02 UTC]
[16:13:41] Completed 1%
[16:15:31] Completed 2%
[16:17:20] Completed 3%
[16:19:10] Completed 4%
[16:20:59] Completed 5%
[16:22:49] Completed 6%
[16:24:38] Completed 7%
[16:26:28] Completed 8%
[16:28:17] Completed 9%
[16:30:07] Completed 10%
[16:31:56] Completed 11%
[16:33:28] + Could not connect to Work Server (results)
[16:33:28] (171.67.108.26:8080)
[16:33:28] + Retrying using alternative port
[16:33:28] - Couldn't send HTTP request to server
[16:33:28] (Got status 503)
[16:33:28] + Could not connect to Work Server (results)
[16:33:28] (171.67.108.26:80)
[16:33:28] Could not transmit unit 01 to Collection server; keeping in queue.
[16:33:28] Project: 10102 (Run 363, Clone 0, Gen 9)
[16:33:28] - Read packet limit of 540015616... Set to 524286976.
[16:33:28] + Attempting to send results [February 15 16:33:28 UTC]
[16:33:31] - Couldn't send HTTP request to server
[16:33:31] + Could not connect to Work Server (results)
[16:33:31] (171.64.65.71:8080)
[16:33:31] + Retrying using alternative port
[16:33:34] - Couldn't send HTTP request to server
[16:33:34] + Could not connect to Work Server (results)
[16:33:34] (171.64.65.71:80)
[16:33:34] - Error: Could not transmit unit 02 (completed February 15) to work server.
[16:33:34] - Read packet limit of 540015616... Set to 524286976.
[16:33:34] + Attempting to send results [February 15 16:33:34 UTC]
[16:33:45] Completed 12%
[16:34:05] - Server does not have record of this unit. Will try again later.
[16:34:05] Could not transmit unit 02 to Collection server; keeping in queue.
[16:34:05] Project: 10105 (Run 109, Clone 6, Gen 2)
[16:34:05] - Read packet limit of 540015616... Set to 524286976.
[16:34:05] + Attempting to send results [February 15 16:34:05 UTC]
[16:34:07] - Couldn't send HTTP request to server
[16:34:07] + Could not connect to Work Server (results)
[16:34:07] (171.64.65.71:8080)
[16:34:07] + Retrying using alternative port
[16:34:10] - Couldn't send HTTP request to server
[16:34:10] + Could not connect to Work Server (results)
[16:34:10] (171.64.65.71:80)
[16:34:10] - Error: Could not transmit unit 03 (completed February 13) to work server.
[16:34:10] - Read packet limit of 540015616... Set to 524286976.
[16:34:10] + Attempting to send results [February 15 16:34:10 UTC]
[16:34:12] - Server does not have record of this unit. Will try again later.
[16:34:12] Could not transmit unit 03 to Collection server; keeping in queue.
[16:35:35] Completed 13%
[16:37:24] Completed 14%
[16:39:14] Completed 15%
[16:41:03] Completed 16%
[16:42:52] Completed 17%
[16:44:42] Completed 18%
[16:46:31] Completed 19%
[16:48:21] Completed 20%
[16:50:10] Completed 21%
[16:51:59] Completed 22%
[16:53:49] Completed 23%
[16:55:37] Completed 24%
[16:57:25] Completed 25%
[16:59:14] Completed 26%
[17:01:02] Completed 27%
[17:02:51] Completed 28%
[17:04:39] Completed 29%
[17:06:27] Completed 30%
[17:08:16] Completed 31%
[17:10:04] Completed 32%
[17:11:53] Completed 33%
[17:13:41] Completed 34%
[17:15:30] Completed 35%
[17:17:18] Completed 36%
[17:19:06] Completed 37%
[17:20:55] Completed 38%
[17:22:43] Completed 39%
[17:24:32] Completed 40%
[17:26:20] Completed 41%
[17:28:08] Completed 42%
[17:29:57] Completed 43%
[17:31:45] Completed 44%
[17:33:34] Completed 45%
[17:35:22] Completed 46%
[17:37:11] Completed 47%
[17:38:59] Completed 48%
[17:40:47] Completed 49%
[17:42:36] Completed 50%
[17:44:24] Completed 51%
[17:46:13] Completed 52%
[17:48:03] Completed 53%
[17:49:54] Completed 54%
[17:51:44] Completed 55%
[17:53:34] Completed 56%
[17:55:25] Completed 57%
[17:57:15] Completed 58%
[17:59:05] Completed 59%
[18:00:55] Completed 60%
[18:02:46] Completed 61%
[18:04:38] Completed 62%
[18:06:29] Completed 63%
[18:08:20] Completed 64%
[18:10:11] Completed 65%
[18:12:02] Completed 66%
[18:13:54] Completed 67%
[18:15:45] Completed 68%
[18:17:36] Completed 69%
[18:19:27] Completed 70%
[18:21:18] Completed 71%
[18:23:10] Completed 72%
[18:25:01] Completed 73%
[18:26:52] Completed 74%
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 6:07 pm
by lambdapro
Ditto. I have a single GTX260 with the same problem.
David
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 6:32 pm
by PantherX
Just checked my log and found a couple of "Server has already received unit" and i am having a single 9600 GT hope it can be fixed soon.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 6:42 pm
by SnW
Thanks for looking into this
a man can whine but must be thankfull as well
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 6:45 pm
by Flathead74
VijayPande wrote:Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
I have ten (10) WUs on a single GPU system that fall into this category.
I have the Fahlogs, Work folder and queue.dat file.
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 7:17 pm
by CBT
Works for me now.
At 15:33h UTC it picked up a new WU.
Corné
Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26
Posted: Mon Feb 15, 2010 7:18 pm
by goben_2003
VijayPande wrote:
Note that, as far as we can tell so far, this is only an issue for people with multiple GPUs in the same box. If you're seeing it in some other case, please let us know.
I also have four results that had Server has already received unit. This is on my single gpu 9800 gt.
Note: I can see that I received points for earlier ones on kakao stats, and it was just the last 4 before I couldn't get any more units, so I'm guessing earlier ones made it. We'll see if they send when the new unit I just got is sent in.
Edit: The points didn't show up, so I'm going to guess those four didn't make it. I'm not worried about the points, just the science. I let the client process new units since someone else would have to do the unit anyways, and it overwrote the data.