Page 1 of 7

Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 4:43 pm
by markg735
Hello all,

I have been folding for many years on my idle servers. I recently got a project #6318 WU and after successful completion it seems the collection server is unwilling to accept the results:

Code: Select all

[02:28:28] + Attempting to get work packet
[02:28:28] - Connecting to assignment server
[02:28:29] - Successful: assigned to (171.64.65.60).
[02:28:29] + News From Folding@Home: Welcome to Folding@Home
[02:28:29] Loaded queue successfully.
[02:28:41] + Closed connections
[02:28:41]
[02:28:41] + Processing work unit
[02:28:41] Core required: FahCore_78.exe
[02:28:41] Core found.
[02:28:41] Working on Unit 05 [January 21 02:28:41]
[02:28:41] + Working ...
...
[23:16:58] CoreStatus = 64 (100)
[23:16:58] Sending work to server
[23:16:58] - Error: Length of work/wuresults_05.dat (6873627) exceeds packet limit set (5241856)
[23:16:58] - Error: Could not transmit unit 05 (completed January 21) to work server.
[23:16:58]   Keeping unit 05 in queue.
[23:16:58] - Error: Length of work/wuresults_05.dat (6873627) exceeds packet limit set (5241856)
[23:16:58] - Error: Could not transmit unit 05 (completed January 21) to work server.
[23:16:58] - Error: Length of work/wuresults_05.dat (6873627) exceeds packet limit set (5241856)
[23:16:58]   Could not transmit unit 05 to Collection server; keeping in queue.
This machine has folded with many cores successful for many years. The system is running a 64-bit kernel (Linux). Any clues how to fix this?

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 4:54 pm
by anandhanju
Hello markg735, welcome to the forum!

Which client version are you running? You may need to enable big WUs by running reconfiguring the client as the results for these WUs exceed the "normal" setting.

To fix this particular WU, running qfix should change it so you can upload it.

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 5:05 pm
by bruce
To avoid this problem in the future, one of three things needs to happen. (1) You can run -configonly and change from Small or from Normal to Big (to accept larger WUs) or (2) The owner of the project needs to change the project configuration so that it won't be sent to clients set to whatever is in your current configuration or (3) The owner of the project needs to reduce the amount of data produced by the project so that the uploads will be smaller.

(I've notified the project owner.)

You'll still need to run qfix to be able to upload this result.

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 5:07 pm
by markg735
Hello markg735, welcome to the forum!
Hey!
Which client version are you running?

Code: Select all

 Folding@Home Client Version 5.04beta
You may need to enable big WUs by running reconfiguring the client as the results for these WUs exceed the "normal" setting.
Can you tell me what option and section this falls under in client.cfg? I don't see anything that looks like a limit on returned results size in there.

I start the client with -advmethods -- I was under the impression (that is probably wrong) that this option also enables big WU's.

Also, I take it qfix is not part of the client -- where can I get the qfix binary?

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 5:26 pm
by anandhanju
markg735 wrote:Can you tell me what option and section this falls under in client.cfg? I don't see anything that looks like a limit on returned results size in there.
Run your client with the -configonly option to enter into the configuration mode. E.g., ./fah504 -configonly

As you will not be required to change any other option but for the Big WU, you can hit the return key to retain existing values until you reach the question "Allow receipt of work assignments and return of work results greater than 5MB in size (such work units may have large memory demands (no/yes) [no]?". Key in yes to this and proceed to the end of the configuration steps. More details can be found in this Wiki article: http://fahwiki.net/index.php/How_do_I_r ... 28v5.04.29

After making this change, run Qfix which can be found at http://linuxminded.xs4all.nl/?target=so ... s.plc#qfix (Linux/x86 : qfix (9.91 KB)) to fix the current queue entry.

After these two steps, when you restart the client like you normally do, it should attempt to send the result and _shouldn't_ show the same error message. Do post here if you encounter any issues.

-advmethods does not imply Big WUs. It means that you could receive projects that are in the final stages of testing and those which have a slightly higher chance of failing.

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 7:33 pm
by VijayPande
Thanks for the heads up. We're looking into this.

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 7:54 pm
by markg735
Run your client with the -configonly option to enter into the configuration mode. E.g., ./fah504 -configonly
Did that. It added bigpackets=yes to the settings section in client.cfg. I downloaded qfix and ran it. See below for the results.

Incidentally, whoever runs the site that qfix is on has the MIME type misconfigured. It came back as text/plain.

I restarted my client and the same error appears. The limit remains at 5241856 bytes.

Code: Select all

folding@floyd:~$ ./qfix
entry 7, status 0, address 171.67.108.13:8080
entry 8, status 0, address 171.67.108.13:8080
entry 9, status 0, address 171.67.108.13:8080
entry 0, status 0, address 171.67.108.13:8080
entry 1, status 0, address 171.67.108.13:8080
entry 2, status 0, address 171.67.108.13:8080
entry 3, status 0, address 171.67.108.13:8080
entry 4, status 0, address 171.67.108.13:8080
entry 5, status 2, address 171.64.65.60:8080
  Found results <work/wuresults_05.dat>: proj 31056, run 0, clone 53763, gen 19266
   -- queue entry: proj 6318, run 4154, clone 42, gen 0
   -- doesn't match queue entry
entry 6, status 1, address 171.64.65.60:8080
File is OK

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 8:12 pm
by toTOW
This is definitely and invalid PRCG : proj 31056, run 0, clone 53763, gen 19266 :(

Something is broken somewhere in the client I guess ... might be a good idea to upgrade to v6 if you're not running on an old OS ...

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 8:32 pm
by bruce
Whether you can upgrade or not, please run -queueinfo and post the output that it puts in FAHlog.txt

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 8:37 pm
by smoking2000
markg735 wrote:Incidentally, whoever runs the site that qfix is on has the MIME type misconfigured. It came back as text/plain.
I happen to run that site, and you can thank apaches mime-type autodetection for that header :)

Any decent client will handle the mime type discrepancy properly.

Code: Select all

folding@floyd:~$ ./qfix
[...]
entry 5, status 2, address 171.64.65.60:8080
  Found results <work/wuresults_05.dat>: proj 31056, run 0, clone 53763, gen 19266
   -- queue entry: proj 6318, run 4154, clone 42, gen 0
   -- doesn't match queue entry
entry 6, status 1, address 171.64.65.60:8080
File is OK
I think that p6318 is served by the new v5 work servers who behave differently from the old servers we're used to.

The PRCG qfix finds in the wuresults_05.dat is not what's expected based on what's stored in the queue.dat, so qfix refuses to modify the queue.dat to fix any possible issues.

Can you try the following updated qfix binary that I've prepared to hopefully deal with this change?
http://linuxminded.nl/tmp/qfix

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 8:54 pm
by markg735
Bruce,

Here is the output of -queueinfo. This machine is my SVN server. It is lightly loaded so I run FAH on it while it's idle. But I do have to say I am not so keen on changing things on a machine storing 15 years worth of my coding.

Code: Select all

# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 5.04beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/folding
Executable: ./FAH504-Linux.exe
Arguments: -queueinfo

[20:47:46] - Ask before connecting: No
[20:47:46] - User name: Anonymous (Team 0)
[20:47:46] - User ID: 1CF431DE0CD11586
[20:47:46] - Machine ID: 1
[20:47:46]
[20:47:46] Loaded queue successfully.
[20:47:46] Printing Queue Information
CURRENT QUEUE:
00  EMPTY
01  EMPTY
02  EMPTY
03  EMPTY
04  EMPTY
05  DONE      "Folding@Home" (78) 171.64.65.60:8080  January 21 02:28->January 21 23:16:58
06  DONE      "Folding@Home" (78) 171.64.65.60:8080  January 21 23:17->January 22 20:22:56
07 *READY     "Folding@Home" (78) 171.64.65.60:8080  January 22 20:23 | March 15 20:23
08  EMPTY
09  EMPTY

Folding@Home Client Shutdown.

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 8:59 pm
by bruce
I'm seeing a trend here. Several people are reporting problems and they all seem to be running v5.0x. I'm not sure what to recommend right now but several people are looking at the issue.

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 9:00 pm
by markg735
Smoking2000,
I happen to run that site, and you can thank apaches mime-type autodetection for that header :)
Any decent client will handle the mime type discrepancy properly.
I guess that means Lynx isn't a decent client? FWIW, I also wrote my own HTTP client (and server). And while a wrong header is something it will ignore (for precisely this reason) technically if something comes back as text/* a client has the right to normalize the line endings. Which wouldn't be good for a binary file.

And I think your new code still isn't getting the offset right. The project#'s still don't look right.

Code: Select all

entry 8, status 0, address 171.67.108.13:8080
entry 9, status 0, address 171.67.108.13:8080
entry 0, status 0, address 171.67.108.13:8080
entry 1, status 0, address 171.67.108.13:8080
entry 2, status 0, address 171.67.108.13:8080
entry 3, status 0, address 171.67.108.13:8080
entry 4, status 0, address 171.67.108.13:8080
entry 5, status 2, address 171.64.65.60:8080
  Found results <work/wuresults_05.dat>: proj 20601, run 0, clone 978, gen 16971
   -- queue entry: proj 6318, run 4154, clone 42, gen 0
   -- doesn't match queue entry
entry 6, status 2, address 171.64.65.60:8080
  Found results <work/wuresults_06.dat>: proj 24716, run 0, clone 25991, gen 16971
   -- queue entry: proj 6318, run 2338, clone 50, gen 0
   -- doesn't match queue entry
entry 7, status 1, address 171.64.65.60:8080
File is OK

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 9:05 pm
by smoking2000
Then I'm afraid that the wuresults format has changed too much for qfix to be able to support it. I don't have the time to reverse engineer that too.

Re: Project 6318: Collection server misconfigured?

Posted: Fri Jan 22, 2010 9:13 pm
by markg735
Would it help anybody if I tarballed up my folding directory for someone at Stanford to check out?