Page 2 of 7
Re: Project 6318: Collection server misconfigured?
Posted: Fri Jan 22, 2010 9:22 pm
by bruce
markg735 wrote:Would it help anybody if I tarballed up my folding directory for someone at Stanford to check out?
I would most definately create the tarball but I wouldn't expect Stanford to ask for it -- depending on what they find just by checking this project (see the post from "VijayPande" above) -- and even if they can use, it probably won't be something they want quickly.
Re: Project 6318: Collection server misconfigured?
Posted: Fri Jan 22, 2010 10:12 pm
by Tobit
One of my teammates is experience the same thing with another 6318 and, of course, he is running a 5.0x client.
Re: Project 6318: Collection server misconfigured?
Posted: Sat Jan 23, 2010 1:50 am
by AgrFan
I had this problem running the v5 client also. It looks like v5 clients may have pulled 6318 units incorrectly during the server outage. I was running 1 v5 client and 5 v6 clients when the server outage occurred. The v6 clients always pulled 63xx units and the v5 clients always pulled 44xx/46xx units. It seemed strange at the time to get a 6318 unit with the v5 client. The v5 client has since been upgraded to v6. This is the first time I've had a issue running older client versions.
Could it be possible 6318 requires the v6 client and min_ver for this server is set incorrectly to v5?
171.64.65.60 does not show 'min_ver' on the serverstat page so there's no way to check this.
http://fah-web.stanford.edu/localinfo/c ... assic.html
Re: Project 6318: Collection server misconfigured?
Posted: Wed Jan 27, 2010 9:32 am
by Dave
I have been folding for one year now with no problems, until Project 6318 (Run 408, Clone 53, Gen 0) running Core 78. My system also has Cores 81, 82, and a0. My configuration has always been set for big files (box on bottom of Connection tab is checked to allow files >10MB), as this project requires. I also use the
-advmethods additional parameter. I am using the current client (version 6.23, built November 26, 2008), so I'm pretty sure this is NOT a version problem. Slightly off the subject, but it would be extremely helpful if there were a checkbox configuration setting to allow an automatic check for an updated version at startup, plus a menu item to do a manual update check when you right-click on the systray icon. Hint: There's room on the Connection tab.
As you may note, my current project is Project 6318 (Run 3098, Clone 65, Gen 0), which seems to be running fine as well. I am now concerned about whether it will upload when it finishes. According to FAHmon, the queue for slot 07 uses Work server 171.64.65.60:8080, and Collection Server 171.67.108.26.
I've set my Windows Firewall to allow Folding@Home to communicate through IP addresses 171.64.65.60 and 171.67.108.26. Unfortunately, I cannot narrow the scope any further than the IP address, so I can't add the port numbers it uses.
So far, there have already been 10 failed upload attempts on the completed work unit.
Here is my logfile:
Code: Select all
--- Opening Log file [January 26 18:16:34 UTC]
# Windows CPU Systray Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Users\Dave\AppData\Roaming\Folding@home-x86
Arguments: -advmethods
[18:16:34] - Ask before connecting: No
[18:16:34] - User name: Dave_Haber (Team 125669)
[18:16:34] - User ID: 12B4BDB531AD6DA4
[18:16:34] - Machine ID: 1
[18:16:34]
[18:16:34] Loaded queue successfully.
[18:16:34] Initialization complete
[18:16:34]
[18:16:34] + Processing work unit
[18:16:34] Core required: FahCore_78.exe
[18:16:34] Core found.
[18:16:35] Project: 6318 (Run 408, Clone 53, Gen 0)
[18:16:35] - Read packet limit of 540015616... Set to 524286976.
[18:16:35] + Attempting to send results [January 26 18:16:35 UTC]
[18:16:35] Working on queue slot 08 [January 26 18:16:35 UTC]
[18:16:35] + Working ...
[18:16:35] - Couldn't send HTTP request to server
[18:16:35] + Could not connect to Work Server (results)
[18:16:35] (171.64.65.60:8080)
[18:16:35] + Retrying using alternative port
[18:16:35]
[18:16:35] *------------------------------*
[18:16:35] Folding@Home Gromacs Core
[18:16:35] Version 1.90 (March 8, 2006)
[18:16:35]
[18:16:35] Preparing to commence simulation
[18:16:35] - Looking at optimizations...
[18:16:35] - Files status OK
[18:16:36] - Expanded 390548 -> 2244040 (decompressed 574.5 percent)
[18:16:36] - Couldn't send HTTP request to server
[18:16:36] + Could not connect to Work Server (results)
[18:16:36] (171.64.65.60:80)
[18:16:36] - Error: Could not transmit unit 07 (completed January 25) to work server.
[18:16:36] - Read packet limit of 540015616... Set to 524286976.
[18:16:36] + Attempting to send results [January 26 18:16:36 UTC]
[18:16:36] - Couldn't send HTTP request to server
[18:16:36] + Could not connect to Work Server (results)
[18:16:36] (171.67.108.26:8080)
[18:16:36] + Retrying using alternative port
[18:16:43]
[18:16:43] Project: 6318 (Run 3098, Clone 65, Gen 0)
[18:16:43]
[18:16:43] Assembly optimizations on if available.
[18:16:43] Entering M.D.
[18:17:03] (Starting from checkpoint)
[18:17:03] Protein: Great Red Oystrich Makes All Chemists Sane in water
[18:17:03]
[18:17:03] Writing local files
[18:17:03] Completed 122418 out of 500000 steps (24%)
[18:17:03] Extra SSE boost OK.
[18:35:37] Writing local files
[18:35:37] Completed 125000 out of 500000 steps (25%)
[19:09:18] Writing local files
[19:09:18] Completed 130000 out of 500000 steps (26%)
[19:45:56] Writing local files
[19:45:56] Completed 135000 out of 500000 steps (27%)
[20:21:21] Writing local files
[20:21:21] Completed 140000 out of 500000 steps (28%)
[20:58:15] Writing local files
[20:58:15] Completed 145000 out of 500000 steps (29%)
[21:34:03] Writing local files
[21:34:03] Completed 150000 out of 500000 steps (30%)
[22:11:40] Writing local files
[22:11:40] Completed 155000 out of 500000 steps (31%)
[22:46:28] Writing local files
[22:46:28] Completed 160000 out of 500000 steps (32%)
[23:19:43] Writing local files
[23:19:43] Completed 165000 out of 500000 steps (33%)
[23:58:27] Writing local files
[23:58:27] Completed 170000 out of 500000 steps (34%)
[00:35:05] + Could not connect to Work Server (results)
[00:35:05] (171.67.108.26:80)
[00:35:05] Could not transmit unit 07 to Collection server; keeping in queue.
[00:51:35] Writing local files
[00:51:35] Completed 175000 out of 500000 steps (35%)
[01:26:41] Writing local files
[01:26:42] Completed 180000 out of 500000 steps (36%)
[02:02:45] Writing local files
[02:02:45] Completed 185000 out of 500000 steps (37%)
[02:38:54] Writing local files
[02:38:54] Completed 190000 out of 500000 steps (38%)
[03:14:49] Writing local files
[03:14:49] Completed 195000 out of 500000 steps (39%)
[03:50:45] Writing local files
[03:50:45] Completed 200000 out of 500000 steps (40%)
[04:26:41] Writing local files
[04:26:41] Completed 205000 out of 500000 steps (41%)
[05:02:00] Writing local files
[05:02:01] Completed 210000 out of 500000 steps (42%)
[05:36:10] Writing local files
[05:36:10] Completed 215000 out of 500000 steps (43%)
[06:16:50] Writing local files
[06:16:50] Completed 220000 out of 500000 steps (44%)
[06:35:03] Project: 6318 (Run 408, Clone 53, Gen 0)
[06:35:03] - Read packet limit of 540015616... Set to 524286976.
[06:35:04] + Attempting to send results [January 27 06:35:04 UTC]
[06:35:04] - Couldn't send HTTP request to server
[06:35:04] + Could not connect to Work Server (results)
[06:35:04] (171.64.65.60:8080)
[06:35:04] + Retrying using alternative port
[06:35:06] - Couldn't send HTTP request to server
[06:35:06] + Could not connect to Work Server (results)
[06:35:06] (171.64.65.60:80)
[06:35:06] - Error: Could not transmit unit 07 (completed January 25) to work server.
[06:35:06] - Read packet limit of 540015616... Set to 524286976.
[06:35:06] + Attempting to send results [January 27 06:35:06 UTC]
[06:35:06] - Couldn't send HTTP request to server
[06:35:06] + Could not connect to Work Server (results)
[06:35:06] (171.67.108.26:8080)
[06:35:06] + Retrying using alternative port
[06:35:07] - Couldn't send HTTP request to server
[06:35:07] (Got status 503)
[06:35:07] + Could not connect to Work Server (results)
[06:35:07] (171.67.108.26:80)
[06:35:07] Could not transmit unit 07 to Collection server; keeping in queue.
[07:34:01] Writing local files
[07:34:01] Completed 225000 out of 500000 steps (45%)
[08:53:22] Writing local files
[08:53:22] Completed 230000 out of 500000 steps (46%)
Re: Project 6318: Collection server misconfigured?
Posted: Thu Jan 28, 2010 1:25 am
by bruce
I suggest that you download the 3rd party application "qfix" into that directory and run it from the text window with FAH stopped. It looks like you've received a WU which is trying to upload a result larger than was expected for your configuration setting.
Was the Connection tab box always checked to allow files >10MB or did you change that recently?
Re: Project 6318: Collection server misconfigured?
Posted: Thu Jan 28, 2010 4:20 am
by AgrFan
A teammate was able to resolve this problem by downloading the v6.x client and overlaying his v5.x client executable with it. Use the -sendall switch to upload the completed unit after overlaying the client executable. Make sure to upgrade to the v6.x client when you're finished.
Re: Project 6318: Collection server misconfigured?
Posted: Fri Jan 29, 2010 2:31 am
by matheusber
I have also that annoying message:
Code: Select all
$ ./qfix.velho
entry 5, status 0, address 0.0.0.0
Found results <work/wuresults_05.dat>: proj 2681, run 1, clone 3, gen 76
-- queue entry: proj 0, run 0, clone 0, gen 0
-- doesn't match queue entry
entry 6, status 0, address 0.0.0.0
entry 7, status 0, address 0.0.0.0
entry 8, status 0, address 0.0.0.0
entry 9, status 0, address 0.0.0.0
entry 0, status 0, address 0.0.0.0
entry 1, status 0, address 171.67.108.22:8080
entry 2, status 0, address 171.67.108.22:8080
entry 3, status 0, address 171.67.108.22:8080
entry 4, status 1, address 171.67.108.22:8080
File is OK
I tried to get
http://linuxminded.nl/tmp/qfix but got 404
is there any hope ?
for the dev of qfix, thanks ... saved me a lot of wu's.
matheus
Re: Project 6318: Collection server misconfigured?
Posted: Fri Jan 29, 2010 3:48 am
by ChelseaOilman
Re: Project 6318: Collection server misconfigured?
Posted: Fri Jan 29, 2010 4:20 am
by matheusber
thanks, but I tried that already and no good
i was trying to get a newer version of qfix, the old can help.
thanks for your time,
matheus
Re: Project 6318: Collection server misconfigured?
Posted: Fri Jan 29, 2010 7:54 am
by Dave
bruce wrote:I suggest that you download the 3rd party application "qfix" into that directory and run it from the text window with FAH stopped. It looks like you've received a WU which is trying to upload a result larger than was expected for your configuration setting.
Was the Connection tab box always checked to allow files >10MB or did you change that recently?
Bruce, thanks for the qfix suggestion, but it didn't seem to work.
After I restarted F@H, the same messages came up again. My configuration setting has always been for files >10MB, and I use the
-advmethods additional parameter. I tried adding that
-sendall additional parameter, but my current client version 6.23 apparently doesn't recognize it as a valid entry. I just don't what else to do at this point. Any other possible ideas?
Re: Project 6318: Collection server misconfigured?
Posted: Fri Jan 29, 2010 4:51 pm
by ChelseaOilman
Dave wrote:I tried adding that -sendall additional parameter, but my current client version 6.23 apparently doesn't recognize it as a valid entry.
That's because there should be a space between send and all. ---> -send all
What does qd show as the status of that WU?
Re: Project 6318: Collection server misconfigured?
Posted: Sat Jan 30, 2010 5:24 am
by AgrFan
Dave wrote:I just don't what else to do at this point. Any other possible ideas?
Are you running the v5.x client? If so, download the v6.x client and overlay fah5.exe in the working directory with fah6.exe. Start the updated client with the -sendall switch to upload any completed units. Make sure to do a clean install of the v6.x client when finished.
Re: Project 6318: Collection server misconfigured?
Posted: Sat Jan 30, 2010 6:28 am
by bruce
. . . unless you're running an old version of Windows that REQUIRES the lower version. In any case, back up what you have before you try this method.
Re: Project 6318: Collection server misconfigured?
Posted: Sat Jan 30, 2010 7:38 am
by Dave
I run the current client (version 6.23) on a relatively new system (Windows Vista Home Basic, 2.0GB RAM, 2.0GHz single-core CPU, 139GB hard drive), so I know that I should be able to run any work unit in my current configuration. As I mentioned previously, I have checked the configuration box to permit results >10MB to be transmitted, and I have also included the additional parameter
-advmethods. I have had no problems with this configuration for over a year, until now. I now have TWO slots (7 and 8) tied up with results from project 6318 that will not transmit.
I have attempted to try everything that has been suggested so far. The qfix program did not seem to have any effect on the queue slots. Every time I tried to add the
-send all additional parameter, the client died. I tried
-advmethods -send all, then I tried
-send all -advmethods, then I tried just
-send all by itself, but each time, the client stopped working and I had to restart it. This version of the client just doesn't seem to allow
-send all to be used as an additional parameter in the configuration file. Am I correct in this assumption?
Perhaps I'm just missing something patently obvious to the Folding veterans, but I sure am open to some guidance. I monitor Folding@Home using FAHmon, so I copied and pasted my current client session. Here is my logfile:
Code: Select all
--- Opening Log file [January 30 06:53:55 UTC]
# Windows CPU Systray Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Users\Dave\AppData\Roaming\Folding@home-x86
Arguments: -advmethods
[06:53:55] - Ask before connecting: No
[06:53:55] - User name: Dave_Haber (Team 125669)
[06:53:55] - User ID: 12B4BDB531AD6DA4
[06:53:55] - Machine ID: 1
[06:53:55]
[06:53:55] Loaded queue successfully.
[06:53:55] Initialization complete
[06:53:55]
[06:53:55] + Processing work unit
[06:53:55] Core required: FahCore_78.exe
[06:53:55] Core found.
[06:53:55] Working on queue slot 09 [January 30 06:53:55 UTC]
[06:53:55] + Working ...
[06:53:55] Project: 6318 (Run 408, Clone 53, Gen 0)
[06:53:55] - Read packet limit of 540015616... Set to 524286976.
[06:53:55] + Attempting to send results [January 30 06:53:55 UTC]
[06:53:55] - Couldn't send HTTP request to server
[06:53:55] + Could not connect to Work Server (results)
[06:53:55] (171.64.65.60:8080)
[06:53:55] + Retrying using alternative port
[06:53:57] - Couldn't send HTTP request to server
[06:53:57] + Could not connect to Work Server (results)
[06:53:57] (171.64.65.60:80)
[06:53:57] - Error: Could not transmit unit 07 (completed January 25) to work server.
[06:53:57] - Read packet limit of 540015616... Set to 524286976.
[06:53:57] + Attempting to send results [January 30 06:53:57 UTC]
[06:53:57] - Couldn't send HTTP request to server
[06:53:57] + Could not connect to Work Server (results)
[06:53:57] (171.67.108.26:8080)
[06:53:57] + Retrying using alternative port
[06:53:57]
[06:53:57] *------------------------------*
[06:53:57] Folding@Home Gromacs Core
[06:53:57] Version 1.90 (March 8, 2006)
[06:53:57]
[06:53:57] Preparing to commence simulation
[06:53:57] - Looking at optimizations...
[06:53:57] - Files status OK
[06:53:58] - Couldn't send HTTP request to server
[06:53:58] (Got status 503)
[06:53:58] + Could not connect to Work Server (results)
[06:53:58] (171.67.108.26:80)
[06:53:58] Could not transmit unit 07 to Collection server; keeping in queue.
[06:53:58] Project: 6318 (Run 3098, Clone 65, Gen 0)
[06:53:58] - Read packet limit of 540015616... Set to 524286976.
[06:53:58] + Attempting to send results [January 30 06:53:58 UTC]
[06:53:58] - Couldn't send HTTP request to server
[06:53:58] + Could not connect to Work Server (results)
[06:53:58] (171.64.65.60:8080)
[06:53:58] + Retrying using alternative port
[06:54:02] - Couldn't send HTTP request to server
[06:54:02] + Could not connect to Work Server (results)
[06:54:02] (171.64.65.60:80)
[06:54:02] - Error: Could not transmit unit 08 (completed January 30) to work server.
[06:54:02] - Read packet limit of 540015616... Set to 524286976.
[06:54:02] + Attempting to send results [January 30 06:54:02 UTC]
[06:54:02] - Couldn't send HTTP request to server
[06:54:02] + Could not connect to Work Server (results)
[06:54:02] (171.67.108.26:8080)
[06:54:02] + Retrying using alternative port
[06:54:03] - Couldn't send HTTP request to server
[06:54:03] (Got status 503)
[06:54:03] + Could not connect to Work Server (results)
[06:54:03] (171.67.108.26:80)
[06:54:03] Could not transmit unit 08 to Collection server; keeping in queue.
[06:54:03] + Working...
[06:54:30] Printing Queue Information
Current Queue:
Slot 00 Empty/Deleted
Project: 2493 (Run 80, Clone 7, Gen 4), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: November 26 04:50:54
Finished date: December 4 00:36:13
Slot 01 Empty/Deleted
Project: 2494 (Run 91, Clone 32, Gen 0), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: December 4 00:37:09
Finished date: December 9 09:21:10
Slot 02 Empty/Deleted
Project: 2494 (Run 141, Clone 7, Gen 0), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: December 9 09:22:02
Finished date: December 13 16:57:58
Slot 03 Empty/Deleted
Project: 2494 (Run 176, Clone 20, Gen 0), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: December 13 16:58:49
Finished date: December 21 06:36:07
Slot 04 Empty/Deleted
Project: 2494 (Run 236, Clone 39, Gen 0), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: December 21 06:37:00
Finished date: December 31 11:36:25
Slot 05 Empty/Deleted
Project: 2494 (Run 202, Clone 5, Gen 0), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: December 31 11:37:33
Finished date: January 11 19:33:04
Slot 06 Empty/Deleted
Project: 2494 (Run 44, Clone 4, Gen 1), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: January 11 19:33:56
Finished date: January 21 21:42:22
Slot 07 Done
Project: 6318 (Run 408, Clone 53, Gen 0), Core: 78
Work server: 171.64.65.60:8080
Collection server: 171.67.108.26
Download date: January 22 12:03:15
Finished date: January 25 07:15:08
Failed uploads: 24
Slot 08 Done
Project: 6318 (Run 3098, Clone 65, Gen 0), Core: 78
Work server: 171.64.65.60:8080
Collection server: 171.67.108.26
Download date: January 25 07:31:45
Finished date: January 30 01:09:13
Failed uploads: 6
Slot 09 *Ready
Project: 2495 (Run 72, Clone 0, Gen 0), Core: 78
Work server: 171.65.103.160:80
Collection server: 171.67.108.17
Download date: January 30 02:02:41
Deadline date: May 3 02:02:41
PF: 0.911775 based on last 4 slot(s)
[06:54:42] - Expanded 2199512 -> 15082113 (decompressed 685.7 percent)
[06:54:45] - Starting from initial work packet
[06:54:45]
[06:54:45] Project: 2495 (Run 72, Clone 0, Gen 0)
[06:54:45]
[06:54:52] Assembly optimizations on if available.
[06:54:52] Entering M.D.
[06:55:10] Protein: system
[06:55:10]
[06:55:11] Writing local files
[06:55:36] Extra SSE boost OK.
[06:55:45] Writing local files
[06:55:46] Completed 0 out of 250000 steps (0%)
Re: Project 6318: Collection server misconfigured?
Posted: Sat Jan 30, 2010 9:24 pm
by 7im
The -send command is redundant. By default, the fah client attempts to "send all" EVERY time the client is started. There is no need to use that switch to send all completed work units. Stopping and restarting the client does the same thing. Sorry, can't help much with the rest.