[01:29:22] - Autosending finished units... [April 1 01:29:22 UTC]
[01:29:22] Trying to send all finished work units
[01:29:22] Project: 10005 (Run 4930, Clone 0, Gen 9)
[01:29:22] - Read packet limit of 540015616... Set to 524286976.
[01:29:22] + Attempting to send results [April 1 01:29:22 UTC]
[01:29:22] - Reading file work/wuresults_05.dat from core
[01:29:23] (Read 1247883 bytes from disk)
[01:29:23] Connecting to http://129.74.85.48:8080/
[01:30:29] Posted data.
[01:30:29] Initial: 0000; - Uploaded at ~18 kB/s
[01:30:29] - Averaged speed for that direction ~26 kB/s
[01:30:29] - Server does not have record of this unit. Will try again later.
[01:30:29] - Error: Could not transmit unit 05 (completed March 30) to work server.
[01:30:29] - 18 failed uploads of this unit.
[01:30:29] - Read packet limit of 540015616... Set to 524286976.
[01:30:29] + Attempting to send results [April 1 01:30:29 UTC]
[01:30:29] - Reading file work/wuresults_05.dat from core
[01:30:29] (Read 1247883 bytes from disk)
[01:30:29] Connecting to http://129.74.85.49:8080/
[01:31:42] Posted data.
[01:31:42] Initial: 0000; - Uploaded at ~16 kB/s
[01:31:42] - Averaged speed for that direction ~24 kB/s
[01:31:42] - Server does not have record of this unit. Will try again later.
[01:31:42] Could not transmit unit 05 to Collection server; keeping in queue.
[01:31:42] + Sent 0 of 1 completed units to the server
[01:31:42] - Autosend completed
I notice that p10005 has been taken off of psummary -- could this be why the server doesn't want the WU?
Note that this server (.48) is the actual work server and not the collection server (which is 129.74.85.49).
Well, it's better for a Collection Server to tell you that it can't upload your WU than to accept it and then discard it. The primary server does have a problem with a NETLOAD that is out of sight. I'll notify the server owner, but it's 03:30 there, so everybody is sleeping.
bruce wrote:The primary server does have a problem with a NETLOAD that is out of sight. I'll notify the server owner, but it's 03:30 there, so everybody is sleeping.
There isn't enough classic work available so as soon as .48 tells the AS it has work to assign, it gets swamped.
Earlier this week, gbowman stopped assignments on 171.67.108.13 because of NETLOAD. That appears to have stabilized and in the meantime has created 12000+ work units. Handing those out would reduce the load on .48 and provide work to quite a number of idling clients.
bruce wrote:Well, it's better for a Collection Server to tell you that it can't upload your WU than to accept it and then discard it. The primary server does have a problem with a NETLOAD that is out of sight. I'll notify the server owner, but it's 03:30 there, so everybody is sleeping.
We're aware of the problem and working to fix it.
Thanks
Your efforts will be too late for me — my client has magically stopped trying to send my WU and reports that no unsent units remain even though the logs show that it was never sent and the files remain in my Work folder.
When my current WU is finished in a few minutes and successfully submitted, I'm shutting down my client and giving it a lo-o-o-o-o-o-ng rest to give the people at Stanford a chance to get their act together.
Unless. of course, somebody can suggest something to minimize the chances of having completed WUs rejected because of errors or no records...
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
I'd like to see that log. WUs don't magically stop trying to send unless something happens.
Please also post the Project, Run, Clone, and Gen number for that work unit so someone can look in to the problem, check server logs, that sort of thing. It's hard to suggest anything without more information.
7im wrote:I'd like to see that log. WUs don't magically stop trying to send unless something happens.
Please also post the Project, Run, Clone, and Gen number for that work unit so someone can look in to the problem, check server logs, that sort of thing. It's hard to suggest anything without more information.
One thing that's certain to happen to WUs that don't upload is that eventually they'll expire and be deleted by the client. That may not be what happened in your case, but 7im's request for FAHlog is a reasonable request.
[21:23:08] - Machine ID: 1
[21:23:08]
[21:23:08] Loaded queue successfully.
[21:23:08] Initialization complete
[21:23:08]
[21:23:08] + Processing work unit
[21:23:08] Project: 10008 (Run 1519, Clone 0, Gen 11)
[21:23:08] - Read packet limit of 540015616... Set to 524286976.
[21:23:08] + Attempting to send results [April 1 21:23:08 UTC]
[21:23:08] Core required: FahCore_78.exe
[21:23:08] Core found.
[21:23:08] Working on queue slot 06 [April 1 21:23:08 UTC]
[21:23:08] + Working ...
[21:23:08]
[21:23:08] *------------------------------*
[21:23:08] Folding@Home Gromacs Core
[21:23:08] Version 1.90 (March 8, 2006)
[21:23:08]
[21:23:08] Preparing to commence simulation
[21:23:08] - Looking at optimizations...
[21:23:08] - Files status OK
[21:23:08] - Expanded 463366 -> 2244013 (decompressed 484.2 percent)
[21:23:08]
[21:23:08] Project: 6313 (Run 716, Clone 9, Gen 24)
[21:23:08]
[21:23:08] Assembly optimizations on if available.
[21:23:08] Entering M.D.
[21:23:09] - Couldn't send HTTP request to server
[21:23:09] + Could not connect to Work Server (results)
[21:23:09] (129.74.85.48:8080)
[21:23:09] + Retrying using alternative port
[21:23:28] (Starting from checkpoint)
[21:23:28] Protein: p6313_sh3_with_ALA_frags
[21:23:28]
[21:23:28] Writing local files
[21:23:28] Completed 75000 out of 500000 steps (15%)
[21:23:28] Extra SSE boost OK.
[21:23:30] - Couldn't send HTTP request to server
[21:23:30] + Could not connect to Work Server (results)
[21:23:30] (129.74.85.48:80)
[21:23:30] - Error: Could not transmit unit 05 (completed April 1) to work server.
[21:23:30] - Read packet limit of 540015616... Set to 524286976.
[21:23:30] + Attempting to send results [April 1 21:23:30 UTC]
[21:24:27] - Server does not have record of this unit. Will try again later.
[21:24:27] Could not transmit unit 05 to Collection server; keeping in queue.
[21:33:04] Writing local files
[21:33:04] Completed 80000 out of 500000 steps (16%)
[21:42:38] Writing local files
............
.........
[03:12:17] Completed 255000 out of 500000 steps (51%)
[03:21:56] Writing local files
[03:21:56] Completed 260000 out of 500000 steps (52%)
[03:24:28] Project: 10008 (Run 1519, Clone 0, Gen 11)
[03:24:28] - Read packet limit of 540015616... Set to 524286976.
[03:24:28] + Attempting to send results [April 2 03:24:28 UTC]
[03:27:02] - Server reports problem with unit.
[03:31:34] Writing local files
[03:31:34] Completed 265000 out of 500000 steps (53%)
..........
......
[09:27:40] Printing Queue Information
Current Queue:
Slot 07 Empty/Deleted
Project: 1771 (Run 3, Clone 86, Gen 2), Core: a0
Work server: 134.139.127.31:8080
Collection server: 134.139.127.34
Download date: March 25 09:08:35
Finished date: March 27 12:15:13
Slot 08 Empty/Deleted
Project: 6316 (Run 269, Clone 8, Gen 33), Core: 78
Work server: 171.64.65.111:8080
Collection server: 171.67.108.17
Download date: March 27 12:16:27
Finished date: March 28 03:37:02
Slot 09 Empty/Deleted
Project: 2613 (Run 37, Clone 5, Gen 138), Core: 78
Work server: 171.64.65.65:8080
Collection server: 171.67.108.25
Download date: March 28 03:38:23
Finished date: March 29 04:07:52
Slot 00 Empty/Deleted
Project: 6316 (Run 360, Clone 5, Gen 37), Core: 78
Work server: 171.64.65.111:8080
Collection server: 171.67.108.17
Download date: March 29 07:41:09
Finished date: March 29 23:56:10
Slot 01 Empty/Deleted
Project: 10004 (Run 3508, Clone 0, Gen 5), Core: b4
Work server: 129.74.85.48:8080
Collection server: 129.74.85.49
Download date: March 29 23:57:16
Finished date: March 30 08:00:03
Failed uploads: 1
Slot 02 Empty/Deleted
Project: 6314 (Run 113, Clone 14, Gen 42), Core: 78
Work server: 171.64.65.111:8080
Collection server: 171.67.108.17
Download date: March 30 08:24:32
Finished date: March 31 00:42:42
Slot 03 Empty/Deleted
Project: 6316 (Run 175, Clone 10, Gen 42), Core: 78
Work server: 171.64.65.111:8080
Collection server: 171.67.108.17
Download date: March 31 00:43:49
Finished date: March 31 16:51:18
Slot 04 Empty/Deleted
Project: 6316 (Run 234, Clone 11, Gen 42), Core: 78
Work server: 171.64.65.111:8080
Collection server: 171.67.108.17
Download date: March 31 16:53:26
Finished date: April 1 09:20:44
******************************************************************
Slot 05 Empty/Deleted
Project: 10008 (Run 1519, Clone 0, Gen 11), Core: b4
Work server: 129.74.85.48:8080
Collection server: 129.74.85.16
Download date: April 1 09:21:47
Finished date: April 1 18:51:47
*****************************************************************
Slot 06 *Ready
Project: 6313 (Run 716, Clone 9, Gen 24), Core: 78
Work server: 171.64.65.111:8080
Collection server: 171.67.108.17
Download date: April 1 18:54:26
Deadline date: May 23 18:54:26
PF: 0.982998 based on last 4 slot(s)
[edit]As I noted in my OP it was Project: 10008 (Run 2455, Clone 0, Gen 12).[/edit]
This is the second WU in a row (plus at least one other since November) that has been lost due to this ambiguous "problem", which elsewhere in this Forum has been attributed to being due to either my client or computer.
You can't possibly know what has happened to my client or computer, and I can't even begin to guess what might have changed.
Is it time to uninstall and then download/install all new software?
[07:37:19] - Autosending finished units... [April 2 07:37:19 UTC]
[07:37:19] Trying to send all finished work units
[07:37:19] Project: 10005 (Run 4930, Clone 0, Gen 9)
[07:37:19] - Read packet limit of 540015616... Set to 524286976.
[07:37:19] + Attempting to send results [April 2 07:37:19 UTC]
[07:37:19] - Reading file work/wuresults_05.dat from core
[07:37:19] (Read 1247883 bytes from disk)
[07:37:19] Connecting to http://129.74.85.48:8080/
[07:40:15] Posted data.
[07:40:15] Initial: 0000; - Uploaded at ~6 kB/s
[07:40:15] - Averaged speed for that direction ~35 kB/s
[07:40:15] + Results successfully sent
[07:40:15] Thank you for your contribution to Folding@Home.
[07:40:15] + Number of Units Completed: 569
[07:40:16] + Sent 1 of 1 completed units to the server
[07:40:16] - Autosend completed
tacoma43 wrote:I was able to successfully upload a Project: 4604 WU yesterday so, temporarily at least, my time and CPUs aren't being wasted.
I think both of the previous WUs that failed to upload were Project: 10008s, all of which suggests the problem is at the Stanford end.
Someone most likely could tell the cause of failure, if you had/would post your Fahlog including the portions that show
how F@H is being started, especially if you are employing the -verbosity 9 flag.
The portion that you did post could be caused by the relavent server being temporarily off-line.
[09:15:51] - Server reports problem with unit.
The client will automatically attempt to upload any completed work units, every six hours.