GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Moderators: Site Moderators, FAHC Science Team

weedacres
Posts: 138
Joined: Mon Dec 24, 2007 11:18 pm
Hardware configuration: UserNames: weedacres_gpu ...
Location: Eastern Washington

What do we do with all of the unsent workunits?

Post by weedacres »

Now that the gpu servers seem to be running again, I'd like to get the unsent workunits that have accumulated over the last few days sent in.

I currently have about 150 wuresults_xx.dat files that are not being detected by autosend and will not send with -send.

For example:

Code: Select all

C:\GPU0>gpu -send 00

Note: Please read the license agreement (gpu -license). Further
use of this software requires that you have read and accepted this agreement.



--- Opening Log file [February 15 18:58:14 UTC]


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.20

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\GPU0
Executable: gpu
Arguments: -send 00 -verbosity 9

[18:58:14] - Ask before connecting: No
[18:58:14] - User name: weedacres_gpu (Team 52523)
[18:58:14] - User ID: 3E34899A69C37D0A
[18:58:14] - Machine ID: 1
[18:58:14]
[18:58:15] Loaded queue successfully.
[18:58:15] Attempting to return result(s) to server...
[18:58:15] Project: 10501 (Run 356, Clone 0, Gen 0)
[18:58:15] - Warning: Asked to send unfinished unit to server
[18:58:15] - Failed to send unit 00 to server
[18:58:15] ***** Got a SIGTERM signal (2)
[18:58:15] Killing all core threads

Folding@Home Client Shutdown.

C:\GPU0>
Here's the log for this particular wu:

Code: Select all

[09:53:22] Folding@Home GPU Core
[09:53:22] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[09:53:22] 
[09:53:22] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[09:53:22] Build host: amoeba
[09:53:22] Board Type: Nvidia
[09:53:22] Core      : 
[09:53:22] Preparing to commence simulation
[09:53:22] - Looking at optimizations...
[09:53:22] DeleteFrameFiles: successfully deleted file=work/wudata_00.ckp
[09:53:22] - Created dyn
[09:53:22] - Files status OK
[09:53:22] - Expanded 59223 -> 336763 (decompressed 568.6 percent)
[09:53:22] Called DecompressByteArray: compressed_data_size=59223 data_size=336763, decompressed_data_size=336763 diff=0
[09:53:22] - Digital signature verified
[09:53:22] 
[09:53:22] Project: 10501 (Run 356, Clone 0, Gen 0)
[09:53:22] 
[09:53:22] Assembly optimizations on if available.
[09:53:22] Entering M.D.
[09:53:28] Tpr hash work/wudata_00.tpr:  86323995 1473313338 4156891406 2337521679 3596678503
[09:53:28] 
[09:53:28] Calling fah_main args: 14 usage=90
[09:53:28] 
[09:53:28] Working on Protein
[09:53:29] Client config found, loading data.
[09:53:29] Starting GUI Server
[09:55:15] Completed 1%
[09:57:02] Completed 2%
[09:58:48] Completed 3%
[10:00:34] Completed 4%
[10:02:20] Completed 5%
[10:04:06] Completed 6%
[10:05:52] Completed 7%
[10:07:39] Completed 8%
[10:09:25] Completed 9%
[10:11:11] Completed 10%
[10:12:57] Completed 11%
[10:14:44] Completed 12%
[10:16:30] Completed 13%
[10:18:16] Completed 14%
[10:20:02] Completed 15%
[10:21:49] Completed 16%
[10:23:35] Completed 17%
[10:25:21] Completed 18%
[10:27:07] Completed 19%
[10:28:53] Completed 20%
[10:30:39] Completed 21%
[10:32:26] Completed 22%
[10:34:12] Completed 23%
[10:35:58] Completed 24%
[10:37:44] Completed 25%
[10:39:30] Completed 26%
[10:41:16] Completed 27%
[10:43:03] Completed 28%
[10:44:49] Completed 29%
[10:46:35] Completed 30%
[10:48:21] Completed 31%
[10:50:07] Completed 32%
[10:51:53] Completed 33%
[10:53:39] Completed 34%
[10:55:25] Completed 35%
[10:57:11] Completed 36%
[10:58:57] Completed 37%
[11:00:43] Completed 38%
[11:02:30] Completed 39%
[11:04:16] Completed 40%
[11:06:02] Completed 41%
[11:07:48] Completed 42%
[11:09:35] Completed 43%
[11:11:21] Completed 44%
[11:13:07] Completed 45%
[11:14:53] Completed 46%
[11:16:39] Completed 47%
[11:18:25] Completed 48%
[11:20:11] Completed 49%
[11:21:23] - Autosending finished units... [February 15 11:21:23 UTC]
[11:21:23] Trying to send all finished work units
[11:21:23] + No unsent completed units remaining.
[11:21:23] - Autosend completed
[11:21:57] Completed 50%
[11:23:43] Completed 51%
[11:25:30] Completed 52%
[11:27:16] Completed 53%
[11:29:02] Completed 54%
[11:30:48] Completed 55%
[11:32:34] Completed 56%
[11:34:20] Completed 57%
[11:36:06] Completed 58%
[11:37:53] Completed 59%
[11:39:39] Completed 60%
[11:41:25] Completed 61%
[11:43:11] Completed 62%
[11:44:57] Completed 63%
[11:46:43] Completed 64%
[11:48:29] Completed 65%
[11:50:15] Completed 66%
[11:52:02] Completed 67%
[11:53:48] Completed 68%
[11:55:34] Completed 69%
[11:57:20] Completed 70%
[11:59:06] Completed 71%
[12:00:52] Completed 72%
[12:02:38] Completed 73%
[12:04:24] Completed 74%
[12:06:10] Completed 75%
[12:07:56] Completed 76%
[12:09:42] Completed 77%
[12:11:29] Completed 78%
[12:13:15] Completed 79%
[12:15:01] Completed 80%
[12:16:47] Completed 81%
[12:18:33] Completed 82%
[12:20:19] Completed 83%
[12:22:05] Completed 84%
[12:23:52] Completed 85%
[12:25:38] Completed 86%
[12:27:24] Completed 87%
[12:29:10] Completed 88%
[12:30:56] Completed 89%
[12:32:42] Completed 90%
[12:34:28] Completed 91%
[12:36:14] Completed 92%
[12:38:01] Completed 93%
[12:39:47] Completed 94%
[12:41:33] Completed 95%
[12:43:19] Completed 96%
[12:45:05] Completed 97%
[12:46:51] Completed 98%
[12:48:37] Completed 99%
[12:50:23] Completed 100%
[12:50:23] Successful run
[12:50:23] DynamicWrapper: Finished Work Unit: sleep=10000
[12:50:33] Reserved 109372 bytes for xtc file; Cosm status=0
[12:50:33] Allocated 109372 bytes for xtc file
[12:50:33] - Reading up to 109372 from "work/wudata_00.xtc": Read 109372
[12:50:33] Read 109372 bytes from xtc file; available packet space=786321092
[12:50:33] xtc file hash check passed.
[12:50:33] Reserved 21912 21912 786321092 bytes for arc file=<work/wudata_00.trr> Cosm status=0
[12:50:33] Allocated 21912 bytes for arc file
[12:50:33] - Reading up to 21912 from "work/wudata_00.trr": Read 21912
[12:50:33] Read 21912 bytes from arc file; available packet space=786299180
[12:50:33] trr file hash check passed.
[12:50:33] Allocated 560 bytes for edr file
[12:50:33] Read bedfile
[12:50:33] edr file hash check passed.
[12:50:33] Logfile not read.
[12:50:33] GuardedRun: success in DynamicWrapper
[12:50:33] GuardedRun: done
[12:50:33] Run: GuardedRun completed.
[12:50:37] + Opened results file
[12:50:37] - Writing 132356 bytes of core data to disk...
[12:50:38] Done: 131844 -> 130780 (compressed to 99.1 percent)
[12:50:38]   ... Done.
[12:50:38] DeleteFrameFiles: successfully deleted file=work/wudata_00.ckp
[12:50:38] Shutting down core 
[12:50:38] 
[12:50:38] Folding@home Core Shutdown: FINISHED_UNIT
[12:50:40] CoreStatus = 64 (100)
[12:50:40] Unit 0 finished with 99 percent of time to deadline remaining.
[12:50:40] Updated performance fraction: 0.989071
[12:50:40] Sending work to server
[12:50:40] Project: 10501 (Run 356, Clone 0, Gen 0)


[12:50:40] + Attempting to send results [February 15 12:50:40 UTC]
[12:50:40] - Reading file work/wuresults_00.dat from core
[12:50:40]   (Read 131292 bytes from disk)
[12:50:40] Connecting to http://171.67.108.21:8080/
[12:50:43] - Couldn't send HTTP request to server
[12:50:43] + Could not connect to Work Server (results)
[12:50:43]     (171.67.108.21:8080)
[12:50:43] + Retrying using alternative port
[12:50:43] Connecting to http://171.67.108.21:80/
[12:51:04] - Couldn't send HTTP request to server
[12:51:04] + Could not connect to Work Server (results)
[12:51:04]     (171.67.108.21:80)
[12:51:04] - Error: Could not transmit unit 00 (completed February 15) to work server.
[12:51:04] - 1 failed uploads of this unit.
[12:51:04]   Keeping unit 00 in queue.
[12:51:04] Trying to send all finished work units
[12:51:04] Project: 10501 (Run 356, Clone 0, Gen 0)


[12:51:04] + Attempting to send results [February 15 12:51:04 UTC]
[12:51:04] - Reading file work/wuresults_00.dat from core
[12:51:04]   (Read 131292 bytes from disk)
[12:51:04] Connecting to http://171.67.108.21:8080/
[12:51:07] Posted data.
[12:51:07] Initial: 0000; - Uploaded at ~43 kB/s
[12:51:07] - Averaged speed for that direction ~28 kB/s
[12:51:07] - Server has already received unit.
[12:51:07] + Sent 0 of 1 completed units to the server
[12:51:07] - Preparing to get new work unit...
[12:51:07] + Attempting to get work packet
[12:51:07] - Will indicate memory of 3070 MB
[12:51:07] - Connecting to assignment server
[12:51:07] Connecting to http://assign-GPU.stanford.edu:8080/
[12:51:07] Posted data.
[12:51:07] Initial: 43AB; - Successful: assigned to (171.67.108.21).
[12:51:07] + News From Folding@Home: Welcome to Folding@Home
[12:51:07] Loaded queue successfully.
[12:51:07] Connecting to http://171.67.108.21:8080/
[12:51:08] Posted data.
[12:51:08] Initial: 0000; - Receiving payload (expected size: 59636)
[12:51:08] Conversation time very short, giving reduced weight in bandwidth avg
[12:51:08] - Downloaded at ~116 kB/s
[12:51:08] - Averaged speed for that direction ~48 kB/s
[12:51:08] + Received work.
[12:51:08] Trying to send all finished work units
[12:51:08] + No unsent completed units remaining.
[12:51:08] + Closed connections
[12:51:08] 
[12:51:08] + Processing work unit
[12:51:08] Core required: FahCore_11.exe
[12:51:08] Core found.
[12:51:08] Working on queue slot 01 [February 15 12:51:08 UTC]
[12:51:08] + Working ...
[12:51:08] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -priority 96 -checkpoint 15 -verbose -lifeline 63484 -version 620'

[12:51:08] 
[12:51:08] *------------------------------*
[12:51:08] Folding@Home GPU Core
[12:51:08] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[12:51:08] 
[12:51:08] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[12:51:08] Build host: amoeba
[12:51:08] Board Type: Nvidia
[12:51:08] Core      : 
[12:51:08] Preparing to commence simulation
[12:51:08] - Looking at optimizations...
[12:51:08] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[12:51:08] - Created dyn
[12:51:08] - Files status OK
[12:51:08] - Expanded 59124 -> 336763 (decompressed 569.5 percent)
[12:51:08] Called DecompressByteArray: compressed_data_size=59124 data_size=336763, decompressed_data_size=336763 diff=0
[12:51:08] - Digital signature verified
As you can see it successfully completed the project but could not be sent because of this weekends server problems.
Image
Horvat
Posts: 43
Joined: Thu Aug 06, 2009 4:07 am
Hardware configuration: Rig1: Asus Z8PE-D12X/Dual Xeon X5675 3.06 Ghz
Rig2: Asus Z8NA-D6/Dual Xeon E5620 2.4 Ghz
Rig3: Asus Z8NA-D6C/Dual Xeon X5670 2.93 Ghz
Rig4: Asus Z8NA-D6C/Dual Xeon E5649 2.53 Ghz

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by Horvat »

I also have two computers that have one gpu client installed on each that were affected by the bug. I am not at home now so I do not know if they restarted yet. I left them running instead of turning them off yesterday when the servers went down.
Horvat
Posts: 43
Joined: Thu Aug 06, 2009 4:07 am
Hardware configuration: Rig1: Asus Z8PE-D12X/Dual Xeon X5675 3.06 Ghz
Rig2: Asus Z8NA-D6/Dual Xeon E5620 2.4 Ghz
Rig3: Asus Z8NA-D6C/Dual Xeon X5670 2.93 Ghz
Rig4: Asus Z8NA-D6C/Dual Xeon E5649 2.53 Ghz

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by Horvat »

I am still at work, however I just checked the EOC website and it is showing completed WU's at the 12pm checkpoint so I am assuming the gpu clients have started processing WU's again.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: What do we do with all of the unsent workunits?

Post by toTOW »

Did you try qfix to requeue the results and to see if they go through ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
lambdapro
Posts: 16
Joined: Tue Dec 29, 2009 6:20 pm

Re: What do we do with all of the unsent workunits?

Post by lambdapro »

What is qfix?
weedacres
Posts: 138
Joined: Mon Dec 24, 2007 11:18 pm
Hardware configuration: UserNames: weedacres_gpu ...
Location: Eastern Washington

Re: What do we do with all of the unsent workunits?

Post by weedacres »

toTOW wrote:Did you try qfix to requeue the results and to see if they go through ?
I didn't know that qfix worked on gpu clients. I'll give it a try.
Thanks
Image
weedacres
Posts: 138
Joined: Mon Dec 24, 2007 11:18 pm
Hardware configuration: UserNames: weedacres_gpu ...
Location: Eastern Washington

Re: What do we do with all of the unsent workunits?

Post by weedacres »

I ran qfix 00 and qfix 03, trying out 2 of the workunits. There was no messages and it went back to the promp. Tried a -send 00 and 03 and keep getting the

Code: Select all

 - Warning: Asked to send unfinished unit to server
message.
Image
weedacres
Posts: 138
Joined: Mon Dec 24, 2007 11:18 pm
Hardware configuration: UserNames: weedacres_gpu ...
Location: Eastern Washington

Re: What do we do with all of the unsent workunits?

Post by weedacres »

weedacres wrote:I ran qfix 00 and qfix 03, trying out 2 of the workunits. There was no messages and it went back to the promp. Tried a -send 00 and 03 and keep getting the

Code: Select all

 - Warning: Asked to send unfinished unit to server
message.
If I try -send all it comes back with

Code: Select all

 + No unsent completed units remaining.
which I know is not the case.
Image
Tobit
Posts: 342
Joined: Thu Apr 17, 2008 2:35 pm
Location: Manchester, NH USA

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by Tobit »

Vijay, there are still some issues going on with 108.21. I can receive work but haven't been able to upload results to 108.21. As as last resort, the client tries to connect to 108.26 CS which is currently in FAIL mode. However, instead of timing out after X minutes, the client just sits there trying to connect. I don't have any network issues here on my end as my uniprocessor and SMP2 clients are working fine and I am able to send/receive from other GPU work servers.

Code: Select all

Launch directory: C:\fah\gpu1
Executable: [email protected]
Arguments: -send all -verbosity 9 

[20:30:31] - Ask before connecting: No
[20:30:31] - User name: Tobit (Team 33)
[20:30:31] - User ID: ***************
[20:30:31] - Machine ID: 3
[20:30:31] 
[20:30:31] Loaded queue successfully.
[20:30:31] Attempting to return result(s) to server...
[20:30:31] Trying to send all finished work units
[20:30:31] Project: 5781 (Run 10, Clone 80, Gen 4)
[20:30:31] - Read packet limit of 540015616... Set to 524286976.

[20:30:31] + Attempting to send results [February 15 20:30:31 UTC]
[20:30:31] - Reading file work/wuresults_01.dat from core
[20:30:31]   (Read 168832 bytes from disk)
[20:30:31] Connecting to http://171.67.108.21:8080/
[20:30:32] - Couldn't send HTTP request to server
[20:30:32] + Could not connect to Work Server (results)
[20:30:32]     (171.67.108.21:8080)
[20:30:32] + Retrying using alternative port
[20:30:32] Connecting to http://171.67.108.21:80/
[20:30:53] - Couldn't send HTTP request to server
[20:30:53] + Could not connect to Work Server (results)
[20:30:53]     (171.67.108.21:80)
[20:30:53] - Error: Could not transmit unit 01 (completed February 14) to work server.
[20:30:53] - 5 failed uploads of this unit.
[20:30:53] - Read packet limit of 540015616... Set to 524286976.

[20:30:53] + Attempting to send results [February 15 20:30:53 UTC]
[20:30:53] - Reading file work/wuresults_01.dat from core
[20:30:53]   (Read 168832 bytes from disk)
[20:30:53] Connecting to http://171.67.108.26:8080/
[20:35:39] ***** Got a SIGTERM signal (2)
[20:35:39] Killing all core threads

Folding@Home Client Shutdown.
checka
Posts: 10
Joined: Mon Feb 18, 2008 3:23 am

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by checka »

I was the one who kicked off the other thread about the server having no record of the work units on Thursday. That started on a machine with only 1 graphics card a 9800GT. The cpu continued to receive and complete work assignments but I had unaccepted work units back to Jan 30. Still showing 4 units back to Feb 8 as not recognized as legitimate by the server. I don't run ECC so I was suspicious that a stray error had creaped into my machine. So I don't think unless I have a bug in my computer that the problem is isolated to multiple gpu units. After my card completed the first new unit, it sent that work file back and started and froze trying to send previously rejected units back. I have restarted folding and the new project is up to 11% but is trying to send rejected units 00,01, 04 and 07 to the server again.
Tobit
Posts: 342
Joined: Thu Apr 17, 2008 2:35 pm
Location: Manchester, NH USA

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by Tobit »

Regarding the log I just posted above, I can now upload to 108.21 but it is reporting that the server has no record of the work unit. The client also still sits there, never timing out, afterwards trying to connect to the 108.26 CS.

Edit: my problem isn't so much that it will not upload, I can wait it out. However, hanging there, never timing out, while it tries to connect to the CS is frustrating.
Pette Broad
Posts: 128
Joined: Mon Dec 03, 2007 9:38 pm
Hardware configuration: CPU folding on only one machine a laptop

GPU Hardware..
3 x 460
1 X 260
4 X 250

+ 1 X 9800GT (3 days a week)
Location: Chester U.K

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by Pette Broad »

Still problems. At least the unit hadn't been received. Lets see what happens next.

Pete

Code: Select all

[21:07:05] Folding@home Core Shutdown: FINISHED_UNIT
[21:07:08] CoreStatus = 64 (100)
[21:07:08] Sending work to server
[21:07:08] Project: 10501 (Run 120, Clone 0, Gen 0)


[21:07:08] + Attempting to send results [February 15 21:07:08 UTC]
[21:07:13] - Server does not have record of this unit. Will try again later.
[21:07:13] - Error: Could not transmit unit 01 (completed February 15) to work server.
[21:07:13]   Keeping unit 01 in queue.
[21:07:13] Project: 10501 (Run 120, Clone 0, Gen 0)


[21:07:13] + Attempting to send results [February 15 21:07:13 UTC]
[21:07:19] - Server does not have record of this unit. Will try again later.
[21:07:19] - Error: Could not transmit unit 01 (completed February 15) to work server.


[21:07:19] + Attempting to send results [February 15 21:07:19 UTC]
[21:07:28] - Server does not have record of this unit. Will try again later.
[21:07:28]   Could not transmit unit 01 to Collection server; keeping in queue.
[21:07:28] - Preparing to get new work unit...
[21:07:28] + Attempting to get work packet
[21:07:28] - Connecting to assignment server
[21:07:29] - Successful: assigned to (171.67.108.21).
[21:07:29] + News From Folding@Home: Welcome to Folding@Home
[21:07:29] Loaded queue successfully.
[21:07:35] Project: 10501 (Run 120, Clone 0, Gen 0)


[21:07:35] + Attempting to send results [February 15 21:07:35 UTC]
[21:07:40] - Server does not have record of this unit. Will try again later.
[21:07:40] - Error: Could not transmit unit 01 (completed February 15) to work server.


[21:07:40] + Attempting to send results [February 15 21:07:40 UTC]
[21:07:47] - Server does not have record of this unit. Will try again later.
[21:07:47]   Could not transmit unit 01 to Collection server; keeping in queue.
[21:07:47] + Closed connections
[21:07:47] 
[21:07:47] + Processing work unit
[21:07:47] Core required: FahCore_11.exe
[21:07:47] Core found.
[21:07:47] Working on queue slot 02 [February 15 21:07:47 UTC]
[21:07:47] + Working ...
[21:07:47] 
[21:07:47] *------------------------------*
[21:07:47] Folding@Home GPU Core
[21:07:47] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[21:07:47] 
[21:07:47] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:07:47] Build host: amoeba
[21:07:47] Board Type: Nvidia
[21:07:47] Core      : 
[21:07:47] Preparing to commence simulation
[21:07:47] - Looking at optimizations...
[21:07:47] DeleteFrameFiles: successfully deleted file=work/wudata_02.ckp
[21:07:47] - Created dyn
[21:07:47] - Files status OK
[21:07:48] - Expanded 64952 -> 344387 (decompressed 530.2 percent)
[21:07:48] Called DecompressByteArray: compressed_data_size=64952 data_size=344387, decompressed_data_size=344387 diff=0
[21:07:48] - Digital signature verified
[21:07:48] 
[21:07:48] Project: 5781 (Run 12, Clone 173, Gen 1)
[21:07:48] 
[21:07:48] Assembly optimizations on if available.
[21:07:48] Entering M.D.
[21:07:54] Tpr hash work/wudata_02.tpr:  2878465394 2743608853 2532790532 790850555 768806336
[21:07:54] 
[21:07:54] Calling fah_main args: 14 usage=100
[21:07:54] 
[21:07:55] Working on Great Red Owns Many ACres of Sand
[21:07:58] Client config found, loading data.
[21:07:59] Starting GUI Server
[21:09:54] Completed 1%
[21:11:50] Completed 2%
Image
Anglik666
Posts: 33
Joined: Tue Dec 16, 2008 9:35 pm

Re: What do we do with all of the unsent workunits?

Post by Anglik666 »

you should copy queue.dat corresponds with .res ults files as well.
If you don't have them try qgen http://linuxminded.xs4all.nl/mirror/www ... /qgen.html to try generate queue.dat files otherwise it won't go.
PC1: PII 940, 790GX-P, 4x2GB, HD 6870
PC2: P 9750, 790GX-P, 4x1GB, GTS 250
PC3: 2x O 2427, SM H8DII+-F, 4x2GB ECC
PC4: 2x X e5420, SM X8DLT-I, 6x1GB ECC
Image
bollix47
Posts: 2958
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: What do we do with all of the unsent workunits?

Post by bollix47 »

Neither qfix or qgen work for this particular problem. qgen has not been re-programmed for the newer queue.dat file format and as a result messes up the info like Project/Clone/Run/Gen.

I tried both with no success.
Last edited by bollix47 on Wed Feb 17, 2010 8:37 pm, edited 1 time in total.
Image
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by Nathan_P »

Tobit wrote:Vijay, there are still some issues going on with 108.21. I can receive work but haven't been able to upload results to 108.21. As as last resort, the client tries to connect to 108.26 CS which is currently in FAIL mode. However, instead of timing out after X minutes, the client just sits there trying to connect. I don't have any network issues here on my end as my uniprocessor and SMP2 clients are working fine and I am able to send/receive from other GPU work servers.

Code: Select all

Launch directory: C:\fah\gpu1
Executable: [email protected]
Arguments: -send all -verbosity 9 

[20:30:31] - Ask before connecting: No
[20:30:31] - User name: Tobit (Team 33)
[20:30:31] - User ID: ***************
[20:30:31] - Machine ID: 3
[20:30:31] 
[20:30:31] Loaded queue successfully.
[20:30:31] Attempting to return result(s) to server...
[20:30:31] Trying to send all finished work units
[20:30:31] Project: 5781 (Run 10, Clone 80, Gen 4)
[20:30:31] - Read packet limit of 540015616... Set to 524286976.

[20:30:31] + Attempting to send results [February 15 20:30:31 UTC]
[20:30:31] - Reading file work/wuresults_01.dat from core
[20:30:31]   (Read 168832 bytes from disk)
[20:30:31] Connecting to http://171.67.108.21:8080/
[20:30:32] - Couldn't send HTTP request to server
[20:30:32] + Could not connect to Work Server (results)
[20:30:32]     (171.67.108.21:8080)
[20:30:32] + Retrying using alternative port
[20:30:32] Connecting to http://171.67.108.21:80/
[20:30:53] - Couldn't send HTTP request to server
[20:30:53] + Could not connect to Work Server (results)
[20:30:53]     (171.67.108.21:80)
[20:30:53] - Error: Could not transmit unit 01 (completed February 14) to work server.
[20:30:53] - 5 failed uploads of this unit.
[20:30:53] - Read packet limit of 540015616... Set to 524286976.

[20:30:53] + Attempting to send results [February 15 20:30:53 UTC]
[20:30:53] - Reading file work/wuresults_01.dat from core
[20:30:53]   (Read 168832 bytes from disk)
[20:30:53] Connecting to http://171.67.108.26:8080/
[20:35:39] ***** Got a SIGTERM signal (2)
[20:35:39] Killing all core threads

Folding@Home Client Shutdown.
I've got the same problem on one of my clients, unfortuneately i can't post the log as IE keeps throwing a strop. Its downloaded a new unit but keeps trying to send to the 108.26 CS
Image
Post Reply