GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Moderators: Site Moderators, FAHC Science Team

noprob
Posts: 31
Joined: Sun Mar 09, 2008 2:48 am
Hardware configuration: borgs
Location: mountains of West Virginia U.S.of A.
Contact:

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by noprob »

looks like no need to post log file as this seems to be an on going issue.
server in question = 171.64.65.71
on the plus side this has just happened on one of my gpu clients out of four (4)
The question now is to allow the continuation of seeking another WU or turn off the gpu client? :(

*Edit
Received WU :D (decided to leave gpu client running)

Code: Select all

[02:03:07] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.

[02:15:02] - Attempt #8  to get work failed, and no other work to do.
Waiting before retry.

[02:25:46] + Attempting to get work packet
[02:25:46] - Will indicate memory of 1023 MB
[02:25:46] - Connecting to assignment server
[02:25:46] Connecting to http://assign-GPU.stanford.edu:8080/
[02:25:47] Posted data.
[02:25:47] Initial: 40AB; - Successful: assigned to (171.64.65.20).
[02:25:47] + News From Folding@Home: Welcome to Folding@Home
[02:25:47] Loaded queue successfully.
[02:25:47] Connecting to http://171.64.65.20:8080/
[02:25:48] Posted data.
[02:25:48] Initial: 0000; - Receiving payload (expected size: 70724)
[02:25:48] Conversation time very short, giving reduced weight in bandwidth avg
[02:25:48] - Downloaded at ~138 kB/s
[02:25:48] - Averaged speed for that direction ~63 kB/s
[02:25:48] + Received work.
keep up the good work,thx. Joe!
Image
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by DrSpalding »

Is there any status on the WUs that are still in the state of "could not transmit" but then "server has already received" such as this one here, the first one I saw happen, on 13 Feb 2010 @ 23:06:03 PST:

Code: Select all

[07:06:01] Folding@home Core Shutdown: FINISHED_UNIT
[07:06:03] CoreStatus = 64 (100)
[07:06:03] Sending work to server
[07:06:03] Project: 3470 (Run 16, Clone 51, Gen 1)
[07:06:03] - Read packet limit of 540015616... Set to 524286976.


[07:06:03] + Attempting to send results [February 14 07:06:03 UTC]
[07:06:04] - Couldn't send HTTP request to server
[07:06:04] + Could not connect to Work Server (results)
[07:06:04]     (171.67.108.21:8080)
[07:06:04] + Retrying using alternative port
[07:06:25] - Couldn't send HTTP request to server
[07:06:25] + Could not connect to Work Server (results)
[07:06:25]     (171.67.108.21:80)
[07:06:25] - Error: Could not transmit unit 02 (completed February 14) to work server.
[07:06:25]   Keeping unit 02 in queue.
[07:06:25] Project: 3470 (Run 16, Clone 51, Gen 1)
[07:06:25] - Read packet limit of 540015616... Set to 524286976.


[07:06:25] + Attempting to send results [February 14 07:06:25 UTC]
[07:06:25] - Server has already received unit.
[07:06:25] - Preparing to get new work unit...
[07:06:25] + Attempting to get work packet
[07:06:25] - Connecting to assignment server
[07:06:26] - Successful: assigned to (171.67.108.21).
[07:06:26] + News From Folding@Home: Welcome to Folding@Home
[07:06:26] Loaded queue successfully.
[07:06:26] + Closed connections
I have 10 such WUs from Feb 14-15 that will expire at some point soon. To my reasoning, I think that they fall in to one of these categories:

1. Not uploaded, and server is broken such that the WUs will simply expire and be reassigned.
2. Not uploaded, and server is broken such that the WUs are marked as done but not actually be done, leading to a loss of science until it is discovered.
3. Uploaded, and my client did not get notified properly and I did not receive the credit for them.
4. Uploaded, and my client did not get notified properly and I received the credit for them.

I think #4 is the least likely as I really didn't see any extra ~3000-4000 points hit my stats. I am still thinking that #2 is the most probable.

Is there any chance that these WUs can be uploaded properly or should I just forget about it and delete my archive of the two GPU clients I have with the 10 WUs?

Here is the full list of project WUs that I have in this state:

Code: Select all

P3470, r16, c51,  g1
P3470, r14, c112, g2
P5781, r22, c982, g3
P5781, r28, c199, g3
P5781, r35, c554, g3
P5781, r6,  c539, g4
P5781, r13, c935, g4
P5781, r15, c764, g3
P5781, r22, c303, g3
P5781, r33, c15,  g3
P5781, r9,  c612, g4
Last edited by DrSpalding on Tue Feb 23, 2010 8:27 pm, edited 1 time in total.
Not a real doctor, I just play one on the 'net!
Image
ikerekes
Posts: 94
Joined: Thu Nov 13, 2008 4:18 pm
Hardware configuration: q6600 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon x2 6000+ @ 3.0Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
5600X2 @ 3.19Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
E5200 @ 3.7Ghz ubuntu 8.04 smp2 + asus 9600GT silent gpu2 in wine wrapper
E5200 @ 3.65Ghz ubuntu 8.04 smp2 + asus 9600GSO gpu2 in wine wrapper
E6550 vmware ubuntu 8.4.1
q8400 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon II 620 @ 2.6 Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Location: Calgary, Canada

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by ikerekes »

I was able to send after 45 minutes idling my completed WU but from the logs it is not clear where my WU is ended up, on the original server (171.67.108.21) or on the CS (171.67.108.26).
Needles to say 45 minutes in the life of this card, is ages

Code: Select all

[17:45:33] Folding@home Core Shutdown: FINISHED_UNIT
[17:45:35] CoreStatus = 64 (100)
[17:45:35] Sending work to server
[17:45:35] Project: 5784 (Run 8, Clone 73, Gen 59)
[17:45:35] - Read packet limit of 540015616... Set to 524286976.


[17:45:35] + Attempting to send results [February 23 17:45:35 UTC]
[17:45:40] - Server does not have record of this unit. Will try again later.
[17:45:40] - Error: Could not transmit unit 05 (completed February 23) to work server.
[17:45:40]   Keeping unit 05 in queue.
[17:45:40] Project: 5784 (Run 8, Clone 73, Gen 59)
[17:45:40] - Read packet limit of 540015616... Set to 524286976.


[17:45:40] + Attempting to send results [February 23 17:45:40 UTC]
[17:45:42] - Server does not have record of this unit. Will try again later.
[17:45:42] - Error: Could not transmit unit 05 (completed February 23) to work server.
[17:45:42] - Read packet limit of 540015616... Set to 524286976.


[17:45:42] + Attempting to send results [February 23 17:45:42 UTC]
[18:35:52] + Could not connect to Work Server (results)
[18:35:52]     (171.67.108.26:8080)
[18:35:52] + Retrying using alternative port
[18:35:52] - Couldn't send HTTP request to server
[18:35:52] + Could not connect to Work Server (results)
[18:35:52]     (171.67.108.26:80)
[18:35:52]   Could not transmit unit 05 to Collection server; keeping in queue.
[18:35:52] - Preparing to get new work unit...
[18:35:52] + Attempting to get work packet
[18:35:52] - Connecting to assignment server
[18:35:52] - Successful: assigned to (171.64.65.20).
[18:35:52] + News From Folding@Home: Welcome to Folding@Home
[18:35:52] Loaded queue successfully.
[18:35:53] Project: 5784 (Run 8, Clone 73, Gen 59)
[18:35:53] - Read packet limit of 540015616... Set to 524286976.


[18:35:53] + Attempting to send results [February 23 18:35:53 UTC]
[18:35:56] + Results successfully sent
[18:35:56] Thank you for your contribution to Folding@Home.
[18:35:56] + Number of Units Completed: 344

[18:35:56] + Closed connections
[18:35:56] 
[18:35:56] + Processing work unit
[18:35:56] Core required: FahCore_14.exe
[18:35:56] Core found.
[18:35:56] Working on queue slot 06 [February 23 18:35:56 UTC]
[18:35:56] + Working ...
[18:35:56] 
Image
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by noorman »

ikerekes wrote:I was able to send after 45 minutes idling my completed WU but from the logs it is not clear where my WU is ended up, on the original server (171.67.108.21) or on the CS (171.67.108.26).
Needles to say 45 minutes in the life of this card, is ages

Code: Select all

[17:45:33] Folding@home Core Shutdown: FINISHED_UNIT
[17:45:35] CoreStatus = 64 (100)
[17:45:35] Sending work to server
[17:45:35] Project: 5784 (Run 8, Clone 73, Gen 59)
[17:45:35] - Read packet limit of 540015616... Set to 524286976.


[17:45:35] + Attempting to send results [February 23 17:45:35 UTC]
[17:45:40] - Server does not have record of this unit. Will try again later.
[17:45:40] - Error: Could not transmit unit 05 (completed February 23) to work server.
[17:45:40]   Keeping unit 05 in queue.
[17:45:40] Project: 5784 (Run 8, Clone 73, Gen 59)
[17:45:40] - Read packet limit of 540015616... Set to 524286976.


[17:45:40] + Attempting to send results [February 23 17:45:40 UTC]
[17:45:42] - Server does not have record of this unit. Will try again later.
[17:45:42] - Error: Could not transmit unit 05 (completed February 23) to work server.
[17:45:42] - Read packet limit of 540015616... Set to 524286976.


[17:45:42] + Attempting to send results [February 23 17:45:42 UTC]
[18:35:52] + Could not connect to Work Server (results)
[18:35:52]     (171.67.108.26:8080)
[18:35:52] + Retrying using alternative port
[18:35:52] - Couldn't send HTTP request to server
[18:35:52] + Could not connect to Work Server (results)
[18:35:52]     (171.67.108.26:80)
[18:35:52]   Could not transmit unit 05 to Collection server; keeping in queue.
[18:35:52] - Preparing to get new work unit...
[18:35:52] + Attempting to get work packet
[18:35:52] - Connecting to assignment server
[18:35:52] - Successful: assigned to (171.64.65.20).
[18:35:52] + News From Folding@Home: Welcome to Folding@Home
[18:35:52] Loaded queue successfully.
[18:35:53] Project: 5784 (Run 8, Clone 73, Gen 59)
[18:35:53] - Read packet limit of 540015616... Set to 524286976.


[18:35:53] + Attempting to send results [February 23 18:35:53 UTC]
[18:35:56] + Results successfully sent
[18:35:56] Thank you for your contribution to Folding@Home.
[18:35:56] + Number of Units Completed: 344

[18:35:56] + Closed connections
[18:35:56] 
[18:35:56] + Processing work unit
[18:35:56] Core required: FahCore_14.exe
[18:35:56] Core found.
[18:35:56] Working on queue slot 06 [February 23 18:35:56 UTC]
[18:35:56] + Working ...
[18:35:56] 
.


Since 171.64.65.20 is the only server to which a connection was made (successfully), IMO the results were sent to that server (too)

171.67.108.26 is still very heavily loaded on its Network ...

.
- stopped Linux SMP w. HT on [email protected] GHz
....................................
Folded since 10-06-04 till 09-2010
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by bruce »

DrSpalding wrote:Is there any status on the WUs that are still in the state of "could not transmit" but then "server has already received" such as this one here, the first one I saw happen, on 13 Feb 2010 @ 23:06:03 PST:

Code: Select all

I have 10 such WUs from Feb 14-15 that will expire at some point soon.  To my reasoning, I think that they fall in to one of these categories:

1. Not uploaded, and server is broken such that the WUs will simply expire and be reassigned.
2. Not uploaded, and server is broken such that the WUs are marked as done but not actually be done, leading to a loss of science until it is discovered.
3. Uploaded, and my client did not get notified properly and I did not receive the credit for them.
4. Uploaded, and my client did not get notified properly and I received the credit for them.

I think #4 is the least likely as I really didn't see any extra ~3000-4000 points hit my stats.  I am still thinking that #2 is the most probable.

Is there any chance that these WUs can be uploaded properly or should I just forget about it and delete my archive of the two GPU clients I have with the 10 WUs?

Here is the full list of project WUs that I have in this state:
[code]P3470, r16, c51,  g1
P3470, r14, c112, g2
P5781, r22, c982, g3
P5781, r28, c199, g3
P5781, r35, c554, g3
P5781, r6,  c539, g4
P5781, r13, c935, g4
P5781, r15, c764, g3
P5781, r22, c303, g3
P5781, r33, c15,  g3
P5781, r9,  c612, g4
I checked on a few of these. I don't see that there's a single consistent answer.

I cannot find a record of P5781, r9, c612, g4 being credited to you but it was assigned to someone else 2010-02-08 19:53:24 PST and completed by them. I'd say that it's a #1, though the Mod database was reactivated recently so there are other reasons why I might not see a record of your WU. Would a reassignment at [03:53] UTC be consistent with either a normal expiration from the original assignment date-time or from some other event in your FAHlog?

Project: 5781, Run 33, Clone 15, Gen 3, / Run 22, Clone 303, Gen 3 / Run 22, Clone 982, Gen 3 - No data back from queries. Could be #1 or #2, depending on how many assignment/timeout cycles it takes to get a result uploaded -- or whether my DB is just incomplete.

Project: 3470, Run 16, Clone 51, Gen 1 reassigned and completed a number of times.
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by DrSpalding »

bruce wrote:
I checked on a few of these. I don't see that there's a single consistent answer.

I cannot find a record of P5781, r9, c612, g4 being credited to you but it was assigned to someone else 2010-02-08 19:53:24 PST and completed by them. I'd say that it's a #1, though the Mod database was reactivated recently so there are other reasons why I might not see a record of your WU. Would a reassignment at [03:53] UTC be consistent with either a normal expiration from the original assignment date-time or from some other event in your FAHlog?

Project: 5781, Run 33, Clone 15, Gen 3, / Run 22, Clone 303, Gen 3 / Run 22, Clone 982, Gen 3 - No data back from queries. Could be #1 or #2, depending on how many assignment/timeout cycles it takes to get a result uploaded -- or whether my DB is just incomplete.

Project: 3470, Run 16, Clone 51, Gen 1 reassigned and completed a number of times.
Sorry, my log files scrolled out of existence. FAHlog-Prev.txt starts on:
--- Opening Log file [February 13 18:25:26 UTC]

so I have no idea about any event like a reassignment or the like.

I guess I shouldn't worry about these units--they will likely get reassigned anyway, if they are not already done, and the trouble to get them uploaded properly with my wuresults_XX.dat files is probably not worth it. I will keep the files for a while longer but if I don't hear anything back from you or Pande Group about it, I'll just consider them toast and move on from here.

Thanks,

Dan
Not a real doctor, I just play one on the 'net!
Image
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by VijayPande »

Here's an update. The WS's look to be in pretty good shape, but the CS code still has known issues and bad behavior that Joe is addressing. For what it's worth, it looks like all of this did expose several problems in the code which Joe has now fixed or is fixing, so I think this has hardened it considerably. I hope Joe will have a CS fix shortly (day or two), but it's too early to guarantee an ETA since he's still making sure he understands the failure mode completely.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
ikerekes
Posts: 94
Joined: Thu Nov 13, 2008 4:18 pm
Hardware configuration: q6600 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon x2 6000+ @ 3.0Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
5600X2 @ 3.19Ghz ubuntu 8.04 smp + asus 9600GSO gpu2 in wine wrapper
E5200 @ 3.7Ghz ubuntu 8.04 smp2 + asus 9600GT silent gpu2 in wine wrapper
E5200 @ 3.65Ghz ubuntu 8.04 smp2 + asus 9600GSO gpu2 in wine wrapper
E6550 vmware ubuntu 8.4.1
q8400 @ 3.3Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Athlon II 620 @ 2.6 Ghz windows xp-sp3 one SMP2 (2.15 core) + 1 9800GT native GPU2
Location: Calgary, Canada

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by ikerekes »

VijayPande wrote:Here's an update. The WS's look to be in pretty good shape, but the CS code still has known issues and bad behavior that Joe is addressing. For what it's worth, it looks like all of this did expose several problems in the code which Joe has now fixed or is fixing, so I think this has hardened it considerably. I hope Joe will have a CS fix shortly (day or two), but it's too early to guarantee an ETA since he's still making sure he understands the failure mode completely.
Thank you for the update Prof. Pande.

I have 7 gpu client running and today only two client hang up for 45 minutes both on 171.67.108.21 (two hours after the WS issued the work unit, didn't have a record of it. One of the log is just 3 post ahead of this post).
I wouldn't call it pretty good shape but definitely better than was 10 days ago :P

Hope for ironing out the last wrinkles, and we all can return to contributing to the science.
Image
lambdapro
Posts: 16
Joined: Tue Dec 29, 2009 6:20 pm

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by lambdapro »

My nvidias have been running better of late, but I still get stuck now and then. Here is something on .71 now.
Waiting before retry.
[02:54:54] + Attempting to get work packet
[02:54:54] - Connecting to assignment server
[02:54:54] - Successful: assigned to (171.64.65.71).
[02:54:54] + News From Folding@Home: Welcome to Folding@Home
[02:54:55] Loaded queue successfully.
[02:54:55] - Couldn't send HTTP request to server
[02:54:55] + Could not connect to Work Server
[02:54:55] - Attempt #8 to get work failed, and no other work to do.
Waiting before retry.
[03:05:38] + Attempting to get work packet
[03:05:38] - Connecting to assignment server
[03:05:39] - Successful: assigned to (171.64.65.71).
[03:05:39] + News From Folding@Home: Welcome to Folding@Home
[03:05:39] Loaded queue successfully.
[03:05:39] - Couldn't send HTTP request to server
[03:05:39] + Could not connect to Work Server
[03:05:39] - Attempt #9 to get work failed, and no other work to do.
Waiting before retry.

I'm shutting down for a while. I'll check back every month to see if it gets straightened out.

David
SnW
Posts: 12
Joined: Sun Feb 14, 2010 4:21 pm

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by SnW »

Just wanted to thanks the guys for fixing this , much appreciated :D
Image
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by VijayPande »

BTW, we've taken down the 171.67.108.26 vsp09a CS until we can get that code fixed. Right now, it looks like it isn't helping, but rather hurting clients (delays them but doesn't take back their WU's). Joe is working on it and will put it back on line when it's working.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by VijayPande »

ikerekes wrote:
VijayPande wrote:Here's an update. The WS's look to be in pretty good shape, but the CS code still has known issues and bad behavior that Joe is addressing. For what it's worth, it looks like all of this did expose several problems in the code which Joe has now fixed or is fixing, so I think this has hardened it considerably. I hope Joe will have a CS fix shortly (day or two), but it's too early to guarantee an ETA since he's still making sure he understands the failure mode completely.
Thank you for the update Prof. Pande.

I have 7 gpu client running and today only two client hang up for 45 minutes both on 171.67.108.21 (two hours after the WS issued the work unit, didn't have a record of it. One of the log is just 3 post ahead of this post).
I wouldn't call it pretty good shape but definitely better than was 10 days ago :P

Hope for ironing out the last wrinkles, and we all can return to contributing to the science.
Thanks, this sounds like progress. It sounds like you're not having problems with the WS but only the CS (.21 is a CS). This and other reports made me decide to take down the CS until we can get it working. Since it's not helping and only slowing down clients, I think we're better off this way until Joe fixes the CS. Moreover, the new WS code talks actively to the CS, so CS problems hurt the WS. Taking down the CS should help the WS's.

The upshot is that (hopefully) the WS's are in reasonable shape. I guess we'll see if that's true over the next day or so.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by PantherX »

Thanks Dr. Pande for these updates. My GPU client is working much better than before but I noticed something weird in the F@H log, in the first few attempts to upload the completed WU, it gives "server has no record..." but later it successfully uploads the WU so what does this mean?

Thanks

Code: Select all

[20:26:06] Completed 90%
[20:29:21] Completed 91%
[20:32:28] Completed 92%
[20:35:29] Completed 93%
[20:38:29] Completed 94%
[20:41:32] Completed 95%
[20:44:40] Completed 96%
[20:47:40] Completed 97%
[20:50:42] Completed 98%
[20:53:53] Completed 99%
[20:57:02] Completed 100%
[20:57:02] Successful run
[20:57:02] DynamicWrapper: Finished Work Unit: sleep=10000
[20:57:12] Reserved 146032 bytes for xtc file; Cosm status=0
[20:57:12] Allocated 146032 bytes for xtc file
[20:57:12] - Reading up to 146032 from "work/wudata_06.xtc": Read 146032
[20:57:12] Read 146032 bytes from xtc file; available packet space=786284432
[20:57:12] xtc file hash check passed.
[20:57:12] Reserved 22272 22272 786284432 bytes for arc file=<work/wudata_06.trr> Cosm status=0
[20:57:12] Allocated 22272 bytes for arc file
[20:57:12] - Reading up to 22272 from "work/wudata_06.trr": Read 22272
[20:57:12] Read 22272 bytes from arc file; available packet space=786262160
[20:57:12] trr file hash check passed.
[20:57:12] Allocated 560 bytes for edr file
[20:57:12] Read bedfile
[20:57:12] edr file hash check passed.
[20:57:12] Logfile not read.
[20:57:12] GuardedRun: success in DynamicWrapper
[20:57:12] GuardedRun: done
[20:57:12] Run: GuardedRun completed.
[20:57:13] + Opened results file
[20:57:13] - Writing 169376 bytes of core data to disk...
[20:57:13] Done: 168864 -> 167392 (compressed to 99.1 percent)
[20:57:13]   ... Done.
[20:57:13] DeleteFrameFiles: successfully deleted file=work/wudata_06.ckp
[20:57:13] Shutting down core 
[20:57:13] 
[20:57:13] Folding@home Core Shutdown: FINISHED_UNIT
[20:57:18] CoreStatus = 64 (100)
[20:57:18] Sending work to server
[20:57:18] Project: 5782 (Run 1, Clone 97, Gen 29)
[20:57:18] - Read packet limit of 540015616... Set to 524286976.


[20:57:18] + Attempting to send results [February 23 20:57:18 UTC]
[20:59:52] - Server does not have record of this unit. Will try again later.
[20:59:52] - Error: Could not transmit unit 06 (completed February 23) to work server.
[20:59:52]   Keeping unit 06 in queue.
[20:59:52] Project: 5782 (Run 1, Clone 97, Gen 29)
[20:59:52] - Read packet limit of 540015616... Set to 524286976.


[20:59:52] + Attempting to send results [February 23 20:59:52 UTC]
[21:02:42] - Server does not have record of this unit. Will try again later.
[21:02:42] - Error: Could not transmit unit 06 (completed February 23) to work server.
[21:02:42] - Read packet limit of 540015616... Set to 524286976.


[21:02:42] + Attempting to send results [February 23 21:02:42 UTC]
[21:15:58] + Could not connect to Work Server (results)
[21:15:58]     (171.67.108.26:8080)
[21:15:58] + Retrying using alternative port
[21:15:59] - Couldn't send HTTP request to server
[21:15:59] + Could not connect to Work Server (results)
[21:15:59]     (171.67.108.26:80)
[21:15:59]   Could not transmit unit 06 to Collection server; keeping in queue.
[21:15:59] - Preparing to get new work unit...
[21:15:59] + Attempting to get work packet
[21:15:59] - Connecting to assignment server
[21:16:11] + Could not connect to Assignment Server
[21:16:13] - Successful: assigned to (171.67.108.11).
[21:16:13] + News From Folding@Home: Welcome to Folding@Home
[21:16:13] Loaded queue successfully.
[21:16:16] Project: 5782 (Run 1, Clone 97, Gen 29)
[21:16:16] - Read packet limit of 540015616... Set to 524286976.


[21:16:16] + Attempting to send results [February 23 21:16:16 UTC]
[21:16:45] + Results successfully sent
[21:16:45] Thank you for your contribution to Folding@Home.
[21:16:45] + Number of Units Completed: 432

[21:16:45] + Closed connections
[21:16:45] 
[21:16:45] + Processing work unit
[21:16:45] Core required: FahCore_11.exe
[21:16:45] Core found.
[21:16:45] Working on queue slot 07 [February 23 21:16:45 UTC]
[21:16:45] + Working ...
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by PantherX »

DrSpalding wrote: 2. Not uploaded, and server is broken such that the WUs are marked as done but not actually be done, leading to a loss of science until it is discovered.
Can somebody from the Pande Group tell if the above statement is true or not and if it is true, are they working on a solution for it?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 [email protected] Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 [email protected] Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Post by Nathan_P »

VijayPande wrote:Here's an update. The WS's look to be in pretty good shape, but the CS code still has known issues and bad behavior that Joe is addressing. For what it's worth, it looks like all of this did expose several problems in the code which Joe has now fixed or is fixing, so I think this has hardened it considerably. I hope Joe will have a CS fix shortly (day or two), but it's too early to guarantee an ETA since he's still making sure he understands the failure mode completely.
Vijay, Thanks for the update and all the hard work that is going to into the fix for these problems. :D
Image
Post Reply