171.64.65.64 overloaded

Moderators: Site Moderators, FAHC Science Team

jclu52
Posts: 19
Joined: Sat May 09, 2009 1:16 am

Re: 171.64.65.64 overloaded

Post by jclu52 »

Oh, well, it's rejecting again.

Code: Select all

Sat May 21 08:45:10 PDT 2011	171.64.65.64	GPU	vspg2v	lin5	full	Reject	2.25	0	0	2	17883	3101	5	0	113254	113254	113254	-	-	10	-	0	0	-	-	 1	171.64.122.86
171.67.108.25
-	0	 0	W;	100	6.119	-	49	64	-	-	; , 3	F	8080G	-	-	-	-	0	lin5	1	vspg2v	
Sat May 21 09:20:10 PDT 2011	171.64.65.64	GPU	vspg2v	lin5	full	Reject	2.16	0	0	2	17883	3101	5	0	113254	113254	113254	-	-	-	-	0	0	-	-	 1	171.64.122.86
171.67.108.25
-	0	 0	W;	100	6.119	-	49	64	-	-	; , 3	F	8080G	-	-	-	-	0	lin5	1	vspg2v	
Sat May 21 09:55:10 PDT 2011	171.64.65.64	GPU	vspg2v	lin5	full	Reject	2.31	0	0	2	17883	3100	4	0	113254	113254	113254	-	-	-	-	0	0	-	-	 1	171.64.122.86
171.67.108.25
-	0	 0	W;	100	6.119	-	49	64	-	-	; , 3	F	8080G	-	-	-	-	0	lin5	1	vspg2v
Image
Xavier Zepherious
Posts: 140
Joined: Fri Jan 21, 2011 8:02 am

Re: 171.64.65.64 overloaded

Post by Xavier Zepherious »

seems I can get new WU if I shutdown the client and re-start it - but it hangs again on resending when it completes
so..I have to monitor manually the client - shutdown/restarting it after each complete unit

I don't want to kill the completed WU (which it's failing on - can't send)
but in order to leave it unmonitored I have to remove it or kill it

PS: will someone respond to the users posting on this issue...we would like to kept up on what is going on
icspotz
Posts: 2
Joined: Fri Apr 22, 2011 5:38 am

Re: Project 10720: Unable to send results

Post by icspotz »

I have a similar problem with a different project number #6801 with 16 failed upload attempts

Fah log file:

Code: Select all

[15:31:34] Completed 49499999 out of 50000000 steps (99%).
[15:33:33] Completed 49999999 out of 50000000 steps (100%).
[15:33:34] Finished fah_main
[15:33:34]
[15:33:34] Successful run
[15:33:34] DynamicWrapper: Finished Work Unit: sleep=10000
[15:33:43] Reserved 2471344 bytes for xtc file; Cosm status=0
[15:33:43] Allocated 2471344 bytes for xtc file
[15:33:43] - Reading up to 2471344 from "work/wudata_01.xtc": Read 2471344
[15:33:43] Read 2471344 bytes from xtc file; available packet space=783959120
[15:33:43] xtc file hash check passed.
[15:33:43] Reserved 76680 76680 783959120 bytes for arc file=<work/wudata_01.trr> Cosm status=0
[15:33:43] Allocated 76680 bytes for arc file
[15:33:43] - Reading up to 76680 from "work/wudata_01.trr": Read 76680
[15:33:43] Read 76680 bytes from arc file; available packet space=783882440
[15:33:43] trr file hash check passed.
[15:33:43] Allocated 544 bytes for edr file
[15:33:43] Read bedfile
[15:33:43] edr file hash check passed.
[15:33:43] Allocated 120324 bytes for logfile
[15:33:43] Read logfile
[15:33:43] GuardedRun: success in DynamicWrapper
[15:33:43] GuardedRun: done
[15:33:43] Run: GuardedRun completed.
[15:33:45] + Opened results file
[15:33:45] - Writing 2669404 bytes of core data to disk...
[15:33:46] Done: 2668892 -> 2511208 (compressed to 94.0 percent)
[15:33:46] ... Done.
[15:33:46] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[15:33:47] Shutting down core
[15:33:47]
[15:33:47] Folding@home Core Shutdown: FINISHED_UNIT
[15:33:51] CoreStatus = 64 (100)
[15:33:51] Unit 1 finished with 97 percent of time to deadline remaining.
[15:33:51] Updated performance fraction: 0.971493
[15:33:51] Sending work to server
[15:33:51] Project: 6801 (Run 8739, Clone 1, Gen 13)
[15:33:51] - Read packet limit of 540015616... Set to 524286976.


[15:33:51] + Attempting to send results [May 21 15:33:51 UTC]
[15:33:51] - Reading file work/wuresults_01.dat from core
[15:33:51] (Read 2511720 bytes from disk)
[15:33:51] Gpu type=3 species=30.
[15:33:51] Connecting to http://171.64.65.64:8080/
[15:33:52] - Couldn't send HTTP request to server
[15:33:52] + Could not connect to Work Server (results)
[15:33:52] (171.64.65.64:8080)
[15:33:52] + Retrying using alternative port
[15:33:52] Connecting to http://171.64.65.64:80/
[15:33:53] - Couldn't send HTTP request to server
[15:33:53] + Could not connect to Work Server (results)
[15:33:53] (171.64.65.64:80)
[15:33:53] - Error: Could not transmit unit 01 (completed May 21) to work server.
[15:33:53] - 1 failed uploads of this unit.
[15:33:53] Keeping unit 01 in queue.
[15:33:53] Trying to send all finished work units
[15:33:53] Project: 6801 (Run 8739, Clone 1, Gen 13)
[15:33:53] - Read packet limit of 540015616... Set to 524286976.
Mod Edit: Added Code Tags - PantherX
Last edited by icspotz on Sat May 21, 2011 7:01 pm, edited 1 time in total.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 171.64.65.64 overloaded

Post by 7im »

While the IT department does work weekends, the PR department doesn't. We may not see updates posted until Monday morning, although I would expect the server to be working again before then, if possible. ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
jclu52
Posts: 19
Joined: Sat May 09, 2009 1:16 am

Re: 171.64.65.64 overloaded

Post by jclu52 »

PS: will someone respond to the users posting on this issue...we would like to kept up on what is going on
The Server Status Page

Server log for 171.64.65.64
Xavier Zepherious wrote:seems I can get new WU if I shutdown the client and re-start it - but it hangs again on resending when it completes
so..I have to monitor manually the client - shutdown/restarting it after each complete unit

I don't want to kill the completed WU (which it's failing on - can't send)
but in order to leave it unmonitored I have to remove it or kill it
Like you said, you don't have to kill the completed WU. Just shutdown then start the GPU client / SMP client. The WU will be kept in the queue to be submitted. Like 7im said:
7im wrote:While the IT department does work weekends, the PR department doesn't. We may not see updates posted until Monday morning, although I would expect the server to be working again before then, if possible. ;)
We just have to wait a bit. :D
Image
yslin
Pande Group Member
Posts: 196
Joined: Tue Sep 22, 2009 9:11 pm

Re: 171.64.65.64 overloaded

Post by yslin »

Hi,

I've been working on this server but it might take more to fix. Sorry for the inconveniences!


yslin
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 [email protected] GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 [email protected] GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Re: 171.64.65.64 overloaded

Post by GreyWhiskers »

jclu52 wrote: Like you said, you don't have to kill the completed WU. Just shutdown then start the GPU client / SMP client. The WU will be kept in the queue to be submitted.
However, comma, if you are running v6, there is a round-robin or circular queue 10 WUs long where results are kept. In my own case, I would process a p6801 WU from 171.64.65.64 every 2.2 hours, giving ~22 hours before the circular queue got back to an item.

I'm observing that since WUs aren't available from 171.64.65.64 for the fermi, the assignment server is sending my GPU client to 171.67.108.32, where it is serving up a series of 109xx and 112xx WUs. My GPU processes these in about 1.2 hours, giving the v6 circular buffer about 12 hours for the circular queue to wrap around.

When 171.64.65.64 went down, I was processing a p6801 in queue slot 9 that couldn't be uploaded. Every time the client has a completed new WU, it also tries to get rid of the old p6801. I'm now processing p109xx or p112xx from queue slot 2, so there are still 7 hours or so until the circular queue will wrap around and this WU overwritten (I think). Bad news if 171.64.65.64 isn't accepting uploads by then is that Stanford may lose the results from my computations, and I may lose the 1348 points for the P6801 WU. Good news for my particular GTX560 Ti card is that the smaller WUs seem to be more productive on my hardware (~19k ppd vs ~14,378 PPD).

v7 does better than v6 in that it will keep a pending WU upload indefinitely until it can upload. The reports that I and many others have made about stuck v7 uploads were related to uploading partial results from EUEs, not uploading full results from successful runs.
Last edited by GreyWhiskers on Sat May 21, 2011 11:39 pm, edited 1 time in total.
jclu52
Posts: 19
Joined: Sat May 09, 2009 1:16 am

Re: 171.64.65.64 overloaded

Post by jclu52 »

GreyWhiskers wrote:However, comma, if you are running v6, there is a round-robin or circular queue 10 WUs long where results are kept. In my own case, I would process a p6801 WU from 171.64.65.64 every 2.2 hours, giving ~22 hours before the circular queue got back to an item.
Thanks for explaining it in great details. It really helps to understand how it worked and how to deal with problems when it happens. :ewink:

I have not spent enough time to learn more about F@H as I wanted. In fact, that's one thing about F@H I am having trouble with. There are so much information but I don't know what to look for to get a better grasp about F@H. When using tools like the HFM.NET and FahSpy, I am not sure what I am looking at when viewing the logs, the benchmark, etc. All I am able to do was installing the GPU systray client and the SMP client to run (in service mode) & configure properly.

I am interested to learn more about the F@H but can't seem to locate a centralized / authoritative information. :(
Image
l67swap
Posts: 4
Joined: Fri Nov 05, 2010 9:32 pm

Re: 171.64.65.64 overloaded

Post by l67swap »

so i guess we just keep watching the server page to see when the server will be live again?
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: 171.64.65.64 overloaded

Post by VijayPande »

It's still having problems, we so we're doing a hard reboot. The machine will likely fsck for a while. We'll give you an update when we know more.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
ChrisM101
Posts: 12
Joined: Tue Mar 22, 2011 4:06 pm

Re: Project 10720: Unable to send results

Post by ChrisM101 »

Same Issue here Today...6801 not uploading has me backed up and not making pts.

Code: Select all

--- Opening Log file [May 21 23:46:49 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: D:\Downloads\FAH_GPU_Tracker_V2\FAH GPU Tracker V2\GPU0
Executable: D:\Downloads\FAH_GPU_Tracker_V2\FAH GPU Tracker V2\FAH_GPU3.exe
Arguments: -oneunit -forcegpu nvidia_fermi -advmethods -verbosity 9 -gpu 0 

[23:46:49] - Ask before connecting: No
[23:46:49] - User name: ChrisM101 (Team 111065)
[23:46:49] - User ID: 21BE43D7336DE836
[23:46:49] - Machine ID: 3
[23:46:49] 
[23:46:49] Gpu type=3 species=30.
[23:46:49] Loaded queue successfully.
[23:46:49] - Preparing to get new work unit...
[23:46:49] Cleaning up work directory
[23:46:49] - Autosending finished units... [May 21 23:46:49 UTC]
[23:46:49] Trying to send all finished work units
[23:46:49] Project: 6801 (Run 2181, Clone 4, Gen 14)
[23:46:49] - Read packet limit of 540015616... [23:46:49] + Attempting to get work packet
Set to 524286976.
[23:46:49] Passkey found
[23:46:49] - Will indicate memory of 6135 MB


[23:46:49] Gpu type=3 species=30.
[23:46:49] + Attempting to send results [May 21 23:46:49 UTC]
[23:46:49] - Detect CPU.[23:46:49] - Reading file work/wuresults_01.dat from core
 Vendor: GenuineIntel, Family: 6, Model: 10, Stepping: 5
[23:46:49] - Connecting to assignment server
[23:46:49] Connecting to http://assign-GPU.stanford.edu:8080/
[23:46:49]   (Read 2509828 bytes from disk)
[23:46:49] Gpu type=3 species=30.
[23:46:49] Connecting to http://171.64.65.64:8080/
[23:46:50] Posted data.
[23:46:50] Initial: 43AB; - Successful: assigned to (171.67.108.32).
[23:46:50] + News From Folding@Home: Welcome to Folding@Home
[23:46:50] Loaded queue successfully.
[23:46:50] Gpu type=3 species=30.
[23:46:50] Sent data
[23:46:50] Connecting to http://171.67.108.32:8080/
[23:46:50] Posted data.
[23:46:50] Initial: 0000; - Receiving payload (expected size: 20648)
[23:46:50] Conversation time very short, giving reduced weight in bandwidth avg
[23:46:50] - Downloaded at ~40 kB/s
[23:46:50] - Averaged speed for that direction ~41 kB/s
[23:46:50] + Received work.
[23:46:50] + Closed connections
[23:46:50] 
[23:46:50] + Processing work unit
[23:46:50] Core required: FahCore_15.exe
[23:46:50] Core found.
[23:46:50] Working on queue slot 03 [May 21 23:46:50 UTC]
[23:46:50] + Working ...
[23:46:50] - Calling '.\FahCore_15.exe -dir work/ -suffix 03 -nice 19 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 5992 -version 630'

[23:46:50] 
[23:46:50] *------------------------------*
[23:46:50] Folding@Home GPU Core
[23:46:50] Version 2.15 (Tue Nov 16 09:05:18 PST 2010)
[23:46:50] 
[23:46:50] Build host: SimbiosNvdWin7
[23:46:50] Board Type: NVIDIA/CUDA
[23:46:50] Core      : x=15
[23:46:50]  Window's signal control handler registered.
[23:46:50] Preparing to commence simulation
[23:46:50] - Looking at optimizations...
[23:46:50] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[23:46:51] - Couldn't send HTTP request to server
[23:46:51] + Could not connect to Work Server (results)
[23:46:51]     (171.64.65.64:8080)
[23:46:51] + Retrying using alternative port
[23:46:51] Connecting to http://171.64.65.64:80/
[23:46:51] - Created dyn
[23:46:51] - Files status OK
[23:46:51] sizeof(CORE_PACKET_HDR) = 512 file=<>
[23:46:51] - Expanded 20136 -> 77539 (decompressed 385.0 percent)
[23:46:51] Called DecompressByteArray: compressed_data_size=20136 data_size=77539, decompressed_data_size=77539 diff=0
[23:46:51] - Digital signature verified
[23:46:51] 
[23:46:51] Project: 10950 (Run 0, Clone 68, Gen 18)
[23:46:51] 
[23:46:51] Assembly optimizations on if available.
[23:46:51] Entering M.D.
[23:46:52] - Couldn't send HTTP request to server
[23:46:52] + Could not connect to Work Server (results)
[23:46:52]     (171.64.65.64:80)
[23:46:52] - Error: Could not transmit unit 01 (completed May 21) to work server.
[23:46:52] - 21 failed uploads of this unit.
[23:46:52] - Read packet limit of 540015616... Set to 524286976.
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Re: Project 10720: Unable to send results

Post by k1wi »

ChrisM101 and icspotz - please refer to this thread:

http://foldingforum.org/viewtopic.php?f=18&t=18681

Edit by Mod: Posts moved to the correct topic.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.64.65.64 overloaded

Post by bruce »

GreyWhiskers wrote:However, comma, if you are running v6, there is a round-robin or circular queue 10 WUs long where results are kept. In my own case, I would process a p6801 WU from 171.64.65.64 every 2.2 hours, giving ~22 hours before the circular queue got back to an item.
However, comma, ;) that doesn't matter. The queue will get full if you have a total of 10 WUs either processing or waiting to upload. The V6 client is perfectly happy looping through the 9 open positions when one WU is stuck uploading. It will skip a queue position that happens to be in use and does not reassign those positions unless you accumulate 10 WUs that are all waiting to upload (and that's not going to happen).
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 [email protected] GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 [email protected] GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Re: 171.64.65.64 overloaded

Post by GreyWhiskers »

Thanks for the update. I wasn't aware that "used" queue positions were skipped. :oops: :oops:
jclu52
Posts: 19
Joined: Sat May 09, 2009 1:16 am

Re: 171.64.65.64 overloaded

Post by jclu52 »

bruce wrote:However, comma, ;) that doesn't matter. The queue will get full if you have a total of 10 WUs either processing or waiting to upload. The V6 client is perfectly happy looping through the 9 open positions when one WU is stuck uploading. It will skip a queue position that happens to be in use and does not reassign those positions unless you accumulate 10 WUs that are all waiting to upload (and that's not going to happen).
Great info!! Thanks, bruce! :biggrin:
Image
Post Reply