128.252.203.10 Losing Work Units?

Moderators: Site Moderators, FAHC Science Team

Post Reply
rusty
Posts: 17
Joined: Sun Mar 15, 2020 9:00 pm

128.252.203.10 Losing Work Units?

Post by rusty »

Hello,

I observed the following a few hours ago and thought I would report it.

Log for WU [Project:11759 (Run 0, Clone 5086, Gen 42)]
(Upload Successful. Final credit estimate, 132696.00 points)

Code: Select all

14:36:48:WU00:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
14:36:50:WU00:FS01:0x22:Saving result file ../logfile_01.txt
14:36:50:WU00:FS01:0x22:Saving result file checkpointState.xml
14:36:50:WU00:FS01:0x22:Saving result file checkpt.crc
14:36:50:WU00:FS01:0x22:Saving result file positions.xtc
14:36:50:WU00:FS01:0x22:Saving result file science.log
14:36:50:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
14:36:51:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
14:36:51:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11759 run:0 clone:5086 gen:42 core:0x22 unit:0x0000004680fccb0a5e6e859a0f61666c
14:36:51:WU00:FS01:Uploading 50.00MiB to 128.252.203.10
14:36:51:WU00:FS01:Connecting to 128.252.203.10:8080
14:37:17:WU00:FS01:Upload 0.75%
14:37:23:WU00:FS01:Upload 17.75%
14:37:29:WU00:FS01:Upload 46.75%
14:37:35:WU00:FS01:Upload 90.00%
14:37:36:WU00:FS01:Upload complete
14:37:36:WU00:FS01:Server responded WORK_ACK (400)
14:37:36:WU00:FS01:Final credit estimate, 132696.00 points
14:37:36:WU00:FS01:Cleaning up
Checking the WU Status Page (link) :
"Not found."

I waited about 3 hours before posting this report to make sure that whatever cron job (or whatever) had time to update the WU Status, but alas...

This particular machine has returned other WUs since 11759 (0, 5086, 42), and those show up as having been returned (for example: 11764 (0, 3122, 33)

Hopefully returned results are not getting "lost"
Image
davidcoton
Posts: 1094
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: 128.252.203.10 Losing Work Units?

Post by davidcoton »

Experience suggests credit is almost never lost, but frequently delayed. The stats server is under pressure (like the whole F@H system), but is not a high priority compared to the servers that affect the science.
Image
rusty
Posts: 17
Joined: Sun Mar 15, 2020 9:00 pm

Re: 128.252.203.10 Losing Work Units?

Post by rusty »

Thanks. I'm not really concerned about the stats/credit -- just the loss of the work.

Even if the "points" aren't updated, I would think that the WU status tool would still be correct @ https://apps.foldingathome.org/wu (unless there is a "real" problem, of course). Otherwise, the tool is a useless novelty, which I would assume is not the case.
Image
davidcoton
Posts: 1094
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: 128.252.203.10 Losing Work Units?

Post by davidcoton »

AFAICT the status tool uses the stats database to get its information
Image
rusty
Posts: 17
Joined: Sun Mar 15, 2020 9:00 pm

Re: 128.252.203.10 Losing Work Units?

Post by rusty »

Interesting, I would have expected WU status queries to go one level deeper than the statistics database; otherwise, I wouldn't have bothered posting.

In any case, I'll keep an eye on this and bump the thread if this WU still doesn't register in the next week or so.
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 128.252.203.10 Losing Work Units?

Post by PantherX »

This is a simplified version of how you get points:

Finished WU uploaded to WS/CS -> WS/CS verifies WU and calculates points -> Stats Server gets all the data from WS/CS and processes it -> Stats are updated

Thus, if you have successfully upload the WU and the server acknowledges it, I am pretty certain that you will eventually get the points. Do note that the collection of additional points is generally manually done and is rather intensive so I would expect that it will be addressed eventually once the supply of WUs meets the demands in a reliable manner. AFAIK, I can't remember an instance where any points were lost.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
rusty
Posts: 17
Joined: Sun Mar 15, 2020 9:00 pm

Re: 128.252.203.10 Losing Work Units?

Post by rusty »

Okay, I have to admit. I am getting pretty frustrated here.

I am not posting to inquire about my point total or the credit for completing this WU.

Like I mentioned before, I do not care about the points.

I was posting because it appeared that the collection server did not properly record the receipt of the WU. I simply wanted it to be on the admin staff's radar if this was the case.

The confusion here stems from the fact that I believed I could use the WU Status tool to check the true status of a WU. Now that I understand that that is, in fact, not the case, I believe we are square here. Thanks for your help.

Again, if the WU doesn't show as being received by the WU Status tool in a week or so, I will bump the thread.
Image
uyaem
Posts: 219
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: 128.252.203.10 Losing Work Units?

Post by uyaem »

Fair enough, just wanna say I'm in the same boat at the moment.
Completed the following today, none of which are available on the ../wu app.
It has been the case before, just requires some patience, the delay varies a LOT. :)

Code: Select all

FS00:0xa7:Project: 16411 (Run 705, Clone 0, Gen 4)
FS01:0x22:Project: 13879 (Run 0, Clone 68, Gen 20)
FS00:0xa7:Project: 14592 (Run 705, Clone 2, Gen 15)
FS00:0xa7:Project: 16405 (Run 0, Clone 700, Gen 16)
FS00:0xa7:Project: 16418 (Run 0, Clone 951, Gen 39)
FS00:0xa7:Project: 14614 (Run 239, Clone 0, Gen 2)
FS01:0x22:Project: 14541 (Run 0, Clone 1724, Gen 10)
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
uyaem
Posts: 219
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: 128.252.203.10 Losing Work Units?

Post by uyaem »

Please check again with the WU status tool, all mine have been processed successfully now.
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 128.252.203.10 Losing Work Units?

Post by Neil-B »

They have caught up a bit (circa 1.1m WUs) https://apps.foldingathome.org/credit-log
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
rusty
Posts: 17
Joined: Sun Mar 15, 2020 9:00 pm

Re: 128.252.203.10 Losing Work Units?

Post by rusty »

Thanks everyone. Fortunately, it looks like the situation was the same for me. The original WU in question is now showing as received, so no worries.

I learned an important lesson here about the WU Status tool: it's not necessarily up-to-date and shouldn't be used to try and identify potential problems.

Neil-B:
The link you posted to the Credit Log app is a nice resource to have (oddly, not listed at https://apps.foldingathome.org). Thanks!
Image
davidcoton
Posts: 1094
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: 128.252.203.10 Losing Work Units?

Post by davidcoton »

The summary is this: if you get credit or see the WU in the WU status tool, then it has been received.
Otherwise, if the upload in your log looks correct, the science has almost certainly been received but there is a delay in the link to stats. The stats server will usually be a lower priority than anything on the science path.
Otherwise, the client will retry the upload until it succeeds or the WU expires.

The system is designed to be robust -- loss of science is in no-one's interest. That implies that the vast majority of WUs will be uploaded before they expire. Even with the recent step-change in the work throughput, very little work has been lost (sorry I don't have figures, but certainly nothing that worries the F@H team). There were delays in work allocation and return while servers were overloaded, that should mainly be solved now (several new powerful servers are online).

Once a WU is uploaded, loss of points credit is also extremely rare, but may be delayed -- work on the stats system has been carried out, if more is necessary it will be done but at lower priority than the science systems.
Image
Post Reply