New Servers Idle?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 59
- Joined: Tue Apr 07, 2020 8:53 pm
Re: New Servers Idle?
Yes, noticed that - and a very prominent dip exactly before that. Another good indication that this is when the the stats system finally processes the data. And where no abrupt spikes happen, the data are a pretty good representation of what's going on, I believe. the daily repeating "waveform" is also pretty interesting, a big group of clients seems to have a strong dependence on the time of the day
Re: New Servers Idle?
Not everybody is a FAH "gearhead" Remember FAH runs on home computers and it's moderately common for home computers to be turned off when people are sleeping or at work or ...
Yes, there are a lot of FAH Clients that run 24x7, but not 100%. That also means that there are certain times during a 24hr cycle when you can expect to get a new assignment.
In the early days of FAH, the daily cycles were probably more pronounced with a steeper rise and a more gradual descent (which can be seen today) because folks had to connect their modem to upload whatever was completed during the night.
Yes, there are a lot of FAH Clients that run 24x7, but not 100%. That also means that there are certain times during a 24hr cycle when you can expect to get a new assignment.
In the early days of FAH, the daily cycles were probably more pronounced with a steeper rise and a more gradual descent (which can be seen today) because folks had to connect their modem to upload whatever was completed during the night.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: New Servers Idle?
Given that FAH is a "global phenomenon" I am slightly surprised at how pronounced the cycle is - but I guess is shows where the most folders are concentrated .. and the dips may well not be as low as they could be due to not just 24/7 folders but the effect of the global folding community?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 52
- Joined: Sat Mar 28, 2020 1:22 am
Re: New Servers Idle?
Could it be based on the researchers? Every morning, they come in, do something to upload more work, which populates a bunch of clients. The fast clients run out of work quickly, and then the totals decrease over time until the next batch upload?
-
- Posts: 59
- Joined: Tue Apr 07, 2020 8:53 pm
Re: New Servers Idle?
I wasn't surprised by the 24 hour cycles, I just find them interesting... either mostly client/user-dependent (which region(s) dominate, who of the folders shuts down their computer at night, or only lets it fold at night because it's doing something else during the day), but perhaps there is a WU availability dependece as well, as suggested by Endgame124 - will try to correlate this later today
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: New Servers Idle?
Generally speaking, researchers are involved when they are generating new Projects or additional simulations of an existing project if they need more data. Once the initial "batch" of WUs are done, the researchers are out of the loop, i.e. the Servers hand the WUs out and once the completed ones arrive, automatically generate the next sequence and will assign it.Endgame124 wrote:Could it be based on the researchers? Every morning, they come in, do something to upload more work, which populates a bunch of clients. The fast clients run out of work quickly, and then the totals decrease over time until the next batch upload?
Once there's enough statistical data, researchers can further analyses it and may tweak the settings (more sample collection or spin up a new series of Projects) depending on the data.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 59
- Joined: Tue Apr 07, 2020 8:53 pm
Re: New Servers Idle?
This is a first ugly plot of my "experimental composite stats", combining key aspects of the server stats and credit logs:
It visualizes some interesting trends, including a "sneak peak" into my suspicion that the global F@H "supercomputer" is "only" about 60-80% as efficient as it (maybe) could be given the current number of clients, and interestingly, this does no longer seem to be due to the WU availability or WS speed.
Plot Walk-through:
The 3 solid lines show available WUs as reported by server-stats, split into GPU, CPU and total.
For the last 10+ days, there were always 300-500k WU available, cool!
And always over 200k CPU WUs, which is important because most of Covid-19 is CPU as of now.
The dashed purple line with the daily fluctuations is the hours/hours as reported by credit-log,
with a little forward-looking smoothing to deal with the short-term spikes due to delayed processing.
The cool thing here:
Trendline is going up, currently on average there are 500.000 hrs/hrs logged -
just slightly above what the WS report is available. In other words:
F@H always has WUs ready for the next 30-60 minutes.
The dotted lines towards the bottom are the assign rates (actual reported ones, not what the servers are configured for),
and in blue the Credits logged/hr, which is kinda an assign rate, just at the other end of the "pileline" ("collect rate"?)
WARNING! The rest of the post goes into nitty-gritty details!
And this is where things become interesting:
The green dotted line is the assign rates as reported by AS1 and AS2, I assume that is the number of assignments they hand out.
The purple line is all the individual assign rates reported by the WS's added up. Far less than what the two AS hand out!
The blue line ("collect rate") tracks the purple line ("WS assign rate") very well, a strong indication that almost no WUs are lost by the clients.
Big Kudos to all folders for being so reliable!!
(The JSON objects, which server-stats builds its numbers from, include more detail than visualized in the end. That's how I was able to derive some of the above stats)
Now my quest continues....
Hunting down why the AS assign rate and the WS cumulative assign rates are so different.
BTW, I think you can see that difference in real-time on the server-stats page: In the first table,
the cumulative assign rate for CPU & GPU are listed, but they are always a lot smaller than the TOTALS right beneath it.
Different Hypothesis/possible reasons so far:
#1)
There could be other things the AS count as assignment, for example, the JSON file includes things like:
"id_rate":3.156667,
"assign_rate":14.06,
"no_assign_rate":132.873333,
"blacklist_rate":2.49,
And "assign_rate" is what is reported - is it the total, and one would subtract "id_rate" and "blacklist_rate" to derive the net assign rate actually handed off to the WS?
#2) After clients receive an assignment from the AS, they fail to ask the WS. Unlikely that this happens on such a large scale.
UNLESS some "rogue folders" use such techniques to "cherry pick" WUs (or WSs) they like best.
#3) After clients receive an assignment from the AS, the WS says "no WUs available"
I definitely have evidence in my log files that this happens, but I need more statistics to see if that happens often enough to explain the difference.
If that would be the case, F@H should be able to tweak the "indirect interaction" between AS and WS relatively easily.
In a next step, I will analyze my own client logs to see how their data correlates with above plot, especially building statistics for the reported reasons of not getting WUs (and how much of the times my clients are idle due to that.)
Hypothesis #4):
Perhaps the 24-hours "oscillating" behavior is not only a function of when the folding computers are online or available for folding, but also of "WU request storms" during certain times of the day which in turn kick a significant amount of clients into a waiting-for-WUs state for more hours than necessary.
I am looking for any insight or feedback!
It visualizes some interesting trends, including a "sneak peak" into my suspicion that the global F@H "supercomputer" is "only" about 60-80% as efficient as it (maybe) could be given the current number of clients, and interestingly, this does no longer seem to be due to the WU availability or WS speed.
Plot Walk-through:
The 3 solid lines show available WUs as reported by server-stats, split into GPU, CPU and total.
For the last 10+ days, there were always 300-500k WU available, cool!
And always over 200k CPU WUs, which is important because most of Covid-19 is CPU as of now.
The dashed purple line with the daily fluctuations is the hours/hours as reported by credit-log,
with a little forward-looking smoothing to deal with the short-term spikes due to delayed processing.
The cool thing here:
Trendline is going up, currently on average there are 500.000 hrs/hrs logged -
just slightly above what the WS report is available. In other words:
F@H always has WUs ready for the next 30-60 minutes.
The dotted lines towards the bottom are the assign rates (actual reported ones, not what the servers are configured for),
and in blue the Credits logged/hr, which is kinda an assign rate, just at the other end of the "pileline" ("collect rate"?)
WARNING! The rest of the post goes into nitty-gritty details!
And this is where things become interesting:
The green dotted line is the assign rates as reported by AS1 and AS2, I assume that is the number of assignments they hand out.
The purple line is all the individual assign rates reported by the WS's added up. Far less than what the two AS hand out!
The blue line ("collect rate") tracks the purple line ("WS assign rate") very well, a strong indication that almost no WUs are lost by the clients.
Big Kudos to all folders for being so reliable!!
(The JSON objects, which server-stats builds its numbers from, include more detail than visualized in the end. That's how I was able to derive some of the above stats)
Now my quest continues....
Hunting down why the AS assign rate and the WS cumulative assign rates are so different.
BTW, I think you can see that difference in real-time on the server-stats page: In the first table,
the cumulative assign rate for CPU & GPU are listed, but they are always a lot smaller than the TOTALS right beneath it.
Different Hypothesis/possible reasons so far:
#1)
There could be other things the AS count as assignment, for example, the JSON file includes things like:
"id_rate":3.156667,
"assign_rate":14.06,
"no_assign_rate":132.873333,
"blacklist_rate":2.49,
And "assign_rate" is what is reported - is it the total, and one would subtract "id_rate" and "blacklist_rate" to derive the net assign rate actually handed off to the WS?
#2) After clients receive an assignment from the AS, they fail to ask the WS. Unlikely that this happens on such a large scale.
UNLESS some "rogue folders" use such techniques to "cherry pick" WUs (or WSs) they like best.
#3) After clients receive an assignment from the AS, the WS says "no WUs available"
I definitely have evidence in my log files that this happens, but I need more statistics to see if that happens often enough to explain the difference.
If that would be the case, F@H should be able to tweak the "indirect interaction" between AS and WS relatively easily.
In a next step, I will analyze my own client logs to see how their data correlates with above plot, especially building statistics for the reported reasons of not getting WUs (and how much of the times my clients are idle due to that.)
Hypothesis #4):
Perhaps the 24-hours "oscillating" behavior is not only a function of when the folding computers are online or available for folding, but also of "WU request storms" during certain times of the day which in turn kick a significant amount of clients into a waiting-for-WUs state for more hours than necessary.
I am looking for any insight or feedback!