New Servers Idle?
Moderators: Site Moderators, FAHC Science Team
New Servers Idle?
Looks like FAH is ramping up big time: https://apps.foldingathome.org/serverstats
If I see that correctly, when the 8 new servers (among them 2 from AWS) listed there will reach their full capacity, FAH will become about twice as powerful as it is now.
But can the Assignment Servers handle that? And why are they all idle?
If I see that correctly, when the 8 new servers (among them 2 from AWS) listed there will reach their full capacity, FAH will become about twice as powerful as it is now.
But can the Assignment Servers handle that? And why are they all idle?
-
- Posts: 59
- Joined: Tue Apr 07, 2020 8:53 pm
Re: New Servers Idle?
According to the Fire Side Dev Chat last Tuesday, the Assignment Servers are not the bottleneck, at least not hardware-wise. I suspect some algorithmic inefficiencies under certain conditions, but that doesn't apply right now because:
My best guess is that - yet again - there is not enough useful work available, i.e. scientists can only crank out so much so the WS can create useful WU to distribute.
Not surprising given the Easter weekend.
My best guess is that - yet again - there is not enough useful work available, i.e. scientists can only crank out so much so the WS can create useful WU to distribute.
Not surprising given the Easter weekend.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: New Servers Idle?
[I am not part of the Folding@home team, but I wrote and implemented applications for 40 years]
If I build a new highway, there will be a time when you can see a highway but no cars are driving on it yet, as the construction is not finished and safe.
I bet the new servers are also still being integrated into F@H system. If they rush it, then 'minor' parts of the system won't work, (Points won't record, Bonuses won't occur, etc.) even though the science results are recoded.
If the volunteers did not care that they were not credited with their points, you could rush the new servers into working sooner. But we want the system to work perfectly, so it will take some time.
If I build a new highway, there will be a time when you can see a highway but no cars are driving on it yet, as the construction is not finished and safe.
I bet the new servers are also still being integrated into F@H system. If they rush it, then 'minor' parts of the system won't work, (Points won't record, Bonuses won't occur, etc.) even though the science results are recoded.
If the volunteers did not care that they were not credited with their points, you could rush the new servers into working sooner. But we want the system to work perfectly, so it will take some time.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: New Servers Idle?
Then maybe adding a code that people could put in their expert section would help:
Code: Select all
client-type: rush_and_to_hell_with_the_points
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: New Servers Idle?
hmmm … client-type: ftp … (if you will pardon the expletive)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 59
- Joined: Tue Apr 07, 2020 8:53 pm
Re: New Servers Idle?
My understanding (but I am also not affiliated with the folding@home team) is that the WS software stack is under active development, and that there are some complexities with the accounting etc, but rolling out & configuring that is not the big thing that holds back those WS and scaling up the number of WS.
It seems to be really related to useful work to be generated, useful from the scientific standpoint. And often, the scientific groups first have to make sense of many TBs of generated data (simplified speaking) and after looking and understanding those results, you might have an idea of what type of molecule under which conditions makes sense next.
If you are "babysitting" your folding machines anyway and don't mind f@h hanging up sometimes (and maybe very rarely crashing your machine), you can sign up for late stage beta work units by setting your client(s) to also accept "advanced" work units: https://foldingathome.org/support/faq/i ... ion-guide/
Some folders report that their utilization is a lot better after that, but you still get the points if the WU complete (and not so much if your machine hang for a day on a WU which can never complete). So not quite "FTP", but you might care a little less (although making even more points in certain times were more beta work is available)
It seems to be really related to useful work to be generated, useful from the scientific standpoint. And often, the scientific groups first have to make sense of many TBs of generated data (simplified speaking) and after looking and understanding those results, you might have an idea of what type of molecule under which conditions makes sense next.
If you are "babysitting" your folding machines anyway and don't mind f@h hanging up sometimes (and maybe very rarely crashing your machine), you can sign up for late stage beta work units by setting your client(s) to also accept "advanced" work units: https://foldingathome.org/support/faq/i ... ion-guide/
Some folders report that their utilization is a lot better after that, but you still get the points if the WU complete (and not so much if your machine hang for a day on a WU which can never complete). So not quite "FTP", but you might care a little less (although making even more points in certain times were more beta work is available)
Re: New Servers Idle?
That sounds good, but doesn't really add up. Since the thing with the virus, most FAH ressources are concentrated on that particular research. Now, if those covid-19 projects can't be exploited by the scientists as fast as FAH can compute them, it still should be possible to let other researchers use the system. It's not like we are out of diseases and of people wanting to study, treat, and heal them? The computing power is available now, it thus should be used now.
It's problematic also at a deeper level: FAH is pioneering a huge experiment, that can potentially decide of the rise of modern distributed computing. It demonstrates that this computing model is more powerful than "conventional" supercomputers. And I often have the feeling that the incentives of the past (the "accounting") is preventing it to deploy its full potential. There should be a way to opt out of this for the benefice of science (here = more computing power). We would then really see whether the model is sustainable.
It's problematic also at a deeper level: FAH is pioneering a huge experiment, that can potentially decide of the rise of modern distributed computing. It demonstrates that this computing model is more powerful than "conventional" supercomputers. And I often have the feeling that the incentives of the past (the "accounting") is preventing it to deploy its full potential. There should be a way to opt out of this for the benefice of science (here = more computing power). We would then really see whether the model is sustainable.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: New Servers Idle?
Once you take away the accounting, how will you know? The accounting tells how many users there are, how many WUs got done, all the 'sustainable' bits.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: New Servers Idle?
What is important for deciding whether the system is powerful and sustainable enough is there: https://stats.foldingathome.org/os
No mention of points...
No mention of points...
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: New Servers Idle?
The FAH model of distributed computing has been "pioneering" for some 20 years and delivering to its scientists much needed compute resource than they would have otherwise had - so I guess it has proved sustainability already, but whether it can scale to the potential compute resource available today, and whether that compute resource is in itself sustainable or simply "short term ism" has yet to be seen … the limits of the potential of the FAH model is the funding required to deliver the infrastructure and the science services - it has managed to date, but if more regular secure funding comes out of the current focus it has then those limits should be much greater … Other distributed projects have gone different routes (possibly funding driven) - specialist software running on specific platforms being specifically tasked (a closed shop FAH as it were) to generic platform approaches where projects of any sort can be run (BOINC, IBM, etc.) for examples - they may have a better funding basis and be able to deliver more sustainable growth.
"Conventional" supercomputers for the most part are (unlike a decade or so ago) a very closely connected high bandwidth low latency distributed compute platform (without much distributedness) … They are actually much better for some types of science and calculations than the distributed platforms and as such for the most part are used for purposes that require their performance characteristics - "It is not all about the FLOPs" is fair to say, or even "It is not how much power you have got but how you use it" … They are also fiercely expensive to purchase and support … FAH is probably even more expensive but its model minimises costs to the scientists and effectively crowdsources the costs outwards, by having people donate parts of their electricity bills, pc hardware/software budget, network charges - total all this up and it will be even more scary that the total compute power !! - oh yes and support, guess that is "paid for" by the myriad volunteers who try each to their own ability to help the next folder who encounters problems they have already seen/dealt with - no call centre costs, or salaries for First Line and Deep Technical support, or 24hr call out charges
If there is capacity for non Covid-19 WUs and there are scientists (not in lockdown, isolation or looking after families) to generate them then the current system doesn't stop/withhold them - Covid-19 is only given priority … We may not be out of diseases, but we may well be short on people with the skills and knowledge of molecular modelling to create the projects and get them running?
I am all for a not points related system (although they do provide a form of performance metric that allows comparisons and troubleshooting) but given the brouhaha over points these last few weeks I fear that without them these forums would be too quite a place?
"Conventional" supercomputers for the most part are (unlike a decade or so ago) a very closely connected high bandwidth low latency distributed compute platform (without much distributedness) … They are actually much better for some types of science and calculations than the distributed platforms and as such for the most part are used for purposes that require their performance characteristics - "It is not all about the FLOPs" is fair to say, or even "It is not how much power you have got but how you use it" … They are also fiercely expensive to purchase and support … FAH is probably even more expensive but its model minimises costs to the scientists and effectively crowdsources the costs outwards, by having people donate parts of their electricity bills, pc hardware/software budget, network charges - total all this up and it will be even more scary that the total compute power !! - oh yes and support, guess that is "paid for" by the myriad volunteers who try each to their own ability to help the next folder who encounters problems they have already seen/dealt with - no call centre costs, or salaries for First Line and Deep Technical support, or 24hr call out charges
If there is capacity for non Covid-19 WUs and there are scientists (not in lockdown, isolation or looking after families) to generate them then the current system doesn't stop/withhold them - Covid-19 is only given priority … We may not be out of diseases, but we may well be short on people with the skills and knowledge of molecular modelling to create the projects and get them running?
I am all for a not points related system (although they do provide a form of performance metric that allows comparisons and troubleshooting) but given the brouhaha over points these last few weeks I fear that without them these forums would be too quite a place?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: New Servers Idle?
All very good points.
But the facts remain: nowadays, the largest computing power is the one that is distributed, that is, spread among regular people and companies around the world. All those billions of CPUs and GPUs that sit idle most of the year compose by far the most powerful computing ressource in the world. Sure, a distributed system cannot handle all computing tasks, at least not yet. But that huge capacity is a fantastic opportunity and I think we should bank on that. 20 years back, it would have been delusional; today, it would be a sin not to exploit it. And sure, it is expensive, for users, too. It needs a dose of idealism, or something. But if we cannot muster it, do we really have a right to health, for example, as an "advanced" civilization?
If FAH offered a way to bypass the "accounting" on a voluntary basis, we could see if people and companies are ready for the challenge. As is, the present situations looks a bit like what v00d00 says here: viewtopic.php?f=16&t=33575&p=325210&hilit=stability#p325210 - a mere hype, that will disappear as soon as the coronavirus will be mastered. Besides, the accounting also could be externalized, with the log generated in each client serving as basis for a more general "hype system".
For it to work, the most important, I think, is that volunteers, who gives their computing power, their time, their know-how and goodwill just for the sake of it aren't bugged by cryptic problems. If you want to help, ok, just do. It works very well on this forum, and this is not because of the hype. It can work.
But the facts remain: nowadays, the largest computing power is the one that is distributed, that is, spread among regular people and companies around the world. All those billions of CPUs and GPUs that sit idle most of the year compose by far the most powerful computing ressource in the world. Sure, a distributed system cannot handle all computing tasks, at least not yet. But that huge capacity is a fantastic opportunity and I think we should bank on that. 20 years back, it would have been delusional; today, it would be a sin not to exploit it. And sure, it is expensive, for users, too. It needs a dose of idealism, or something. But if we cannot muster it, do we really have a right to health, for example, as an "advanced" civilization?
If FAH offered a way to bypass the "accounting" on a voluntary basis, we could see if people and companies are ready for the challenge. As is, the present situations looks a bit like what v00d00 says here: viewtopic.php?f=16&t=33575&p=325210&hilit=stability#p325210 - a mere hype, that will disappear as soon as the coronavirus will be mastered. Besides, the accounting also could be externalized, with the log generated in each client serving as basis for a more general "hype system".
For it to work, the most important, I think, is that volunteers, who gives their computing power, their time, their know-how and goodwill just for the sake of it aren't bugged by cryptic problems. If you want to help, ok, just do. It works very well on this forum, and this is not because of the hype. It can work.
-
- Posts: 59
- Joined: Tue Apr 07, 2020 8:53 pm
Re: New Servers Idle?
I can add that F@H is by no means a "modern" distributed computing system, it's more one of the early pioneers. In fact it has a lot of 10- and 20-year old baggage they try to improve, and they are working on getting teams together now since a lot of developers are volunteering. But it will take time, weeks at least, to make significant changes to clients & WS to leverage some more software approaches and overall system tweaks allowing to use the - very suddenly multiplied - resources efficiently.
And other food for thought: F@H is not at all a general-purpose-supercomputer. In fact it can only perform some very specific calculations around protein chemistry - so even if there were other types of calculations which could benefit from a similar distributed computing structure, it would take a while to adapt F@H. But there a plenty of other, for example BOINC-Based, distributed computing projects out there which also support scientific research on Covid-19, F@H is just the largest and got decent news coverage, thus the spike in interest.
Also, I am not aware that Covid is really prioritized - a lot of the "classic" projects are still providing work units:
https://apps.foldingathome.org/psummary
and again, science is hard - coming up with meaningful tasks for F@H to compute and making sense of the results just takes some time.
Another thought: Energy per calculation performed is more and more the metric to be concerned about. So yes, there is a lot of hardware deployed in homes & companies, but if it is not computing, it's only using a mere fraction of the power it does use when it computes at full throttle.
And I definitely agree with the OP that many don't fold for the points, although it is fun and also gives you one (but by far not the only) metric to optimize the system - from client to the overall F@H behemoth. And while the OS stats are impressive and give some FLOP numbers, I am looking forward to many more meaningful stats describing the overall F@H performance. For example, on a daily basis:
* how much computational work was actually completed by how many clients (the "hours clients spent computing" divided by "wall time" mentioned by Joseph Coffland is a good start)
* how many clients were actually idle because they didn't get a work unit assigned (estimate of additional resources available, but idle)
* how many WU are actually ready-to-go and waiting to be computed (efficiency of assignment & distribution process if compared with above "idle clients/resources" stats)
* and - much harder to do - how energy efficient is the overall system (could use averages per CPU/GPU type & speed from databases etc)
And other food for thought: F@H is not at all a general-purpose-supercomputer. In fact it can only perform some very specific calculations around protein chemistry - so even if there were other types of calculations which could benefit from a similar distributed computing structure, it would take a while to adapt F@H. But there a plenty of other, for example BOINC-Based, distributed computing projects out there which also support scientific research on Covid-19, F@H is just the largest and got decent news coverage, thus the spike in interest.
Also, I am not aware that Covid is really prioritized - a lot of the "classic" projects are still providing work units:
https://apps.foldingathome.org/psummary
and again, science is hard - coming up with meaningful tasks for F@H to compute and making sense of the results just takes some time.
Another thought: Energy per calculation performed is more and more the metric to be concerned about. So yes, there is a lot of hardware deployed in homes & companies, but if it is not computing, it's only using a mere fraction of the power it does use when it computes at full throttle.
And I definitely agree with the OP that many don't fold for the points, although it is fun and also gives you one (but by far not the only) metric to optimize the system - from client to the overall F@H behemoth. And while the OS stats are impressive and give some FLOP numbers, I am looking forward to many more meaningful stats describing the overall F@H performance. For example, on a daily basis:
* how much computational work was actually completed by how many clients (the "hours clients spent computing" divided by "wall time" mentioned by Joseph Coffland is a good start)
* how many clients were actually idle because they didn't get a work unit assigned (estimate of additional resources available, but idle)
* how many WU are actually ready-to-go and waiting to be computed (efficiency of assignment & distribution process if compared with above "idle clients/resources" stats)
* and - much harder to do - how energy efficient is the overall system (could use averages per CPU/GPU type & speed from databases etc)
Re: New Servers Idle?
If there is something FAH has proved these last weeks, it is that they just have to ask. They asked for donors. Boom, 20 fold (!) increase. They could not handle it so they asked for more servers. Boom, the necessary servers are now there (albeit still idle). If they now lack scientists who can use their ressources, maybe they should just ask, too.
There should be a vision of what is possible in the present (computing) world, and from there, well, just ask. For the folding of proteins, it does work. It can work for many other useful computing works. Let's make it such a success story that it will be self-evident afterwards. Let's ask for more scientists.
There should be a vision of what is possible in the present (computing) world, and from there, well, just ask. For the folding of proteins, it does work. It can work for many other useful computing works. Let's make it such a success story that it will be self-evident afterwards. Let's ask for more scientists.
-
- Posts: 59
- Joined: Tue Apr 07, 2020 8:53 pm
Re: New Servers Idle?
What is "advanced", and what is a "civilization"?ajm wrote: But the facts remain: nowadays, the largest computing power is the one that is distributed, that is, spread among regular people and companies around the world. All those billions of CPUs and GPUs that sit idle most of the year compose by far the most powerful computing ressource in the world. Sure, a distributed system cannot handle all computing tasks, at least not yet. But that huge capacity is a fantastic opportunity and I think we should bank on that. 20 years back, it would have been delusional; today, it would be a sin not to exploit it. And sure, it is expensive, for users, too. It needs a dose of idealism, or something. But if we cannot muster it, do we really have a right to health, for example, as an "advanced" civilization?
Many challenges to overcome: After this pandemic, the next one could be much worse - not even to talk about other at least partly-man-made problems, like antibiotic resistance and the long tail of climate change... at least we have a decent chance to avert a "global killer" asteroid in the meanwhile, so at least some advancement over the dinosaurs
Back to distributed computing & power efficiency:
Right now, we talk a lot about not wasting the heat when producing other types of power (combustion engines, electricity), but all of the energy we use for computing ultimately becomes heat, and most of it is just wasted - or even worse, needs more energy to be transported away (air conditioning to cool the data center - or your home if you live in warm areas).
I "envision" that in the not-too-far-future, most things needing heat (heating your home, your food, heat for industrial processes) will generate that heat by performing useful calculations. If that is mostly done when the heat is needed, these type of distributed computing systems are truly energy efficient.
Kind of a triangle to minimize waste in: Electricity - Heat - Computation
For example, I am in the process of changing my server cooling arrangements at home so that the warm air is actually heating rooms which would otherwise be heated by natural gas (as companies and data centers start doing as well). And in summer, I hope to be able to "cool" my personal servers with warm outside air just enough so they are not taking too much damage, and blow the even warmer air right out again. I will throttle CPU performance in summer if the servers are at risk OR if my battery-backed photovoltaic system does not provide enough energy - I don't see the point of using (mostly dirty) electricity from the utility in summer for this type of distributed computing (unless you know your system has state-of-the-art energy efficiency for the task at hand, looking forward for far more information on that front!)
Re: New Servers Idle?
A civilization is a constructive collaboration, I'd say. And it is as advanced as this kind of collaboration is the norm.
By challenge, I just meant to understand that the large scale collaboration implied by distributed computing was worth it, for its own sake, and not, say, for surfing on a hype. Beyond the struggle against the virus and other problems, and the technical aspect, there is this collaboration, this working together.
And, yes, managing heat is one of the main challenges of... life itself, actually.
By challenge, I just meant to understand that the large scale collaboration implied by distributed computing was worth it, for its own sake, and not, say, for surfing on a hype. Beyond the struggle against the virus and other problems, and the technical aspect, there is this collaboration, this working together.
And, yes, managing heat is one of the main challenges of... life itself, actually.