[Feature request] Average temp on slots.
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 32
- Joined: Fri Mar 06, 2020 5:20 pm
[Feature request] Average temp on slots.
is it possible to have average temp per slot in the work unit list. its a good way to check if the PC is working properly in case a fan dies on the component.
My room is always Hot.
-
- Site Moderator
- Posts: 1140
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: [Feature request] Average temp on slots.
There is a related feature request
https://github.com/FoldingAtHome/fah-cl ... /issues/11
https://github.com/FoldingAtHome/fah-cl ... /issues/11
-
- Site Moderator
- Posts: 1140
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: [Feature request] Average temp on slots.
The client currently doesn’t have any code to collect temperatures.
Code to do so will need to be open source GPL3-compatible and C/C++ or linkable by such.
Code to do so will need to be open source GPL3-compatible and C/C++ or linkable by such.
Re: [Feature request] Average temp on slots.
Not sure how "average temp per slot" could work - you'd have one temperature reading per GPU, and a single temperature reading per CPU socket (for all its cores, together), right?
That is indeed good data to keep your eye on, but I'd suggest you find a widget for your system that shows this somewhere appropriate (maybe in the system tray). You'd want this not only for folding, but also for gaming, compiling, or anything else computationally expensive. On the other hand, if it's the CPU or GPU fan that dies, I guess the system will shut down quicker than you can look at any widget
That is indeed good data to keep your eye on, but I'd suggest you find a widget for your system that shows this somewhere appropriate (maybe in the system tray). You'd want this not only for folding, but also for gaming, compiling, or anything else computationally expensive. On the other hand, if it's the CPU or GPU fan that dies, I guess the system will shut down quicker than you can look at any widget
-
- Site Moderator
- Posts: 6359
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: [Feature request] Average temp on slots.
It is very unlikely to be integrated to the client ... there are too many possibilities and only one developer ...
And many 3rd party tools already do this.
And many 3rd party tools already do this.
-
- Posts: 308
- Joined: Wed Feb 16, 2022 1:18 am
Re: [Feature request] Average temp on slots.
Each slot is usually on one physical chip. If you for some reason ran two of 12 core slots on a 24 core CPU, you'd just get two slots with identical temperatures. The only problem would be a slot which used CPU and GPU, or two GPUs, which hasn't been invented yet, although planned?
I use MSI afterburner. These graphs are brilliant, I can see how hard it's working (% utilisation), the temperature, the RAM usage, etc. This is the one for my main computer, which is overly complicated as it monitors internet usage for itself and the garage (which is where the other computers are) - that was for LHC on Boinc, which uses a LOT of internet, and was causing things to throttle waiting for new data.victor_pp wrote: ↑Sat Apr 22, 2023 12:38 pmThat is indeed good data to keep your eye on, but I'd suggest you find a widget for your system that shows this somewhere appropriate (maybe in the system tray). You'd want this not only for folding, but also for gaming, compiling, or anything else computationally expensive. On the other hand, if it's the CPU or GPU fan that dies, I guess the system will shut down quicker than you can look at any widget
There's also a program I use called TThrottle, designed for Boinc, but you can add other programs like Folding, which will slow down programs you define when the temperature is above a user defined limit. I use that on a laptop which has a nasty habit of being a dust collector for the entire house. The fins are so fine I have to take it apart to brush the dust out. I blame the parrots.
-
- Posts: 32
- Joined: Fri Mar 06, 2020 5:20 pm
Re: [Feature request] Average temp on slots.
Its mainly for monitoring remote clients. I've had a PC with a dead CPU water pump and all it did was thermal throttle the system. it never shut off, all it did was run reallly sloooowwww. I was hoping that something first party would be best in my case since I am too small for larger more specialized monitor programs and too large for v7 implementation.victor_pp wrote: ↑Sat Apr 22, 2023 12:38 pm Not sure how "average temp per slot" could work - you'd have one temperature reading per GPU, and a single temperature reading per CPU socket (for all its cores, together), right?
That is indeed good data to keep your eye on, but I'd suggest you find a widget for your system that shows this somewhere appropriate (maybe in the system tray). You'd want this not only for folding, but also for gaming, compiling, or anything else computationally expensive. On the other hand, if it's the CPU or GPU fan that dies, I guess the system will shut down quicker than you can look at any widget
My room is always Hot.
-
- Posts: 308
- Joined: Wed Feb 16, 2022 1:18 am
Re: [Feature request] Average temp on slots.
I do have remote clients, 7 of them. They all have the MSI Afterburner graph running on them, I access them with a remote desktop session once a day to check on them. I can easily spot anything funny going on like throttling.
-
- Posts: 32
- Joined: Fri Mar 06, 2020 5:20 pm
Re: [Feature request] Average temp on slots.
This shouldn't be the solution though. F@H really should take a page from the crypto programs and actually display useful basic information to the user when something goes wrong. you already have a system reporting information from the slots, adding the bare minimum for basic troubleshooting can go a long way. doesn't the GPU/CPU driver have an API that could pull current temp?Peter_Hucker wrote: ↑Sat Apr 22, 2023 8:43 pm I do have remote clients, 7 of them. They all have the MSI Afterburner graph running on them, I access them with a remote desktop session once a day to check on them. I can easily spot anything funny going on like throttling.
I'm not asking for an MSI afterburner replacement since that is what 3rd party devs are for just asking for a flag of "hey, I'm hot. maybe you should pause my work and look at this."
My room is always Hot.
Re: [Feature request] Average temp on slots.
Is it that easy to define what is "wrong"? Absolute temperatures may be very relative - what's normal for a PC with a good cooling system might be terribly hot for a laptop.F@H really should take a page from the crypto programs and actually display useful basic information to the user when something goes wrong.
Not all monitoring systems are enterprise-class complex. For remote clients under Linux, I can recommend https://www.monitorix.org/ . Maybe your distribution already has it in its repositories, it is very lightweight, you can set it up in a few minutes (one config file, optionally a separate nginx config), and it gives you all relevant information on one web site.zexmaxwell wrote: ↑Sat Apr 22, 2023 8:26 pm Its mainly for monitoring remote clients. I've had a PC with a dead CPU water pump and all it did was thermal throttle the system. it never shut off, all it did was run reallly sloooowwww. I was hoping that something first party would be best in my case since I am too small for larger more specialized monitor programs and too large for v7 implementation.
-
- Site Moderator
- Posts: 6359
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: [Feature request] Average temp on slots.
Everyone use PPD to monitor their clients ... if it gets abnormally low, then something is wrong with the system. It is the user responsibility to investigates what's wrong.
FAH doesn't have the manpower to maintain a hardware monitoring application that would work on all hardware and all new updates (drivers, new hardware, ...). There are 3rd party tool the are doing this way better (look at how any updates a tool like GPUZ gets).
I never look at hardware monitoring unless I see abnormally low PPD, or gets unusual compute errors ... but I always monitor my clients and regularly look at the summary : http://fahmon.fleucorp.fr/ (generated by HFM, a 3rd party tool too).
FAH doesn't have the manpower to maintain a hardware monitoring application that would work on all hardware and all new updates (drivers, new hardware, ...). There are 3rd party tool the are doing this way better (look at how any updates a tool like GPUZ gets).
I never look at hardware monitoring unless I see abnormally low PPD, or gets unusual compute errors ... but I always monitor my clients and regularly look at the summary : http://fahmon.fleucorp.fr/ (generated by HFM, a 3rd party tool too).
-
- Site Moderator
- Posts: 1140
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: [Feature request] Average temp on slots.
Unfortunately, we need to wait for HFM to support v8.
The ppd estimate in v8 is not accurate yet and not displayed in the WU table.
The ppd estimate in v8 is not accurate yet and not displayed in the WU table.
-
- Posts: 308
- Joined: Wed Feb 16, 2022 1:18 am
Re: [Feature request] Average temp on slots.
PPD is not accurate, as it varies per project. To make sure you're getting the best out of your machines, you need to monitor temperature and usage. The other day I found a laptop's temperature was bouncing off about 87C, it was thermally throttling and only working 2/3rd as hard as it should, I took it apart and cleaned fine dust out of the heatsink which hadn't come out by blowing, and now it's working harder again.
And it is easy to define what's wrong, all chips are happy up to 80C. Above that there are problems with throttling or crashing.
And it is easy to define what's wrong, all chips are happy up to 80C. Above that there are problems with throttling or crashing.
-
- Site Moderator
- Posts: 1140
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: [Feature request] Average temp on slots.
Yes, but that is not the problem with v8.
There is a ticket for inaccurate PPD.
When available, it seems to always be an integer multiple of 86400.
So, not helpful for detecting problems.
-
- Posts: 308
- Joined: Wed Feb 16, 2022 1:18 am
Re: [Feature request] Average temp on slots.
A ticket? I didn't say there was a bug. It's not meant to be a reproducible amount. A perfectly functioning GPU might make twice as much one day as the next depending one what projects it's given.