[Feature request] Average temp on slots.

Moderators: Site Moderators, FAHC Science Team

zexmaxwell
Posts: 32
Joined: Fri Mar 06, 2020 5:20 pm

[Feature request] Average temp on slots.

Post by zexmaxwell »

is it possible to have average temp per slot in the work unit list. its a good way to check if the PC is working properly in case a fan dies on the component.

Image
My room is always Hot.
calxalot
Site Moderator
Posts: 1117
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: [Feature request] Average temp on slots.

Post by calxalot »

calxalot
Site Moderator
Posts: 1117
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: [Feature request] Average temp on slots.

Post by calxalot »

The client currently doesn’t have any code to collect temperatures.
Code to do so will need to be open source GPL3-compatible and C/C++ or linkable by such.
victor_pp
Posts: 7
Joined: Sat May 02, 2020 10:02 pm

Re: [Feature request] Average temp on slots.

Post by victor_pp »

Not sure how "average temp per slot" could work - you'd have one temperature reading per GPU, and a single temperature reading per CPU socket (for all its cores, together), right?

That is indeed good data to keep your eye on, but I'd suggest you find a widget for your system that shows this somewhere appropriate (maybe in the system tray). You'd want this not only for folding, but also for gaming, compiling, or anything else computationally expensive. On the other hand, if it's the CPU or GPU fan that dies, I guess the system will shut down quicker than you can look at any widget :)
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: [Feature request] Average temp on slots.

Post by toTOW »

It is very unlikely to be integrated to the client ... there are too many possibilities and only one developer ...

And many 3rd party tools already do this.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: [Feature request] Average temp on slots.

Post by Peter_Hucker »

victor_pp wrote: Sat Apr 22, 2023 12:38 pmNot sure how "average temp per slot" could work - you'd have one temperature reading per GPU, and a single temperature reading per CPU socket (for all its cores, together), right?
Each slot is usually on one physical chip. If you for some reason ran two of 12 core slots on a 24 core CPU, you'd just get two slots with identical temperatures. The only problem would be a slot which used CPU and GPU, or two GPUs, which hasn't been invented yet, although planned?
victor_pp wrote: Sat Apr 22, 2023 12:38 pmThat is indeed good data to keep your eye on, but I'd suggest you find a widget for your system that shows this somewhere appropriate (maybe in the system tray). You'd want this not only for folding, but also for gaming, compiling, or anything else computationally expensive. On the other hand, if it's the CPU or GPU fan that dies, I guess the system will shut down quicker than you can look at any widget :)
I use MSI afterburner. These graphs are brilliant, I can see how hard it's working (% utilisation), the temperature, the RAM usage, etc. This is the one for my main computer, which is overly complicated as it monitors internet usage for itself and the garage (which is where the other computers are) - that was for LHC on Boinc, which uses a LOT of internet, and was causing things to throttle waiting for new data.

Image

There's also a program I use called TThrottle, designed for Boinc, but you can add other programs like Folding, which will slow down programs you define when the temperature is above a user defined limit. I use that on a laptop which has a nasty habit of being a dust collector for the entire house. The fins are so fine I have to take it apart to brush the dust out. I blame the parrots.
zexmaxwell
Posts: 32
Joined: Fri Mar 06, 2020 5:20 pm

Re: [Feature request] Average temp on slots.

Post by zexmaxwell »

victor_pp wrote: Sat Apr 22, 2023 12:38 pm Not sure how "average temp per slot" could work - you'd have one temperature reading per GPU, and a single temperature reading per CPU socket (for all its cores, together), right?

That is indeed good data to keep your eye on, but I'd suggest you find a widget for your system that shows this somewhere appropriate (maybe in the system tray). You'd want this not only for folding, but also for gaming, compiling, or anything else computationally expensive. On the other hand, if it's the CPU or GPU fan that dies, I guess the system will shut down quicker than you can look at any widget :)
Its mainly for monitoring remote clients. I've had a PC with a dead CPU water pump and all it did was thermal throttle the system. it never shut off, all it did was run reallly sloooowwww. I was hoping that something first party would be best in my case since I am too small for larger more specialized monitor programs and too large for v7 implementation.
My room is always Hot.
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: [Feature request] Average temp on slots.

Post by Peter_Hucker »

I do have remote clients, 7 of them. They all have the MSI Afterburner graph running on them, I access them with a remote desktop session once a day to check on them. I can easily spot anything funny going on like throttling.
zexmaxwell
Posts: 32
Joined: Fri Mar 06, 2020 5:20 pm

Re: [Feature request] Average temp on slots.

Post by zexmaxwell »

Peter_Hucker wrote: Sat Apr 22, 2023 8:43 pm I do have remote clients, 7 of them. They all have the MSI Afterburner graph running on them, I access them with a remote desktop session once a day to check on them. I can easily spot anything funny going on like throttling.
This shouldn't be the solution though. F@H really should take a page from the crypto programs and actually display useful basic information to the user when something goes wrong. you already have a system reporting information from the slots, adding the bare minimum for basic troubleshooting can go a long way. doesn't the GPU/CPU driver have an API that could pull current temp?

I'm not asking for an MSI afterburner replacement since that is what 3rd party devs are for just asking for a flag of "hey, I'm hot. maybe you should pause my work and look at this."
My room is always Hot.
victor_pp
Posts: 7
Joined: Sat May 02, 2020 10:02 pm

Re: [Feature request] Average temp on slots.

Post by victor_pp »

F@H really should take a page from the crypto programs and actually display useful basic information to the user when something goes wrong.
Is it that easy to define what is "wrong"? Absolute temperatures may be very relative - what's normal for a PC with a good cooling system might be terribly hot for a laptop.
zexmaxwell wrote: Sat Apr 22, 2023 8:26 pm Its mainly for monitoring remote clients. I've had a PC with a dead CPU water pump and all it did was thermal throttle the system. it never shut off, all it did was run reallly sloooowwww. I was hoping that something first party would be best in my case since I am too small for larger more specialized monitor programs and too large for v7 implementation.
Not all monitoring systems are enterprise-class complex. For remote clients under Linux, I can recommend https://www.monitorix.org/ . Maybe your distribution already has it in its repositories, it is very lightweight, you can set it up in a few minutes (one config file, optionally a separate nginx config), and it gives you all relevant information on one web site.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: [Feature request] Average temp on slots.

Post by toTOW »

Everyone use PPD to monitor their clients ... if it gets abnormally low, then something is wrong with the system. It is the user responsibility to investigates what's wrong.

FAH doesn't have the manpower to maintain a hardware monitoring application that would work on all hardware and all new updates (drivers, new hardware, ...). There are 3rd party tool the are doing this way better (look at how any updates a tool like GPUZ gets).

I never look at hardware monitoring unless I see abnormally low PPD, or gets unusual compute errors ... but I always monitor my clients and regularly look at the summary : http://fahmon.fleucorp.fr/ (generated by HFM, a 3rd party tool too).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
calxalot
Site Moderator
Posts: 1117
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: [Feature request] Average temp on slots.

Post by calxalot »

Unfortunately, we need to wait for HFM to support v8.

The ppd estimate in v8 is not accurate yet and not displayed in the WU table.
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: [Feature request] Average temp on slots.

Post by Peter_Hucker »

PPD is not accurate, as it varies per project. To make sure you're getting the best out of your machines, you need to monitor temperature and usage. The other day I found a laptop's temperature was bouncing off about 87C, it was thermally throttling and only working 2/3rd as hard as it should, I took it apart and cleaned fine dust out of the heatsink which hadn't come out by blowing, and now it's working harder again.

And it is easy to define what's wrong, all chips are happy up to 80C. Above that there are problems with throttling or crashing.
calxalot
Site Moderator
Posts: 1117
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: [Feature request] Average temp on slots.

Post by calxalot »

Peter_Hucker wrote: Sun Apr 23, 2023 10:15 pm PPD is not accurate, as it varies per project.
Yes, but that is not the problem with v8.
There is a ticket for inaccurate PPD.
When available, it seems to always be an integer multiple of 86400.

So, not helpful for detecting problems.
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: [Feature request] Average temp on slots.

Post by Peter_Hucker »

A ticket? I didn't say there was a bug. It's not meant to be a reproducible amount. A perfectly functioning GPU might make twice as much one day as the next depending one what projects it's given.
Post Reply