Page 1 of 1

Core 17 and temp control

Posted: Fri Nov 08, 2013 9:16 pm
by ChristianVirtual
First at all: congratulation to the team to release core 17 ; a truly remarkable milestone as it enable more computing power to the project. Very well done.

(http://folding.stanford.edu/home/change ... -full-fah/)


Now in the release notes I read about temp control and wonder if there is also a way to actually read the current temperature of the GPU via remote interface. I really would love to be able to have that as a datapoint in slot info and be able to present in all kind of front ends. As the core reads the value and have ones some cool-down-control build in I think the effort to add actual temp would be marginal. But offer a valuable information to end user.

Re: Core 17 and temp control

Posted: Fri Nov 08, 2013 11:25 pm
by 7im
I see it as progress, but some might say just the opposite. With core 11, 15, and 16 going end of life, there are lots of pre-fermi gpus that will eventually go idle, or on to other projects. A lot of older AMDs already have. Not sure how many of those are still around, but it's a counter weight to rebalance against the core 17 announcement.

Start with: nvidia-smi.exe

But all the OC tools for GPUs already display the temp in their tool tray, and have built in alerts, and fan ramping, etc. So for fah, it's a duplication of effort, and a bell/whistler not really needed. But if you're in to building such things, check out NV's tools. Note the announcement about temps is for NV only. Not sure how AMD does it. You'd have to add code for each.

IMO, the temp code in fah for NV is a last line of defense, not a front line tool. I want my fans ramping up way before I consider shutting down the GPU via FAH. I also want downclocking before idling the GPU for a minimum of 15 minutes. And it only works in single GPU systems at this time, so us dual GPU users, who have more heat concerns, can't use it.

It's one more wrench in an already well stocked tool bucket. ;) Some may find it useful, others not.

Re: Core 17 and temp control

Posted: Sat Nov 09, 2013 12:10 am
by ChristianVirtual
Sure I use nvidia-smi on the console with refresh loop every other second and also via my zabbix monitoring tool for graphing over the day. Works ok.

But as the core has the value anyway it would streamline generalized monitoring tools allowing to get a quick glance whenever looking after the folding progress. The effort putting in into the slot info should be fairly small. But add a temp graph over the TPF bars in my iPad would be interesting.
I would not go so far to control the fan via remote interface; that could cause trouble/damage if wrong used and keep that to the OC tools for the specialists/geeks.

Didn't understood the single GPU part of the news! too bad. Me even triple GPU folder; talk about heat. But get cold outside so happy folding :mrgreen:

Re: Core 17 and temp control

Posted: Sat Nov 09, 2013 12:34 am
by 7im
Even a small effort is a waste when it's a duplicated effort. Nice to have, not must have. And we have a lot of must have bugs to fix first.

It also has a very limited application value.
Only 1 GPU.
Must be NV GPU.
Doesn't work in all 3 OS types.

Tools they put in to the GUI should work for all types of GPUs, for multiple GPUs, and for all OS types. Until they can do that, it's a waste of time anyway. Someone should add this as a feature request ticket, but it's a long list.

Re: Core 17 and temp control

Posted: Sat Nov 09, 2013 3:45 am
by art_l_j_PlanetAMD64
7im wrote:Even a small effort is a waste when it's a duplicated effort. Nice to have, not must have. And we have a lot of must have bugs to fix first.

It also has a very limited application value.
Only 1 GPU.
Must be NV GPU.
Doesn't work in all 3 OS types.

Tools they put in to the GUI should work for all types of GPUs, for multiple GPUs, and for all OS types. Until they can do that, it's a waste of time anyway. Someone should add this as a feature request ticket, but it's a long list.
Very well said!

I don't really understand the 'want' (it's not a 'need') to have so much data being monitored remotely, especially for the GPU core temperatures.

For the 'temperature control' of the 46 GPUs in "The Farm", I rely on 3 things:
  1. Each GPU's own built-in (by NVIDIA and/or the GPU maker) 'clock frequency control', which does the job for me. You can see "The Farm" at this link.
  2. EVGA Precision X (which works with all makes of NVIDIA GPUs), where I set each GPU's 'Fan Speed' control to 'Manual', and then set the Fan Speed to get the GPU temps I want (65C maximum). This usually ends up with a Manual Fan Speed of anywhere from 80% to 95%. In my experience, over many years with NVIDIA GPU types from 9500GT's to GTX Titan's, I have found that the 'Auto' fan speed control, is universally poor, regardless of the 'make' of NVIDIA GPU.
  3. Good 'internal' and 'external' airflow control is essential, as is described at this link.

Re: Core 17 and temp control

Posted: Sat Nov 09, 2013 5:01 am
by ChristianVirtual
On the other side this temp control function made it already into the core ... working for only NV card; even single GPU setups only. :?: the work is 90% done. Just asking to provide this collected value to the outside world.

And as for monitoring of temps ... There are less professional donors and setups out there. I would be one of those.
I use manual fan control set Linux and try to get my three GPU on less then 70C; but I have less control about ambient temps during the day. So it would be still great (yes, a "want") to get access to a data point already collected.