ASUS R904 G34
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 2
- Joined: Thu Oct 14, 2021 4:40 am
ASUS R904 G34
Hi,
I obtained the aforementioned server (4 Processors, 64 cores in total, 128 GB RAM) which was part of a former supercomputer that comprised of 535 of these machines. (287 TFLOPS peak)
Operating System is Ubuntu 20.04.3 LTS with all patches etc. applied.
f@h Software that was installed is "fahclient_7.6.21_amd64.deb"
I can't send images here, but I can tell you that all processors are well above 80% or 85% load - and this must be true based on the noise of the fans in the machine, as well as the dissipated heat.
What wonders me here is the point that with this machine it still takes 4 hours with all 64 cores reported "in use" by the related fahcontrol program to complete a work-unit...
Are there any hints to speed up things - other than getting a faster machine ? My idea was that with 64 cores it would take an hour or so to handle on work-unit. Or is this a wrong idea from my part ?
My 4-core Desk-PC processes a work-unit in about 5 hours.
Many thanks for any usefule hint or advice from Cologne / Germany
I obtained the aforementioned server (4 Processors, 64 cores in total, 128 GB RAM) which was part of a former supercomputer that comprised of 535 of these machines. (287 TFLOPS peak)
Operating System is Ubuntu 20.04.3 LTS with all patches etc. applied.
f@h Software that was installed is "fahclient_7.6.21_amd64.deb"
I can't send images here, but I can tell you that all processors are well above 80% or 85% load - and this must be true based on the noise of the fans in the machine, as well as the dissipated heat.
What wonders me here is the point that with this machine it still takes 4 hours with all 64 cores reported "in use" by the related fahcontrol program to complete a work-unit...
Are there any hints to speed up things - other than getting a faster machine ? My idea was that with 64 cores it would take an hour or so to handle on work-unit. Or is this a wrong idea from my part ?
My 4-core Desk-PC processes a work-unit in about 5 hours.
Many thanks for any usefule hint or advice from Cologne / Germany
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: ASUS R904 G34
Welcome to Folding@Home!
F@H sizes the Work Unit based on the number of CPUs devoted to folding, so while both may be taking the same amount of time, more CPUs should be getting more Points Per Day as it is working on more challenging proteins.
The different ages of CPUs have different capabities,so older CPUs, may be slower per CPU.
If you used Windows, there would be tricks to use over 32, Linux should be fine.
F@H sizes the Work Unit based on the number of CPUs devoted to folding, so while both may be taking the same amount of time, more CPUs should be getting more Points Per Day as it is working on more challenging proteins.
The different ages of CPUs have different capabities,so older CPUs, may be slower per CPU.
If you used Windows, there would be tricks to use over 32, Linux should be fine.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: ASUS R904 G34
If you happened to get one of the monster CPU jobs, 4 hours isn't bad at all. I've had jobs that took like 36 hours on 14 threads (AMD 5800X) - and the next one may be done within an hour. Job sizes for different projects will have different sizes, depending on number of atoms and number of steps.
Ryzen 9800X3D / RTX 4090 / Windows 11
Ryzen 5600X / RTX 3070 Ti / Ubuntu 22.04
Ryzen 5600 / RTX 3060 Ti / Windows 11
-
- Posts: 65
- Joined: Sat May 09, 2020 2:13 pm
- Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
- Location: Boston
- Contact:
Re: ASUS R904 G34
are the CPUs Intel or AMD? 4 sockets, 64 cores : is that 32 cores and 64 threads or 64 physical cores?
if AMD, it could be Interlagos (2011) or Abu Dhabi (2012).
if Intel, it would have to Haswell generation of Xeon E7 v3 (2015) or more recent.
The AMD cores of that era (prior to Zen) were weaker, the Intel Haswell should be half way decent. What was PPD? FaH does seem to assign big jobs to high core count systems
if AMD, it could be Interlagos (2011) or Abu Dhabi (2012).
if Intel, it would have to Haswell generation of Xeon E7 v3 (2015) or more recent.
The AMD cores of that era (prior to Zen) were weaker, the Intel Haswell should be half way decent. What was PPD? FaH does seem to assign big jobs to high core count systems
-
- Posts: 2
- Joined: Thu Oct 14, 2021 4:40 am
Re: ASUS R904 G34
Hi, thanks for the quick reply and all the detailed information.
There are 4 AMD processors in the machine with 16 cores each. They are of the "Interlagos" type.
You mentioned that FaH seems to have problems with assigning jobs to high core count systems - Ubuntu Systems-Management reports 64 processors, all with loads 80% or higher - and this should be true, since the speed of the fans ramps up conderarbly as soon as FaH starts up automatically after system boot.
So despite the number of cores the performance per core seems to be the issue and my expectations were slightly wrong. I am testing some BIOS settings, eventually I can squeeze some performance out of the system. If anyone has a clue to speed up things - comments are very welcome !
Regards
There are 4 AMD processors in the machine with 16 cores each. They are of the "Interlagos" type.
You mentioned that FaH seems to have problems with assigning jobs to high core count systems - Ubuntu Systems-Management reports 64 processors, all with loads 80% or higher - and this should be true, since the speed of the fans ramps up conderarbly as soon as FaH starts up automatically after system boot.
So despite the number of cores the performance per core seems to be the issue and my expectations were slightly wrong. I am testing some BIOS settings, eventually I can squeeze some performance out of the system. If anyone has a clue to speed up things - comments are very welcome !
Regards
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: ASUS R904 G34
There are some projects that will really use large thread counts well - others are far less scalable and you will not get optimal throughput/ppd from them ... you need to run a fair few projects to get a feel for what your highs/lows in throughput/ppd are.
Make sure you are monitoring temps/cpu boost speeds - it is perfectly possible to have a situation where you are running all threads/core at max but the thermals are reducing the clock rates by a significant amount - halving a core/thread count can cool off the system and increase clock speeds giving little is any drop in throughput/ppd.
Server grade kit can tend to be loud ... and you need to make sure it is configured properly or it can be more so ... with intel kit checking the fru/sdr is important as otherwise the server may not actually know what configuration it is and may not be managing itself properly (including clocks/thermals) - I guess that AMD kit has something similar that needs to be configured for the server to run optimally - not just a case of bios settings with some servers as they will have their own management suite as well.
Make sure you are monitoring temps/cpu boost speeds - it is perfectly possible to have a situation where you are running all threads/core at max but the thermals are reducing the clock rates by a significant amount - halving a core/thread count can cool off the system and increase clock speeds giving little is any drop in throughput/ppd.
Server grade kit can tend to be loud ... and you need to make sure it is configured properly or it can be more so ... with intel kit checking the fru/sdr is important as otherwise the server may not actually know what configuration it is and may not be managing itself properly (including clocks/thermals) - I guess that AMD kit has something similar that needs to be configured for the server to run optimally - not just a case of bios settings with some servers as they will have their own management suite as well.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: ASUS R904 G34
I'm wondering if making one CPU slot for each of the processors might be a good idea (4x 16 threads), if inter-CPU communication has to be done through slow RAM, or if the hypervisor is moving threads between CPUs. One 64-thread slot should in theory be better, but with four 16-core CPUs instead of one 64-core Threadripper/Xeon (with shared fast cache), I'm not so sure that running one slot is the optimal configuration.
If indeed this is the problem, the high CPU load might be mainly comprised of actively waiting for RAM/bus to access data from a thread on a different CPU, rather than active processing.
Edit: Or perhaps some other kind of NUMA-related CPU affinity can be done on the OS level.
If indeed this is the problem, the high CPU load might be mainly comprised of actively waiting for RAM/bus to access data from a thread on a different CPU, rather than active processing.
Edit: Or perhaps some other kind of NUMA-related CPU affinity can be done on the OS level.
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Re: ASUS R904 G34
Exactly what I was going to suggest.gunnarre wrote:I'm wondering if making one CPU slot for each of the processors might be a good idea (4x 16 threads), if inter-CPU communication has to be done through slow RAM, or if the hypervisor is moving threads between CPUs. One 64-thread slot should in theory be better, but with four 16-core CPUs instead of one 64-core Threadripper/Xeon (with shared fast cache), I'm not so sure that running one slot is the optimal configuration.
If indeed this is the problem, the high CPU load might be mainly comprised of actively waiting for RAM/bus to access data from a thread on a different CPU, rather than active processing.
Edit: Or perhaps some other kind of NUMA-related CPU affinity can be done on the OS level.
The main problem with assigning 1 WU to all cores, is inter-core activity. Certain data that's written to the L-cache in core 1, now has to travel to the significantly slower PCIE bus, to be read by a thread on another CPU core.
This is extremely inefficient.
Hence why allocating 4 CPUs in the program, each controlling their own CPU.
Also, leave about 1 thread of the CPU for background data processing, unless all it does is fold. Even then, 15 threads per CPU or WU are plenty and PPD will not be affected much over 16 threads.
-
- Posts: 19
- Joined: Sat May 14, 2011 11:50 pm
Re: ASUS R904 G34
I have 2 Opteron 6276 systems, I did run Ubuntu but I now run Win 10 Pro for workstations
I have these pics over on Overclockers.com, if you can post your pics to a forum, you can then link them here.
I have these pics over on Overclockers.com, if you can post your pics to a forum, you can then link them here.
Re: ASUS R904 G34
Have you tried the suggestion to make more CPU slots?
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Re: ASUS R904 G34
I have actually tried this. It isn't sustainable currently in windows. Multiple CPU slots can find them both on the same NUMA node on a Threadripper. I've also tried manually setting affinity only to find it back on node0 on the next WU.
Currently this only really works when running in Linux when and with a cpu slot configured threads/4-2 and only using 2 slots at a time for the fastest result.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: ASUS R904 G34
https://www.amd.com/en/product/1546
https://www.cpu-world.com/CPUs/Bulldoze ... TGGGU.html
This may be your CPU.
I am guessing AVX is the fastest floating point math it knows.
https://www.cpu-world.com/CPUs/Bulldoze ... TGGGU.html
This may be your CPU.
I am guessing AVX is the fastest floating point math it knows.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends