13400 assigned to GPUs that are way too slow

Moderators: Site Moderators, FAHC Science Team

MaartenBaert
Posts: 4
Joined: Sat Apr 25, 2020 1:04 pm

13400 assigned to GPUs that are way too slow

Post by MaartenBaert »

I'm running F@H on 3 machines with the following GPUs:

- Nvidia NVS 310 (very slow)
- Nvidia GTX 660
- Nvidia GTX 1060

Today all three GPUs were assigned project 13400 with a 2 day deadline. However the GTX 660 needs ~2.4 days to complete this WU and the NVS 310 needs ~26.8 days! The GTX 1060 needs ~16.3 hours so that's fine.

Are these WUs supposed to be that slow? If so, they probably shouldn't be assigned to weaker graphics cards ...
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 13400 assigned to GPUs that are way too slow

Post by Neil-B »

Caveat: Estimated times in the early folds (pre 5%) of a new project on a slot can be fairly anomalous.

The NVS may be getting a tad slow even for smaller WUs? … 48 shaders and only OpenCL 1.1 (but with Double Precision FP) looks to be the specs which from recent threads might put it fairly close to retirement?

The other two are fairly old generation but still doing well considering the 13400 (iircc) uses the latest GPU core and (again iirc) uses parts of OpenMM not used previously so it doesn't surprise me that it pushes the cards a bit - and you are right the 660 probably shouldn't be sent this type of WU, but I'm not sure how much granularity of control the AS has on this (hopefully enough) … I'm guessing folders with the latest GPUs are loving them.

What OS is each of the GPUs running under? … as this might have a relevance if I can find something I am sure I read earlier today - found it viewtopic.php?f=19&t=34745&p=329658#p329658 … this is obviously a very quickly adapting scenario so that post may be outdated … also see viewtopic.php?f=19&t=34745&p=329702#p329737 which explains that even with the issues they are having it is still helping the science.

It may be that one of the team can walk you through the best way to "dump" the two WUs on the slower cards so that they get flagged for "immediate" reassignment
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
MaartenBaert
Posts: 4
Joined: Sat Apr 25, 2020 1:04 pm

Re: 13400 assigned to GPUs that are way too slow

Post by MaartenBaert »

The GTX 660 is running on Arch Linux, kernel version 5.6.6, Nvidia driver version 440.82, CPU is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz.
The NVS 310 is running on CentOS 7, kernel 3.10.0, Nvidia driver version 390.116, CPU is Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz.
The GTX 1060 is running on CentOS 7, kernel 3.10.0, Nvidia driver version 440.64, CPU is Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz.

I should probably just disable the NVS 310, it has even less processing power than the CPU.

Edit: Would it be feasible to transfer the unfinished WU from the GTX 660 to the GTX 1060? Is it just a matter of copying the files or will that break things?
Joe_H
Site Admin
Posts: 7929
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 13400 assigned to GPUs that are way too slow

Post by Joe_H »

It is more than just copying the files. A WU and the necessary other files can be moved to a similar enough machine, but is complicated if you are already processing WUs on that machine. It is not something I can recommend.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: 13400 assigned to GPUs that are way too slow

Post by Nuitari »

Got 13400 (42, 21, 4) assigned to a Radeon Baffin XT RX 560 (not Ellesmere) and the TPF is at 24 minutes, 28 secs.
Likely going to complete midway between the timeout and the expiration.
Image
lazyacevw
Posts: 35
Joined: Tue Mar 17, 2020 8:12 pm

Re: 13400 assigned to GPUs that are way too slow

Post by lazyacevw »

Do you have your 310 or 660 set to client-type advanced? If so, you might want to remove them. Not sure but looks like 13400 is a beta task. My 1080 TI is 4 min 60 sec per fold, so about 6 or 7 hours.

https://stats.foldingathome.org/project?p=13400
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13400 assigned to GPUs that are way too slow

Post by bruce »

Yes, the settings for project 13400 are being adjusted. It is a very large project and probably should be restricted to hardware faster than any of yours except the GTX 1060 which should be able to handle it.
lazyacevw
Posts: 35
Joined: Tue Mar 17, 2020 8:12 pm

Re: 13400 assigned to GPUs that are way too slow

Post by lazyacevw »

I like the larger/longer projects. Keeps the GPUs gainfully employed and reduces the number of connection requests to the work and collection servers.
MaartenBaert
Posts: 4
Joined: Sat Apr 25, 2020 1:04 pm

Re: 13400 assigned to GPUs that are way too slow

Post by MaartenBaert »

All my clients are using default settings. The GTX 1060 has indeed completed the WU without issues.
lazyacevw
Posts: 35
Joined: Tue Mar 17, 2020 8:12 pm

Re: 13400 assigned to GPUs that are way too slow

Post by lazyacevw »

If you are running stock settings, I guess the devs need to figure out a way to blacklist less powerful GPUs to avoid waste. All of my WU's so far today are 7 hour tasks on my 1080TIs. Not blacklisting or making them beta tasks will just cause the WUs to time out on less powerful GPUs.
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: 13400 assigned to GPUs that are way too slow

Post by JohnChodera »

Apologies for the extremely short deadline/timeout for 13400. This is a brand new type of workload for us---relative binding free energy calculations using a new nonequilibrium integrator that exploits features just rolled out in core22 0.0.5.
We're still learning how to improve things, and there will be some hiccups in the first few projects (like 13400).
I've changed 13400 to collect-only, and we'll be making modifications to future iterations of this workload.
We collected a ton of useful data in this first trial that we'll use to make improvements.
Thanks again for bearing with us!

~ John Chodera // MSKCC
Nuitari
Posts: 78
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: 13400 assigned to GPUs that are way too slow

Post by Nuitari »

1050ti is just a tad too slow to do it within the timeout at about 1.2 days
Image
Theonlycure
Posts: 6
Joined: Mon Mar 23, 2020 11:53 pm

Re: 13400 assigned to GPUs that are way too slow

Post by Theonlycure »

I have a RTX 2080Ti and WU 13400 is way too slow also. I usually plow through the work units and get near max credit. This one however is a slug. Only 45% finished and estimated 9hr and 42 minutes left. This would not bother me except for the fact the points don't reflect how much time and electricity I am expending. Estimated credit 317635. Very sad.
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: 13400 assigned to GPUs that are way too slow

Post by HaloJones »

Theonlycure wrote:I have a RTX 2080Ti and WU 13400 is way too slow also. I usually plow through the work units and get near max credit. This one however is a slug. Only 45% finished and estimated 9hr and 42 minutes left. This would not bother me except for the fact the points don't reflect how much time and electricity I am expending. Estimated credit 317635. Very sad.
Can you provide a little detail?

What OS?
What "client-type" do you have set? Advanced? Beta?
single 1070

Image
Basti
Posts: 1
Joined: Mon Apr 27, 2020 10:56 am

Re: 13400 assigned to GPUs that are way too slow

Post by Basti »

Ran into this today, too.
Did not change anything in config.

Code: Select all

# Project
13400

# os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

# CPU
model name      : Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz

# GPU
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
        Subsystem: Sapphire Technology Limited Baffin [Radeon RX 550 640SP / RX 560/560X] (Radeon RX 550 640SP)

# uptime
 13:02:42 up 15 days, 19:33,  1 user,  load average: 0,11, 0,08, 0,09

Code: Select all

# log
06:42:10:WU01:FS01:0x22:Completed 1860000 out of 2000000 steps (93%)
07:13:58:WU01:FS01:0x22:Completed 1880000 out of 2000000 steps (94%)
07:45:44:WU01:FS01:0x22:Completed 1900000 out of 2000000 steps (95%)
07:45:53:WARNING:WU01:FS01:Past final deadline 2020-04-27T07:45:52Z, dumping
07:45:53:WU01:FS01:Shutting core down
07:45:53:WU01:FS01:0x22:Caught signal SIGINT(2) on PID 1642
07:45:53:WU01:FS01:0x22:Exiting, please wait. . .
07:45:53:WU01:FS01:0x22:Folding@home Core Shutdown: INTERRUPTED

Code: Select all

# config.xml
<config>
  <!-- Client Control -->
  <fold-anon v='true'/>

  <!-- HTTP Server -->
  <allow v='10.20.30.20'/>

  <!-- Network -->
  <proxy v=':8080'/>

  <!-- Remote Command Server -->
  <password v='*'/>

  <!-- Slot Control -->
  <power v='full'/>

  <!-- User Information -->
  <passkey v='*'/>
  <team v='*'/>
  <user v='*'/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'/>
Post Reply