18XXX series work units assigned to slow GPU's/iGPU's

Moderators: Site Moderators, FAHC Science Team

BobWilliams757
Posts: 520
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

18XXX series work units assigned to slow GPU's/iGPU's

Post by BobWilliams757 »

I just wanted to report this, and have seen others mention it lately. I can't be certain if this would apply to all the work units in this series, but it appears that the work units from 18003-18107 all have the same deadlines and cause.

Most of these take 30+ hours on my system, and thus don't make the timeout. In the case of 18010 I've only received one, and it wouldn't even meet the two day expiration. For that reason it was dumped.

I've also noticed that a number of these have been reassigned well before the timeout is reached, and I've never noticed that before.... in part mostly because my system would meet the timeouts.

I have no idea if my iGPU is somehow classified in a species with very wide capabilities, but even if so I have to think that others are in the same boat. If I'm folding 24/7 and can't meet the timeouts it's just slowing things down for the science.

I realize that there are a lot of wide and fast GPU's coming into play lately, but this thing hasn't had any issues meeting deadlines for the past 16 months or so, so in this case I tend to think that it was just assigned to a group of cards that won't handle these work units.


It's a 2400G with Vega 11 graphics. If anyone needs any logs or anything just let me know.
Fold them if you get them!
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by psaam0001 »

I have been getting them on my GT 1030's.

Here is my recent log covering the start of an example WU from my Fedora desktop:

Code: Select all

09:54:19:WU00:FS01:Connecting to assign1.foldingathome.org:80
09:54:20:WU00:FS01:Assigned to work server 34.72.228.44
09:54:20:WU00:FS01:Requesting new work unit for slot 01: gpu:3:0 GP108 [GeForce GT 1030] 1127 from 34.72.228.44
09:54:20:WU00:FS01:Connecting to 34.72.228.44:8080
09:54:20:WU00:FS01:Downloading 5.54MiB
09:54:21:WU00:FS01:Download complete
09:54:21:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:18018 run:2 clone:18 gen:66 core:0x22 unit:0x00000012000000420000466200000002
09:54:26:WU01:FS01:0x22:Saving result file ../logfile_01.txt
09:54:26:WU01:FS01:0x22:Saving result file checkpointIntegrator.xml.bz2
09:54:26:WU01:FS01:0x22:Saving result file checkpointState.xml.bz2
09:54:26:WU01:FS01:0x22:Saving result file positions.xtc
09:54:26:WU01:FS01:0x22:Saving result file science.log
09:54:26:WU01:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
09:54:27:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
09:54:27:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:16474 run:1 clone:102 gen:3 core:0x22 unit:0x00000066000000030000405a00000001
09:54:27:WU01:FS01:Uploading 13.46MiB to 140.163.4.200
09:54:27:WU01:FS01:Connecting to 140.163.4.200:8080
09:54:27:WU00:FS01:Starting
09:54:27:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 1738 -checkpoint 30 -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu-vendor nvidia -gpu 1 -gpu-usage 100
09:54:27:WU00:FS01:Started FahCore on PID 51754
09:54:27:WU00:FS01:Core PID:51758
09:54:27:WU00:FS01:FahCore 0x22 started
09:54:28:WU00:FS01:0x22:*********************** Log Started 2021-08-31T09:54:27Z ***********************
09:54:28:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
09:54:28:WU00:FS01:0x22:       Core: Core22
09:54:28:WU00:FS01:0x22:       Type: 0x22
09:54:28:WU00:FS01:0x22:    Version: 0.0.13
09:54:28:WU00:FS01:0x22:     Author: Joseph Coffland <[email protected]>
09:54:28:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
09:54:28:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
09:54:28:WU00:FS01:0x22:       Date: Sep 19 2020
09:54:28:WU00:FS01:0x22:       Time: 01:10:35
09:54:28:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
09:54:28:WU00:FS01:0x22:     Branch: core22-0.0.13
09:54:28:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:54:28:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:54:28:WU00:FS01:0x22:             -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
09:54:28:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
09:54:28:WU00:FS01:0x22:       Bits: 64
09:54:28:WU00:FS01:0x22:       Mode: Release
09:54:28:WU00:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
09:54:28:WU00:FS01:0x22:             <[email protected]>
09:54:28:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 51754 -checkpoint 30
09:54:28:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu-vendor
09:54:28:WU00:FS01:0x22:             nvidia -gpu 1 -gpu-usage 100
09:54:28:WU00:FS01:0x22:************************************ libFAH ************************************
09:54:28:WU00:FS01:0x22:       Date: Sep 15 2020
09:54:28:WU00:FS01:0x22:       Time: 05:14:43
09:54:28:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
09:54:28:WU00:FS01:0x22:     Branch: HEAD
09:54:28:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:54:28:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:54:28:WU00:FS01:0x22:             -funroll-loops
09:54:28:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
09:54:28:WU00:FS01:0x22:       Bits: 64
09:54:28:WU00:FS01:0x22:       Mode: Release
09:54:28:WU00:FS01:0x22:************************************ CBang *************************************
09:54:28:WU00:FS01:0x22:       Date: Sep 15 2020
09:54:28:WU00:FS01:0x22:       Time: 05:11:04
09:54:28:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
09:54:28:WU00:FS01:0x22:     Branch: HEAD
09:54:28:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:54:28:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:54:28:WU00:FS01:0x22:             -funroll-loops -fPIC
09:54:28:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
09:54:28:WU00:FS01:0x22:       Bits: 64
09:54:28:WU00:FS01:0x22:       Mode: Release
09:54:28:WU00:FS01:0x22:************************************ System ************************************
09:54:28:WU00:FS01:0x22:        CPU: AMD Ryzen 9 3950X 16-Core Processor
09:54:28:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
09:54:28:WU00:FS01:0x22:       CPUs: 32
09:54:28:WU00:FS01:0x22:     Memory: 62.72GiB
09:54:28:WU00:FS01:0x22:Free Memory: 55.31GiB
09:54:28:WU00:FS01:0x22:    Threads: POSIX_THREADS
09:54:28:WU00:FS01:0x22: OS Version: 5.13
09:54:28:WU00:FS01:0x22:Has Battery: false
09:54:28:WU00:FS01:0x22: On Battery: false
09:54:28:WU00:FS01:0x22: UTC Offset: -4
09:54:28:WU00:FS01:0x22:        PID: 51758
09:54:28:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
09:54:28:WU00:FS01:0x22:************************************ OpenMM ************************************
09:54:28:WU00:FS01:0x22:   Revision: 189320d0
09:54:28:WU00:FS01:0x22:********************************************************************************
09:54:28:WU00:FS01:0x22:Project: 18018 (Run 2, Clone 18, Gen 66)
09:54:28:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
09:54:28:WU00:FS01:0x22:Reading tar file core.xml
09:54:28:WU00:FS01:0x22:Reading tar file integrator.xml.bz2
09:54:28:WU00:FS01:0x22:Reading tar file state.xml.bz2
09:54:28:WU00:FS01:0x22:Reading tar file system.xml.bz2
09:54:28:WU00:FS01:0x22:Digital signatures verified
09:54:28:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
09:54:28:WU00:FS01:0x22:Version 0.0.13
09:54:28:WU00:FS01:0x22:  Checkpoint write interval: 250000 steps (5%) [20 total]
09:54:28:WU00:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
09:54:28:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
09:54:28:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
09:54:28:WU00:FS01:0x22:There are 4 platforms available.
09:54:28:WU00:FS01:0x22:Platform 0: Reference
09:54:28:WU00:FS01:0x22:Platform 1: CPU
09:54:28:WU00:FS01:0x22:Platform 2: OpenCL
09:54:28:WU00:FS01:0x22:  opencl-device 1 specified
09:54:28:WU00:FS01:0x22:Platform 3: CUDA
09:54:28:WU00:FS01:0x22:  cuda-device 1 specified
09:54:31:WU00:FS01:0x22:Attempting to create CUDA context:
09:54:31:WU00:FS01:0x22:  Configuring platform CUDA
09:54:33:WU01:FS01:Upload complete
09:54:33:WU01:FS01:Server responded WORK_ACK (400)
09:54:33:WU01:FS01:Final credit estimate, 87398.00 points
09:54:33:WU01:FS01:Cleaning up
09:54:36:WU00:FS01:0x22:  Using CUDA and gpu 1
09:54:36:WU00:FS01:0x22:Completed 0 out of 5000000 steps (0%)
09:54:36:WU00:FS01:0x22:Checkpoint completed at step 0 
Paul
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by toTOW »

With AMD GPUs, the problem is that the categories are basically supported/unsupported ... there's not much room for classification based on performance ...

Paul, does your GT 1030 make the deadlines ? (we have better classification on nVidia GPUs and it might be possible to exclude it from these projects).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by psaam0001 »

It needs like an extra one-half to one-full day, depending on the project that is coming from the 34.72.228.44 server. That is my best estimate, given that others with the GT 1030's may have different experiences (as I'm not over-clocking my system).

Paul
BobWilliams757
Posts: 520
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by BobWilliams757 »

toTOW wrote:With AMD GPUs, the problem is that the categories are basically supported/unsupported ... there's not much room for classification based on performance ...

Paul, does your GT 1030 make the deadlines ? (we have better classification on nVidia GPUs and it might be possible to exclude it from these projects).
Thanks for the response toTow. I didn't realize the AMD gear was in such a wide spread of performance.

But unless you've forwarded this information and some exclusions have taken place, I've just got lucky and have had a couple work units other than this series lately. It's nice knowing that I'm doing work that will meet the timeout and not cause it to be done twice.
Fold them if you get them!
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by psaam0001 »

psaam0001 wrote:It needs like an extra one-half to one-full day, depending on the project that is coming from the 34.72.228.44 server. That is my best estimate, given that others with the GT 1030's may have different experiences (as I'm not over-clocking my system).
@toTOW: The slowest card that I have, that will complete anything off of the previously listed server (in that group of projects) is my GTX 1050ti; I have not gotten another unit from it for the iGPU in my Ryzen 3 within the last couple of days, so I'll try to let you know what happens ASAP.

Paul
BobWilliams757
Posts: 520
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by BobWilliams757 »

psaam0001 wrote:
psaam0001 wrote:It needs like an extra one-half to one-full day, depending on the project that is coming from the 34.72.228.44 server. That is my best estimate, given that others with the GT 1030's may have different experiences (as I'm not over-clocking my system).
@toTOW: The slowest card that I have, that will complete anything off of the previously listed server (in that group of projects) is my GTX 1050ti; I have not gotten another unit from it for the iGPU in my Ryzen 3 within the last couple of days, so I'll try to let you know what happens ASAP.

Paul
I was getting nothing but the WU's in that series, but not since after I've posted the original post. Either they changed something or I'm just hitting the jackpot. Either way, I've been back to stuff where I meet the timeouts again.
Fold them if you get them!
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by psaam0001 »

The Ryzen 3 3200G's Vega 8 iGPU is doing fine on one of the 18018 WU's as I speak.

Paul
BobWilliams757
Posts: 520
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by BobWilliams757 »

psaam0001 wrote:The Ryzen 3 3200G's Vega 8 iGPU is doing fine on one of the 18018 WU's as I speak.

Paul
I guess I just got lucky for a few days, as I'm chipping away at a 18018 WU as we speak. :mrgreen:
Fold them if you get them!
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by psaam0001 »

And BTW: I have figured out how to adjust performance/fan speed control's in Fedora... At least for the cards that I use, which have them (and actually work).

Paul
BobWilliams757
Posts: 520
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by BobWilliams757 »

Bah! I had a string of luck that kept me away from these WU's for quite a while. I was getting stuff that had longer timeouts but were slow as far as PPD..... but at least I knew the work wasn't being done twice.

But I just picked up a 18023. I'm going to let it run, but I'm sure before I reach the completion it will have been picked up and done by a faster machine.


Oh well, it was good while it lasted!
Fold them if you get them!
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by psaam0001 »

BobWilliams757 wrote:Bah! I had a string of luck that kept me away from these WU's for quite a while. I was getting stuff that had longer timeouts but were slow as far as PPD..... but at least I knew the work wasn't being done twice.

But I just picked up a 18023. I'm going to let it run, but I'm sure before I reach the completion it will have been picked up and done by a faster machine.


Oh well, it was good while it lasted!
I have pulled my AMD iGPU from FAH (as of 11/5/2021), until the issues with driver performance have been stabilized to the satisfaction of the FAH community members who have had issues with the latest versions. I was running okay with the version that Windows 10 Home update installs if there are none present, but the system that iGPU is in, is also used by my dad who has a habit of leaving unused browser windows open (he's visually impaired though, so he considers that practice a convenience).

Strangely enough, I'm pushing near 55K a day on my GT 1030 since the last updates for Windows 10 were installed. That will vary of course.

Paul
BobWilliams757
Posts: 520
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by BobWilliams757 »

I might have to pull mine as well until things sort out. It doesn't look like there are a lot of WU's in this series left to be done, so waiting it out might be the best option. It's the only series where I've had issues meeting the timeouts thus far.

As for drivers, I haven't had any issues on my system. PPD variations seem about normal, but no big hits that I can see.



I have noticed however, that with this series of work units, they have often been reassigned before the timeout. I had one just complete, and it was assigned to me before the previous user had reached the timeout period. Though in this case it turned out that the other users WU failed, it was assigned to me about 17 hours after it was assigned to the first folder.
Fold them if you get them!
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by gunnarre »

A failed WU will be re-assigned to new persons without waiting for timeout.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
BobWilliams757
Posts: 520
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: 18XXX series work units assigned to slow GPU's/iGPU's

Post by BobWilliams757 »

gunnarre wrote:A failed WU will be re-assigned to new persons without waiting for timeout.
Understood, but the failed work unit hadn't been returned yet. In this instance, the WU was picked up by a third person when I didn't meet the timeout, and then a fourth person as well.

User Team CPUID Credit Assigned Returned Credited Days Code
Anonymous 0 B7D6E95E485DC3E8 0 2021-11-05 07:14:30 2021-11-07 01:26:28 2021-11-07 01:22:09 1.755 Failed
BobWilliams757 0 18AD785E828F2FA4 95,000 2021-11-06 00:26:12 2021-11-07 05:46:20 2021-11-07 05:38:45 1.217 Ok
Zenop 0 8541765E00105F18 229,235 2021-11-07 00:36:37 2021-11-07 06:56:32 2021-11-07 06:47:38 0.258 Ok
Anonymous 0 D1E2F45F08AA1448 95,000 2021-11-07 01:22:45 2021-11-08 03:46:26 2021-11-08 03:38:00 1.094 Ok

The fourth folder picked up up only about an hour after the third. And being that I picked it up about 6 hours before the first timeout, either way they are being reassigned early.


I'm just going to wait until this series of WU's is no longer being assigned, but it seems that something isn't working correctly unless the policy on assignments has been changed.
Fold them if you get them!
Post Reply