Failed GPU slot daily :(
Moderators: Site Moderators, FAHC Science Team
Re: Failed GPU slot daily :(
I edited my earlier post so it contains new information.
Your GPU is clearly having trouble with that WU. I would not leave it running. Let's start by simply pausing your GPU. Go to FAHControl and in the middle of the initial screen, you'll see a small chart called "Folding Slots" and another called "Work Queue" In the upper chart, you'll see two green "Running" words, one called cpu and one called gpu. Right-click on the green gpu slot Status flag and select Pause
Your GPU is clearly having trouble with that WU. I would not leave it running. Let's start by simply pausing your GPU. Go to FAHControl and in the middle of the initial screen, you'll see a small chart called "Folding Slots" and another called "Work Queue" In the upper chart, you'll see two green "Running" words, one called cpu and one called gpu. Right-click on the green gpu slot Status flag and select Pause
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 33
- Joined: Sat Aug 22, 2020 4:37 pm
Re: Failed GPU slot daily :(
Thanks bruce.bruce wrote:Yes, and it's a good sign. FAH is running two WUs that run independently of each other, one on your CPU and one on your GPU. The output logs are intermixed so you have to learn to read the combined output or use the filtering function that's built into to FAHControl.... So far, what's weird is, i'm still noticing that PRCG is changing from 13422 (2975, 69, 2) to PRCG 16918 (4, 50, 39) then right back to PRCG 13422 (2975, 69, 2) again and repeat. Is that a sign of anything?
FAHCore 0xa7 is running WU00 on slot FS00 using the avx-256 hardware feature. Independently, FAHCore 0x22 is running (or trying to) WU01 using slot FS01Code: Select all
20:18:34:WU00:FS00:0xa7: SIMD: avx_256 20:18:34:WU00:FS00:0xa7:Project: 14824 (Run 1225, Clone 3, Gen 52) ... 20:18:37:WU01:FS01:0x22:Project: 13422 (Run 3163, Clone 20, Gen 0)
Re: Failed GPU slot daily :(
I don't have a good explanation as to why the Hawaii [Radeon R9 200/300 Series] is having trouble with Project 13422 (2975, 69, 2) but it is ... and it looks like it's looping. That's not good so I suggested the Pause. The log should continue to show (only) what's happening to the CPU assignment and it will be a lot easier to follow.
Then we'll figure out what to do with the GPU.
Your GPU is running Core: Core22 Version: 0.0.11. It has been going through a process of bug fixing and I think you've found a new one. The developer is on the east coast, so I don't think we can contact him a 01:00 EST.
That leaves us 2 choices. 1) tell you how to dump the WU and hope that it's replaced with something your GPU can process or 2) wait until morning in NYC and let him recommend a way to figure out what's going on.
If you choose 2, it's actually better than leaving it running (wasting GPU power and gathering useless repeated messages.
https://apps.foldingathome.org/cpu shows that your AMD GPU has completed several other WUs from Project 13422 but this one is somehow different.
Then we'll figure out what to do with the GPU.
Your GPU is running Core: Core22 Version: 0.0.11. It has been going through a process of bug fixing and I think you've found a new one. The developer is on the east coast, so I don't think we can contact him a 01:00 EST.
That leaves us 2 choices. 1) tell you how to dump the WU and hope that it's replaced with something your GPU can process or 2) wait until morning in NYC and let him recommend a way to figure out what's going on.
If you choose 2, it's actually better than leaving it running (wasting GPU power and gathering useless repeated messages.
https://apps.foldingathome.org/cpu shows that your AMD GPU has completed several other WUs from Project 13422 but this one is somehow different.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 33
- Joined: Sat Aug 22, 2020 4:37 pm
Re: Failed GPU slot daily :(
bruce, Just can't shake it this one! I hope I found something! Because before this, I was downing 13,000 point WU's in an hour with this bad puppy. If it were you how would you go about?
-
- Posts: 33
- Joined: Sat Aug 22, 2020 4:37 pm
Re: Failed GPU slot daily :(
It's up to you, I don't mind taking a 1am dump! If that fails we can unleash the devsbruce wrote:I don't have a good explanation as to why the Hawaii [Radeon R9 200/300 Series] is having trouble with Project 13422 (2975, 69, 2) but it is ... and it looks like it's looping. That's not good so I suggested the Pause. The log should continue to show (only) what's happening to the CPU assignment and it will be a lot easier to follow.
Then we'll figure out what to do with the GPU.
Your GPU is running Core: Core22 Version: 0.0.11. It has been going through a process of bug fixing and I think you've found a new one. The developer is on the east coast, so I don't think we can contact him a 01:00 EST.
That leaves us 2 choices. 1) tell you how to dump the WU and hope that it's replaced with something your GPU can process or 2) wait until morning in NYC and let him recommend a way to figure out what's going on.
If you choose 2, it's actually better than leaving it running (wasting GPU power and gathering useless repeated messages.
https://apps.foldingathome.org/cpu shows that your AMD GPU has completed several other WUs from Project 13422 but this one is somehow different.
(funny i was actually reading up about dumping a WU but I definitely need a walk though there.
Re: Failed GPU slot daily :(
You have to choose. Personally, I'd rather help fix a bug than complete more WUs, but both are important. Others don't always have the same preferences that I do.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 33
- Joined: Sat Aug 22, 2020 4:37 pm
Re: Failed GPU slot daily :(
I rather wait for Dev my friend thank you for your help.bruce wrote:You have to choose. Personally, I'd rather help fix a bug than complete more WUs, but both are important. Others don't always have the same preferences that I do.
Re: Failed GPU slot daily :(
I'll send him an email. If you change your mind, this should allow you to dump it. ... or if Dev takes Sunday off.
FAH's data files are at C:\Users\Crimson\AppData\Roaming\FAHClient
The work files are in \work\0n where n is the queue position. (01, in your case).
I'd make a backup of 01 somewhere. With the WU paused, think you can delete enough of the contents of 01 to force it to abort itself if that's your choice. There's no guarantee that the same thing might or might not happen to another WU.
FAH's data files are at C:\Users\Crimson\AppData\Roaming\FAHClient
The work files are in \work\0n where n is the queue position. (01, in your case).
I'd make a backup of 01 somewhere. With the WU paused, think you can delete enough of the contents of 01 to force it to abort itself if that's your choice. There's no guarantee that the same thing might or might not happen to another WU.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 33
- Joined: Sat Aug 22, 2020 4:37 pm
Re: Failed GPU slot daily :(
This is definitely Mr. Mcbuggy buggerton's bug house going on here.
Re: Failed GPU slot daily :(
I take it from your handle that you're a dedicated Red-box (AMD) fan. There are a number of unexplained AMD bugs that Green-box (nV) fans don't encounter. All the red-box fans will thank you for you dedication if we can fix this one.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 33
- Joined: Sat Aug 22, 2020 4:37 pm
Re: Failed GPU slot daily :(
Not necessarily, I'm not a fan of either at the time the R9 390 was the better buy oppose to the GTX 970 for Me and for gaming at the time. (I never thought I'd fold on my gaming rig.)bruce wrote:I take it from your handle that you're a dedicated Red-box (AMD) fan. There are a number of unexplained AMD bugs that Green-box (nV) fans don't encounter. All the red-box fans will thank you for you dedication if we can fix this one.
I don't see this ever happening on my two 660's I have folding right now. I just got my first Nvida cards this year and I'm loving both! I have to say up, until this, it was kicking butt with 390! Easy 200k 24 avg. But, i'm not so sure about AMD folding now... sheesh. I do wanna at least get it going again, she's a beaut of a card... she's worth it.
hint, hint, Roll Tide.
Re: Failed GPU slot daily :(
Oh, that Crimson
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 33
- Joined: Sat Aug 22, 2020 4:37 pm
Re: Failed GPU slot daily :(
Thanks for your help bruce. I did as you said with 01 folder and backed up and deleted. As I fired up FAH I watched in windows file explore tab 01 reappear as it should but still on PRCG 13422.bruce wrote:Oh, that Crimson
Maybe too soon to tell as of now. We shall see!
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Failed GPU slot daily :(
Project 13422 is the P part of PRCG ... R is run, C is clone and G is generation ... a PRCG identifies a specific WU within a Project ... hopefully you have actually got a new PRCG within Project 13422 ... related to the buggy PRCG - was the couple of days pause iirc you mentioned during the folding of that WU? - and did you pause the slot and exit the client before shutting down? - I am simply wondering if the initial failure of the WU may have been linked to some corruption caused at that point ... for a safe shutdown it can be worth pausing slots (there are threads discussing best time to do this relating to checkpoints) quit the client then wait a bit to ensure everything is saved properly before shutting down the system - this seems to minimise chances of issues from that respect.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)