Page 1 of 1
Having more GPU issues; ways to test configurations?
Posted: Fri Apr 10, 2020 9:34 pm
by RiaSkies
I've been continuing to try to get my GPU to successfully fold, but I am just not having any luck at stability. CPU is folding fine without any issue, but the GPU continues to spit out errors. Usually, it's a 'particle coordinate is NaN error', though occasionally I get 'clCreateCommandQueue (-6)' errors. Sometimes, the (F@H) core will retry from a saved state, sometimes it just gives up and shut the core down.. From what I have seen, this often indicates some sort of OC issue. However, even after using software to downclock the GPU and VRAM to reference base speed, turning up the fan curve to be very aggressive, slight increases in the voltage for extra stability, and lowering the power target near the minimums allowed (-45% or so), I am still finding myself unable to successfully fold.
I'm currently checking system memory, and there's been no issues with CPU folding or issues on my old GPU, so I am not sure the source of instability is. This has happened on many WU's to the point where the issue is definitely with this graphics cards.
I'd like to continue to tweak settings, but I don't want to test running my GPU on live WU's until I can ensure that I have some baseline level of stability, in order to ensure that the science isn't being hindered by potentially faulty hardware. So I'd like to know if there is any way to get old, already run, known-to-be-good dummy WU's to run in order to verify stability as I continue to try to tweak settings (or decide if the card is just faulty and needs replacing).
Re: Having more GPU issues; ways to test configurations?
Posted: Fri Apr 10, 2020 10:07 pm
by toTOW
Re: Having more GPU issues; ways to test configurations?
Posted: Fri Apr 10, 2020 10:09 pm
by RiaSkies
Will this work on a 5600 xt? I understand that Core 21 doesn't work with Navi cards for architectural reasons.
Re: Having more GPU issues; ways to test configurations?
Posted: Fri Apr 10, 2020 10:20 pm
by toTOW
Good point. Current FAHBench version is still based on latest Core 21 revisions.
Re: Having more GPU issues; ways to test configurations?
Posted: Sat Apr 11, 2020 12:19 pm
by Sven
What power supply are you using?
Not only the wattage, but the brand is also important.
Re: Having more GPU issues; ways to test configurations?
Posted: Sat Apr 11, 2020 1:25 pm
by RiaSkies
500 W EVGA power supply.
Given that the CPU folds fine and hasn't had any issue, it could be an issue with the 6+2 pin connector, but my old GTX 970 using 2x 6 pin connectors was folding fine as well. So I do not believe the PSU is the likely culprit.
Re: Having more GPU issues; ways to test configurations?
Posted: Sat Apr 11, 2020 2:10 pm
by PantherX
EVGA is a good brand, what's the efficiency of it? What's your CPU model? Do you have additional fans, multiple HDDs, etc. In theory, 500 Watts should suffice as long as it isn't damaged. How old is it?
Re: Having more GPU issues; ways to test configurations?
Posted: Sat Apr 11, 2020 6:58 pm
by RiaSkies
PantherX wrote:EVGA is a good brand, what's the efficiency of it? What's your CPU model? Do you have additional fans, multiple HDDs, etc. In theory, 500 Watts should suffice as long as it isn't damaged. How old is it?
80+ (White), Ryzen 1700 @ 3.2 GHz, one CPU fan & one case fan, 2 HDD's & 1 M.2 SATA SSD, 1.5 yrs
Re: Having more GPU issues; ways to test configurations?
Posted: Sat Apr 11, 2020 7:22 pm
by HaloJones
try turning off the cpu folding slot and see if it can cope. if it still can't you've probably ruled out the PSU. It it can, it strongly suggests the PSU may not be enough.
Re: Having more GPU issues; ways to test configurations?
Posted: Sat Apr 11, 2020 7:31 pm
by jrweiss
A good 500W PSU should have plenty of power, and that one has a single 40A 12V rail.
However, you may want to check the voltage stability under high load. Some GPUs are very sensitive to voltage fluctuations, and a PSU that just meets the +/- 5% (0.6V) spec may not be good enough. See if HWMonitor (from CPUID) shows any fluctuations on the 12V rail over time. Mine shows +/- 0.8% (0.096V)...
Re: Having more GPU issues; ways to test configurations?
Posted: Sun Apr 12, 2020 10:57 pm
by RiaSkies
Now that there's a new FAHBench on Core 22, I was able to do more GPU testing; still getting NaN errors even with stock configuration, downclocked to factory specs, and with extra fan speed, all while no CPU core is running.
Re: Having more GPU issues; ways to test configurations?
Posted: Thu Apr 23, 2020 3:26 pm
by Sven
The original 80+ power supplies (without bronze, gold, platinum, etc.) are mostly ATX V2.3 or lower and aren't ready for newer graphics cards.
Because of the strong power fluctuations required by the new chips. The continious load wouldn't be a problem, but the short spikes can lead to short voltages drops. That can lead to crashes of the FAH-Cores.
I would test a modern, quality vendor power supply
Re: Having more GPU issues; ways to test configurations?
Posted: Thu Apr 23, 2020 6:40 pm
by jrweiss
BTW, I just read the ATX12V v2.0 PSU Design Guide, and it allows +/-10% voltage variation on a second 12V rail at peak loading. It also allows 200mV Noise/Ripple on 12V2, but only 120mV on 12V1.
So, you may also have to look at your PSU wiring diagram and ensure your GPU is fully powered by +12V1DC if you have 2 rails. Just another reason to look at mfgrs specs and trustworthy reviews before buying...