Maybe a bit off topic - but this WU is already in generation 50. How can it be faulty? Does my understanding of the whole process still lack?muziqaz wrote:If one person reports Faulty WU, we question that person's hardware, of two or more return the same WU as faulty, we start questioning the WU
AMD GPU Error sortShortList on some projects
Moderators: Site Moderators, FAHC Science Team
Re: AMD GPU Error sortShortList on some projects
Re: AMD GPU Error sortShortList on some projects
Well, in my case my hope was that using the 'advanced' settings would result in less errors because it would result in downloading more of newer and better coded applications as indicated by Neil-B earlier in this thread but in my case it did not help.Jan wrote:Client-type advanced will not fix any errors. It will simply make your client looking for advanced WUs (which are WUs that just made it out of beta testing) additionally to "normal" WUs and thats it. Afaik.
muziqaz might have a point, as this WU has been returned 2 or 3 times as faulty. Have you had other WUs on your GPUs so far/since then?
I have checked the logs to day from my computers running the GTX1070 and RTX2080 and have no errors on work units running on those cards.
I will not run any more folding on my Radeon cards for a while, I check back in the Forum in few weeks to see if the problem is still there.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: AMD GPU Error sortShortList on some projects
.. tbh, I wasn't recommending such as a solution, simply trying to explain to a previous poster one reason why they might be seeing their issue when folding FAH but not when folding ADV … Apologies if it came across that I was promoting this as a solution.Simplex0 wrote:… as indicated by Neil-B earlier in this thread but in my case it did not help.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Posts: 946
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: AMD GPU Error sortShortList on some projects
project is in gen50, WU is a single Work Unit you download to processJan wrote:Maybe a bit off topic - but this WU is already in generation 50. How can it be faulty? Does my understanding of the whole process still lack?muziqaz wrote:If one person reports Faulty WU, we question that person's hardware, of two or more return the same WU as faulty, we start questioning the WU
That particular WU is most likely Faulty, but it does not mean the whole project or even Generation is faulty as well
It is your choice of course, but in my opinion this solution is too drastic. This type of error you are encountering is not very frequent, and is independent of type of GPU folder is running. Currently these random failed WUs are acceptable, and known. Devs are working on better handling of these errors, though.Simplex0 wrote:
I will not run any more folding on my Radeon cards for a while, I check back in the Forum in few weeks to see if the problem is still there.
FAH Omega tester
Re: AMD GPU Error sortShortList on some projects
Sure. I just didnt think the new WUs after so many generations could still (or rather: newly) be faulty. I probably dont understand the generating process of these WUs well enough. And now I'm done derailing this thread.muziqaz wrote:project is in gen50, WU is a single Work Unit you download to process
That particular WU is most likely Faulty, but it does not mean the whole project or even Generation is faulty as well
Re: AMD GPU Error sortShortList on some projects
No problem, the fact is that sam6861 observed a reduction in errors after using this settings and it was worth trying.Neil-B wrote:.. tbh, I wasn't recommending such as a solution, simply trying to explain to a previous poster one reason why they might be seeing their issue when folding FAH but not when folding ADV … Apologies if it came across that I was promoting this as a solution.Simplex0 wrote:… as indicated by Neil-B earlier in this thread but in my case it did not help.
It is usually like that, you observe a change and come up with a assumption on WHY that happened.
That can finally turn out to be wrong but is was still a plausible explanation at that time.
Re: AMD GPU Error sortShortList on some projects
Fact is that this type of errors was very frequent on my R9 290 cards and close to nonexistent on my Nvidia cards, I have observed a lot of this type of errors on my AMD cards lately an non on my Nvidia cards.muziqaz wrote:Jan wrote:It is your choice of course, but in my opinion this solution is too drastic. This type of error you are encountering is not very frequent, and is independent of type of GPU folder is running. Currently these random failed WUs are acceptable, and known. Devs are working on better handling of these errors, though.muziqaz wrote:If one person reports Faulty WU, we question that person's hardware, of two or more return the same WU as faulty, we start questioning the WU
I am wondering if this type of work units are sent more frequently to specifically AMD cards maybe? I will try to dig in a little deeper next week.
For now I can say that in the log files covering 15 days on my computer running Nvidia cards I have 0 cases of Bad state detected, BAD WORK UNIT(114=0x72)
On the computer with AMD R9 290 cards I have in the log files covering 9 days found 8 work units which resulted in Bad state detected, BAD WORK UNIT(114=0x72
Thank you all for your support
-
- Posts: 946
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: AMD GPU Error sortShortList on some projects
So maybe it is time to clean the fans of the card, and maybe reduce the clocks 290 is VERY old card, it's possible that VRMs are on their last legsSimplex0 wrote: Fact is that this type of errors was very frequent on my R9 290 cards and close to nonexistent on my Nvidia cards, I have observed a lot of this type of errors on my AMD cards lately an non on my Nvidia cards.
I am wondering if this type of work units are sent more frequently to specifically AMD cards maybe? I will try to dig in a little deeper next week.
For now I can say that in the log files covering 15 days on my computer running Nvidia cards I have 0 cases of Bad state detected, BAD WORK UNIT(114=0x72)
On the computer with AMD R9 290 cards I have in the log files covering 9 days found 8 work units which resulted in Bad state detected, BAD WORK UNIT(114=0x72
Thank you all for your support
FAH Omega tester
Re: AMD GPU Error sortShortList on some projects
The computer is all water cooled, custom loop, and the temperature on the GPU and VRM on my graphic cards stays under 65 °C at all time.muziqaz wrote:So maybe it is time to clean the fans of the card, and maybe reduce the clocks 290 is VERY old card, it's possible that VRMs are on their last legsSimplex0 wrote: Fact is that this type of errors was very frequent on my R9 290 cards and close to nonexistent on my Nvidia cards, I have observed a lot of this type of errors on my AMD cards lately an non on my Nvidia cards.
I am wondering if this type of work units are sent more frequently to specifically AMD cards maybe? I will try to dig in a little deeper next week.
For now I can say that in the log files covering 15 days on my computer running Nvidia cards I have 0 cases of Bad state detected, BAD WORK UNIT(114=0x72)
On the computer with AMD R9 290 cards I have in the log files covering 9 days found 8 work units which resulted in Bad state detected, BAD WORK UNIT(114=0x72
Thank you all for your support
You are right regarding the fact that it is indeed very old graphic cards and that seams to be the problem, after reducing the GPU-clock to 80% on all cards everything works just fine now.
Thank you for your support muziqaz.
Re: AMD GPU Error sortShortList on some projects
The errors with the keyword "sortShortList" are unique to AMD GPUs and simply do not occur on nV hardware.
"Bad state detected, BAD WORK UNIT(114=0x72)" covers that case as well as several other possibilities across both brands of GPUs. If you eliminate the sortShortList errors, are the Bad State errors about the same?
"Bad state detected, BAD WORK UNIT(114=0x72)" covers that case as well as several other possibilities across both brands of GPUs. If you eliminate the sortShortList errors, are the Bad State errors about the same?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: AMD GPU Error sortShortList on some projects
Just a quick note here: We've fixed this issue in OpenMM:
https://github.com/openmm/openmm/pull/2631
We're just working on backporting the fix into core22.
Thanks for your patience!
~ John Chodera // MSKCC
https://github.com/openmm/openmm/pull/2631
We're just working on backporting the fix into core22.
Thanks for your patience!
~ John Chodera // MSKCC