@ Dr.G
I read the thread. Thanks for the link.
@ PS3EdOlkkola
Thanks for the detailed post and it makes perfect sense. These machines have had some lengthy troubleshooting battles before so I'll give a quick list of what has been checked when I was running into the same issues with Core 17 WU's and a driver update eventually took care of it.
The components in each machine are identical. Same manufacturer as well, so I mean literally identical.
1) under clock the GPU's
2) under clock the RAM.
3) disable CPU folding.
4) change GPU's from machine to machine.
5) change GPU slots in the machines.
6) tried different power plugs on the Corsair PSU's to rule out bad plugs. The PSU's in all three machines are Corsair AX1200i (single rail).
7) change the BIOS so PCIE2 is used instead of PCIE3.
8) updated motherboad BIOS to latest version.
9) ran stress tests on the RAM (no errors found).
10) swap memory from machine to machine.
11) reinstalled Windows.
12) after a fresh Windows install went straight to 14.4 (at the time). I just did a clean Windows install a couple days ago and went to 14.7RC3 and then to 14.9.
13) tried over-volting the GPU's and motherboard.
After all these things every machine behaved exactly the same. Nothing made any difference at all and the memory tests showed nothing wrong. I think I even ran some CPU stress tests. I really don't want to run though all that again. At the time since no real solution could be found I pulled GPU's. When 14.7RC3 was released I had a little free time and on a whim decided to check to see if the machines could handle more GPU's and suddenly they could. Core 17's ran without incident for days. They wouldn't run 10 minutes in a multi-GPU system prior to the 14.7RV3 release though. I need to be careful when i say they wouldn't run for X minutes though because sometimes they would run for a hour or more and then die five times in a row immediately. I guess I should say 10 minutes on average. The machines are behaving the same way with the Core 16's now.
So I'm not sure what about Core 16 would be different enough to cause the same issues that wouldn't have been found in the first round of tests. I could disable CPU testing on a machine to see if it makes any difference with the surrounding components just to be sure that isn't it.
Regarding airflow, each motherboard is in a Corsair Carbide Series Air 540 ATX Cube Case with both side off. They get a lot of airflow.
The frustration level is high because there are always going to be issues with various cores on some hardware and at the moment we have no way to deal with them. If I had my way, I would dump my GPU's and buy some GTX 970's, but the wife has informed me in no uncertain terms that if I do that I'll lose a body part I'm rather fond of.
![Very Happy :D](./images/smilies/icon_biggrin.gif)
So I'm going to have to make due with what I have.