Re: Project 5801 issues. [Should be Offline]
Posted: Wed Oct 29, 2008 1:05 am
And this is understandable Mr. Pande. Thanks for your feedback
Community driven support forum for Folding@home
https://foldingforum.org/
Precisely... nothing revolutionary... even if just a couple WUs were run, this problem would have been evident and halted before it ever became a problem.theo343 wrote:And implement that you always reQA a Project on the latest forced core, before you distribute the project. Record what core the Project was QAed on so you know if you have to reQA it before release.
Mr. Pande has the patients of a saint.VijayPande wrote:We keep an eye on the forum, but the first post was just a few hours ago. Due to staff having other responsibilities, our response will typically be on the hours time scale not minutes time scales for issues like this. I wish it could be faster, but that's what we're staffed to do at the moment.
VijayPande wrote:PS In case you're curious:
This was beta tested before (this was a project # change due to a move onto a new server -- which was done to try to keep work around while the CS servers were down).MoneyGuyBK wrote: I am surprised that:
1) F@H released this WU in such a bad stateWe keep an eye on the forum, but the first post was just a few hours ago. Due to staff having other responsibilities, our response will typically be on the hours time scale not minutes time scales for issues like this. I wish it could be faster, but that's what we're staffed to do at the moment.However, more stumped that:
2) F@H has not chimed in here officially after 7 Pages of comments
Two words: regression testingVijayPande wrote:PS In case you're curious:This was beta tested before (this was a project # change due to a move onto a new server -- which was done to try to keep work around while the CS servers were down).MoneyGuyBK wrote: I am surprised that:
1) F@H released this WU in such a bad state
Well ... I think you missed at least one of the QA steps ...VijayPande wrote:Sorry about the really nasty problem on this one. It was definitely strange since these WU's were QA'd before. I think this may be an issue where they were QA'd on an earlier core and 1.15 is causing issues.
5801 was just a copy of another project, which did go all the way through QA. Nevertheless, I will have a talk with the responsible parties about this.toTOW wrote:Well ... I think you missed at least one of the QA steps ...VijayPande wrote:Sorry about the really nasty problem on this one. It was definitely strange since these WU's were QA'd before. I think this may be an issue where they were QA'd on an earlier core and 1.15 is causing issues.
p5800 was fully tested through the whole QA process ... but not the p5801
1.15 passed all of the regression testing on machines at Stanford and NVIDIA and then passed FAH beta testing. There's not much more we can do than that before releasing it. Keep in mind that we now know that for many people (some boards), 1.15 is perfectly fine and stable, whereas for others, it doesn't work at all. If that's the case, my guess is that this is a CUDA or hardware issue. If the code in 1.15 were really broken, it would not work on any hardware, which is definitely not the case. We're working with NVIDIA on this one. The first step is to get the problem reproducible in their labs.shatteredsilicon wrote: Two words: regression testing
This makes it all the more shocking just how broken the nVidia core 1.15 is.
So does this mean CUDA isn't compatible with all hardware which is supposed to be compatible with it, or does it point to the implementation of CUDA by the clients isn't compatible with all hardware? Or is it to soon to tell? I would hope it's the last option, as in the first case I'm afraid you don't have the same expedience in getting it sortedVijayPande wrote:The bottom line here is that it is becoming clear that what works on some CUDA hardware platforms does not universally work on all. We have since gotten a few of the boards that cause problems and have included them in our recent testing.
Technically, if the same code work on certain cards but not on others, we can look at the driver or hardware level. However, the core is partly to be responsible of this as well so it's a two-side work to find out what wrong (NVIDIA with the CUDA code and PG with the core). This is what make debugging of this issue very hard.MtM wrote:So does this mean CUDA isn't compatible with all hardware which is supposed to be compatible with it, or does it point to the implementation of CUDA by the clients isn't compatible with all hardware? Or is it to soon to tell? I would hope it's the last option, as in the first case I'm afraid you don't have the same expedience in getting it sortedVijayPande wrote:The bottom line here is that it is becoming clear that what works on some CUDA hardware platforms does not universally work on all. We have since gotten a few of the boards that cause problems and have included them in our recent testing.