Seriously PG, WU's 5765-5767!!!

Moderators: Site Moderators, FAHC Science Team

Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

You guys really need to nix the 5765,5766,5767, WU's they are ridiculous.
I'm not trying to cherry pick here or anything, just want to further the science! Send me the slow, low pointers, whatever,,,,
BUT,,,,
I have (2) 8800GT's folding 24/7. I can fold any and everything under the sun that you send to me.
At stock speeds.
At OC speeds, up to 1750mhz on the shaders.
Literally they don't run hot(relatively speaking) or even hiccup.

Throw me one of those 3 WU's and bring on the 'Unstable_Machine' till it EUE's and pauses 24hrs. I don't get those on any of the other 3 folders i have, why, i don't know. But as sure as we're all here on mother earth, when i get one it's done!
What's really bad is if i get one in the middle of the night, or when i'm done babysitting the machine for the night, it'll sit there and suck electricity all night and do nothing till i catch it in the morning.
Even though i work outta my home office, i don't have the time to babysit ONE rig all day!
I can't believe the server's with the current technology, and expertise you guys have, can't see this and stop sending those to that machine.

I've tried;
-not oc'n the gpu's
-uninstall folding
-reinstall folding
-227 different driver's install/uninstall/drivercleaner/install/etc.(yes 227, is a bit of a stretch but after 1/2 dozen or more it seems like that many)
-even a fresh OS install(what a pain, just to find out if it's the machine!)


It's NOT my setup.


Please pull these things, or at least stop sending them to me! Or shoot me some guidance/options on how to handle these. The machine seems like it's pausing more than it's folding now!
I mean, use the equipment, even till it burns up if you want, i got no problem with that!
But using it 20% of the time and pausing it the other 80%, just isn't cost effective!

My family has Alzheimer's ad dementia, that's why i fold!
The WU i fold today just might save my own life 20 years from now, LOL!!!!

Help a brotha out here!!!


Frustrated to no end in AZ,

Mr. $cary
patonb
Posts: 348
Joined: Thu Oct 23, 2008 2:42 am
Hardware configuration: WooHoo= SR-2 -- L5639 @ ?? -- Evga 560ti FPB -- 12Gig Corsair XMS3 -- Corsair 1050hx -- Blackhawk Ultra

Foldie = @3.2Ghz -- Noctua NH-U12 -- BFG GTX 260-216 -- 6Gig OCZ Gold -- x58a-ud3r -- 6Gig OCZ Gold -- hx520

Re: Seriously PG, WU's 5765-5767!!!

Post by patonb »

Funny, I love thoses, even with my 8800gt. Get over 6kppd
WooHoo = L5639 @ 3.3Ghz Evga SR-2 6x2gb Corsair XMS3 CM 212+ Corsair 1050hx Blackhawk Ultra EVGA 560ti

Foldie = i7 950@ 4.0Ghz x58a-ud3r 216-216 @ 850/2000 3x2gb OCZ Gold NH-u12 Heatsink Corsair hx520 Antec 900
Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Re: Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

HOLY COW! Serious?
what driver are you using?
I see you have a 680i mobo, my p6n diamond is 680i as well:)
can you point me in the right direction as to how/what your settings/setup is?

That's why i love this place, i figured if someone replied like yours, then maybe there's some hope!!!

thanks for the reply patonb!

M$

ps-edit;;;;;;i see the 88 is on your asus pq5-n, is that a 680i chipset as well?
kiore
Posts: 921
Joined: Fri Jan 16, 2009 5:45 pm
Location: USA

Re: Seriously PG, WU's 5765-5767!!!

Post by kiore »

Are you sure those are the ones you are having problems with? I too am surprised, my 9800Gts just eat them up. I don't disbelieve you different cards seem to like different units, but these are usually pretty stable from my experience.
Image
i7 7800x RTX 3070 OS= win10. AMD 3700x RTX 2080ti OS= win10 .

Team page: https://www.rationalskepticism.org/viewtopic.php?t=616
Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Re: Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

@kiore-thank you as well for the response, sounds to me like i MUST be missin' something here.
but, yea, those are the one's that give me the unstable_machine over and over till it eue's!
i can run the 1010x's, 660x's, and just about every other wu that comes down the pipe, as soon as i get one of those 3 it's game over:(

M$
patonb
Posts: 348
Joined: Thu Oct 23, 2008 2:42 am
Hardware configuration: WooHoo= SR-2 -- L5639 @ ?? -- Evga 560ti FPB -- 12Gig Corsair XMS3 -- Corsair 1050hx -- Blackhawk Ultra

Foldie = @3.2Ghz -- Noctua NH-U12 -- BFG GTX 260-216 -- 6Gig OCZ Gold -- x58a-ud3r -- 6Gig OCZ Gold -- hx520

Re: Seriously PG, WU's 5765-5767!!!

Post by patonb »

Hehe, thanks for pointing out... wrong board.. Its actually a p5n-e, and it is the 650i chipset.
Out of curiousity... do you eue at the start, and get the same wu back over and over?

I get that every so often.. 1 wu fails and 24hrs me.

What psu do you have? and who makes your 8800?

Drivers are 195.62 on XP Home btw.
WooHoo = L5639 @ 3.3Ghz Evga SR-2 6x2gb Corsair XMS3 CM 212+ Corsair 1050hx Blackhawk Ultra EVGA 560ti

Foldie = i7 950@ 4.0Ghz x58a-ud3r 216-216 @ 850/2000 3x2gb OCZ Gold NH-u12 Heatsink Corsair hx520 Antec 900
Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Re: Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

No prob with the mobo, that's definitely a hehehe!
and yes, i eue at the start when it's calling up the arguments it looks like this;
[15:30:09] Calling fah_main args: 14 usage=100
[15:30:09]
[15:30:11] Working on Protein
[15:30:11] mdrun_gpu returned
[15:30:11] Self-test failure
[15:30:11]
[15:30:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:30:13] CoreStatus = 7A (122)
[15:30:13] Sending work to server
[15:30:13] Project: 5767 (Run 10, Clone 222, Gen 1528)



etc. etc. etc........the eue pausing 24 hrs...........

happens on like i siad 5765,5766,5767,

the rest of them don't do it! and, LOL!! i'm looking at my HFM.NET on my other screen and low and behold i finished my 10105 and presto, here's a 5765(Run 13, Cone 64, Gen 1688) and yep, you guessed it,,,,,
eue limit exceeded. pausing 24 hours!

so off i go, shut down client, delete everything in file, start up again, hope it grabs a different one, voila' it does and i'm off and running again, till the next one:( the other 88 pulled a 5771 and is crunching at a nice 5754ppd! go figure!

M$
Wrish
Posts: 74
Joined: Thu Jan 28, 2010 5:09 am

Re: Seriously PG, WU's 5765-5767!!!

Post by Wrish »

Is that the only rig with 2 video cards on one motherboard? Might swap the second GPU to another machine and shuffle slots, if that's the case. Seen many instant EUE's on multi-GPU systems on not-quite-compatible motherboards.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Seriously PG, WU's 5765-5767!!!

Post by bruce »

Mr. Scary wrote:[15:30:11] Working on Protein
[15:30:11] mdrun_gpu returned
[15:30:11] Self-test failure
I've never seen an explanation for the message "Self-test failure" but it sure sounds like the output from a GPU diagnostic test that checks the hardware before the WU actually starts processing.

You've dismissed all WUs from three projects without giving examples that we can check. The only two WUs that are explicitly mentioned in your report are Project: 5767 (Run 10, Clone 222, Gen 1528) and Project: 5765 (Run 13, Cone 64, Gen 1688)

I checked on them to see if they happened to be bad WUs. One client reported a problem with (P5767 R10 C222 G1528) and it was successfully completed by several other clients. None of the reports are listed as being from Mr. Scary, though that may not be your FAH name. I don't have any way to determine if that one client was a 8800GT or what drivers they were using.

There's only one report from (P5765 R13 C64 G1688)
WU (P5765 R13 C64 G1688) was added to the stats database on 2010-03-09 13:04:51 for 353 points of credit.
Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Re: Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

@Wrish-don't have a mobo to try it on right at the moment, might have to work on that. but thanks for the response and suggestion:)
@patond-i believe it's a 650w
@Bruce-thank you as well for the response! :)
self test failure is a puzzle to me as well. I think your gpu diag test guess is probably as good or close as any!!!
next, i haven't in the past actually paid much attention to the "( )" portion of the projects before, that's probably my bad. If it was running and sending in the science, just call me happy as a clam!
That being said, let me know exactly what i need to post, send, or look for and i'm all over it like white on rice!
It's killin' me having this thing sit here and suck electricity AND MY $$$, and noone's gettin' diddly for it!!!!! GRRRRRRRRRRRR!!!!!
I have the following rigs running whatever they get with no probs.
1. p6n diamond mobo, with qx6800, oc'd 3.3 and x2 9800GX2's----4 gpu clients 1 smp
2. 780i ftw in a xps 630i, QX6850, oc'd 3.67, x2 8800gt xfx alpha dog's, liquid cooled only folding part time no probs, 2 gpu clients 1 smp
3. 650i mobo, q6600 @ 3.2, x2 gtx260's 2gpu clients 1 smp
4. (problem child) p6n diamond, e8200 2 3.0ghz, x2 8800GT's, (vanillanvidia geforce).

here's the sitch!!!,,,,,,the two alpha dogs and the two current 8800GT's are all going in a new build custom liquid case! so there'll be 4 of them in there. all liquid cooled along with a quad of some sort liquid as well.

these cards at one time were originally paired up with both of those p6n mobo's and at one time had a total 7 8800gt's between the two of them running(including the two we're talking about). got a darn pic somewhere, i'll find it, LOL!!!
that's what's puzzling me.

Anyway, again, as i'm typing, i look up at hfm, and you guessed it! :(
So, here's what's NOT running now. and, ummmm, doin' the pausing 24hrs thing!

gpu 1-p5765(r13,c64,g1688)
gpu 2-p5765(r13,c0,g1836)

does that help at all?

again, tell me what i need to tell you.


thanks all for the replies and such so far!!!
gotta be an end in site!

m$
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Seriously PG, WU's 5765-5767!!!

Post by toTOW »

bruce wrote:
Mr. Scary wrote:[15:30:11] Working on Protein
[15:30:11] mdrun_gpu returned
[15:30:11] Self-test failure
I've never seen an explanation for the message "Self-test failure" but it sure sounds like the output from a GPU diagnostic test that checks the hardware before the WU actually starts processing.
When I started getting random Self-test failures, it was the begging of the end of the card ... then it went worse, and the next step was UM errors (NaNs on GPU) at random progresses of the WUs, with some going perfectly fine, and finally, it wouldn't fold anything, although it didn't show any error in other tests (MemtestG80, 3D stress tests ...). I sent the card back to manufacturer who confirmed it was defective (but I wish they said what was wrong with the card ...) and sent me a replacement.

Mr. Scary > I guess a MemtestG80 test doesn't show any issue on your board ?

A WU should be attempted 6 times (5 in row then pause for 24H + 1 attempt on client restart), do you get the exact same failure on all attempts, or do you get a mix of Self-tests and NaNs detected on GPU at random progress ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Re: Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

thanks toTOW for your reply, especially as late as it is!
I think you and I went down this road before, but, it's all good. at this point i am truly willing to do whatever it takes to fix this, barring any further budget crunches.(i'm working on a liquid folder as we speak, and having a hard time passing costs thru the boss/wife! :( )))

I will gladly run another MemtestG80 on the gpu in question. This, in hopes of not only solving the problem, but providing not only quality science, but a solution to anyone having this prob in the future.
I went back 5-6 pages and found multiple 576x probs posted. This is by no means a finger pointer!!!! AND, i appreciate bRuce researching as he did.

Please stay tuned for results as i appreciate any and all input/suggestions,,,,as these gpu's are slated for a much bigger/anticipated folding build i'm working on. Link to follow!

It truly is all for science and I want all responders to know this. :)

Results to follow::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

M$
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Seriously PG, WU's 5765-5767!!!

Post by toTOW »

It's 11:00am here in France ... it's not late :mrgreen:

I'm just asking questions that usually help ... but if you confirm that you already ran all the tests without error, we're stuck and I've no more ideas :( That's why I gave you the example of my card (which pissed me for weeks before I decided to get it replaced) which I have been unable to make it fail on other appliations :(

This kind of failures are a real nightmare to diagnose unfortunately ... I prefer when the boards die with obvious symptoms :mrgreen:
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Re: Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

i'm all over dying with obvious symptoms!! give me a puff of smoke or something!!!
it's 3:14am here in Arizona, USA!! WEEEEEEEEEEEEEEEEEEEE!!!
I'm a night owl, what can i say! hell, i'm up babysitting the damn folder, LMAO!!!
loaded the test, pasted the whatever over there to get it to run.
ran it, 50 iter's no errors.

i'm lost with being able to test the other card though! help..??..??..??...

i double click the exe in the memtest file and all i get is the do you want to transmitt dealio. do i type the [space] --gpu 1 for the second card there???

any guidance would be helpful. and again, the first AND second go around didn't produce any errors.

M$
Mr. Scary
Posts: 35
Joined: Fri Jul 04, 2008 7:13 pm
Hardware configuration: The Rock-XPS630i/[email protected]/OCZ Plat 1066 2x2gb/(2)8800GT AlphaDogs/(2)10krpm Raptors Raid0&(2)WD HDD 500gb
Nekid-CoolerMaster CM690/MSI-P6N Diamond/[email protected]/Samsung 2x2gb 800mhz/(2)9800GT's/TX2 HSF/Corsair TX850W PSU/(2)WD 160gb HDD's Raid 0, (1)500gb WD & (1)1.5tb WD HDD
Bare Nekid-MSI-P6N Diamond/[email protected]/OCZ ReaperX1000mhz 2x2gb/(2)8800GT's/Corsair TX750W PSU/160gb WD HDD
Buck Nekid-Xion XON-303/Dell'd 650iMobo/[email protected]/Naya 4x1gb 667mhz/(2)8800GT's SLI/750W PSU/320gb WD HDD

Re: Seriously PG, WU's 5765-5767!!!

Post by Mr. Scary »

when the dos window comes up how do i tell it to do something else other than run the standard/stock test!
i.e.,,,the command line?
no matter what i type at the 'do you want to transmit line' it starts the test!
sorry for the noob scene, but what am i missing?
i guess i didn't do this before as i don't remember this monkey $#*T! LOL!!!

m$
Post Reply