18027 Bad Work Unit

Moderators: Site Moderators, FAHC Science Team

v00d00
Posts: 390
Joined: Sun Dec 02, 2007 4:53 am
Hardware configuration: FX8320e (6 cores enabled) @ stock,
- 16GB DDR3,
- Zotac GTX 1050Ti @ Stock.
- Gigabyte GTX 970 @ Stock
Debian 9.

Running GPU since it came out, CPU since client version 3.
Folding since Folding began (~2000) and ran Genome@Home for a while too.
Ran Seti@Home prior to that.
Location: UK
Contact:

Re: 18027 Bad Work Unit

Post by v00d00 »

Workunits fail. That's part of being in beta. That's why running beta is opt in. The chance of things going awry is generally quite high with beta workunits. If you want more stable workunits remove the beta or advanced flag from your client. Their is no shame attached to it. Most people who do beta just accept that things can go wrong. Obviously the more workunits you successfully complete make the likelihood of losing QRB less, so you could just opt out of beta for a while and build up a large number of completed workunits to lessen the chance of hitting the 80% cut off point. 20% of 1000 is a considerably bigger buffer than 20% of 100.

I have removed beta a few times in the last couple of months due to unstable workunits from certain projects. I then fold regular workunits for a couple of days and then go back on to beta. For whatever reason some of these workunits hate my RTX 2080 Ti, I have lost a handful now on these 182xx unit, but have also folded many more without issue, so i am leaning more towards an issue with the projects, as if it was hardware related, would they not all fail?
Image
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 18027 Bad Work Unit

Post by Neil-B »

Believe the project in question may actually have been released full fah hence folders concerns ... and I believe representations were made to the researcher about this.

I have run some 750+ p18201s and some 50+ p18202s and not had a single failure on my RTX3070 Win 11 setup so for me they have been stable ... different projects work gpus in different ways and even within projects there can be some slight variations in workload on gpus ... p18201s being decently large (from an atom count perspective) utilise my rtx3070 to the max (unlike some smaller atom count wus) and push to max power usage (running 2025MHz clocks) so it might be that they push your gpu right to the borderline on stability with the occasional one pushing it just too far? ... Have the errors been NAN ones?
Last edited by Neil-B on Thu Dec 30, 2021 11:02 pm, edited 2 times in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 18027 Bad Work Unit

Post by Joe_H »

Yes, Project 18027 was released by the researcher to full F@h with no beta testing, little internal testing, and with no advance notice. There have been messages sent to that researcher.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: 18027 Bad Work Unit

Post by psaam0001 »

Like WT*.... Seems like someone was in a little bit of a hurry to get going on this, but forgot to institute the proper testing protocols (which may make the process of getting valid information from the data more difficult).

FWIW: I'm running what gets assigned to my system.

Paul
v00d00
Posts: 390
Joined: Sun Dec 02, 2007 4:53 am
Hardware configuration: FX8320e (6 cores enabled) @ stock,
- 16GB DDR3,
- Zotac GTX 1050Ti @ Stock.
- Gigabyte GTX 970 @ Stock
Debian 9.

Running GPU since it came out, CPU since client version 3.
Folding since Folding began (~2000) and ran Genome@Home for a while too.
Ran Seti@Home prior to that.
Location: UK
Contact:

Re: 18027 Bad Work Unit

Post by v00d00 »

Oh ok, ignore my post.

This has happened before and it sucks. Probably a researcher who either doesn't know that new workunits are supposed to be tested by betateam for a week or two to make sure these mishaps don't happen, or maybe someone who doesn't care about following protocol. Either way someone from Mod team should shoot the problem up the food chain to whomever owns the project and let them know it needs tweaking. With any luck the project will get pulled from public and pushed back into beta for a bit until the problem is ironed out.
psaam0001 wrote:FWIW: I'm running what gets assigned to my system.
Same here. Not really bothered what I fold, as long as I fold.
Neil-B wrote:... Have the errors been NAN ones?
My errors were a mixture of NaN and Cuda/random weird error codes, plus Core_18 has dumped a few times for no reason. I've done all the usual checks on my 2080, cleaned out the cooling system, checked the RM850 is giving correct voltages, etc, and it doesnt appear to be the Card, PSU or OS. I even tried underclocking the card by 100MHz to see if it would work, but core_18 still dumped. So reset back to stock and just started keeping an eye on it for when it dies. I do wish Windows would kill and restart it automatically instead of telling me it died.
Image
Post Reply