WU reassigned before timeout or after "Ok" completion?

Moderators: Site Moderators, FAHC Science Team

Post Reply
Hopfgeist
Posts: 70
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

WU reassigned before timeout or after "Ok" completion?

Post by Hopfgeist »

I sometimes look up the stats of individual work units I have done, sometimes to discover that they had been previously assigned, but returned "Faulty 2" (sounds almost like "42" :lol: ), or vice-versa, that my client had returned a fault, or had let a WU run past the timeout, so it got reassigned. I fully understand that, so please don't explain the normal procedure to me.

But take a look at this work unit: it has been reassigned three times within roughly one hour or less after the previous assignment, long before any timeout.

What makes this even weirder is that the units got reassigned shortly after the "Credited" time (sometimes within less than a minute), which is when the unit was actually uploaded to the work server, but before the "Returned" date, which I figure is the time a server one rung up from the download/upload servers takes a look at it, updates the database, and creates follow-up work units (next "Gen")

How and why does this happen? It seems like a big waste of computing power if this happens regularly (I found at least two instances just searching for a minute through random permutations of RCG parameters for Project 14717).


Cheers,
HG
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: WU reassigned before timeout or after "Ok" completion?

Post by Neil-B »

So first thing ..., Credited and Received are labeled incorrectly and should be labeled the other way round - known issue.

As to rapid reissues of same WU ... There has been on occasion a bug with the generation scripts of the WS where it fails to increment up ... reports such as yours allow these to be spotted (if already not spotted by researcher and the server sorted) ... In the past this has happened when projects have been move from one server to another (fairly rare occurrence) which sometimes breaks the scripts and I believe this may have happened to some projects as a result of the recent issues with various servers being down.

Hopefully a message will be forwarded to the researcher concerned to check the scripts on the server.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Hopfgeist
Posts: 70
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: WU reassigned before timeout or after "Ok" completion?

Post by Hopfgeist »

Thanks, Neil-B, I though it might have to do with a slight hiccup when shuffling projects between servers in the middle of a run.

I don't mind how the columns are labelled, I just noticed that the "Credited" column coincided exactly with the time my client reports a successful upload. And it makes sense to award points for that exact time because any subsequent processing server-side is out of the control of the client and should thus not be penalised when calculating the bonus, so if that is the way it works, I consider the "Credited" column labelled correctly, but indeed, "Returned" would certainly be correct.

Maybe "Returned" should be labelled "Processed" or something, but I don't really know enough of the internals to say for sure what that time represents.

I'm not really in it for the points anyway (although I like to keep track), but to do my little part in supporting science and fighting disease.

Is there a another place (besides this forum) to report such occurrences?

Cheers,
HG
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WU reassigned before timeout or after "Ok" completion?

Post by bruce »

Hopfgeist wrote: I though it might have to do with a slight hiccup when shuffling projects between servers in the middle of a run.
That would be my guess.

If a project is on server A which is either running out of disk space or seeing too much traffic, changes need to be made. Moving it to server B would seem to be simple (it's not) but there are already a number of WUs being processed by you folks which are pre-programmed to be returned to server A. The project owner needs to capture those WUs that arrive at A and manually transfer them to B.
Post Reply