Page 1 of 1

Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Mon Oct 19, 2020 10:24 pm
by midhart90
Just got sent another duplicate on 17403:

Project:17403 run:0 clone:1930 gen:3

https://apps.foldingathome.org/wu#proje ... 1930&gen=3

Looks like I'm the sixth to have received this (albeit two prior users returned it as "Faulty 2")

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Tue Oct 20, 2020 4:40 am
by PantherX
Since 2 of them are reported as faulty, it would generate 4 copies in total, 2 for each report.

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Tue Oct 20, 2020 2:27 pm
by midhart90
Ah, so each time a WU is returned as Faulty, it automatically triggers two copies to be sent out as a verification process of sorts, even if there's already a successful completion on file? Is this expected behavior?

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Wed Oct 21, 2020 4:04 am
by PantherX
AFAIK, it's the expected behavior and the limit is 5 faulty returns will cause the WS to stop issuing any additional copies of that WU. The value of 5 is the default and might be changed by researchers depending on their needs but this is rarely changed from the default.

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Wed Oct 21, 2020 7:31 am
by bruce
Why is this the expected behavior?
Shouldn't we recommend it be changed?

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Wed Oct 21, 2020 4:12 pm
by midhart90
Last night, a little after halfway through this WU (I believe it was at about 54% or so) the client went berserk, immediately advancing to 99.99% with an ETA of "Unknown" and showing something like 4.2 billion for the estimated PPD. There was nothing in the logs about it, it's as though it just stopped recording immediately before this happened. I ended up killing it (had I restarted it from scratch, it wouldn't have finished in time anyway) when it became clear that it was not going any further. The client downloaded and started a new WU and it appears to be working fine now.

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Wed Oct 21, 2020 6:27 pm
by Knish
questioning this "expected behavior" - https://apps.foldingathome.org/wu#proje ... e=9&gen=51
time will tell if we see the same things

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Thu Oct 22, 2020 8:26 am
by bruce
"berserk" is not the expected behavior, of course but we don't really have enough information to diagnose what actually happened.

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Fri Oct 23, 2020 8:18 pm
by PantherX
bruce wrote:Why is this the expected behavior?
Shouldn't we recommend it be changed?
I guess it is expected behavior because it is a legacy decision.

There's two options to think about:
1) Prevention of Duplicated WUs:
If a WU returns Faulty result, send out 1 additional copy. This is capped to 5 by default (unless changed in the Project).

2) Fastest Science Advancement (Current behavior):
If a WU returns Faulty result, send out 2 additional copies. Terminate WU assignment once you have 5 Faulty results. This is capped to 5 by default (unless changed in the Project).

Personally, I would like both options to be implemented with the default being #1 while #2 can be set on certain time sensitive Projects (Moonshot, etc.)

Re: Another duplicate: project:17403 run:0 clone:1930 gen:3

Posted: Fri Oct 23, 2020 8:19 pm
by PantherX
midhart90 wrote:...the client went berserk, immediately advancing to 99.99% with an ETA of "Unknown" and showing something like 4.2 billion for the estimated PPD. There was nothing in the logs about it, it's as though it just stopped recording immediately before this happened...
When it happens again, please capture screenshots and file an issue (using the template) with as much information as possible: https://github.com/FoldingAtHome/fah-issues/issues