Page 1 of 1
Proposal: Allow to skip or reject work units
Posted: Tue Mar 10, 2020 4:11 am
by Goetz
Scenario: I noticed that work unit 11737 restarts from its beginning (0%) once I stop and restart it. It's the CORE22 test project, which would require 23h10m for completion on my computer using the GPU (NVIDIA:5 GM108 [MX110]). I use a laptop computer which sometimes has to be run on battery only. That stops folding. Therefore such a work unit never can be finished on my computer.
Proposal: I propose to add a "skip" or "reject" command (besides "fold", "pause" and "finish") in order to be able to give a work unit back to the community before the time expires within which the computer has to complete the project.
Re: Proposal: Allow to skip or reject work units
Posted: Tue Mar 10, 2020 4:27 am
by bruce
All projects should restart from the most recent checkpoint. If you try to restart a project before it reaches the first checkpoint, then it's going to restart from 0%.k Checkpoints for CPU projects are based on a specific time interval which you can adjust.
The checkpoint interval for GPU projects (like 11737) is defined by the project owner based on the number of scientific samples needed per WU. That information is not as easy to obtain as it should be and an enhancement request has been submitted which might be included in a future code revision. I find that information two places. It's printed in the science log while the WU is being initialized. It can also be figured out if you peek at the work directory for that project.
Note: If you're on Windows and you shut down Windows without first doing a PAUSE, you're likely to corrupt the most recent checkpoint for the active WU.
Re: Proposal: Allow to skip or reject work units
Posted: Tue Mar 10, 2020 7:55 am
by JimboPalmer
If you cannot complete Core_22 Work Units due to being a laptop, you can delete the GPU slot. Then you will only get Core_a7 WUs for your CPU.
You can manage a CPU WU to better cope with pauses.
(If you only occasionally do not want GPU WUs, you can pause just one slot, just right clink on it is the Advanced Control program)
Re: Proposal: Allow to skip or reject work units
Posted: Tue Mar 10, 2020 11:42 am
by HaloJones
Allowing users to skip/reject units will simply encourage cherry-picking.
Re: Proposal: Allow to skip or reject work units
Posted: Wed Apr 01, 2020 1:38 pm
by scolphoy
Not sure if available in all versions, but at least the linux FAHClient has the --dump (wu|all) that will dump the work unit and inform the servers.
I looked into this just a moment ago, when I saw that my server had spent five days working on a WU, done about 40%, timeout later today, expiry in 3 days and eta was still about 7 days. I knew it would never make it on time and the client would dump it anyways in a few days when the expiry time is passed. My options as I saw would be:
- Not touch it, have it waste electricity for a few more days, then dump the expired WU and get new work.
- Shut down the box for a couple of days, save the energy, then dump the expired WU and get new work.
- Dump the WU now and get new work right away.
The third option seemed overall the most efficient, so I decided to go with that. I got a new WU with and the ETA for this shows just a few hours, so it seems something was wrong with the old WU.
In case someone on the team wants to look into the old WU, here's a log for when I dumped it:
Code: Select all
12:57:48:WARNING:Dumping WU00 per user request
12:57:48:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:13821 run:237 clone:1 gen:83 core:0xa7 unit:0x0000006080fccb095c883992cea0aa06
12:57:49:WU00:FS00:Connecting to 155.247.166.219:8080
12:57:50:WU00:FS00:Server responded WORK_QUIT (404)
12:57:50:WARNING:WU00:FS00:Server did not like results, dumping
I recommend making dumping a WU simple, so that other people with a hopeless WU could also put that time and energy into better use. That way you learn sooner if you need to send that WU to someone else and the science can continue faster. I don't believe cherry picking would actually become a real issue.
Re: Proposal: Allow to skip or reject work units
Posted: Wed Apr 01, 2020 7:12 pm
by Joe_H
scolphoy wrote:I recommend making dumping a WU simple, so that other people with a hopeless WU could also put that time and energy into better use. That way you learn sooner if you need to send that WU to someone else and the science can continue faster. I don't believe cherry picking would actually become a real issue.
For reasons already given here and elsewhere, that will not happen. Removal of a problem WU is already fairly easy. As for "cherry picking" becoming an issue, it has at times in the past. This project has been running for 20 years, this has been seen more than a few times. Some even caused problems that affected other folders ability to contribute.
As for the WU that was taking too long, you did not provide enough log data to tell, but it might have been one of a small batch that got created with several times the normal number of steps for that project. If you had noticed and posted earlier, we would have checked the log information to see if that was the problem, and given directions on how to get rid of the WU.
Re: Proposal: Allow to skip or reject work units
Posted: Thu Apr 02, 2020 12:14 am
by scolphoy
Ok. Cherry picking volunteer computing tasks seems very pointless and vain, but then again, people are known to do a lot of pointless stuff. We don't have to go deeper on this here.
If you say that this has happened before, I believe you.
Re: Proposal: Allow to skip or reject work units
Posted: Thu Jul 02, 2020 9:16 am
by susanreads
I'm running one CPU slot on my laptop, and I've got a WU that won't finish by deadline (project 16805). I ran it all last night with nothing else running, TPF is just over an hour, timeout is on Saturday and deadline at 23:57Z on Sunday. Is there a way to reject it so that the server assigns it to someone else without waiting till Saturday?
I could just Stop Folding and save the electricity until it expires, but I'd rather someone with a faster machine could get on with it and I could get one that I might finish before timeout (I've completed 44, all before deadline until now, and the majority before timeout, so my slow machine isn't completely useless).
Joe_H says "Removal of a problem WU is already fairly easy" but I don't know how to do it, and I can't find anything useful in the foldingathome FAQ.
Re: Proposal: Allow to skip or reject work units
Posted: Thu Jul 02, 2020 9:32 am
by ajm
For dumping the WU, you pause the slot (or the whole thing as you have only that one slot), you delete the folder containing the WU in %AppData%\FAHClient\work in Windows or in var/lib/fahclient/work in Linux or in /Library/Application Support/FAHClient/work in MacOS. Once again, as you only have one slot, you also can delete the whole work folder. Then you restart the slot or FAH, and your client will download another WU.
For pausing and restarting, you use Advanced Control (aka FAHControl).
Re: Proposal: Allow to skip or reject work units
Posted: Thu Jul 02, 2020 3:11 pm
by susanreads
Thanks ajm, that seems to have worked. There wasn't even much of a delay as it downloaded a new WU and now it's folding something of a reasonable size again.