Page 1 of 3
Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 5:08 am
by markhl
I have been running FAH since 2022 on a Dell desktop bought in 2018. In January, I started running version 8.4.9. FAH uses two CPUs and a GPU. I shut the PC down each night and sometimes pause FAH to reduce surges in CPU fan noise. Is this setup OK for FAH?
I just checked my Work Units at
https://v8-4.foldingathome.org/wus.
In the last 33 days, my system has attempted 161 WUs.
Only 94 WUs (about 60%) reached 100% completion and were Credited!
43 WUs were lost to Shutting Down at an average of 50% progress.
22 WUs were Dumped at an average of 30% progress; most of these WUs were then Credited on other people's systems so they could have been Credited on my system.
One WU Failed; it has now Failed 278 times on other people's systems!
So, FAH did not complete more than one-third of all WUs assigned to my machine. Issues seem to affect CPU and GPU WUs equally. That is not a great use of my compute. If that is also affecting other volunteers, it could be a problem. Other people might want to check their Work Units.
I have seen the good advice to pause FAH and wait a minute before shutting down Windows. Then remember to resume FAH after I reboot or start the PC. I will try to do so. But it is easy to forget and should not be necessary. Does FAH reliably resume from the last checkpoint?
I have run many volunteer computing projects that do not lose their WUs after reboot. For example, my BOINC projects only lose a WU every few months. Ideas welcome, thanks!
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 5:17 am
by calxalot
Another workaround is to set folding to Finish after starting it. Then it might already be paused if you forget to manually pause before a shutdown.
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 5:35 am
by Joe_H
With Windows Pause folding a minute or two before shutting down your PC. Windows is supposed to wait for the folding process to exit, it often does not wait long enough. This is a known issue with Windows. There is code in the setup of the client to have Windows wait, from many reports Windows is ignoring it.
Personally my experience with Windows is this is a long running bug. I have dealt with the results of Windows not waiting for processes to exit for over 20 years. It does this for regular shutdowns, also with shutdown and reboots that are part of software updates.
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 5:55 am
by calxalot
I think logout will also do it.
Code in the client is not sufficient.
Dev maybe thinks that windows kills the cores before the client can stop them normally, then the client assumes the cores crashed and dumps work.
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 8:20 am
by arisu
calxalot wrote: ↑Wed Mar 05, 2025 5:55 am
I think logout will also do it.
Code in the client is not sufficient.
Dev maybe thinks that windows kills the cores before the client can stop them normally, then the client assumes the cores crashed and dumps work.
That probably is it. The current code is pretty liberal about dumping the core for many exit reasons and does not always make optimal decisions:
https://github.com/FoldingAtHome/fah-cl ... ExitCode.h. It would probably be CLIENT_DIED which is an overloaded status:
Code: Select all
* DEFAULT - v322-v600: DUMP and ERROR.
* v623: If SMP then DUMP and ERROR else EXIT.
* CLIENT_DIED, BAD_WORK_CHECKSUM, MALLOC_ERROR, UNKNOWN_ERROR
Non-SMP clients just exit without dumping the WU for some reason.
It can be triggered by mistake, for example (on Linux at least) system-wide resource limits can send SIGXCPU and SIGKILL to the core, but not the client. But one can't just make CLIENT_DIED more relaxed because that is probably the same error that would happen if the core has a bug that causes it to receive SIGSEGV (just a guess). Windows probably has something similar. I don't think the client distinguishes different types of exits caused by signals (or their Windows equivalents). It treats it as an unknown error internally.
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 8:38 am
by calxalot
There is an unreleased commit that changes something about the terminate order.
I don’t know if anyone else has tested it.
https://github.com/FoldingAtHome/fah-cl ... 429c9444cc
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 2:58 pm
by muziqaz
Ideally we would need fahclient to probe Works folder and fold everything that is in there before downloading new WUs. If existing WU is expired, dump it, if not, continue folding.
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 8:02 pm
by calxalot
Sounds like a great enhancement request.
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 8:08 pm
by muziqaz
calxalot wrote: ↑Wed Mar 05, 2025 8:02 pm
Sounds like a great enhancement request.
I think I asked Joe for this, ever since Windows started forgetting WUs upon restart. Works folder would have many forgotten WUs still sitting there. Later that developed into killing WUs
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 8:19 pm
by calxalot
Don’t ask him. Just create a ticket.
Re: Only 60% of WUs are Credited
Posted: Wed Mar 05, 2025 8:59 pm
by muziqaz
calxalot wrote: ↑Wed Mar 05, 2025 8:19 pm
Don’t ask him. Just create a ticket.
I think I did
Re: Only 60% of WUs are Credited
Posted: Mon Mar 17, 2025 2:25 am
by markhl
Thanks for the discussion! I will continue to Pause and then to wait a minute before shutdown. I see fewer lost WUs when I do that. How many other users or devices does this issue affect? What percentage of all FAH WUs are being lost at shutdown?
Re: Only 60% of WUs are Credited
Posted: Mon Mar 17, 2025 5:54 am
by muziqaz
Everyone on Windows who do not pause before reboot, lose WUs
Re: Only 60% of WUs are Credited
Posted: Tue Mar 18, 2025 3:32 am
by arisu
muziqaz wrote: ↑Mon Mar 17, 2025 5:54 am
Everyone on Windows who do not pause before reboot, lose WUs
That seems like an extremely serious problem for the project. What percentage of Windows folders are using the v8 client?
Re: Only 60% of WUs are Credited
Posted: Tue Mar 18, 2025 4:32 am
by Joe_H
This stats page -
https://stats.foldingathome.org/os - gives number for current folders and the OS used. It doesn't include Intel GPU stats, and any CPU folding on Raspberry Pi and similar systems is probably included under Linux. But Windows and Linux are almost even, just around 50% for Windows and 45% for Linux. I do not know why there are separate Windows and Win64 categories.
It would take someone scanning the server logs to get an idea which clients are using V7 versus v8. I haven't heard of that being done recently, last time I heard of it being done was years ago. Some were still using very early versions of v7.