Only 60% of WUs are Credited
Moderators: Site Moderators, FAHC Science Team
Only 60% of WUs are Credited
I have been running FAH since 2022 on a Dell desktop bought in 2018. In January, I started running version 8.4.9. FAH uses two CPUs and a GPU. I shut the PC down each night and sometimes pause FAH to reduce surges in CPU fan noise. Is this setup OK for FAH?
I just checked my Work Units at https://v8-4.foldingathome.org/wus.
In the last 33 days, my system has attempted 161 WUs.
Only 94 WUs (about 60%) reached 100% completion and were Credited!
43 WUs were lost to Shutting Down at an average of 50% progress.
22 WUs were Dumped at an average of 30% progress; most of these WUs were then Credited on other people's systems so they could have been Credited on my system.
One WU Failed; it has now Failed 278 times on other people's systems!
So, FAH did not complete more than one-third of all WUs assigned to my machine. Issues seem to affect CPU and GPU WUs equally. That is not a great use of my compute. If that is also affecting other volunteers, it could be a problem. Other people might want to check their Work Units.
I have seen the good advice to pause FAH and wait a minute before shutting down Windows. Then remember to resume FAH after I reboot or start the PC. I will try to do so. But it is easy to forget and should not be necessary. Does FAH reliably resume from the last checkpoint?
I have run many volunteer computing projects that do not lose their WUs after reboot. For example, my BOINC projects only lose a WU every few months. Ideas welcome, thanks!
I just checked my Work Units at https://v8-4.foldingathome.org/wus.
In the last 33 days, my system has attempted 161 WUs.
Only 94 WUs (about 60%) reached 100% completion and were Credited!
43 WUs were lost to Shutting Down at an average of 50% progress.
22 WUs were Dumped at an average of 30% progress; most of these WUs were then Credited on other people's systems so they could have been Credited on my system.
One WU Failed; it has now Failed 278 times on other people's systems!
So, FAH did not complete more than one-third of all WUs assigned to my machine. Issues seem to affect CPU and GPU WUs equally. That is not a great use of my compute. If that is also affecting other volunteers, it could be a problem. Other people might want to check their Work Units.
I have seen the good advice to pause FAH and wait a minute before shutting down Windows. Then remember to resume FAH after I reboot or start the PC. I will try to do so. But it is easy to forget and should not be necessary. Does FAH reliably resume from the last checkpoint?
I have run many volunteer computing projects that do not lose their WUs after reboot. For example, my BOINC projects only lose a WU every few months. Ideas welcome, thanks!
-
- Site Moderator
- Posts: 1438
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Only 60% of WUs are Credited
Another workaround is to set folding to Finish after starting it. Then it might already be paused if you forget to manually pause before a shutdown.
-
- Site Admin
- Posts: 8087
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Only 60% of WUs are Credited
With Windows Pause folding a minute or two before shutting down your PC. Windows is supposed to wait for the folding process to exit, it often does not wait long enough. This is a known issue with Windows. There is code in the setup of the client to have Windows wait, from many reports Windows is ignoring it.
Personally my experience with Windows is this is a long running bug. I have dealt with the results of Windows not waiting for processes to exit for over 20 years. It does this for regular shutdowns, also with shutdown and reboots that are part of software updates.
Personally my experience with Windows is this is a long running bug. I have dealt with the results of Windows not waiting for processes to exit for over 20 years. It does this for regular shutdowns, also with shutdown and reboots that are part of software updates.
-
- Site Moderator
- Posts: 1438
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Only 60% of WUs are Credited
I think logout will also do it.
Code in the client is not sufficient.
Dev maybe thinks that windows kills the cores before the client can stop them normally, then the client assumes the cores crashed and dumps work.
Code in the client is not sufficient.
Dev maybe thinks that windows kills the cores before the client can stop them normally, then the client assumes the cores crashed and dumps work.
Re: Only 60% of WUs are Credited
That probably is it. The current code is pretty liberal about dumping the core for many exit reasons and does not always make optimal decisions: https://github.com/FoldingAtHome/fah-cl ... ExitCode.h. It would probably be CLIENT_DIED which is an overloaded status:
Code: Select all
* DEFAULT - v322-v600: DUMP and ERROR.
* v623: If SMP then DUMP and ERROR else EXIT.
* CLIENT_DIED, BAD_WORK_CHECKSUM, MALLOC_ERROR, UNKNOWN_ERROR
It can be triggered by mistake, for example (on Linux at least) system-wide resource limits can send SIGXCPU and SIGKILL to the core, but not the client. But one can't just make CLIENT_DIED more relaxed because that is probably the same error that would happen if the core has a bug that causes it to receive SIGSEGV (just a guess). Windows probably has something similar. I don't think the client distinguishes different types of exits caused by signals (or their Windows equivalents). It treats it as an unknown error internally.
-
- Site Moderator
- Posts: 1438
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Only 60% of WUs are Credited
There is an unreleased commit that changes something about the terminate order.
I don’t know if anyone else has tested it.
https://github.com/FoldingAtHome/fah-cl ... 429c9444cc
I don’t know if anyone else has tested it.
https://github.com/FoldingAtHome/fah-cl ... 429c9444cc
-
- Posts: 1531
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Only 60% of WUs are Credited
Ideally we would need fahclient to probe Works folder and fold everything that is in there before downloading new WUs. If existing WU is expired, dump it, if not, continue folding.
-
- Site Moderator
- Posts: 1438
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Only 60% of WUs are Credited
Sounds like a great enhancement request.
-
- Posts: 1531
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Only 60% of WUs are Credited
I think I asked Joe for this, ever since Windows started forgetting WUs upon restart. Works folder would have many forgotten WUs still sitting there. Later that developed into killing WUs
-
- Site Moderator
- Posts: 1438
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Only 60% of WUs are Credited
Don’t ask him. Just create a ticket.
Re: Only 60% of WUs are Credited
Thanks for the discussion! I will continue to Pause and then to wait a minute before shutdown. I see fewer lost WUs when I do that. How many other users or devices does this issue affect? What percentage of all FAH WUs are being lost at shutdown?
-
- Posts: 1531
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: Only 60% of WUs are Credited
Everyone on Windows who do not pause before reboot, lose WUs
-
- Site Admin
- Posts: 8087
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Only 60% of WUs are Credited
This stats page - https://stats.foldingathome.org/os - gives number for current folders and the OS used. It doesn't include Intel GPU stats, and any CPU folding on Raspberry Pi and similar systems is probably included under Linux. But Windows and Linux are almost even, just around 50% for Windows and 45% for Linux. I do not know why there are separate Windows and Win64 categories.
It would take someone scanning the server logs to get an idea which clients are using V7 versus v8. I haven't heard of that being done recently, last time I heard of it being done was years ago. Some were still using very early versions of v7.
It would take someone scanning the server logs to get an idea which clients are using V7 versus v8. I haven't heard of that being done recently, last time I heard of it being done was years ago. Some were still using very early versions of v7.