project 16944 WU 48,36,48 takes too long, how to stop
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 16
- Joined: Mon Apr 06, 2020 9:14 am
project 16944 WU 48,36,48 takes too long, how to stop
hello,
yesterday i updated to 7.6.21.
Now i get very long WUs.
But now i see that over night i got from project 16944 a WU 48,36,48 which needs 2.3 days more but timeout is before that...some hours are missing.
I can not finish it. This is sad for the scientist and for me. Is there any way to stop it so that somebody with more power can run that WU?
I am wasting here my CPU time.
Beside that, since 7.6.21 and long WUs i get round about only 30% of the points like it did the last year with older version.
I have a i7 NUC....this is for short WUs only. I am surprised that i get WUs with runtimes of more than 2 days.
The whole last year it runs without problems and WUs took like max half a day but newer 2-3 days.
BR
yesterday i updated to 7.6.21.
Now i get very long WUs.
But now i see that over night i got from project 16944 a WU 48,36,48 which needs 2.3 days more but timeout is before that...some hours are missing.
I can not finish it. This is sad for the scientist and for me. Is there any way to stop it so that somebody with more power can run that WU?
I am wasting here my CPU time.
Beside that, since 7.6.21 and long WUs i get round about only 30% of the points like it did the last year with older version.
I have a i7 NUC....this is for short WUs only. I am surprised that i get WUs with runtimes of more than 2 days.
The whole last year it runs without problems and WUs took like max half a day but newer 2-3 days.
BR
Re: project 16944 WU 48,36,48 takes too long, how to stop
It would be nice if you could post the about 200 first lines of your FAH log.
The actual folding cores are the same even if your client is newer, so that shouldn't have changed just because you upgraded the client. It's possible that your config file was messed up by the upgrade so that your number of CPU threads isn't ideal. It's also possible that the project has too big work units. You could try checking if you have set the "max-packet-size" avanced option in the client and perhaps set it to "small". That should give you smaller WU files.
If that doesn't help, I'd suggest backing up the config.xml files and re-install the client from scratch, since it used to work for you. Don't go back to an older version of the client, since older versions have some security issues.
PS: Have you checked for dust buildup in your Nuc?
The actual folding cores are the same even if your client is newer, so that shouldn't have changed just because you upgraded the client. It's possible that your config file was messed up by the upgrade so that your number of CPU threads isn't ideal. It's also possible that the project has too big work units. You could try checking if you have set the "max-packet-size" avanced option in the client and perhaps set it to "small". That should give you smaller WU files.
If that doesn't help, I'd suggest backing up the config.xml files and re-install the client from scratch, since it used to work for you. Don't go back to an older version of the client, since older versions have some security issues.
PS: Have you checked for dust buildup in your Nuc?
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
-
- Posts: 16
- Joined: Mon Apr 06, 2020 9:14 am
Re: project 16944 WU 48,36,48 takes too long, how to stop
Thank you very much!
The NUC is fine and works like always.
The settings are correct with 8 cores.
I do not know why i got such a big WU.
After a restart the estimation was like 3 more days.
So i decided for a new install.
-> I uninstalled all and installed fresh the new version.
Now all seams to work fine again.
I get again same points like in the past and a normal WU time of around 6h.
Thanks again.
The NUC is fine and works like always.
The settings are correct with 8 cores.
I do not know why i got such a big WU.
After a restart the estimation was like 3 more days.
So i decided for a new install.
-> I uninstalled all and installed fresh the new version.
Now all seams to work fine again.
I get again same points like in the past and a normal WU time of around 6h.
Thanks again.
Re: project 16944 WU 48,36,48 takes too long, how to stop
You're welcome. Try setting "max-packet-size" to "small" in the advanced configuration to avoid re-occurence of the problem. Sometimes a researcher makes a mistake and releases a "big" work unit as a normal one, but if your Nuc is struggling to reach the timeout and expiry of normal work units, then setting it to "small" might help avoid it.
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
-
- Posts: 16
- Joined: Mon Apr 06, 2020 9:14 am
Re: project 16944 WU 48,36,48 takes too long, how to stop
Iam just surprised that in the old version i never got this big WUs.
Actually i do not find max-packet-size in the FAHcontrol.
Do i enter it under Expert - Extra client options ?
Thanks for your help!
Actually i do not find max-packet-size in the FAHcontrol.
Do i enter it under Expert - Extra client options ?
Thanks for your help!
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: project 16944 WU 48,36,48 takes too long, how to stop
Project 16944 is not a particularly large system, 23,400 atoms. I don't have any records in my logs to do a comparison, so can not tell if if the WUs from it are large in processing time due to their number of steps.
What you could have been seeing is your system processing a WU from this project for the first time. When that happens the client has no record to compare, so the estimates will at first be based on the timeout. After processing 2-3%, then the estimates will be more accurate. The next time you get a WU from that project the estimates will be closer to accurate from the start.
The same issue with estimates occurs right after a restart. How accurate/inaccurate will depend on exactly where the restart is done.
Beyond that your total points per day might be down if your NUC was not getting assignments continuously. There have been intermittent shortages of WUs for various configurations over the past few months. Also check to see if your passkey was properly entered.
Finally, one note about configuration. You don't mention which i7 you have in your NUC, but I have found on my systems that I get almost the same points running my i7s configured to use just the physical cores and not all of the CPU threads available through HT. The i7 runs a bit cooler and clocks up higher. I especially see this on my laptop with a 2 core/4 thread i7. So you may want to try it both ways and see which works best for you.
What you could have been seeing is your system processing a WU from this project for the first time. When that happens the client has no record to compare, so the estimates will at first be based on the timeout. After processing 2-3%, then the estimates will be more accurate. The next time you get a WU from that project the estimates will be closer to accurate from the start.
The same issue with estimates occurs right after a restart. How accurate/inaccurate will depend on exactly where the restart is done.
Beyond that your total points per day might be down if your NUC was not getting assignments continuously. There have been intermittent shortages of WUs for various configurations over the past few months. Also check to see if your passkey was properly entered.
Finally, one note about configuration. You don't mention which i7 you have in your NUC, but I have found on my systems that I get almost the same points running my i7s configured to use just the physical cores and not all of the CPU threads available through HT. The i7 runs a bit cooler and clocks up higher. I especially see this on my laptop with a 2 core/4 thread i7. So you may want to try it both ways and see which works best for you.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 16
- Joined: Mon Apr 06, 2020 9:14 am
Re: project 16944 WU 48,36,48 takes too long, how to stop
12:06:08: CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
12:06:08: CPU ID: GenuineIntel Family 6 Model 142 Stepping 10
12:06:08: CPUs: 8
i can tell that the estimation was accurate, because the WU was already running some hours..according to the progress in that time the estimation was okay.
So i thing the WU was bigger than the ones i was receiving during the last 12 months.
But here i also do not understand the client.
Why the software continues a job which can not reach the goal in time. The software should skip the WU assign it to somebody else and download a new one.
My systems runs 24h/d and beside some days to begin of last year i got nearly always WUs..so this can not be the reason for the few points.
Anyway, now everything seems to run again like a charm.
12:06:08: CPU ID: GenuineIntel Family 6 Model 142 Stepping 10
12:06:08: CPUs: 8
i can tell that the estimation was accurate, because the WU was already running some hours..according to the progress in that time the estimation was okay.
So i thing the WU was bigger than the ones i was receiving during the last 12 months.
But here i also do not understand the client.
Why the software continues a job which can not reach the goal in time. The software should skip the WU assign it to somebody else and download a new one.
My systems runs 24h/d and beside some days to begin of last year i got nearly always WUs..so this can not be the reason for the few points.
Anyway, now everything seems to run again like a charm.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: project 16944 WU 48,36,48 takes too long, how to stop
The client will run a WU until it reaches the final deadline, for Project 16944 that is 4.2 days. You get bonus points up until the Timeout figure of 2.4 days, just the base points after the timeout. After the final deadline a WU will be dumped. Before that the client does not since current slow processing may be for a temporary reason.
The client does not do any benchmarking, and there is no mechanism in place currently to assign WUs based on how long they will take to process on a particular system. There are older enhancement requests for the client to do this, no idea when or if they will be implemented. The 'max-packet-size' parameter previously mentioned is a holdover from the early days when some people were still connecting via modem. It relates to the size of the files uploaded, not the length of runtime or simulation size. Still of some use for those on slower connections such as DSL, but not too useful for figuring how long a WU will take.
Without the logs anything else is conjecture. Possibly your NUC's i7 was running in a lower power mode to stay within cooling limits, perhaps something else.
The client does not do any benchmarking, and there is no mechanism in place currently to assign WUs based on how long they will take to process on a particular system. There are older enhancement requests for the client to do this, no idea when or if they will be implemented. The 'max-packet-size' parameter previously mentioned is a holdover from the early days when some people were still connecting via modem. It relates to the size of the files uploaded, not the length of runtime or simulation size. Still of some use for those on slower connections such as DSL, but not too useful for figuring how long a WU will take.
Without the logs anything else is conjecture. Possibly your NUC's i7 was running in a lower power mode to stay within cooling limits, perhaps something else.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 16
- Joined: Mon Apr 06, 2020 9:14 am
Re: project 16944 WU 48,36,48 takes too long, how to stop
i understood, thank you very much for the explanation.
Re: project 16944 WU 48,36,48 takes too long, how to stop
The concept of a "big" WU is actually several independent concepts.
* The time to process an assignment does depend on the complexity of the protein (atom count) and that can alter the packet size but that's not really and effective way to manage that.
* The time to process an assignment also depends on the number of steps.
* (etc)
The number of atoms depends on the protein, so it's not going to change.
It's relatively easy for the project owner to alter the total simulated time (which changes the number of steps and the number of points per WU) but it means altering the basic project.
Project 16944 seems to be missing from the list of active projects. I have no explanation.
* The time to process an assignment does depend on the complexity of the protein (atom count) and that can alter the packet size but that's not really and effective way to manage that.
* The time to process an assignment also depends on the number of steps.
* (etc)
The number of atoms depends on the protein, so it's not going to change.
It's relatively easy for the project owner to alter the total simulated time (which changes the number of steps and the number of points per WU) but it means altering the basic project.
Project 16944 seems to be missing from the list of active projects. I have no explanation.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: project 16944 WU 48,36,48 takes too long, how to stop
I'm not sure 6h is "normal." Obviously it's going to depend on your hardware, but in the distant past, WUs often were designed to run for several days.strombergFs wrote:Thank you very much!
I get again same points like in the past and a normal WU time of around 6h.
Thanks again.
Anyway, apparently that WU had a problem which you have solved by the reinstall. We'll let the project owner know (and maybe he has suspended the project to readjust something).
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 521
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: project 16944 WU 48,36,48 takes too long, how to stop
I've been getting some longer than usual CPU WU's as well... in this case a couple of these WU's took 15-17 hours on my Ryzen 2400G. The PPD returns seemed fairly in line with recent, so I didn't pay it much attention.
Most CPU WU's aren't nearly this long..... but they did seem to have some larger and longer running ones lately.
Most CPU WU's aren't nearly this long..... but they did seem to have some larger and longer running ones lately.
Fold them if you get them!