Page 2 of 2

Re: F@H pausing itself?

Posted: Thu Feb 06, 2025 6:19 pm
by muziqaz
It can be RAM, CPU memory controller, but most likely GPU, GPU mem controller, VRAM, GPU VRM.
If it was Linux, we could easily blame FAHcores, but in windows things are relatively stable on that front.

Re: F@H pausing itself?

Posted: Wed Feb 26, 2025 10:00 pm
by vica153
I upped the GPU voltage by 6mV to 963mV@1801MHz and its been stable for ~50WU over the last few weeks. So apparently my perfectly stable GPU wasn't as perfectly stable as I had thought. Interesting that F@H seems to be more sensitive than any other usage.

Re: F@H pausing itself?

Posted: Wed Feb 26, 2025 10:02 pm
by muziqaz
It is not sensitive, it is just properly loading your hardware

Re: F@H pausing itself?

Posted: Thu Mar 06, 2025 6:52 am
by arisu
Mild instability when playing video games will cause artifacts that you probably won't notice. Mild instability when folding can cause mistakes in the simulation that can make it converge to an impossible or unrealistic state that will be caught by sanity checks (in this case the position of a particle has become NaN). Folding doesn't make a system less stable, but it will catch small instabilities that other usages will not.

Code: Select all

05:04:10:I1:WU145:An exception occurred at step 17067: Particle coordinate is NaN. For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan
05:04:10:I1:WU145:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
I think this means that the incorrect calculation happened before the last checkpoint and the bad simulation state was saved to the checkpoint. When it tried to resume the checkpoint with the bad data, it converged into a state where a particle's position was NaN (an invalid floating point number). It retried twice and reached that state each time, so the core concluded that the checkpoint itself had bad data (which was probably true).