Page 1 of 1

Client suspended work for another task?

Posted: Fri Mar 27, 2020 6:52 am
by mcr42
Hi all,

I have a client running for a few days now.
All working well so far, but for a few days now, I have an unfinished task (14 minutes to go), that the client stopped working on in favor of a new work unit.
As far as I remember, that was before the Timeout was reached, but I'm not certain.

Image


Update:

The log reads:

Code: Select all

******************************* Date: 2020-03-26 *******************************
11:57:05:WU02:FS01:0x22:Completed 1920000 out of 2000000 steps (96%)
12:11:40:WU02:FS01:0x22:Completed 1940000 out of 2000000 steps (97%)
12:26:10:WU02:FS01:0x22:Completed 1960000 out of 2000000 steps (98%)
12:41:02:WU02:FS01:0x22:Completed 1980000 out of 2000000 steps (99%)
12:41:02:WU01:FS01:Connecting to 65.254.110.245:8080
12:41:03:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
...
12:42:04:WU01:FS01:Connecting to 18.218.241.186:80
12:42:04:WU01:FS01:Assigned to work server 140.163.4.231
12:42:04:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK106 [GeForce GTX 765M] from 140.163.4.231
12:42:04:WU01:FS01:Connecting to 140.163.4.231:8080
12:42:25:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
12:42:25:WU01:FS01:Connecting to 140.163.4.231:80
12:44:46:WU01:FS01:Downloading 7.85MiB
12:44:52:WU01:FS01:Download 30.26%
12:44:58:WU01:FS01:Download 66.10%
12:45:03:WU01:FS01:Download complete
12:45:03:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11750 run:0 clone:1142 gen:10 core:0x22 unit:0x000000198ca304e75e6a801f85923962

12:54:10:WARNING:WU02:FS01:FahCore returned an unknown error code which probably indicates that it crashed
12:54:10:WARNING:WU02:FS01:FahCore returned: WU_STALLED (127 = 0x7f)

12:54:10:WU01:FS01:Starting
12:54:10:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\mcr\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 2152 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
12:54:10:WU01:FS01:Started FahCore on PID 8692
12:54:10:WU01:FS01:Core PID:6080
12:54:10:WU01:FS01:FahCore 0x22 started
12:54:11:WU01:FS01:0x22:*********************** Log Started 2020-03-26T12:54:10Z ***********************
Funny thing is, the client seems to have asked for more work before finishing the job, and the core crashed after that.
It then started working on the new task, so a BSOD is not the cause. (Thus, I_m not pdating the title as proposed.)

However, meanwhile the client finished the newer task, and seems to have returned to the old one.
Update: Yep, confirmed, it finished the old task and published the results.

Re: Client suspended work for another task?

Posted: Fri Mar 27, 2020 7:10 am
by bruce
Could this be due to a blue screen reboot? Sure, it might be. I'd go to Log and click Warnings and Errors.

You need to fix whatever is causing the bsod crashes. [Is your system getting too hot? ... is it overclocked? ... is it time to blow the dust out of your cooling air path? ... or is it simply time to upgrade your machine?

Re: Client suspended work for another task?

Posted: Fri Mar 27, 2020 7:21 am
by bruce
You're title isn't very discriptive. I'd change it to something like. :shock: BSOD Corrupted a WorkUnit.

Re: Client suspended work for another task?

Posted: Fri Mar 27, 2020 9:30 am
by mcr42
No no, the BSOD was just an occurrence. I have this Laptop running for years now, without a BSOD (except from the BT/Wifi driver crashing on Suspend/Resume, which I did not trigger since I started FAH).
FAH has been running for more than a week on it now, without any BSOD or other problems. So the BSOD is not the problem.
The Warnings and Errors does show a lot of download errors, and only 2 or 3 times a core crashed.
I reduce Folding power from medium to light now, and watch closely.

The question remains: Should the client fetch new work when there is an unfinished task (that might time out meanwhile) ?

Re: Client suspended work for another task?

Posted: Fri Mar 27, 2020 9:50 am
by Neil-B
iirc default setting is that new WU is requested/downloaded when progress reaches 99% … this allows minimal overlap but has the WU ready for when previous one finishes - keeps the CPU/GPU nicely loaded with only a slight/short dip on the meters … for my kit that leaves a window usually less than 60 secs for a crash to occur - on slower kit that may be a few minutes or so (look at your TPF figure) … not sure if it is possible or even advisable to change setting to wait until 100% but seen it done the other way to (say 90% to) cover slow download speeds so might be doable.

As to your question - for me I never have stability issues or crashes so having a WU ready and waiting so my CPUs don't cool off between WUs minimise wear and tear through heat cycling and one TPF (1%) is easily long enough for me to download whilst minimising any risk of failure once new WU received so I am happy to say yes it should happen - from your perspective I can see why you might question otherwise.

Re: Client suspended work for another task?

Posted: Fri Mar 27, 2020 10:14 am
by mcr42
Thanks, that's a good explanation, I didn't think of that.
I didn't intend to touch any of the controls (didn't know there are any); I was just curious why this might have happenend.
Thanks.

Accepted answer, Problem solved, Thread can be closed.

Re: Client suspended work for another task?

Posted: Sat Mar 28, 2020 12:56 am
by Joe_H
Yes, that is a good explanation. The option for download defaults to 99%, and can be set between 90-100%. I would have to look up the exact name as I have not used it recently, being on DSL having an upload overlap with a download from setting it to 100% became a problem when the WU download sizes increased.