Page 1 of 1

Core killed after laptop hibernation

Posted: Thu Sep 05, 2024 6:19 am
by nrwahl
Hi, I've encountered a repeated problem.

If I manually pause folding, or if I remove the power cable before closing my laptop, I can resume my current work unit when I reconnect the power cable later.

If I close my laptop before removing the power cable, I lose all progress on the work unit. The core is killed and I have to fetch a new unit the next time I start folding.

I'm including logs below. I closed my laptop around 01:40 and opened it back up on battery power (while connected to public WiFi) around 02:12. I probably reconnected the power cable at home and clicked Fold at either 04:42 or 04:55.

Code: Select all

01:38:41:I1:WU38:Completed 177500 out of 250000 steps (71%)
02:12:25:W :WU38:Detected clock skew (19 mins 2 secs), I/O delay, laptop hibernation, other slowdown or clock change noted, adjusting time estimates
02:12:25:I1:Account websocket closed: PROTOCOL msg=Failed to read header start
02:12:25:W :CON78:DNS lookup failed for api.foldingathome.org
02:12:25:E :OUT78:Failed response: CONNECT
02:12:26:I1:WU38:Caught signal SIGINT(2)
02:12:26:I1:WU38:Exiting, please wait. . .
02:12:40:I1:OUT80:> GET https://api.foldingathome.org/machine/r-6-CGLCnbOGW8fEkfH0qpwMSeONseoY7hMV5OBKwYM HTTP/1.1
02:12:41:I1:OUT80:< HTTP/1.1 200 HTTP_OK
02:12:41:I1:OUT3:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
02:12:42:I1:OUT3:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
02:12:42:I1:Logging into node account
02:13:26:W :Core did not shutdown gracefully, killing process
02:13:27:E :WU38:Core was killed
02:13:27:W :WU38:Core returned FAILED_1 (0)
02:13:27:E :WU38:Run did not produce any results. Dumping WU
02:13:27:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:13:27:I1:WU38:Sending dump report
02:13:27:E :WU39:Exception: WU does not have an ID
02:13:27:I1:OUT81:> POST https://highland3.seas.upenn.edu/api/results HTTP/1.1
02:13:27:I1:OUT81:< HTTP/1.1 200 HTTP_OK
02:13:27:I1:WU38:Dumped
02:13:31:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:13:31:E :WU40:Exception: WU does not have an ID
02:13:39:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:13:39:E :WU41:Exception: WU does not have an ID
02:13:55:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:13:55:E :WU42:Exception: WU does not have an ID
02:14:27:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:14:27:E :WU43:Exception: WU does not have an ID
02:15:31:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:15:31:E :WU44:Exception: WU does not have an ID
02:17:39:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:17:39:E :WU45:Exception: WU does not have an ID
02:21:55:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:21:55:E :WU46:Exception: WU does not have an ID
02:44:18:I1:Account websocket closed: PROTOCOL msg=Failed to read header start
02:44:18:W :CON82:DNS lookup failed for api.foldingathome.org
02:44:18:E :OUT82:Failed response: CONNECT
02:44:33:I1:OUT84:> GET https://api.foldingathome.org/machine/r-6-CGLCnbOGW8fEkfH0qpwMSeONseoY7hMV5OBKwYM HTTP/1.1
02:44:34:I1:OUT84:< HTTP/1.1 200 HTTP_OK
02:44:34:I1:OUT3:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
02:44:35:I1:OUT3:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
02:44:35:I1:Logging into node account
02:51:55:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
02:51:55:E :WU47:Exception: WU does not have an ID
03:03:36:I1:Account websocket closed: PROTOCOL msg=Failed to read header start
03:03:36:W :CON85:DNS lookup failed for api.foldingathome.org
03:03:36:E :OUT85:Failed response: CONNECT
03:03:51:I1:OUT87:> GET https://api.foldingathome.org/machine/r-6-CGLCnbOGW8fEkfH0qpwMSeONseoY7hMV5OBKwYM HTTP/1.1
03:03:52:I1:OUT87:< HTTP/1.1 200 HTTP_OK
03:03:52:I1:OUT3:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
03:03:53:I1:OUT3:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
03:03:53:I1:Logging into node account
03:13:01:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
03:13:01:E :WU48:Exception: WU does not have an ID
03:30:05:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
03:30:05:E :WU49:Exception: WU does not have an ID
03:47:09:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
03:47:09:E :WU50:Exception: WU does not have an ID
04:04:13:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
04:04:13:E :WU51:Exception: WU does not have an ID
04:21:17:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
04:21:17:E :WU52:Exception: WU does not have an ID
04:42:00:I1:Account websocket closed: PROTOCOL msg=Failed to read header start
04:42:00:W :CON88:DNS lookup failed for api.foldingathome.org
04:42:00:E :OUT88:Failed response: CONNECT
04:42:16:I1:OUT90:> GET https://api.foldingathome.org/machine/r-6-CGLCnbOGW8fEkfH0qpwMSeONseoY7hMV5OBKwYM HTTP/1.1
04:42:17:I1:OUT90:< HTTP/1.1 200 HTTP_OK
04:42:17:I1:OUT3:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
04:42:17:I1:OUT3:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
04:42:17:I1:Logging into node account
04:55:18:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
04:55:18:I1:WU53:Requesting WU assignment for user nrwahl team 11812
04:55:18:I1:OUT91:> POST https://assign6.foldingathome.org/api/assign HTTP/1.1
04:55:18:I1:OUT91:< HTTP/1.1 200 HTTP_OK
04:55:18:I1:WU53:Received WU assignment pD5m0G_lpyRjNVYZKX8cueb4GdrkWDtqlZ_K8QD7uIM
04:55:18:I1:WU53:Downloading WU
04:55:19:I1:OUT92:> POST https://huangfolding1.chem.wisc.edu/api/assign HTTP/1.1
04:55:24:I1:WU53:DOWNLOAD 10% 2.56MiB of 24.76MiB
04:55:25:I1:WU53:DOWNLOAD 38% 9.37MiB of 24.76MiB
04:55:26:I1:WU53:DOWNLOAD 82% 20.22MiB of 24.76MiB
04:55:27:I1:OUT92:< HTTP/1.1 200 HTTP_OK
04:55:27:I1:WU53:Received WU P16771 R21 C11 G44
Is there a way to avoid this, or would this be considered a bug? I'm relatively new to folding, and I'm accustomed to closing my laptop before unplugging it, so it will be difficult to break that habit.

Re: Core killed after laptop hibernation

Posted: Thu Sep 05, 2024 6:48 am
by calxalot
Sounds like GPU WU is paused on battery power, and started again when you plug in.

If your laptop sleeps before going on battery, the WU is not paused and the GPU core can't handle the sleep cycle, which typically resets the GPU and crashes the core.

In the past with v7, the GPU core would be stuck and the estimated progress would go to 99.99% incorrectly.

This is a known deficiency of the GPU cores that they can't handle sleep/hibernation.
This is also why by default the client tries to disable sleep while folding.
For now, you need to pause or go to battery before closing laptop.
You also should deselect On Battery in the settings.

Re: Core killed after laptop hibernation

Posted: Thu Sep 05, 2024 10:41 pm
by nrwahl
Thanks! That's a bummer to hear but I'm glad to have a definitive answer so quickly.

Re: Core killed after laptop hibernation

Posted: Sun Sep 08, 2024 2:12 pm
by toTOW
Yes, FAH doesn't like nVidia Optimus or any other similar technologies ... or power saving features. They are the opposite to what FAH is.