repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Knish
Posts: 222
Joined: Tue Mar 17, 2020 5:20 am

repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Knish »

This the cloud GCP nvidia T4 again, but now neither pause/unpausing nor rebooting clears the 17322. I saw someone else had a fault with this WU; could it be the WU? If my next WU completes and uploads ok, i may have to just dump the 17322

Code: Select all

*********************** Log Started 2020-12-24T14:24:39Z ***************
14:24:39:Trying to access database...
14:24:40:Successfully acquired database lock
14:24:40:Read GPUs.txt
14:24:43:Enabled folding slot 01: READY gpu:0:TU104GL [Tesla T4] 8141
14:24:43:****************************** FAHClient ***********************
14:24:43:        Version: 7.6.13
14:24:43:         Author: Joseph Coffland <[email protected]>
14:24:43:      Copyright: 2020 foldingathome.org
14:24:43:       Homepage: https://foldingathome.org/
14:24:43:           Date: Apr 28 2020
14:24:43:           Time: 04:20:16
14:24:43:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
14:24:43:         Branch: master
14:24:43:       Compiler: GNU 8.3.0
14:24:43:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
14:24:43:                 -funroll-loops -fno-pie
14:24:43:       Platform: linux2 4.19.0-5-amd64
14:24:43:           Bits: 64
14:24:43:           Mode: Release
14:24:43:           Args: --child /etc/fahclient/config.xml --run-as fahclient
14:24:43:                 --pid-file=/var/run/fahclient.pid --daemon
14:24:43:         Config: /etc/fahclient/config.xml
14:24:43:******************************** CBang ***************************
14:24:43:           Date: Apr 25 2020
14:24:43:           Time: 00:07:53
14:24:43:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
14:24:43:         Branch: master
14:24:43:       Compiler: GNU 8.3.0
14:24:43:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
14:24:43:                 -funroll-loops -fno-pie -fPIC
14:24:43:       Platform: linux2 4.19.0-5-amd64
14:24:43:           Bits: 64
14:24:43:           Mode: Release
14:24:43:******************************* System *************************
14:24:43:            CPU: Intel(R) Xeon(R) CPU @ 2.30GHz
14:24:43:         CPU ID: GenuineIntel Family 6 Model 63 Stepping 0
14:24:43:           CPUs: 1
14:24:43:         Memory: 1.70GiB
14:24:43:    Free Memory: 1.45GiB
14:24:43:        Threads: POSIX_THREADS
14:24:43:     OS Version: 4.19
14:24:43:    Has Battery: false
14:24:43:     On Battery: false
14:24:43:     UTC Offset: 0
14:24:43:            PID: 444
14:24:43:            CWD: /var/lib/fahclient
14:24:43:             OS: Linux 4.19.0-13-cloud-amd64 x86_64
14:24:43:        OS Arch: AMD64
14:24:43:           GPUs: 1
14:24:43:          GPU 0: Bus:0 Slot:4 Func:0 NVIDIA:6 TU104GL [Tesla T4] 8141
14:24:43:  CUDA Device 0: Platform:0 Device:0 Bus:0 Slot:4 Compute:7.5 Driver:11.0
14:24:43:OpenCL Device 0: Platform:0 Device:0 Bus:0 Slot:4 Compute:1.2 Driver:450.51
14:24:43:******************************* libFAH **************************
14:24:43:           Date: Apr 15 2020
14:24:43:           Time: 21:43:24
14:24:43:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
14:24:43:         Branch: master
14:24:43:       Compiler: GNU 8.3.0
14:24:43:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
14:24:43:                 -funroll-loops -fno-pie
14:24:43:       Platform: linux2 4.19.0-5-amd64
14:24:43:           Bits: 64
14:24:43:           Mode: Release
14:24:43:****************************************************************
14:24:43:<config>
14:24:43:  <!-- Client Control -->
14:24:43:  <fold-anon v='true'/>

14:24:43:  <!-- User Information -->

14:24:43:  <!-- Folding Slots -->
14:24:43:  <slot id='1' type='GPU'/>
14:24:43:</config>
14:24:43:WU01:FS01:Starting
14:24:43:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 444 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
14:24:43:WU01:FS01:Started FahCore on PID 551
14:24:43:WU01:FS01:Core PID:555
14:24:43:WU01:FS01:FahCore 0x22 started
14:24:44:WU01:FS01:0x22:*********************** Log Started 2020-12-24T14:24:43Z ***********
14:24:44:WU01:FS01:0x22:*************************** Core22 Folding@home Core ************
14:24:44:WU01:FS01:0x22:       Core: Core22
14:24:44:WU01:FS01:0x22:       Type: 0x22
14:24:44:WU01:FS01:0x22:    Version: 0.0.13
14:24:44:WU01:FS01:0x22:     Author: Joseph Coffland <[email protected]>
14:24:44:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
14:24:44:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
14:24:44:WU01:FS01:0x22:       Date: Sep 19 2020
14:24:44:WU01:FS01:0x22:       Time: 01:10:35
14:24:44:WU01:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
14:24:44:WU01:FS01:0x22:     Branch: core22-0.0.13
14:24:44:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:24:44:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:24:44:WU01:FS01:0x22:             -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
14:24:44:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:24:44:WU01:FS01:0x22:       Bits: 64
14:24:44:WU01:FS01:0x22:       Mode: Release
14:24:44:WU01:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
14:24:44:WU01:FS01:0x22:             <[email protected]>
14:24:44:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 551 -checkpoint 15
14:24:44:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
14:24:44:WU01:FS01:0x22:             0 -gpu 0
14:24:44:WU01:FS01:0x22:************************************ libFAH *******************
14:24:44:WU01:FS01:0x22:       Date: Sep 15 2020
14:24:44:WU01:FS01:0x22:       Time: 05:14:43
14:24:44:WU01:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
14:24:44:WU01:FS01:0x22:     Branch: HEAD
14:24:44:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:24:44:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:24:44:WU01:FS01:0x22:             -funroll-loops
14:24:44:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:24:44:WU01:FS01:0x22:       Bits: 64
14:24:44:WU01:FS01:0x22:       Mode: Release
14:24:44:WU01:FS01:0x22:************************************ CBang *******************
14:24:44:WU01:FS01:0x22:       Date: Sep 15 2020
14:24:44:WU01:FS01:0x22:       Time: 05:11:04
14:24:44:WU01:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
14:24:44:WU01:FS01:0x22:     Branch: HEAD
14:24:44:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:24:44:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:24:44:WU01:FS01:0x22:             -funroll-loops -fPIC
14:24:44:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:24:44:WU01:FS01:0x22:       Bits: 64
14:24:44:WU01:FS01:0x22:       Mode: Release
14:24:44:WU01:FS01:0x22:************************************ System ********************
14:24:44:WU01:FS01:0x22:        CPU: Intel(R) Xeon(R) CPU @ 2.30GHz
14:24:44:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 0
14:24:44:WU01:FS01:0x22:       CPUs: 1
14:24:44:WU01:FS01:0x22:     Memory: 1.70GiB
14:24:44:WU01:FS01:0x22:Free Memory: 1.31GiB
14:24:44:WU01:FS01:0x22:    Threads: POSIX_THREADS
14:24:44:WU01:FS01:0x22: OS Version: 4.19
14:24:44:WU01:FS01:0x22:Has Battery: false
14:24:44:WU01:FS01:0x22: On Battery: false
14:24:44:WU01:FS01:0x22: UTC Offset: 0
14:24:44:WU01:FS01:0x22:        PID: 555
14:24:44:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
14:24:44:WU01:FS01:0x22:************************************ OpenMM *********************
14:24:44:WU01:FS01:0x22:   Revision: 189320d0
14:24:44:WU01:FS01:0x22:****************************************************************
14:24:44:WU01:FS01:0x22:Project: 17322 (Run 0, Clone 993, Gen 21)
14:24:44:WU01:FS01:0x22:Unit: 0x00000000000000000000000000000000
14:24:44:WU01:FS01:0x22:Digital signatures verified
14:24:44:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:24:44:WU01:FS01:0x22:Version 0.0.13
14:24:44:WU01:FS01:0x22:  Checkpoint write interval: 15000 steps (2%) [50 total]
14:24:44:WU01:FS01:0x22:  JSON viewer frame write interval: 7500 steps (1%) [100 total]
14:24:44:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (33%) [3 total]
14:24:44:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
14:24:45:WU01:FS01:0x22:There are 4 platforms available.
14:24:45:WU01:FS01:0x22:Platform 0: Reference
14:24:45:WU01:FS01:0x22:Platform 1: CPU
14:24:45:WU01:FS01:0x22:Platform 2: OpenCL
14:24:45:WU01:FS01:0x22:  opencl-device 0 specified
14:24:45:WU01:FS01:0x22:Platform 3: CUDA
14:24:45:WU01:FS01:0x22:  cuda-device 0 specified
14:25:04:WU01:FS01:0x22:Attempting to create CUDA context:
14:25:05:WU01:FS01:0x22:  Configuring platform CUDA
14:25:23:WU01:FS01:0x22:  Using CUDA and gpu 0
14:25:24:WU01:FS01:0x22:Completed 750000 out of 750000 steps (100%)
14:25:29:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
14:25:29:WU00:FS01:Starting
14:25:29:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 444 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
14:25:29:WU00:FS01:Started FahCore on PID 567
14:25:29:WU00:FS01:Core PID:571
14:25:29:WU00:FS01:FahCore 0x22 started
14:25:30:WU00:FS01:0x22:*********************** Log Started 2020-12-24T14:25:30Z *********
14:25:30:WU00:FS01:0x22:*************************** Core22 Folding@home Core *************
14:25:30:WU00:FS01:0x22:       Core: Core22
14:25:30:WU00:FS01:0x22:       Type: 0x22
14:25:30:WU00:FS01:0x22:    Version: 0.0.13
14:25:30:WU00:FS01:0x22:     Author: Joseph Coffland <[email protected]>
14:25:30:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
14:25:30:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
14:25:30:WU00:FS01:0x22:       Date: Sep 19 2020
14:25:30:WU00:FS01:0x22:       Time: 01:10:35
14:25:30:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
14:25:30:WU00:FS01:0x22:     Branch: core22-0.0.13
14:25:30:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:25:30:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:25:30:WU00:FS01:0x22:             -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
14:25:30:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:25:30:WU00:FS01:0x22:       Bits: 64
14:25:30:WU00:FS01:0x22:       Mode: Release
14:25:30:WU00:FS01:0x22:Maintainers: John Chodera <[email protected]> and Peter Eastman
14:25:30:WU00:FS01:0x22:             <[email protected]>
14:25:30:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 567 -checkpoint 15
14:25:30:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
14:25:30:WU00:FS01:0x22:             0 -gpu 0
14:25:30:WU00:FS01:0x22:************************************ libFAH ******************
14:25:30:WU00:FS01:0x22:       Date: Sep 15 2020
14:25:30:WU00:FS01:0x22:       Time: 05:14:43
14:25:30:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
14:25:30:WU00:FS01:0x22:     Branch: HEAD
14:25:30:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:25:30:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:25:30:WU00:FS01:0x22:             -funroll-loops
14:25:30:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:25:30:WU00:FS01:0x22:       Bits: 64
14:25:30:WU00:FS01:0x22:       Mode: Release
14:25:30:WU00:FS01:0x22:************************************ CBang **********************
14:25:30:WU00:FS01:0x22:       Date: Sep 15 2020
14:25:30:WU00:FS01:0x22:       Time: 05:11:04
14:25:30:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
14:25:30:WU00:FS01:0x22:     Branch: HEAD
14:25:30:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
14:25:30:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
14:25:30:WU00:FS01:0x22:             -funroll-loops -fPIC
14:25:30:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
14:25:30:WU00:FS01:0x22:       Bits: 64
14:25:30:WU00:FS01:0x22:       Mode: Release
14:25:30:WU00:FS01:0x22:************************************ System ********************
14:25:30:WU00:FS01:0x22:        CPU: Intel(R) Xeon(R) CPU @ 2.30GHz
14:25:30:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 0
14:25:30:WU00:FS01:0x22:       CPUs: 1
14:25:30:WU00:FS01:0x22:     Memory: 1.70GiB
14:25:30:WU00:FS01:0x22:Free Memory: 1.42GiB
14:25:30:WU00:FS01:0x22:    Threads: POSIX_THREADS
14:25:30:WU00:FS01:0x22: OS Version: 4.19
14:25:30:WU00:FS01:0x22:Has Battery: false
14:25:30:WU00:FS01:0x22: On Battery: false
14:25:30:WU00:FS01:0x22: UTC Offset: 0
14:25:30:WU00:FS01:0x22:        PID: 571
14:25:30:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
14:25:30:WU00:FS01:0x22:************************************ OpenMM *********************
14:25:30:WU00:FS01:0x22:   Revision: 189320d0
14:25:30:WU00:FS01:0x22:******************************************************************
14:25:30:WU00:FS01:0x22:Project: 17424 (Run 0, Clone 1149, Gen 17)
14:25:30:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
14:25:30:WU00:FS01:0x22:Digital signatures verified
14:25:30:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
14:25:30:WU00:FS01:0x22:Version 0.0.13
14:25:30:WU00:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
14:25:30:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
14:25:30:WU00:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
14:25:30:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
14:25:32:WU00:FS01:0x22:There are 4 platforms available.
14:25:32:WU00:FS01:0x22:Platform 0: Reference
14:25:32:WU00:FS01:0x22:Platform 1: CPU
14:25:32:WU00:FS01:0x22:Platform 2: OpenCL
14:25:32:WU00:FS01:0x22:  opencl-device 0 specified
14:25:32:WU00:FS01:0x22:Platform 3: CUDA
14:25:32:WU00:FS01:0x22:  cuda-device 0 specified
14:25:40:WU00:FS01:0x22:Attempting to create CUDA context:
14:25:40:WU00:FS01:0x22:  Configuring platform CUDA
14:25:51:WU00:FS01:0x22:  Using CUDA and gpu 0
14:25:52:WU00:FS01:0x22:Completed 200000 out of 1250000 steps (16%)
14:27:19:WU00:FS01:0x22:Completed 212500 out of 1250000 steps (17%)
14:28:06:FS01:Paused
14:28:06:FS01:Shutting core down
14:28:06:WU00:FS01:0x22:Caught signal SIGINT(2) on PID 571
14:28:06:WU00:FS01:0x22:Exiting, please wait. . .
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by PantherX »

It seems that you might have ended up with 2 WUs (each with progress) on a single Slot. See what happens once the current WU finishes as I think the Slot will see that you have another WU and then package it before sending it.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Knish
Posts: 222
Joined: Tue Mar 17, 2020 5:20 am

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Knish »

endless looping between 100% and INTERRUPTED now, i think either this GPU, or this WU is hosed, but it still can complete other different WUs
Knish
Posts: 222
Joined: Tue Mar 17, 2020 5:20 am

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Knish »

..wait, maybe something weird with GCP itself. I had spun up a new instance for single GPU which has been running all day until processing suddenly dropped in the monitoring with a corresponding halt FAH logging. Everything was on, so it wasn't preempted, but there was no logging/ calculating going on. I rebooted and P14911 continued where it left off. Examining logs reveals a mysterious gap from 0804Z, the last entry at 22% progress, and 0842Z where it starts back up from 20%

... and as I type remote monitoring of this GPU in my FahControl dropped off like the other one, saying "Updating" but I got another clue: my ssh using putty was still barely connected and things were real sluggish, as in many seconds delay in typing things. Tried to open the last log again in an editor and got error:

Code: Select all

-bash: fork: Cannot allocate memory
and then i lost ssh connection

VM is still running and cpu that feeds the gpu has dropped to 0% usage.

Maybe Santa wants to machine learn our predictive behavior so he doesn't have to keep lists by hand anymore.

Merry Christmas, Happy holidays, and here's to a happy new year
Knish
Posts: 222
Joined: Tue Mar 17, 2020 5:20 am

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Knish »

Coincidence? noticed after the fact that around 1100Z WS 140.**.200 restarted. my 17322 then uploaded! BUT I had also increased RAM for that VM from 1.75 GB to 2.

So I switched to other VM which only had 1.5 gb ram and it started hitting same issues with 14911, increased that to 2gb ram as well, and that then completed with no more problems.

running top in the 1.5GB system I frequently saw 70MiB of ram still free, and saw it get as low as 50, so thought I was still ok.
After increasing to 2GB tho, I sometimes see it dip down from 100 to 70MiB free, so I'm guessing this was all a RAM issue? It's a shame I forgot to note the buffer/cache size (if that was important).

So I guess I stumbled upon a minimum for Folding with a GPU with linux. After nearly 2 billion points in 8 months using (among others) a T4 with 1.75 GB RAM, looks like you'll see less issues with 2GB.
Joe_H
Site Admin
Posts: 7926
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Joe_H »

The increased memory usage probably comes from the size of the system being simulated in Project 17322. At over 430,000 atoms it is currently the largest by atom count being distributed.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by PantherX »

This is a tricky one as I am not sure what component (FAHClient/FahCore) should throw/write an error for being unable to package up the WU to send it. Nonetheless, I have asked around so let's see what happens :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Knish
Posts: 222
Joined: Tue Mar 17, 2020 5:20 am

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Knish »

Final bit of trivia, now that I'm running with 2GB of RAM, I stared at top as P17319 finished. It went from 177MiB free, and as it goes through all the "saving result file..." I saw it drop to 67MiB free, so that seems like a perfect explanation to my logs from 17322 earlier.
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by toTOW »

This is what happen when you don't allocate enough memory to the VM, and don't use swap ...

I wouldn't try to fold with less than 4 GB of RAM ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Gnomuz
Posts: 31
Joined: Sat Nov 21, 2020 5:07 pm

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Gnomuz »

I've started a GCP instance two days ago, following the guide by Knish, and it works great. But I also got two failed units on project 17322, exactly the same symptoms as described by Knish, despite I created the instance with 2GB RAM. I've just edited the instance to 3GB RAM and will check next time I get a WU from this project.
The RAM costs are reported as 0.03 or 0.04€ per day, so that shouldn't have a big impact on the overall cost of the VM !
Image

Nvidia RTX 3060 Ti & GTX 1660 Super - AMD Ryzen 7 5800X - MSI MEG X570 Unify - 16 GB RAM - Ubuntu 20.04.2 LTS - Nvidia drivers 460.56
Knish
Posts: 222
Joined: Tue Mar 17, 2020 5:20 am

Re: repeated INTERRUPTED after 100%- 17322 (0, 993,21)

Post by Knish »

oh, Thanks for this update!
Post Reply