Page 1 of 1

WU not starting

Posted: Tue Oct 08, 2019 8:55 am
by Penfold
I haven't seen this before. The WS is 155.247.166.219 and the CS is 155.247.166.220.
The status is stuck at 'ready'. Once in a while 'running' appears and shows some progress details, but it's there too fleetingly for me to catch what it says and then returns to 'ready' with 0.00% progress.

Here's the last part of the log file:

Code: Select all

08:51:43:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/AVX/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 705 -lifeline 1418 -checkpoint 15 -np 8
08:51:43:WU01:FS00:Started FahCore on PID 26774
08:51:43:WU01:FS00:Core PID:26778
08:51:43:WU01:FS00:FahCore 0xa7 started
08:51:43:WU01:FS00:0xa7:*********************** Log Started 2019-10-08T08:51:43Z ***********************
08:51:43:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
08:51:43:WU01:FS00:0xa7:       Type: 0xa7
08:51:43:WU01:FS00:0xa7:       Core: Gromacs
08:51:43:WU01:FS00:0xa7:    Website: https://foldingathome.org/
08:51:43:WU01:FS00:0xa7:  Copyright: (c) 2009-2018 foldingathome.org
08:51:43:WU01:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
08:51:43:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 705 -lifeline 26774 -checkpoint 15 -np
08:51:43:WU01:FS00:0xa7:             8
08:51:43:WU01:FS00:0xa7:     Config: <none>
08:51:43:WU01:FS00:0xa7:************************************ Build *************************************
08:51:43:WU01:FS00:0xa7:    Version: 0.0.17
08:51:43:WU01:FS00:0xa7:       Date: Apr 27 2018
08:51:43:WU01:FS00:0xa7:       Time: 19:09:21
08:51:43:WU01:FS00:0xa7: Repository: Git
08:51:43:WU01:FS00:0xa7:   Revision: 21359963583d09ec2063ef946399441c4df4ccd7
08:51:43:WU01:FS00:0xa7:     Branch: master
08:51:43:WU01:FS00:0xa7:   Compiler: GNU 6.3.0 20170516
08:51:43:WU01:FS00:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops
08:51:43:WU01:FS00:0xa7:   Platform: linux2 4.14.0-3-amd64
08:51:43:WU01:FS00:0xa7:       Bits: 64
08:51:43:WU01:FS00:0xa7:       Mode: Release
08:51:43:WU01:FS00:0xa7:       SIMD: avx_256
08:51:43:WU01:FS00:0xa7:************************************ System ************************************
08:51:43:WU01:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
08:51:43:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
08:51:43:WU01:FS00:0xa7:       CPUs: 8
08:51:43:WU01:FS00:0xa7:     Memory: 3.77GiB
08:51:43:WU01:FS00:0xa7:Free Memory: 827.39MiB
08:51:43:WU01:FS00:0xa7:    Threads: POSIX_THREADS
08:51:43:WU01:FS00:0xa7: OS Version: 4.15
08:51:43:WU01:FS00:0xa7:Has Battery: false
08:51:43:WU01:FS00:0xa7: On Battery: false
08:51:43:WU01:FS00:0xa7: UTC Offset: 1
08:51:43:WU01:FS00:0xa7:        PID: 26778
08:51:43:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
08:51:43:WU01:FS00:0xa7:         OS: Linux 4.15.0-65-generic x86_64
08:51:43:WU01:FS00:0xa7:    OS Arch: AMD64
08:51:43:WU01:FS00:0xa7:********************************************************************************
08:51:43:WU01:FS00:0xa7:Project: 14088 (Run 107, Clone 5, Gen 0)
08:51:43:WU01:FS00:0xa7:Unit: 0x000000020002894b5d92491cd4c5bd55
08:51:43:WU01:FS00:0xa7:Digital signatures verified
08:51:43:WU01:FS00:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -cpt 15 -nt 8
08:51:43:WU01:FS00:0xa7:Steps: first=0 total=1250000
08:51:44:WU01:FS00:0xa7:Completed 1 out of 1250000 steps (0%)
08:51:44:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
It seems to have been stuck like this for the past 50 minutes or so. What can/should I do?

EDIT: I see from EOC stats that 50 minutes is incorrect. It looks to have been stuck since yesterday sometime - daily production = 0.

Re: WU not starting

Posted: Tue Oct 08, 2019 1:41 pm
by Penfold
It seems I need to follow this advice, from another thread (thanks to parkut):
Pause slot, go to /work your FAH data folder, delete the subfolder with the same number as your work queue number. Resume slot.
Where do I find the data folder? I run FAHClient on my Ubuntu machine, but use FAHControl on my iMac to check in now and then. Will I find the data folder somewhere on my iMac, or will I need to go search on the Ubuntu machine? In either case I don't know where I'm going.

Re: WU not starting

Posted: Tue Oct 08, 2019 1:52 pm
by Joe_H
I was about to post the link to this topic - viewtopic.php?f=19&t=31899. Something has gone wrong with the setup for Project 14088 and it has been taken off from assignment.

The WU data will be on the Ubuntu machine, in the same location as the log and other files. See this post on finding the log file on Linux for posting - viewtopic.php?p=261083&f=24#p261083, Method 2. Navigate to /var/lib/fahclient and then into the work folder in that directory. Then you can remove the subfolder for the WU that matches WU01 as shown in your log.

Re: WU not starting

Posted: Tue Oct 08, 2019 5:19 pm
by Penfold
Well, that was a workout! Got there though. Thanks Joe.

I found it incredibly difficult to get access to /var/lib/fahclient. I'm sure if I recall correctly that at one time in Ubuntu you could actually see the System Folder and its subfolders in the GUI as in Mac OSX. Not now, it seems. Please correct me if I'm wrong. (I have very, very little knowledge of and proficiency in Ubuntu)

Does

Code: Select all

 xdg-open /var/lib/fahclient
sound familiar?

I hope I remember all of this if it happens again. This is the first time I've had to do this with a dodgy WU.

QUESTION: the Folding Slots and the Work Queue both have the same ID, namely 00. Is that OK? With the stalled WU they were 00 and 01 respectively.

Re: WU not starting

Posted: Tue Oct 08, 2019 6:48 pm
by HaloJones
next time just open a terminal window

cd /var/lib/fahclient/work
ls -l
see what the folders are called
sudo rm folder_name

you will then need your root password

Re: WU not starting

Posted: Tue Oct 08, 2019 7:04 pm
by Penfold
Thanks, HaloJones.

Not being savvy in respect of these things, can you tell me - is it 'ell' 'ess' space hyphen 'ell' or 'ell' 'ess' space hyphen 'one' ?

Re: WU not starting

Posted: Tue Oct 08, 2019 7:08 pm
by Joe_H
Penfold wrote:Thanks, HaloJones.

Not being savvy in respect of these things, can you tell me - is it 'ell' 'ess' space hyphen 'ell' or 'ell' 'ess' space hyphen 'one' ?
It would be the first, 'ell' 'ess' space hyphen 'ell'

Re: WU not starting

Posted: Tue Oct 08, 2019 7:10 pm
by Penfold
Thanks again, Joe.

All is well now …

Re: WU not starting

Posted: Wed Oct 09, 2019 12:46 am
by bruce
Penfold wrote:QUESTION: the Folding Slots and the Work Queue both have the same ID, namely 00. Is that OK? With the stalled WU they were 00 and 01 respectively.
Slots are numbered (starting from 0) in the order they were created, which is often CPU, GPU1, ...

WUs are numbered in the order received (also starting from 0, but a new WU will be assigned the lowest number that doesn't already contain a WU. In other words, they might have the same number or might not.

Look at FAHControl. You'll find each WU has both numbers listed ... and if you select a WU from the central panel, the corresponding entry for the other number is also hilighted. See also the numbers in the right-hand panel.

... or ...
look at the log. Each line of the log begins with something like
05:31:00:WU01:FS00:0xa7:(text)
05:31:15:WU00:FS01:0x21:(text)
05:31:32:WU03:FS02:0x21:(text)

The WU with the queue number 01 is running in FAH Slot 00 using FAHCore_a7.
The WU with the numbered 00 is running in slot 01 using FAHCore_21.
The WU with the queue number 03 is running in slot 02 also using FAHCore_21.

The WU with queue number 02 is now no longer present, having just recently uploaded.

Re: WU not starting

Posted: Wed Oct 09, 2019 8:45 am
by Penfold
Thanks for that explanation, Bruce. I only have a CPU - no GPUs for me. I notice this morning that the new WU has Work Queue ID 01.

This all blew up quite unexpectedly. I do little other than have a look at progress via FAHControl on my iMac now and then, check the Ubuntu machine to install any updates to do with the OS.

That's two days worth of WU crunching lost so my 'now and thens' should be daily henceforth.

Re: WU not starting

Posted: Wed Oct 09, 2019 5:24 pm
by bruce
With a CPU, you'll generally have one queue (probably 00) and most of the time, you should have 1 WU, though it's identifier will vary. During the download of a new WU and the upload of a completed WU, you will have 2 or 3 WUs briefly. Otherwise, since you'll only have one, it's obvious which file to delete.

Re: WU not starting

Posted: Thu Oct 10, 2019 8:58 am
by Penfold
Yes, my setup is so simple it could be called primitive. Thanks for all the info, Bruce.

Re: WU not starting

Posted: Fri Oct 11, 2019 3:09 pm
by Hagerstrom
On my Windows10 machine I was unable to delete the work queue folder after pausing. System said that the folder/file was in use.

I restarted my computer and was able to delete the folder before relaunching FAHClient. My app doesn't automatically start up after a re-boot. It has to Run as Administrator. Maybe there's a fix but I just have to remember to manually relaunch FAHClient after a computer re-boot.

Anyway. Folder deleted, FAHClient relaunched. All is good.

Re: WU not starting

Posted: Fri Oct 11, 2019 5:33 pm
by bruce
FAHClient shouldn't need to run as administrator unless you installed it while logged on as an administrator. It is best to install Windows FAH as the user who is going to log on to run it. (During the install, it may or may not ask you for the Admin password.)