Very short final deadline? 48 h?

Moderators: Site Moderators, FAHC Science Team

Post Reply
sejtam
Posts: 11
Joined: Mon Apr 13, 2020 11:15 am

Very short final deadline? 48 h?

Post by sejtam »

I just had a WU killed by FAHcontrol because it exceeded the final deadline. That deadline
apparently was very short.


******************************* Date: 2020-10-22 *******************************
23:22:10:WU02:FS01:0xa7:Completed 177500 out of 250000 steps (71%)
23:32:59:WU02:FS01:0xa7:Completed 180000 out of 250000 steps (72%)
03:56:34:WU01:FS01:Connecting to assign2.foldingathome.org:80
03:56:35:WU01:FS01:Assigned to work server 66.170.111.50
03:56:35:WU01:FS01:Requesting new work unit for slot 01: RUNNING cpu:1 from 66.170.111.50
03:56:35:WU01:FS01:Connecting to 66.170.111.50:8080
03:56:35:WU01:FS01:Downloading 7.36MiB
03:56:41:WU01:FS01:Download 55.17%
03:56:44:WU01:FS01:Download complete
03:56:44:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:17408 run:0 clone:668 gen:85 core:0xa7 unit:0x0000006742aa6f325f61846850e1c007
04:02:07:WU02:FS01:0xa7:Completed 250000 out of 250000 steps (100%)
04:02:12:WU02:FS01:0xa7:Saving result file ../logfile_01.txt
04:02:12:WU01:FS01:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-sse2/a7-0.0.19/Core_a7.fah/FahCore_a7" -dir 01 -suffix 01 -version 706 -lifeline 75 -checkpoint 30 -np 1
04:02:12:WU01:FS01:Started FahCore on PID 8409
04:02:13:WU01:FS01:Core PID:8410
04:02:13:WU01:FS01:FahCore 0xa7 started
04:02:13:WU01:FS01:0xa7:*********************** Log Started 2020-10-23T04:02:13Z ***********************
04:02:13:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
until
23:47:32:WU01:FS01:0xa7:Completed 103750 out of 125000 steps (83%)
******************************* Date: 2020-10-25 *******************************
00:24:16:WU01:FS01:0xa7:Completed 105000 out of 125000 steps (84%)
01:00:10:WU01:FS01:0xa7:Completed 106250 out of 125000 steps (85%)
01:38:11:WU01:FS01:0xa7:Completed 107500 out of 125000 steps (86%)
02:14:46:WU01:FS01:0xa7:Completed 108750 out of 125000 steps (87%)
02:53:04:WU01:FS01:0xa7:Completed 110000 out of 125000 steps (88%)
03:27:19:WU01:FS01:0xa7:Completed 111250 out of 125000 steps (89%)
03:52:53:WARNING:WU01:FS01:Past final deadline 2020-10-25T03:52:52Z, dumping
03:52:53:WU01:FS01:Shutting core down
03:52:53:WU01:FS01:0xa7:Caught signal SIGINT(2) on PID 8410
03:52:53:WU01:FS01:0xa7:Exiting, please wait. . .
03:52:53:WU00:FS01:Connecting to assign1.foldingathome.org:80
So that workload had a final deadline of just 48 hours after assignment?

I don't find any mention of the timeouts or deadlines in the logs'
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Very short final deadline? 48 h?

Post by Neil-B »

Timeouts and Expiration Deadlines are indicated it both the web and advanced controls .. the project summary webpage linked from top of forum also indicates these .. 1 day Timeouts and 2 day Expirations have been common for covid projects
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
sejtam
Posts: 11
Joined: Mon Apr 13, 2020 11:15 am

Re: Very short final deadline? 48 h?

Post by sejtam »

But they are not logged, so after the fact it is impossible to see what was actually indicated as timeout and deadline... unless you run into the deadline . *then* it logs them at the point it stops processing. At which point I have wasted almost 2 days of processing. for nothing. Why is my client (macos OSX, no GPU) even getting such WUs that have deadlines that are almost impossible to achieve ???
ajm
Posts: 750
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Re: Very short final deadline? 48 h?

Post by ajm »

The Timeout and Deadline values are constant for a given project. You can see them here: https://apps.foldingathome.org/psummary
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Very short final deadline? 48 h?

Post by Neil-B »

I am guessing that you may not be running 24/7 ... or be running fairly old kit ... or possibly be running a limited core cpu slot ... the WUs might struggle to complete under those circumstances ... the web and advanced controls are the tools intended for monitoring progress of WUs ... the log is simply a record of what has happened ... FaH WUs will normally complete within Timeout on relatively recent kit folding 24/7.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
sejtam
Posts: 11
Joined: Mon Apr 13, 2020 11:15 am

Re: Very short final deadline? 48 h?

Post by sejtam »

I am running 24/7 . yes it is limited ( 201- mac mini). I am running 2 slots with one core each (as running one slot with 2 cores only uses about half of each CPU which seems like a waste also). It would be nice if client could request only WUs that fit its limited performance

And I understand that he web controls are there for monitoring, but once the WU has been cancelled/deleted you cannot find. the timeout and expiry anymore. and I surely don't watch this all the time. Just logging these values on assignment would help later figure out what happened.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Very short final deadline? 48 h?

Post by Neil-B »

I slot with 2 cores is the way to go and should use both cores 100% .. a single core on a mac mini sill struggle to meet expirations tbh ... You can always check the timeout expiration past WUs as the top level project details are posted in the links previously provided and the deadlines are the same for all WUs in each project.

it is possible (due to an old bug) for the first WU when a slot is setup to only use a single core (which might be why you have seen 50%) but after that it should run fine on 2 cores (and pretty much twice as quickly) - you may find the WUs get in close to Timeout even ... If you finish one of the slots then once completed delete it then increase the cores on the remaining slot to 2 you should get what you want ... if you increase the slot to 2 mid WU it will only increase at the next WU not mid WU - but it won't harm the running WU to do so.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Very short final deadline? 48 h?

Post by bruce »

If, in response to the old bug mentioned above, you decided to reconfigure the Mini to run two independent slots, that was a bad idea. Before a new wU is downloaded, reconfigure your sysem for one slot using two threads (CPUs). If the slot is set for 1 CPU, when a WU is downloaded, you cannot reconfigure that WU to use both CPUs.
Post Reply