Page 1 of 2

Deadlines way too short on 8001, 8004, 8011 [Not]

Posted: Tue Jan 17, 2012 11:36 pm
by Hyperlife
Since these WUs were rebenchmarked, the preferred and final deadlines are now less than 24 hours. I have a uniprocessor client whose 8001 WU was just terminated for going over the deadline at 85% on a 24/7 Pentium 4 2.8GHz rig.

Here's the complete log showing the termination:

Code: Select all

06:20:00:WU01:FS00:Connecting to assign3.stanford.edu:8080
06:20:00:WU01:FS00:News: Welcome to Folding@Home
06:20:00:WU01:FS00:Assigned to work server 171.67.108.58
06:20:00:WU01:FS00:Requesting new work unit for slot 00: RUNNING uniprocessor from 171.67.108.58
06:20:00:WU01:FS00:Connecting to 171.67.108.58:8080
06:20:01:WU01:FS00:Downloading 532.03KiB
06:20:02:WU01:FS00:Download complete
06:20:02:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:8001 run:18 clone:62 gen:22 core:0xa4 unit:0x000000246652edca4eded8dc117bd15f
06:20:15:WU01:FS00:Starting
06:20:15:WU01:FS00:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" "C:/Documents and Settings/xxx/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/beta/Core_a4.fah/FahCore_a4.exe" -dir 01 -suffix 01 -version 701 -checkpoint 15 -service -forceasm
06:20:15:WU01:FS00:Started FahCore on PID 4308
06:20:15:WU01:FS00:Core PID:4708
06:20:15:WU01:FS00:FahCore 0xa4 started
06:20:15:WU01:FS00:0xa4:
06:20:15:WU01:FS00:0xa4:*------------------------------*
06:20:15:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
06:20:15:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
06:20:15:WU01:FS00:0xa4:
06:20:15:WU01:FS00:0xa4:Preparing to commence simulation
06:20:15:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
06:20:15:WU01:FS00:0xa4:- Not checking prior termination.
06:20:15:WU01:FS00:0xa4:- Expanded 544290 -> 1305312 (decompressed 239.8 percent)
06:20:15:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=544290 data_size=1305312, decompressed_data_size=1305312 diff=0
06:20:15:WU01:FS00:0xa4:- Digital signature verified
06:20:15:WU01:FS00:0xa4:
06:20:15:WU01:FS00:0xa4:Project: 8001 (Run 18, Clone 62, Gen 22)
06:20:15:WU01:FS00:0xa4:
06:20:15:WU01:FS00:0xa4:Assembly optimizations on if available.
06:20:15:WU01:FS00:0xa4:Entering M.D.
06:20:21:WU01:FS00:0xa4:Mapping NT from 1 to 1 
06:20:22:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
06:31:21:WU01:FS00:0xa4:Completed 2500 out of 250000 steps  (1%)
06:42:21:WU01:FS00:0xa4:Completed 5000 out of 250000 steps  (2%)
06:53:20:WU01:FS00:0xa4:Completed 7500 out of 250000 steps  (3%)
07:04:20:WU01:FS00:0xa4:Completed 10000 out of 250000 steps  (4%)
07:15:22:WU01:FS00:0xa4:Completed 12500 out of 250000 steps  (5%)
07:26:22:WU01:FS00:0xa4:Completed 15000 out of 250000 steps  (6%)
07:37:22:WU01:FS00:0xa4:Completed 17500 out of 250000 steps  (7%)
07:48:22:WU01:FS00:0xa4:Completed 20000 out of 250000 steps  (8%)
07:59:22:WU01:FS00:0xa4:Completed 22500 out of 250000 steps  (9%)
08:10:22:WU01:FS00:0xa4:Completed 25000 out of 250000 steps  (10%)
08:21:23:WU01:FS00:0xa4:Completed 27500 out of 250000 steps  (11%)
08:32:22:WU01:FS00:0xa4:Completed 30000 out of 250000 steps  (12%)
08:43:22:WU01:FS00:0xa4:Completed 32500 out of 250000 steps  (13%)
08:54:21:WU01:FS00:0xa4:Completed 35000 out of 250000 steps  (14%)
09:05:21:WU01:FS00:0xa4:Completed 37500 out of 250000 steps  (15%)
09:16:22:WU01:FS00:0xa4:Completed 40000 out of 250000 steps  (16%)
09:27:24:WU01:FS00:0xa4:Completed 42500 out of 250000 steps  (17%)
09:38:24:WU01:FS00:0xa4:Completed 45000 out of 250000 steps  (18%)
09:49:24:WU01:FS00:0xa4:Completed 47500 out of 250000 steps  (19%)
10:00:25:WU01:FS00:0xa4:Completed 50000 out of 250000 steps  (20%)
10:11:26:WU01:FS00:0xa4:Completed 52500 out of 250000 steps  (21%)
10:22:27:WU01:FS00:0xa4:Completed 55000 out of 250000 steps  (22%)
10:33:26:WU01:FS00:0xa4:Completed 57500 out of 250000 steps  (23%)
10:44:27:WU01:FS00:0xa4:Completed 60000 out of 250000 steps  (24%)
10:55:27:WU01:FS00:0xa4:Completed 62500 out of 250000 steps  (25%)
11:06:28:WU01:FS00:0xa4:Completed 65000 out of 250000 steps  (26%)
11:17:29:WU01:FS00:0xa4:Completed 67500 out of 250000 steps  (27%)
11:28:29:WU01:FS00:0xa4:Completed 70000 out of 250000 steps  (28%)
11:39:30:WU01:FS00:0xa4:Completed 72500 out of 250000 steps  (29%)
11:50:30:WU01:FS00:0xa4:Completed 75000 out of 250000 steps  (30%)
12:01:31:WU01:FS00:0xa4:Completed 77500 out of 250000 steps  (31%)
12:12:30:WU01:FS00:0xa4:Completed 80000 out of 250000 steps  (32%)
12:23:31:WU01:FS00:0xa4:Completed 82500 out of 250000 steps  (33%)
12:34:30:WU01:FS00:0xa4:Completed 85000 out of 250000 steps  (34%)
12:45:29:WU01:FS00:0xa4:Completed 87500 out of 250000 steps  (35%)
12:56:27:WU01:FS00:0xa4:Completed 90000 out of 250000 steps  (36%)
13:07:28:WU01:FS00:0xa4:Completed 92500 out of 250000 steps  (37%)
13:18:28:WU01:FS00:0xa4:Completed 95000 out of 250000 steps  (38%)
13:29:28:WU01:FS00:0xa4:Completed 97500 out of 250000 steps  (39%)
13:40:27:WU01:FS00:0xa4:Completed 100000 out of 250000 steps  (40%)
13:51:25:WU01:FS00:0xa4:Completed 102500 out of 250000 steps  (41%)
14:02:25:WU01:FS00:0xa4:Completed 105000 out of 250000 steps  (42%)
14:13:25:WU01:FS00:0xa4:Completed 107500 out of 250000 steps  (43%)
14:24:25:WU01:FS00:0xa4:Completed 110000 out of 250000 steps  (44%)
14:35:26:WU01:FS00:0xa4:Completed 112500 out of 250000 steps  (45%)
14:46:24:WU01:FS00:0xa4:Completed 115000 out of 250000 steps  (46%)
14:57:23:WU01:FS00:0xa4:Completed 117500 out of 250000 steps  (47%)
15:08:22:WU01:FS00:0xa4:Completed 120000 out of 250000 steps  (48%)
15:19:21:WU01:FS00:0xa4:Completed 122500 out of 250000 steps  (49%)
15:30:22:WU01:FS00:0xa4:Completed 125000 out of 250000 steps  (50%)
15:41:23:WU01:FS00:0xa4:Completed 127500 out of 250000 steps  (51%)
15:52:20:WU01:FS00:0xa4:Completed 130000 out of 250000 steps  (52%)
16:03:17:WU01:FS00:0xa4:Completed 132500 out of 250000 steps  (53%)
16:14:14:WU01:FS00:0xa4:Completed 135000 out of 250000 steps  (54%)
16:25:12:WU01:FS00:0xa4:Completed 137500 out of 250000 steps  (55%)
16:36:10:WU01:FS00:0xa4:Completed 140000 out of 250000 steps  (56%)
16:47:08:WU01:FS00:0xa4:Completed 142500 out of 250000 steps  (57%)
16:58:05:WU01:FS00:0xa4:Completed 145000 out of 250000 steps  (58%)
17:10:23:WU01:FS00:0xa4:Completed 147500 out of 250000 steps  (59%)
17:23:09:WU01:FS00:0xa4:Completed 150000 out of 250000 steps  (60%)
17:36:37:WU01:FS00:0xa4:Completed 152500 out of 250000 steps  (61%)
17:49:44:WU01:FS00:0xa4:Completed 155000 out of 250000 steps  (62%)
18:02:04:WU01:FS00:0xa4:Completed 157500 out of 250000 steps  (63%)
18:13:54:WU01:FS00:0xa4:Completed 160000 out of 250000 steps  (64%)
18:25:19:WU01:FS00:0xa4:Completed 162500 out of 250000 steps  (65%)
18:37:39:WU01:FS00:0xa4:Completed 165000 out of 250000 steps  (66%)
18:50:14:WU01:FS00:0xa4:Completed 167500 out of 250000 steps  (67%)
19:03:17:WU01:FS00:0xa4:Completed 170000 out of 250000 steps  (68%)
19:15:17:WU01:FS00:0xa4:Completed 172500 out of 250000 steps  (69%)
19:33:04:WU01:FS00:0xa4:Completed 175000 out of 250000 steps  (70%)
19:46:51:WU01:FS00:0xa4:Completed 177500 out of 250000 steps  (71%)
19:57:51:WU01:FS00:0xa4:Completed 180000 out of 250000 steps  (72%)
20:08:54:WU01:FS00:0xa4:Completed 182500 out of 250000 steps  (73%)
20:20:10:WU01:FS00:0xa4:Completed 185000 out of 250000 steps  (74%)
20:31:15:WU01:FS00:0xa4:Completed 187500 out of 250000 steps  (75%)
20:44:06:WU01:FS00:0xa4:Completed 190000 out of 250000 steps  (76%)
20:56:32:WU01:FS00:0xa4:Completed 192500 out of 250000 steps  (77%)
21:08:30:WU01:FS00:0xa4:Completed 195000 out of 250000 steps  (78%)
21:20:58:WU01:FS00:0xa4:Completed 197500 out of 250000 steps  (79%)
21:33:03:WU01:FS00:0xa4:Completed 200000 out of 250000 steps  (80%)
21:49:41:WU01:FS00:0xa4:Completed 202500 out of 250000 steps  (81%)
22:06:35:WU01:FS00:0xa4:Completed 205000 out of 250000 steps  (82%)
22:22:50:WU01:FS00:0xa4:Completed 207500 out of 250000 steps  (83%)
22:34:11:WU01:FS00:0xa4:Completed 210000 out of 250000 steps  (84%)
22:46:32:WU01:FS00:0xa4:Completed 212500 out of 250000 steps  (85%)
22:53:33:WARNING:WU01:FS00:Past final deadline 2012-01-17T22:53:32, dumping
22:53:33:WU01:FS00:Shutting core down
22:53:33:WARNING:WU01:FS00:FahCore not accepting gentle shutdown, killing
22:53:33:WARNING:WU01:FS00:Killing WU01
22:53:33:WU01:FS00:Cleaning up

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 1:16 am
by diwakar
I have changed the deadlines, timeout and k-factor after discussion with other Pande group members. It should be ok now.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 2:44 am
by Hyperlife
Thanks for the quick update.

Final deadlines of 3.54 and 4.62 for a WU that can be folded on uniprocessor clients still seem a bit short, especially for those who don't fold 24/7.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 3:15 am
by diwakar
First thing we wanted to do here was to have consistent settings for all the SMP projects. I had similar deadlines before we changed the settings so it should work but we can reflect on the deadlines extension issue if there are any more reports about WU termination due to short deadlines.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 3:45 am
by Hyperlife
Those are reasonable deadlines for SMP projects. However, with A4 WUs that can be folded with either SMP or uniprocessor slots, the final deadline calculation shouldn't ignore the fact that classic clients need more generous deadlines.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 4:10 am
by diwakar
I understand that it could be an issue. However, there is already a significant difference between the deadlines for multi-core only and multi-core+uniprocessor WU's. For example, the deadline for 8001 WU's for multi-core only would be 0.68 and for uniprocessor case it is 3.54. We can certainly look into the issue of extending the deadline and it is something I will keep in mind when starting any new projects. Thanks for your quick feedback and for reporting issues with 80xx WU's.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 5:19 am
by brityank
diwakar - Is there some reason that these are no longer getting Bonus Points? I've missed at least three over the past few days. I see there's another thread on P8004 with the same complaint. Thanks.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 5:29 am
by diwakar
brityank wrote:diwakar - Is there some reason that these are no longer getting Bonus Points? I've missed at least three over the past few days. I see there's another thread on P8004 with the same complaint. Thanks.
According to my logs, everyone is getting the bonus points for these WU's. We have readjusted the bonus factors to be in line with other SMP projects so the bonus points are less than their earlier value.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 8:00 am
by 7im
These are small work units that fold very quickly, so a 3.5 day deadline is in line with other a4 work units. Only their size (duration) is different (shorter).

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 8:38 am
by Jonazz
diwakar wrote:I understand that it could be an issue. However, there is already a significant difference between the deadlines for multi-core only and multi-core+uniprocessor WU's. For example, the deadline for 8001 WU's for multi-core only would be 0.68 and for uniprocessor case it is 3.54. We can certainly look into the issue of extending the deadline and it is something I will keep in mind when starting any new projects. Thanks for your quick feedback and for reporting issues with 80xx WU's.
You have to take in account that lots of unicore folders fold with older hardware and do not run 24/7. Unless you specifically state you don't need this older hardware, unicore WU's should keep their long deadlines.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 5:39 pm
by 7im
As I said above, they have kept a longer deadline.

A WU that takes 1 day to fold gets 10 days deadline. A WU that only takes a few hours to fold only needs 1 or 2 days for a deadline. This one has a 3.5 deadline. Is that not long enough?

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 5:52 pm
by Hyperlife
7im wrote:A WU that takes 1 day to fold gets 10 days deadline. A WU that only takes a few hours to fold only needs 1 or 2 days for a deadline. This one has a 3.5 deadline. Is that not long enough?
If we use your 10x ratio, then no.

On my Pentium 4 2.8GHz, the TPF on p8001 is 11 minutes, which means it takes around 18 hours to finish. 10 times 18 hours is 7.6 days, which is more than double the current 3.5 day final deadline.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 6:13 pm
by bruce
In the old FAQ, FAH wrote:How do you set the deadlines for the work units?

Each work unit is benchmarked on a dedicated 2.8 GHz Pentium 4 machine with SSE2 disabled. For most work units (although there may be exceptions, described in the next paragraph), we apply this equation:

timeout = 20 * (daysPerWU) + 2 deadline = max(30* (daysPerWU) + 2,10)
timeout = 20*(18 hours/24) +2 = 17 days.

That policy was established when a P4 was a pretty common machine. They're pretty rare now. If we assume there will always be part-time donors, but most of them have faster machines, it's not unreasonable for the Pande Group to establish a new policy that tightens up the deadlines somewhat. IMHO, 17 days is too long and 3.5 days is too short, but either way, it requires a policy change.

... and let's not use a guess like 10X unless we call it a proposed policy change.

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 6:14 pm
by 7im
Hyperlife wrote:
7im wrote:A WU that takes 1 day to fold gets 10 days deadline. A WU that only takes a few hours to fold only needs 1 or 2 days for a deadline. This one has a 3.5 deadline. Is that not long enough?
If we use your 10x ratio, then no.

On my Pentium 4 2.8GHz, the TPF on p8001 is 11 minutes, which means it takes around 18 hours to finish. 10 times 18 hours is 7.6 days, which is more than double the current 3.5 day final deadline.
Use whatever ratio you want... it was simply a demonstrative example as to relative size (length). I did not quote the actual ratio. Maybe someone other than me should actually look that up... ;)

Edit: Looks like Bruce beat me to it. :lol:

Re: Deadlines way too short on 8001, 8004, 8011

Posted: Wed Jan 18, 2012 6:27 pm
by Hyperlife
And the deadline using the actual ratio, as Bruce demonstrated, is much more than 3.5 days. So again, to answer your question: no, that is not enough time.

Until a new policy is implemented, the current formula, using a P4 2.8GHz as the benchmark, should be used for the calculation of all uniprocessor WUs.