Unusually low PPD on project 16927?

Moderators: Site Moderators, FAHC Science Team

Hopfgeist
Posts: 70
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Unusually low PPD on project 16927?

Post by Hopfgeist »

Hi there,

normally I get some variation in PPD from different projects, and that is expected, because not all CPUs are identical, but on my main CPU folding system (dual Xeon X5675) I almost always get between 85,000 and 115,000 PPD.

However, Project 16927 consistently gives just around 45,000 PDD, which is quite a big outlier.

Are there other people experiencing this, or is the benchmark for this project known to be skewed?

I don't worry too much about it, just curious, because one other machine (single Xeon E5-1428L v2), which is, on all other benchmarks, roughly half as fast, is working on Project 17216 and is achieving 75,000 PD compared to its normal 45,000--50,000.

Just curious,
HG.
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon [email protected], 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon [email protected], 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: [email protected], 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Unusually low PPD on project 16927?

Post by Neil-B »

Have you checked your logs? .. not sure if there is a thread limit on that project .. the log will show if the as/client is running a lower thread count wu .. this can happen if the project has a thread limit and there aren't any WUs that will fully use your kit.

There are some variable ppd projects around at the moment which are peaking high or low depending on the kit in play.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Hopfgeist
Posts: 70
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Unusually low PPD on project 16927?

Post by Hopfgeist »

Neil-B wrote:Have you checked your logs? .. not sure if there is a thread limit on that project .. the log will show if the as/client is running a lower thread count wu .. this can happen if the project has a thread limit and there aren't any WUs that will fully use your kit.

There are some variable ppd projects around at the moment which are peaking high or low depending on the kit in play.
Thanks for the reply. Yes, I checked, and it was running on all 24 threads.

Code: Select all

04:42:14:WU01:FS00:0xa7:Project: 16927 (Run 24, Clone 226, Gen 4)
04:42:14:WU01:FS00:0xa7:Unit: 0x00000000000000000000000000000000
04:42:14:WU01:FS00:0xa7:Reading tar file core.xml
04:42:14:WU01:FS00:0xa7:Reading tar file frame4.tpr
04:42:14:WU01:FS00:0xa7:Digital signatures verified
04:42:14:WU01:FS00:0xa7:Calling: mdrun -s frame4.tpr -o frame4.trr -cpt 15 -nt 24
04:42:15:WU01:FS00:0xa7:Steps: first=2000000 total=500000
04:42:17:WU01:FS00:0xa7:Completed 1 out of 500000 steps (0%)
04:44:23:WU01:FS00:0xa7:Completed 5000 out of 500000 steps (1%)
04:46:29:WU01:FS00:0xa7:Completed 10000 out of 500000 steps (2%)
So it was started with "-nt 24", and CPU monitoring tools confirm that it actually runs 24 threads.

I have never seen a WU assigned to me using fewer than the advertised number of cores, however I have frequently seen a message that no WUs were available for my configuration, but that is expected occasionally.

(As noted before in another post, I get almost identical PPD whether running on all 24 CPU threads, or just one folding thread per physical CPU core, and this WU was no exception when I stopped it and restarted with reduced thread count.)

The specific work unit was Project: 16927 (Run 24, Clone 226, Gen 4), which has since been successfully uploaded, but not yet credited.

As I said, not to worry too much, just making sure nothing is broken, or my machine isn't somehow acting weird after a small kernel upgrade.


Cheers,
HG.
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Maddog
Posts: 15
Joined: Wed Sep 30, 2020 2:06 pm

Re: Unusually low PPD on project 16927?

Post by Maddog »

"Are there other people experiencing this, or is the benchmark for this project known to be skewed?"

Yes, had a couple of those, my 8 thread intel cpu needs over 4 Hours to complete these worrk units. PPD is under half the normal average 60,000 PPD.

Just checked the one that finished and uploaded earlier this morning (14. 684. 4) : not found.
DrBB1
Posts: 136
Joined: Wed Mar 26, 2008 12:30 am
Location: SE PA

Re: Unusually low PPD on project 16927?

Post by DrBB1 »

I just came to the forum to check on this very issue. Am running on a 10-year old PC, where I usually earn about 8000-9000 PPD on my WUs. This WU (project:16927 run:7 clone:956 gen:3) earned about 2500 PPD, and took over a day to complete (total points: 2793).
========
DrBB1
Joe_H
Site Admin
Posts: 7938
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Unusually low PPD on project 16927?

Post by Joe_H »

This should be fixed now for WUs assigned today and onwards. A setting was changed a couple days ago as part of the project being moved to a new server and has been corrected.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Unusually low PPD on project 16927?

Post by bruce »

Hopfgeist wrote: (As noted before in another post, I get almost identical PPD whether running on all 24 CPU threads, or just one folding thread per physical CPU core, and this WU was no exception when I stopped it and restarted with reduced thread count.)
This is not surprising. Modern CPUs typically are designed so that two IPUs share the resources of one FPU. This reduces the cost of the chip and makes use of the fact that for "normal" computer use, the FPU is under-utilized. FAH isn't "normal" since it depends mostly on the throughput of the FPU.

A second factor: You may also be experiencing thermal limiting. If your typical clock rate is below the rated boost speed, you MIGHT get more throughput by upgrading your cooling subsystem. Processors often are designed to accommodate brief speed excursions above the average clock rate as long as the temperature rise is brief enough. Then they can advertise a speed that's above what can actually be achieved long-term.
DrBB1
Posts: 136
Joined: Wed Mar 26, 2008 12:30 am
Location: SE PA

Re: Unusually low PPD on project 16927?

Post by DrBB1 »

Joe_H wrote:This should be fixed now for WUs assigned today and onwards. A setting was changed a couple days ago as part of the project being moved to a new server and has been corrected.
Got another one 5 hours ago. Currently running at estimated 2805 PPD, about one-third of what I normally earn. Problem is not fixed. [Project: 16927 (Run 17, Clone 52, Gen 48)]

UPDATE: About halfway through this WU, the time per frame was cut in half; estimated PPD now back to a reasonable (though still below average) 7000+ PPD. After 16 hours, its still only 85% finished, but progressing.
========
DrBB1
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: Unusually low PPD on project 16927?

Post by psaam0001 »

I also had a few work units cross my path, with an unusually short completion due by time (less than others that I have had before).

They are:

16927 (19, 311, 3)
16927 (29, 358, 4)
16927 (28, 444, 4)
16927 (20, 324, 3)
16927 (30, 333, 3)

My intent is to let them expire, as I did get a different WU from this project that was more consistent with the previous completion due by time frames.

Paul
Hopfgeist
Posts: 70
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Unusually low PPD on project 16927?

Post by Hopfgeist »

psaam0001 wrote:I also had a few work units cross my path, with an unusually short completion due by time (less than others that I have had before).

They are:

16927 (19, 311, 3)
16927 (29, 358, 4)
16927 (28, 444, 4)
16927 (20, 324, 3)
16927 (30, 333, 3)

My intent is to let them expire, as I did get a different WU from this project that was more consistent with the previous completion due by time frames.

Paul
You mean these work units have a shorter due-by time than other work units from the same project? That would be highly unusual, and indicative of an error.

At least as currently listed on the summary page, project 16927 has a reasonable timeout (2 days), and an unusually long deadline (20 days). The latter presumably because it is not a disease-related project, and thus not considered urgent or otherwise high-priority.

Otherwise, if your machine can handle it within that timeframe, I strongly recommend to have your machine work on them. Work units within one project/run/clone combination are sequential, and the next "gen" work unit depends on the previous one. Letting them expire will block process on that chain for until expiry.

Cheers,
HG.
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Joe_H
Site Admin
Posts: 7938
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Unusually low PPD on project 16927?

Post by Joe_H »

For a short time the 16927 WUs were getting a timeout of 3 days and a final deadline of 5. That happened to be the settings for some other projects being relocated to the new servers. The change in final deadline resulted in lower PPD and final bonus credit. The project dates back a while to when final deadlines for lower priority projects were longer, and was benchmarked as such. To get comparable credit the project would need to be benchmarked again with a different deadline.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: Unusually low PPD on project 16927?

Post by psaam0001 »

The majority of those work units I had been getting, had an expiration time of 20 days from when I received them to start folding. However, the ones I mentioned had a much shorter time to complete.

It's not that I was complaining about the points, I was trying to see if I just caught the last batch of WU's that were sent out before the server change.

Paul
zotric
Posts: 13
Joined: Sat Nov 14, 2020 5:57 pm
Hardware configuration: HOMEBREW
Processor 12th Gen Intel(R) Core(TM) i9-12900K 3.19 GHz
Installed RAM 32.0 GB (31.8 GB usable)
RTX 3080 Ti

LEGION T7 34IMZ5
Processor Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz 3.79 GHz
Installed RAM 16.0 GB (15.8 GB usable)
RTX 3070
Location: London, UK

Re: Unusually low PPD on project 16927?

Post by zotric »

Update following my original report (below)
I've stopped using the CPU now anyway so I'm leaving this here just in case it's useful.

Summary (following the original report in which the thread count seemed to be ignored for project 16927 (7, 1513, 27)):
1. When the PC was rebooted the number of cores was found to be running per the thread setting in FAHControl and performance was OK.
2. Errors continued to be reported in the log.
3. Project 16927 was eventually abandoned by the system (because of the error count?)

Detail:
Following my original report, below, I restarted the PC and the core usage went up correctly to the setting in FAHControl (-1).
A shock because all 28 threads started and the temperature hit 90 degrees!
I don't think it has behaved this way before - I thought it responded straight away when the number of threads was changed in FAHControl.
Turned down the thread count to 6 and restarted again - not wanting the CPU to melt.
Then I found that the WU for project 16927 (Unspecified, Temple University) had been abandoned and a new one started (17423 - Myosins, Washington University in St. Louis).
Logging had stopped.
WU for project 17423 completed, apparently successfully.

Logs show error 0x40010004, the thread count being set back to 3 followed by more errors - I think I set it to -1 but the system seems to have set it back to 3.
Then there is an error which is repeated several times.
This is mixed up with the CUDA Core x22 starting or re-starting which seems unrelated. I have not seen any errors with the gpu.

15:27:12:WARNING:WU00:FS00:FahCore crashed with Windows unhandled exception code 0x40010004, searching for this code online may provide more information
15:27:12:WARNING:WU00:FS00:FahCore returned: UNKNOWN_ENUM (1073807364 = 0x40010004)
15:27:12:WARNING:WU01:FS01:FahCore crashed with Windows unhandled exception code 0x40010004, searching for this code online may provide more information
15:27:12:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (1073807364 = 0x40010004)
15:27:12:WU00:FS00:Starting
15:27:12:WARNING:WU00:FS00:AS lowered CPUs from 27 to 3
15:27:12:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\david\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7.exe -dir 00 -suffix 01 -version 706 -lifeline 4116 -checkpoint 6 -np 3
15:27:12:WU00:FS00:Started FahCore on PID 46188
15:27:12:WU00:FS00:FahCore 0xa7 started
15:27:13:WARNING:WU00:FS00:FahCore returned an unknown error code which probably indicates that it crashed
15:27:13:WARNING:WU00:FS00:FahCore returned: UNKNOWN_ENUM (-1073741205 = 0xc000026b)
15:27:13:WU01:FS01:Starting
15:27:13:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\david\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 4116 -checkpoint 6 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
15:27:13:WU01:FS01:Started FahCore on PID 49052
15:27:13:WU01:FS01:FahCore 0x22 started
15:27:13:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
15:27:13:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (-1073741205 = 0xc000026b)

Later the log says FahCore (presumably 0xa7) crashed several times with the same errors.
Logging seems to have stopped after that time so there is no record of the WU for project 17423 started later and which completed before I removed the cpu entry from FAHControl.

Original:
Still seeing unusually low PPD for project 16927.
Summary for Work Unit(16927 (30, 1175, 33)): low core count used, low PPD per core, 20 days is a surprisingly long time given to complete.
1. Part of the cause for the low PPD it that this work unit is only using four cores, at 100%, out of 14 on a 10940X processor.
I know of no way to find out how fast this particular WU it would run on the GPU for comparison.
2. The PPD per core is also low - less than half what I would expect per core for a COVID-19 unit running on the same 0xA7 FahCore.
3. The above unit is allowing 20 days to complete per the previous post. Does this seem high given that the actual ETA was about 4 or 5 hours?
Last edited by zotric on Sat Feb 13, 2021 9:13 pm, edited 1 time in total.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Unusually low PPD on project 16927?

Post by bruce »

Yes, WUs from p169xx are highly variable but that doesn't seem to be your real problem.

I'm not getting p16927 so I have to base my comments on the logs you have posted.

Where are you looking to see "the advertised number of cores"?

Are you adjusting the number of assigned threads manually? If so, when do you do so? Look back through your logs and determine how many cores were configured by the slot that initiated FAHClient's download of that project?

A slot that's configured for 4 CPUs will download WUs that cannot use more that 4 threads. For any slot that's going to download a new WU soon, increase the number allocated to some realistic number that might actually be available.
Joe_H
Site Admin
Posts: 7938
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Unusually low PPD on project 16927?

Post by Joe_H »

All I can comment on is that the PPD that I have been getting on WUs from this project have been within the normal range for the systems I have. YMMV, but this applies to all projects depending on hardware and other configuration differences.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply