Unless I'm mistaken, distributed.net
OGR is ALU-only. This particular math puzzle seems to have some interesting practical applications:
OGR's have many applications including sensor placements for X-ray crystallography and radio astronomy. Golomb rulers can also play a significant role in combinatorics, coding theory and communications, and Dr. Golomb was one of the first to analyze them for use in these areas.
X-ray crystallography is used in protein studies. Who knows, OGR might even benefit FAH indirectly. According to Wikipedia, one of the coordinators of distributed.net in its current form is certain
Adam L. Beberg. Hmm, why does the name
Beberg sound vaguely familiar...
For starters, I was surprised to see that running 4 OGR crunchers instead of just 2 gave me over 1.5x performance boost. Then again, ALUs are much simpler than FPUs and HyperThreads have
some resources duplicated, so I presume the ALU side of my 2C/4T gets closer to a true quad than the FPU side. Consider AMD BullDozer, for example; ALU side is true octocore but it actually has only 4 FPUs... Without further ado, let's have FAH and OGR duke it out on my 2C/4T CPU.
OGR only:
- 28 Mnodes / s == 36ms / Mnode (2 crunchers, 50% CPU)
- 43 Mnodes / s == 23ms / Mnode (4 crunchers, 100% CPU)
2x P6892 uniprocessor CPU WUs only:
2x P6892 + 2x OGR :
- 29min TPF + 20Mnodes / s == 50ms / Mnode (50% uni + 48% OGR, 100% total)
2x P6892, P5770 and P7630:
- 29min TPF (50% uni + 0.5% GPU2 + 3.5% GPU3)
2x P6892, P5770 and P7630 + 2x OGR:
- 34min TPF + 17 Mnodes /s == 59ms / Mnode (50% uni + 0.5% GPU2 + 3.5% GPU + 43% OGR, 100% total)
I use Process Lasso to tweak priorities and affinities. Here are some further details and observations:
- OGR is Low priority and running on cores 0 and 3 along with GPU cores, ensuring access to both physical ALUs
- Uniprocessor slots are Above Normal priority and running on cores 1 and 2, ensuring access to both physical FPUs as well as minimal preemption from normal processes
- GPU2 slot is High priority and running on core 0, preempting just about everything on it
- GPU3 slot is High priority and running on core 3, --- "" ----
- GPU folding performance remains unaffected in all cases, no surprises there
- GPU folding requires also some kernel time because it needs to access the GPU hardware through drivers
- CPU kernel time produced by GPU folding seems to stick to cores 0 and 3 according to Task Manager graphs
HT is providing decent concurrency in the 2x P6892 + 2x OGR case. CPU is fully utilized and P6892 frame times increase only about (29-27) / 27 * 100% ==
7.4%. Since FAH is my priority charity, it's nice to see that uniprocessor slots are going strong while OGR takes the bigger hit in milliseconds per Mnode, (50-36) / 36 * 100% ==
39%, . I don't quite understand why uniprocessor frame times increase from 29min to 34min when I run everything concurrently. Maybe Task Manager doesn't show every little detail after all, and with 2x uni + 2x GPU + 2x OGR there shall certainly be frequent scheduling clashes on cores 0 and 3. OGR's 50 ==> 59 ms / Mnode increase is easily explained by the CPU overhead from GPU folding, though.
Conclusion: I'm going to stick with 2x uni + 2x GPU + 2x OGR. About (34-29) / 29 * 100% ==
17% increase in uniprocessor TPF introduced by adding OGR to the mix isn't that bad. Uniprocessor WUs have long deadlines anyway, and for some reason I never get any A4 WUs, so it's not like I'm losing any QRBes either.