INT 8 and 16 for precise calculations?

MeeLee · Post by **MeeLee** » Tue Sep 22, 2020 4:33 pm

Would it in theory be possible, to use INT 8, and 16 shaders (like RT cores, and other), to calculate FAH projects, with an equal precision, by either looping the data 2 to 4x through that shader, or using multiple shaders to perform the duty of a full FP 32 bit (cuda) core?

And if it is, can it be used to either enable GPUs for certain workloads, or even speed up GPU workloads?

Post by **Joe_H** » Tue Sep 22, 2020 5:16 pm

In theory you can do all kinds of calculations on integer registers to be used in place of floating point. In practice it takes many more cycles to do what can be done on floating point registers and adds more levels of complexity to the code and debugging so that it is rarely used these days. Something that might have been done 30+ years ago when floating point often might not be supported in hardware.

I see absolutely no benefit for F@h in that approach.

JimboPalmer · Post by **JimboPalmer** » Tue Sep 22, 2020 6:43 pm

FP32 has a 24 bit mantissa, so both INT16 and INT8 are less precise than FP32. If you use 2 INT16 or 3 INT8 registers, and a whole lot of slow code, you could achieve the same precision as is built into the CPU.

So for a 4 times slower WU, you could write that code. I find that I make more correct code when I take advantage of the built in computer abilities. (which is why i wrote in PL/SQL the last decade I was a programmer)

https://en.wikipedia.org/wiki/PL/SQL

You mention using INT code to replace FP omissions, which are present in old, slow GPUs. Slowing old, slow GPUs even further does not seem ideal.

MeeLee · Post by **MeeLee** » Wed Sep 23, 2020 12:12 am

But despite the slowdown, looking at a 3090, it can calculate up to ~36Tflops, 142 of FP 16, and/or 285 Tensor tflops.
That's 36 Tflops at 32 bit,
Potentially ~70tflops at 16 bit
and/or the same 70tflops at 8 bit.

Not sure if those ray tracing cores can be added to the regular cores.
If optimized, it seems like they could outdo the 32bit cores!

Post by **PantherX** » Wed Sep 23, 2020 7:50 am

I personally think that rather than "going backwards" maybe we think "forwards" as in, we don't use those Tensor cores for FP32 processing but instead, can we use it for something new and exciting, like AI or ML or unique algorithms. We are already using FP32 for folding so why not see what the "additional" hardware can be used to supplement F@H. I have no idea if F@H can even do those things but dreams are free

Post by **bruce** » Wed Sep 23, 2020 5:22 pm

MeeLee wrote:But despite the slowdown, looking at a 3090, it can calculate up to ~36Tflops, 142 of FP 16, and/or 285 Tensor tflops.
That's 36 Tflops at 32 bit,
Potentially ~70tflops at 16 bit
and/or the same 70tflops at 8 bit.

Not sure if those ray tracing cores can be added to the regular cores.
If optimized, it seems like they could outdo the 32bit cores!

Yes, in theory, It could work but it would be EXPENSIVE in programming time and debugging time plus the issues of validating some entirely new code. It's easier to wait for the next generation of hardware that adds hardware that will enhance the FP performance.

I think it's better to use the tensor cores for problems that will benefit from the use of tensor mathematics. (I'm sure that there are FAH scientists already considering such things.) In the meantime, you can temporary add another GPU that can produce 36 Tflops at 32 bit plus ?? Tflops at 64 bit and donate it to some needy FAH donor whenever you do your next upgrade.

Folding Forum

INT 8 and 16 for precise calculations?

INT 8 and 16 for precise calculations?

Re: INT 8 and 16 for precise calculations?

Re: INT 8 and 16 for precise calculations?

Re: INT 8 and 16 for precise calculations?

Re: INT 8 and 16 for precise calculations?

Re: INT 8 and 16 for precise calculations?