Would it in theory be possible, to use INT 8, and 16 shaders (like RT cores, and other), to calculate FAH projects, with an equal precision, by either looping the data 2 to 4x through that shader, or using multiple shaders to perform the duty of a full FP 32 bit (cuda) core?
And if it is, can it be used to either enable GPUs for certain workloads, or even speed up GPU workloads?
INT 8 and 16 for precise calculations?
Moderators: Site Moderators, FAHC Science Team
-
- Site Admin
- Posts: 7939
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: INT 8 and 16 for precise calculations?
In theory you can do all kinds of calculations on integer registers to be used in place of floating point. In practice it takes many more cycles to do what can be done on floating point registers and adds more levels of complexity to the code and debugging so that it is rarely used these days. Something that might have been done 30+ years ago when floating point often might not be supported in hardware.
I see absolutely no benefit for F@h in that approach.
I see absolutely no benefit for F@h in that approach.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: INT 8 and 16 for precise calculations?
FP32 has a 24 bit mantissa, so both INT16 and INT8 are less precise than FP32. If you use 2 INT16 or 3 INT8 registers, and a whole lot of slow code, you could achieve the same precision as is built into the CPU.
So for a 4 times slower WU, you could write that code. I find that I make more correct code when I take advantage of the built in computer abilities. (which is why i wrote in PL/SQL the last decade I was a programmer)
https://en.wikipedia.org/wiki/PL/SQL
You mention using INT code to replace FP omissions, which are present in old, slow GPUs. Slowing old, slow GPUs even further does not seem ideal.
So for a 4 times slower WU, you could write that code. I find that I make more correct code when I take advantage of the built in computer abilities. (which is why i wrote in PL/SQL the last decade I was a programmer)
https://en.wikipedia.org/wiki/PL/SQL
You mention using INT code to replace FP omissions, which are present in old, slow GPUs. Slowing old, slow GPUs even further does not seem ideal.
Last edited by JimboPalmer on Wed Sep 23, 2020 2:28 am, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: INT 8 and 16 for precise calculations?
But despite the slowdown, looking at a 3090, it can calculate up to ~36Tflops, 142 of FP 16, and/or 285 Tensor tflops.
That's 36 Tflops at 32 bit,
Potentially ~70tflops at 16 bit
and/or the same 70tflops at 8 bit.
Not sure if those ray tracing cores can be added to the regular cores.
If optimized, it seems like they could outdo the 32bit cores!
That's 36 Tflops at 32 bit,
Potentially ~70tflops at 16 bit
and/or the same 70tflops at 8 bit.
Not sure if those ray tracing cores can be added to the regular cores.
If optimized, it seems like they could outdo the 32bit cores!
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: INT 8 and 16 for precise calculations?
I personally think that rather than "going backwards" maybe we think "forwards" as in, we don't use those Tensor cores for FP32 processing but instead, can we use it for something new and exciting, like AI or ML or unique algorithms. We are already using FP32 for folding so why not see what the "additional" hardware can be used to supplement F@H. I have no idea if F@H can even do those things but dreams are free
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: INT 8 and 16 for precise calculations?
Yes, in theory, It could work but it would be EXPENSIVE in programming time and debugging time plus the issues of validating some entirely new code. It's easier to wait for the next generation of hardware that adds hardware that will enhance the FP performance.MeeLee wrote:But despite the slowdown, looking at a 3090, it can calculate up to ~36Tflops, 142 of FP 16, and/or 285 Tensor tflops.
That's 36 Tflops at 32 bit,
Potentially ~70tflops at 16 bit
and/or the same 70tflops at 8 bit.
Not sure if those ray tracing cores can be added to the regular cores.
If optimized, it seems like they could outdo the 32bit cores!
I think it's better to use the tensor cores for problems that will benefit from the use of tensor mathematics. (I'm sure that there are FAH scientists already considering such things.) In the meantime, you can temporary add another GPU that can produce 36 Tflops at 32 bit plus ?? Tflops at 64 bit and donate it to some needy FAH donor whenever you do your next upgrade.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.