The reason for that is likely that AMD changed the ratio of FP32 and FP64 throughput per WGP/CU with their recent RDNA3 GPUs. FP64 throughput was halved to 1:32 (or 1:64 when considering dual-issue) of FP32, while FP32 throughput per clock cycle has potentially been doubled from 64 FP32/CU to 128 FP32/CU, which is basically the same FP64:FP32 Ratio as Ampere/Lovelace. RDNA3 should perfom significantly better with OpenMM in Single or Mixed mode.
