[SYCL][NVPTX] Enable approximate div/sqrt with -ffast-math #15553

frasercrmck · 2024-09-30T11:27:00Z

The generation of approximate div/sqrt in the NVPTX backend is driven by the "unsafe-fp-math" function attribute. Presumably when the optimization was first added there was no way of getting at this information from ISel, or even that there was no suitable instruction-level representation to begin with.

Even today, the afn fast-math flag is appropriate for relaxing sqrt to an approximate version, but while some targets apply that reasoning to fdiv, it's not clear that's a valid reading of the language reference manual.

The problem with using the function attribute is that when inlining it must be set on both caller/callee functions, otherwise it is wiped.

Since CUDA's devicelib bytecode library has hundreds functions with unsafe-fp-math explicitly disabled, if we inline those functions into SYCL kernels, we disable the ability for the backend to generate approximate functions, not just inside the devicelib function but across the entire kernel.

This might explain why some performance reports we've received suggest that inlining certain maths functions can make things worse even when the CUDA compiler does the same thing (e.g., #14358 though this needs verified).

For this reason, presuambly, the NVPTX backend has two codegen options that override the function attribute and always generate approximate div/sqrt instructions. This patch thus explicitly sets these options when compiling SYCL for NVPTX GPUs. It does not do so for regular C/C++ or CUDA code to limit the wider impact on existing code.

The generation of approximate div/sqrt in the NVPTX backend is driven by the "unsafe-fp-math" function attribute, as there is currently no suitable instruction-level representation (flag, metadata, etc.) for this optimization. The problem with this function attribute is that when inlining it must be set on *both* caller/callee functions, otherwise it is wiped. Since CUDA's devicelib bytecode library has hundreds functions with unsafe-fp-math explicitly disabled, if we inline those functions into SYCL kernels, we disable the ability for the backend to generate approximate functions, not just inside the devicelib function but across the entire kernel. This might explain why some performance reports we've received suggest that inlining certain maths functions can make things worse even when the CUDA compiler does the same thing (e.g., intel#14358 though this needs verified). For this reason, presuambly, the NVPTX backend has two codegen options that override the function attribute and always generate approximate div/sqrt instructions. This patch thus explicitly sets these options when compiling SYCL for NVPTX GPUs. It does not do so for regular C/C++ or CUDA code to limit the wider impact on existing code.

frasercrmck · 2024-09-30T17:43:32Z

I note that the AMDGPU backend treats the fast-math afn flag as allowing approximate calculations on fdiv. The LangRef says:

Allow substitution of approximate calculations for functions (sin, log, sqrt, etc).
See floating-point intrinsic definitions for places where this can apply to LLVM’s intrinsic math functions.

so my reading was that it's not intended for fdiv as it's not "a function". It's not entirely clear either way so I can see AMD's reading too.

frasercrmck · 2024-10-08T09:13:53Z

Ping @intel/dpcpp-clang-driver-reviewers

mdtoguchi

The driver changes look OK to me. Beyond just checking that the proper options are being passed to the device compilation, are there any codegen tests to verify the behavior?

frasercrmck · 2024-10-09T18:59:49Z

The driver changes look OK to me. Beyond just checking that the proper options are being passed to the device compilation, are there any codegen tests to verify the behavior?

Thanks for the review!

Just to check - do you mean IR codegen or NVPTX codegen? There wouldn't be anything meaningful to check in LLVM IR because it doesn't affect the IR at all. The NVPTX codegen is tested elsewhere like here (div) and here (sqrt).

So in short I think it's already covered?

mdtoguchi · 2024-10-09T20:04:06Z

The driver changes look OK to me. Beyond just checking that the proper options are being passed to the device compilation, are there any codegen tests to verify the behavior?

Thanks for the review!

Just to check - do you mean IR codegen or NVPTX codegen? There wouldn't be anything meaningful to check in LLVM IR because it doesn't affect the IR at all. The NVPTX codegen is tested elsewhere like here (div) and here (sqrt).

So in short I think it's already covered?

Yes, NVPTX codegen. Thanks!

frasercrmck · 2024-10-10T05:26:50Z

@intel/llvm-gatekeepers This is ready to merge, cheers

frasercrmck requested a review from a team as a code owner September 30, 2024 11:27

frasercrmck temporarily deployed to WindowsCILock September 30, 2024 11:28 — with GitHub Actions Inactive

frasercrmck temporarily deployed to WindowsCILock September 30, 2024 12:05 — with GitHub Actions Inactive

mdtoguchi approved these changes Oct 8, 2024

View reviewed changes

sommerlukas merged commit 65898a3 into intel:sycl Oct 10, 2024
13 checks passed

frasercrmck deleted the sycl-nvptx-fast-math branch October 10, 2024 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][NVPTX] Enable approximate div/sqrt with -ffast-math #15553

[SYCL][NVPTX] Enable approximate div/sqrt with -ffast-math #15553

frasercrmck commented Sep 30, 2024 •

edited

Loading

frasercrmck commented Sep 30, 2024 •

edited

Loading

frasercrmck commented Oct 8, 2024

mdtoguchi left a comment

frasercrmck commented Oct 9, 2024

mdtoguchi commented Oct 9, 2024

frasercrmck commented Oct 10, 2024

[SYCL][NVPTX] Enable approximate div/sqrt with -ffast-math #15553

[SYCL][NVPTX] Enable approximate div/sqrt with -ffast-math #15553

Conversation

frasercrmck commented Sep 30, 2024 • edited Loading

frasercrmck commented Sep 30, 2024 • edited Loading

frasercrmck commented Oct 8, 2024

mdtoguchi left a comment

Choose a reason for hiding this comment

frasercrmck commented Oct 9, 2024

mdtoguchi commented Oct 9, 2024

frasercrmck commented Oct 10, 2024

frasercrmck commented Sep 30, 2024 •

edited

Loading

frasercrmck commented Sep 30, 2024 •

edited

Loading