Skip to content

Commit

Permalink
[SYCL][NVPTX] Enable approximate div/sqrt with -ffast-math (#15553)
Browse files Browse the repository at this point in the history
The generation of approximate div/sqrt in the NVPTX backend is driven by
the "unsafe-fp-math" function attribute. Presumably when the
optimization was first added there was no way of getting at this
information from ISel, or even that there was no suitable
instruction-level representation to begin with.

Even today, the `afn` fast-math flag is appropriate for relaxing sqrt to
an approximate version, but while some targets apply that reasoning to
fdiv, it's not clear that's a valid reading of the language reference
manual.

The problem with using the function attribute is that when inlining it
must be set on *both* caller/callee functions, otherwise it is wiped.

Since CUDA's devicelib bytecode library has hundreds functions with
unsafe-fp-math explicitly disabled, if we inline those functions into
SYCL kernels, we disable the ability for the backend to generate
approximate functions, not just inside the devicelib function but across
the entire kernel.

This might explain why some performance reports we've received suggest
that inlining certain maths functions can make things worse even when
the CUDA compiler does the same thing (e.g., #14358 though this needs
verified).

For this reason, presuambly, the NVPTX backend has two codegen options
that override the function attribute and always generate approximate
div/sqrt instructions. This patch thus explicitly sets these options
when compiling SYCL for NVPTX GPUs. It does not do so for regular C/C++
or CUDA code to limit the wider impact on existing code.
  • Loading branch information
frasercrmck authored Oct 10, 2024
1 parent 7f59dea commit 65898a3
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 0 deletions.
9 changes: 9 additions & 0 deletions clang/lib/Driver/ToolChains/Cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -946,6 +946,15 @@ void CudaToolChain::addClangTargetOptions(

if (DriverArgs.hasArg(options::OPT_fsycl_fp32_prec_sqrt))
CC1Args.push_back("-fcuda-prec-sqrt");

bool FastRelaxedMath = DriverArgs.hasFlag(
options::OPT_ffast_math, options::OPT_fno_fast_math, false);
bool UnsafeMathOpt =
DriverArgs.hasFlag(options::OPT_funsafe_math_optimizations,
options::OPT_fno_unsafe_math_optimizations, false);
if (FastRelaxedMath || UnsafeMathOpt)
CC1Args.append({"-mllvm", "--nvptx-prec-divf32=0", "-mllvm",
"--nvptx-prec-sqrtf32=0"});
} else {
CC1Args.append(
{"-fcuda-is-device", "-mllvm", "-enable-memcpyopt-without-libcalls"});
Expand Down
18 changes: 18 additions & 0 deletions clang/test/Driver/sycl-nvptx-fast-math.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// REQUIRES: nvptx-registered-target

// RUN: %clang -### -nocudalib \
// RUN: -fsycl -fsycl-targets=nvptx64-nvidia-cuda %s 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-DEFAULT %s

// RUN: %clang -### -nocudalib \
// RUN: -fsycl -fsycl-targets=nvptx64-nvidia-cuda -ffast-math %s 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-FAST %s

// RUN: %clang -### -nocudalib \
// RUN: -fsycl -fsycl-targets=nvptx64-nvidia-cuda -funsafe-math-optimizations %s 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-FAST %s

// CHECK-FAST: "-mllvm" "--nvptx-prec-divf32=0" "-mllvm" "--nvptx-prec-sqrtf32=0"

// CHECK-DEFAULT-NOT: "nvptx-prec-divf32=0"
// CHECK-DEFAULT-NOT: "nvptx-prec-sqrtf32=0"

0 comments on commit 65898a3

Please sign in to comment.