[FMV][AArch64] Don't optimize backward compatible features in resolver. #90928

ilinpv · 2024-05-03T01:48:56Z

For arch64 features, such as Branch Target Identification or MTE (Memory Tagging Extension), compatible with targets that lack their support we may encounter scenarios where a binary compiled with MTE for example is executed on both MTE and non-MTE hardware and we still need to detect at runtime whether the MTE feature is available to choose the appropriate function version.
So, we cannot optimize the function multi versioning resolver by removing checks for these features enabled for the target during compilation.

For arch64 features, such as Branch Target Identification or MTE (Memory Tagging Extension), compatible with targets that lack their support we may encounter scenarios where a binary compiled with MTE for example is executed on both MTE and non-MTE hardware and we still need to detect at runtime whether the MTE feature is available to choose the appropriate function version. So, we cannot optimize the function multi versioning resolver by removing checks for these features enabled for the target during compilation.

llvmbot · 2024-05-03T01:49:29Z

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Pavel Iliin (ilinpv)

Changes

For arch64 features, such as Branch Target Identification or MTE (Memory Tagging Extension), compatible with targets that lack their support we may encounter scenarios where a binary compiled with MTE for example is executed on both MTE and non-MTE hardware and we still need to detect at runtime whether the MTE feature is available to choose the appropriate function version.
So, we cannot optimize the function multi versioning resolver by removing checks for these features enabled for the target during compilation.

Full diff: https://github.com/llvm/llvm-project/pull/90928.diff

2 Files Affected:

(modified) clang/lib/CodeGen/CodeGenFunction.cpp (+8-2)
(modified) clang/test/CodeGen/attr-target-clones-aarch64.c (+27-2)

diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 87766a758311d5..39943ed2a415e8 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2759,8 +2759,14 @@ llvm::Value *CodeGenFunction::FormAArch64ResolverCondition(
     const MultiVersionResolverOption &RO) {
   llvm::SmallVector<StringRef, 8> CondFeatures;
   for (const StringRef &Feature : RO.Conditions.Features) {
-    // Form condition for features which are not yet enabled in target
-    if (!getContext().getTargetInfo().hasFeature(Feature))
+    // Optimize the Function Multi Versioning resolver by creating conditions
+    // only for features that are not enabled in the target. The exception is
+    // for features whose extension instructions are executed as NOP on targets
+    // without extension support.
+    if (!getContext().getTargetInfo().hasFeature(Feature) ||
+        Feature.equals("bti") || Feature.equals("memtag") ||
+        Feature.equals("memtag2") || Feature.equals("memtag3") ||
+        Feature.equals("dgh"))
       CondFeatures.push_back(Feature);
   }
   if (!CondFeatures.empty()) {
diff --git a/clang/test/CodeGen/attr-target-clones-aarch64.c b/clang/test/CodeGen/attr-target-clones-aarch64.c
index f75d8a69ebf02f..603d067864b45a 100644
--- a/clang/test/CodeGen/attr-target-clones-aarch64.c
+++ b/clang/test/CodeGen/attr-target-clones-aarch64.c
@@ -526,8 +526,8 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 // CHECK-MTE-BTI-NEXT:  resolver_entry:
 // CHECK-MTE-BTI-NEXT:    call void @__init_cpu_features_resolver()
 // CHECK-MTE-BTI-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-MTE-BTI-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 4096
-// CHECK-MTE-BTI-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 4096
+// CHECK-MTE-BTI-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 17592186048512
+// CHECK-MTE-BTI-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 17592186048512
 // CHECK-MTE-BTI-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
 // CHECK-MTE-BTI-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
 // CHECK-MTE-BTI:       resolver_return:
@@ -604,7 +604,24 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 //
 // CHECK-MTE-BTI-LABEL: @ftc_dup3.resolver(
 // CHECK-MTE-BTI-NEXT:  resolver_entry:
+// CHECK-MTE-BTI-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-MTE-BTI-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-MTE-BTI-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-MTE-BTI-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK-MTE-BTI:       resolver_return:
 // CHECK-MTE-BTI-NEXT:    ret ptr @ftc_dup3._Mbti
+// CHECK-MTE-BTI:       resolver_else:
+// CHECK-MTE-BTI-NEXT:    [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-MTE-BTI-NEXT:    [[TMP5:%.*]] = and i64 [[TMP4]], 17592186044416
+// CHECK-MTE-BTI-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 17592186044416
+// CHECK-MTE-BTI-NEXT:    [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-MTE-BTI-NEXT:    br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK-MTE-BTI:       resolver_return1:
+// CHECK-MTE-BTI-NEXT:    ret ptr @ftc_dup3._Mmemtag2
+// CHECK-MTE-BTI:       resolver_else2:
+// CHECK-MTE-BTI-NEXT:    ret ptr @ftc_dup3.default
 //
 //
 // CHECK-MTE-BTI: Function Attrs: noinline nounwind optnone
@@ -712,7 +729,15 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 // CHECK-MTE-BTI:       resolver_return:
 // CHECK-MTE-BTI-NEXT:    ret ptr @ftc_inline3._MsbMsve
 // CHECK-MTE-BTI:       resolver_else:
+// CHECK-MTE-BTI-NEXT:    [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-MTE-BTI-NEXT:    [[TMP5:%.*]] = and i64 [[TMP4]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-MTE-BTI-NEXT:    br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK-MTE-BTI:       resolver_return1:
 // CHECK-MTE-BTI-NEXT:    ret ptr @ftc_inline3._Mbti
+// CHECK-MTE-BTI:       resolver_else2:
+// CHECK-MTE-BTI-NEXT:    ret ptr @ftc_inline3.default
 //
 //
 // CHECK-MTE-BTI: Function Attrs: noinline nounwind optnone

DanielKristofKiss

LGTM

efriedma-quic · 2024-05-03T18:07:45Z

Burying this check in the middle of CodeGen seems error-prone: when we add new features in the future, someone is going to have to dig up this code. Can we put this in one of the target feature tables, like in AArch64TargetParser.h?

If I'm understanding correctly, the issue here isn't really that the feature is "backwards-compatible"; the issue is that it's a feature the OS frequently disables, so even if you have compatible hardware, it might not actually be enabled. Other features have similar characteristics, like SVE, but we generally assume the OS will enable SVE on CPUs where it's available. And the user can explicitly override this assumption if they want to.

It might make sense to allow the user to explicitly specify that we can assume the OS support is enabled.

jroelofs · 2024-05-03T18:15:08Z

I am mildly opposed to this on principle: we should be able to optimize for any feature present in the supplied -mcpu=, and accommodating features like this that have a "graceful" fallback sets a bad precedent. If someone wants to run code on a machine, they should set mcpu/march/whatever to the minimal spec and use FMV to opt-in to additional features. This patch bends that rule, and makes FMV an opt-out mechanism, but only for these specific features.

lenary · 2024-05-03T23:09:11Z

Is this check even right for MTE? FEAT_MTE uses encodings that are undefined (rather than nop-compatible) in the base architecture, even though those encodings are not doing tag checking until you enable at least FEAT_MTE2 - so I cannot execute e.g. IRG on a base armv8.0a architecture.

tmatheson-arm · 2024-05-04T00:03:54Z

I agree with the other comments, and also I think changes like this should not go in this fast and with so few eyes on them.

ilinpv · 2024-05-07T11:17:33Z

Apologies for quick merge and thanks for comments. I agree with all of them. I would prefer to keep the patch and provide fixes on top of it. Let me know if you want it reverted.

When generating the body of the ifunc resolver, clang skips runtime checks for features that are implied from the command line. We bend this rule for certain features (memtag, bti, dgh), but this happens quite arbitrarily in my opinion. The reasoning is that some features are in the HINT instruction space, meaning they operate as NOPs if the hardware does not support them. Still the user wants to detect their presence with runtime checks. See llvm#90928 for details. I think we should always perform runtime checks regardless of the feature and then try to statically resolve calls whenever a function is compiled with a sufficiently high set of architecture features (so including target/target_version/target_clones attributes, and command line options). This is what GCC does. We have an open PR in LLVM GlobalOpt since it was suggested not to perform such codegen optimizations in clang anyway. See llvm#87939.

…es (#99522) When generating the body of the ifunc resolver, clang skips runtime checks for features that are implied from the command line. We bend this rule for certain features (memtag, bti, dgh), but this happens quite arbitrarily in my opinion. The reasoning is that some features are in the HINT instruction space, meaning they operate as NOPs if the hardware does not support them. Still the user wants to detect their presence with runtime checks. See #90928 for details. I think we should always perform runtime checks regardless of the feature and then try to statically resolve calls whenever a function is compiled with a sufficiently high set of architecture features (so including target/target_version/target_clones attributes, and command line options). This is what GCC does. We have an open PR in LLVM GlobalOpt since it was suggested not to perform such codegen optimizations in clang anyway. See #87939.

…es (llvm#99522) When generating the body of the ifunc resolver, clang skips runtime checks for features that are implied from the command line. We bend this rule for certain features (memtag, bti, dgh), but this happens quite arbitrarily in my opinion. The reasoning is that some features are in the HINT instruction space, meaning they operate as NOPs if the hardware does not support them. Still the user wants to detect their presence with runtime checks. See llvm#90928 for details. I think we should always perform runtime checks regardless of the feature and then try to statically resolve calls whenever a function is compiled with a sufficiently high set of architecture features (so including target/target_version/target_clones attributes, and command line options). This is what GCC does. We have an open PR in LLVM GlobalOpt since it was suggested not to perform such codegen optimizations in clang anyway. See llvm#87939.

…es (#99522) Summary: When generating the body of the ifunc resolver, clang skips runtime checks for features that are implied from the command line. We bend this rule for certain features (memtag, bti, dgh), but this happens quite arbitrarily in my opinion. The reasoning is that some features are in the HINT instruction space, meaning they operate as NOPs if the hardware does not support them. Still the user wants to detect their presence with runtime checks. See #90928 for details. I think we should always perform runtime checks regardless of the feature and then try to statically resolve calls whenever a function is compiled with a sufficiently high set of architecture features (so including target/target_version/target_clones attributes, and command line options). This is what GCC does. We have an open PR in LLVM GlobalOpt since it was suggested not to perform such codegen optimizations in clang anyway. See #87939. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251293

ilinpv requested review from jroelofs, DanielKristofKiss and labrinea May 3, 2024 01:48

llvmbot added clang Clang issues not falling into any other category clang:codegen labels May 3, 2024

DanielKristofKiss approved these changes May 3, 2024

View reviewed changes

ilinpv merged commit 8042022 into llvm:main May 3, 2024
7 checks passed

ilinpv mentioned this pull request May 7, 2024

[AArch64] move extension information into tablgen #90987

Merged

labrinea mentioned this pull request Jul 18, 2024

[FMV][AArch64] Do not optimize away runtime checks for implied features #99522

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FMV][AArch64] Don't optimize backward compatible features in resolver. #90928

[FMV][AArch64] Don't optimize backward compatible features in resolver. #90928

ilinpv commented May 3, 2024

llvmbot commented May 3, 2024 •

edited

Loading

DanielKristofKiss left a comment

efriedma-quic commented May 3, 2024 •

edited

Loading

jroelofs commented May 3, 2024

lenary commented May 3, 2024

tmatheson-arm commented May 4, 2024

ilinpv commented May 7, 2024 •

edited

Loading

[FMV][AArch64] Don't optimize backward compatible features in resolver. #90928

[FMV][AArch64] Don't optimize backward compatible features in resolver. #90928

Conversation

ilinpv commented May 3, 2024

llvmbot commented May 3, 2024 • edited Loading

DanielKristofKiss left a comment

Choose a reason for hiding this comment

efriedma-quic commented May 3, 2024 • edited Loading

jroelofs commented May 3, 2024

lenary commented May 3, 2024

tmatheson-arm commented May 4, 2024

ilinpv commented May 7, 2024 • edited Loading

llvmbot commented May 3, 2024 •

edited

Loading

efriedma-quic commented May 3, 2024 •

edited

Loading

ilinpv commented May 7, 2024 •

edited

Loading