Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FMV][AArch64] Don't optimize backward compatible features in resolver. #90928

Merged
merged 1 commit into from
May 3, 2024

Conversation

ilinpv
Copy link
Contributor

@ilinpv ilinpv commented May 3, 2024

For arch64 features, such as Branch Target Identification or MTE (Memory Tagging Extension), compatible with targets that lack their support we may encounter scenarios where a binary compiled with MTE for example is executed on both MTE and non-MTE hardware and we still need to detect at runtime whether the MTE feature is available to choose the appropriate function version.
So, we cannot optimize the function multi versioning resolver by removing checks for these features enabled for the target during compilation.

For arch64 features, such as Branch Target Identification or MTE (Memory
Tagging Extension), compatible with targets that lack their support we
may encounter scenarios where a binary compiled with MTE for example is
executed on both MTE and non-MTE hardware and we still need to detect at
runtime whether the MTE feature is available to choose the appropriate
function version.
So, we cannot optimize the function multi versioning resolver by removing
checks for these features enabled for the target during compilation.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen labels May 3, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 3, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Pavel Iliin (ilinpv)

Changes

For arch64 features, such as Branch Target Identification or MTE (Memory Tagging Extension), compatible with targets that lack their support we may encounter scenarios where a binary compiled with MTE for example is executed on both MTE and non-MTE hardware and we still need to detect at runtime whether the MTE feature is available to choose the appropriate function version.
So, we cannot optimize the function multi versioning resolver by removing checks for these features enabled for the target during compilation.


Full diff: https://github.com/llvm/llvm-project/pull/90928.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+8-2)
  • (modified) clang/test/CodeGen/attr-target-clones-aarch64.c (+27-2)
diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 87766a758311d5..39943ed2a415e8 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2759,8 +2759,14 @@ llvm::Value *CodeGenFunction::FormAArch64ResolverCondition(
     const MultiVersionResolverOption &RO) {
   llvm::SmallVector<StringRef, 8> CondFeatures;
   for (const StringRef &Feature : RO.Conditions.Features) {
-    // Form condition for features which are not yet enabled in target
-    if (!getContext().getTargetInfo().hasFeature(Feature))
+    // Optimize the Function Multi Versioning resolver by creating conditions
+    // only for features that are not enabled in the target. The exception is
+    // for features whose extension instructions are executed as NOP on targets
+    // without extension support.
+    if (!getContext().getTargetInfo().hasFeature(Feature) ||
+        Feature.equals("bti") || Feature.equals("memtag") ||
+        Feature.equals("memtag2") || Feature.equals("memtag3") ||
+        Feature.equals("dgh"))
       CondFeatures.push_back(Feature);
   }
   if (!CondFeatures.empty()) {
diff --git a/clang/test/CodeGen/attr-target-clones-aarch64.c b/clang/test/CodeGen/attr-target-clones-aarch64.c
index f75d8a69ebf02f..603d067864b45a 100644
--- a/clang/test/CodeGen/attr-target-clones-aarch64.c
+++ b/clang/test/CodeGen/attr-target-clones-aarch64.c
@@ -526,8 +526,8 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 // CHECK-MTE-BTI-NEXT:  resolver_entry:
 // CHECK-MTE-BTI-NEXT:    call void @__init_cpu_features_resolver()
 // CHECK-MTE-BTI-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-MTE-BTI-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 4096
-// CHECK-MTE-BTI-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 4096
+// CHECK-MTE-BTI-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 17592186048512
+// CHECK-MTE-BTI-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 17592186048512
 // CHECK-MTE-BTI-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
 // CHECK-MTE-BTI-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
 // CHECK-MTE-BTI:       resolver_return:
@@ -604,7 +604,24 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 //
 // CHECK-MTE-BTI-LABEL: @ftc_dup3.resolver(
 // CHECK-MTE-BTI-NEXT:  resolver_entry:
+// CHECK-MTE-BTI-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-MTE-BTI-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-MTE-BTI-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-MTE-BTI-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK-MTE-BTI:       resolver_return:
 // CHECK-MTE-BTI-NEXT:    ret ptr @ftc_dup3._Mbti
+// CHECK-MTE-BTI:       resolver_else:
+// CHECK-MTE-BTI-NEXT:    [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-MTE-BTI-NEXT:    [[TMP5:%.*]] = and i64 [[TMP4]], 17592186044416
+// CHECK-MTE-BTI-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 17592186044416
+// CHECK-MTE-BTI-NEXT:    [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-MTE-BTI-NEXT:    br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK-MTE-BTI:       resolver_return1:
+// CHECK-MTE-BTI-NEXT:    ret ptr @ftc_dup3._Mmemtag2
+// CHECK-MTE-BTI:       resolver_else2:
+// CHECK-MTE-BTI-NEXT:    ret ptr @ftc_dup3.default
 //
 //
 // CHECK-MTE-BTI: Function Attrs: noinline nounwind optnone
@@ -712,7 +729,15 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 // CHECK-MTE-BTI:       resolver_return:
 // CHECK-MTE-BTI-NEXT:    ret ptr @ftc_inline3._MsbMsve
 // CHECK-MTE-BTI:       resolver_else:
+// CHECK-MTE-BTI-NEXT:    [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-MTE-BTI-NEXT:    [[TMP5:%.*]] = and i64 [[TMP4]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 1125899906842624
+// CHECK-MTE-BTI-NEXT:    [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-MTE-BTI-NEXT:    br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK-MTE-BTI:       resolver_return1:
 // CHECK-MTE-BTI-NEXT:    ret ptr @ftc_inline3._Mbti
+// CHECK-MTE-BTI:       resolver_else2:
+// CHECK-MTE-BTI-NEXT:    ret ptr @ftc_inline3.default
 //
 //
 // CHECK-MTE-BTI: Function Attrs: noinline nounwind optnone

Copy link
Member

@DanielKristofKiss DanielKristofKiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ilinpv ilinpv merged commit 8042022 into llvm:main May 3, 2024
7 checks passed
@efriedma-quic
Copy link
Collaborator

efriedma-quic commented May 3, 2024

Burying this check in the middle of CodeGen seems error-prone: when we add new features in the future, someone is going to have to dig up this code. Can we put this in one of the target feature tables, like in AArch64TargetParser.h?

If I'm understanding correctly, the issue here isn't really that the feature is "backwards-compatible"; the issue is that it's a feature the OS frequently disables, so even if you have compatible hardware, it might not actually be enabled. Other features have similar characteristics, like SVE, but we generally assume the OS will enable SVE on CPUs where it's available. And the user can explicitly override this assumption if they want to.

It might make sense to allow the user to explicitly specify that we can assume the OS support is enabled.

@jroelofs
Copy link
Contributor

jroelofs commented May 3, 2024

I am mildly opposed to this on principle: we should be able to optimize for any feature present in the supplied -mcpu=, and accommodating features like this that have a "graceful" fallback sets a bad precedent. If someone wants to run code on a machine, they should set mcpu/march/whatever to the minimal spec and use FMV to opt-in to additional features. This patch bends that rule, and makes FMV an opt-out mechanism, but only for these specific features.

@lenary
Copy link
Member

lenary commented May 3, 2024

Is this check even right for MTE? FEAT_MTE uses encodings that are undefined (rather than nop-compatible) in the base architecture, even though those encodings are not doing tag checking until you enable at least FEAT_MTE2 - so I cannot execute e.g. IRG on a base armv8.0a architecture.

@tmatheson-arm
Copy link
Contributor

I agree with the other comments, and also I think changes like this should not go in this fast and with so few eyes on them.

@ilinpv
Copy link
Contributor Author

ilinpv commented May 7, 2024

Apologies for quick merge and thanks for comments. I agree with all of them. I would prefer to keep the patch and provide fixes on top of it. Let me know if you want it reverted.

labrinea added a commit to labrinea/llvm-project that referenced this pull request Jul 18, 2024
When generating the body of the ifunc resolver, clang skips runtime
checks for features that are implied from the command line. We bend
this rule for certain features (memtag, bti, dgh), but this happens
quite arbitrarily in my opinion. The reasoning is that some features
are in the HINT instruction space, meaning they operate as NOPs if
the hardware does not support them. Still the user wants to detect
their presence with runtime checks. See llvm#90928 for details.

I think we should always perform runtime checks regardless of the
feature and then try to statically resolve calls whenever a function
is compiled with a sufficiently high set of architecture features
(so including target/target_version/target_clones attributes, and
command line options). This is what GCC does. We have an open PR in
LLVM GlobalOpt since it was suggested not to perform such codegen
optimizations in clang anyway. See llvm#87939.
labrinea added a commit that referenced this pull request Jul 19, 2024
…es (#99522)

When generating the body of the ifunc resolver, clang skips runtime
checks for features that are implied from the command line. We bend this
rule for certain features (memtag, bti, dgh), but this happens quite
arbitrarily in my opinion. The reasoning is that some features are in
the HINT instruction space, meaning they operate as NOPs if the hardware
does not support them. Still the user wants to detect their presence
with runtime checks. See #90928 for details.

I think we should always perform runtime checks regardless of the
feature and then try to statically resolve calls whenever a function is
compiled with a sufficiently high set of architecture features (so
including target/target_version/target_clones attributes, and command
line options). This is what GCC does. We have an open PR in LLVM
GlobalOpt since it was suggested not to perform such codegen
optimizations in clang anyway. See #87939.
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
…es (llvm#99522)

When generating the body of the ifunc resolver, clang skips runtime
checks for features that are implied from the command line. We bend this
rule for certain features (memtag, bti, dgh), but this happens quite
arbitrarily in my opinion. The reasoning is that some features are in
the HINT instruction space, meaning they operate as NOPs if the hardware
does not support them. Still the user wants to detect their presence
with runtime checks. See llvm#90928 for details.

I think we should always perform runtime checks regardless of the
feature and then try to statically resolve calls whenever a function is
compiled with a sufficiently high set of architecture features (so
including target/target_version/target_clones attributes, and command
line options). This is what GCC does. We have an open PR in LLVM
GlobalOpt since it was suggested not to perform such codegen
optimizations in clang anyway. See llvm#87939.
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…es (#99522)

Summary:
When generating the body of the ifunc resolver, clang skips runtime
checks for features that are implied from the command line. We bend this
rule for certain features (memtag, bti, dgh), but this happens quite
arbitrarily in my opinion. The reasoning is that some features are in
the HINT instruction space, meaning they operate as NOPs if the hardware
does not support them. Still the user wants to detect their presence
with runtime checks. See #90928 for details.

I think we should always perform runtime checks regardless of the
feature and then try to statically resolve calls whenever a function is
compiled with a sufficiently high set of architecture features (so
including target/target_version/target_clones attributes, and command
line options). This is what GCC does. We have an open PR in LLVM
GlobalOpt since it was suggested not to perform such codegen
optimizations in clang anyway. See #87939.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251293
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants