Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add vcgez, vcgtz, vclez, vcltz neon instructions #1069

Merged
merged 4 commits into from
Mar 10, 2021

Conversation

SparrowLii
Copy link
Member

All are automatically generated single-parameter comparison instructions. In order to be consistent with the implementation in Clang, some changes have been made to stdarch-gen.

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

multi_fn = fixed, c:in_t
multi_fn = fixed_2, d:in_t
multi_fn = simd_shr, e:, a, transmute(c)
multi_fn = simd_xor, transmute(e), transmute(d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use simd_ge here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be consistent with Clang's implementation. The following is the test I did in https://godbolt.org/:

#include <arm_neon.h>
int test() {
  return (int) vcgez_s32;
}        

And the Output:

define dso_local i32 @test() local_unnamed_addr #0 {
    ret i32 ptrtoint (<2 x i32> (<2 x i32>)* @vcgez_s32 to i32)
}

define internal <2 x i32> @vcgez_s32(<2 x i32> %0) #1 {
    %2 = ashr <2 x i32> %0, <i32 31, i32 31>
    %3 = xor <2 x i32> %2, <i32 -1, i32 -1>
    ret <2 x i32> %3
}

attributes #0 = { norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" }
attributes #1 = { alwaysinline norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="64" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you compile with -O0 you will see that Clang actually emits an icmp sge. LLVM optimizations are then turning this into a shift + xor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url of godbolt is from here: #148

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That usually works, but in this particular case it gives a different result because the IR is not the one generated by Clang directly: it is the IR after LLVM has run optimization passes that expand the icmp eq into shift and xor.

You can use simd_ge in Rust and it will produce the same IR as Clang.

Copy link
Member Author

@SparrowLii SparrowLii Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm.. That's right. If we use an implementation consistent with -O0, can we ensure that LLVM achieves the same optimization? If so, we should indeed use simd_ge IMO

[Edit] OK, got it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run the same LLVM passes as Clang (mostly) so rustc will also transform simd_ge into a shift + xor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation!

/// Compare signed less than zero
name = vcltz
multi_fn = fixed, b:in_t
multi_fn = simd_shr, c:in_t, a, transmute(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And simd_lt here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, the following is my test in https://godbolt.org/:

#include <arm_neon.h>
int test() {
return (int) vcltz_s32;
}                                     

And the Output:

define dso_local i32 @test() local_unnamed_addr #0 {
    ret i32 ptrtoint (<2 x i32> (<2 x i32>)* @vcltz_s32 to i32)
}

define internal <2 x i32> @vcltz_s32(<2 x i32> %0) #1 {
    %2 = ashr <2 x i32> %0, <i32 31, i32 31>
    ret <2 x i32> %2
}

attributes #0 = { norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" }
attributes #1 = { alwaysinline norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="64" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" }

@Amanieu
Copy link
Member

Amanieu commented Mar 10, 2021

Can you add ARM versions of these functions?

@SparrowLii
Copy link
Member Author

SparrowLii commented Mar 10, 2021

Can you add ARM versions of these functions?

It seems that these instructions are unique to aarch64 and only accept signed parameters. I can't compile the version of arm on godbolt either.
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vcltz

@Amanieu Amanieu merged commit fc199fe into rust-lang:master Mar 10, 2021
@Amanieu
Copy link
Member

Amanieu commented Mar 10, 2021

You are right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants