Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop generating allocas & memcmp for simple short array equality #85828

Merged
merged 9 commits into from
Jul 9, 2021

Conversation

scottmcm
Copy link
Member

Example:

pub fn demo(x: [u16; 6], y: [u16; 6]) -> bool { x == y }

Before:

define zeroext i1 @_ZN10playground4demo17h48537f7eac23948fE(i96 %0, i96 %1) unnamed_addr #0 {
start:
  %y = alloca [6 x i16], align 8
  %x = alloca [6 x i16], align 8
  %.0..sroa_cast = bitcast [6 x i16]* %x to i96*
  store i96 %0, i96* %.0..sroa_cast, align 8
  %.0..sroa_cast3 = bitcast [6 x i16]* %y to i96*
  store i96 %1, i96* %.0..sroa_cast3, align 8
  %_11.i.i.i = bitcast [6 x i16]* %x to i8*
  %_14.i.i.i = bitcast [6 x i16]* %y to i8*
  %bcmp.i.i.i = call i32 @bcmp(i8* nonnull dereferenceable(12) %_11.i.i.i, i8* nonnull dereferenceable(12) %_14.i.i.i, i64 12) #2, !alias.scope !2
  %2 = icmp eq i32 %bcmp.i.i.i, 0
  ret i1 %2
}
playground::demo: # @playground::demo
	sub	rsp, 32
	mov	qword ptr [rsp], rdi
	mov	dword ptr [rsp + 8], esi
	mov	qword ptr [rsp + 16], rdx
	mov	dword ptr [rsp + 24], ecx
	xor	rdi, rdx
	xor	esi, ecx
	or	rsi, rdi
	sete	al
	add	rsp, 32
	ret

After:

define zeroext i1 @_ZN4mini4demo17h7a8994aaa314c981E(i96 %0, i96 %1) unnamed_addr #0 {
start:
  %2 = icmp eq i96 %0, %1
  ret i1 %2
}
_ZN4mini4demo17h7a8994aaa314c981E:
	xor	rcx, r8
	xor	edx, r9d
	or	rdx, rcx
	sete	al
	ret

@rust-highfive
Copy link
Collaborator

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

@rust-highfive
Copy link
Collaborator

r? @dtolnay

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 30, 2021
@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 30, 2021
@bors
Copy link
Contributor

bors commented May 30, 2021

⌛ Trying commit a6546252f35553da72f213514eb1ea3f26c67b0a with merge 4fc737828e7a2f978c30875c76c93b2f0e5d6562...

@bors
Copy link
Contributor

bors commented May 30, 2021

☀️ Try build successful - checks-actions
Build commit: 4fc737828e7a2f978c30875c76c93b2f0e5d6562 (4fc737828e7a2f978c30875c76c93b2f0e5d6562)

@rust-timer
Copy link
Collaborator

Queued 4fc737828e7a2f978c30875c76c93b2f0e5d6562 with parent bff138d, future comparison URL.

@oli-obk
Copy link
Contributor

oli-obk commented May 30, 2021

For future archeologists it may be helpful to create a separate commit for the move of the eq logic to the eq module and base the logic changes on top of that

let rhs = self.read_scalar(rhs)?.check_init()?;
let lhs_bytes = self.memory.read_bytes(lhs, layout.size)?;
let rhs_bytes = self.memory.read_bytes(rhs, layout.size)?;
Ok(Scalar::Int((lhs_bytes == rhs_bytes).into()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Scalar::from_bool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way better! Thanks.

Copy link
Member

@RalfJung RalfJung May 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this will raise errors when there are uninit bytes or pointers anywhere in this memory. For CTFE that makes sense, for Miri-the-tool we might want to properly support comparing pointers...
Nothing we have to resolve now, just pointing this out.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (4fc737828e7a2f978c30875c76c93b2f0e5d6562): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 30, 2021
@@ -367,6 +367,14 @@ pub fn check_intrinsic_type(tcx: TyCtxt<'_>, it: &hir::ForeignItem<'_>) {

sym::nontemporal_store => (1, vec![tcx.mk_mut_ptr(param(0)), param(0)], tcx.mk_unit()),

sym::raw_eq => {
let param_count = if intrinsic_name == sym::raw_eq { 2 } else { 1 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is matching on the intrinsic name already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Good catch. (Remnant of a previous attempt which included a second intrinsic.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still there, did you forget to push?

Copy link
Member Author

@scottmcm scottmcm Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm
Copy link
Member Author

scottmcm commented May 31, 2021

Added the implementation for cranelift.

I didn't bother doing anything too fancy, but it does trigger for the IPv6 case ([u16; 8]):

@0009                               v9 = load.i128 notrap v8
@0009                               v10 = load.i128 notrap v7
@0009                               v11 = icmp eq v9, v10
@0009                               v12 = bint.i8 v11

Otherwise it lib_calls out to memcmp still (here for the [u16; 6] case):

@0009                               v9 = iconst.i64 12
@0009                               v10 = call fn2(v8, v7, v9)
@0009                               v16 = iconst.i32 0
@0009                               v11 = icmp eq v10, v16
@0009                               v12 = bint.i8 v11

Added another nice codegen test example.

pub fn array_eq_zero(x: [u16; 8]) -> bool { x == [0; 8] }

Before:

  %x = alloca i128, align 8
  store i128 %0, i128* %x, align 8
  %_11.i.i.i = bitcast i128* %x to i8*
  %bcmp.i.i.i = call i32 @bcmp(i8* nonnull dereferenceable(16) %_11.i.i.i, i8* nonnull dereferenceable(16) getelementptr inbounds (<{ [16 x i8] }>, <{ [16 x i8] }>* @alloc2, i64 0, i32 0, i64 0), i64 16) #2, !alias.scope !2
  %1 = icmp eq i32 %bcmp.i.i.i, 0
  ret i1 %1
	sub	rsp, 16
	mov	qword ptr [rsp + 8], rsi
	mov	qword ptr [rsp], rdi
	or	rdi, rsi
	sete	al
	add	rsp, 16
	ret

After:

  %1 = icmp eq i128 %0, 0
  ret i1 %1
	or	rcx, rdx
	sete	al
	ret

@Mark-Simulacrum
Copy link
Member

Did some investigation into the regressions in the perf run here - https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/.2385828

Copy link
Member

@bjorn3 bjorn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cg_clif changes LGTM

@scottmcm
Copy link
Member Author

scottmcm commented Jun 3, 2021

Changing the threshold means it's probably worth re-running perf

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@@ -712,6 +713,10 @@ impl CodegenCx<'b, 'tcx> {
ifn!("llvm.assume", fn(i1) -> void);
ifn!("llvm.prefetch", fn(i8p, t_i32, t_i32, t_i32) -> void);

// This isn't an "LLVM intrinsic", but LLVM's optimization passes
// recognize it like one and we assume it exists in `core::slice::cmp`
ifn!("memcmp", fn(i8p, i8p, t_isize) -> t_i32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memcmp in slice::cmp has a FIXME for the return type, should that go here too?

@rust-log-analyzer

This comment has been minimized.

@scottmcm
Copy link
Member Author

scottmcm commented Jul 8, 2021

@oli-obk I've re-based and blessed this.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 9, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Jul 9, 2021

📌 Commit d064494 has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 9, 2021
@bors
Copy link
Contributor

bors commented Jul 9, 2021

⌛ Testing commit d064494 with merge ee86f96...

@bors
Copy link
Contributor

bors commented Jul 9, 2021

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing ee86f96 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 9, 2021
@bors bors merged commit ee86f96 into rust-lang:master Jul 9, 2021
@rustbot rustbot added this to the 1.55.0 milestone Jul 9, 2021
@scottmcm scottmcm deleted the raw-eq branch August 29, 2021 07:03
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 17, 2021
…t-slices, r=dtolnay

Do array-slice equality via array equality, rather than always via slices

~~Draft because it needs a rebase after rust-lang#91766 eventually gets through bors.~~

This enables the optimizations from rust-lang#85828 to be used for array-to-slice comparisons too, not just array-to-array.

For example, <https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=5f9ba69b3d5825a782f897c830d3a6aa>
```rust
pub fn demo(x: &[u8], y: [u8; 4]) -> bool {
    *x == y
}
```
Currently writes the array to stack for no reason:
```nasm
	sub	rsp, 4
	mov	dword ptr [rsp], edx
	cmp	rsi, 4
	jne	.LBB0_1
	mov	eax, dword ptr [rdi]
	cmp	eax, dword ptr [rsp]
	sete	al
	add	rsp, 4
	ret

.LBB0_1:
	xor	eax, eax
	add	rsp, 4
	ret
```
Whereas with the change in this PR it just compares it directly:
```nasm
	cmp	rsi, 4
	jne	.LBB1_1
	cmp	dword ptr [rdi], edx
	sete	al
	ret

.LBB1_1:
	xor	eax, eax
	ret
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.