Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take advantage of known-valid-align in layout.rs #99136

Merged
merged 2 commits into from
Jul 13, 2022

Conversation

CAD97
Copy link
Contributor

@CAD97 CAD97 commented Jul 10, 2022

An attempt to improve perf by @nnethercote's approach suggested in #99117

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 10, 2022
@rustbot

This comment was marked as resolved.

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 10, 2022
@CAD97
Copy link
Contributor Author

CAD97 commented Jul 11, 2022

Whoops, forgot an #[inline]

if size > isize::MAX as usize - (align.as_nonzero().get() - 1) {
return Err(LayoutError);
}
// SAFTEY: as above, this check is sufficient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pondering: does this need to be duplicated?

Could from_size_align become something like

pub const fn from_size_align(size: usize, align: usize) -> Result<Self, LayoutError> {
    let align = ValidAlign::try_from(align).map_err(|_| LayoutError)?;
    from_size_valid_align(size, align)
}

to avoid the repetition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be correct, but for the time being I'm erring on the side of making small changes so there's a chance of saying whether a change is positive or not.

@scottmcm
Copy link
Member

Let's ask perf, as the comment says!

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 11, 2022
@bors
Copy link
Contributor

bors commented Jul 11, 2022

⌛ Trying commit 079d3eb with merge bb4f8945315c007e9dddba9b61c64f37eeb201c1...

@bors
Copy link
Contributor

bors commented Jul 11, 2022

☀️ Try build successful - checks-actions
Build commit: bb4f8945315c007e9dddba9b61c64f37eeb201c1 (bb4f8945315c007e9dddba9b61c64f37eeb201c1)

@rust-timer
Copy link
Collaborator

Queued bb4f8945315c007e9dddba9b61c64f37eeb201c1 with parent e1b348f, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (bb4f8945315c007e9dddba9b61c64f37eeb201c1): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-1.7% -1.9% 4
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
2.4% 2.4% 1
Regressions 😿
(secondary)
2.2% 2.4% 2
Improvements 🎉
(primary)
-4.7% -4.7% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -1.1% -4.7% 2

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
3.4% 3.4% 1
Regressions 😿
(secondary)
4.4% 4.4% 1
Improvements 🎉
(primary)
-3.4% -3.4% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -0.0% -3.4% 2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 11, 2022
@CAD97
Copy link
Contributor Author

CAD97 commented Jul 11, 2022

Not surprised, but this shows as an improvement on ctfe stress tests and neutral everywhere else. That makes it potentially worth it to merge.

@scottmcm
Copy link
Member

@CAD97 If this had recovered a bunch of the primary regressions from 95295 I'd have just merged it immediately, but since it's a more nuanced win, would you mind trying the deduplicated version? I'd rather not have to repeat the code if we didn't have to, and the deduplicated version should still save CTFE work by not having to check the align.

@erikdesjardins
Copy link
Contributor

The CTFE perf win is probably spurious, it's been flaky lately.

For example, the cachegrind diff of ctfe-stress-5 check full from this PR looks like:

 -247,721,975  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_ty
  -44,860,875  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_ty
  -28,327,968  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_const
  -25,453,103  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::run
   14,162,786  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::try_fold_with::<rustc_middle::ty::subst::SubstFolder>
   10,906,079  ???:<rustc_middle::ty::instance::Instance>::resolve_opt_const_arg
   -2,359,073  ???:<rustc_middle::ty::normalize_erasing_regions::TryNormalizeAfterErasingRegionsFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_mir_const
   -1,477,147  ???:<rustc_middle::ty::Ty>::fn_sig
   -1,468,108  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::eval_fn_call
      724,860  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeSuperFoldable>::super_fold_with::<rustc_middle::ty::subst::SubstFolder>
      366,988  ???:<rustc_const_eval::const_eval::machine::CompileTimeInterpreter as rustc_const_eval::interpret::machine::Machine>::find_mir_or_eval_fn

And the cachegrind diff of ctfe-stress-5 check full from #97841 (comment), a PR that shouldn't have impacted CTFE at all (since it just inlines a non-const Windows-only function), is pretty close to the opposite of that:

247,716,419  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_ty
 44,860,867  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_ty
 28,327,968  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_const
-14,162,786  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::try_fold_with::<rustc_middle::ty::subst::SubstFolder>
-10,895,378  ???:<rustc_middle::ty::instance::Instance>::resolve_opt_const_arg
 10,609,061  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::run
  2,215,876  ???:<rustc_middle::ty::Ty>::fn_sig
  1,703,968  ???:<rustc_middle::ty::normalize_erasing_regions::TryNormalizeAfterErasingRegionsFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_mir_const
  1,176,734  ???:<rustc_infer::infer::InferCtxt>::commit_unconditionally::<rustc_middle::traits::ImplSourceUserDefinedData<rustc_infer::traits::Obligation<rustc_middle::ty::Predicate>>, <rustc_trait_selection::traits::select::SelectionContext>::confirm_impl_candidate::{closure
  1,100,996  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::eval_place
   -817,906  ???:<rustc_trait_selection::traits::select::SelectionContext>::rematch_impl
   -721,179  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeSuperFoldable>::super_fold_with::<rustc_middle::ty::subst::SubstFolder>
   -688,130  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::operand_projection
   -367,016  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::mir_const_to_op
    365,443  ???:<core::iter::adapters::GenericShunt<core::iter::adapters::flatten::FlatMap<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<rustc_middle::traits::select::SelectionCandidate>, <rustc_trait_selection::traits::select::SelectionContext>::candidate_from_obligation_no_cache::{closure
   -325,479  ???:<rustc_trait_selection::traits::select::SelectionContext>::vtable_impl

@rust-log-analyzer

This comment has been minimized.

@CAD97 CAD97 force-pushed the layout-faster branch 2 times, most recently from 1c37e0e to 2d84703 Compare July 11, 2022 21:49
@CAD97
Copy link
Contributor Author

CAD97 commented Jul 11, 2022

(sigill was me messing up and inserting a wrong assert_unsafe_precondition! whoops)

@bors
Copy link
Contributor

bors commented Jul 12, 2022

☀️ Try build successful - checks-actions
Build commit: 95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4 (95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4)

@rust-timer
Copy link
Collaborator

Queued 95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4 with parent 38b7215, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
0.5% 0.5% 1
Regressions 😿
(secondary)
2.6% 3.2% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-4.0% -4.0% 1
All 😿🎉 (primary) 0.5% 0.5% 1

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-5.4% -8.7% 2
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 12, 2022
@CAD97
Copy link
Contributor Author

CAD97 commented Jul 12, 2022

So this is pretty clearly perf-nuetral. To that end, it seems better to just use the safe constructor everywhere, but I'm going to hold back from making that call.

At a minimum, having the smaller codepath might make it easier to get other perf gains, but honestly who knows. Intuition doesn't work in this file.

@scottmcm
Copy link
Member

I think it absolutely makes sense to do this -- I might not have wanted to do it for a new unsafe constructor or a bunch of repetitive code, but because the type system means we can skip the check naturally, I think it's worth reducing the amount of work these things need to do at runtime even if it doesn't show up in rustc's perf.

@bors r+ rollup=iffy (perf came back neutral, but it's still a file that can be highly impactful)

@bors
Copy link
Contributor

bors commented Jul 12, 2022

📌 Commit 1169490 has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 12, 2022
@bors
Copy link
Contributor

bors commented Jul 13, 2022

⌛ Testing commit 1169490 with merge 87588a2...

@bors
Copy link
Contributor

bors commented Jul 13, 2022

☀️ Test successful - checks-actions
Approved by: scottmcm
Pushing 87588a2 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 13, 2022
@bors bors merged commit 87588a2 into rust-lang:master Jul 13, 2022
@rustbot rustbot added this to the 1.64.0 milestone Jul 13, 2022
@CAD97 CAD97 deleted the layout-faster branch July 14, 2022 00:00
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (87588a2): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvement found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-0.3% -0.3% 1
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.7% 3.7% 1
Improvements 🎉
(primary)
-2.8% -3.4% 2
Improvements 🎉
(secondary)
-3.3% -3.3% 1
All 😿🎉 (primary) -2.8% -3.4% 2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 10, 2022
…oshtriplett

Reoptimize layout array

This way it's one check instead of two, so hopefully (cc rust-lang#99117) it'll be simpler for rustc perf too 🤞

Quick demonstration:
```rust
pub fn demo(n: usize) -> Option<Layout> {
    Layout::array::<i32>(n).ok()
}
```

Nightly: <https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=e97bf33508aa03f38968101cdeb5322d>
```nasm
	mov	rax, rdi
	mov	ecx, 4
	mul	rcx
	seto	cl
	movabs	rdx, 9223372036854775805
	xor	esi, esi
	cmp	rax, rdx
	setb	sil
	shl	rsi, 2
	xor	edx, edx
	test	cl, cl
	cmove	rdx, rsi
	ret
```

This PR (note no `mul`, in addition to being much shorter):
```nasm
	xor	edx, edx
	lea	rax, [4*rcx]
	shr	rcx, 61
	sete	dl
	shl	rdx, 2
	ret
```

This is built atop `@CAD97` 's rust-lang#99136; the new changes are cb8aba66ef6a0e17f08a0574e4820653e31b45a0.

I added a bunch more tests for `Layout::from_size_align` and `Layout::array` too.
bors pushed a commit to rust-lang-ci/rust that referenced this pull request Sep 19, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 19, 2022
[stable] Prepare 1.64.0 release

This PR prepares the 1.64.0 stable release builds.

In addition to bumping the channel and including the latest release notes changes, this PR also backports the following PRs:

*  rust-lang#100852
*  rust-lang#101366
*  rust-lang#101468
*  rust-lang#101922

This PR also reverts the following PRs, as decided in rust-lang#101899 (comment):

* rust-lang#95295
* rust-lang#99136 (followup to the previous PR)

r? `@ghost`
cc `@rust-lang/release`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants