Take advantage of known-valid-align in layout.rs #99136

CAD97 · 2022-07-10T23:42:42Z

An attempt to improve perf by @nnethercote's approach suggested in #99117

rust-highfive · 2022-07-10T23:42:46Z

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

CAD97 · 2022-07-11T00:35:47Z

Whoops, forgot an #[inline]

library/core/src/alloc/layout.rs

library/core/src/mem/valid_align.rs

scottmcm · 2022-07-11T02:23:54Z

library/core/src/alloc/layout.rs

+        if size > isize::MAX as usize - (align.as_nonzero().get() - 1) {
+            return Err(LayoutError);
+        }
+        // SAFTEY: as above, this check is sufficient.


pondering: does this need to be duplicated?

Could from_size_align become something like

pub const fn from_size_align(size: usize, align: usize) -> Result<Self, LayoutError> { let align = ValidAlign::try_from(align).map_err(|_| LayoutError)?; from_size_valid_align(size, align) }

to avoid the repetition?

That would be correct, but for the time being I'm erring on the side of making small changes so there's a chance of saying whether a change is positive or not.

scottmcm · 2022-07-11T02:24:20Z

Let's ask perf, as the comment says!

@bors try @rust-timer queue

rust-timer · 2022-07-11T02:24:22Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-11T02:24:33Z

⌛ Trying commit 079d3eb with merge bb4f8945315c007e9dddba9b61c64f37eeb201c1...

bors · 2022-07-11T04:14:57Z

☀️ Try build successful - checks-actions
Build commit: bb4f8945315c007e9dddba9b61c64f37eeb201c1 (bb4f8945315c007e9dddba9b61c64f37eeb201c1)

rust-timer · 2022-07-11T04:14:59Z

Queued bb4f8945315c007e9dddba9b61c64f37eeb201c1 with parent e1b348f, future comparison URL.

rust-timer · 2022-07-11T07:35:48Z

Finished benchmarking commit (bb4f8945315c007e9dddba9b61c64f37eeb201c1): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-1.7%	-1.9%	4
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: mixed results
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	2.4%	2.4%	1
Regressions 😿 (secondary)	2.2%	2.4%	2
Improvements 🎉 (primary)	-4.7%	-4.7%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-1.1%	-4.7%	2

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: 😿 relevant regression found

	mean¹	max	count²
Regressions 😿 (primary)	3.4%	3.4%	1
Regressions 😿 (secondary)	4.4%	4.4%	1
Improvements 🎉 (primary)	-3.4%	-3.4%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-0.0%	-3.4%	2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

CAD97 · 2022-07-11T08:08:22Z

Not surprised, but this shows as an improvement on ctfe stress tests and neutral everywhere else. That makes it potentially worth it to merge.

scottmcm · 2022-07-11T18:55:34Z

@CAD97 If this had recovered a bunch of the primary regressions from 95295 I'd have just merged it immediately, but since it's a more nuanced win, would you mind trying the deduplicated version? I'd rather not have to repeat the code if we didn't have to, and the deduplicated version should still save CTFE work by not having to check the align.

library/core/src/alloc/layout.rs

erikdesjardins · 2022-07-11T21:34:14Z

The CTFE perf win is probably spurious, it's been flaky lately.

For example, the cachegrind diff of ctfe-stress-5 check full from this PR looks like:

 -247,721,975  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_ty
  -44,860,875  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_ty
  -28,327,968  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_const
  -25,453,103  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::run
   14,162,786  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::try_fold_with::<rustc_middle::ty::subst::SubstFolder>
   10,906,079  ???:<rustc_middle::ty::instance::Instance>::resolve_opt_const_arg
   -2,359,073  ???:<rustc_middle::ty::normalize_erasing_regions::TryNormalizeAfterErasingRegionsFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_mir_const
   -1,477,147  ???:<rustc_middle::ty::Ty>::fn_sig
   -1,468,108  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::eval_fn_call
      724,860  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeSuperFoldable>::super_fold_with::<rustc_middle::ty::subst::SubstFolder>
      366,988  ???:<rustc_const_eval::const_eval::machine::CompileTimeInterpreter as rustc_const_eval::interpret::machine::Machine>::find_mir_or_eval_fn

And the cachegrind diff of ctfe-stress-5 check full from #97841 (comment), a PR that shouldn't have impacted CTFE at all (since it just inlines a non-const Windows-only function), is pretty close to the opposite of that:

247,716,419  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_ty
 44,860,867  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_ty
 28,327,968  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_const
-14,162,786  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::try_fold_with::<rustc_middle::ty::subst::SubstFolder>
-10,895,378  ???:<rustc_middle::ty::instance::Instance>::resolve_opt_const_arg
 10,609,061  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::run
  2,215,876  ???:<rustc_middle::ty::Ty>::fn_sig
  1,703,968  ???:<rustc_middle::ty::normalize_erasing_regions::TryNormalizeAfterErasingRegionsFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_mir_const
  1,176,734  ???:<rustc_infer::infer::InferCtxt>::commit_unconditionally::<rustc_middle::traits::ImplSourceUserDefinedData<rustc_infer::traits::Obligation<rustc_middle::ty::Predicate>>, <rustc_trait_selection::traits::select::SelectionContext>::confirm_impl_candidate::{closure
  1,100,996  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::eval_place
   -817,906  ???:<rustc_trait_selection::traits::select::SelectionContext>::rematch_impl
   -721,179  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeSuperFoldable>::super_fold_with::<rustc_middle::ty::subst::SubstFolder>
   -688,130  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::operand_projection
   -367,016  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::mir_const_to_op
    365,443  ???:<core::iter::adapters::GenericShunt<core::iter::adapters::flatten::FlatMap<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<rustc_middle::traits::select::SelectionCandidate>, <rustc_trait_selection::traits::select::SelectionContext>::candidate_from_obligation_no_cache::{closure
   -325,479  ???:<rustc_trait_selection::traits::select::SelectionContext>::vtable_impl

CAD97 · 2022-07-11T21:49:13Z

(sigill was me messing up and inserting a wrong assert_unsafe_precondition! whoops)

bors · 2022-07-12T00:33:47Z

☀️ Try build successful - checks-actions
Build commit: 95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4 (95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4)

rust-timer · 2022-07-12T00:33:49Z

Queued 95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4 with parent 38b7215, future comparison URL.

rust-timer · 2022-07-12T04:10:32Z

Finished benchmarking commit (95ff54ef0b0a45df48b6f4150236ce2faa6c4ae4): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Primary benchmarks: 😿 relevant regression found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	0.5%	0.5%	1
Regressions 😿 (secondary)	2.6%	3.2%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-4.0%	-4.0%	1
All 😿🎉 (primary)	0.5%	0.5%	1

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-5.4%	-8.7%	2
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

CAD97 · 2022-07-12T04:46:33Z

So this is pretty clearly perf-nuetral. To that end, it seems better to just use the safe constructor everywhere, but I'm going to hold back from making that call.

At a minimum, having the smaller codepath might make it easier to get other perf gains, but honestly who knows. Intuition doesn't work in this file.

scottmcm · 2022-07-12T06:31:42Z

I think it absolutely makes sense to do this -- I might not have wanted to do it for a new unsafe constructor or a bunch of repetitive code, but because the type system means we can skip the check naturally, I think it's worth reducing the amount of work these things need to do at runtime even if it doesn't show up in rustc's perf.

@bors r+ rollup=iffy (perf came back neutral, but it's still a file that can be highly impactful)

bors · 2022-07-12T06:31:44Z

📌 Commit 1169490 has been approved by scottmcm

It is now in the queue for this repository.

bors · 2022-07-13T21:01:24Z

⌛ Testing commit 1169490 with merge 87588a2...

bors · 2022-07-13T23:41:59Z

☀️ Test successful - checks-actions
Approved by: scottmcm
Pushing 87588a2 to master...

rust-timer · 2022-07-14T00:58:19Z

Finished benchmarking commit (87588a2): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvement found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-0.3%	-0.3%	1
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.7%	3.7%	1
Improvements 🎉 (primary)	-2.8%	-3.4%	2
Improvements 🎉 (secondary)	-3.3%	-3.3%	1
All 😿🎉 (primary)	-2.8%	-3.4%	2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

…oshtriplett Reoptimize layout array This way it's one check instead of two, so hopefully (cc rust-lang#99117) it'll be simpler for rustc perf too 🤞 Quick demonstration: ```rust pub fn demo(n: usize) -> Option<Layout> { Layout::array::<i32>(n).ok() } ``` Nightly: <https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=e97bf33508aa03f38968101cdeb5322d> ```nasm mov rax, rdi mov ecx, 4 mul rcx seto cl movabs rdx, 9223372036854775805 xor esi, esi cmp rax, rdx setb sil shl rsi, 2 xor edx, edx test cl, cl cmove rdx, rsi ret ``` This PR (note no `mul`, in addition to being much shorter): ```nasm xor edx, edx lea rax, [4*rcx] shr rcx, 61 sete dl shl rdx, 2 ret ``` This is built atop `@CAD97` 's rust-lang#99136; the new changes are cb8aba66ef6a0e17f08a0574e4820653e31b45a0. I added a bunch more tests for `Layout::from_size_align` and `Layout::array` too.

This reverts commit 87588a2, reversing changes made to c80dde4.

[stable] Prepare 1.64.0 release This PR prepares the 1.64.0 stable release builds. In addition to bumping the channel and including the latest release notes changes, this PR also backports the following PRs: * rust-lang#100852 * rust-lang#101366 * rust-lang#101468 * rust-lang#101922 This PR also reverts the following PRs, as decided in rust-lang#101899 (comment): * rust-lang#95295 * rust-lang#99136 (followup to the previous PR) r? `@ghost` cc `@rust-lang/release`

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 10, 2022

This comment was marked as resolved.

Sign in to view

rust-highfive assigned Mark-Simulacrum Jul 10, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 10, 2022

Take advantage of known-valid-align in layout.rs

079d3eb

CAD97 force-pushed the layout-faster branch from e44eb57 to 079d3eb Compare July 11, 2022 00:35

scottmcm reviewed Jul 11, 2022

View reviewed changes

library/core/src/alloc/layout.rs Show resolved Hide resolved

scottmcm reviewed Jul 11, 2022

View reviewed changes

library/core/src/mem/valid_align.rs Show resolved Hide resolved

scottmcm reviewed Jul 11, 2022

View reviewed changes

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 11, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 11, 2022

lqd reviewed Jul 11, 2022

View reviewed changes

library/core/src/alloc/layout.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

CAD97 force-pushed the layout-faster branch 2 times, most recently from 1c37e0e to 2d84703 Compare July 11, 2022 21:49

CAD97 force-pushed the layout-faster branch from 2d84703 to 24ad0f2 Compare July 11, 2022 21:54

Remove duplication of layout size check

1169490

lqd mentioned this pull request Jul 12, 2022

Utilize PGO for windows x64 rustc dist builds #96978

Merged

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 12, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 12, 2022

scottmcm mentioned this pull request Jul 12, 2022

Reoptimize layout array #99174

Merged

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 13, 2022

bors merged commit 87588a2 into rust-lang:master Jul 13, 2022

rustbot added this to the 1.64.0 milestone Jul 13, 2022

bors mentioned this pull request Jul 13, 2022

Make functions part of alloc_layout_extra (#55724) const. #99016

Closed

CAD97 deleted the layout-faster branch July 14, 2022 00:00

This was referenced Sep 19, 2022

[stable] Prepare 1.64.0 release #102018

Merged

#95295 causes unsoundness in multiple existing crates #101899

Closed

bors pushed a commit to rust-lang-ci/rust that referenced this pull request Sep 19, 2022

Revert "Auto merge of rust-lang#99136 - CAD97:layout-faster, r=scottmcm"

1d0597c

This reverts commit 87588a2, reversing changes made to c80dde4.

saethlin mentioned this pull request Oct 21, 2022

Even nicer errors from assert_unsafe_precondition #103035

Merged

jyn514 mentioned this pull request Apr 15, 2023

Perf: investigate manual inlining in layout.rs #99117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take advantage of known-valid-align in layout.rs #99136

Take advantage of known-valid-align in layout.rs #99136

CAD97 commented Jul 10, 2022

This comment was marked as resolved.

rust-highfive commented Jul 10, 2022

CAD97 commented Jul 11, 2022

scottmcm Jul 11, 2022

CAD97 Jul 11, 2022

scottmcm commented Jul 11, 2022

rust-timer commented Jul 11, 2022

bors commented Jul 11, 2022

bors commented Jul 11, 2022

rust-timer commented Jul 11, 2022

rust-timer commented Jul 11, 2022

CAD97 commented Jul 11, 2022

scottmcm commented Jul 11, 2022

erikdesjardins commented Jul 11, 2022

This comment has been minimized.

CAD97 commented Jul 11, 2022

bors commented Jul 12, 2022

rust-timer commented Jul 12, 2022

rust-timer commented Jul 12, 2022

CAD97 commented Jul 12, 2022

scottmcm commented Jul 12, 2022

bors commented Jul 12, 2022

bors commented Jul 13, 2022

bors commented Jul 13, 2022

rust-timer commented Jul 14, 2022

Take advantage of known-valid-align in layout.rs #99136

Take advantage of known-valid-align in layout.rs #99136

Conversation

CAD97 commented Jul 10, 2022

This comment was marked as resolved.

rust-highfive commented Jul 10, 2022

CAD97 commented Jul 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm commented Jul 11, 2022

rust-timer commented Jul 11, 2022

bors commented Jul 11, 2022

bors commented Jul 11, 2022

rust-timer commented Jul 11, 2022

rust-timer commented Jul 11, 2022

Footnotes

CAD97 commented Jul 11, 2022

scottmcm commented Jul 11, 2022

erikdesjardins commented Jul 11, 2022

This comment has been minimized.

CAD97 commented Jul 11, 2022

bors commented Jul 12, 2022

rust-timer commented Jul 12, 2022

rust-timer commented Jul 12, 2022

Footnotes

CAD97 commented Jul 12, 2022

scottmcm commented Jul 12, 2022

bors commented Jul 12, 2022

bors commented Jul 13, 2022

bors commented Jul 13, 2022

rust-timer commented Jul 14, 2022

Footnotes