Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a faster allocation size check in slice::from_raw_parts #103287

Merged
merged 1 commit into from
Oct 26, 2022

Conversation

saethlin
Copy link
Member

I've been perusing through the codegen changes that result from turning on the standard library debug assertions. The previous check in here uses saturating arithmetic, which in my experience sometimes makes LLVM just fail to optimize things around the saturating operation.

Here is a demo of the codegen difference: https://godbolt.org/z/WMEqrjajW
Before:

example::len_check_old:
        mov     rax, rdi
        mov     ecx, 3
        mul     rcx
        setno   cl
        test    rax, rax
        setns   al
        and     al, cl
        ret

example::len_check_old:
        mov     rax, rdi
        mov     ecx, 8
        mul     rcx
        setno   cl
        test    rax, rax
        setns   al
        and     al, cl
        ret

After:

example::len_check_new:
        movabs  rax, 3074457345618258603
        cmp     rdi, rax
        setb    al
        ret

example::len_check_new:
        shr     rdi, 60
        sete    al
        ret

Running rustc-perf locally, this looks like up to a 4.5% improvement when debug-assertions-std = true.

Thanks @LegionMammal978 (I think that's you?) for turning my idea into a much cleaner implementation.

r? @thomcc

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 20, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 20, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 20, 2022
@thomcc
Copy link
Member

thomcc commented Oct 20, 2022

Thanks! Looks good.

This is only used in debug builds (of the stdlib) so no need for a perf run or restricting rollup (but no need to force it either, since maybe it could impact compile times for other unknown reasons).

@bors r+

@bors
Copy link
Contributor

bors commented Oct 20, 2022

📌 Commit cfcb0a2 has been approved by thomcc

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 20, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Oct 25, 2022
Use a faster allocation size check in slice::from_raw_parts

I've been perusing through the codegen changes that result from turning on the standard library debug assertions. The previous check in here uses saturating arithmetic, which in my experience sometimes makes LLVM just fail to optimize things around the saturating operation.

Here is a demo of the codegen difference: https://godbolt.org/z/WMEqrjajW
Before:
```asm
example::len_check_old:
        mov     rax, rdi
        mov     ecx, 3
        mul     rcx
        setno   cl
        test    rax, rax
        setns   al
        and     al, cl
        ret

example::len_check_old:
        mov     rax, rdi
        mov     ecx, 8
        mul     rcx
        setno   cl
        test    rax, rax
        setns   al
        and     al, cl
        ret
```
After:
```asm
example::len_check_new:
        movabs  rax, 3074457345618258603
        cmp     rdi, rax
        setb    al
        ret

example::len_check_new:
        shr     rdi, 60
        sete    al
        ret
```

Running rustc-perf locally, this looks like up to a 4.5% improvement when `debug-assertions-std = true`.

Thanks `@LegionMammal978` (I think that's you?) for turning my idea into a much cleaner implementation.

r? `@thomcc`
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Oct 25, 2022
Use a faster allocation size check in slice::from_raw_parts

I've been perusing through the codegen changes that result from turning on the standard library debug assertions. The previous check in here uses saturating arithmetic, which in my experience sometimes makes LLVM just fail to optimize things around the saturating operation.

Here is a demo of the codegen difference: https://godbolt.org/z/WMEqrjajW
Before:
```asm
example::len_check_old:
        mov     rax, rdi
        mov     ecx, 3
        mul     rcx
        setno   cl
        test    rax, rax
        setns   al
        and     al, cl
        ret

example::len_check_old:
        mov     rax, rdi
        mov     ecx, 8
        mul     rcx
        setno   cl
        test    rax, rax
        setns   al
        and     al, cl
        ret
```
After:
```asm
example::len_check_new:
        movabs  rax, 3074457345618258603
        cmp     rdi, rax
        setb    al
        ret

example::len_check_new:
        shr     rdi, 60
        sete    al
        ret
```

Running rustc-perf locally, this looks like up to a 4.5% improvement when `debug-assertions-std = true`.

Thanks ``@LegionMammal978`` (I think that's you?) for turning my idea into a much cleaner implementation.

r? ``@thomcc``
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 26, 2022
Rollup of 10 pull requests

Successful merges:

 - rust-lang#102951 (suggest type annotation for local statement initialed by ref expression)
 - rust-lang#103209 (Diagnostic derives: allow specifying multiple alternative suggestions)
 - rust-lang#103287 (Use a faster allocation size check in slice::from_raw_parts)
 - rust-lang#103416 (Name the `impl Trait` in region bound suggestions)
 - rust-lang#103430 (Workaround unstable stmt_expr_attributes for method receiver expressions)
 - rust-lang#103444 (Remove extra type error after missing semicolon error)
 - rust-lang#103520 (rustc_middle: Rearrange resolver outputs structures slightly)
 - rust-lang#103533 (Use &self instead of &mut self for cast methods)
 - rust-lang#103536 (Remove `rustc_driver::set_sigpipe_handler()`)
 - rust-lang#103542 (Pinning tests for some `macro_rules!` errors discussed in the lang meeting)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 8ed3a80 into rust-lang:master Oct 26, 2022
@rustbot rustbot added this to the 1.66.0 milestone Oct 26, 2022
@saethlin saethlin deleted the faster-len-check branch March 15, 2023 00:34
pietroalbini added a commit to ferrocene/sphinx-shared-resources that referenced this pull request Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants