Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Between opt-level=1 and opt-level=2, LLVM deoptimizes the output of ptr::addr, if it ptr::addr is #[inline] #103285

Closed
saethlin opened this issue Oct 20, 2022 · 3 comments · Fixed by #103299
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@saethlin
Copy link
Member

godbolt demo: https://godbolt.org/z/51o3hYjvv

This code:

#[inline]
fn inline_addr<T>(ptr: *const T) -> usize {
    unsafe {
        core::mem::transmute(ptr)
    }
}

pub fn inline_is_nonoverlapping(src: *const u8, dst: *const u8) -> usize {
    let src_usize = inline_addr(src);
    let dst_usize = inline_addr(dst);
    if src_usize > dst_usize { let _ = src_usize - dst_usize; }
    src_usize
}

At opt-level=2 compiles to this:

example::inline_is_nonoverlapping:
        cmp     rdi, rsi
        jbe     .LBB0_3
        jb      .LBB0_2
.LBB0_3:
        mov     rax, rdi
        ret
.LBB0_2:
        push    rax
        lea     rdi, [rip + str.0]
        lea     rdx, [rip + .L__unnamed_1]
        mov     esi, 33
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2

Decreasing to opt-level=1 or removing the #[inline] on the transmute wrapper function causes the panic path to go away.

As far as I can tell, this doesn't happen on stable but it happens on beta and nightly. This code pattern is used in the standard library here:

pub(crate) fn is_nonoverlapping<T>(src: *const T, dst: *const T, count: usize) -> bool {
let src_usize = src.addr();
let dst_usize = dst.addr();
let size = mem::size_of::<T>().checked_mul(count).unwrap();
let diff = if src_usize > dst_usize { src_usize - dst_usize } else { dst_usize - src_usize };

@rustbot label +A-LLVM +regression-from-stable-to-beta

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. regression-from-stable-to-beta Performance or correctness regression from stable to beta. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 20, 2022
@nikic
Copy link
Contributor

nikic commented Oct 20, 2022

Works fine with LLVM 16: https://llvm.godbolt.org/z/67WPE19xa Presumably due to llvm/llvm-project@926e731.

Assigning to myself to check back on this after the next LLVM update.

@nikic
Copy link
Contributor

nikic commented Oct 20, 2022

Eh, we might as well fix this on the rustc side: #103299

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 20, 2022
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 20, 2022
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 30, 2022
Don't use usub.with.overflow intrinsic

The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential.

The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR.

Fixes rust-lang#103285.
@bors bors closed this as completed in 7833012 Oct 30, 2022
Aaron1011 pushed a commit to Aaron1011/rust that referenced this issue Jan 6, 2023
Don't use usub.with.overflow intrinsic

The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential.

The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR.

Fixes rust-lang#103285.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants