"died due to signal 11" in atomic::static_init libcore test on Android #49775

SimonSapin · 2018-04-08T05:47:32Z

The same test has failed on Travis in multiple PRs, including at least:

I can reproduce on some branches (but not master) by running RUST_TEST_THREADS=1 src/ci/docker/run.sh arm-android. The end of the output looks like:

test atomic::int_nand ... ok
test atomic::int_or ... ok
test atomic::int_xor ... ok
test atomic::static_init ... died due to signal 11
error: test failed, to rerun pass '--test coretests'


command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "arm-linux-androideabi" "-j" "8" "--release" "--locked" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "core" "--"
expected success, got: exit code: 3


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --target arm-linux-androideabi

Which means the test process dies with “SIGSEGV Invalid memory reference” while executing this test:

static S_FALSE: AtomicBool = AtomicBool::new(false);
static S_TRUE: AtomicBool = AtomicBool::new(true);
static S_INT: AtomicIsize  = AtomicIsize::new(0);
static S_UINT: AtomicUsize = AtomicUsize::new(0);

#[test]
fn static_init() {
    assert!(!S_FALSE.load(SeqCst));
    assert!(S_TRUE.load(SeqCst));
    assert!(S_INT.load(SeqCst) == 0);
    assert!(S_UINT.load(SeqCst) == 0);
}

I haven’t figured out how to use gdb inside the Android emulator inside Docker to debug this further.

CC @alexcrichton, @rust-lang/release

The text was updated successfully, but these errors were encountered:

kennytm · 2018-04-08T07:53:39Z

Assume the error is first discovered in #49675 (based on 74abffe). The final PRs involved in that rollup are:

Rollup merge of remove erroneous error message when checking impl trait params #48709 - tinaun:issue48703, r=nikomatsakis
Rollup merge of Stabilize attributes on generic parameters #48851 - petrochenkov:genparattr, r=nikomatsakis
Rollup merge of AST: Give spans to all identifiers #49154 - petrochenkov:spident, r=eddyb
Rollup merge of Take the original extra-filename passed to a crate into account when resolving it as a dependency #49253 - chmanchester:probing_fix, r=alexcrichton
Rollup merge of RFC 2008 non-exhaustive enums/structs: Finishing Touches #49345 - davidtwco:issue-44109, r=nikomatsakis
Rollup merge of Small typeck refactorings #49419 - leodasvacas:small-typeck-refactorings, r=nik…
Rollup merge of Flush executables to disk after linkage #49432 - nabijaczleweli:master, r=michaelwoerister
Rollup merge of std: Minimize size of panicking on wasm #49488 - alexcrichton:small-wasm-panic, r=sfackler
Rollup merge of Add more vec![... ; n] optimizations #49496 - glandium:master, r=sfackler
Rollup merge of add a dist builder to build rust-std components for the THUMB targets #49563 - japaric:std-thumb, r=alexcrichton

It was reported that the error still appears locally, so one of the above would be the cause.

From #49669 (comment), we knew the cause should be already contained by d919eb671b5c6d3c68221831a8d1186f492ea4d4 (i.e. 48fa6f9 on master). 74abffe...48fa6f9 includes the following PRs:

*   48fa6f9631 Auto merge of #49696 - alexcrichton:rollup, r=alexcrichton
|\  
| * cd615e9863 Rollup merge of #49705 - alexcrichton:less-manifest-docs, r=kennytm
| * 4d239ab14e Rollup merge of #49697 - kennytm:name-every-builder, r=aturon
| * 83669ecc1f Rollup merge of #49621 - Nemo157:impl-unpin-for-pin, r=withoutboats
| * 71bf15c6e8 Rollup merge of #49686 - memoryleak47:typo, r=alexcrichton
| * e6947ecf4d Rollup merge of #49597 - alexcrichton:proc-macro-v2, r=petrochenkov
| * 72ac3ebf68 Rollup merge of #49497 - scalexm:hrtb, r=nikomatsakis
| * 46492ffabd Rollup merge of #49350 - abonander:macros-in-extern, r=petrochenkov
| * b0bd9a771e Rollup merge of #49045 - Zoxc:tls, r=michaelwoerister
* | 7222241e7c Auto merge of #49045 - Zoxc:tls, r=michaelwoerister
|/  
*   56714acc5e Auto merge of #49684 - kennytm:rollup, r=kennytm
|\  
| * f4511e2437 Rollup merge of #49674 - alexcrichton:no-incremental-rustc, r=michaelwoerister
| * d05009bfa1 Rollup merge of #49667 - Manishearth:preview-features, r=nikomatsakis
| * 84f6440d79 Rollup merge of #49654 - davidtwco:issue-29893, r=alexcrichton
| * b146e33518 Rollup merge of #49563 - japaric:std-thumb, r=alexcrichton
| * 23689cc8e9 Rollup merge of #49496 - glandium:master, r=sfackler
| * 19c69082f5 Rollup merge of #49432 - nabijaczleweli:master, r=michaelwoerister
| * a70f844012 Rollup merge of #49345 - davidtwco:issue-44109, r=nikomatsakis
| * 46d0befb8e Rollup merge of #49253 - chmanchester:probing_fix, r=alexcrichton
| * 920249abdd Rollup merge of #48658 - llogiq:no-more-cas, r=kennytm
* | 01d0be9925 Auto merge of #48851 - petrochenkov:genparattr, r=nikomatsakis
|/  
* 4bf76d6745 Auto merge of #48709 - tinaun:issue48703, r=nikomatsakis
* 3b1fa867f2 Auto merge of #49587 - Bobo1239:master, r=nrc

PRs in both lists are:

remove erroneous error message when checking impl trait params #48709, Stabilize attributes on generic parameters #48851, Take the original extra-filename passed to a crate into account when resolving it as a dependency #49253, RFC 2008 non-exhaustive enums/structs: Finishing Touches #49345, Flush executables to disk after linkage #49432, Add more vec![... ; n] optimizations #49496, add a dist builder to build rust-std components for the THUMB targets #49563

alexcrichton · 2018-04-09T13:00:11Z

Yes I've been able to reproduce this locally and it seems to be unrelated to changes that we're landing and otherwise just related to how libcore or libstd itself is optimized. I've attempted to debug this to get an actual cause but to no avail so far.

alexcrichton · 2018-04-09T13:00:55Z

I'm pretty certain though that this isn't spurious in the sense that @bors: retry would do much, it seems deterministic based on changes in libstd/libcore. The changes, however, don't feel at all related to the bug still...

alexcrichton · 2018-04-09T18:38:57Z

Ok I think I've found the cause of this. I've got no idea how this ever worked before! (as is par for the course on these kinds of bugs)

So a program like this:

use std::sync::atomic::*;
use std::sync::atomic::Ordering::SeqCst;

static FOO: AtomicIsize  = AtomicIsize::new(0);

fn main() {
    FOO.load(SeqCst);
}

generates roughly this IR:

@FOO = internal unnamed_addr global <{ [4 x i8] }> zeroinitializer, align 4

define void @main() {
  %a = load atomic i32, i32* bitcast (<{ [4 x i8] }>* @FOO to i32*) seq_cst, align 4
  ret void
}

Now the crucial thing here is that we don't actually modify @FOO in LLVM, so when run through optimizations it determines:

@FOO = internal unnamed_addr constant <{ [4 x i8] }> zeroinitializer, align 4

Which sounds reasonable! As a result, our constant was promoted from BSS to read-only data. We can see this:

$ nm -a without-optimizations | grep INT
00064124 b _ZN3lib5S_INT17hb5c694457a35af85E
$ nm -a with-optimizations | grep INT
00058600 r _ZN3lib5S_INT17hb5c694457a35af85E

Here the b means bss and r means rodata.

Now the tricky part comes later during the actual codegen. Here if you disassemble the function we get:

00008554 <main>:
    8554:       e92d4800        push    {fp, lr}
    8558:       e59f000c        ldr     r0, [pc, #12]   ; 856c <main+0x18>
    855c:       e3a01000        mov     r1, #0
    8560:       e3a02000        mov     r2, #0
    8564:       eb0002ef        bl      9128 <__sync_val_compare_and_swap_4>
    8568:       e8bd8800        pop     {fp, pc}
    856c:       0000aadc        .word   0x0000aadc

So it turns out we're not using atomic instructions at all, but rather we're using the __sync_val_compare_and_swap_4 intrinsic function. On Android this is loaded from libgcc, and the definition bottoms out in the __kuser_cmpxchg function provided by the kernel. This kernel helper presumably does syscall stuff if necessary, but on the emulator the kernel apparently discovered that the hardware supports atomic instructions so it's using atomic instructions! Notably gdb says that the faulting instruction is:

(gdb) disas $pc-12,$pc+12
Dump of assembler code from 0xffff0fc0 to 0xffff0fd4:
   0xffff0fc0:  ldrex   r3, [r2]
   0xffff0fc4:  subs    r3, r3, r0
=> 0xffff0fc8:  strexeq r3, r1, [r2]
   0xffff0fcc:  teqeq   r3, #1
   0xffff0fd0:  beq     0xffff0fc0
End of assembler dump.

Turns out r2 is the same as the address to our static which is in rodata. Now that we're in rodata this ends up failing with a fault because we can't actually modify rodata.

So it looks like tl;dr; LLVM will aggressively flag unmodified internal globals as constant (yay!) which allocates them in rodata instead of bss. In this case, however, the intrinsic operation that we lowered down to ended up actually doing a write which LLVM didn't know about making that an invalid optimization.

Given that we don't actually have many instances of atomics without modifying them I'm just gonna fix the test and call it a day...

alexcrichton · 2018-04-09T18:42:03Z

As to how this ever worked I'm sort of msytified. I'm just gonna assume that all rodata doesn't actually end up being read-only for various reasons in the loader and we got lucky before where our constants here just happened to wind up in a read/write section of memory

alexcrichton · 2018-04-09T18:42:21Z

I've posted a workaround at #49811

cuviper · 2018-04-09T18:43:42Z

In this case, however, the intrinsic operation that we lowered down to ended up actually doing a write which LLVM didn't know about making that an invalid optimization.

This sounds like something that still needs to be fixed!

BatmanAoD · 2018-04-09T18:54:56Z

On Android this is loaded from libgcc, and the definition bottoms out in the __kuser_cmpxchg function provided by the kernel. This kernel helper presumably does syscall stuff if necessary, but on the emulator the kernel apparently discovered that the hardware supports atomic instructions so it's using atomic instructions!

Since we have no control over Android's libgcc, is it possible that the implementation on an actual Android device might be updated in the future to use real atomic instructions, thus triggering the bug on that device as well as in the emulator?

See rust-lang#49775 for some more information but it looks like this is working around an LLVM bug for the time being. Closes rust-lang#49775

alexcrichton · 2018-04-09T20:10:18Z

@cuviper indeed! I've opened https://bugs.llvm.org/show_bug.cgi?id=37061

alexcrichton · 2018-04-09T20:15:25Z

@BatmanAoD hm I'm not sure I quite follow? It looks like our vanilla arm-linux-androideabi is targeting such an old version of ARM/Android that it requires usage of intrinsics for atomic operations. The implementation of the intrinsics are behaving differently than what LLVM assumes, aka loading from an atomic is actually doing a cmpxchg to learn the value. The actual write happens deep within the intrinsic inside of the kernel-provided __kuser_cmpxchg function, and I'm sort of just assuming that the kernel is doing runtime detection to see if atomics are actually available and filling in an appropriate implementaiton. If a kernel implementation of the cmpxchg intrinsic were used I'm not sure if it'd also deliver a fault.

If the code is recompiled with support for atomic instructions then I don't think any fault will happen as LLVM's assumption that the static is never modified will actually be true (as it's generating the atomic instructions).

Does that answer your question?

std: Be sure to modify atomics in tests See #49775 for some more information but it looks like this is working around an LLVM bug for the time being. Closes #49775

BatmanAoD · 2018-04-10T00:47:36Z

@alexcrichton Yes; I misunderstood or misread and somehow got the impression that libgcc was responsible for making the determination of whether to invoke the strexeq instruction.

SimonSapin added O-android Operating system: Android A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) T-release Relevant to the release subteam, which will review and decide on the PR/issue. labels Apr 8, 2018

SimonSapin mentioned this issue Apr 8, 2018

Merge the std_unicode crate into the core crate #49698

Merged

kennytm mentioned this issue Apr 8, 2018

std: Minimize size of panicking on wasm #49488

Merged

alexcrichton mentioned this issue Apr 9, 2018

Add GlobalAlloc trait + tweaks for initial stabilization #49669

Merged

alexcrichton mentioned this issue Apr 9, 2018

std: Be sure to modify atomics in tests #49811

Merged

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 9, 2018

std: Be sure to modify atomics in tests

69c3830

See rust-lang#49775 for some more information but it looks like this is working around an LLVM bug for the time being. Closes rust-lang#49775

bors added a commit that referenced this issue Apr 9, 2018

Auto merge of #49811 - alexcrichton:fix-android, r=SimonSapin

880fbd7

std: Be sure to modify atomics in tests See #49775 for some more information but it looks like this is working around an LLVM bug for the time being. Closes #49775

bors closed this as completed in #49811 Apr 10, 2018

Mark-Simulacrum mentioned this issue Apr 24, 2018

Add repeat method on slice #48999

Merged

kennytm mentioned this issue Nov 11, 2018

"died due to signal 11" in collectionstests on arm-android #55861

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"died due to signal 11" in atomic::static_init libcore test on Android #49775

"died due to signal 11" in atomic::static_init libcore test on Android #49775

SimonSapin commented Apr 8, 2018

kennytm commented Apr 8, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

cuviper commented Apr 9, 2018

BatmanAoD commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

BatmanAoD commented Apr 10, 2018

"died due to signal 11" in atomic::static_init libcore test on Android #49775

"died due to signal 11" in atomic::static_init libcore test on Android #49775

Comments

SimonSapin commented Apr 8, 2018

kennytm commented Apr 8, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

cuviper commented Apr 9, 2018

BatmanAoD commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

alexcrichton commented Apr 9, 2018

BatmanAoD commented Apr 10, 2018