NVPTX: non-inlined functions can't be used cross crate #38787

japaric · 2017-01-02T23:41:36Z

Or all functions should be "inlined" for the nvptx targets

STR

$ cargo new --lib lib

$ edit lib/src/lib.rs && cat $_

#![no_std]

pub fn foo() -> i32 {
    42
}

$ cargo new --lib kernel && cd $_

$ edit src/lib.rs && cat $_

#![no_std]

extern crate lib;

fn bar() -> i32 {
    lib::foo()
}

$ edit Cargo.toml && tail -n5 $_

[dependencies]
lib = "../lib"

[profile.dev]
debug = false  # cf. rust-lang/rust#38785

$ edit nvptx64-nvidia-cuda.json && cat $_

{
  "arch": "nvptx64",
  "cpu": "sm_20",
  "data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
  "llvm-target": "nvptx64-nvidia-cuda",
  "max-atomic-width": 0,
  "os": "cuda",
  "panic-strategy": "abort",
  "target-endian": "little",
  "target-pointer-width": "64"
}

$ cargo install xargo --vers 0.3.0 || true

$ xargo rustc --target nvptx64-nvidia-cuda -- --emit=asm

$ cat $(find target/nvptx64-nvidia-cuda/debug -name '*.s')
.version 3.2
.target sm_20
.address_size 64

.extern .func  (.param .b32 func_retval0) _ZN3lib3foo17h3feefc42e145764bE
()
;

.func  (.param .b32 func_retval0) _ZN6kernel3bar17h7b762ababaef9f36E()
{
        .reg .s32       %r<2>;

        bra.uni         LBB0_1;
LBB0_1:
        { // callseq 0
        .reg .b32 temp_param_reg;
        .param .b32 retval0;
        call.uni (retval0),
        _ZN3lib3foo17h3feefc42e145764bE,
        (
        );
        ld.param.b32    %r1, [retval0+0];
        } // callseq 0
        bra.uni         LBB0_2;
LBB0_2:
        st.param.b32    [func_retval0+0], %r1;
        ret;
}

Note that foo appears as undefined (.extern .func) in the PTX module.

Workaround

Mark lib::foo as #[inline]. Then you'll get the translation of foo in the final PTX module:

.version 3.2
.target sm_20
.address_size 64


.func  (.param .b32 func_retval0) _ZN3lib3foo17h3feefc42e145764bE()
{
        .reg .s32       %r<2>;

        bra.uni         LBB0_1;
LBB0_1:
        mov.u32         %r1, 42;
        st.param.b32    [func_retval0+0], %r1;
        ret;
}

.func  (.param .b32 func_retval0) _ZN6kernel3bar17h7b762ababaef9f36E()
{
        .reg .s32       %r<2>;

        bra.uni         LBB1_1;
LBB1_1:
        { // callseq 0
        .reg .b32 temp_param_reg;
        .param .b32 retval0;
        call.uni (retval0),
        _ZN3lib3foo17h3feefc42e145764bE,
        (
        );
        ld.param.b32    %r1, [retval0+0];
        } // callseq 0
        bra.uni         LBB1_2;
LBB1_2:
        st.param.b32    [func_retval0+0], %r1;
        ret;
}

The text was updated successfully, but these errors were encountered:

michaelwoerister · 2017-01-03T14:44:57Z

This is another problem that would be solved by MIR-only rlibs.

japaric · 2017-01-03T15:20:09Z

@michaelwoerister Indeed! Is there an issue tracking progress on that? Or do you know what's the status? (Is it blocked by anything?)

michaelwoerister · 2017-01-03T16:51:54Z

I don't think we have a tracking issue for that. We might need an RFC for it too.
It is kind of blocked on incremental compilation, since compile times for some kinds of projects (with lots of machine code in upstream crates) would probably regress. Incremental compilation would take care of that in a quite natural way.

denzp · 2017-06-05T10:20:25Z

Looks like we need to link the libs, but for me --emit asm isn't the right way to do it. I would expect it to generate assembly for the current crate only.

Could it be an option to introduce new linker that will have a behaviour of llvm-link + llc? Then we can use obj_is_bitcode = true and feed the bitcode to llvm-link. After that, we call llc to produce a PTX assembly.

Sure, we don't really have to call these 2 commands (from the host environment), we could just call LLVM through ffi.

japaric · 2017-06-06T01:18:21Z

@denzp Sounds like a plan to me. It also sounds like this can be prototyped out of tree without modifying the compiler. If it works then it sounds like we should land the llvm-link + llc approach in rustc because using external commands could fail if the external llvm-link / llc doesn't match rustc's LLVM version.

NVPTX target specification This change adds a built-in `nvptx64-nvidia-cuda` GPGPU no-std target specification and a basic PTX assembly smoke tests. The approach is taken here and the target spec is based on `ptx-linker`, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side. Prior to this change, the linker used a `ld` linker-flavor, but I think, having the special CLI convention is a more reliable way. Questions about further progress on reliable CUDA workflow with Rust: 1. Is it possible to create a test suite `codegen-asm` to verify end-to-end integration with LLVM backend? 1. How would it be better to organise no-std `compile-fail` tests: add `#![no_std]` where possible and mark others as `ignore-nvptx` directive, or alternatively, introduce `compile-fail-no-std` test suite? 1. Can we have the `ptx-linker` eventually be integrated as `rls` or `clippy`? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the [current hacky solution](https://github.com/denzp/rustc-llvm-proxy). 1. Am I missing some methods from `rustc_codegen_ssa::back::linker::Linker` that can be useful for bitcode-only linking? Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates. Related to #38789 Fixes #38787 Fixes #38786

japaric added the O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html label Jan 2, 2017

japaric mentioned this issue Jan 2, 2017

NVPTX backend metabug #38789

Open

hanna-kruppe mentioned this issue Jan 3, 2017

Migrate wasm target to LLVM wasm backend #38804

Closed

michaelwoerister mentioned this issue Jan 8, 2017

Tracking issue for MIR-only RLIBs #38913

Closed

denzp mentioned this issue Jun 21, 2017

LLVM assertion when using i128 with the NVPTX #38824

Closed

Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 26, 2017

denzp mentioned this issue Jan 27, 2019

NVPTX target specification #57937

Merged

bors closed this as completed in #57937 Feb 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVPTX: non-inlined functions can't be used cross crate #38787

NVPTX: non-inlined functions can't be used cross crate #38787

japaric commented Jan 2, 2017 •

edited

Loading

michaelwoerister commented Jan 3, 2017

japaric commented Jan 3, 2017

michaelwoerister commented Jan 3, 2017

denzp commented Jun 5, 2017

japaric commented Jun 6, 2017

NVPTX: non-inlined functions can't be used cross crate #38787

NVPTX: non-inlined functions can't be used cross crate #38787

Comments

japaric commented Jan 2, 2017 • edited Loading

STR

Workaround

michaelwoerister commented Jan 3, 2017

japaric commented Jan 3, 2017

michaelwoerister commented Jan 3, 2017

denzp commented Jun 5, 2017

japaric commented Jun 6, 2017

japaric commented Jan 2, 2017 •

edited

Loading