Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVPTX: non-inlined functions can't be used cross crate #38787

Closed
japaric opened this issue Jan 2, 2017 · 5 comments
Closed

NVPTX: non-inlined functions can't be used cross crate #38787

japaric opened this issue Jan 2, 2017 · 5 comments
Labels
C-bug Category: This is a bug. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html

Comments

@japaric
Copy link
Member

japaric commented Jan 2, 2017

Or all functions should be "inlined" for the nvptx targets

STR

$ cargo new --lib lib

$ edit lib/src/lib.rs && cat $_
#![no_std]

pub fn foo() -> i32 {
    42
}
$ cargo new --lib kernel && cd $_

$ edit src/lib.rs && cat $_
#![no_std]

extern crate lib;

fn bar() -> i32 {
    lib::foo()
}
$ edit Cargo.toml && tail -n5 $_
[dependencies]
lib = "../lib"

[profile.dev]
debug = false  # cf. rust-lang/rust#38785
$ edit nvptx64-nvidia-cuda.json && cat $_
{
  "arch": "nvptx64",
  "cpu": "sm_20",
  "data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
  "llvm-target": "nvptx64-nvidia-cuda",
  "max-atomic-width": 0,
  "os": "cuda",
  "panic-strategy": "abort",
  "target-endian": "little",
  "target-pointer-width": "64"
}
$ cargo install xargo --vers 0.3.0 || true

$ xargo rustc --target nvptx64-nvidia-cuda -- --emit=asm

$ cat $(find target/nvptx64-nvidia-cuda/debug -name '*.s')
.version 3.2
.target sm_20
.address_size 64

.extern .func  (.param .b32 func_retval0) _ZN3lib3foo17h3feefc42e145764bE
()
;

.func  (.param .b32 func_retval0) _ZN6kernel3bar17h7b762ababaef9f36E()
{
        .reg .s32       %r<2>;

        bra.uni         LBB0_1;
LBB0_1:
        { // callseq 0
        .reg .b32 temp_param_reg;
        .param .b32 retval0;
        call.uni (retval0),
        _ZN3lib3foo17h3feefc42e145764bE,
        (
        );
        ld.param.b32    %r1, [retval0+0];
        } // callseq 0
        bra.uni         LBB0_2;
LBB0_2:
        st.param.b32    [func_retval0+0], %r1;
        ret;
}

Note that foo appears as undefined (.extern .func) in the PTX module.

Workaround

Mark lib::foo as #[inline]. Then you'll get the translation of foo in the final PTX module:

.version 3.2
.target sm_20
.address_size 64


.func  (.param .b32 func_retval0) _ZN3lib3foo17h3feefc42e145764bE()
{
        .reg .s32       %r<2>;

        bra.uni         LBB0_1;
LBB0_1:
        mov.u32         %r1, 42;
        st.param.b32    [func_retval0+0], %r1;
        ret;
}

.func  (.param .b32 func_retval0) _ZN6kernel3bar17h7b762ababaef9f36E()
{
        .reg .s32       %r<2>;

        bra.uni         LBB1_1;
LBB1_1:
        { // callseq 0
        .reg .b32 temp_param_reg;
        .param .b32 retval0;
        call.uni (retval0),
        _ZN3lib3foo17h3feefc42e145764bE,
        (
        );
        ld.param.b32    %r1, [retval0+0];
        } // callseq 0
        bra.uni         LBB1_2;
LBB1_2:
        st.param.b32    [func_retval0+0], %r1;
        ret;
}
@japaric japaric added the O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html label Jan 2, 2017
@michaelwoerister
Copy link
Member

This is another problem that would be solved by MIR-only rlibs.

@japaric
Copy link
Member Author

japaric commented Jan 3, 2017

@michaelwoerister Indeed! Is there an issue tracking progress on that? Or do you know what's the status? (Is it blocked by anything?)

@michaelwoerister
Copy link
Member

I don't think we have a tracking issue for that. We might need an RFC for it too.
It is kind of blocked on incremental compilation, since compile times for some kinds of projects (with lots of machine code in upstream crates) would probably regress. Incremental compilation would take care of that in a quite natural way.

@denzp
Copy link
Contributor

denzp commented Jun 5, 2017

Looks like we need to link the libs, but for me --emit asm isn't the right way to do it. I would expect it to generate assembly for the current crate only.

Could it be an option to introduce new linker that will have a behaviour of llvm-link + llc? Then we can use obj_is_bitcode = true and feed the bitcode to llvm-link. After that, we call llc to produce a PTX assembly.

Sure, we don't really have to call these 2 commands (from the host environment), we could just call LLVM through ffi.

@japaric
Copy link
Member Author

japaric commented Jun 6, 2017

@denzp Sounds like a plan to me. It also sounds like this can be prototyped out of tree without modifying the compiler. If it works then it sounds like we should land the llvm-link + llc approach in rustc because using external commands could fail if the external llvm-link / llc doesn't match rustc's LLVM version.

@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 26, 2017
bors added a commit that referenced this issue Feb 1, 2019
NVPTX target specification

This change adds a built-in `nvptx64-nvidia-cuda` GPGPU no-std target specification and a basic PTX assembly smoke tests.

The approach is taken here and the target spec is based on `ptx-linker`, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.

Prior to this change, the linker used a `ld` linker-flavor, but I think, having the special CLI convention is a more reliable way.

Questions about further progress on reliable CUDA workflow with Rust:
1. Is it possible to create a test suite `codegen-asm` to verify end-to-end integration with LLVM backend?
1. How would it be better to organise no-std `compile-fail` tests: add `#![no_std]` where possible and mark others as `ignore-nvptx` directive, or alternatively, introduce `compile-fail-no-std` test suite?
1. Can we have the `ptx-linker` eventually be integrated as `rls` or `clippy`? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the [current hacky solution](https://github.com/denzp/rustc-llvm-proxy).
1. Am I missing some methods from `rustc_codegen_ssa::back::linker::Linker` that can be useful for bitcode-only linking?

Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.

Related to #38789
Fixes #38787
Fixes #38786
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html
Projects
None yet
Development

No branches or pull requests

4 participants