Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port libstd to "no operating system" target #37133

Closed
wants to merge 1 commit into from

Conversation

jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Oct 12, 2016

This is an attempt to "port" libstd to run in unconventional environments. Unconventional here means: no networking, no filesystem, no environment and in general no system calls. Think OS kernel but different: you do want allocation and collections, you do want floating point numbers, you do want standard Rust abstractions such as io::Write, etc. Using no_std is really inconvenient in these situations because a lot of std is missing. Examples of such environments are emscripten¹ and Intel SGX.

This PR adds a target_os = "none" configuration. The approach I've taken is to base everything of off the unix base, removing the dependency on libc, and returning errors from all functions that used libc. I've borrowed some ideas from the Linux and emscripten implementations here and there.

In some places the existing unix code needed only minor modifications and #[cfg()] additions, in these cases I've modified the files inline. In other cases, it made more sense to reimplement the sys API in a new file. See libstd/sys/none/mod.rs.

In general these areas have been modified:

  • fs just mostly return io::Error. UNIX-compatible Path/OsString handling is still supported
  • net just mostly return io::Error. Addressing structures are still supported
  • process just mostly return io::Error
  • env just mostly return io::Error
  • thread just mostly return io::Error (note std::thread::spawn calls unwrap internally...)
  • stdio use "fake stdio" as is used on Windows when there is no console
  • sync replace all primitives with single-threaded primitives without any locking. If you do somehow end up in a multi-threaded environment, this means things are totally and utterly racy and unsafe.
  • time SystemTime is like the Linux version and you can do time math, but calling now will panic.

I'm aware that the current state of the PR is nowhere near merging. I just want to announce my plans and solicit feedback. I already know that the following things need work:

  • tests. I have not tried to run any tests yet. It's not even clear how to run tests as there is no generic way to run binaries compiled for this target. Even if there were, I imagine many tests are going to fail. What would be a good approach to handling this?
  • formatting. I'm pretty sure that I'm not adhering to proper style everywhere
  • copyright headers for new files. These will be added later
  • librustc_back target. For now you need to use the JSON target file.

Rather, I'm looking for technical feedback.

You can build this branch like so:

RUST_TARGET_PATH=/somewhere cargo rustc --lib --release --target x86_64-unknown-none-gnu --manifest-path src/rustc/std_shim/Cargo.toml

You can then link the deps directory to the approriate lib/rustlib in your rust installation to test building with this target. Here's what you need to supply to make it link:

  • an allocator crate
  • rlibc or similar
  • an implementation for extern fn getrandom(buf: *mut u8, len: usize);

I made a sample application that takes this std and adds some Linux system calls around it, it works.

¹: If I understand the state of the emscripten port correctly based on discussions with @brson, if you're trying to use functionality that doesn't exist on the platform you'll just get linker errors. That's of course not very ergonomic.

r? @brson

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @brson (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@est31
Copy link
Member

est31 commented Oct 13, 2016

Why not remove the modules (or at least everything but the traits), instead of returning errors? And why not add what remains to libcore, or some library that sits between libcore (which is the absolute minimum) and libstd?

@jethrogb
Copy link
Contributor Author

jethrogb commented Oct 13, 2016

Why not remove the modules (or at least everything but the traits), instead of returning errors? And why not add what remains to libcore, or some library that sits between libcore (which is the absolute minimum) and libstd?

So that existing crates may work with no or minimal modifications.

@jethrogb jethrogb changed the title Port libstd to "no operating system" Port libstd to "no operating system" target Oct 13, 2016
@est31
Copy link
Member

est31 commented Oct 13, 2016

I agree that it makes compiling crates for those platforms easier, but making sure they actually work is made harder. You can't rely on the compiler anymore to tell you whether you used something unsupported, you need to find that out by testing, and doing tests is harder than to run the compiler.

It makes it also harder for library authors to "port" their libraries to that target. In the ideal case, one could do it like you made your PR: you compile your library to the target and put all features that communicate over the network or similar behind a cfg flag, or chose different behaviour, depending on the situation. If you only had the version where everything returns io::Error, you'd need to actually test your program, or search for it manually.

If your crate is architected properly, the code that uses file I/O outside of the traits should be isolated from the other functionality anyway.

Note that I don't suggest that the sync primitives should be removed, their shims are probably fine as is, and maybe in the future there could be an option where you can control whether you are in a single threaded environment or not, similar to panic=abort, to remove overhead from single threaded programs that use libraries written for multithreaded contexts.

However, everything that panics or returns an error should be removed.

@ghost
Copy link

ghost commented Oct 13, 2016

sync replace all primitives with single-threaded primitives without any locking. If you do somehow end up in a multi-threaded environment, this means things are totally and utterly racy and unsafe.

doesn't this mean you have a mutex which doesn't deadlock?

let x = Mutex::new(5);
let mut y = x.lock();
let mut z = x.lock(); //2 mutable pointers

from what i can tell this will panic in debug mode but not in release mode unless i misunderstand debug_assert

@jethrogb
Copy link
Contributor Author

from what i can tell this will panic in debug mode but not in release mode unless i misunderstand debug_assert

I suppose I should turn these and others in Mutex/RWLock into actual asserts. Note that panicking in this case is allowed behavior.

@Ericson2314
Copy link
Contributor

Yeah what we need to do is break std up into something more reasonable, keeping std as a shim for a) backwards compat b) nice interface with yet-to-be-designed scenarios. I am also wary of introducing dynamic failure (return Err(..), unimplemented!(), etc), even as a temporary measure--it will just cause problems down the road.

I'm currently hoping for a "bottom up approach" which would be something like:

  1. Implement allocator RFC (hopefully including associated error type, and option global singleton allocator!)
  2. Port collections to allocators
  3. Deal with std, possibly make a libnet, libfs, and reintroducing libsync.

@jethrogb
Copy link
Contributor Author

Regarding the use of unimplemented!(), I'm fairly certain I used these only in places that will never get called. For example, it's impossible to get a FileAttr instance (which is Metadata.inner) because all functions that return Metadata actually return a Result<Metadata> and they will never return Ok.

@alexcrichton
Copy link
Member

Thanks for the PR @jethrogb!

I haven't had a chance to read it too much in detail yet, but my initial reaction is along the line of @est31's reaction above. In some sense I'm not sure how this helps "port" existing code to a new platform as it seems like it'd be very difficult to stop calling "always fallible APIs". In some sense this is the purpose of libcore, a library which you can use freely without every worrying about whether it's portable or not. (e.g. it works across all OS environments).

Put another way, what does this standard library actually do? If all I/O is stubbed out, it seems like the only thing you really get is libcore plus collections/pointers that abort on OOM, right? We should most certainly have a library for collections/pointers, but I don't necessarily think we'd need a whole new "return an error everywhere" standard library. Similarly the action of porting a library to a "no operating system" target would then entail compiling with #![no_std] or with only this extra collections crate.

Finally, there's been musings of ports like this in the past with the concept of "scenarios" which allow the standard library to be robustly ported various places. That way code is tagged with the "scenario" that it expects to operate within, and you get a compile-time guarantee that you, say, don't use the filesystem and/or network. You could the imagine linker shenanigans to ensure that everything actually compiles in the end.

Unfortunately not a lot of movement has been happening on that recently, but it does seem very highly related to this PR itself.

@jethrogb
Copy link
Contributor Author

jethrogb commented Oct 17, 2016

what does this standard library actually do?

Two main things:

  1. It makes available all standard Rust API's and abstractions, many of which are not available in core. Some of them are available when using nightly features/crates, but others are not. Box, collections (all of them), I/O traits, I/O cursor, Error trait, C strings, path name handling, IP addresses, deriving serialization traits (not supported on core even with a ported rustc-serialize).
  2. It supports using existing crates. Having to maintain core ports of any crate you're using is a lot of effort. Especially since upstream crates are reluctant to accept PRs making their crates work on core. Yes, if you're using a crate that heavily depends on filesystem/networking it's obviously not going to work. But if you're using a crate that only uses File::create to write out some log file, you can now use this without any problems. Or if you're not using the networking functionality but just the parsing functionality (for example I'm using hyper this way right now).

@est31
Copy link
Member

est31 commented Oct 17, 2016

I do agree with @jethrogb that libcore right now is too minimal for projects to really like it, and its not just about collections.

I/O traits are really useful. Most rust crates to read/write in some file format are using the Read and Write traits, and it would make things easier to have the traits in webassembly scenarios. Webassembly is completely secluded from the world, afaik it can communicate with the javascript world, and through that you could download files from the net or prompt the user to upload a file and then pass those files on to some crate you use via std::io::Cursor (which sadly isn't in libcore either).

But if you're using a crate that only uses File::create to write out some log file, you can now use this without any problems.

A library crate shouldn't create log files without you asking for it. And if it has a separate logging functionality the user has to opt in to, you can surround it with a cfg flag in upstream and it will continue to work.

@jethrogb
Copy link
Contributor Author

jethrogb commented Oct 17, 2016

A library crate shouldn't create log files without you asking for it. And if it has a separate logging functionality the user has to opt in to, you can surround it with a cfg flag in upstream and it will continue to work.

A library crate needn't actually create log files for it to have a dependency on File without which it won't compile.

@Ericson2314
Copy link
Contributor

Yeah these are valid points, the solution is to break up std.

@jethrogb
Copy link
Contributor Author

Yeah these are valid points, the solution is to break up std.

I can see how that would make more of std available. Could you explain how that helps with using existing crates?

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 17, 2016

@jethrogb existing crates downstream of std? Those need to be refactored. I think the pain of refactoring is good here---it makes downstream more conscious of exactly what sort of minimal system they depend on.

@briansmith
Copy link
Contributor

@jethrogb existing crates downstream of std? Those need to be refactored. I think the pain of refactoring is good here---it makes downstream more conscious of exactly what sort of minimal system they depend on.

I agree such refactoring is good. I think, though, it is a matter of time-to-market. How many months would it take to get all the necessary changes made? AFAICT it could easily be a year or more. I think it makes sense to have a workaround in place until that happens.

@jethrogb
Copy link
Contributor Author

I agree with @briansmith. In addition, as I was saying before, it's not just a matter of refactoring those crates, but also of convincing those crate's maintainers that they should accept your refactoring.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 17, 2016

I should link #27701; @briansmith and I are veterans of that endless thread. I'd like to think the "scenarios" @alexcrichton mentions are the successor of the "subset std" idea of that thread. When they land, I think they will provide a "lighter" refactor path to the extent they overlap with the crates behind the facade.

On the other hand, you all mention time-to-market, but I am in no rush and conversely think its really important we get this right, or at least right before stabilization. Yes it's hard to refactor all those crates but conversely the shape of the no-std ecosystem will be determined by how good a job we do. Post stabilization, any warts or inflexibilities will be impossible to root out (see libc's mess of an interface).

@Ericson2314
Copy link
Contributor

@jethrogb also I think the lynchpin here for building out the crates behind the facade is jethrogb/rust-core_io#3. (Trace the few issues I've transitively linked to it lately, for example). Getting those associated types is IMO the easiest way to allow for a core::io, and one can hope for Rust 2.0 std's Read and Write can be built off the core traits too.

@petrochenkov
Copy link
Contributor

Finally, there's been musings of ports like this in the past with the concept of "scenarios" which allow the standard library to be robustly ported various places.

Unfortunately not a lot of movement has been happening on that recently, but it does seem very highly related to this PR itself.

Huh, this seems to always happen with important topics.
First the progress is blocked on someone from the core devs making the design work, then months go, nothing happens and problems that need to be resolved yesterday stay unresolved.
Can the progress be achieved without waiting for @aturon completing his other more prioritized work? Can these "scenarios" be designed independently by, e.g. @jethrogb, @briansmith and @Ericson2314 starting from today and not from some unclear point in the future?

@aturon
Copy link
Member

aturon commented Oct 17, 2016

@petrochenkov I don't think @alexcrichton was implying that this PR should be blocked on the "scenarios" idea; I think he was just pointing at something relevant.

FWIW, the libs team is taking up working out the "scenarios" design right now -- it's up for discussion today, and notes will be posted to internals. Anyone's welcome to chime in/take leadership on the existing internals thread, of course, or to work in the direction of an RFC.

I know that it can be opaque what's being worked on at any given point in time, vs what's up for grabs. We're continually trying to improve our processes (e.g. with rfcbot), and one of the changes we're making starting this cycle is for more "proactive" discussions in the subteam meetings -- as mentioned above, these should result in internals threads which give the broader community a chance to join in as well. We're also talking about doing more to make visible what people are working on. If you have other ideas about improvements, I'd love to hear them.

In the meantime, please understand that Rust is a huge project, and it's incredibly difficult to keep it all organized, to stay on top of things in addition to doing productive work. If you feel like something's stalled or being ignored, never hesitate to ping!

@aturon aturon added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-nominated labels Oct 17, 2016
@aturon
Copy link
Member

aturon commented Oct 17, 2016

Nominating for libs team discussion on the overall direction/idea.

@aturon
Copy link
Member

aturon commented Oct 17, 2016

cc @rust-lang/libs

@brson
Copy link
Contributor

brson commented Oct 17, 2016

I think this is a worthwhile port. I personally want Rust to be usable in every context, particuraly ones without libc, and I think subsetting the standard library is a reasonable thing to do (the emscripten port could use it).

We have quite a bit of refactoring work ahead of us before we can support this type of port though. With this, the recent talk of merging Redox std, and the haiku port, our current platform abstraction in std is at its maintainable limit.

I have some ideas for how to get to a better place, where ports like this can be self-contained and not impose huge maintenance burden, and have started at a proof of concept. Unfortunately, as with so many things, the amount of time I can dedicate to it is small. I've discussed my intentions previously in these threads:

I will plan to make a more detailed writeup soon.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 17, 2016

@petrochenkov For me the key is having both scenarios and the facade availible. Basically where possible the facade should be leveraged because its simpler and harder to mess up, but for a few things scenarios are needed. Crucially, I think the facade is more appropriate for exotic platforms, and scenarios more for minute differences between variants of the standard big 3 platforms.

So it's great that scenarios is finally happening, but I don't want to block the no-std ecosystem on it.

@aturon
Copy link
Member

aturon commented Oct 17, 2016

We discussed both this PR and the "scenarios" ideas in the libs team meeting today. @alexcrichton will be writing up the scenarios ideas in a new internals thread, which we'll link to here once it's ready.

On the whole, the libs team concurs with what many have said on this thread: we think there's a reasonable goal here, but would prefer to fit it into the existing facade story, rather than approach it as a new "workaround" target. More specifically:

  • The core aim, of providing all the parts of std that do not rely on OS services, seems reasonable but is more naturally provided as part of the std facade. In particular, the layers between core and std could use a refactoring, and it is quite plausible to provide the precise functionality proposed here in such a facade crate.
  • Providing error-returning implementations, rather than compile-time errors, for missing functionality is suboptimal (at best a temporary workaround, as @briansmith said).
  • The refactoring needed to move over to a facade crate, in practice, is extremely minimal -- the interior crates should provide the exact same module hierarchy, meaning the refactor is just a matter of replacing std with a different module name. And we've also already committed to the facade story in general; a PR is not the place to overturn those plans.

@brson is going to write up some more detailed thoughts on how the facade could evolve to handle this and other use-cases, trying to give specific steps that a enterprising contributor could take to make progress. In the meantime, using core + collections is a reasonable starting point for experimentation.

@jethrogb
Copy link
Contributor Author

jethrogb commented Oct 17, 2016

In the meantime, using core + collections is a reasonable starting point for experimentation.

Could you elaborate on what further experimentation you'd like to see here? I've been doing exactly this for the better part of this year and porting anything to work in that environment is extremely painful (e.g. redis-rs/redis-rs@master...jethrogb:core), which is what prompted this PR.

@aturon
Copy link
Member

aturon commented Oct 17, 2016

@jethrogb That's the kind of experimentation we had in mind, yes. The point was that we'd like to head toward facade usage for this, rather than adding a new platform, and the current facade is a starting point that works today.

Can you elaborate on what makes it so painful to port? Is it primarily the stuff that's in std but not collections?

@jethrogb
Copy link
Contributor Author

jethrogb commented Oct 17, 2016

Is it primarily the stuff that's in std but not collections?

Yes, and there's no prelude, and everything is at a different path (think alloc::boxed::Box, collections::string::String). This comment contains a list with everything I know of that's missing (and I intend to update that list as I learn more).

Can you elaborate on what makes it so painful to port?

And another thing (I feel like I keep repeating myself) is that the current situation is so bad (cfgs everywhere, optional dependencies with weird feature selection) that changes like those for redis-rs above are unlikely to ever be accepted by any sane upstream maintainer.

@aturon
Copy link
Member

aturon commented Oct 17, 2016

@jethrogb

The prelude of course can be addressed by having a prelude module that you glob import. The path differences should be addressed by changing the facade crates, just like we did when cleaning up libcore (cc @alexcrichton). The goal is for the crates to effectively be drop-in replacements for their fragment of std.

More broadly, something like what you're doing with the core_* crates is basically where we'd like to go with the facade.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 17, 2016

Hmm, I feel like one the existence of alloc and collections is stable (which I don't see being too far off), having a bunch of facades will be less useful, but this is an unimportant nit.

@aturon
Copy link
Member

aturon commented Oct 17, 2016

@Ericson2314 I'm not assuming that alloc and collections would remain as separate crates -- the crates between core and std need to be re-rationalized.

@brson should be writing up his more specific thoughts on this soon.

@alexcrichton
Copy link
Member

@jethrogb

I'd like to echo @aturon's sentiment that the mid-facade crates are in no way stable or finalized. I added them when I first made the facade on a whim, and they've basically never changed since then. They could use some serious reorganizing if we plan to seriously use them. Currently, however, they were never seriously intended for the reuse that we're seeing, hence the pains! And also as @aturon mentioned, @brson will comment more here.

And another thing (I feel like I keep repeating myself) is that the current situation is so bad (cfgs everywhere, optional dependencies with weird feature selection) that changes like those for redis-rs above are unlikely to ever be accepted by any sane upstream maintainer.

This is something I'd like to dig into, but perhaps not on this thread. I think it's bad if we end up in a situation with a jungle of #[cfg] annotations as well, but I feel that a PR like this is a hammer to solve the symptom and not the actual problem. The real problem is that the platform you care about does implement most of the standard library, not all. In my opinion this needs to be expressed in a static fashion, not a runtime one. Whether that's through more facade crates, reorganization facade crates, or the scenarios idea I'm about to write up doesn't matter too much in my opinion.

I personally feel that the sentiment that we should shim std stems from the belief that most crates will "just work" if most of std returns and error. I also personally believe that this isn't the case because crates which use std feel the freedom to expand over time. Put another way, if they happen to work with a particular subset today there's no guarantee they will continue to do so as changes are accepted over time.

I'd also like to head off concerns about upstreaming this kind of support. If an upstream maintainer does not want to support a platform, then that's a choice that we can't really get around. If, however, the upstream maintainer would like to support a platform, then it's our job to ensure that this support is as ergonomic as possible and easily available.

@alexcrichton
Copy link
Member

Ok, I've typed up my ideas about scenarios. Feedback is certainly welcome!

@jethrogb
Copy link
Contributor Author

jethrogb commented Oct 20, 2016

Still eagerly waiting for @brson's write-up!

In the mean time, reading @HybridEidolon's blog post gave me a half-baked idea. It does however depend on a future version of rust-lang/rfcs#1133 which allows replacing std crates.

std should be basically be a facade only. Each of fs, net, stdio, process+env, thread+sync, time should be in its own two crates each. The two crates are "std__" and "std___sys" (e.g. "std_net" contains IpAddr and "std_net_sys" contains TcpStream). std consists solely of re-exports, merging the facade crates where appropriate.

  1. If I want to build a library specifically not using particular features, I can just use only the std_X crates I want. Maybe we can have some language feature that automatically turns use std::net into an import of the std_net crate and a binding of that crate under that name.
  2. If I want to replace a crate with an different implementation (non-standard system library, always-error shim, etc.) I can just say [replace] std_net_sys = { ... } in my Cargo.toml and all it needs to do is provide std::net's stable interface.

An resolved question in this split model is how to deal with io::Error which I think is the only thing that's shared between all system-dependent crates.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 21, 2016

To be clear, #1133 as written (and my implementation) allows replacing stdlib crates. What the impl doesn't allow is replacing other crates with stdlib crates, but this a temporary restriction until other dep types (i.e. dev-deps, build-deps) are sorted out (not sure it even made it into the RFC).

An resolved question in this split model is how to deal with io::Error which I think is the only thing that's shared between all system-dependent crates.

I'm telling ya, the associated error solves all problems here :)

@aturon
Copy link
Member

aturon commented Oct 31, 2016

@jethrogb Note: we discussed this a bit in the libs meeting -- @brson is much of the way through writing up his thoughts on the platform abstraction layer, should be on its way soon.

@brson
Copy link
Contributor

brson commented Nov 2, 2016

Here are my thoughts on the path forward for making std more portable.

I suggest reading it all, but the thrust is that platform-specific code should be isolated into platform-specific crates. So in the end, a port like this might be entirely relegated to a pal_error crate, where everything platform-specific errors, or maybe a pal_custom_sgx crate if not everything is an error. I think if we were to consider such a port in the short-term, before we achieve such a refactoring, we might want to see the entire port reside within sys/erroring (the "erroring" port), though even then, whether such a port belongs in tree I suspect would be subject to much debate.

@ticki
Copy link
Contributor

ticki commented Nov 10, 2016

OMG! Yes yes yes!

I've written ralloc, and part of the reason it took so long is because literally everything from the standard library needed complete reimplementation, because I cannot depend on libstd in memory allocators, for obvious reasons.

This would reduce the code base by maybe 10%. Not ideal, but certainly a big deal.

@alexcrichton
Copy link
Member

@brson it seems that this PR isn't quite what you had in mind in terms of the grand vision, do you think we should close this and keep iterating or do you feel like this'd be a good intermediate step?

(I sort of feel the former, personally)

@alexcrichton
Copy link
Member

Ok, I'm going to close this in favor of the discussion at https://internals.rust-lang.org/t/refactoring-std-for-ultimate-portability/4301 (clearing out the queue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants