Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way of obtaining numeric file descriptors. #15643

Closed
wants to merge 2 commits into from

Conversation

errordeveloper
Copy link
Contributor

This is to improve the interoperability with external C libraries.
One should be able to leverage most Rust's standard run-time functions
for openning and closing files with all the error handling etc, while
only making C library calls when it's very neccessary.

@errordeveloper
Copy link
Contributor Author

it is still missing Windows support... and unit tests. Would be great to hear some feedback before I proceed with more work on this.

@errordeveloper
Copy link
Contributor Author

My main question regarding the unit tests is whether I should add it to every implementation of this trait except from the trait definition, right?

@errordeveloper
Copy link
Contributor Author

Regarding Windows, I have no idea, so please do feel free to make some suggestions... Does it even have FDs in the same sense?

@alexcrichton
Copy link
Member

I'm a little wary to do this because of, sadly, windows. I don't quite grok how windows deals with HANDLE values vs file descriptors (there is some correspondence at least). Dealing with a file descriptor manually is also largely not portable, but it is often quite useful!

I would want to understand better how this works with windows before moving forward, personally.

@errordeveloper
Copy link
Contributor Author

I would want to understand better how this works with windows before moving forward, personally.

@alexcrichton understood.

Dealing with a file descriptor manually is also largely not portable, but it is often quite useful!

I have briefly though of doing something along the lines of:

fn with_c_fd(&self, close_after: bool, block: |libc::c_int| -> IoResult) { match block(self.fd()) { ... } }

Although this seems rather useful and makes sense along the side of with_c_str, but in some cases all you really want is to pass that file descriptor along and take the responsibility of closing it yourself. The prosed change certainly makes a lot of sense as the first step.

To me, the most disapointing bit was finding out that there is fs_from_raw_fd...

@errordeveloper
Copy link
Contributor Author

@alexcrichton by looking at libnative/io/file_win32.rs, it doesn't seem like there is a huge difference...

There is only an additional method:

    pub fn fd(&self) -> fd_t { self.inner.fd }

    pub fn handle(&self) -> libc::HANDLE {
        unsafe { libc::get_osfhandle(self.fd()) as libc::HANDLE }
    }

So I guess I could provide get_handle in addition to get_fd in the windows version.

We need to check with libuv also...

@alexcrichton
Copy link
Member

I'm not sure whether a file descriptor is a first-class thing in windows or what relation it has to HANDLE other than there's likely a one-to-one mapping. Most windows apis deal with a HANDLE instead of a file descriptor, so this change would definitely be favoring unix over windows.

From your example above, however, this is why I've been hesitant to add it in the past. Many methods on these objects won't work if you start manually fiddling with flags on the file descriptor itself.

@retep998
Copy link
Member

Perhaps get_fd should be unsafe to warn you that working with file descriptors is an unsafe thing?

@errordeveloper
Copy link
Contributor Author

I have considered this and it's in the doc comment. There is not much you
can do with it apart from passing to a C function, which you will have to
wrap into an unsafe block anyway.
The FD is not a thing that you would have to keep secure anyway, a user can
get all FD to each of the processes running with their UID without a lot of
effort, if they had to be secure they wouldn't be numeric anyway. This is
not the right place to discuss security anyway, why did I even get started?
As far as Rust is concerned, it does not intend to make the result
application secure. It can only make it safer. Although, I think insecurity
is a side effect of unsafe languages...
On 13 Jul 2014 18:30, "Peter Atashian" notifications@github.com wrote:

Perhaps get_fd should be unsafe to warn you that working with file
descriptors is an unsafe thing?

Reply to this email directly or view it on GitHub
#15643 (comment).

@alexcrichton
Copy link
Member

This is not necessarily a security bug, it has the possibility of becoming a memory safety bug. Functions in libnative (and possibly libuv as well) may be designed assuming complete control of a file descriptor. If this control is broken, the broken assumptions may lead to memory unsafety.

I don't think we have any examples of this today, but that doesn't mean that it's impossible to have.

@errordeveloper
Copy link
Contributor Author

OK, sure. I'm not completely opposing making it an unsafe function. And,
perhaps with suggested with_c_fd solution, it may be made somewhat safer
with respect to other functions... I would also propose adding a lock
attribute around this and may be also add a guard for ensuring that only
read-only access is allowed, perhaps. For example, get_fd could fail if
it holds write access. Although it would be less useful that way... I'm not
sure whether one can force to reopen the same FD though.

So best solution would probably be to just simply call those functions
unsafe.
On 13 Jul 2014 19:37, "Alex Crichton" notifications@github.com wrote:

This is not necessarily a security bug, it has the possibility of becoming
a memory safety bug. Functions in libnative (and possibly libuv as well)
may be designed assuming complete control of a file descriptor. If this
control is broken, the broken assumptions may lead to memory unsafety.

I don't think we have any examples of this today, but that doesn't mean
that it's impossible to have.

Reply to this email directly or view it on GitHub
#15643 (comment).

@andrew-d
Copy link
Contributor

I'm not sure whether a file descriptor is a first-class thing in windows or what relation it has to HANDLE other than there's likely a one-to-one mapping. Most windows apis deal with a HANDLE instead of a file descriptor, so this change would definitely be favoring unix over windows.

For what it's worth, a HANDLE on Windows is essentially the equivalent of the unix fd - it's what's used by the operating system, stored internally in the handle table of each process, etc.. fds on Windows, on the other hand, are only used by the C runtime library, and I believe there are cases in which a CRT fd doesn't have a backing HANDLE, though some quick googling doesn't provide any information about this.

@errordeveloper
Copy link
Contributor Author

So my Stack Overflow post resulted in a solution that will work for what I am doing right now.

@errordeveloper
Copy link
Contributor Author

I do still insist on this going forward.

@alexcrichton you commented earlier:

This is not necessarily a security bug, it has the possibility of becoming a memory safety bug. Functions in libnative (and possibly libuv as well) may be designed assuming complete control of a file descriptor. If this control is broken, the broken assumptions may lead to memory unsafety.

So it does occur to me now, based on the solution posted on Stack Overflow, that libnative (and most likelly libuv too) will allow one to call file.fd(), it's just that std::io::fs::File tries to hide it away for some reason.

Perhaps fd() and my get_fd() should be made unsafe for the reasons that we discussed? Although, personally I do not see those reasons being particularly strong. As I said, if one will be passing this to a C function call, they will need to explicitly say that it's going to be unsafe. In fact, I do think that operation of actually getting the numeric value of the file descriptor is 100% safe as such. It's what you do with it after.

In fact, one might actually simply use this to could how many files they have opened so far. As far as standard run time goes, it could actually attempt to protect form file descriptor leaks by preventing someone from opening the same file over and over again with the same mode... It would just tell you, that this had already been open and here is the original handle.

@alexcrichton
Copy link
Member

The librustuv and libnative interfaces are likely to always be experimental and not intended for general consumption. They provide their own access to file descriptors out of necessity for the library itself, I wouldn't recommend relying on the native/rustuv interfaces for extracting file descriptors.

If we were to add this function, I believe that it should be unsafe as we don't know for sure that all backing support is indeed memory safe if a file descriptor is fiddled with. Additionally, it should generally be avoided and not be necessary.

Also, from what @andrew-d said about file descriptors on windows, this would be tying rust permanently to their C runtime library. I'm not entirely sure if that's desirable or not?

@errordeveloper
Copy link
Contributor Author

Another use case for this would when one wants to use std::os::MemoryMap, which takes a file descriptor and does so on windows also. If you look at the source code it calls libc::get_osfhandle(fd) to get the handle...

Anyhow, right now I need to use native::io::file::open, if I want to use std::os::MemoryMap. Is this okay or I should ideally use std::io::fs::File?

extern crate std;
extern crate native;
extern crate debug;

use native::io::file::open;
use std::rt::rtio::{Open, Read};
use std::os::{MemoryMap, MapReadable, MapWritable, MapFd};

fn main() {
  let path = "test.bin";
  let file =
    match open(&path.to_c_str(), Open, Read) {
      Err(_) => { fail!("Something is terribly wrong!"); },
      Ok(f)  => { f },
    };
  let fmap = 
    match MemoryMap::new(4096u, [MapReadable, MapWritable, MapFd(file.fd())]) {
      Err(_) => { fail!("Something is terribly wrong!"); },
      Ok(f)  => { f },
    };
  println!("{:?}", fmap);
}

@alexcrichton
Copy link
Member

Ok, after stewing for a bit I think I'd like to move forward with this. @errordeveloper, could you add the following pieces to the pr?

  • Add get_fd to all the other I/O primitives in std::io (where applicable). The ones that I can think of are:
    • {Tcp, Unix}{Stream, Acceptor, Listener}
    • UdpSocket,
    • File
    • PipeStream
    • Std{Reader, Writer}
  • All methods can basically point to one other method to look for documentation
  • There should be stern documentation about the dangers of using raw file descriptors, I can help with the exact wording
  • Windows networking objects should be commented that they are returning a SOCKET, not a file descriptor
  • Comments should indicate that file descriptors on windows are C runtime artifacts, not HANDLE values.

And some unresolved questions I also have:

  • Should get_fd be unsafe? (I would vote yes)
  • Should get_fd return an int instead of a c_int? This is largely due to SOCKET where it is a c_uint and this also may just vary per-platform. I would tend to lean towards yes to not expose C types, but I don't feel super strongly about this.
  • Is get_fd the best name? The sockets on windows definitely aren't file descriptors...
  • Should get_fd be a trait method? It's probably too much abstraction for now, so I would vote no. The upside is that you have to explicitly import something to look at the file descriptor.

Does that sounds ok? We can deal with the questions in time, and I'm around to help out with any questions you have, feel free to reach out on IRC! (I'm acrichto)

@huonw
Copy link
Member

huonw commented Jul 15, 2014

Is get_fd the best name? The sockets on windows definitely aren't file descriptors...

get_os_identifier? (I guess people will likely be looking for "file descriptor" or "fd" when searching for it, so this may not be the most discoverable name.)

@chris-morgan
Copy link
Member

I do not like this, Sam-I-Am.

Supposing I wish to use a different runtime that doesn’t use that sort of descriptor at all—a bare metal case might do it, or perhaps an emscripten-specific runtime for running in the browser (I can easily imagine that one having absolutely nothing to do with such magic numbers). I do not believe it is at all reasonable to expect to be able to have such a descriptor, at the language level at least.

Still, for pragmatism’s sake, it must evidently be exposed somewhere. Given that, I would say very strongly that such functions should be marked unsafe, and probably unstable also. I would think also that having it on a separate trait would be a good thing—could the trait even live in the libc crate to make it absolutely clear that it’s That Sort of Thing?

@chris-morgan
Copy link
Member

A benefit of using a trait for this is that you could then use it as a generic bound, e.g. for exposing a safe interface over ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count).

@errordeveloper
Copy link
Contributor Author

I cannot disagree with you, Chris. I do however believe that current
standard run-time contains plenty of bits that wouldn't work on bare metal.
My original diff contains only changes for filesystem access abstractions,
and I would argue whether filesystems should exist on certain kinds of
embedded systems. In regards to sockets, nearly all of embedded stacks
numeric descriptors that are quite similar in nature, those that are
different in respect to this are, in fact, different in many other ways.
On 18 Jul 2014 08:45, "Chris Morgan" notifications@github.com wrote:

I do not like this, Sam-I-Am.

Supposing I wish to use a different runtime that doesn't use that sort of
descriptor at all--a bare metal case might do it, or perhaps an
emscripten-specific runtime for running in the browser (I can easily
imagine that one having absolutely nothing to do with such magic
numbers). I do not believe it is at all reasonable to expect to be able to
have such a descriptor, at the language level at least.

Still, for pragmatism's sake, it must evidently be exposed somewhere.
Given that, I would say very strongly that such functions should be marked
unsafe, and probably unstable also. I would think also that having it on a
separate trait would be a good thing--could the trait even live in the libc
crate to make it absolutely clear that it's That Sort of Thing?

Reply to this email directly or view it on GitHub
#15643 (comment).

@huonw
Copy link
Member

huonw commented Jul 18, 2014

I would think also that having it on a separate trait would be a good thing

A trait is useless for distiguishing things that only have sensible implementation under certain runtimes, the std::io interfaces are completely opaque to the underlying runtime, meaning an implementation for File would apply to a File objects in any runtimes, exactly the same as a freestanding method.

This is to improve the interoperability with external C libraries.
One should be able to leverage most Rust's standard run-time functions
for openning and closing files with all the error handling etc, while
only making C library calls when it's very neccessary.
@errordeveloper
Copy link
Contributor Author

@huonw what do you say: add get_fd() to std::io::File or keep it how it is right now, requiring one to use native::io::file::open?

@errordeveloper
Copy link
Contributor Author

I have just notice new rust-lang/rfcs#185... If that's to be implemented, I'd hold on with the get_fd() story.

@tbu-
Copy link
Contributor

tbu- commented Aug 13, 2014

I don't see why getting the file descriptor should be unsafe – getting a raw pointer from a & pointer isn't unsafe either, it's only that actually using it is unsafe.

As per the Rust design documents, reading private values from objects isn't considered unsafe if I remember correctly.

@l0kod
Copy link
Contributor

l0kod commented Aug 18, 2014

First, about the get_fd method, I suggest to use something like the FileDesc struct (with close_on_drop set to true) instead of the c_int result (how should be typed to fd_t anyway).

Second, it would be safer to enclose the file descriptor use into a closure. What's unsafe about file descriptor is their lifetime: they can be closed by another piece of code, refer to a different file (i.e. close then open), or passed to an execve (Command close all file descriptor, except 0 to 2, before execve anyway, but some unsafe code could execve too).

I don't think we need a close_after parameter, we just can make the file descriptor object live until the C binding close the raw file descriptor.
If we really need a persistent file descriptor we can dup it.
For MemoryMap-like struct, the Rust libraries should take care of file descriptor transfert/safety. Maybe MapFd should be created from a RtioPipe?

We could use a with_fd method inspired from the with_c_str but for the RtioPipe trait:

fn with_fd<T>(&self, f: |Option<&fd_t>| -> T) -> T;

I used Option hack because it's maybe not possible to get fd_t from any RtioPipe (libuv, special architecture?).

The problem is that we get error: cannot call a generic method through an object
By the way, is there plan to support generic method call through trait?

@l0kod
Copy link
Contributor

l0kod commented Aug 18, 2014

First, about the get_fd method, I suggest to use something like the FileDesc struct (with close_on_drop set to true) instead of the c_int result (how should be typed to fd_t anyway).

Not the right choice, prefer to return &fd_t to let the RtioPipe handle the file descriptor closing, but it sound unsafe anyway because the RtioPipe can read/write/lseek/close the same file descriptor.
The with_fd method definitely seems to be a better choice.

To properly handle file descriptor lifetime in bindings, the right thing to do is to wrap the bindings in a Rust object who handle the file descriptor closing.

@tbu-
Copy link
Contributor

tbu- commented Aug 19, 2014

Second, it would be safer to enclose the file descriptor use into a closure. What's unsafe about file descriptor is their lifetime: they can be closed by another piece of code, refer to a different file (i.e. close then open), or passed to an execve (Command close all file descriptor, except 0 to 2, before execve anyway, but some unsafe code could execve too).

I don't see how reading the file descriptor is unsafe in any way – only using them to construct other things should be unsafe. Compare

&3 as *const int

(which is safe) to

unsafe { *(&3 as *const int) }

(which is unsafe). Getting the raw value of a pointer is not unsafe at all, while actually using it is. This is how it should be handled with file descriptors IMO, getting them out of a structure is no problem at al – only using them to construct other things should be unsafe.

@tbu-
Copy link
Contributor

tbu- commented Aug 26, 2014

What are file descriptors supposed to be on Windows? I only know of HANDLEs (and would have assumed that Rust will return these as FileDesc on Windows.)

@thestinger
Copy link
Contributor

It has numeric file descriptors in the C runtime, which can be converted to and from a HANDLE. It would be best to just use HANDLE on Windows and have a different type on different platforms. It could trigger a portability lint or whatever.

@thestinger
Copy link
Contributor

Anyway, it would be nice to expose a proper FileDesc type with the reference counting ownership semantics of POSIX file descriptors and Windows HANDLEs.

pub struct FileDesc {
    fd: c_int
}

impl FileDesc {
    pub unsafe fn from_handle(fd: c_int) -> FileDesc { ... }
    pub fn as_handle(&self) -> c_int { ... }
    pub fn into_handle(self) -> c_int { ... }
    pub fn dup(&self) -> Result<FileDesc, Error> { ... }
    pub fn read(&self, buffer: &mut [u8]) -> IoResult<uint> { ... }
    pub fn write(&self, buffer: &[u8]) -> IoResult<uint> { ... }
    ...
}

An important thing to keep in mind is that all files should be atomically O_CLOEXEC by default, so using dup, dup2, etc. rather than fcntl is wrong.

@l0kod
Copy link
Contributor

l0kod commented Aug 26, 2014

fd: c_inc

Preferable to use a dedicated fd_t type instead of c_int.

impl FileDesc {

pub fn read(&self, buffer: &mut [u8]) -> IoResult<uint> { ... }

Like I said previously, we should probably take care of file descriptor type (e.g. regular file, socket, directory) and their capabilities: it should not be possible to attempt a read on a directory file descriptor.

To avoid this, FileDesc should be a trait and an enum. Something like this:

trait BasicFileDesc {
    pub unsafe fn from_handle(fd: fd_t) -> Self {}
    // …
    pub fn dup(&self) -> IoResult<Self> {fcntl()} // also called by clone()?
}
impl BasicFileDesc for RegularFd;
// same for all other file descriptor types…

trait DirFileDesc {
    pub fn openat() -> IoResult<FileDesc> // enum defined below
    pub fn getdents() -> IoResult<DirEntry> // or readdir-like
    // …
}
impl DirFileDesc for DirectoryFd;

trait RwFileDesc {
    pub fn read() -> IoResult<uint> { ... }
    // …
}
impl RwFileDesc for RegularFd;
impl RwFileDesc for CharDevFd;
// …

trait DevFileDesc {
    fn ioctl() -> IoResult<()>
    // …
}
impl DevFileDesc for CharDevFd;

impl CharDevFd {
    fn isatty() -> bool {}
}

enum FileDesc {
    Regular(RegularFd),
    Directory(DirectoryFd), // implement openat, linkat…
    CharDev(CharDevFd), // implement ioctl, isatty…
    BlockDev(BlockDevFd), // implement ioctl…
    Pipe(PipeFd),
    Symlink(SymlinkFd),
    Socket(SocketFd),
    Unknown(UnknownFd), // seems to be possible: bad thing for static typing but could implement all traits
}

E.g. the native::io::file::open should then return the previously defined IoResult<FileDesc>.

An important thing to keep in mind is that all files should be atomically O_CLOEXEC by default, so using dup, dup2, etc. rather than fcntl is wrong.

Right, dup3 can also do it but is a bit more newer.

@thestinger
Copy link
Contributor

@l0kod: F_DUPFD_CLOEXEC for fcntl is part of POSIX, and equivalent to dup with CLOEXEC. I don't think there's a standard equivalent for dup2, but yes, dup3 can be used on Linux.

@thestinger
Copy link
Contributor

If you do this with an enum, you're responsible for including every kind of file descriptor / handle. On Linux, that includes signalfd, eventfd, timerfd, epoll and more. It's not unsafe to call read on a file descriptor that's not readable, and a low-level type shouldn't be adding checks that aren't necessary for safety. There can be higher level abstractions, but that doesn't preclude having useful low-level functionality that it's built on.

@tbu-
Copy link
Contributor

tbu- commented Aug 26, 2014

Indeed, the whole point we wanted to achieve with this was compatiblity with low-level libraries in other languages – so wrapping it up as such an enum would have the wrong effect in my opinion.

@l0kod
Copy link
Contributor

l0kod commented Aug 26, 2014

OK, I missed some file descriptor types ;) but It's probably why fstat can return an unknown type.
All 7 types (plus the unknown) are defined by POSIX.

@l0kod
Copy link
Contributor

l0kod commented Aug 26, 2014

wrapping it up as such an enum would have the wrong effect in my opinion.

Doesn't it one of the goal of static typing: to avoid confusion between different things?

I don't see why it would be a problem for binding with low-level library.

@tbu-
Copy link
Contributor

tbu- commented Aug 26, 2014

Because C doesn't understand these enums, it only "understands" the raw FileDesc type.

@l0kod
Copy link
Contributor

l0kod commented Aug 26, 2014

Yes, C understand the raw fd_t but the FFI goal is to wrap unsafe functions in safe ones. High level typing is not a problem.

@thestinger
Copy link
Contributor

By adding abstraction and typing restrictions, it's no longer a low-level library and can no longer express everything that you can in C. Exposing low-level functionality does not preclude having high-level abstractions built on top of them. It shouldn't go beyond what is necessary to achieve safety unless it's not making any use cases more difficult.

@l0kod
Copy link
Contributor

l0kod commented Aug 26, 2014

By adding abstraction and typing restrictions, it's no longer a low-level library and can no longer express everything that you can in C.

Yes, that's the goal of proper wrapping, to avoid errors :)

I don't see the difference of use between a single FileDesc and multiple ones with traits for low-level binding. The enum thing is not a problem if you use it for return value and take a trait for function argument.

The idea is that bindings who return file descriptors (e.g. open(2)) can tighten file descriptor type with fstat. Functions taking file descriptor as argument should use traits. Moreover, I don't see any problem if UnknownFd implement all traits. Am I missing something?

@petrochenkov
Copy link
Contributor

Plus to thestinger, I feel the primary goal is a possibility to use an operating system API in its entirety.
And after that anybody can add, refine and discuss whatever abstractions and wrappers he wants.

@l0kod
Copy link
Contributor

l0kod commented Aug 28, 2014

Good news! libuv now support file descriptors: joyent/libuv#1435.

@retep998
Copy link
Member

Of particular note is that libuv provides HANDLE on Windows while Rust provides c_int file descriptors instead.

@alexcrichton
Copy link
Member

Closing due to inactivity. There are many changes planned to the I/O libraries, especially libnative. While these changes settle down over the coming weeks, something like this will definitely be included. I would recommend holding off to see how the new design of libnative works out.

The plan is to have high-level primitives in libstd which provide access to lower-level primitives in libnative which in turn provide access to the raw file descriptor/handle objects.

@errordeveloper
Copy link
Contributor Author

@alexcrichton I am definitely looking forward to see those changes. Is there a tracking issues for it?

@errordeveloper errordeveloper deleted the get_fd branch September 9, 2014 13:11
@alexcrichton
Copy link
Member

I'd follow rust-lang/rfcs#219 which, although closed, will be updated with the next course of action.

@aturon
Copy link
Member

aturon commented Sep 9, 2014

See (and please comment on!) this new RFC.

@l0kod
Copy link
Contributor

l0kod commented Nov 9, 2014

@l0kod
Copy link
Contributor

l0kod commented Nov 22, 2014

New PR: #19169

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.