Skip to content

OSv Linux ABI Compatibility

WALDEMAR KOZACZUK edited this page Nov 3, 2019 · 14 revisions

OSv mostly implements Linux's ABI. This means that most unmodified executable code compiled for Linux can be run in OSv. There are only a few areas where OSv is known to be imperfectly compatible with Linux. This document will list these cases.

Executable formats

Currently, OSv can run executables in relocatable shared object (".so") form as well as standard Linux position-independent executables ("PIE") and position-dependent executables (non-relocatable dynamically linked executable) (for details see this) "as is". Support for running statically linked executables will be harder.

In most cases, there should be no need to recompile an app from the source in order to make it run on OSv.

Processes

OSv supports only a single process. Therefore, fork(), vfork() and clone() are not supported (their use in an executable will cause a crash because of a missing symbol).

Moreover, in OSv there is no isolation between the single process and the kernel - we do not track which memory, and which resources (threads, mutexes, etc.) belong to the process and which to the kernel. So exec() in an attempt to switch only the process but not the kernel, is not supported. So all exec() variants - execl(), execlp(), execle(), execv(), execvp(), execvpe(), and execve() - are not supported. Instead, if you want to run another executable, you can load it as a new shared object (with dlopen() or equivalent) and run its main() (see also osv::run(3o)), and attempt to free the old shared object's resources, and to unload it. NOTE: Some of the exec functions were implemented. This section must be updated.

Users

OSv is designed to support a single application on a VM, and with only a single application trying to isolate different users is pointless. Therefore, OSv only supports a single user - with uid=0 and gid=0. Trying to set a different user will fail. Permission bits on files are ignored (TODO: only the owner/group/other difference, or also the writable bit?)

Signals

In Linux, when a signal is sent to a process with kill(), it is delivered to one of this process's threads which hasn't masked this signal - preferably to the main thread but if it masked the signal, then one of the other threads is chosen.

In contrast, in OSv signals are delivered in a new thread, not one of the process's existing threads. The signal handler does not preempt an existing thread, and cannot take over it (as single-thread applications sometimes used longjmp()). We do support, however, signal delivery interrupting a long-running system call (such as sleep() or a blocking read() from the network) in a thread.

It is hoped that this difference will not matter to most actual uses of kill() (and related functions, such as alarm()) in cloud applications. The reason why signal handling was implemented differently in OSv is system call reentrancy and interruption: If a signal handler was to be run in an existing thread, we would need to handle the case where the signal handler is run in the middle of a system call, and the handler also calls a system call - and the system call might not be reentrant. It is possible to track calls to "system calls" (calls from a shared object to the main program), and avoid running a signal handler while a system call is in progress (and run it when it returns), but then we have a problem of system call interruption - if we are sleeping on "system calls" like sleep(), poll(), read(), etc., or internally on OSv functions like mutex_wait(), condvar_wait(), msleep(), the signal handler can be delayed indefinitely, so in Unix the signal first interrupts the system call, which needs to unravel its stack and return an EINTR from the system call. All of these are doable, but will require extensive changes to the code, and make it slower and uglier just to support an archaic Unix API, kill().

Special files

Most linux /proc/* files are not yet supported on OSV. /dev/urandom and /dev/random are available. Both of them are implemented on top of FreeBSD CPRNG that uses the Yarrow algorithm. NOTE: most of entropy comes from hardware sources, e.g. virtio-rng; interrupt timing is used as a software source of entropy).

Clone this wiki locally