Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored nanobind so that it works with Py_LIMITED_API #37

Merged
merged 1 commit into from
May 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 28 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,16 +142,21 @@ long-standing performance issues in _pybind11_:
pointer chasing compared to _pybind11_). The per-instance overhead for
wrapping a C++ type into a Python object shrinks by 2.3x. (_pybind11_: 56
bytes, _nanobind_: 24 bytes.)

- C++ function binding information is now co-located with the Python function
object (less pointer chasing).

- C++ type binding information is now co-located with the Python type object
(less pointer chasing, fewer hashtable lookups).

- _nanobind_ internally replaces `std::unordered_map` with a more efficient
hash table ([tsl::robin_map](https://github.com/Tessil/robin-map), which is
included as a git submodule).

- function calls from/to Python are realized using [PEP 590 vector
calls](https://www.python.org/dev/peps/pep-0590), which gives a nice speed
boost. The main function dispatch loop no longer allocates heap memory.

- _pybind11_ was designed as a header-only library, which is generally a good
thing because it simplifies the compilation workflow. However, one major
downside of this is that a large amount of redundant code has to be compiled
Expand All @@ -160,15 +165,18 @@ long-standing performance issues in _pybind11_:
support library (`libnanobind`) and links it against the binding code to
avoid redundant compilation. When using the CMake `nanobind_add_module()`
function, this all happens transparently.

- `#include <pybind11/pybind11.h>` pulls in a large portion of the STL (about
2.1 MiB of headers with Clang and libc++). _nanobind_ minimizes STL usage to
avoid this problem. Type casters even for for basic types like `std::string`
require an explicit opt-in by including an extra header file (e.g. `#include
<nanobind/stl/string.h>`).

- _pybind11_ is dependent on *link time optimization* (LTO) to produce
reasonably-sized bindings, which makes linking a build time bottleneck. With
_nanobind_'s split into a precompiled core library and minimal
metatemplating, LTO is no longer important.

- _nanobind_ maintains efficient internal data structures for lifetime
management (needed for `nb::keep_alive`, `nb::rv_policy::reference_internal`,
the `std::shared_ptr` interface, etc.). With these changes, it is no longer
Expand All @@ -180,6 +188,18 @@ long-standing performance issues in _pybind11_:
Besides performance improvements, _nanobind_ includes a quality-of-live
improvements for developers:

- _nanobind_ has [greatly
improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
support for exchanging CPU/GPU/TPU/.. tensor data structures with modern
array programming frameworks.

- _nanobind_ can target Python's [stable ABI
interface](https://docs.python.org/3/c-api/stable.html) starting with Python
3.12. This means that extension modules will eventually be compatible with
any future version of Python without having to compile separate binaries per
version. That vision is still far out, however: it will require Python 3.12+
to be widely deployed.

- When the python interpreter shuts down, _nanobind_ reports instance, type,
and function leaks related to bindings, which is useful for tracking down
reference counting issues.
Expand All @@ -195,11 +215,6 @@ improvements for developers:
- _nanobind_ docstrings have improved out-of-the-box compatibility with tools
like [Sphinx](https://www.sphinx-doc.org/en/master/).

- _nanobind_ has [greatly
improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
support for exchanging tensor data structures with modern array programming
frameworks.

### Dependencies

_nanobind_ depends on recent versions of everything:
Expand Down Expand Up @@ -419,24 +434,26 @@ changes are detailed below.


- **Supplemental type data**: _nanobind_ can store supplemental data along
with registered types. This information is co-located with the Python type
object. An example use of this fairly advanced feature are libraries that
register large numbers of different types (e.g. flavors of tensors). A
single generically implemented function can then query this supplemental
information to handle each type slightly differently.
with registered types. An example use of this fairly advanced feature are
libraries that register large numbers of different types (e.g. flavors of
tensors). A single generically implemented function can then query this
supplemental information to handle each type slightly differently.

```cpp
struct Supplement {
... // should be a POD (plain old data) type
};

// Register a new type Test, and reserve space for sizeof(Supplement)
nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>())
nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>(), nb::is_final())

/// Mutable reference to 'Supplement' portion in Python type object
Supplement &supplement = nb::type_supplement<Supplement>(cls);
```

The supplement is not propagated to subclasses created within Python.
Such types should therefore be created with `nb::is_final()`.

- **Low-level interface**: _nanobind_ exposes a low-level interface to
provide fine-grained control over the sequence of steps that instantiates a
Python object wrapping a C++ instance. Like the above point, this is useful
Expand Down
51 changes: 38 additions & 13 deletions cmake/nanobind-config.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,15 @@ function (nanobuild_build_library TARGET_NAME TARGET_TYPE)
${NB_DIR}/include/nanobind/stl/vector.h
${NB_DIR}/include/nanobind/stl/list.h

${NB_DIR}/src/internals.h
${NB_DIR}/src/buffer.h
${NB_DIR}/src/internals.cpp
${NB_DIR}/src/common.cpp
${NB_DIR}/src/tensor.cpp
${NB_DIR}/src/nb_internals.h
${NB_DIR}/src/nb_internals.cpp
${NB_DIR}/src/nb_func.cpp
${NB_DIR}/src/nb_type.cpp
${NB_DIR}/src/nb_enum.cpp
${NB_DIR}/src/common.cpp
${NB_DIR}/src/error.cpp
${NB_DIR}/src/tensor.cpp
${NB_DIR}/src/trampoline.cpp
${NB_DIR}/src/implicit.cpp
)
Expand Down Expand Up @@ -161,8 +161,12 @@ function(nanobind_disable_stack_protector name)
endfunction()

function(nanobind_extension name)
set_target_properties(${name} PROPERTIES
PREFIX "" SUFFIX "${NB_SUFFIX}")
set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX "${NB_SUFFIX}")
endfunction()

function(nanobind_extension_abi3 name)
get_filename_component(ext "${NB_SUFFIX}" LAST_EXT)
set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX ".abi3${ext}")
endfunction()

function (nanobind_cpp17 name)
Expand All @@ -187,23 +191,44 @@ function (nanobind_headers name)
endfunction()

function(nanobind_add_module name)
cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")
cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;STABLE_ABI;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")

Python_add_library(${name} MODULE ${ARG_UNPARSED_ARGUMENTS})

nanobind_cpp17(${name})
nanobind_extension(${name})
nanobind_msvc(${name})
nanobind_headers(${name})

if (ARG_NB_STATIC)
nanobuild_build_library(nanobind-static STATIC)
target_link_libraries(${name} PRIVATE nanobind-static)
# Limited API interface only supported in Python >= 3.12
if ((Python_VERSION_MAJOR EQUAL 3) AND (Python_VERSION_MINOR LESS 12))
set(ARG_STABLE_ABI OFF)
endif()
Comment on lines +202 to +205
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this set to 3.12? The documentation mentions:

Python 3.2 introduced the Limited API, a subset of Python’s C API.

so I would have assumed that 3.2 is sufficient.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nanobind depends on a few features that I specifically worked on adding to the stable ABI (search for my name here: https://docs.python.org/3.12/whatsnew/3.12.html). That means they are only usable in stable ABI builds targeting 3.12+ (so not very useful just yet, but good to have for the future).


if (ARG_STABLE_ABI)
if (ARG_NB_STATIC)
nanobuild_build_library(nanobind-static-abi3 STATIC)
set(libname nanobind-static-abi3)
else()
nanobuild_build_library(nanobind-abi3 SHARED)
set(libname nanobind-abi3)
endif()

target_compile_definitions(${libname} PUBLIC -DPy_STABLE_ABI=0x030C0000)
nanobind_extension_abi3(${name})
else()
nanobuild_build_library(nanobind SHARED)
target_link_libraries(${name} PRIVATE nanobind)
if (ARG_NB_STATIC)
nanobuild_build_library(nanobind-static STATIC)
set(libname nanobind)
else()
nanobuild_build_library(nanobind SHARED)
set(libname nanobind)
endif()

nanobind_extension(${name})
endif()

target_link_libraries(${name} PRIVATE ${libname})

if (NOT ARG_PROTECT_STACK)
nanobind_disable_stack_protector(${name})
endif()
Expand Down
7 changes: 7 additions & 0 deletions docs/cmake.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,13 @@ it performs the following steps to produce efficient bindings.
- It appends the library suffix (e.g., `.cpython-39-darwin.so`) based on
information provided by CMake's `FindPython` module.

- When requested via the optional `STABLE_ABI` parameter, and when your
version of Python is sufficiently recent (3.12 +), the implementation
will build a [stable ABI](https://docs.python.org/3/c-api/stable.html)
extension module with a different suffix (e.g., `.abi3.so`). This comes at a
performance cost since _nanobind_ can no longer access the internals of
various data structures directly.

- It statically or dynamically links against `libnanobind` depending on the
value of the `NB_SHARED` parameter of the CMake project. Note that
`NB_SHARED` is not an input of the `nanobind_add_module()` function. Rather,
Expand Down
8 changes: 4 additions & 4 deletions include/nanobind/nb_accessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,13 +120,13 @@ struct num_item_list {
using key_type = Py_ssize_t;

NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
*cache = PyList_GET_ITEM(obj, index);
*cache = NB_LIST_GET_ITEM(obj, index);
}

NB_INLINE static void set(PyObject *obj, Py_ssize_t index, PyObject *v) {
PyObject *old = PyList_GET_ITEM(obj, index);
PyObject *old = NB_LIST_GET_ITEM(obj, index);
Py_INCREF(v);
PyList_SET_ITEM(obj, index, v);
NB_LIST_SET_ITEM(obj, index, v);
Py_DECREF(old);
}
};
Expand All @@ -136,7 +136,7 @@ struct num_item_tuple {
using key_type = Py_ssize_t;

NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
*cache = PyTuple_GET_ITEM(obj, index);
*cache = NB_TUPLE_GET_ITEM(obj, index);
}

template <typename...Ts> static void set(Ts...) {
Expand Down
10 changes: 6 additions & 4 deletions include/nanobind/nb_attr.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,16 @@ struct is_method {};
struct is_implicit {};
struct is_operator {};
struct is_arithmetic {};
struct is_final { };
struct is_enum {
bool is_signed;
};

template <size_t /* Nurse */, size_t /* Patient */> struct keep_alive {};
template <typename T> struct supplement {};
struct type_callback {
type_callback(void (*value)(PyTypeObject *) noexcept) : value(value) {}
void (*value)(PyTypeObject *) noexcept;
type_callback(void (*value)(PyType_Slot **) noexcept) : value(value) {}
void (*value)(PyType_Slot **) noexcept;
};
struct raw_doc {
const char *value;
Expand Down Expand Up @@ -94,7 +96,7 @@ enum class func_flags : uint32_t {
is_implicit = (1 << 12),
/// Is this function an arithmetic operator?
is_operator = (1 << 13),
/// When the function is GCed, do we need to call func_data::free?
/// When the function is GCed, do we need to call func_data_prelim::free?
has_free = (1 << 14),
/// Should the func_new() call return a new reference?
return_ref = (1 << 15),
Expand All @@ -110,7 +112,7 @@ struct arg_data {
bool none;
};

template <size_t Size> struct func_data {
template <size_t Size> struct func_data_prelim {
// A small amount of space to capture data used by the function/closure
void *capture[3];

Expand Down
15 changes: 7 additions & 8 deletions include/nanobind/nb_call.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,14 @@ template <typename T>
NB_INLINE void call_analyze(size_t &nargs, size_t &nkwargs, const T &value) {
using D = std::decay_t<T>;

if constexpr (std::is_same_v<D, arg_v>) {
if constexpr (std::is_same_v<D, arg_v>)
nkwargs++;
} else if constexpr (std::is_same_v<D, args_proxy>) {
else if constexpr (std::is_same_v<D, args_proxy>)
nargs += len(value);
} else if constexpr (std::is_same_v<D, kwargs_proxy>) {
else if constexpr (std::is_same_v<D, kwargs_proxy>)
nkwargs += len(value);
} else {
else
nargs += 1;
}

(void) nargs; (void) nkwargs; (void) value;
}
Expand All @@ -53,7 +52,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,

if constexpr (std::is_same_v<D, arg_v>) {
args[kwargs_offset + nkwargs] = value.value.release().ptr();
PyTuple_SET_ITEM(kwnames, nkwargs++,
NB_TUPLE_SET_ITEM(kwnames, nkwargs++,
PyUnicode_InternFromString(value.name));
} else if constexpr (std::is_same_v<D, args_proxy>) {
for (size_t i = 0, l = len(value); i < l; ++i)
Expand All @@ -65,7 +64,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
while (PyDict_Next(value.ptr(), &pos, &key, &entry)) {
Py_INCREF(key); Py_INCREF(entry);
args[kwargs_offset + nkwargs] = entry;
PyTuple_SET_ITEM(kwnames, nkwargs++, key);
NB_TUPLE_SET_ITEM(kwnames, nkwargs++, key);
}
} else {
args[nargs++] =
Expand All @@ -88,7 +87,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
args[0] = nullptr; \
args_p = args + 1; \
} \
nargs |= PY_VECTORCALL_ARGUMENTS_OFFSET; \
nargs |= NB_VECTORCALL_ARGUMENTS_OFFSET; \
return steal(obj_vectorcall(base, args_p, nargs, kwnames, method_call))

template <typename Derived>
Expand Down
2 changes: 1 addition & 1 deletion include/nanobind/nb_cast.h
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ tuple make_tuple(Args &&...args) {
size_t nargs = 0;
PyObject *o = result.ptr();

(PyTuple_SET_ITEM(o, nargs++,
(NB_TUPLE_SET_ITEM(o, nargs++,
detail::make_caster<Args>::from_cpp(
(detail::forward_t<Args>) args,
detail::infer_policy<Args>(policy), nullptr).ptr()),
Expand Down
28 changes: 20 additions & 8 deletions include/nanobind/nb_class.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,19 @@ enum class type_flags : uint32_t {
is_arithmetic = (1 << 15),

/// This type is an arithmetic enumeration
has_type_callback = (1 << 16)
has_type_callback = (1 << 16),

/// This type does not permit subclassing from Python
is_final = (1 << 17),

/// This type does not permit subclassing from Python
has_supplement = (1 << 18)
};

struct type_data {
uint32_t size : 24;
uint32_t size;
uint32_t align : 8;
uint32_t flags : 20;
uint32_t supplement : 12;
uint32_t flags : 24;
const char *name;
const char *doc;
PyObject *scope;
Expand All @@ -77,10 +82,11 @@ struct type_data {
void (*move)(void *, void *) noexcept;
const std::type_info **implicit;
bool (**implicit_py)(PyTypeObject *, PyObject *, cleanup_list *) noexcept;
void (*type_callback)(PyTypeObject *) noexcept;
void (*type_callback)(PyType_Slot **) noexcept;
void *supplement;
};

static_assert(sizeof(type_data) == 8 + sizeof(void *) * 13);
static_assert(sizeof(type_data) == 8 + sizeof(void *) * 14);

NB_INLINE void type_extra_apply(type_data &t, const handle &h) {
t.flags |= (uint32_t) type_flags::has_base_py;
Expand All @@ -104,14 +110,20 @@ NB_INLINE void type_extra_apply(type_data &t, is_enum e) {
t.flags |= (uint32_t) type_flags::is_unsigned_enum;
}

NB_INLINE void type_extra_apply(type_data &t, is_final) {
t.flags |= (uint32_t) type_flags::is_final;
}

NB_INLINE void type_extra_apply(type_data &t, is_arithmetic) {
t.flags |= (uint32_t) type_flags::is_arithmetic;
}

template <typename T>
NB_INLINE void type_extra_apply(type_data &t, supplement<T>) {
static_assert(sizeof(T) <= 0xFF, "Supplement is too big!");
t.supplement += sizeof(T);
static_assert(std::is_trivially_default_constructible_v<T>,
"The supplement type must be a POD (plain old data) type");
t.flags |= (uint32_t) type_flags::has_supplement;
t.supplement = (void *) malloc(sizeof(T));
}

template <typename... Args> struct init {
Expand Down
Loading