wjakob · wjakob · May 26, 2022 · May 19, 2022 · hmenke · May 31, 2023
diff --git a/README.md b/README.md
@@ -142,16 +142,21 @@ long-standing performance issues in _pybind11_:
   pointer chasing compared to _pybind11_). The per-instance overhead for
   wrapping a C++ type into a Python object shrinks by 2.3x. (_pybind11_: 56
   bytes, _nanobind_: 24 bytes.)
+
 - C++ function binding information is now co-located with the Python function
   object (less pointer chasing).
+
 - C++ type binding information is now co-located with the Python type object
   (less pointer chasing, fewer hashtable lookups).
+
 - _nanobind_ internally replaces `std::unordered_map` with a more efficient
   hash table ([tsl::robin_map](https://github.com/Tessil/robin-map), which is
   included as a git submodule).
+
 - function calls from/to Python are realized using [PEP 590 vector
   calls](https://www.python.org/dev/peps/pep-0590), which gives a nice speed
   boost. The main function dispatch loop no longer allocates heap memory.
+
 - _pybind11_ was designed as a header-only library, which is generally a good
   thing because it simplifies the compilation workflow. However, one major
   downside of this is that a large amount of redundant code has to be compiled
@@ -160,15 +165,18 @@ long-standing performance issues in _pybind11_:
   support library (`libnanobind`) and links it against the binding code to
   avoid redundant compilation. When using the CMake `nanobind_add_module()`
   function, this all happens transparently.
+
 - `#include <pybind11/pybind11.h>` pulls in a large portion of the STL (about
   2.1 MiB of headers with Clang and libc++). _nanobind_ minimizes STL usage to
   avoid this problem. Type casters even for for basic types like `std::string`
   require an explicit opt-in by including an extra header file (e.g. `#include
   <nanobind/stl/string.h>`).
+
 - _pybind11_ is dependent on *link time optimization* (LTO) to produce
   reasonably-sized bindings, which makes linking a build time bottleneck. With
   _nanobind_'s split into a precompiled core library and minimal
   metatemplating, LTO is no longer important.
+
 - _nanobind_ maintains efficient internal data structures for lifetime
   management (needed for `nb::keep_alive`, `nb::rv_policy::reference_internal`,
   the `std::shared_ptr` interface, etc.). With these changes, it is no longer
@@ -180,6 +188,18 @@ long-standing performance issues in _pybind11_:
 Besides performance improvements, _nanobind_ includes a quality-of-live
 improvements for developers:
 
+- _nanobind_ has [greatly
+  improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
+  support for exchanging CPU/GPU/TPU/.. tensor data structures with modern
+  array programming frameworks.
+
+- _nanobind_ can target Python's [stable ABI
+  interface](https://docs.python.org/3/c-api/stable.html) starting with Python
+  3.12. This means that extension modules will eventually be compatible with
+  any future version of Python without having to compile separate binaries per
+  version. That vision is still far out, however: it will require Python 3.12+
+  to be widely deployed.
+
 - When the python interpreter shuts down, _nanobind_ reports instance, type,
   and function leaks related to bindings, which is useful for tracking down
   reference counting issues.
@@ -195,11 +215,6 @@ improvements for developers:
 - _nanobind_ docstrings have improved out-of-the-box compatibility with tools
   like [Sphinx](https://www.sphinx-doc.org/en/master/).
 
-- _nanobind_ has [greatly
-  improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
-  support for exchanging tensor data structures with modern array programming
-  frameworks.
-
 ### Dependencies
 
 _nanobind_ depends on recent versions of everything:
@@ -419,24 +434,26 @@ changes are detailed below.
 
 
   - **Supplemental type data**: _nanobind_ can store supplemental data along
-    with registered types. This information is co-located with the Python type
-    object. An example use of this fairly advanced feature are libraries that
-    register large numbers of different types (e.g. flavors of tensors). A
-    single generically implemented function can then query this supplemental
-    information to handle each type slightly differently.
+    with registered types. An example use of this fairly advanced feature are
+    libraries that register large numbers of different types (e.g. flavors of
+    tensors). A single generically implemented function can then query this
+    supplemental information to handle each type slightly differently.
 
     ```cpp
     struct Supplement {
         ... // should be a POD (plain old data) type
     };
 
     // Register a new type Test, and reserve space for sizeof(Supplement)
-    nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>())
+    nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>(), nb::is_final())
 
     /// Mutable reference to 'Supplement' portion in Python type object
     Supplement &supplement = nb::type_supplement<Supplement>(cls);
     ```
 
+    The supplement is not propagated to subclasses created within Python.
+    Such types should therefore be created with `nb::is_final()`.
+
   - **Low-level interface**: _nanobind_ exposes a low-level interface to
     provide fine-grained control over the sequence of steps that instantiates a
     Python object wrapping a C++ instance. Like the above point, this is useful

diff --git a/cmake/nanobind-config.cmake b/cmake/nanobind-config.cmake
@@ -80,15 +80,15 @@ function (nanobuild_build_library TARGET_NAME TARGET_TYPE)
     ${NB_DIR}/include/nanobind/stl/vector.h
     ${NB_DIR}/include/nanobind/stl/list.h
 
-    ${NB_DIR}/src/internals.h
     ${NB_DIR}/src/buffer.h
-    ${NB_DIR}/src/internals.cpp
-    ${NB_DIR}/src/common.cpp
-    ${NB_DIR}/src/tensor.cpp
+    ${NB_DIR}/src/nb_internals.h
+    ${NB_DIR}/src/nb_internals.cpp
     ${NB_DIR}/src/nb_func.cpp
     ${NB_DIR}/src/nb_type.cpp
     ${NB_DIR}/src/nb_enum.cpp
+    ${NB_DIR}/src/common.cpp
     ${NB_DIR}/src/error.cpp
+    ${NB_DIR}/src/tensor.cpp
     ${NB_DIR}/src/trampoline.cpp
     ${NB_DIR}/src/implicit.cpp
   )
@@ -161,8 +161,12 @@ function(nanobind_disable_stack_protector name)
 endfunction()
 
 function(nanobind_extension name)
-  set_target_properties(${name} PROPERTIES
-    PREFIX "" SUFFIX "${NB_SUFFIX}")
+  set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX "${NB_SUFFIX}")
+endfunction()
+
+function(nanobind_extension_abi3 name)
+  get_filename_component(ext "${NB_SUFFIX}" LAST_EXT)
+  set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX ".abi3${ext}")
 endfunction()
 
 function (nanobind_cpp17 name)
@@ -187,23 +191,44 @@ function (nanobind_headers name)
 endfunction()
 
 function(nanobind_add_module name)
-  cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")
+  cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;STABLE_ABI;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")
 
   Python_add_library(${name} MODULE ${ARG_UNPARSED_ARGUMENTS})
 
   nanobind_cpp17(${name})
-  nanobind_extension(${name})
   nanobind_msvc(${name})
   nanobind_headers(${name})
 
-  if (ARG_NB_STATIC)
-    nanobuild_build_library(nanobind-static STATIC)
-    target_link_libraries(${name} PRIVATE nanobind-static)
+  # Limited API interface only supported in Python >= 3.12
+  if ((Python_VERSION_MAJOR EQUAL 3) AND (Python_VERSION_MINOR LESS 12))
+    set(ARG_STABLE_ABI OFF)
+  endif()
+
+  if (ARG_STABLE_ABI)
+    if (ARG_NB_STATIC)
+      nanobuild_build_library(nanobind-static-abi3 STATIC)
+      set(libname nanobind-static-abi3)
+    else()
+      nanobuild_build_library(nanobind-abi3 SHARED)
+      set(libname nanobind-abi3)
+    endif()
+
+    target_compile_definitions(${libname} PUBLIC -DPy_STABLE_ABI=0x030C0000)
+    nanobind_extension_abi3(${name})
   else()
-    nanobuild_build_library(nanobind SHARED)
-    target_link_libraries(${name} PRIVATE nanobind)
+    if (ARG_NB_STATIC)
+      nanobuild_build_library(nanobind-static STATIC)
+      set(libname nanobind)
+    else()
+      nanobuild_build_library(nanobind SHARED)
+      set(libname nanobind)
+    endif()
+
+    nanobind_extension(${name})
   endif()
 
+  target_link_libraries(${name} PRIVATE ${libname})
+
   if (NOT ARG_PROTECT_STACK)
     nanobind_disable_stack_protector(${name})
   endif()

diff --git a/docs/cmake.md b/docs/cmake.md
@@ -59,6 +59,13 @@ it performs the following steps to produce efficient bindings.
 - It appends the library suffix (e.g., `.cpython-39-darwin.so`) based on
   information provided by CMake's `FindPython` module.
 
+- When requested via the optional `STABLE_ABI` parameter, and when your
+  version of Python is sufficiently recent (3.12 +), the implementation
+  will build a [stable ABI](https://docs.python.org/3/c-api/stable.html)
+  extension module with a different suffix (e.g., `.abi3.so`). This comes at a
+  performance cost since _nanobind_ can no longer access the internals of
+  various data structures directly.
+
 - It statically or dynamically links against `libnanobind` depending on the
   value of the `NB_SHARED` parameter of the CMake project. Note that
   `NB_SHARED` is not an input of the `nanobind_add_module()` function. Rather,

diff --git a/include/nanobind/nb_accessor.h b/include/nanobind/nb_accessor.h
@@ -120,13 +120,13 @@ struct num_item_list {
     using key_type = Py_ssize_t;
 
     NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
-        *cache = PyList_GET_ITEM(obj, index);
+        *cache = NB_LIST_GET_ITEM(obj, index);
     }
 
     NB_INLINE static void set(PyObject *obj, Py_ssize_t index, PyObject *v) {
-        PyObject *old = PyList_GET_ITEM(obj, index);
+        PyObject *old = NB_LIST_GET_ITEM(obj, index);
         Py_INCREF(v);
-        PyList_SET_ITEM(obj, index, v);
+        NB_LIST_SET_ITEM(obj, index, v);
         Py_DECREF(old);
     }
 };
@@ -136,7 +136,7 @@ struct num_item_tuple {
     using key_type = Py_ssize_t;
 
     NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
-        *cache = PyTuple_GET_ITEM(obj, index);
+        *cache = NB_TUPLE_GET_ITEM(obj, index);
     }
 
     template <typename...Ts> static void set(Ts...) {

diff --git a/include/nanobind/nb_attr.h b/include/nanobind/nb_attr.h
@@ -51,14 +51,16 @@ struct is_method {};
 struct is_implicit {};
 struct is_operator {};
 struct is_arithmetic {};
+struct is_final { };
 struct is_enum {
     bool is_signed;
 };
+
 template <size_t /* Nurse */, size_t /* Patient */> struct keep_alive {};
 template <typename T> struct supplement {};
 struct type_callback {
-    type_callback(void (*value)(PyTypeObject *) noexcept) : value(value) {}
-    void (*value)(PyTypeObject *) noexcept;
+    type_callback(void (*value)(PyType_Slot **) noexcept) : value(value) {}
+    void (*value)(PyType_Slot **) noexcept;
 };
 struct raw_doc {
     const char *value;
@@ -94,7 +96,7 @@ enum class func_flags : uint32_t {
     is_implicit = (1 << 12),
     /// Is this function an arithmetic operator?
     is_operator = (1 << 13),
-    /// When the function is GCed, do we need to call func_data::free?
+    /// When the function is GCed, do we need to call func_data_prelim::free?
     has_free = (1 << 14),
     /// Should the func_new() call return a new reference?
     return_ref = (1 << 15),
@@ -110,7 +112,7 @@ struct arg_data {
     bool none;
 };
 
-template <size_t Size> struct func_data {
+template <size_t Size> struct func_data_prelim {
     // A small amount of space to capture data used by the function/closure
     void *capture[3];
 

diff --git a/include/nanobind/nb_call.h b/include/nanobind/nb_call.h
@@ -31,15 +31,14 @@ template <typename T>
 NB_INLINE void call_analyze(size_t &nargs, size_t &nkwargs, const T &value) {
     using D = std::decay_t<T>;
 
-    if constexpr (std::is_same_v<D, arg_v>) {
+    if constexpr (std::is_same_v<D, arg_v>)
         nkwargs++;
-    } else if constexpr (std::is_same_v<D, args_proxy>) {
+    else if constexpr (std::is_same_v<D, args_proxy>)
         nargs += len(value);
-    } else if constexpr (std::is_same_v<D, kwargs_proxy>) {
+    else if constexpr (std::is_same_v<D, kwargs_proxy>)
         nkwargs += len(value);
-    } else {
+    else
         nargs += 1;
-    }
 
     (void) nargs; (void) nkwargs; (void) value;
 }
@@ -53,7 +52,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
 
     if constexpr (std::is_same_v<D, arg_v>) {
         args[kwargs_offset + nkwargs] = value.value.release().ptr();
-        PyTuple_SET_ITEM(kwnames, nkwargs++,
+        NB_TUPLE_SET_ITEM(kwnames, nkwargs++,
                          PyUnicode_InternFromString(value.name));
     } else if constexpr (std::is_same_v<D, args_proxy>) {
         for (size_t i = 0, l = len(value); i < l; ++i)
@@ -65,7 +64,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
         while (PyDict_Next(value.ptr(), &pos, &key, &entry)) {
             Py_INCREF(key); Py_INCREF(entry);
             args[kwargs_offset + nkwargs] = entry;
-            PyTuple_SET_ITEM(kwnames, nkwargs++, key);
+            NB_TUPLE_SET_ITEM(kwnames, nkwargs++, key);
         }
     } else {
         args[nargs++] =
@@ -88,7 +87,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
         args[0] = nullptr;                                                     \
         args_p = args + 1;                                                     \
     }                                                                          \
-    nargs |= PY_VECTORCALL_ARGUMENTS_OFFSET;                                   \
+    nargs |= NB_VECTORCALL_ARGUMENTS_OFFSET;                                   \
     return steal(obj_vectorcall(base, args_p, nargs, kwnames, method_call))
 
 template <typename Derived>

diff --git a/include/nanobind/nb_cast.h b/include/nanobind/nb_cast.h
@@ -318,7 +318,7 @@ tuple make_tuple(Args &&...args) {
     size_t nargs = 0;
     PyObject *o = result.ptr();
 
-    (PyTuple_SET_ITEM(o, nargs++,
+    (NB_TUPLE_SET_ITEM(o, nargs++,
                       detail::make_caster<Args>::from_cpp(
                           (detail::forward_t<Args>) args,
                           detail::infer_policy<Args>(policy), nullptr).ptr()),

diff --git a/include/nanobind/nb_class.h b/include/nanobind/nb_class.h
@@ -57,14 +57,19 @@ enum class type_flags : uint32_t {
     is_arithmetic            = (1 << 15),
 
     /// This type is an arithmetic enumeration
-    has_type_callback        = (1 << 16)
+    has_type_callback        = (1 << 16),
+
+    /// This type does not permit subclassing from Python
+    is_final                 = (1 << 17),
+
+    /// This type does not permit subclassing from Python
+    has_supplement           = (1 << 18)
 };
 
 struct type_data {
-    uint32_t size : 24;
+    uint32_t size;
     uint32_t align : 8;
-    uint32_t flags : 20;
-    uint32_t supplement : 12;
+    uint32_t flags : 24;
     const char *name;
     const char *doc;
     PyObject *scope;
@@ -77,10 +82,11 @@ struct type_data {
     void (*move)(void *, void *) noexcept;
     const std::type_info **implicit;
     bool (**implicit_py)(PyTypeObject *, PyObject *, cleanup_list *) noexcept;
-    void (*type_callback)(PyTypeObject *) noexcept;
+    void (*type_callback)(PyType_Slot **) noexcept;
+    void *supplement;
 };
 
-static_assert(sizeof(type_data) == 8 + sizeof(void *) * 13);
+static_assert(sizeof(type_data) == 8 + sizeof(void *) * 14);
 
 NB_INLINE void type_extra_apply(type_data &t, const handle &h) {
     t.flags |= (uint32_t) type_flags::has_base_py;
@@ -104,14 +110,20 @@ NB_INLINE void type_extra_apply(type_data &t, is_enum e) {
         t.flags |= (uint32_t) type_flags::is_unsigned_enum;
 }
 
+NB_INLINE void type_extra_apply(type_data &t, is_final) {
+    t.flags |= (uint32_t) type_flags::is_final;
+}
+
 NB_INLINE void type_extra_apply(type_data &t, is_arithmetic) {
     t.flags |= (uint32_t) type_flags::is_arithmetic;
 }
 
 template <typename T>
 NB_INLINE void type_extra_apply(type_data &t, supplement<T>) {
-    static_assert(sizeof(T) <= 0xFF, "Supplement is too big!");
-    t.supplement += sizeof(T);
+    static_assert(std::is_trivially_default_constructible_v<T>,
+                  "The supplement type must be a POD (plain old data) type");
+    t.flags |= (uint32_t) type_flags::has_supplement;
+    t.supplement = (void *) malloc(sizeof(T));
 }
 
 template <typename... Args> struct init {