Points from the Bad C API document #28

encukou · 2023-10-12T10:52:14Z

@vstinner, your bad C-API document has these notes that you might want to expand into issues here:

The text was updated successfully, but these errors were encountered:

vstinner · 2023-10-13T14:51:34Z

No public C functions if it can’t be done in Python

That was a request from PyPy devs. By the way, the specific case of Py_buffer was fixed in Python 3.12 by PEP 688 – Making the buffer protocol accessible in Python.

PyObject** must not be exposed: PyObject** PySequence_Fast_ITEMS(ob) has to go.

I propose adding a PyResource API:

But I'm not convinced myself by this approach. A more promising approach may be to support PyObject* type in the Py_buffer protocol. Apparently, some projects have an unofficial support for that.

pitrou · 2023-10-14T09:02:02Z

PyObject** must not be exposed: PyObject** PySequence_Fast_ITEMS(ob) has to go.

I think this could be solved with a "sequence unpacking API" that would return an array of owned refs:

/*
 * Try to unpack `nitems` items from `seq`, starting at position `start`, into `items`.
 *
 * Returns the number of items actually unpacked:
 *  - is less than `nitems` if sequence is too short;
 *  - is 0 if past sequence end;
 *  - is -1 on error.
 *
 * The unpacked items are all owned references and must be released
 * with `PySequence_ClearUnpacked` when done.
 */
Py_ssize_t PySequence_Unpack(PyObject* seq, Py_ssize_t start, Py_ssize_t nitems, PyObject** items);
void PySequence_ClearUnpacked(PyObject** items, Py_ssize_t start, Py_ssize_t nitems);

Example usage:

// Iterate over the items in sequence `seq`, in 128-element batches
#define BATCH_SIZE 128
PyObject *items[BATCH_SIZE];
Py_ssize_t start = 0, nitems;
while ((nitems = PySequence_Unpack(seq, start, BATCH_SIZE, items)) > 0) {
  // Do something with the `nitems` objects in `items`...
  ...
  // Clear owned refs
  PySequence_ClearUnpacked(items, start, nitems);
  // Move to next batch
  start += nitems;
}
if (nitems < 0) {
  // PySequence_Unpack failed, handle error
  ...
}

Batching solves the performance problem of calling PySequence_GetItem one item at a time, but without compromising with error handling and safety.

pitrou · 2023-10-14T09:09:11Z

A more promising approach may be to support PyObject* type in the Py_buffer protocol. Apparently, some projects have an unofficial support for that.

This is already part of PEP 3118 (see the O typecode for Python objects), and has been supported by NumPy for a long time.

But I don't think using the buffer protocol is adequate for quick sequence access over potentially short sequences. The use cases are different. Also, this would require tuple and list to implement the buffer protocol which is potentially disruptive.

vstinner · 2023-10-14T09:35:34Z

Yeah, batching seems to be preferred over my PyResource API based on "create a single view, use the view, delete the view" knowning that the creation could require to copy the whole array, create new strong references, and then decrement ref count.

pitrou mentioned this issue Oct 16, 2023

Array export and mass incref/decref capi-workgroup/problems#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Points from the Bad C API document #28

Points from the Bad C API document #28

encukou commented Oct 12, 2023

vstinner commented Oct 13, 2023

pitrou commented Oct 14, 2023

pitrou commented Oct 14, 2023 •

edited

Loading

vstinner commented Oct 14, 2023

Points from the Bad C API document #28

Points from the Bad C API document #28

Comments

encukou commented Oct 12, 2023

vstinner commented Oct 13, 2023

pitrou commented Oct 14, 2023

pitrou commented Oct 14, 2023 • edited Loading

vstinner commented Oct 14, 2023

pitrou commented Oct 14, 2023 •

edited

Loading