Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Points from the Bad C API document #28

Open
encukou opened this issue Oct 12, 2023 · 4 comments
Open

Points from the Bad C API document #28

encukou opened this issue Oct 12, 2023 · 4 comments

Comments

@encukou
Copy link
Contributor

encukou commented Oct 12, 2023

@vstinner, your bad C-API document has these notes that you might want to expand into issues here:

@vstinner
Copy link
Contributor

No public C functions if it can’t be done in Python

That was a request from PyPy devs. By the way, the specific case of Py_buffer was fixed in Python 3.12 by PEP 688 – Making the buffer protocol accessible in Python.

PyObject** must not be exposed: PyObject** PySequence_Fast_ITEMS(ob) has to go.

I propose adding a PyResource API:

But I'm not convinced myself by this approach. A more promising approach may be to support PyObject* type in the Py_buffer protocol. Apparently, some projects have an unofficial support for that.

@pitrou
Copy link

pitrou commented Oct 14, 2023

I think this could be solved with a "sequence unpacking API" that would return an array of owned refs:

/*
 * Try to unpack `nitems` items from `seq`, starting at position `start`, into `items`.
 *
 * Returns the number of items actually unpacked:
 *  - is less than `nitems` if sequence is too short;
 *  - is 0 if past sequence end;
 *  - is -1 on error.
 *
 * The unpacked items are all owned references and must be released
 * with `PySequence_ClearUnpacked` when done.
 */
Py_ssize_t PySequence_Unpack(PyObject* seq, Py_ssize_t start, Py_ssize_t nitems, PyObject** items);
void PySequence_ClearUnpacked(PyObject** items, Py_ssize_t start, Py_ssize_t nitems);

Example usage:

// Iterate over the items in sequence `seq`, in 128-element batches
#define BATCH_SIZE 128
PyObject *items[BATCH_SIZE];
Py_ssize_t start = 0, nitems;
while ((nitems = PySequence_Unpack(seq, start, BATCH_SIZE, items)) > 0) {
  // Do something with the `nitems` objects in `items`...
  ...
  // Clear owned refs
  PySequence_ClearUnpacked(items, start, nitems);
  // Move to next batch
  start += nitems;
}
if (nitems < 0) {
  // PySequence_Unpack failed, handle error
  ...
}

Batching solves the performance problem of calling PySequence_GetItem one item at a time, but without compromising with error handling and safety.

@pitrou
Copy link

pitrou commented Oct 14, 2023

A more promising approach may be to support PyObject* type in the Py_buffer protocol. Apparently, some projects have an unofficial support for that.

This is already part of PEP 3118 (see the O typecode for Python objects), and has been supported by NumPy for a long time.

But I don't think using the buffer protocol is adequate for quick sequence access over potentially short sequences. The use cases are different. Also, this would require tuple and list to implement the buffer protocol which is potentially disruptive.

@vstinner
Copy link
Contributor

Yeah, batching seems to be preferred over my PyResource API based on "create a single view, use the view, delete the view" knowning that the creation could require to copy the whole array, create new strong references, and then decrement ref count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants