Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyObject_GetBuffer(PyBUF_FORMAT) breaks on a memoryview #123609

Open
philthompson10 opened this issue Sep 2, 2024 · 16 comments
Open

PyObject_GetBuffer(PyBUF_FORMAT) breaks on a memoryview #123609

philthompson10 opened this issue Sep 2, 2024 · 16 comments
Labels
docs Documentation in the Doc dir topic-C-API

Comments

@philthompson10
Copy link

philthompson10 commented Sep 2, 2024

Bug report

Bug description:

Calling PyObject_GetBuffer() with the PyBUF_FORMAT flag on a memoryview object that wraps a bytes object fails with the exception:

BufferError: memoryview: cannot cast to unsigned bytes if the format flag is present

Calling it directly on the same bytes object does not fail.

The test is in memory_getbuf() in memoryobject.c. The test is failing because format is not NULL. format is actually "B" which is the equivalent of NULL. I think the test should be changed to allow a format value of "B".

An alternative might be to change PyBuffer_FillInfo() to always set format to NULL.

CPython versions tested on:

3.12, 3.13

Operating systems tested on:

Linux, macOS

Linked PRs

@philthompson10 philthompson10 added the type-bug An unexpected behavior, bug, or error label Sep 2, 2024
@ZeroIntensity
Copy link
Member

The docs state that PyBUF_SIMPLE | PyBUF_FORMAT (equivalent to PyBUF_FORMAT) is disallowed, because if you request a standalone PyBUF_FORMAT, you already know that the format is B ahead of time. Is there a specific problem this is causing somewhere that can't be refactored?

However, it looks like we do support standalone PyBUF_FORMAT on bytearray and bytes. For example, the following does not raise a BufferError:

import _testbuffer

_testbuffer.ndarray(b"no good!", getbuf=_testbuffer.PyBUF_FORMAT)

Maybe it's worth trying to explicitly disallow those?

(@encukou, this needs topic-C-API.)

@philthompson10
Copy link
Author

Why would I know the format ahead of time? What I am trying to do is to introspect the detail of the buffer protocol implemented by an object (I've already called PyObject_CheckBuffer()).

@ZeroIntensity
Copy link
Member

Why would I know the format ahead of time?

If the requested buffer was PyBUF_SIMPLE | PyBUF_FORMAT, then by definition, the format is B. FWIW, I'm not totally sure why the check was explicitly added only for memoryview, that's what I want to fix. There are two ideas here:

  • Add the same check for bytes and bytearray, or possibly PyBuffer_FillInfo (I guess that would technically be a breaking change, but this wasn't allowed anyway).
  • Remove the check altogether for consistency, and document support for PyBUF_SIMPLE | PyBUF_FORMAT. Though, I'm not convinced that's particularly useful. (If you could provide an example where explicitly knowing that the format is B instead of NULL, that would be great!)

@philthompson10
Copy link
Author

Two points...

  1. It seems I am misunderstanding what the purpose of PyObject_GetBuffer() is. How do I determine what format an object that implements the buffer protocol supports?

  2. You state that by definition, the format is B, but that's not true. By definition, the format is B or NULL. The test should check for both values.

@ZeroIntensity
Copy link
Member

  1. How do I determine what format an object that implements the buffer protocol supports?

You do that with PyBUF_FORMAT, but it's redundant for PyBUF_SIMPLE, because that will always be a B format anyway.

2. By definition, the format is B or NULL

Oh, I think you misinterpreted what I meant by that. I meant, by requesting PyBUF_SIMPLE | PyBUF_FORMAT, you will always get a format of B. If you request PyBUF_SIMPLE, then you're right, then the format field will be NULL -- but that implies B anyway. Why explicitly set it at runtime?

I'm guessing you want to do something like the following:

static PyObject *
print_buffer_format(PyObject *Py_UNUSED(self), PyObject *obj)
{
    PyObject *obj;
    // technically, you should be using y* to get the buffer, but I'm just making an example
    if (!PyArg_ParseTuple("O", &obj))
        return NULL;

    Py_buffer buf;
    if (PyObject_GetBuffer(obj, &buf, PyBUF_FORMAT) < 0)
        return NULL;

    printf("Format string: %s\n", buf.format);
    PyBuffer_Release(&buf);

    Py_RETURN_NONE;
}

Again, using format in the above is redundant. You could refactor this to:

if (PyObject_GetBuffer(obj, &buf, PyBUF_SIMPLE) < 0)
    return NULL;

puts("Format string: B");
PyBuffer_Release(&buf);

@philthompson10
Copy link
Author

I've now read the PEP and have a much better understanding of what PyObject_GetBuffer() is for. It makes it very clear that I should be using PyBUF_SIMPLE for my use case.

It seems that there is no mechanism for making general queries about the buffer capabilities of an object. The nearest you can get is to make repeated calls to PyObject_GetBuffer() with different flags to see which succeed or fail. That's not a problem for me at the moment.

I guess my problem was that I misunderstood the docs. This statement from the PEP really helped : The exporter should always fill in all elements of the buffer structure, as did the explanation of PyBUF_SIMPLE. It would also help if the docs stated that PyBUF_FORMAT (and PyBUF_WRITABLE?) should not be used in isolation.

@ZeroIntensity
Copy link
Member

It would also help if the docs stated that PyBUF_FORMAT (and PyBUF_WRITABLE?) should not be used in isolation.

It does, just not directly. They say not to use PyBUF_SIMPLE | PyBUF_FORMAT, but since PyBUF_SIMPLE is defined as zero, that's equivalent to a standalone PyBUF_FORMAT.

I think, for consistency, we should also disallow this in PyBuffer_FillInfo, not just memoryview.

@ZeroIntensity
Copy link
Member

@encukou, I think this should get closed. Disallowing PyBUF_SIMPLE | PyBUF_FORMAT will probably break code somewhere, and I don't think it's worth the hassle to try to remove the check from memoryview.

@encukou
Copy link
Member

encukou commented Sep 6, 2024

I think a little note like this could make things clearer:

Since PyBUF_SIMPLE is defined as 0, PyBUF_FORMAT cannot be used as a stand-alone flag.

@encukou encukou added docs Documentation in the Doc dir and removed type-bug An unexpected behavior, bug, or error labels Sep 6, 2024
@ZeroIntensity
Copy link
Member

Sure, I'll submit a PR. Would it be worth mentioning that bytes and bytearray support it anyway?

@ZeroIntensity
Copy link
Member

I've created #123778 to fix the docs. @philthompson10, does this look clearer to you?

@encukou
Copy link
Member

encukou commented Sep 6, 2024

Would it be worth mentioning that bytes and bytearray support it anyway?

No. CPython doesn't check that you're not playing by the rules, but that doesn't mean you should do it. (Even though we don't want to add the check and break people who didn't study the docs carefully.)

Please keep the message short, that way more people will read it :)

@ZeroIntensity
Copy link
Member

CPython doesn't check that you're not playing by the rules, but that doesn't mean you should do it.

Yeah, I mentioned this in the PR by using bytes and memoryview as an example. Is that OK?

@encukou
Copy link
Member

encukou commented Sep 6, 2024

User-facing docs shouldn't describe the implementation (especially if it's slightly buggy), but how users should use it.
It's OK to avoid mentioning that a check is missing.

@philthompson10
Copy link
Author

@ZeroIntensity sorry, no, the docs change doesn't really make things clearer and (strictly speaking) isn't true. As the value of PyBUF_SIMPLE is 0 then it can be used with PyBUF_FORMAT as it makes no difference. I agree with @encukou that the docs should describe the API from the point of view of the user of the API rather than the developer of the API (ie. I don't care what the actual value of PyBUF_SIMPLE is). The PEP is more user focused.

@encukou
Copy link
Member

encukou commented Sep 9, 2024

PyBUF_FORMAT cannot be used alone. On some objects it'll work, but you shouldn't do it.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 10, 2024
…-123778)

(cherry picked from commit 962304a)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 10, 2024
…-123778)

(cherry picked from commit 962304a)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Yhg1s pushed a commit that referenced this issue Sep 24, 2024
…) (#123903)

gh-123609: Clarify usage of standalone `PyBUF_FORMAT` (GH-123778)
(cherry picked from commit 962304a)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-C-API
Projects
None yet
Development

No branches or pull requests

3 participants