Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid C-specific Types #10

Open
encukou opened this issue Oct 9, 2023 · 28 comments
Open

Avoid C-specific Types #10

encukou opened this issue Oct 9, 2023 · 28 comments
Labels
guideline To be included in guidelines PEP

Comments

@encukou
Copy link
Contributor

encukou commented Oct 9, 2023

In new API, let's avoid types whose size/layout depends on the C compiler.

  • Bad: long, bitfields, enums
  • OK:
    • size_t, intptr_t, which depend directly on the platform
    • int for small ranges (as replacement for enum) -- here practicality beats purity
  • Good: int32_t etc.

In the existing API, we might maybe want to e.g. replace long by int32_t, and declare an ABI break on the few platforms where sizeof(long) != sizeof(int32_t).

@encukou encukou changed the title Avoid C-only Types Avoid C-specific Types Oct 9, 2023
@encukou encukou added the guideline To be included in guidelines PEP label Oct 11, 2023
@pitrou
Copy link

pitrou commented Oct 14, 2023

In the existing API, we might maybe want to e.g. replace long by int32_t, and declare an ABI break on the few platforms where sizeof(long) != sizeof(int32_t).

I'm not sure why this would be desirable. Do you have an example of an API where you would want to break the ABI?

Also, "the few platforms" are very popular. On Linux x86-64, long is 64 bits...

@encukou
Copy link
Contributor Author

encukou commented Oct 17, 2023

Right. I've removed that half-baked idea from the proposal.

@vstinner
Copy link
Contributor

See also capi-workgroup/problems#27

@vstinner
Copy link
Contributor

Good: int32_t etc.

In C, it's "natural" and convenient to use char, short, int, long and long long. For example, there is no PyLong_FromUint32() or PyLong_AsUint32() function.

@encukou
Copy link
Contributor Author

encukou commented Jun 12, 2024

In C, it's "natural" and convenient to use char, short, int, long and long long.

I'd say it's traditional. The explicitly sized types aren't really harder to use.
The problem with this tradition is that it encourages non-portable programs. For example, if we use an int for flags or “enums”, we should only use 16 bits (the minimum size of a C int), but common platforms allow more without warning.

And with C's conversion rules, PyLong_FromLong and PyLong_AsLong are quite comfortable to use for int32_t and smaller.

@davidhewitt
Copy link

For what it's worth, from the Rust side the default types are all fixed-size e.g. i32, u8, f64. We do have c_int and friends available for important compatibility reasons too. So preferring fixed-size types makes Rust interop nicer, though not game changing.

And with C's conversion rules, PyLong_FromLong and PyLong_AsLong are quite comfortable to use for int32_t and smaller.

Isn't there a risk of silent truncation lurking for users who don't exercise a bit of care here? If I understand correctly, on platforms with long as 64 bits then PyLong_AsLong will only raise OverflowError on values outside of that range, so casting the result of that to int32_t requires users to repeat another range check and potentially construct their own OverflowError.

At least, that's what we have to do in PyO3 when we extract an i32 from a Python int.

@vstinner
Copy link
Contributor

I propose adding PyLong functions to convert to/from <stdint.h> integers: Add PyLong_FromInt64() and PyLong_AsInt64().

@cavokz
Copy link

cavokz commented Jun 12, 2024

I propose adding PyLong functions to convert to/from <stdint.h> integers: Add PyLong_FromInt64() and PyLong_AsInt64().

This is what we've done also in Pygolo.

Initially we were just wrapping PyLong_FromLong and PyLong_AsLong but when we added support for Windows, where long is 32 bit also on 64 bits CPUs, we dropped them (the project is young and we can still fix the API) in favor of sized integers.

We did an extra mile so that non-sized Go ints are converted with the proper PyLong_{As,From}Long or PyLong_{As,From}LongLong call depending on the system. See here and here.

In summary, we could paper over the C flexibility and give Go users what they actually need; we test our assumptions.

@encukou
Copy link
Contributor Author

encukou commented Jun 12, 2024

I propose adding PyLong functions to convert to/from <stdint.h> integers

IMO, we should only add those functions if they're significantly faster than PyLong_{From,To}NativeBytes.
If they're not, I'd actually prefer looking into* macro wrappers for NativeBytes, like PyLong_FROM_VAR(int32_t, my_value) for C, because other languages have better ways to do this (like template functions, trait impls).
Such a mechanism, if it's possible to make it, would allow conversions for any integral type -- size_t etc. etc.

  • (but I haven't looked into that yet, don't know if it's possible)

@pitrou
Copy link

pitrou commented Jun 12, 2024

Yuck. Shouldn't we try to move away from macros and define actual functions instead (possibly inline)?

Performance doesn't have anything to do with it. It's about convenience.

@encukou
Copy link
Contributor Author

encukou commented Jun 12, 2024

The macros should definitely be backed by a function that does as much as possible, but sizeof(TYPE) and IS_SIGNED(TYPE) need to be in a macro.

@pitrou
Copy link

pitrou commented Jun 12, 2024

The macros should definitely be backed by a function that does as much as possible, but sizeof(TYPE) and IS_SIGNED(TYPE) need to be in a macro.

Can you explain why they need to be in a macro?

I would suggest the converse: the various functions can be implemented using a common macro, but there should be a dedicated API function for each C type of choice (mostly int32_t, int64_t, uint32_t and uint64_t, as those are the most common IME).

@encukou
Copy link
Contributor Author

encukou commented Jun 12, 2024

Because you can only cover a fixed number of most common types. And if we add API that's parameterized on the type (and calls PyLong_{From,To}NativeBytes), so we can cover stuff like POSIX typedefs or user enums, then we won't need specialized PyLong_FromInt32.

@pitrou
Copy link

pitrou commented Jun 12, 2024

Well, this issue is really about fixed-width integer types such as int32_t, not assorted typedefs from POSIX or Windows.

@encukou
Copy link
Contributor Author

encukou commented Jun 12, 2024

This issue is about avoiding types with platform-specific widths (at least for values that aren't inherently platform-specific). And now the issue is also about adding API that makes a particular alternative more usable.

Perhaps we should use, for example, a typedef int32_t PyUnicode_Format_t instead of int32_t in Victor's PyUnicode_Export API. Any language except pre-C23 C will want to use a distinct enum type here anyway.

@pitrou
Copy link

pitrou commented Jun 12, 2024

(note: deleted comment that was based on a misunderstanding of your message; sorry!)

@pitrou
Copy link

pitrou commented Jun 12, 2024

That said:

This issue is about avoiding types with platform-specific widths (at least for values that aren't inherently platform-specific). And now the issue is also about adding API that makes a particular alternative more usable.

I'm not sure how that is disagreeing with or countering anything I said above.

@encukou
Copy link
Contributor Author

encukou commented Jun 12, 2024

If it's OK to discuss a proposal for PyLong_FromInt64 here, please allow me to say why I think adding PyLong_FromInt64 is a bad idea we can do better than PyLong_FromInt64.

@pitrou
Copy link

pitrou commented Jun 12, 2024

Right, I just disagree that making PyLong_FromInt64 a macro is a better idea than making it a C API function. Is it what you suggest or am I misunderstanding?

@pitrou
Copy link

pitrou commented Jun 12, 2024

Note that you can generate actual C API functions using a macro if you want. Here's a quick sketch:

/* in .h file */
#define PY_MAKE_INT_CONVERSION_FUNCS(API_NAME, C_TYPE) \
  PyAPI_FUNC(PyObject *) PyLong_From##API_NAME(C_TYPE); \
  PyAPI_FUNC(C_TYPE) PyLong_As##API_NAME(PyObject *); \

PY_MAKE_INT_CONVERSION_FUNCS(Int32, int32_t);
PY_MAKE_INT_CONVERSION_FUNCS(Int64, int64_t);
PY_MAKE_INT_CONVERSION_FUNCS(UInt32, uint32_t);
PY_MAKE_INT_CONVERSION_FUNCS(UInt64, uint64_t);

#undef PY_MAKE_INT_CONVERSION_FUNCS

/* in .c file */
#define PY_MAKE_INT_CONVERSION_FUNCS(API_NAME, C_TYPE) \
  PyObject *PyLong_From##API_NAME(C_TYPE) { ... actual implementation ... } \
  C_TYPE PyLong_As##API_NAME(PyObject *) { ... actual implementation ... } \

PY_MAKE_INT_CONVERSION_FUNCS(Int32, int32_t);
PY_MAKE_INT_CONVERSION_FUNCS(Int64, int64_t);
PY_MAKE_INT_CONVERSION_FUNCS(UInt32, uint32_t);
PY_MAKE_INT_CONVERSION_FUNCS(UInt64, uint64_t);

#undef PY_MAKE_INT_CONVERSION_FUNCS

@encukou
Copy link
Contributor Author

encukou commented Jun 12, 2024

I am not suggesting to make PyLong_FromInt64 a macro.
I'm suggesting a single macro like PyLong_FROM(int64_t, value) that would cover all integral types. It would call PyLong_FromNativeBytes(&value, 64/8, -1).

I want the size to be an argument; not to add functions for all possible values of that argument.

@vstinner
Copy link
Contributor

I'm suggesting a single macro like PyLong_FROM(int64_t, value) that would cover all integral types.

To call it, you would have to write PyLong_FROM(int64_t, value)? It looks uncommon and less convenient than having a regular function: PyLong_FromInt64(value). I'm trying to make the usage of <stdint.h> types more convenient and more natural. I don't think that PyLong_FROM(int64_t, value) would be a good fit here.

@zooba
Copy link

zooba commented Jun 12, 2024

Unfortunately, value needs to be an lvalue (which at least means we can evaluate it twice in a macro and use sizeof), or else we need to assign it storage by calling a real function. And once we assign the value ourselves, the original size doesn't matter, for better or worse.

But the hope of FromNativeBytes is certainly to replace the implementations of the existing converters (in both directions), as well as any we implement in the future. Providing bodies for FromIntXX could be done with a macro. . But it can't select the right arguments for an rvalue by itself, and the user gets a better experience with an argument type mismatch on a function (or they can use FromNativeBytes directly).

@encukou
Copy link
Contributor Author

encukou commented Jun 13, 2024

Actually, the PyLong_From functions aren't the hard part. All of the proposed functions can be replaced by PyLong_FromLongLong/PyLong_FromUnsignedLongLong, which are guaranteed to handle anything up to int64. C will convert smaller values easily.
It won't work with __int128 of course, but, neither do the proposed functions.

I'm still interested in a generic solution that would cover all integral types; we can't cover them all. And even if we shoot the idea down, we should shoot down the best version of it.
That solution can be C-specific -- most other languages (including C++) should have enough support for generic types to make it easy to find the right arguments to NativeBytes. I don't think they need 16 new functions.

Unfortunately, value needs to be an lvalue (which at least means we can evaluate it twice in a macro and use sizeof)

Unless we use C99:

#define PyLong_FROM(TYPE, VALUE) (                                            \
        (((TYPE)-1) < (TYPE)0)                                                \
        ? PyLong_FromNativeBytes(&(TYPE){VALUE}, sizeof(TYPE), -1)            \
        : PyLong_FromUnsignedNativeBytes(&(TYPE){VALUE}, sizeof(TYPE), -1)    \
    )

It still needs the type for signedness; and it'd be good to add a check that sizeof(VALUE) == sizeof(TYPE).

How bad is to use a C99 designated initializer? Perhaps not that bad. If it's in a macro, it won't be a syntax error if you don't use it. It'll only hurt Python-3.14-only projects on a pre-C99 (or C++) compiler.

But, I guess that's all moot -- it's not that much of an improvement over using FromNativeBytes directly, and you can use PyLong_FromLongLong anyway.


The PyLong_TO side is more complicated, though:

  • unsigned values need unwieldy flags
  • overflow needs to be checked
  • in some use cases it's correct to use the REJECT_NEGATIVE flag, but it's easy to forget

But with a static inline helper function, the macro would be very similar to a PyLong_FROM.

click for strawman code
static inline int
_PyLong_To_impl(PyObject *pylong, void *buffer,
                Py_ssize_t n_bytes, int flags,
                Py_ssize_t type_size,
                const char *typename)
{
    if (n_bytes != type_size) {
        PyErr_Format(PyExc_SystemError,
                     "PyLong_TO: size of argument does not match %s",
                     typename);
    }
    int result = PyLong_AsNativeBytes(pylong, buffer, n_bytes, flags);
    if (result < 0) {
        return result;
    }
    if (result > n_bytes) {
        PyErr_Format(PyExc_ValueError,
                     "python integer too large to convert to %s",
                     typename);
        return -1;
    }
    return result;
}

#define PyLong_TO(TYPE, DEST, N) (                                            \
        (((TYPE)-1) < (TYPE)0)                                                \
        ? _PyLong_To_impl((PyObject*)(N), &(DEST), sizeof(DEST), -1,          \
            sizeof(TYPE), #TYPE)                                             \
        : _PyLong_To_impl((PyObject*)(N), &(DEST), sizeof(DEST),              \
            Py_ASNATIVEBYTES_NATIVE_ENDIAN                                    \
            | Py_ASNATIVEBYTES_UNSIGNED_BUFFER,                               \
            sizeof(TYPE), #TYPE)                                             \
    )

I kinda wish AsNativeBytes would do more of that itself, to better live up to its promise to make converter implementations easy.

@vstinner
Copy link
Contributor

PyLong_FROM(int32_t, 123) doesn't work:

int func(int *ptr)
{
    return (*ptr == 3);
}

int main()
{
    return func(&(int)3);
}

Output:

$ gcc x.c -o x
x.c: In function 'main':
x.c:8:17: error: lvalue required as unary '&' operand
    8 |     return func(&(int)3);
      |                 ^

We used multiple "magic" macros like that in the past, and it created multiple problems. It took years to fix macros and/or move away from macros. PEP 670 gives examples of macro issues. I would prefer to not introduce a new "magic" macro.

I don't see advantages of this macro for the users of the API. It looks more complicated to use rather than just calling a regular function.

@zooba
Copy link

zooba commented Jun 17, 2024

I kinda wish AsNativeBytes would do more of that itself, to better live up to its promise to make converter implementations easy.

The only bit it doesn't do is set the exception when the value exceeds the provided buffer. And that does make it easier, because callers are more likely to try a different approach than to simply bubble up our exception, so they're saved from a PyErr_Clear (and also encouraged to provide a more specific error message if they do need to bail out).

We used multiple "magic" macros like that in the past, and it created multiple problems. It took years to fix macros and/or move away from macros. PEP 670 gives examples of macro issues. I would prefer to not introduce a new "magic" macro.

Agreed. Using macros in our .c files to generate the implementations is fine, but I think any new conversions ought to be real functions.

@vstinner
Copy link
Contributor

My PR python/cpython#120390 is now ready for review. It adds 8 functions:

  • PyLong_FromInt32()
  • PyLong_FromUInt32()
  • PyLong_FromInt64()
  • PyLong_FromUInt64()
  • PyLong_ToInt32()
  • PyLong_ToUInt32()
  • PyLong_ToInt64()
  • PyLong_ToUInt64()

@vstinner
Copy link
Contributor

I created capi-workgroup/decisions#32 "Add PyLong_FromInt64() and PyLong_ToInt64()" in the C API WG Decisions project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guideline To be included in guidelines PEP
Projects
None yet
Development

No branches or pull requests

6 participants