Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up the Python C API for consistency, completeness and usefulness #71

Closed
malemburg opened this issue Aug 7, 2023 · 18 comments
Closed

Comments

@malemburg
Copy link

There have been long discussions on Discourse about how to go about dealing with C API changes, deprecations, private vs. public and ABI stability promises.

I think it's time to sit down and have a workgroup discuss and design the way forward. My hope is that this repo can turn into that workgroup, but we'd need some form of governance, regular meetings and deliverables for this to work out.

In any case, looking at the C API as it stands now, I find that (or perhaps I'm looking in the wrong places):

  • there are many useful APIs which are marked private, but would be of benefit for extension writers; most of these APIs do not expose significant CPython internals which would hinder future developments
  • we have no good C APIs for bulk operations (e.g. encode this list of 100k strings) or bulk methods in general
  • the often heard suggestion to access CPython details by going via Python's lookups and function/method instead of using direct C APIs is not a universal answer in all situations; this works if you only need the information or action a few times, but completely fails when running in tighter loops; we need direct C APIs for often needed details
  • more recently, the code base was changed to only allow upfront configuration of the interpreter(s); while this makes setting up interpreters easier, we lost the possibility to enable or change certain settings while the interpreter is running; we will need APIs to reenable making such online changes, e.g. to enable/disable optimization, the verbose flag or the dont-write-bytecode flag to name a few
  • the C API should regain consistency by e.g. making the Unicode implementation encoder API public again (these used to be public in Python 2, albeit with a different signature)

I'm not a fan of redesigns, but instead would like to see the C API evolve over time and if this takes longer, that's perfectly fine.

There aren't all that many places which really expose internals in a way which hinders future development, e.g. APIs targeting the VM. Those can be treated with different strategies, while clearly stating our goals and promises we make.

For most other APIs, it's important to make the C API enjoyable for C programmers (again), providing a rich API with good tooling to tap into Python objects in an efficient way. This is what attracted me to Python back in 1994, when I was mostly a C programmer, and I would like Python to regain this elegance on the C side of things.

@malemburg
Copy link
Author

Just to clarify: I'm willing to help with all this, if people would like to have me in such a workgroup.

@encukou
Copy link
Contributor

encukou commented Aug 7, 2023

The next steps seem to be answering these, respectively:

  • Which useful APIs are marked private?
  • What would such bulk API look like?
  • Which are the often needed details?
  • If the settings can be changed at runtime, IMO there should be working Python API to do that (e.g. set something in sys). When that's working, we can see if a C function is needed -- most likely this won't be needed in a tight loop so a PySys_SetObject might suffice.
  • Which functions should be made public?

Do you have some concrete (partial) answers already? Or do we need to do research and, say, establish some criteria for what to fix/include?

@malemburg
Copy link
Author

I could go through the API and answer your questions, but isn't the point of a workgroup to do such things in a more collaborative and interactive way, after we have established how we want to tackle such problems (e.g. splitting up the work in chunks, one ticket per API, group of APIs, design pattern, etc) ?

Some example for the above:

  • useful APIs: Writer APIs for str and bytes should be public #70 and the Unicode encoder APIs
  • bulk API: a PyObject_MethodCaller() API similar to e.g. operator.methodcaller() (ideally using the vector mechanism if possible)
  • often needed details: e.g. calling the same codec over and over again and having to use PyCodec_Encode() for this, which has to look up the codec for every single encoding operation
  • changing runtime settings: agreed, this is not needed in tight loops, but it should at least be possible from C with a simple call, rather than having to resort to calling a function in a support .py file for this
  • public APIs: see above, the encoder APIs are good example, but there are more

@steve-s
Copy link
Contributor

steve-s commented Aug 8, 2023

This is what attracted me to Python back in 1994, when I was mostly a C programmer, and I would like Python to regain this elegance on the C side of things.

  • Does Python want to be a glue language that orchestrates pieces of C code that do the real work? This is important, because a glue language needs to expose lots of things to make everything possible and fast in C, but that goes against making the Python language itself fast and making its evolution simple (even adding new language features may be problem with leaked implementation detail, it's not just performance). noGIL is good example: why noGIL if people could already write parallel algorithms in C and use Python to just drive them?
  • I can see how this was true in 90s, but is it still? How big is the share of native C extensions among PyPI packages? People that do web development, for example, do not touch many native extensions. There are some native extensions that are widely used (numpy, scipy, pandas, tensorflow, ...), but people that use them do not care how they are implemented and want to use them from Python. And if CPython's performance is not good enough for them, I would think they tend to reach to Cython or nanobind, not raw C API... and if CPython's performance was not a concern, I think we would have a lot fewer native extensions. People like and want to write Python, I believe, not C.

@iritkatriel
Copy link
Member

I think this issue is too broad to be useful for our current effort, which is to identify (1) specific problems with the c api that need to be resolved in a new one and (2) specific strengths that should be preserved.

You mention several items in the first category. Could you create an issue for each of them? Try to avoid discussing solutions at this stage.

@gvanrossum
Copy link

@malemburg: I think your list does not lend itself to a uniform "API design committee" approach.

  • Private APIs that should be public: a committee could decide it's good to put work into this, but we'd still need practitioners to come up with specific APIs that they think should be made public, and why. I don't think it's fair to ask the committee to come up with the list of APIs to promote.
  • Bulk operations probably need one or two dedicated community members to come up with a design in the form of a PEP, with use cases taken from practice, and so on.
  • Direct C APIs, again, require practitioners to argue for promotion of specific Python APIs that would be useful as C APIs, and why (performance in tight loops, or avoiding common mistakes in calling Python code from C, or convenience for commonly needed APIs, or what have you).
  • Interpreter configuration: this sounds like something that should be taken into account when designing the interpreter for creating (sub)interpreters. There's ongoing work in PEP 554, so maybe this could be included as a requirement there.
  • Making the Unicode encoder API public (again): This might require understanding why it was made private (or internal?). Was it an oversight or was there a specific reason? If there was a reason, we should establish that the reason for making it public is stronger. This is something where the committee should have the last word, but not until after it's been debated properly.

@steve-s: I'm not sure I buy the glue language dichotomy, especially the no-GIL example. IIUC, no-GIL can be essential for people who use Python as a glue language that wraps a large C library that is fundamentally multi-threaded. Callbacks from C to Python currently have to deal with the GIL in a very awkward way. So the existence of parallel algorithms written in C actually make no-GIL relevant.

@steve-s
Copy link
Contributor

steve-s commented Aug 30, 2023

@steve-s: I'm not sure I buy the glue language dichotomy, especially the no-GIL example. IIUC, no-GIL can be essential for people who use Python as a glue language that wraps a large C library that is fundamentally multi-threaded. Callbacks from C to Python currently have to deal with the GIL in a very awkward way. So the existence of parallel algorithms written in C actually make no-GIL relevant.

Maybe the no-GIL wasn't a good example, because it can be useful for "glue language" use-case in the way you mention, which makes it confusing. Another example: faster CPython. It's better for highly optimized interpreter if the internal details are hidden and can be flexible, while if you think of Python as a "glue language", then speeding up the interpreter is not much useful (all the heavy computations happen in native code) and hiding implementation details is bad because you want a raw access for maximal performance of the native part.

Of course, it's not one or the other, but I also believe that there's no free lunch. One cannot have fast C API with direct access to internal data-structures and at the same time modern advanced VM. No-GIL could have used tracing GC instead of the biased reference counting, if reference counting was not exposed, for example.

@malemburg
Copy link
Author

@gvanrossum: I think your list does not lend itself to a uniform "API design committee" approach.

From your and Irit's reply I read that I probably wasn't clear enough in my original description. I'm going to try again:

I was not at the PyCon US meeting where the capi workgroup was apparently discussed, so perhaps I'm missing some context, but from looking through this repo, I could not find any clue as to how this workgroup will operate, who its members are and what the design goals should be.

I think it's a good idea to have such a workgroup, but for it to be useful we will need some clear mission or perhaps vision of what the workgroup should strive for. Simply collecting lots and lots of often very specific requests is not going to help with this, since it's not clear how those requests should be handled, whether they fit the common goal the workgroup has and how and where the workgroup will discuss and perhaps vote on the various requests.

The tickets in this "problems" repo clearly show that there are lots of different views on where the C API should be heading, from the "make it as complete and useful as possible for C extension writers" all the way to "we should only have a high level minimal C API".

While it's good to collect ideas, I believe it's necessary to give people an idea of the general plan first, before they spend many hours digging through the existing APIs and suggesting to turn private into public APIs or proposing new approaches to the design and investing a lot of time into proposals (everything from "let's do a Python C API 2.0 from scratch", via "let's replace pointers with handles" to "how about adding a level of indirection similar to the Windows COM design for better separation of implementation vs. API").

To answer @iritkatriel: Yes, I can open new tickets for the examples I've given (and probably will), but that was not really what I was after. IMO, we need to have more structure first and a clearer vision of what we're after, before making more specific suggestions. E.g. if the vision turns out to be "we should only have a high level minimal C API", those new tickets would likely mostly be pointless.

Hopefully, I've made my motivation a little clearer this time 🙂

Aside: The point about the config API was triggered by my implementation of eGenix PyRun, where I replace a lot of the Python startup time C code with an implementation written in Python (similar to the importlib effort). Since the code still needs to change those config variables after the interpreter has started running, I'm currently faced with a problem and will probably have to abandon the approach. OTOH, this may actually be something we'd want to have in CPython going forward to untangle the startup procedures we currently have in C. I'll add this context to a separate ticket to discuss.

@iritkatriel
Copy link
Member

The point of this repo at the moment is not to discuss the future of the C API but rather to get to a shared understanding of what its current state is. See draft for the document describing this shared understanding at: https://github.com/capi-workgroup/problems/blob/main/capi_problems.rst

@gvanrossum
Copy link

@malemburg

I was not at the PyCon US meeting where the capi workgroup was apparently discussed, so perhaps I'm missing some context, but from looking through this repo, I could not find any clue as to how this workgroup will operate, who its members are and what the design goals should be.

It was the language summit. See https://pyfound.blogspot.com/2023/05/the-python-language-summit-2023-three.html.

There is currently no membership, we plan to have a meeting in Brno with everyone who's at the sprint to discuss further steps, who's going to do what, etc. I hope you are planning to attend, as it is not that far for you?

@gvanrossum
Copy link

Looking back at the subject of this issue ("Clean up the Python C API for consistency, completeness and usefulness") I think this issue is too general to be of use for the C API working group. I'll close it. We know that we want all those things already.

@malemburg
Copy link
Author

The point of this repo at the moment is not to discuss the future of the C API but rather to get to a shared understanding of what its current state is. See draft for the document describing this shared understanding at: https://github.com/capi-workgroup/problems/blob/main/capi_problems.rst

Thanks for the pointer, @iritkatriel. I've read the document and now have a better idea of what you're after.

I still don't fully understand what you consider a "problem", since that's essentially framed by the goals we're after, but I guess you want everyone to simply put forward whatever they consider a "problem" based on their own goals, which is fine.

It was the language summit. See https://pyfound.blogspot.com/2023/05/the-python-language-summit-2023-three.html.

There is currently no membership, we plan to have a meeting in Brno with everyone who's at the sprint to discuss further steps, who's going to do what, etc. I hope you are planning to attend, as it is not that far for you?

Thanks for the link to the summit page. @gvanrossum. I read that as well and found the reasoning there to be very similar to discussions I see on Discourse or at conferences: there's no clear direction and that's the main obstacle we currently have in determining, where things should or should not be heading.

I won't be attending the sprint at Brno, but perhaps there's an option to listen in on the conversations you are having on Discord, if I can make those meetings (virtually).

Looking back at the subject of this issue ("Clean up the Python C API for consistency, completeness and usefulness") I think this issue is too general to be of use for the C API working group. I'll close it. We know that we want all those things already.

I'll take that as documentation of a first goal we have in setting the perspective for a clearer C API vision 🙂 .

@gvanrossum
Copy link

I'm sorry, I doubt that there will be an option to participate in the sprint discussions online; the logistics of that are too complicated, and we may have many smaller "hallway" discussions that wouldn't be captured anyways. But we will be sure to produce a document (or documents) with a conclusion and next steps.

You sound critical of our process. We're doing the best we can, first gathering input (this tracker), then summarizing it (Irit's doc), then having a focused discussion about direction and next steps (Brno). I think that's the best we can do.

@malemburg
Copy link
Author

You sound critical of our process. We're doing the best we can, first gathering input (this tracker), then summarizing it (Irit's doc), then having a focused discussion about direction and next steps (Brno). I think that's the best we can do.

I'm not really critical of the process you have chosen. I am very used to working in the context of PSF, EPS and other workgroups, where these things are always documented in the early stages of their creation, hence me wondering what the goals are and how to participate 🙂.

What I am missing is better communication of the effort, an open invitation to Python extension writers to participate and a short introduction of the goals and how the group envisions coming up with a strategy. Perhaps I should open a ticket for this and follow up there.

@gvanrossum
Copy link

What I am missing is better communication of the effort, an open invitation to Python extension writers to participate and a short introduction of the goals and how the group envisions coming up with a strategy. Perhaps I should open a ticket for this and follow up there.

Maybe you can send a PR with an update to the README for this repo?

@iritkatriel
Copy link
Member

We deliberately made it by invitation initially and not announced it widely so we can make some progress with little noise. The next stage should be to solicit wider feedback.

@malemburg
Copy link
Author

Maybe you can send a PR with an update to the README for this repo?

I'll see what I can come up with 🙂

We deliberately made it by invitation initially and not announced it widely so we can make some progress with little noise. The next stage should be to solicit wider feedback.

Great 👍

@malemburg
Copy link
Author


Maybe you can send a PR with an update to the README for this repo?

I'll see what I can come up with 🙂

Done. Please see #72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants