Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-1635741 - port unicodedata to multi-phase init #22145

Closed

Conversation

koubaa
Copy link
Contributor

@koubaa koubaa commented Sep 8, 2020

Second attempt, this time I'm slightly less ignorant about CPython core and argument clinic.

"Every problem in software engineering can be solved by adding a layer of indirection". I hope this is true about porting unicodedata to multi-phase init.

The challenge with this module is that the methods are shared between the module, class and in some cases a capsule API. Each of those cases will have its own way to get the module state (which is optional in the case of the capsule API).

I created internal functions like unicodedata_UCD_normalize_internal which is passed in the module state. The class methods pass it in by getting it from the defining class, the module methods get it their own way.

https://bugs.python.org/issue1635741

Modules/unicodedata.c Outdated Show resolved Hide resolved
@koubaa
Copy link
Contributor Author

koubaa commented Sep 15, 2020

I am taking the approach of putting the module in the capsule struct. This requires the client code to pass in the module (and will need some note in the changelog)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer with a first short PR to prepare the work to just pass a state to UCD_Check(), without all other changes.

This PR is too big, I cannot review it. GitHub even hides the diff by default because it is too large :-D

@vstinner
Copy link
Member

I created https://bugs.python.org/issue41798 "meta-issue" to discuss the usage of the PyCapsule C API with multi-phase init API (PEP 489).

@vstinner
Copy link
Member

I am taking the approach of putting the module in the capsule struct. This requires the client code to pass in the module (and will need some note in the changelog)

That sounds like a good plan, to pass the module through a state.

@koubaa
Copy link
Contributor Author

koubaa commented Sep 17, 2020

I would prefer with a first short PR to prepare the work to just pass a state to UCD_Check(), without all other changes.

This PR is too big, I cannot review it. GitHub even hides the diff by default because it is too large :-D

@vstinner Since the module state (which contains the capsule and a type) is new in this PR, it would be a struct with sizeof of 0. I can put an int in it just as a placeholder but I don't know if its a good idea to merge that in.

Most of the changes are breaking the functions (which are shared between the module and the type) into different entry points for either - so that I can get the module state from both flavors in the way that works for them. I propose to make the first stage just about this indirection, without switching to any heap types. What do you think?

@vstinner
Copy link
Member

@vstinner Since the module state (which contains the capsule and a type) is new in this PR, it would be a struct with sizeof of 0. I can put an int in it just as a placeholder but I don't know if its a good idea to merge that in.

You can try to work in multiple steps (multiple PRs):

  • Add a ucd_type parameter to UCD_Check() macro
  • Convert UCD_Type static type to a heap type without adding a module state (right now, it's not possible to create more than one instance of the unicodedata module)
  • Add a module state

For the module state, it may be more flexible to pass ucnhash_CAPI to getname() and getcode() this API, rather than passing ucnhash_CAPI->module as the first parameter.

@koubaa
Copy link
Contributor Author

koubaa commented Sep 18, 2020

@vstinner step 1 is equivalent to Py_IS_TYPE, but sure I can do that first.

Are there any examples of heap types without module state that I can refer to?

@koubaa
Copy link
Contributor Author

koubaa commented Sep 18, 2020

For the module state, it may be more flexible to pass ucnhash_CAPI to getname() and getcode() this API, rather than passing ucnhash_CAPI->module as the first parameter.

@vstinner This is definitely pythonic 👍

@vstinner
Copy link
Member

This PR is outdated, unicodedata got many changes in the meanwhile. I proposed one approach to convert unicodedata to multi-phase init in https://bugs.python.org/issue42157 I close this PR. Once PR #22990 will be merged, I will propose a PR to finally convert the module to multi-phase init. Sorry for the misunderstanding, but this extension module is way more complex than other extensions, and I didn't spot all corner cases at the first review. See the bpo for the list of all issues and my proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants