Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31333: Re-implement ABCMeta in C #5273

Merged
merged 103 commits into from
Feb 18, 2018
Merged

Conversation

ilevkivskyi
Copy link
Member

@ilevkivskyi ilevkivskyi commented Jan 23, 2018

This implementation is fully functional but there are three remaining issues/questions that I hope we can resolve quickly:

  • One test still fails. This is because six.with_metaclass (used by pip) does some fragile "magic". With this PR it fails with TypeError: type.__new__(metaclass) is not safe, use ABCMeta.__new__(). I see two options here: (1) Try to somehow special-case ABCMeta to fix this; (2) Do nothing here, but instead contact six and pip maintainers so that hey can fix this.
  • What to do with private caches/registry? There are three options: (1) Make them fully internal (i.e. only visible in C code); (2) Expose them, but make them read-only, so that one can do C._abc_cache.clear(), but not C._abc_cache = "Surprise!"; (3) Expose them, and make them writable. I currently go with option (2), which seems to be a reasonable compromise.
  • How to organize the caches? In this version I made both caches and registry just normal sets of weak references without callbacks. This is very simple in terms of implementation and fast, with a minor downside that dead references will stay in caches. I am open to alternative ideas here (note the main issue is that references should not be removed during iteration over the sets).

I didn't do any careful benchmarking yet, but this this seems to give a decent speed-up for Python start-up time and to several ABC-related tests. For example, on my machine Python startup is 10% faster.

@methane @serhiy-storchaka I will really appreciate your help/advise/review here.

@ned-deily I know beta1 is very close, but I would like this to get in. I already discussed with @gvanrossum and he is OK with this getting into beta1.

https://bugs.python.org/issue31333

@ilevkivskyi
Copy link
Member Author

(I will add a news item later.)

@1st1
Copy link
Member

1st1 commented Jan 23, 2018

  • One test still fails. This is because six.with_metaclass (used by pip) does some fragile "magic". With this PR it fails with TypeError: type.__new__(metaclass) is not safe, use ABCMeta.__new__(). I see two options here: (1) Try to somehow special-case ABCMeta to fix this; (2) Do nothing here, but instead contact six and pip maintainers so that hey can fix this.

What exact magic does six use? This can actually be a serious issue and cause many breakages.

  • What to do with private caches/registry? There are three options: (1) Make them fully internal (i.e. only visible in C code); (2) Expose them, but make them read-only, so that one can do C._abc_cache.clear(), but not C._abc_cache = "Surprise!"; (3) Expose them, and make them writable. I currently go with option (2), which seems to be a reasonable compromise.

I'd go with (1). There're 0 reasons to use those private caches/registries directly.

I had a similar problem problem in asyncio in 3.6, when I debated whether I want to expose private Task and Future attributes or not. Turns out that we'll be hiding some of them in 3.7 because it's impossible to optimize/refactor code otherwise.

  • How to organize the caches? In this version I made both caches and registry just normal sets of weak references without callbacks. This is very simple in terms of implementation and fast, with a minor downside that dead references will stay in caches. I am open to alternative ideas here (note the main issue is that references should not be removed during iteration over the sets).

Sounds like "minor downside that dead references will stay in caches" is a backwards-incompatible change. Some ORMs (like the one we've developed at MagicStack) create virtual DB Model classes on the fly. Some of them can be inherited from ABCMeta. So I guess with this PR we'd be risking to have a memory leak in 3.7.


BTW, have you considered implementing caches and instance/subclass hooks in C, but implementing the actual ABCMeta class in pure Python? That way ABCMeta methods would have nice C-accelerated versions, but the 'six' problem should go away.

@1st1
Copy link
Member

1st1 commented Jan 23, 2018

How to organize the caches? In this version I made both caches and registry just normal sets of weak references without callbacks.

We can also use WeakSet and WeakKeyDictionary btw.

Modules/_abc.c Outdated
Stage 1: direct abstract methods.
(It is safe to assume everything is fine since type.__new__ succeeded.) */
ns = PyTuple_GET_ITEM(args, 2);
items = PyMapping_Items(ns); /* TODO: Fast path for exact dicts with PyDict_Next */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be null if ns is a subclass of dict that overwrites the items method to raise an error or return a non-iterable

Modules/_abc.c Outdated
items = PyMapping_Items(ns); /* TODO: Fast path for exact dicts with PyDict_Next */
for (pos = 0; pos < PySequence_Size(items); pos++) {
item = PySequence_GetItem(items, pos);
key = PyTuple_GetItem(item, 0);
Copy link

@pppery pppery Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item isn't necessarily a tuple, for example in

class BadItems(dict):
    def items(self):
         return (87,)
AbcMeta.__new__("name", (), BadItems())

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, item could be a empty tuple or 1-tuple, meaning that key and/or value could be null, which again needs to be checked for.

Modules/_abc.c Outdated
if (!(iter = PyObject_GetIter(base_abstracts))) {
goto error;
}
while ((key = PyIter_Next(iter))) {
Copy link

@pppery pppery Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing check for iterators that raise an error (such as when __abstractmethods__ is overwritten with a custom object).

Modules/_abc.c Outdated
}
Py_DECREF(iter);
}
if (_PyObject_SetAttrId((PyObject *)result, &PyId___abstractmethods__, abstracts) < 0) {
Copy link

@pppery pppery Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this cause a change in behavior; __abstractmethods__ used to be a frozenset, but would now be a regular mutable set

@methane
Copy link
Member

methane commented Jan 23, 2018

We can also use WeakSet and WeakKeyDictionary btw.

They're implemented in Python and it's one of major reason ABC is slow.

Modules/_abc.c Outdated
abcmeta_new, /* tp_new */
0, /* tp_free */
0, /* tp_is_gc */
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use designated initializer for new type.

@ilevkivskyi
Copy link
Member Author

@1st1

What exact magic does six use?

six uses this code:

def with_metaclass(meta, *bases):
    # ...
    class metaclass(type):

        def __new__(cls, name, this_bases, d):
            return meta(name, bases, d)

        @classmethod
        def __prepare__(cls, name, this_bases):
            return meta.__prepare__(name, bases)
return type.__new__(metaclass, 'temporary_class', (), {})

The problem is that it doesn't work with C-level metaclasses.

BTW, have you considered implementing caches and instance/subclass hooks in C, but implementing the actual ABCMeta class in pure Python? That way ABCMeta methods would have nice C-accelerated versions, but the 'six' problem should go away.

This actually sounds like a reasonable solution.

I'd go with (1). There're 0 reasons to use those private caches/registries directly.
I had a similar problem problem in asyncio in 3.6, when I debated whether I want to expose private Task and Future attributes or not. Turns out that we'll be hiding some of them in 3.7 because it's impossible to optimize/refactor code otherwise.

OK, I can hide them (we then just need to update code in refleak.py that explicitly reads them).
How do you propose to have the hidden C-level attributes for Python level class? Just have a single huge dictionary, so that it will work like this:

# pseudo-code, will be in C
_the_registry: Dict[WeakRef[type], Set[WeakRef[type]]] = {}
...
def _abc_register(cls, subcls):
    _registry = _the_registry[ref(cls)]
    _registry.add(ref(subcls))
    return subcls

or is there a better way?

Sounds like "minor downside that dead references will stay in caches" is a backwards-incompatible change...

Yes, I have a TODO about limiting cache growth in code, but if we are going to hide private cache attributes, then it is easy, we can just register callbacks since caches are never iterated in this code.

We can also use WeakSet and WeakKeyDictionary btw.

As @methane mentioned they are implemented in Python and also slow and overly general. As I said above, if we are going to hide the attributes, then we don't need this for caches. We only iterate over the registry and for it I can just use callbscks with a "commit queue" and an iteration guard (this is actually the idea behind WeakSet, but we can make it much simpler since we are doing very limited set of operations with the registry).

@ilevkivskyi
Copy link
Member Author

@pppery Thanks for a review! Will fix this.

@ilevkivskyi ilevkivskyi reopened this Feb 17, 2018
@ilevkivskyi
Copy link
Member Author

@serhiy-storchaka I just noticed you self-requested a review on this PR. Do you still want to review this before I merge? Or are you fine taking a look at it later and if necessary making a separate PR?

@serhiy-storchaka
Copy link
Member

I'm making a quick review right now.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the C code. It mostly LGTM, but I added several comments.

Modules/_abc.c Outdated
PyObject *_abc_registry;
PyObject *_abc_cache; /* Normal set of weak references. */
PyObject *_abc_negative_cache; /* Normal set of weak references. */
PyObject *_abc_negative_cache_version;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use just a plain 64-bit integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good point, I was worried about some ORMs that register lots of classes, but I just calculated that it will take million years to reach the maximum value even if 1000 classes are registered every second.

Modules/_abc.c Outdated
static int
_in_weak_set(PyObject *set, PyObject *obj)
{
if (set == NULL || PySet_Size(set) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use PySet_GET_SIZE()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these attributes are not accessible from Python code so that we know that they always refer to sets.

if (wr == NULL) {
return -1;
}
destroy_cb = PyCFunction_NewEx(&_destroy_def, wr, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check destroy_cb == NULL?

Modules/_abc.c Outdated
self: object
/

Internal ABC helper for class set-up. Should be never used outside abc module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed period at the end.

Modules/_abc.c Outdated
int is_abstract = _PyObject_IsAbstract(value);
Py_DECREF(value);
if (is_abstract < 0 ||
(is_abstract && PySet_Add(abstracts, key) < 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 7 requires { on new line in case of multiline condition.

Modules/_abc.c Outdated

/* 6. Check if it's a subclass of a subclass (recursive). */
subclasses = PyObject_CallMethod(self, "__subclasses__", NULL);
if(!PyList_Check(subclasses)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed space after if.

Modules/_abc.c Outdated
goto end;
}
for (pos = 0; pos < PyList_GET_SIZE(subclasses); pos++) {
int r = PyObject_IsSubclass(subclass, PyList_GET_ITEM(subclasses, pos));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyList_GET_ITEM(subclasses, pos) is a borrowed reference while subclasses is mutable and can has external references. It is safe to temporary increment the refcount.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but the list itself holds strong references to all its elements, and will not be destroyed since we hold a strong reference to the list. Anyway, if you think it is important I can add an INCREF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyObject_IsSubclass() can execute arbitrary Python code. This code can modify the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but note that this can happen only if a subclass overrides __subclasses__, because object.__subclasses__ returns a new list on each call.

Modules/_abc.c Outdated
return 0;
}
// Weakref callback may remove entry from set.
// Se we take snapshot of registry first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"So"?

Modules/_abc.c Outdated
}
// Weakref callback may remove entry from set.
// Se we take snapshot of registry first.
PyObject **copy = PyMem_Malloc(sizeof(PyObject*) * registry_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be simpler to use PySequence_List(impl->_abc_registry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is a very hot code, so we decided to make an optimisation here (anyway we need a copy for avery short time).

Modules/_abc.c Outdated

The token is an opaque object (supporting equality testing) identifying the
current version of the ABC cache for virtual subclasses. The token changes
with every call to ``register()`` on any ABC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double bacticks look not needed.

@ilevkivskyi
Copy link
Member Author

@serhiy-storchaka Thanks for review! I will make the changes now.

@ilevkivskyi ilevkivskyi reopened this Feb 17, 2018
@ilevkivskyi
Copy link
Member Author

@serhiy-storchaka I implemented the latest comments, are you happy with the PR now?

@pppery
Copy link

pppery commented Feb 17, 2018

It seems like after this change the _weakrefset hack can be removed (moving the code back directly into the weakref module)

@ilevkivskyi
Copy link
Member Author

It seems like after this change the _weakrefset hack can be removed (moving the code back directly into the weakref module).

The key word here is after.

static unsigned long long abc_invalidation_counter = 0;

/* This object stores internal state for ABCs.
Note that we can use normal sets for caches,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date

since they are never iterated over. */
typedef struct {
PyObject_HEAD
PyObject *_abc_registry;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of registry, cache, and negative cache are normal sets of weak references; there should be comments stating that for either all of none of them

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth mentioning in What's New in the optimizations section.

Modules/_abc.c Outdated
@@ -508,6 +513,9 @@ _abc__abc_instancecheck_impl(PyObject *module, PyObject *self,
}

subclass = _PyObject_GetAttrId(instance, &PyId___class__);
if (subclass == NULL) {
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaked impl.

Modules/_abc.c Outdated
Py_DECREF(negative_cache);
return NULL;
}
PyObject *res = PyTuple_Pack(4, registry, cache, negative_cache, cache_version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written simpler as:

PyObject *res = Py_BuildValue("NNNK",
                              PySet_New(impl->_abc_registry),
                              PySet_New(impl->_abc_cache),
                              PySet_New(impl->_abc_negative_cache),
                              impl->_abc_negative_cache_version);

Modules/_abc.c Outdated
abc_invalidation_counter, Py_LT);
assert(r >= 0); // Both should be PyLong
if (r > 0) {
if (impl->_abc_negative_cache_version < abc_invalidation_counter) {
/* Invalidate the negative cache. */
if (impl->_abc_negative_cache != NULL &&
PySet_Clear(impl->_abc_negative_cache) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 7 requires { on a separate line in this case.

@ilevkivskyi ilevkivskyi reopened this Feb 18, 2018
@ilevkivskyi ilevkivskyi merged commit 03e3c34 into python:master Feb 18, 2018
@bedevere-bot
Copy link

@ilevkivskyi: Please replace # with GH- in the commit message next time. Thanks!

ilevkivskyi added a commit to ilevkivskyi/cpython that referenced this pull request Feb 18, 2018
This adds C versions of methods used by ABCMeta that
improve performance of various ABC operations.
@1st1
Copy link
Member

1st1 commented Feb 18, 2018

Thank you @ilevkivskyi and @methane!

@ilevkivskyi
Copy link
Member Author

Thank you for good reviews @1st1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants