Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-103193: Micro-optimise helper functions for inspect.getattr_static #103195

Merged

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Apr 2, 2023

Micro-optimising inspect._static_getmro and inspect._shadowed_dict leads to a significant improvement in the time taken to call isinstance() on a runtime-checkable protocol. (This is a useful benchmark, as it's a real-world use of inspect.getattr_static in a tight loop, that's found in the stdlib.)

Benchmark results on a0305c5:

Time taken for objects with a property: 3.34
Time taken for objects with a classvar: 3.08
Time taken for objects with an instance var: 12.15
Time taken for objects with no var: 15.38
Time taken for nominal subclass instances: 25.13
Time taken for registered subclass instances: 21.65

Benchmark results with this PR:

Time taken for objects with a property: 2.82
Time taken for objects with a classvar: 2.85
Time taken for objects with an instance var: 8.62
Time taken for objects with no var: 14.01
Time taken for nominal subclass instances: 21.80
Time taken for registered subclass instances: 20.23
Benchmark script:
import time
from typing import Protocol, runtime_checkable

@runtime_checkable
class HasX(Protocol):
    x: int

class Foo:
    @property
    def x(self) -> int:
        return 42

class Bar:
    x = 42

class Baz:
    def __init__(self):
        self.x = 42

class Egg: ...

class Nominal(HasX):
    def __init__(self):
        self.x = 42

class Registered: ...

HasX.register(Registered)

num_instances = 500_000
foos = [Foo() for _ in range(num_instances)]
bars = [Bar() for _ in range(num_instances)]
bazzes = [Baz() for _ in range(num_instances)]
basket = [Egg() for _ in range(num_instances)]
nominals = [Nominal() for _ in range(num_instances)]
registereds = [Registered() for _ in range(num_instances)]


def bench(objs, title):
    start_time = time.perf_counter()
    for obj in objs:
        isinstance(obj, HasX)
    elapsed = time.perf_counter() - start_time
    print(f"{title}: {elapsed:.2f}")


bench(foos, "Time taken for objects with a property")
bench(bars, "Time taken for objects with a classvar")
bench(bazzes, "Time taken for objects with an instance var")
bench(basket, "Time taken for objects with no var")
bench(nominals, "Time taken for nominal subclass instances")
bench(registereds, "Time taken for registered subclass instances")

@AlexWaygood AlexWaygood added type-feature A feature request or enhancement performance Performance or resource usage stdlib Python modules in the Lib dir 3.12 bugs and security fixes labels Apr 2, 2023
@AlexWaygood AlexWaygood requested a review from carljm April 2, 2023 17:05
@AlexWaygood
Copy link
Member Author

I can find one significant bug report open regarding inspect.getattr_static:

Having studied the bug report, I don't think this patch impacts the bug (or any possible solutions to the bug) either way.

@Eclips4
Copy link
Member

Eclips4 commented Apr 3, 2023

Hello, Alex!
Just out of curiosity, why do you use a x.__dict__ instead of vars(x), it's related to a performance issues?

@AlexWaygood
Copy link
Member Author

Just out of curiosity, why do you use a x.__dict__ instead of vars(x), it's related to a performance issues?

There's not really any difference between the two; it's just a matter of taste. Some people consider vars(x) to be "more Pythonic" than accessing the __dict__ variable directly, but I don't really have a strong preference either way.

In this case, the main reason I'm accessing it via .__dict__ is just because that's what the existing code already does, and I don't see any reason to change the style here. It's also arguably more explicit in this case -- but again, that's just a subjective opinion about code readability, really.

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several things I can think of to speed up getattr_static.

  1. Change _check_instance to be
def _check_instance(obj, attr):
    try:
        instance_dict = object.__getattribute__(obj, "__dict__")
    except AttributeError:
        return _sentinel
    return dict.get(instance_dict, attr, _sentinel)
  1. Here type is called twice:
if (_check_class(type(klass_result), '__get__') is not _sentinel and
            _check_class(type(klass_result), '__set__') is not _sentinel):

Ideally we can also call _check_class once with several attributes 🤔
Because in this case it will allow us to decrease the amount of _static_mro calls.

I think that a variable might be a bit faster

@AlexWaygood
Copy link
Member Author

There are several things I can think of to speed up getattr_static.

Those sound great! My instinct is to tackle them in a separate PR, so that each change can be evaluated and benchmarked independently. Would you like to submit a PR?

@sobolevn
Copy link
Member

sobolevn commented Apr 3, 2023

Yes, I will do it tomorrow 👍

@picnixz
Copy link
Contributor

picnixz commented Apr 3, 2023

Would it be ok to have _dunder_dict_descriptor_get = type.__dict__["__dict__"].__get__ instead of _dunder_dict_descriptor = type.__dict__["__dict__"] in order to minimize the number of accesses to __get__ when iterating over the classes ?

@AlexWaygood
Copy link
Member Author

Would it be ok to have _dunder_dict_descriptor_get = type.__dict__["__dict__"].__get__ instead of _dunder_dict_descriptor = type.__dict__["__dict__"] in order to minimize the number of accesses to __get__ when iterating over the classes ?

I considered it, but felt like it would make it significantly less readable (_dunder_dict_descriptor already isn't a great name, and _dunder_dict_descriptor_get feels even worse). Happy to reconsider if others agree that we should go that way, though.

@picnixz
Copy link
Contributor

picnixz commented Apr 3, 2023

_dunder_dict_descriptor already isn't a great name, and _dunder_dict_descriptor_get feels even worse

IMO, since we are already micro-optimizing an internal helper of an internal function, we may not necessarily need to be exact so perhaps we can name it _dunder_dict_of ?

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Apr 3, 2023

IMO, since we are already micro-optimizing an internal helper of an internal function, we may not necessarily need to be exact so perhaps we can name it _dunder_dict_of ?

That's a decent name. Or maybe _get_dunder_dict_of_class.

I'll make the change — thanks!

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Apr 3, 2023

I pushed the change and updated the benchmark results in the PR description (it's maybe a teeny tiny bit faster than it was, but not by much)

@picnixz
Copy link
Contributor

picnixz commented Apr 3, 2023

it's maybe a teeny tiny bit faster than it was, but not by much

I think the impact will be clearer when we have a complicated inheritance diagram (e.g., large projects with abstract interfaces and/or mixins here and there).

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@AlexWaygood AlexWaygood merged commit 264c00a into python:main Apr 5, 2023
@AlexWaygood AlexWaygood deleted the microoptimise-getattr_static-helpers branch April 5, 2023 07:27
@AlexWaygood
Copy link
Member Author

There are several things I can think of to speed up getattr_static.

  1. Change _check_instance to be
def _check_instance(obj, attr):
    try:
        instance_dict = object.__getattribute__(obj, "__dict__")
    except AttributeError:
        return _sentinel
    return dict.get(instance_dict, attr, _sentinel)
  1. Here type is called twice:
if (_check_class(type(klass_result), '__get__') is not _sentinel and
            _check_class(type(klass_result), '__set__') is not _sentinel):

Ideally we can also call _check_class once with several attributes 🤔 Because in this case it will allow us to decrease the amount of _static_mro calls.

I think that a variable might be a bit faster

I tried these out locally, but unfortunately I can't measure any speedup from them :/

It's possible I'm not using the right benchmark to show a speedup, however -- happy to be proven wrong if you can get a speedup somewhere!

gaogaotiantian pushed a commit to gaogaotiantian/cpython that referenced this pull request Apr 8, 2023
warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants