Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

method_cache interacts badly with subclasses #23

Open
jaraco opened this issue Aug 13, 2023 · 2 comments
Open

method_cache interacts badly with subclasses #23

jaraco opened this issue Aug 13, 2023 · 2 comments

Comments

@jaraco
Copy link
Owner

jaraco commented Aug 13, 2023

The implementation of method_cache, which sets the cached value on the instance, means that if another class attempts to override a cached method on the superclass, after the first time it's called, only the behavior in the superclass will be called.

This condition was encountered in irc.strings.IRCFoldedCase(jaraco.text.FoldedCase), where jaraco.text.FoldedCase.casefold is cached.

I'm not sure there's anything to do here except maybe warn of the risk and provide guidance on how to navigate the concern.

@bswck
Copy link
Contributor

bswck commented Aug 14, 2023

A question here: what exactly behavior is expected in this case? I will try to guess, basing on my intuition, what should be expected in specific scenarios:

Scenario 1. The developer did not decorate the subclass's method with @method_cache.

class Superclass:
    @method_cache
    def method(self):
        """Some heavy computations"""

class Subclass(Superclass):
    def method(self):
        ret = super().method()
        # some operations on ret...
        return ret

subclass_instance = Subclass()

Intuitive behavior: the subclass's method is not cached and operates on cached superclass's method.
Current behavior: as mentioned; the second and further calls to subclass_instance.method() result in only Superclass.method(subclass_instance) behavior being executed due to attribute overwrite.

Scenario 2. The developer decorated the subclass's method with @method_cache.

class Superclass:
    @method_cache
    def method(self):
        """Some heavy computations"""

class Subclass(Superclass):
    @method_cache
    def method(self):
        ret = super().method()
        # some operations on ret...
        return ret

subclass_instance = Subclass()

Intuitive behavior: the subclass's method is cached and operates on cached superclass's method.
Other methods in the subclass accessing super().method will access the cached superclass's method. self.method for them is cached as well.
Current behavior: method_cache()→wrapper() overwrites method's name on the instance before calling it. This results in superclass's cached method overlapping the subclass's cached method. The second and further calls to subclass_instance.method() result in only Superclass.method(subclass_instance) behavior being executed due to attribute overwrite. If we change the implementation a bit, from this:

def wrapper(self: object, *args: object, **kwargs: object) -> object:
# it's the first call, replace the method with a cached, bound method
bound_method: CallableT = types.MethodType( # type: ignore[assignment]
method, self
)
cached_method = cache_wrapper(bound_method)
setattr(self, method.__name__, cached_method)
return cached_method(*args, **kwargs)

to this:

def wrapper(self, *args, **kwargs):
    # it's the first call, replace the method with a cached, bound method
    bound_method = types.MethodType(method, self)
    cached_method = cache_wrapper(bound_method)
    result = cached_method(*args, **kwargs)
    setattr(self, method.__name__, cached_method)
    return result

(and apply consequent patches to special method wrapper)
the subclass's method is at its place, cached, and not throttled by its super-implementation.
This change doesn't, however, fix the current behavior for scenario 1 and implement the intuitive behavior, because super().method isn't cached anymore.

Retrieval scenario.

The developer might want to access the uncached version of the superclass's cached method after its first call. Wouldn't it be quite convenient if a subclass could access the superclass's wrapped method? Maybe from the wrapper's attribute? (wrapper.func, wrapper.wrapped or wrapper.uncached seem all reasonable). This would look like this:

class Superclass:
    @method_cache
    def method(self):
        """Some heavy computations"""

class Subclass(Superclass):
    def some_other_method(self):
        bound_uncached_superclass_method = types.MethodType(self.method.uncached, self)
        ...

When I look at all this, hoping that my points are correct, my thinking is that method_cache might be better as a descriptor class, similarly to functools.cached_property, which essentially uses instance's __dict__ to store cached property values. A descriptor implementation would separate instance's vars from actual getattr results, leaving some space for the underlying magic to accomplish all goals listed out in this post.

Scenario 1: on the first method_cache.__get__() call, already bind the function (passed in during decoration) to a method and store in instance's __dict__. Return cached method.

Scenario 2: as above; super().method will invoke superclass method_cache descriptor's __get__(). And that's where we could encounter a problem, because the instance's __dict__ are used by both the superclass and the subclass. Thus, the only possible idea here is to store cached methods locally per every method_cache descriptor object across the MRO of the instance. Like a huge dictionary of instances to their remote, one-purpose __dict__s.
weakref.ref() or id() can be used to identify instances, preferably the first one. The only flaw is that the instances using @method_cache must be weakly referencable. If they aren't, id()instance mapping would be required (instead of weakref.WeakKeyDictionary) to 1. have keys that differentiate every instance, 2. hold strong references to instances we store IDs of, because once they are no longer strongly referenced by any object, those IDs may point to something else. This way every descriptor will be able to easily identify the proper cached method to return. The only huge flaw to the id()-based approach is probable reference cycles and, consequently, memory leaks... That needs some thought.

Retrieval scenario: when the developer wants to access the uncached version of previously @method_cache-d function, it might be made even more convenient by binding to method type in method_cache.__get__().

Please let me know what you think about this hacky idea!

@jaraco
Copy link
Owner Author

jaraco commented Aug 14, 2023

Thanks for the feedback. I've skimmed over it but haven't the time right now to digest it. Regardless, I'm grateful for the assistance and look forward to reading the proposal in earnest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants