Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #441 do not touch __deepcopy__ memo object #442

Merged
merged 1 commit into from
Feb 6, 2024
Merged

Conversation

robnagler
Copy link
Member

No description provided.

@robnagler robnagler requested a review from rorour February 5, 2024 17:32
Copy link
Contributor

@rorour rorour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I don't understand the purpose of rv = self.copy() rather than rv = self.__class__(). I'm not sure how you feel about dictionary comprehensions but we could even get rid of the temp var

    def __deepcopy__(self, memo):                                                                         |
        return PKDict(                                                                                    |
            {copy.deepcopy(k, memo): copy.deepcopy(v, memo) for (k, v) in self.items()}                   |
        )

@robnagler
Copy link
Member Author

This is an excellent question.

I don't understand the purpose of rv = self.copy() rather than rv = self.class(). I'm not sure how you feel about dictionary comprehensions but we could even get rid of the temp var

    def __deepcopy__(self, memo):                                                                         |
        return PKDict(                                                                                    |
            {copy.deepcopy(k, memo): copy.deepcopy(v, memo) for (k, v) in self.items()}                   |
        )

Technically we shouldn't need deepcopy, but python does the wrong thing with subclasses of dict:

def _deepcopy_dict(x, memo, deepcopy=deepcopy):
    y = {}

There's a subtlety here when implementing deepcopy. The type has to stay the same and the constructor may not match. Consider this naive implementation:

rv = self.__class__()

This requires any subclass of PKDict to accept an empty constructor, which many subclasses don't. I was just dealing with this for sirepo.quest.Attr, which always accepts a qcall, and I wanted to clone the object. self.__class__() crashed with incorrect args.

The design of PKDict.deepcopy relies on PKDict.copy, which overrides dict.copy to fix the problem above, and also allows subclasses to override copy or better __copy__ to get around the construction problem of the initial object. Any nested deepcopy calls will always do the right thing, that is, construct an object of the class using copy so you can have a tree of different classes all of which derive from PKDict without anything breaking. Furthermore, any subclasses can change the behavior of copy, e.g. to throw out items they don't want copied on a deepcopy. (I'm having to deal with this now on Attr, btw.)

The use of a comprehension could be done with update or pkupdate. This constructs a new object ({}), however. If the dict is large (and there are nested comprehensions, there could be a lot of memory used up. The way it is written, it creates an object (rv) which will be no bigger than the ultimate return value. By using items, we don't allocate any more space than is absolutely necessary to make the copy.

When designing low level APIs like PKDict, which are used all over the system, it's important to consider these performance issues.

If you feel like it, you can try to document this in the implementation. I think it is a good exercise to understand code like this, and documenting is the best way to understand code.

@robnagler robnagler merged commit 28efa7b into master Feb 6, 2024
2 checks passed
@robnagler robnagler deleted the 441-memo branch February 6, 2024 22:09
@rorour
Copy link
Contributor

rorour commented Feb 6, 2024

Technically we shouldn't need deepcopy, but python does the wrong thing with subclasses of dict:

Wow.

Makes sense, thank you for the explanation. I will add some documentation (#443)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants