-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(memh5): implement .copy() method #193
Conversation
|
caput/memh5.py
Outdated
self[key][:], order="A" | ||
), # ...so lets ensure that deep copies are actually made. | ||
) | ||
if hasattr(self, "index_map"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 - you should never have to explicitly copy index maps, since they are just an abstraction of an internal set of groups and datasets.
2 - MemGroups don't have index maps.
caput/memh5.py
Outdated
@@ -1774,6 +1796,41 @@ def save( | |||
**kwargs | |||
) | |||
|
|||
def copy(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function probably shouldn't reimplement crawling through the tree, since that has already been implemented for the underlying storage (MemGroup.copy() or deep_group_copy() on an hdf5 group). You will just want to call that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair I mostly said no to doing that as I don't want anything to unexpectedly break by changing the behaviour of deep_group_copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a shared
argument which is a list of paths to datasets which should remain shared? That way I can use it to entirely replace the version in draco.core.containers.ContainerBase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do kind of agree with @kiyo-masui here too. This seems a little confusing in here. Either it should live in MemDiskGroup
and blindly copy the tree (which should work fine), or it should live in BasicCont
where it can exploit the restricted structure of that class (i.e. non-recursive, index_maps etc). At the moment, it does a bit of a hybrid, i.e. it recursively copies everything, but with a special treatment for index_maps which aren't even a concept in MemDiskGroup
What you're doing kind of works, but as a bit of style advice generally a method should only call functionality it knows is there. In here you've protected the calls with a conditional, but having to do this is a good indicator, you probably should structure things differently:
- Move the method down in the class hierarchy (i.e. into the subclass)
- Move functionality into a hook method
- Move the functionality you're calling up in the class hierearchy
- Use an overload (you can call the base class method, and then do your own work)
caput/memh5.py
Outdated
Could be generalized to MemGroups whose datasets are not all numpy arrays. | ||
In particular, if the data structure is hierarchical (datasets more than one layer deep), this will complain and fail. | ||
""" | ||
for key in new.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that new
isn't defined here I don't see how this would work, and so you could probably do with another test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @leungcalvin this is looking like a good start. Thanks for this. I've added a few review comments. Ping me here or on slack if you have any comments about them.
caput/memh5.py
Outdated
_copy(f[key], g_key) | ||
else: | ||
try: | ||
g.create_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this won't work correctly with distributed datasets:
- It doesn't pass on the distributed keywords
- It doesn't attempt to pass the array via an MPIArray which would be expected for creating a distributed dataset (this may or may not work)
- This may also strip any chunking or compression params that were set on the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show me where to look to figure out how to write this?
caput/memh5.py
Outdated
@@ -1774,6 +1796,41 @@ def save( | |||
**kwargs | |||
) | |||
|
|||
def copy(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a shared
argument which is a list of paths to datasets which should remain shared? That way I can use it to entirely replace the version in draco.core.containers.ContainerBase
caput/memh5.py
Outdated
@@ -1774,6 +1796,41 @@ def save( | |||
**kwargs | |||
) | |||
|
|||
def copy(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do kind of agree with @kiyo-masui here too. This seems a little confusing in here. Either it should live in MemDiskGroup
and blindly copy the tree (which should work fine), or it should live in BasicCont
where it can exploit the restricted structure of that class (i.e. non-recursive, index_maps etc). At the moment, it does a bit of a hybrid, i.e. it recursively copies everything, but with a special treatment for index_maps which aren't even a concept in MemDiskGroup
What you're doing kind of works, but as a bit of style advice generally a method should only call functionality it knows is there. In here you've protected the calls with a conditional, but having to do this is a good indicator, you probably should structure things differently:
- Move the method down in the class hierarchy (i.e. into the subclass)
- Move functionality into a hook method
- Move the functionality you're calling up in the class hierearchy
- Use an overload (you can call the base class method, and then do your own work)
Apologies for the last commit, I was careless with a |
Do either of you know what is the source of the |
I do not know where the error is from. However, |
It's apparently a Sphinx bug: sphinx-doc/sphinx#9817 which is fixed in Sphinx 4.3, but we've pinned Sphinx to version 3.4.2. |
@zkader and I would like to push this soon. Is the sphinx bug not an issue for the main branch of |
I'm going to close this as this functionality was added in #236, based on feedback from this PR |
Implements a
.copy()
method for memh5 objects that makes a deep copy of datasets. Currently does not support nested groups, or non-numpy arrays as datasets.