Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide .copy methods for MemDataset and MemDiskGroup #236

Merged
merged 7 commits into from
Mar 16, 2023

Conversation

ljgray
Copy link
Contributor

@ljgray ljgray commented Mar 1, 2023

This PR provides .copy methods for MemDataset and MemDiskGroup classes.

MemDataset.copy can optionally do a deep or shallow copy (deep is default) and set the memory layout of the underlying data (if it is a numpy array). By default, this is the same layout as the dataset being copied.

MemDiskGroup.copy deep copies all datasets by default and provides a shared argument. This is implemented via the deep_group_copy method, which is extended to optionally deep copy datasets (except those that are shared) and copy distributed arrays only in the memory -> memory case. I made considerations based on discussion in #190.

This PR also provides a a tool to check two objects which may contain numpy arrays for equality (ex: a list of arrays, dict with array arguments, etc...). For example, deep copying a list which contains a numpy array would result in an error when comparing the dicts, as the numpy arrays would not be the same object and would fall back on elementwise comparison. Since list.__eq__ won't call all() on the numpy array after elementwise comparison, the comparison would fail. The same thing happens if the outer container is a dict.

Finally, the PR adds a check for matching dtype when comparing two MemDatasets. I'm undecided if this is a good idea or not.

NOTE: I've done a fair bit of testing and I think this is tentatively ready for review. I do still want to try to break it a bit more thoroughly. I've tagged a few possibly relevant people for review but it's fine if this waits for awhile

@ljgray ljgray changed the title Dataset_id handling (1/2): MemDataset method to partially copy a dataset Dataset_id Handling (1/2): MemDataset method to partially copy a dataset Mar 1, 2023
@ljgray ljgray force-pushed the ljg/dataset-from-dataset branch 8 times, most recently from 190b975 to 2761eba Compare March 1, 2023 23:33
@ljgray ljgray changed the title Dataset_id Handling (1/2): MemDataset method to partially copy a dataset MemDataset method to partially copy a dataset Mar 2, 2023
@ljgray ljgray marked this pull request as draft March 2, 2023 23:23
@ljgray ljgray changed the title MemDataset method to partially copy a dataset WIP - MemDataset method to partially copy a dataset Mar 8, 2023
@ljgray ljgray force-pushed the ljg/dataset-from-dataset branch 8 times, most recently from 572cddc to 42045a0 Compare March 14, 2023 22:44
caput/memh5.py Show resolved Hide resolved
caput/memh5.py Show resolved Hide resolved
@ljgray ljgray force-pushed the ljg/dataset-from-dataset branch 3 times, most recently from a1a5d00 to 642d3f0 Compare March 14, 2023 23:48
Provides deep_group_copy with arguments to actually deep copy datasets,
with an option to shallow copy 'shared' datasets. Also, allow
distributed datasets to be copied only in the memory -> memory case.
@ljgray ljgray changed the title WIP - MemDataset method to partially copy a dataset WIP - provide .copy methods for MemDataset and MemDiskGroup Mar 14, 2023
@ljgray ljgray changed the title WIP - provide .copy methods for MemDataset and MemDiskGroup Provide .copy methods for MemDataset and MemDiskGroup Mar 15, 2023
@ljgray ljgray marked this pull request as ready for review March 15, 2023 00:08
Copy link
Contributor

@kiyo-masui kiyo-masui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a full read, and it looks good to me. I'll leave final approval to @jrs65 since he is much more up-to-date on this code than I am at this point.

Copy link
Contributor

@jrs65 jrs65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice. A small query about the object comparison stuff, but otherwise great!

caput/tools.py Outdated Show resolved Hide resolved
Specifically, this lets us compare objects such as lists and dicts which
contain numpy ndarrays (or subclasses) which are equal elementwise but
are not actually the same object (such as after a deep copy).
@ljgray ljgray merged commit a3787b9 into master Mar 16, 2023
@ljgray ljgray deleted the ljg/dataset-from-dataset branch March 16, 2023 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants