-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide .copy
methods for MemDataset
and MemDiskGroup
#236
Conversation
190b975
to
2761eba
Compare
2761eba
to
d3c7a30
Compare
572cddc
to
42045a0
Compare
a1a5d00
to
642d3f0
Compare
Provides deep_group_copy with arguments to actually deep copy datasets, with an option to shallow copy 'shared' datasets. Also, allow distributed datasets to be copied only in the memory -> memory case.
642d3f0
to
42ec637
Compare
.copy
methods for MemDataset
and MemDiskGroup
.copy
methods for MemDataset
and MemDiskGroup
.copy
methods for MemDataset
and MemDiskGroup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given this a full read, and it looks good to me. I'll leave final approval to @jrs65 since he is much more up-to-date on this code than I am at this point.
42ec637
to
5c9eef2
Compare
5c9eef2
to
0a7a897
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice. A small query about the object comparison stuff, but otherwise great!
Specifically, this lets us compare objects such as lists and dicts which contain numpy ndarrays (or subclasses) which are equal elementwise but are not actually the same object (such as after a deep copy).
0a7a897
to
5d693a5
Compare
5d693a5
to
860004a
Compare
This PR provides
.copy
methods forMemDataset
andMemDiskGroup
classes.MemDataset.copy
can optionally do a deep or shallow copy (deep is default) and set the memory layout of the underlying data (if it is a numpy array). By default, this is the same layout as the dataset being copied.MemDiskGroup.copy
deep copies all datasets by default and provides ashared
argument. This is implemented via thedeep_group_copy
method, which is extended to optionally deep copy datasets (except those that are shared) and copy distributed arrays only in the memory -> memory case. I made considerations based on discussion in #190.This PR also provides a a tool to check two objects which may contain numpy arrays for equality (ex: a list of arrays, dict with array arguments, etc...). For example, deep copying a list which contains a numpy array would result in an error when comparing the dicts, as the numpy arrays would not be the same object and would fall back on elementwise comparison. Since
list.__eq__
won't callall()
on the numpy array after elementwise comparison, the comparison would fail. The same thing happens if the outer container is a dict.Finally, the PR adds a check for matching
dtype
when comparing twoMemDatasets
. I'm undecided if this is a good idea or not.NOTE: I've done a fair bit of testing and I think this is tentatively ready for review. I do still want to try to break it a bit more thoroughly. I've tagged a few possibly relevant people for review but it's fine if this waits for awhile