-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added experimental pickling support to Universe #2140
Conversation
package/MDAnalysis/core/universe.py
Outdated
raise NotImplementedError | ||
@classmethod | ||
def _unpickle_U(cls, top, traj, anchor): | ||
u = cls(top, anchor_name=anchor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the anchor_name=
here makes sure that AtomGroups that get unpickled afterwards recognise this deserialised object as the correct parent for them
package/MDAnalysis/core/universe.py
Outdated
def __setstate__(self, state): | ||
raise NotImplementedError | ||
def __reduce__(self): | ||
return (self._unpickle_U, (self._topology, self.trajectory.filename, self.anchor_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the entire Topology
side of things seems to pickle cleanly without any drama, so we can just do this. To serialise/deserialise the Reader
side of things we will always want a unique file handle, so passing the filename around isn't too awful.
There's probably lots of state in Reader
which isn't currently respected (auxs, current frame off the top of my head)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to be documented what does not work yet.
From the code above it seems that Aux are included but if not, then pickle should fail on anything that we cannot pickle at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out a Reader
just knows how to serialise itself (including file handles)
Codecov Report
@@ Coverage Diff @@
## develop #2140 +/- ##
===========================================
- Coverage 90.71% 89.41% -1.3%
===========================================
Files 15 158 +143
Lines 1982 19646 +17664
Branches 0 2761 +2761
===========================================
+ Hits 1798 17567 +15769
- Misses 184 1476 +1292
- Partials 0 603 +603
Continue to review full report at Codecov.
|
poke @VOD555 if this works then it makes pmda a hell of a lot simpler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package/MDAnalysis/core/universe.py
Outdated
def __setstate__(self, state): | ||
raise NotImplementedError | ||
def __reduce__(self): | ||
return (self._unpickle_U, (self._topology, self.trajectory.filename, self.anchor_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to be documented what does not work yet.
From the code above it seems that Aux are included but if not, then pickle should fail on anything that we cannot pickle at the moment.
@@ -283,10 +283,14 @@ def test_load_multiple_args(self): | |||
assert_equal(len(u.atoms), 3341, "Loading universe failed somehow") | |||
assert_equal(u.trajectory.n_frames, 2 * ref.trajectory.n_frames) | |||
|
|||
def test_pickle_raises_NotImplementedError(self): | |||
def test_pickle(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also test that
- aux
- transformations
are properly pickled.
@orbeckst I'm happy to push ahead and finish this once I get some confirmation it works for pmda too, else it's just a useless fix |
0596149
to
ba1a138
Compare
@@ -738,7 +738,7 @@ def _gen_anchor_hash(self): | |||
return self._anchor_uuid | |||
except AttributeError: | |||
# store this so we can later recall it if needed | |||
self._anchor_uuid = uuid.uuid4() | |||
self._anchor_uuid = str(uuid.uuid4()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuid's didn't seem to survive the serialisation well, but if I just take a string it's all good (and still uuidesque)
"""Special method used by __reduce__ to deserialise a Universe""" | ||
# top is a Topology object at this point, but Universe can handle that | ||
u = cls(top) | ||
u.anchor_name = anchor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the magic that lets future atomgroups find their 'home' universe
Ok looks like multiprocessing works, except for ascii format trajectories. I think this is close to working and I'd like to put it in (for a project I have brewing) so @MDAnalysis/coredevs try and break this (while I fix up ascii formats) |
@VOD555 can you have a look at this work in the context of PMDA? |
These are surprisingly few changes. |
As I said in the review "More testing and documentation" (I think you also need CHANGELOG etc). It should be clearly documented what works and what doesn't. If not everything works then label it as "experimental". And it would be nice to fail with meaningful exceptions – if possible. |
@@ -353,6 +353,16 @@ def __init__(self, filenames, skip=1, dt=None, continuous=False, **kwargs): | |||
self.ts = None | |||
self.rewind() | |||
|
|||
def __getstate__(self): | |||
state = self.__dict__.copy() | |||
state.pop('_ChainReader__chained_trajectories_iter', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can't really reproduce the state of this generator cleanly (or at least in a simple way). Tbh I've never used chainreader or really touched the code, so maybe I should just slap a 'NotImplementedError' on this and let someone who understands this fix it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're doing is ok — like all the others, it resets to frame 0 on unpickling.
Once pickle support is upgraded to also restore the frame index then moving to the appropriate frame should just use the usual trajectory[frame]
and then let ChainReader take care of how to do it.
If you somehow managed to pickle ChainReader.readers
in
mdanalysis/package/MDAnalysis/coordinates/chain.py
Lines 265 to 266 in 6d6b862
self.readers = [core.reader(filename, **kwargs) | |
for filename in filenames] |
readers
and rebuilds them from ChainReader.filenames
(although we should actually store the original filenames
because they can be a list of tuples — that's an addition to ChainReader itself).
|
||
del self._pickle_fn | ||
|
||
class _BAsciiPickle(_AsciiPickle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda ugly, but to unpickle the file handle you need to remember the mode it was opened in. This isn't currently possible afaik
I did some hacking, all readers now pickle, to the best of my knowledge..... Lots of ugly hacks everywhere though |
(XYZ,), | ||
]) | ||
def u(request): | ||
if len(request.param) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just
def u(request):
return mda.Universe(*request.param)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then that would also work for ChainReader shenanigans
u(TOPOL, TRJ1, TRJ2)
u(TOPOL, [TRJ1, TRJ2])
u(TOPOL, [(TRJ1, "pdb"), (TRJ2, "xtc"), TRJ3])
@yuxuanzhuang as discussed on the GSoC call, put up your updated PR from your fork and then we can close this one as superseded. (Mention PR #2140 in the text of your PR so that they link nicely to each other.) |
Oops – sorry, you just did it 16 minutes ago – my page hadn't been refreshed. I am closing this PR now (superseded by PR #2704) |
* Fixes #2878 * basic approach: composition instead of inheritance for pickling Universe (which was tested in PR #2704 (which was derived from PR #2140)) * Changes made in this Pull Request: - add new classes and pickle_open function to picklable_file_io.py - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes. - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader` - fix DCD, XDR pickle issue - modified gsd and ncdf to be picklabel - modified ChainReader to be picklabel - ensure chemfiles is picklable * add tests (MultiFrameReader will test for serializability) * update CHANGELOG * update docs Note: This merge squashed 120 commits. See PR #2723 for the full history with 420 comments.
* Fixes MDAnalysis#2878 * basic approach: composition instead of inheritance for pickling Universe (which was tested in PR MDAnalysis#2704 (which was derived from PR MDAnalysis#2140)) * Changes made in this Pull Request: - add new classes and pickle_open function to picklable_file_io.py - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes. - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader` - fix DCD, XDR pickle issue - modified gsd and ncdf to be picklabel - modified ChainReader to be picklabel - ensure chemfiles is picklable * add tests (MultiFrameReader will test for serializability) * update CHANGELOG * update docs Note: This merge squashed 120 commits. See PR MDAnalysis#2723 for the full history with 420 comments.
This adds a basic form of pickling for a Universe, everything seems to work in the snippet below
WRT what sort of schedulers this should work for, I think this is only safe for distributed or multiprocessing, but not thread (as you can't share a Universe and play with frames)
@VOD555 I think this means you can send AtomGroups directly (no more remembering
.ix
tricks), and you don't reread the topology file (ie faster) because that half of the Universe is pickled.