Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added experimental pickling support to Universe #2140

Closed
wants to merge 8 commits into from

Conversation

richardjgowers
Copy link
Member

@richardjgowers richardjgowers commented Nov 15, 2018

This adds a basic form of pickling for a Universe, everything seems to work in the snippet below

import dask
from dask import distributed

import MDAnalysis as mda
from MDAnalysisTests.datafiles import GRO, XTC

c = distributed.Client()

u = mda.Universe(GRO, XTC)

ag = u.atoms[:5]

def get_pos(u, ag, frame):    
    u.trajectory[frame]
    
    return ag.center_of_mass()

futures = []
for i in range(10):
    futures.append(dask.delayed(get_pos)(u, ag, i))

L = c.compute(dask.delayed(futures))

ref = [get_pos(u, ag, i) for i in range(10)]

WRT what sort of schedulers this should work for, I think this is only safe for distributed or multiprocessing, but not thread (as you can't share a Universe and play with frames)

@VOD555 I think this means you can send AtomGroups directly (no more remembering .ix tricks), and you don't reread the topology file (ie faster) because that half of the Universe is pickled.

raise NotImplementedError
@classmethod
def _unpickle_U(cls, top, traj, anchor):
u = cls(top, anchor_name=anchor)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the anchor_name= here makes sure that AtomGroups that get unpickled afterwards recognise this deserialised object as the correct parent for them

def __setstate__(self, state):
raise NotImplementedError
def __reduce__(self):
return (self._unpickle_U, (self._topology, self.trajectory.filename, self.anchor_name))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the entire Topology side of things seems to pickle cleanly without any drama, so we can just do this. To serialise/deserialise the Reader side of things we will always want a unique file handle, so passing the filename around isn't too awful.

There's probably lots of state in Reader which isn't currently respected (auxs, current frame off the top of my head)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be documented what does not work yet.

From the code above it seems that Aux are included but if not, then pickle should fail on anything that we cannot pickle at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out a Reader just knows how to serialise itself (including file handles)

@codecov
Copy link

codecov bot commented Nov 21, 2018

Codecov Report

Merging #2140 into develop will decrease coverage by 1.29%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2140      +/-   ##
===========================================
- Coverage    90.71%   89.41%    -1.3%     
===========================================
  Files           15      158     +143     
  Lines         1982    19646   +17664     
  Branches         0     2761    +2761     
===========================================
+ Hits          1798    17567   +15769     
- Misses         184     1476    +1292     
- Partials         0      603     +603
Impacted Files Coverage Δ
package/MDAnalysis/core/universe.py 95.05% <100%> (ø)
package/MDAnalysis/coordinates/base.py 93.32% <100%> (ø)
package/MDAnalysis/coordinates/GRO.py 93.87% <0%> (ø)
package/MDAnalysis/analysis/helanal.py 85.19% <0%> (ø)
package/MDAnalysis/topology/MinimalParser.py 100% <0%> (ø)
package/MDAnalysis/visualization/__init__.py 100% <0%> (ø)
package/MDAnalysis/core/qcprot.py 100% <0%> (ø)
...e/MDAnalysis/analysis/encore/clustering/cluster.py 96.22% <0%> (ø)
package/MDAnalysis/coordinates/DCD.py 97.16% <0%> (ø)
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea05c5a...f1a7d5a. Read the comment docs.

@richardjgowers
Copy link
Member Author

poke @VOD555 if this works then it makes pmda a hell of a lot simpler

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very encouraging!! Addresses/supercedes #643 (and possibly #878 ?)

More testing & documentation.

Will also eventually need tests for #1981 #1940

def __setstate__(self, state):
raise NotImplementedError
def __reduce__(self):
return (self._unpickle_U, (self._topology, self.trajectory.filename, self.anchor_name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be documented what does not work yet.

From the code above it seems that Aux are included but if not, then pickle should fail on anything that we cannot pickle at the moment.

package/MDAnalysis/coordinates/base.py Outdated Show resolved Hide resolved
@@ -283,10 +283,14 @@ def test_load_multiple_args(self):
assert_equal(len(u.atoms), 3341, "Loading universe failed somehow")
assert_equal(u.trajectory.n_frames, 2 * ref.trajectory.n_frames)

def test_pickle_raises_NotImplementedError(self):
def test_pickle(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also test that

  • aux
  • transformations

are properly pickled.

@richardjgowers
Copy link
Member Author

@orbeckst I'm happy to push ahead and finish this once I get some confirmation it works for pmda too, else it's just a useless fix

@@ -738,7 +738,7 @@ def _gen_anchor_hash(self):
return self._anchor_uuid
except AttributeError:
# store this so we can later recall it if needed
self._anchor_uuid = uuid.uuid4()
self._anchor_uuid = str(uuid.uuid4())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uuid's didn't seem to survive the serialisation well, but if I just take a string it's all good (and still uuidesque)

"""Special method used by __reduce__ to deserialise a Universe"""
# top is a Topology object at this point, but Universe can handle that
u = cls(top)
u.anchor_name = anchor
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the magic that lets future atomgroups find their 'home' universe

@richardjgowers
Copy link
Member Author

Ok looks like multiprocessing works, except for ascii format trajectories. I think this is close to working and I'd like to put it in (for a project I have brewing) so @MDAnalysis/coredevs try and break this (while I fix up ascii formats)

@orbeckst
Copy link
Member

@VOD555 can you have a look at this work in the context of PMDA?

@orbeckst
Copy link
Member

These are surprisingly few changes.

@orbeckst
Copy link
Member

As I said in the review "More testing and documentation" (I think you also need CHANGELOG etc). It should be clearly documented what works and what doesn't. If not everything works then label it as "experimental".

And it would be nice to fail with meaningful exceptions – if possible.

@@ -353,6 +353,16 @@ def __init__(self, filenames, skip=1, dt=None, continuous=False, **kwargs):
self.ts = None
self.rewind()

def __getstate__(self):
state = self.__dict__.copy()
state.pop('_ChainReader__chained_trajectories_iter', None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can't really reproduce the state of this generator cleanly (or at least in a simple way). Tbh I've never used chainreader or really touched the code, so maybe I should just slap a 'NotImplementedError' on this and let someone who understands this fix it in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're doing is ok — like all the others, it resets to frame 0 on unpickling.

Once pickle support is upgraded to also restore the frame index then moving to the appropriate frame should just use the usual trajectory[frame] and then let ChainReader take care of how to do it.

If you somehow managed to pickle ChainReader.readers in

self.readers = [core.reader(filename, **kwargs)
for filename in filenames]
then it should work. Otherwise, we need a special getstate/setstate pair that blanks readers and rebuilds them from ChainReader.filenames (although we should actually store the original filenames because they can be a list of tuples — that's an addition to ChainReader itself).


del self._pickle_fn

class _BAsciiPickle(_AsciiPickle):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda ugly, but to unpickle the file handle you need to remember the mode it was opened in. This isn't currently possible afaik

@richardjgowers
Copy link
Member Author

I did some hacking, all readers now pickle, to the best of my knowledge..... Lots of ugly hacks everywhere though

(XYZ,),
])
def u(request):
if len(request.param) == 1:
Copy link
Member

@orbeckst orbeckst Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just

def u(request):
   return mda.Universe(*request.param)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that would also work for ChainReader shenanigans

u(TOPOL, TRJ1, TRJ2)
u(TOPOL, [TRJ1, TRJ2])
u(TOPOL, [(TRJ1, "pdb"), (TRJ2, "xtc"), TRJ3])

@orbeckst
Copy link
Member

@yuxuanzhuang as discussed on the GSoC call, put up your updated PR from your fork and then we can close this one as superseded. (Mention PR #2140 in the text of your PR so that they link nicely to each other.)

@orbeckst
Copy link
Member

Oops – sorry, you just did it 16 minutes ago – my page hadn't been refreshed.

I am closing this PR now (superseded by PR #2704)

@orbeckst orbeckst closed this May 29, 2020
orbeckst pushed a commit that referenced this pull request Aug 8, 2020
* Fixes #2878
* basic approach: composition instead of inheritance for pickling Universe (which was
   tested in PR #2704 (which was derived from PR #2140))
* Changes made in this Pull Request:
    - add new classes and pickle_open function to picklable_file_io.py
    - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes.
    - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader`
    - fix DCD, XDR pickle issue
    - modified gsd and ncdf to be picklabel
    - modified ChainReader to be picklabel
    - ensure chemfiles is picklable
* add tests (MultiFrameReader will test for serializability)
* update CHANGELOG
* update docs

Note: This merge squashed 120 commits. See PR #2723 for the full history with 420 comments.
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* Fixes MDAnalysis#2878
* basic approach: composition instead of inheritance for pickling Universe (which was
   tested in PR MDAnalysis#2704 (which was derived from PR MDAnalysis#2140))
* Changes made in this Pull Request:
    - add new classes and pickle_open function to picklable_file_io.py
    - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes.
    - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader`
    - fix DCD, XDR pickle issue
    - modified gsd and ncdf to be picklabel
    - modified ChainReader to be picklabel
    - ensure chemfiles is picklable
* add tests (MultiFrameReader will test for serializability)
* update CHANGELOG
* update docs

Note: This merge squashed 120 commits. See PR MDAnalysis#2723 for the full history with 420 comments.
@RMeli RMeli deleted the serialise branch December 23, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants