Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport fix for #2878 (pickle DCD and XDR readers at correct frame) #2908

Closed
wants to merge 6 commits into from

Conversation

orbeckst
Copy link
Member

Fixes #2878 (backport)

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

yuxuanzhuang and others added 2 commits August 12, 2020 17:47
- fix issue #2878
- add tests to the base tests (but xfail for any format that does not
  implement pickling yet)
@orbeckst
Copy link
Member Author

@yuxuanzhuang can you please have a look? The test test_pickle_next_ts_reader fails for DCD and XTC/TRR and I don't understand why. Am I missing another change somewhere?

pytest --disable-pytest-warnings testsuite/MDAnalysisTests/coordinates/test_{dcd,xdr}.py

platform darwin -- Python 3.7.6, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: ~/mdanalysis/testsuite, inifile: setup.cfg
plugins: hypothesis-5.19.0
collected 288 items

testsuite/MDAnalysisTests/coordinates/test_dcd.py ......................................F..... [ 15%]
.......................................................................                        [ 39%]
testsuite/MDAnalysisTests/coordinates/test_xdr.py ............................................ [ 55%]
.................................................F............................................ [ 87%]
....F..............................                                                            [100%]

============================================== FAILURES ==============================================
______________________________ TestDCDReader.test_pickle_next_ts_reader ______________________________

self = <MDAnalysisTests.coordinates.test_dcd.TestDCDReader object at 0x118ec4bd0>
reader = <DCDReader ~/mdanalysis/testsuite/MDAnalysisTests/data/coordinates/test.dcd with 5 frames of 5 atoms>

    def test_pickle_next_ts_reader(self, reader):
        try:
            reader_p = pickle.loads(pickle.dumps(reader))
        except (TypeError, ValueError):
            pytest.xfail("Pickling not yet implemented for format {}".format(
                reader.format))
        assert_equal(next(reader), next(reader_p),
>                    "Next timestep is changed after pickling")
E       AssertionError:
E       Items are not equal: Next timestep is changed after pickling
E        ACTUAL: < Timestep 1 with unit cell dimensions [82.1 83.2 84.3 75.1 80.1 85.1] >
E        DESIRED: < Timestep 1 with unit cell dimensions [81.1 82.2 83.3 75.  80.  85. ] >

testsuite/MDAnalysisTests/coordinates/base.py:513: AssertionError
_____________________________ TestXTCReader_2.test_pickle_next_ts_reader _____________________________

self = <MDAnalysisTests.coordinates.test_xdr.TestXTCReader_2 object at 0x10c9f1290>
reader = <XTCReader ~/mdanalysis/testsuite/MDAnalysisTests/data/coordinates/test.xtc with 5 frames of 5 atoms>

    def test_pickle_next_ts_reader(self, reader):
        try:
            reader_p = pickle.loads(pickle.dumps(reader))
        except (TypeError, ValueError):
            pytest.xfail("Pickling not yet implemented for format {}".format(
                reader.format))
        assert_equal(next(reader), next(reader_p),
>                    "Next timestep is changed after pickling")
E       AssertionError:
E       Items are not equal: Next timestep is changed after pickling
E        ACTUAL: < Timestep 1 with unit cell dimensions [82.1 83.2 84.3 75.1 80.1 85.1] >
E        DESIRED: < Timestep 1 with unit cell dimensions [81.1      82.200005 83.30001  75.       80.       85.      ] >

testsuite/MDAnalysisTests/coordinates/base.py:513: AssertionError
_____________________________ TestTRRReader_2.test_pickle_next_ts_reader _____________________________

self = <MDAnalysisTests.coordinates.test_xdr.TestTRRReader_2 object at 0x11b4c9b10>
reader = <TRRReader ~/mdanalysis/testsuite/MDAnalysisTests/data/coordinates/test.trr with 5 frames of 5 atoms>

    def test_pickle_next_ts_reader(self, reader):
        try:
            reader_p = pickle.loads(pickle.dumps(reader))
        except (TypeError, ValueError):
            pytest.xfail("Pickling not yet implemented for format {}".format(
                reader.format))
        assert_equal(next(reader), next(reader_p),
>                    "Next timestep is changed after pickling")
E       AssertionError:
E       Items are not equal: Next timestep is changed after pickling
E        ACTUAL: < Timestep 1 with unit cell dimensions [82.1 83.2 84.3 75.1 80.1 85.1] >
E        DESIRED: < Timestep 1 with unit cell dimensions [81.1      82.200005 83.30001  75.       80.       85.      ] >

testsuite/MDAnalysisTests/coordinates/base.py:513: AssertionError
====================================== short test summary info =======================================
FAILED testsuite/MDAnalysisTests/coordinates/test_dcd.py::TestDCDReader::test_pickle_next_ts_reader
FAILED testsuite/MDAnalysisTests/coordinates/test_xdr.py::TestXTCReader_2::test_pickle_next_ts_reader
FAILED testsuite/MDAnalysisTests/coordinates/test_xdr.py::TestTRRReader_2::test_pickle_next_ts_reader
============================ 3 failed, 285 passed, 957 warnings in 23.42s ============================

@orbeckst orbeckst mentioned this pull request Aug 13, 2020
4 tasks
@yuxuanzhuang
Copy link
Contributor

I think the __setstate__ for ProtoReader also has to be included to make next() right:

    def __setstate__(self, state):
        self.__dict__ = state
        self[self.ts.frame]

@orbeckst
Copy link
Member Author

So if I understand correctly then I should not include the tests that check that a Reader can be pickled because that's really the new feature in 2.0.0. (For instance, I also didn't backport any __set/getstate__ code in base.Timestep.) I should just test that the low-level DCD and XTC/TRR file objects can pickle themselves correctly.

@yuxuanzhuang
Copy link
Contributor

I think so, maybe something similar to what was tested here ad48254#diff-7674f3848f653d737235c4f81a267c11R86

Besides, maybe it's a bit tricky to get to the last frame with only DCD/XDR file because you cannot do dcd.seek(len(dcd)) but:

    dcd.seek(len(dcd) - 1)
    _ = dcd.read()  #  read next frame 
    dump = pickle.dumps(dcd)
    new_dcd = pickle.loads(dump)

    assert dcd.fname == new_dcd.fname
    assert dcd.tell() == new_dcd.tell()

These tests all FAIL at the moment.
all pickle tests FAIL at the moment
@pep8speaks
Copy link

pep8speaks commented Aug 13, 2020

Hello @orbeckst! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 87:1: E302 expected 2 blank lines, found 1
Line 89:34: E261 at least two spaces before inline comment
Line 96:1: E302 expected 2 blank lines, found 1
Line 102:1: E302 expected 2 blank lines, found 1
Line 115:1: E302 expected 2 blank lines, found 1
Line 120:1: E265 block comment should start with '# '
Line 121:1: E302 expected 2 blank lines, found 1

Line 164:5: E265 block comment should start with '# '

Comment last updated at 2020-08-13 19:22:53 UTC

@orbeckst
Copy link
Member Author

@yuxuanzhuang I added additional testing for DCDFile pickling and XTC/TRRFile pickling and I find that even though reader.tell() agrees after pickling, the contents of the frames do not.

Run

pytest --disable-pytest-warnings testsuite/MDAnalysisTests/formats/test_lib{dcd,mdaxdr}.py

to see the failing tests.

Furthermore, if I pickle a DCDFile before seeking then I cannot unpickle it (test test_pickle_immediately).

Did you test the low-level pickling of the file classes or only the Readers? Any insights?

(Also ping @kain88-de just in case he wants to take a look.)

@yuxuanzhuang
Copy link
Contributor

yuxuanzhuang commented Aug 13, 2020

Sorry I am a bit confused, isn't the only error here is that AttributeError: 'TRRFrame' object has no attribute 'prec' (my test is the same as in Travis https://travis-ci.com/github/MDAnalysis/mdanalysis/jobs/371817789)

Edit: Now I get the errors after I rerun build_ext. I will have a look.

@orbeckst
Copy link
Member Author

oops, the prec is my error – but I never saw it because my tests failed earlier. fixed.

Copy link
Member Author

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on code for additional rationale/explanations.

@@ -265,7 +265,8 @@ cdef class DCDFile:
return

current_frame = state[1]
self.seek(current_frame)
self.seek(current_frame - 1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these lines make the low-level tests fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah...it messed up with the position of the file.

The fix is ditching current changes, and changing line 410 to:

        if frame >= self.n_frames + 1:

Besides, I don't think current tests cover for the "true" last frame i.e.:

        traj = DCDReader(DCD)
        traj[-1]  #  move to last frame
        dcd = traj._file
        dcd.tell() == len(dcd)  #  instead of len(dcd) - 1

Comment on lines +309 to +310
self.seek(current_frame - 1)
self.current_frame = current_frame
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these lines make the low-level tests fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current tests can pass if these changes are ditched, but still the last frame cannot be easily pickled as seek here is sort of different from that in DCD files.

Comment on lines +93 to +94
assert_almost_equal(frame.xyz, new_frame.xyz)
assert_almost_equal(frame.unitcell, new_frame.unitcell)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests to check the contents of the current frame


assert dcd.fname == new_dcd.fname
assert dcd.tell() == new_dcd.tell()
def test_pickle(dcd):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the generic pickle test start somewhere in the middle of the trajectory


def test_pickle_last(dcd):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific test for last frame (this was the original test_pickle test)

Comment on lines +121 to +124
def test_pickle_immediately(dcd):
# do not seek before pickling: this seems to leave the DCDFile
# object in weird state: is this supposed to work?
new_dcd = pickle.loads(pickle.dumps(dcd))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pickle always fails if we have never done a seek() before attempting to pickle.

______________________________________ test_pickle_immediately _______________________________________

dcd = <MDAnalysis.lib.formats.libdcd.DCDFile object at 0x12a1e5890>

    def test_pickle_immediately(dcd):
        # do not seek before pickling: this seems to leave the DCDFile
        # object in weird state: is this supposed to work?
>       new_dcd = pickle.loads(pickle.dumps(dcd))

testsuite/MDAnalysisTests/formats/test_libdcd.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
MDAnalysis/lib/formats/libdcd.pyx:268: in MDAnalysis.lib.formats.libdcd.DCDFile.__setstate__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OSError: DCD seek failed with DCD error=Normal EOF

MDAnalysis/lib/formats/libdcd.pyx:422: OSError

I find this behavior surprising (as a user) but maybe I don't understand if this is expected. But in that case we should raise an appropriate exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, then that's a bug.

Copy link
Member Author

@orbeckst orbeckst Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that it is only a bug in this PR... because it works with 1.0.0 and current develop

In [1]: import MDAnalysis as mda
In [2]: mda.__version__
Out[2]: '1.0.0'
In [3]: import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data

For dcd

In [13]: dcd = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD)
In [14]: pickle.dumps(dcd)
Out[14]: b'\x80\x03cMDAnalysis.lib.formats.libdcd\nDCDFile\nq\x00Xg\x00\x00\x00~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysisTests/data/coordinates/test.dcdq\x01X\x01\x00\x00\x00rq\x02\x86q\x03Rq\x04K\x01K\x00\x86q\x05b.'
In [15]: pickle.loads(pickle.dumps(dcd))
Out[15]: <MDAnalysis.lib.formats.libdcd.DCDFile at 0x118d75678>

and also works for trr

In [8]: trr = mda.lib.formats.libmdaxdr.TRRFile(data.COORDINATES_TRR)
In [9]: pickle.dumps(trr)
Out[9]: b'\x80\x03cMDAnalysis.lib.formats.libmdaxdr\nTRRFile\nq\x00Xg\x00\x00\x00~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysisTests/data/coordinates/test.trrq\x01X\x01\x00\x00\x00rq\x02\x86q\x03Rq\x04(K\x01K\x00NK\x00tq\x05b.'
In [10]: pickle.loads(pickle.dumps(trr))
Out[10]: <MDAnalysis.lib.formats.libmdaxdr.TRRFile at 0x118c0d438>

EDIT: added loading and fixed DCD example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach I took here is flawed (so it also shouldn't work in current develop--when pickle.loading).

Copy link
Member Author

@orbeckst orbeckst Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I need to check what commit I used for this:

In [13]: dcd = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD)

In [14]: pickle.dumps(dcd)
Out[14]: b'\x80\x03cMDAnalysis.lib.formats.libdcd\nDCDFile\nq\x00Xj\x00\x00\x00~/anaconda3/envs/mda3dev/lib/python3.7/site-packages/MDAnalysisTests/data/coordinates/test.dcdq\x01X\x01\x00\x00\x00rq\x02\x86q\x03Rq\x04K\x01K\x00\x86q\x05b.'

In [15]: mda.__version__
Out[15]: '2.0.0-dev0'

In [16]: pickle.loads(pickle.dumps(dcd))
Out[16]: <MDAnalysis.lib.formats.libdcd.DCDFile at 0x1268ffb90>

Copy link
Contributor

@yuxuanzhuang yuxuanzhuang Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly enough, data.DCD (which is used in the testsuite) fails at pickling while data.COORDINATES_DCD fails at reading:

>>> dcd = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD)
>>> read_p = pickle.loads(pickle.dumps(dcd))
>>> read_p.read()
OSError: Reading DCD header failed: format of DCD file is wrong

Comment on lines +139 to +143
assert_equal(old_reader.offsets, new_reader.offsets)
assert_almost_equal(frame.x, new_frame.x)
assert_almost_equal(frame.box, new_frame.box)
assert frame.step == new_frame.step
assert_almost_equal(frame.time, new_frame.time)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests for contents of the frame

Comment on lines +133 to +134
frame = old_reader.read()
new_frame = new_reader.read()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read() or next(reader) is the only way to get the current frame

@orbeckst
Copy link
Member Author

orbeckst commented Aug 13, 2020

Some representative errors from

pytest --disable-pytest-warnings testsuite/MDAnalysisTests/formats/test_lib{dcd,mdaxdr}.py

are pasted here below. The full output is attached as pytest.txt

DCD

  • can't pickle without an initial seek
  • restoring pickle ends up with the wrong frame data (but supposedly the correct frame number)
=================================== FAILURES ===================================
_________________________________ test_pickle __________________________________

dcd = <MDAnalysis.lib.formats.libdcd.DCDFile object at 0x12f5a04d0>

    def test_pickle(dcd):
        mid = len(dcd) // 2
        dcd.seek(mid)
        new_dcd = pickle.loads(pickle.dumps(dcd))
>       _assert_compare_readers(dcd, new_dcd)

testsuite/MDAnalysisTests/formats/test_libdcd.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

old_reader = <MDAnalysis.lib.formats.libdcd.DCDFile object at 0x12f5a04d0>
new_reader = <MDAnalysis.lib.formats.libdcd.DCDFile object at 0x12f5a0590>

    def _assert_compare_readers(old_reader, new_reader):
        frame = old_reader.read()     # same as next(old_reader)
        new_frame = new_reader.read() # same as next(new_reader)
    
        assert old_reader.fname == new_reader.fname
        assert old_reader.tell() == new_reader.tell()
>       assert_almost_equal(frame.xyz, new_frame.xyz)
E       AssertionError: 
E       Arrays are not almost equal to 7 decimals
E       
E       Mismatched elements: 10023 / 10023 (100%)
E       Max absolute difference: 2.6039443
E       Max relative difference: 174.84053
E        x: array([[14.55957  ,  6.7656193, -7.6886163],
E              [15.369944 ,  6.5904756, -7.066784 ],
E              [14.758957 ,  7.5787416, -8.288996 ],...
E        y: array([[14.533502 ,  7.0310464, -7.2787514],
E              [15.250823 ,  6.667893 , -6.528267 ],
E              [14.731286 ,  7.9598083, -7.628377 ],...

testsuite/MDAnalysisTests/formats/test_libdcd.py:93: AssertionError

___________________________ test_pickle_immediately ____________________________

dcd = <MDAnalysis.lib.formats.libdcd.DCDFile object at 0x12f5a0890>

    def test_pickle_immediately(dcd):
        # do not seek before pickling: this seems to leave the DCDFile
        # object in weird state: is this supposed to work?
>       new_dcd = pickle.loads(pickle.dumps(dcd))

testsuite/MDAnalysisTests/formats/test_libdcd.py:124: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
MDAnalysis/lib/formats/libdcd.pyx:268: in MDAnalysis.lib.formats.libdcd.DCDFile.__setstate__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: DCD seek failed with DCD error=Normal EOF

MDAnalysis/lib/formats/libdcd.pyx:422: OSError

xdr (XTC/TRR)

  • can't pickle without an initial seek
  • restoring pickle ends up with the wrong frame data (but supposedly the correct frame number)
_ TestCommonAPI.test_pickle[~/mdanalysis/testsuite/MDAnalysisTests/data/xtc_test_only_10_frame_10_atoms.xtc-XTCFile] _

self = <MDAnalysisTests.formats.test_libmdaxdr.TestCommonAPI object at 0x12fbcd8d0>
reader = <MDAnalysis.lib.formats.libmdaxdr.XTCFile object at 0x12fba4150>

    def test_pickle(self, reader):
        mid = len(reader) // 2
        reader.seek(mid)
        new_reader = pickle.loads(pickle.dumps(reader))
>       self._assert_compare_readers(reader, new_reader)

testsuite/MDAnalysisTests/formats/test_libmdaxdr.py:149: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

old_reader = <MDAnalysis.lib.formats.libmdaxdr.XTCFile object at 0x12fba4150>
new_reader = <MDAnalysis.lib.formats.libmdaxdr.XTCFile object at 0x12fba4250>

    @staticmethod
    def _assert_compare_readers(old_reader, new_reader):
        frame = old_reader.read()
        new_frame = new_reader.read()
    
        assert old_reader.fname == new_reader.fname
        assert old_reader.tell() == new_reader.tell()
    
        assert_equal(old_reader.offsets, new_reader.offsets)
>       assert_almost_equal(frame.x, new_frame.x)
E       AssertionError: 
E       Arrays are not almost equal to 7 decimals
E       
E       Mismatched elements: 30 / 30 (100%)
E       Max absolute difference: 1.
E       Max relative difference: 0.25
E        x: array([[5., 5., 5.],
E              [5., 5., 5.],
E              [5., 5., 5.],...
E        y: array([[4., 4., 4.],
E              [4., 4., 4.],
E              [4., 4., 4.],...

_ TestCommonAPI.test_pickle_immediately[~/mdanalysis/testsuite/MDAnalysisTests/data/xtc_test_only_10_frame_10_atoms.xtc-XTCFile] _

self = <MDAnalysisTests.formats.test_libmdaxdr.TestCommonAPI object at 0x1280159d0>
reader = <MDAnalysis.lib.formats.libmdaxdr.XTCFile object at 0x127fdd9d0>

    def test_pickle_immediately(self, reader):
        # do not seek before pickling: this seems to leave the XDRFile
        # object in weird state: is this supposed to work?
>       new_reader = pickle.loads(pickle.dumps(reader))

testsuite/MDAnalysisTests/formats/test_libmdaxdr.py:168: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
MDAnalysis/lib/formats/libmdaxdr.pyx:309: in MDAnalysis.lib.formats.libmdaxdr._XDRFile.__setstate__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: Can't seek to negative frame

MDAnalysis/lib/formats/libmdaxdr.pyx:332: OSError

@orbeckst orbeckst marked this pull request as draft August 13, 2020 19:26
@@ -265,7 +265,8 @@ cdef class DCDFile:
return

current_frame = state[1]
self.seek(current_frame)
self.seek(current_frame - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.seek(current_frame - 1)
self.seek(current_frame)

@@ -265,7 +265,8 @@ cdef class DCDFile:
return

current_frame = state[1]
self.seek(current_frame)
self.seek(current_frame - 1)
self.current_frame = current_frame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.current_frame = current_frame

@@ -265,7 +265,8 @@ cdef class DCDFile:
return

current_frame = state[1]
self.seek(current_frame)
self.seek(current_frame - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah...it messed up with the position of the file.

The fix is ditching current changes, and changing line 410 to:

        if frame >= self.n_frames + 1:

Besides, I don't think current tests cover for the "true" last frame i.e.:

        traj = DCDReader(DCD)
        traj[-1]  #  move to last frame
        dcd = traj._file
        dcd.tell() == len(dcd)  #  instead of len(dcd) - 1

Comment on lines +309 to +310
self.seek(current_frame - 1)
self.current_frame = current_frame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current tests can pass if these changes are ditched, but still the last frame cannot be easily pickled as seek here is sort of different from that in DCD files.

@orbeckst
Copy link
Member Author

I am trying to understand how we use frames here. The basic problem seems to be that tell() reports on the next frame when a frame has been read.

One can get to the last frame in a trajectory with the DCDFile and xdrFile readers.

DCD

>>> import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
>>> reader = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD)

Behavior of tell

Are we using 0- or 1 based frame numbers? The docs for DCDFile.seek say 0 based and trying to seek to len(reader) fails appropriately:

>>> reader.seek(len(reader))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "MDAnalysis/lib/formats/libdcd.pyx", line 411, in MDAnalysis.lib.formats.libdcd.DCDFile.seek
EOFError: Trying to seek over max number of frames

However, DCDFile.tell "current frame (0-based)" will always give the frame + 1:

>>> reader.seek(0)
>>> frames = [reader.tell() for frame in reader]
>>> frames
[1, 2, 3, 4, 5]

Maybe this is because at the time tell() is called, the frame has already been read and we are now at the next frame. But it is a bit odd if one wanted to do something like a time series

>>> centers = [(reader.tell(), frame.xyz.mean(axis=0)) for frame in reader]
>>> centers
[(1, array([6., 7., 8.], dtype=float32)), (2, array([12., 14., 16.], dtype=float32)), (3, array([24., 28., 32.], dtype=float32)), (4, array([48., 56., 64.], dtype=float32)), (5, array([ 96., 112., 128.], dtype=float32))]

where we are now associating the wrong frame index with each observable.

If we want to go back to a specific frame (here, frame 2) and we naively use tell after we read the frame (assuming that tell gives us the "current frame", i.e., the one we have been working with) then we will seek to the next frame instead:

>>> reader.seek(2)
>>> frame_2 = reader.read()
>>> t_2 = reader.tell()      # this is NOT frame 2 but frame 3 already
>>> t_2
3
>>> reader.seek(t_2)
>>> reader.tell()
3
>>> reader.seek(t_2 - 1)  # need to remember to go one back...
>>> reader.tell()
2

last frame

One can get to the last frame

>>> reader.seek(len(reader)-1)
>>> reader.read()
DCDFrame(xyz=array([[  0.,  16.,  32.],
       [ 48.,  64.,  80.],
       [ 96., 112., 128.],
       [144., 160., 176.],
       [192., 208., 224.]], dtype=float32), unitcell=array([85.09999847, 85.40000153, 86.19999695, 80.40000153, 75.40000153,
       87.30000305]))
>>>

The above is really the last frame, as seen here:

>>> for frames in reader:
...    print(frames.xyz)
...
[[ 0.  1.  2.]
 [ 3.  4.  5.]
 [ 6.  7.  8.]
 [ 9. 10. 11.]
 [12. 13. 14.]]
[[ 0.  2.  4.]
 [ 6.  8. 10.]
 [12. 14. 16.]
 [18. 20. 22.]
 [24. 26. 28.]]
[[ 0.  4.  8.]
 [12. 16. 20.]
 [24. 28. 32.]
 [36. 40. 44.]
 [48. 52. 56.]]
[[  0.   8.  16.]
 [ 24.  32.  40.]
 [ 48.  56.  64.]
 [ 72.  80.  88.]
 [ 96. 104. 112.]]
[[  0.  16.  32.]
 [ 48.  64.  80.]
 [ 96. 112. 128.]
 [144. 160. 176.]
 [192. 208. 224.]]
>>>

XDR

>>> import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
>>> reader = mda.lib.formats.libmdaxdr.TRRFile(data.COORDINATES_TRR)

tell

Same as above: tell provides the next frame

>>> reader.seek(0)
>>> frames = [reader.tell() for frame in reader]
>>> frames
[1, 2, 3, 4, 5]

and

>>> reader.seek(2)
>>> frame_2 = reader.read()
>>> reader.seek(2)
>>> frame_2 = reader.read()
>>> t_2 = reader.tell()      # this is NOT frame 2 but frame 3 already
>>> t_2
3
>>> reader.seek(t_2)
>>> reader.tell()
3
>>> frame_new = reader.read()
>>> frame_new.step
3
>>> frame.step
2
>>> reader.seek(t_2 - 1)
>>> frame_correct = reader.read()
>>> frame_correct.step
2

last frame

You can seek to the last frame in the trajectory by using frame_index = len(reader)-1.

>>> reader.seek(len(reader)-1)
>>> reader.read()
TRRFrame(x=array([[ 0.      ,  1.6     ,  3.2     ],
       [ 4.8     ,  6.4     ,  8.      ],
       [ 9.6     , 11.2     , 12.8     ],
       [14.400001, 16.      , 17.6     ],
       [19.2     , 20.800001, 22.4     ]], dtype=float32), v=array([[0.        , 0.16000001, 0.32000002],
       [0.48000002, 0.64000005, 0.8       ],
       [0.96000004, 1.12      , 1.2800001 ],
       [1.4399999 , 1.6       , 1.7600001 ],
       [1.9200001 , 2.08      , 2.24      ]], dtype=float32), f=array([[ 0.       ,  1.5999999,  3.1999998],
       [ 4.7999997,  6.3999996,  8.       ],
       [ 9.599999 , 11.2      , 12.799999 ],
       [14.400001 , 16.       , 17.6      ],
       [19.199999 , 20.8      , 22.4      ]], dtype=float32), box=array([[8.51      , 0.        , 0.        ],
       [0.69131476, 8.592234  , 0.        ],
       [1.4558911 , 2.0905383 , 8.350026  ]], dtype=float32), step=4, time=4.0, lmbda=1.0, hasx=True, hasv=True, hasf=True)

as demonstrated by the fact that we just got the last frame.

The TRR has the steps explicitly numbered so I know that we got the last frame:

>>> steps = [frame.step for frame in reader]
>>> steps
[0, 1, 2, 3, 4]

@yuxuanzhuang
Copy link
Contributor

yuxuanzhuang commented Aug 14, 2020

By "last frame", I mean the state of the file when trajectory is in its last frame. i.e.

>>> reader.seek(len(reader)-1)
>>> reader.read()
reader.tell() == len(reader)

which cannot be pickled before.

I think it makes sense that read() reads the next frame (similar to TextIOWrapper.readline()). I guess the problem is what we should expect from self.current_frame. Given that we cannot read the current frame context (also true for e.g. TextIOWrapper), maybe we should call it next_frame?

Or does it make more sense to make it init to -1, then we always get the 0-based current frame with tell

>>> reader.seek(0)
>>> frames = [reader.tell() for frame in reader]
>>> frames
[0, 1, 2, 3, 4]

@yuxuanzhuang
Copy link
Contributor

Maybe it's worth adding these inclusive tests to develop branch with a new PR? Also maybe we could explore a bit more on how to actually fix the "last frame" pickling issue.

@yuxuanzhuang yuxuanzhuang mentioned this pull request Aug 15, 2020
4 tasks
orbeckst pushed a commit that referenced this pull request Aug 17, 2020
- Fixes #2878
- Changes made in this Pull Request:
    - add more comprehensive tests for pickling low-level dcd and xdr files
    - fix a bug that was introduced in #2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR #2908)
- update CHANGELOG
- maybe backport in #2908
@orbeckst
Copy link
Member Author

Closing; would need to port the changes/fixes from PR #2911.

@orbeckst orbeckst closed this Aug 25, 2020
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
- Fixes MDAnalysis#2878
- Changes made in this Pull Request:
    - add more comprehensive tests for pickling low-level dcd and xdr files
    - fix a bug that was introduced in MDAnalysis#2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR MDAnalysis#2908)
- update CHANGELOG
- maybe backport in MDAnalysis#2908
@RMeli RMeli deleted the backport-fix-2878 branch December 23, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants