Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL_POLY HISTORY may contain cells even for non-periodic simulations #3312

Merged
merged 14 commits into from
May 22, 2021

Conversation

fiszczyp
Copy link
Contributor

@fiszczyp fiszczyp commented May 20, 2021

Fixes #3314

DL_POLY HISTORY files can still have cell, even if the boundary conditions are switched off (imcom=0).

Changes made in this Pull Request:

  • Created a new attribute _has_cell (akin to _has_velocity, etc.) to use as a flag instead of not imcom == 0 for DL_POLY trajectory HistoryReader.
  • Changed one if not all(...) to if not np.all(...) as I think it is deprecated and was throwing errors.
  • Not so relevant for the DL_POLY topology HistoryParser, but also removed the not imcom == 0 check there and instead just checked once the line length corresponds to a correct atom entry to skip all the cell parameters.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@fiszczyp fiszczyp changed the title Fs dlpoly history DL_POLY HISTORY may contain cells even for non-periodic simulations May 20, 2021
@codecov
Copy link

codecov bot commented May 20, 2021

Codecov Report

Merging #3312 (abe62d7) into develop (6b4eea0) will decrease coverage by 4.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3312      +/-   ##
===========================================
- Coverage    93.65%   89.60%   -4.05%     
===========================================
  Files          176      167       -9     
  Lines        22841    21417    -1424     
  Branches      3197        0    -3197     
===========================================
- Hits         21391    19191    -2200     
- Misses        1399     2226     +827     
+ Partials        51        0      -51     
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/DLPoly.py 98.83% <100.00%> (+0.02%) ⬆️
package/MDAnalysis/topology/DLPolyParser.py 96.55% <100.00%> (-0.04%) ⬇️
package/MDAnalysis/converters/RDKitParser.py 14.75% <0.00%> (-81.97%) ⬇️
package/MDAnalysis/converters/RDKit.py 16.85% <0.00%> (-81.23%) ⬇️
package/MDAnalysis/analysis/hole2/hole.py 14.47% <0.00%> (-60.00%) ⬇️
package/MDAnalysis/converters/OpenMM.py 38.23% <0.00%> (-58.83%) ⬇️
package/MDAnalysis/converters/OpenMMParser.py 42.85% <0.00%> (-57.15%) ⬇️
package/MDAnalysis/analysis/hole2/utils.py 25.78% <0.00%> (-50.32%) ⬇️
package/MDAnalysis/analysis/psa.py 79.14% <0.00%> (-10.14%) ⬇️
package/MDAnalysis/topology/guessers.py 93.02% <0.00%> (-6.21%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b4eea0...abe62d7. Read the comment docs.

@fiszczyp fiszczyp marked this pull request as ready for review May 20, 2021 15:58
Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @fiszczyp thanks for the fix. To include this we'll also need a test written, you might be able to alter one of the existing test files for this.

@pep8speaks
Copy link

pep8speaks commented May 20, 2021

Hello @fiszczyp! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 149:32: E261 at least two spaces before inline comment
Line 505:80: E501 line too long (90 > 79 characters)

Comment last updated at 2021-05-22 15:55:18 UTC

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the test, couple of comments

Comment on lines +137 to +138
while not len(line.split()) == 5:
line = inf.readline()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has the ability to infinitely loop if a (non HISTORY) file never has 5 tokens on a line. I think once we reach end-of-file readline() is going to return an empty string (i.e. not even a newline char), can we add a check for that and raise an error if we reach the end of file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now raises EOFError, thanks!

Comment on lines +230 to 233
# Second line is traj_key, imcom, n_atoms, n_frames, n_records
offsets = []

with open(self.filename, 'r') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for self._n_frames was to cache the number of frames so it's only calculated once. This needs to be added back in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a question about the expected behaviour if a wrong HISTORY file is passed. DL_POLY HISTORY files have the n_frames added to the header line on top, so from the test files DLP_HISTORY_minimal:

DL_POLY: Minimal example of HISTORY
         0         0       3                    3                 2606

it can be parsed as:
trajectory key: 0 - only coordinates saved
imcom: 0 - non periodic
n_atoms: 3 - three atoms
n_frames: 3 - three frames
n_records: 2606 - number of records (typically lines, wrong here).

The HistoryReader() takes n_atoms from there. I also made it take n_frames from there, but it might be that someone just truncated the trajectory file and the information might be wrong. Should we instead just go through the file updating the number of frames as it reads? That is:

n_frames = len(offsets).

I think it might be more bullet proof, as I can totally imagine when someone would remove some last/first 50 frames without updating a header, but not touch the number of atoms at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re caching, I don't think the HistoryReader() was doing that but it makes sense to add, thanks!

@richardjgowers
Copy link
Member

thanks @fiszczyp can you also add yourself to AUTHORS and we can merge this

@richardjgowers richardjgowers added this to the 2.0 milestone May 22, 2021
@fiszczyp
Copy link
Contributor Author

thanks @fiszczyp can you also add yourself to AUTHORS and we can merge this

Done, thanks!

@richardjgowers richardjgowers merged commit 3b7755b into MDAnalysis:develop May 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DL_POLY HISTORY may contain cells even for non-periodic simulations
4 participants