Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL_POLY HISTORY may contain cells even for non-periodic simulations #3312

Merged
merged 14 commits into from
May 22, 2021
22 changes: 12 additions & 10 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,13 @@ The rules for this file:
calcraven,xiki-tempula, mieczyslaw, manuel.nuno.melo, PicoCentauri,
hanatok, rmeli, aditya-kamath, tirkarthi, LeonardoBarneschi, hejamu,
biogen98, orioncohen, z3y50n, hp115, ojeda-e, thadanipaarth, HenryKobin,
1ut, sulays, ahy3nz
1ut, sulays, ahy3nz, fiszczyp

* 2.0.0

Fixes
* DL_POLY HISTORY file may contain unit cell dimensions even if there are
no periodic boundary conditions (imcom=0) (Issue #3314, PR #3312).
* New RDKitConverter cache system to fix issue #2958 (PR #2942)
* Fixed OpenMM converter documentation not showing up in the Converters
section (Issue #3262, PR #2882)
Expand Down Expand Up @@ -186,7 +188,7 @@ Changes
* Adds a `force` parameter to the RDKitConverter to allow conversion of
molecules with no hydrogen atom, and a `set_converter_cache_size` function
to change how many items are retained in the cache (PR #2942)
* `hydrogenbonds.WaterBridgeAnalysis.generate_table()` now returns the table; as previously,
* `hydrogenbonds.WaterBridgeAnalysis.generate_table()` now returns the table; as previously,
it also generates the `table` attribute but the attribute will be removed in 3.0.0.
* `hydrogenbonds.WaterBridgeAnalysis` now stores data using the
`hydrogenbonds.WaterBridgeAnalysis.results` attribute
Expand All @@ -198,7 +200,7 @@ Changes
passes keyword arguments to the underlying converter. It can also be used
as `convert_to.lowercase_pkg_name()` for tab-completion (PR #2882)
* `analysis.polymer.PersistenceLength` class now stores `lb`,
`lp` and `fit` using the `analysis.base.Results` class
`lp` and `fit` using the `analysis.base.Results` class
(Issues #3289, #3291)
* `analysis.hole2.HoleAnalysis` now stores ``sphpdbs``, ``outfiles``
and ``profiles`` in the `analysis.base.Results` class (Issues #3261, #3269)
Expand Down Expand Up @@ -282,13 +284,13 @@ Deprecations
* The `hydrogenbonds.WaterBridgeAnalysis.network` attribute is now deprecated
in favour of `hydrogenbonds.WaterBridgeAnalysis.results.network`. It will be
removed in 3.0.0 (Issue #3261, Issue #3270)
* The `analysis.Contacts.timeseries` attribute is now deprecated in favour of
* The `analysis.Contacts.timeseries` attribute is now deprecated in favour of
`analysis.Contacts.results.timeseries`. It will be removed in 3.0.0
(Issue #3261)
* In 3.0.0 the ParmEd classes will only be accessible from the
`MDAnalysis.converters` module.
* The `analysis.polymer.PersistenceLength.lb`,
`analysis.polymer.PersistenceLength.lp` and
`analysis.polymer.PersistenceLength.lp` and
`analysis.polymer.PersistenceLength.fit` attributes are now deprecated in
favour of `analysis.polymer.PersistenceLength.results.lb`,
`analysis.polymer.PersistenceLength.results.lp` and
Expand Down Expand Up @@ -371,15 +373,15 @@ Deprecations
replaced with hydrogenbonds.WaterBridgeAnalysis (#3111)
* TPRParser indexing resids from 0 by default is deprecated.
From 2.0 TPRParser will index resids from 1 by default.


Enhancements
* Added `get_connections` method to get bonds, angles, dihedrals, etc.
with or without atoms outside the group (Issues #1264, #2821, PR #3160)
* Added `tpr_resid_from_one` keyword to select whether TPRParser
indexes resids from 0 or 1 (Issue #2364, PR #3153)


01/17/21 richardjgowers, IAlibay, orbeckst, tylerjereddy, jbarnoud,
yuxuanzhuang, lilyminium, VOD555, p-j-smith, bieniekmateusz,
calcraven, ianmkenney, rcrehuet, manuel.nuno.melo, hanatok
Expand Down Expand Up @@ -465,7 +467,7 @@ Fixes
* n_components correctly selects PCA components (Issue #2623)
* Use internal residue indices instead of resids for similarity and connectivity
selections (Issues #2669 and #2672, PR #2673)
* Checks for cryo-em 1 A^3 default CRYST1 record,
* Checks for cryo-em 1 A^3 default CRYST1 record,
disabling setting of box dimensions if found (Issue #2599)
* mdamath.angles now returns np.pi instead of -np.pi in cases of
lower bound roundoff errors. (Issue #2632, PR #2634)
Expand Down Expand Up @@ -527,7 +529,7 @@ Fixes
Enhancements
* Added support for FHI-AIMS input files (PR #2705)
* vastly improved support for 32-bit Windows (PR #2696)
* Added methods to compute the root-mean-square-inner-product of subspaces
* Added methods to compute the root-mean-square-inner-product of subspaces
and the cumulative overlap of a vector in a subspace for PCA (PR #2613)
* Added .frames and .times arrays to AnalysisBase (Issue #2661)
* Added elements attribute to PDBParser (Issue #2553, #2647)
Expand Down Expand Up @@ -574,7 +576,7 @@ Enhancements

Changes
* Unused `MDAnalysis.lib.mdamath._angle` has been removed (Issue #2650)
* Refactored dihedral selections and Ramachandran.__init__ to speed up
* Refactored dihedral selections and Ramachandran.__init__ to speed up
dihedral selections for faster tests (Issue #2671, PR #2706)
* Removes the deprecated `t0`, `tf`, and `dtmax` from
:class:Waterdynamics.SurvivalProbability. Instead the `start`, `stop` and
Expand Down
42 changes: 20 additions & 22 deletions package/MDAnalysis/coordinates/DLPoly.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
Expand Down Expand Up @@ -152,10 +152,19 @@ def __init__(self, filename, **kwargs):
# "private" file handle
self._file = util.anyopen(self.filename, 'r')
self.title = self._file.readline().strip()
self._levcfg, self._imcon, self.n_atoms = np.int64(self._file.readline().split()[:3])
header = np.int64(self._file.readline().split())
self._levcfg, self._imcon, self.n_atoms, _, _ = header
self._has_vels = True if self._levcfg > 0 else False
self._has_forces = True if self._levcfg == 2 else False

rwnd = self._file.tell()
self._file.readline()
if (len(self._file.readline().split())) == 3:
self._has_cell = True
else:
self._has_cell = False
self._file.seek(rwnd)

self.ts = self._Timestep(self.n_atoms,
velocities=self._has_vels,
forces=self._has_forces,
Expand All @@ -169,7 +178,8 @@ def _read_next_timestep(self, ts=None):
line = self._file.readline() # timestep line
if not line.startswith('timestep'):
raise IOError
if not self._imcon == 0:

if self._has_cell:
ts._unitcell[0] = self._file.readline().split()
ts._unitcell[1] = self._file.readline().split()
ts._unitcell[2] = self._file.readline().split()
Expand All @@ -193,11 +203,12 @@ def _read_next_timestep(self, ts=None):
ts._velocities[i] = self._file.readline().split()
if self._has_forces:
ts._forces[i] = self._file.readline().split()
i += 1

if ids:
ids = np.array(ids)
# if ids aren't strictly sequential
if not all(ids == (np.arange(self.n_atoms) + 1)):
if not np.all(ids == (np.arange(self.n_atoms) + 1)):
order = np.argsort(ids)
ts._pos[:] = ts._pos[order]
if self._has_vels:
Expand All @@ -216,31 +227,17 @@ def _read_frame(self, frame):

@property
def n_frames(self):
try:
return self._n_frames
except AttributeError:
self._n_frames = self._read_n_frames()
return self._n_frames

def _read_n_frames(self):
"""Read the number of frames, and the offset for each frame

offset[i] - returns the offset in bytes to seek into the file to be
just before the frame starts
"""
offsets = self._offsets = []
# Second line is traj_key, imcom, n_atoms, n_frames, n_records
offsets = []

with open(self.filename, 'r') as f:
Comment on lines +233 to 236
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for self._n_frames was to cache the number of frames so it's only calculated once. This needs to be added back in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a question about the expected behaviour if a wrong HISTORY file is passed. DL_POLY HISTORY files have the n_frames added to the header line on top, so from the test files DLP_HISTORY_minimal:

DL_POLY: Minimal example of HISTORY
         0         0       3                    3                 2606

it can be parsed as:
trajectory key: 0 - only coordinates saved
imcom: 0 - non periodic
n_atoms: 3 - three atoms
n_frames: 3 - three frames
n_records: 2606 - number of records (typically lines, wrong here).

The HistoryReader() takes n_atoms from there. I also made it take n_frames from there, but it might be that someone just truncated the trajectory file and the information might be wrong. Should we instead just go through the file updating the number of frames as it reads? That is:

n_frames = len(offsets).

I think it might be more bullet proof, as I can totally imagine when someone would remove some last/first 50 frames without updating a header, but not touch the number of atoms at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re caching, I don't think the HistoryReader() was doing that but it makes sense to add, thanks!

n_frames = 0

f.readline()
f.readline()
n_frames = int(f.readline().split()[3])
position = f.tell()
line = f.readline()
while line.startswith('timestep'):
offsets.append(position)
n_frames += 1
if not self._imcon == 0: # box info
if self._has_cell:
f.readline()
f.readline()
f.readline()
Expand All @@ -254,6 +251,7 @@ def _read_n_frames(self):
position = f.tell()
line = f.readline()

self._offsets = offsets
return n_frames

def _reopen(self):
Expand Down
10 changes: 4 additions & 6 deletions package/MDAnalysis/topology/DLPolyParser.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,16 +129,14 @@ def parse(self, **kwargs):
with openany(self.filename) as inf:
inf.readline()
levcfg, imcon, megatm = np.int64(inf.readline().split()[:3])
inf.readline()
if not imcon == 0:
inf.readline()
inf.readline()
inf.readline()

names = []
ids = []

line = inf.readline()
while not len(line.split()) == 5:
line = inf.readline()
Comment on lines +137 to +138
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has the ability to infinitely loop if a (non HISTORY) file never has 5 tokens on a line. I think once we reach end-of-file readline() is going to return an empty string (i.e. not even a newline char), can we add a check for that and raise an error if we reach the end of file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now raises EOFError, thanks!


while line and not line.startswith('timestep'):
name = line[:8].strip()
names.append(name)
Expand Down Expand Up @@ -169,7 +167,7 @@ def parse(self, **kwargs):

atomtypes = guessers.guess_types(names)
masses = guessers.guess_masses(atomtypes)

attrs = [
Atomnames(names),
Atomids(ids),
Expand Down