Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend lammpsdump to accept arbitrary columns #3608

Merged
merged 31 commits into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
89c7363
Extended mdanalysis to accept other attributes as well.
Feb 7, 2022
99b7875
Able to parse arbitrary columns now.
Apr 6, 2022
36e9bb6
Tried to fix most of the pep8 problems.
May 6, 2022
3406e5b
First try at testing the additional column part.
May 6, 2022
87ed9ce
Testing multi read columns as well.
May 13, 2022
efca264
Fix the no additional columns case.
May 20, 2022
2f02395
Implemented requested changes to docs.
Jun 28, 2022
c8f61db
Implemented the requested changes to the tests.
Jun 28, 2022
140b062
Incorporated the PEP8 comments.
Jun 28, 2022
0365404
Third round of PEP...
Jun 28, 2022
138d569
PEP8 ...
Jun 28, 2022
5ab661a
Authors and changelog.
Jun 28, 2022
a785cec
Variable renaming issue.
Jun 29, 2022
ade5647
Hopefully fixed documentation.
Jun 29, 2022
603116a
Sphinx
Jun 29, 2022
bc9296c
Hopefully fixed the tests
Nov 8, 2022
d6d5426
Implement input from UGM23
hejamu Sep 29, 2023
27cb7be
refine tests
hejamu Sep 29, 2023
1e8557b
Small typo
pstaerk Feb 26, 2024
07d0c18
Removed comment
pstaerk Feb 26, 2024
5f36ff4
Addressed hmacdope's comments regarding issue link
Feb 26, 2024
e45e764
Addressed hmacdope's comments regarding file paths.
Feb 26, 2024
546dd43
Added warning if keys are not in lammpsdump file.
Feb 26, 2024
adec794
Tested the formatting error of additional_columns.
Feb 26, 2024
57b895d
Make changed lines comply with pep8
hejamu Feb 27, 2024
d054946
80 is indeed longer than 79...
hejamu Feb 27, 2024
bb568b3
Merge remote-tracking branch 'origin/develop' into extend_lammpsdump
hejamu Feb 27, 2024
9f14a8a
Fix test
hejamu Feb 27, 2024
bcb044f
Fix test
hejamu Feb 27, 2024
8ef649c
Don't format `datafiles.py`
hejamu Feb 27, 2024
bac1f4c
Added test of warning.
Feb 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package/AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ Chronological list of authors
- Jenna M. Swarthout Goddard
2024
- Aditya Keshari
- Philipp Stärk

External code
-------------
Expand Down
3 changes: 2 additions & 1 deletion package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The rules for this file:

-------------------------------------------------------------------------------
??/??/?? IAlibay, HeetVekariya, marinegor, lilyminium, RMeli,
ljwoods2, aditya292002
ljwoods2, aditya292002, pstaerk

* 2.8.0

Expand All @@ -30,6 +30,7 @@ Fixes
* Fix deploy action to use the correct version of the pypi upload action.

Enhancements
* Added parsing of arbitrary columns of the LAMMPS dump parser. (Issue #3504)
* Documented the r0 attribute in the `Contacts` class and added the
`n_initial_contacts` attribute, with documentation. (Issue #2604, PR #4415)

Expand Down
102 changes: 94 additions & 8 deletions package/MDAnalysis/coordinates/LAMMPS.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@
from ..topology.LAMMPSParser import DATAParser
from ..exceptions import NoDataError
from . import base
import warnings

btype_sections = {'bond':'Bonds', 'angle':'Angles',
'dihedral':'Dihedrals', 'improper':'Impropers'}
Expand Down Expand Up @@ -458,12 +459,50 @@
"""Reads the default `LAMMPS dump format
<https://docs.lammps.org/dump.html>`__

Supports coordinates in various LAMMPS coordinate conventions and both
orthogonal and triclinic simulation box dimensions (for more details see
`documentation <https://docs.lammps.org/Howto_triclinic.html>`__). In
either case, MDAnalysis will always use ``(*A*, *B*, *C*, *alpha*, *beta*,
*gamma*)`` to represent the unit cell. Lengths *A*, *B*, *C* are in the
MDAnalysis length unit (Å), and angles are in degrees.
Supports coordinates in the LAMMPS "unscaled" (x,y,z), "scaled" (xs,ys,zs),
"unwrapped" (xu,yu,zu) and "scaled_unwrapped" (xsu,ysu,zsu) coordinate
conventions (see https://docs.lammps.org/dump.html for more details).
If `lammps_coordinate_convention='auto'` (default),
one will be guessed. Guessing checks whether the coordinates fit each
convention in the order "unscaled", "scaled", "unwrapped",
"scaled_unwrapped" and whichever set of coordinates is detected first will
be used. If coordinates are given in the scaled coordinate convention
(xs,ys,zs) or scaled unwrapped coordinate convention (xsu,ysu,zsu) they
will automatically be converted from their scaled/fractional representation
to their real values.

Supports both orthogonal and triclinic simulation box dimensions (for more
details see https://docs.lammps.org/Howto_triclinic.html). In either case,
MDAnalysis will always use ``(*A*, *B*, *C*, *alpha*, *beta*, *gamma*)``
to represent the unit cell. Lengths *A*, *B*, *C* are in the MDAnalysis
length unit (Å), and angles are in degrees.

By using the keyword `additional_columns`, you can specify arbitrary data
to be read. The keyword expects a list of the names of the columns or
`True` to read all additional columns. The results are saved to
:attr:`Timestep.data`. For example, if your LAMMPS dump looks like this

.. code-block::

ITEM: ATOMS id x y z q l
1 2.84 8.17 -25 0.00258855 1.1
2 7.1 8.17 -25 6.91952e-05 1.2

Then you may parse the additional columns `q` and `l` via:

.. code-block:: python

u = mda.Universe('structure.data', 'traj.lammpsdump',
additional_columns=['q', 'l'])

The additional data is then available for each time step via:

.. code-block:: python

for ts in u.trajectory:
charges = ts.data['q'] # Access additional data, sorted by the id
ls = ts.data['l']
...

Parameters
----------
Expand Down Expand Up @@ -497,6 +536,9 @@
**kwargs
Other keyword arguments used in :class:`~MDAnalysis.coordinates.base.ReaderBase`

.. versionchanged:: 2.7.0
Reading of arbitrary, additional columns is now supported.
(Issue #3608)
.. versionchanged:: 2.4.0
Now imports velocities and forces, translates the box to the origin,
and optionally unwraps trajectories with image flags upon loading.
Expand All @@ -510,18 +552,23 @@
format = 'LAMMPSDUMP'
_conventions = ["auto", "unscaled", "scaled", "unwrapped",
"scaled_unwrapped"]

_coordtype_column_names = {
"unscaled": ["x", "y", "z"],
"scaled": ["xs", "ys", "zs"],
"unwrapped": ["xu", "yu", "zu"],
"scaled_unwrapped": ["xsu", "ysu", "zsu"]
}

_parsable_columns = ["id", "vx", "vy", "vz", "fx", "fy", "fz"]
for key in _coordtype_column_names.keys():
_parsable_columns += _coordtype_column_names[key]

@store_init_arguments
def __init__(self, filename,
def __init__(self, filename,
lammps_coordinate_convention="auto",
unwrap_images=False,
**kwargs):
additional_columns=None, **kwargs):
super(DumpReader, self).__init__(filename, **kwargs)

root, ext = os.path.splitext(self.filename)
Expand All @@ -536,6 +583,16 @@

self._unwrap = unwrap_images

if (util.iterable(additional_columns)
or additional_columns is None
or additional_columns is True):
self._additional_columns = additional_columns
hmacdope marked this conversation as resolved.
Show resolved Hide resolved
else:
raise ValueError(f"additional_columns={additional_columns} "
hmacdope marked this conversation as resolved.
Show resolved Hide resolved
"is not a valid option. Please provide an "
"iterable containing the additional"
"column headers.")

self._cache = {}

self._reopen()
Expand Down Expand Up @@ -681,6 +738,25 @@
coord_cols.extend(image_cols)

ids = "id" in attr_to_col_ix

# Create the data arrays for additional attributes which will be saved
# under ts.data
if self._additional_columns is True:
# Parse every column that is not already parsed
# elsewhere (total \ parsable)
additional_keys = set(attrs).difference(self._parsable_columns)
elif self._additional_columns:
if not all([key in attrs for key in self._additional_columns]):
warnings.warn("Some of the additional columns are not present "

Check warning on line 750 in package/MDAnalysis/coordinates/LAMMPS.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/LAMMPS.py#L750

Added line #L750 was not covered by tests
"in the file, they will be ignored")
additional_keys = \
[key for key in self._additional_columns if key in attrs]
hmacdope marked this conversation as resolved.
Show resolved Hide resolved
else:
additional_keys = []
for key in additional_keys:
ts.data[key] = np.empty(self.n_atoms)

# Parse all the atoms
for i in range(self.n_atoms):
fields = f.readline().split()
if ids:
Expand All @@ -701,12 +777,22 @@
if self._has_forces:
ts.forces[i] = [fields[dim] for dim in force_cols]

# Collect additional cols
for attribute_key in additional_keys:
ts.data[attribute_key][i] = \
fields[attr_to_col_ix[attribute_key]]

order = np.argsort(indices)
ts.positions = ts.positions[order]
if self._has_vels:
ts.velocities = ts.velocities[order]
if self._has_forces:
ts.forces = ts.forces[order]

# Also need to sort the additional keys
for attribute_key in additional_keys:
ts.data[attribute_key] = ts.data[attribute_key][order]

if (self.lammps_coordinate_convention.startswith("scaled")):
# if coordinates are given in scaled format, undo that
ts.positions = distances.transform_StoR(ts.positions,
Expand Down
13 changes: 10 additions & 3 deletions testsuite/MDAnalysisTests/coordinates/reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,10 @@
from MDAnalysisTests import datafiles
from MDAnalysisTests.datafiles import (PDB_small, PDB, LAMMPSdata,
LAMMPSdata2, LAMMPSdcd2,
LAMMPSdata_mini, PSF_TRICLINIC,
DCD_TRICLINIC, PSF_NAMD_TRICLINIC,
DCD_NAMD_TRICLINIC)
LAMMPSdata_mini,
LAMMPSdata_additional_columns,
PSF_TRICLINIC, DCD_TRICLINIC,
PSF_NAMD_TRICLINIC, DCD_NAMD_TRICLINIC)


class RefAdKSmall(object):
Expand Down Expand Up @@ -227,3 +228,9 @@ class RefLAMMPSDataMini(object):
dtype=np.float32)
dimensions = np.array([60., 50., 30., 90., 90., 90.], dtype=np.float32)


class RefLAMMPSDataAdditionalColumns(object):
q = np.array([2.58855e-03, 6.91952e-05, 1.05548e-02, 4.20319e-03,
9.19172e-03, 4.79777e-03, 6.36864e-04, 5.87125e-03,
-2.18125e-03, 6.88910e-03])
p = np.array(5 * [1.1, 1.2])
57 changes: 55 additions & 2 deletions testsuite/MDAnalysisTests/coordinates/test_lammps.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,18 @@
import MDAnalysis as mda
from MDAnalysis import NoDataError

from numpy.testing import (assert_equal, assert_allclose, assert_allclose)
from numpy.testing import (assert_equal, assert_allclose)

from MDAnalysisTests import make_Universe
from MDAnalysisTests.coordinates.reference import (
RefLAMMPSData, RefLAMMPSDataMini, RefLAMMPSDataDCD,
RefLAMMPSDataAdditionalColumns
)
from MDAnalysisTests.datafiles import (
LAMMPScnt, LAMMPShyd, LAMMPSdata, LAMMPSdata_mini, LAMMPSdata_triclinic,
LAMMPSDUMP, LAMMPSDUMP_allcoords, LAMMPSDUMP_nocoords, LAMMPSDUMP_triclinic,
LAMMPSDUMP_image_vf, LAMMPS_image_vf
LAMMPSDUMP_image_vf, LAMMPS_image_vf, LAMMPSdata_additional_columns,
LAMMPSDUMP_additional_columns
)


Expand Down Expand Up @@ -511,6 +513,38 @@ def u(self, tmpdir, request):
yield mda.Universe(f, format='LAMMPSDUMP',
lammps_coordinate_convention="auto")

@pytest.fixture()
def u_additional_columns_true(self):
f = LAMMPSDUMP_additional_columns
top = LAMMPSdata_additional_columns
return mda.Universe(top, f, format='LAMMPSDUMP',
lammps_coordinate_convention="auto",
additional_columns=True)

@pytest.fixture()
def u_additional_columns_single(self):
f = LAMMPSDUMP_additional_columns
top = LAMMPSdata_additional_columns
return mda.Universe(top, f, format='LAMMPSDUMP',
lammps_coordinate_convention="auto",
additional_columns=['q'])

@pytest.fixture()
def u_additional_columns_multiple(self):
f = LAMMPSDUMP_additional_columns
top = LAMMPSdata_additional_columns
return mda.Universe(top, f, format='LAMMPSDUMP',
lammps_coordinate_convention="auto",
additional_columns=['q', 'p'])

@pytest.fixture()
def u_additional_columns_wrong_format(self):
f = LAMMPSDUMP_additional_columns
top = LAMMPSdata_additional_columns
return mda.Universe(top, f, format='LAMMPSDUMP',
lammps_coordinate_convention="auto",
additional_columns='q')

@pytest.fixture()
def reference_positions(self):
# manually copied from traj file
Expand Down Expand Up @@ -592,6 +626,25 @@ def test_atom_reordering(self, u, reference_positions):
assert_allclose(atom1.position, atom1_pos-bmin, atol=1e-5)
assert_allclose(atom13.position, atom13_pos-bmin, atol=1e-5)

@pytest.mark.parametrize("system, fields", [
('u_additional_columns_true', ['q', 'p']),
('u_additional_columns_single', ['q']),
('u_additional_columns_multiple', ['q', 'p']),
])
def test_additional_columns(self, system, fields, request):
u = request.getfixturevalue(system)
hmacdope marked this conversation as resolved.
Show resolved Hide resolved
for field in fields:
data = u.trajectory[0].data[field]
assert_allclose(data,
getattr(RefLAMMPSDataAdditionalColumns, field))

@pytest.mark.parametrize("system", [
('u_additional_columns_wrong_format'),
])
def test_wrong_format_additional_colums(self, system, request):
with pytest.raises(ValueError,
match="Please provide an iterable containing"):
request.getfixturevalue(system)

@pytest.mark.parametrize("convention",
["unscaled", "unwrapped", "scaled_unwrapped"])
Expand Down
29 changes: 29 additions & 0 deletions testsuite/MDAnalysisTests/data/lammps/additional_columns.data
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
LAMMPS data file via write_data, version 24 Mar 2022, timestep = 500

10 atoms
1 atom types

0 42.6 xlo xhi
0 44.2712 ylo yhi
-25.1 25.1 zlo zhi

Masses

1 12.011

Pair Coeffs # lj/cut/coul/long/omp

1 0.0663 3.5812

Atoms # full

1 2 1 -0.00706800004577013 2.84 8.17 -25 0 0 0
2 2 1 0.004078816788554217 7.1 8.17 -25 0 0 0
3 2 1 -0.005824512619752745 2.13 6.94 -25 0 0 0
4 2 1 0.002812345167059992 6.39 6.94 -25 0 0 0
5 2 1 -0.004070019151543417 2.84 5.71 -25 0 0 0
6 2 1 0.004796116641855679 7.1 5.71 -25 0 0 0
7 2 1 -0.003217742434809291 2.13 4.48 -25 0 0 0
8 2 1 0.0008273956785370801 6.39 4.48 -25 0 0 0
9 2 1 -0.0003942558636157474 2.84 3.25 -25 0 0 0
10 2 1 0.001288716009147968 7.1 3.25 -25 0 0 0
19 changes: 19 additions & 0 deletions testsuite/MDAnalysisTests/data/lammps/additional_columns.lammpstrj
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
10
ITEM: BOX BOUNDS pp pp ff
0.0000000000000000e+00 4.2600000000000001e+01
0.0000000000000000e+00 4.4271200000000000e+01
-2.5100000000000001e+01 2.5100000000000001e+01
ITEM: ATOMS id x y z q p
1 2.84 8.17 -25 0.00258855 1.1
2 7.1 8.17 -25 6.91952e-05 1.2
3 2.13 6.94 -25 0.0105548 1.1
4 6.39 6.94 -25 0.00420319 1.2
5 2.84 5.71 -25 0.00919172 1.1
6 7.1 5.71 -25 0.00479777 1.2
7 2.13 4.48 -25 0.000636864 1.1
8 6.39 4.48 -25 0.00587125 1.2
9 2.84 3.25 -25 -0.00218125 1.1
10 7.1 3.25 -25 0.0068891 1.2
4 changes: 4 additions & 0 deletions testsuite/MDAnalysisTests/datafiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@
"LAMMPSdata_deletedatoms", # with deleted atoms
"LAMMPSdata_triclinic", # lammpsdata file to test triclinic dimension parsing, albite with most atoms deleted
"LAMMPSdata_PairIJ", # lammps datafile with a PairIJ Coeffs section
"LAMMPSdata_additional_columns", # structure for the additional column lammpstrj
"LAMMPSDUMP",
"LAMMPSDUMP_long", # lammpsdump file with a few zeros sprinkled in the first column first frame
"LAMMPSDUMP_allcoords", # lammpsdump file with all coordinate conventions (x,xs,xu,xsu) present, from LAMMPS rdf example
Expand All @@ -166,6 +167,7 @@
"LAMMPSDUMP_chain1", # Lammps dump file with chain reader
"LAMMPSDUMP_chain2", # Lammps dump file with chain reader
"LAMMPS_chain", # Lammps data file with chain reader
"LAMMPSDUMP_additional_columns", # lammpsdump file with additional data (an additional charge column)
"unordered_res", # pdb file with resids non sequential
"GMS_ASYMOPT", # GAMESS C1 optimization
"GMS_SYMOPT", # GAMESS D4h optimization
Expand Down Expand Up @@ -552,6 +554,8 @@
LAMMPSDUMP_chain2 = (_data_ref / "lammps/chain_dump_2.lammpstrj").as_posix()
LAMMPS_chain = (_data_ref / "lammps/chain_initial.data").as_posix()
LAMMPSdata_many_bonds = (_data_ref / "lammps/a_lot_of_bond_types.data").as_posix()
LAMMPSdata_additional_columns = (_data_ref / "lammps/additional_columns.data").as_posix()
LAMMPSDUMP_additional_columns = (_data_ref / "lammps/additional_columns.lammpstrj").as_posix()

unordered_res = (_data_ref / "unordered_res.pdb").as_posix()

Expand Down
Loading