Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend lammpsdump to accept arbitrary columns #3608

Merged
merged 31 commits into from
Mar 4, 2024

Conversation

pstaerk
Copy link
Contributor

@pstaerk pstaerk commented Apr 6, 2022

Fixes #

Changes made in this Pull Request:

  • This updates the lammpsdump Parser to be able to parse arbitrary data columns such as charge etc.

This handles issues #3504 and addresses PR #3448.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Apr 6, 2022

Hello @pstaerk! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 159:80: E501 line too long (85 > 79 characters)
Line 170:80: E501 line too long (106 > 79 characters)
Line 557:80: E501 line too long (89 > 79 characters)
Line 558:80: E501 line too long (94 > 79 characters)

Comment last updated at 2024-02-29 13:52:10 UTC

@codecov
Copy link

codecov bot commented Apr 6, 2022

Codecov Report

Merging #3608 (d5e0310) into develop (95791dd) will increase coverage by 1.07%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #3608      +/-   ##
===========================================
+ Coverage    93.04%   94.12%   +1.07%     
===========================================
  Files          172      190      +18     
  Lines        22731    24675    +1944     
  Branches      3308     3327      +19     
===========================================
+ Hits         21150    23225    +2075     
- Misses        1028     1388     +360     
+ Partials       553       62     -491     
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/LAMMPS.py 96.29% <100.00%> (+6.61%) ⬆️
package/MDAnalysis/lib/NeighborSearch.py 96.42% <0.00%> (-3.58%) ⬇️
package/MDAnalysis/topology/tpr/obj.py 96.96% <0.00%> (-3.04%) ⬇️
...onality_reduction/DimensionalityReductionMethod.py 97.05% <0.00%> (-2.95%) ⬇️
...sis/analysis/encore/clustering/ClusteringMethod.py 95.45% <0.00%> (-1.43%) ⬇️
package/MDAnalysis/converters/RDKitParser.py 96.21% <0.00%> (-0.57%) ⬇️
package/MDAnalysis/topology/guessers.py 99.21% <0.00%> (-0.02%) ⬇️
package/MDAnalysis/units.py 100.00% <0.00%> (ø)
package/MDAnalysis/lib/_cutil.pyx 100.00% <0.00%> (ø)
package/MDAnalysis/lib/mdamath.py 100.00% <0.00%> (ø)
... and 122 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95791dd...d5e0310. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 6, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.38%. Comparing base (2c1aa4b) to head (bac1f4c).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3608      +/-   ##
===========================================
- Coverage    93.38%   93.38%   -0.01%     
===========================================
  Files          171      183      +12     
  Lines        21744    22843    +1099     
  Branches      4014     4023       +9     
===========================================
+ Hits         20305    21331    +1026     
- Misses         952     1025      +73     
  Partials       487      487              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hmacdope hmacdope self-assigned this Apr 6, 2022
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good start, see my comments. Main thing is I would add an __init__ kwarg and force people to specify what additional columns they want to save memory.

You will also need to add tests in the testsuite, feel free to add an additional small file if you need.

package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
@hmacdope
Copy link
Member

hmacdope commented Apr 7, 2022

Also @pstaerk just keep an eye on PEP8

@aliehlen
Copy link

aliehlen commented May 2, 2022

Hello! I have been following this set of issues and PRs that is looking to update the lammps dump readers. I really like this general solution to reading in arbitrary data, and I appreciate that it can auto-detect the columns in the dump file, so that all of them can be read in if needed. Just wanted to drop in and say thanks for your work on this :)

@hmacdope
Copy link
Member

@pstaerk just ping me when you want a review.

@pstaerk pstaerk requested a review from hmacdope May 17, 2022 08:10
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work @pstaerk,

A few things to change but its looking good overall. See my comments for details.
Additionally,

  • You will need to address the PEP8 issues and formatting
  • You will need to add some docs.
  • You will need a CHANGELOG entry
  • Don't forget to add yourself to AUTHORS also if you are not already on there.

:)

package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_lammps.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_lammps.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_lammps.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_lammps.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_lammps.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
@pstaerk
Copy link
Contributor Author

pstaerk commented Nov 8, 2022

@hmacdope I hope that with this, I finally have all the things done that are required for the PR :).

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Few queries and changes suggested.

Would you also be able to fix conflicts? There were changes made to the parser in #3844 and you will need to work in with those. :)

Could you please also introduce yourself on the mailing list as merging this PR will make you part of the MDAnalysis community. :)

package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
@@ -490,7 +525,7 @@ class DumpReader(base.ReaderBase):

@store_init_arguments
def __init__(self, filename, lammps_coordinate_convention="auto",
**kwargs):
additional_columns=False, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say None is more idiomatic here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also be solved with @hejamu work

# Create the data arrays for additional attributes which will be saved
# under ts.data
additional_keys = []
if len(attrs) > 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check for >3?

@@ -425,6 +427,7 @@ def u(self, tmpdir, request):
# no conversion needed
f = LAMMPSDUMP
else:
# Select if one wants to use the additional column format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this comment is here for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address this?

def u_additional_columns(self):
f = LAMMPSDUMP_additional_columns
top = LAMMPSdata_additional_columns
yield (mda.Universe(top, f, format='LAMMPSDUMP',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just return the tuple?

testsuite/MDAnalysisTests/coordinates/test_lammps.py Outdated Show resolved Hide resolved
package/AUTHORS Outdated Show resolved Hide resolved
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. I haven't been able to do a full review but have a few comments/questions.

package/CHANGELOG Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
name of the data column. For instance, if you have time-dependent charges
saved in a LAMMPS dump such as

.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use python formatting for this block, just use something generic

package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
additional_columns=['q', 'l'])

The additional data is then available for each time step via
(as the value of the `data` dictionary, sorted by the ids of the atoms).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use reST role for the data attr

package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
@@ -490,7 +525,7 @@ class DumpReader(base.ReaderBase):

@store_init_arguments
def __init__(self, filename, lammps_coordinate_convention="auto",
**kwargs):
additional_columns=False, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur.

package/MDAnalysis/coordinates/LAMMPS.py Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Sep 29, 2023

Linter Bot Results:

Hi @pstaerk! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/8097120276/job/22127421709


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

@hejamu
Copy link
Contributor

hejamu commented Sep 29, 2023

@orbeckst I implemented the changed discussed offline. None is now the default for additional_columns and True parses all columns that are not parsable already.

@hmacdope maybe you can take another look also.

Docs still need tuning.

@hejamu
Copy link
Contributor

hejamu commented Jan 30, 2024

Pinging @hmacdope @orbeckst :)

@hmacdope
Copy link
Member

Sorry for being so slow @pstaerk, i'll have a look ASAP

@orbeckst
Copy link
Member

orbeckst commented Jan 30, 2024 via email

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic work @pstaerk, will be a big quality of life improvement for LAMMPS users, given how much they use arbitrary columnar data. Sorry it has taken me so long to get to. Just a few improvements to make and we should be able to go ahead.

package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
@@ -425,6 +427,7 @@ def u(self, tmpdir, request):
# no conversion needed
f = LAMMPSDUMP
else:
# Select if one wants to use the additional column format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address this?

package/MDAnalysis/coordinates/LAMMPS.py Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/LAMMPS.py Show resolved Hide resolved
@pstaerk
Copy link
Contributor Author

pstaerk commented Feb 26, 2024

Ok, I hope to have addressed all the requested points @hmacdope :) .

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tiny nitpicks on formatting, but other than that this is good to go. Amazing amazing work @pstaerk, and @hejamu this will be such a fantastic boost for LAMMPS users.

@@ -156,6 +156,8 @@
"LAMMPSdata_deletedatoms", # with deleted atoms
"LAMMPSdata_triclinic", # lammpsdata file to test triclinic dimension parsing, albite with most atoms deleted
"LAMMPSdata_PairIJ", # lammps datafile with a PairIJ Coeffs section
# structure for the additional column lammpstrj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments after data

@@ -166,6 +168,8 @@
"LAMMPSDUMP_chain1", # Lammps dump file with chain reader
"LAMMPSDUMP_chain2", # Lammps dump file with chain reader
"LAMMPS_chain", # Lammps data file with chain reader
# lammpsdump file with additional data (an additional charge column)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments after data.

testsuite/MDAnalysisTests/datafiles.py Outdated Show resolved Hide resolved
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work all! I am happy for this to go ahead. 🥇

@hmacdope
Copy link
Member

Kicking CI

@hmacdope
Copy link
Member

hmacdope commented Mar 1, 2024

CI is not cooperating, trying again.

@hmacdope hmacdope merged commit 3c83b8f into MDAnalysis:develop Mar 4, 2024
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants