Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow LammpsDumpParser Topology reader more flexibility in coordinate column data. #3449

Open
hmacdope opened this issue Oct 31, 2021 · 21 comments

Comments

@hmacdope
Copy link
Member

Is your feature request related to a problem?

The LAMMPS DumpParser topology creator is somewhat inflexible in the column format it expects.
Now that we support other coordinate conventions #3360 and velocities #3448 (WIP), we should relax the requirement for an

id, type, x, y, z 

format and make it somewhat more flexible.

Describe the solution you'd like

Allow the parser more flexibility in available column data.

Describe alternatives you've considered

Additional context

@orbeckst
Copy link
Member

@hmacdope I un-assigned you from this issue, otherwise GSoC Contributors will wonder if they can pick the issue to work on.

@hmacdope
Copy link
Member Author

Okay all good 👍

@Riyabelle25
Copy link

Riyabelle25 commented Mar 28, 2022

Hello, I'm RIya Elizabeth John, an Outreachy applicant. Could you provide some more context about the LAMMPS DumpParser topology creator (perhaps it's location in the codebase), and the feature request?

@hmacdope
Copy link
Member Author

The LAMMPs parser is in this file. Currently it expects dump files produced by LAMMPs to have a very specific format, when the file format is actually very flexible and can have many different columns in different orders. The job would be to add a flexible parser to fix this issue.

@Riyabelle25
Copy link

Apologies for the late response, I had been travelling on our class trip.
Adding a parser would mean adding a separate class for the same in a separate file right?
Or should I look into adding a specific method to handle this usecase?

@hmacdope
Copy link
Member Author

hmacdope commented Apr 3, 2022

@Riyabelle25 sorry should have been clearer, you just need to change the logic in the existing class :)

@sayantan-bhattacharyya
Copy link

I want to contribute.

@Davichet-e
Copy link

Davichet-e commented Apr 14, 2022

Hey! I'm interested in contributing to this issue, I would have one question, when talking about: "we should relax the requirement for an id, type, x, y, z format and make it somewhat more flexible." You mean deleting totally/some fields of these lines?:

reqd_attrs = ['id', 'type', 'x', 'y', 'z']
missing_attrs = [attr for attr in reqd_attrs if attr not in style_dict]

or what exactly is meant by: "more flexible". Thank you so much in advance!

@aliehlen
Copy link

Hi! I am an MDAnalysis user and have just started using lammps. I would absolutely use this functionality, especially if it were general (for example, I have been needing to hack around to read forces using MDAnalysis). I don't think I could contribute to changing the parser, but let me know if I can help out in any other way (providing example dump files with different properties, testing things out).

@orbeckst
Copy link
Member

@aliehlen any kind of help is appreciated. If you are a user of LAMMPS then please say how MDAnalysis could work better for you. Issues should focus on one specific thing. You can open new issues or starting a discussion on the developer mailing list. Examples of input files (maybe a snippet in a comment for discussion) and links to format specifications that you know to be relevant are useful. For example, I know very little about LAMMPS (so I also can't say much about #3449 (comment) ) but I can judge an approach if someone clearly explains

  1. what the current situation is (what doesn't work — example)
  2. what they want to achieve (eg an example of what it should look like)
  3. how they want to solve the problem

The last point is not necessary to get a discussion started. Initially, it's more important to understand what the problem is and what a solution should provide.

@hmacdope
Copy link
Member Author

Hey! I'm interested in contributing to this issue, I would have one question, when talking about: "we should relax the requirement for an id, type, x, y, z format and make it somewhat more flexible." You mean deleting totally/some fields of these lines?:

reqd_attrs = ['id', 'type', 'x', 'y', 'z']
missing_attrs = [attr for attr in reqd_attrs if attr not in style_dict]

or what exactly is meant by: "more flexible". Thank you so much in advance!

It means being able to parse data under more columns than "Id type ..." and also have them possibly out of order. 😀

@amirhs1
Copy link

amirhs1 commented Apr 16, 2022

Hey! I wanna work on this issue, but I got confused, so I decided to ask a few questions before moving forward:

  • The DATAParser class is limit in reading different styles Atoms section; it only read full and atomic styles. My idea is to modify this class so it can read all different styles as outlined in a table in the Atoms section in read_data documentation. Is this improvement relevant to this issue?
  • Is the goal to improve the LammpsDumpParser class so it can work with a variety of columns? If yes, I am thinking of defining a subset of valid attributes as defined in dump documentation? My rationale to select a subset not the whole set of attributes is to compatible with MDAnalysis; for instance, I am not sure whether attributes like proc (the number of processors) or mux (the orientation of dipole moment of an atom along x axis) can be used in MDAnalysis or not. What are your thoughts?

@hmacdope
Copy link
Member Author

@amirhs1 for the DataParser, you can provide the atom_style keyword to match the variety of columns and the LAMMPs output style you chose.

The aim of this issue is to do something similar for the DumpParser. Have a read of the source code in https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/topology/LAMMPSParser.py which already has a lot of the functionality you will need.

@amirhs1
Copy link

amirhs1 commented Apr 17, 2022

I created a branch called "issue3449-LammpsDumpParser" from "develop" branch.

@hmacdope
Copy link
Member Author

hmacdope commented Apr 18, 2022 via email

@fortune-max
Copy link

Hello @hmacdope, I'm Fortune from the Outreachy program and am interested in contributing to this project. I think I understand the issue but I have just one question about the intended atom_style attribute for the LammpsDumpParser class.

According to the LAMMPS docs for the dump command, these are the possible attributes for atom_style

possible attributes = id, mol, proc, procp1, type, element, mass,
                      x, y, z, xs, ys, zs, xu, yu, zu,
                      xsu, ysu, zsu, ix, iy, iz,
                      vx, vy, vz, fx, fy, fz,
                      q, mux, muy, muz, mu,
                      radius, diameter, omegax, omegay, omegaz,
                      angmomx, angmomy, angmomz, tqx, tqy, tqz,
                      c_ID, c_ID[I], f_ID, f_ID[I], v_name,
                      i_name, d_name, i2_name[I], d2_name[I]

Should all be parsed and passed to their respective TopologyAttrs subclass? The DataParser class seem to clip theirs to id type resid charge x y z

@aliehlen
Copy link

aliehlen commented May 3, 2022

@aliehlen any kind of help is appreciated. If you are a user of LAMMPS then please say how MDAnalysis could work better for you. Issues should focus on one specific thing. You can open new issues or starting a discussion on the developer mailing list. Examples of input files (maybe a snippet in a comment for discussion) and links to format specifications that you know to be relevant are useful. For example, I know very little about LAMMPS (so I also can't say much about #3449 (comment) ) but I can judge an approach if someone clearly explains

1. what the current situation is (what doesn't work — example)

2. what they want to achieve (eg an example of what it _should_ look like)

3. how they want to solve the problem

The last point is not necessary to get a discussion started. Initially, it's more important to understand what the problem is and what a solution should provide.

Sorry for the delay (I was away). I would open an issue about this but it seems that there are a few open already (this one, issue #3504 ) and a few PRs to deal with them (PR #3448 , #3608 , #3647 ). So I'd just like to voice support for this one and #3504 :).

As is discussed here, lammps dump files are very flexible and customizable and it would be great to be able to read in other properties. Even though I'm not an extremely experienced user of lammps or MDAnalysis, I've already run into situations in which I want to read in properties from a dump file that can't currently be read in (I've attached an example of a dump file that has a simple custom style).

Reading through these comments, I think PR #3647 has allowed for the topology parser to accept different atom_styles. I want to point out that it also (seemly) is common to have custom dump formats, so it would be great if those can be handled by the topology parser, as well (or, at least, they shouldn't break the parser). PR #3608 deals with arbitrary columns for reading in coordinate data in a really general way---I'm not sure any of that could apply to the topology parsing, though.

Anyway, I don't know if there's a good way to help with these two specific PRs, but I can test out different files and configurations here if needed!

dump.zip

@yickysan
Copy link

Hello, I want to ask a question concerning what you meant on making the lammpsDumpParser more flexible. Does it mean changing the parser to accept different kind of atom styles such as body, bond and charge and not just the atomic style currently used .

@vdjeudjam
Copy link

vdjeudjam commented Mar 27, 2024

Hi @hmacdope, I received the invitation to submit a PR for the purpose of applying to gsoc'24. I'm interested in contributing to this issue.

Having gone through the discussions above, I understand that I need to allow more flexibility to user in the passing o their dup file. Please is this definition ok? If so I'll:

  1. Extend the atom_style argument: Instead of just a string, allow it to be a dictionary.
  2. Dictionary Structure:
  • Keys: Integers representing column indices (starting from 0).
  • Values: Strings representing the desired attribute names (e.g., "id", "type", "x", "y", "z").
  1. Update _interpret_atom_style function:
  • Modify the function to accept the dictionary as input.
  • Use the dictionary to create a mapping between column indices and attribute names.
  • This mapping can then be used during parsing to interpret data correctly.

If this approach is fine with you, please assign it to me

@vdjeudjam
Copy link

Hi @hmacdope, I received the invitation to submit a PR for the purpose of applying to gsoc'24. I'm interested in contributing to this issue.

Having gone through the discussions above, I understand that I need to allow more flexibility to user in the passing o their dup file. Please is this definition ok? If so I'll:

  1. Extend the atom_style argument: Instead of just a string, allow it to be a dictionary.
  2. Dictionary Structure:
  • Keys: Integers representing column indices (starting from 0).
  • Values: Strings representing the desired attribute names (e.g., "id", "type", "x", "y", "z").
  1. Update _interpret_atom_style function:
  • Modify the function to accept the dictionary as input.
  • Use the dictionary to create a mapping between column indices and attribute names.
  • This mapping can then be used during parsing to interpret data correctly.

If this approach is fine with you, please assign it to me

Hi @hmacdope @orbeckst any feedback on this suggestion?

@orbeckst
Copy link
Member

Hello @vdjeudjam we normally don't assign issues for GSOC or other programs. Just submit a PR that references the issue. We will review the first one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants