Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate TopologyAttrs with consistent meanings #2362

Open
lilyminium opened this issue Oct 10, 2019 · 6 comments
Open

Consolidate TopologyAttrs with consistent meanings #2362

lilyminium opened this issue Oct 10, 2019 · 6 comments

Comments

@lilyminium
Copy link
Member

lilyminium commented Oct 10, 2019

TopologyAttrs are a bit of a mess right now, and the documentation is not up-to-date. This is a list of some inconsistencies I found.

atomiccharge

The GAMESS parser uses the TopologyAttr atomiccharge for the atomic number.

>>> from MDAnalysis.tests.datafiles import *
>>> gms = mda.Universe(GMS_ASYMOPT)
>>> gms.atoms
<AtomGroup with 6 atoms>
>>> g = gms.atoms[0]
>>> g.atomiccharge
8
>>> g.name
'O'
>>> g.type
'O'
>>> g.element
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lily/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/core/groups.py", line 3566, in __getattr__
    cls=self.__class__.__name__, attr=attr))
AttributeError: Atom has no attribute element

This is technically correct, but I think most people would expect the atomiccharge to refer to a partial atomic charge. A more informational name might be atomicnumber.

atomnum

The Desmond DMS parser uses the TopologyAttr atomnum. The Desmond User Guide (section 17.1.1) asserts that 'anum' refers to the atomic number of a particle, but this seems implausible:

>>> from MDAnalysis.tests.datafiles import DMS
>>> dms = mda.Universe(DMS)
>>> dms.atoms.names
array(['N', 'HT1', 'HT2', ..., 'C', 'OT1', 'OT2'], dtype=object)
>>> dms.atoms.atomnums
array([0, 0, 0, ..., 0, 0, 0], dtype=int32)

element vs type

A non-exhaustive search indicates that only the prmtop TOPParser implements the element TopologyAttr.

>>> prm = mda.Universe(PRM)
>>> prm.atoms[:6].names
array(['N', 'H1', 'H2', 'H3', 'CA', 'HA'], dtype=object)
>>> prm.atoms[:6].types
array(['N3', 'H', 'H', 'H', 'CT', 'HP'], dtype=object)
>>> prm.atoms[:6].elements
array(['N', 'H', 'H', 'H', 'C', 'H'], dtype=object)

I looked at formats like GAMESS and XYZ that as far as I know explicitly state element, as well as formats like PDB that either read an element column or guess it from the name. type alternately refers to elements:

>>> pdb = mda.Universe(PDB)
>>> pdb.atoms.types
array(['N', 'H', 'H', ..., 'NA', 'NA', 'NA'], dtype=object)
>>> pdb.atoms.names
array(['N', 'H1', 'H2', ..., 'NA', 'NA', 'NA'], dtype=object)

or to force field / Autodock atom types:

>>> pdbqt = mda.Universe(PDBQT_input)
>>> pdbqt.atoms.types
array(['N', 'HD', 'HD', ..., 'C', 'C', 'OA'], dtype=object)
>>> pdbqt.atoms.names
array(['N', 'HN1', 'HN2', ..., 'CA', 'C', 'O'], dtype=object)
>>> pdbqt.atoms.masses
array([14.007,  0.   ,  0.   , ..., 12.011, 12.011,  0.   ])
>>> pdbqt.atoms.elements
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lily/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/core/groups.py", line 2278, in __getattr__
    cls=self.__class__.__name__, attr=attr))
AttributeError: AtomGroup has no attribute elements

My opinion is also that the 0 mass for the Hs and O should have raised a warning.

resid vs resnum

I can't see a difference between resnum and resid.

>>> crd = mda.Universe(CRD)
>>> crd.atoms.resnums
array([  1,   1,   1, ..., 214, 214, 214])
>>> crd.atoms.resids
array([  1,   1,   1, ..., 214, 214, 214])
>>> tpr = mda.Universe(TPR)
>>> tpr.atoms.resnums
array([    0,     0,     0, ..., 11299, 11300, 11301])
>>> tpr.atoms.resids
array([    0,     0,     0, ..., 11299, 11300, 11301])

GROMACS indexes from 1 in user-readable files, as well.

chainIDs

The Desmond parser appears to be the only one to implement the chainID TopologyAttr.

>>> dms.atoms.chainIDs
array(['X', 'X', 'X', ..., 'X', 'X', 'X'], dtype=object)

Comments in the code suggest that all other parsers turn chains into segments.

tempfactors vs bfactors

These parsers have tempfactors but not bfactors:

  • PDB
  • PDBx
  • PDBQT
  • CRD

MMTFParser has bfactors but not tempfactors.

>>> mmtf.atoms[:6].bfactors
array([ 9.48, 10.88, 10.88, 10.88, 10.88,  9.48])

TPRParser has neither bfactors nor tempfactors, despite the documentation thinking it does.

This impacts on selection language. Selecting same bfactor as *selection* only works for MMTFs because selection language doesn't recognise the tempfactor attribute.

ids

I don't understand the meaning of atom IDs for file formats like XYZ. TPR atom IDs also start from 0, despite GROMACS indexing from 1.

>>> tpr = mda.Universe(TPR)
>>> tpr.atoms.ids
array([    0,     1,     2, ..., 47678, 47679, 47680])

masses

Failing to guess masses gives masses of 0. This is done atom-by-atom, so users may only check the end of an array like the one below, not realising that they do not have appropriate masses elsewhere. This seems like a bad idea.

>>> mol2.atoms.masses
array([ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   , 18.998,  0.   , 35.45 ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  1.008,  1.008,
        1.008,  1.008,  1.008,  1.008,  1.008,  1.008,  1.008,  1.008,
        1.008,  1.008,  1.008,  1.008,  1.008,  1.008,  1.008,  1.008,
        1.008])

resnames

The GSDParser has numbers as resnames. Should this be the case?

>>> gsd = mda.Universe(GSD)
>>> gsd.atoms.resnames
array([0, 1, 2, ..., 647, 647, 647], dtype=object)
>>> type(gsd.atoms[0].resname)
<class 'int'>
@orbeckst
Copy link
Member

sigh

Thanks for documenting this. It is really helpful that you're taking a global view.

@lilyminium
Copy link
Member Author

>>> gsd = mda.Universe(GSD)
>>> gsd.atoms.resnames
array([0, 1, 2, ..., 647, 647, 647], dtype=object)
>>> gsd.select_atoms("resname 0")
<AtomGroup with 0 atoms>
>>> gsd.select_atoms("resname 2")
<AtomGroup with 0 atoms>

^ A consequence of integer resnames.

@richardjgowers
Copy link
Member

richardjgowers commented Oct 27, 2019 via email

@richardjgowers
Copy link
Member

This would be cool to address in 3.0 as it "breaks" things, but it needs fixing (ping #3800)

@orbeckst
Copy link
Member

Consolidating attributes looks like a project to be put on the 3.0 roadmap as it is related to

  • Improve interoperability with chemoinformatics and simulation software.
  • Improve guessers to enable context-dependent chemical information to be easily inferred.

With PR #3753 almost done, we will have a new guesser infrastructure. We should then clean-up our attributes.

@orbeckst
Copy link
Member

And by "putting on the roadmap" I mean "We need to figure out how we will make it actually happen." — something for the next dev meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants