Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make guessers more consistent and transparent #2348

Open
orbeckst opened this issue Sep 10, 2019 · 7 comments · May be fixed by #3753
Open

make guessers more consistent and transparent #2348

orbeckst opened this issue Sep 10, 2019 · 7 comments · May be fixed by #3753

Comments

@orbeckst
Copy link
Member

Expected behavior

  • Code should only guess when authorized by the user (see Print warning when a property is guessed  #2222 ).
  • If property cannot be guessed (atom types, bonds, masses) then a proper error message or warning should alert the user and suggest how to enable guessing.
  • Users should be able to override guesses.
  • Guessers should use the best available approach.

Actual behavior

Existing discussions

Code to reproduce the behavior

Example for bad mass guessing from atom_types

import MDAnalysis as mda
from MDAnalysis.tests import datafiles
import numpy as np

u = mda.Universe(datafiles.PDBQT_input)

not_guessed = u.atoms[u.atoms.masses == 0]
print("Missing masses: {}/{}".format(not_guessed.n_atoms, u.atoms.n_atoms))
print("Types without masses:", set(not_guessed.types))

# guessing by name finds all masses
masses_from_name = np.array([mda.topology.guessers.guess_atom_mass(name) for name in not_guessed.names])
print("Guessed masses from missing with name: {}/{}".format(sum(masses_from_name != 0), not_guessed.n_atoms))

# even guessing again from the types improves
masses_from_types_again = np.array([mda.topology.guessers.guess_atom_mass(atype) for atype in not_guessed.types])
print("Guessed masses (again) from missing with type: {}/{}".format(sum(masses_from_types_again != 0), not_guessed.n_atoms))
print("Types without masses:", set(not_guessed[masses_from_types_again == 0].types))

gives

~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/topology/guessers.py:80: UserWarning: Failed to guess the mass for the following atom types: A
  warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/topology/guessers.py:80: UserWarning: Failed to guess the mass for the following atom types: HD
  warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/topology/guessers.py:80: UserWarning: Failed to guess the mass for the following atom types: OA
  warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
Missing masses: 726/1805
Types without masses: {'HD', 'OA', 'A'}
Guessed masses from missing with name: 726/726
Guessed masses (again) from missing with type: 580/726
Types without masses: {'A'}

I actually don't understand why on parsing the topology, the types HD, OA, and A cannot be used for guessing the mass but later HD and OA work and only A fails:

u = mda.Universe(datafiles.PDBQT_input)
ng = np.array([mda.topology.guessers.guess_atom_mass(atype) for atype in u.atoms.types]) == 0
print("No masses for types ", set(u.atoms[ng].types))

Again, warnings as above, but then No masses for types {'A'} only.

Currently version of MDAnalysis

0.20.1

@RMeli
Copy link
Member

RMeli commented Sep 11, 2019

Related discussions:

@orbeckst
Copy link
Member Author

See also #2349

@richardjgowers
Copy link
Member

See also:

In [70]: mda.topology.guessers.guess_atom_element('Zn')                                                                                     
Out[70]: 'N'

-_-

@richardjgowers
Copy link
Member

Also my solution which maybe could make it into mda

from periodictable import elements
from fuzzywuzzy import process


_ELEMS = {elem.symbol for elem in elements
          if not elem.symbol == 'n'}
_RESULT_CACHE = {}

def to_element(thing):
    try:
        return _RESULT_CACHE[thing]
    except KeyError:
        stripped_thing = thing.strip('1234567890')
        if stripped_thing in _ELEMS:
            ret = stripped_thing
        else:
            ret, score = process.extractOne(thing, _ELEMS)
            # if score < something: throw a tantrum                                                                                         
        _RESULT_CACHE[thing] = ret
        return ret

@RMeli
Copy link
Member

RMeli commented Jan 17, 2020

To add on this issue, I think the name of guess_masses is misleading. In fact this function does not call guess_atom_mass but get_atom_mass. get_masses could be a better name?

@AnshulAngaria
Copy link

@orbeckst , I want to work on this issue. Any guidance?

@orbeckst
Copy link
Member Author

orbeckst commented Mar 8, 2020

This is a complicated issue. Start by reading everything that is linked in this issue and try to get an idea of what the problem is. If you want to work on it, don’t try to solve everything, start with a partial solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants