concatenating ResidueGroups #532

tylerjereddy · 2015-11-11T15:01:27Z

In a topology building context with two matching ResidueGroup selections, when you attempt to concatenate the ResidueGroups and write them to file, strange things happen (see below). This is probably related to the atom indices matching between the two groups despite changes to the resids. If you do this naively, you will restore the changed coordinates in the output file even though the concatenated group shows different coordinate contents (which also seem to be wrong in this example!).

Although one can simply state that MDAnalysis.Merge() should be used in these cases, is it really sensible for this kind of behaviour to happen silently? If building topology in this manner is really discouraged, maybe some kind of warning would be useful when concatenating groups with matching atom indices? Also, the docs seem to suggest that indices is a read only property, so I'm guessing a quick manual setting, even for an experienced user, might be tricky? Maybe one could go down to the atom.index level directly though.

import MDAnalysis
from MDAnalysis.tests.datafiles import GRO
import numpy as np

u = MDAnalysis.Universe(GRO)
u2 = MDAnalysis.Universe(GRO)

residues_1 = u.select_atoms('all').residues[:10]
residues_2 = u2.select_atoms('all').residues[:10]

residues_1.atoms
#<AtomGroup with 157 atoms>

residues_2.atoms
#<AtomGroup with 157 atoms>

residues_1.resids, residues_2.resids
#(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
#array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]))

residues_2[0].set_positions((np.zeros(3)))
residues_2[0].set_resids(999)
residues_2[0].id = 999
residues_2[0]
#<Residue MET, 999>
residues_2[0].coordinates()
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]], dtype=float32)

combined_residues = residues_1 + residues_2
combined_residues
#<AtomGroup with 314 atoms>
combined_residues.residues.resids
#array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10, 999,   2,   3,
#         4,   5,   6,   7,   8,   9,  10])
combined_residues.residues[10].coordinates() #what's going on with the last few rows here?
array([[  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ],
       [ 26.60000038,  64.09999847,  18.71999931],
       [ 27.29000092,  64.45000458,  18.15000153],
       [ 26.28000069,  63.33000183,  18.26000023],
       [ 26.64999962,  64.05000305,  18.59000015]], dtype=float32)

combined_residues.residues[10]
#<Residue MET, 999>

combined_residues.atoms.write('combined.gro')

!grep 999 combined.gro #neither the full set of zeros nor the weird zero + other coords above survive!

  999MET      N  158   5.202   4.356   3.155
  999MET     H1  159   5.119   4.411   3.172
  999MET     H2  160   5.155   4.283   3.104
  999MET     H3  161   5.247   4.318   3.237
  999MET     CA  162   5.306   4.421   3.075
  999MET     HA  163   5.383   4.347   3.054
  999MET     CB  164   5.257   4.474   2.941
  999MET    HB1  165   5.189   4.404   2.893
  999MET    HB2  166   5.202   4.564   2.966
  999MET     CG  167   5.371   4.511   2.845
  999MET    HG1  168   5.347   4.566   2.754
  999MET    HG2  169   5.439   4.573   2.903
  999MET     SD  170   5.464   4.368   2.784
  999MET     CE  171   5.335   4.312   2.670
  999MET    HE1  172   5.315   4.378   2.586
  999MET    HE2  173   5.359   4.210   2.638
  999MET    HE3  174   5.242   4.291   2.723
  999MET      C  175   5.374   4.532   3.155
  999MET      O  176   5.306   4.623   3.201

!head combined.gro #restored coords are from the same residue in first group
    1MET      N    1   5.202   4.356   3.155
    1MET     H1    2   5.119   4.411   3.172
    1MET     H2    3   5.155   4.283   3.104
    1MET     H3    4   5.247   4.318   3.237
    1MET     CA    5   5.306   4.421   3.075
    1MET     HA    6   5.383   4.347   3.054
    1MET     CB    7   5.257   4.474   2.941
    1MET    HB1    8   5.189   4.404   2.893

The text was updated successfully, but these errors were encountered:

richardjgowers · 2015-11-11T20:04:56Z

So I think a lot of what is going on is that there's rarely checks that two MDA objects are from the same Universe when things are used (AtomGroups added, in functions etc)

With the coordinates() example, AtomGroups will pull the Universe off the first atom, then grab all the coordinates based off the indices.

u = self.atoms[0].universe
indices = self.atoms.indices
positions = u.trajectory.ts.positions[indices]
return positions

To make this work properly, you'd need to instead do a list comprehension over the atoms, so each atom pulled the coordinate from its original Universe np.array([atom.positions for atom in self]), but that'd be horribly slow.

A good fix for this would be to add a line to AtomGroup.__add__ like if not ag1.universe is ag2.universe to stop these weird AtomGroups getting made in the first place.

orbeckst · 2015-11-12T22:56:07Z

Can we add a quick fix (along the lines of what @richardjgowers suggests) to at least prevent weird things from happening silently?

It would be worthwhile to think through what the requirements are to make system building work seamlessly (such as a checklist of features and uses that "should just work") and then we could think how the underlying data representations have to be structured to make that work. My suspicion is that this will be tangled up with #363 .

richardjgowers · 2015-11-12T23:04:24Z

#363 is going to make things like this worse, as it moves from list comprehensions of attributes (asking each Atom what his name is), to slicing one array of names.

I think the ultimate solution is to Merge first, and then work from there, and never (be able to) use mixed Universe objects.

Prevents addition of AtomGroups from different Universes

richardjgowers · 2015-11-15T21:57:15Z

So I've added the check for adding AtomGroups from different Universes. This should now not work and make lots of noise.

I think the rest of this is wontfix and/or use Merge so I'm gonna close this.

aphagstrom · 2019-07-09T20:21:05Z

What has set_positions method been changed to? This doesn't work anymore

orbeckst · 2019-07-09T20:52:55Z

@aphagstrom for general questions please use the mailing list https://groups.google.com/forum/#!forum/mdnalysis-discussion . We're happy to answer questions there.

We reserve the issue tracker for bugs and enhancements.

Thanks!

tylerjereddy added usability topology-building labels Nov 11, 2015

richardjgowers self-assigned this Nov 13, 2015

richardjgowers added a commit that referenced this issue Nov 13, 2015

Tests for Issue #532

51b109e

richardjgowers added a commit that referenced this issue Nov 13, 2015

Added fix for Issue #532

b984aa9

Prevents addition of AtomGroups from different Universes

richardjgowers added the wontfix label Nov 15, 2015

richardjgowers closed this as completed Nov 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concatenating ResidueGroups #532

concatenating ResidueGroups #532

tylerjereddy commented Nov 11, 2015

richardjgowers commented Nov 11, 2015

orbeckst commented Nov 12, 2015

richardjgowers commented Nov 12, 2015

richardjgowers commented Nov 15, 2015

aphagstrom commented Jul 9, 2019

orbeckst commented Jul 9, 2019

concatenating ResidueGroups #532

concatenating ResidueGroups #532

Comments

tylerjereddy commented Nov 11, 2015

richardjgowers commented Nov 11, 2015

orbeckst commented Nov 12, 2015

richardjgowers commented Nov 12, 2015

richardjgowers commented Nov 15, 2015

aphagstrom commented Jul 9, 2019

orbeckst commented Jul 9, 2019