Skip to content

Commit

Permalink
Merge pull request #2812 from VOD555/fix_rdf_density
Browse files Browse the repository at this point in the history
* Fixes #2811
* enable hitherto undocumented kwarg density for InterRDF_s (but changed 
   meaning of True/False):
   If you used density=False since 0.19.0 to calculate a single particle density then you 
   now have to use density=True. The default remains the same: calculate the RDF and not
   the density.
* Fix normalization for InterRDF_s(..., density=True)
* add test to compare InterRDF and InterRDF_s
* add description for density option
* update CHANGELOG
  • Loading branch information
orbeckst authored Jul 12, 2020
2 parents 3c97ae0 + 9046bf5 commit 8704511
Show file tree
Hide file tree
Showing 3 changed files with 150 additions and 43 deletions.
11 changes: 7 additions & 4 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The rules for this file:

------------------------------------------------------------------------------
??/??/?? tylerjereddy, richardjgowers, IAlibay, hmacdope, orbeckst, cbouy,
lilyminium, daveminh, jbarnoud, yuxuanzhuang
lilyminium, daveminh, jbarnoud, yuxuanzhuang, VOD555

* 2.0.0

Expand All @@ -26,11 +26,14 @@ Fixes
* TOPParser no longer guesses elements when missing atomic number records
(Issues #2449, #2651)
* Testsuite does not any more matplotlib.use('agg') (#2191)
* In ChainReader, read_frame does not trigger change of iterating position.
* In ChainReader, read_frame does not trigger change of iterating position.
(Issue #2723, PR #2815)
* TRZReader now checks `n_atoms` on reading. (Issue #2817, PR #2820)
* TRZWriter now writes `n_atoms` to the file. (Issue #2817, PR #2820)

* rdf.InterRDF_s density keyword documented and now gives correct results for
density=True; the keyword was available since 0.19.0 but with incorrect
semantics and not documented and did not produce correct results (Issue
#2811, PR #2812)

Enhancements
* Added the RDKitParser which creates a `core.topology.Topology` object from
Expand All @@ -46,7 +49,7 @@ Enhancements
Changes
* deprecated NumPy type aliases have been replaced with their actual types
(see upstream NumPy PR 14882), and our pytest configuration will raise an
error for any violation when testing with development NumPy builds
error for any violation when testing with development NumPy builds
(`1.20.0`)
* Changes development status from Beta to Mature (Issue #2773)
* Removes deprecated MDAnalysis.analysis.hole, please use
Expand Down
158 changes: 125 additions & 33 deletions package/MDAnalysis/analysis/rdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
`pair distribution functions`_ (`radial distribution functions`_ or "RDF").
The RDF :math:`g_{ab}(r)` between types of particles :math:`a` and :math:`b` is
.. _equation-gab:
.. math::
g_{ab}(r) = (N_{a} N_{b})^{-1} \sum_{i=1}^{N_a} \sum_{j=1}^{N_b}
Expand All @@ -46,6 +48,8 @@
and the average number of :math:`b` particles within radius :math:`r`
.. _equation-countab:
.. math::
N_{ab}(r) = \rho G_{ab}(r)
Expand All @@ -55,6 +59,15 @@
the first solvation shell :math:`N(r_1)` where :math:`r_1` is the position of
the first minimum in :math:`g(r)`.
In :class:`InterRDF_s`, we provide an option `density`. When `density` is
``False``, it will return the RDF :math:`g_{ab}(r)`; when `density` is
``True``, it will return the density of particle :math:`b` in a shell at
distance :math:`r` around a :math:`a` particle, which is
.. _equation-nab:
.. math::
n_{ab}(r) = \rho g_{ab}(r)
.. _`pair distribution functions`:
https://en.wikipedia.org/wiki/Pair_distribution_function
Expand All @@ -68,7 +81,7 @@
:class:`InterRDF` is a tool to calculate average radial distribution functions
between two groups of atoms. Suppose we have two AtomGroups ``A`` and
``B``. ``A`` contains atom ``A1``, ``A2``, and ``B`` contains ``B1``,
``B2``. Give ``A`` and ``B`` to class:`InterRDF`, the output will be the
``B2``. Given ``A`` and ``B`` to :class:`InterRDF`, the output will be the
average of RDFs between ``A1`` and ``B1``, ``A1`` and ``B2``, ``A2`` and
``B1``, ``A2`` and ``B2``. A typical application is to calculate the RDF of
solvent with itself or with another solute.
Expand All @@ -77,28 +90,82 @@
:members:
:inherited-members:
.. attribute:: bins
:class:`numpy.ndarray` of the centers of the `nbins` histogram
bins.
.. attribute:: edges
:class:`numpy.ndarray` of the `nbins + 1` edges of the histogram
bins.
.. attribute:: rdf
:class:`numpy.ndarray` of the :ref:`radial distribution
function<equation-gab>` values for the :attr:`bins`.
.. attribute:: count
:class:`numpy.ndarray` representing the radial histogram, i.e.,
the raw counts, for all :attr:`bins`.
Site-specific radial distribution function
------------------------------------------
:class:`InterRDF_s` calculates site-specific radial distribution
functions. Instead of two groups of atoms it takes as input a list of pairs of
AtomGroup, ``[[A, B], [C, D], ...]``. Give the same ``A`` and ``B`` to
:class:`InterRDF_s`, the output will be a list of RDFs between ``A1`` and
``B1``, ``A1`` and ``B2``, ``A2`` and ``B1``, ``A2`` and ``B2`` (and similarly
for ``C`` and ``D``). These site-specific radial distribution functions are
typically calculated if one is interested in the solvation shells of a ligand
in a binding site or the solvation of specific residues in a protein. A common
use case is to choose ``A`` and ``C`` to be AtomGroups that only contain a
single atom and ``W`` all solvent molecules: ``InterRDF_s(u, [[A, W], [B,
W]])`` will then produce the RDF of solvent around atom ``A[0]`` and around
atom ``B[0]``.
AtomGroup, ``[[A, B], [C, D], ...]``. Given the same ``A`` and ``B`` to
:class:`InterRDF_s`, the output will be a list of individual RDFs between
``A1`` and ``B1``, ``A1`` and ``B2``, ``A2`` and ``B1``, ``A2`` and ``B2`` (and
similarly for ``C`` and ``D``). These site-specific radial distribution
functions are typically calculated if one is interested in the solvation shells
of a ligand in a binding site or the solvation of specific residues in a
protein.
.. autoclass:: InterRDF_s
:members:
:inherited-members:
.. attribute:: bins
:class:`numpy.ndarray` of the centers of the `nbins` histogram
bins; all individual site-specific RDFs have the same bins.
.. attribute:: edges
:class:`numpy.ndarray` of the `nbins + 1` edges of the histogram
bins; all individual site-specific RDFs have the same bins.
.. attribute:: rdf
:class:`list` of the site-specific :ref:`radial distribution
functions<equation-gab>` or :ref:`density
functions<equation-nab>` for the :attr:`bins`. The list contains
``len(ags)`` entries. Each entry for the ``i``-th pair ``[A, B]
= ags[i]`` in `ags` is a :class:`numpy.ndarray` with shape
``(len(A), len(B))``, i.e., a stack of RDFs. For example,
``rdf[i][0, 2]`` is the RDF between atoms ``A[0]`` and ``B[2]``.
.. attribute:: count
:class:`list` of the site-specific radial histograms, i.e., the
raw counts, for all :attr:`bins`. The data have the same
structure as :attr:`rdf` except that the arrays contain the raw
counts.
.. attribute:: cdf
:class:`list` of the site-specific :ref:`cumulative
counts<equation-countab>`, for all :attr:`bins`. The data have the same
structure as :attr:`rdf` except that the arrays contain the cumulative
counts.
This attribute only exists after :meth:`get_cdf` has been run.
.. Not Implemented yet:
.. - Structure factor?
Expand All @@ -113,9 +180,17 @@


class InterRDF(AnalysisBase):
"""Intermolecular pair distribution function
r"""Intermolecular pair distribution function
The :ref:`radial distribution function<equation-gab>` is calculated by
histogramming distances between all particles in `g1` and `g2` while taking
periodic boundary conditions into account via the minimum image
convention.
The `exclusion_block` keyword may be used to exclude a set of distances
from the calculations.
InterRDF(g1, g2, nbins=75, range=(0.0, 15.0))
Results are available in the attributes :attr:`rdf` and :attr:`count`.
Arguments
---------
Expand Down Expand Up @@ -230,7 +305,7 @@ def _conclude(self):


class InterRDF_s(AnalysisBase):
"""Site-specific intermolecular pair distribution function
r"""Site-specific intermolecular pair distribution function
Arguments
---------
Expand All @@ -243,6 +318,18 @@ class InterRDF_s(AnalysisBase):
Number of bins in the histogram [75]
range : tuple or list (optional)
The size of the RDF [0.0, 15.0]
density : bool (optional)
``False``: calculate :math:`g_{ab}(r)`; ``True``: calculate
the true :ref:`single particle density<equation-nab>`
:math:`n_{ab}(r)`. The default is ``False``.
.. versionadded:: 1.0.1
This keyword was available since 0.19.0 but was not
documented. Furthermore, it had the opposite
meaning. Since 1.0.1 it is officially supported as
documented.
Example
-------
Expand All @@ -265,19 +352,20 @@ class InterRDF_s(AnalysisBase):
Results are available through the :attr:`bins` and :attr:`rdf` attributes::
plt.plot(rdf.bins, rdf.rdf[0][0][0])
plt.plot(rdf.bins, rdf.rdf[0][0, 0])
(Which plots the rdf between the first atom in ``s1`` and the first atom in
``s2``)
To generate the *cumulative distribution function* (cdf), use the
To generate the *cumulative distribution function* (cdf) in the sense of
"particles within radius :math:`r`", i.e., :math:`N_{ab}(r)`, use the
:meth:`~InterRDF_s.get_cdf` method ::
cdf = rdf.get_cdf()
Results are available through the :attr:'cdf' attribute::
Results are available through the :attr:`cdf` attribute::
plt.plot(rdf.bins, rdf.cdf[0][0][0])
plt.plot(rdf.bins, rdf.cdf[0][0, 0])
(Which plots the cdf between the first atom in ``s1`` and the first atom in
``s2``)
Expand All @@ -291,7 +379,7 @@ class InterRDF_s(AnalysisBase):
"""
def __init__(self, u, ags,
nbins=75, range=(0.0, 15.0), density=True, **kwargs):
nbins=75, range=(0.0, 15.0), density=False, **kwargs):
super(InterRDF_s, self).__init__(u.universe.trajectory, **kwargs)

# List of pairs of AtomGroups
Expand Down Expand Up @@ -325,8 +413,8 @@ def _single_frame(self):
box=self.u.dimensions)

for j, (idx1, idx2) in enumerate(pairs):
self.count[i][idx1, idx2, :] += np.histogram(dist[j],
**self.rdf_settings)[0]
self.count[i][idx1, idx2, :] += np.histogram(dist[j],
**self.rdf_settings)[0]

self.volume += self._ts.volume

Expand All @@ -342,31 +430,35 @@ def _conclude(self):

for i, (ag1, ag2) in enumerate(self.ags):
# Number of each selection
nA = len(ag1)
nB = len(ag2)
N = nA * nB
indices.append([ag1.indices, ag2.indices])

# Average number density
box_vol = self.volume / self.n_frames
density = N / box_vol
density = 1 / box_vol

if self._density:
rdf.append(self.count[i] / (density * vol * self.n_frames))
else:
rdf.append(self.count[i] / (vol * self.n_frames))
else:
rdf.append(self.count[i] / (density * vol * self.n_frames))

self.rdf = rdf
self.indices = indices

def get_cdf(self):
"""Calculate the cumulative distribution functions (CDF) for all sites.
Note that this is the actual count within a given radius, i.e.,
:math:`N(r)`.
r"""Calculate the cumulative counts for all sites.
This is the :ref:`cumulative count<equation-countab>` within a given
radius, i.e., :math:`N_{ab}(r)`.
The result is returned and also stored in the attribute
:attr:`cdf`.
Returns
-------
cdf : list
list of arrays with the same structure as :attr:`rdf`
cdf : list
list of arrays with the same structure as :attr:`rdf`
"""
# Calculate cumulative distribution function
# Empty list to restore CDF
Expand Down
24 changes: 18 additions & 6 deletions testsuite/MDAnalysisTests/analysis/test_rdf_s.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from numpy.testing import assert_almost_equal

import MDAnalysis as mda
from MDAnalysis.analysis.rdf import InterRDF_s
from MDAnalysis.analysis.rdf import InterRDF_s, InterRDF

from MDAnalysisTests.datafiles import GRO_MEMPROT, XTC_MEMPROT

Expand All @@ -38,16 +38,19 @@ def u():
@pytest.fixture(scope='module')
def sels(u):
s1 = u.select_atoms('name ZND and resid 289')
s2 = u.select_atoms('(name OD1 or name OD2) and resid 51 and sphzone 5.0 (resid 289)')
s2 = u.select_atoms(
'(name OD1 or name OD2) and resid 51 and sphzone 5.0 (resid 289)')
s3 = u.select_atoms('name ZND and (resid 291 or resid 292)')
s4 = u.select_atoms('(name OD1 or name OD2) and sphzone 5.0 (resid 291)')
ags = [[s1, s2], [s3, s4]]
return ags


@pytest.fixture(scope='module')
def rdf(u, sels):
return InterRDF_s(u, sels).run()


def test_nbins(u, sels):
rdf = InterRDF_s(u, sels, nbins=412).run()

Expand Down Expand Up @@ -90,15 +93,24 @@ def test_double_run(rdf):
assert len(rdf.count[0][0][1][rdf.count[0][0][1] == 5]) == 1
assert len(rdf.count[1][1][0][rdf.count[1][1][0] == 3]) == 1


def test_cdf(rdf):
rdf.get_cdf()
assert rdf.cdf[0][0][0][-1] == rdf.count[0][0][0].sum()/rdf.n_frames


@pytest.mark.parametrize("density, value", [
(True, 13275.775440503656),
(False, 0.021915460340071267)])

(None, 26551.55088100731), # default, like False (no kwarg, see below)
(False, 26551.55088100731),
(True, 0.021915460340071267)])
def test_density(u, sels, density, value):
rdf = InterRDF_s(u, sels, density=density).run()
kwargs = {'density': density} if density is not None else {}
rdf = InterRDF_s(u, sels, **kwargs).run()
assert_almost_equal(max(rdf.rdf[0][0][0]), value)
if not density:
s1 = u.select_atoms('name ZND and resid 289')
s2 = u.select_atoms(
'name OD1 and resid 51 and sphzone 5.0 (resid 289)')
rdf_ref = InterRDF(s1, s2).run()
assert_almost_equal(rdf_ref.rdf, rdf.rdf[0][0][0])

0 comments on commit 8704511

Please sign in to comment.