Merge pull request #2812 from VOD555/fix_rdf_density

* Fixes #2811 * enable hitherto undocumented kwarg density for InterRDF_s (but changed meaning of True/False): If you used density=False since 0.19.0 to calculate a single particle density then you now have to use density=True. The default remains the same: calculate the RDF and not the density. * Fix normalization for InterRDF_s(..., density=True) * add test to compare InterRDF and InterRDF_s * add description for density option * update CHANGELOG
MDAnalysis · Jul 12, 2020 · 8704511 · 8704511
2 parents 3c97ae0 + 9046bf5
commit 8704511
Show file tree

Hide file tree

Showing 3 changed files with 150 additions and 43 deletions.
diff --git a/package/CHANGELOG b/package/CHANGELOG
@@ -14,7 +14,7 @@ The rules for this file:
 
 ------------------------------------------------------------------------------
 ??/??/?? tylerjereddy, richardjgowers, IAlibay, hmacdope, orbeckst, cbouy,
-         lilyminium, daveminh, jbarnoud, yuxuanzhuang
+         lilyminium, daveminh, jbarnoud, yuxuanzhuang, VOD555
 
   * 2.0.0
 
@@ -26,11 +26,14 @@ Fixes
   * TOPParser no longer guesses elements when missing atomic number records
     (Issues #2449, #2651)
   * Testsuite does not any more matplotlib.use('agg') (#2191)
-  * In ChainReader, read_frame does not trigger change of iterating position. 
+  * In ChainReader, read_frame does not trigger change of iterating position.
     (Issue #2723, PR #2815)
   * TRZReader now checks `n_atoms` on reading. (Issue #2817, PR #2820)
   * TRZWriter now writes `n_atoms` to the file. (Issue #2817, PR #2820)
-
+  * rdf.InterRDF_s density keyword documented and now gives correct results for
+    density=True; the keyword was available since 0.19.0 but with incorrect
+    semantics and not documented and did not produce correct results (Issue
+    #2811, PR #2812)
 
 Enhancements
   * Added the RDKitParser which creates a `core.topology.Topology` object from
@@ -46,7 +49,7 @@ Enhancements
 Changes
   * deprecated NumPy type aliases have been replaced with their actual types
     (see upstream NumPy PR 14882), and our pytest configuration will raise an
-    error for any violation when testing with development NumPy builds 
+    error for any violation when testing with development NumPy builds
     (`1.20.0`)
   * Changes development status from Beta to Mature (Issue #2773)
   * Removes deprecated MDAnalysis.analysis.hole, please use

diff --git a/package/MDAnalysis/analysis/rdf.py b/package/MDAnalysis/analysis/rdf.py
@@ -28,6 +28,8 @@
 `pair distribution functions`_ (`radial distribution functions`_ or "RDF").
 The RDF :math:`g_{ab}(r)` between types of particles :math:`a` and :math:`b` is
 
+.. _equation-gab:
+
 .. math::
 
    g_{ab}(r) = (N_{a} N_{b})^{-1} \sum_{i=1}^{N_a} \sum_{j=1}^{N_b}
@@ -46,6 +48,8 @@
 
 and the average number of :math:`b` particles within radius :math:`r`
 
+.. _equation-countab:
+
 .. math::
 
    N_{ab}(r) = \rho G_{ab}(r)
@@ -55,6 +59,15 @@
 the first solvation shell :math:`N(r_1)` where :math:`r_1` is the position of
 the first minimum in :math:`g(r)`.
 
+In :class:`InterRDF_s`, we provide an option `density`. When `density` is
+``False``, it will return the RDF :math:`g_{ab}(r)`; when `density` is
+``True``, it will return the density of particle :math:`b` in a shell at
+distance :math:`r` around a :math:`a` particle, which is
+
+.. _equation-nab:
+
+.. math::
+   n_{ab}(r) = \rho g_{ab}(r)
 
 .. _`pair distribution functions`:
    https://en.wikipedia.org/wiki/Pair_distribution_function
@@ -68,7 +81,7 @@
 :class:`InterRDF` is a tool to calculate average radial distribution functions
 between two groups of atoms. Suppose we have two AtomGroups ``A`` and
 ``B``. ``A`` contains atom ``A1``, ``A2``, and ``B`` contains ``B1``,
-``B2``. Give ``A`` and ``B`` to class:`InterRDF`, the output will be the
+``B2``. Given ``A`` and ``B`` to :class:`InterRDF`, the output will be the
 average of RDFs between ``A1`` and ``B1``, ``A1`` and ``B2``, ``A2`` and
 ``B1``, ``A2`` and ``B2``. A typical application is to calculate the RDF of
 solvent with itself or with another solute.
@@ -77,28 +90,82 @@
    :members:
    :inherited-members:
 
+   .. attribute:: bins
+
+      :class:`numpy.ndarray` of the centers of the `nbins` histogram
+      bins.
+
+   .. attribute:: edges
+
+      :class:`numpy.ndarray` of the `nbins + 1` edges of the histogram
+      bins.
+
+   .. attribute:: rdf
+
+      :class:`numpy.ndarray` of the :ref:`radial distribution
+      function<equation-gab>` values for the :attr:`bins`.
+
+   .. attribute:: count
+
+      :class:`numpy.ndarray` representing the radial histogram, i.e.,
+      the raw counts, for all :attr:`bins`.
+
 
 Site-specific radial distribution function
 ------------------------------------------
 
 :class:`InterRDF_s` calculates site-specific radial distribution
 functions. Instead of two groups of atoms it takes as input a list of pairs of
-AtomGroup, ``[[A, B], [C, D], ...]``. Give the same ``A`` and ``B`` to
-:class:`InterRDF_s`, the output will be a list of RDFs between ``A1`` and
-``B1``, ``A1`` and ``B2``, ``A2`` and ``B1``, ``A2`` and ``B2`` (and similarly
-for ``C`` and ``D``). These site-specific radial distribution functions are
-typically calculated if one is interested in the solvation shells of a ligand
-in a binding site or the solvation of specific residues in a protein. A common
-use case is to choose ``A`` and ``C`` to be AtomGroups that only contain a
-single atom and ``W`` all solvent molecules: ``InterRDF_s(u, [[A, W], [B,
-W]])`` will then produce the RDF of solvent around atom ``A[0]`` and around
-atom ``B[0]``.
-
+AtomGroup, ``[[A, B], [C, D], ...]``. Given the same ``A`` and ``B`` to
+:class:`InterRDF_s`, the output will be a list of individual RDFs between
+``A1`` and ``B1``, ``A1`` and ``B2``, ``A2`` and ``B1``, ``A2`` and ``B2`` (and
+similarly for ``C`` and ``D``). These site-specific radial distribution
+functions are typically calculated if one is interested in the solvation shells
+of a ligand in a binding site or the solvation of specific residues in a
+protein.
 
 .. autoclass:: InterRDF_s
    :members:
    :inherited-members:
 
+   .. attribute:: bins
+
+      :class:`numpy.ndarray` of the centers of the `nbins` histogram
+      bins; all individual site-specific RDFs have the same bins.
+
+   .. attribute:: edges
+
+      :class:`numpy.ndarray` of the `nbins + 1` edges of the histogram
+      bins; all individual site-specific RDFs have the same bins.
+
+   .. attribute:: rdf
+
+      :class:`list` of the site-specific :ref:`radial distribution
+      functions<equation-gab>` or :ref:`density
+      functions<equation-nab>` for the :attr:`bins`. The list contains
+      ``len(ags)`` entries. Each entry for the ``i``-th pair ``[A, B]
+      = ags[i]`` in `ags` is a :class:`numpy.ndarray` with shape
+      ``(len(A), len(B))``, i.e., a stack of RDFs. For example,
+      ``rdf[i][0, 2]`` is the RDF between atoms ``A[0]`` and ``B[2]``.
+
+   .. attribute:: count
+
+      :class:`list` of the site-specific radial histograms, i.e., the
+      raw counts, for all :attr:`bins`. The data have the same
+      structure as :attr:`rdf` except that the arrays contain the raw
+      counts.
+
+   .. attribute:: cdf
+
+      :class:`list` of the site-specific :ref:`cumulative
+      counts<equation-countab>`, for all :attr:`bins`. The data have the same
+      structure as :attr:`rdf` except that the arrays contain the cumulative
+      counts.
+
+      This attribute only exists after :meth:`get_cdf` has been run.
+
+
+
 
 .. Not Implemented yet:
 .. - Structure factor?
@@ -113,9 +180,17 @@
 
 
 class InterRDF(AnalysisBase):
-    """Intermolecular pair distribution function
+    r"""Intermolecular pair distribution function
+
+    The :ref:`radial distribution function<equation-gab>` is calculated by
+    histogramming distances between all particles in `g1` and `g2` while taking
+    periodic boundary conditions into account via the minimum image
+    convention.
+
+    The `exclusion_block` keyword may be used to exclude a set of distances
+    from the calculations.
 
-    InterRDF(g1, g2, nbins=75, range=(0.0, 15.0))
+    Results are available in the attributes :attr:`rdf` and :attr:`count`.
 
     Arguments
     ---------
@@ -230,7 +305,7 @@ def _conclude(self):
 
 
 class InterRDF_s(AnalysisBase):
-    """Site-specific intermolecular pair distribution function
+    r"""Site-specific intermolecular pair distribution function
 
     Arguments
     ---------
@@ -243,6 +318,18 @@ class InterRDF_s(AnalysisBase):
           Number of bins in the histogram [75]
     range : tuple or list (optional)
           The size of the RDF [0.0, 15.0]
+    density : bool (optional)
+          ``False``: calculate :math:`g_{ab}(r)`; ``True``: calculate
+          the true :ref:`single particle density<equation-nab>`
+          :math:`n_{ab}(r)`. The default is ``False``.
+
+          .. versionadded:: 1.0.1
+
+             This keyword was available since 0.19.0 but was not
+             documented. Furthermore, it had the opposite
+             meaning. Since 1.0.1 it is officially supported as
+             documented.
+
 
     Example
     -------
@@ -265,19 +352,20 @@ class InterRDF_s(AnalysisBase):
 
     Results are available through the :attr:`bins` and :attr:`rdf` attributes::
 
-      plt.plot(rdf.bins, rdf.rdf[0][0][0])
+      plt.plot(rdf.bins, rdf.rdf[0][0, 0])
 
     (Which plots the rdf between the first atom in ``s1`` and the first atom in
     ``s2``)
 
-    To generate the *cumulative distribution function* (cdf), use the
+    To generate the *cumulative distribution function* (cdf) in the sense of
+    "particles within radius :math:`r`", i.e., :math:`N_{ab}(r)`, use the
     :meth:`~InterRDF_s.get_cdf` method ::
 
       cdf = rdf.get_cdf()
 
-    Results are available through the :attr:'cdf' attribute::
+    Results are available through the :attr:`cdf` attribute::
 
-      plt.plot(rdf.bins, rdf.cdf[0][0][0])
+      plt.plot(rdf.bins, rdf.cdf[0][0, 0])
 
     (Which plots the cdf between the first atom in ``s1`` and the first atom in
     ``s2``)
@@ -291,7 +379,7 @@ class InterRDF_s(AnalysisBase):
 
     """
     def __init__(self, u, ags,
-                 nbins=75, range=(0.0, 15.0), density=True, **kwargs):
+                 nbins=75, range=(0.0, 15.0), density=False, **kwargs):
         super(InterRDF_s, self).__init__(u.universe.trajectory, **kwargs)
 
         # List of pairs of AtomGroups
@@ -325,8 +413,8 @@ def _single_frame(self):
                                                     box=self.u.dimensions)
 
             for j, (idx1, idx2) in enumerate(pairs):
-                self.count[i][idx1, idx2, :] += np.histogram(dist[j], 
-                                                             **self.rdf_settings)[0]            
+                self.count[i][idx1, idx2, :] += np.histogram(dist[j],
+                                                             **self.rdf_settings)[0]
 
         self.volume += self._ts.volume
 
@@ -342,31 +430,35 @@ def _conclude(self):
 
         for i, (ag1, ag2) in enumerate(self.ags):
             # Number of each selection
-            nA = len(ag1)
-            nB = len(ag2)
-            N = nA * nB
             indices.append([ag1.indices, ag2.indices])
 
             # Average number density
             box_vol = self.volume / self.n_frames
-            density = N / box_vol
+            density = 1 / box_vol
 
             if self._density:
-                rdf.append(self.count[i] / (density * vol * self.n_frames))
-            else:
                 rdf.append(self.count[i] / (vol * self.n_frames))
+            else:
+                rdf.append(self.count[i] / (density * vol * self.n_frames))
 
         self.rdf = rdf
         self.indices = indices
 
     def get_cdf(self):
-        """Calculate the cumulative distribution functions (CDF) for all sites.
-        Note that this is the actual count within a given radius, i.e.,
-        :math:`N(r)`.
+        r"""Calculate the cumulative counts for all sites.
+
+        This is the :ref:`cumulative count<equation-countab>` within a given
+        radius, i.e., :math:`N_{ab}(r)`.
+
+        The result is returned and also stored in the attribute
+        :attr:`cdf`.
+
+
         Returns
         -------
-              cdf : list
-                      list of arrays with the same structure as :attr:`rdf`
+        cdf : list
+              list of arrays with the same structure as :attr:`rdf`
+
         """
         # Calculate cumulative distribution function
         # Empty list to restore CDF

diff --git a/testsuite/MDAnalysisTests/analysis/test_rdf_s.py b/testsuite/MDAnalysisTests/analysis/test_rdf_s.py
@@ -25,7 +25,7 @@
 from numpy.testing import assert_almost_equal
 
 import MDAnalysis as mda
-from MDAnalysis.analysis.rdf import InterRDF_s
+from MDAnalysis.analysis.rdf import InterRDF_s, InterRDF
 
 from MDAnalysisTests.datafiles import GRO_MEMPROT, XTC_MEMPROT
 
@@ -38,16 +38,19 @@ def u():
 @pytest.fixture(scope='module')
 def sels(u):
     s1 = u.select_atoms('name ZND and resid 289')
-    s2 = u.select_atoms('(name OD1 or name OD2) and resid 51 and sphzone 5.0 (resid 289)')
+    s2 = u.select_atoms(
+         '(name OD1 or name OD2) and resid 51 and sphzone 5.0 (resid 289)')
     s3 = u.select_atoms('name ZND and (resid 291 or resid 292)')
     s4 = u.select_atoms('(name OD1 or name OD2) and sphzone 5.0 (resid 291)')
     ags = [[s1, s2], [s3, s4]]
     return ags
 
+
 @pytest.fixture(scope='module')
 def rdf(u, sels):
     return InterRDF_s(u, sels).run()
 
+
 def test_nbins(u, sels):
     rdf = InterRDF_s(u, sels, nbins=412).run()
 
@@ -90,15 +93,24 @@ def test_double_run(rdf):
     assert len(rdf.count[0][0][1][rdf.count[0][0][1] == 5]) == 1
     assert len(rdf.count[1][1][0][rdf.count[1][1][0] == 3]) == 1
 
+
 def test_cdf(rdf):
     rdf.get_cdf()
     assert rdf.cdf[0][0][0][-1] == rdf.count[0][0][0].sum()/rdf.n_frames
 
 
 @pytest.mark.parametrize("density, value", [
-    (True, 13275.775440503656),
-    (False, 0.021915460340071267)])
-
+    (None, 26551.55088100731),    # default, like False (no kwarg, see below)
+    (False, 26551.55088100731),
+    (True, 0.021915460340071267)])
 def test_density(u, sels, density, value):
-    rdf = InterRDF_s(u, sels, density=density).run()
+    kwargs = {'density': density} if density is not None else {}
+    rdf = InterRDF_s(u, sels, **kwargs).run()
     assert_almost_equal(max(rdf.rdf[0][0][0]), value)
+    if not density:
+        s1 = u.select_atoms('name ZND and resid 289')
+        s2 = u.select_atoms(
+                'name OD1 and resid 51 and sphzone 5.0 (resid 289)')
+        rdf_ref = InterRDF(s1, s2).run()
+        assert_almost_equal(rdf_ref.rdf, rdf.rdf[0][0][0])
+