Skip to content

Commit

Permalink
Extend GIST to mixtures and non-water solvents (#961)
Browse files Browse the repository at this point in the history
* Update GIST description with new command line options.

* in GIST, remove excludeions option

Removes
* the "excludeions" option
* the Gist::includeIons_ variable
* the Gist::A_idxs_ array, which tracked the solute+solvent atoms, and
  simply contained all atom indices unless excludeions was specified.

* In GIST, add a function to create datasets, change griddim_ to int.

* in GIST, refactor creation of the O_idxs array etc.

* Two new methods Action_GIST::setSolventProperties and
  Action_GIST::checkSolventProperties

* Unfinished implementation of GIST with non-water solvents.

* Removes checks for water is a solvent, but there is still a warning,
  and it is checked that all solvents are the same molecule.
* Molecules are currently binned by their center of mass (COM), but it
  is planned to add a "com" or "nocom" option to let the user choose.
* The user can choose 3 atoms that form a rigid subunit of the molecule
  for the quaternion construction
* There are some further refactorings: datasets are not cast to
  DataSet_GridFlt or ...Dbl, instead the polymorphic DataSet_3D is kept.
  I changed this after debugging for a while because I had accidently
  changed a dataset from double to float. The polymorphic option is more
  robust, but the downside is that DataSet_3D.operator[] is not
  writable, so we need to use the SetGrid method.

Currenty unfinished
* density output (g_... columns) for non-water solvents!!!
* order calculation (not planned for non-water solvents)
* neighbor calculation (not sure if I will implement this, it probably
  conflicts with the "com" option

* Make GIST datasets dynamic to allow other elements than O and H.

* Remove individual dataset pointers from Action_GIST class.
* Instead, manage an std::map to call them by their name. For fast
  access (e.g., within DoAction), the pointer is re-used.
* Dynamically add g_H, g_H, g_C etc. datasets for each element in the
  solvent.
* Add a struct SolventInfo (instance name solventInfo_) that tracks the
  elements in the solvent, and a method "getDensityDatasets" that
  returns a vector of pointers to the respective density datasets.
  solventInfo_ is filled by the analyzeSolventElements method during
  Action Setup.
* Add a class DataFilePrinter (private to Action_GIST), that prints
  whitespace separated values to a CpptrajFile. This replaces the long
  format string in Action_GIST::Print().

* Further GIST updates and refactorings

* implement some dataset math that will replace the CalcAvgVoxelEnergy
  and CalcAvgVoxelEnergy_PME functions
* fix a bug when parsing rigidAtomIndices
* implement a nocom option to recover the old GIST behavior, and a
  calcMolCenter method that computes the position of a molecule in the
  appropriate manner (COM or first "rigid" atom)

* In GIST, redo some bad refactorings of the prior commits

* make individual pointers for Esw_, Eww_ etc. datasets, instead of the
  std::map.
* Refactor some of the dataset math in the Print method

* Fix GIST output

* remove solute positions from Esw in PME-GIST.
* divide Eww by two when printing the total Eww on the grid.
* Add -a flag to DoTest because energy output is off in the last printed
  digit
* Fix GPU code after the last refactoring

* In GIST, track molecular densities and set rigidatomindices

rigidAtomIndices_ specifies which atoms of the solvent define a rigid
subunit. This is needed for the entropy calc, and also (if "nocom" is
specified) to define the molecule's position. Each number is an offset
from the molecule's start index (i.e., 0 means the first atom of the
moleucle). The first number is the central atom, and the second and
third number are other atoms that are bound to it. It can be set in 2
ways:
* The user specifies rigidatoms. In that case, the atom names are
  matched to set the indices (offset from molecule start)
* If the user doesn't specify, the solvent atom with the most bonds is
  searched. If it has less than 2 bonds to heavy atoms, hydrogens are
  also included. The central atom is the one with the most bonds. The
  two others are chosen from the atoms that are bonded to it.

Molecular densities are tracked in separate datasets. They are defined
relative to the solvent bulk density, and are computed by binning the
solvent centers of each specified solvent species (either the COM or the
first of "rigidAtomIndices_"). The user has to choose which molecules
are tracked by specifying "solventmols", followed by a comma-separated
list of molecule names. No selection masks are allowed, firstly because
this would mess with the comma-separated list, and also because we
anyway assume that we select whole molecules. For each specified
molecule, a dataset is generated and written out as dx file, e.g.,
gist-g-mol-WAT.dx

* In GIST: Consistent energy between PME and CPU, update tests

* add entropy tests for Gist4 test and Gist2 test
* run PME tests on GPU builds as well.
* change test save files to output v4, but don't change any other
  numbers.
* Use DoTest ... -a to allow for minor differences.

* Parallel entropy and RAM improvement for GIST.

* Parallel entropy calculations (dTSorient and dTSsix+dTStrans) for
  GIST, using OpenMP. The parallelization is very easy because most of
  the calculation can be done independently for each voxel, with only
  read access the voxel_xyz_ and voxel_Q_ arrays. The parallelization is
  implemented using the loop over all voxels, and there is a critical
  section when writing the data to arrays and incrementing the count of
  finished voxels and the count of water molecules. In a test system of
  10000 frames of benzene in water, the results are identical to the
  previous version.
* If skipS_ is specified, don't fill the voxel_xyz_ and voxel_Q_ arrays,
  since they are only needed for the entropy. Those are arrays of
  arrays. The outer array is still allocated, but each voxel array now
  stays empty with skipS_. For large GIST calculations (say, 200x200x200
  voxels and >10000 frames), those arrays need large amounts of memory
  (~8GB in this example, assuming mostly water in the grid)
  This allows for an efficient strategy for very large GIST calculations:
  do the energy once in total (for GPU or PME implementations, this is
  more efficient than splitting the grid) and do the entropy calculation
  for small sub-grids, or on a cluster with high RAM (now also in
  parallel).

* In GIST, fix progress bar for entropy, use "nocom" in GIST tests.

* In GIST, fix timing of CUDA energy being double-counted

* In GIST, rename N_waters_ to N_solvent_ and update tests.

* Undo an accidental change (extra echo command) in MasterTest.sh

* Update a format string in GIST

* Add .save files for GIST entropy tests

* Tests for GIST entropy were added previously, but the save files were
  not yet committed, leading to failing tests.

* Fix a segmentation fault in GIST.

* Avoid the arccos call in GIST orientational entropy.

* Add a GIST test for the entropy of a multidimensional normal distribution.

* In GIST, increase test tolerance to 1.1E-4

* Some numbers are stored with 1E-4 precision, and they are sometimes
  different in the last digit.

* Split GIST energy into solvent molecules, not yet correct!

* The Eww energy does not currently add up to the same value as before
  when ions are present

* Fix double counting in GistCudaCalc.cu

In the previous commit, the energy calculation was changed to fix the
double counting after the energy calculation in all cases. Therefore,
the CUDA kernel also needs to be updated.

* Add GIST tests for salt-water mixture

* add a topology and 10-frame trajectory of benzene in water.
* Add tests with and without PME. The reference energies are computed
  using the energy command. Accuracy is currently +-0.6 kcal/mol using
  PME and +-4 using nearest image convention.
* Fix a bug where GIST was crashing with the skipE and pme options. Now,
  just print 0 as energy in this case, as expected with skipE.

* Add regression tests for GIST energy with ions

* Check for libpme in GIST tests

* Add test tolerance to the Test_GIST_Order tests

* Fix GIST test runner.

* In GIST, streamline neighbor calculation with energy

* Minor GIST updates

* fix neighbor calculation on GPU
* add comments for previously undocumented functions
* add a check for linear solvent molecules

* Update documentation for GIST

* Add references to GIST, add tolerance to new GIST test.

* Also, removes the [excludeions] option from the GIST documentation.

* Increase version number to 6.7.0

* Add .save files for GIST test with benzene as a solvent

* Fix a print formatting string in GIST

* more exact quaternion distance in GIST

* add functions to compute the distance between almost normalized
  quaternions, such as those cast from float to double.
* update unitTests with new functions
  • Loading branch information
fwaibl authored Jun 7, 2022
1 parent 901809c commit c7526d7
Show file tree
Hide file tree
Showing 46 changed files with 60,651 additions and 1,535 deletions.
272 changes: 237 additions & 35 deletions doc/cpptraj.lyx
Original file line number Diff line number Diff line change
Expand Up @@ -25801,10 +25801,6 @@ rdval>]
[noimage] Disable distance imaging in energy calculation.
\end_layout

\begin_layout Description
[excludeions] If specified, exclude any ions from the calculation.
\end_layout

\begin_layout Description

\series bold
Expand Down Expand Up @@ -25988,6 +25984,68 @@ suffix>] Suffix for main GIST info file name.
the calculation time (default 1).
\end_layout

\begin_layout Description
[solute
\begin_inset space ~
\end_inset

<mask>] Selection mask for the solute.
All other molecules will be solvent.
If this is omitted, the standard solute/solvent assignment will be used.
\end_layout

\begin_layout Description
[solventmols
\begin_inset space ~
\end_inset

<MOLS>] Comma-separated list of names of solvent molecules.
Energies will be computed per solvent molecule.
For the entropy, only the main solvent (the first one) will be used.
Use, e.g.,
\series bold
solventmols WAT,NA,CL
\series default
for a GIST calculation including ions.
This needs to be specified if there is more than one solvent species.
\end_layout

\begin_layout Description
[nocom] Do not use the center of mass to define the molecular position.
Instead, use the first atom in
\series bold
rigidatoms
\series default
.
Use this flag to restore the behavior of old GIST runs.
\end_layout

\begin_layout Description
[rigidatoms
\begin_inset space ~
\end_inset

<CENTRAL>
\begin_inset space ~
\end_inset

<SUBST1>
\begin_inset space ~
\end_inset

<SUBST2>] Specifies how to define the molecular orientation for the entropy.
By default, a simple heuristic will be used.
This works for water, but not for all solvents.
The atoms should be representative of the molecular orientation and should
not be collinear.
Note that the central atom goes first.
For water, the default is equivalent to
\series bold
rigidatoms O H1 H2
\series default
, corresponding to H1-O-H2 as the rigid substructure.
\end_layout

\begin_layout Description
[nopme] Do not use particle mesh Ewald for the non-bonded calculation (default).
\end_layout
Expand Down Expand Up @@ -26105,7 +26163,11 @@ DataSet Aspects:
\end_layout

\begin_layout Description
[dTSorient] First order orientational entropy density .
[dTSorient] First order orientational entropy density.
\end_layout

\begin_layout Description
[dTSsix] First order six-dimensional entropy density.
\end_layout

\begin_layout Description
Expand Down Expand Up @@ -26145,6 +26207,90 @@ DataSet Aspects:
[U_PME] (pme only) Mean solute energy on the GIST grid.
\end_layout

\begin_layout Standard
DataSets if the main solvent is not water:
\end_layout

\begin_layout Description
[gELEM] For every element
\series bold
ELEM
\series default
in the main solvent, the atomic density relative to rho0 (e.g.,
\series bold
gC
\series default
and
\series bold
gH
\series default
for benzene).
\end_layout

\begin_layout Standard
DataSets if there are multiple solvents:
\end_layout

\begin_layout Description
[g_mol_NAME] for every solvent species
\series bold
NAME
\series default
(e.g.,
\series bold
g_mol_WAT
\series default
and
\series bold
g_mol_NA
\series default
if
\series bold
solventmols WAT,NA
\series default
was specified).
\end_layout

\begin_layout Description
[Esw_mol_NAME] for every solvent species
\series bold
NAME
\series default
(e.g.,
\series bold
Esw_mol_WAT
\series default
and
\series bold
Esw_mol_NA
\series default
if
\series bold
solventmols WAT,NA
\series default
was specified).
\end_layout

\begin_layout Description
[Eww_mol_NAME] for every solvent species
\series bold
NAME
\series default
(e.g.,
\series bold
Eww_mol_WAT
\series default
and
\series bold
Eww_mol_NA
\series default
if
\series bold
solventmols WAT,NA
\series default
was specified).
\end_layout

\end_deeper
\begin_layout Standard
Grid Inhomogeneous Solvation Theory
Expand Down Expand Up @@ -26519,6 +26665,37 @@ g_H

\begin_layout Itemize

\series bold
g_ELEM
\series default
- (if the main solvent is not water) Density of every further element
\series bold
ELEM
\series default
in the main solvent.
Scaled such that the expectation value in pure solvent is unity.
\end_layout

\begin_layout Itemize

\series bold
g_mol_NAME
\series default
- (if there is more than one solvent) Density of every solvent species

\series bold
NAME
\series default
specified in solventmols.
Scaled by
\series bold
rho0
\series default
.
\end_layout

\begin_layout Itemize

\series bold
dTStrans-dens
\series default
Expand Down Expand Up @@ -26625,8 +26802,60 @@ unting, and not referenced to the corresponding bulk value of this quantity
\end_inset

).
Again, both Lennard-Jones and electrostatic interactions are computed without
any cutoff, within the minimum image convention.
Unless PME is used, both Lennard-Jones and electrostatic interactions are
computed without any cutoff, within the minimum image convention.
\end_layout

\begin_layout Itemize

\series bold
Esw_mol_NAME-dens
\series default
and
\series bold
Esw_mol_NAME-norm
\series default
- (if there are multiple solvent species) Mean solute-solvent energy per
molecule of species
\series bold
NAME
\series default
, for each solvent specified in
\series bold
solventmols
\series default
.
Follows the same conventions as
\series bold
Esw
\series default
.
\end_layout

\begin_layout Itemize

\series bold
Eww_mol_NAME-dens
\series default
and
\series bold
Eww_mol_NAME-norm
\series default
- (if there are multiple solvent species) Mean solvent-solvent energy per
molecule of species
\series bold
NAME
\series default
, for each solvent specified in
\series bold
solventmols
\series default
.
Follows the same conventions as
\series bold
Eww
\series default
.
\end_layout

\begin_layout Itemize
Expand Down Expand Up @@ -27541,34 +27770,7 @@ d with a region
can add up quickly.
It can be advisable to omit all voxels above a certain distance to the
solute to obtain more stable results.
When using PME, the energy cannot be decomposed into Eww and Esw contributions.
In that case, the energy of hydration is
\begin_inset Formula $\Delta E_{hyd}=E_{solute}-E_{solute}^{self}+\sum^{voxels}E_{PME}^{dens}\times V_{vox}$
\end_inset

, where the solute energy
\begin_inset Formula $E_{solute}$
\end_inset

(containing solute-solute and solute-solvent interactions) can be found
in the GIST
\series bold
info
\series default
file, and the solute self-energy
\begin_inset Formula $E_{solute}^{self}$
\end_inset

can be obtained e.g.
using the
\begin_inset Quotes eld
\end_inset

energy
\begin_inset Quotes erd
\end_inset

command .

\end_layout

\begin_layout Subsection
Expand Down
Loading

0 comments on commit c7526d7

Please sign in to comment.