Extend GIST to mixtures and non-water solvents (#961)

* Update GIST description with new command line options. * in GIST, remove excludeions option Removes * the "excludeions" option * the Gist::includeIons_ variable * the Gist::A_idxs_ array, which tracked the solute+solvent atoms, and simply contained all atom indices unless excludeions was specified. * In GIST, add a function to create datasets, change griddim_ to int. * in GIST, refactor creation of the O_idxs array etc. * Two new methods Action_GIST::setSolventProperties and Action_GIST::checkSolventProperties * Unfinished implementation of GIST with non-water solvents. * Removes checks for water is a solvent, but there is still a warning, and it is checked that all solvents are the same molecule. * Molecules are currently binned by their center of mass (COM), but it is planned to add a "com" or "nocom" option to let the user choose. * The user can choose 3 atoms that form a rigid subunit of the molecule for the quaternion construction * There are some further refactorings: datasets are not cast to DataSet_GridFlt or ...Dbl, instead the polymorphic DataSet_3D is kept. I changed this after debugging for a while because I had accidently changed a dataset from double to float. The polymorphic option is more robust, but the downside is that DataSet_3D.operator[] is not writable, so we need to use the SetGrid method. Currenty unfinished * density output (g_... columns) for non-water solvents!!! * order calculation (not planned for non-water solvents) * neighbor calculation (not sure if I will implement this, it probably conflicts with the "com" option * Make GIST datasets dynamic to allow other elements than O and H. * Remove individual dataset pointers from Action_GIST class. * Instead, manage an std::map to call them by their name. For fast access (e.g., within DoAction), the pointer is re-used. * Dynamically add g_H, g_H, g_C etc. datasets for each element in the solvent. * Add a struct SolventInfo (instance name solventInfo_) that tracks the elements in the solvent, and a method "getDensityDatasets" that returns a vector of pointers to the respective density datasets. solventInfo_ is filled by the analyzeSolventElements method during Action Setup. * Add a class DataFilePrinter (private to Action_GIST), that prints whitespace separated values to a CpptrajFile. This replaces the long format string in Action_GIST::Print(). * Further GIST updates and refactorings * implement some dataset math that will replace the CalcAvgVoxelEnergy and CalcAvgVoxelEnergy_PME functions * fix a bug when parsing rigidAtomIndices * implement a nocom option to recover the old GIST behavior, and a calcMolCenter method that computes the position of a molecule in the appropriate manner (COM or first "rigid" atom) * In GIST, redo some bad refactorings of the prior commits * make individual pointers for Esw_, Eww_ etc. datasets, instead of the std::map. * Refactor some of the dataset math in the Print method * Fix GIST output * remove solute positions from Esw in PME-GIST. * divide Eww by two when printing the total Eww on the grid. * Add -a flag to DoTest because energy output is off in the last printed digit * Fix GPU code after the last refactoring * In GIST, track molecular densities and set rigidatomindices rigidAtomIndices_ specifies which atoms of the solvent define a rigid subunit. This is needed for the entropy calc, and also (if "nocom" is specified) to define the molecule's position. Each number is an offset from the molecule's start index (i.e., 0 means the first atom of the moleucle). The first number is the central atom, and the second and third number are other atoms that are bound to it. It can be set in 2 ways: * The user specifies rigidatoms. In that case, the atom names are matched to set the indices (offset from molecule start) * If the user doesn't specify, the solvent atom with the most bonds is searched. If it has less than 2 bonds to heavy atoms, hydrogens are also included. The central atom is the one with the most bonds. The two others are chosen from the atoms that are bonded to it. Molecular densities are tracked in separate datasets. They are defined relative to the solvent bulk density, and are computed by binning the solvent centers of each specified solvent species (either the COM or the first of "rigidAtomIndices_"). The user has to choose which molecules are tracked by specifying "solventmols", followed by a comma-separated list of molecule names. No selection masks are allowed, firstly because this would mess with the comma-separated list, and also because we anyway assume that we select whole molecules. For each specified molecule, a dataset is generated and written out as dx file, e.g., gist-g-mol-WAT.dx * In GIST: Consistent energy between PME and CPU, update tests * add entropy tests for Gist4 test and Gist2 test * run PME tests on GPU builds as well. * change test save files to output v4, but don't change any other numbers. * Use DoTest ... -a to allow for minor differences. * Parallel entropy and RAM improvement for GIST. * Parallel entropy calculations (dTSorient and dTSsix+dTStrans) for GIST, using OpenMP. The parallelization is very easy because most of the calculation can be done independently for each voxel, with only read access the voxel_xyz_ and voxel_Q_ arrays. The parallelization is implemented using the loop over all voxels, and there is a critical section when writing the data to arrays and incrementing the count of finished voxels and the count of water molecules. In a test system of 10000 frames of benzene in water, the results are identical to the previous version. * If skipS_ is specified, don't fill the voxel_xyz_ and voxel_Q_ arrays, since they are only needed for the entropy. Those are arrays of arrays. The outer array is still allocated, but each voxel array now stays empty with skipS_. For large GIST calculations (say, 200x200x200 voxels and >10000 frames), those arrays need large amounts of memory (~8GB in this example, assuming mostly water in the grid) This allows for an efficient strategy for very large GIST calculations: do the energy once in total (for GPU or PME implementations, this is more efficient than splitting the grid) and do the entropy calculation for small sub-grids, or on a cluster with high RAM (now also in parallel). * In GIST, fix progress bar for entropy, use "nocom" in GIST tests. * In GIST, fix timing of CUDA energy being double-counted * In GIST, rename N_waters_ to N_solvent_ and update tests. * Undo an accidental change (extra echo command) in MasterTest.sh * Update a format string in GIST * Add .save files for GIST entropy tests * Tests for GIST entropy were added previously, but the save files were not yet committed, leading to failing tests. * Fix a segmentation fault in GIST. * Avoid the arccos call in GIST orientational entropy. * Add a GIST test for the entropy of a multidimensional normal distribution. * In GIST, increase test tolerance to 1.1E-4 * Some numbers are stored with 1E-4 precision, and they are sometimes different in the last digit. * Split GIST energy into solvent molecules, not yet correct! * The Eww energy does not currently add up to the same value as before when ions are present * Fix double counting in GistCudaCalc.cu In the previous commit, the energy calculation was changed to fix the double counting after the energy calculation in all cases. Therefore, the CUDA kernel also needs to be updated. * Add GIST tests for salt-water mixture * add a topology and 10-frame trajectory of benzene in water. * Add tests with and without PME. The reference energies are computed using the energy command. Accuracy is currently +-0.6 kcal/mol using PME and +-4 using nearest image convention. * Fix a bug where GIST was crashing with the skipE and pme options. Now, just print 0 as energy in this case, as expected with skipE. * Add regression tests for GIST energy with ions * Check for libpme in GIST tests * Add test tolerance to the Test_GIST_Order tests * Fix GIST test runner. * In GIST, streamline neighbor calculation with energy * Minor GIST updates * fix neighbor calculation on GPU * add comments for previously undocumented functions * add a check for linear solvent molecules * Update documentation for GIST * Add references to GIST, add tolerance to new GIST test. * Also, removes the [excludeions] option from the GIST documentation. * Increase version number to 6.7.0 * Add .save files for GIST test with benzene as a solvent * Fix a print formatting string in GIST * more exact quaternion distance in GIST * add functions to compute the distance between almost normalized quaternions, such as those cast from float to double. * update unitTests with new functions
Amber-MD · Jun 7, 2022 · c7526d7 · c7526d7
1 parent 901809c
commit c7526d7
Show file tree

Hide file tree

Showing 46 changed files with 60,651 additions and 1,535 deletions.
diff --git a/doc/cpptraj.lyx b/doc/cpptraj.lyx
@@ -25801,10 +25801,6 @@ rdval>]
 [noimage] Disable distance imaging in energy calculation.
 \end_layout
 
-\begin_layout Description
-[excludeions] If specified, exclude any ions from the calculation.
-\end_layout
-
 \begin_layout Description
 
 \series bold
@@ -25988,6 +25984,68 @@ suffix>] Suffix for main GIST info file name.
  the calculation time (default 1).
 \end_layout
 
+\begin_layout Description
+[solute
+\begin_inset space ~
+\end_inset
+
+<mask>] Selection mask for the solute.
+ All other molecules will be solvent.
+ If this is omitted, the standard solute/solvent assignment will be used.
+\end_layout
+
+\begin_layout Description
+[solventmols
+\begin_inset space ~
+\end_inset
+
+<MOLS>] Comma-separated list of names of solvent molecules.
+ Energies will be computed per solvent molecule.
+ For the entropy, only the main solvent (the first one) will be used.
+ Use, e.g., 
+\series bold
+solventmols WAT,NA,CL
+\series default
+ for a GIST calculation including ions.
+ This needs to be specified if there is more than one solvent species.
+\end_layout
+
+\begin_layout Description
+[nocom] Do not use the center of mass to define the molecular position.
+ Instead, use the first atom in 
+\series bold
+rigidatoms
+\series default
+.
+ Use this flag to restore the behavior of old GIST runs.
+\end_layout
+
+\begin_layout Description
+[rigidatoms
+\begin_inset space ~
+\end_inset
+
+<CENTRAL>
+\begin_inset space ~
+\end_inset
+
+<SUBST1>
+\begin_inset space ~
+\end_inset
+
+<SUBST2>] Specifies how to define the molecular orientation for the entropy.
+ By default, a simple heuristic will be used.
+ This works for water, but not for all solvents.
+ The atoms should be representative of the molecular orientation and should
+ not be collinear.
+ Note that the central atom goes first.
+ For water, the default is equivalent to 
+\series bold
+rigidatoms O H1 H2
+\series default
+, corresponding to H1-O-H2 as the rigid substructure.
+\end_layout
+
 \begin_layout Description
 [nopme] Do not use particle mesh Ewald for the non-bonded calculation (default).
 \end_layout
@@ -26105,7 +26163,11 @@ DataSet Aspects:
 \end_layout
 
 \begin_layout Description
-[dTSorient] First order orientational entropy density .
+[dTSorient] First order orientational entropy density.
+\end_layout
+
+\begin_layout Description
+[dTSsix] First order six-dimensional entropy density.
 \end_layout
 
 \begin_layout Description
@@ -26145,6 +26207,90 @@ DataSet Aspects:
 [U_PME] (pme only) Mean solute energy on the GIST grid.
 \end_layout
 
+\begin_layout Standard
+DataSets if the main solvent is not water:
+\end_layout
+
+\begin_layout Description
+[gELEM] For every element 
+\series bold
+ELEM
+\series default
+ in the main solvent, the atomic density relative to rho0 (e.g., 
+\series bold
+gC
+\series default
+ and 
+\series bold
+gH
+\series default
+ for benzene).
+\end_layout
+
+\begin_layout Standard
+DataSets if there are multiple solvents:
+\end_layout
+
+\begin_layout Description
+[g_mol_NAME] for every solvent species 
+\series bold
+NAME
+\series default
+ (e.g., 
+\series bold
+g_mol_WAT
+\series default
+ and 
+\series bold
+g_mol_NA
+\series default
+ if 
+\series bold
+solventmols WAT,NA
+\series default
+ was specified).
+\end_layout
+
+\begin_layout Description
+[Esw_mol_NAME] for every solvent species 
+\series bold
+NAME
+\series default
+ (e.g., 
+\series bold
+Esw_mol_WAT
+\series default
+ and 
+\series bold
+Esw_mol_NA
+\series default
+ if 
+\series bold
+solventmols WAT,NA
+\series default
+ was specified).
+\end_layout
+
+\begin_layout Description
+[Eww_mol_NAME] for every solvent species 
+\series bold
+NAME
+\series default
+ (e.g., 
+\series bold
+Eww_mol_WAT
+\series default
+ and 
+\series bold
+Eww_mol_NA
+\series default
+ if 
+\series bold
+solventmols WAT,NA
+\series default
+ was specified).
+\end_layout
+
 \end_deeper
 \begin_layout Standard
 Grid Inhomogeneous Solvation Theory 
@@ -26519,6 +26665,37 @@ g_H
 
 \begin_layout Itemize
 
+\series bold
+g_ELEM
+\series default
+ - (if the main solvent is not water) Density of every further element 
+\series bold
+ELEM
+\series default
+ in the main solvent.
+ Scaled such that the expectation value in pure solvent is unity.
+\end_layout
+
+\begin_layout Itemize
+
+\series bold
+g_mol_NAME
+\series default
+ - (if there is more than one solvent) Density of every solvent species
+
+\series bold
+NAME
+\series default
+ specified in solventmols.
+ Scaled by 
+\series bold
+rho0
+\series default
+.
+\end_layout
+
+\begin_layout Itemize
+
 \series bold
 dTStrans-dens
 \series default
@@ -26625,8 +26802,60 @@ unting, and not referenced to the corresponding bulk value of this quantity
 \end_inset
 
 ).
- Again, both Lennard-Jones and electrostatic interactions are computed without
- any cutoff, within the minimum image convention.
+ Unless PME is used, both Lennard-Jones and electrostatic interactions are
+ computed without any cutoff, within the minimum image convention.
+\end_layout
+
+\begin_layout Itemize
+
+\series bold
+Esw_mol_NAME-dens
+\series default
+ and 
+\series bold
+Esw_mol_NAME-norm
+\series default
+ - (if there are multiple solvent species) Mean solute-solvent energy per
+ molecule of species 
+\series bold
+NAME
+\series default
+, for each solvent specified in 
+\series bold
+solventmols
+\series default
+.
+ Follows the same conventions as 
+\series bold
+Esw
+\series default
+.
+\end_layout
+
+\begin_layout Itemize
+
+\series bold
+Eww_mol_NAME-dens
+\series default
+ and 
+\series bold
+Eww_mol_NAME-norm
+\series default
+ - (if there are multiple solvent species) Mean solvent-solvent energy per
+ molecule of species 
+\series bold
+NAME
+\series default
+, for each solvent specified in 
+\series bold
+solventmols
+\series default
+.
+ Follows the same conventions as 
+\series bold
+Eww
+\series default
+.
 \end_layout
 
 \begin_layout Itemize
@@ -27541,34 +27770,7 @@ d with a region
  can add up quickly.
  It can be advisable to omit all voxels above a certain distance to the
  solute to obtain more stable results.
- When using PME, the energy cannot be decomposed into Eww and Esw contributions.
- In that case, the energy of hydration is 
-\begin_inset Formula $\Delta E_{hyd}=E_{solute}-E_{solute}^{self}+\sum^{voxels}E_{PME}^{dens}\times V_{vox}$
-\end_inset
-
-, where the solute energy 
-\begin_inset Formula $E_{solute}$
-\end_inset
-
- (containing solute-solute and solute-solvent interactions) can be found
- in the GIST 
-\series bold
-info
-\series default
- file, and the solute self-energy 
-\begin_inset Formula $E_{solute}^{self}$
-\end_inset
-
- can be obtained e.g.
- using the 
-\begin_inset Quotes eld
-\end_inset
-
-energy
-\begin_inset Quotes erd
-\end_inset
-
- command .
+
 \end_layout
 
 \begin_layout Subsection