-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert RadiallySymmetricRestraint Forces to CV Forces #365
Conversation
These tests pass locally.
@@ -15,6 +15,9 @@ New features | |||
control global variables (`#363 <https://github.com/choderalab/openmmtools/pull/363>`_). | |||
- Allow restraint force classes to be controlled by a parameter other than ``lambda_restraints``. This will enable | |||
multi-restraints simulations (`#363 <https://github.com/choderalab/openmmtools/pull/363>`_). | |||
- ``RadiallySymmetricRestraintForce`` and all subclasses are now a ``CustomCVForce`` subclass wrapping their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should warn that this may be an API-breaking change in the changelog!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did?
RadiallySymmetricRestraintForce
and all subclasses are now a CustomCVForce
subclass wrapping their respective CustomBondForce
and CustomCentroidBondForce
objects. This breaks backwards compatibility, but enables tracking the restraint distances through the new SamplerSate
collective variable features.
Emphasis mine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just meant that there might be a "Warnings" section rather than burying this in the "Features" (since it may break your program, which may not be a "feature").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I'll make this more visible
openmmtools/forces.py
Outdated
parameter_names, parameter_values = zip(*restraint_parameters.items()) | ||
# Create the distance force used for the CV force wraps | ||
distance_force = self._create_distance_force() | ||
self.addCollectiveVariable("Distance", distance_force) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should our naming convention be for these CVs?
- Do we want to capitalize them (
Distance
) or make them lowercase (distance
)? - Be more explicit in the names (e.g.
receptor_ligand_distance
) or use math symbols (r
,theta1
, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the capital "Distance" because it clearly indicates this is a unique variable the user must think about. I don't want to use r
or distance(g1,g2)
here because this parent class is agnostic of the underlying CV implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to being convinced otherwise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we thought through what names we will use for CVs for all the other restraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the only other restraint classes I have are the Boresch and RMSD ones.
The RMSD one's have the CV RMSD
Boresch has CVs Distance
, AngleA
, AngleB
, TorsionA
/B
/`C.
There is overlap there, yes, but the way I implemented the SamplerState
CV checking is supports multiple forces having overlapping CV Names. I see the overlap here. I'm open to new name ideas, its very easy to change at this point since I have not implemented any of the analysis to take advantage of this conversion yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the reference implementation of the Boresch restraints are in YANK at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jchodera here. I would use a more "long winded" name such as restraint_distance
for this variables because we've been using underscore conventions for both force global variables (e.g. lambda_sterics
) and netcdf variables (e.g., positions
, state
), and also because using only distance
increases the chance of having multiple unrelated forces using the same CV variable name.
Fix Python 2.7 ABC reference by just removing it. Add a check to ensure the distance variable is in the energy_string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, but I just realized that my previous modifications for _controlling_parameter_name
(and _restraints_parameters
here) may not work in the YANK analysis as they are currently implemented.
I've left also a couple of comments about the naming conventions below, but those are minor points.
openmmtools/forces.py
Outdated
parameter_names, parameter_values = zip(*restraint_parameters.items()) | ||
# Create the distance force used for the CV force wraps | ||
distance_force = self._create_distance_force() | ||
self.addCollectiveVariable("Distance", distance_force) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jchodera here. I would use a more "long winded" name such as restraint_distance
for this variables because we've been using underscore conventions for both force global variables (e.g. lambda_sterics
) and netcdf variables (e.g., positions
, state
), and also because using only distance
increases the chance of having multiple unrelated forces using the same CV variable name.
openmmtools/forces.py
Outdated
# Augment args with the energy string | ||
args = (energy_string,) + args | ||
|
||
# Construct | ||
super(RadiallySymmetricRestraintForce, self).__init__(*args, **kwargs) | ||
self._controlling_parameter_name = controlling_parameter_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we can't use this because at the analysis stage we can't restore Python attributes from the serialization. We'll have to infer them from the force object (for example by taking the first name before ' * '
) and return it in the property.
openmmtools/forces.py
Outdated
restraint_parameters = [(self.getPerBondParameterName(parameter_idx), parameter_value) | ||
for parameter_idx, parameter_value in enumerate(parameter_values)] | ||
return collections.OrderedDict(restraint_parameters) | ||
return collections.OrderedDict(self._restraint_parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here like _controlling_parameter_name
. We can't rely on Python attributes as they will be lost with serialization. We can restore only properties so we'll have to read them from the force somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we did not rely on the OpenMM.XmlSerializer
alone to serialize and de-serialize the force? Could we add a __getstate__
and __setstate__
for the base force objects and use the OpenMMTools serialize
and deserialize
functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it is possible unfortunately because I think the serialization of the System
is performed at the C/C++ level and it automatically serializes the forces in the system in XML format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I was worried about and have asked on the [OpenMM](openmm/openmm#2136] issues if its possible to extend, long shot question, but simple enough answer if "no" I would think. The other option would be to infer it from the energy_function
as you said then?
I'm trying to think of a way to intercept the OpenMM XMLserializer to augment it with additional information, but I am worried that will only lead to confusing namespace overwrites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other option would be to infer it from the
energy_function
as you said then?
Yes, for _controlling_parameter_name
, although the controlling_parameter_name + ' * (' + energy_function + ')'
is added in the subclass right now, so we may need to push that line up one level in the hierarchy to keep things clean. I believe that line is shared in all subclasses anyway.
but I am worried that will only lead to confusing namespace overwrites
I agree. I'm also worried about the fact that a serialized system with this method won't be able to be read as a normal OpenMM System
with XmlSerializer.deserializeSystem()
without our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whops! Sorry I was overthinking it. We'll just have to return self.getGlobalParameterName(0)
inside the controlling_parameter_name
property. Maybe also placing an assert
statement on __init__()
to check that at that point there are no other global parameters to make sure the index is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrrizzi on what we talked about, there actually is a restraint_parameters
property which already exists in the released version. Without some way to generically parse it, I don't know how to preserve this property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, I forgot about that! Let's just push the implementation of that property in the subclasses then since they know which one to retrieve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I don't think anything actually uses that property in OpenMMTools or YANK
openmmtools/forces.py
Outdated
@@ -256,8 +266,14 @@ class RadiallySymmetricRestraintForce(utils.RestorableOpenMMObject): | |||
The indices of the second group of atoms to restrain. | |||
controlling_parameter_name : str | |||
The name of the global parameter controlling the energy function. | |||
energy_string : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe call this energy_function
for consistency with the other classes?
openmmtools/forces.py
Outdated
force : OpenMM.Force | ||
The force measuring distance to be used as the CV | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that creating a function without at least pass
was valid syntax!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither did I. That was an oversight on my part. I think having a pass
here is better anyways so its clear that this is an empty function and the end of the function.
Expanded test to ensure deserialization correctly restores Python accessed properties Changed name of the CV Distance from `Distance` to `symmetric_restraint_distance` Removed the `restraint_parameters` property as it was not called by any test or implementation here or in YANK. It also would have been a pain to restore correctly. Added helper function for the regex search of energy strings for implementations to follow. Installed NetCDF4 from pip on windows due to odd bug in Conda's NetCDF4 on windows having DLL errors for... reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've overhauled the implementation to address concerns from @jchodera and discussion with @andrrizzi
-
Forces' Python Properties are now inferred from the energy string through regex
-
Expanded test to ensure deserialization correctly restores Python accessed properties
-
Changed name of the CV Distance from
Distance
tosymmetric_restraint_distance
(This name could be changed) -
Removed the
restraint_parameters
property as it was not called by any test or implementation here or in YANK. It also would have been a pain to restore correctly. -
Added helper function for the regex search of energy strings for implementations to follow.
-
Installed NetCDF4 from pip on windows due to odd bug in Conda's NetCDF4 on windows having DLL errors for... reasons?
restrained_atom_indices1 : iterable of int | ||
The indices of the first group of atoms to restrain. | ||
restrained_atom_indices2 : iterable of int | ||
The indices of the second group of atoms to restrain. | ||
controlling_parameter_name : str | ||
The name of the global parameter controlling the energy function. | ||
energy_function : str | ||
Energy Expression for the CustomCVForce object. Use the variable | ||
"symmetric_restraint_distance" to reference the internal distance force CV. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've chosen this name: symmetric_restraint_distance
. What do you think @andrrizzi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
# Should catch int, float, +-, sci. notation | ||
re_search = re.compile(r';{} = ([-+\d\.eE]+);'.format(parameter)) | ||
try: | ||
value = re_search.search(energy_function).group(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a corner case here which can cause ambiguity, even though it is valid. Assume the user wrote a function:
4*Const*r^2;
Const = 5.4;
Const = 32.1;
This is grossly oversimplified, but illustrates the problem. The user defines this twice , but the search will only return the first entry. I could change it to a findall
and accept only the first entry, OR I could use the findall
to error trap this case of re-defined parameter, but trapping it on parameter fetch seems like a silly place to trap it when the error came much sooner.
I could trap it in the __init__
, but it just kind of seems like overkill because the user wrote a sloppy custom expression. I'm not sure if that is something we should try to catch at all? Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case is unlikely enough that we could just mention it in the docs and leave it as undefined behavior of the function, but if we want to catch it, I'd probably go for an assert
in __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks!
restrained_atom_indices1 : iterable of int | ||
The indices of the first group of atoms to restrain. | ||
restrained_atom_indices2 : iterable of int | ||
The indices of the second group of atoms to restrain. | ||
controlling_parameter_name : str | ||
The name of the global parameter controlling the energy function. | ||
energy_function : str | ||
Energy Expression for the CustomCVForce object. Use the variable | ||
"symmetric_restraint_distance" to reference the internal distance force CV. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
openmmtools/forces.py
Outdated
@@ -21,6 +21,7 @@ | |||
import logging | |||
import math | |||
import re | |||
import yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this module here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no. I had that in as part of other serialization attempts. I'll fix it.
# Should catch int, float, +-, sci. notation | ||
re_search = re.compile(r';{} = ([-+\d\.eE]+);'.format(parameter)) | ||
try: | ||
value = re_search.search(energy_function).group(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case is unlikely enough that we could just mention it in the docs and leave it as undefined behavior of the function, but if we want to catch it, I'd probably go for an assert
in __init__
.
Try again to fix windows netcdf. Pinning libnetcdf to the last good version the netcdf4 lib built against on CondaForge
The windows tests are failing because netcdf is having DLL issues, even when I pin |
This is my implementation of the CV Force conversion of the
RadiallySymmetricRestraintForce
class and subclasses. The call to the parent class is slightly different, the parent class is aCustomCVForce
, and subclasses are no longerCustomBond-
orCustomCentroidBondForce
s.These tests pass locally but the implementation should still be reviewed.
While waiting for a review, I will work on the YANK side of things to take advantage of these changes.