Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert RadiallySymmetricRestraint Forces to CV Forces #365

Merged
merged 8 commits into from
Aug 2, 2018

Conversation

Lnaden
Copy link
Contributor

@Lnaden Lnaden commented Jul 31, 2018

This is my implementation of the CV Force conversion of the RadiallySymmetricRestraintForce class and subclasses. The call to the parent class is slightly different, the parent class is a CustomCVForce, and subclasses are no longer CustomBond- or CustomCentroidBondForces.

These tests pass locally but the implementation should still be reviewed.

While waiting for a review, I will work on the YANK side of things to take advantage of these changes.

@Lnaden Lnaden requested a review from andrrizzi July 31, 2018 16:56
@@ -15,6 +15,9 @@ New features
control global variables (`#363 <https://github.com/choderalab/openmmtools/pull/363>`_).
- Allow restraint force classes to be controlled by a parameter other than ``lambda_restraints``. This will enable
multi-restraints simulations (`#363 <https://github.com/choderalab/openmmtools/pull/363>`_).
- ``RadiallySymmetricRestraintForce`` and all subclasses are now a ``CustomCVForce`` subclass wrapping their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should warn that this may be an API-breaking change in the changelog!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did?

RadiallySymmetricRestraintForce and all subclasses are now a CustomCVForce subclass wrapping their respective CustomBondForce and CustomCentroidBondForce objects. This breaks backwards compatibility, but enables tracking the restraint distances through the new SamplerSate collective variable features.

Emphasis mine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just meant that there might be a "Warnings" section rather than burying this in the "Features" (since it may break your program, which may not be a "feature").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I'll make this more visible

parameter_names, parameter_values = zip(*restraint_parameters.items())
# Create the distance force used for the CV force wraps
distance_force = self._create_distance_force()
self.addCollectiveVariable("Distance", distance_force)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should our naming convention be for these CVs?

  • Do we want to capitalize them (Distance) or make them lowercase (distance)?
  • Be more explicit in the names (e.g. receptor_ligand_distance) or use math symbols (r, theta1, etc.)?

Copy link
Contributor Author

@Lnaden Lnaden Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the capital "Distance" because it clearly indicates this is a unique variable the user must think about. I don't want to use r or distance(g1,g2) here because this parent class is agnostic of the underlying CV implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to being convinced otherwise!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we thought through what names we will use for CVs for all the other restraints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the only other restraint classes I have are the Boresch and RMSD ones.

The RMSD one's have the CV RMSD
Boresch has CVs Distance, AngleA, AngleB, TorsionA/B/`C.

There is overlap there, yes, but the way I implemented the SamplerState CV checking is supports multiple forces having overlapping CV Names. I see the overlap here. I'm open to new name ideas, its very easy to change at this point since I have not implemented any of the analysis to take advantage of this conversion yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the reference implementation of the Boresch restraints are in YANK at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jchodera here. I would use a more "long winded" name such as restraint_distance for this variables because we've been using underscore conventions for both force global variables (e.g. lambda_sterics) and netcdf variables (e.g., positions, state), and also because using only distance increases the chance of having multiple unrelated forces using the same CV variable name.

Fix Python 2.7 ABC reference by just removing it.
Add a check to ensure the distance variable is in the energy_string
Copy link
Contributor

@andrrizzi andrrizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but I just realized that my previous modifications for _controlling_parameter_name (and _restraints_parameters here) may not work in the YANK analysis as they are currently implemented.

I've left also a couple of comments about the naming conventions below, but those are minor points.

parameter_names, parameter_values = zip(*restraint_parameters.items())
# Create the distance force used for the CV force wraps
distance_force = self._create_distance_force()
self.addCollectiveVariable("Distance", distance_force)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jchodera here. I would use a more "long winded" name such as restraint_distance for this variables because we've been using underscore conventions for both force global variables (e.g. lambda_sterics) and netcdf variables (e.g., positions, state), and also because using only distance increases the chance of having multiple unrelated forces using the same CV variable name.

# Augment args with the energy string
args = (energy_string,) + args

# Construct
super(RadiallySymmetricRestraintForce, self).__init__(*args, **kwargs)
self._controlling_parameter_name = controlling_parameter_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we can't use this because at the analysis stage we can't restore Python attributes from the serialization. We'll have to infer them from the force object (for example by taking the first name before ' * ') and return it in the property.

restraint_parameters = [(self.getPerBondParameterName(parameter_idx), parameter_value)
for parameter_idx, parameter_value in enumerate(parameter_values)]
return collections.OrderedDict(restraint_parameters)
return collections.OrderedDict(self._restraint_parameters)
Copy link
Contributor

@andrrizzi andrrizzi Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here like _controlling_parameter_name. We can't rely on Python attributes as they will be lost with serialization. We can restore only properties so we'll have to read them from the force somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we did not rely on the OpenMM.XmlSerializer alone to serialize and de-serialize the force? Could we add a __getstate__ and __setstate__ for the base force objects and use the OpenMMTools serialize and deserialize functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it is possible unfortunately because I think the serialization of the System is performed at the C/C++ level and it automatically serializes the forces in the system in XML format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I was worried about and have asked on the [OpenMM](openmm/openmm#2136] issues if its possible to extend, long shot question, but simple enough answer if "no" I would think. The other option would be to infer it from the energy_function as you said then?

I'm trying to think of a way to intercept the OpenMM XMLserializer to augment it with additional information, but I am worried that will only lead to confusing namespace overwrites.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option would be to infer it from the energy_function as you said then?

Yes, for _controlling_parameter_name, although the controlling_parameter_name + ' * (' + energy_function + ')' is added in the subclass right now, so we may need to push that line up one level in the hierarchy to keep things clean. I believe that line is shared in all subclasses anyway.

but I am worried that will only lead to confusing namespace overwrites

I agree. I'm also worried about the fact that a serialized system with this method won't be able to be read as a normal OpenMM System with XmlSerializer.deserializeSystem() without our code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whops! Sorry I was overthinking it. We'll just have to return self.getGlobalParameterName(0) inside the controlling_parameter_name property. Maybe also placing an assert statement on __init__() to check that at that point there are no other global parameters to make sure the index is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrrizzi on what we talked about, there actually is a restraint_parameters property which already exists in the released version. Without some way to generically parse it, I don't know how to preserve this property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, I forgot about that! Let's just push the implementation of that property in the subclasses then since they know which one to retrieve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't think anything actually uses that property in OpenMMTools or YANK

@@ -256,8 +266,14 @@ class RadiallySymmetricRestraintForce(utils.RestorableOpenMMObject):
The indices of the second group of atoms to restrain.
controlling_parameter_name : str
The name of the global parameter controlling the energy function.
energy_string : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe call this energy_function for consistency with the other classes?

force : OpenMM.Force
The force measuring distance to be used as the CV
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that creating a function without at least pass was valid syntax!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither did I. That was an oversight on my part. I think having a pass here is better anyways so its clear that this is an empty function and the end of the function.

Expanded test to ensure deserialization correctly restores Python accessed properties

Changed name of the CV Distance from `Distance` to `symmetric_restraint_distance`

Removed the `restraint_parameters` property as it was not called by any test or implementation here or in YANK. It also would have been a pain to restore correctly.

Added helper function for the regex search of energy strings for implementations to follow.

Installed NetCDF4 from pip on windows due to odd bug in Conda's NetCDF4 on windows having DLL errors for... reasons?
Copy link
Contributor Author

@Lnaden Lnaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've overhauled the implementation to address concerns from @jchodera and discussion with @andrrizzi

  • Forces' Python Properties are now inferred from the energy string through regex

  • Expanded test to ensure deserialization correctly restores Python accessed properties

  • Changed name of the CV Distance from Distance to symmetric_restraint_distance (This name could be changed)

  • Removed the restraint_parameters property as it was not called by any test or implementation here or in YANK. It also would have been a pain to restore correctly.

  • Added helper function for the regex search of energy strings for implementations to follow.

  • Installed NetCDF4 from pip on windows due to odd bug in Conda's NetCDF4 on windows having DLL errors for... reasons?

restrained_atom_indices1 : iterable of int
The indices of the first group of atoms to restrain.
restrained_atom_indices2 : iterable of int
The indices of the second group of atoms to restrain.
controlling_parameter_name : str
The name of the global parameter controlling the energy function.
energy_function : str
Energy Expression for the CustomCVForce object. Use the variable
"symmetric_restraint_distance" to reference the internal distance force CV.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've chosen this name: symmetric_restraint_distance. What do you think @andrrizzi?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

# Should catch int, float, +-, sci. notation
re_search = re.compile(r';{} = ([-+\d\.eE]+);'.format(parameter))
try:
value = re_search.search(energy_function).group(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a corner case here which can cause ambiguity, even though it is valid. Assume the user wrote a function:

4*Const*r^2;
Const = 5.4;
Const = 32.1;

This is grossly oversimplified, but illustrates the problem. The user defines this twice , but the search will only return the first entry. I could change it to a findall and accept only the first entry, OR I could use the findall to error trap this case of re-defined parameter, but trapping it on parameter fetch seems like a silly place to trap it when the error came much sooner.

I could trap it in the __init__, but it just kind of seems like overkill because the user wrote a sloppy custom expression. I'm not sure if that is something we should try to catch at all? Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case is unlikely enough that we could just mention it in the docs and leave it as undefined behavior of the function, but if we want to catch it, I'd probably go for an assert in __init__.

Copy link
Contributor

@andrrizzi andrrizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks!

restrained_atom_indices1 : iterable of int
The indices of the first group of atoms to restrain.
restrained_atom_indices2 : iterable of int
The indices of the second group of atoms to restrain.
controlling_parameter_name : str
The name of the global parameter controlling the energy function.
energy_function : str
Energy Expression for the CustomCVForce object. Use the variable
"symmetric_restraint_distance" to reference the internal distance force CV.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@@ -21,6 +21,7 @@
import logging
import math
import re
import yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this module here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no. I had that in as part of other serialization attempts. I'll fix it.

# Should catch int, float, +-, sci. notation
re_search = re.compile(r';{} = ([-+\d\.eE]+);'.format(parameter))
try:
value = re_search.search(energy_function).group(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case is unlikely enough that we could just mention it in the docs and leave it as undefined behavior of the function, but if we want to catch it, I'd probably go for an assert in __init__.

Try again to fix windows netcdf. Pinning libnetcdf to the last good version the netcdf4 lib built against on CondaForge
@Lnaden Lnaden merged commit 24841a0 into multi-restraints Aug 2, 2018
@Lnaden Lnaden deleted the cv-restraints branch August 2, 2018 11:43
@Lnaden
Copy link
Contributor Author

Lnaden commented Aug 2, 2018

The windows tests are failing because netcdf is having DLL issues, even when I pin libnetcdf to the exact version the last stable conda-forge build version on AppVeyor, it still fails. Since that does not bear on this PR, I'm merging to keep moving on more important things and we will have to figure that out later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants