Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdkit interoperability #2468

Closed
richardjgowers opened this issue Jan 24, 2020 · 16 comments
Closed

rdkit interoperability #2468

richardjgowers opened this issue Jan 24, 2020 · 16 comments

Comments

@richardjgowers
Copy link
Member

This is mostly an idea for a GSOC project. There's a bunch of cool stuff in rdkit which isn't even close to being in MDA (and vice versa) so rather than reinvent wheels, it would be cool to do:

import MDAnalysis as mda
from MDAnalysis import convert_to

u = mda.Universe('bleh')

rdkit_moi = convert_to('rdkit', u)

and

from rdkit import Chem
from MDAnalysis import convert_to

rdkit_mol = Chem.MolFromSmiles('CCOC')

u = convert_to('mda', rdkit_mol)

Which would expand upon the converters idea that @lilyminium has got rolling with parmed.

The data structures are going to be very different, and rdkit is quite picky about what it lets you load, but it would be cool to get something going.

@RMeli
Copy link
Member

RMeli commented Jan 24, 2020

This would be very useful indeed! The following blog post might be relevant for further discussion: Why the RDKit isn't available on PyPi.

@richardjgowers
Copy link
Member Author

Yeah rdkit via conda is the only way to stay sane tbh. So this project might end up in a side package/non default submodule which is optionally installed and does some monkey patching to _CONVERTERS.

@IAlibay
Copy link
Member

IAlibay commented Jan 24, 2020

At least on our end, rdkit integration would be very much appreciated. It would definitely make tools that depend on both packages like lintools (which we still have aims to revive properly) a lot easier.

@orbeckst
Copy link
Member

I like the idea of expanding on converters – working towards API interoperability instead of files. It's the future.

We might have to review our policy on package dependencies in the core. Maybe it's ok for the majority of users to get a well-define failure if they try to do something with a specialized reader/converter, especially if they are told what to do if they want to install the package. I am thinking along mocking missing optional packages in such a way that only users who want to use exactly the functionality get a failure.

Perhaps the policy can be changed as to "packages that are used in a single convertor or reader can be optional". Or we make converters a submodule with its own policy (but it would be convenient to also be able to include readers/parsers with exotic dependencies).

@orbeckst
Copy link
Member

Do you know anyone from the RDKit community who might want to co-mentor for GSoC? That would be extremely valuable.

@RMeli
Copy link
Member

RMeli commented Jan 24, 2020

RDKit usually takes part to GSoC under the OpenChemistry organization. You can have a look at the RDKit Project Ideas for past mentors. They have a Slack channel and @greglandrum is on it. Maybe worth a try?

@richardjgowers
Copy link
Member Author

richardjgowers commented Jan 24, 2020 via email

@greglandrum
Copy link

This would be cool! I'd be happy to try and help with answering RDKit-related questions.
It's hard to commit to co-mentoring though, until we see what projects actually end up being funded I won't know how busy I am.

@haroldgrosjean
Copy link

haroldgrosjean commented Jul 9, 2020

Hello,

I am investigating protein-ligand interaction with MD simulations and the idea of combining RDkit with MDAnalysis is extremely exciting.

Currently, I am using ODDT to do that (https://oddt.readthedocs.io/en/latest/rst/oddt.html#module-oddt.interactions). The problem with this approach is that ODDT requires PDB files (which are then transformed into RDkit/pybel objects) meaning that one has to convert each frame into a PDB before moving into processing which is slow and bad practice. The idea behind it is that you provide the receptor and the ligand and it returns a list of descriptors such as donor atom, acceptor atom, atom types, etc.

One could feed the trajectory into the new RDKit wrapper and use ODTT in the loop to get the data. This implies that the wrapper must be able to take in the protein atoms and transform them appropriately into an RDKit object. Is that something that is envisaged because it would be of great use to a lot of people? Such functionality would also allow us to hardcode more complex rules and study/ measure less popular/ frequent interactions that are not captured by ODDT such as anion-pi interactions but are also important in protein-ligand binding. It would also combine extremely well with Native contacts analysis https://www.mdanalysis.org/docs/documentation_pages/analysis/contacts.html.

Many thanks,

Harold

@cbouy cbouy mentioned this issue Jul 31, 2020
4 tasks
@orbeckst orbeckst added the GSoC GSoC project label Aug 7, 2020
orbeckst pushed a commit that referenced this issue Aug 22, 2020
* Overview of work done in PR

  This PR adds the `RDKitConverter` class which converts MDAnalysis
  `Universe`/`Atomgroup` objects to `RDKit rdchem.Mol` objects.

* Limitations
  - Bonds and elements must be present (the former will be inferred if not
    present).
  - This converter mainly aims at supporting cases where explicit hydrogens
    are present.

* Extra implementation details
  - Bond order and formal charges are inferred via atomic valencies & the
    number of unpaired electrons (see `_infer_bo_and_charges`).
  - In part due to the influence of atom read order on inferring, bond
    conjugation and functional groups are standardized using SMARTS
    reactions (see `_standardize_patterns`).

*  Other changes

   Also includes some PEP8 changes and some cleaning up of the tests for the
   `RDKITReader`.

* References
  For more information on the development process for this PR, see:
  1) https://cbouy.github.io/blog/2020/07/01/rdkit-converter
  2) https://cbouy.github.io/blog/2020/07/22/rdkit-converter-part2
IAlibay added a commit that referenced this issue Aug 23, 2020
Towards #2468 (RDKit interoperability GSOC project)

PR #2916 -> Originally PR #2775

##Overview of work done in PR

This PR adds the RDKitConverter class which converts MDAnalysis
Universe/Atomgroup objects to RDKit rdchem.Mol objects.

##Limitations

  - Bonds and elements must be present (the former will be inferred if not
     present).
  - This converter mainly aims at supporting cases where explicit hydrogens
     are present.

##Extra implementation details

  - Bond order and formal charges are inferred via atomic valencies & the
     number of unpaired electrons (see _infer_bo_and_charges).
   - In part due to the influence of atom read order on inferring, bond
      conjugation and functional groups are standardized using SMARTS
      reactions (see _standardize_patterns).

##Other changes

Also includes some PEP8 changes and some cleaning up of the tests for the
RDKITReader.

##References

For more information on the development process for this PR, see:
https://cbouy.github.io/blog/2020/07/01/rdkit-converter
https://cbouy.github.io/blog/2020/07/22/rdkit-converter-part2
IAlibay pushed a commit that referenced this issue Aug 29, 2020
Towards #2468 

## Work done in this PR
Adds a new SMARTS based selection which uses the RDKit converter.
@j-wags
Copy link

j-wags commented Sep 3, 2020

Thanks for doing this work! This is really exciting from Open Force Field's perspective. We've also been struggling to perceive bond orders from elements+connectivity (and to know when we can or can't safely guess), so the careful writeups and test cases in this project have been particularly cool to see :-)

@orbeckst
Copy link
Member

orbeckst commented Sep 3, 2020

@haroldgrosjean please have a look at https://www.mdanalysis.org/2020/08/29/gsoc-report-cbouy/#demo — it sounds to me that this will cover your use case. Note that not everything is working yet because not all of @cbouy 's PRs are merged yet but it will come.

@cbouy might be able to say more — the MDA/RDKit project has really made big leaps since you posted your comment in July (sorry for the long silence).

EDIT: Also have a look at @cbouy 's blog, especially https://cedric.bouysset.net/blog/2020/08/07/rdkit-interoperability

@orbeckst
Copy link
Member

orbeckst commented Nov 12, 2020 via email

@cbouy
Copy link
Member

cbouy commented Nov 12, 2020

I was going to answer but I'll wait for the mailing list thread so that more people can find the answer 😃 And yes, it's possible with some tweaks to your code example.

@cbouy
Copy link
Member

cbouy commented Nov 12, 2020

I'm not authorized to answer the discussion :(
@orbeckst Should I post my answer here and you repost it on the thread, or can you authorize me to reply on the mailing list ?

@orbeckst
Copy link
Member

orbeckst commented Nov 13, 2020

Normally, you need to subscribe to the mailing list and when you reply the first time, an admin will remove the hold on your subscription (to make sure you aren't a spammer). However, I added you directly to https://groups.google.com/g/mdnalysis-discussion with your b...@gmail address. Please try again.

PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this issue Mar 30, 2021
* Overview of work done in PR

  This PR adds the `RDKitConverter` class which converts MDAnalysis
  `Universe`/`Atomgroup` objects to `RDKit rdchem.Mol` objects.

* Limitations
  - Bonds and elements must be present (the former will be inferred if not
    present).
  - This converter mainly aims at supporting cases where explicit hydrogens
    are present.

* Extra implementation details
  - Bond order and formal charges are inferred via atomic valencies & the
    number of unpaired electrons (see `_infer_bo_and_charges`).
  - In part due to the influence of atom read order on inferring, bond
    conjugation and functional groups are standardized using SMARTS
    reactions (see `_standardize_patterns`).

*  Other changes

   Also includes some PEP8 changes and some cleaning up of the tests for the
   `RDKITReader`.

* References
  For more information on the development process for this PR, see:
  1) https://cbouy.github.io/blog/2020/07/01/rdkit-converter
  2) https://cbouy.github.io/blog/2020/07/22/rdkit-converter-part2
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this issue Mar 30, 2021
Towards MDAnalysis#2468 

## Work done in this PR
Adds a new SMARTS based selection which uses the RDKit converter.
IAlibay pushed a commit that referenced this issue Apr 23, 2021
Towards #2468 

## Work done in this PR
* Add aromaticity and Gasteiger charges guessers which work via the RDKIT converter.
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this issue Apr 28, 2021
Towards MDAnalysis#2468 

## Work done in this PR
* Add aromaticity and Gasteiger charges guessers which work via the RDKIT converter.
jbarnoud added a commit that referenced this issue May 3, 2021
* added get_connections

* modified tests for atoms.bonds/angles/dihedrals etc

* modified parsers and things to use get_connections or bonds

* updated CHANGELOG

* pep8

* undo half of PR 3160

* add intra stuff

* Update package/MDAnalysis/core/groups.py

Co-authored-by: Jonathan Barnoud <jonathan@barnoud.net>

* tighten up base class checking

* update docstring

* suppres warnings

* Use absolute file paths in ITPParser (#3108)

Fixes #3037

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>

* Adds aromaticity and Gasteiger charges guessers (#2926)

Towards #2468 

## Work done in this PR
* Add aromaticity and Gasteiger charges guessers which work via the RDKIT converter.

* BLD: handle gcc on MacOS (#3234)

Fixes #3109 

## Work done in this PR
* gracefully handle the case where `gcc` toolchain
in use on MacOS has been built from source using `clang`
by `spack` (so it really is `gcc` in use, not `clang`)

## Notes
* we could try to add regression testing, but a few problems:
- `using_clang()` is inside `setup.py`, which probably
can't be safely imported because it has unguarded statements/
code blocks that run right away
- testing build issues is typically tricky with mocking, etc.
(though in this case, probably just need to move `using_clang()`
somewhere else and then test it against a variety of compiler
metadata strings

* Remove ParmEd Timestep writing "support" (#3240)

Fixes #3031

* Adding py3.9 to gh actions CI matrix (#3245)

* Fixes #2974
* Python 3.9 officially supported
* Add  Python 3.9 to testing matrix
* Adds macOS CI entry, formalises 3.9 support

* fix changelog

* special metaclass

* move function down

* tidy code

Co-authored-by: Jonathan Barnoud <jonathan@barnoud.net>
Co-authored-by: Aditya Kamath <48089312+aditya-kamath@users.noreply.github.com>
Co-authored-by: Cédric Bouysset <bouysset.cedric@gmail.com>
Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
@Jay-sanjay
Copy link

is the issue still open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants