adding scripts from 2017 mof subset paper NO_JIRA

ccdc-opensource · Jul 2, 2024 · a972fa0 · a972fa0
1 parent 74a5289
commit a972fa0
Show file tree

Hide file tree

Showing 5 changed files with 325 additions and 34 deletions.
diff --git a/scripts/ReadMe.md b/scripts/ReadMe.md
@@ -1,45 +1,58 @@
-## Contents
 
-This folder contains scripts submitted by users or CCDC scientists for anyone to use freely. 
+# Contents
 
-### Hydrogen bond propensity
-- Writes a `.docx report` of a hydrogen bond propensity calculation for any given `.mol2`/refcode.
+## Concat Mol2
 
-### Multi-component hydrogen bond propensity 
-- Performs a multi-component HBP calculation for a given library of co-formers. 
+- Concatenates mol2 files present in working directory to a single `.mol2` file.
 
-### Packing similarity dendrogram
-- Construct a dendrogram for an input set of structures based on packing-similarity analysis.
+## Create CASTEP Input
 
-### GOLD-multi
-- Use the CSD Docking API and the multiprocessing module to parallelize GOLD docking.
+- Creates input files (`.cell` and `.param`) files for a given compound through Mercury.
+
+## Create GAUSSIAN Input
+
+- Create GAUSSIAN input file (`.gjf`) for a given CSD refcode or `.mol2` file.
+
+## Find Binding Conformation
 
-### Find Binding Conformation
 - Generates idealized conformers for ligands and evaluates their RMSD to the conformation in the PDB.
 
-### Concat Mol2
-- Concatenates mol2 files present in working directory to a single `.mol2` file. 
+## GOLD-multi
 
-### Create CASTEP Input
-- Creates input files (`.cell` and `.param`) files for a given compound through Mercury.
+- Use the CSD Docking API and the multiprocessing module to parallelize GOLD docking.
 
-### Create GAUSSIAN Input
-- Create GAUSSIAN input file (`.gjf`) for a given CSD refcode or `.mol2` file.
+## Hydrogen bond propensity
+
+- Writes a `.docx report` of a hydrogen bond propensity calculation for any given `.mol2`/refcode.
+
+## MOF subset 2017 Chem Mater publication
+
+- Two scripts that were supplementary information in the publication "Development of a Cambridge Structural Database Subset:
+ A Collection of Metal–Organic Frameworks for Past, Present, and Future" DOI: <https://doi.org/10.1021/acs.chemmater.7b00441>
+
+## Multi-component hydrogen bond propensity
+
+- Performs a multi-component HBP calculation for a given library of co-formers.
+
+## Packing similarity dendrogram
+
+- Construct a dendrogram for an input set of structures based on packing-similarity analysis.
+
+## Particle Rugosity
 
-### Particle Rugosity
 - Calculates the simulated BFDH particle rugosity weighted by facet area.
 
-## Tips 
-A section for top tips in using the repository and GitHub. 
-### Searching tips:
+## Tips
+
+A section for top tips in using the repository and GitHub.
+
+### Searching tips
 
 The search bar in GitHub allows you to search for keywords mentioned in any file throughout the repository (in the main branch).
 
 It is also possible to filter which file type you are interested in.
 
-For example: 
-"hydrogen bond" 
+For example:
+"hydrogen bond"
 
 <img src="../assets/search.gif" width="500px">
-
-
diff --git a/...pts/mof_solvent_removal_2017_chem_mater_publication/Command_prompt_MOF_solvent_removal.py b/...pts/mof_solvent_removal_2017_chem_mater_publication/Command_prompt_MOF_solvent_removal.py
@@ -0,0 +1,119 @@
+#
+# This script can be used for any purpose without limitation subject to the
+# conditions at http://www.ccdc.cam.ac.uk/Community/Pages/Licences/v2.aspx
+#
+# This permission notice and the following statement of attribution must be
+# included in all copies or substantial portions of this script.
+#
+# 2016-12-15: created by S. B. Wiggin, the Cambridge Crystallographic Data Centre
+# 2024-07-02: minor update to include using ccdc utilities to find the solvent file
+
+"""
+Script to identify and remove bound solvent molecules from a MOF structure.
+
+Solvents are identified using a defined list.
+Output in CIF format includes only framework component with all monodentate solvent removed.
+"""
+#######################################################################
+
+import os
+import glob
+import argparse
+
+from ccdc import io
+from ccdc import utilities
+
+#######################################################################
+
+arg_handler = argparse.ArgumentParser(description=__doc__)
+arg_handler.add_argument(
+    'input_file',
+    help='CSD .gcd file from which to read MOF structures'
+)
+arg_handler.add_argument(
+    '-o', '--output-directory',
+    help='Directory into which to write stripped structures'
+)
+arg_handler.add_argument(
+    '-m', '--monodentate', default=False, action='store_true',
+    help='Whether or not to strip all unidenate (or monodentate) ligands from the structure'
+)
+arg_handler.add_argument(
+    '-s', '--solvent-file',
+    help='Location of solvent file'
+)
+
+args = arg_handler.parse_args()
+if not args.output_directory:
+    args.output_directory = os.path.dirname(args.input_file)
+
+# Define the solvent smiles patterns
+if not args.solvent_file:
+    args.solvent_file = utilities.Resources().get_ccdc_solvents_dir()
+
+if os.path.isdir(args.solvent_file):
+    solvent_smiles = [
+        io.MoleculeReader(f)[0].smiles
+        for f in glob.glob(os.path.join(args.solvent_file, '*.mol2'))
+    ]
+else:
+    solvent_smiles = [m.smiles for m in io.MoleculeReader(args.solvent_file)]
+
+
+#######################################################################
+
+
+def is_multidentate(c, mol):
+    """
+    Check for components bonded to metals more than once.
+    If monodentate is not specified in the arguments, skip this test.
+    """
+    if not args.monodentate:
+        return True
+    got_one = False
+    for a in c.atoms:
+        orig_a = mol.atom(a.label)
+        if any(x.is_metal for b in orig_a.bonds for x in b.atoms):
+            if got_one:
+                return True
+            got_one = True
+    return False
+
+
+def is_solvent(c):
+    """Check if this component is a solvent."""
+    return c.smiles == 'O' or c.smiles in solvent_smiles
+
+
+def has_metal(c):
+    """Check if this component has any metals."""
+    return any(a.is_metal for a in c.atoms)
+
+
+# Iterate over entries
+try:
+    for entry in io.EntryReader(args.input_file):
+        if entry.has_3d_structure:
+            # Ensure labels are unique
+            mol = entry.molecule
+            mol.normalise_labels()
+            # Use a copy
+            clone = mol.copy()
+            # Remove all bonds containing a metal atom
+            clone.remove_bonds(b for b in clone.bonds if any(a.is_metal for a in b.atoms))
+            # Work out which components to remove
+            to_remove = [
+                c
+                for c in clone.components
+                if not has_metal(c) and (not is_multidentate(c, mol) or is_solvent(c))
+            ]
+            # Remove the atoms of selected components
+            mol.remove_atoms(
+                mol.atom(a.label) for c in to_remove for a in c.atoms
+            )
+            # Write the CIF
+            entry.crystal.molecule = mol
+            with io.CrystalWriter('%s/%s_stripped.cif' % (args.output_directory, entry.identifier)) as writer:
+                writer.write(entry.crystal)
+except RuntimeError:
+    print('File format not recognised')
diff --git a/scripts/mof_solvent_removal_2017_chem_mater_publication/Mercury_MOF_solvent_removal.py b/scripts/mof_solvent_removal_2017_chem_mater_publication/Mercury_MOF_solvent_removal.py
@@ -0,0 +1,98 @@
+#
+# This script can be used for any purpose without limitation subject to the
+# conditions at http://www.ccdc.cam.ac.uk/Community/Pages/Licences/v2.aspx
+#
+# This permission notice and the following statement of attribution must be
+# included in all copies or substantial portions of this script.
+#
+# 2016-12-15: created by S. B. Wiggin, the Cambridge Crystallographic Data Centre
+# 2024-07-02: minor update to include using ccdc utilities to find the solvent file
+
+"""
+Script to identify and remove bound solvent molecules from a MOF structure.
+
+Solvents are identified using a defined list.
+Output in CIF format includes only framework component with all monodentate solvent removed.
+"""
+#######################################################################
+
+import os
+import glob
+
+from ccdc import io
+from ccdc import utilities
+from mercury_interface import MercuryInterface
+
+#######################################################################
+
+helper = MercuryInterface()
+solvent_smiles = []
+
+# Define the solvent smiles patterns
+solvent_file = utilities.Resources().get_ccdc_solvents_dir()
+
+if os.path.isdir(solvent_file):
+    solvent_smiles = [
+        io.MoleculeReader(f)[0].smiles
+        for f in glob.glob(os.path.join(solvent_file, '*.mol2'))
+    ]
+
+else:
+    html_file = helper.output_html_file
+    f = open(html_file, "w")
+    f.write('<br>')
+    f.write('Sorry, unable to locate solvent files in the CCDC directory')
+    f.write('<br>')
+    f.close()
+# a user-defined solvent directory could be added here instead
+
+#######################################################################
+
+
+def is_solvent(c):
+    """Check if this component is a solvent."""
+    return c.smiles == 'O' or c.smiles in solvent_smiles
+
+
+def has_metal(c):
+    """Check if this component has any metals."""
+    return any(a.is_metal for a in c.atoms)
+
+
+entry = helper.current_entry
+if entry.has_3d_structure:
+    # Ensure labels are unique
+    mol = entry.molecule
+    mol.normalise_labels()
+    # Use a copy
+    clone = mol.copy()
+    # Remove all bonds containing a metal atom
+    clone.remove_bonds(b for b in clone.bonds if any(a.is_metal for a in b.atoms))
+    # Work out which components to remove
+    to_remove = [
+        c
+        for c in clone.components
+        if not has_metal(c) and is_solvent(c)
+        ]
+    # Remove the atoms of selected components
+    mol.remove_atoms(
+        mol.atom(a.label) for c in to_remove for a in c.atoms
+    )
+    # Write the CIF
+    entry.crystal.molecule = mol
+    with (io.CrystalWriter('%s/%s_stripped.cif' % (helper.options['working_directory_path'], entry.identifier)) as
+          writer):
+        writer.write(entry.crystal)
+        html_file = helper.output_html_file
+    f = open(html_file, "w")
+    f.write('<br>')
+    f.write('Cif file containing MOF framework without monodentate solvent written to your output directory')
+    f.write('<br>')
+    f.close()
+else:
+    html_file = helper.output_html_file
+    f = open(html_file, "w")
+    f.write('<br>')
+    f.write('Sorry, this script will only work for CSD entries containing atomic coordinates')
+    f.write('<br>')
+    f.close()
diff --git a/scripts/mof_solvent_removal_2017_chem_mater_publication/ReadMe.md b/scripts/mof_solvent_removal_2017_chem_mater_publication/ReadMe.md
@@ -0,0 +1,56 @@
+# MOF solvent removal
+
+## Summary
+
+Scripts included in the supporting information of the article "Development of a Cambridge Structural Database Subset:
+A Collection of Metal–Organic Frameworks for Past, Present, and Future", Peyman Z. Moghadam, Aurelia Li,
+Seth B. Wiggin, Andi Tao, Andrew G. P. Maloney, Peter A. Wood, Suzanna C. Ward, and David Fairen-Jimenez
+*Chem. Mater.* **2017**, 29, 7, 2618–2625, DOI: <https://doi.org/10.1021/acs.chemmater.7b00441>
+
+Scripts are essentially equivalent: one is designed to be run through the Mercury CSD Python API menu to
+remove solvent from a single structure present in the visualiser, the second runs from the command line
+and takes a list of CSD entries (a .gcd file) to run through the solvent removal process in bulk.
+
+## Requirements
+
+Tested with CSD Python API 3.9.18
+
+## Licensing Requirements
+
+CSD-Core
+
+## Instructions on running
+
+For the script Mercury_MOF_solvent_removal.py:
+
+- In Mercury, pick **CSD Python API** in the top-level menu, then **Options…** in the resulting pull-down menu.
+- The Mercury Scripting Configuration control window will be displayed; from the  *Additional Mercury Script Locations*
+section, use the **Add Location** button to navigate to a folder location containing the script
+- It will then be possible to run the script directly from the CSD Python API menu, with the script running on the structure
+shown in the visualiser
+
+For the script Command_prompt_MOF_solvent_removal.py
+
+```cmd
+python Command_prompt_MOF_solvent_removal.py <search_results>.gcd
+```
+
+```cmd
+positional arguments:
+  input_file        CSD .gcd file from which to read MOF structures
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
+                         Directory into which to write stripped structures
+  -m, --monodentate
+                        Whether or not to strip all unidenate (or monodentate) ligands from the structure
+  -s SOLVENT_FILE, --solvent-file SOLVENT_FILE
+                        The location of a solvent file
+```
+
+## Author
+
+*S.B.Wiggin* (2016)
+
+> For feedback or to report any issues please contact [support@ccdc.cam.ac.uk](mailto:support@ccdc.cam.ac.uk)