Challenge #2 – COMPUTATIONAL METHODS

Here is a list of all computational methods used for hit identification in CACHE Challenge #2. Click on the Description for more details. Some participants preferred not to release their publications to stay anonymous at this time.

Description

Method name

Commercial software

Free software

Our goal with this competition is to evaluate our released tools. Namely, gnina (https://github.com/gnina) and pharmit (http://pharmit.csb.pitt.edu).

gnina+pharmit

https://sourceforge.net/projects/pharmit/ https://github.com/gnina/gnina

Selection of 10,000: compounds available for purchase (in stock) will be obtained from the ZINC database, from which Morgan fingerprints are computed using RDKit with BAMBU(https://pypi.org/project/bambu-qsar/). The outliers and dimensionality of the dataset will be reduced using Principal Component Analysis (PCA), preserving 95% of the variance, followed by the UMAP algorithm, reducing to two dimensions.

RFL-Bambu

AutoDock Vina 1.1.2, AutoDock Tools, Primordia, RDKit, BINANA, PaDEL Descriptors, ZINC Database, PDB Database, Alphafold EBI Database, Gromacs, Pymol, VMD, Python and Biopython

Our hit identification workflow combines physics-based cheminformatics methods together with novel machine learning algorithms. We employ a fragment-based virtual screening with significant speed-ups from our novel pharmacophore matching algorithm. Secondly, we enrich the pool of the potential hits with de novo generated drug-like candidates. These candidates are then ranked and refined using sequential binding affinity estimation techniques of increasing accuracy.

Hit Stream

Molecular Operating Environment (MOE)
Glide (Schrödinger)
Q-Chem
Gaussian

AutoDock Vina Protein-Ligand ANT System (PLANTS) GROMACS Dock 3.7 (Kuntz Group UCSF)

Using our expertise in medicinal chemistry, structural biology, cheminformatics, machine learning (ML) and structure-based drug design (SBDD) we will generate hits for the RNA-binding site of the SARS-CoV-2 NSP13 helicase.

Hybrid

In-house

GROMACS and AMBER (if required).

The hit identification and drug discovery strategy consist in high-throughput docking for the identification of modulators of the NSP13 helicase of SARS-CoV-2.

Hybrid: High-throughput docking coupled with reevaluation of top hits & docked poses

Schrodinger's Drug Discovery Suite, BioSolvIT SeeSAR, MolSoft ICM.

None.

Abstract

Ultra-Large Scale Virtual Screening & Docking

Schöringer SMD Suite, GRID, Flap, BioGPS

CmDock, PyMOL, Q, R, Python, RDKit, KNIME, ProBiS

Our approach consists of two general steps, each of which has some flexibility.

One-shot Batch Bayesopt

None

Python Autodock vina pytorch, gpytorch, botorch

Our proposed pipeline consists of four steps. As a preliminary step, because of the four similar protein PDBs for this CACHE challenge, we will run unrestrained MD simulations for all four PDB structures and compare the resulting Boltzmann Distributions. If no major differences can be found, we will limit further steps to PDB 5RLZ.

CMOD Design

Gaussian

OpenMM, OpenForceField, Gromacs, MDAnalysis, AmberTools, Autodock Vina, Ledock, Plants, internally developed machine learning models (MILCDock)

Our team of Computational Chemists and Machine Learning experts is part of a Science CRO that has continuous impact on the drug discovery community by collaborating with big pharma and incubating biotechs. In drug discovery projects, we prioritize compounds based on our million-scale in-house compound database, which includes structures, bioactivities, and PhysChem data.

Hydration site analysis guided virtual screening campaign

Molsoft ICM-Pro

CCG MOE

AMBER

KNIME server

Python + libraries (OpenMM, RDKit, pandas, matplotlib, numpy) PyRod

The project will begin with a structure-based analysis of the RNA binding cavity of NSP13, based on the crystal structure 7KRN, using molecular dynamics simulations together with in-house program PyRod [1,2] to sample interaction points in the binding pocket. Briefly, PyRod traces water molecules in protein binding cavities and generates dynamic maps describing the interaction patterns of the water molecules with respect to the protein.

Dynamic 3D Pharmacophores

InteLigand - LigandScout

CCG - MOE

Schrodinger - Desmond

CCDC - GOLD

OpenEye - Szybki

PyRod

De novo hit identification will be pursued using a fragment growing/linking approach, followed by free energy calculations (if time). Designed compounds will be used as targets in a similarity screen of the Enamine Real Database catalog, or synthesized directly in house:

FEgrow

N/A

FEgrow: https://github.com/cole-group/FEgrow gnina: https://github.com/gnina/gnina DeLinker: https://github.com/oxpig/DeLinker RDKit: https://github.com/rdkit/rdkit SOMD: https://github.com/michellab/Sire

We present an end-to-end lead optimization system for discovery based on an AI-gym environment called ``Reinforcement Learning for Molecular Modeling" (RLMM). RLMM automates running fully customizable molecular dynamic simulations inside of an agent-based molecular design protocol. RLMM is fully autonomous---from a single starting ligand, protein structure, and configuration file, RLMM cycles through designs for lead optimization informed by physics-based simulations.

RLMM

OpenEye

RDKit, OpenMM, AMBER20

We will identify the most conserved residues of the NSP13 RNA-Binding tunnel where there are co-crystalized fragments (PDB: 5RML, 5RMM, 5RLZ and 5RLH) by performing multiple sequence alignment (MSA) with the Kalign algorithm on approximately 200,000 SARS-CoV-2 NSP13 sequences from the NCBI. We will determine the amino acids close to these fragments that can form interactions with the predicted hit molecules.

Tiered screening incorporating molecular shape, pharmacophore features, docking and clustering

Molecular Operating Environment (MOE) by the Chemical Computing Group

Kalign In-house MoPBS pharmacophore generation software In-house VS streamlining software DataPype

The in-stock 3D molecules from the ZINC20 database or Mcule Purchasable molecules will be subjected to common filters after duplicates are removed and conformers will be generated.

Evolutionary chemical binding similarity

BIOVIA Discovery Studio Client

RDKit, AutoDock VINA, AutoDock, graphDelta, Gromacs, gmx_MMPBSA

We propose to apply a massive library screening workflow which exhaustively screens the 4.5 billion compound Enamine REAL database using a deep-learning-based Drug Target Interaction (DTI) prediction engine to identify molecules likely to bind to the RNA binding site of NSP13 helicase of SARS-CoV-2.

Massive Library Screening using Structurally-Augmented Drug-Target Interaction (DTI) prediction models

MatchMaker (Cyclica Inc.)

Python-based ML stack (PyTorch, scikit-learn) BioPython computational biology toolkit RD-Kit computational chemistry toolkit Various structural biology tools for structural analysis and visualization, including P2Rank, NGL viewer, Autodock Vina.

For the hit identification we rely on the combination of different methods used and developed in our group. The workflow follows the visual inspection of all structures together with quality assessment in order to choose the most suitable virtual screening workflow.

Expansion of the co-crystallized fragments is planned to be done in parallel from two main approaches:

Artificial Intelligence-driven Optimization of Ligands (AIOLI)

Pipeline Pilot (BioVia)
Spark, Ignite (Cresset)
Glide, ABFEP (Schrodinger)
FlexX, Ftrees, SpaceLight (BioSolveIT)

I have developed a genetic algorithm (GA) that can search Enamine Real Space and will use it to find molecules with good docking scores to the target.

Synthon-GA

Glide, SmallWorld, FTrees

Synthon-GA

Our approach combines the expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of binding site hot-spot information together with conventional structure-based virtual screening methods are enabling components of our hit selection approach.

Frag2Hits

Glide, to additionally veryfy nominated hits

Autodock, FTMap, LigTBM, ReLeaSE

We developed a multi-scale and multi-task neural network to learn binding poses and binding affinities between compounds and proteins. The model takes geometric graph representation of compounds and proteins as input. The compound was processed by a physics-driven graph neural network, integrating the geometry and momentum information into the topological structure. While the protein was processed by a multi-scale graph neural network, connecting surface to structure and sequence.

Multi-scale Drug-Protein Interaction prediction (M-DPI)

Python, Torch, RDKIT, Biopython, P2Rank

FRASE-bot is a computational platform enabling de novo construction of small-molecule ligands directly in the binding pocket of a target protein. It makes use of machine learning to distill 3D information relevant to the protein of interest from thousands of 3D protein-ligand complexes in the Protein Data Bank (PDB) and respective structure-activity relationships (SAR).

FRASE-bot

Schrodinger, Pipeline Pilot

RDKit, Keras/TensorFlow

Our team will recommend compounds predicted to bind to the RNA-binding site of the SARS-CoV-2 helicase NSP13 and with potential for subsequent medicinal chemistry optimization. To this end, we will first filter a set of commercially available molecules (including those suggested in the CACHE guidelines) to reduce potential safety liabilities and undesired chemical reactivity and maximize lead-likeness. This step will also considerably reduce the chemical space that needs to be considered.

SCORCH screening pipeline

StarDrop

Autodock, PSOVina2, GWOVina, RF-Score-VS v2, SCORCH, Osiris DataWarrior, PDB2PQR, OpenBabel, RDKit

Modular synthon-based approach - V-SYNTHES was published in Nature 601, 452–459 (2022). It first identifies the best scaffold–synthon combinations as seeds suitable for further growth, and then iteratively elaborates these seeds to select complete molecules with the best docking scores.

V-SYNTHES

ICM-Pro is provided by MolSoft.

RDKit, KNIME

Foldit is a crowd-sourced molecular biology game. For this challenge, Foldit players will use the graphical small molecule design tools to manually add atoms, bonds and fragments to a starting ligand with the binding pocket (derived from the crystal structures with starting fragments) to optimize the designed ligand for binding into the protein pocket.

Drugit

Foldit/Rosetta/RDKit/ZINC API/BCL/OpenBabel

We will screen out the hit compounds using a Structural Systems Pharmacology (SSP) scheme (1-2). In this scheme, the core is the function-site interaction fingerprint (Fs-IFP) approach (3). Using the Fs-IFP approach, we explore the structural insights into the binding sites across the whole structural proteome. Additionally, this SSP scheme combines MD simulations, Free energy calculations, and machine learning models.

Function-site interaction fingerprint method

IChem (from Dr. Rognan group) Autodock Vina (Docking) Acemd (MD simulations)