Using our expertise in medicinal chemistry, structural biology, cheminformatics, machine learning (ML) and structure-based drug design (SBDD) we will generate hits for the RNA-binding site of the SARS-CoV-2 NSP13 helicase.
Researchers at Diamond Light Source Ltd., the Structural Genomics Consortium, the University of Toronto and the University of Johannesburg released a study1, in which they note that the NSP13 RNA binding pocket is not only highly conserved across multiple coronaviruses but is also highly druggable, making it an excellent candidate to develop anti-viral therapeutics. We will compile a list of the available NSP13 protein structures from the Protein Data Bank (PDB), with (e.g. 5RLH,5RMA, 5RML, 5RLZ, 5RMM) or without (e.g. 7KRO, 6XEZ, 5RL9, 5RLR, 5RLJ, 5RLI, 5RM2, 5RM7, 5RLW) ligands bound to the RNA-binding site. We will shortlist those amenable to virtual screening: sufficient resolution, completeness, and low B-factors. The apo protein structures will help us determine protein flexibility upon ligand binding and may point to key interactions. We will further analyze these structures noting various ligand chemotypes, variations in amino acid sidechain conformations, and the presence of conserved water molecules. If the water molecule information is insufficient in the available experimental structures, we will perform molecular dynamics (MD) simulations of NSP13 to identify regions with high occupancy of water molecules.
If we determine more than one protein structure to be suitable for our VS campaign, we will carry multiple structures forward in our benchmarking sanity-checks. We will assess the performance of our docking program in reproducing poses from relevant PDB structures, and modify our approach as required. Using our rigid, semi-flexible, and fully-flexible docking approaches, we will evaluate which protein conformation(s) perform best in self- and cross-docking studies. With sanity-checks and benchmarking complete, we will have our model(s) to use prospectively.
Using our in-house drug discovery & cheminformatics platform (peer-reviewed, proprietary code), we will identify a suitable subset of compounds from the Enamine Real Database using various filters which follow medicinal chemistry standards & CACHE white paper guidelines. To consider structurally diverse compounds, we will cluster this set using ECFP4 fingerprints. Our approach is rooted in docking this filtered set of small molecules to the chosen protein model(s) using our state-of-the-art docking program, which considers protein flexibility, displaceable water molecules, and protein-ligand complementarity inside the active site. We are confident in the predicted poses (especially after our retrospective analysis), and we envision several avenues to score/select compounds for testing.
The CACHE Challenge #2 will enable us to test multiple approaches and hypotheses simultaneously against a second target; we used a similar multi-pronged approach during CACHE Challenge #1:
We will pick the top-scoring molecules ranked based on our docking scoring function.
We will use a machine learning (ML) algorithm based on Graph Neural Networks (GNN) to predict the docked scores of molecules (an approach proposed in recent literature). In this second approach we would consider 2-3 orders of magnitude more molecules and then prioritize high-ranking compounds for our comparatively resource-intensive docking algorithm.
We will employ a Quantum Mechanics-Based Scoring Function (QMSF) on molecules not picked using the first two approaches and rank the molecules based on the calculated relative free energy of binding using a more accurate scoring function than a standard docking function.
Finally, our team will participate in a “hit-picking party”, wherein we will visualize the predicted poses and make a human-based selection following discussion and critique.
For the hit SAR stage, this workflow will change as follows: we will search for analogues of the hits in the filtered set using the 2D analogue search module available on our platform. We will then undertake similar steps outlined above with the new, focused library.
Ultimately, we aim to establish a pros/cons list for incorporating ML in physics-based SBDD approaches. Following each approach, essential interactions with NSP13 based on available structures and literature as well as overall fit will be assessed. 100 total top-ranking compounds yielded by the multiple approaches will be selected for testing, aimed at being evenly distributed across methods. "Computational negative controls" may also be selected to support our hypotheses. In line with the SGC/CACHE principles, we will document our research progress and publicize it for all to follow and reference; we are taking a research-centered focus with this opportunity. We hope the sharing of our findings will help guide future efforts in SBDD.
Newman, J.A., Douangamath, A., Yadzani, S. et al. Structure, mechanism and crystallographic fragment screening of the SARS-CoV-2 NSP13 helicase. Nat Commun 12, 4848 (2021). https://doi.org/10.1038/s41467-021-25166-6