Hit Identification Method type (check all that applies) Deep learning High-throughput docking Physics-based Hybrid of the above Hybrid of the above Description of your approach (min 200 and max 800 words) Our hit identification technique involves three stages. The first involves aggregating a list of initial compounds. We utilize our in-house database of in-stock and on-demand compounds from various vendors (MCule, Enamine, etc) and aggregators (ZINC) (see citations). We also utilize a database of synthetically accessible compounds created through computationally running known synthetic reaction pathways (SAVI). This selection technique produces over a billion compounds, either in-stock and available for purchase or based on a known reaction pathway from in-stock building blocks. We limit ourselves generally to in-stock compounds but will include simple reaction pathways if the compound scores extraordinarily well. The second stage involves simulating the protein target and targeting a binding site. Utilizing DeepDriveMD, we simulate available structures for microseconds and use anharmonic conformational analysis-enabled autoencoder to sample the state space to produce a series of static conformations to dock against. We also utilize the transition information for eluding a binding site. This produces an ensemble of protein structures. We then utilize state-of-the-art commercial docking protocol using our scalable workflow environment to run docking on HPC systems. After running docking on the initial seed set of in-stock orderable compounds, we train a deep learning model to act as a 50,000x faster surrogate than performing docking. With this fast surrogate model, we screen the remaining billion compounds from a make-on-demand database such as Enamine Real or SAVI. A short list is chosen from these two lists by sampling from clusters of high-scoring surrogate compounds and high-quality and in-stock poses. We dock the deep learning scored compounds to verify the correctness of the model. This is performed across the ensemble of structures. Lastly, compounds from each cluster are resimulated and run through DeepDriveMD to determine if any compounds are causing a significant change in protein dynamics or present decoy-like features (free energy calculations score poorly, flies away from site). This information is used to select compounds that elucidate interesting modifications to the protein state space, indicating interaction is likely. Method Name Ensemble-Based Docking Commercial software packages used OpenEye Toolkit Free software packages used OpenMM, RDKit, PyTorch Relevant publications of previous uses by your group of this software/method Computational Docking - High Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Non-Covalent Inhibitor Journal of Chemical Informatics 2022. - IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads 50th International Conference on Parallel Processing (ICPP 21). - Scalable HPC and AI Infrastructure for COVID-19 Therapeutics” in the Platform for Advanced Scientific Computing Platform for Advanced Scientific Computing (PASC ‘21). - Pandemic Drugs at Pandemic Speed: Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning-and Physics-based Simulations on High Performance Computers Interface Focus 2021 HPC Screening of Compounds Targeting SARS-CoV-2 with AI-and HPC-enabled lead generation: a first data release, PASC '21. Deep Drive MD (State space sampling) - AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. The International Journal of High-Performance Computing Applications, Gordon Bell Special Prize for HPC-Based COVID-19 Research ‘20. - Stream-AI-MD: Streaming AI-driven Adaptive Molecular Simulations for Heterogeneous Computing Platforms Platform for Advanced Scientific Computing (PASC ‘21).