Synthesis and crystal structure of a new three-dimensional Pb(II) hole structure coordination polymer based on based on 5-(3′,4′-Dicarboxylphenoxy)-isophthalic acid
Efficient synthesis of 3D/2D CeO2/MoS2 nanocomposites with enhanced photocatalytic activity to degrade organic dye in wastewater and statistical optimization of reaction parameters
Single Organic Ligands Act as a Bifunctional Sensor for Subsequent Detection of Metal and Cyanide Ions, a Statistical Approach toward Coordination and Sensitivity
Mitigating the growth of plant pathogenic bacterium, fungi, and nematode by using plant-mediated synthesis of copper oxide nanoparticles (CuO NPs)
Advances and critical assessment of machine learning techniques for prediction of docking scores
Three machine learning (ML) approaches (TensorFlow, XGBoost, and SchNetPack) are used for prediction of inhibitory potential, expressed as docking score, towards SARS-CoV-2. ML train and test sets are based on ZINC15 database of compounds. Proposed ML models are evaluated based on their prediction accuracy, screening potential, and error estimation. Prediction errors are analyzed with respect to compound size, charge, and docking score, and their improvements towards ML prediction are discussed.
Abstract
Here we present three distinct machine learning (ML) approaches (TensorFlow, XGBoost, and SchNetPack) for docking score prediction. AutoDock Vina is used to evaluate the inhibitory potential of ZINC15 in-vivo and in-vitro-only sets towards the SARS-CoV-2 main protease. The in-vivo set (59 884 compounds) is used for ML training (max. 80%), validation (5%), and testing (15%). The in-vitro-only set (174 014 compounds) is used for the evaluation of prediction capability of the trained ML models. Contributions to the prediction error are analyzed with respect to compounds' charge, number of atoms, and expected inhibitory potential (docking score). Methods for the prediction error estimation of new compounds are considered, yet critically rejected. The ML input weighted with respect to the desired property (i.e., low docking score) in the machine learning models shows to be a promising option to improve the ML performance. Proposed models provide significant reduction in number of intriguing compounds that need to be investigated.