Prediction of HIV-1 protease resistance using genotypic, phenotypic, and molecular information with artificial neural networks

Tunc H., Doğan B., Kiraz B. N. D., Sarı M., Durdagi S., Kotil S.

PeerJ, vol.11, 2023 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 11
  • Publication Date: 2023
  • Doi Number: 10.7717/peerj.14987
  • Journal Name: PeerJ
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, CAB Abstracts, EMBASE, Veterinary Science Database, Directory of Open Access Journals
  • Keywords: Machine learning, Artificial neural networks, HIV, AIDS, Drug resistance, Protease inhibitors
  • Yıldız Technical University Affiliated: No


Drug resistance is a primary barrier to effective treatments of HIV/AIDS. Calculating quantitative relations between genotype and phenotype observations for each inhibitor with cell-based assays requires time and money-consuming experiments. Machine learning models are good options for tackling these problems by generalizing the available data with suitable linear or nonlinear mappings. The main aim of this study is to construct drug isolate fold (DIF) change-based artificial neural network (ANN) models for estimating the resistance potential of molecules inhibiting the HIV-1 protease (PR) enzyme. Throughout the study, seven of eight protease inhibitors (PIs) have been included in the training set and the remaining ones in the test set. We have obtained 11,803 genotype-phenotype data points for eight PIs from Stanford HIV drug resistance database. Using the leave-one-out (LVO) procedure, eight ANN models have been produced to measure the learning capacity of models from the descriptors of the inhibitors. Mean R2 value of eight ANN models for unseen inhibitors is 0.716, and the 95% confidence interval (CI) is [0.592–0.840]. Predicting the fold change resistance for hundreds of isolates allowed a robust comparison of drug pairs. These eight models have predicted the drug resistance tendencies of each inhibitor pair with the mean 2D correlation coefficient of 0.933 and 95% CI [0.930–0.938]. A classification problem has been created to predict the ordered relationship of the PIs, and the mean accuracy, sensitivity, specificity, and Matthews correlation coefficient (MCC) values are calculated as 0.954, 0.791, 0.791, and 0.688, respectively. Furthermore, we have created an external test dataset consisting of 51 unique known HIV-1 PR inhibitors and 87 genotype-phenotype relations. Our developed ANN model has accuracy and area under the curve (AUC) values of 0.749 and 0.818 to predict the ordered relationships of molecules on the same strain for the external dataset. The currently derived ANN models can accurately predict the drug resistance tendencies of PI pairs. This observation could help test new inhibitors with various isolates.