A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks


LÜLECİ H. B., Uzuner D., Cesur M. F., Ilgun A., Duz E., Abdik E., ...Daha Fazla

NPJ SYSTEMS BIOLOGY AND APPLICATIONS, cilt.10, sa.1, 2024 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 10 Sayı: 1
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1038/s41540-024-00448-z
  • Dergi Adı: NPJ SYSTEMS BIOLOGY AND APPLICATIONS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Yıldız Teknik Üniversitesi Adresli: Hayır

Özet

Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of similar to 0.80 for AD and similar to 0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.