Exploring the Effectiveness of Dimensionality Reduction Methods for High-Dimensional Turbofan Engine Sensor Data


GÜNEŞ M. Ş.

Applied Sciences (Switzerland), cilt.16, sa.10, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 10
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3390/app16104610
  • Dergi Adı: Applied Sciences (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Applied Science & Technology Source, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: dimensionality reduction, PCA, RUL, t-SNE, UMAP
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

This study presents a systematic comparison of three dimensionality reduction methods namely Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and uniform manifold approximation and projection (UMAP) applied to multivariate turbofan engine sensor data from the NASA C-MAPSS benchmark. The analysis was conducted across three subsets of increasing complexity: FD001 (single operating condition, single-fault mode), FD002 (six operating conditions, single-fault mode), and FD004 (six operating conditions, two fault modes), comprising 20,631, 53,759, and 61,249 observations respectively. For multi-condition subsets, within-condition z-score normalization was applied to prevent inter-condition offsets from masking the degradation signal. Fourteen informative sensor variables were retained following the exclusion of near-constant sensors. Embedding quality was assessed using four complementary metrics: silhouette score (with bootstrap 95% confidence intervals), trustworthiness, continuity, and PCA reconstruction RMSE. A downstream remaining useful life (RUL) prediction task and a hyperparameter sensitivity analysis were also conducted. PCA achieved the best silhouette scores on FD001 (0.4608; 95% CI = [0.447, 0.475]; and FD002) and demonstrated RUL predictive capabilities similar to those of a 14-Dimensional Baseline Model, which supports the ability of PCA to be used as an interpretable tool for analyzing data globally. t-SNE maintained the highest levels of trustworthiness and continuity in preserving local neighborhood relationships among the models tested across each subset. UMAP had the best silhouette score on FD004 (0.4818; 95% CI = [0.463, 0.495]); UMAP also produced confidence intervals that did not overlap with either PCA or t-SNE, thus showing significant statistical differences when compared to these two methods under conditions involving multiple faults. The PCA ranking was consistent across the range of hyperparameter combinations tested (n = 36). The results provide a quantitative, generalizable framework for dimensionality reduction method selection in prognostic health management applications.