EUROPEAN PHYSICAL JOURNAL C, cilt.85, sa.12, 2025 (SCI-Expanded, Scopus)
In this study, the potential of quantum machine learning (QML) techniques based on trainable quantum circuits was explored for vector boson identification at the large hadron collider (LHC). Specifically, the compact muon solenoid (CMS) experiment dataset was employed to reconstruct the Z boson through the muon-antimuon (mu+mu-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu <^>{+}\mu <^>{-}$$\end{document}) decay channel using variational quantum circuits (VQC). To examine the effect of data structure on QML performance, various preprocessing strategies were applied, including different train/test splits, feature selection, dimensionality reduction, and class balancing techniques. The dataset was evaluated under two train/test configurations, namely a balanced split (70:30) and an imbalanced split (80:20), in order to examine the effect of class distribution on QML outcomes. Feature selection based on Random Forest (RF) was used to extract the most informative variables, while principal component analysis (PCA) was utilized to reduce input dimensionality and optimize qubit usage. To mitigate class imbalance, resampling techniques such as the Synthetic Minority Over-sampling Technique (SMOTE), SMOTE combined with edited nearest neighbors (SMOTEENN), and SMOTE with Tomek Links (SMOTETomek) were implemented. A comparative evaluation using stratified cross-validation was conducted to assess model performance and generalization ability across different configurations. The findings indicated that integrating PCA with resampling methods substantially improves the generalization capacity of the VQC model, especially in imbalanced settings. Among all configurations, SMOTE and SMOTEENN delivered the highest classification performance, boosting sensitivity to the minority class and enhancing model stability. These results highlight the significance of data structure, feature reduction, and resampling in classical quantum (CQ) data processing for high energy physics applications.