IEEE Access, cilt.14, ss.14777-14793, 2026 (SCI-Expanded, Scopus)
This study presents a novel empirical methodology to characterize and compare the ranking environments of major information retrieval systems, specifically Google and Bing. By analyzing technical and content attributes from a dataset of 14,465 Search Engine Results Page (SERP) items collected from a homogeneous commercial discount domain comprising 500 queries, we aim to characterize observable associative patterns between resource attributes and ranking outcomes. The dataset includes Lighthouse performance metrics and advanced content features, such as Sentence-BERT-based semantic similarity. Using K-Means clustering, we identify five resource profiles representing emergent optimization archetypes. The analysis revealed that content-related factors had a higher aggregate importance for both systems (Google: 70.1%, Bing: 61.8%) than technical factors. Specifically, Random Forest feature importance analysis highlighted that for Bing, content volume was a dominant predictor, whereas for Google, semantic relevance signals outweighed pure keyword targeting. We further contextualize these findings within an "Authority–Optimization Trade-off" framework, suggesting that Google’s negative associations for certain on-page optimization signals likely reflect a ranking function that heavily weights latent domain authority over explicit on-page compliance. These findings highlight how modern learning-to-rank systems may differentially weight explicit content features and latent authority signals when balancing relevance, diversity, and quality.