From graph theory to chemoinformatics: modified bond-based indices and a hypothesis-driven multi-task QSAR/QSPR benchmark


Creative Commons License

Altairi A., Alhaj Z., Alsharafi M., ZEREN Y.

Scientific Reports, cilt.16, sa.1, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 1
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1038/s41598-026-40969-7
  • Dergi Adı: Scientific Reports
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Chemical Abstracts Core, MEDLINE, Directory of Open Access Journals
  • Anahtar Kelimeler: Antibacterial, Drug discovery, Modified bond-based indices, Molecular graph theory, Physicochemical properties, QSAR, QSPR
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Graph–theoretic degree–based descriptors play a central role in chemoinformatics and QSPR/QSAR modelling, yet most classical indices either focus purely on vertex degrees or treat bond contributions in a purely multiplicative way. In this work we introduce and systematically study a new family of modified bond-based indices in which each edge is weighted by a local bond factor in the denominator, coupled with a vertex kernel in the numerator. This construction yields modified versions of the first and second Zagreb indices, the Forgotten and Yemen indices, several connectivity-type descriptors (product, sum, Nirmala, ABC, CAB, GA, harmonic, and misbalance prodeg), as well as Sombor- and Dharwad-type bond indices. We first present a unified edge–partition representation for any symmetric kernel, expressing each modified index as a finite sum over degree classes. This framework allows us to derive closed-form expressions for all sixteen modified bond-based indices on a broad collection of benchmark families: paths, cycles, complete graphs, complete bipartite graphs, stars, friendship graphs, wheels, book graphs, Dutch windmill graphs, and hypercubes. The resulting tables reveal clear asymptotic growth patterns and highlight which structures are extremal for the modified descriptors. Moreover, we obtain sharp degree–extreme bounds for a representative subset of the indices in terms of the order, size m, and the minimum and maximum degrees and, with equality characterizing regular graphs. The proposed modified bond-based indices thus provide a flexible and analytically tractable family of descriptors that couple vertex and bond information in a novel way, and are well suited as structured features for modern chemoinformatics and graph-based machine-learning models on molecular graphs. Finally, to demonstrate predictive utility in a hypothesis-driven setting, we further benchmark these descriptors within a large multi-task QSAR/QSPR pipeline on 3,219 ChEMBL antibacterial molecules across ten continuous properties using a heterogeneous model zoo under three descriptor scenarios, where the combined descriptors scenario achieves the best overall generalisation (Macro Test ; Global zRMSE), improving upon the Physicochemical descriptors scenario (Macro Test ; Global zRMSE).