Copy For Citation
Anjum S., Kousar S., Kausar N., Aydin N., Olanrewaju O. A., Mncwango B.
DECISION MAKING: APPLICATIONS IN MANAGEMENT AND ENGINEERING, vol.7, no.2, pp.1-14, 2024 (Scopus)
Abstract
Investigating algebraic structures in a non-conventional framework supplements mathematics for hard-nosed practical applications to the fields of theoretical biology and computer science. One such algebraic structure is multigroup whose underlying set is a multiset. The genome is the entire set of DNA instructions found within a cell which contains all the information needed for an individual to develop and function. DNA and RNA are the hereditary materials that play a vital role in the metabolism process of living things, especially protein synthesis. In genomic database DNA sequences are stored in the form of string or text data types. The only data that works with machine learning algorithms is numerical. Thus, it is necessary to transform DNA sequence strings to numerical values. This article is organized in the following manner. Firstly, we prove that standard genetic code is a multigroup and genome architecture of the whole population can be interpreted as the sum of multisets. Next, it is described how a numerical representation of DNA bases relates to its algebraic representation. We further employed Gated Recurrent Unit, Long Short-Term Memory, and Bidirectional Long Short-Term Memory to identify changes between the DNA sequences. Experimental results show that GRU with multiset-based numerical values for DNA bases offers 98% accuracy on testing data. This novel technique will aid in the detection of mutations in complex diseases.