Application Identification via Network Traffic Classification

YAMANSAVASCILAR B., GÜVENSAN M. A., YAVUZ A. G., Karsligil M. E.

International Conference on Computing, Networking and Communications (ICNC), California, Amerika Birleşik Devletleri, 26 - 29 Ocak 2017, ss.843-848, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/iccnc.2017.7876241
Basıldığı Şehir: California
Basıldığı Ülke: Amerika Birleşik Devletleri
Sayfa Sayıları: ss.843-848
Anahtar Kelimeler: Network Traffic Classification, Application-based, Machine Learning
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Recent developments in Internet technology have led to an increased importance of network traffic classification. In this study, we used machine-learning methods for the identification of applications using network traffic classification. Contrary to existing studies, which classify applications into categories like FTP, Instant Messaging, etc., we tried to identify popular end-user applications such as Facebook, Twitter, Skype and many more individually. We are motivated by the fact that individual identification of applications is of high importance for network security, QoS enforcement, and trend analysis. For our tests, we used UNB ISCX Network Traffic dataset and our internal dataset, consisting of 14 and 13 well-known applications respectively. In our experiments, we evaluated four classification algorithms, namely J48, Random Forest, k-NN, and Bayes Net. With the complete set of 111 features, k-NN gave the best result for the ISCX Dataset as 93.94% of accuracy using the value of k as 1, and Random Forest gave the best result for the internal dataset as 90.87% of accuracy. During the course of this study, the initial numbers of features were successfully reduced to two sets of 12 features specific to each dataset without a compromise to the success. Moreover, we observed a 2% increase in the success rate for the internal dataset. We believe that individual application identification by applying machine-learning methods is a viable solution and currently we are investigating a two-tier approach to make it more resilient to in category confusion.