The Importance of Representative Network Data on Classification Models for the Detection of Specific Network Attacks  
Author Maryam M. Najafabadi


Co-Author(s) Taghi M. Khoshgoftaar; Clifford Kemp


Abstract The growing number of attacks in computer networks makes security a very important topic in the networking and communication areas. In recent years, data mining and machine learning methods have been used for network intrusion detection. Usually machine learning models are built on the labeled data. Producing normal/attack labeled network data is a time consuming and labor intensive task. The problem with methods such as simulating network traffic or using honeypots to produce labeled data is that they dont include real background network data that adequately represents different network scenarios (file transfer, interactive connection, etc.) seen in real network traffic; in other word the data is not representative. Most of the machine learning-based works in the network intrusion detection area have been applied on such datasets without considering the fact that such data might not be representative. In this paper, we investigate the impact of not using a representative network data in building machine learning-based intrusion detection models for a specific type of attack. We collected network flows (IPFIX) from a real operational network. Our network operators labeled the real brute force attacks in the collected data. We build four different classification models for the detection of SSH brute force attack on poorly representative data. Our experiments show that even though the cross-validated results on the inadequately representative data are strong the models built on such data perform very poorly when they are tested on a representative network data.We recommend that practitioners take into account the domain knowledge of the specific intrusion detection task under study to provide a representative network data to build the machine learning-based intrusion detection models.


Keywords Representative Data; Intrusion Detection; Machine Learning; Brute Force Attack; Network Flow Analysis
    Article #:  2159
Proceedings of the 21st ISSAT International Conference on Reliability and Quality in Design
August 6-8, 2015 - Philadelphia, Pennsylvia, U.S.A.