Analyzing and Classifying Network Attacks Using Machine Learning on the NSL-KDD Dataset
Keywords:
Machine Learning, OneR, J48 Decision Tree, Naïve Bayes, NSL-KDD, Information Gain, Network Based Intrusion Detection SystemsAbstract
In an attempt to build an efficient network-based Intrusion Detection System, this is a thorough study on a benchmark dataset, NSL-KDD. The novelty of this work lies in determining the minimal number of features necessary to classify each individual attack as well as each attack category in the NSL-KDD dataset using Machine Learning. No previous analysis has yet been done at the individual attack level. Feature selection is performed using Information Gain, and then machine learning algorithms, specifically J48 Decision Tree, Naive Bayes, and a less commonly used classifier, OneR, are used for classification. The most important features for the classification of each individual attack as well as each attack category are presented, as determined by Information Gain. Classification results are also presented. High classification accuracies of mostly over 99%, using the J48 and Naïve Bayes, as well as OneR classifiers, were achieved. The number of attributes that it would take to get true positive rates of 100% or very close to 100% are also presented. In addition, the number of attributes it would take to achieve a recall of 100% or very close to 100% are also presented.