Ensemble-Based Network Anomaly Detection Using RFE and Information Gain for Optimized Feature Selection
Abstract
Intrusion Detection Systems (IDSs) play a significant role in reducing dynamic cyber threats. However, current machine learning-centric IDSs are not without issues, as they may have a high false positive rate and suboptimal feature selection, resulting in a low detection rate. This paper proposes an ensemble IDS architecture that utilizes RFE and IG for feature selection, aiming to enhance anomaly detection performance and reduce computational intensity. We begin with a preprocessing pipeline that includes data cleaning, one-hot encoding of categorical features, and normalization to scale the features. The most discriminative attributes are selected to minimize redundancy. Then, the selected feature subset is fed to build a set of ensemble classifiers, including Random Forest, XGBoost, Extra Trees, and a weighted Voting Classifier. Extensive experimental results on the CIC-IDS2017 datasets demonstrate that the proposed ensemble-level approach outperforms in all aspects, achieving 97.5% accuracy, 97.2% precision, 97.8% recall, and 97.5% F1-score. Overall, the ensemble model exhibits an improvement in terms of recall and hence robustness compared to the two baseline classifiers, namely the standalone Random Forest (recall: 96.5%) and XGBoost (recall: 97.3%). We also conducted an ablation study that confirms the effectiveness of RFE and Information Gain by comparing settings with and without feature selection. These findings indicate that the proposed IDS architecture can be feasibly and scalably implemented for real-time network anomaly detection. Adaptive feature selection and deployment in a streaming setting could be investigated to enhance its resistance to novel attacks in the future.DOI:
https://doi.org/10.31449/inf.v49i10.8387Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







