The Effect of Topic Modelling on Prediction of Criticality Levels of Software Vulnerabilities

Prarna Mehta, Shubhangi Aggarwal, Abhishek Tandon


In this day and age, software is an indispensable part of our per diem endeavours, thereby keeping a check on exploitable vulnerabilities has become a vital function of a software firm. The motivation of this paper is to have better understanding of vulnerabilities, creating a tool for the industry practitioners to identify a critical vulnerability that could be detrimental for the firm’s assets. In this article, 1999 vulnerabilities related to Google Chrome was analysed to understand the behaviour of vulnerabilities. The identification of trends and patterns using topic modelling technique lead to extraction of topics. The extricated topics were then implemented in 10 classifiers to foresee the criticality of the vulnerability. The resulting performances were also assessed with the classifiers without implementing topic modelling techniques. A 10-fold validation was conducted on the suggested prediction model.

Full Text:



Alves, H., Fonseca, B., & Antunes, N. (2016). Software metrics and security vulnerabilities: dataset and exploratory study. 2016 12th European Dependable Computing Conference (EDCC),

Anjum, M., Agarwal, V., Kapur, P., & Khatri, S. K. (2020). Two-phase methodology for prioritization and utility assessment of software vulnerabilities. International Journal of System Assurance Engineering and Management, 11(2), 289-300.

Anjum, M., Kapur, P., Agarwal, V., & Khatri, S. K. (2020). Evaluation and Selection of Software Vulnerabilities. International Journal of Reliability, Quality and Safety Engineering, 27(05), 2040014.

Bulut, F. G., Altunel, H., & Tosun, A. (2019). Predicting software vulnerabilities using topic modeling with issues. 2019 4th International Conference on Computer Science and Engineering (UBMK),

Dam, H. K., Tran, T., & Pham, T. (2016). A deep language model for software code. in workshop on Naturalness of Software (NL+SE), co- located with the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE),

Farris, K. A., Shah, A., Cybenko, G., Ganesan, R., & Jajodia, S. (2018). Vulcon: A system for vulnerability prioritization, mitigation, and management. ACM Transactions on Privacy and Security (TOPS), 21(4), 1-28.

Filus, K., Siavvas, M., Domańska, J., & Gelenbe, E. (2020). The random neural network as a bonding model for software vulnerability prediction. Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems,

Ji, T., Wu, Y., Wang, C., Zhang, X., & Wang, Z. (2018). The coming era of alphahacking?: A survey of automatic software vulnerability detection, exploitation and patching techniques. 2018 IEEE third international conference on data science in cyberspace (DSC),

Kalouptsoglou, I., Siavvas, M., Tsoukalas, D., & Kehagias, D. (2020). Cross-project vulnerability prediction based on software metrics and deep learning. International Conference on Computational Science and Its Applications,

Kansal, Y., Kapur, P., & Kumar, D. (2016). Assessing optimal patch release time for vulnerable software systems. 2016 International Conference on Innovation and Challenges in Cyber Security (ICICCS-INBUSH),

Kansal, Y., Kumar, U., Kumar, D., & Kapur, P. K. (2018). Fixing of Faults and Vulnerabilities via Single Patch. In Quality, IT and Business Operations (pp. 175-190). Springer.

Kudjo, P. K., Chen, J., Mensah, S., Amankwah, R., & Kudjo, C. (2020). The effect of Bellwether analysis on software vulnerability severity prediction models. Software Quality Journal, 1-34.

Kumar, M., & Sharma, A. (2017). An integrated framework for software vulnerability detection, analysis and mitigation: an autonomic system. Sādhanā, 42(9), 1481-1493.

Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., & Chen, Z. (2021). SySeVR: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing.

Malhotra, R. (2021). Severity Prediction of Software Vulnerabilities Using Textual Data. Proceedings of International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications,

Mounika, V., Yuan, X., & Bandaru, K. (2019). Analyzing CVE Database Using Unsupervised Topic Modelling. 2019 International Conference on Computational Science and Computational Intelligence (CSCI),

Narang, S., Kapur, P., Damodaran, D., & Majumdar, R. (2018). Prioritizing types of vulnerability on the basis of their severity in multi-version software systems using DEMATEL technique. 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO),

Nguyen, V. H., Dashevskyi, S., & Massacci, F. (2016). An automatic method for assessing the versions affected by a vulnerability. Empirical Software Engineering, 21(6), 2268-2297.

Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (2000). Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences, 61(2), 217-235.

Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks,

Roumani, Y., Nwankpa, J. K., & Roumani, Y. F. (2015). Time series modeling of vulnerabilities. Computers & Security, 51, 32-40.

Shahriar, H., & Haddad, H. (2016). Object injection vulnerability discovery based on latent semantic indexing. Proceedings of the 31st Annual ACM Symposium on Applied Computing,

Sharma, R., Sibal, R., & Sabharwal, S. (2019). Software vulnerability prioritization: A comparative study using TOPSIS and VIKOR techniques. In System performance and management analytics (pp. 405-418). Springer.

Stuckman, J., Walden, J., & Scandariato, R. (2016). The effect of dimensionality reduction on software vulnerability prediction models. IEEE Transactions on Reliability, 66(1), 17-37.

Telang, R., & Wattal, S. (2007). An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Transactions on Software engineering, 33(8), 544-557.

Theisen, C., & Williams, L. (2020). Better together: Comparing vulnerability prediction models. Information and Software Technology, 119, 106204.

Vanamala, M., Yuan, X., & Roy, K. (2020). Topic Modeling And Classification Of Common Vulnerabilities And Exposures Database. 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD),

Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Preprocessing techniques for text mining-an overview. International Journal of Computer Science & Communication Networks, 5(1), 7-16.

Walden, J., Stuckman, J., & Scandariato, R. (2014). Predicting vulnerable components: Software metrics vs text mining. 2014 IEEE 25th international symposium on software reliability engineering,

Wu, F., Wang, J., Liu, J., & Wang, W. (2017). Vulnerability detection with deep learning. 2017 3rd IEEE International Conference on Computer and Communications (ICCC),

Zerkane, S. (2018). Security Analysis and Access Control Enforcement through Software Defined Networks Brest].


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.