Keyphrase Extraction Model: A New Design and Application on Tourism Information

Hien Ngo Le Huy, Hoang Ho Minh, Tien Nguyen Van, Hieu Nguyen Van

Abstract


Keyphrase extraction has recently become a foundation for developing digital library applications, especially in semantic information retrieval techniques. From that context, in this paper, a keyphrase extraction model was formulated in terms of Natural Language Processing, applied explicitly in extracting information and searching techniques in tourism. The proposed process includes collecting and processing data from tourism sources such as Tripadvisor.com, Agoda.com, and vietnam-guide.com. Then, the raw data was analyzed and pre-processed with labeling keyphrase and fed data forward to Pretrained BERT model and Bidirectional Long Short-Term Memory with Conditional Random Field. The model performed the combination of Bidirectional Long Short-Term Memory with Conditional Random Field in order to solve keyphrase extraction tasks. Furthermore, the model integrated the Elasticsearch technique to enhance performance and time of looking up tourism destinations' information. The outcome extracted key phrases produce high accuracy and can be applied for extraction problems and textual content summaries.

Full Text:

PDF

References


Alzaidy, Rabah; Caragea, Cornelia; GILES, C. Lee. Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In: The world wide web conference. 2019. p. 2551-2557.

https://doi.org/10.1145/3308558.3313642.

Mahata, Debanjan, et al. Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018. p. 634-639.

http://dx.doi.org/10.18653/v1/N18-2100.

Mihalcea, Rada; Tarau, Paul. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. 2004. p. 404-411.

Danesh, Soheil; Sumner, Tamara; Martin, James H. Sgrank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In: Proceedings of the fourth joint conference on lexical and computational semantics. 2015. p. 117-126.

http://dx.doi.org/10.18653/v1/S15-1013.

Yu, Yang; Ng, Vincent. Wikirank: Improving keyphrase extraction based on background knowledge. arXiv:1803.09000, 2018.

Feldman, Ronen, et al. Self-supervised relation extraction from the web. In: International Symposium on Methodologies for Intelligent Systems. Springer, Berlin, Heidelberg, 2006. p. 755-764.

https://doi.org/10.1007/s10115-007-0110-6.

HU, Xuming, Et Al. SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction. arXiv:2004.02438, 2020.

Hochreiter, Sepp; Schmidhuber, Jürgen. Long shortterm memory. Neural computation, 1997, 9.8: 1735-1780.

https://doi.org/10.1162/neco.1997.9.8.1735.

Alzaidy, Rabah; Caragea, Cornelia; Giles, C. Lee. Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In: The world wide web conference. 2019. p. 2551-2557.

https://doi.org/10.1145/3308558.3313642.

Mikolov, Tomas, et al. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.

Devlin, Jacob, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805, 2018.

Graves, Alex; Fernández, Santiago; Schmidhuber, Jürgen. Bidirectional LSTM networks for improved phoneme classification and recognition. In: International conference on artificial neural networks. Springer, Berlin, Heidelberg, 2005. p.799-804.

https://doi.org/10.1007/11550907_126.

David, Rumelhart. Recurrent Neural Networks, 1986.

Pennington, Jeffrey; Socher, Richard; Manning, Christopher D. Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. p. 1532-1543.

http://dx.doi.org/10.3115/v1/D14-1162.

Hien, Ngo Le Huy; Tien, Thai Quang; Van Hieu,Nguyen. Web Crawler: Design and Implementation for Extracting Article-Like Contents. Cybernetics and Physics, 2020, 9.3: 144-151.

https://doi.org/10.35470/2226-4116-2020-9-3-144-151.

Witten, Ian H., et al. Kea: Practical automated keyphrase extraction. In: Design and Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI global, 2005. p. 129-152.

https://doi.org/10.4018/978-1-59140-441-5.ch008.

Hien, Ngo Le Huy; Van Hieu, Nguyen. Recognition of Plant Species using Deep Convolutional Feature Extraction. International Journal on Emerging Technologies, 2020, 11.3: 904-910.

https://doi.org/10.14445/22315381/IJETTV68I4P205S.

Zhang, Chengzhi. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems, 2008, 4.3: 1169-1180.

Hien, Ngo Le Huy; Huy, Luu Van; Van Hieu, Nguyen. Artwork style transfer model using deep learning approach. Cybernetics and Physics, 2021, 10.3: 127-137.

https://doi.org/10.35470/2226-4116-2021-10-3-127-137.

Munezero, Myriam, et al. Automatic detection of antisocial behaviour in texts. Informatica, 2014, 38.1: 3-10.

Azam, Irfan, and Sajid Ali Khan. Feature extraction trends for intelligent facial expression recognition: A survey. Informatica, 2018, 42.4: 507-514.

https://doi.org/10.31449/inf.v42i4.2037.

Chen, Feng, and Shi Zhang. Information Visualization Analysis of Public Opinion Data on Social Media. Informatica, 2021, 45.1: 157-162.

https://doi.org/10.31449/inf.v45i1.3426.

Menai, Mohamed El Bachir. Word sense disambiguation using an evolutionary approach. Informatica, 2014, 38.3: 155-169.




DOI: https://doi.org/10.31449/inf.v45i4.3493

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.