Research Article
Open access
Published on 30 May 2023
Download pdf
Zhou,W. (2023). Research on information extraction technology applied for knowledge graphs. Applied and Computational Engineering,4,26-31.
Export citation

Research on information extraction technology applied for knowledge graphs

Wei Zhou *,1,
  • 1 Renmin University of China,No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/4/20230340

Abstract

Information extraction is an important part of natural language processing and is an important basis for building question and answer systems and knowledge graphs. A growing number of new technologies are being applied to information extraction with the development of deep learning techniques. As a first step, this paper introduces information extraction techniques and their main tasks, then describes the development history of information extraction techniques, and introduces the practice and application of different types of information extraction techniques in knowledge graph construction, including entity-extraction, relationship extraction and attribute extraction. Finally, some problems and research directions faced by information extraction techniques are discussed.

Keywords

Knowledge Graph, Information Extraction, Entity Extraction, Relationship Extraction

[1]. Etzioni, O., Fader, A., Christensen, J., et al. (2011) Open Information Extraction: The Second Generation.

[2]. Wu, X.D., Wu, J., Fu, X.Y., Li, J.C., Zhou, P. and Jiang, X. (2019) Automatic Knowledge Graph Construction: A Report on the 2019 ICDM/ICBK Contest. 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8-11 November 2019, 1540-1545. https://doi.org/10.1109/ICDM.2019.00204

[3]. Ralph Grishman, Beth Sundheim: Message Understanding Conference - 6: A Brief History. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING), I, Copenhagen, 1996, 466–471.

[4]. Zhao, J., Liu, K., Zhou, G.Y., et al. (2011) Open Information Extraction. Journal of Chinese Information Processing, 25, 98-110.

[5]. Etzioni, O., Cafarella, M., Downey, D., et al. (2005) Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence, 165, 91-134. https://doi.org/10.1016/j.artint.2005.03.001

[6]. Shi, B., Zhang, Z., Sun, L., et al. (2014) A Probabilistic Co-Bootstrapping Method for Entity Set Expansion. Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, august 2014, 2280-2290.

[7]. Agichtein, E. and Gravano, L. (2000) Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the 5th ACM Conference on Digital Libraries, San Antonio, June 2010, 85-94.

[8]. Fader, A., Soderland, S. and Etzioni, O. (2011) Identifying Relations for Open Information Extraction. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, John McIntyre Conference Centre,Edinburgh, 27-31 July 2011, 1535-1545.

[9]. RatnaParkhi, A. (1997) A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Institute for Research in Cognitive Science, Technical Reports, University of Pennsylvania, Pennsylvania, 97-108.

[10]. Rau, L.F. (1991) Extracting Company Names from Text. Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications Piscataway, Miami Beach, 24-28 February 1991, 29-32.

[11]. Zhu, J., Nei, Z.Q., Liu, X.J., et al. (2009) StatSnowball: A Statistical Approach to Extracting Entity Relationships. Proceedings of the 18th International Conference on World Wide Web, Madrid, 20-24 April 2009, 101-110.

[12]. Yi, L., Mari, O. and Hannaneh, H. (2017) Scientific Information Extraction with Semi-Supervised Neural Tagging. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, September 2017 2641-2651.

[13]. Godin, F., Vandersmissen, B., Neve, W.D., et al. (2015) Multimedia Lab @ ACL W-NUT NER Shared Task: NamedEntity Recognition for Twitter Microposts Using Distributed Word Representations. Proceedings of the Workshop on Noisy User-Generated Text, Beijing, July 2015, 146-153.

[14]. Bollegala, D.T., Matsuo, Y. and Ishizuka, M. (2010) Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web. Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, 26-30 April 2010, 151-160.

[15]. SODERLANDS. Learning information extraction rules for semi - structured and Free Text[J]. Machine Learning, 1999,34(1-3):233 - 272.

[16]. ZHOU G D, SU J. Named entity recognition USing an HMM— based chunk tagger[C]//Proceedings of 40th Annual Meeting of the Association for Computatoional Linguistics. Philadelphia, PA, USA, 2002 :473-480.

[17]. BORTHWICK A. A maximum entropy approach to named entity recog¬nition[D]. New York: New York University, 1999.

[18]. CRISTANINI N, SHAWE - TAYLOR J. An introduction to support vector machines[M]. Cambridge: Cambridge University Press, 2000.

[19]. LAFFERTY J, MCALLUM A, PEREIRA F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C] //Proceedings of the Eighteenth International Conference on Machine Learning, 2001 :282 -289

[20]. MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[J]. arXiv preprint arXiv: 1301.3781, 2013

[21]. LIUC Y, SUNWB, CHAO W H, et al. Convolution Neural Network for Relation Extraction L C]//International Conference on Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, 2013 : 231 - 242.

[22]. ZENG D, LIU K, LAI S, et al. Relation classification via convolu¬tional deep neural network [C]//Proceedings of the 25th International Conference on Computational Linguistics, 2014:2335 -2344.

[23]. NGUYEN TH, GRISHMAN R. Combining neural networks and log - linear models to improve relation extraction [ J. arXiv preprint arXiv: 1511.05926,2015.

[24]. ZH DONGXU, DONG W. Relation Classification via Recurrent Neu¬ral Network [J], arXiv preprintarXiv : 1508.01006,2015 : 121 -128.

[25]. CAI R, ZHANG X, WANG H. Bidirectional Recurrent Convolutional Neural Network for Relation Classification 1 C J//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany ,2016 :756 -765.

[26]. MIWA M, BANSAL M. End - to - end relation extraction using LST- Ms on sequences and tree structures [C] //Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Ber¬lin, Germany, 2016 : 1105 -1116.

[27]. KATIYAR A, CARDIE C. Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees [C]//Pro¬ceedings of the 55th Annual Meeting of the Association for Computa¬tional Linguistics. Vancouver, Canada, 2017 :917 - 928.

[28]. DEVLIN J, CHANG M W, LEK K, et al. BERT: Pre - training of Deep Bidirectional Transformers for Language Understanding] C」// Proceedings of the 2019 Confence of the North American Chapter of the Association for Computational Linguistics: Human Lnguage Tech¬nologies. 2019 : 4171 -4186.

[29]. Young, T., Hazarika, D., et al. Recent trends in deep learning based natural language processing[J]. IEEE Computational Intelligence Magazine. 2018, 13(3), 55-75.

[30]. Wang, Zhiguo, et al. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering[C]// Proceedings of the 2019. Conference on Empirical Methods in Natural Language Processing and the 9th International. Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

[31]. Rajpurkar, Pranav, et al. Squad: 100,000+ questions for machine comprehension of text[J]. arXiv preprint arXiv:1606.05250, 2016.

[32]. Alberti, Chris, Kenton Lee, et al. A bert baseline for the natural questions[J]. arXiv preprintarXiv:1901.08634, 2019.

[33]. Kwiatkowski, Tom, et al. Natural questions: a benchmark for question answering research[J]. Transactions of the Association for Computational Linguistics 7. 2019: 453-466.

Cite this article

Zhou,W. (2023). Research on information extraction technology applied for knowledge graphs. Applied and Computational Engineering,4,26-31.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 3rd International Conference on Signal Processing and Machine Learning

Conference website: http://www.confspml.org
ISBN:978-1-915371-55-3(Print) / 978-1-915371-56-0(Online)
Conference date: 25 February 2023
Editor:Omer Burak Istanbullu
Series: Applied and Computational Engineering
Volume number: Vol.4
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).