Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach

Qinjun Qiu; Miao Tian; Zhong Xie; Yongjian Tan; Kai Ma; Qingfang Wang; Shengyong Pan; Liufeng Tao

doi:10.1007/s12583-022-1789-8

Volume 34 Issue 5

Oct 2023

Turn off MathJax

Article Contents

Journal of Earth Science > 2023 > 34(5): 1406-1417.

Qinjun Qiu, Miao Tian, Zhong Xie, Yongjian Tan, Kai Ma, Qingfang Wang, Shengyong Pan, Liufeng Tao. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. Journal of Earth Science, 2023, 34(5): 1406-1417. doi: 10.1007/s12583-022-1789-8

Citation:

Qinjun Qiu, Miao Tian, Zhong Xie, Yongjian Tan, Kai Ma, Qingfang Wang, Shengyong Pan, Liufeng Tao. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. Journal of Earth Science, 2023, 34(5): 1406-1417. doi: 10.1007/s12583-022-1789-8

Citation:

Qinjun Qiu, Miao Tian, Zhong Xie, Yongjian Tan, Kai Ma, Qingfang Wang, Shengyong Pan, Liufeng Tao. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. Journal of Earth Science, 2023, 34(5): 1406-1417. doi: 10.1007/s12583-022-1789-8

PDF( 2304 KB)

Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach

doi: 10.1007/s12583-022-1789-8

Qinjun Qiu^{1, 2, 3
,},
Miao Tian^{4, 5},
Zhong Xie^{2, 3},
Yongjian Tan^{4, 5},
Kai Ma^{4, 5
,
,
,},
Qingfang Wang⁶,
Shengyong Pan⁷,
Liufeng Tao^{2, 3}

1.
State Key Laboratory of Geo-Information Engineering and Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR, Chinese Academy of Surveying and Mapping, Beijing 100036, China
2.
Key Laboratory of Spatial-Temporal Big Data Analysis and Application of Natural Resources in Megacities, MNR, Shanghai 20063, China
3.
Key Laboratory of Geological Survey and Evaluation of Ministry of Education, China University of Geosciences, Wuhan 430074, China
4.
Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang 443002, China
5.
College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China
6.
Chengdu Geological Environment Monitoring Station, Chengdu 610036, China
7.
Wuhan Zondy Cyber Science & Technology Co., Ltd., Wuhan 430074, China

More Information

Corresponding author: Kai Ma, makai@ctgu.edu.cn
Received Date: 03 Mar 2022
Accepted Date: 14 Nov 2022

Available Online: 14 Oct 2023

Issue Publish Date: 30 Oct 2023

Abstract

Abstract

Artificial intelligence (AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers (ALBERT)- Bi-directional Long Short-Term Memory (BiLSTM)-Conditional Random Fields (CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology.
- ontology,
- geological reports,
- named entity recognition,
- geological corpus construction,
- semi-automated annotation platforms,
- deep learning

Conflict of Interest
The authors declare that they have no conflict of interest.

FullText(HTML)

References(58)

References

Aone, C., Halverson, L., Hampton, T., et al., 1998. SRA: Description of the IE2 System Used for MUC-7. Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, April 29–May 1, Virginia

Artstein, R., Poesio, M., 2008. Inter-Coder Agreement for Computational Linguistics. Computational Linguistics, 34(4): 555–596 doi: 10.1162/coli.07-034-R2

Beniest, A., Schellart, W. P. A., 2020. Geological Map of the Scotia Sea Area Constrained by Bathymetry, Geological Data, Geophysical Data and Seismic Tomography Models from the Deep Mantle. Earth-Science Reviews, 22: 103391. https://doi.org/10.1016/j.earscirev.2020.103391

Bikel, D. M., Schwartz, R., Weischedel, R. M., 1999. An Algorithm That Learns What's in a Name. Machine Learning, 34(1): 211–231. https://doi.org/10.1023/a:1007558221122

Black, W. J., Rinaldi, F., Mowatt, D., 1998. FACILE: Description of the NE System Used for MUC-7. The Seventh Message Understanding Conference (MUC-7), April 29–May 1, Virginia

Borthwick, A. E., 1999. A Maximum Entropy Approach to Named Entity Recognition. New York University, New York

Cao, Y., Zhu, Q. M., Peifeng Li, P. F., 2013. A Method for Constructing a Corpus of Factual Information about Chinese Events. Journal of Chinese Information Processing, 27(6): 38–44

Carletta, J., 1996. Assessing Agreement on Classification Tasks: The Kappa Statistic. arXiv: cmp-lg/9602004. https://arxiv.org/abs/cmp-lg/9602004

Chen, W., Zhang, Y., Isahara, H., 2006. Chinese Named Entity Recognition with Conditional Random Fields. The Fifth SIGHAN Workshop on Chinese Language Processing. 22–23 July 2006, Sydney

Chu, D. P., Wan, B., Li, H., et al., 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 46(8): 3039–3048 (in Chinese with English Abstract)

Collobert, R., Weston, J., Bottou, L., et al., 2011. Natural Language Processing (almost) from Scratch. arXiv: 1103.0398. Journal of Machine Learning Research, 12: 2493–2537. https://arxiv.org/abs/1103.0398

Devlin, J., Chang, M. W., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805. https://arxiv.org/abs/1810.04805

Enkhsaikhan, M., Liu, W., Holden, E. J., et al., 2021. Auto-Labelling Entities in Low-Resource Text: A Geological Case Study. Knowledge and Information Systems, 63(3): 695–715. https://doi.org/10.1007/s10115-020-01532-6.

Fan, J., Shen, S., Erwin, D. H., et al., 2020. A High-Resolution Summary of Cambrian to Early Triassic Marine Invertebrate Biodiversity. Science, 367(6475): 272–277. https://doi.org/10.1126/science.aax4953

Feng, L. L., Li, J. H., Li, P. F., 2020. Corpus Construction Method of Technology and Terminology for National Defense Science and Technology. Journal of Chinese Information Processing, 34(8): 41–50

Fu, G. H., Luke, K. K., 2005. Chinese Named Entity Recognition Using Lexicalized HMMS. ACM SIGKDD Explorations Newsletter, 7(1): 19–25. https://doi.org/10.1145/1089815.1089819.

Hochreiter, S., Schmidhuber, J., 1997. Long Short-Term Memory. Neural Computation, 9(8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Hou, L. L., Zhang, J., Wu, O., et al., 2022. Method and Dataset Entity Mining in Scientific Literature: A CNN + BiLSTM Model with Self-Attention. Knowledge-Based Systems, 235: 107621. https://doi.org/10.1016/j.knosys.2021.107621.

Hripcsak, G., Rothschild, A. S., 2005. Agreement, the F-Measure, and Reliability in Information Retrieval. Journal of the American Medical Informatics Association, 12(3): 296–298. https://doi.org/10.1197/jamia.m1733

Huang, M. S., Lai, P. T., Tsai, R. T. H., et al., 2019. Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task. arXiv: 1901.10219. https://doi.org/10.1093/bib/bbaa054

Humphreys, K., Gaizauskas, R., Azzam, S., et al., 1998. University of Sheffield: Description of the LaSIE-Ⅱ System as Used for MUC-7. Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29–May 1, 1998

Isozaki, H., Kazawa, H., 2002. Efficient Support Vector Classifiers for Named Entity Recognition. Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 24 August–1 September, 2002, Taipei. https://doi.org/10.3115/1072228.1072282

Jin, Y. P., Wanvarie, D., Le, P. T. V., 2022. Learning from Noisy Out-of-Domain Corpus Using Dataless Classification. Natural Language Engineering, 28(1): 39–69. https://doi.org/10.1017/s1351324920000340

Krupka, G., IsoQuest, K., 2005. Description of the Nerowl Extractor System as Used for muc-7. Proceedings of the 7th Message Understanding Conference, Virginia

Lan, Z. Z., Chen, M. D., Goodman, S., et al., 2019. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv: 1909.11942. https://arxiv.org/abs/1909.11942

Li, R., Mo, T. J., Yang, J. X., et al., 2021. Bridge Inspection Named Entity Recognition via BERT and Lexicon Augmented Machine Reading Comprehension Neural Model. Advanced Engineering Informatics, 50: 101416. https://doi.org/10.1016/j.aei.2021.101416

Liu, P., Guo, Y. M., Wang, F. L., et al., 2022. Chinese Named Entity Recognition: The State of the Art. Neurocomputing, 473: 37–53. https://doi.org/10.1016/j.neucom.2021.10.101

Liu, W. C., Chunju Z, Chen, W., et al., 2021. Geological Time Information Extraction from Chinese Text Based on BiLSTM-CRF. Advances in Earth Science, 36(2): 211–220 (in Chinese with English Abstract)

Ma, K., Tian, M., Tan, Y. J., et al., 2022. What is this Article About? Generative Summarization with the BERT Model in the Geosciences Domain. Earth Science Informatics, 15(1): 21–36. https://doi.org/10.1007/s12145-021-00695-2

Ma, X. G., Ma, C., Wang, C. B., 2020. A New Structure for Representing and Tracking Version Information in a Deep Time Knowledge Graph. Computers & Geosciences, 145: 104620. https://doi.org/10.1016/j.cageo.2020.104620

Ma, X. Z., Hovy, E., 2016. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF. arXiv: 1603.01354. https://arxiv.org/abs/1603.01354

McCallum, A., Li, W., 2003. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Proceedings of the Seventh Conference on Natural Language lLearning at HLT-NAACL 2003. Association for Computational Linguistics Edmonton, Morristown

Mo, T. J., Li, R., Yang, J. X., et al., 2020. Construction of Named Entity Recognition Corpus in the Field of Periodic Inspection of Highways and Bridges. Computer Applications, 40(S1): 103–108 (in Chinese with English Abstract)

Nieh, E. H., Schottdorf, M., Freeman, N. W., et al., 2021. Geometry of Abstract Learned Knowledge in the Hippocampus. Nature, 595(7865): 80–84. https://doi.org/10.1038/s41586-021-03652-7

Ogren, P. V., Savova, G., Chute, C., 2008. Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition. LREC, 8: 3143–3150

Peters, M. E., Neumann, M., Iyyer, M., et al., 2018. Deep Contextualized Word Representations. arXiv: 1802.05365. https://arxiv.org/abs/1802.05365

Qiu, Q. J., Xie, Z., Wu, L. A., et al., 2019a. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931–946. https://doi.org/10.1029/2019ea000610

Qiu, Q. J., Xie, Z., Wu, L., et al., 2019b. BiLSTM-CRF for Geological Named Entity Recognition from the Geoscience Literature. Earth Science Informatics, 12(4): 565–579. https://doi.org/10.1007/s12145-019-00390-3

Qiu, Q. J., Xie, Z., Wu, L., et al., 2019c. Geoscience Keyphrase Extraction Algorithm Using Enhanced Word Embedding. Expert Systems with Applications, 125: 157–169. https://doi.org/10.1016/j.eswa.2019.02.001

Qu, C. Y., Yi, G., Yang, J. F., et al., 2015. Construction of Annotated Corpus of Named Entities in Chinese Electronic Medical Records. High Technology Letters, 25(2): 143–150 (in Chinese with English Abstract)

Schiffries, C. M., Wang, C., Hazen, R., et al., 2020. The Deep-Time Digital Earth Program: Data Driven Discovery in the Geosciences. AGU Fall Meeting 2020, 1–17 December, online

Shan, Y. D., Wang, H. J., Huang, H., et al., 2019. Research on Named Entity Recognition Model Based on Attention Mechanism. Frontiers in Chemistry, 11: 958002. https://doi.org/10.3389/fchem.2023.958002

Verhagen, M., Saurí, R., Caselli, T., et al., 2010. SemEval-2010 Task 13: TempEval-2. Proceedings of the 5th International Workshop on Semantic Evaluation. July 15–16, 2010, Los Angeles. https://doi.org/10.5555/1859664.1859674

Wang, C. S., Hazen, R. M., Cheng, Q. M., et al., 2021. The Deep-Time Digital Earth Program: Data-Driven Discovery in Geosciences. National Science Review, 8(9): nwab027. https://doi.org/10.1093/nsr/nwab027

Wang, J. B., Lu, F., Wu, S., et al., 2018. Construction Method of Geographic Entity Relationship Corpus Based on Automatic Return Bid. Journal of Geo-Information Science, 20(7): 871–879

Wu, L. A., Xue, L., Li, C. L., et al., 2017. A Knowledge-Driven Geospatially Enabled Framework for Geological Big Data. ISPRS International Journal of Geo-Information, 6(6): 166. https://doi.org/10.3390/ijgi6060166

Wu, T., Li, M. Y., Kong, F., 2020. Construction of a Corpus Based on Synonymous Reasoning of Subordinate Relations of Text-Level Entities. Journal of Chinese Information Processing, 34(4): 38–46

Xing, D., Rao, G. Q., Xun, E. D., et al., 2020. Construction of a Collocation Library of Preposition Structure Based on a Large-Scale Corpus. Journal of Chinese Information Processing, 34(11): 1–8 (in Chinese with English Abstract)

Xu, J. J., He, H. F., Sun, X., et al., 2018. Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11): 2142–2152. https://doi.org/10.1109/taslp.2018.2856625

Yang, J. F., Guan, Y., He, B., et al., 2016. Construction of a Corpus of Named Entities and Entity Relationships in Chinese Electronic Medical Records. Journal of Software, 27(11): 2725–2746 (in Chinese with English Abstract)

Zaitouny, A., Small, M., Hill, J., et al., 2020. Fast Automatic Detection of Geological Boundaries from Multivariate Log Data Using Recurrence. Computers & Geosciences, 135: 104362

Zhang, K. L., Zhao, X., Guan, T. F., et al., 2020. Construction and Application of Entity and Relationship Labeling Platform for Medical Text. Journal of Chinese Information Processing, 34(6): 36–44 (in Chinese with English Abstract)

Zhang, N. X., Li, F., Xu, G. L., et al., 2019. Chinese NER Using Dynamic Meta-Embeddings. IEEE Access, 7: 64450–64459. https://doi.org/10.1109/access.2019.2916816

Zhang, Q., Sun, Y., Zhang, L. L., et al., 2021. Named Entity Recognition Method in Health Preserving Field Based on BERT. Procedia Computer Science, 183: 212–220. https://doi.org/10.1016/j.procs.2021.03.010

Zhang, X. Y., Ye, P., Wang, S., et al., 2018. Recognition Method of Geological Entities Based on Deep Belief Network. Acta Petrologica Sinica, 34(2): 343–351 (in Chinese with English Abstract)

Zhang, Y., Yang, J., 2018. Chinese NER Using Lattice LSTM. arXiv: 1805.02023. https://arxiv.org/abs/1805.02023

Zhou, C. H., Wang, H., Wang, C. S., et al., 2021. Geoscience Knowledge Graph in the Big Data Era. Science China Earth Sciences, 64(7): 1105–1114. https://doi.org/10.1007/s11430-020-9750-4

Zhou, G. D., Su, J., 2002. Named Entity Recognition Using an HMM-Based Chunk Tagger. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACM, New York, 473–480. https://doi.org/10.3115/1073083.1073163

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(9) / Tables(7)

Get Citation

PDF

XML

Article Metrics

Article views(183) PDF downloads(35)

Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach

doi: 10.1007/s12583-022-1789-8

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach

doi: 10.1007/s12583-022-1789-8

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content