Advanced Search

Indexed by SCI、CA、РЖ、PA、CSA、ZR、etc .

Volume 34 Issue 5
Oct 2023
Turn off MathJax
Article Contents
Qinjun Qiu, Miao Tian, Zhong Xie, Yongjian Tan, Kai Ma, Qingfang Wang, Shengyong Pan, Liufeng Tao. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. Journal of Earth Science, 2023, 34(5): 1406-1417. doi: 10.1007/s12583-022-1789-8
Citation: Qinjun Qiu, Miao Tian, Zhong Xie, Yongjian Tan, Kai Ma, Qingfang Wang, Shengyong Pan, Liufeng Tao. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. Journal of Earth Science, 2023, 34(5): 1406-1417. doi: 10.1007/s12583-022-1789-8

Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach

doi: 10.1007/s12583-022-1789-8
More Information
  • Corresponding author: Kai Ma, makai@ctgu.edu.cn
  • Received Date: 03 Mar 2022
  • Accepted Date: 14 Nov 2022
  • Available Online: 14 Oct 2023
  • Issue Publish Date: 30 Oct 2023
  • Artificial intelligence (AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers (ALBERT)- Bi-directional Long Short-Term Memory (BiLSTM)-Conditional Random Fields (CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology.

     

  • Conflict of Interest
    The authors declare that they have no conflict of interest.
  • loading
  • Aone, C., Halverson, L., Hampton, T., et al., 1998. SRA: Description of the IE2 System Used for MUC-7. Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, April 29–May 1, Virginia
    Artstein, R., Poesio, M., 2008. Inter-Coder Agreement for Computational Linguistics. Computational Linguistics, 34(4): 555–596 doi: 10.1162/coli.07-034-R2
    Beniest, A., Schellart, W. P. A., 2020. Geological Map of the Scotia Sea Area Constrained by Bathymetry, Geological Data, Geophysical Data and Seismic Tomography Models from the Deep Mantle. Earth-Science Reviews, 22: 103391. https://doi.org/10.1016/j.earscirev.2020.103391
    Bikel, D. M., Schwartz, R., Weischedel, R. M., 1999. An Algorithm That Learns What's in a Name. Machine Learning, 34(1): 211–231. https://doi.org/10.1023/a:1007558221122
    Black, W. J., Rinaldi, F., Mowatt, D., 1998. FACILE: Description of the NE System Used for MUC-7. The Seventh Message Understanding Conference (MUC-7), April 29–May 1, Virginia
    Borthwick, A. E., 1999. A Maximum Entropy Approach to Named Entity Recognition. New York University, New York
    Cao, Y., Zhu, Q. M., Peifeng Li, P. F., 2013. A Method for Constructing a Corpus of Factual Information about Chinese Events. Journal of Chinese Information Processing, 27(6): 38–44
    Carletta, J., 1996. Assessing Agreement on Classification Tasks: The Kappa Statistic. arXiv: cmp-lg/9602004. https://arxiv.org/abs/cmp-lg/9602004
    Chen, W., Zhang, Y., Isahara, H., 2006. Chinese Named Entity Recognition with Conditional Random Fields. The Fifth SIGHAN Workshop on Chinese Language Processing. 22–23 July 2006, Sydney
    Chu, D. P., Wan, B., Li, H., et al., 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 46(8): 3039–3048 (in Chinese with English Abstract)
    Collobert, R., Weston, J., Bottou, L., et al., 2011. Natural Language Processing (almost) from Scratch. arXiv: 1103.0398. Journal of Machine Learning Research, 12: 2493–2537. https://arxiv.org/abs/1103.0398
    Devlin, J., Chang, M. W., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805. https://arxiv.org/abs/1810.04805
    Enkhsaikhan, M., Liu, W., Holden, E. J., et al., 2021. Auto-Labelling Entities in Low-Resource Text: A Geological Case Study. Knowledge and Information Systems, 63(3): 695–715. https://doi.org/10.1007/s10115-020-01532-6.
    Fan, J., Shen, S., Erwin, D. H., et al., 2020. A High-Resolution Summary of Cambrian to Early Triassic Marine Invertebrate Biodiversity. Science, 367(6475): 272–277. https://doi.org/10.1126/science.aax4953
    Feng, L. L., Li, J. H., Li, P. F., 2020. Corpus Construction Method of Technology and Terminology for National Defense Science and Technology. Journal of Chinese Information Processing, 34(8): 41–50
    Fu, G. H., Luke, K. K., 2005. Chinese Named Entity Recognition Using Lexicalized HMMS. ACM SIGKDD Explorations Newsletter, 7(1): 19–25. https://doi.org/10.1145/1089815.1089819.
    Hochreiter, S., Schmidhuber, J., 1997. Long Short-Term Memory. Neural Computation, 9(8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
    Hou, L. L., Zhang, J., Wu, O., et al., 2022. Method and Dataset Entity Mining in Scientific Literature: A CNN + BiLSTM Model with Self-Attention. Knowledge-Based Systems, 235: 107621. https://doi.org/10.1016/j.knosys.2021.107621.
    Hripcsak, G., Rothschild, A. S., 2005. Agreement, the F-Measure, and Reliability in Information Retrieval. Journal of the American Medical Informatics Association, 12(3): 296–298. https://doi.org/10.1197/jamia.m1733
    Huang, M. S., Lai, P. T., Tsai, R. T. H., et al., 2019. Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task. arXiv: 1901.10219. https://doi.org/10.1093/bib/bbaa054
    Humphreys, K., Gaizauskas, R., Azzam, S., et al., 1998. University of Sheffield: Description of the LaSIE-Ⅱ System as Used for MUC-7. Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29–May 1, 1998
    Isozaki, H., Kazawa, H., 2002. Efficient Support Vector Classifiers for Named Entity Recognition. Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 24 August–1 September, 2002, Taipei. https://doi.org/10.3115/1072228.1072282
    Jin, Y. P., Wanvarie, D., Le, P. T. V., 2022. Learning from Noisy Out-of-Domain Corpus Using Dataless Classification. Natural Language Engineering, 28(1): 39–69. https://doi.org/10.1017/s1351324920000340
    Krupka, G., IsoQuest, K., 2005. Description of the Nerowl Extractor System as Used for muc-7. Proceedings of the 7th Message Understanding Conference, Virginia
    Lan, Z. Z., Chen, M. D., Goodman, S., et al., 2019. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv: 1909.11942. https://arxiv.org/abs/1909.11942
    Li, R., Mo, T. J., Yang, J. X., et al., 2021. Bridge Inspection Named Entity Recognition via BERT and Lexicon Augmented Machine Reading Comprehension Neural Model. Advanced Engineering Informatics, 50: 101416. https://doi.org/10.1016/j.aei.2021.101416
    Liu, P., Guo, Y. M., Wang, F. L., et al., 2022. Chinese Named Entity Recognition: The State of the Art. Neurocomputing, 473: 37–53. https://doi.org/10.1016/j.neucom.2021.10.101
    Liu, W. C., Chunju Z, Chen, W., et al., 2021. Geological Time Information Extraction from Chinese Text Based on BiLSTM-CRF. Advances in Earth Science, 36(2): 211–220 (in Chinese with English Abstract)
    Ma, K., Tian, M., Tan, Y. J., et al., 2022. What is this Article About? Generative Summarization with the BERT Model in the Geosciences Domain. Earth Science Informatics, 15(1): 21–36. https://doi.org/10.1007/s12145-021-00695-2
    Ma, X. G., Ma, C., Wang, C. B., 2020. A New Structure for Representing and Tracking Version Information in a Deep Time Knowledge Graph. Computers & Geosciences, 145: 104620. https://doi.org/10.1016/j.cageo.2020.104620
    Ma, X. Z., Hovy, E., 2016. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF. arXiv: 1603.01354. https://arxiv.org/abs/1603.01354
    McCallum, A., Li, W., 2003. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Proceedings of the Seventh Conference on Natural Language lLearning at HLT-NAACL 2003. Association for Computational Linguistics Edmonton, Morristown
    Mo, T. J., Li, R., Yang, J. X., et al., 2020. Construction of Named Entity Recognition Corpus in the Field of Periodic Inspection of Highways and Bridges. Computer Applications, 40(S1): 103–108 (in Chinese with English Abstract)
    Nieh, E. H., Schottdorf, M., Freeman, N. W., et al., 2021. Geometry of Abstract Learned Knowledge in the Hippocampus. Nature, 595(7865): 80–84. https://doi.org/10.1038/s41586-021-03652-7
    Ogren, P. V., Savova, G., Chute, C., 2008. Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition. LREC, 8: 3143–3150
    Peters, M. E., Neumann, M., Iyyer, M., et al., 2018. Deep Contextualized Word Representations. arXiv: 1802.05365. https://arxiv.org/abs/1802.05365
    Qiu, Q. J., Xie, Z., Wu, L. A., et al., 2019a. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931–946. https://doi.org/10.1029/2019ea000610
    Qiu, Q. J., Xie, Z., Wu, L., et al., 2019b. BiLSTM-CRF for Geological Named Entity Recognition from the Geoscience Literature. Earth Science Informatics, 12(4): 565–579. https://doi.org/10.1007/s12145-019-00390-3
    Qiu, Q. J., Xie, Z., Wu, L., et al., 2019c. Geoscience Keyphrase Extraction Algorithm Using Enhanced Word Embedding. Expert Systems with Applications, 125: 157–169. https://doi.org/10.1016/j.eswa.2019.02.001
    Qu, C. Y., Yi, G., Yang, J. F., et al., 2015. Construction of Annotated Corpus of Named Entities in Chinese Electronic Medical Records. High Technology Letters, 25(2): 143–150 (in Chinese with English Abstract)
    Schiffries, C. M., Wang, C., Hazen, R., et al., 2020. The Deep-Time Digital Earth Program: Data Driven Discovery in the Geosciences. AGU Fall Meeting 2020, 1–17 December, online
    Shan, Y. D., Wang, H. J., Huang, H., et al., 2019. Research on Named Entity Recognition Model Based on Attention Mechanism. Frontiers in Chemistry, 11: 958002. https://doi.org/10.3389/fchem.2023.958002
    Verhagen, M., Saurí, R., Caselli, T., et al., 2010. SemEval-2010 Task 13: TempEval-2. Proceedings of the 5th International Workshop on Semantic Evaluation. July 15–16, 2010, Los Angeles. https://doi.org/10.5555/1859664.1859674
    Wang, C. S., Hazen, R. M., Cheng, Q. M., et al., 2021. The Deep-Time Digital Earth Program: Data-Driven Discovery in Geosciences. National Science Review, 8(9): nwab027. https://doi.org/10.1093/nsr/nwab027
    Wang, J. B., Lu, F., Wu, S., et al., 2018. Construction Method of Geographic Entity Relationship Corpus Based on Automatic Return Bid. Journal of Geo-Information Science, 20(7): 871–879
    Wu, L. A., Xue, L., Li, C. L., et al., 2017. A Knowledge-Driven Geospatially Enabled Framework for Geological Big Data. ISPRS International Journal of Geo-Information, 6(6): 166. https://doi.org/10.3390/ijgi6060166
    Wu, T., Li, M. Y., Kong, F., 2020. Construction of a Corpus Based on Synonymous Reasoning of Subordinate Relations of Text-Level Entities. Journal of Chinese Information Processing, 34(4): 38–46
    Xing, D., Rao, G. Q., Xun, E. D., et al., 2020. Construction of a Collocation Library of Preposition Structure Based on a Large-Scale Corpus. Journal of Chinese Information Processing, 34(11): 1–8 (in Chinese with English Abstract)
    Xu, J. J., He, H. F., Sun, X., et al., 2018. Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11): 2142–2152. https://doi.org/10.1109/taslp.2018.2856625
    Yang, J. F., Guan, Y., He, B., et al., 2016. Construction of a Corpus of Named Entities and Entity Relationships in Chinese Electronic Medical Records. Journal of Software, 27(11): 2725–2746 (in Chinese with English Abstract)
    Zaitouny, A., Small, M., Hill, J., et al., 2020. Fast Automatic Detection of Geological Boundaries from Multivariate Log Data Using Recurrence. Computers & Geosciences, 135: 104362
    Zhang, K. L., Zhao, X., Guan, T. F., et al., 2020. Construction and Application of Entity and Relationship Labeling Platform for Medical Text. Journal of Chinese Information Processing, 34(6): 36–44 (in Chinese with English Abstract)
    Zhang, N. X., Li, F., Xu, G. L., et al., 2019. Chinese NER Using Dynamic Meta-Embeddings. IEEE Access, 7: 64450–64459. https://doi.org/10.1109/access.2019.2916816
    Zhang, Q., Sun, Y., Zhang, L. L., et al., 2021. Named Entity Recognition Method in Health Preserving Field Based on BERT. Procedia Computer Science, 183: 212–220. https://doi.org/10.1016/j.procs.2021.03.010
    Zhang, X. Y., Ye, P., Wang, S., et al., 2018. Recognition Method of Geological Entities Based on Deep Belief Network. Acta Petrologica Sinica, 34(2): 343–351 (in Chinese with English Abstract)
    Zhang, Y., Yang, J., 2018. Chinese NER Using Lattice LSTM. arXiv: 1805.02023. https://arxiv.org/abs/1805.02023
    Zhou, C. H., Wang, H., Wang, C. S., et al., 2021. Geoscience Knowledge Graph in the Big Data Era. Science China Earth Sciences, 64(7): 1105–1114. https://doi.org/10.1007/s11430-020-9750-4
    Zhou, G. D., Su, J., 2002. Named Entity Recognition Using an HMM-Based Chunk Tagger. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACM, New York, 473–480. https://doi.org/10.3115/1073083.1073163
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(7)

    Article Metrics

    Article views(117) PDF downloads(32) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return