Advanced Search

Indexed by SCI、CA、РЖ、PA、CSA、ZR、etc .

Volume 35 Issue 3
Jun 2024
Turn off MathJax
Article Contents
Hang He, Chao Ma, Shan Ye, Wenqiang Tang, Yuxuan Zhou, Zhen Yu, Jiaxin Yi, Li Hou, Mingcai Hou. Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning. Journal of Earth Science, 2024, 35(3): 1035-1043. doi: 10.1007/s12583-023-1944-8
Citation: Hang He, Chao Ma, Shan Ye, Wenqiang Tang, Yuxuan Zhou, Zhen Yu, Jiaxin Yi, Li Hou, Mingcai Hou. Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning. Journal of Earth Science, 2024, 35(3): 1035-1043. doi: 10.1007/s12583-023-1944-8

Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning

doi: 10.1007/s12583-023-1944-8
More Information
  • Corresponding author: Chao Ma, machao@cdut.edu.cn
  • Received Date: 14 Jul 2023
  • Accepted Date: 13 Sep 2023
  • Issue Publish Date: 30 Jun 2024
  • Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information. With the rapid development of science and technology, a large number of textual reports have accumulated in the field of geology. However, many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining, making it more challenging for some researchers to extract necessary information from these texts. Natural Language Processing (NLP) has obvious advantages in processing large amounts of textual data. The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques. We propose the RoBERTa-Prompt-Tuning-NER method, which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations. The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors. Finally, we conducted experiments on the constructed Geological Named Entity Recognition (GNER) dataset. Our experimental results show that the proposed model achieves the highest F1 score of 80.64% among the four baseline algorithms, demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.

     

  • Conflict of Interest
    The authors declare that they have no conflict of interest.
  • loading
  • Allahyari, M., Pouriyeh, S., Assefi, M., et al., 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. arXiv: 1707.02919. http://arxiv.org/abs/1707.02919
    Bowring, J. F., McLean, N. M., Walker, J. D., et al., 2015. Advanced Cyberinfrastructure for Geochronology as a Collaborative Endeavor: A Decade of Progress, A Decade of Plans. American Geophysical Union, Fall Meeting 2015. IN23E-03
    Chan, M. A., Peters, S. E., Tikoff, B., 2016. The Future of Field Geology, Open Data Sharing and CyberTechnology in Earth Science. The Sedimentary Record, 14(1): 4–10. https://doi.org/10.2110/sedred.2016.1.4
    Chu, D. P., Wan, B., Li, H., et al., 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 46(8): 3039–3048. https://doi.org/10.3799/dqkx.2020.309 (in Chinese with English Abstract)
    Consoli, B., Santos, J., Gomes, D., et al., 2020. Embeddings for Named Entity Recognition in Geoscience Portuguese Literature. Proceedings of The 12th Language Resources and Evaluation Conference. Euro-pean Language Resources Association, Marseille, France. 4625–4630
    Cutcher-Gershenfeld, J., Baker, K. S., Berente, N., et al., 2016. Build It, but will They Come? A Geoscience Cyberinfrastructure Baseline Analysis. Data Science Journal, 15: 8. https://doi.org/10.5334/dsj-2016-008
    Devlin, J., Chang, M. W., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805. http://arxiv.org/abs/1810.04805
    Enkhsaikhan, M., Holden, E. J., Duuring, P., et al., 2021. Understanding Ore-Forming Conditions Using Machine Reading of Text. Ore Geology Reviews, 135: 104200. https://doi.org/10.1016/j.oregeorev.2021.104200
    Fan, R. Y., Wang, L. Z., Yan, J. N., et al., 2019. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 9(1): 15. https://doi.org/10.3390/ijgi9010015
    Guo, C., Xu, Q., Dong, X. J., et al., 2021. Geohazard Recognition and Inventory Mapping Using Airborne LiDAR Data in Complex Mountainous Areas. Journal of Earth Science, 32(5): 1079–1091. https://doi.org/10.1007/s12583-021-1467-2
    He, Y. X., Luo, C. W., Hu, B. Y., 2015. Geographic Entity Recognition Method Based on Crf Model and Rules Combination. Computer Appli-cations and Software, 32(1): 179–185, 202. https://doi.org/10.3969/j.issn.1000-386x.2015.01.046 (in Chinese with English Abstract)
    Holden, E. J., Liu, W., Horrocks, T., et al., 2019. GeoDocA—Fast Analysis of Geological Content in Mineral Exploration Reports: A Text Mining Approach. Ore Geology Reviews, 111: 102919. https://doi.org/10.1016/j.oregeorev.2019.05.005
    Huang, G. H., Zhong, J., Wang, C., et al., 2022. Prompt-Based Self-Training Framework for Few-Shot Named Entity Recognition. Knowledge Science, Engineering and Management. Proceedings of 15th International Conference, KSEM 2022. August 6–8, 2022, Singapore. 91–103. https://doi.org/10.1007/978-3-031-10989-8_8
    Kitchin, R., 2014. Big Data, New Epistemologies and Paradigm Shifts. Big Data & Society, 1(1): 205395171452848. https://doi.org/10.1177/2053951714528481
    Lehnert, K., Su, Y., Langmuir, C. H., et al., 2000. A Global Geochemical Database Structure for Rocks. Geochemistry, Geophysics, Geosystems, 1(1): 1012. https://doi.org/10.1029/1999gc000026
    Li, D. F., Hu, B. T., Chen, Q. C., 2022. Prompt-Based Text Entailment for Low-Resource Named Entity Recognition. arXiv: 2211.03039. http://arxiv.org/abs/2211.03039
    Liu, P. F., Yuan, W. Z., Fu, J. L., et al., 2023. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55(9): 195. https://doi.org/10.1145/3560815
    Lü, X., Xie, Z., Xu, D. X., et al., 2022. Chinese Named Entity Recognition in the Geoscience Domain Based on BERT. Earth and Space Science, 9(3): e02166. https://doi.org/10.1029/2021ea002166
    Ma, K., Tian, M., Tan, Y. J., et al., 2022. Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology. Journal of Global Change Data & Discovery, 6(1): 78–84. https://doi.org/10.3974/geodp.2022.01.11
    McKay, N. P., Emile-Geay, J., 2016. Technical Note: The Linked Paleo Data Framework—A Common Tongue for Paleoclimatology. Climate of the Past, 12(4): 1093–1100. https://doi.org/10.5194/cp-12-1093-2016
    Peters, S. E., Husson, J. M., 2018. We need a Global Comprehensive Stratigraphic Database: Here's a Start. The Sedimentary Record, 16(1): 4–9. https://doi.org/10.2110/sedred.2018.1.4
    Peters, S. E., Husson, J. M., Czaplewski, J., 2018. Macrostrat: A Platform for Geological Data Integration and Deep-Time Earth Crust Research. Geochemistry, Geophysics, Geosystems, 19(4): 1393–1409. https://doi.org/10.1029/2018gc007467
    Peters, S. E., McClennen, M., 2016. The Paleobiology Database Application Programming Interface. Paleobiology, 42(1): 1–7. https://doi.org/10.1017/pab.2015.39
    Piskorski, J., Yangarber, R., 2013. Information Extraction: Past, Present and Future. Multi-source, Multilingual Information Extraction and Summarization. Springer, Berlin, Heidelberg. 23–49. https://doi.org/10.1007/978-3-642-28569-1_2
    Qiu, Q. J., Xie, Z., Wu, L., et al., 2019. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931–946. https://doi.org/10.1029/2019ea000610
    Qiu, Q. J., Tian, M., Xie, Z., et al., 2023. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. Journal of Earth Science, 34(5): 1406–1417. https://doi.org/10.1007/s12583-022-1789-8
    Quinn, D., Linzmeier, B., Sundell, K., et al., 2021. Implementing the Sparrow Laboratory Data System in Multiple Subdomains of Geochro-nology and Geochemistry. EGU General Assembly Conference Abstracts. EGU21-13832. https://doi.org/10.5194/egusphere-egu21-13832
    Raja, N. B., Dunne, E. M., Matiwane, A., et al., 2022. Colonial History and Global Economics Distort our Understanding of Deep-Time Biodiver-sity. Nature Ecology & Evolution, 6(2): 145–154. https://doi.org/10.1038/s41559-021-01608-8
    Sang, E. F., De Meulder, F., 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. Edmonton, Canada. Association for Computational Linguistics, Morristown, NJ, USA. https://doi.org/10.3115/1119176.1119195
    Shin, T., Razeghi, Y., Logan IV, R. L., et al., 2020. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. arXiv: 2010.15980. http://arxiv.org/abs/2010.15980
    Shipley, T. F., Tikoff, B., 2019. Collaboration, Cyberinfrastructure, and Cognitive Science: The Role of Databases and Dataguides in 21st Century Structural Geology. Journal of Structural Geology, 125: 48–54. https://doi.org/10.1016/j.jsg.2018.05.007
    Singer, D. A., 2021. How Deep Learning Networks could be Designed to Locate Mineral Deposits. Journal of Earth Science, 32(2): 288–292. https://doi.org/10.1007/s12583-020-1399-2
    Vieira, D. A., Mookerjee, M., Matsa, S., 2014. Incorporating Geoscience, Field Data Collection Workflows into Software Developed for Mobile Devices. AGU Fall Meeting Abstracts. IN41A-3641
    Walker, J. D., Tikoff, B., Newman, J., et al., 2019. StraboSpot Data System for Structural Geology. Geosphere, 15(2): 533–547. https://doi.org/10.1130/ges02039.1
    Walker, J., Lehnert, K., Hofmann, A., et al., 2005. EarthChem: International Collaboration for Solid Earth Geochemistry in Geoinformatics. AGU Fall Meeting Abstracts. IN44A-03
    Wang, B., Ma, K., Wu, L., et al., 2022. Visual Analytics and Information Extraction of Geological Content for Text-Based Mineral Exploration Reports. Ore Geology Reviews, 144: 104818. https://doi.org/10.1016/j.oregeorev.2022.104818
    Wang, Q. Y., Li, Z. H., Tu, Z. P., et al., 2023. Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model. Earth Science, 48(8): 3137–3150. https://doi.org/10.3799/dqkx.2022.462 (in Chinese with English Abstract)
    Williams, J. W., Grimm, E. C., Blois, J. L., et al., 2018. The Neotoma Paleoecology Database, a Multiproxy, International, Community-Curated Data Resource. Quaternary Research, 89(1): 156–177. https://doi.org/10.1017/qua.2017.105
    Yan, H., Yang, N., Peng, Y., et al., 2020. Data Mining in the Construction Industry: Present Status, Opportunities, and Future Trends. Automation in Construction, 119: 103331. https://doi.org/10.1016/j.autcon.2020.103331
    Yao, Y., Zhang, A., Zhang, Z. Y., et al., 2021. CPT: Colorful Prompt Tuning for Pre-Trained Vision-Language Models. arXiv: 2109.11797. http://arxiv.org/abs/2109.11797
    Ye, S., 2022. A Quantitative Investigation of Large Geoscientific Datasets: How Records of Geochronology and Macroevolution are Distorted by Paleoclimate, Paleoenvironment, and Sediment Preservation: [Disser-tation]. The University of Wisconsin-Madison, Madison
    Ye, S., Cuzzone, J. K., Marcott, S. A., et al., 2023. A Quantitative Assessment of Snow Shielding Effects on Surface Exposure Dating from a Western North American 10Be Data Compilation. Quaternary Geochronology, 76: 101440. https://doi.org/10.1016/j.quageo.2023.101440
    Ye, S., Peters, S. E., 2023. Bedrock Geological Map Predictions for Phanerozoic Fossil Occurrences. Paleobiology, 49(3): 394–413. https://doi.org/10.1017/pab.2022.46
    Zhu, Y. Q., Sun, K., Hu, X. M., et al., 2023. Research and Practice on the Framework for the Construction, Sharing, and Application of Large-Scale Geoscience Knowledge Graphs. Journal of Geo-information Science, 25(6): 1215–1227. https://doi.org/10.12082/dqxxkx.2023.210696 (in Chinese with English Abstract)
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(4)  / Tables(5)

    Article Metrics

    Article views(36) PDF downloads(90) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return