Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports

Kai Ma; Miao Tian; Yongjian Tan; Qinjun Qiu; Zhong Xie; Rong Huang

doi:10.1007/s12583-022-1724-z

Volume 34 Issue 5

Oct 2023

Turn off MathJax

Article Contents

Journal of Earth Science > 2023 > 34(5): 1390-1405.

Kai Ma, Miao Tian, Yongjian Tan, Qinjun Qiu, Zhong Xie, Rong Huang. Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports. Journal of Earth Science, 2023, 34(5): 1390-1405. doi: 10.1007/s12583-022-1724-z

Citation:

Kai Ma, Miao Tian, Yongjian Tan, Qinjun Qiu, Zhong Xie, Rong Huang. Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports. Journal of Earth Science, 2023, 34(5): 1390-1405. doi: 10.1007/s12583-022-1724-z

Citation:

PDF( 1350 KB)

Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports

doi: 10.1007/s12583-022-1724-z

Kai Ma^{1, 2, 3
,},
Miao Tian^{1, 2, 3},
Yongjian Tan^{1, 2, 3},
Qinjun Qiu^{4, 5, 6},
Zhong Xie^{4, 5},
Rong Huang^{7
,
,
,}

1.
Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang 443002, China
2.
College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China
3.
Hubei Engineering Technology Research Center for Farmland Environment Monitoring, China Three Gorges University, Yichang 443002, China
4.
School of Computer Science, China University of Geosciences, Wuhan 430074, China
5.
Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen 518034, China
6.
Beijing Key Laboratory of Urban Spatial Information Engineering, Beijing 100045, China
7.
College of Economics and Management, China Three Gorges University, Yichang 443002, China

More Information

Corresponding author: Rong Huang, huangrong@ctgu.edu.cn
Received Date: 25 Dec 2021
Accepted Date: 22 Aug 2022

Available Online: 14 Oct 2023

Issue Publish Date: 30 Oct 2023

Abstract

Abstract

Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models (such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers (BERT) pretrained language model. To effectively represent the text features, we construct a BERT- bidirectional gated recurrent unit network (BiGRU)-conditional random field (CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models (e.g., word2vec and Embedding from Language Models (ELMo)).
- ontology,
- BERT model,
- name entity recognition,
- relation extraction,
- knowledge graph

Conflict of Interest
The authors declare that they have no conflict of interest.

FullText(HTML)

References(64)

References

Bengio, Y., Ducharme, R., Vincent, P., 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3: 1137–1155

Bojanowski, P., Grave, E., Joulin, A., et al., 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5: 135–146. https://doi.org/10.1162/tacl_a_00051

Bouvrie, J., 2006. Notes on Convolutional Neural Networks, Neural Nets. http://web.mit.edu/jvb/www/papers/cnn_tutorial.pdf

Cao, P., Chen, Y., Liu, K., et al., 2018. Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31–November 4, Brusssels

Chiticariu, L., Krishnamurthy, R., Li, Y. Y., et al., 2010. Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. October 9–11, 2010, Cambridge. 1002–1012. https://doi.org/10.5555/1870658.1870756

Chiu, J. P., Nichols, E., 2016. Named Entity Recognition with Bidirectional LSTM-CNNS. Transactions of the Association for Computational Linguistics, 4: 357–370 doi: 10.1162/tacl_a_00104

Deng, C., Jia, Y. T., Xu, H., et al., 2021. GAKG: A Multimodal Geoscience Academic Knowledge Graph. Proceedings of the 30th ACM International Conference on Information & Knowledge Management. November 1–5, 2021, Virtual Event, New York. 4445–4454. https://doi.org/10.1145/3459637.3482003

Devlin, J., Chang, M. W., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805. https://arxiv.org/abs/1810.04805

Enkhsaikhan, M., Liu, W., Holden, E. J., et al., 2021. Auto-Labelling Entities in Low-Resource Text: A Geological Case Study. Knowledge and Information Systems, 63(3): 695–715. https://doi.org/10.1007/s10115-020-01532-6

Fan, J., Shen, S., Erwin, D. H., et al., 2020. A High-Resolution Summary of Cambrian to Early Triassic Marine Invertebrate Biodiversity. Science, 367(6475): 272–277 doi: 10.1126/science.aax4953

Fang, W. L., Ma, L., Love, P. E. D., et al., 2020. Knowledge Graph for Identifying Hazards on Construction Sites: Integrating Computer Vision with Ontology. Automation in Construction, 119: 103310. https://doi.org/10.1016/j.autcon.2020.103310

Gayen, V., Sarkar, K., 2014. An HMM Based Named Entity Recognition System for Indian Languages: The JU System at ICON 2013. arXiv: 1405.7397. https://arxiv.org/abs/1405.7397

Gers, F. A., Schmidhuber, J., Cummins, F., 2000. Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10): 2451–2471. https://doi.org/10.1162/089976600300015015

Ghahabi, O., Hernando, J., 2018. Restricted Boltzmann Machines for Vector Representation of Speech in Speaker Recognition. Computer Speech & Language, 47: 16–29. https://doi.org/10.1016/j.csl.2017.06.007

Hashimoto, K., Miwa, M., Tsuruoka, Y., et al., 2013. Simple Customization of Recursive Neural Networks for Semantic Relation Classification. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 18–21

Hochreiter, S., Schmidhuber, J., 1997. Long Short-Term Memory. Neural Computation, 9(8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Jauregi Unanue, I., Zare Borzeshi, E., Piccardi, M., 2017. Recurrent Neural Networks with Specialized Word Embeddings for Health-Domain Named-Entity Recognition. Journal of Biomedical Informatics, 76: 102–109. https://doi.org/10.1016/j.jbi.2017.11.007

Lai, T., Ji, H., Zhai, C. X., et al., 2021. Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference. arXiv: 2105.13456. https://arxiv.org/abs/2105.13456

Lawley, C. J. M., Raimondo, S., Chen, T. Y., et al., 2022. Geoscience Language Models and Their Intrinsic Evaluation. Applied Computing and Geosciences, 14: 100084. https://doi.org/10.1016/j.acags.2022.100084

Li, P. F., Mao, K. Z., 2019. Knowledge-Oriented Convolutional Neural Network for Causal Relation Extraction from Natural Language Texts. Expert Systems with Applications, 115: 512–523. https://doi.org/10.1016/j.eswa.2018.08.009

Lin, Y., Shen, S., Liu, Z., et al., 2016. Neural Relation Extraction with Selective Attention Over Instances. The 54th Annual Meeting of the Association for Computational Linguistics, August 7–12, Berlin

Liu, Z. J., Yang, M., Wang, X. L., et al., 2017. Entity Recognition from Clinical Texts via Recurrent Neural Network. BMC Medical Informatics and Decision Making, 17(Suppl 2): 67. https://doi.org/10.1186/s12911-017-0468-7

Lü, X., Xie, Z., Xu, D., et al., 2022. Chinese Named Entity Recognition in the Geoscience Domain Based on BERT. Earth and Space Science, 9(3): e2021EA002166 doi: 10.1029/2021EA002166

Ma, K., Tan, Y. J., Tian, M., et al., 2022a. Extraction of Temporal Information from Social Media Messages Using the BERT Model. Earth Science Informatics, 15(1): 573–584. https://doi.org/10.1007/s12145-021-00756-6

Ma, K., Tan, Y. J., Xie, Z., et al., 2022b. Chinese Toponym Recognition with Variant Neural Structures from Social Media Messages Based on BERT Methods. Journal of Geographical Systems, 24(2): 143–169. https://doi.org/10.1007/s10109-022-00375-9

Ma, K., Tian, M., Tan, Y. J., et al., 2022c. What is This Article About? Generative Summarization with the BERT Model in the Geosciences Domain. Earth Science Informatics, 15(1): 21–36. https://doi.org/10.1007/s12145-021-00695-2

Ma, X. G., 2019. Geo-Data Science: Leveraging Geoscience Research with Geoinformatics, Semantics and Open Data. Acta Geologica Sinica-English Edition, 93(S3): 44–47. https://doi.org/10.1111/1755-6724.14240

Ma, X. G., Ma, C., Wang, C. B., 2020. A New Structure for Representing and Tracking Version Information in a Deep Time Knowledge Graph. Computers & Geosciences, 145: 104620. https://doi.org/10.1016/j.cageo.2020.104620

Mikolov, T., Chen, K., Corrado, G., et al., 2013. Efficient Estimation of Word Representations in Vector Space. arXiv: 1301.3781. https://arxiv.org/abs/1301.3781

Miwa, M., Bansal, M., 2016. End-to-End Relation Extraction Using LSTMS on Sequences and Tree Structures. arXiv: 1601.00770. https://arxiv.org/abs/1601.00770

Nguyen, T. H., Grishman, R., 2015. Relation Extraction: Perspective from Convolutional Neural Networks. Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. Association for Computational Linguistics, Denver

Nieh, E. H., Schottdorf, M., Freeman, N. W., et al., 2021. Geometry of Abstract Learned Knowledge in the Hippocampus. Nature, 595(7865): 80–84. https://doi.org/10.1038/s41586-021-03652-7

Oramas, S., Ostuni, V. C., Di Noia, T., et al., 2017. Sound and Music Recommendation with Knowledge Graphs. ACM Transactions on Intelligent Systems and Technology, 8(2): 1–21. https://doi.org/10.1145/2926718

Palumbo, E., Monti, D., Rizzo, G., et al., 2020. Entity2rec: Property-Specific Knowledge Graph Embeddings for Item Recommendation. Expert Systems with Applications, 151: 113235. https://doi.org/10.1016/j.eswa.2020.113235

Peng, N. Y., Dredze, M., 2016. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning. arXiv: 1603.00786. https://arxiv.org/abs/1603.00786

Pennington, J., Socher, R., Manning, C., 2014. Glove: Global Vectors for Word RepresentationProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha

Peters, M. E., Neumann, M., Iyyer, M., et al., 2018. Deep Contextualized Word Representations. arXiv: 1802.05365. https://arxiv.org/abs/1802.05365

Qiu, Q. J., Xie, Z., Wu, L., et al., 2018. DGeoSegmenter: A Dictionary-Based Chinese Word Segmenter for the Geoscience Domain. Computers & Geosciences, 121: 1–11. https://doi.org/10.1016/j.cageo.2018.08.006

Qiu, Q. J., Xie, Z., Wu, L., et al., 2019a. BiLSTM-CRF for Geological Named Entity Recognition from the Geoscience Literature. Earth Science Informatics, 12(4): 565–579. https://doi.org/10.1007/s12145-019-00390-3

Qiu, Q. J., Xie, Z., Wu, L., et al., 2019b. Geoscience Keyphrase Extraction Algorithm Using Enhanced Word Embedding. Expert Systems With Applications, 125: 157–169. https://doi.org/10.1016/j.eswa.2019.02.001

Qiu, Q. J., Xie, Z., Wu, L. A., et al., 2019c. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931–946. https://doi.org/10.1029/2019ea000610

Qiu, Q. J., Xie, Z., Wu, L., et al., 2020a. Automatic Spatiotemporal and Semantic Information Extraction from Unstructured Geoscience Reports Using Text Mining Techniques. Earth Science Informatics, 13(4): 1393–1410. https://doi.org/10.1007/s12145-020-00527-9

Qiu, Q. J., Xie, Z., Wu, L. A., et al., 2020b. Dictionary-Based Automated Information Extraction from Geological Documents Using a Deep Learning Algorithm. Earth and Space Science, 7(3): e2019EA000993. https://doi.org/10.1029/2019ea000993

Qu, J. F., Ouyang, D. T., Hua, W., et al., 2018. Distant Supervision for Neural Relation Extraction Integrated with Word Attention and Property Features. Neural Networks, 100: 59–69. https://doi.org/10.1016/j.neunet.2018.01.006

Radford, A., Narasimhan, K., 2018. Improving Language Understanding by Generative Pre-Training, preprint. https://gwern.net/doc/www/s3-us-west-2.amazonaws.com/d73fdc5ffa8627bce44dcda2fc012da638ffb158.pdf

Radford, A., Wu, J., Child, R., et al., 2019. Language Models are Unsupervised Multitask Learners. OpenAI Blog, 1(8): 9

Santos, R., Murrieta-Flores, P., Calado, P., et al., 2018. Toponym Matching through Deep Neural Networks. International Journal of Geographical Information Science, 32(2): 324–348. https://doi.org/10.1080/13658816.2017.1390119

Singhal, A., 2012. Introducing the Knowledge Graph: Things, not Strings. Google Blog. https://www.blog.google/products/search/introducing-knowledge-graph-things-not/

Sun, C., Yang, Z. H., Wang, L., et al., 2021. Biomedical Named Entity Recognition Using BERT in the Machine Reading Comprehension Framework. Journal of Biomedical Informatics, 118: 103799. https://doi.org/10.1016/j.jbi.2021.103799

Surdeanu, M., Tibshirani, J., Nallapati, R., et al., 2012. Multi-Instance Multi-Label Learning for Relation Extraction. The 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. July 12– 14, 2012, Jeju Island. https://doi.org/10.5555/2390948.2391003

Vaswani, A., Shazeer, N., Parmar, N., et al., 2017. Attention is All You Need. Advances in Neural Information Processing Systems, 30: 5998–6008

Vincent, P., Larochelle, H., Bengio, Y., et al., 2008. Extracting and Composing Robust Features with Denoising Autoencoders. The 25th International conference on Machine Learning. July 5–9, 2008, Helsinki. https://doi.org/10.1145/1390156.1390294

Wang, C., Hazen, R. M., Cheng, Q., et al., 2021. The Deep-Time Digital Earth Program: Data-Driven Discovery in Geosciences. National Science Review, 8(9): nwab027. https://doi.org/10.1093/nsr/nwab027

Wu, S., Song, X. N., Feng, Z. H., 2021. MECT: Multi-Metadata Embedding Based Cross-Transformer for Chinese Named Entity Recognition. arXiv: 2107.05418. https://arxiv.org/abs/2107.05418

Xu, Y., Mou, L. L., Li, G., et al., 2015. Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Path. arXiv: 1508.03720. https://arxiv.org/abs/1508.03720

Yao, L., Zhang, Y., Chen, Q. F., et al., 2017. Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data. Engineering Applications of Artificial Intelligence, 64: 432–439. https://doi.org/10.1016/j.engappai.2017.06.024

Zeng, D., Liu, K., Lai, S., et al., 2014. Relation Classification via Convolutional Deep Neural Network. The 25th International Conference on Computational Linguistics: Technical Papers, March 25–28, Tokyo

Zhang, W., Du, Y. H., Yoshida, T., et al., 2019. DeepRec: A Deep Neural Network Approach to Recommendation with Item Embedding and Weighted Loss Function. Information Sciences, 470: 121–140. https://doi.org/10.1016/j.ins.2018.08.039

Zhang, X. Y., Ye, P., Wang, S., et al., 2018. Geological Entity Recognition Method Based on Deep Belief Networks. Acta Petrologica Sinica, 34(2): 343–351

Zheng, S. C., Hao, Y. X., Lu, D. Y., et al., 2017a. Joint Entity and Relation Extraction Based on a Hybrid Neural Network. Neurocomputing, 257: 59–66. https://doi.org/10.1016/j.neucom.2016.12.075

Zheng, S. C., Wang, F., Bao, H. Y., et al., 2017b. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. arXiv: 1706.05075. https://arxiv.org/abs/1706.05075

Zhou, C. H., Wang, H., Wang, C. S., et al., 2021. Geoscience Knowledge Graph in the Big Data Era. Science China Earth Sciences, 64(7): 1105–1114. https://doi.org/10.1007/s11430-020-9750-4

Zhou, P., El-Gohary, N., 2017. Ontology-Based Automated Information Extraction from Building Energy Conservation Codes. Automation in Construction, 74: 103–117 doi: 10.1016/j.autcon.2016.09.004

Zhou, P., Xu, J., Qi, Z., et al., 2018. Distant Supervision for Relation Extraction with Hierarchical Selective Attention. Neural Networks, 108: 240. https://doi.org/10.1016/j.neunet.2018.08.016

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(3) / Tables(11)

Get Citation

PDF

XML

Article Metrics

Article views(354) PDF downloads(51)

Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports

doi: 10.1007/s12583-022-1724-z

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports

doi: 10.1007/s12583-022-1724-z

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content