Citation: | Qinjun Qiu, Zhen Huang, Dexin Xu, Kai Ma, Liufeng Tao, Run Wang, Jianguo Chen, Zhong Xie, Yongsheng Pan. Integrating NLP and Ontology Matching into a Unified System for Automated Information Extraction from Geological Hazard Reports. Journal of Earth Science, 2023, 34(5): 1433-1446. doi: 10.1007/s12583-022-1716-z |
Many detailed data on past geological hazard events are buried in geological hazard reports and have not been fully utilized. The growing developments in geographic information retrieval and temporal information retrieval offer opportunities to analyse this wealth of data to mine the spatiotemporal evolution of geological disaster occurrence and enhance risk decision making. This study pre-sents a combined NLP and ontology matching information extraction framework for automatically re-cognizing semantic and spatiotemporal information from geological hazard reports. This framework mainly extracts unstructured information from geological disaster reports through named entity recognition, ontology matching and gazetteer matching to identify and annotate elements, thus enabling users to quickly obtain key information and understand the general content of disaster reports. In addition, we present the final results obtained from the experiments through a reasonable visualization and analyse the visual results. The extraction and retrieval of semantic information related to the dynamics of geohazard events are performed from both natural and human perspectives to provide information on the progress of events.
Abdelkoui, F., Kholladi, M. K., 2017. Extracting Criminal-Related Events from Arabic Tweets. Journal of Information Technology Research, 10(3): 34–47. https://doi.org/10.4018/jitr.2017070103 |
Abraham, S., Mäs, S., Bernard, L., 2018. Extraction of Spatio-Temporal Data about Historical Events from Text Documents. Transactions in GIS, 22(3): 677–696. https://doi.org/10.1111/tgis.12448 |
Ali Sit, M., Koylu, C., Demir, I., 2019. Identifying Disaster-Related Tweets and Their Semantic, Spatial and Temporal Context Using Deep Learning, Natural Language Processing and Spatial Analysis: A Case Study of Hurricane Irma. International Journal of Digital Earth, 12(11): 1205–1229. https://doi.org/10.1080/17538947.2018.1563219 |
Burel, G., Saif, H., Alani, H., 2017. Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media. The Semantic Web-ISWC 2017: 16th International Semantic Web Conference, October 21–25, 2017, Vienna. |
Campos, R., Dias, G., Jorge, A. M., et al., 2015. Survey of Temporal Information Retrieval and Related Applications. ACM Computing Surveys, 47(2): 1–41. https://doi.org/10.1145/2619088 |
Chiu, J. P. C., Nichols, E., 2015. Named Entity Recognition with Bidirectional LSTM-CNNS. arXiv: 1511.08308. |
Clough, P., 2005. Extracting Metadata for Spatially-Aware Information Retrieval on the Internet. The 2005 Workshop on Geographic Information Retrieval. 4 November 2005, Bremen. |
Fan, R., Wang, L. Z., Yan, J. N., et al., 2019. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 9(1): 15 https://doi.org/10.3390/ijgi9010015 |
Gregory, I., 2002. A Place in History: A Guide to Using GIS in Historical Research. Oxbow Books, Oxford |
Jayawardhana, U. K., Gorsevski, P. V., 2019. An Ontology-Based Framework for Extracting Spatio-Temporal Influenza Data Using Twitter. International Journal of Digital Earth, 12(1): 2–24. https://doi.org/10.1080/17538947.2017.1411535 |
Jindal, P., Roth, D., 2013. Extraction of Events and Temporal Expressions from Clinical Narratives. Journal of Biomedical Informatics, 46: S13–S19. https://doi.org/10.1016/j.jbi.2013.08.010 |
Karimzadeh, M., Huang, W. Y., Banerjee, S., et al., 2013. GeoTxt: A Web API to Leverage Place References in Text. Proceedings of the 7th Workshop on Geographic Information Retrieval. November 5, 2013, Orlando. |
Karimzadeh, M., Pezanowski, S., MacEachren, A., et al., 2019. GeoTxt: A Scalable Geoparsing System for Unstructured Text Geolocation. GeoTxt: A Scalable Geoparsing System. Transactions in GIS, 23(1): 118–136. https://doi.org/10.1111/tgis.12510 |
Kordjamshidi, P., Van Otterlo, M., Moens, M. F., 2011. Spatial Role Labeling: Towards Extraction of Spatial Relations from Natural Language. ACM Transactions on Speech and Language Processing (TSLP), 8(3): 1–36 |
Lee, C. H., Wu, C. H., Yang, H. C., et al., 2013. Exploiting Online Social Data in Ontology Learning for Event Tracking and Emergency Response. The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, August 25–28, 2013, Niagara. |
Li, S., Chen, J. P., Xiang, J., 2018. Prospecting Information Extraction by Text Mining Based on Convolutional Neural Networks―A Case Study of the Lala Copper Deposit, China. IEEE Access, 6: 52286–52297. https://doi.org/10.1109/access.2018.2870203 |
Lin, S., Jin, P. Q., Zhao, X. J., 2014. Exploiting Temporal Information in Web Search. Expert Systems with Applications: An International Journal, 41: 331–341. https://doi.org/10.1016/j.eswa.2013.07.048 |
Liu, K. J., El-Gohary, N., 2017. Ontology-Based Semi-Supervised Conditional Random Fields for Automated Information Extraction from Bridge Inspection Reports. Automation in Construction, 81: 313–327. https://doi.org/10.1016/j.autcon.2017.02.003 |
Ma, K., Tan, Y. J., Tian, M., et al., 2022a. Extraction of Temporal Information from Social Media Messages Using the BERT Model. Earth Science Informatics, 15(1): 573–584. https://doi.org/10.1007/s12145-021-00756-6 |
Ma, K., Tan, Y. J., Xie, Z., et al., 2022b. Chinese Toponym Recognition with Variant Neural Structures from Social Media Messages Based on BERT Methods. Journal of Geographical Systems, 24(2): 143–169. https://doi.org/10.1007/s10109-022-00375-9 |
Ma, K., Tian, M., Tan, Y. J., et al., 2022c. What is this Article About? Generative Summarization with the BERT Model in the Geosciences Domain. Earth Science Informatics, 15(1): 21–36. https://doi.org/10.1007/s12145-021-00695-2 |
Nguyen, D. T., Joty, S., Imran, M., et al., 2016. Applications of Online Deep Learning for Crisis Response Using Social Media Information. arXiv: 1610.01030. |
Olteanu, A., Castillo, C., Diaz, F., et al., 2014. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. Proceedings of the International AAAI Conference on Web and Social Media, 8(1): 376–385. https://doi.org/10.1609/icwsm.v8i1.14538 |
Qiu, Q. J., Xie, Z., Ma, K., et al., 2022a. Spatially Oriented Convolutional Neural Network for Spatial Relation Extraction from Natural Language Texts. Transactions in GIS, 26(2): 839–866. https://doi.org/10.1111/tgis.12887 |
Qiu, Q. J., Xie, Z., Wang, S., et al., 2022b. ChineseTR: A Weakly Supervised Toponym Recognition Architecture Based on Automatic Training Data Generator and Deep Neural Network. Transactions in GIS, 26(3): 1256–1279. https://doi.org/10.1111/tgis.12902 |
Qiu, Q. J., Xie, Z., Wu, L., et al., 2018. DGeoSegmenter: A Dictionary-Based Chinese Word Segmenter for the Geoscience Domain. Computers & Geosciences, 121: 1–11. https://doi.org/10.1016/j.cageo.2018.08.006 |
Qiu, Q. J., Xie, Z., Wu, L., et al., 2019a. BiLSTM-CRF for Geological Named Entity Recognition from the Geoscience Literature. Earth Science Informatics, 12(4): 565–579. https://doi.org/10.1007/s12145-019-00390-3 |
Qiu, Q. J., Xie, Z., Wu, L. A., et al., 2019b. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931–946. https://doi.org/10.1029/2019ea000610 |
Qiu, Q. J., Xie, Z., Wu, L., et al., 2019c. Geoscience Keyphrase Extraction Algorithm Using Enhanced Word Embedding. Expert Systems With Applications, 125: 157–169. https://doi.org/10.1016/j.eswa.2019.02.001 |
Qiu, Q. J., Xie, Z., Wu, L., et al., 2020a. Automatic Spatiotemporal and Semantic Information Extraction from Unstructured Geoscience Reports Using Text Mining Techniques. Earth Science Informatics, 13(4): 1393–1410. https://doi.org/10.1007/s12145-020-00527-9 |
Qiu, Q. J., Xie, Z., Wu, L., et al., 2020b. Dictionary-Based Automated Information Extraction from Geological Documents Using a Deep Learning Algorithm. Earth and Space Science, 7(3): e2019ea000993. https://doi.org/10.1029/2019ea000993 |
Strotgen, J., Gertz, M., Popv, P., 2010. Extraction and Exploration of Spatiotemporal Information in Documents. The 6th Workshop on Geographic Information Retrieval, February 18–19, Zurich. |
Strötgen, J., Gertz, M., 2010. HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions. The 5th International Workshop on Semantic Evaluation, July 15–16, 2010, Uppsala |
Volz, R., Kleb, J., Mueller, W., 2007. Towards Ontology-Based Disambiguation of Geographical Identifiers. The 16th International World Wide Web Conference (WWW2007), May 8–12, 2007, Banff |
Wang, W., Kreimeyer, K., Woo, E. J., et al., 2016. A New Algorithmic Approach for the Extraction of Temporal Associations from Clinical Narratives with an Application to Medical Product Safety Surveillance Reports. Journal of Biomedical Informatics, 62: 78–89. https://doi.org/10.1016/j.jbi.2016.06.006 |
Wang, W., Stewart, K., 2015. Spatiotemporal and Semantic Information Extraction from Web News Reports about Natural Hazards. Computers, Environment and Urban Systems, 50: 30–40. https://doi.org/10.1016/j.compenvurbsys.2014.11.001 |
Wu, L. A., Xue, L., Li, C. L., et al., 2017. A Knowledge-Driven Geospatially Enabled Framework for Geological Big Data. ISPRS International Journal of Geo-Information, 6(6): 166. https://doi.org/10.3390/ijgi6060166 |
Yeung, C. M. A., Jatowt, A., 2011. Studying how the Past is Remembered: Towards Computational History through Large Scale Text Mining. Proceedings of the 20th ACM International Conference on Information and Knowledge Management. October 24–28, 2011, Glasgow. |
Zhang, F., Fleyeh, H., Wang, X. R., et al., 2019. Construction Site Accident Analysis Using Text Mining and Natural Language Processing Techniques. Automation in Construction, 99: 238–248. https://doi.org/10.1016/j.autcon.2018.12.016 |
Zhang, Q. Q., Jin, P. Q., Lin, S., et al., 2011. Extracting Focused Locations for Web Pages. Lecture Notes in Computer Science, 7142: 76–89 |
Zhou, P., El-Gohary, N., 2017. Ontology-Based Automated Information Extraction from Building Energy Conservation Codes. Automation in Construction, 74: 103–117. https://doi.org/10.1016/j.autcon.2016.09.004 |
Zhou, P., Xu, J. M., Qi, Z. Y., et al., 2018. Distant Supervision for Relation Extraction with Hierarchical Selective Attention. Neural Networks, 108: 240–247. https://doi.org/10.1016/j.neunet.2018.08.016 |