Citation: | Shaochun Dong, Yukun Shi, Yizao Ran, Haijun Wu, Yiying Deng, Junxuan Fan, Xinyu Dai. Biological Classification System Knowledge Graph and Semi-automatic Construction of Its Invertebrate Fossil Branches. Journal of Earth Science, 2024, 35(6): 2119-2128. doi: 10.1007/s12583-023-1941-y |
Biological classification is the foundation of biology and paleontology, as it arranges all the organisms in a hierarchy that humans can easily follow and understand. It is further used to reconstruct the evolution of life. A biological classification system (BCS) that includes all the established fossil taxa would be both useful and challenging for uncovering the life history. Since fossil taxa were originally recorded in various published books and articles written by natural languages, the primary step is to organize all those taxa information in a manner that can be deciphered by a computer system. A Knowledge Graph (KG) is a formalized description framework of semantic knowledge, which represents and retrieves knowledge in a machine-understandable way, and therefore provides an eligible method to represent the BCS. In this paper, a model of the BCS KG including the ontology and fact layers is presented. To put it into practice, the ontology layer of the invertebrate fossil branches was manually developed, while the fact layer was automatically constructed by extracting information from 46 volumes of the Treatise of Invertebrate Paleontology series with the help of natural language processing technology. As a result, 27 348 taxa nodes spanning fourteen taxonomic ranks were extracted with high accuracy and high efficiency, and the invertebrate fossil branches of the BCS KG was thus installed. This study demonstrates that a properly designed KG model and its automatic construction with the help of natural language processing are reliable and efficient.
Adam-Blondon, A. F., Alaux, M., Pommier, C., et al., 2016. Towards an Open Grapevine Information System. Horticulture Research, 3: 16056. https://doi.org/10.1038/hortres.2016.56 |
Aristotle, 1995. The Complete Works of Aristotle. Vols. Ⅰ and Ⅱ. Princeton University Press, Princeton, NJ |
Arp, R., Smith, B., Spear, A. D., 2015. Building Ontologies with Basic Formal Ontology. The MIT Press. 1-25. https://doi.org/10.7551/mitpress/9780262527811.001.0001 |
Ayadi, A., Samet, A., de Bertrand de Beuvron, F., et al., 2019a. Ontology Population with Deep Learning-Based NLP: A Case Study on the Biomolecular Network Ontology. Procedia Computer Science, 159: 572-581. https://doi.org/10.1016/j.procs.2019.09.212 |
Ayadi, A., Zanni-Merk, C., de Bertrand de Beuvron, F., et al., 2019b. BNO—An Ontology for Understanding the Transittability of Complex Biomolecular Networks. Journal of Web Semantics, 57: 100495. https://doi.org/10.1016/j.websem.2019.01.002 |
Benton, M. J., 2000. Stems, Nodes, Crown Clades, and Rank-Free Lists: Is Linnaeus Dead? Biological Reviews, 75(4): 633-648. https://doi.org/10.1111/j.1469-185x.2000.tb00055.x |
Brower, A. V., Schuh, R. T., 2021. Biological Systematics: Principles and Applications. Cornell University Press, Ithaca, NY. 435 |
Cox, S. J. D., Richard, S. M., 2005. A Formal Model for the Geologic Time Scale and Global Stratotype Section and Point, Compatible with Geospatial Information Transfer Standards. Geosphere, 1(3): 119-137. https://doi.org/10.1130/ges00022.1 |
Ellis, B. F., Messina, A. R., 1942. Catalogue of Foraminifera. American Museum of Natural History. Micropaleontology Press, New York |
Ellis, B. F., Messina, A. R., 1952. Catalogue of Ostracoda. American Museum of Natural History. Micropaleontology Press, New York |
Fallahi, G. R., Frank, A. U., Mesgari, M. S., et al., 2008. An Ontological Structure for Semantic Interoperability of GIS and Environmental Modeling. International Journal of Applied Earth Observation and Geoinformation, 10(3): 342-357. https://doi.org/10.1016/j.jag.2008.01.001 |
Gruber, T. R., 1993. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2): 199-220. https://doi.org/10.1006/knac.1993.1008 |
Gruber, T. R., 1995. Toward Principles for the Design of Ontologies Used for Knowledge Sharing? International Journal of Human-Computer Studies, 43(5/6): 907-928. https://doi.org/10.1006/ijhc.1995.1081 |
Guan, N. N., Song, D. D., Liao, L. J., 2019. Knowledge Graph Embedding with Concepts. Knowledge-Based Systems, 164: 38-44. https://doi.org/10.1016/j.knosys.2018.10.008 |
Gutierrez, C., Sequeda, J. F., 2021. Knowledge Graphs. Communications of the ACM, 64(3): 96-104. https://doi.org/10.1145/3418294 |
Hinchliff, C. E., Smith, S. A., Allman, J. F., et al., 2015. Synthesis of Phylogeny and Taxonomy into a Comprehensive Tree of Life. Proceedings of the National Academy of Sciences of the United States of America, 112(41): 12764-12769. https://doi.org/10.1073/pnas.1423041112 |
Hou, C. B., Liu, K. C., Wang, T. H., et al., 2024. DDE KG Editor: A Data Service System for Knowledge Graph Construction in Geoscience. Geoscience Data Journal. https://doi.org/10.1002/gdj3.245 |
Hu, X. M., Xu, Y. W., Ma, X. G., et al., 2023. Knowledge System, Ontology, and Knowledge Graph of the Deep-Time Digital Earth (DDE): Progress and Perspective. Journal of Earth Science, 34(5): 1323-1327. https://doi.org/10.1007/s12583-023-1930-1 |
Linnaeus, C., 1753. Species Plantarum. Stockholm, Sweden |
Linnaeus, C., 1758. Systema Naturae, Sive Regna Tria Naturae Systematice Proposita per Classes, Ordines, Genera, & Species. 10th Edition. Haak, Leiden |
Ma, X. G., 2022. Knowledge Graph Construction and Application in Geosciences: A Review. Computers & Geosciences, 161: 105082. https://doi.org/10.1016/j.cageo.2022.105082 |
Ma, X. G., Wu, C. L., Carranza, E. J. M., et al., 2010. Development of a Controlled Vocabulary for Semantic Interoperability of Mineral Exploration Geodata for Mining Projects. Computers & Geosciences, 36(12): 1512-1522. https://doi.org/10.1016/j.cageo.2010.05.014 |
Neches, R., Fikes, R., Finin, T., et al., 1991. Enabling Technology for Knowledge Sharing. AI Magazine, 12(3): 36-56 |
Qiu, Q. J., Wang, B., Ma, K., et al., 2023. A Practical Approach to Constructing a Geological Knowledge Graph: A Case Study of Mineral Exploration Data. Journal of Earth Science, 34(5): 1374-1389. https://doi.org/10.1007/s12583-023-1809-3 |
Qiu, Z. X., Zhang, M. M, Wu, X. Z., 2015. Palaeovertebrata Sinica. Science Press, Beijing (in Chinese with English Abstract) |
Ravikumar, K. E., Wagholikar, K. B., Liu, H. F., 2014. Towards Pathway Curation through Literature Mining: A Case Study Using PharmGKB. Pacific Symposium on Biocomputing, 2014: 352-363 |
Redelings, B. D., Holder, M. T., 2017. A Supertree Pipeline for Summarizing Phylogenetic and Taxonomic Information for Millions of Species. PeerJ, 5: e3058. https://doi.org/10.7717/peerj.3058 |
Ruggiero, M. A., Gordon, D. P., Orrell, T. M., et al., 2015. A Higher Level Classification of all Living Organisms. PLoS One, 10(4): e0119248. https://doi.org/10.1371/journal.pone.0119248 |
Sancetta, C. A., 1985. Catalogue of Diatoms. Micropaleontology Press, New York |
Scott-Ram, N. R., 1990. Transformed Cladistics, Taxonomy and Evolution. Cambridge University Press, Cambridge. 5-36 |
Selden, P. A., 2012. Treatise on Invertebrate Paleontology: A Work in Progress. PALAIOS, 27(7): 439-442. https://doi.org/10.2110/palo.2012.so4 |
Shen, S. Z., Fan, J. X., Wang, X. D., et al., 2022. How to Build a High-Resolution Digital Geological Timescale?. Journal of Earth Science, 33(6): 1629-1632. https://doi.org/10.1007/s12583-022-1315-z |
Singhal, A., 2012. Introducing the Knowledge Graph: Things, not Strings. (2012-5-16). |
Smith, S. A., Brown, J. W., Hinchliff, C. E., 2013. Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs. PLoS Computational Biology, 9(9): e1003223. https://doi.org/10.1371/journal.pcbi.1003223 |
Verma, A. K., Prakash, S., 2020. Status of Animal Phyla in Different Kingdom Systems of Biological Classification. International Journal of Biological Innovations, 2(2): 149-154. https://doi.org/10.46505/ijbi.2020.2211 |
Wang, C. B., Ma, X. G., Chen, J. G., et al., 2018. Information Extraction and Knowledge Graph Construction from Geoscience Literature. Computers and Geosciences, 112: 112-120. https://doi.org/10.1016/j.cageo.2017.12.007 |
Wang, C. S., Hazen, R. M., Cheng, Q. M., et al., 2021. The Deep-Time Digital Earth Program: Data-Driven Discovery in Geosciences. National Science Review, 8(9): nwab027. https://doi.org/10.1093/nsr/nwab027 |
Wang, Z., Zhang, J. W., Feng, J. L., et al., 2014. Knowledge Graph Embedding by Translating on Hyperplanes. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1): 1112-1119. https://doi.org/10.1609/aaai.v28i1.8870 |
Xi, J. L., Wu, J., Wu, M. B., 2023. Design and Construction of Lightweight Domain Ontology of Tectonic Geomorphology. Journal of Earth Science, 34(5): 1350-1357. https://doi.org/10.1007/s12583-022-1779-x |
Zhang, L. N., Hou, Z. S., Shen, B. H., et al., 2023. Paleobiogeographic Knowledge Graph: An Ongoing Work with Fundamental Support for Future Research. Journal of Earth Science, 34(5): 1339-1349. https://doi.org/10.1007/s12583-023-1845-z |
Zhou, Z. Y., Sun, G., Wang, J., et al., 2020. Palaeobotanica Sinica. Science Press, Beijing (in Chinese with English Abstract) |
Zhu, Y. Q., Zhou, W. W., Xu, Y., et al., 2017. Intelligent Learning for Knowledge Graph towards Geological Data. Scientific Programming, 2017(1): 5072427. https://doi.org/10.1155/2017/5072427 |