Microfossils play a crucial role in biostratigraphy and palaeoenvironmental reconstructions, as the first appearance datum (FAD) and last appearance datum (LAD) of specific microfossils enable precise stratigraphic correlations and age determinations. However, traditional identification methods are often time-intensive and heavily dependent on expert knowledge. To overcome these limitations, we propose a dual-path deep learning model, MicroViT, which integrates convolutional neural networks (CNNs) and vision transformers (ViTs) to automate the identification of Cenozoic ostracods (
Microlimnocythere,
Cyprideis,
Qaidamocythere,
Hemicyprinotus,
Qaibeigouia,
Austrocypris, and
Candoniella) from the Qaidam Basin. MicroViT achieves an accuracy of 95.34%, demonstrating superior performance across all classification metrics. Furthermore, we utilized Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize the decision-making process of the model, revealing that DL models focus on morphological features such as reticulation and honeycomb-like spots. We also investigated the potential for extending this approach to other microfossil groups, such as charophytes and sporopollen, as well as to diverse ostracod populations. These results highlight the significant potential of deep learning techniques for rapid and accurate microfossil classification, offering promising applications in micropalaeontology and stratigraphic studies.