Video-CT MAE: Self-supervised Video-CT Domain Adaptation for Vertebral Fracture Diagnosis

Bueß L, Stollenga M, Schinz D, Wiestler B, Kirschke J, Maier A, Navab N, Keicher M (2024)

Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2024

Series: Proceedings of Machine Learning Research

Book Volume: Volume 250

Pages Range: 151-167

Conference Proceedings Title: Proceedings of MIDL 2024

Event location: Paris

URI: https://proceedings.mlr.press/v250/

Open Access Link: https://openreview.net/pdf?id=shuwpLaOJP

Abstract

Early and accurate diagnosis of vertebral body anomalies is crucial for effectively treating spinal disorders, but the manual interpretation of CT scans can be time-consuming and error-prone. While deep learning has shown promise in automating vertebral fracture detection, improving the interpretability of existing methods is crucial for building trust and ensuring reliable clinical application. Vision Transformers (ViTs) offer inherent interpretability through attention visualizations but are limited in their application to 3D medical images due to reliance on 2D image pretraining. To address this challenge, we propose a novel approach combining the benefits of transfer learning from video-pretrained models and domain adaptation with self-supervised pretraining on a task-specific but unlabeled dataset. Compared to naive transfer learning from Video MAE, our method shows improved downstream task performance by 8.3 in F1 and a training speedup of factor 2. This closes the gap between videos and medical images, allowing a ViT to learn relevant anatomical features while adapting to the task domain. We demonstrate that our framework enables ViTs to effectively detect vertebral fractures in a low data regime, outperforming CNN-based state-of-the-art methods while providing inherent interpretability. Our task adaptation approach and dataset not only improve the performance of our proposed method but also enhance existing self-supervised pretraining approaches, highlighting the benefits of task-specific self-supervised pretraining for domain adaptation.

Authors with CRIS profile

Lukas Bueß Lehrstuhl für Informatik 5 (Mustererkennung) Andreas Maier Lehrstuhl für Informatik 5 (Mustererkennung)

Involved external institutions

Technische Universität München (TUM)

Germany (DE)

How to cite

APA:

Bueß, L., Stollenga, M., Schinz, D., Wiestler, B., Kirschke, J., Maier, A.,... Keicher, M. (2024). Video-CT MAE: Self-supervised Video-CT Domain Adaptation for Vertebral Fracture Diagnosis. In Ninon Burgos, Caroline Petitjean, Maria Vakalopoulou, Stergios Christodoulidis, Pierrick Coupe, Hervé Delingette, Carole Lartizien, Diana Mateus (Eds.), Proceedings of MIDL 2024 (pp. 151-167). Paris, FR.

MLA:

Bueß, Lukas, et al. "Video-CT MAE: Self-supervised Video-CT Domain Adaptation for Vertebral Fracture Diagnosis." Proceedings of the International Conference on Medical Imaging with Deep Learning (MIDL) 2024, Paris Ed. Ninon Burgos, Caroline Petitjean, Maria Vakalopoulou, Stergios Christodoulidis, Pierrick Coupe, Hervé Delingette, Carole Lartizien, Diana Mateus, 2024. 151-167.

BibTeX: Download