Towards an Exhaustive Evaluation of Vision-Language Foundation Models

Salin E, Ayache S, Favre B (2023)

Publication Type: Conference contribution

Publication year: 2023

Publisher: Institute of Electrical and Electronics Engineers Inc.

Pages Range: 339-352

Conference Proceedings Title: Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023

Event location: Paris

ISBN: 9798350307443

DOI: 10.1109/ICCVW60793.2023.00041

Abstract

Vision-language foundation models have had considerable increase in performances in the last few years. However, there is still a lack comprehensive evaluation methods able to clearly explain their performances. We argue that a more systematic approach to foundation model evaluation would be beneficial to their use in real-world applications. In particular, we think that those models should be evaluated on a broad range of precise capabilities, in order to bring awareness to the width of their scope and their potential weaknesses. To that end, we propose a methodology to build a taxonomy of multimodal capabilities for vision-language foundation models. The proposed taxonomy is intended as a first step towards an exhaustive evaluation of vision-language foundation models.

Authors with CRIS profile

Emmanuelle Salin

Involved external institutions

Aix-Marseille University / Aix-Marseille Université

France (FR)

How to cite

APA:

Salin, E., Ayache, S., & Favre, B. (2023). Towards an Exhaustive Evaluation of Vision-Language Foundation Models. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 (pp. 339-352). Paris, FR: Institute of Electrical and Electronics Engineers Inc..

MLA:

Salin, Emmanuelle, Stéphane Ayache, and Benoit Favre. "Towards an Exhaustive Evaluation of Vision-Language Foundation Models." Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023, Paris Institute of Electrical and Electronics Engineers Inc., 2023. 339-352.

BibTeX: Download