Benchmark
|
Previous SoTA
|
Claude-3.5 Sonnet
|
GPT-4o
|
Qwen2-VL-72B
|
Qwen2-VL-7B
|
Qwen2-VL-2B
|
MMMU
val
(Yue et al., 2023)
|
66.1
(X.AI, 2024b)
|
68.3
|
69.1
|
64.5
|
54.1
|
41.1
|
DocVQA
test
(Mathew et al., 2021)
|
94.1
(Chen et al., 2024c)
|
95.2
|
92.8
|
96.5
|
94.5
|
90.1
|
InfoVQA
test
(Mathew et al., 2021)
|
82.0
(Chen et al., 2024c)
|
-
|
-
|
84.5
|
76.5
|
65.5
|
AI2D
(Kembhavi et al., 2016)
|
87.6
(Chen et al., 2024c)
|
80.2(94.7)
|
84.6(94.2)
|
88.1
|
83.0
|
74.7
|
ChartQA
test
(Masry et al., 2022)
|
88.4
(Chen et al., 2024c)
|
90.8
|
85.7
|
88.3
|
83.0
|
73.5
|
TextVQA
val
(Singh et al., 2019)
|
84.4
(Chen et al., 2024c)
|
-
|
-
|
85.5
|
84.3
|
79.7
|
OCRBench
(Liu et al., 2023e)
|
852
(Yao et al., 2024)
|
788
|
736
|
877
|
866
|
809
|
MTVQA
(Tang et al., 2024)
|
23.2
(Team et al., 2023)
|
25.7
|
27.8
|
30.9
|
25.6
|
18.1
|
VCR
en easy
(Zhang et al., 2024c)
|
84.7
(Chen et al., 2024c)
|
63.9
|
91.6
|
91.9
|
89.7
|
81.5
|
VCR
zh easy
(Zhang et al., 2024c)
|
22.1
(Chen et al., 2024c)
|
1.0
|
14.9
|
65.4
|
59.9
|
46.2
|
RealWorldQA
(X.AI, 2024a)
|
72.2
(Chen et al., 2024c)
|
60.1
|
75.4
|
77.8
|
70.1
|
62.9
|
MME
sum
(Fu et al., 2023)
|
2414.7
(Chen et al., 2024c)
|
1920.0
|
2328.7
|
2482.7
|
2326.8
|
1872.0
|
MMBench-EN
test
(Liu et al., 2023d)
|
86.5
(Chen et al., 2024c)
|
79.7
|
83.4
|
86.5
|
83.0
|
74.9
|
MMBench-CN
test
(Liu et al., 2023d)
|
86.3
(Chen et al., 2024c)
|
80.7
|
82.1
|
86.6
|
80.5
|
73.5
|
MMBench-V1.1
test
(Liu et al., 2023d)
|
85.5
(Chen et al., 2024c)
|
78.5
|
82.2
|
85.9
|
80.7
|
72.2
|
MMT-Bench
test
(Ying et al., 2024)
|
63.4
(Chen et al., 2024b)
|
-
|
65.5
|
71.7
|
63.7
|
54.5
|
MMStar
(Chen et al., 2024a)
|
67.1
(Chen et al., 2024c)
|
62.2
|
63.9
|
68.3
|
60.7
|
48.0
|
MMVet
GPT-4-Turbo
(Yu et al., 2024)
|
67.5
(OpenAI., 2023)
|
66.0
|
69.1
|
74.0
|
62.0
|
49.5
|
HallBench
avg
(Guan et al., 2023)
|
55.2
(Chen et al., 2024c)
|
49.9
|
55.0
|
58.1
|
50.6
|
41.7
|
MathVista
testmini
(Lu et al., 2024a)
|
69.0
(X.AI, 2024b)
|
67.7
|
63.8
|
70.5
|
58.2
|
43.0
|
MathVision
(Wang et al., 2024)
|
30.3
(OpenAI, 2023)
|
-
|
30.4
|
25.9
|
16.3
|
12.4
|
MMMU-Pro
(Yue et al., 2024)
|
46.9
(Team et al., 2023)
|
51.5
|
51.9
|
46.2
|
43.5
|
37.6
|