项目简介
CodeQwen1.5 是 Qwen1.5 的代码特定版本。它是一种基于 Transformer 的纯解码器语言模型,在大量代码数据上进行了预训练。
-
强大的代码生成能力和在一系列基准测试中的竞争性能;
-
支持长上下文理解和生成,上下文长度为 64K 令牌;
-
支持92种编码语言;
在文本转 SQL、bug 修复等方面表现出色。
详细📑的性能和介绍在此博客中显示。
性能
EvalPlus(HumanEval,MBPP)
Model
|
Size
|
HumanEval
0-shot
|
HumanEval+
0-shot
|
MBPP
0-shot
|
MBPP+
0-shot
|
MBPP
3-shot
|
Base Model
|
CodeLlama-Base
|
7B
|
33.5
|
25.6
|
52.1
|
41.6
|
38.6
|
StarCoder2
|
7B
|
35.4
|
29.9
|
54.4
|
45.6
|
51.0
|
DeepSeek-Coder-Base
|
6.7B
|
47.6
|
39.6
|
70.2
|
56.6
|
60.6
|
CodeQwen1.5
|
7B
|
51.8
|
45.7
|
72.2
|
60.2
|
61.8
|
Chat Model
|
GPT-3.5-Turbo
|
-
|
76.8
|
70.7
|
82.5
|
69.7
|
70.8
|
GPT-4-Turbo (Nov 2023)
|
-
|
85.4
|
81.7
|
83.5
|
70.7
|
80.0
|
DeepSeek-Coder-Instruct
|
6.7B
|
73.8
|
70.1
|
73.2
|
63.4
|
65.4
|
CodeQwen1.5-Chat
|
7B
|
83.5
|
78.7
|
77.7
|
67.2
|
70.6
|
LiveCodeBench
Model
|
Size
|
Code Generation
All Time
Pass@1
|
Code Generation
2023/9/1 ~ 2024/4/1
Pass@1
|
Base Model
|
CodeLlama-Base
|
7B
|
6.5
|
7.6
|
StarCoder2
|
7B
|
11.3
|
12.7
|
DeepSeek-Coder-Base
|
6.7B
|
19.1
|
13.7
|
CodeQwen1.5
|
7B
|
21.8
|
19.3
|
Chat Model
|
CodeLlama-Instruct
|
7B
|
10.6
|
12.4
|
DeepSeek-Coder-Instruct
|
6.7B
|
21.6
|
19.2
|
CodeQwen1.5-Chat
|
7B
|
25.0
|
23.2
|
MultiPL-E
Model
|
Size
|
Python
|
C++
|
Java
|
PHP
|
TS
|
C#
|
Bash
|
JS
|
Avg
|
Base Model
|
CodeLlama-Base
|
7B
|
31.7
|
29.8
|
34.2
|
23.6
|
36.5
|
36.7
|
12.0
|
29.2
|
29.2
|
StarCoder2-Base
|
7B
|
35.3
|
40.9
|
37.3
|
29.2
|
37.7
|
40.5
|
9.4
|
36.0
|
33.3
|
DeepSeek-Coder-Base
|
6.7B
|
49.4
|
50.3
|
43.0
|
38.5
|
49.7
|
50.0
|
28.5
|
48.4
|
44.7
|
CodeQwen1.5
|
7B
|
52.4
|
52.2
|
42.4
|
46.6
|
52.2
|
55.7
|
36.7
|
49.7
|
48.5
|
Chat Model
|
GPT-3.5-Turbo
|
-
|
76.2
|
63.4
|
69.2
|
60.9
|
69.1
|
70.8
|
42.4
|
67.1
|
64.9
|
GPT-4
|
-
|
84.1
|
76.4
|
81.6
|
77.2
|
77.4
|
79.1
|
58.2
|
78.0
|
76.5
|
DeepSeek-Coder-Instruct
|
6.7B
|
78.6
|
63.4
|
68.4
|
68.9
|
67.2
|
72.8
|
36.7
|
72.7
|
66.1
|
CodeQwen1.5-Chat
|
7B
|
83.2
|
71.2
|
70.1
|
73.5
|
75.4
|
75.9
|
41.1
|
78.2
|
71.1
|
Text-to-SQL
Model
|
Size
|
Spider
Execution Accuracy
Dev Set
|
Bird
Execution Accuracy
Dev Set
|
GPT-3.5-Turbo
|
-
|
70.1
|
37.2
|
GPT-4
|
-
|
85.3
|
50.7
|
CodeLlama-Instruct
|
7B
|
59.5
|
22.4
|
DeepSeek-Coder-Instruct
|
6.7B
|
70.1
|
39.4
|
CodeQwen1.5-Chat
|
7B
|
77.9
|
42.0
|
快速上手