Open-O1：旨在通过开源方式匹配 OpenAI O1 模型的强大能力

GitHubStore · 公众号 · · 2024-10-15 08:37

正文

项目简介

我们的 Open O1 旨在匹配专有 OpenAI O1 模型的强大功能，为社区提供先进的开源替代方案。我们的模型是通过整理一组 O1 风格的思维数据而开发的，然后将其用于训练 LLaMA 和 Qwen 模型。这种训练方法赋予了较小的模型增强的长期推理和解决问题的能力。

随着 Open O1 项目的进展，我们将继续突破大型语言模型的可能性界限。我们的愿景是创建一个模型，不仅能实现类似 O1 的性能，还能在测试时间扩展方面处于领先地位，使所有人都能使用先进的 AI 功能。通过社区驱动的发展和对道德实践的承诺，Open O1 将成为人工智能进步的基石，确保技术的未来是开放的并造福于所有人。

演示

🍓“strawberry”一词中有多少个字母“r”？

Input
How many letters 'r' in the word 'strawberry'?

Thought
Alright, I need to determine how many letters 'r' are in the word "strawberry." Let's start by looking at the word step by step. The word "strawberry" has several letters, and I'll list them out: s, t, r, a, w, b, e, r, r, y. Now, focusing on the letter 'r,' I can count each occurrence. 
First, there's an 'r' in the third position after the 's.' Then, there's another 'r' in the eighth position before the 'y.' Additionally, there's an 'r' in the ninth position, making it three 'r's in total. To be thorough, I'll double-check by counting again: 1st letter is 's,' 2nd is 't,' 3rd is 'r'—count, 4th is 'a,' 5th is 'w,' 6th is 'b,' 7th is 'e,' 8th is 'r,' 9th is 'r,' 10th is 'r,' and finally, 11th is 'y.' Yes, that confirms it—the letter 'r' appears three times.
But wait, could there be any other interpretations? Maybe considering uppercase versus lowercase, but since both 'r's in "strawberry" are lowercase, it doesn't change the count. Also, thinking about the pronunciation, even though 'r' might have a different sound, the number of 'r's remains the same visually. Another angle: if the word were spelled differently or had diacritics, but "strawberry" is straightforward in its spelling.
Is there a possibility of multiple meanings or contexts affecting the count? For example, in a different language or script, but assuming we're sticking to standard English spelling, the count should remain consistent. Also, considering the position of 'r's in the word, they are not repeated consecutively, which might affect certain linguistic analyses, but for the purpose of counting, their positions don't impact the total number.
In summary, after carefully examining each letter in "strawberry" and recounting to ensure accuracy, I'm confident that the letter 'r' appears three times in the word.

Output
There are three letters 'r' in the word 'strawberry'.

内容导航

Section	Description
💻 Model Deployments & Chat Templates	Instructions and examples for deploying models and using chat templates effectively.
✍ Example Demonstrations	Showcase of various use cases and practical demonstrations of the model's capabilities.
💯 System Performance	Analysis of system performance metrics, benchmarking results, and optimization strategies.
🎋 Training Details	An overview of the training process for Open O1, including datasets used, training methodologies, and any relevant hyperparameters.
❓ FAQ	Answers to frequently asked questions.
⚠️ Limitations	A discussion of the limitations of the models involved, including known issues, performance constraints, and areas for future improvement.

💻模型部署和💬聊天模板

💻模型部署

为了快速开始使用 Open O1 ，我们提供了一些步骤供您享受我们的模型。

Model下载

OpenO1-LLama-8B-v0.1
OpenO1-V1-Qwen-7B

部署

转到open-o1 Huggingface 存储库来部署模型。

💬聊天模板

Open O1的聊天模板遵循LLaMA3.1的。详细信息可以在chat-templates中获取。

dialog = [    {"role": "user", "content": "What's the weather like today?"},    {"role": "assistant", "content": "It's sunny and warm, around 75°F."}]

使用聊天模板处理后，转换结果如下：

<|begin_of_text|><|start_header_id|>user<|end_header_id|>What's the weather like today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>It's sunny and warm, around 75°F.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

系统性能

下表提供了 llama3.1-8b-instruct 和我们的模型在多个基准测试中的性能的全面比较。这些评估是在零样本设置中进行的，这意味着模型是在没有针对特定任务进行微调的情况下进行测试的，突显了它们跨不同任务进行泛化的能力。这些基准评估不同领域的推理、知识和理解的各个方面，清楚地表明每个模型如何处理复杂的任务，而无需事先接触或特定的任务相关培训。我们的模型始终展现出具有竞争力或卓越的性能，展示了推理、数学理解和通用人工智能能力关键领域的进步。

Model	GSM8K	MATH	MMLU	Hellaswag	ARC-C	BBH
llama3.1-8b-instruct	84.00	47.42	67.95	68.43	83.87	53.64
Ours(OpenO1-llama-8B-v0.1)	85.82	52.88	70.45	67.77	86.52	58.43

GSM8K ：我们的模型以 85.82 的分数优于 llama3.1-8b-instruct ，在数学应用题中表现出更好的推理能力。
数学