专栏名称: GitHubStore

分享有意思的开源项目

智能实时API工具

GitHubStore · 公众号 · · 2024-10-10 09:09

正文

项目简介

集成Firecrawl的OpenAI实时API控制台，可实时交互和检查API，适用于浏览器和Node.js，支持音频管理

OpenAI Realtime Console + Firecrawl 旨在作为 OpenAI Realtime API 的检查器和交互式 API 参考，并集成了用于 Web 数据的Firecrawl 。它附带了两个实用程序库：openai/openai-realtime-api-beta （充当参考客户端（适用于浏览器和 Node.js））和 /src/lib/wavtools （允许在浏览器中进行简单的音频管理）。

启动控制台

这是一个使用 create-react-app 创建的 React 项目，通过 Webpack 捆绑。通过解压该包的内容并使用来安装它;

$ npm i

使用以下命令启动您的服务器：

$ npm start

它应该可以通过 localhost:3000 获得。

使用控制台

控制台需要能够访问 Realtime API 的 OpenAI API 密钥（用户密钥或项目密钥）。启动时系统会提示您输入它。它将通过 localStorage 保存，并且可以随时从 UI 进行更改。

要开始会话，您需要连接。这将需要麦克风访问权限。然后，您可以选择手动（一键通）和 vad （语音活动检测）对话模式，并随时在它们之间切换。

启用了两个功能；

get_weather ：询问任何地方的天气，模型将尽力查明该位置，将其显示在地图上，并获取该位置的天气。请注意，它没有位置访问权限，并且坐标是根据模型的训练数据“猜测”的，因此准确性可能并不完美。
set_memory ：您可以要求模型为您记住信息，它会将其存储在左侧的 JSON blob 中。

您可以在按键通话或 VAD 模式下随时自由地中断模型。

使用中继服务器

如果您想构建一个更强大的实现并使用您自己的服务器来使用参考客户端，我们已经提供了 Node.js中继服务器。

$ npm run relay

它将在 localhost:8081 上自动启动。

您需要使用以下配置创建一个 .env 文件：

OPENAI_API_KEY=YOUR_API_KEYREACT_APP_LOCAL_RELAY_SERVER_URL=http://localhost:8081

您将需要重新启动 .env. 更改才能生效。本地服务器 URL 通过 ConsolePage.tsx 加载。要随时停止使用中继服务器，只需删除环境变量或将其设置为空字符串即可。

/** * Running a local relay server will allow you to hide your API key * and run custom logic on the server * * Set the local relay server address to: * REACT_APP_LOCAL_RELAY_SERVER_URL=http://localhost:8081 * * This will also require you to set OPENAI_API_KEY= in a `.env` file * You can run it with `npm run relay`, in parallel with `npm start` */const LOCAL_RELAY_SERVER_URL: string =  process.env.REACT_APP_LOCAL_RELAY_SERVER_URL || '';

该服务器只是一个简单的消息中继，但它可以扩展为：

如果您想发布可在线玩的应用程序，请隐藏 API 凭据
直接在服务器上处理您想要保密的某些调用（例如 instructions ）
限制客户端可以接收和发送的事件类型

您必须自己实现这些功能。

实时API参考客户端

最新的参考客户端和文档可在 GitHub 上获取，网址为openai/openai-realtime-api-beta 。

您可以在任何 React（前端）或 Node.js 项目中自行使用此客户端。有关完整文档，请参阅 GitHub 存储库，但您可以使用此处的指南作为入门指南。

import { RealtimeClient } from '/src/lib/realtime-api-beta/index.js';
const client = new RealtimeClient({ apiKey: process.env.OPENAI_API_KEY });
// Can set parameters ahead of connectingclient.updateSession({ instructions: 'You are a great, upbeat friend.' });client.updateSession({ voice: 'alloy' });client.updateSession({ turn_detection: 'server_vad' });client.updateSession({ input_audio_transcription: { model: 'whisper-1' } });
// Set up event handlingclient.on('conversation.updated', ({ item, delta }) => {  const items = client.conversation.getItems(); // can use this to render all items  /* includes all changes to conversations, delta may be populated */});
// Connect to Realtime APIawait client.connect();
// Send a item and triggers a generationclient.sendUserMessageContent([{ type: 'text', text: `How are you?` }]);

发送流音频

要发送流音频，请使用 .appendInputAudio() 方法。如果您处于 turn_detection: 'disabled' 模式，那么您需要使用 .generate() 来告诉模型做出响应。

// Send user audio, must be Int16Array or ArrayBuffer// Default audio format is pcm16 with sample rate of 24,000 Hz// This populates 1s of noise in 0.1s chunksfor (let i = 0; i < 10; i++) {  const data = new Int16Array(2400);  for (let n = 0; n < 2400; n++) {    const value = Math.floor((Math.random() * 2 - 1) * 0x8000);    data[n] = value;  }  client.appendInputAudio(data);}// Pending audio is committed and model is asked to generateclient.createResponse();

添加和使用工具

使用工具很容易。只需调用 .addTool() 并将回调设置为第二个参数即可。回调将使用工具的参数执行，结果将自动发送回模型。

// We can add tools as well, with callbacks specifiedclient.addTool(  {    name: 'get_weather',    description:      'Retrieves the weather for a given lat, lng coordinate pair. Specify a label for the location.',    parameters: {      type: 'object',      properties: {        lat: {          type: 'number',          description: 'Latitude',        },        lng: {          type: 'number',          description: 'Longitude',        },        location: {          type: 'string',          description: 'Name of the location',        },      },      required: ['lat', 'lng', 'location'],    },  },  async ({ lat, lng, location }) => {    const result = await fetch(      `https://api.open-meteo.com/v1/forecast?latitude=${lat}&longitude=${lng}¤t=temperature_2m,wind_speed_10m`    );    const json = await result.json();    return json;  });

中断模型

您可能需要手动中断模型，尤其是在 turn_detection: 'disabled' 模式下。为此，我们可以使用：

// id is the id of the item currently being generated// sampleCount is the number of audio samples that have been heard by the listenerclient.cancelResponse(id, sampleCount);

此方法将导致模型立即停止生成，但也会通过删除 sampleCount 之后的所有音频并清除文本响应来截断正在播放的项目。通过使用此方法，您可以中断模型并防止它“记住”用户状态之前生成的任何内容。

参考客户端事件

RealtimeClient 中有五个用于应用程序控制流的主要客户端事件。请注意，这只是使用客户端的概述，完整的实时 API 事件规范要大得多，如果您需要更多控制，请查看 GitHub 存储库：openai/openai-realtime-api-beta 。

// errors like connection failuresclient.on('error', (event) => {  // do thing});
// in VAD mode, the user starts speaking// we can use this to stop audio playback of a previous response if necessaryclient.on('conversation.interrupted', () => {  /* do something */});
// includes all changes to conversations// delta may be populatedclient.on('conversation.updated', ({ item, delta }) => {  // get all items, e.g. if you need to update a chat window  const items = client.conversation.getItems();  switch (item.type) {    case 'message':      // system, user, or assistant message (item.role)      break;    case 'function_call':      // always a function call from the model      break;    case 'function_call_output':      // always a response from the user / application      break;  }  if (delta) {    // Only one of the following will be populated for any given event    // delta.audio = Int16Array, audio added    // delta.transcript = string, transcript added    // delta.arguments = string, function arguments added  }});
// only triggered after item added to conversationclient.on('conversation.item.appended', ({ item }) => {  /* item status can be 'in_progress' or 'completed' */});
// only triggered after item completed in conversation// will always be triggered after conversation.item.appendedclient.on('conversation.item.completed', ({ item }) => {  /* item status will always be 'completed' */});

波形工具

Wavtools 可以在浏览器中轻松管理 PCM16 音频流，包括录制和播放。