专栏名称: GitHubStore

分享有意思的开源项目

完全本地化的AI语音聊天工具

GitHubStore · 公众号 · · 2024-08-06 16:22

正文

项目简介

一款快速、完全本地化的AI语音聊天工具，使用WebSockets实现低延迟语音交互，支持多种语音识别和合成技术

在 7900 类 AMD RDNA3 显卡上，语音到语音的延迟在 1 秒范围内：

Whisper large-v2 (Q5)
Llama 3 8B (Q4_K_M)
tts_models/en/vctk/vits (Coqui TTS default VITS models)

在 4090 上，使用更快的 Whisper（faster-distil-whisper-large-v2），我们可以将延迟降低到低至 300 毫秒：

安装

这些安装指南适用于 Ubuntu LTS，并假设您已经设置了 ROCm 或 CUDA。

我建议你使用 conda 或（我更喜欢的）mamba 进行环境管理。这会让你的生活更轻松。

系统先决条件

sudo apt update
# Not strictly required but the helpers we usesudo apt install byobu curl wget
# Audio processingsudo apt install espeak-ng ffmpeg libopus0 libopus-dev

Checkout code

# Create envmamba create -y -n voicechat2 python=3.11
# Setupmamba activate voicechat2git clone https://github.com/lhl/voicechat2cd voicechat2pip install -r requirements.txt

whisper.cpp

# Build whisper.cppgit clone https://github.com/ggerganov/whisper.cppcd whisper.cpp# AMD version# -DGGML_HIP_UMA=ON to work with APUs (but hurts dGPU perf)GGML_HIPBLAS=1 make -j # Nvidia versionGGML_CUDA=1 make -j 
# Get model - large-v2 is 3094 MBbash ./models/download-ggml-model.sh large-v2# Quantized version - large-v2-q5_0 is  1080MB# bash ./models/download-ggml-model.sh large-v2-q5_0
# If you're going to go to the next instructioncd ..

llama.cpp

# Build llama.cppgit clone https://github.com/ggerganov/llama.cppcd llama.cpp# AMD versionmake GGML_HIPBLAS=1 -j # Nvidia versionmake GGML_CUDA=1 -j 
# Grab your preferred GGUF modelwget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
# If you're going to go to the next instructioncd ..

StyleTTS2

git clone https://github.com/yl4579/StyleTTS2.gitcd StyleTTS2pip install -r requirements.txtpip install phonemizer
# Download the LJSpeech Model# https://huggingface.co/yl4579/StyleTTS2-LJSpeech/tree/main# https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/mainpip install huggingface_hubhuggingface-cli download --local-dir . yl4579/StyleTTS2-LJSpeech

额外的便利脚本：Translated Text:

run-voicechat2.sh - on your GPU machine, tries to launch all servers in separate byobu sessionsremote-tunnel.sh - connect your GPU machine to a jump machinelocal-tunnel.sh - connect to the GPU machine via a jump machine

其他 AI 语音聊天项目

webrtc-ai-voice-chat

演示显示了相当大的延迟（约 10 秒），但这个项目并不接近我们正在做的事情（它使用 WebRTC 而不是 WebSocket）来自 voicechat2（HF Transformers，Ollama）

https://github.com/lalanikarim/webrtc-ai-voice-chat
Apache 2.0

june

基于控制台的本地客户端（HF 变压器、Ollama、Coqui TTS、PortAudio）

https://github.com/mezbaul-h/june
MIT

GlaDOS

这是一个非常响应式的基于控制台的本地客户端应用，也支持 VAD 和中断，还有非常聪明的钩子！（whisper.cpp, llama.cpp, piper, espeak）

https://github.com/dnhkng/GlaDOS
MIT

local-talking-llm

另一种基于控制台的本地客户端，更多的是一个概念验证，但带有博客撰写说明。

https://github.com/vndee/local-talking-llm
https://blog.duy.dev/build-your-own-voice-assistant-and-run-it-locally/
MIT

完全本地化的AI语音聊天工具

正文

项目简介

安装

系统先决条件

Checkout code

whisper.cpp

llama.cpp

StyleTTS2

其他 AI 语音聊天项目

webrtc-ai-voice-chat

june

GlaDOS

local-talking-llm

BUD-E - 自然语音助手

请到「今天看啥」查看全文

完全本地化的AI语音聊天工具

正文

项目简介 (adsbygoogle = window.adsbygoogle || []).push({});

安装

系统先决条件

Checkout code

whisper.cpp

llama.cpp

StyleTTS2

其他 AI 语音聊天项目

webrtc-ai-voice-chat

june

GlaDOS

local-talking-llm

BUD-E - 自然语音助手

请到「今天看啥」查看全文

项目简介