Installing Voicebox: A Local AI Voice Studio Guide
A developer-oriented guide to Voicebox — from macOS/Windows installation to voice cloning, and how to make your AI agent speak via MCP.
Introduction
If you're looking for a capable, fully private voice tool, Voicebox is one of the strongest options in the open-source community right now. It's not just a text-to-speech (TTS) tool — it's a complete local voice studio. You can clone any voice, enable system-wide dictation, and even give your AI agent a voice of its own — all running locally on your machine, with no cloud subscriptions or privacy trade-offs.
UI Demo
Voicebox UI and operation interface demo
Installation Guide: Which File Should I Download?
When you visit the Voicebox GitHub Releases page, you'll see a lot of files with different suffixes. Pick the one that matches your machine:
Voicebox download section (click the link above)
1. macOS
| Hardware | Recommended File | Notes |
|---|---|---|
| Apple Silicon (M1/M2/M3) | Voicebox_0.5.0_aarch64.dmg | Best performance, supports MLX hardware acceleration |
| Intel | Voicebox_0.5.0_x64.dmg | For older MacBooks or iMacs |
- Install tips: After downloading, open the
.dmgand drag Voicebox into your Applications folder. If you see a "cannot verify developer" warning on first launch, go to System Settings > Privacy & Security and click Open Anyway.
2. Windows
| Hardware | Recommended File | Notes |
|---|---|---|
| General use (recommended) | Voicebox_0.5.0_x64-setup.exe | Standard installer with setup wizard |
| Enterprise / automated deployment | Voicebox_0.5.0_x64_en-US.msi | Microsoft standard installer format |
- Install tips: Run the
.exefile. If Windows Defender shows an orange warning, click More info and select Run anyway. After launch, Voicebox will auto-detect your GPU (NVIDIA/AMD) and download the corresponding compute modules.
Note: Files ending in
.sigor.zip.sigare digital signatures used to verify package integrity — most users don't need to download them.
Core Features
Step 1: Create a Voice Profile (Voice Cloning)
- Go to the Profiles tab and click "Create New Profile".
- Upload an audio file: Prepare a 10-30 second reference clip — clear audio with no background noise.
- Choose an engine:
- For high-quality cloning: select
Qwen3-TTS. - For speed: select
Kokoro.
- For high-quality cloning: select
- Click "Create". You can now generate speech with this voice.
Step 2: Global Dictation
This is one of Voicebox's most practical features — it lets you dictate text into any app:
- Go to Settings > Dictation and set a hotkey (default is usually
Caps Lockor a custom combo). - To use it: In any text field (Slack, VS Code, etc.), hold the hotkey and start speaking.
- Done: Release the key — Voicebox transcribes your speech via Whisper and pastes the text automatically.
Step 3: Make Your AI Agent Speak (MCP Setup)
If you use Claude Code or Cursor, you can connect Voicebox via the MCP protocol:
Claude Code setup:
claude mcp add voicebox \
--transport http \
--url http://127.0.0.1:17493/mcp \
--header "X-Voicebox-Client-Id: claude-code"
Once configured, your agent can call the voicebox.speak tool and talk back to you using your cloned voice.
Step 4: Story Editor
- Open the Stories tab and create a new project.
- The multi-track timeline lets you drag and drop audio clips.
- Multi-character dialogue is supported — useful for podcast clips or audio tours.
Troubleshooting
- Generation too slow: Make sure GPU acceleration is enabled in Settings.
- Model download fails: Check your network connection, or manually set the
VOICEBOX_MODELS_DIRenvironment variable. - Global dictation won't paste: On macOS, verify that Voicebox has Accessibility permissions enabled.
Related links:

