Installing Voicebox: A Local AI Voice Studio Guide

Introduction

If you're looking for a capable, fully private voice tool, Voicebox is one of the strongest options in the open-source community right now. It's not just a text-to-speech (TTS) tool — it's a complete local voice studio. You can clone any voice, enable system-wide dictation, and even give your AI agent a voice of its own — all running locally on your machine, with no cloud subscriptions or privacy trade-offs.

UI Demo

Voicebox UI and operation interface demo

Installation Guide: Which File Should I Download?

When you visit the Voicebox GitHub Releases page, you'll see a lot of files with different suffixes. Pick the one that matches your machine:

Voicebox download section (click the link above)

1. macOS

Hardware	Recommended File	Notes
Apple Silicon (M1/M2/M3)	`Voicebox_0.5.0_aarch64.dmg`	Best performance, supports MLX hardware acceleration
Intel	`Voicebox_0.5.0_x64.dmg`	For older MacBooks or iMacs

Install tips: After downloading, open the .dmg and drag Voicebox into your Applications folder. If you see a "cannot verify developer" warning on first launch, go to System Settings > Privacy & Security and click Open Anyway.

2. Windows

Hardware	Recommended File	Notes
General use (recommended)	`Voicebox_0.5.0_x64-setup.exe`	Standard installer with setup wizard
Enterprise / automated deployment	`Voicebox_0.5.0_x64_en-US.msi`	Microsoft standard installer format

Install tips: Run the .exe file. If Windows Defender shows an orange warning, click More info and select Run anyway. After launch, Voicebox will auto-detect your GPU (NVIDIA/AMD) and download the corresponding compute modules.

Note: Files ending in .sig or .zip.sig are digital signatures used to verify package integrity — most users don't need to download them.

Core Features

Step 1: Create a Voice Profile (Voice Cloning)

Go to the Profiles tab and click "Create New Profile".
Upload an audio file: Prepare a 10-30 second reference clip — clear audio with no background noise.
Choose an engine:
- For high-quality cloning: select Qwen3-TTS.
- For speed: select Kokoro.
Click "Create". You can now generate speech with this voice.

Step 2: Global Dictation

This is one of Voicebox's most practical features — it lets you dictate text into any app:

Go to Settings > Dictation and set a hotkey (default is usually Caps Lock or a custom combo).
To use it: In any text field (Slack, VS Code, etc.), hold the hotkey and start speaking.
Done: Release the key — Voicebox transcribes your speech via Whisper and pastes the text automatically.

Step 3: Make Your AI Agent Speak (MCP Setup)

If you use Claude Code or Cursor, you can connect Voicebox via the MCP protocol:

Claude Code setup:

bash

claude mcp add voicebox \
  --transport http \
  --url http://127.0.0.1:17493/mcp \
  --header "X-Voicebox-Client-Id: claude-code"

Once configured, your agent can call the voicebox.speak tool and talk back to you using your cloned voice.

Step 4: Story Editor

Open the Stories tab and create a new project.
The multi-track timeline lets you drag and drop audio clips.
Multi-character dialogue is supported — useful for podcast clips or audio tours.

Troubleshooting

Generation too slow: Make sure GPU acceleration is enabled in Settings.
Model download fails: Check your network connection, or manually set the VOICEBOX_MODELS_DIR environment variable.
Global dictation won't paste: On macOS, verify that Voicebox has Accessibility permissions enabled.

Related links: