AI & Tools #AI Tools #Audio Processing #Automation

Installing Voicebox: A Local AI Voice Studio Guide

A developer-oriented guide to Voicebox — from macOS/Windows installation to voice cloning, and how to make your AI agent speak via MCP.

8 min read/ Medium

Introduction

If you're looking for a capable, fully private voice tool, Voicebox is one of the strongest options in the open-source community right now. It's not just a text-to-speech (TTS) tool — it's a complete local voice studio. You can clone any voice, enable system-wide dictation, and even give your AI agent a voice of its own — all running locally on your machine, with no cloud subscriptions or privacy trade-offs.


UI Demo

Voicebox UI and operation interface demo


Installation Guide: Which File Should I Download?

When you visit the Voicebox GitHub Releases page, you'll see a lot of files with different suffixes. Pick the one that matches your machine:

Voicebox download section (click the link above)

1. macOS

HardwareRecommended FileNotes
Apple Silicon (M1/M2/M3)Voicebox_0.5.0_aarch64.dmgBest performance, supports MLX hardware acceleration
IntelVoicebox_0.5.0_x64.dmgFor older MacBooks or iMacs
  • Install tips: After downloading, open the .dmg and drag Voicebox into your Applications folder. If you see a "cannot verify developer" warning on first launch, go to System Settings > Privacy & Security and click Open Anyway.

2. Windows

HardwareRecommended FileNotes
General use (recommended)Voicebox_0.5.0_x64-setup.exeStandard installer with setup wizard
Enterprise / automated deploymentVoicebox_0.5.0_x64_en-US.msiMicrosoft standard installer format
  • Install tips: Run the .exe file. If Windows Defender shows an orange warning, click More info and select Run anyway. After launch, Voicebox will auto-detect your GPU (NVIDIA/AMD) and download the corresponding compute modules.

Note: Files ending in .sig or .zip.sig are digital signatures used to verify package integrity — most users don't need to download them.


Core Features

Step 1: Create a Voice Profile (Voice Cloning)

  1. Go to the Profiles tab and click "Create New Profile".
  2. Upload an audio file: Prepare a 10-30 second reference clip — clear audio with no background noise.
  3. Choose an engine:
    • For high-quality cloning: select Qwen3-TTS.
    • For speed: select Kokoro.
  4. Click "Create". You can now generate speech with this voice.

Step 2: Global Dictation

This is one of Voicebox's most practical features — it lets you dictate text into any app:

  1. Go to Settings > Dictation and set a hotkey (default is usually Caps Lock or a custom combo).
  2. To use it: In any text field (Slack, VS Code, etc.), hold the hotkey and start speaking.
  3. Done: Release the key — Voicebox transcribes your speech via Whisper and pastes the text automatically.

Step 3: Make Your AI Agent Speak (MCP Setup)

If you use Claude Code or Cursor, you can connect Voicebox via the MCP protocol:

Claude Code setup:

bash
claude mcp add voicebox \
  --transport http \
  --url http://127.0.0.1:17493/mcp \
  --header "X-Voicebox-Client-Id: claude-code"

Once configured, your agent can call the voicebox.speak tool and talk back to you using your cloned voice.

Step 4: Story Editor

  1. Open the Stories tab and create a new project.
  2. The multi-track timeline lets you drag and drop audio clips.
  3. Multi-character dialogue is supported — useful for podcast clips or audio tours.

Troubleshooting

  • Generation too slow: Make sure GPU acceleration is enabled in Settings.
  • Model download fails: Check your network connection, or manually set the VOICEBOX_MODELS_DIR environment variable.
  • Global dictation won't paste: On macOS, verify that Voicebox has Accessibility permissions enabled.

Related links: