Voicebox

Free, open-source, local-first voice cloning studio — a powerful ElevenLabs alternative that runs entirely on your own machine with no subscriptions, no cloud, and no data sent externally.

AI Voice Generation & Voiceovers Self Hosted Text-to-Speech & Accessibility

Linux macOS Open Source Self Hosted Windows

What is Voicebox?

Voicebox is a free, open-source, local-first AI voice cloning studio built as a direct alternative to subscription services like ElevenLabs. It runs entirely on your machine — Windows, macOS, or Linux — with no cloud dependency, no monthly fees, and no audio data ever leaving your device. In just a few weeks after launch, the project attracted over 11,000 GitHub stars, reflecting strong community demand for a privacy-respecting TTS solution.

Key Features

Voice Cloning from 3 seconds of audio — upload or record a short sample to create a reusable voice profile
5 TTS Engines — Qwen3-TTS (0.6B / 1.7B), LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, and TADA (1B / 3B)
23 Languages — including Arabic, Chinese, Danish, English, Finnish, French, German, Greek, Hebrew, Hindi, Japanese, Korean, Malay, Norwegian, Polish, Spanish, Swahili, Swedish, Turkish and more
Multi-voice Timeline Editor — compose podcasts, dialogues, and audiobooks with multiple speakers on a visual timeline
Paralinguistic Tags — add [laugh], [sigh], [gasp] and other expressive cues via Chatterbox Turbo
Post-processing Effects — pitch shift, reverb, delay, chorus, compression, and filters
Unlimited Length — auto-chunking with crossfade for long-form scripts, articles, and chapters
In-app Recorder + Whisper Transcription — record audio and transcribe it automatically
REST API — integrate Voicebox into your own applications or workflows
MCP Integration — a single voicebox.speak tool call lets any MCP-aware AI agent (Claude Code, Cursor, Cline) speak to you in a cloned voice
GPU Acceleration — CUDA support for compatible GPUs; CPU-only mode available at reduced speed
Apple Silicon Support — MLX backend leverages the Neural Engine for fast generation on M-series Macs

TTS Engine Comparison

Engine	Languages	Strengths
Qwen3-TTS (0.6B / 1.7B)	10	High-quality multilingual cloning, delivery instructions ("speak slowly", "whisper")
LuxTTS	English	Lightweight (~1 GB VRAM), 48 kHz output, 150× realtime on CPU
Chatterbox Multilingual	23	Broadest language coverage
Chatterbox Turbo	English	Fast 350 M model with paralinguistic emotion/sound tags
TADA (1B / 3B)	10	HumeAI speech-language model — 700s+ coherent audio, text-acoustic dual alignment

Who Is It For?

Voicebox suits content creators, podcasters, developers, accessibility advocates, and anyone who needs realistic TTS or voice cloning without paying for a SaaS subscription. Because everything runs locally, it is also ideal for sensitive use cases where privacy matters — narrating private documents, building voice interfaces for local AI agents, or producing voiceovers offline.

Getting Started

Download the desktop app directly from voicebox.sh — no account, no email required. Full documentation and model management guides are available at docs.voicebox.sh. The source code is available on GitHub.