Voicebox

Voicebox

Free, open-source, local-first voice cloning studio — a powerful ElevenLabs alternative that runs entirely on your own machine with no subscriptions, no cloud, and no data sent externally.

Linux macOS Open Source Self Hosted Windows
Voicebox

What is Voicebox?

Voicebox is a free, open-source, local-first AI voice cloning studio built as a direct alternative to subscription services like ElevenLabs. It runs entirely on your machine — Windows, macOS, or Linux — with no cloud dependency, no monthly fees, and no audio data ever leaving your device. In just a few weeks after launch, the project attracted over 11,000 GitHub stars, reflecting strong community demand for a privacy-respecting TTS solution.

Key Features

  • Voice Cloning from 3 seconds of audio — upload or record a short sample to create a reusable voice profile
  • 5 TTS Engines — Qwen3-TTS (0.6B / 1.7B), LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, and TADA (1B / 3B)
  • 23 Languages — including Arabic, Chinese, Danish, English, Finnish, French, German, Greek, Hebrew, Hindi, Japanese, Korean, Malay, Norwegian, Polish, Spanish, Swahili, Swedish, Turkish and more
  • Multi-voice Timeline Editor — compose podcasts, dialogues, and audiobooks with multiple speakers on a visual timeline
  • Paralinguistic Tags — add [laugh], [sigh], [gasp] and other expressive cues via Chatterbox Turbo
  • Post-processing Effects — pitch shift, reverb, delay, chorus, compression, and filters
  • Unlimited Length — auto-chunking with crossfade for long-form scripts, articles, and chapters
  • In-app Recorder + Whisper Transcription — record audio and transcribe it automatically
  • REST API — integrate Voicebox into your own applications or workflows
  • MCP Integration — a single voicebox.speak tool call lets any MCP-aware AI agent (Claude Code, Cursor, Cline) speak to you in a cloned voice
  • GPU Acceleration — CUDA support for compatible GPUs; CPU-only mode available at reduced speed
  • Apple Silicon Support — MLX backend leverages the Neural Engine for fast generation on M-series Macs

TTS Engine Comparison

Engine Languages Strengths
Qwen3-TTS (0.6B / 1.7B) 10 High-quality multilingual cloning, delivery instructions ("speak slowly", "whisper")
LuxTTS English Lightweight (~1 GB VRAM), 48 kHz output, 150× realtime on CPU
Chatterbox Multilingual 23 Broadest language coverage
Chatterbox Turbo English Fast 350 M model with paralinguistic emotion/sound tags
TADA (1B / 3B) 10 HumeAI speech-language model — 700s+ coherent audio, text-acoustic dual alignment

Who Is It For?

Voicebox suits content creators, podcasters, developers, accessibility advocates, and anyone who needs realistic TTS or voice cloning without paying for a SaaS subscription. Because everything runs locally, it is also ideal for sensitive use cases where privacy matters — narrating private documents, building voice interfaces for local AI agents, or producing voiceovers offline.

Getting Started

Download the desktop app directly from voicebox.sh — no account, no email required. Full documentation and model management guides are available at docs.voicebox.sh. The source code is available on GitHub.

Reviews

No reviews yet

Similar listings in category

Directify Logo Create a Site Like This