DramaBox TTS Pinokio: Free Open-Source Expressive Voice Cloning You Can Run Locally

https://github.com/PierrunoYT/DramaBox-TTS-Pinokio

If you've been searching for a genuinely expressive, free, and locally-runnable text-to-speech solution, DramaBox TTS Pinokio might be the most compelling option to arrive in 2026. Built by community contributor PierrunoYT, this Pinokio integration packages Resemble AI's groundbreaking DramaBox model into a one-click installer — no command-line experience required, no subscriptions, and no cloud dependency once it's installed.

This article covers everything you need to know: what DramaBox is, what makes it different from every other TTS tool on the market, how the Pinokio installer works, hardware requirements, practical use cases, and how it compares to paid alternatives.


What Is DramaBox?

DramaBox is an open-source expressive text-to-speech model developed by Resemble AI and built on the LTX-2.3 architecture. Unlike conventional TTS systems that convert raw text into robotic or flat-sounding speech, DramaBox is designed to interpret intent — the emotional arc, pacing, pauses, and paralinguistic texture of a vocal performance.

Resemble AI describes it as "closing the gap between synthesis and performance." That's not marketing hyperbole. DramaBox is designed to take a prompt written like a screenplay and produce audio that sounds like it was directed, not just generated.

The Resemble AI family of models has exceeded 10 million downloads on Hugging Face, and DramaBox represents the most capable and expressive entry in that lineup.


What Makes DramaBox Different?

Most free TTS tools fall into one of two camps: they either produce flat, robotic speech, or they offer decent quality but zero control over delivery. DramaBox sits in a completely different category. Here's what sets it apart:

1. Prompt-Driven Vocal Performances

DramaBox uses a unique prompting format borrowed from stage direction writing. You don't just feed it text — you write a scene:

  • Dialogue goes inside double quotes. This is what gets spoken.
  • Stage directions go outside the quotes. These cues — sighs, pauses, a cracking voice, hesitation — are acted out by the model but never spoken aloud.

For example, you might write:

Her voice trembles with exhaustion. "I don't know how much longer I can keep doing this." She takes a slow, shaky breath.

The model reads those non-quoted directions and physically performs them in the audio. This is a paradigm shift in how we interact with TTS systems.

2. Prompt-Defined Speaker Identity

If you don't have a reference audio clip, DramaBox can invent a voice based purely on a text description. You can specify:

  • Age and gender
  • Accent and regional dialect
  • Emotional register (anxious, warm, authoritative, playful)
  • Vocal texture and speaking style

This opens up creative possibilities that no other free TTS tool offers. Want a middle-aged British woman with a dry sense of humour? A young enthusiastic American podcaster? Describe it, and DramaBox will synthesise it from scratch.

3. Zero-Shot Voice Cloning

For those who want to replicate a specific voice, DramaBox supports zero-shot voice cloning from just 10 seconds of reference audio. The model extracts the timbre — the unique tonal fingerprint of a voice — and applies it to your prompt's emotional performance. The voice clone preserves the character of the original speaker while still following the emotional arc you've written.

This combination of cloning fidelity and expressive control is something previously only available in expensive enterprise-grade APIs.

4. Studio-Quality Audio Output

DramaBox generates audio at 48 kHz stereo, which is the standard sample rate used in professional audio production. The outputs are clean enough to use directly in:

  • Podcast intros and outros
  • Video narration
  • Game character dialogue
  • Audiobook production
  • Marketing and explainer videos
  • Interactive fiction

5. PerTh Watermarking

Every output from DramaBox is watermarked using PerTh — Resemble AI's proprietary audio watermarking technology. The watermark is completely imperceptible to human listeners and survives MP3/AAC encoding at approximately 100% detection accuracy. It can be disabled for debugging purposes, but for production use it ensures that AI-generated audio is always traceable back to its source.


What Is Pinokio?

Pinokio is a free, open-source application that acts as a one-click installer and launcher for AI tools. Think of it as an app store specifically designed for locally-run AI models — it handles Python environments, dependency installation, model downloads, and configuration automatically.

Without Pinokio, running DramaBox locally would require manual installation of Python, specific package versions, CUDA drivers, model weights from Hugging Face, and correct configuration of the inference pipeline. With Pinokio, it's a single click.

Pinokio is available for Windows, Linux, and macOS (Apple Silicon).


DramaBox TTS Pinokio: The PierrunoYT Integration

The DramaBox-TTS-Pinokio repository by PierrunoYT is a community-maintained Pinokio integration that packages everything needed to run DramaBox locally. It's listed as a Featured app on Pinokio's discovery platform and is actively maintained — currently on version 5.0 as of May 2026.

PierrunoYT is a prolific contributor to the Pinokio ecosystem, having also packaged Kokoro TTS, Chatterbox TTS, OrpheusTTS, VyvoTTS LFM2, KittenTTS, GLM-TTS, and Higgs Audio V2 for easy local installation. Their DramaBox integration has quickly become one of the most-checked-in apps on the platform.

GPU Platform Support

The integration supports:

  • NVIDIA GPUs (primary — CUDA-based)
  • AMD GPUs (ROCm support)
  • Apple Silicon (Mac M-series chips)

Hardware Requirements

Running DramaBox locally requires a capable machine. Here's what you need to know:

Spec Requirement
GPU VRAM (warm inference) ~24 GB VRAM (e.g. RTX 3090, RTX 4090, H100)
GPU VRAM (cold/Gemma loading) ~8 GB VRAM minimum
Generation Speed (H100) ~2.5 seconds per output
Generation Speed (cold start) ~30 seconds per output
Storage ~23.5 GB for full model installation
OS Windows, Linux, macOS (Apple Silicon)

The 24 GB VRAM requirement for warm inference is the most demanding spec. Users with 12–16 GB VRAM cards can still run DramaBox but should expect slower generation times and may need to operate in cold inference mode. The Pinokio community is actively testing quantised and optimised versions as the model matures.


How to Install DramaBox via Pinokio

Installing DramaBox through the Pinokio integration is straightforward:

  1. Download and install Pinokio from pinokio.co — choose your OS (Windows, Linux, or macOS).
  2. Open Pinokio and navigate to the Discover tab.
  3. Search for "DramaBox" — the PierrunoYT integration will appear as a Featured app.
  4. Click Download on the DramaBox TTS Pinokio card.
  5. Pinokio will automatically download all dependencies, Python environments, and model weights (~23.5 GB total).
  6. Once complete, click Launch — a browser-based UI will open where you can start generating speech.

Alternatively, you can install directly from the GitHub repository URL by pasting https://github.com/PierrunoYT/DramaBox-TTS-Pinokio into Pinokio's install-from-URL field.


How to Use DramaBox

Once launched, the DramaBox interface is browser-based and straightforward to use:

Step 1: Write Your Prompt

Write your scene using the screenplay format. Put spoken dialogue inside double quotes. Add stage directions (emotions, actions, paralinguistic cues) outside the quotes. Avoid placing words like "sigh", "gasp", or "cough" inside quotation marks — the model will literally speak the word rather than perform the action.

Good prompt example:

An elderly professor, thoughtful and measured. "The problem with certainty," he paused, tapping the desk slowly, "is that it leaves no room for discovery."

What not to do:

"The problem with certainty, sigh, is that it leaves no room for discovery."

Step 2: Optionally Provide Reference Audio

Upload 10 or more seconds of clean reference audio if you want to clone a specific voice. The audio should be clear, with minimal background noise. DramaBox will extract the timbre and apply it to your prompt's performance.

If you skip this step, DramaBox will generate a voice based on the speaker description in your prompt.

Step 3: Generate

Hit generate and wait for the output. On hardware with sufficient VRAM in warm mode, you'll have 48 kHz stereo audio within a few seconds. Download the file and use it in your project.


Practical Use Cases

DramaBox TTS Pinokio has obvious applications across a wide range of creative and professional workflows:

  • Content creators — Produce voiceovers for YouTube videos, shorts, and social media without recording studios or voice actors.
  • Game developers — Generate character dialogue with real emotional range for indie games and interactive fiction, prototype quickly before hiring voice talent.
  • Podcast producers — Create hosts, co-hosts, or character voices for narrative podcasts.
  • Audiobook authors — Produce full narration with different voices for each character.
  • Marketers and advertisers — Generate voiceover demos for ad copy and explainer videos at no cost.
  • Automation engineers — Integrate DramaBox into local workflows using n8n, Make, or custom API calls for automated audio production pipelines.
  • Educators and e-learning creators — Narrate course content in multiple voices and emotional registers to improve engagement.

How Does It Compare to Paid Alternatives?

DramaBox competes directly with commercial TTS products that charge per character or per hour of audio. Here's how it stacks up:

Feature DramaBox (Local) ElevenLabs PlayHT Murf AI
Price Free From $5/mo From $31.20/mo From $29/mo
Voice Cloning Yes (10s audio) Yes Yes Limited
Emotional Control Full (prompt-directed) Partial Limited Limited
Output Quality 48 kHz stereo 44.1 kHz 44.1 kHz 44.1 kHz
Local/Offline Yes No No No
Open Source Yes No No No
Prompt-Defined Identity Yes No No No
Privacy Full (local) Cloud-based Cloud-based Cloud-based

For users who need production volume or don't have a capable GPU, cloud-based services make sense. But for power users, developers, and content creators with the right hardware, DramaBox running locally is simply unbeatable on value.


Where to Get It


Final Verdict

DramaBox TTS Pinokio is one of the most significant free AI audio tools released in 2026. The combination of prompt-directed emotional performance, zero-shot voice cloning from just 10 seconds of audio, synthetic voice invention from text descriptions, and studio-grade 48 kHz stereo output is genuinely unprecedented in the open-source space.

The hardware requirements are real — you'll want at least 12 GB VRAM and ideally 24 GB for fast warm inference — but for users who already run local AI workloads, this installs alongside existing tools without friction.

PierrunoYT's Pinokio integration removes every barrier to getting started. There's no Python environment to configure, no dependency conflicts to debug, and no API key to manage. You install Pinokio, click once, and within the download time you have a world-class expressive TTS system running entirely on your own hardware.

For anyone serious about AI audio production — whether for content creation, game development, automation, or simply exploration — DramaBox TTS Pinokio is an essential addition to your local AI toolkit.

Enjoyed this article?

Share it with your network

Listings related to DramaBox TTS Pinokio: Free Open-Source Expressive Voice Cloning You Can Run Locally