AI voice synthesis that runs entirely in your browser. No account, no server, no data leaving your device.
No sign-up requiredZero server uploadsWorks offline after first load100% free forever
AI Loading…
Choose a voice
Speed
1.0×
0 / 5000
Generated audio
Built to respect your privacy
Every word you type stays on your device. Scribeus runs the AI model directly in your browser — no audio ever reaches a server.
⚡
Instant playback
Synthesis plays audio as each sentence completes — no waiting for the full text before you hear anything.
🎙
4 natural voices
Bella and Jessica for female narration, Liam and Adam for male — all powered by the Kokoro-82M model.
🔒
Zero-upload privacy
The model runs in a Web Worker. Your text never leaves your device — not even in encrypted form.
📶
Works offline
After the first load the 155 MB model is cached in your browser. Generate speech on a plane, no Wi-Fi needed.
🎚
Speed control
Adjust speech rate from 0.5× to 2× for presentations, audiobooks, or fast-paced content review.
💾
WAV download
Export as a lossless 24 kHz WAV file — ready for video editors, podcasts, or presentations.
Frequently asked questions
Everything you need to know about Scribeus.
Up to 5,000 characters per generation. Long texts are split automatically at sentence boundaries and stitched into a single audio file.
Yes. The Kokoro model runs entirely inside your browser in a Web Worker. No text, no audio, and no metadata is transmitted to any server.
Only for the first visit, to download the AI model (~155 MB). After that, the model is cached and Scribeus works fully offline.
Kokoro-82M v1.0, an open-source neural TTS model with 82 million parameters. We use the q4f16 quantized ONNX variant — optimized for browser inference without sacrificing audio quality.
Completely free. No subscription, no account, no usage limits. Scribeus is part of the RuntimeHub suite of free, private, browser-based tools, supported by non-intrusive ads.
Speed depends on your device. A short sentence typically generates in 2–5 seconds. Longer passages are processed in chunks — you hear the first chunk while the rest are being generated.