# kitten-tts-web-demo **Repository Path**: web/kitten-tts-web-demo ## Basic Information - **Project Name**: kitten-tts-web-demo - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-20 - **Last Updated**: 2025-08-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 😻 Kitten TTS Web Demo A web-based demo of the **Kitten TTS Nano** model - a lightweight 15M parameter text-to-speech model running entirely in your browser using ONNX Runtime and transformers.js! [Try the demo here](https://clowerweb.github.io/kitten-tts-web-demo/). ## ✨ Features - 🎤 **8 Different Voices** - Male and female expression voices - ⚡ **Adjustable Speed** - From 0.5x (slow) to 2.0x (fast) - 🎵 **Multiple Sample Rates** - 8kHz to 48kHz for different quality levels - 🌐 **100% Browser-Based** - No server required, runs locally - 📱 **Real-time Generation** - Fast inference using WebAssembly - 🚀 **WebGPU Support** - Experimental WebGPU acceleration (with WASM fallback) ## 🚀 Quick Start 1. **Clone the repository:** ```bash git clone https://github.com/clowerweb/kitten-tts-web-demo cd kitten-tts-web-demo ``` 2. **Install dependencies:** ```bash npm install ``` 3. **Start the development server:** ```bash npm run dev ``` 4. **Open your browser** and navigate to `http://localhost:5173` 5. **Type some text and generate speech!** 🎉 ## 📋 Requirements - Node.js 16+ - Modern browser with WebAssembly support - ~50MB disk space for model files ## 🏗️ How It Works This demo replicates the Kitten TTS pipeline in JavaScript. ## 🎛️ Controls - **Voice Selection** - Choose from 8 different voice embeddings - **Speed Control** - Adjust speech rate (0.5x - 2.0x) - **Sample Rate** - Select audio quality (16kHz - 48kHz) - **WebGPU Toggle** - Enable experimental GPU acceleration ## 📦 Model Information This demo uses the **Kitten TTS Nano v0.1** model: - **Size:** ~24MB ONNX model (quantized) - **Parameters:** 15 million - **Quality:** High-quality speech synthesis - **Speed:** ~2-3x Real-time generation in browser **Original Model:** - 📁 **GitHub:** [KittenML/KittenTTS](https://github.com/KittenML/KittenTTS) - 🤗 **Hugging Face:** [KittenML/kitten-tts-nano-0.1](https://huggingface.co/KittenML/kitten-tts-nano-0.1) ## 🛠️ Technical Stack - **Frontend:** Vue 3 + Vite - **ML Runtime:** ONNX Runtime Web (WebGPU + WASM) - **Phonemization:** phonemizer.js (espeak backend) - **Audio Processing:** Web Audio API - **Text Processing:** Custom text cleaner with smart chunking - **Model Format:** ONNX + JSON voice embeddings ## 📁 Project Structure ``` ├── index.html # Main HTML entry point ├── src/ │ ├── App.vue # Main Vue application │ ├── main.js # Application entry point │ ├── components/ # Vue components │ │ ├── AudioChunk.vue # Audio playback component │ │ ├── SampleRateSelector.vue │ │ ├── SpeedControl.vue │ │ ├── TextStatistics.vue │ │ ├── ThemeToggle.vue │ │ ├── VoiceSelector.vue │ │ └── WebGPUToggle.vue # GPU acceleration toggle │ ├── lib/ │ │ └── kitten-tts.js # Core TTS implementation │ ├── utils/ │ │ ├── model-cache.js # Model caching utilities │ │ ├── text-cleaner.js # Text processing & chunking │ │ └── utils.js # General utilities │ └── workers/ │ └── tts-worker.js # Web Worker for TTS ├── public/ │ ├── onnx-runtime/ # ONNX Runtime WASM files │ └── tts-model/ # Model files │ ├── model_quantized.onnx │ ├── tokenizer.json │ └── voices.json # Voice embeddings ├── package.json # Dependencies └── vite.config.js # Vite configuration ``` ## 🤝 Contributing Contributions are welcome! Feel free to: - Report bugs or issues - Suggest new features - Submit pull requests - Improve documentation ## 📄 License This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSE) file for details. The Kitten TTS model is also licensed under Apache 2.0 by [KittenML](https://github.com/KittenML). ## 🙏 Acknowledgments - **KittenML Team** for creating the amazing Kitten TTS model - **Xenova** for transformers.js and ONNX Runtime Web integration - **espeak** for phonemization support ## 🐛 Troubleshooting **Model not loading?** - Check browser console for CORS or network errors **Audio not playing?** - Try different browsers (Chrome/Firefox recommended) - Check if audio autoplay is blocked - Verify audio permissions **Poor audio quality?** - Try different voices - Adjust sample rate settings - Use shorter text inputs for better quality **WebGPU not working?** - This is an experimental feature and is known not to work in some browser/GPU setups. We are looking for contributors to help with better WebGPU support. --- Made with ❤️ using the Kitten TTS Nano model. Meow! 😻