# opentts **Repository Path**: htqs_admin/opentts ## Basic Information - **Project Name**: opentts - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-03 - **Last Updated**: 2025-03-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Open Text to Speech Server Unifies access to multiple open source text to speech systems and voices for [many languages](#running). Supports a [subset of SSML](#ssml) that can use multiple voices, text to speech systems, and languages! ``` xml The 1st thing to remember is that 27 languages are supported in Open TTS as of 10/13/2021 at 3pm. The current voice can be changed, even to a different text to speech system! Breaks are possible between sentences. One language is never enough Eine Sprache ist niemals genug 言語を一つは決して足りない Lugha moja haitoshi ``` See the [full SSML example](https://github.com/synesthesiam/opentts/blob/master/etc/ssml_example.xml) (use `synesthesiam/opentts:all` Docker image with all voices included) [Listen to voice samples](https://synesthesiam.github.io/opentts/) ![Web interface screenshot](img/screenshot.png "Screenshot") ## Voices * [Larynx](https://github.com/rhasspy/larynx) * English (27), German (7), French (3), Spanish (2), Dutch (4), Russian (3), Swedish (1), Italian (2), Swahili (1) * Model types available: [GlowTTS](https://github.com/rhasspy/glow-tts-train) * Vocoders available: [HiFi-Gan](https://github.com/rhasspy/hifi-gan-train) (3 levels of quality) * Patched embedded version of Larynx 1.0 * [Glow-Speak](https://github.com/rhasspy/glow-speak) * English (2), German (1), French (1), Spanish (1), Dutch (1), Russian (1), Swedish (1), Italian (1), Swahili (1), Greek (1), Finnish (1), Hungarian (1), Korean (1) * Model types available: [GlowTTS](https://github.com/rhasspy/glow-tts-train) * Vocoders available: [HiFi-Gan](https://github.com/rhasspy/hifi-gan-train) (3 levels of quality) * [Coqui-TTS](https://github.com/coqui-ai/TTS) * English (110), Japanese (1), Chinese (1) * Patched embedded version of Coqui-TTS 0.3.1 * [nanoTTS](https://github.com/gmn/nanotts) * English (2), German (1), French (1), Italian (1), Spanish (1) * [MaryTTS](http://mary.dfki.de) * English (7), German (3), French (4), Italian (1), Russian (1), Swedish (1), Telugu (1), Turkish (1) * Includes [embedded MaryTTS](https://github.com/synesthesiam/marytts-txt2wav) * [flite](http://www.festvox.org/flite) * English (19), Hindi (1), Bengali (1), Gujarati (3), Kannada (1), Marathi (2), Punjabi (1), Tamil (1), Telugu (3) * [Festival](http://www.cstr.ed.ac.uk/projects/festival/) * English (9), Spanish (1), Catalan (1), Czech (4), Russian (1), Finnish (2), Marathi (1), Telugu (1), Hindi (1), Italian (2), Arabic (2) * Spanish/Catalan/Finnish use [ISO-8859-15 encoding](https://en.wikipedia.org/wiki/ISO/IEC_8859-15) * Czech uses [ISO-8859-2 encoding](https://en.wikipedia.org/wiki/ISO/IEC_8859-2) * Russian is [transliterated](https://pypi.org/project/transliterate/) from Cyrillic to Latin script automatically * Arabic uses UTF-8 and is diacritized with [mishkal](https://github.com/linuxscout/mishkal) * [eSpeak](http://espeak.sourceforge.net) * Supports huge number of languages/locales, but sounds robotic ## Running Basic OpenTTS server: ```bash $ docker run -it -p 5500:5500 synesthesiam/opentts: ``` where `` is one of: * all (All languages) * ar (Arabic) * bn (Bengali) * ca (Catalan) * cs (Czech) * de (German) * el (Greek) * en (English) * es (Spanish) * fi (Finnish) * fr (French) * gu (Gujarati) * hi (Hindi) * hu (Hungarian) * it (Italian) * ja (Japanese) * kn (Kannada) * ko (Korean) * mr (Marathi) * nl (Dutch) * pa (Punjabi) * ru (Russian) * sv (Swedish) * sw (Swahili) * ta (Tamil) * te (Telugu) * tr (Turkish) * zh (Chinese) Visit http://localhost:5500 For HTTP API test page, visit http://localhost:5500/openapi/ Exclude eSpeak (robotic voices): ```bash $ docker run -it -p 5500:5500 synesthesiam/opentts: --no-espeak ``` ### WAV Cache You can have the OpenTTS server cache WAV files with `--cache`: ```bash $ docker run -it -p 5500:5500 synesthesiam/opentts: --cache ``` This will store WAV files in a temporary directory (inside the Docker container). A specific directory can also be used: ```bash $ docker run -it -v /path/to/cache:/cache -p 5500:5500 synesthesiam/opentts: --cache /cache ``` ## HTTP API Endpoints See [swagger.yaml](swagger.yaml) * `GET /api/tts` * `?voice` - voice in the form `tts:voice` (e.g., `espeak:en`) * `?text` - text to speak * `?cache` - disable WAV cache with `false` * Returns `audio/wav` bytes * `GET /api/voices` * Returns JSON object * Keys are voice ids in the form `tts:voice` * Values are objects with: * `id` - voice identifier for TTS system (string) * `name` - friendly name of voice (string) * `gender` - M or F (string) * `language` - 2-character language code (e.g., "en") * `locale` - lower-case locale code (e.g., "en-gb") * `tts_name` - name of text to speech system * Filter voices using query parameters: * `?tts_name` - only text to speech system(s) * `?language` - only language(s) * `?locale` - only locale(s) * `?gender` - only gender(s) * `GET /api/languages` * Returns JSON list of supported languages * Filter languages using query parameters: * `?tts_name` - only text to speech system(s) ## SSML A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) is supported: * `` - wrap around SSML text * `lang` - set language for document * `` - sentence (disables automatic sentence breaking) * `lang` - set language for sentence * `` / `` - word (disables automatic tokenization) * `` - set voice of inner text * `voice` - name or language of voice * Name format is `tts:voice` (e.g., "glow-speak:en-us_mary_ann") or `tts:voice#speaker_id` (e.g., "coqui-tts:en_vctk#p228") * If one of the supported languages, a preferred voice is used (override with `--preferred-voice `) * `` - force interpretation of inner text * `interpret-as` one of "spell-out", "date", "number", "time", or "currency" * `format` - way to format text depending on `interpret-as` * number - one of "cardinal", "ordinal", "digits", "year" * date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year) * `` - Pause for given amount of time * time - seconds ("123s") or milliseconds ("123ms") * `` - substitute `alias` for inner text ## MaryTTS Compatible Endpoint Use OpenTTS as a drop-in replacement for [MaryTTS](http://mary.dfki.de/). The voice format is `:`. Visit the OpenTTS web UI and copy/paste the "voice id" of your favorite voice here. You may need to change the port in your `docker run` command to `-p 59125:5500` for compatibility with existing software. ### Larynx Voice Quality On the Raspberry Pi, you may need to lower the quality of [Larynx](https://github.com/rhasspy/larynx) voices to get reasonable response times. This is done by appending the quality level to the end of your voice: ```yaml tts: - platform: marytts voice:larynx:harvard;low ``` Available quality levels are `high` (the default), `medium`, and `low`. Note that this only applies to Larynx and Glow-Speak voices. ### Speaker ID For multi-speaker models (currently just `coqui-tts:en_vctk`), you can append a speaker name or id to your voice: ```yaml tts: - platform: marytts voice:coqui-tts:en_vctk#p228 ``` You can get the available speaker names from `/api/voices` or provide a 0-based index instead: ```yaml tts: - platform: marytts voice:coqui-tts:en_vctk#42 ``` ## Default Larynx Settings Default settings for [Larynx](https://github.com/rhasspy/larynx) can be provided on the command-line: * `--larynx-quality` - vocoder quality ("high", "medium", or "low", default: "high") * `--larynx-noise-scale` - voice volatility (0-1, default: 0.667) * `--larynx-length-scale` - voice speed (< 1 is faster, default: 1.0) --- ## Building From Source OpenTTS uses [Docker buildx](https://docs.docker.com/buildx/working-with-buildx/) to build multi-platform images based on [Debian bullseye](https://www.debian.org/releases/bullseye/). Before building, make sure to download the voices you want to the `voices` directory. Each TTS system that uses external voices has a sub-directory with instructions on how to download voices. If you only plan to build an image for your current platform, you should be able to run: ``` sh make ``` from the root of the cloned repository, where `` is one of the [supported languages](#running). If it builds successfully, you can run it with: ``` sh make -run ``` For example, the English image can be built and run with: ``` sh make en make en-run ``` Under the hood, this does two things: 1. Runs the `configure` script with `--languages ` 2. Runs `docker buildx build` with the appropriate arguments You can manually run the `configure` script -- see `./configure --help` for more options. This script generates the following files (used by the build process): * build_packages - Debian packages installed with `apt-get` during the build only * packages - Debian packages installed with `apt-get` for runtime * python_packages - Python packages installed with `pip` * .dockerignore - Files that docker will ignore during building ("!" inverts) * .dockerargs - Command-line arguments passed to `docker buildx build` ### Multi-Platform images To build an image for a different platform, you need to initialize a docker buildx builder: ``` sh docker run --rm --privileged multiarch/qemu-user-static --reset -p yes docker buildx create --config /etc/docker/buildx.conf --use --name mybuilder docker buildx use mybuilder docker buildx inspect --bootstrap ``` **NOTE:** For some reason, you have to do these steps *each time you reboot*. If you see errors like "Error while loading /usr/sbin/dpkg-split: No such file or directory", run `docker buildx rm mybuilder` and re-run the steps above. When you run `make`, specify the platform(s) you want to build for: ``` sh DOCKER_PLATFORMS='--platform linux/amd64,linux/arm64,linux/arm/v7' make ``` You may place pre-compiled Python wheels in the `download` directory. They will be used during the installation of Python packages.