# opentts
**Repository Path**: htqs_admin/opentts
## Basic Information
- **Project Name**: opentts
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-03
- **Last Updated**: 2025-03-03
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Open Text to Speech Server
Unifies access to multiple open source text to speech systems and voices for [many languages](#running).
Supports a [subset of SSML](#ssml) that can use multiple voices, text to speech systems, and languages!
``` xml
The 1st thing to remember is that 27 languages are supported in Open TTS as of 10/13/2021 at 3pm.
The current voice can be changed, even to a different text to speech system!
Breaks are possible
between sentences.
One language is never enough
Eine Sprache ist niemals genug
言語を一つは決して足りない
Lugha moja haitoshi
```
See the [full SSML example](https://github.com/synesthesiam/opentts/blob/master/etc/ssml_example.xml) (use `synesthesiam/opentts:all` Docker image with all voices included)
[Listen to voice samples](https://synesthesiam.github.io/opentts/)

## Voices
* [Larynx](https://github.com/rhasspy/larynx)
* English (27), German (7), French (3), Spanish (2), Dutch (4), Russian (3), Swedish (1), Italian (2), Swahili (1)
* Model types available: [GlowTTS](https://github.com/rhasspy/glow-tts-train)
* Vocoders available: [HiFi-Gan](https://github.com/rhasspy/hifi-gan-train) (3 levels of quality)
* Patched embedded version of Larynx 1.0
* [Glow-Speak](https://github.com/rhasspy/glow-speak)
* English (2), German (1), French (1), Spanish (1), Dutch (1), Russian (1), Swedish (1), Italian (1), Swahili (1), Greek (1), Finnish (1), Hungarian (1), Korean (1)
* Model types available: [GlowTTS](https://github.com/rhasspy/glow-tts-train)
* Vocoders available: [HiFi-Gan](https://github.com/rhasspy/hifi-gan-train) (3 levels of quality)
* [Coqui-TTS](https://github.com/coqui-ai/TTS)
* English (110), Japanese (1), Chinese (1)
* Patched embedded version of Coqui-TTS 0.3.1
* [nanoTTS](https://github.com/gmn/nanotts)
* English (2), German (1), French (1), Italian (1), Spanish (1)
* [MaryTTS](http://mary.dfki.de)
* English (7), German (3), French (4), Italian (1), Russian (1), Swedish (1), Telugu (1), Turkish (1)
* Includes [embedded MaryTTS](https://github.com/synesthesiam/marytts-txt2wav)
* [flite](http://www.festvox.org/flite)
* English (19), Hindi (1), Bengali (1), Gujarati (3), Kannada (1), Marathi (2), Punjabi (1), Tamil (1), Telugu (3)
* [Festival](http://www.cstr.ed.ac.uk/projects/festival/)
* English (9), Spanish (1), Catalan (1), Czech (4), Russian (1), Finnish (2), Marathi (1), Telugu (1), Hindi (1), Italian (2), Arabic (2)
* Spanish/Catalan/Finnish use [ISO-8859-15 encoding](https://en.wikipedia.org/wiki/ISO/IEC_8859-15)
* Czech uses [ISO-8859-2 encoding](https://en.wikipedia.org/wiki/ISO/IEC_8859-2)
* Russian is [transliterated](https://pypi.org/project/transliterate/) from Cyrillic to Latin script automatically
* Arabic uses UTF-8 and is diacritized with [mishkal](https://github.com/linuxscout/mishkal)
* [eSpeak](http://espeak.sourceforge.net)
* Supports huge number of languages/locales, but sounds robotic
## Running
Basic OpenTTS server:
```bash
$ docker run -it -p 5500:5500 synesthesiam/opentts:
```
where `` is one of:
* all (All languages)
* ar (Arabic)
* bn (Bengali)
* ca (Catalan)
* cs (Czech)
* de (German)
* el (Greek)
* en (English)
* es (Spanish)
* fi (Finnish)
* fr (French)
* gu (Gujarati)
* hi (Hindi)
* hu (Hungarian)
* it (Italian)
* ja (Japanese)
* kn (Kannada)
* ko (Korean)
* mr (Marathi)
* nl (Dutch)
* pa (Punjabi)
* ru (Russian)
* sv (Swedish)
* sw (Swahili)
* ta (Tamil)
* te (Telugu)
* tr (Turkish)
* zh (Chinese)
Visit http://localhost:5500
For HTTP API test page, visit http://localhost:5500/openapi/
Exclude eSpeak (robotic voices):
```bash
$ docker run -it -p 5500:5500 synesthesiam/opentts: --no-espeak
```
### WAV Cache
You can have the OpenTTS server cache WAV files with `--cache`:
```bash
$ docker run -it -p 5500:5500 synesthesiam/opentts: --cache
```
This will store WAV files in a temporary directory (inside the Docker container). A specific directory can also be used:
```bash
$ docker run -it -v /path/to/cache:/cache -p 5500:5500 synesthesiam/opentts: --cache /cache
```
## HTTP API Endpoints
See [swagger.yaml](swagger.yaml)
* `GET /api/tts`
* `?voice` - voice in the form `tts:voice` (e.g., `espeak:en`)
* `?text` - text to speak
* `?cache` - disable WAV cache with `false`
* Returns `audio/wav` bytes
* `GET /api/voices`
* Returns JSON object
* Keys are voice ids in the form `tts:voice`
* Values are objects with:
* `id` - voice identifier for TTS system (string)
* `name` - friendly name of voice (string)
* `gender` - M or F (string)
* `language` - 2-character language code (e.g., "en")
* `locale` - lower-case locale code (e.g., "en-gb")
* `tts_name` - name of text to speech system
* Filter voices using query parameters:
* `?tts_name` - only text to speech system(s)
* `?language` - only language(s)
* `?locale` - only locale(s)
* `?gender` - only gender(s)
* `GET /api/languages`
* Returns JSON list of supported languages
* Filter languages using query parameters:
* `?tts_name` - only text to speech system(s)
## SSML
A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) is supported:
* `` - wrap around SSML text
* `lang` - set language for document
* `` - sentence (disables automatic sentence breaking)
* `lang` - set language for sentence
* `` / `` - word (disables automatic tokenization)
* `` - set voice of inner text
* `voice` - name or language of voice
* Name format is `tts:voice` (e.g., "glow-speak:en-us_mary_ann") or `tts:voice#speaker_id` (e.g., "coqui-tts:en_vctk#p228")
* If one of the supported languages, a preferred voice is used (override with `--preferred-voice `)
* `` - force interpretation of inner text
* `interpret-as` one of "spell-out", "date", "number", "time", or "currency"
* `format` - way to format text depending on `interpret-as`
* number - one of "cardinal", "ordinal", "digits", "year"
* date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
* `` - Pause for given amount of time
* time - seconds ("123s") or milliseconds ("123ms")
* `` - substitute `alias` for inner text
## MaryTTS Compatible Endpoint
Use OpenTTS as a drop-in replacement for [MaryTTS](http://mary.dfki.de/).
The voice format is `:`. Visit the OpenTTS web UI and copy/paste the "voice id" of your favorite voice here.
You may need to change the port in your `docker run` command to `-p 59125:5500` for compatibility with existing software.
### Larynx Voice Quality
On the Raspberry Pi, you may need to lower the quality of [Larynx](https://github.com/rhasspy/larynx) voices to get reasonable response times.
This is done by appending the quality level to the end of your voice:
```yaml
tts:
- platform: marytts
voice:larynx:harvard;low
```
Available quality levels are `high` (the default), `medium`, and `low`.
Note that this only applies to Larynx and Glow-Speak voices.
### Speaker ID
For multi-speaker models (currently just `coqui-tts:en_vctk`), you can append a speaker name or id to your voice:
```yaml
tts:
- platform: marytts
voice:coqui-tts:en_vctk#p228
```
You can get the available speaker names from `/api/voices` or provide a 0-based index instead:
```yaml
tts:
- platform: marytts
voice:coqui-tts:en_vctk#42
```
## Default Larynx Settings
Default settings for [Larynx](https://github.com/rhasspy/larynx) can be provided on the command-line:
* `--larynx-quality` - vocoder quality ("high", "medium", or "low", default: "high")
* `--larynx-noise-scale` - voice volatility (0-1, default: 0.667)
* `--larynx-length-scale` - voice speed (< 1 is faster, default: 1.0)
---
## Building From Source
OpenTTS uses [Docker buildx](https://docs.docker.com/buildx/working-with-buildx/) to build multi-platform images based on [Debian bullseye](https://www.debian.org/releases/bullseye/).
Before building, make sure to download the voices you want to the `voices` directory. Each TTS system that uses external voices has a sub-directory with instructions on how to download voices.
If you only plan to build an image for your current platform, you should be able to run:
``` sh
make
```
from the root of the cloned repository, where `` is one of the [supported languages](#running). If it builds successfully, you can run it with:
``` sh
make -run
```
For example, the English image can be built and run with:
``` sh
make en
make en-run
```
Under the hood, this does two things:
1. Runs the `configure` script with `--languages `
2. Runs `docker buildx build` with the appropriate arguments
You can manually run the `configure` script -- see `./configure --help` for more options. This script generates the following files (used by the build process):
* build_packages - Debian packages installed with `apt-get` during the build only
* packages - Debian packages installed with `apt-get` for runtime
* python_packages - Python packages installed with `pip`
* .dockerignore - Files that docker will ignore during building ("!" inverts)
* .dockerargs - Command-line arguments passed to `docker buildx build`
### Multi-Platform images
To build an image for a different platform, you need to initialize a docker buildx builder:
``` sh
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker buildx create --config /etc/docker/buildx.conf --use --name mybuilder
docker buildx use mybuilder
docker buildx inspect --bootstrap
```
**NOTE:** For some reason, you have to do these steps *each time you reboot*. If you see errors like "Error while loading /usr/sbin/dpkg-split: No such file or directory", run `docker buildx rm mybuilder` and re-run the steps above.
When you run `make`, specify the platform(s) you want to build for:
``` sh
DOCKER_PLATFORMS='--platform linux/amd64,linux/arm64,linux/arm/v7' make
```
You may place pre-compiled Python wheels in the `download` directory. They will be used during the installation of Python packages.