# text-frontend-tts **Repository Path**: xbnpyk/text-frontend-tts ## Basic Information - **Project Name**: text-frontend-tts - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-03-10 - **Last Updated**: 2021-03-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # text-frontend-tts Multilingual text processing API for cleaning, IPA phonemization, tokenization, translating into sequence of character IDs for easy stacking with neural Text-to-Speech models. ## 1 Installation **Supported OS type**: Unix (only) Package provides simple installation: * Clone the repo `git clone https://github.com/ivanvovk/text-frontend-tts.git` * Get into the root `cd text-frontend-tts` * Run `sh install.sh`. The script will: * Install all necessary Python dependencies * Initialize `phonemizer` submodule * Download and install G2P backends: `espeak-ng`, `festival`, `mbrola`, which are necessary to make `phonemizer` work * Install `phonemizer` as Python package * Install `text_frontend` as Python package ## 2 Usage API is devoted for neural TTS systems text inputs preprocessing (i.e. getting the sequence of character embedding ids). Package supports grapheme and phoneme text representation. (Note: grapheme processing doesn't support word stressing, whereas phoneme processing does) ### Code examples: Import: ```python from text_frontend import TextFrontend ``` Initialization: ```python # Encodes grapheme inputs tf = TextFrontend(text_cleaners=['basic_cleaners'], use_phonemes=True, n_jobs=1, with_stress=False) ``` To get the number of supported characters to know how many embeddings to initialize in your TTS neural network (note: current API supports only IPA phoneme scheme): ```python tf = TextFrontend(use_phonemes=False) # if using graphemes for encoding print(tf.nchars) # Output: 119 tf = TextFrontend(use_phonemes=True) # if using phonemes for encoding print(tf.nchars) # Output: 236 ``` Text encoding: ```python # Encodes grapheme inputs tf = TextFrontend(text_cleaners=['english_cleaners'], use_phonemes=False) text = "Mr. User, this is test sentence to check the performance of phonemizer and text-to-sequence encoding." print(tf.graphemes_to_phonemes(text, lang='en-us')) # it still can make G2P # Output: "m_ˈɪ_s_t_ɚ_._ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_." sequence = tf.text_to_sequence(text, lang='en-us') print(sequence) # Output: [36, 32, 42, 43, 28, 41, 2, 44, 42, 28, 41, 5, 2, 43, 31, 32, 42, 2, 32, 42, 2, 43, 28, 42, 43, 2, 42, 28, 37, 43, 28, 37, 26, 28, 2, 43, 38, 2, 26, 31, 28, 26, 34, 2, 43, 31, 28, 2, 39, 28, 41, 29, 38, 41, 36, 24, 37, 26, 28, 2, 38, 29, 2, 39, 31, 38, 37, 28, 36, 32, 49, 28, 41, 2, 24, 37, 27, 2, 43, 28, 47, 43, 6, 43, 38, 6, 42, 28, 40, 44, 28, 37, 26, 28, 2, 28, 37, 26, 38, 27, 32, 37, 30, 7, 1] print(tf.sequence_to_text(sequence)) # however encoding corresponds only to grapheme representation # Output: "mister user, this is test sentence to check the performance of phonemizer and text-to-sequence encoding." ``` ```python # Encodes phoneme inputs tf = TextFrontend(text_cleaners=['english_cleaners'], use_phonemes=True, with_stress=True) text = "Mr. User, this is test sentence to check the performance of phonemizer and text-to-sequence encoding." print(tf.graphemes_to_phonemes(text, lang='en-us')) # Output: "m_ˈɪ_s_t_ɚ_._ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_." sequence = tf.text_to_sequence(text, lang='en-us') print(sequence) # Output: [153, 45, 42, 225, 89, 135, 127, 122, 137, 89, 5, 135, 76, 159, 42, 135, 159, 137, 135, 225, 87, 42, 225, 135, 42, 87, 165, 225, 77, 165, 42, 135, 225, 77, 135, 55, 87, 160, 135, 76, 77, 135, 147, 89, 38, 83, 153, 77, 165, 42, 135, 104, 139, 135, 38, 123, 165, 153, 217, 137, 89, 135, 133, 165, 151, 135, 225, 87, 160, 42, 225, 6, 135, 225, 77, 6, 135, 42, 141, 160, 35, 77, 165, 42, 135, 158, 40, 160, 123, 151, 159, 40, 7, 1] print(tf.sequence_to_text(sequence)) # encoding corresponds to phoneme representation # Output: "m_ˈɪ_s_t_ɚ_ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_." ``` Just cleaning the text: ```python from text_frontend import clean_text text = "Mr. User, this is test sentence to check the performance of text cleaning. It costs $0." print(clean_text(text, ['english_cleaners'])) # Output: "mister user, this is test sentence to check the performance of text cleaning. it costs zero dollars." ``` For more details read the docs when calling functions.