# dv3_world **Repository Path**: xbnpyk/dv3_world ## Basic Information - **Project Name**: dv3_world - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-09-14 - **Last Updated**: 2021-09-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Deep voice 3 with World vocoder This repository extends DV3 implementation of r9y9 by supporting WORLD vocoder at its converter module. It provides WORLD feature extraction and wav synthesis using Merlin toolkit. Other DB might be supported later but for now it assumes from VCTK DB. It also assumes E2E manner TTS which means the case using G2P is not considered. Thus the whole process including wav trimming does not require HTSLabel. ### Pretrained model Following is pretrained model of DV3 including encoder, decoder and converter. Converter part has been trained to generate WORLD parameters. * pretrained model: https://www.dropbox.com/s/ubp1ez38vo03u2d/checkpoint_step004510000.pth?dl=0 * hparam configuration for training above pretrained model: https://www.dropbox.com/s/a8zu89zfsp74r47/deepvoice3_vctk.json?dl=0 ### Audio Samples I am still in the middle of training, but for those of you who would like samples, I provide audio files. * LibriTTS speaker 100: https://soundcloud.com/x7uo0xxkerik/sets/libri_dv3_world_speaker100 ## 1. Download VCTK dataset or LibriTTS * https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html * http://www.openslr.org/60/ * LibriTTS gives better clearity than VCTK. ## 2. Install Merlin Since it extracts WORLD vocoder feature and synthesize wav from it using Merlin, you need to clone Merlin. Clone it from here https://github.com/CSTR-Edinburgh/merlin and wait before you compile the tools of Merlin since you need to modify the WORLD tool before compilation ## 3. Merlin code modification You need to modify some part of merlin. WORLD has hopping size of 5ms hardcoded while you need 256 timesteps of hopping size and so on. So I have made 'merlin' directroy under this project. You must copy the files from it and replace it with the files in your merlin directory. Then you compile the tool as written in the Merlin guideline and start installing merlin as in https://github.com/CSTR-Edinburgh/merlin#installation. **In case you already have installed merlin, you still have to replace the files with the ones I provide and compile** ``` bash tools/compile_tools.sh ``` ## 4. DV3 preprocess As in the original DV3 code, you need to run **preprocess.py**. In addition to mel and linear spectrogram extraction, wav is trimmed and that trimmed wav is stored. This procedure is required because WORLD features are extracted from stored wav directly and thus trimmed wav needs to be stored. ``` python preprocess.py vctk {VCTK-Corpus directory} {mel,linear spectrogram saving directory} --preset=presets/deepvoice3_vctk.json ``` For example, ``` python preprocess.py vctk /home/administrator/Music/VCTK-Corpus ./preprocess_output_trim --preset=presets/deepvoice3_vctk.json ``` As a result, mel and linear spectrograms are saved at **./preprocess_output_trim** and trimmed wav are saved at **{VCTK-Corpus directory}/wav_trim_22050.** This directory is where Merlin refers to when extracting WORLD features from wav. ## 5. Extract WORLD features with Merlin When I say 'merlin' it means the merlin that you cloned from its original repository. Do not confuse with 'dv3_world/merlin'. At this point, you do not need 'dv3_world/merlin' once you have copied all the files from it to your 'merlin'. ### 5.1. Run setup Create global configuration and modify sampling rate ``` cd merlin/egs/build_your_own_voice/s1 bash 01_setup.sh s1 vim conf/global_settings.cfg ``` Then change sampling rate as follows in order to the recommened setting of deepvoice3 that is configured in deepvoice3_vctk.json ``` SamplingFreq=22050 ``` ### 5.2 Extract acoustic features Once you have followed directions above, you should have **03_vctk_prepare_acoustic_features.sh** that I provided at your synthesizing directory and **wav_trim_22050** under your VCTK root directory. ``` cd merlin/egs/build_your_own_voice/s1 mkdir -p database/feats bash 03_vctk_prepare_acoustic_features.sh {your VCTK-corpus directory}/wav_trim_22050 database/feats ``` As a result, you have WORLD features and SPTK features at database/feats. In original merlin code, sp, bapd, f0 are deleted because they are mere intermediate files but I left them for debugging purpose. ### 5.3 Create file_id_list.scp I have provided **dv3_world/merlin/egs/build_your_own_voice/vctk/gen_file_id_list.sh** At this point, that file should be copied into your Merlin work directory. That script is required for generating the file list to extract WORLD features ``` cd merlin/egs/build_your_own_voice/s1 bash gen_file_id_list.sh ``` ### 5.4 Prepare conf files ``` cd merlin/egs/build_your_own_voice/s1 bash 04_prepare_conf_files.sh conf/global_settings.cfg ``` As a result, conf/acoustic_s1.conf, conf/test_synth_s1.conf will be created. You need modify them a bit. * conf/acoustic_s1.conf ``` # sub-processes NORMLAB : False MAKECMP : True NORMCMP : True TRAINDNN : False DNNGEN : False GENWAV : False CALMCD : False ``` ### 5.5 Create and normalize cmp cmp is extension for WORLD features concatenated. It composes of 187 dimensions. 0 to 179 dim for static, delta, delta-delta of mgc, 180 to 182 for static, delta, delta-delta of lf0, 183 for vuv, 184 to 186 for static, delta, delta-delta of bap. ``` cd merlin python src/run_merlin.py egs/build_your_own_voice/s1/conf/acoustic_s1.conf ``` ### 5.6 Copy-and-synthesis to verify your extraction #### 5.6.1 Modify test_synth_s1.conf before generating wav. ``` framelength: 1024 fw_alpha: 0.65 # sub-processes NORMLAB : False MAKECMP: False NORMCMP: False TRAINDNN: False DNNGEN : False GENWAV : True CALMCD: False ``` #### 5.6.2 Copy features to exp directory ``` cd merlin/egs/build_your_own_voice/s1/experiments/s1/acoustic_model/data cp */p225_001.* merlin/egs/build_your_own_voice/s1/experiments/s1/test_synthesis/wav ``` #### 5.6.3. Write filename without extention to 'test_id_list.scp' ``` vim merlin/egs/build_your_own_voice/s1/experiments/s1/test_synthesis/test_id_list.scp ``` and write as following without extention. ``` p225_001 ``` #### 5.6.4. Synthesize waveform ``` cd merlin python src/run_merlin.py egs/build_your_own_voice/s1/conf/test_synth_s1.conf ``` ## 6. Train DV3 with WORLD feature for converter ``` cd dv3_world python train.py --data-root=./preprocess_output_trim --cmp-root={your_merlin_dir}/egs/build_your_own_voice/s1/experiments/s1/acoustic_model/inter_module/nn_norm_mgc_lf0_vuv_bap_187 --preset=presets/deepvoice3_vctk.json --checkpoint-dir=./190709 ``` The only added option here is **--cmp-root** which points to the directory of extracted WORLD features. ## 7. Synthesize WORLD features from DV3 ``` cd dv3_world python synthesis.py {path_to_check_point} text_list.txt gen --preset=presets/deepvoice3_vctk.json --speaker_id=1 ``` The arguments for synthesizing is exactly the same as in the original deepvoice3. The only difference is that this step generates cmp(WORLD feature), not wav directly. cmp is turned into wav in next procedure using Merlin. ## 8. Generate wav from cmp using Merlin ### 8.1 Move the cmp into merlin directory ``` cp dv3_world/gen/*.cmp {merlin_directory}/egs/build_your_own_voice/s1/experiments/s1/test_synthesis/wav ``` ### 8.2 Modify test_id_list of Merlin ``` cd {merlin_directory}/egs/build_your_own_voice/s1/experiments/s1/test_synthesis vim test_id_list.scp ``` => Type the name of cmp file without extension as in ``` 0_checkpoint_step000723825 1_checkpoint_step000723825 ``` ### 8.3. Configure test_synth_s1.conf This configuration is different from when you do copy-synthesis. ``` # sub-processes NORMLAB : False MAKECMP: False NORMCMP: False TRAINDNN: False DNNGEN : True GENWAV : True CALMCD: False ``` ### 8.4. Run merlin ``` cd {merlin_directory} python src/run_merlin.py {merlin_directory}/egs/build_your_own_voice/s1/conf/test_synth_s1.conf ``` Now you have wav files at **{merlin_directory}/egs/build_your_own_voice/s1/experiments/s1/test_synth/wav**