# PytorchWaveNetVocoder

**Repository Path**: xbnpyk/PytorchWaveNetVocoder

## Basic Information

- **Project Name**: PytorchWaveNetVocoder
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-11-20
- **Last Updated**: 2021-11-20

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

 

### I released new implementation [kan-bayashi/ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN). Please enjoy your hacking!

# PYTORCH-WAVENET-VOCODER

[![Build Status](https://travis-ci.org/kan-bayashi/PytorchWaveNetVocoder.svg?branch=master)](https://travis-ci.org/kan-bayashi/PytorchWaveNetVocoder)

This repository is the wavenet-vocoder implementation with pytorch.

![](https://kan-bayashi.github.io/WaveNetVocoderSamples/images/overview.bmp)

You can try the demo recipe in Google colab from now!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kan-bayashi/INTERSPEECH19_TUTORIAL/blob/master/notebooks/wavenet_vocoder/wavenet_vocoder.ipynb)

## Key features

- Support kaldi-like recipe, easy to reproduce the results
- Support multi-gpu training / decoding
- Support world features / mel-spectrogram as auxiliary features
- Support recipes of three public databases

    - [CMU Arctic database](http://www.festvox.org/cmu_arctic/): `egs/arctic`
    - [LJ Speech database](https://keithito.com/LJ-Speech-Dataset/): `egs/ljspeech`
    - [M-AILABS speech database](http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/): `egs/m-ailabs-speech`

## Requirements

- python 3.6+
- virtualenv
- cuda 9.0+
- cndnn 7.1+
- nccl 2.0+ (for the use of multi-gpus)

Recommend to use the GPU with 10GB> memory.

## Setup

### A. Make virtualenv

```bash
$ git clone https://github.com/kan-bayashi/PytorchWaveNetVocoder.git
$ cd PytorchWaveNetVocoder/tools
$ make
```

### B. Install with pip

```
$ git clone https://github.com/kan-bayashi/PytorchWaveNetVocoder.git
$ cd PytorchWaveNetVocoder

# recommend to use with pytorch 1.0.1 because only tested on 1.0.1
$ pip install torch==1.0.1 torchvision==0.2.2
$ pip install -e .

# please make dummy activate file to suppress warning in the recipe
$ mkdir -p tools/venv/bin && touch tools/venv/bin/activate
```

## How-to-run

```bash
$ cd egs/arctic/sd
$ ./run.sh
```

See more detail of the recipes in [egs/README.md](egs/README.md).

## Results

You can listen to samples from [kan-bayashi/WaveNetVocoderSamples](https://kan-bayashi.github.io/WaveNetVocoderSamples/).

This is the subjective evaluation results using `arctic` recipe.

**Comparison between model type**
![](https://kan-bayashi.github.io/WaveNetVocoderSamples/images/mos.bmp)

**Effect of the amount of training data**
![](https://kan-bayashi.github.io/WaveNetVocoderSamples/images/mos_num_train.bmp)

If you want to listen more samples, please access our google drive from [here](https://drive.google.com/drive/folders/1zC1WDiMu4SOdc7UeOayoEe_79PdnPBu6?usp=sharing).

Here is the list of samples:
- `arctic_raw_16k`: original in arctic database
- `arctic_sd_16k_world`: sd model with world aux feats + noise shaping with world mcep
- `arctic_si-open_16k_world`: si-open model with world aux feats + noise shaping with world mcep
- `arctic_si-close_16k_world`: si-close model with world aux feats + noise shaping with world mcep
- `arctic_si-close_16k_melspc`: si-close model with mel-spectrogram aux feats
- `arctic_si-close_16k_melspc_ns`: si-close model with mel-spectrogram aux feats + noise shaping with stft mcep
- `ljspeech_raw_22.05k`: original in ljspeech database
- `ljspeech_sd_22.05k_world`: sd model with world aux feats + noise shaping with world mcep
- `ljspeech_sd_22.05k_melspc`: sd model with mel-spectrogram aux feats
- `ljspeech_sd_22.05k_melspc_ns`: sd model with mel-spectrogram aux feats + noise shaping with stft mcep
- `m-ailabs_raw_16k`: original in m-ailabs speech database
- `m-ailabs_sd_16k_melspc`: sd model with mel-spectrogram aux feats

## References

Please cite the following articles.

```
@inproceedings{tamamori2017speaker,
  title={Speaker-dependent WaveNet vocoder},
  author={Tamamori, Akira and Hayashi, Tomoki and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proceedings of Interspeech},
  pages={1118--1122},
  year={2017}
}
@inproceedings{hayashi2017multi,
  title={An Investigation of Multi-Speaker Training for WaveNet Vocoder},
  author={Hayashi, Tomoki and Tamamori, Akira and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proc. ASRU 2017},
  year={2017}
}
@article{hayashi2018sp,
  title={複数話者WaveNetボコーダに関する調査}.
  author={林知樹 and 小林和弘 and 玉森聡 and 武田一哉 and 戸田智基},
  journal={電子情報通信学会技術研究報告},
  year={2018}
}
```

## Author

Tomoki Hayashi @ Nagoya University  
e-mail:hayashi.tomoki@g.sp.m.is.nagoya-u.ac.jp