# Deep_Speaker-speaker_recognition_system **Repository Path**: wangmingMY/Deep_Speaker-speaker_recognition_system ## Basic Information - **Project Name**: Deep_Speaker-speaker_recognition_system - **Description**: Keras implementation of ‘’Deep Speaker: an End-to-End Neural Speaker Embedding System‘’ (speaker recognition) - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-07-07 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Deep Speaker: speaker recognition system Data Set: [LibriSpeech](http://www.openslr.org/12/) Reference paper: [Deep Speaker: an End-to-End Neural Speaker Embedding System](https://arxiv.org/pdf/1705.02304.pdf) Reference code : https://github.com/philipperemy/deep-speaker (Thanks to Philippe Rémy) This code was trained on librispeech-train-clean dataset, tested on librispeech-test-clean dataset. In my code, librispeech dataset shows ~5% EER with CNN model. ## About the Code `train.py` This is the main file, contains training, evaluation and save-model function `models.py` The neural network used for the experiment. This file contains three models, CNN model (same with the paper’s CNN), GRU model (same with the paper's GRU), simple_cnn model. simple_cnn model has similar performance with the original CNN model, but the number of trained parameter dropped from 24M to 7M. `select_batch.py` Choose the optimal batch feed to the network. This is one of the cores of this experiment. `triplet_loss.py` This is a code to calculate triplet-loss for network training. Implementation is the same as paper. `test_model.py` This is a code that evaluates (test) the model, in terms of EER... `eval_matrics.py` For calculating equal error rate, f-measure, accuracy, and other metrics `pretaining.py` This is for pre-training on softmax classification loss. `pre_process.py` Load the utterance, filter out the mute, extract the fbank feature and save the module in .npy format. ## Experimental Results This code was trained on librispeech-train-clean dataset, tested on librispeech-test-clean dataset. In my code, librispeech dataset shows ~5% EER with CNN model.
## More Details If you want to know more details, please read [deep_speaker_report.pdf](deep_speaker_report.pdf) (English) or [deep_speaker实验报告.pdf](deep_speaker实验报告.pdf) (中文). ## Simple Use 1. Preprare data. I provide the sample data in `audio/LibriSpeechSamples/` or you can download full [LibriSpeech](http://www.openslr.org/12/) data or prepare your own data. 2. Preprocessing. Extract feature and preprocessing: `python preprocess.py`. 3. Training. If you want to train your model with Triplet Loss: `python train.py`. If you want to pretrain with softmax loss first: `python pretraining.py` then `python train.py`. Note: If you want to pretrain or not, you need to set `PRE_TRAIN`(in `constants.py`) flag with `True` or `False`. 4. Evaluation. Evaluate the model in terms of EER: `test_model.py`. Note: During training, `train.py` also evaluates the model. 5. Plot loss curve. Plot loss curve and EER curve with `utils.py`. ``` import constants as c from utils import plot_loss loss_file=c.CHECKPOINT_FOLDER+'/losses.txt' # loss file path plot_loss(loss_file) ```