# EEND_PyTorch **Repository Path**: segmentationFaults/EEND_PyTorch ## Basic Information - **Project Name**: EEND_PyTorch - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-01-04 - **Last Updated**: 2022-01-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # EEND_PyTorch A PyTorch implementation of [End-to-End Neural Diarization](https://ieeexplore.ieee.org/document/9003959). This repo is largely based on the original chainer implementation [EEND](https://github.com/hitachi-speech/EEND) by [Hitachi Ltd.](https://github.com/hitachi-speech), who holds the copyright. This repo only includes the training/inferring part. If you are looking for data preparation, please refer to the [original authors' repo](https://github.com/hitachi-speech/EEND/blob/master/egs/callhome/v1/run_prepare_shared.sh). ## Note Only Transformer model with PIT loss is implemented here. And I can only assure the main pipeline is correct. Some side stuffs (such as save_attn_weight, BLSTM model, deep clustering loss, etc.) are either not implemented correctly or not implemented. Actually the orignal chainer code reserves the pytorch interface, I may consider make a merge request after the code is well-polished. ## Run 1. Prepare your kaldi-style data and modify `run.sh` according to your own directories. 2. Check configuration file. The default `conf/large/train.yaml` configuration uses a 4 layer Transformer with 100k warmsteps, which is different from their paper in ASRU2019. This configuration comes from [their paper submitted to TASLP](https://arxiv.org/abs/2003.02966). As larger model yeilds better performance. 3. `./run.sh` ## Pretrained Models Pretrained models are offerred here. `model_simu.th` is trained on simulation data (beta=2), and `model_callhome.th` is adapted on callhome data. They are all 4-layer Transformer models trained with `conf/large/train.yaml`. ## Results We miss the SwitchBoard Phase 1 for training data, so the results can be a little worse. | Type | Transformer Layer | Noam Warmup Steps | DER on simu | DER on callhome | |:-:|:-:|:-:|:-:|:-:| | [Chainer (ASRU2019)](https://ieeexplore.ieee.org/document/9003959) | 2 | 25k | 7.36 | 12.50 | | [Chainer (TASLP)](https://arxiv.org/pdf/2003.02966.pdf) | 4 | 100k | 4.56 | 9.54 | | Chainer (run on our data) | 2 | 25k | 9.78 | 14.85 | | PyTorch (epoch 50 on simu) | 2 | 25k | 10.14 | 15.72 | | PyTorch | 4 | 100k | 6.76 | 11.21 | ## Citation Cite their great papers! ``` @inproceedings={fujita2019endtoend2, title={End-to-End Neural Speaker Diarization with Permutation-Free Objectives}, author={Fujita, Yusuke and Kanda, Naoyuki and Horiguchi, Shota and Nagamatsu, Kenji and Watanabe, Shinji}, booktitle={INTERSPEECH}, year={2019}, pages={4300--4304}, } ``` ``` @inproceedings={fujita2019endtoend, title={End-to-End Neural Speaker Diarization with Self-Attention}, author={Fujita, Yusuke and Kanda, Naoyuki and Horiguchi, Shota and Xue, Yawen and Nagamatsu, Kenji and Watanabe, Shinji}, booktitle={IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, pages={296--303}, year={2019}, } ``` ``` @article={fujita2020endtoend, title={End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification}, author={Fujita, Yusuke and Watanabe, Shinji and Horiguchi, Shota and Xue, Yawen and Nagamatsu, Kenji}, journal={arXiv:2003.02966}, year={2020}, } ```