# TF-Codec
**Repository Path**: mirrors_microsoft/TF-Codec
## Basic Information
- **Project Name**: TF-Codec
- **Description**: Latent-Domain Predictive Neural Speech Coding
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-12
- **Last Updated**: 2025-08-25
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# TF-Codec: Latent-Domain Predictive Neural Speech Coding
Official implementation of the non-predictive version of the paper [Latent-Domain Predictive Neural Speech Coding](https://arxiv.org/abs/2207.08363).
## Prerequisites
- Python 3.10 and conda, get [Conda](https://www.anaconda.com/)
- CUDA 12.5 (other versions may also work. Make sure the CUDA version matches with pytorch.)
- pytorch 2.5 (We have tested that pytorch-2.5 works. Other versions may also work.)
- Environment
```
conda create -n $YOUR_PY_ENV_NAME python=3.10
conda activate $YOUR_PY_ENV_NAME
pip install -r requirements.txt
```
## Pretrained models
Download [our pretrained models](https://1drv.ms/f/c/5fdaec1d5376d89a/EnLDJvj4S7JBscPWEKcEhcABZijlncQoX6K_Kdajt2IAQg?e=eVOMTi) and put them into ./checkpoints folder. Both the generator and discriminator weights are saved in the pretrained model ckpt.
## Training
Put your training and validation data (Multilingual_train.mdb and Multilingual_val.mdb in LMDB format) in ./training_data folder:
Stage-1 without adversarial training:
```bash
python multiprocess_caller.py --nproc_per_node=4 --nnodes=1 --num_workers=2 --train_data_dir=training_data/Multilingual_train.mdb --val_data_dir=training_data/Multilingual_val.mdb --train_dir=job_tfcodec_stage1 --config=configs/tfcodec_config_train_stage1.yaml
```
Stage-2 finetuning from stage-1 checkpoints (./checkpoints/model_stage1.ckpt) with adversarial training:
```bash
python multiprocess_caller.py --nproc_per_node=4 --nnodes=1 --num_workers=2 --train_data_dir=training_data/Multilingual_train.mdb --val_data_dir=training_data/Multilingual_val.mdb --train_dir=job_tfcodec_stage2 --config=configs/tfcodec_config_train_stage2.yaml --checkpoint_path=checkpoints/model_stage1.ckpt
```
## Testing
Example to test pretrained models:
```bash
python inf.py --audio_path= --model_path=checkpoints/tfcodec_path/tfcodec_6k_514000.ckpt --config_path=configs/tfcodec_config_6k.yaml --output_path=