# LargeBatchCTR

**Repository Path**: ByteDance/LargeBatchCTR

## Basic Information

- **Project Name**: LargeBatchCTR
- **Description**: Large batch training of CTR models based on DeepCTR with CowClip.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-08-26
- **Last Updated**: 2026-01-02

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Large Batch Training for CTR Prediction (CowClip)

LargeBatchCTR aims to train CTR prediction models with large batch (~128k). The framework is based on [DeepCTR](https://github.com/shenweichen/DeepCTR). You can run the code on a V100 GPU to feel the fast training speed.

Adaptive Column-wise Clipping (CowClip) method from paper "CowClip: Reducing CTR Prediction Model Training
Time from 12 hours to 10 minutes on 1 GPU" is implemented in this repo.

## Get Started

First, download dataset to the data folder. Use `data_utils.py` to preprocess the data for training.

```sh
python data_utils.py --dataset criteo_kaggle --split rand
```

Then, use `train.py` to train the network.

```sh
# Criteo (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM
# Avazu (baseline)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM
```

For large batch training with CowClip, do as follows:

```sh
# Criteo (8K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 8192 --l2 8e-05 --lr 22.6274e-4
# Criteo (128K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 131072 --l2 128e-05 --lr 90.5096e-4
# Avazu (64K)
CUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-4 --bs 65536 --l2 64e-05 --lr 8e-4
```

## CowClip Quick Look

![CowClip Algorithm Quick Look](./assets/cowclip.png)

## Dataset List

- [Criteo Kaggle](https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset): download `train.txt` in `data/criteo_kaggle/`
- [Avazu](https://www.kaggle.com/c/avazu-ctr-prediction): download `train` in `data/avazu/`

## Hyperparameters

The meaning of hyperparameters in the command line is as follows:

| params        | name                                        |
| ------------- | ------------------------------------------- |
| --bs          | batch size                                  |
| --lr_embed    | learning rate for the embedding layer       |
| --lr          | learning rate for the dense weights         |
| --l2          | L2-regularization weight λ                  |
| --clip        | CowClip coefficient r                       |
| --bound       | CowClip bound ζ                             |
| --warmup      | number of epochs to warmup on dense weights |
| --init_stddev | initialization weight standard deviation    |

The hyperparameters neet to be scaled are listed as follows. For Criteo dataset:

| bs   | lr       | l2     |   ζ   | DeepFM AUC(%) | Time(min) |
| :--- | :------- | :----- | :---: | :-----------: | :-------: |
| 1K   | 8e-4     | 1e-5   | 1e-5  |     80.86     |    768    |
| 2K   | 11.31e-4 | 2e-5   | 1e-5  |     80.93     |    390    |
| 4K   | 16e-4    | 4e-5   | 1e-5  |     80.97     |    204    |
| 8K   | 22.62e-4 | 8e-5   | 1e-5  |     80.97     |    102    |
| 16K  | 32e-4    | 16e-5  | 1e-5  |     80.94     |    48     |
| 32K  | 45.25e-4 | 32e-5  | 1e-5  |     80.95     |    27     |
| 64K  | 64e-4    | 64e-5  | 1e-5  |     80.96     |    15     |
| 128K | 90.50e-4 | 128e-5 | 1e-5  |     80.90     |     9     |

For Avazu dataset:

| bs   | lr      | l2    |   ζ   | DeepFM AUC(%) | Time(min) |
| :--- | :------ | :---- | :---: | :-----------: | :-------: |
| 1K   | 1e-4    | 1e-5  | 1e-3  |     78.83     |    210    |
| 2K   | 1.41e-4 | 2e-5  | 1e-3  |     78.82     |    108    |
| 4K   | 2e-4    | 4e-5  | 1e-4  |     78.90     |    54     |
| 8K   | 2.83e-4 | 8e-5  | 1e-4  |     79.06     |    30     |
| 16K  | 4e-4    | 16e-5 | 1e-4  |     79.01     |    17     |
| 32K  | 5.66e-4 | 32e-5 | 1e-4  |     78.82     |    10     |
| 64K  | 8e-4    | 64e-5 | 1e-4  |     78.82     |    6.7    |
| 128K | 16e-4   | 96e-5 | 1e-4  |     78.80     |    4.8    |

## Model List

|        Model         | Paper                                                                                                                                              |
| :------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------- |
|     Wide & Deep      | [DLRS 2016][Wide & Deep Learning for Recommender Systems](https://arxiv.org/pdf/1606.07792.pdf)                                                    |
|        DeepFM        | [IJCAI 2017][DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](http://www.ijcai.org/proceedings/2017/0239.pdf)              |
| Deep & Cross Network | [ADKDD 2017][Deep & Cross Network for Ad Click Predictions](https://arxiv.org/abs/1708.05123)                                                      |
|        DCN V2        | [arxiv 2020][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/abs/2008.13535) |

## Requirements

Tensorflow 2.4.0  
Tensorflow-Addons

```sh
pip install -r requirements.txt
```

## Citation

```bibtex
@article{zheng2022cowclip,
  title={{CowClip}: Reducing {CTR} Prediction Model Training Time from 12 hours to 10 minutes on 1 {GPU}},
  author={Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qing, Youlong Cheng, Yang You},
  journal={arXiv},
  volume={abs/2204.06240},
  year={2022}
}
```