# adapter-bert

**Repository Path**: mirrors_google-research/adapter-bert

## Basic Information

- **Project Name**: adapter-bert
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-24
- **Last Updated**: 2026-03-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Adapter-BERT

## Introduction

This repository contains a version of BERT that can be trained using adapters.
Our ICML 2019 paper contains a full description of this technique:
[Parameter-Efficient Transfer Learning for NLP](http://proceedings.mlr.press/v97/houlsby19a.html).

Adapters allow one to train a model to solve new tasks, but adjust only a few
parameters per task. This technique yields compact models that share many
parameters across tasks, whilst performing similarly to fine-tuning the entire
model independently for every task.

The code here is forked from the
[original BERT repo](https://github.com/google-research/bert).
It provides our version of BERT with adapters, and the capability to train it on
the [GLUE tasks](https://gluebenchmark.com/).
For additional details on BERT, and support for additional tasks, see the
original repo.


## Tuning BERT with Adapters

The following command provides an example of tuning with adapters on GLUE.

Fine-tuning may be run on a GPU with at least 12GB of RAM, or a Cloud TPU. The
same constraints apply as for full fine-tuning of BERT. For additional details,
and instructions on downloading a pre-trained checkpoint and the GLUE tasks,
see
[https://github.com/google-research/bert](https://github.com/google-research/bert).


```shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue

python run_classifier.py \
  --task_name=MRPC \
  --do_train=true \
  --do_eval=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=3e-4 \
  --num_train_epochs=5.0 \
  --output_dir=/tmp/adapter_bert_mrpc/
```

You should see an output like this:

```
***** Eval results *****
  eval_accuracy = 0.85784316
  eval_loss = 0.48347527
  global_step = 573
  loss = 0.48347527
```

This means that the Dev set accuracy was 85.78%. Small sets like MRPC have a
high variance in the Dev set accuracy, even when starting from the same
pre-training checkpoint. Therefore results may deviate from this by 2%.

## Citation

Please use the following citation for this work:

```
@inproceedings{houlsby2019parameter,
  title = {Parameter-Efficient Transfer Learning for {NLP}},
  author = {Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and De Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain},
  booktitle = {Proceedings of the 36th International Conference on Machine Learning},
  year = {2019},
}
```

The paper is uploaded to [ArXiv](https://arxiv.org/abs/1902.00751).

## Disclaimer

This is not an official Google product.

## Contact information

For personal communication, please contact Neil Houlsby
(neilhoulsby@google.com).