# deepcluster

**Repository Path**: facebookresearch/deepcluster

## Basic Information

- **Project Name**: deepcluster
- **Description**: Deep Clustering for Unsupervised Learning of Visual Features
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-07-23
- **Last Updated**: 2023-07-23

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Deep Clustering for Unsupervised Learning of Visual Features

## News
We release [paper](https://arxiv.org/abs/2006.09882) and [code](https://github.com/facebookresearch/swav) for SwAV, our new self-supervised method.
SwAV pushes self-supervised learning to only 1.2% away from supervised learning on ImageNet with a ResNet-50!
It combines online clustering with a multi-crop data augmentation.

We also present DeepCluster-v2, which is an improved version of DeepCluster (ResNet-50, better data augmentation, cosine learning rate schedule, MLP projection head, use of centroids, ...).
Check out [DeepCluster-v2 code](https://github.com/facebookresearch/swav/blob/master/main_deepclusterv2.py).

## DeepCluster
This code implements the unsupervised training of convolutional neural networks, or convnets, as described in the paper [Deep Clustering for Unsupervised Learning of Visual Features](https://arxiv.org/abs/1807.05520).

Moreover, we provide the evaluation protocol codes we used in the paper:
* Pascal VOC classification
* Linear classification on activations
* Instance-level image retrieval

Finally, this code also includes a visualisation module that allows to assess visually the quality of the learned features.

## Requirements

- a Python installation version 2.7
- the SciPy and scikit-learn packages
- a PyTorch install version 0.1.8 ([pytorch.org](http://pytorch.org))
- CUDA 8.0
- a Faiss install ([Faiss](https://github.com/facebookresearch/faiss))
- The ImageNet dataset (which can be automatically downloaded by recent version of [torchvision](https://pytorch.org/docs/stable/torchvision/datasets.html#imagenet))

## Pre-trained models
We provide pre-trained models with AlexNet and VGG-16 architectures, available for download.
* The models in Caffe format expect BGR inputs that range in [0, 255]. You do not need to subtract the per-color-channel mean image since the preprocessing of the data is already included in our released models.
* The models in PyTorch format expect RGB inputs that range in [0, 1]. You should preprocessed your data before passing them to the released models by normalizing them: ```mean_rgb = [0.485, 0.456, 0.406]```; ```std_rgb = [0.229, 0.224, 0.225] ```
Note that in all our released models, sobel filters are computed within the models as two convolutional layers (greyscale + sobel filters).

You can download all variants by running 
```
$ ./download_model.sh
```
This will fetch the models into `${HOME}/deepcluster_models` by default.
You can change that path in the environment variable.
Direct download links are provided here:
* [AlexNet-PyTorch](https://dl.fbaipublicfiles.com/deepcluster/alexnet/checkpoint.pth.tar)
* [AlexNet-prototxt](https://dl.fbaipublicfiles.com/deepcluster/alexnet/model.prototxt) + [AlexNet-caffemodel](https://dl.fbaipublicfiles.com/deepcluster/alexnet/model.caffemodel)
* [VGG16-PyTorch](https://dl.fbaipublicfiles.com/deepcluster/vgg16/checkpoint.pth.tar)
* [VGG16-prototxt](https://dl.fbaipublicfiles.com/deepcluster/vgg16/model.prototxt) + [VGG16-caffemodel](https://dl.fbaipublicfiles.com/deepcluster/vgg16/model.caffemodel)

We also provide the last epoch cluster assignments for these models. After downloading, open the file with Python 2:
```
import pickle
with open("./alexnet_cluster_assignment.pickle", "rb") as f:
    b = pickle.load(f)
```
If you're a Python 3 user, specify ```encoding='latin1'``` in the load fonction.
Each file is a list of (image path, cluster_index) tuples.
* [AlexNet-clusters](https://dl.fbaipublicfiles.com/deepcluster/alexnet/alexnet_cluster_assignment.pickle)
* [VGG16-clusters](https://dl.fbaipublicfiles.com/deepcluster/vgg16/vgg16_cluster_assignment.pickle)

Finally, we release the features extracted with DeepCluster model for ImageNet dataset.
These features are in dimension 4096 and correspond to a forward on the model up to the penultimate convolutional layer (just before last ReLU).
In you plan to cluster the features, don't forget to normalize and reduce/whiten them.
* [AlexNet-imnetfeatures](https://dl.fbaipublicfiles.com/deepcluster/alexnet/alexnet_features.pkl)
* [VGG16-imnetfeatures](https://dl.fbaipublicfiles.com/deepcluster/vgg16/vgg16_features.pkl)

## Running the unsupervised training

Unsupervised training can be launched by running:
```
$ ./main.sh
```
Please provide the path to the data folder:
```
DIR=/datasets01/imagenet_full_size/061417/train
```
To train an AlexNet network, specify `ARCH=alexnet` whereas to train a VGG-16 convnet use `ARCH=vgg16`.

You can also specify where you want to save the clustering logs and checkpoints using:
```
EXP=exp
```

During training, models are saved every other n iterations (set using the `--checkpoints` flag), and can be found in for instance in `${EXP}/checkpoints/checkpoint_0.pth.tar`.
A log of the assignments in the clusters at each epoch can be found in the pickle file `${EXP}/clusters`.


Full documentation of the unsupervised training code `main.py`:
```
usage: main.py [-h] [--arch ARCH] [--sobel] [--clustering {Kmeans,PIC}]
               [--nmb_cluster NMB_CLUSTER] [--lr LR] [--wd WD]
               [--reassign REASSIGN] [--workers WORKERS] [--epochs EPOCHS]
               [--start_epoch START_EPOCH] [--batch BATCH]
               [--momentum MOMENTUM] [--resume PATH]
               [--checkpoints CHECKPOINTS] [--seed SEED] [--exp EXP]
               [--verbose]
               DIR

PyTorch Implementation of DeepCluster

positional arguments:
  DIR                   path to dataset

optional arguments:
  -h, --help            show this help message and exit
  --arch ARCH, -a ARCH  CNN architecture (default: alexnet)
  --sobel               Sobel filtering
  --clustering {Kmeans,PIC}
                        clustering algorithm (default: Kmeans)
  --nmb_cluster NMB_CLUSTER, --k NMB_CLUSTER
                        number of cluster for k-means (default: 10000)
  --lr LR               learning rate (default: 0.05)
  --wd WD               weight decay pow (default: -5)
  --reassign REASSIGN   how many epochs of training between two consecutive
                        reassignments of clusters (default: 1)
  --workers WORKERS     number of data loading workers (default: 4)
  --epochs EPOCHS       number of total epochs to run (default: 200)
  --start_epoch START_EPOCH
                        manual epoch number (useful on restarts) (default: 0)
  --batch BATCH         mini-batch size (default: 256)
  --momentum MOMENTUM   momentum (default: 0.9)
  --resume PATH         path to checkpoint (default: None)
  --checkpoints CHECKPOINTS
                        how many iterations between two checkpoints (default:
                        25000)
  --seed SEED           random seed (default: 31)
  --exp EXP             path to exp folder
  --verbose             chatty
```


## Evaluation protocols

### Pascal VOC

To run the classification task with fine-tuning launch:
```
./eval_voc_classif_all.sh
```
and with no finetuning:
```
./eval_voc_classif_fc6_8.sh
```

Both these scripts download [this code](https://github.com/philkr/voc-classification).
You need to download the [VOC 2007 dataset](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/). Then, specify in both `./eval_voc_classif_all.sh` and `./eval_voc_classif_fc6_8.sh` scripts the path `CAFFE` to point to the caffe branch, and `VOC` to point to the Pascal VOC directory.
Indicate in `PROTO` and `MODEL` respectively the path to the prototxt file of the model and the path to the model weights of the model to evaluate.
The flag `--train-from` allows to indicate the separation between the frozen and to-train layers.

We implemented [voc classification](https://github.com/facebookresearch/deepcluster/blob/master/eval_voc_classif.py) with PyTorch.

Erratum: When training the MLP only (fc6-8), the parameters of scaling of the batch-norm layers in the whole network are trained. 
With freezing these parameters we get 70.4 mAP.

### Linear classification on activations

You can run these transfer tasks using:
```
$ ./eval_linear.sh
```

You need to specify the path to the supervised data (ImageNet or Places):
```
DATA=/datasets01/imagenet_full_size/061417/
```
the path of your model:
```
MODEL=/private/home/mathilde/deepcluster/checkpoint.pth.tar
```
and on top of which convolutional layer to train the classifier:
```
CONV=3
```

You can specify where you want to save the output of this experiment (checkpoints and best models) with
```
EXP=exp
```

Full documentation for this task:
```
usage: eval_linear.py [-h] [--data DATA] [--model MODEL] [--conv {1,2,3,4,5}]
                      [--tencrops] [--exp EXP] [--workers WORKERS]
                      [--epochs EPOCHS] [--batch_size BATCH_SIZE] [--lr LR]
                      [--momentum MOMENTUM] [--weight_decay WEIGHT_DECAY]
                      [--seed SEED] [--verbose]

Train linear classifier on top of frozen convolutional layers of an AlexNet.

optional arguments:
  -h, --help            show this help message and exit
  --data DATA           path to dataset
  --model MODEL         path to model
  --conv {1,2,3,4,5}    on top of which convolutional layer train logistic
                        regression
  --tencrops            validation accuracy averaged over 10 crops
  --exp EXP             exp folder
  --workers WORKERS     number of data loading workers (default: 4)
  --epochs EPOCHS       number of total epochs to run (default: 90)
  --batch_size BATCH_SIZE
                        mini-batch size (default: 256)
  --lr LR               learning rate
  --momentum MOMENTUM   momentum (default: 0.9)
  --weight_decay WEIGHT_DECAY, --wd WEIGHT_DECAY
                        weight decay pow (default: -4)
  --seed SEED           random seed
  --verbose             chatty
```

### Instance-level image retrieval

You can run the instance-level image retrieval transfer task using:
```
./eval_retrieval.sh
```

## Visualisation

We provide two standard visualisation methods presented in our paper.

### Filter visualisation with gradient ascent

First, it is posible to learn an input image that maximizes the activation of a given filter. We follow the process
described by [Yosinki et al.](https://arxiv.org/abs/1506.06579) with a cross entropy function between the target
filter and the other filters in the same layer.
From the visu folder you can run
```
./gradient_ascent.sh
```
You will need to specify the model path ```MODEL```, the architecture of your model ```ARCH```, the path of the folder in which you want to save the synthetic images ```EXP``` and the convolutional layer to consider ```CONV```.

Full documentation:
```
usage: gradient_ascent.py [-h] [--model MODEL] [--arch {alexnet,vgg16}]
                          [--conv CONV] [--exp EXP] [--lr LR] [--wd WD]
                          [--sig SIG] [--step STEP] [--niter NITER]
                          [--idim IDIM]

Gradient ascent visualisation

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         Model
  --arch {alexnet,vgg16}
                        arch
  --conv CONV           convolutional layer
  --exp EXP             path to res
  --lr LR               learning rate (default: 3)
  --wd WD               weight decay (default: 10^-5)
  --sig SIG             gaussian blur (default: 0.3)
  --step STEP           number of iter between gaussian blurs (default: 5)
  --niter NITER         total number of iterations (default: 1000)
  --idim IDIM           size of input image (default: 224)
```
I recommand you play with the hyper-parameters to find a regime where the visualisations are good.
For example with the pre-trained deepcluster AlexNet, for conv1 using a learning rate of 3 and 30.000 iterations works well.
For conv5, using a learning rate of 30 and 3.000 iterations gives nice images with the other parameters set to their default values.

### Top 9 maximally activated images in a dataset

Finally, we provide code to retrieve images in a dataset that maximally activate a given filter in the convnet.
From the visu folder, after having changed the fields ```MODEL```, ```EXP```, ```CONV``` and ```DATA```, run
```
./activ-retrieval.sh
```

## DeeperCluster

We have proposed another unsupervised feature learning paper at ICCV 2019.
We have shown that unsupervised learning can be used to pre-train convnets, leading to a boost in performance on ImageNet classification.
We achieve that by scaling DeepCluster to 96M images and mixing it with RotNet self-supervision.
Check out the [paper](https://arxiv.org/abs/1905.01278) and [code](https://github.com/facebookresearch/DeeperCluster).

## License

You may find out more about the license [here](https://github.com/facebookresearch/deepcluster/blob/master/LICENSE).

## Reference

If you use this code, please cite the following paper:

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. "Deep Clustering for Unsupervised Learning of Visual Features." Proc. ECCV (2018).

```
@InProceedings{caron2018deep,
  title={Deep Clustering for Unsupervised Learning of Visual Features},
  author={Caron, Mathilde and Bojanowski, Piotr and Joulin, Armand and Douze, Matthijs},
  booktitle={European Conference on Computer Vision},
  year={2018},
}
```