# CMC **Repository Path**: wuwu-wu/CMC ## Basic Information - **Project Name**: CMC - **Description**: No description available - **Primary Language**: Unknown - **License**: BSD-2-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-21 - **Last Updated**: 2025-07-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Official implementation: - CMC: Contrastive Multiview Coding ([Paper](http://arxiv.org/abs/1906.05849)) Unofficial implementation: - MoCo: Momentum Contrast for Unsupervised Visual Representation Learning ([Paper](https://arxiv.org/abs/1911.05722)) - InsDis: Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination ([Paper](https://arxiv.org/abs/1805.01978)) ## Citation If you find this repo useful for your research, please consider citing the paper ``` @article{tian2019contrastive, title={Contrastive Multiview Coding}, author={Tian, Yonglong and Krishnan, Dilip and Isola, Phillip}, journal={arXiv preprint arXiv:1906.05849}, year={2019} } ``` ## Contrastive Multiview Coding This repo covers the implementation for CMC (as well as Momentum Contrast and Instance Discrimination), which learns representations from multiview data in a self-supervised way (by multiview, we mean multiple sensory, multiple modal data, or literally multiple viewpoint data. It's flexible to define what is a "view"): "Contrastive Multiview Coding" [Paper](http://arxiv.org/abs/1906.05849), [Project Page](http://hobbitlong.github.io/CMC/). ![Teaser Image](http://hobbitlong.github.io/CMC/CMC_files/teaser.jpg) ## Highlights **(1) Representation quality as a function of number of contrasted views.** We found that, the more views we train with, the better the representation (of each single view). **(2) Contrastive objective v.s. Predictive objective** We compare the contrastive objective to cross-view prediction, finding an advantage to the contrastive approach. **(3) Unsupervised v.s. Supervised** Several ResNets trained with our **unsupervised** CMC objective surpasses **supervisedly** trained AlexNet on ImageNet classification ( e.g., 68.4% v.s. 59.3%). For this first time on ImageNet classification, unsupervised methods are surpassing the classic supervised-AlexNet proposed in 2012 (CPC++ and AMDIM also achieve this milestone concurrently). ## Updates Aug 20, 2019 - ResNets on ImageNet have been added. Nov 26, 2019 - New results updated. Implementation of **MoCo** and **InsDis** added. Jan 18, 2020 - Weights of **InsDis** and **MoCo** added. May 22, 2020 - ImageNet-100 list uploaded, see [`imagenet100.txt`](imagenet100.txt). ## Installation This repo was tested with Ubuntu 16.04.5 LTS, Python 3.5, PyTorch 0.4.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=0.4.0 **Note:** It seems to us that training with Pytorch version >= 1.0 yields slightly worse results. If you find the similar discrepancy and figure out the problem, please report this since we are trying to fix it as well. ## Training AlexNet/ResNets with CMC on ImageNet **Note:** For AlexNet, we split across the channel dimension and use each half to encode L and ab. For ResNets, we use a standard ResNet model to encode each view. NCE flags: - `--nce_k`: number of negatives to contrast for each positive. Default: 4096 - `--nce_m`: the momentum for dynamically updating the memory. Default: 0.5 - `--nce_t`: temperature that modulates the distribution. Default: 0.07 for ImageNet, 0.1 for STL-10 Path flags: - `--data_folder`: specify the ImageNet data folder. - `--model_path`: specify the path to save model. - `--tb_path`: specify where to save tensorboard monitoring events. Model flag: - `--model`: specify which model to use, including *alexnet*, *resnets18*, *resnets50*, and *resnets101* An example of command line for training CMC (Default: `AlexNet` on Single GPU) ``` CUDA_VISIBLE_DEVICES=0 python train_CMC.py --batch_size 256 --num_workers 36 \ --data_folder /path/to/data --model_path /path/to/save --tb_path /path/to/tensorboard ``` Training CMC with ResNets requires at least 4 GPUs, the command of using `resnet50v1` looks like ``` CUDA_VISIBLE_DEVICES=0,1,2,3 python train_CMC.py --model resnet50v1 --batch_size 128 --num_workers 24 --data_folder path/to/data \ --model_path path/to/save \ --tb_path path/to/tensorboard \ ``` To support mixed precision training, simply append the flag `--amp`, which, however is likely to harm the downstream classification. I measure it on ImageNet100 subset and the gap is about 0.5-1%. By default, the training scripts will use L and ab as two views for contrasting. You can switch to `YCbCr` by specifying `--view YCbCr`, which yields better results (about 0.5-1%). If you want to use other color spaces as different views, follow the line [here](https://github.com/HobbitLong/CMC/blob/master/train_CMC.py#L146) and other color transfer functions are already available in `dataset.py`. ## Training Linear Classifier Path flags: - `--data_folder`: specify the ImageNet data folder. Should be the same as above. - `--save_path`: specify the path to save the linear classifier. - `--tb_path`: specify where to save tensorboard events monitoring linear classifier training. Model flag `--model` is similar as above and should be specified. Specify the checkpoint that you want to evaluate with `--model_path` flag, this path should directly point to the `.pth` file. This repo provides 3 ways to train the linear classifier: *single GPU*, *data parallel*, and *distributed data parallel*. An example of command line for evaluating, say `./models/alexnet.pth`, should look like: ``` CUDA_VISIBLE_DEVICES=0 python LinearProbing.py --dataset imagenet \ --data_folder /path/to/data \ --save_path /path/to/save \ --tb_path /path/to/tensorboard \ --model_path ./models/alexnet.pth \ --model alexnet --learning_rate 0.1 --layer 5 ``` **Note:** When training linear classifiers on top of ResNets, it's important to use large learning rate, e.g., 30~50. Specifically, change `--learning_rate 0.1 --layer 5` to `--learning_rate 30 --layer 6` for `resnet50v1` and `resnet50v2`, to `--learning_rate 50 --layer 6` for `resnet50v3`. ## Pretrained Models Pretrained weights can be found in [Dropbox](https://www.dropbox.com/sh/5k4t77mt4011gyr/AABkBvKm2bGNNut0m6bLMK84a?dl=0). Note: - CMC weights are trained with `NCE` loss, `Lab` color space, `4096` negatives and `amp` option. Switching to `softmax-ce` loss, `YCbCr`, `65536` negatives, and turning off `amp` option, are likely to improve the results. - `CMC_resnet50v2.pth` and `CMC_resnet50v3.pth` are trained with FastAutoAugment, which improves the downstream accuracy by 0.8~1%. I will update weights without FastAutoAugment once they are available. InsDis and MoCo are trained using the same hyperparameters as in MoCo (`epochs=200, lr=0.03, lr_decay_epochs=120,160, weight_decay=1e-4`), but with only 4 GPUs. | |Arch | #Params(M) | Loss | #Negative | Accuracy(%) | Delta(%) | |----------|:----:|:---:|:---:|:---:|:---:|:---:| | InsDis | ResNet50 | 24 | NCE | 4096 | 56.5 | - | | InsDis | ResNet50 | 24 | Softmax-CE | 4096 | 57.1 | +0.6 | | InsDis | ResNet50 | 24 | Softmax-CE | 16384 | 58.5 | +1.4 | | MoCo | ResNet50 | 24 | Softmax-CE | 16384 | 59.4 | +0.9| ## Momentum Contrast and Instance Discrimination I have implemented and tested MoCo and InsDis on a ImageNet100 subset (but the code allows one to train on full ImageNet simply by setting the flag `--dataset imagenet`): The pre-training stage: - For InsDis: ``` CUDA_VISIBLE_DEVICES=0,1,2,3 python train_moco_ins.py \ --batch_size 128 --num_workers 24 --nce_k 16384 --softmax ``` - For MoCo: ``` CUDA_VISIBLE_DEVICES=0,1,2,3 python train_moco_ins.py \ --batch_size 128 --num_workers 24 --nce_k 16384 --softmax --moco ``` The linear evaluation stage: - For both InsDis and MoCo (lr=10 is better than 30 on this subset, for full imagenet please switch to 30): ``` CUDA_VISIBLE_DEVICES=0 python eval_moco_ins.py --model resnet50 \ --model_path /path/to/model --num_workers 24 --learning_rate 10 ``` The comparison of `CMC` (using YCbCr), `MoCo` and `InsDIS` on my ImageNet100 subset, is tabulated as below: | |Arch | #Params(M) | Loss | #Negative | Accuracy | |----------|:----:|:---:|:---:|:---:|:---:| | InsDis | ResNet50 | 24 | NCE | 16384 | -- | | InsDis | ResNet50 | 24 | Softmax-CE | 16384 | 69.1 | | MoCo | ResNet50 | 24 | NCE | 16384 | -- | | MoCo | ResNet50 | 24 | Softmax-CE | 16384 | 73.4 | | CMC | 2xResNet50half | 12 | NCE | 4096 | -- | | CMC | 2xResNet50half | 12 | Softmax-CE | 4096 | 75.8 | For any questions, please contact Yonglong Tian (yonglong@mit.edu). ## Acknowledgements Part of this code is inspired by Zhirong Wu's unsupervised learning algorithm [lemniscate](https://github.com/zhirongw/lemniscate.pytorch).