# coap **Repository Path**: ByteDance/coap ## Basic Information - **Project Name**: coap - **Description**: COAP is a memory-efficient training method that reduces computational overhead without sacrificing performance. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-26 - **Last Updated**: 2026-01-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

COAP: Memory-Efficient Training with Correlation-Aware
Gradient Projection

Jinqi Xiao1,2 · Shen Sang1 · Tiancheng Zhi1 · Jing Liu1 · Qing Yan1 · Yuqian Zhang2 · Linjie Luo1 · Bo Yuan2

1ByteDance Inc.  2Rutgers University  

Paper PDF Project Page

CVPR 2025

**COAP** (COrrelation-Aware Gradient Projection) is a memory-efficient training method that reduces computational overhead without sacrificing performance. Tested on vision, language, and multimodal tasks, COAP delivers faster training and better results than existing approaches—making it an ideal choice for scaling large models efficiently.

COAP Performance
Comparison between COAP and other low-rank-based methods. The X-axis shows additional training time, with lower values being better. The Y-axis shows quantitative (e.g., FID, PPL) changes compared to the original optimizer (e.g., Adam, Adafactor) with higher values indicating better performance.


COAP Memory
Profiling the GPU memory usage.


## Installation ```bash pip install -e . ``` ## Usage ### Examples We provide three examples (e.g. ControlNet-XL, DDPM, LLAMA) included in our main paper for reproducibility. Please refer to [examples](./examples/README.md) for more results. - [DDPM](examples/ddpm#readme) - [ControlNet-SDXL](examples/controlnet_sdxl#readme) - [LLaMA-1B and LLaMA-7B on the C4 dataset](examples/llama#readme) ### How to use COAP in your code Here are the main parameters for COAP: - `optimizer`: The optimizer provided by COAP, including `coap_adamw`, `coap_adamw8bit`, `coap_adafactor`, `coap_adafactor8bit`. - `rank`: The rank of the projected matrix. - `rank_ratio_matrix`: The compression ratio of the 2D weight matrix (This will override the `rank` parameter). - `rank_ratio_cnn`: The compression ratio of the 4D weight matrix of CNN layers. - `update_interval`: The interval of updating the projection matrix. - `reproject_factor`: The factor of re-projection. ```python from coap_torch import CoapAdamW, CoapAdafactor # AdamW optimizer = AdamW(model.parameters(), lr=learning_rate) # CoapAdamW optimizer = CoapAdamW(params=model.parameters(), lr=learning_rate, rank_ratio_matrix=2, rank_ratio_cnn=2, update_interval=32, reproject_factor=5) # CoapAdafactor optimizer = CoapAdafactor(params=model.parameters(), lr=learning_rate, rank_ratio_matrix=2, rank_ratio_cnn=2, update_interval=32, reproject_factor=5) ``` Please refer to the [DDPM](./examples/ddpm/train_unconditional.py#L564) and [ControlNet-SDXL](./examples/controlnet_sdxl/train_controlnet_sdxl.py#L1165) examples for basic usage. A more advanced use case can be found in the [LLAMA example](./examples/llama/torchrun_main.py#L301). ## BibTeX If you find [COAP](https://arxiv.org/abs/2412.00071) useful for your research and applications, please cite COAP using this BibTeX: ```bash @misc{xiao2025coapmemoryefficienttrainingcorrelationaware, title={COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection}, author={Jinqi Xiao and Shen Sang and Tiancheng Zhi and Jing Liu and Qing Yan and Yuqian Zhang and Linjie Luo and Bo Yuan}, year={2025}, eprint={2412.00071}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2412.00071}, } ``` ## License Apache 2.0 License. See [LICENSE](./LICENSE) for details.