# AutoThink
**Repository Path**: ScienceOne-AI/AutoThink
## Basic Information
- **Project Name**: AutoThink
- **Description**: AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead of always thinking or never thinking, the model learns when to engage in explicit reasoning, balancing performance and efficiency.
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-26
- **Last Updated**: 2025-08-26
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# ๐ง AutoThink: Adaptive Reasoning in R1-Style Models
๐ Codebase   | ๐ค Hugging Face   |   ๐ Paper   |   ๐ WeChat Chinese Version  
**AutoThink** is a reinforcement learning framework designed to equip R1-style language models with **adaptive reasoning** capabilities. Instead of always thinking or never thinking, the model learns **when** to engage in explicit reasoning, balancing performance and efficiency.
This repository implements **AutoThink**, as described in our paper:
> *Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL*

---
## ๐ฐ News
- ***[2025/05/28]*** Our work was featured on the **QbitAI** WeChat public account: ๐ [Chinese Version](https://mp.weixin.qq.com/s/qcGrNjIqU1cLSg_31wijJg)
- ***[2025/05/27]*** We apply *AutoThink* to the SOTA 7B model [Skywork-OR1-Math-7B](https://huggingface.co/Skywork/Skywork-OR1-Math-7B). *AutoThink* reduces reasoning token usage by **56%** with **less than 2% accuracy degradation**. We also updated the paper to fix minor issues and released the corresponding trained model.
- ***[2025/05/16]*** We release the [Code](https://github.com/ScienceOne-AI/AutoThink), [Models](https://huggingface.co/collections/SONGJUNTU/autothink-682624e1466651b08055b479), and [Paper](https://arxiv.org/abs/2505.10832) for *AutoThink*.
## ๐ Features
- ๐งฉ **Minimal Prompting** with ellipsis (`\n...\n`) to activate stochastic thinking.
- ๐ฏ **Multi-stage RL** to stabilize, reinforce, and prune reasoning behavior.
- โ๏ธ Integrated with the [`verl`](https://github.com/volcengine/verl) framework.
- ๐ Benchmarked on five mathematical reasoning datasets: MATH, Minerva, Olympiad, AIME24, AMC23.

---
## ๐ฆ Installation
Please clone the official [DeepScaleR](https://github.com/agentica-project/rllm) repository and follow its setup instructions:
Then, **replace the following three folders** in the original repo with ours:
```bash
cp -r code-release/verl deepscaler/
cp -r code-release/scripts deepscaler/
cp -r code-release/deepscaler deepscaler/
```
Install the environment:
```bash
# Recommend Python 3.10.
cd deepscaler
pip install -e ./verl
pip install -e .
```
The raw training data is located in `deepscaler/data/[train|test]`, along with preprocessing scripts. To convert the raw data into Parquet files for training, run:
```bash
# Output parquet files in data/*.parquet.
python scripts/data/deepscaler_dataset.py
```
---
## ๐ก Different Prompt Strategies
You can control the model's reasoning behavior by modifying the `chat_template` field in `tokenizer_config.json`. Update the value with one of the following:
- **Standard Prompt** (default for Distill-R1, no changes needed):
```json
"<|Assistant|>\n"
```
- **No-Thinking Prompt** (forces minimal reasoning):
```json
"<|Assistant|>\nOkay, I think I have finished thinking.\n\n\n"
```
- **Ellipsis Prompt** (adaptive reasoning mode):
```json
"<|Assistant|>\n...\n"
```
These prompts enable different reasoning behaviors.
Before AutoThink training, please replace the default `chat_template` with **Ellipsis Prompt** and keep the inference prompt consistent.
## ๐๏ธ Training
AutoThink training proceeds in **three stages** with different reward designs:
```bash
# Stage 1: Stabilize dual-mode reasoning
bash scripts/train_stage1.sh
# Stage 2: Reinforce accurate behavior
bash scripts/train_stage2.sh
# Stage 3: Prune redundant reasoning
bash scripts/train_stage3.sh
```
Make sure to configure your model paths and data in `scripts/train_*.sh`.
---
## ๐ Evaluation
After training, evaluate the model using:
```bash
bash scripts/eval/eval_model_1.5b.sh
```
---
## ๐ Results
AutoThink achievesefficiencyโaccuracy trade-offs, and exhibits two inference modes:



---
## ๐ Citation
```bibtex
@article{tu2025learning,
title={Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL},
author={Tu, Songjun and Lin, Jiahao and Zhang, Qichao and Tian, Xiangyu and Li, Linjing and Lan, Xiangyuan and Zhao, Dongbin},
journal={arXiv preprint arXiv:2505.10832},
year={2025}
}
```
---
## ๐ Acknowledgements
We build and reference on the following open source trunks, and thank the following sources for their contributions to the LLM-Reasoning open source community:
- [verl](https://github.com/volcengine/verl)
- [DeepScaleR](https://github.com/agentica-project/rllm)
- [ThinkPrune](https://github.com/UCSB-NLP-Chang/ThinkPrune)