# AutoThink **Repository Path**: ScienceOne-AI/AutoThink ## Basic Information - **Project Name**: AutoThink - **Description**: AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead of always thinking or never thinking, the model learns when to engage in explicit reasoning, balancing performance and efficiency. - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-26 - **Last Updated**: 2025-08-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ๐Ÿง  AutoThink: Adaptive Reasoning in R1-Style Models

๐Ÿ”— Codebase   | ๐Ÿค— Hugging Face   |   ๐Ÿ“‘ Paper   |   ๐Ÿ“– WeChat Chinese Version  

**AutoThink** is a reinforcement learning framework designed to equip R1-style language models with **adaptive reasoning** capabilities. Instead of always thinking or never thinking, the model learns **when** to engage in explicit reasoning, balancing performance and efficiency. This repository implements **AutoThink**, as described in our paper: > *Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL* ![framework1](./assets/1.png) --- ## ๐Ÿ“ฐ News - ***[2025/05/28]*** Our work was featured on the **QbitAI** WeChat public account: ๐Ÿ“– [Chinese Version](https://mp.weixin.qq.com/s/qcGrNjIqU1cLSg_31wijJg) - ***[2025/05/27]*** We apply *AutoThink* to the SOTA 7B model [Skywork-OR1-Math-7B](https://huggingface.co/Skywork/Skywork-OR1-Math-7B). *AutoThink* reduces reasoning token usage by **56%** with **less than 2% accuracy degradation**. We also updated the paper to fix minor issues and released the corresponding trained model. - ***[2025/05/16]*** We release the [Code](https://github.com/ScienceOne-AI/AutoThink), [Models](https://huggingface.co/collections/SONGJUNTU/autothink-682624e1466651b08055b479), and [Paper](https://arxiv.org/abs/2505.10832) for *AutoThink*. ## ๐Ÿš€ Features - ๐Ÿงฉ **Minimal Prompting** with ellipsis (`\n...\n`) to activate stochastic thinking. - ๐ŸŽฏ **Multi-stage RL** to stabilize, reinforce, and prune reasoning behavior. - โš™๏ธ Integrated with the [`verl`](https://github.com/volcengine/verl) framework. - ๐Ÿ“Š Benchmarked on five mathematical reasoning datasets: MATH, Minerva, Olympiad, AIME24, AMC23. ![framework2](./assets/2.png) --- ## ๐Ÿ“ฆ Installation Please clone the official [DeepScaleR](https://github.com/agentica-project/rllm) repository and follow its setup instructions: Then, **replace the following three folders** in the original repo with ours: ```bash cp -r code-release/verl deepscaler/ cp -r code-release/scripts deepscaler/ cp -r code-release/deepscaler deepscaler/ ``` Install the environment: ```bash # Recommend Python 3.10. cd deepscaler pip install -e ./verl pip install -e . ``` The raw training data is located in `deepscaler/data/[train|test]`, along with preprocessing scripts. To convert the raw data into Parquet files for training, run: ```bash # Output parquet files in data/*.parquet. python scripts/data/deepscaler_dataset.py ``` --- ## ๐Ÿ’ก Different Prompt Strategies You can control the model's reasoning behavior by modifying the `chat_template` field in `tokenizer_config.json`. Update the value with one of the following: - **Standard Prompt** (default for Distill-R1, no changes needed): ```json "<|Assistant|>\n" ``` - **No-Thinking Prompt** (forces minimal reasoning): ```json "<|Assistant|>\nOkay, I think I have finished thinking.\n\n\n" ``` - **Ellipsis Prompt** (adaptive reasoning mode): ```json "<|Assistant|>\n...\n" ``` These prompts enable different reasoning behaviors. Before AutoThink training, please replace the default `chat_template` with **Ellipsis Prompt** and keep the inference prompt consistent. ## ๐Ÿ‹๏ธ Training AutoThink training proceeds in **three stages** with different reward designs: ```bash # Stage 1: Stabilize dual-mode reasoning bash scripts/train_stage1.sh # Stage 2: Reinforce accurate behavior bash scripts/train_stage2.sh # Stage 3: Prune redundant reasoning bash scripts/train_stage3.sh ``` Make sure to configure your model paths and data in `scripts/train_*.sh`. --- ## ๐Ÿ“ˆ Evaluation After training, evaluate the model using: ```bash bash scripts/eval/eval_model_1.5b.sh ``` --- ## ๐Ÿ“Š Results AutoThink achievesefficiencyโ€“accuracy trade-offs, and exhibits two inference modes: ![results](./assets/3.png) ![results](./assets/5.png) ![modes](./assets/4.png) --- ## ๐Ÿ“„ Citation ```bibtex @article{tu2025learning, title={Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL}, author={Tu, Songjun and Lin, Jiahao and Zhang, Qichao and Tian, Xiangyu and Li, Linjing and Lan, Xiangyuan and Zhao, Dongbin}, journal={arXiv preprint arXiv:2505.10832}, year={2025} } ``` --- ## ๐Ÿ” Acknowledgements We build and reference on the following open source trunks, and thank the following sources for their contributions to the LLM-Reasoning open source community: - [verl](https://github.com/volcengine/verl) - [DeepScaleR](https://github.com/agentica-project/rllm) - [ThinkPrune](https://github.com/UCSB-NLP-Chang/ThinkPrune)