# TinyR1-32B-Preview

**Repository Path**: mirrors_Qihoo360/TinyR1-32B-Preview

## Basic Information

- **Project Name**: TinyR1-32B-Preview
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-09
- **Last Updated**: 2025-04-30

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

<p align="center" width="100%">
</p>

<div id="top" align="center">

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
-----------------------------

<h4> |<a href="https://arxiv.org/abs/2503.04872"> 📑 Paper </a> |
<a href="https://huggingface.co/qihoo360/TinyR1-32B-Preview"> 🤗 Hugging Face </a> |
<a href="https://news.pku.edu.cn/xwzh/c82f18fbd0be407f805b8ecc1ace3bfb.htm"> 🌐 Blog </a> |
</h4>


<!-- **Affiliations:** -->

_TinyR1 Team_

</div>


## Introduction
We introduce our first-generation reasoning model, Tiny-R1-32B-Preview, which outperforms the 70B model Deepseek-R1-Distill-Llama-70B and nearly matches the full R1 model in math.

We applied supervised fine-tuning (SFT) to Deepseek-R1-Distill-Qwen-32B across three target domains—Mathematics, Code, and Science — using the [360-LLaMA-Factory](https://github.com/Qihoo360/360-LLaMA-Factory/) training framework to produce three domain-specific models. We used questions from open-source data as seeds. Meanwhile, responses for mathematics, coding, and science tasks were generated by R1, creating specialized models for each domain. Building on this, we leveraged the Mergekit tool from the Arcee team to combine multiple models, creating Tiny-R1-32B-Preview, which demonstrates strong overall performance. For more technical details, please refer to our technical report. <a href="https://arxiv.org/abs/2503.04872"><b>Paper Link</b>👁️</a>


## Evaluation 
| Model                           | Math (AIME 2024)        | Coding (LiveCodeBench)  | Science (GPQA-Diamond) |
| ------------------------------- | ------------------- | ----------------------- | ---------------------- |
| Deepseek-R1-Distill-Qwen-32B    | 72.6        	| 57.2           	  | 62.1            |
| Deepseek-R1-Distill-Llama-70B   | 70.0                  |  57.5                   | 65.2                   |
| Deepseek-R1                     | 79.8                | 65.9                    | 71.5               |
| Tiny-R1-32B-Preview (Ours)       | 78.1                | 61.6                    | 65.0      

All scores are reported as pass@1.
For AIME 2024, we sample 16 responses, and for GPQA-Diamond, we sample 4 responses, both using average overall accuracy for stable evaluation.


We merged the models trained separately in three directions into a single model. Below are the comparison results.  
| Model                           | Math (AIME 2024)        | Coding (LiveCodeBench)  | Science (GPQA-Diamond) |
| ------------------------------- | ------------------- | ----------------------- | ---------------------- |
| Math-Model                | 73.1                | -                       | -                      |
| Code-Model               | -                   | 63.4                    | -                      |
| Science-Model             | -                   | -                       | 64.5                   |
| Merged-Model (Tiny-R1-32B-Preview)              | 78.1                | 61.6                    | 65.0 
## Getting Started

### Branch Train  
For multi-node training, please first fill in the `train/hostfile` file. For single-node training, this step is not required.  

> **Note**  
>  
> - **About `hostfile`**:  
>   Each line in the `hostfile` specifies a node, formatted as `<hostname> slots=<num_slots>`, where `<hostname>` is the name of the node and `<num_slots>` is the number of GPUs available on that node. Here is an example:  
>  
>   ```plaintext
>   worker-0 slots=8  
>   worker-1 slots=8  
>   ```  
>  
>   For more details, please refer to the [DeepSpeed official documentation](https://www.deepspeed.ai/getting-started/).
>

#### Installation

To install the required dependencies, run:
```
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn
```
#### Math Model SFT

Hint: Replace BASE_MODEL with the actual path to the base model, e.g., "/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B".

```bash
BASE_MODEL="/path/to/base-model/" 
bash train/run.sh \
  --model $BASE_MODEL \
  --data-id-path "data/open-r1-math-default-0223.json" \
  --output-dir "model_output/branch-math-model" \
  --model-max-length 16384 \
  --learning-rate 1e-5 \
  --lr-scheduler-type constant_with_warmup \
  --num-train-epochs 5 \
  --save-steps 200 \
  --gradient-accumulation-steps 3 \
  --template qwen \
  --packing_type "packing" 
```
#### Science Model SFT
```bash
BASE_MODEL="/path/to/base-model/" 
bash train/run.sh \
  --model $BASE_MODEL \
  --data-id-path "data/OpenThoughts-science-with-wrong5k-r1,s1_science_3k-r1,s1_1k-r1" \
  --output-dir "model_output/branch-science-model" \
  --model-max-length 16384 \
  --learning-rate 1e-5 \
  --lr-scheduler-type cosine \
  --num-train-epochs 5 \
  --save-steps 200 \
  --gradient-accumulation-steps 1 \
  --packing_type "neatpacking" \
  --template qwen
```
#### Code Model SFT
```bash
BASE_MODEL="/path/to/base-model/" 
bash train/run.sh \
  --model $BASE_MODEL \
  --data-id-path "data/openthoughts-16kseq-0218.json" \
  --output-dir "model_output/branch-code-model" \
  --model-max-length 16384 \
  --learning-rate 1e-5 \
  --lr-scheduler-type constant_with_warmup \
  --num-train-epochs 15 \
  --save-steps 200 \
  --gradient-accumulation-steps 3 \
  --packing_type "neatpacking" \
  --template qwen
```

### Merge

#### Installation

To reproduce the merged [qihoo360/TinyR1-32B-Preview](https://huggingface.co/qihoo360/TinyR1-32B-Preview) model, using the script below.

```bash
git clone https://github.com/TinyR1-32B-Preview.git
cd TinyR1-32B-Preview/mergekit/
pip install -e .
```
If you encounter the error:

```bash
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

```

you can resolve it by following these steps:

Update the package list and install the virtual environment package:

```bash
apt-get update -y
apt-get install python3-venv -y
```

Create a virtual environment and activate the virtual environment:

```bash
python3.10 -m venv eval
source eval/bin/activate
```

After activating the virtual environment, reinstall the required packages. This approach isolates your Python environment from the global packages, thereby preventing dependency conflicts.

```
sh sh/tinyr1_merge.sh  [/path/to/math-model]  [/path/to/science-model]  [/path/to/code-model]  [/path/to/output-model-dir]
```
The following parameters are mandatory:

- `[/path/to/math-model]`: the path to the math domain model that has been fine-tuned via SFT.

- `[/path/to/science-model]`: the path to the science domain model that has been fine-tuned via SFT.

- `[/path/to/code-model]`: the path to the code domain model that has been fine-tuned via SFT.

- `[/path/to/output-model-dir]`: the path where the fused model will be saved.


## Evaluation


We test the resulted models on three kinds of benchmarks, including **Math Reasoning**, **Code Reasoning** , and **Scientific Reasoning**.

Math Reasoning
  - AIME24
  - AIME25

Scientific Reasoning
  - GPQA-Diamond

  
Code Reasoning
  - LiveCodeBench (2408-2502)


### Math Reasoning

The evaluation code is modified from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math). In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in [math_evaluation](./math_evaluation).

The system prompt for evaluation is set to:

```sh
Please reason step by step, and put your final answer within \\boxed{{}}.
```


### Scientific Reasoning

The evaluation code is modified from [FuseO1-Preview](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview). In our evaluation, we set the temperature to 0.6 and the max_tokens to 32768. We provide the example to reproduce our results in [science_evaluation](./science_evaluation).

The system prompt for evaluation is set to:

```sh
You are a helpful and harmless assistant. You should think step-by-step.
```


### Code Reasoning

The evaluation code is modified from [FuseO1-Preview](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview). In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in [code_lcb_evaluation](./code_lcb_evaluation).

The system prompt for evaluation is set to:

```sh
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
```


## Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "qihoo360/TinyR1-32B-Preview"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Please reason step by step, and put your final answer within \\boxed{}. Solve the integral:  \[I = \int \frac{x^2}{(x+1)^3} \,dx\]"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=4000
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)
```

## Citation
```
@misc{tinyr1proj,
      title={TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation}, 
      author={TinyR1 Team},
      year={2025},
      url={https://arxiv.org/abs/2503.04872}, 
}
```