# TinyR1-32B-Preview **Repository Path**: mirrors_Qihoo360/TinyR1-32B-Preview ## Basic Information - **Project Name**: TinyR1-32B-Preview - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-09 - **Last Updated**: 2025-04-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation -----------------------------

| 📑 Paper | 🤗 Hugging Face | 🌐 Blog |

_TinyR1 Team_
## Introduction We introduce our first-generation reasoning model, Tiny-R1-32B-Preview, which outperforms the 70B model Deepseek-R1-Distill-Llama-70B and nearly matches the full R1 model in math. We applied supervised fine-tuning (SFT) to Deepseek-R1-Distill-Qwen-32B across three target domains—Mathematics, Code, and Science — using the [360-LLaMA-Factory](https://github.com/Qihoo360/360-LLaMA-Factory/) training framework to produce three domain-specific models. We used questions from open-source data as seeds. Meanwhile, responses for mathematics, coding, and science tasks were generated by R1, creating specialized models for each domain. Building on this, we leveraged the Mergekit tool from the Arcee team to combine multiple models, creating Tiny-R1-32B-Preview, which demonstrates strong overall performance. For more technical details, please refer to our technical report. Paper Link👁️ ## Evaluation | Model | Math (AIME 2024) | Coding (LiveCodeBench) | Science (GPQA-Diamond) | | ------------------------------- | ------------------- | ----------------------- | ---------------------- | | Deepseek-R1-Distill-Qwen-32B | 72.6 | 57.2 | 62.1 | | Deepseek-R1-Distill-Llama-70B | 70.0 | 57.5 | 65.2 | | Deepseek-R1 | 79.8 | 65.9 | 71.5 | | Tiny-R1-32B-Preview (Ours) | 78.1 | 61.6 | 65.0 All scores are reported as pass@1. For AIME 2024, we sample 16 responses, and for GPQA-Diamond, we sample 4 responses, both using average overall accuracy for stable evaluation. We merged the models trained separately in three directions into a single model. Below are the comparison results. | Model | Math (AIME 2024) | Coding (LiveCodeBench) | Science (GPQA-Diamond) | | ------------------------------- | ------------------- | ----------------------- | ---------------------- | | Math-Model | 73.1 | - | - | | Code-Model | - | 63.4 | - | | Science-Model | - | - | 64.5 | | Merged-Model (Tiny-R1-32B-Preview) | 78.1 | 61.6 | 65.0 ## Getting Started ### Branch Train For multi-node training, please first fill in the `train/hostfile` file. For single-node training, this step is not required. > **Note** > > - **About `hostfile`**: > Each line in the `hostfile` specifies a node, formatted as ` slots=`, where `` is the name of the node and `` is the number of GPUs available on that node. Here is an example: > > ```plaintext > worker-0 slots=8 > worker-1 slots=8 > ``` > > For more details, please refer to the [DeepSpeed official documentation](https://www.deepspeed.ai/getting-started/). > #### Installation To install the required dependencies, run: ``` pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn ``` #### Math Model SFT Hint: Replace BASE_MODEL with the actual path to the base model, e.g., "/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B". ```bash BASE_MODEL="/path/to/base-model/" bash train/run.sh \ --model $BASE_MODEL \ --data-id-path "data/open-r1-math-default-0223.json" \ --output-dir "model_output/branch-math-model" \ --model-max-length 16384 \ --learning-rate 1e-5 \ --lr-scheduler-type constant_with_warmup \ --num-train-epochs 5 \ --save-steps 200 \ --gradient-accumulation-steps 3 \ --template qwen \ --packing_type "packing" ``` #### Science Model SFT ```bash BASE_MODEL="/path/to/base-model/" bash train/run.sh \ --model $BASE_MODEL \ --data-id-path "data/OpenThoughts-science-with-wrong5k-r1,s1_science_3k-r1,s1_1k-r1" \ --output-dir "model_output/branch-science-model" \ --model-max-length 16384 \ --learning-rate 1e-5 \ --lr-scheduler-type cosine \ --num-train-epochs 5 \ --save-steps 200 \ --gradient-accumulation-steps 1 \ --packing_type "neatpacking" \ --template qwen ``` #### Code Model SFT ```bash BASE_MODEL="/path/to/base-model/" bash train/run.sh \ --model $BASE_MODEL \ --data-id-path "data/openthoughts-16kseq-0218.json" \ --output-dir "model_output/branch-code-model" \ --model-max-length 16384 \ --learning-rate 1e-5 \ --lr-scheduler-type constant_with_warmup \ --num-train-epochs 15 \ --save-steps 200 \ --gradient-accumulation-steps 3 \ --packing_type "neatpacking" \ --template qwen ``` ### Merge #### Installation To reproduce the merged [qihoo360/TinyR1-32B-Preview](https://huggingface.co/qihoo360/TinyR1-32B-Preview) model, using the script below. ```bash git clone https://github.com/TinyR1-32B-Preview.git cd TinyR1-32B-Preview/mergekit/ pip install -e . ``` If you encounter the error: ```bash ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. ``` you can resolve it by following these steps: Update the package list and install the virtual environment package: ```bash apt-get update -y apt-get install python3-venv -y ``` Create a virtual environment and activate the virtual environment: ```bash python3.10 -m venv eval source eval/bin/activate ``` After activating the virtual environment, reinstall the required packages. This approach isolates your Python environment from the global packages, thereby preventing dependency conflicts. ``` sh sh/tinyr1_merge.sh [/path/to/math-model] [/path/to/science-model] [/path/to/code-model] [/path/to/output-model-dir] ``` The following parameters are mandatory: - `[/path/to/math-model]`: the path to the math domain model that has been fine-tuned via SFT. - `[/path/to/science-model]`: the path to the science domain model that has been fine-tuned via SFT. - `[/path/to/code-model]`: the path to the code domain model that has been fine-tuned via SFT. - `[/path/to/output-model-dir]`: the path where the fused model will be saved. ## Evaluation We test the resulted models on three kinds of benchmarks, including **Math Reasoning**, **Code Reasoning** , and **Scientific Reasoning**. Math Reasoning - AIME24 - AIME25 Scientific Reasoning - GPQA-Diamond Code Reasoning - LiveCodeBench (2408-2502) ### Math Reasoning The evaluation code is modified from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math). In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in [math_evaluation](./math_evaluation). The system prompt for evaluation is set to: ```sh Please reason step by step, and put your final answer within \\boxed{{}}. ``` ### Scientific Reasoning The evaluation code is modified from [FuseO1-Preview](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview). In our evaluation, we set the temperature to 0.6 and the max_tokens to 32768. We provide the example to reproduce our results in [science_evaluation](./science_evaluation). The system prompt for evaluation is set to: ```sh You are a helpful and harmless assistant. You should think step-by-step. ``` ### Code Reasoning The evaluation code is modified from [FuseO1-Preview](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview). In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in [code_lcb_evaluation](./code_lcb_evaluation). The system prompt for evaluation is set to: ```sh A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within and tags, respectively, i.e., reasoning process here answer here . ``` ## Quickstart ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "qihoo360/TinyR1-32B-Preview" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Please reason step by step, and put your final answer within \\boxed{}. Solve the integral: \[I = \int \frac{x^2}{(x+1)^3} \,dx\]" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=4000 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ## Citation ``` @misc{tinyr1proj, title={TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation}, author={TinyR1 Team}, year={2025}, url={https://arxiv.org/abs/2503.04872}, } ```