# NuminaMath-1.5-RL-Verifiable **Repository Path**: hf-datasets/NuminaMath-1.5-RL-Verifiable ## Basic Information - **Project Name**: NuminaMath-1.5-RL-Verifiable - **Description**: Mirror of https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-15 - **Last Updated**: 2025-07-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README --- license: apache-2.0 task_categories: - text-generation - question-answering language: - en tags: - math - post-training - RL - verifiable - reasoning pretty_name: NuminaMath 1.5 RL Verifiable dataset_info: features: - name: problem dtype: string - name: solution dtype: string - name: answer dtype: string - name: problem_type dtype: string - name: question_type dtype: string - name: problem_is_valid dtype: string - name: solution_is_valid dtype: string - name: source dtype: string - name: synthetic dtype: bool splits: - name: train num_bytes: 188626432 num_examples: 131063 download_size: 85718743 dataset_size: 188626432 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for NuminaMath-1.5-RL-Verifiable ## Dataset Description - **Homepage:** https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable - **Repository:** [NuminaMath-1.5-RL-Verifiable](https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable) - **Based on:** [NuminaMath-1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5) ### Dataset Summary NuminaMath-1.5-RL-Verifiable is a curated subset of the NuminaMath-1.5 dataset, specifically filtered to support reinforcement learning applications requiring verifiable outcomes. This collection consists of 131,063 math word problems from the original dataset that meet strict filtering criteria: all problems have definitive numerical answers, validated problem statements and solutions, and come from high-quality, non-synthetic sources. The filtering process removes multiple-choice questions, proofs, problems without clear numerical answers, and all synthetic content, while preserving the rich diversity of mathematical domains from the original collection. ### Filtering Methodology The dataset was created by applying the following filters to the original NuminaMath-1.5 dataset: - **Removed question types**: Multiple-choice questions and proofs - **Answer validation**: Retained only problems with non-empty, numerical answers (excluded 'proof', 'notfound' answers) - **Source selection**: Excluded potentially lower-quality sources (cn_k12, orca_math, synthetic_math, metamath) - **Quality filters**: Retained only problems with validated problem statements and solutions - **Authenticity**: Excluded all synthetic problems These filtering steps reduced the original dataset from 896,215 problems to 131,063 problems (approximately 14.6% of the original dataset), all with verifiable outcomes. ## Dataset Structure ### Data Instances Each instance in the dataset contains: - A math word problem statement - A Chain of Thought (CoT) solution - A definitive numerical answer - Problem metadata including math domain type ### Data Fields - `problem`: Text description of the mathematical problem - `ref_solution`: Step-by-step Chain of Thought (CoT) solution - `answer`: Definitive numerical result - `problem_type`: Mathematical domain (Algebra, Geometry, Number Theory, etc.) - `question_type`: Always "math-word-problem" in this filtered dataset - `source`: Origin of the problem (olympiads, cn_contest, aops_forum, etc.) - `problem_is_valid`: Always "Yes" in this filtered dataset - `solution_is_valid`: Always "Yes" in this filtered dataset - `synthetic`: Always false in this filtered dataset ### Dataset Statistics #### Distribution by Source | Source | Problem Count | |--------|---------------| | olympiads | 92,487 | | cn_contest | 15,828 | | aops_forum | 15,092 | | amc_aime | 4,893 | | inequalities | 1,145 | | olympiads_ref | 1,001 | | number_theory | 617 | | **Total** | **131,063** | #### Distribution by Problem Type | Problem Type | Problem Count | Percentage | |--------------|---------------|------------| | Algebra | 42,972 | 32.79% | | Geometry | 31,405 | 23.96% | | Number Theory | 22,071 | 16.84% | | Combinatorics | 17,144 | 13.08% | | Logic and Puzzles | 7,250 | 5.53% | | Calculus | 4,954 | 3.78% | | Inequalities | 4,000 | 3.05% | | Other | 1,267 | 0.97% | #### Detailed Breakdown by Problem Type and Source
Click to expand detailed breakdown **Algebra** - olympiads: 31,752 - cn_contest: 6,776 - amc_aime: 1,886 - aops_forum: 1,684 - inequalities: 531 - olympiads_ref: 265 - number_theory: 78 **Geometry** - olympiads: 22,091 - cn_contest: 4,377 - aops_forum: 3,316 - amc_aime: 1,454 - olympiads_ref: 99 - inequalities: 60 - number_theory: 8 **Number Theory** - olympiads: 14,848 - aops_forum: 3,614 - cn_contest: 1,916 - amc_aime: 744 - number_theory: 489 - olympiads_ref: 329 - inequalities: 131 **Combinatorics** - olympiads: 11,219 - aops_forum: 3,176 - cn_contest: 1,724 - amc_aime: 612 - olympiads_ref: 266 - inequalities: 125 - number_theory: 22 **Logic and Puzzles** - olympiads: 5,677 - aops_forum: 1,197 - cn_contest: 212 - amc_aime: 136 - inequalities: 16 - number_theory: 7 - olympiads_ref: 5 **Calculus** - olympiads: 3,894 - aops_forum: 907 - cn_contest: 139 - inequalities: 8 - amc_aime: 4 - olympiads_ref: 1 - number_theory: 1 **Inequalities** - olympiads: 2,292 - aops_forum: 717 - cn_contest: 657 - inequalities: 273 - olympiads_ref: 34 - amc_aime: 25 - number_theory: 2 **Other** - olympiads: 714 - aops_forum: 481 - amc_aime: 32 - cn_contest: 27 - number_theory: 10 - olympiads_ref: 2 - inequalities: 1
#### Original NuminaMath-1.5 Source Breakdown | source | problems | question_type:proof | question_type:mcq | question_type:word | |:---------------|-----------:|----------------------:|--------------------:|---------------------:| | olympiads | 197084 | 62970 | 13529 | 117845 | | olympiads_ref | 3638 | 2246 | nan | 1392 | | amc_aime | 5872 | 208 | 4374 | 963 | | aops_forum | 67841 | 24532 | 5924 | 33486 | | cn_contest | 29944 | 8663 | 5602 | 15649 | | inequalities | 7314 | 5780 | 49 | 1478 | | number_theory | 4043 | 2591 | 15 | 1239 | | cn_k12 | 268819 | 3966 | 115800 | 149010 | | orca_math | 151934 | 1 | 17 | 151916 | | synthetic_math | 148712 | 41 | 1057 | 147612 | | metamath | 11014 | nan | 82 | 10932 | | Total | 896215 | 110998 | 146449 | 631522 | ## Additional Information ### Licensing Information The dataset follows the licensing of the original NuminaMath-1.5 dataset and is available under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). ### Citation Information ``` @misc{nlile2025numinamath15rlverifiable, author = {nlile}, title = {NuminaMath-1.5-RL-Verifiable}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face Dataset Repository}, howpublished = {\url{https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable}} } @misc{numina_math_datasets, author = {Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu}, title = {NuminaMath}, year = {2024}, publisher = {Numina}, journal = {Hugging Face repository}, howpublished = {\url{https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf}} } ```