# NuminaMath-1.5-RL-Verifiable

**Repository Path**: hf-datasets/NuminaMath-1.5-RL-Verifiable

## Basic Information

- **Project Name**: NuminaMath-1.5-RL-Verifiable
- **Description**: Mirror of https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-15
- **Last Updated**: 2025-07-15

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

---
license: apache-2.0
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- math
- post-training
- RL
- verifiable
- reasoning
pretty_name: NuminaMath 1.5 RL Verifiable
dataset_info:
  features:
  - name: problem
    dtype: string
  - name: solution
    dtype: string
  - name: answer
    dtype: string
  - name: problem_type
    dtype: string
  - name: question_type
    dtype: string
  - name: problem_is_valid
    dtype: string
  - name: solution_is_valid
    dtype: string
  - name: source
    dtype: string
  - name: synthetic
    dtype: bool
  splits:
  - name: train
    num_bytes: 188626432
    num_examples: 131063
  download_size: 85718743
  dataset_size: 188626432
configs:
- config_name: default
  data_files:
  - split: train
    path: data/train-*
---

# Dataset Card for NuminaMath-1.5-RL-Verifiable

## Dataset Description

- **Homepage:** https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable
- **Repository:** [NuminaMath-1.5-RL-Verifiable](https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable)
- **Based on:** [NuminaMath-1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5)

### Dataset Summary

NuminaMath-1.5-RL-Verifiable is a curated subset of the NuminaMath-1.5 dataset, specifically filtered to support reinforcement learning applications requiring verifiable outcomes. This collection consists of 131,063 math word problems from the original dataset that meet strict filtering criteria: all problems have definitive numerical answers, validated problem statements and solutions, and come from high-quality, non-synthetic sources.

The filtering process removes multiple-choice questions, proofs, problems without clear numerical answers, and all synthetic content, while preserving the rich diversity of mathematical domains from the original collection.

### Filtering Methodology

The dataset was created by applying the following filters to the original NuminaMath-1.5 dataset:

- **Removed question types**: Multiple-choice questions and proofs
- **Answer validation**: Retained only problems with non-empty, numerical answers (excluded 'proof', 'notfound' answers)
- **Source selection**: Excluded potentially lower-quality sources (cn_k12, orca_math, synthetic_math, metamath)
- **Quality filters**: Retained only problems with validated problem statements and solutions
- **Authenticity**: Excluded all synthetic problems

These filtering steps reduced the original dataset from 896,215 problems to 131,063 problems (approximately 14.6% of the original dataset), all with verifiable outcomes.

## Dataset Structure

### Data Instances

Each instance in the dataset contains:
- A math word problem statement
- A Chain of Thought (CoT) solution
- A definitive numerical answer
- Problem metadata including math domain type

### Data Fields

- `problem`: Text description of the mathematical problem
- `ref_solution`: Step-by-step Chain of Thought (CoT) solution
- `answer`: Definitive numerical result
- `problem_type`: Mathematical domain (Algebra, Geometry, Number Theory, etc.)
- `question_type`: Always "math-word-problem" in this filtered dataset
- `source`: Origin of the problem (olympiads, cn_contest, aops_forum, etc.)
- `problem_is_valid`: Always "Yes" in this filtered dataset
- `solution_is_valid`: Always "Yes" in this filtered dataset
- `synthetic`: Always false in this filtered dataset

### Dataset Statistics

#### Distribution by Source

| Source | Problem Count |
|--------|---------------|
| olympiads | 92,487 |
| cn_contest | 15,828 |
| aops_forum | 15,092 |
| amc_aime | 4,893 |
| inequalities | 1,145 |
| olympiads_ref | 1,001 |
| number_theory | 617 |
| **Total** | **131,063** |

#### Distribution by Problem Type

| Problem Type | Problem Count | Percentage |
|--------------|---------------|------------|
| Algebra | 42,972 | 32.79% |
| Geometry | 31,405 | 23.96% |
| Number Theory | 22,071 | 16.84% |
| Combinatorics | 17,144 | 13.08% |
| Logic and Puzzles | 7,250 | 5.53% |
| Calculus | 4,954 | 3.78% |
| Inequalities | 4,000 | 3.05% |
| Other | 1,267 | 0.97% |

#### Detailed Breakdown by Problem Type and Source

<details>
<summary>Click to expand detailed breakdown</summary>

**Algebra**
- olympiads: 31,752
- cn_contest: 6,776
- amc_aime: 1,886
- aops_forum: 1,684
- inequalities: 531
- olympiads_ref: 265
- number_theory: 78

**Geometry**
- olympiads: 22,091
- cn_contest: 4,377
- aops_forum: 3,316
- amc_aime: 1,454
- olympiads_ref: 99
- inequalities: 60
- number_theory: 8

**Number Theory**
- olympiads: 14,848
- aops_forum: 3,614
- cn_contest: 1,916
- amc_aime: 744
- number_theory: 489
- olympiads_ref: 329
- inequalities: 131

**Combinatorics**
- olympiads: 11,219
- aops_forum: 3,176
- cn_contest: 1,724
- amc_aime: 612
- olympiads_ref: 266
- inequalities: 125
- number_theory: 22

**Logic and Puzzles**
- olympiads: 5,677
- aops_forum: 1,197
- cn_contest: 212
- amc_aime: 136
- inequalities: 16
- number_theory: 7
- olympiads_ref: 5

**Calculus**
- olympiads: 3,894
- aops_forum: 907
- cn_contest: 139
- inequalities: 8
- amc_aime: 4
- olympiads_ref: 1
- number_theory: 1

**Inequalities**
- olympiads: 2,292
- aops_forum: 717
- cn_contest: 657
- inequalities: 273
- olympiads_ref: 34
- amc_aime: 25
- number_theory: 2

**Other**
- olympiads: 714
- aops_forum: 481
- amc_aime: 32
- cn_contest: 27
- number_theory: 10
- olympiads_ref: 2
- inequalities: 1

</details>

#### Original NuminaMath-1.5 Source Breakdown

| source         |   problems |   question_type:proof |   question_type:mcq |   question_type:word |
|:---------------|-----------:|----------------------:|--------------------:|---------------------:|
| olympiads      |     197084 |                 62970 |               13529 |               117845 |
| olympiads_ref  |       3638 |                  2246 |                 nan |                 1392 |
| amc_aime       |       5872 |                   208 |                4374 |                  963 |
| aops_forum     |      67841 |                 24532 |                5924 |                33486 |
| cn_contest     |      29944 |                  8663 |                5602 |                15649 |
| inequalities   |       7314 |                  5780 |                  49 |                 1478 |
| number_theory  |       4043 |                  2591 |                  15 |                 1239 |
| cn_k12         |     268819 |                  3966 |              115800 |               149010 |
| orca_math      |     151934 |                     1 |                  17 |               151916 |
| synthetic_math |     148712 |                    41 |                1057 |               147612 |
| metamath       |      11014 |                   nan |                  82 |                10932 |
| Total          |     896215 |                110998 |              146449 |               631522 |


## Additional Information

### Licensing Information

The dataset follows the licensing of the original NuminaMath-1.5 dataset and is available under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).

### Citation Information

```
@misc{nlile2025numinamath15rlverifiable,
  author = {nlile},
  title = {NuminaMath-1.5-RL-Verifiable},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Dataset Repository},
  howpublished = {\url{https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable}}
}

@misc{numina_math_datasets,
  author = {Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu},
  title = {NuminaMath},
  year = {2024},
  publisher = {Numina},
  journal = {Hugging Face repository},
  howpublished = {\url{https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf}}
}
```