# pile-dm_mathematics **Repository Path**: hf-datasets/pile-dm_mathematics ## Basic Information - **Project Name**: pile-dm_mathematics - **Description**: Mirror of https://huggingface.co/datasets/timaeus/pile-dm_mathematics - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-27 - **Last Updated**: 2025-08-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README --- dataset_info: features: - name: text dtype: string - name: meta struct: - name: pile_set_name dtype: string splits: - name: train num_bytes: 821512129 num_examples: 100000 download_size: 414996906 dataset_size: 821512129 configs: - config_name: default data_files: - split: train path: data/train-* --- ## Dataset Creation Process These subsets were created by streaming over the rows from [`monology/pile-uncopyrighted`](https://huggingface.co/datasets/monology/pile-uncopyrighted) and filtering by the meta column. Each subset is generally limited to the first 100,000 qualifying rows encountered. ## Citations If you use this dataset, please cite the original Pile papers: ```bibtex @article{gao2020pile, title={The Pile: An 800GB dataset of diverse text for language modeling}, author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others}, journal={arXiv preprint arXiv:2101.00027}, year={2020} } @article{biderman2022datasheet, title={Datasheet for the pile}, author={Biderman, Stella and Bicheno, Kieran and Gao, Leo}, journal={arXiv preprint arXiv:2201.07311}, year={2022} } ```