# ml-space-benchmark **Repository Path**: mirrors_apple/ml-space-benchmark ## Basic Information - **Project Name**: ml-space-benchmark - **Description**: Code and data for "Does Spatial Cognition Emerge in Frontier Models?" - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-20 - **Last Updated**: 2026-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SPACE benchmark This software project and dataset accompanies the research paper: **[Does Spatial Cognition Emerge in Frontier Models?](https://arxiv.org/pdf/2410.06468)**, *Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl, Vladlen Koltun*. Published in ICLR 2025 ![](assets/space_figure.png) We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition. ## Installation instructions 1. Install mamba following instructions from [Miniforge](https://github.com/conda-forge/miniforge). Create a mamba environment. ``` mamba create -n space-benchmark python=3.9 cmake=3.14.0 -y mamba activate space-benchmark ``` 2. Clone this repo and install requirements. ``` pip install -r requirements.txt ``` 3. Install habitat for large-scale cognition navigation experiments. ``` mamba install habitat-sim=0.3.0 headless -c conda-forge -c aihabitat ``` 4. Generate a security token from huggingface and set the environment variables. ``` export HF_TOKEN= export HF_HUB_ENABLE_HF_TRANSFER=1 ``` 5. Set OpenAI and Anthropic API keys for evaluating GPT and Claude models. ``` export OPENAI_API_KEY= export ANTHROPIC_API_KEY= ``` ## Downloading SPACE dataset The SPACE dataset is available [here](https://ml-site.cdn-apple.com/datasets/space/space.tar.gz). Download it to `/data/SPACE_data_release`. ``` mkdir /data cd /data wget https://ml-site.cdn-apple.com/datasets/space/space.tar.gz tar -xvzf space.tar.gz rm space.tar.gz ``` ## Evaluating models ### SPACE QA tasks **Command:** ``` python -m space.evaluate_qas \ --model_name _qa \ --data_path data/SPACE_data_release//qas.json \ --save_dir experiments/ ``` **QA task names:** `CBTT_text`, `CBTT_vision`, `DirectionEstimationBEVImage`, `DirectionEstimationBEVText`, `DirectionEstimationEgo`, `DistanceEstimationBEVImage`, `DistanceEstimationBEVText`, `DistanceEstimationEgo`, `JLO_text`, `JLO_vision`, `MPFB_text`, `MPFB_vision`, `MRT_text`, `MRT_vision`, `MapSketchingBEVImage`, `MapSketchingBEVText`, `MapSketchingEgo`, `PTT_text`, `PTT_vision`, `SAdd_text`, `SAdd_vision`, `SAtt_text`, `SAtt_vision`, `WLT_vision`
**Models supported for multimodal presentation:** `claude35sonnet`, `gpt4o`, `gpt4v`, `phi35vision`, `pixtral12b`
**Models supported for text-only presentation:** `claude35sonnet`, `gpt4o`, `gpt4v`, `llama3_8b`, `llama3_70b`, `mixtral8x7b`, `mixtral8x22b`, `mistral123b`, `yi15_9b`, `yi15_34b`
### SPACE navigation tasks This evaluation requires habitat-sim to be installed.
**Command for egocentric navigation:** ``` python -m space.evaluate_egonav \ --model_name _egonav \ --envs_dir data/SPACE_data_release/3D_scenes/ \ --save_dir experiments/RouteRetracingEgo \ --walkthrough_key shortestpath \ --max_steps 250 ``` **Command for discrete map image navigation:** ``` python -m space.evaluate_dmnav \ --model_name _dminav \ --envs_dir data/SPACE_data_release/2D_scenes \ --walkthroughs_dir data/SPACE_data_release/BEV_image_walkthroughs/ \ --obs_type image \ --save_dir experiments/RouteRetracingDiscreteMapImage \ --walkthrough_key shortestpath ``` **Command for discrete map text navigation:** ``` python -m space.evaluate_dmnav \ --model_name _dmtnav \ --envs_dir data/SPACE_data_release/2D_scenes \ --walkthroughs_dir data/SPACE_data_release/BEV_text_walkthroughs/ \ --obs_type text \ --save_dir experiments/RouteRetracingDiscreteMapText \ --walkthrough_key shortestpath ``` **Notes:** * These commands evaluate on route retracing. Set `--walkthrough_key walkthrough` for evaluating on shortcut discovery.
* Models supported for multimodal presentation: `claude35sonnet`, `gpt4o`, `gpt4v`
* Models supported for text-only presentation: `claude35sonnet`, `gpt4o`, `gpt4v`, `llama3_8b`, `llama3_70b`, `mixtral8x7b`, `mixtral8x22b`, `mistral123b`, `yi15_9b`, `yi15_34b` ### SPACE CSWM task **Command for multimodal presentation:** ``` python -m space.evaluate_cswm \ --model_name _cswm_vision \ --envs_dir data/SPACE_data_release/CSWM_vision/ \ --save_dir experiments/CSWM_vision \ --game_mode vision ``` **Command for text-only presentation:** ``` python -m space.evaluate_cswm \ --model_name _cswm_text \ --envs_dir data/SPACE_data_release/CSWM_text/ \ --save_dir experiments/CSWM_text \ --game_mode text ``` **Models supported for multimodal presentation:** `claude35sonnet`, `gpt4o`, `gpt4v`
**Models supported for text-only presentation:** `claude35sonnet`, `gpt4o`, `gpt4v`, `llama3_8b`, `llama3_70b`, `mixtral8x7b`, `mixtral8x22b`, `mistral123b`, `yi15_9b`, `yi15_34b` ### SPACE maze completion task **Command for multimodal presentation:** ``` python -m space.evaluate_mct \ --model_name _mct_vision \ --envs_dir data/SPACE_data_release/MCT_vision/envs \ --save_dir experiments/MCT_vision ``` **Command for text-only presentation:** ``` python -m space.evaluate_mct \ --model_name _mct_text \ --envs_dir data/SPACE_data_release/MCT_text/envs \ --save_dir experiments/MCT_text ``` **Models supported for multimodal presentation:** `claude35sonnet`, `gpt4o`, `gpt4v`
**Models supported for text-only presentation:** `claude35sonnet`, `gpt4o`, `gpt4v`, `llama3_8b`, `llama3_70b`, `mixtral8x7b`, `mixtral8x22b`, `mistral123b`, `yi15_9b`, `yi15_34b` ## Citation ``` @inproceedings{ramakrishnan2025space, title={Does Spatial Cognition Emerge in Frontier Models?}, author={Ramakrishnan, Santhosh Kumar and Wijmans, Erik and Kraehenbuehl, Philipp and Koltun, Vladlen}, booktitle={International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=WK6K1FMEQ1} } ``` ## License This project's code is released under the Apple Sample Code License (see [LICENSE](LICENSE)). This project's data is released under the CC-BY-NC-ND license (see [LICENSE_DATA](LICENSE_DATA)). ## Acknowledgements Our codebase is built using opensource contributions, please see [Acknowledgements](ACKNOWLEDGEMENTS.md) for more details. Please check the paper for a complete list of references and datasets used in this work.