# unsup-queries-data **Repository Path**: mirrors_deepmind/unsup-queries-data ## Basic Information - **Project Name**: unsup-queries-data - **Description**: Unsupervised Data Generated for GeoQuery and SAIL Datasets - **Primary Language**: Unknown - **License**: GPL-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-02-27 - **Last Updated**: 2025-09-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Unsupervised Data Generated for GeoQuery and SAIL Datasets This repository contains the generated unsupervised data for GeoQuery and SAIL semantic parsing tasks. For a detailed description of this see below and in the paper [Semantic Parsing with Semi-Supervised Sequential Autoencoders](https://arxiv.org/abs/1609.09315), Kočiský et al., EMNLP 2016. Please cite the paper if you use this corpus in your work. ### Bibtex ``` @inproceedings{emnlp16_kocisky, author = {Tom\'a\v s Ko\v cisk\'y and G\'abor Melis and Edward Grefenstette and Chris Dyer and Wang Ling and Phil Blunsom and Karl Moritz Hermann}, title = {Semantic Parsing with Semi-Supervised Sequential Autoencoders}, url = {https://arxiv.org/abs/1609.09315}, booktitle = "Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP)", year = "2016", } ``` ### GeoQuery Data For GeoQuery, we fit a 3-gram Kneser-Ney model to the queries in the training set and sample about 7 million queries from it. We ensure that the sampled queries are different from the training queries, but do not enforce validity. ### SAIL Data For this task we hand designed a maze by trying to approximately replicate the key statistics of the existing mazes (corridor lengths, number and types of intersections, object distribution). #### Path navigation action sequences To approximate the distribution of instruction sequences that correspond to utterances in the SAIL dataset, we selected randomly a starting point, an ending point, and a starting and ending orientation in the hand-designed maze. We then found the shortest action sequence between the two points. The sequence was then segmented into roughly utterance-sized instructions using heuristic rules.