# trecs_image_generation **Repository Path**: mirrors_google-research/trecs_image_generation ## Basic Information - **Project Name**: trecs_image_generation - **Description**: No description available - **Primary Language**: Unknown - **License**: CC-BY-4.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-02-24 - **Last Updated**: 2025-11-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Text-to-Image Generation Grounded by Fine-Grained User Attention This repository contains the paired word-tag data required to train a word-to-label sequence tagger (as described in our [paper](https://arxiv.org/abs/2011.03775)). In addition, the generated images from the proposed TReCS model are also provided for the LN-COCO and LN-OpenImages validation sets. ## Abstract [Localized Narratives](https://google.github.io/localized-narratives/) is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TReCS, a sequential model that exploits this grounding to generate images. TReCS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used to select and position masks to generate a fully covered segmentation canvas; the final image is produced by a segmentation-to-image generator using this canvas. This multi-step, retrieval-based approach outperforms existing direct text-to-image generation models on both automatic metrics and human evaluations: overall, its generated images are more photo-realistic and better match descriptions. ## Word-Tag Training Data We release the training data used for training the sequence tagger used in TReCS. We generate this data automatically from [COCO-Stuff segmentation data](https://github.com/nightrome/cocostuff) and Localized Narratives mouse traces. The full preprocessing steps are described in Section 2.1 of our paper. The data is available in the `sequence_labels` folder of this repository. We provide the data for the MS-COCO split in several paired .txt files. The lines in each set of files correspond to each other, so for example line 1 in `train_coco.ids.txt` is the ID for the first sentence in `train_coco.words.txt`, with the tags in the first line in `train_coco.tags.txt`. * `train_coco.ids.txt`: A text file where each line consists of the unique example IDs in the format `{image_id}_{annotator_id}` where `image_id` and `annotator_id` are values taken from the corresponding Localized Narratives examples. * `train_coco.words.txt`: A text file where each line consists of the formatted caption for the Localized Narratives. Tokens are space separated. * `train_coco.tags.txt`: A text file where each line consists of the processed labels for each word in the caption. Labels are space separated. * `val_coco.ids.txt`: IDs for the validation data in MS-COCO split of Localized Narratives. * `val_coco.words.txt`: Formatted caption for the validation data in the MS-COCO split of Localized Narratives. * `val_coco.tags.txt`: Labels for each word in the caption for the validation data in the MS-COCO split of Localized Narratives. * `val_oi.ids.txt`: IDs for the validation data in Open Images split of Localized Narratives. * `val_oi.words.txt`: Formatted caption for the validation data in the Open Images split of Localized Narratives. ## TReCS Generated Images Our generated images for the MS-COCO and OpenImages validation sets can be downloaded [here](http://storage.googleapis.com/gresearch/trecs-image-generation/trecs_images.tar). We generate an image using the TReCS model for each item in the Localized Narratives MS-COCO validation set. The naming convention for the images is in the format `{image_id}_{annotator_id}.png`, where `image_id` and `annotator_id` are values taken from the Localized Narratives examples. ## Citation If you find this work useful, please consider citing: ``` @inproceedings{koh2020text, title={Text-to-Image Generation Grounded by Fine-Grained User Attention}, author={Koh, Jing Yu and Baldridge, Jason and Lee, Honglak and Yang, Yinfei}, booktitle={Winter Conference on Applications of Computer Vision (WACV)}, year={2021} } ``` We also suggest citing the paper for the [Localized Narratives](https://google.github.io/localized-narratives/) dataset. The Localized Narratives dataset is licensed under CC-BY 4.0. ## Disclaimer Not an official Google product.