# inksight **Repository Path**: mirrors_google-research/inksight ## Basic Information - **Project Name**: inksight - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-25 - **Last Updated**: 2026-03-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Organization Icon

# InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

Blagoj Mitrevski†Arina Rak†Julian Schnitzler†Chengkun Li†Andrii Maksai‡Jesse BerentClaudiu Musat

First authors (random order)   |   Corresponding author: amaksai@google.com

Paper arXiv Project Page Demo Colab Google Research Blog

---

Inksight
Animated teaser

## Overview InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code. **Key capabilities:** - Offline-to-online handwriting conversion from photos - Multi-language support with robust background handling - Word-level and full-page text processing - Vector-based digital ink output for editing and search

Derender Diagram
InkSight system architecture (animated version)

## Latest Updates - **June 2025**: Paper accepted to **[TMLR (Transactions on Machine Learning Research)](https://openreview.net/forum?id=pSyUfV5BqA)** - **October 2024**: Model weights and dataset released on [Hugging Face](https://huggingface.co/Derendering/InkSight-Small-p) - **October 2024**: Featured on [Google Research Blog](https://research.google/blog/a-return-to-hand-written-notes-by-learning-to-read-write/) - **February 2024**: Interactive [demo](https://huggingface.co/spaces/Derendering/Model-Output-Playground) launched ## Quick Start ### Online Demo Try InkSight on Hugging Face Space: [**Interactive Demo**](https://huggingface.co/spaces/Derendering/Model-Output-Playground) ### Jupyter Notebook Explore our [**example notebook**](https://githubtocolab.com/google-research/inksight/blob/main/colab.ipynb) with step-by-step inference examples. ### Dataset Access our comprehensive dataset: [**InkSight Dataset on Hugging Face**](https://huggingface.co/datasets/Derendering/InkSight-Derenderings) ## Installation ### Using uv (Recommended) [uv](https://docs.astral.sh/uv/) is a fast Python package and project manager that provides excellent dependency resolution and virtual environment management. ```bash # Install uv if you haven't already curl -LsSf https://astral.sh/uv/install.sh | sh # Clone and set up the project git clone https://github.com/google-research/inksight.git cd inksight uv sync ``` ### Using Conda ```bash git clone https://github.com/google-research/inksight.git cd inksight conda env create -f environment.yml conda activate inksight ``` > **Important**: Use TensorFlow 2.15.0-2.17.0. Later versions may cause unexpected behavior. ## Local Playground Setup For development or custom inference, run the Gradio playground locally: ```bash git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground cd Model-Output-Playground pip install -r requirements.txt python app.py ``` ## Resources ### 📊 Dataset - [**InkSight Dataset**](https://huggingface.co/datasets/Derendering/InkSight-Derenderings) - Comprehensive collection of model outputs and expert traces - [**Dataset Documentation**](docs/dataset.md) - Detailed dataset description, format specifications, and usage guidelines ### 🤖 Models - [**Small-p model (CPU/GPU)**](https://huggingface.co/Derendering/InkSight-Small-p) - Optimized for standard inference - [**Small-p model (TPU)**](https://storage.googleapis.com/derendering_model/small-p-tpu.zip) - TPU-optimized version ### 💻 Code Examples - [**Inference Notebook**](colab.ipynb) - Word and page-level inference examples - [**Sample Outputs**](figures/) - Visual examples of model results The inference code demonstrates both word-level and full-page text processing using open-source alternatives to commercial OCR APIs, including support for [docTR](https://github.com/mindee/doctr) and [Tesseract OCR](https://github.com/tesseract-ocr/tesseract). ## License and Citation ### License This code is released under the [Apache 2.0 License](https://github.com/google-research/google-research/blob/master/LICENSE). ### Citation If you use InkSight in your research, please cite our paper: ```bibtex @article{ mitrevski2025inksight, title={InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write}, author={Blagoj Mitrevski and Arina Rak and Julian Schnitzler and Chengkun Li and Andrii Maksai and Jesse Berent and Claudiu Cristian Musat}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2025}, url={https://openreview.net/forum?id=pSyUfV5BqA}, note={} } ``` ### Additional Resources - [**Project Page**](https://charlieleee.github.io/publication/inksight/) - Comprehensive project overview with examples and technical details - [**Google Research Blog**](https://research.google/blog/a-return-to-hand-written-notes-by-learning-to-read-write/) - Featured article explaining the research --- *This is not an officially supported Google product.*