# DAViD **Repository Path**: mirrors_microsoft/DAViD ## Basic Information - **Project Name**: DAViD - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-23 - **Last Updated**: 2025-10-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DAViD: Data-efficient and Accurate Vision Models from Synthetic Data The repo accompanies the ICCV 2025 paper [DAViD: Data-efficient and Accurate Vision Models from Synthetic Data](https://microsoft.github.io/DAViD) and contains instructions for downloading and using the SynthHuman dataset and models described in the paper. ## 📊 The SynthHuman Dataset Face DataFull Body DataUpper Body Data The SynthHuman dataset contains approximately 300,000 images of synthetic humans with ground-truth annotations for foreground alpha masks, absolute depth, surface normals and camera intrinsics. There are approximately 100,000 images for each of three camera scenarios: face, upper-body and full-body. The data is generated using the latest version of our synthetic data generation pipeline, which has been used to create a number of datasets: [Face Synthetics](https://microsoft.github.io/FaceSynthetics/), [SimpleEgo](https://aka.ms/SimpleEgo) and [SynthMoCap](https://aka.ms/SynthMoCap). Ground-truth annotations are per-pixel with perfect accuracy due to the graphics-based rendering pipeline: Face GT Full Body GT ### Data Format The dataset contains 298008 samples. There first 98040 samples feature the face, the next 99976 sample feature the full body and the final 99992 samples feature the upper body. Each sample is made up of: - `rgb_0000000.png` - RGB image - `alpha_0000000.png` - foreground alpha mask - `depth_0000000.exr` - absolute z-depth image in cm - `normal_0000000.exr` - surface normal image (XYZ) - `cam_0000000.txt` - camera intrinsics (see below) The camera text file includes the standard intrinsic matrix: ``` f_x 0.0 c_x 0.0 f_y c_y 0.0 0.0 1.0 ``` Where `f_x`, and `f_y` are in pixel units. This can be easily loaded with `np.loadtxt(path_to_camera_txt)`. ### Downloading the Dataset The dataset is broken in 60 zip files to make downloading easier. Each zip file contains 5000 samples and has a maximum size of 8.75GB. The total download size is approximately 330GB. To download the dataset simply run `download_data.py TARGET_DIRECTORY [--single-sample] [--single-chunk]` which will download and unzip the zips into the target folder. You can optionally download a single sample or a single chunk to quickly take a look at the data. ### Loading the Dataset You can visualize samples from the dataset using `visualize_data.py SYNTHHUMAN_DIRECTORY [--start-idx N]`. This script shows examples of how to load the image files correctly and display the data. ### Dataset License The SynthHuman dataset is licensed under the [CDLA-2.0](./LICENSE-CDLA-2.0.txt). The download and visualization scripts are licensed under the [MIT License](./LICENSE-MIT.txt). ## 🔓 Released Models We release models for the following tasks:
Task Version ONNX Model Model Card
Soft Foreground Segmentation Base Download Model Card
Large Download
Relative Depth Estimation Base Download Model Card
Large Download
Surface Normal Estimation Base Download Model Card
Large Download
Multi-Task Model Large Download Model Card
## 🚀 Run the Demo This demo supports running: - Relative depth estimation - Soft foreground segmentation - Surface normal estimation To install the requirements for running demo: ```bash pip install -r requirement.txt ``` You can use either run: 1. A multi-task model that performs all tasks simultaneously ```bash python demo.py \ --image path/to/input.jpg \ --multitask-model models/multitask.onnx ``` 2. Or using individual models ```bash python demo.py \ --image path/to/input.jpg \ --depth-model models/depth.onnx \ --foreground-model models/foreground.onnx \ --normal-model models/normal.onnx ``` 🧠 **Notes:** - The script expects ONNX models. Ensure the model paths are correct. - If both multi-task and individual models are provided, results from both will be shown and compared. - Foreground masks are used for improved visualization of depth and normals. Here is an example output image after running the demo: ![Example results](img/demo_result.png) ### Model License DAViD models and runtime code are licensed under the [MIT License](./LICENSE-MIT.txt). ## 📖 Citation If you use the SynthHuman Dataset or any of the DAViD models in your research, please cite the following: ```bibtex @misc{saleh2025david, title={{DAViD}: Data-efficient and Accurate Vision Models from Synthetic Data}, author={Fatemeh Saleh and Sadegh Aliakbarian and Charlie Hewitt and Lohit Petikam and Xiao-Xian and Antonio Criminisi and Thomas J. Cashman and Tadas Baltrušaitis}, year={2025}, eprint={2507.15365}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.15365}, } ```