# DAViD
**Repository Path**: mirrors_microsoft/DAViD
## Basic Information
- **Project Name**: DAViD
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-23
- **Last Updated**: 2025-10-04
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
The repo accompanies the ICCV 2025 paper [DAViD: Data-efficient and Accurate Vision Models from Synthetic Data](https://microsoft.github.io/DAViD) and contains instructions for downloading and using the SynthHuman dataset and models described in the paper.
## 📊 The SynthHuman Dataset
The SynthHuman dataset contains approximately 300,000 images of synthetic humans with ground-truth annotations for foreground alpha masks, absolute depth, surface normals and camera intrinsics. There are approximately 100,000 images for each of three camera scenarios: face, upper-body and full-body. The data is generated using the latest version of our synthetic data generation pipeline, which has been used to create a number of datasets: [Face Synthetics](https://microsoft.github.io/FaceSynthetics/), [SimpleEgo](https://aka.ms/SimpleEgo) and [SynthMoCap](https://aka.ms/SynthMoCap). Ground-truth annotations are per-pixel with perfect accuracy due to the graphics-based rendering pipeline:
### Data Format
The dataset contains 298008 samples.
There first 98040 samples feature the face, the next 99976 sample feature the full body and the final 99992 samples feature the upper body.
Each sample is made up of:
- `rgb_0000000.png` - RGB image
- `alpha_0000000.png` - foreground alpha mask
- `depth_0000000.exr` - absolute z-depth image in cm
- `normal_0000000.exr` - surface normal image (XYZ)
- `cam_0000000.txt` - camera intrinsics (see below)
The camera text file includes the standard intrinsic matrix:
```
f_x 0.0 c_x
0.0 f_y c_y
0.0 0.0 1.0
```
Where `f_x`, and `f_y` are in pixel units.
This can be easily loaded with `np.loadtxt(path_to_camera_txt)`.
### Downloading the Dataset
The dataset is broken in 60 zip files to make downloading easier.
Each zip file contains 5000 samples and has a maximum size of 8.75GB.
The total download size is approximately 330GB.
To download the dataset simply run `download_data.py TARGET_DIRECTORY [--single-sample] [--single-chunk]` which will download and unzip the zips into the target folder.
You can optionally download a single sample or a single chunk to quickly take a look at the data.
### Loading the Dataset
You can visualize samples from the dataset using `visualize_data.py SYNTHHUMAN_DIRECTORY [--start-idx N]`.
This script shows examples of how to load the image files correctly and display the data.
### Dataset License
The SynthHuman dataset is licensed under the [CDLA-2.0](./LICENSE-CDLA-2.0.txt).
The download and visualization scripts are licensed under the [MIT License](./LICENSE-MIT.txt).
## 🔓 Released Models
We release models for the following tasks:
Task | Version | ONNX Model | Model Card |
---|---|---|---|
Soft Foreground Segmentation | Base | Download | Model Card |
Large | Download | ||
Relative Depth Estimation | Base | Download | Model Card |
Large | Download | ||
Surface Normal Estimation | Base | Download | Model Card |
Large | Download | ||
Multi-Task Model | Large | Download | Model Card |