# corenet

**Repository Path**: mirrors_google-research/corenet

## Basic Information

- **Project Name**: corenet
- **Description**: CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-03-02
- **Last Updated**: 2025-11-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# CoReNet

![](doc/teaser_image_efe6cd70d5f1a2636c39834fcff598d1fa4525f8d73b115d5eb2f2210527f047_v1.jpg)
![](doc/teaser_image_efe6cd70d5f1a2636c39834fcff598d1fa4525f8d73b115d5eb2f2210527f047_v2.jpg)
![](doc/teaser_image_efe6cd70d5f1a2636c39834fcff598d1fa4525f8d73b115d5eb2f2210527f047_v3.jpg)

CoReNet is a technique for joint multi-object 3D reconstruction from a single
RGB image. It produces coherent reconstructions, where all objects live in a
single consistent 3D coordinate frame relative to the camera, and they do not
intersect in 3D. You can find more information in the following paper:
[CoReNet: Coherent 3D scene reconstruction from a single RGB image](
https://arxiv.org/abs/2004.12989).

This repository contains source code, dataset pointers, and instructions for
reproducing the results in the paper. If you find our code, data, or the paper
useful, please consider citing

```
@InProceedings{popov20eccv,
  title="CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image",
  author="Popov, Stefan and Bauszat, Pablo and Ferrari, Vittorio", 
  booktitle="Computer Vision -- ECCV 2020",
  year="2020",
  doi="10.1007/978-3-030-58536-5_22"
}
```

#### Table of Contents 
* [Installation](#installation)
* [Datasets](#datasets)
* [Models from the paper](#models-from-the-paper)
* [Training and evaluating a new model](#training-and-evaluating-a-new-model)
* [Further details](#further-details)


## Installation

The code in this repository has been verified to work on Ubuntu 18.04 with the
following dependencies:

```bash
# General APT packages
sudo apt install \
  python3-pip python3-virtualenv python python3.8-dev g++-8 \
  ninja-build git libboost-container-dev unzip

# NVIDIA related packages
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /"
sudo apt install \
    nvidia-driver-455 nvidia-utils-455 `#driver, CUDA+GL libraries, utils` \
    cuda-runtime-10-1 cuda-toolkit-10-2 libcudnn7 `# Cuda and CUDNN`
```

To install CoReNet, you need to clone the code from GitHub and create a python
virtual environment. 
 
```bash
# Clone CoReNet
mkdir -p ~/prj/corenet
cd ~/prj/corenet
git clone https://github.com/google-research/corenet.git .

# Setup a python virtual environment
python3.8 -m virtualenv --python=/usr/bin/python3.8 venv_38
. venv_38/bin/activate
pip install -r requirements.txt
```

All instructions below assume that CoReNet lives in `~/prj/corenet`, that this
is the current working directory, and that the virtual environment is activated.
You can also run CoReNet using the supplied docker file:
`~/prj/corenet/Dockerfile`.

## Datasets

The CoReNet paper introduced several datasets with synthetic scenes. To
reproduce the experiments in the paper you need to download them, using: 
 
```bash
cd ~/prj/corenet
mkdir -p ~/prj/corenet/data/raw
for n in single pairs triplets; do  
  for s in train val test; do
    wget "https://storage.googleapis.com/gresearch/corenet/${n}.${s}.tar" \
      -O "data/raw/${n}.${s}.tar" 
    tar -xvf "data/raw/${n}.${s}.tar" -C data/ 
  done 
done
```

For each scene, these datasets provide the objects placement, a good view point,
and two images rendered from it with a varying degree of realism. To download
the actual object geometry, you need to download `ShapeNetCore.v2.zip` from
[ShapeNet](https://www.shapenet.org/)'s original site, unpack it, and convert
the 3D meshes to CoReNet's binary format:

```bash
echo "Please download ShapeNetCore.v2.zip from ShapeNet's original site and "
echo "place it in ~/prj/corenet/data/raw/ before running the commands below"

cd ~/prj/corenet
unzip data/raw/ShapeNetCore.v2.zip -d data/raw/
PYTHONPATH=src python -m preprocess_shapenet \
  --shapenet_root=data/raw/ShapeNetCore.v2 \
  --output_root=data/shapenet_meshes
```


## Models from the paper

To help reproduce the results from the CoReNet paper, we offer 5 pre-trained
models from it (`h5`, `h7`, `m7`, `m9`, and `y1`; details below and in 
the paper). You can download and unpack these using:

```bash
cd ~/prj/corenet
wget https://storage.googleapis.com/gresearch/corenet/paper_tf_models.tgz \
  -O data/raw/paper_tf_models.tgz
tar xzvf data/raw/paper_tf_models.tgz -C data/
```

You can evaluate the downloaded models against their respective test sets using: 
```bash
MODEL=h7  # Set to one of: h5, h7, m7, m9, y1

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
tf_model_eval --config_path=configs/paper_tf_models/${MODEL}.json5
```

To run on multiple GPUs in parallel, set `--nproc_per_node` to the number of
desired GPUs. You can use `CUDA_VISIBLE_DEVICES` to control which GPUs exactly
to use. `CUDA_HOME`, `PATH`, and `FILL_VOXELS_CUDA_FLAGS` control the
just-in-time compiler for the voxelization operation. 

Upon completion, quantitative results will be stored in 
`~/prj/corenet/output/paper_tf_models/${MODEL}/voxel_metrics.csv`.
Qualitative results will be available in 
`~/prj/corenet/output/paper_tf_models/${MODEL}/` in the form of
PNG files.

This table summarizes the model attributes and their performance.
More details can be found in the paper.

| model | dataset   | realism | native resolution | mean IoU |
| :---: | :---:     | :---:   | :---:             | :---:    |
| h5    | single    | low     | 128 x 128 x 128   | 57.9%    |
| h7    | single    | high    | 128 x 128 x 128   | 59.1%    |  
| y1    | single    | low     | 32  x 32  x 32    | 53.3%    |
| m7    | pairs     | high    | 128 x 128 x 128   | 43.1%    |  
| m9    | triplets  | high    | 128 x 128 x 128   | 43.9%    |  

Note that all models are evaluated on a grid resolution of `128 x 128 x 128`,
independent of their native resolution (see section 3.5 in the paper).
The performance computed with this code matches the one reported in the
paper for `h5`, `h7`, `m7`, and `m9`. For `y1`, the performance here is slightly
higher (+0.2% IoU), as we no longer have the exact checkpoint used in the paper.

You can also run these models on individual images interactively, using the
`corenet_demo.ipynb` notebook. For this, you need to also
`pip install jupyter-notebook` in your virtual environment.


## Training and evaluating a new model

We offer PyTorch code for training and evaluating models. To train a model, you
need to (once) import the starting ResNet50 checkpoint:
```bash
cd ~/prj/corenet
PYTHONPATH=src python -m import_resnet50_checkpoint
```

Then run:
```bash
MODEL=h7  # Set to one of: h5, h7, m7, m9 

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
train --config_path=configs/models/h7.json5
```
Again, use `--nproc_per_node` and `CUDA_VISIBLE_DEVICES` to control parallel
execution on multiple GPUs, `CUDA_HOME`, `PATH`, and `FILL_VOXELS_CUDA_FLAGS`
control just-in-time compilation.

You can also evaluate individual checkpoints, for example:
```bash
cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 eval \
  --cpt_path=output/models/h7/cpt/persistent/state_000000000.cpt \
  --output_path=output/eval_cpt_example \
  --eval_names_regex="short.*" \
  -jq '(.. | .config? | select(.num_qualitative_results != null) | .num_qualitative_results) |= 4' \
```
The `-jq` option limits the number of qualitative results to 4 (see also 
[Further details](#further-details) section)

We currently offer checkpoints trained with this code for models 
`h5`, `h7`, `m7`, and `m9`,
[in this .tgz](https://storage.googleapis.com/gresearch/corenet/model_checkpoints.tgz).
These checkpoints achieve slightly better performance than the paper
(see table below).
This is likely due to a different distributed training strategy 
(synchronous here vs. asynchronous in the paper) and a different ML framework
(PyTorch vs. TensorFlow in the paper).

|  | h5 | h7 | m7 | m9 |
| :---: | :---: | :---: | :---: | :---: | 
|  mean IoU  | 60.2% | 61.6% |  45.0% | 46.9% |


## Further details

### Configuration files

The evaluation and training scripts are configured using JSON5 files
that map to the `TfModelEvalPipeline` and `TrainPipeline` dataclasses in
`src/corenet/configuration.py`. You can find description of the different 
configuration options in code comments, starting from these two classes. 

You can also modify the configuration on the fly, through 
[jq](https://stedolan.github.io/jq/) queries, as well as defines that
change entries in the `string_templates` section. For example, 
the following options change the number of workers, and the prefetch factor of 
the data loaders, as well as the location of the data and the output
directories:

```bash
... \
-jq "'(.. | .data_loader? | select(. != null) | .num_data_workers) |= 12'" \
    "'(.. | .data_loader? | select(. != null) | .prefetch_factor) |= 4'" \
-D 'data_dir=gs://some_gcs_bucket/data' \
   'output_dir=gs://some_gcs_bucket/output/models'
```   
 
### Dataset statistics
The table below summarizes the number of scenes in each dataset 

| | single | pairs | triplets |
| :---: | :---: | :---: | :---: |
| train | 883084 | 319981 | 80000 | 
| val | 127286 | 45600 | 11400 |
| test | 246498 | 91194 | 22798 |


### Data format and transformations.
Please see [this](doc/data_format_and_coordinate_systems.md) document.

### Licenses
The code and the checkpoints are released under the 
[Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). The datasets,
the documentation, and the configuration files are licensed under the
[Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).