# PoseFormer
**Repository Path**: AI52CV/PoseFormer
## Basic Information
- **Project Name**: PoseFormer
- **Description**: PoseFormer:首个纯基于Transformer的 3D 人体姿态估计网络,性能达到 SOTA
https://mp.weixin.qq.com/s/DKWSeRu_ThMf_vf9j1GCbQ
原地址:https://github.com/zczcwh/PoseFormer
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 2
- **Created**: 2021-04-16
- **Last Updated**: 2021-04-29
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# 3D Human Pose Estimation with Spatial and Temporal Transformers
This repo is the official implementation for [3D Human Pose Estimation with Spatial and Temporal Transformers](https://arxiv.org/pdf/2103.10455.pdf).
[Video Demonstration](https://youtu.be/z8HWOdXjGR8)
## PoseFormer Architecture
## Video Demo
|
|
|:--:|
| 3D HPE on Human3.6M |
|
|
|:--:|
| 3D HPE on videos in-the-wild using PoseFormer |
Our code is built on top of [VideoPose3D](https://github.com/facebookresearch/VideoPose3D).
### Environment
The code is developed and tested under the following environment
* Python 3.8.2
* PyTorch 1.7.1
* CUDA 11.0
You can create the environment:
```bash
conda env create -f poseformer.yml
```
### Dataset
Our code is compatible with the dataset setup introduced by [Martinez et al.](https://github.com/una-dinosauria/3d-pose-baseline) and [Pavllo et al.](https://github.com/facebookresearch/VideoPose3D). Please refer to [VideoPose3D](https://github.com/facebookresearch/VideoPose3D) to set up the Human3.6M dataset (./data directory).
### Evaluating pre-trained models
We provide the pre-trained 81-frame model (CPN detected 2D pose as input) [here](https://drive.google.com/file/d/1j0Vto7ljPHMdBndZKtGESaIUym6stAY_/view?usp=sharing). To evaluate it, put it into the `./checkpoint` directory and run:
```bash
python run_poseformer.py -k cpn_ft_h36m_dbb -f 81 -c checkpoint --evaluate detected81f.bin
```
We also provide pre-trained 81-frame model (Ground truth 2D pose as input) [here](https://drive.google.com/file/d/1b_f22oFy9_SzoxdpOADS7so7l0Y3JE8-/view?usp=sharing). To evaluate it, put it into the `./checkpoint` directory and run:
```bash
python run_poseformer.py -k gt -f 81 -c checkpoint --evaluate gt81f.bin
```
### Training new models
* To train a model from scratch (CPN detected 2D pose as input), run:
```bash
python run_poseformer.py -k cpn_ft_h36m_dbb -f 27 -lr 0.00004 -lrd 0.99
```
`-f` controls how many frames are used as input. 27 frames achieves 47.0 mm, 81 frames achieves achieves 44.3 mm.
* To train a model from scratch (Ground truth 2D pose as input), run:
```bash
python run_poseformer.py -k gt -f 81 -lr 0.00004 -lrd 0.99
```
81 frames achieves 31.3 mm (MPJPE).
### Visualization and other functions
We keep our code consistent with [VideoPose3D](https://github.com/facebookresearch/VideoPose3D). Please refer to their project page for further information.
### Bibtex
If you find our work useful in your research, please consider citing:
@article{zheng20213d,
title={3D Human Pose Estimation with Spatial and Temporal Transformers},
author={Zheng, Ce and Zhu, Sijie and Mendieta, Matias and Yang, Taojiannan and Chen, Chen and Ding, Zhengming},
journal={arXiv preprint arXiv:2103.10455},
year={2021}
}
## Acknowledgement
Part of our code is borrowed from [VideoPose3D](https://github.com/facebookresearch/VideoPose3D). We thank the authors for releasing the codes.