diff --git a/audio/speech_synthesis/tacotron2/pytorch/README.md b/audio/speech_synthesis/tacotron2/pytorch/README.md index f7f113ce60846e87b51f0ea8b01bbc4236f66ef0..b5bc8ef8fa0d1be86699164d4602dcea3929ea87 100644 --- a/audio/speech_synthesis/tacotron2/pytorch/README.md +++ b/audio/speech_synthesis/tacotron2/pytorch/README.md @@ -2,14 +2,12 @@ ## Model Description -This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is -composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale -spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those -spectrograms. Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally -recorded speech. To validate our design choices, we present ablation studies of key components of our system and -evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and F_0 features. -We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of -the WaveNet architecture. +Tacotron2 is an end-to-end neural text-to-speech synthesis system that directly converts text into natural-sounding +speech. It combines a sequence-to-sequence network that generates mel-spectrograms from text with a WaveNet-based +vocoder to produce high-quality audio. The model achieves near-human speech quality with a Mean Opinion Score (MOS) of +4.53, rivaling professional recordings. Its architecture simplifies traditional speech synthesis pipelines by using +learned acoustic representations, enabling more natural prosody and articulation while maintaining computational +efficiency. ## Model Preparation diff --git a/cv/3d-reconstruction/hashnerf/pytorch/README.md b/cv/3d-reconstruction/hashnerf/pytorch/README.md index 3cdc741931212b8319aedaed69a97b9a097a07f2..3b904466862a16b7178bafd4743b5e0cdeb5dfb8 100644 --- a/cv/3d-reconstruction/hashnerf/pytorch/README.md +++ b/cv/3d-reconstruction/hashnerf/pytorch/README.md @@ -2,16 +2,15 @@ ## Model description -A PyTorch implementation (Hash) of the NeRF part (grid encoder, density grid ray sampler) in instant-ngp, as described -in Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. +HashNeRF is an efficient implementation of Neural Radiance Fields (NeRF) using a multiresolution hash encoding +technique. It accelerates 3D scene reconstruction and novel view synthesis by optimizing memory usage and computational +efficiency. Based on instant-ngp's approach, HashNeRF employs a grid encoder and density grid ray sampler to achieve +high-quality rendering results. The model supports various datasets and custom scenes, making it suitable for +applications in computer graphics, virtual reality, and 3D reconstruction tasks. -## Step 1: Installation +## Model Preparation -```sh -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources We use the same data format as instant-ngp, [fox](https://github.com/NVlabs/instant-ngp/tree/master/data/nerf/fox) and blender dataset [nerf_synthetic](https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1).Please @@ -23,19 +22,23 @@ For custom dataset, you should: 2. put the video under a path like ./data/custom/video.mp4 or the images under ./data/custom/images/*.jpg. 3. call the preprocess code: (should install ffmpeg and colmap first! refer to the file for more options) -```sh +```bash python3 scripts/colmap2nerf.py --video ./data/custom/video.mp4 --run_colmap # if use video python3 scripts/colmap2nerf.py --images ./data/custom/images/ --run_colmap # if use images ``` -## Step 3: Training and test +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` -### One single GPU +## Model Training First time running will take some time to compile the CUDA extensions. -```sh -# train with fox dataset +```bash +# train with fox dataset on One single GPU python3 main_nerf.py data/fox --workspace trial_nerf -O # data/fox is dataset path; --workspace means output path; @@ -43,22 +46,18 @@ python3 main_nerf.py data/fox --workspace trial_nerf -O # test mode python3 main_nerf.py data/fox --workspace trial_nerf -O --test -``` -```sh # train with the blender dataset, you should add `--bound 1.0 --scale 0.8 --dt_gamma 0` # --bound means the scene is assumed to be inside box[-bound, bound] # --scale adjusts the camera locaction to make sure it falls inside the above bounding box. # --dt_gamma controls the adaptive ray marching speed, set to 0 turns it off. python3 main_nerf.py data/nerf_synthetic/lego --workspace trial_nerf -O --bound 1.0 --scale 0.8 --dt_gamma 0 -``` -```sh # train with custom dataset(you'll need to tune the scale & bound if necessary): python3 main_nerf.py data/custom_data --workspace trial_nerf -O ``` -## Results +## Model Results | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| diff --git a/cv/3d_detection/bevformer/pytorch/README.md b/cv/3d_detection/bevformer/pytorch/README.md index 22d54f0e580a3673c800da334b19de21de218431..7783512ddcc08462dc19b2a3145f7f3b4ea1a49d 100755 --- a/cv/3d_detection/bevformer/pytorch/README.md +++ b/cv/3d_detection/bevformer/pytorch/README.md @@ -1,99 +1,99 @@ # BEVFormer -## Model description -In this work, the authors present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, the authors design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, the authors propose a temporal self-attention to recurrently fuse the history BEV information. -The proposed approach achieves the new state-of-the-art **56.9\%** in terms of NDS metric on the nuScenes test set, which is **9.0** points higher than previous best arts and on par with the performance of LiDAR-based baselines. +## Model Description +BEVFormer is a transformer-based framework for autonomous driving perception that learns unified Bird's Eye View (BEV) +representations. It combines spatial and temporal information through innovative attention mechanisms: spatial +cross-attention extracts features from camera views, while temporal self-attention fuses historical BEV data. This +approach achieves state-of-the-art performance on nuScenes dataset, matching LiDAR-based systems. BEVFormer supports +multiple perception tasks simultaneously, making it a versatile solution for comprehensive scene understanding in +autonomous driving applications. -## Prepare -**Install mmcv-full.** -```shell -cd mmcv -bash clean_mmcv.sh -bash build_mmcv.sh -bash install_mmcv.sh -``` +## Model Preparation -**Install mmdet and mmseg.** -```shell -pip3 install mmdet==2.25.0 -pip3 install mmsegmentation==0.25.0 -``` +### Prepare Resources -**Install mmdet3d from source code.** -```shell -cd ../mmdetection3d -pip3 install -r requirements.txt,OR pip3 install -r requirements/optional.txt,pip3 install -r requirements/runtime.txt,pip3 install -r requirements/tests.txt -python3 setup.py install -``` - -**Install timm.** -```shell -pip3 install timm -``` +Download nuScenes V1.0-mini data and CAN bus expansion data from [HERE](https://www.nuscenes.org/download). Prepare +nuscenes data by running. -## NuScenes -Download nuScenes V1.0-mini data and CAN bus expansion data [HERE](https://www.nuscenes.org/download). Prepare nuscenes data by running - - -**Download CAN bus expansion** -``` -cd .. +```bash mkdir data -cd data +cd data/ + # download 'can_bus.zip' unzip can_bus.zip + # move can_bus to data dir ``` -**Prepare nuScenes data** +Prepare nuScenes data. -*We genetate custom annotation files which are different from mmdet3d's* -``` -cd .. +We genetate custom annotation files which are different from mmdet3d's + +```bash +cd ../ python3 tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --version v1.0-mini --canbus ./data ``` Using the above code will generate `nuscenes_infos_temporal_{train,val}.pkl`. -## Prepare pretrained models +Prepare pretrained models. + ```shell mkdir ckpts cd ckpts & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/bevformer_r101_dcn_24ep.pth -cd .. +cd ../ ``` -## Prerequisites +### Install Dependencies -**Please ensure you have prepared the environment and the nuScenes dataset.** +```shell +# Install mmcv-full +cd mmcv/ +bash clean_mmcv.sh +bash build_mmcv.sh +bash install_mmcv.sh -## Train and Test +# Install mmdet and mmseg +pip3 install mmdet==2.25.0 +pip3 install mmsegmentation==0.25.0 + +# Install mmdet3d from source code +cd ../mmdetection3d +pip3 install -r requirements.txt,OR pip3 install -r requirements/optional.txt,pip3 install -r requirements/runtime.txt,pip3 install -r requirements/tests.txt +python3 setup.py install -Train BEVFormer with 8 GPUs +# Install timm +pip3 install timm ``` + +## Model Training + +```bash +# Train BEVFormer with 8 GPUs ./tools/dist_train.sh ./projects/configs/bevformer/bevformer_base.py 8 -``` -Eval BEVFormer with 8 GPUs -``` +# Eval BEVFormer with 8 GPUs ./tools/dist_test.sh ./projects/configs/bevformer/bevformer_base.py ./path/to/ckpts.pth 8 ``` -Note: using 1 GPU to eval can obtain slightly higher performance because continuous video may be truncated with multiple GPUs. By default we report the score evaled with 8 GPUs. - +Note: using 1 GPU to eval can obtain slightly higher performance because continuous video may be truncated with multiple +GPUs. By default we report the score evaled with 8 GPUs. -## Using FP16 to train the model. -The above training script can not support FP16 training, +The above training script can not support FP16 training, and we provide another script to train BEVFormer with FP16. -``` +```bash +# Using FP16 to train the model ./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 8 ``` -## Results on BI-V100 -| GPUs | model | NDS | mAP | -|------|----------------|--------|--------| -| 1x8 | bevformer_base | 0.3516 | 0.3701 | +## Model Results + +| Model | GPU | model | NDS | mAP | +|-----------|------------|----------------|--------|--------| +| BEVFormer | BI-V100 x8 | bevformer_base | 0.3516 | 0.3701 | + +## References -## Reference: -[BEVFormer](https://github.com/fundamentalvision/BEVFormer/tree/master) +[BEVFormer](https://github.com/fundamentalvision/BEVFormer/tree/master) diff --git a/cv/3d_detection/centerpoint/pytorch/README.md b/cv/3d_detection/centerpoint/pytorch/README.md index e6de3a160bf5ddf5f8e53f262227ee5f83007d2f..27a9d5d67d8dee70b3c4e7aaa75c560cd0851803 100644 --- a/cv/3d_detection/centerpoint/pytorch/README.md +++ b/cv/3d_detection/centerpoint/pytorch/README.md @@ -1,10 +1,35 @@ # CenterPoint -## Model description -Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, CenterPoint outperforms all previous single model method by a large margin and ranks first among all Lidar-only submissions. +## Model Description + +CenterPoint is a state-of-the-art 3D object detection and tracking framework that represents objects as points rather +than bounding boxes. It first detects object centers using a keypoint detector, then regresses other attributes like +size, orientation, and velocity. A second stage refines these estimates using additional point features. This approach +simplifies 3D tracking to greedy closest-point matching, achieving top performance on nuScenes and Waymo datasets while +maintaining efficiency and simplicity in implementation. + +## Model Preparation + +### Prepare Resources + +Download nuScenes from . + +```bash +mkdir -p data/nuscenes +# For nuScenes Dataset +└── NUSCENES_DATASET_ROOT + ├── samples <-- key frames + ├── sweeps <-- frames without annotation + ├── maps <-- unused + ├── v1.0-trainval <-- metadata + +python3 tools/create_data.py nuscenes_data_prep --root-path ./data/nuscenes --version="v1.0-trainval" --nsweeps=10 -## Step 1: Installation ``` + +### Install Dependencies + +```bash ## install libGL and libboost yum install mesa-libGL yum install boost-devel @@ -30,48 +55,25 @@ bash setup.sh export PYTHONPATH="${PYTHONPATH}:PATH_TO_CENTERPOINT" ``` -## Step 2: Preparing datasets -Download nuScenes from https://www.nuscenes.org/download -``` -mkdir -p data/nuscenes -# For nuScenes Dataset -└── NUSCENES_DATASET_ROOT - ├── samples <-- key frames - ├── sweeps <-- frames without annotation - ├── maps <-- unused - ├── v1.0-trainval <-- metadata - -python3 tools/create_data.py nuscenes_data_prep --root-path ./data/nuscenes --version="v1.0-trainval" --nsweeps=10 - -``` - - -## Step 3: Training - -### Single GPU training +## Model Training ```bash +# Single GPU training python3 ./tools/train.py ./configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py -``` - -### Multiple GPU training -```bash +# Multiple GPU training python3 -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py -``` -### Evaluation - -```bash +# Evaluation python3 ./tools/dist_test.py ./configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py --work_dir work_dirs/nusc_centerpoint_voxelnet_01voxel --checkpoint work_dirs/nusc_centerpoint_voxelnet_01voxel/latest.pth ``` -## Results +## Model Results -GPUs | FPS | ACC ----- | --- | --- -BI-V100 x8 | 2.423 s/step | mAP: 0.5654 +| Model | GPU | FPS | ACC | +|-------------|------------|--------------|-------------| +| CenterPoint | BI-V100 x8 | 2.423 s/step | mAP: 0.5654 | +## References -## Reference - [CenterPoint](https://github.com/tianweiy/CenterPoint) diff --git a/cv/3d_detection/paconv/pytorch/README.md b/cv/3d_detection/paconv/pytorch/README.md index 3ff390de8a57a7a98690af2ce0290e56af60f8ea..69a10d0901ba4d30c719bd3e67c20368bf63ab18 100644 --- a/cv/3d_detection/paconv/pytorch/README.md +++ b/cv/3d_detection/paconv/pytorch/README.md @@ -1,9 +1,24 @@ -# PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds +# PAConv -## Model description -We introduce Position Adaptive Convolution (PAConv), a generic convolution operation for 3D point cloud processing. The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet. In this way, the kernel is built in a data-driven manner, endowing PAConv with more flexibility than 2D convolutions to better handle the irregular and unordered point cloud data. Besides, the complexity of the learning process is reduced by combining weight matrices instead of brutally predicting kernels from point positions. Furthermore, different from the existing point convolution operators whose network architectures are often heavily engineered, we integrate our PAConv into classical MLP-based point cloud pipelines without changing network configurations. Even built on simple networks, our method still approaches or even surpasses the state-of-the-art models, and significantly improves baseline performance on both classification and segmentation tasks, yet with decent efficiency. Thorough ablation studies and visualizations are provided to understand PAConv. +## Model Description -## Step 1: Installation +PAConv (Position Adaptive Convolution) is an innovative convolution operation for 3D point cloud processing that +dynamically assembles convolution kernels. It constructs kernels by adaptively combining weight matrices from a Weight +Bank, with coefficients learned from point positions through ScoreNet. This data-driven approach provides flexibility to +handle irregular point cloud data efficiently. PAConv integrates seamlessly with existing MLP-based pipelines, achieving +state-of-the-art performance in classification and segmentation tasks while maintaining computational efficiency. + +## Model Preparation + +### Prepare Resources + +```bash +cd data/s3dis/ +``` + +Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. + +### Install Dependencies ```bash # Install libGL @@ -18,14 +33,7 @@ cd mmdetection3d pip install -v -e . ``` -## Step 2: Preparing datasets - -```bash -cd data/s3dis/ -``` -Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. - -## Step 3: Training +## Model Training ```bash # Single GPU training @@ -36,13 +44,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/paconv/paconv_cuda_ssg_8x8_cosine_200e_s3dis_seg-3d-13class.py 8 ``` -## Results +## Model Results -classes | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls ----------|---------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|--------|---------|--------|--------|--------- -results | 0.9488 | 0.9838 | 0.8184 | 0.0000 | 0.1682 | 0.5836 | 0.7387 | 0.7782 | 0.8832 | 0.6101 | 0.7081 | 0.6876 | 0.5810 | 0.6530 | 0.8910 | 0.7131 +| Model | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls | fps | +|--------|---------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|--------|---------|--------|--------|---------|------------------| +| PAConv | 0.9488 | 0.9838 | 0.8184 | 0.0000 | 0.1682 | 0.5836 | 0.7387 | 0.7782 | 0.8832 | 0.6101 | 0.7081 | 0.6876 | 0.5810 | 0.6530 | 0.8910 | 0.7131 | 65.3 samples/sec | -fps = batchsize*8/1batchtime = 65.3 samples/sec +## References -## Reference -[mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) +- [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) diff --git a/cv/3d_detection/part_a2_anchor/pytorch/README.md b/cv/3d_detection/part_a2_anchor/pytorch/README.md index 66b4549eb0dbbbef9af172d75cfead43311b6431..3e45faedcbde89a30c6a219fb93bf4e77c044b11 100644 --- a/cv/3d_detection/part_a2_anchor/pytorch/README.md +++ b/cv/3d_detection/part_a2_anchor/pytorch/README.md @@ -1,33 +1,16 @@ # Part-A2-Anchor -## Model description +## Model Description -3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part-A2 net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part-A2 net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data. +Part-A2-Anchor is an advanced 3D object detection framework for LiDAR point clouds, extending PointRCNN with enhanced +part-aware and aggregation capabilities. It operates in two stages: first, it predicts 3D proposals and intra-object +part locations using free part supervisions; second, it aggregates these parts to refine box scores and locations. This +approach effectively captures object geometry, achieving state-of-the-art performance on the KITTI dataset while +maintaining computational efficiency for practical applications. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -## switch to devtoolset-7 env -source /opt/rh/devtoolset-7/enable - -# Install spconv -cd toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh - -# Install openpcdet -cd toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -52,17 +35,36 @@ cd toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/PartA2.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +## switch to devtoolset-7 env +source /opt/rh/devtoolset-7/enable + +# Install spconv +cd toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh + +# Install openpcdet +cd toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools +python3 train.py --cfg_file cfgs/kitti_models/PartA2.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/PartA2.yaml ``` diff --git a/cv/3d_detection/part_a2_free/pytorch/README.md b/cv/3d_detection/part_a2_free/pytorch/README.md index e64bebc0e07c58899f1fb913734985d749de340e..189ba0b7e988602302be37588f0cfa3dfa66ce58 100644 --- a/cv/3d_detection/part_a2_free/pytorch/README.md +++ b/cv/3d_detection/part_a2_free/pytorch/README.md @@ -1,30 +1,16 @@ # Part-A2-Free -## Model description +## Model Description -In this work, we propose the part-aware and aggregation neural network (PartA2-Net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. At the time of submission (July-9 2019), our PartA2-Net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection learderbaord by utilizing only the LiDAR point cloud data. +Part-A2-Free is an advanced 3D object detection framework for LiDAR point clouds, leveraging part-aware and aggregation +techniques. It operates in two stages: first predicting 3D proposals and intra-object part locations using free part +supervisions, then aggregating these parts to refine box scores and locations. This approach effectively captures object +geometry through a novel RoI-aware point cloud pooling module, achieving state-of-the-art performance on the KITTI +dataset while maintaining computational efficiency for practical applications. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install spconv -cd toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh - -# Install openpcdet -cd toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -49,17 +35,33 @@ cd toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/PartA2_free.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install spconv +cd toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh + +# Install openpcdet +cd toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/PartA2_free.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/PartA2_free.yaml ``` diff --git a/cv/3d_detection/pointnet2/pytorch/README.md b/cv/3d_detection/pointnet2/pytorch/README.md index 3b8dfcdc7444b7c2509206500459d79f7a927354..bd3547177d56425eabe102dd35768eec5848fbe4 100644 --- a/cv/3d_detection/pointnet2/pytorch/README.md +++ b/cv/3d_detection/pointnet2/pytorch/README.md @@ -1,10 +1,25 @@ -# PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space -> [PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space](https://arxiv.org/abs/1706.02413) +# PointNet++ -## Model description -Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. +## Model Description + +PointNet++ is a hierarchical neural network for processing 3D point cloud data, extending the capabilities of PointNet. +It recursively applies PointNet on nested partitions of the input point set, enabling the learning of local features at +multiple scales. The network adapts to varying point densities through novel set learning layers, improving performance +on complex scenes. PointNet++ excels in tasks like 3D object classification and segmentation by effectively capturing +fine-grained geometric patterns in point clouds. + +## Model Preparation + +### Prepare Resources + +```bash +cd data/s3dis/ +``` + +Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. + +### Install Dependencies -## Installing packages ```bash # Install libGL ## CentOS @@ -18,13 +33,8 @@ cd mmdetection3d pip install -v -e . ``` -## Prepare S3DIS Data -``` -cd data/s3dis/ -``` -Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. +## Model Training -## Training ```bash # Single GPU training python3 tools/train.py configs/pointnet2/pointnet2_msg_2xb16-cosine-80e_s3dis-seg.py @@ -34,10 +44,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/pointnet2/pointnet2_msg_2xb16-cosine-80e_s3dis-seg.py 8 ``` -## Training Results -| Classes | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls | -| --------| ------- | ----- | ------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ | -| Results | 0.9147 | 0.9742 | 0.7800 | 0.0000 | 0.1881 | 0.5361 | 0.2265 | 0.6922 | 0.8249 | 0.3303 | 0.6585 | 0.5422 | 0.4607 | 0.5483 | 0.8490 | 0.6168 | +## Model Results + +| Model | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls | +|------------|---------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|--------|---------|--------|--------|---------| +| PointNet++ | 0.9147 | 0.9742 | 0.7800 | 0.0000 | 0.1881 | 0.5361 | 0.2265 | 0.6922 | 0.8249 | 0.3303 | 0.6585 | 0.5422 | 0.4607 | 0.5483 | 0.8490 | 0.6168 | + +## References -## Reference -[mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) \ No newline at end of file +- [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) diff --git a/cv/3d_detection/pointpillars/pytorch/README.md b/cv/3d_detection/pointpillars/pytorch/README.md index 29d0d0c37c4e901adfa482e929816096daf194f0..ce5c69321f080213507cdce0f47e51a182cfc200 100755 --- a/cv/3d_detection/pointpillars/pytorch/README.md +++ b/cv/3d_detection/pointpillars/pytorch/README.md @@ -1,13 +1,86 @@ # PointPillars -## Model description -A Simple PointPillars PyTorch Implenmentation for 3D Lidar(KITTI) Detection. +## Model Description -- It can be run without installing [mmcv](https://github.com/open-mmlab/mmcv), [Spconv](https://github.com/traveller59/spconv), [mmdet](https://github.com/open-mmlab/mmdetection) or [mmdet3d](https://github.com/open-mmlab/mmdetection3d). -- Only one detection network (PointPillars) was implemented in this repo, so the code may be more easy to read. -- Sincere thanks for the great open-souce architectures [mmcv](https://github.com/open-mmlab/mmcv), [mmdet](https://github.com/open-mmlab/mmdetection) and [mmdet3d](https://github.com/open-mmlab/mmdetection3d), which helps me to learn 3D detetion and implement this repo. +PointPillars is an efficient 3D object detection framework designed for LiDAR point cloud data. It organizes point +clouds into vertical columns (pillars) to create a pseudo-image representation, enabling the use of 2D convolutional +networks for processing. This approach balances accuracy and speed, making it suitable for real-time applications like +autonomous driving. PointPillars achieves state-of-the-art performance on the KITTI dataset while maintaining +computational efficiency through its pillar-based encoding and simplified network architecture. + +## Model Preparation + +### Prepare Resources + +Download: + +- [point cloud (29GB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip) +- [images (12 GB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip) +- [calibration files (16 MB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip) +- [labels (5 MB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip) + +Format the datasets as follows: + +```bash +kitti + |- ImageSets + |- train.txt + |- val.txt + |- test.txt + |- trainval.txt + |- training + |- calib (#7481 .txt) + |- image_2 (#7481 .png) + |- label_2 (#7481 .txt) + |- velodyne (#7481 .bin) + |- testing + |- calib (#7518 .txt) + |- image_2 (#7518 .png) + |- velodyne (#7418 .bin) +``` + +The train.txt、val.txt、test.txt and trainval.txt you can get from: + +```bash +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt +``` + +Pre-process KITTI datasets First. + +```bash +ln -s path/to/kitti/ImageSets ./dataset +python3 pre_process_kitti.py --data_root your_path_to_kitti +``` + +Now, we have datasets as follows: + +```bash +kitti + |- training + |- calib (#7481 .txt) + |- image_2 (#7481 .png) + |- label_2 (#7481 .txt) + |- velodyne (#7481 .bin) + |- velodyne_reduced (#7481 .bin) + |- testing + |- calib (#7518 .txt) + |- image_2 (#7518 .png) + |- velodyne (#7518 .bin) + |- velodyne_reduced (#7518 .bin) + |- kitti_gt_database (# 19700 .bin) + |- kitti_infos_train.pkl + |- kitti_infos_val.pkl + |- kitti_infos_trainval.pkl + |- kitti_infos_test.pkl + |- kitti_dbinfos_train.pkl + +``` + +### Install Dependencies -## [Compile] ```bash # Install libGL ## CentOS @@ -22,72 +95,13 @@ python3 setup.py build_ext --inplace pip install . ``` -## [Datasets] - -1. Download - - Download [point cloud](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip)(29GB), [images](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip)(12 GB), [calibration files](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip)(16 MB)和[labels](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip)(5 MB)。 - Format the datasets as follows: - ``` - kitti - |- ImageSets - |- train.txt - |- val.txt - |- test.txt - |- trainval.txt - |- training - |- calib (#7481 .txt) - |- image_2 (#7481 .png) - |- label_2 (#7481 .txt) - |- velodyne (#7481 .bin) - |- testing - |- calib (#7518 .txt) - |- image_2 (#7518 .png) - |- velodyne (#7418 .bin) - ``` - The train.txt、val.txt、test.txt and trainval.txt you can get from: - ``` - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt - ``` -2. Pre-process KITTI datasets First - - ``` - ln -s path/to/kitti/ImageSets ./dataset - python3 pre_process_kitti.py --data_root your_path_to_kitti - ``` - - Now, we have datasets as follows: - ``` - kitti - |- training - |- calib (#7481 .txt) - |- image_2 (#7481 .png) - |- label_2 (#7481 .txt) - |- velodyne (#7481 .bin) - |- velodyne_reduced (#7481 .bin) - |- testing - |- calib (#7518 .txt) - |- image_2 (#7518 .png) - |- velodyne (#7518 .bin) - |- velodyne_reduced (#7518 .bin) - |- kitti_gt_database (# 19700 .bin) - |- kitti_infos_train.pkl - |- kitti_infos_val.pkl - |- kitti_infos_trainval.pkl - |- kitti_infos_test.pkl - |- kitti_dbinfos_train.pkl - - ``` - -## [Training] - -### Single GPU training +## Model Training + ```bash +# Single GPU training python3 train.py --data_root your_path_to_kitti ``` -## Reference -[PointPillars](https://github.com/zhulf0804/PointPillars/tree/620e6b0d07e4cb37b7b0114f26b934e8be92a0ba) \ No newline at end of file +## References + +- [PointPillars](https://github.com/zhulf0804/PointPillars/tree/620e6b0d07e4cb37b7b0114f26b934e8be92a0ba) diff --git a/cv/3d_detection/pointrcnn/pytorch/README.md b/cv/3d_detection/pointrcnn/pytorch/README.md index 573be6585caabaae9630ef7cdcc2065830443616..42d36c3dd3a5238facb375d46d0fdf2afa4846cf 100644 --- a/cv/3d_detection/pointrcnn/pytorch/README.md +++ b/cv/3d_detection/pointrcnn/pytorch/README.md @@ -1,31 +1,22 @@ # PointRCNN -## Model description -PointRCNN 3D object detector to directly generated accurate 3D box proposals from raw point cloud in a bottom-up manner, which are then refined in the canonical coordinate by the proposed bin-based 3D box regression loss. To the best of our knowledge, PointRCNN is the first two-stage 3D object detector for 3D object detection by using only the raw point cloud as input. PointRCNN is evaluated on the KITTI dataset and achieves state-of-the-art performance on the KITTI 3D object detection leaderboard among all published works at the time of submission. +## Model Description -## Step 1: Installation -```bash -## install libGL -yum install -y mesa-libGL +PointRCNN is a two-stage 3D object detection framework that directly processes raw point cloud data. In the first stage, +it generates accurate 3D box proposals in a bottom-up manner. The second stage refines these proposals using a bin-based +3D box regression loss in canonical coordinates. As the first two-stage detector using only raw point clouds, PointRCNN +achieves state-of-the-art performance on the KITTI dataset, demonstrating superior accuracy in 3D object detection +tasks. -pip3 install easydict tensorboardX shapely fire scikit-image +## Model Preparation -bash build_and_install.sh +### Prepare Resources -## install numba -pushd numba/ -bash clean_numba.sh -bash build_numba.sh -bash install_numba.sh -popd -``` - -## Step 2: Preparing datasets Download the kitti dataset from Download the "planes" subdataset from -``` +```bash PointRCNN ├── data │ ├── KITTI @@ -41,60 +32,64 @@ PointRCNN ``` Generate gt database + ```bash pushd tools/ python3 generate_gt_database.py --class_name 'Car' --split train popd ``` - -## Step 3: Training -### Training of RPN stage +### Install Dependencies ```bash -pushd tools/ +## install libGL +yum install -y mesa-libGL -# Single GPU training -export CUDA_VISIBLE_DEVICES=0 -python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 16 --train_mode rpn --epochs 200 +pip3 install easydict tensorboardX shapely fire scikit-image -# Multiple GPU training -CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rpn --epochs 200 --mgpus +bash build_and_install.sh +## install numba +pushd numba/ +bash clean_numba.sh +bash build_numba.sh +bash install_numba.sh popd ``` -### Training of RCNN stage +## Model Training ```bash -pushd tools/ +cd tools/ -# Single GPU training +# Training of RPN stage +## Single GPU training export CUDA_VISIBLE_DEVICES=0 -python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth +python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 16 --train_mode rpn --epochs 200 -# Multiple GPU training -CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth --mgpus +## Multiple GPU training +CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rpn --epochs 200 --mgpus -popd -``` -## Step 4: Evaluation +# Training of RCNN stage +## Single GPU training +export CUDA_VISIBLE_DEVICES=0 +python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth -```bash -pushd tools/ +## Multiple GPU training +CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth --mgpus +# Evaluation python3 eval_rcnn.py --cfg_file cfgs/default.yaml --ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth --batch_size 4 --eval_mode rpn python3 eval_rcnn.py --cfg_file cfgs/default.yaml --ckpt ../output/rcnn/default/ckpt/checkpoint_epoch_70.pth --batch_size 4 --eval_mode rcnn - -popd ``` -## Results +## Model Results + +| Model | GPU | Stage | FPS | ACC | +|-----------|------------|-------|-------------|-----------------------| +| PointRCNN | BI-V100 x8 | RPN | 127.56 s/it | iou avg: 0.5417 | +| PointRCNN | BI-V100 x8 | RCNN | 975.71 s/it | avg detections: 7.243 | -GPUs|Stage|FPS|ACC -----|-----|---|--- -BI-V100 x8|RPN| 127.56 s/it | iou avg: 0.5417 -BI-V100 x8|RCNN| 975.71 s/it | avg detections: 7.243 +## References -## Reference - [PointRCNN](https://github.com/sshaoshuai/PointRCNN) diff --git a/cv/3d_detection/pointrcnn_iou/pytorch/README.md b/cv/3d_detection/pointrcnn_iou/pytorch/README.md index 3c3feb0038e99b01130e48eb19608bbf6a303189..fe3b2e578d6b33a17ad1f46da061b3fedf388cc4 100644 --- a/cv/3d_detection/pointrcnn_iou/pytorch/README.md +++ b/cv/3d_detection/pointrcnn_iou/pytorch/README.md @@ -1,38 +1,16 @@ # PointRCNN-IoU -## Model description +## Model Description -PointRCNN-IoU is an extension of the PointRCNN object detection framework that incorporates Intersection over Union (IoU) as a metric for evaluation. IoU is a common metric used in object detection tasks to measure the overlap between predicted bounding boxes and ground truth bounding boxes. +PointRCNN-IoU is an enhanced version of the PointRCNN framework that incorporates Intersection over Union (IoU) +optimization for 3D object detection. It processes raw point cloud data in two stages: first generating 3D proposals, +then refining them with IoU-aware regression. This approach improves bounding box accuracy by directly optimizing the +overlap between predicted and ground truth boxes. PointRCNN-IoU maintains the efficiency of its predecessor while +achieving higher precision in 3D object detection tasks. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install numba -pushd /toolbox/numba -python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log -pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl -popd - -# Install spconv -pushd /toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh -popd - -# Install openpcdet -pushd /toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -popd -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -57,17 +35,41 @@ cd /toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/pointrcnn_iou.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install numba +pushd /toolbox/numba +python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log +pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl +popd + +# Install spconv +pushd /toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh +popd + +# Install openpcdet +pushd /toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh +popd ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/pointrcnn_iou.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/pointrcnn_iou.yaml ``` diff --git a/cv/3d_detection/second/pytorch/README.md b/cv/3d_detection/second/pytorch/README.md index 52d028ddd67662c31816ea4524e763d73098ddf6..e28fe30e969bb25f85812f254940a005a0d1a4ff 100644 --- a/cv/3d_detection/second/pytorch/README.md +++ b/cv/3d_detection/second/pytorch/README.md @@ -1,38 +1,16 @@ # SECOND -## Model description +## Model Description -LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data. However, problems remain, including a slow inference speed and low orientation estimation performance. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed. +SECOND is an efficient 3D object detection framework for LiDAR point cloud data, utilizing sparse convolutional networks +to enhance information retention. It introduces improved sparse convolution methods for faster training and inference, +along with novel angle loss regression for better orientation estimation. The framework also incorporates a unique data +augmentation approach to boost convergence speed and performance. SECOND achieves state-of-the-art results on the KITTI +benchmark while maintaining rapid inference, making it suitable for real-time applications like autonomous driving. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install numba -pushd /toolbox/numba -python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log -pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl -popd - -# Install spconv -pushd /toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh -popd - -# Install openpcdet -pushd /toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -popd -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -57,17 +35,41 @@ cd /toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/second.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install numba +pushd /toolbox/numba +python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log +pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl +popd + +# Install spconv +pushd /toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh +popd + +# Install openpcdet +pushd /toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh +popd ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/second.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/second.yaml ``` diff --git a/cv/3d_detection/second_iou/pytorch/README.md b/cv/3d_detection/second_iou/pytorch/README.md index aaef1ec2c077e8658863e3542455a54f485e1ee3..a36cabb15c85ba7a820e14bf535891d8c1535997 100644 --- a/cv/3d_detection/second_iou/pytorch/README.md +++ b/cv/3d_detection/second_iou/pytorch/README.md @@ -1,38 +1,17 @@ # SECOND-IoU -## Model description +## Model Description -we present a novel approach called SECOND (Sparsely Embedded CONvolutional Detection), which addresses these challenges in 3D convolution-based detection by maximizing the use of the rich 3D information present in point cloud data. This method incorporates several improvements to the existing convolutional network architecture. Spatially sparse convolutional networks are introduced for LiDAR-based detection and are used to extract information from the z-axis before the 3D data are downsampled to something akin to 2D image data. +SECOND-IoU is an enhanced version of the SECOND framework that incorporates Intersection over Union (IoU) optimization +for 3D object detection from LiDAR point clouds. It leverages sparse convolutional networks to efficiently process 3D +data while maintaining spatial information. The model introduces IoU-aware regression to improve bounding box accuracy +and orientation estimation. SECOND-IoU achieves state-of-the-art performance on 3D detection benchmarks, offering faster +inference speeds and better precision than traditional methods, making it suitable for real-time applications like +autonomous driving. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install numba -pushd /toolbox/numba -python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log -pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl -popd - -# Install spconv -pushd /toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh -popd - -# Install openpcdet -pushd /toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -popd -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -57,17 +36,41 @@ cd /toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/second_iou.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install numba +pushd /toolbox/numba +python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log +pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl +popd + +# Install spconv +pushd /toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh +popd + +# Install openpcdet +pushd /toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh +popd ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/second_iou.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/second_iou.yaml ``` diff --git a/cv/classification/README.md b/cv/classification/README.md deleted file mode 100644 index 468826e5f6f9e564d7db452c5ab59bba5f9e345a..0000000000000000000000000000000000000000 --- a/cv/classification/README.md +++ /dev/null @@ -1 +0,0 @@ -# Image Classification diff --git a/cv/classification/acmix/pytorch/README.md b/cv/classification/acmix/pytorch/README.md index a64b45247375cf35be8baf5e933151dbd7cb1442..724ed82fc23d33e88acace40dbb2b3717b071e0a 100644 --- a/cv/classification/acmix/pytorch/README.md +++ b/cv/classification/acmix/pytorch/README.md @@ -1,18 +1,20 @@ # ACmix -## Model description +## Model Description -Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. Specifically, we first show that a traditional convolution with kernel size k x k can be decomposed into k^2 individual 1x1 convolutions, followed by shift and summation operations. Then, we interpret the projections of queries, keys, and values in self-attention module as multiple 1x1 convolutions, followed by the computation of attention weights and aggregation of the values. Therefore, the first stage of both two modules comprises the similar operation. More importantly, the first stage contributes a dominant computation complexity (square of the channel size) comparing to the second stage. This observation naturally leads to an elegant integration of these two seemingly distinct paradigms, i.e., a mixed model that enjoys the benefit of both self-Attention and Convolution (ACmix), while having minimum computational overhead compared to the pure convolution or self-attention counterpart. Extensive experiments show that our model achieves consistently improved results over competitive baselines on image recognition and downstream tasks. Code and pre-trained models will be released at https://github.com/LeapLabTHU/ACmix and https://gitee.com/mindspore/models. +ACmix is an innovative deep learning model that unifies convolution and self-attention mechanisms by revealing their +shared computational foundation. It demonstrates that both operations can be decomposed into 1x1 convolutions followed +by different aggregation strategies. This insight enables ACmix to efficiently combine the benefits of both paradigms - +the local feature extraction of convolutions and the global context modeling of self-attention. The model achieves +improved performance on image recognition tasks with minimal computational overhead compared to pure convolution or +attention-based approaches. -## Step 1: Installing packages -```bash -git clone https://github.com/LeapLabTHU/ACmix.git -pip install termcolor==1.1.0 yacs==0.1.8 timm==0.4.5 -cd ACmix/Swin-Transformer -git checkout 81dddb6dff98f5e238a7fb6ab174e256489c07fa -``` +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,21 +32,32 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies -### Swin-S + ACmix on ImageNet using 8 cards: ```bash -# fix --local-rank for torch 2.x +git clone https://github.com/LeapLabTHU/ACmix.git +pip install termcolor==1.1.0 yacs==0.1.8 timm==0.4.5 +cd ACmix/Swin-Transformer +git checkout 81dddb6dff98f5e238a7fb6ab174e256489c07fa +``` + +## Model Training + +```bash +# Swin-S + ACmix on ImageNet using 8 cards + +## fix --local-rank for torch 2.x sed -i 's/--local_rank/--local-rank/g' main.py + python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --cfg configs/acmix_swin_small_patch4_window7_224.yaml --data-path /path/to/imagenet --batch-size 128 ``` -## Results on BI-V100 +## Model Results -| card | batch_size | Single Card | 8 Cards | -|:-----|------------|------------:|:-------:| -| BI | 128 | 63.59 | 502.22 | +| Model | GPU | batch_size | Single Card | 8 Cards | +|-------|---------|------------|-------------|---------| +| ACmix | BI-V100 | 128 | 63.59 | 502.22 | +## References -## Reference -[acmix](https://github.com/leaplabthu/acmix) \ No newline at end of file +- [acmix](https://github.com/leaplabthu/acmix) diff --git a/cv/classification/acnet/pytorch/README.md b/cv/classification/acnet/pytorch/README.md index 3a6b9053589024d34ad134d527ccdfc7897c3799..65a5e803771abfa718722e6e06dd3d6cbb0ff3dd 100755 --- a/cv/classification/acnet/pytorch/README.md +++ b/cv/classification/acnet/pytorch/README.md @@ -1,17 +1,20 @@ # ACNet -## Model description -As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model's robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels. +## Model Description -## Step 1: Installation +ACNet (Asymmetric Convolutional Network) is an innovative CNN architecture that enhances model performance through +Asymmetric Convolution Blocks (ACBs). These blocks use 1D asymmetric convolutions to strengthen standard square +convolution kernels, improving robustness to rotational distortions and reinforcing central kernel structures. ACNet can +be seamlessly integrated into existing architectures, boosting accuracy without additional inference costs. After +training, ACNet converts back to the original architecture, maintaining efficiency. It demonstrates consistent +performance improvements across various models on datasets like CIFAR and ImageNet. -```bash -git clone https://github.com/DingXiaoH/ACNet.git -cd ACNet -git checkout 748fb0c734b41c48eacaacf7fc5e851e33a63ce8 -``` +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -29,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies + +```bash +git clone https://github.com/DingXiaoH/ACNet.git +cd ACNet/ +git checkout 748fb0c734b41c48eacaacf7fc5e851e33a63ce8 +``` + +## Model Training ```bash ln -s /path/to/imagenet imagenet_data @@ -37,27 +48,26 @@ rm -rf acnet/acb.py rm -rf utils/misc.py mv ../acb.py acnet/ mv ../misc.py utils/ + # fix --local-rank for torch 2.x sed -i 's/--local_rank/--local-rank/g' acnet/do_acnet.py export PYTHONPATH=$PYTHONPATH:. -``` -### One single GPU -```bash +# One single GPU export CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node=1 acnet/do_acnet.py -a sres18 -b acb -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --nproc_per_node=8 acnet/do_acnet.py -a sres18 -b acb ``` -## Results +## Model Results + +| Model | GPU | ACC | FPS | +|-------|------------|-----------------------------|----------| +| ACNet | BI-V100 ×8 | top1=71.27000,top5=90.00800 | 5.78it/s | -| GPUS | ACC | FPS | -| ----------| ------------------------------|---------| -| BI V100×8 | top1=71.27000,top5=90.00800 | 5.78it/s| +## References -## Reference -- [ACNet](https://github.com/DingXiaoH/ACNet/tree/748fb0c734b41c48eacaacf7fc5e851e33a63ce8) \ No newline at end of file +- [ACNet](https://github.com/DingXiaoH/ACNet/tree/748fb0c734b41c48eacaacf7fc5e851e33a63ce8) diff --git a/cv/classification/alexnet/pytorch/README.md b/cv/classification/alexnet/pytorch/README.md index d6c665b094f7dd72520e738db71534d48b168278..699735310d33909d8231032fa480885bafb3b89e 100644 --- a/cv/classification/alexnet/pytorch/README.md +++ b/cv/classification/alexnet/pytorch/README.md @@ -1,14 +1,21 @@ # AlexNet -## Model description -AlexNet is a classic convolutional neural network architecture. It consists of convolutions, max pooling and dense layers as the basic building blocks. -## Step 1: Installing +## Model Description -```bash -pip3 install torch -pip3 install torchvision -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +AlexNet is a groundbreaking deep convolutional neural network that revolutionized computer vision. It introduced key +innovations like ReLU activations, dropout regularization, and GPU acceleration. With its 8-layer architecture featuring +5 convolutional and 3 fully-connected layers, AlexNet achieved record-breaking performance on ImageNet in 2012. Its +success popularized deep learning and established CNNs as the dominant approach for image recognition. AlexNet's design +principles continue to influence modern neural network architectures in computer vision applications.AlexNet is a +classic convolutional neural network architecture. It consists of convolutions, max pooling and dense layers as the +basic building blocks. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,28 +33,33 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies ```bash -cd start_scripts +pip3 install torch +pip3 install torchvision ``` -### One single GPU +## Model Training + ```bash -bash train_alexnet_torch.sh --data-path /path/to/imagenet +cd start_scripts ``` -### One single GPU (AMP) + ```bash +# One single GPU +bash train_alexnet_torch.sh --data-path /path/to/imagenet + +# One single GPU (AMP) bash train_alexnet_amp_torch.sh --data-path /path/to/imagenet -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine bash train_alexnet_dist_torch.sh --data-path /path/to/imagenet -``` -### 8 GPUs on one machine (AMP) -```bash + +# 8 GPUs on one machine (AMP) bash train_alexnet_dist_amp_torch.sh --data-path /path/to/imagenet ``` -## Reference -https://github.com/pytorch/vision/blob/main/torchvision +## References + +- [vision](https://github.com/pytorch/vision/blob/main/torchvision) diff --git a/cv/classification/alexnet/tensorflow/README.md b/cv/classification/alexnet/tensorflow/README.md index f462a643b35ff3d7d766485f7ed7eac7c97f4848..f95aad1c93dd899091697bc48b221d98080ba11e 100644 --- a/cv/classification/alexnet/tensorflow/README.md +++ b/cv/classification/alexnet/tensorflow/README.md @@ -1,18 +1,23 @@ # AlexNet -AlexNet is a groundbreaking convolutional neural network (CNN) introduced in 2012. It revolutionized computer vision by demonstrating the power of deep learning in image classification. With eight layers, including five convolutional and three fully connected layers, it achieved remarkable results on the ImageNet challenge with a top-1 accuracy of around 57.1%. AlexNet's success paved the way for widespread adoption of deep neural networks in computer vision tasks. +## Model Description -## Installation +AlexNet is a groundbreaking deep convolutional neural network that revolutionized computer vision. It introduced key +innovations like ReLU activations, dropout regularization, and GPU acceleration. With its 8-layer architecture featuring +5 convolutional and 3 fully-connected layers, AlexNet achieved record-breaking performance on ImageNet in 2012. Its +success popularized deep learning and established CNNs as the dominant approach for image recognition. AlexNet's design +principles continue to influence modern neural network architectures in computer vision applications. -```bash -pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger -``` +## Model Preparation -## Preparing datasets +### Prepare Resources You can get ImageNet 1K TFrecords ILSVRC2012 dataset directly from below links: -- [ImageNet 1K TFrecords ILSVRC2012 - part 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) -- [ImageNet 1K TFrecords ILSVRC2012 - part 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) + +- [ImageNet 1K TFrecords ILSVRC2012 - part + 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) +- [ImageNet 1K TFrecords ILSVRC2012 - part + 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) The ImageNet TFrecords dataset path structure should look like: @@ -26,9 +31,15 @@ imagenet_tfrecord └── validation-00127-of-00128 ``` -## Training +### Install Dependencies -**Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link.** +```bash +pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger +``` + +## Model Training + +Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link. ```bash # 1 GPU @@ -38,10 +49,12 @@ bash run_train_alexnet_imagenet.sh bash run_train_alexnet_multigpu_imagenet.sh ``` -## Results -|GPUs|ACC|FPS| -|:---:|:---:|:---:| -|BI-v100 x8|Accuracy @1 = 0.5633 Accuracy @ 5 = 0.7964|1833.9 images/sec| +## Model Results + +| Model | GPU | ACC | FPS | +|---------|------------|--------------------------------------------|-------------------| +| AlexNet | BI-v100 x8 | Accuracy @1 = 0.5633 Accuracy @ 5 = 0.7964 | 1833.9 images/sec | + +## References -## Reference -- [TensorFlow Models](https://github.com/tensorflow/models) +- [TensorFlow Models](https://github.com/tensorflow/models) diff --git a/cv/classification/byol/pytorch/README.md b/cv/classification/byol/pytorch/README.md index c9640543650574d737f42c95fcffb1f998f0834d..3b53a8d0b66e1a37d9058b641564f1d702cc8b1d 100644 --- a/cv/classification/byol/pytorch/README.md +++ b/cv/classification/byol/pytorch/README.md @@ -1,13 +1,39 @@ # BYOL -> [Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733) +## Model Description -## Model description +BYOL (Bootstrap Your Own Latent) is a self-supervised learning method that learns visual representations without +negative samples. It uses two neural networks - an online network and a target network - that learn from each other +through contrasting augmented views of the same image. BYOL's unique approach eliminates the need for negative pairs, +achieving state-of-the-art performance in unsupervised learning. It's particularly effective for pre-training models on +large datasets before fine-tuning for specific tasks. -**B**ootstrap **Y**our **O**wn **L**atent (BYOL) is a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. +## Model Preparation +### Prepare Resources -## Step 1: Installation +Prepare your dataset according to the +[docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). Sign up and login +in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole +ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies ```bash # Install libGL @@ -34,29 +60,7 @@ sed -i 's/python /python3 /g' tools/dist_train.sh python3 setup.py install ``` -## Step 2: Preparing datasets - -Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. -Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: - -```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt -``` - -## Step 3: Training +## Model Training ```bash mkdir -p data @@ -70,12 +74,13 @@ model = dict( bash tools/dist_train.sh configs/byol/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py 8 ``` -## Results -| GPUs | FPS | TOP1 Accuracy | -| ------------ | --------- | -------------- | -| BI-V100 x8 | 5408 | 71.80 | +## Model Results +| Model | GPU | FPS | TOP1 Accuracy | +|-------|------------|------|---------------| +| BYOL | BI-V100 x8 | 5408 | 71.80 | -## Reference -- [mmpretrain](https://github.com/open-mmlab/mmpretrain/) +## References +- [Paper](https://arxiv.org/abs/2006.07733) +- [mmpretrain](https://github.com/open-mmlab/mmpretrain/) diff --git a/cv/classification/cbam/pytorch/README.md b/cv/classification/cbam/pytorch/README.md index d6af1ceee63fcf1ca38fa4659ce7e317d2f20ec5..43d353876469b368eac57160586366e64828725f 100644 --- a/cv/classification/cbam/pytorch/README.md +++ b/cv/classification/cbam/pytorch/README.md @@ -1,36 +1,65 @@ # CBAM -## Model description -Official PyTorch code for "[CBAM: Convolutional Block Attention Module (ECCV2018)](http://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.html)" +## Model Description +CBAM (Convolutional Block Attention Module) is an attention mechanism that enhances CNN feature representations. It +sequentially applies channel and spatial attention to refine feature maps, improving model performance without +significant computational overhead. CBAM helps networks focus on important features while suppressing irrelevant ones, +leading to better object recognition and localization. The module is lightweight and can be easily integrated into +existing CNN architectures, making it a versatile tool for improving various computer vision tasks. -## Step 1: Installing +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies ```bash pip3 install torch pip3 install torchvision ``` -## Step 2: Training +## Model Training -ResNet50 based examples are included. Example scripts are included under ```./scripts/``` directory. ImageNet data should be included under ```./data/ImageNet/``` with foler named ```train``` and ```val```. -``` +ResNet50 based examples are included. Example scripts are included under ```./scripts/``` directory. + +```bash # To train with CBAM (ResNet50 backbone) -# For 8 GPUs +## For 8 GPUs python3 train_imagenet.py --ngpu 8 --workers 20 --arch resnet --depth 50 --epochs 100 --batch-size 256 --lr 0.1 --att-type CBAM --prefix RESNET50_IMAGENET_CBAM ./data/ImageNet -# For 1 GPUs + +## For 1 GPUs python3 train_imagenet.py --ngpu 1 --workers 20 --arch resnet --depth 50 --epochs 100 --batch-size 64 --lr 0.1 --att-type CBAM --prefix RESNET50_IMAGENET_CBAM ./data/ImageNet ``` -## Result +## Model Results -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Prec@1 76.216 fps:83.11 | -| 1 cards | fps:2634.37 | +| Model | GPU | FP32 | +|-------|------------|---------------------------| +| CBAM | BI-V100 x8 | Prec@1 76.216 fps:83.11 | +| CBAM | BI-V100 x1 | fps:2634.37 | -## Reference +## References -- [MXNet implementation of CBAM with several modifications](https://github.com/bruinxiong/Modified-CBAMnet.mxnet) by [bruinxiong](https://github.com/bruinxiong) +- [Modified-CBAMnet.mxnet](https://github.com/bruinxiong/Modified-CBAMnet.mxnet) by diff --git a/cv/classification/convnext/pytorch/README.md b/cv/classification/convnext/pytorch/README.md index 0f9b1553d4829d066d1cfa737af56545d266d340..aab8f3f6648adf85bfbfce2e85d755c00e402547 100644 --- a/cv/classification/convnext/pytorch/README.md +++ b/cv/classification/convnext/pytorch/README.md @@ -1,14 +1,20 @@ # ConvNext -## Model description -The ConvNeXT model was proposed in [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. +## Model Description -## Step 1: Installing -```bash -pip install timm==0.4.12 tensorboardX six torch torchvision -``` +ConvNext is a modern convolutional neural network architecture that bridges the gap between traditional ConvNets and +Vision Transformers. Inspired by Transformer designs, it incorporates techniques like large kernel sizes, layer +normalization, and inverted bottlenecks to achieve state-of-the-art performance. ConvNext demonstrates that properly +modernized ConvNets can match or exceed Transformer-based models in accuracy and efficiency across various vision tasks. +Its simplicity and strong performance make it a compelling choice for image classification and other computer vision +applications. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,12 +32,20 @@ imagenet └── val_list.txt ``` -## Step 2: Training -### Multiple GPUs on one machine +### Install Dependencies + ```bash +pip install timm==0.4.12 tensorboardX six torch torchvision + git clone https://github.com/facebookresearch/ConvNeXt.git -cd /path/to/ConvNeXt +cd ConvNeXt/ git checkout 048efcea897d999aed302f2639b6270aedf8d4c8 +``` + +## Model Training + +```bash +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 main.py \ --model convnext_tiny \ --drop_path 0.1 \ @@ -44,5 +58,6 @@ python3 -m torch.distributed.launch --nproc_per_node=8 main.py \ --output_dir /path/to/save_results ``` -## Reference -https://github.com/facebookresearch/ConvNeXt +## References + +- [ConvNeXt](https://github.com/facebookresearch/ConvNeXt) diff --git a/cv/classification/cspdarknet53/pytorch/README.md b/cv/classification/cspdarknet53/pytorch/README.md index 7d5a5938c6c6b1e9293392c5bb8f28e9e0dcb63d..c4812e97f19dbc5e0452e9b1a8c59fcd572d504b 100644 --- a/cv/classification/cspdarknet53/pytorch/README.md +++ b/cv/classification/cspdarknet53/pytorch/README.md @@ -1,37 +1,61 @@ # CspDarknet53 -## Model description +## Model Description -This is an implementation of CSPDarknet53 in pytorch. +CspDarknet53 is an efficient backbone network for object detection, combining Cross Stage Partial (CSP) connections with +the Darknet architecture. It reduces computational complexity while maintaining feature richness by splitting feature +maps across stages. The model achieves better gradient flow and reduces memory usage compared to traditional Darknet +architectures. CspDarknet53 is particularly effective in real-time detection tasks, offering a good balance between +accuracy and speed, making it popular in modern object detection frameworks like YOLOv4. -## Step 1: Installing +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: ```bash -pip3 install torchsummary +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt ``` -## Step 2: Training +### Install Dependencies + +```bash +pip3 install torchsummary +``` -### One single GPU +## Model Training ```bash +# One single GPU export CUDA_VISIBLE_DEVICES=0 python3 train.py --batch-size 64 --epochs 120 --data-path /home/datasets/cv/imagenet -``` -### 8 GPUs on one machine -```bash +# 8 GPUs on one machine export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --batch-size 64 --epochs 120 --data-path /home/datasets/cv/imagenet ``` -## Result +## Model Results -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1 76.644 fps 1049 | -| 1 card | fps 148 | +| Model | GPU | FP32 | +|--------------|------------|---------------------------| +| CspDarknet53 | BI-V100 x8 | Acc@1 76.644 fps 1049 | +| CspDarknet53 | BI-V100 x1 | fps 148 | -## Reference +## References -https://github.com/WongKinYiu/CrossStagePartialNetworks +- [CrossStagePartialNetworks](https://github.com/WongKinYiu/CrossStagePartialNetworks) diff --git a/cv/classification/densenet/paddlepaddle/README.md b/cv/classification/densenet/paddlepaddle/README.md index ea27049f87bc8523bfa2452bf2c3460a18788183..678b4e196a5e65106609e9f6a4cbd392a96479a4 100644 --- a/cv/classification/densenet/paddlepaddle/README.md +++ b/cv/classification/densenet/paddlepaddle/README.md @@ -1,28 +1,19 @@ # DenseNet -## Model description +## Model Description -A DenseNet is a type of convolutional neural network that utilises dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. +DenseNet is an innovative convolutional neural network architecture that introduces dense connections between layers. In +each dense block, every layer receives feature maps from all preceding layers and passes its own features to all +subsequent layers. This dense connectivity pattern improves gradient flow, encourages feature reuse, and reduces +vanishing gradient problems. DenseNet achieves state-of-the-art performance with fewer parameters compared to +traditional CNNs, making it efficient for various computer vision tasks like image classification and object detection. -## Step 1: Installation +## Model Preparation -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git - -cd PaddleClas - -yum install mesa-libGL -y - -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.13 +### Prepare Resources -python3 setup.py install -``` - -## Step 2: Preparing Datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -40,11 +31,25 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git + +cd PaddleClas/ + +yum install mesa-libGL -y + +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.13 + +python3 setup.py install +``` + +## Model Training ```bash -# Make sure your dataset path is the same as above -cd PaddleClas # Link your dataset to default location ln -s /path/to/imagenet ./dataset/ILSVRC2012 @@ -52,15 +57,17 @@ export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True export CUDA_VISIBLE_DEVICES=0,1,2,3 -python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml -o Arch.pretrained=False -o Global.device=gpu +python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py \ + -c ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml \ + -o Arch.pretrained=False -o Global.device=gpu ``` -## Results +## Model Results -| GPUs | Top1 | Top5 |ips | -|-------------|-------------|----------------|----------------| -| BI-V100 x 4 | 0.757 | 0.925 | 171 | +| Model | GPU | Top1 | Top5 | ips | +|-----------|-------------|-------|-------|-----| +| DeneseNet | BI-V100 x 4 | 0.757 | 0.925 | 171 | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/densenet/pytorch/README.md b/cv/classification/densenet/pytorch/README.md index ed15508c370301032764a7f812b440c8c6a69157..454f846aea25046b1b1437fd269c3537973382da 100755 --- a/cv/classification/densenet/pytorch/README.md +++ b/cv/classification/densenet/pytorch/README.md @@ -1,16 +1,19 @@ # DenseNet -## Model description +## Model Description -A DenseNet is a type of convolutional neural network that utilises dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. +DenseNet is an innovative convolutional neural network architecture that introduces dense connections between layers. In +each dense block, every layer receives feature maps from all preceding layers and passes its own features to all +subsequent layers. This dense connectivity pattern improves gradient flow, encourages feature reuse, and reduces +vanishing gradient problems. DenseNet achieves state-of-the-art performance with fewer parameters compared to +traditional CNNs, making it efficient for various computer vision tasks like image classification and object detection. -## Step 1: Installing +## Model Preparation -```bash -pip install torch torchvision -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,20 +31,22 @@ imagenet └── val_list.txt ``` -## Step 2: Training - -### One single GPU +### Install Dependencies ```bash -python3 train.py --data-path /path/to/imagenet --model densenet201 --batch-size 128 +pip install torch torchvision ``` -### Multiple GPUs on one machine +## Model Training ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model densenet201 --batch-size 128 + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model densenet201 --batch-size 128 ``` -## Reference +## References -[densenet](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) diff --git a/cv/classification/dpn107/pytorch/README.md b/cv/classification/dpn107/pytorch/README.md index b4d0964f450c9fc4bb50770cef4e7ff5f6ef8020..3bf787dee93a6091207e40348dbc550d911cc6ca 100644 --- a/cv/classification/dpn107/pytorch/README.md +++ b/cv/classification/dpn107/pytorch/README.md @@ -1,13 +1,19 @@ # DPN107 -## Model description -A Dual Path Network (DPN) is a convolutional neural network which presents a new topology of connection paths internally.The intuition is that ResNets enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +## Model Description + +DPN107 is an advanced dual-path network that combines the feature reuse capability of ResNet with the feature +exploration of DenseNet. This architecture enables efficient learning by maintaining two parallel paths: one for +preserving important features and another for discovering new ones. DPN107 achieves state-of-the-art performance in +image classification tasks while maintaining computational efficiency. Its unique design makes it particularly effective +for complex visual recognition tasks, offering a balance between model accuracy and resource utilization. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,17 +31,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies -## Step 2: Training -### Multiple GPUs on one machine (AMP) +```bash +pip3 install -r requirements.txt +``` + +## Model Training Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_dpn107_amp_dist.sh ``` -:beers: Done! -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +## References + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/dpn92/pytorch/README.md b/cv/classification/dpn92/pytorch/README.md index bd40d63bb0b97bfd27536fbeaea804f140e36169..9dc35d69446b46e8309aeee446281dbe55d8cce5 100644 --- a/cv/classification/dpn92/pytorch/README.md +++ b/cv/classification/dpn92/pytorch/README.md @@ -1,13 +1,19 @@ # DPN92 -## Model description -A Dual Path Network (DPN) is a convolutional neural network which presents a new topology of connection paths internally. The intuition is that ResNets enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +## Model Description + +DPN92 is a dual-path network that combines the strengths of ResNet and DenseNet architectures. It features two parallel +paths: one for feature reuse (like ResNet) and another for feature exploration (like DenseNet). This dual-path approach +enables efficient learning of both shared and new features. DPN92 achieves state-of-the-art performance in image +classification tasks while maintaining computational efficiency. Its unique architecture makes it particularly effective +for tasks requiring both feature preservation and discovery. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,15 +31,21 @@ imagenet └── val_list.txt ``` -## Step2: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_dpn92_amp_dist.sh ``` +## References -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/eca_mobilenet_v2/pytorch/README.md b/cv/classification/eca_mobilenet_v2/pytorch/README.md index 05f2dcd120f04a57bd23d82fe70bf9794c063805..7a0a347175ba7c7d26b08bc05fb515975cd9cdac 100644 --- a/cv/classification/eca_mobilenet_v2/pytorch/README.md +++ b/cv/classification/eca_mobilenet_v2/pytorch/README.md @@ -1,16 +1,20 @@ # ECA MobileNet V2 -## Model description +## Model Description -An ECA-Net is a type of convolutional neural network that utilises an Efficient Channel Attention module. +ECA MobileNet V2 is an efficient convolutional neural network that combines MobileNet V2's lightweight architecture with +an Efficient Channel Attention (ECA) module. The ECA module enhances feature representation by adaptively recalibrating +channel-wise feature responses without dimensionality reduction. This integration improves model performance while +maintaining computational efficiency, making it suitable for mobile and edge devices. ECA MobileNet V2 achieves better +accuracy than standard MobileNet V2 with minimal additional parameters, making it ideal for resource-constrained image +classification tasks. -## Step 1: Installing +## Model Preparation -```bash -pip3 install -r requirements.txt -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,16 +32,21 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` -### Multiple GPUs on one machine (AMP) +## Model Training Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_eca_mobilenet_v2_amp_dist.sh ``` -## Reference +## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/eca_resnet152/pytorch/README.md b/cv/classification/eca_resnet152/pytorch/README.md index 53f461652c97be54144b49c18c253cfc342d2df7..93d9557577d59b7ea38f3589db76646266e12e1f 100644 --- a/cv/classification/eca_resnet152/pytorch/README.md +++ b/cv/classification/eca_resnet152/pytorch/README.md @@ -1,16 +1,20 @@ # ECA ResNet152 -## Model description +## Model Description -An ECA-Net is a type of convolutional neural network that utilises an Efficient Channel Attention module. +ECA ResNet152 is an enhanced version of ResNet152 that incorporates the Efficient Channel Attention (ECA) module. This +module improves feature representation by adaptively recalibrating channel-wise feature responses without dimensionality +reduction. The ECA mechanism boosts model performance while maintaining computational efficiency. ECA ResNet152 achieves +superior accuracy in image classification tasks compared to standard ResNet152, making it particularly effective for +complex visual recognition problems. Its architecture balances performance and efficiency, making it suitable for +various computer vision applications. -## Step 1: Installing +## Model Preparation -```bash -pip3 install -r requirements.txt -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,16 +32,21 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` -### Multiple GPUs on one machine (AMP) +## Model Training Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_eca_resnet152_amp_dist.sh ``` -## Reference +## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/efficientnet_b0/paddlepaddle/README.md b/cv/classification/efficientnet_b0/paddlepaddle/README.md index aa41572f47e004346c230b0d05e53089f160f619..af8c0c9a39dc55a999f7e4aaedb2bc303f5bdceb 100644 --- a/cv/classification/efficientnet_b0/paddlepaddle/README.md +++ b/cv/classification/efficientnet_b0/paddlepaddle/README.md @@ -1,32 +1,25 @@ # EfficientNetB0 -## Model description +## Model Description -This model is the B0 version of the EfficientNet series, whitch can be used for image classification tasks, such as cat and dog classification, flower classification, and so on. +EfficientNetB0 is the baseline model in the EfficientNet series, known for its exceptional balance between accuracy and +efficiency. It uses compound scaling to uniformly scale up network width, depth, and resolution, achieving +state-of-the-art performance with minimal computational resources. The model employs mobile inverted bottleneck +convolution (MBConv) blocks with squeeze-and-excitation optimization. EfficientNetB0 is particularly effective for +mobile and edge devices, offering high accuracy in image classification tasks while maintaining low computational +requirements. -## Step 1: Installation +## Model Preparation -```bash -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git - -cd PaddleClas -pip3 install -r requirements.txt -pip3 install paddleclas -pip3 install protobuf==3.20.3 -yum install mesa-libGL -pip3 install urllib3==1.26.15 - -``` +### Prepare Resources - -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: ```bash -PaddleClas/dataset/ILSVRC2012/ +ILSVRC2012 ├── train │ └── n01440764 │ ├── n01440764_10026.JPEG @@ -39,9 +32,9 @@ PaddleClas/dataset/ILSVRC2012/ └── val_list.txt ``` -**Tips** +Tips: for `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` +directories: -For `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: - train_list.txt: train/n01440764/n01440764_10026.JPEG 0 - val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 @@ -51,7 +44,21 @@ sed -i 's#^#train/#g' train_list.txt sed -i 's#^#val/#g' val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +yum install -y mesa-libGL + +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt +pip3 install paddleclas +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.15 + +``` + +## Model Training ```bash # Link your dataset to default location @@ -67,11 +74,12 @@ export CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c ppcls/configs/ImageNet/EfficientNet/EfficientNetB0.yaml ``` -## Results +## Model Results + +| Model | GPU | ips | Top1 | Top5 | +|----------------|------------|---------|--------|--------| +| EfficientNetB0 | BI-V100 x8 | 1065.28 | 0.7683 | 0.9316 | -| GPUs| ips | Top1 | Top5 | -| ------ | ---------- |--------------|--------------| -| BI-V100 x8 | 1065.28 | 0.7683 | 0.9316 | +## References -## Reference -- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.5) \ No newline at end of file +- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.5) diff --git a/cv/classification/efficientnet_b4/pytorch/README.md b/cv/classification/efficientnet_b4/pytorch/README.md index aa4c4c9cf57c7b1058d08b337fcae6be32cfa891..91585de19d38c2da3ba6e53a700acf5f2784a92e 100755 --- a/cv/classification/efficientnet_b4/pytorch/README.md +++ b/cv/classification/efficientnet_b4/pytorch/README.md @@ -1,16 +1,19 @@ # EfficientNetB4 -## Model description +## Model Description -EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a compound coefficient. +EfficientNetB4 is a scaled-up version of the EfficientNet architecture, using compound scaling to balance network width, +depth, and resolution. It builds upon the efficient MBConv blocks with squeeze-and-excitation optimization, achieving +superior accuracy compared to smaller EfficientNet variants. The model maintains computational efficiency while handling +more complex visual recognition tasks. EfficientNetB4 is particularly effective for high-accuracy image classification +scenarios where computational resources are available, offering a good trade-off between performance and efficiency. -## Step 1: Installing +## Model Preparation -```bash -pip3 install torch torchvision -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,20 +31,22 @@ imagenet └── val_list.txt ``` -## Step 2: Training - -### One single GPU +### Install Dependencies ```bash -python3 train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-size 128 +pip3 install torch torchvision ``` -### Multiple GPUs on one machine +## Model Training ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-size 128 + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-size 128 ``` -## Reference +## References - +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/efficientnet.py) diff --git a/cv/classification/fasternet/pytorch/README.md b/cv/classification/fasternet/pytorch/README.md index cfa2c15dea87e94954eb696894e0f2728d36f7c2..09b0e132a99364f826f09de61b63912462b1430d 100644 --- a/cv/classification/fasternet/pytorch/README.md +++ b/cv/classification/fasternet/pytorch/README.md @@ -1,27 +1,20 @@ # FasterNet -## Model description +## Model Description -This is the official Pytorch/PytorchLightning implementation of the paper:
-> [**Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**](https://arxiv.org/abs/2303.03667) -> Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan -> *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023* -> +FasterNet is a high-speed neural network architecture that introduces Partial Convolution (PConv) to optimize +computational efficiency. It achieves superior performance by reducing redundant computations while maintaining feature +learning capabilities. FasterNet is designed for real-time applications, offering an excellent balance between accuracy +and speed. Its innovative architecture makes it particularly effective for mobile and edge devices, where computational +resources are limited. The model demonstrates state-of-the-art results in various computer vision tasks while +maintaining low latency. -We propose a simple yet fast and effective partial convolution (**PConv**), as well as a latency-efficient family of architectures called **FasterNet**. +## Model Preparation -## Step 1: Installation -Clone this repo and install the required packages: -```bash -pip install -r requirements.txt -git clone https://github.com/JierunChen/FasterNet.git -cd FasterNet -git checkout e8fba4465ae912359c9f661a72b14e39347e4954 -``` - -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -39,8 +32,21 @@ imagenet └── val_list.txt ``` -## Step 3: Training -**Remark**: Training will prompt wondb visualization options, you'll need a W&B account to visualize, choose "3" if you don't need to. +### Install Dependencies + +Clone this repo and install the required packages: + +```bash +pip install -r requirements.txt +git clone https://github.com/JierunChen/FasterNet.git +cd FasterNet +git checkout e8fba4465ae912359c9f661a72b14e39347e4954 +``` + +## Model Training + +**Remark**: Training will prompt wondb visualization options, you'll need a W&B account to visualize, choose "3" if you +don't need to. FasterNet-T0 training on ImageNet with a 8-GPU node: @@ -64,14 +70,14 @@ python3 train_test.py -g 0 --num_nodes 1 -n 4 -b 512 -e 2000 \ --cfg cfg/fasternet_t0.yaml ``` -To train other FasterNet variants, `--cfg` need to be changed. You may also want to change the training batch size `-b`. +To train other FasterNet variants, `--cfg` need to be changed. You may also want to change the training batch size `-b`. -## Results +## Model Results -| GPUs | FP32 | -| ----------- | ------------------------------------ | -| BI-V100 x8 | test_acc1 71.832 val_acc1 71.722 | +| Model | GPU | FP32 | +|-----------|------------|----------------------------------| +| FasterNet | BI-V100 x8 | test_acc1 71.832 val_acc1 71.722 | -## Reference +## References -[FasterNet](https://github.com/JierunChen/FasterNet/tree/e8fba4465ae912359c9f661a72b14e39347e4954) +- [FasterNet](https://github.com/JierunChen/FasterNet/tree/e8fba4465ae912359c9f661a72b14e39347e4954) diff --git a/cv/classification/googlenet/paddlepaddle/README.md b/cv/classification/googlenet/paddlepaddle/README.md index b1a78c5589d434a8422f22520483c37ca4146f24..6f4139d9c0d09962bfc3960c659d29eb77526cf9 100644 --- a/cv/classification/googlenet/paddlepaddle/README.md +++ b/cv/classification/googlenet/paddlepaddle/README.md @@ -1,19 +1,19 @@ # GoogLeNet -## Model description -GoogLeNet is a type of convolutional neural network based on the Inception architecture. It utilises Inception modules, which allow the network to choose between multiple convolutional filter sizes in each block. An Inception network stacks these modules on top of each other, with occasional max-pooling layers with stride 2 to halve the resolution of the grid. +## Model Description -## Step 1: Installing +GoogLeNet is a pioneering deep convolutional neural network that introduced the Inception architecture. It features +multiple parallel convolutional filters of different sizes within Inception modules, allowing efficient feature +extraction at various scales. The network uses 1x1 convolutions for dimensionality reduction, making it computationally +efficient. GoogLeNet achieved state-of-the-art performance in image classification tasks while maintaining relatively +low computational complexity. Its innovative design has influenced many subsequent CNN architectures in computer vision. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -31,7 +31,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run GoogLeNet AMP +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash # Modify the file: PaddleClas/ppcls/configs/ImageNet/Inception/GoogLeNet.yaml to add the option of AMP diff --git a/cv/classification/googlenet/pytorch/README.md b/cv/classification/googlenet/pytorch/README.md index ab5b3862a0eb5f26ccd6e1b6cf3fbbe026d894ae..759dd4d2855eba7ab3b42f8088c524e26ce9b2e2 100755 --- a/cv/classification/googlenet/pytorch/README.md +++ b/cv/classification/googlenet/pytorch/README.md @@ -1,11 +1,19 @@ # GoogLeNet -## Model description -GoogLeNet is a type of convolutional neural network based on the Inception architecture. It utilises Inception modules, which allow the network to choose between multiple convolutional filter sizes in each block. An Inception network stacks these modules on top of each other, with occasional max-pooling layers with stride 2 to halve the resolution of the grid. +## Model Description -## Step 1: Preparing +GoogLeNet is a pioneering deep convolutional neural network that introduced the Inception architecture. It features +multiple parallel convolutional filters of different sizes within Inception modules, allowing efficient feature +extraction at various scales. The network uses 1x1 convolutions for dimensionality reduction, making it computationally +efficient. GoogLeNet achieved state-of-the-art performance in image classification tasks while maintaining relatively +low computational complexity. Its innovative design has influenced many subsequent CNN architectures in computer vision. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -23,15 +31,16 @@ imagenet └── val_list.txt ``` -## Step 2: Training -### One single GPU +## Model Training + ```bash +# One single GPU python3 train.py --data-path /path/to/imagenet --model googlenet --batch-size 512 -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model googlenet --batch-size 512 --wd 0.000001 ``` -## Reference -https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py +## References + +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py) diff --git a/cv/classification/inceptionv3/mindspore/README.md b/cv/classification/inceptionv3/mindspore/README.md index 2af9d8f880479a7aafc8bebb476efabc6ec25df5..45c8a0701ea238b2395776248245c70d7108fe1f 100644 --- a/cv/classification/inceptionv3/mindspore/README.md +++ b/cv/classification/inceptionv3/mindspore/README.md @@ -1,24 +1,20 @@ # InceptionV3 -## Model description -InceptionV3 is a convolutional neural network architecture from the Inception family that makes several improvements including using Label Smoothing, Factorized 7 x 7 convolutions, and the use of an auxiliary classifier to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). +## Model Description -## Step 1: Installation +InceptionV3 is an advanced convolutional neural network architecture that improves upon previous Inception models with +several key innovations. It introduces factorized convolutions, label smoothing, and an auxiliary classifier to enhance +feature extraction and training stability. The network utilizes batch normalization in side branches to improve gradient +flow and convergence. InceptionV3 achieves state-of-the-art performance in image classification tasks while maintaining +computational efficiency, making it suitable for various computer vision applications requiring high accuracy and robust +feature learning. -```bash -yum install -y mesa-libGL -pip3 install -r requirements.txt -wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz -tar xf openmpi-4.0.7.tar.gz -cd openmpi-4.0.7/ -./configure --prefix=/usr/local/bin --with-orte -make -j4 && make install -export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH -export PATH=/usr/local/openmpi/bin:$PATH -``` +## Model Preparation + +### Prepare Resources -## Step 2: Preparing Datasets -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -36,36 +32,45 @@ imagenet └── val_list.txt ``` +### Install Dependencies -## Step 3: Training -```shell -ln -sf $(which python3) $(which python) +```bash +yum install -y mesa-libGL +pip3 install -r requirements.txt +wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz +tar xf openmpi-4.0.7.tar.gz +cd openmpi-4.0.7/ +./configure --prefix=/usr/local/bin --with-orte +make -j4 && make install +export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH +export PATH=/usr/local/openmpi/bin:$PATH ``` -### On single GPU -```shell -bash scripts/run_standalone_train_gpu.sh DEVICE_ID DATA_DIR CKPT_PATH -# example: bash scripts/run_standalone_train_gpu.sh /path/to/imagenet/train ./ckpt/ -``` +## Model Training -### Multiple GPUs on one machine ```shell -bash scripts/run_distribute_train_gpu.sh DATA_DIR CKPT_PATH -# example: bash scripts/run_distribute_train_gpu.sh /path/to/imagenet/train ./ckpt/ -``` +ln -sf $(which python3) $(which python) -### Use checkpoint to eval -```shell +# On single GPU +## bash scripts/run_standalone_train_gpu.sh DEVICE_ID DATA_DIR CKPT_PATH +bash scripts/run_standalone_train_gpu.sh /path/to/imagenet/train ./ckpt/ + +# Multiple GPUs on one machine +## bash scripts/run_distribute_train_gpu.sh DATA_DIR CKPT_PATH +bash scripts/run_distribute_train_gpu.sh /path/to/imagenet/train ./ckpt/ + +# Evaluation cd scripts/ DEVICE_ID=0 bash run_eval_gpu.sh $DEVICE_ID /path/to/imagenet/val/ /path/to/checkpoint ``` -## Results -| GPUS | ACC (epoch 108) | FPS | -| ----------| --------------------------| ----- | -| BI V100×4 | 'Loss': 3.9033, 'Top1-Acc': 0.4847, 'Top5-Acc': 0.7405 | 447.2 | +## Model Results + +| Model | GPU | epoch | Loss | ACC | FPS | +|-------------|-----------|-------|--------|----------------------------------------|-------| +| InceptionV3 | BI-V100×4 | 108 | 3.9033 | 'Top1-Acc': 0.4847, 'Top5-Acc': 0.7405 | 447.2 | +## References -## Reference -- [MindSpore Models](https://gitee.com/mindspore/models/tree/master/official/) \ No newline at end of file +- [mindspore/models](https://gitee.com/mindspore/models/tree/master/official/) diff --git a/cv/classification/inceptionv3/pytorch/README.md b/cv/classification/inceptionv3/pytorch/README.md index 731ae81135c838904d06cb0a35ca4b019d45f9b1..c8f869658ff31c4607dc508d0b1aae2983681832 100644 --- a/cv/classification/inceptionv3/pytorch/README.md +++ b/cv/classification/inceptionv3/pytorch/README.md @@ -1,15 +1,17 @@ # InceptionV3 -## Model description +## Model Description Inception-v3 is a convolutional neural network architecture from the Inception family that makes several improvements including using Label Smoothing, Factorized 7 x 7 convolutions, and the use of an auxiliary classifer to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install -r requirements.txt ``` -## Step 2: Preparing datasets +### Prepare Resources Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. @@ -29,7 +31,7 @@ imagenet └── val_list.txt ``` -## Step 3: Training +## Model Training ```bash @@ -40,5 +42,5 @@ export DATA_PATH=/path/to/imagenet bash train_inception_v3_amp_dist.sh ``` -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +## References +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/inceptionv3/tensorflow/README.md b/cv/classification/inceptionv3/tensorflow/README.md index 903a34ab3b738f18b276d3f011bee59feb108f7a..ae20907e70573a37adc4f08e065d505336b9c5ef 100644 --- a/cv/classification/inceptionv3/tensorflow/README.md +++ b/cv/classification/inceptionv3/tensorflow/README.md @@ -1,16 +1,18 @@ # InceptionV3 -## Model description +## Model Description InceptionV3 is a convolutional neural network architecture from the Inception family that makes several improvements including using Label Smoothing, Factorized 7 x 7 convolutions, and the use of an auxiliary classifer to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger ``` -## Step 2: Preparing datasets +### Prepare Resources Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. @@ -37,7 +39,7 @@ Refer below links to convert ImageNet data to TFrecord data. Put the TFrecord data in "./imagenet_tfrecord" directory. -## Step 3: Training +## Model Training ```bash # 1 GPU @@ -47,12 +49,12 @@ bash run_train_inceptionV3_imagenet.sh bash run_train_inceptionV3_multigpu_imagenet.sh --epoch 200 ``` -## Results +## Model Results | GPUS | ACC | FPS | | ---------- | ----- | ------------ | | BI-V100 ×8 | 76.4% | 312 images/s | -## Reference +## References - [TensorFlow/benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) \ No newline at end of file diff --git a/cv/classification/inceptionv4/pytorch/README.md b/cv/classification/inceptionv4/pytorch/README.md index 1ebba767d99010cc9a8d731132f8010a5b69025f..625b75f1819666fab9f146ae984ddeb2f3196c16 100644 --- a/cv/classification/inceptionv4/pytorch/README.md +++ b/cv/classification/inceptionv4/pytorch/README.md @@ -1,13 +1,20 @@ # InceptionV4 -## Model description -Inception-v4 is a convolutional neural network architecture that builds on previous iterations of the Inception family by simplifying the architecture and using more inception modules than Inception-v3. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +InceptionV4 is an advanced convolutional neural network architecture that refines the Inception family of models. It +simplifies previous designs while incorporating more inception modules for enhanced feature extraction. The architecture +achieves state-of-the-art performance in image classification tasks by efficiently balancing model depth and width. +InceptionV4 demonstrates improved accuracy over its predecessors while maintaining computational efficiency, making it +suitable for various computer vision applications. Its design focuses on optimizing network structure for better feature +representation and classification performance. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,18 +32,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_inceptionv4_amp_dist.sh ``` -:beers: Done! - +## References -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/internimage/pytorch/README.md b/cv/classification/internimage/pytorch/README.md index 33708fe8a1496d6089a402527181ebea2d0eb9fc..6f61e8052d2bbea8f59fb55b94fb0fa36d62a723 100644 --- a/cv/classification/internimage/pytorch/README.md +++ b/cv/classification/internimage/pytorch/README.md @@ -1,15 +1,43 @@ -# InternImage for Image Classification +# InternImage -## Model description +## Model Description -"INTERN-2.5" is a powerful multimodal multitask general model jointly released by SenseTime and Shanghai AI Laboratory. It consists of large-scale vision foundation model "InternImage", pre-training method "M3I-Pretraining", generic decoder "Uni-Perceiver" series, and generic encoder for autonomous driving perception "BEVFormer" series. +InternImage is a large-scale vision foundation model developed by SenseTime and Shanghai AI Laboratory. It's part of the +INTERN-2.5 multimodal multitask general model, designed for comprehensive visual understanding tasks. The architecture +leverages advanced techniques to achieve state-of-the-art performance in image classification and other vision tasks. +InternImage demonstrates exceptional scalability and efficiency, making it suitable for various applications from +general image recognition to complex autonomous driving perception systems. Its design focuses on balancing model +capacity with computational efficiency. -## Step 1: Installing +## Model Preparation -### Environment Preparation +### Prepare Resources -- `CUDA>=10.2` with `cudnn>=7` -- `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2` +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies + +Environment Preparation. + +- `CUDA>=10.2` with `cudnn>=7` +- `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2` ```bash # Install libGL @@ -18,53 +46,30 @@ yum install -y mesa-libGL ## Ubuntu apt install -y libgl1-mesa-glx -## Install mmcv +# Install mmcv cd mmcv/ bash clean_mmcv.sh bash build_mmcv.sh bash install_mmcv.sh cd ../ -## Install timm and mmdet +# Install timm and mmdet pip3 install timm==0.6.11 mmdet==2.28.1 -``` -- Install other requirements: - -```bash +# Install other requirements: pip3 install addict yapf opencv-python termcolor yacs pyyaml scipy -``` -- Compiling CUDA operators -```bash +# Compiling CUDA operators cd ./ops_dcnv3 sh ./make.sh + # unit test (should see all checking is True) python3 test.py -cd ../ -``` - -### Data Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: - -```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt +cd ../ ``` -## Step 2: Training +## Model Training ```bash # Training on 8 GPUs @@ -79,13 +84,13 @@ python3 main.py --cfg configs/internimage_t_1k_224.yaml --data-path /path/to/ima ``` -## Result +## Model Results -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1 83.440 fps 252 | -| 1 card | fps 31 | +| Model | GPU | FP32 | +|-------------|------------|--------------------------| +| InternImage | BI-V100 x8 | Acc@1 83.440 fps 252 | +| InternImage | BI-V100 x1 | fps 31 | -## Reference +## References -https://github.com/OpenGVLab/InternImage +- [InternImage](https://github.com/OpenGVLab/InternImage) diff --git a/cv/classification/lenet/pytorch/README.md b/cv/classification/lenet/pytorch/README.md index 39244b1c8f5985aa8b6134f28a675d968b414868..a6b0bcd6f0ce3b11a85452e62b4f51eb888c7440 100755 --- a/cv/classification/lenet/pytorch/README.md +++ b/cv/classification/lenet/pytorch/README.md @@ -1,11 +1,19 @@ # LeNet -## Model description -LeNet is a classic convolutional neural network employing the use of convolutions, pooling and fully connected layers. It was used for the handwritten digit recognition task with the MNIST dataset. The architectural design served as inspiration for future networks such as AlexNet and VGG. +## Model Description -## Step 1: Preparing +LeNet is a pioneering convolutional neural network architecture developed for handwritten digit recognition. It +introduced fundamental concepts like convolutional layers, pooling, and fully connected layers, laying the groundwork +for modern deep learning. Designed for the MNIST dataset, LeNet demonstrated the effectiveness of CNNs for image +recognition tasks. Its simple yet effective architecture inspired subsequent networks like AlexNet and VGG, making it a +cornerstone in the evolution of deep learning for computer vision applications. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -23,15 +31,16 @@ imagenet └── val_list.txt ``` -## Step 2: Training -### One single GPU +## Model Training + ```bash +# One single GPU python3 train.py --data-path /path/to/imagenet --model lenet -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model lenet ``` -## Reference -http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf +## References + +- [Paper](http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf) diff --git a/cv/classification/mobilenetv2/pytorch/README.md b/cv/classification/mobilenetv2/pytorch/README.md index fa81f1ee2a4971d073d3080ebbfaabb710568554..3f17237933d34059d3bcf050d0183b610c1473cb 100644 --- a/cv/classification/mobilenetv2/pytorch/README.md +++ b/cv/classification/mobilenetv2/pytorch/README.md @@ -1,14 +1,19 @@ # MobileNetV2 -## Model description -MobileNetV2 is a convolutional neural network architecture that seeks to perform well on mobile devices. It is based on an inverted residual structure where the residual connections are between the bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. As a whole, the architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +MobileNetV2 is an efficient convolutional neural network designed for mobile and embedded vision applications. It +introduces inverted residual blocks with linear bottlenecks, using depthwise separable convolutions to reduce +computational complexity. This architecture maintains high accuracy while significantly decreasing model size and +latency compared to traditional CNNs. MobileNetV2's design focuses on balancing performance and efficiency, making it +ideal for real-time applications on resource-constrained devices like smartphones and IoT devices. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,15 +31,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_mobilenet_v2_amp_dist.sh ``` +## References -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/mobilenetv3/mindspore/README.md b/cv/classification/mobilenetv3/mindspore/README.md index f72c9801d807643d1db1540d335b4deefde3da6f..068999ce2d07dbbb737d5fc6a059cca1d88e6df3 100644 --- a/cv/classification/mobilenetv3/mindspore/README.md +++ b/cv/classification/mobilenetv3/mindspore/README.md @@ -1,30 +1,20 @@ # MobileNetV3 -## Model description -MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware- aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances.Nov 20, 2019. +## Model Description -[Paper](https://arxiv.org/pdf/1905.02244) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019. +MobileNetV3 is an efficient convolutional neural network optimized for mobile devices, combining hardware-aware neural +architecture search with novel design techniques. It introduces improved nonlinearities and efficient network structures +to reduce computational complexity while maintaining accuracy. MobileNetV3 achieves state-of-the-art performance in +mobile vision tasks, offering variants for different computational budgets. Its design focuses on minimizing latency and +power consumption, making it ideal for real-time applications on resource-constrained devices like smartphones and +embedded systems. -## Step 1: Installation +## Model Preparation -```bash -# Install requirements -pip3 install easydict -yum install mesa-libGL - -# Install openmpi -wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz -tar xf openmpi-4.0.7.tar.gz -cd openmpi-4.0.7/ -./configure --prefix=/usr/local/bin --with-orte -make -j4 && make install -export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ -``` +### Prepare Resources - -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -41,7 +31,24 @@ imagenet │ └── ... └── val_list.txt ``` -## Step 3: Training + +### Install Dependencies + +```bash +# Install requirements +pip3 install easydict +yum install mesa-libGL + +# Install openmpi +wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz +tar xf openmpi-4.0.7.tar.gz +cd openmpi-4.0.7/ +./configure --prefix=/usr/local/bin --with-orte +make -j4 && make install +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ +``` + +## Model Training ```bash cd ../scripts @@ -51,21 +58,18 @@ bash run_train.sh GPU 1 0 /path/to/imagenet/train/ # 8 GPUs bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 /path/to/imagenet/train/ -``` -## Step 4: Inference -```bash +# Inference bash run_infer.sh GPU /path/to/imagenet/val/ ../train/checkpointckpt_0/mobilenetv3-300_2135.ckpt ``` -## Results -
- -| GPUS | ACC (ckpt107) | FPS | -| ---------- | ---------- | ---- | -| BI-V100 ×8 | 0.55 | 378.43 | +## Model Results + +| Model | GPU | ACC (ckpt107) | FPS | +|-------------|------------|---------------|--------| +| MobileNetV3 | BI-V100 ×8 | 0.55 | 378.43 | +| | -
+## References -## Reference -- [mindspore/models](https://gitee.com/mindspore/models) \ No newline at end of file +- [mindspore/models](https://gitee.com/mindspore/models) diff --git a/cv/classification/mobilenetv3/paddlepaddle/README.md b/cv/classification/mobilenetv3/paddlepaddle/README.md index 5e9a5f9b244c083f03ad53acb77671d8464fe688..1b707203d434fd8cc9a031e60fb5a33bc106bc63 100644 --- a/cv/classification/mobilenetv3/paddlepaddle/README.md +++ b/cv/classification/mobilenetv3/paddlepaddle/README.md @@ -1,20 +1,20 @@ # MobileNetV3 -## Model description -MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. -## Step 1: Installing -``` -git clone https://github.com/PaddlePaddle/PaddleClas.git -``` +## Model Description -```bash -cd PaddleClas -pip3 install -r requirements.txt -``` +MobileNetV3 is an efficient convolutional neural network optimized for mobile devices, combining hardware-aware neural +architecture search with novel design techniques. It introduces improved nonlinearities and efficient network structures +to reduce computational complexity while maintaining accuracy. MobileNetV3 achieves state-of-the-art performance in +mobile vision tasks, offering variants for different computational budgets. Its design focuses on minimizing latency and +power consumption, making it ideal for real-time applications on resource-constrained devices like smartphones and +embedded systems. -## Step 2: Prepare Datasets +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -32,9 +32,20 @@ imagenet └── val_list.txt ``` -## Step 3: Training -**Notice**: modify PaddleClas/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml file, modify the datasets path as yours. +### Install Dependencies + +```bash +git clone https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt ``` + +## Model Training + +**Notice**: modify PaddleClas/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml file, modify the datasets +path as yours. + +```bash cd PaddleClas export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True @@ -42,5 +53,6 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Reference +## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/mobilenetv3/pytorch/README.md b/cv/classification/mobilenetv3/pytorch/README.md index 1e83d9b7b7fb4658bf2ac6a18b1fcd5f14970f1e..29c69a3420c79449d4cf6b05744c5883ad1bb9f4 100644 --- a/cv/classification/mobilenetv3/pytorch/README.md +++ b/cv/classification/mobilenetv3/pytorch/README.md @@ -1,16 +1,20 @@ # MobileNetV3 -## Model description -MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. +## Model Description -## Step 1: Installation -```bash -pip3 install -r requirements.txt -``` +MobileNetV3 is an efficient convolutional neural network optimized for mobile devices, combining hardware-aware neural +architecture search with novel design techniques. It introduces improved nonlinearities and efficient network structures +to reduce computational complexity while maintaining accuracy. MobileNetV3 achieves state-of-the-art performance in +mobile vision tasks, offering variants for different computational budgets. Its design focuses on minimizing latency and +power consumption, making it ideal for real-time applications on resource-constrained devices like smartphones and +embedded systems. -## Step 2: Preparing datasets +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,7 +32,13 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training ```bash # Set data path @@ -38,5 +48,6 @@ export DATA_PATH=/path/to/imagenet bash train_mobilenet_v3_large_amp_dist.sh ``` -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification#mobilenetv3-large--small) +## References + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification#mobilenetv3-large--small) diff --git a/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md b/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md index 5d1796d4f34454734d6267d6c897b64f27b6eb9a..996ddf9f67162f6e85e020f0e481f6740f6caee3 100644 --- a/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md +++ b/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md @@ -1,25 +1,20 @@ # MobileNetV3_large_x1_0 -## Model description -MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. +## Model Description -## Step 1: Installation -``` -git clone https://github.com/PaddlePaddle/PaddleClas.git -``` +MobileNetV3_large_x1_0 is an efficient convolutional neural network optimized for mobile devices. It combines +hardware-aware neural architecture search with novel design techniques, including improved nonlinearities and efficient +network structures. This variant offers a balance between accuracy and computational efficiency, achieving 74.9% top-1 +accuracy on ImageNet. Its design focuses on reducing latency while maintaining performance, making it suitable for +mobile applications. MobileNetV3_large_x1_0 serves as a general-purpose backbone for various computer vision tasks on +resource-constrained devices. -```bash -cd PaddleClas -yum install mesa-libGL -y +## Model Preparation -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.13 -python3 setup.py install -``` +### Prepare Resources -## Step 2: Preparing Datasets -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -37,11 +32,25 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +yum install mesa-libGL -y + +git clone https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt +python3 setup.py install + +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.13 +``` + +## Model Training ```bash # Make sure your dataset path is the same as above -cd PaddleClas +cd PaddleClas/ # Link your dataset to default location ln -s /path/to/imagenet ./dataset/ILSVRC2012 @@ -51,11 +60,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results -| GPUs | Top1 | Top5 | ips | -|-------------|-------------|----------------|----------| -| BI-V100 x 4 | 0.749 | 0.922 | 512 samples/s | +## Model Results + +| Model | GPU | Top1 | Top5 | ips | +|------------------------|------------|-------|-------|---------------| +| MobileNetV3_large_x1_0 | BI-V100 x4 | 0.749 | 0.922 | 512 samples/s | +## References -## Reference - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/mobileone/pytorch/README.md b/cv/classification/mobileone/pytorch/README.md index 1ac88324187944baa6ceb58afdde9f3af5ac2531..f1770469ee1b134585ca992c4498722eeed4276f 100644 --- a/cv/classification/mobileone/pytorch/README.md +++ b/cv/classification/mobileone/pytorch/README.md @@ -1,18 +1,40 @@ # MobileOne -> [An Improved One millisecond Mobile Backbone](https://arxiv.org/abs/2206.04040) +## Model Description -## Model description +MobileOne is an efficient neural network backbone designed for mobile devices, focusing on real-world latency rather +than just FLOPs or parameter count. It uses reparameterization with depthwise and pointwise convolutions, optimizing for +speed on mobile chips. Achieving under 1ms inference time on iPhone 12 with 75.9% ImageNet accuracy, MobileOne +outperforms other efficient architectures in both speed and accuracy. It's versatile for tasks like image +classification, object detection, and segmentation, making it ideal for mobile deployment. -Mobileone is proposed by apple and based on reparameterization. On the apple chips, the accuracy of the model is close to 0.76 on the ImageNet dataset when the latency is less than 1ms. Its main improvements based on [RepVGG](../repvgg) are fllowing: +## Model Preparation -- Reparameterization using Depthwise convolution and Pointwise convolution instead of normal convolution. -- Removal of the residual structure which is not friendly to access memory. +### Prepare Resources +Prepare your dataset according to the +[docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural networks and provide ways to mitigate these bottlenecks. To this end, we design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12 with 75.9% top-1 accuracy on ImageNet. We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile. Our best model obtains similar performance on ImageNet as MobileFormer while being 38x faster. Our model obtains 2.3% better top-1 accuracy on ImageNet than EfficientNet at similar latency. Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. -## Step 1: Installation +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies ```bash # Install libGL @@ -39,39 +61,18 @@ sed -i 's/python /python3 /g' tools/dist_train.sh python3 setup.py install ``` -## Step 2: Preparing datasets - -Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. -Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: +## Model Training ```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt +bash tools/dist_train.sh configs/mobileone/mobileone-s0_8xb32_in1k.py 8 ``` -## Step 3: Training +## Model Results -```bash -bash tools/dist_train.sh configs/mobileone/mobileone-s0_8xb32_in1k.py 8 -``` +| Model | GPU | FPS | TOP1 Accuracy | +|-----------|------------|------|---------------| +| MobileOne | BI-V100 x8 | 1014 | 71.49 | -## Results -| GPUs | FPS | TOP1 Accuracy | -| ------------ | --------- |-------------- | -| BI-V100 x8 | 1014 | 71.49 | +## References -## Reference - [mmpretrain](https://github.com/open-mmlab/mmpretrain/) - diff --git a/cv/classification/mocov2/pytorch/README.md b/cv/classification/mocov2/pytorch/README.md index 408f1ab2fca8a189ce0784ed6ffae2cf894b5c79..9c81bd97734f1228594b81d5952c5fef6378a29d 100644 --- a/cv/classification/mocov2/pytorch/README.md +++ b/cv/classification/mocov2/pytorch/README.md @@ -1,13 +1,39 @@ # MoCoV2 -> [Improved Baselines with Momentum Contrastive Learning](https://arxiv.org/abs/2003.04297) +## Model Description +MoCoV2 is an improved version of Momentum Contrast (MoCo) for unsupervised learning, combining the strengths of +contrastive learning with momentum-based updates. It introduces an MLP projection head and enhanced data augmentation +techniques to boost performance without requiring large batch sizes. This approach enables effective feature learning +from unlabeled data, establishing strong baselines for self-supervised learning. MoCoV2 outperforms previous methods +like SimCLR while maintaining computational efficiency, making it accessible for various computer vision tasks. -## Model description +## Model Preparation -Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. With simple modifications to MoCo—namely, using an MLP projection head and more data augmentation—we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. +### Prepare Resources -## Installation +Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies ```bash # Install libGL @@ -34,29 +60,7 @@ sed -i 's/python /python3 /g' tools/dist_train.sh python3 setup.py install ``` -## Preparing datasets - -Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. -Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: - -```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt -``` - -## Training +## Model Training ```bash # get mocov2_resnet50_8xb32-coslr-200e_in1k_20220825-b6d23c86.pth @@ -73,10 +77,12 @@ model = dict( bash tools/dist_train.sh configs/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py 8 ``` -## Results -| Model | FPS | TOP1 Accuracy | -| ------------ | --------- |--------------| -| BI-V100 x8 | 4663 | 67.50 | +## Model Results + + | Model | GPU | FPS | TOP1 Accuracy | + |--------|------------|------|---------------| + | MoCoV2 | BI-V100 x8 | 4663 | 67.50 | + +## References -## Reference - [mmpretrain](https://github.com/open-mmlab/mmpretrain/) diff --git a/cv/classification/pp-lcnet/paddlepaddle/README.md b/cv/classification/pp-lcnet/paddlepaddle/README.md index c7d75baa96993066bc0c339293955686b40e8798..54decf1f039279ed81911bedfd050719a785dea5 100644 --- a/cv/classification/pp-lcnet/paddlepaddle/README.md +++ b/cv/classification/pp-lcnet/paddlepaddle/README.md @@ -1,20 +1,20 @@ -# PP-LCNet: A Lightweight CPU Convolutional Neural Network +# PP-LCNet -## Model description -We propose a lightweight CPU network based on the MKLDNN acceleration strategy, named PP-LCNet, which improves the performance of lightweight models on multiple tasks. This paper lists technologies which can improve network accuracy while the latency is almost constant. With these improvements, the accuracy of PP-LCNet can greatly surpass the previous network structure with the same inference time for classification. It outperforms the most state-of-the-art models. And for downstream tasks of computer vision, it also performs very well, such as object detection, semantic segmentation, etc. All our experiments are implemented based on PaddlePaddle. Code and pretrained models are available at PaddleClas. +## Model Description -## Step 1: Installation +PP-LCNet is a lightweight CPU-optimized neural network designed for efficient inference on edge devices. It leverages +MKLDNN acceleration strategies to enhance performance while maintaining low latency. The architecture achieves +state-of-the-art accuracy for lightweight models in image classification tasks and performs well in downstream computer +vision applications like object detection and semantic segmentation. PP-LCNet's design focuses on maximizing accuracy +with minimal computational overhead, making it ideal for resource-constrained environments requiring fast and efficient +inference. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -python3 setup.py install -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -32,7 +32,16 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +python3 setup.py install +``` + +## Model Training ```bash # Make sure your dataset path is the same as above @@ -45,16 +54,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results on BI-V100 +## Model Results -
+| Model | GPU | Crop Size | FPS | TOP1 Accuracy | +|--------------|------------|-----------|------|---------------| +| PPLCNet_x1_0 | BI-V100 x4 | 224x224 | 2537 | 0.7062 | -| Method | Crop Size | FPS (BI x 4) | TOP1 Accuracy | -| ------ | --------- | -------- |--------------:| -| PPLCNet_x1_0 | 224x224 | 2537 | 0.7062 | +## References -
- -## Reference - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) - diff --git a/cv/classification/repmlp/pytorch/README.md b/cv/classification/repmlp/pytorch/README.md index 99d354727187eceaf7ced40baf5f9bbae876a60c..fbc158f2db98a354c205ef5bb2a067d0bfbe290c 100644 --- a/cv/classification/repmlp/pytorch/README.md +++ b/cv/classification/repmlp/pytorch/README.md @@ -1,20 +1,20 @@ # RepMLP -## Model description -RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. Construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. +## Model Description -## Step 1: Installation +RepMLP is an innovative neural network architecture that combines the strengths of fully-connected (FC) layers and +convolutional operations. It uses FC layers for efficient long-range dependency modeling while incorporating +convolutional layers during training to capture local structures. Through structural re-parameterization, RepMLP merges +these components into pure FC layers for inference, achieving both high accuracy and computational efficiency. This +architecture is particularly effective for image recognition tasks, offering a novel approach to balance global and +local feature learning. -```bash -pip3 install timm yacs -git clone https://github.com/DingXiaoH/RepMLP.git -cd RepMLP -git checkout 3eff13fa0257af28663880d870f327d665f0a8e2 -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -32,7 +32,16 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install timm yacs +git clone https://github.com/DingXiaoH/RepMLP.git +cd RepMLP +git checkout 3eff13fa0257af28663880d870f327d665f0a8e2 +``` + +## Model Training ```bash # fix --local-rank for torch 2.x @@ -44,12 +53,12 @@ sed -i "s@dataset = torchvision.datasets.ImageNet(root=config.DATA.DATA_PATH, sp python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12349 main_repmlp.py --arch RepMLPNet-B256 --batch-size 32 --tag my_experiment --opts TRAIN.EPOCHS 100 TRAIN.BASE_LR 0.001 TRAIN.WEIGHT_DECAY 0.1 TRAIN.OPTIMIZER.NAME adamw TRAIN.OPTIMIZER.MOMENTUM 0.9 TRAIN.WARMUP_LR 5e-7 TRAIN.MIN_LR 0.0 TRAIN.WARMUP_EPOCHS 10 AUG.PRESET raug15 AUG.MIXUP 0.4 AUG.CUTMIX 1.0 DATA.IMG_SIZE 256 --data-path [/path/to/imagenet] ``` -## Results +## Model Results -|GPUs|FPS|ACC| -|----|---|---| -|BI-V100 x8|319|epoch 40: 64.866%| +| Model | GPU | FPS | ACC | +|--------|------------|-----|-------------------| +| RepMLP | BI-V100 x8 | 319 | epoch 40: 64.866% | -## Reference +## References - [RepMLP](https://github.com/DingXiaoH/RepMLP/tree/3eff13fa0257af28663880d870f327d665f0a8e2) diff --git a/cv/classification/repvgg/paddlepaddle/README.md b/cv/classification/repvgg/paddlepaddle/README.md index 5862f58f258cb3f49fcbf7c2f575fbf790cc3ca5..0f638266b5c99302bb07955ce278aee04b306ab3 100644 --- a/cv/classification/repvgg/paddlepaddle/README.md +++ b/cv/classification/repvgg/paddlepaddle/README.md @@ -1,18 +1,20 @@ # RepVGG -## Model description - A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. -## Step 1: Installing +## Model Description -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +RepVGG is a simple yet powerful convolutional neural network architecture that combines training-time multi-branch +topology with inference-time VGG-like simplicity. It uses structural re-parameterization to convert complex training +models into efficient inference models composed solely of 3x3 convolutions and ReLU activations. This approach achieves +state-of-the-art performance in image classification tasks while maintaining high speed and efficiency. RepVGG's design +is particularly suitable for applications requiring both high accuracy and fast inference, making it ideal for +real-world deployment scenarios. + +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run RepVGG +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash cd PaddleClas @@ -42,6 +52,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py -c ppcls/configs/ImageNet/RepVGG/RepVGG_A0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1=0.6990 | +## Model Results + +| Model | GPU | FP32 | +|--------|------------|--------------| +| RepVGG | BI-V100 x8 | Acc@1=0.6990 | diff --git a/cv/classification/repvgg/pytorch/README.md b/cv/classification/repvgg/pytorch/README.md index be5907217d58f2a15e6b9f9e715c77dcd384ccee..2b23b995fea8822a821c0cd01aade3fcb24a270a 100755 --- a/cv/classification/repvgg/pytorch/README.md +++ b/cv/classification/repvgg/pytorch/README.md @@ -1,19 +1,20 @@ # RepVGG -## Model description - A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. -## Step 1: Installing +## Model Description -```bash -pip3 install timm yacs -git clone https://github.com/DingXiaoH/RepVGG.git -cd RepVGG -git checkout eae7c5204001eaf195bbe2ee72fb6a37855cce33 -``` +RepVGG is a simple yet powerful convolutional neural network architecture that combines training-time multi-branch +topology with inference-time VGG-like simplicity. It uses structural re-parameterization to convert complex training +models into efficient inference models composed solely of 3x3 convolutions and ReLU activations. This approach achieves +state-of-the-art performance in image classification tasks while maintaining high speed and efficiency. RepVGG's design +is particularly suitable for applications requiring both high accuracy and fast inference, making it ideal for +real-world deployment scenarios. + +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -31,7 +32,16 @@ imagenet └── val_list.txt ``` -## Step 3: Run RepVGG +### Install Dependencies + +```bash +pip3 install timm yacs +git clone https://github.com/DingXiaoH/RepVGG.git +cd RepVGG +git checkout eae7c5204001eaf195bbe2ee72fb6a37855cce33 +``` + +## Model Training ```bash # fix --local-rank for torch 2.x @@ -44,7 +54,10 @@ sed -i "s@dataset = torchvision.datasets.ImageNet(root=config.DATA.DATA_PATH, sp python3 -m torch.distributed.launch --nproc_per_node 4 --master_port 12349 main.py --arch RepVGG-A0 --data-path ./imagenet --batch-size 32 --tag train_from_scratch --output ./ --opts TRAIN.EPOCHS 300 TRAIN.BASE_LR 0.1 TRAIN.WEIGHT_DECAY 1e-4 TRAIN.WARMUP_EPOCHS 5 MODEL.LABEL_SMOOTHING 0.1 AUG.PRESET weak AUG.MIXUP 0.0 DATA.DATASET imagenet DATA.IMG_SIZE 224 ``` -The original RepVGG models were trained in 120 epochs with cosine learning rate decay from 0.1 to 0. We used 8 GPUs, global batch size of 256, weight decay of 1e-4 (no weight decay on fc.bias, bn.bias, rbr_dense.bn.weight and rbr_1x1.bn.weight) (weight decay on rbr_identity.weight makes little difference, and it is better to use it in most of the cases), and the same simple data preprocssing as the PyTorch official example: +The original RepVGG models were trained in 120 epochs with cosine learning rate decay from 0.1 to 0. We used 8 GPUs, +global batch size of 256, weight decay of 1e-4 (no weight decay on fc.bias, bn.bias, rbr_dense.bn.weight and +rbr_1x1.bn.weight) (weight decay on rbr_identity.weight makes little difference, and it is better to use it in most of +the cases), and the same simple data preprocssing as the PyTorch official example: ```py trans = transforms.Compose([ @@ -54,16 +67,15 @@ The original RepVGG models were trained in 120 epochs with cosine learning rate transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) ``` -The valid model names include (--arch [model name]) +The valid model names include (--arch [model name]): -``` -RepVGGplus-L2pse, RepVGG-A0, RepVGG-A1, RepVGG-A2, RepVGG-B0, RepVGG-B1, RepVGG-B1g2, RepVGG-B1g4, RepVGG-B2, RepVGG-B2g2, RepVGG-B2g4, RepVGG-B3, RepVGG-B3g2, RepVGG-B3g4 -``` +RepVGGplus-L2pse, RepVGG-A0, RepVGG-A1, RepVGG-A2, RepVGG-B0, RepVGG-B1, RepVGG-B1g2, RepVGG-B1g4, RepVGG-B2, +RepVGG-B2g2, RepVGG-B2g4, RepVGG-B3, RepVGG-B3g2, RepVGG-B3g4. -| model | GPU | FP32 | -|----------| ----------- | ------------------------------------ | -| RepVGG-A0| 8 cards | Acc@1=0.7241 | +| Model | GPU | FP32 | +|-----------|------------|--------------| +| RepVGG-A0 | BI-V100 x8 | Acc@1=0.7241 | -## Reference +## References -- [RepMLP](https://github.com/DingXiaoH/RepVGG/tree/eae7c5204001eaf195bbe2ee72fb6a37855cce33) \ No newline at end of file +- [RepVGG](https://github.com/DingXiaoH/RepVGG/tree/eae7c5204001eaf195bbe2ee72fb6a37855cce33) diff --git a/cv/classification/repvit/pytorch/README.md b/cv/classification/repvit/pytorch/README.md index 1650ab34f57c6e8e5d3503c393b073915027ba91..05c43e7fe44f674a34102cb50cdf95ca24d8d478 100644 --- a/cv/classification/repvit/pytorch/README.md +++ b/cv/classification/repvit/pytorch/README.md @@ -1,24 +1,19 @@ -# RepViT -> [RepViT: Revisiting Mobile CNN From ViT Perspective](https://arxiv.org/abs/2307.09283) +# RepViT - +## Model Description -## Model description +RepViT is an efficient lightweight vision model that combines the strengths of CNNs and Transformers for mobile devices. +It enhances MobileNetV3 architecture with Transformer-inspired design choices, achieving superior performance and lower +latency than lightweight ViTs. RepViT demonstrates state-of-the-art accuracy on ImageNet while maintaining fast +inference speeds, making it ideal for resource-constrained applications. Its pure CNN architecture ensures +mobile-friendliness, with the largest variant achieving 83.7% accuracy at just 2.3ms latency on an iPhone 12. -Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on resource-constrained mobile devices. This improvement is usually attributed to the multi-head self-attention module, which enables the model to learn global representations. However, the architectural disparities between lightweight ViTs and lightweight CNNs have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs and emphasize their potential for mobile devices. We incrementally enhance the mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by integrating the efficient architectural choices of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. On ImageNet, RepViT achieves over 80\% top-1 accuracy with 1ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Our largest model, RepViT-M2.3, obtains 83.7\% accuracy with only 2.3ms latency. +## Model Preparation -## Step 1: Installation +### Prepare Resources -```bash -git clone https://github.com/THU-MIG/RepViT.git -cd RepViT -git checkout 298f42075eda5d2e6102559fad260c970769d34e -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in the later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in the later training process. The ImageNet dataset path structure should look like: @@ -36,7 +31,16 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone https://github.com/THU-MIG/RepViT.git +cd RepViT +git checkout 298f42075eda5d2e6102559fad260c970769d34e +pip3 install -r requirements.txt +``` + +## Model Training ```bash # On single GPU @@ -45,8 +49,10 @@ python3 main.py --model repvit_m0_9 --data-path /path/to/imagenet --dist-eval # Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --master_port 12346 --use_env main.py --model repvit_m0_9 --data-path /path/to/imagenet --dist-eval ``` -Tips: -- Specify your data path and model name! + +Tips: + +- Specify your data path and model name! - Choose "3" when getting the output log below during training. ```bash @@ -55,10 +61,12 @@ wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results ``` -## Results -|GPUs|FPS|ACC| -|:---:|:---:|:---:| -|BI-V100 x8|1.5984 s / it| Acc@1 78.53% | +## Model Results + +| Model | GPU | FPS | ACC | +|--------|------------|---------------|--------------| +| RepViT | BI-V100 x8 | 1.5984 s / it | Acc@1 78.53% | + +## References -## Reference -[RepViT](https://github.com/THU-MIG/RepViT/tree/298f42075eda5d2e6102559fad260c970769d34e) +- [RepViT](https://github.com/THU-MIG/RepViT/tree/298f42075eda5d2e6102559fad260c970769d34e) diff --git a/cv/classification/res2net50_14w_8s/paddlepaddle/README.md b/cv/classification/res2net50_14w_8s/paddlepaddle/README.md index d924475f5aa14060d58ee21e27ba5957b8d69068..c34af020488cb3e817683afa19fc9ea5fe2822bd 100644 --- a/cv/classification/res2net50_14w_8s/paddlepaddle/README.md +++ b/cv/classification/res2net50_14w_8s/paddlepaddle/README.md @@ -1,20 +1,19 @@ # Res2Net50_14w_8s -## Model description -Res2Net is modified from the source code of ResNet. The main function of Res2Net is to add hierarchical connections within the block and indirectly increase the receptive field while reusing the feature map. -## Step 1: Installation +## Model Description -```bash -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 urllib3==1.26.6 -yum install -y mesa-libGL -``` +Res2Net50_14w_8s is a convolutional neural network that enhances ResNet architecture by introducing hierarchical +residual-like connections within individual blocks. It increases the receptive field while reusing feature maps, +improving feature representation. The 14w_8s variant uses 14 width and 8 scales, achieving state-of-the-art performance +in image classification tasks. This architecture effectively balances model complexity and computational efficiency, +making it suitable for various computer vision applications requiring both high accuracy and efficient processing. -## Step 2: Preparing datasets +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -32,7 +31,17 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 urllib3==1.26.6 +yum install -y mesa-libGL +``` + +## Model Training ```bash cd PaddleClas @@ -44,11 +53,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ./ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results +## Model Results + +| Model | GPU | ACC | FPS | +|------------------|------------|--------------|-------------------| +| Res2Net50_14w_8s | BI-V100 x8 | top1: 0.7943 | 338.29 images/sec | -| GPUs | ACC | FPS -| ---------- | ------ | --- -| BI-V100 x8 | top1: 0.7943 | 338.29 images/sec +## References -## Reference - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/resnest101/pytorch/README.md b/cv/classification/resnest101/pytorch/README.md index cd8593f1bfd63115f562e25b4bbc05f10e4da3fa..8177b6a5142013f5c2dcea4acd164231577611c9 100644 --- a/cv/classification/resnest101/pytorch/README.md +++ b/cv/classification/resnest101/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeSt101 -## Model description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeSt101 is a deep convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It +introduces channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups +with adaptive feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks +by effectively balancing computational efficiency and model capacity. ResNeSt101's design is particularly suitable for +large-scale visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining +efficient training and inference capabilities. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_resnest101_amp_dist.sh ``` +## References - -## Reference -https://github.com/zhanghang1989/ResNeSt +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnest14/pytorch/README.md b/cv/classification/resnest14/pytorch/README.md index be0d67917cfe172aaa34d2597317a213d15df164..230d10c6ae6b5eea31170399116c488b44662b48 100644 --- a/cv/classification/resnest14/pytorch/README.md +++ b/cv/classification/resnest14/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeSt14 -## Model description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeSt14 is a lightweight convolutional neural network that combines ResNet architecture with Split-Attention blocks. +It introduces channel-wise attention mechanisms to enhance feature representation, using adaptive feature aggregation +across multiple groups. The architecture achieves efficient performance in image classification tasks by balancing model +complexity and computational efficiency. ResNeSt14's design is particularly suitable for applications with limited +resources, offering improved accuracy over standard ResNet variants while maintaining fast training and inference +capabilities. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_resnest14_amp_dist.sh ``` +## References - -## Reference -https://github.com/zhanghang1989/ResNeSt +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnest269/pytorch/README.md b/cv/classification/resnest269/pytorch/README.md index e51db47588f8f985d57dbea42f260c6dc1376ed6..d19cd53daadc9523f6efc4e5c383fcaab1f87ef1 100644 --- a/cv/classification/resnest269/pytorch/README.md +++ b/cv/classification/resnest269/pytorch/README.md @@ -1,13 +1,20 @@ # ResNeSt269 -## Model description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +## Model Description + +ResNeSt269 is an advanced convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It +introduces channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups +with adaptive feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks +by effectively balancing computational efficiency and model capacity. ResNeSt269's design is particularly suitable for +large-scale visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining +efficient training and inference capabilities. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,18 +32,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnest269_amp_dist.sh ``` -:beers: Done! - +## References -## Reference -https://github.com/zhanghang1989/ResNeSt +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnest50/paddlepaddle/README.md b/cv/classification/resnest50/paddlepaddle/README.md index 6204dd7d181c8df2f8278edf0ca367a474880d78..9717d0caa703594d3d672a605cbc3b0dfc285ef1 100644 --- a/cv/classification/resnest50/paddlepaddle/README.md +++ b/cv/classification/resnest50/paddlepaddle/README.md @@ -1,18 +1,20 @@ # ResNeSt50 -## Model description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. -## Step 1: Installing +## Model Description -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +ResNeSt50 is a convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It introduces +channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups with adaptive +feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks by effectively +balancing computational efficiency and model capacity. ResNeSt50's design is particularly suitable for large-scale +visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining efficient training +and inference capabilities. + +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run ResNeSt50 +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash cd PaddleClas @@ -42,6 +52,6 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 4 cards | Acc@1=0.7677 | +| Model | GPU | FP32 | +|-----------|------------|--------------| +| ResNeSt50 | BI-V100 x4 | Acc@1=0.7677 | diff --git a/cv/classification/resnest50/pytorch/README.md b/cv/classification/resnest50/pytorch/README.md index d801c719a7a2ed9cc65e0a65851f003722a9eb72..8f3fba24a893798942b8a7365c69f61e110681ae 100644 --- a/cv/classification/resnest50/pytorch/README.md +++ b/cv/classification/resnest50/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeSt50 -## Model description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeSt50 is a convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It introduces +channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups with adaptive +feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks by effectively +balancing computational efficiency and model capacity. ResNeSt50's design is particularly suitable for large-scale +visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining efficient training +and inference capabilities. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,15 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_resnest50_amp_dist.sh ``` +## References -## Reference -https://github.com/zhanghang1989/ResNeSt +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnet101/pytorch/README.md b/cv/classification/resnet101/pytorch/README.md index 96f8c04fb580234a813d0b9090069db441e2b300..ec5c7a7f5bf60040a429180f714b151890b97f31 100644 --- a/cv/classification/resnet101/pytorch/README.md +++ b/cv/classification/resnet101/pytorch/README.md @@ -1,13 +1,20 @@ # ResNet101 -## Model description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +## Model Description + +ResNet101 is a deep convolutional neural network with 101 layers, building upon the ResNet architecture's residual +learning framework. It extends ResNet50's capabilities with additional layers for more complex feature extraction. The +model uses skip connections to address vanishing gradient problems, enabling effective training of very deep networks. +ResNet101 achieves state-of-the-art performance in image classification tasks while maintaining computational +efficiency. Its architecture is widely used as a backbone for various computer vision applications, including object +detection and segmentation. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,15 +32,21 @@ imagenet └── val_list.txt ``` -## Step 2: Training -### Multiple GPUs on one machine +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnet101_amp_dist.sh ``` +## References - -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet152/pytorch/README.md b/cv/classification/resnet152/pytorch/README.md index 404208c3fe1062ddef7b92da795c0e42980f02d6..d21818f6624c3c5f65eb88067692e9d9af998114 100644 --- a/cv/classification/resnet152/pytorch/README.md +++ b/cv/classification/resnet152/pytorch/README.md @@ -1,13 +1,20 @@ # ResNet152 -## Model description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +## Model Description + +ResNet152 is a deep convolutional neural network with 152 layers, representing one of the largest variants in the ResNet +family. It builds upon the residual learning framework, using skip connections to enable effective training of very deep +networks. The model achieves state-of-the-art performance in image classification tasks by extracting complex +hierarchical features. ResNet152's architecture is particularly effective for large-scale visual recognition tasks, +offering improved accuracy over smaller ResNet variants while maintaining computational efficiency through its residual +connections. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnet152_amp_dist.sh ``` +## References - -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet18/pytorch/README.md b/cv/classification/resnet18/pytorch/README.md index c75289f89008b10731c9e4dede0ea7c1fbe83fed..e2004f281f073e19a1ff9a8ff93b39c5485808f1 100644 --- a/cv/classification/resnet18/pytorch/README.md +++ b/cv/classification/resnet18/pytorch/README.md @@ -1,13 +1,19 @@ # ResNet18 -## Model description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +## Model Description + +ResNet18 is a lightweight convolutional neural network with 18 layers, featuring residual connections that enable +efficient training of deep networks. It introduces skip connections that bypass layers, addressing vanishing gradient +problems and allowing for better feature learning. ResNet18 achieves strong performance in image classification tasks +while maintaining computational efficiency. Its compact architecture makes it suitable for applications with limited +resources, serving as a backbone for various computer vision tasks like object detection and segmentation. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,18 +31,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnet18_amp_dist.sh ``` -:beers: Done! - +## References -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet50/paddlepaddle/README.md b/cv/classification/resnet50/paddlepaddle/README.md index c4f0657c41c0f5c0d6bfe46a7ee0a880f09a4ee9..5151cfb89168c1901ed7dcb0571770622cab2cb8 100644 --- a/cv/classification/resnet50/paddlepaddle/README.md +++ b/cv/classification/resnet50/paddlepaddle/README.md @@ -1,21 +1,19 @@ # ResNet50 -## Model description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing +## Model Description -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -yum install mesa-libGL -y -pip3 install urllib3==1.26.6 -pip3 install protobuf==3.20.3 -``` +ResNet50 is a deep convolutional neural network with 50 layers, known for its innovative residual learning framework. It +introduces skip connections that bypass layers, enabling the training of very deep networks by addressing vanishing +gradient problems. This architecture achieved breakthrough performance in image classification tasks, winning the 2015 +ImageNet competition. ResNet50's efficient design and strong feature extraction capabilities make it widely used in +computer vision applications, serving as a backbone for various tasks like object detection and segmentation. + +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -33,9 +31,19 @@ imagenet └── val_list.txt ``` -**Tips** +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +yum install mesa-libGL -y +pip3 install urllib3==1.26.6 +pip3 install protobuf==3.20.3 +``` + +Tips: for `PaddleClas` training, the images path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: -For `PaddleClas` training, the images path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: - train_list.txt: train/n01440764/n01440764_10026.JPEG 0 - val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 @@ -58,12 +66,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results on BI-V100 - -
- -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 4 cards | Acc@1=76.27,FPS=80.37,BatchSize=64 | +## Model Results -
+| Model | GPU | FP32 | +|----------|------------|------------------------------------| +| ResNet50 | BI-V100 x4 | Acc@1=76.27,FPS=80.37,BatchSize=64 | diff --git a/cv/classification/resnet50/pytorch/README.md b/cv/classification/resnet50/pytorch/README.md index 3e3ef3f6ffb73e33c9114f4e9b80dfcaa4dfe27f..ce76b5930ce52852327420485f9c6da994696600 100644 --- a/cv/classification/resnet50/pytorch/README.md +++ b/cv/classification/resnet50/pytorch/README.md @@ -1,10 +1,19 @@ # ResNet50 -## Model description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Preparing +## Model Description -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +ResNet50 is a deep convolutional neural network with 50 layers, known for its innovative residual learning framework. It +introduces skip connections that bypass layers, enabling the training of very deep networks by addressing vanishing +gradient problems. This architecture achieved breakthrough performance in image classification tasks, winning the 2015 +ImageNet competition. ResNet50's efficient design and strong feature extraction capabilities make it widely used in +computer vision applications, serving as a backbone for various tasks like object detection and segmentation. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -22,45 +31,39 @@ imagenet └── val_list.txt ``` +## Model Training - -## Step 2: Training - -### One single GPU ```bash +# One single GPU bash scripts/fp32_1card.sh --data-path /path/to/imagenet -``` -### One single GPU (AMP) -```bash + +# One single GPU (AMP) bash scripts/amp_1card.sh --data-path /path/to/imagenet -``` -### Multiple GPUs on one machine -```bash + +# Multiple GPUs on one machine bash scripts/fp32_4cards.sh --data-path /path/to/imagenet bash scripts/fp32_8cards.sh --data-path /path/to/imagenet -``` -### Multiple GPUs on one machine (AMP) -```bash + +# Multiple GPUs on one machine (AMP) bash scripts/amp_4cards.sh --data-path /path/to/imagenet bash scripts/amp_8cards.sh --data-path /path/to/imagenet -``` + ### Multiple GPUs on two machines -```bash bash scripts/fp32_16cards.sh --data-path /path/to/imagenet ``` -## Results on BI-V100 +## Model Results -| | FP32 | AMP+NHWC | -| ----------- | ----------------------------------------------- | --------------------------------------------- | -| single card | Acc@1=76.02,FPS=330,Time=4d3h,BatchSize=280 | Acc@1=75.56,FPS=550,Time=2d13h,BatchSize=300 | -| 4 cards | Acc@1=75.89,FPS=1233,Time=1d2h,BatchSize=300 | Acc@1=79.04,FPS=2400,Time=11h,BatchSize=512 | -| 8 cards | Acc@1=74.98,FPS=2150,Time=12h43m,BatchSize=300 | Acc@1=76.43,FPS=4200,Time=8h,BatchSize=480 | +| Model | GPU | FP32 | AMP+NHWC | +|----------|------------|-------------------------------------------------|-----------------------------------------------| +| ResNet50 | BI-V100 x1 | Acc@1=76.02,FPS=330,Time=4d3h,BatchSize=280 | Acc@1=75.56,FPS=550,Time=2d13h,BatchSize=300 | +| ResNet50 | BI-V100 x4 | Acc@1=75.89,FPS=1233,Time=1d2h,BatchSize=300 | Acc@1=79.04,FPS=2400,Time=11h,BatchSize=512 | +| ResNet50 | BI-V100 x8 | Acc@1=74.98,FPS=2150,Time=12h43m,BatchSize=300 | Acc@1=76.43,FPS=4200,Time=8h,BatchSize=480 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| | top1 75.9% | SDK V2.2,bs:512,8x,AMP | 5221 | 76.43% | 128\*8 | 0.97 | 29.1\*8 | 1 | +## References -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet50/tensorflow/README.md b/cv/classification/resnet50/tensorflow/README.md index e5151215cca78ee03a987c3544781569aba9e5d3..c12d7e5e358ecdc182f47d24543ddf65617917d1 100644 --- a/cv/classification/resnet50/tensorflow/README.md +++ b/cv/classification/resnet50/tensorflow/README.md @@ -1,41 +1,41 @@ # ResNet50 -## Model description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. +## Model Description -## Prepare +ResNet50 is a deep convolutional neural network with 50 layers, known for its innovative residual learning framework. It +introduces skip connections that bypass layers, enabling the training of very deep networks by addressing vanishing +gradient problems. This architecture achieved breakthrough performance in image classification tasks, winning the 2015 +ImageNet competition. ResNet50's efficient design and strong feature extraction capabilities make it widely used in +computer vision applications, serving as a backbone for various tasks like object detection and segmentation. -### Install packages +## Model Preparation -```shell -pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger -``` +### Prepare Resources -### Download datasets +Download and convert to TFRecord format following [ImageNet-to-TFrecord](https://github.com/kmonachopoulos/ImageNet-to-TFrecord). +Or [here](https://github.com/tensorflow/models/tree/master/research/slim#downloading-and-converting-to-tfrecord-format) -[Downloading and converting to TFRecord format](https://github.com/kmonachopoulos/ImageNet-to-TFrecord) or -[here](https://github.com/tensorflow/models/tree/master/research/slim#downloading-and-converting-to-tfrecord-format) -make a file named imagenet_tfrecord, and store imagenet datasest convert to imagenet_tfrecord +Make a file named imagenet_tfrecord, and store imagenet datasest convert to imagenet_tfrecord +### Install Dependencies +```shell +pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger +``` -## Training - -### Training on single card +## Model Training ```shell +# Training on single card bash run_train_resnet50_imagenette.sh -``` -### Training on mutil-cards -```shell +# Training on mutil-cards bash run_train_resnet50_multigpu_imagenette.sh ``` +## Model Results -## Result - -| | acc | fps | -| --- | --- | --- | -| multi_card | 0.9860 | 236.9 | +| Model | GPU | acc | fps | +|----------|------------|--------|-------| +| ResNet50 | BI-V100 x8 | 0.9860 | 236.9 | diff --git a/cv/classification/resnext101_32x8d/pytorch/README.md b/cv/classification/resnext101_32x8d/pytorch/README.md index 9b6abab27c4f31e7fc489405de894e377fb03e1c..fa33d3a431f11244d97a185ab7ca7c318099e01e 100644 --- a/cv/classification/resnext101_32x8d/pytorch/README.md +++ b/cv/classification/resnext101_32x8d/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeXt101_32x8d -## Model description -A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeXt101 is a deep convolutional network that extends ResNet architecture by introducing cardinality as a new +dimension. The 32x8d variant uses 32 groups with 8-dimensional transformations in each block. This grouped convolution +approach improves feature representation while maintaining computational efficiency. ResNeXt101 achieves +state-of-the-art performance in image classification tasks by combining the benefits of residual learning with +multi-branch transformations. Its architecture is particularly effective for large-scale visual recognition tasks, +offering improved accuracy over standard ResNet models. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnext101_32x8d_amp_dist.sh ``` +## References - -## Reference -https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214 +- [imgclsmob](https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214) diff --git a/cv/classification/resnext50_32x4d/mindspore/README.md b/cv/classification/resnext50_32x4d/mindspore/README.md index 3ea8ce978c7c46e6be8b57515cde5ecb41ccd5e9..6a1ea169b578e3e7d9d93904c85c30fb0273bc08 100644 --- a/cv/classification/resnext50_32x4d/mindspore/README.md +++ b/cv/classification/resnext50_32x4d/mindspore/README.md @@ -1,27 +1,19 @@ # ResNeXt50_32x4d -## Model description -A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. +## Model Description -## Step 1: Installation -Install OpenMPI and mesa-libGL -```bash -wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz -tar -xvf openmpi-4.0.7.tar.gz -cd openmpi-4.0.7 -./configure --prefix=/usr/local/bin --with-orte -make all -make install -vim ~/.bashrc -export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ -source ~/.bashrc -yum install openssh-server, openssh-clients -yum install mesa-libGL -``` +ResNeXt50 is an enhanced version of ResNet50 that introduces cardinality as a new dimension alongside depth and width. +It uses grouped convolutions to create multiple parallel transformation paths within each block, improving feature +representation. The 32x4d variant has 32 groups with 4-dimensional transformations. This architecture achieves better +accuracy than ResNet50 with similar computational complexity, making it efficient for image classification tasks. +ResNeXt50's design has influenced many subsequent CNN architectures in computer vision. + +## Model Preparation -## Step 2:Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -39,20 +31,38 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +Install OpenMPI and mesa-libGL + +```bash +wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz +tar -xvf openmpi-4.0.7.tar.gz +cd openmpi-4.0.7 +./configure --prefix=/usr/local/bin --with-orte +make all +make install +vim ~/.bashrc +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ +source ~/.bashrc +yum install openssh-server, openssh-clients +yum install mesa-libGL +``` + +## Model Training + set `/path/to/checkpoint` to save the model. -single gpu: + ```bash +# Single gpu export CUDA_VISIBLE_DEVICES=0 python3 train.py \ --run_distribute=0 \ --device_target="GPU" \ --data_path=/path/to/imagenet/train \ --output_path /path/to/checkpoint -``` -multi-gpu: -```bash +# Multi-gpu export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 mpirun --allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-stdout \ python3 train.py \ @@ -62,21 +72,24 @@ mpirun --allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-s --output_path /path/to/checkpoint ``` -validate: -the " model_data_dir " in checkpoint_file_path should look like: `2022-02-02_time_02_22_22`, you should fill in +The " model_data_dir " in checkpoint_file_path should look like: `2022-02-02_time_02_22_22`, you should fill in the value based on your actual situation. + ```bash +# Evaluation +export CUDA_VISIBLE_DEVICES=0 python3 eval.py \ --data_path=/path/to/imagenet/val \ --device_target="GPU" \ --checkpoint_file_path=/path/to/checkpoint/model_data_dir/ckpt_0/ ``` -## Results +## Model Results + +| Model | GPU | FPS | ACC(TOP1) | ACC(TOP5) | +|-----------|------------|--------|-----------|-----------| +| ResNeXt50 | BI-V100 x8 | 109.97 | 78.18% | 94.03% | -| GPUs | FPS | ACC(TOP1) | ACC(TOP5) | -|-------------|-----------|--------------|--------------| -| BI-V100 x 8 | 109.97 | 78.18% | 94.03% | +## References -## Reference -https://gitee.com/mindspore/models/tree/master/research/cv/ResNeXt +- [ResNeXt](https://gitee.com/mindspore/models/tree/master/research/cv/ResNeXt) diff --git a/cv/classification/resnext50_32x4d/pytorch/README.md b/cv/classification/resnext50_32x4d/pytorch/README.md index 4e8a75883ad6ec36ae5a7898b3f5ac532a2e0e04..011b82080e34f0f4288c8800cf89233566661283 100644 --- a/cv/classification/resnext50_32x4d/pytorch/README.md +++ b/cv/classification/resnext50_32x4d/pytorch/README.md @@ -1,14 +1,19 @@ # ResNeXt50_32x4d -## Model description -A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeXt50 is an enhanced version of ResNet50 that introduces cardinality as a new dimension alongside depth and width. +It uses grouped convolutions to create multiple parallel transformation paths within each block, improving feature +representation. The 32x4d variant has 32 groups with 4-dimensional transformations. This architecture achieves better +accuracy than ResNet50 with similar computational complexity, making it efficient for image classification tasks. +ResNeXt50's design has influenced many subsequent CNN architectures in computer vision. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +31,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnext50_32x4d_amp_dist.sh ``` +## References - -## Reference -https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L200 +- [imgclsmob](https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L200) diff --git a/cv/classification/se_resnet50_vd/paddlepaddle/README.md b/cv/classification/se_resnet50_vd/paddlepaddle/README.md index e7ecfbcd47bf5c86fb66ec9fa03da99793e138e7..9f74f52cbb4621ddb66d4d2877130d647be5a7e2 100644 --- a/cv/classification/se_resnet50_vd/paddlepaddle/README.md +++ b/cv/classification/se_resnet50_vd/paddlepaddle/README.md @@ -1,22 +1,19 @@ # SE_ResNet50_vd -## Model description +## Model Description -The SENet structure is a weighted average between graph channels that can be embedded into other network structures. SE_ResNet50_vd is a model that adds the senet structure to ResNet50, further learning the dependency relationships between graph channels to obtain better image features. +SE_ResNet50_vd is an enhanced version of ResNet50 that incorporates Squeeze-and-Excitation (SE) blocks and variant +downsampling. The SE blocks adaptively recalibrate channel-wise feature responses, improving feature representation. The +variant downsampling preserves more information during feature map reduction. This architecture achieves better accuracy +than standard ResNet50 while maintaining computational efficiency. SE_ResNet50_vd is particularly effective for image +classification tasks, offering improved performance through better feature learning and channel attention mechanisms. -## Step 1: Installation +## Model Preparation -``` -pip3 install -r requirements.txt -python3 -m pip install urllib3==1.26.6 -yum install -y mesa-libGL - -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git -``` - -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -34,33 +31,42 @@ ILSVRC2012 └── val_list.txt ``` -**Tips** +Tips: for `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` +directories: -For `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: +- train_list.txt: train/n01440764/n01440764_10026.JPEG 0 +- val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 -* train_list.txt: train/n01440764/n01440764_10026.JPEG 0 -* val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 - -``` +```bash # add "train/" and "val/" to head of lines sed -i 's#^#train/#g' train_list.txt sed -i 's#^#val/#g' val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +python3 -m pip install urllib3==1.26.6 +yum install -y mesa-libGL +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git ``` + +## Model Training + +```bash cd PaddleClas/ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch --gpus="0,1,2,3" tools/train.py -c ./ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml ``` -## Results +## Model Results -| GPUS | ACC | FPS | -| ---- | ------ | --------- | -| BI-V100 x8 | 79.20% | 139.63 samples/s | +| Model | GPU | ACC | FPS | +|----------------|------------|--------|------------------| +| SE_ResNet50_vd | BI-V100 x8 | 79.20% | 139.63 samples/s | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.5) diff --git a/cv/classification/seresnext/pytorch/README.md b/cv/classification/seresnext/pytorch/README.md index 641c15a3508999495c8249b15af524aa17280405..ffc3ffb75795ae3f64d8489bde8317e863597724 100644 --- a/cv/classification/seresnext/pytorch/README.md +++ b/cv/classification/seresnext/pytorch/README.md @@ -1,12 +1,20 @@ # SEResNeXt -## Model description -SE ResNeXt is a variant of a ResNext that employs squeeze-and-excitation blocks to enable the network to perform dynamic channel-wise feature recalibration. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Description + +SEResNeXt is an advanced convolutional neural network that combines ResNeXt's grouped convolution with +Squeeze-and-Excitation (SE) blocks. It introduces channel attention mechanisms to adaptively recalibrate feature +responses, improving feature representation. The architecture leverages multiple parallel transformation paths within +each block while maintaining computational efficiency. SEResNeXt achieves state-of-the-art performance in image +classification tasks by effectively combining multi-branch transformations with channel-wise attention, making it +particularly suitable for complex visual recognition problems. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -24,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_seresnext101_32x4d_amp_dist.sh ``` +## References - -## Reference -https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214 +- [imgclsmob](https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214) diff --git a/cv/classification/shufflenetv2/paddlepaddle/README.md b/cv/classification/shufflenetv2/paddlepaddle/README.md index 35fa71c9c608ca3c269822ca8c2d90a13ca66d89..65bdbbf9523ab99c3c4c32a095e671afd810004f 100644 --- a/cv/classification/shufflenetv2/paddlepaddle/README.md +++ b/cv/classification/shufflenetv2/paddlepaddle/README.md @@ -1,24 +1,19 @@ # ShuffleNetv2 -## Model description -ShuffleNet v2 is a convolutional neural network optimized for a direct metric (speed) rather than indirect metrics like FLOPs. It builds upon ShuffleNet v1, which utilised pointwise group convolutions, bottleneck-like structures, and a channel shuffle operation. Differences are shown in the Figure to the right, including a new channel split operation and moving the channel shuffle operation further down the block.ShuffleNetv2 is an efficient convolutional neural network architecture for mobile devices. For more information check the paper: [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164) -## Step 1: Installation -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +## Model Description -cd PaddleClas +ShuffleNetv2 is an efficient convolutional neural network designed specifically for mobile devices. It introduces +practical guidelines for CNN architecture design, focusing on direct speed optimization rather than indirect metrics +like FLOPs. The model features a channel split operation and optimized channel shuffle mechanism, improving both +accuracy and inference speed. ShuffleNetv2 achieves state-of-the-art performance in mobile image classification tasks +while maintaining low computational complexity, making it ideal for resource-constrained applications. -yum install mesa-libGL -y +## Model Preparation -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.13 +### Prepare Resources -python3 setup.py install -``` - -## Step 2: Preparing Datasets -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -36,11 +31,26 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +yum install -y mesa-libGL + +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt +python3 setup.py install + +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.13 + +``` + +## Model Training ```bash # Make sure your dataset path is the same as above -cd PaddleClas +cd PaddleClas/ # Link your dataset to default location ln -s /path/to/imagenet ./dataset/ILSVRC2012 @@ -51,11 +61,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results +## Model Results + +| Model | GPU | Top1 | Top5 | ips | +|--------------|------------|-------|-------|------| +| ShuffleNetv2 | BI-V100 x4 | 0.684 | 0.881 | 1236 | -| GPUs | Top1 | Top5 |ips | -|-------------|-------------|----------------|----------------| -| BI-V100 x 4 | 0.684 | 0.881 | 1236 | +## References -## Reference - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/shufflenetv2/pytorch/README.md b/cv/classification/shufflenetv2/pytorch/README.md index 8ed42cde5f2e8e21724eeda70af9e145943d5c02..1f28a20cafd4a8fd5d5a7bab0d09982dd6fa2e61 100644 --- a/cv/classification/shufflenetv2/pytorch/README.md +++ b/cv/classification/shufflenetv2/pytorch/README.md @@ -1,12 +1,19 @@ # ShuffleNetV2 -## Model description -ShuffleNet v2 is a convolutional neural network optimized for a direct metric (speed) rather than indirect metrics like FLOPs. It builds upon ShuffleNet v1, which utilised pointwise group convolutions, bottleneck-like structures, and a channel shuffle operation. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Description + +ShuffleNetv2 is an efficient convolutional neural network designed specifically for mobile devices. It introduces +practical guidelines for CNN architecture design, focusing on direct speed optimization rather than indirect metrics +like FLOPs. The model features a channel split operation and optimized channel shuffle mechanism, improving both +accuracy and inference speed. ShuffleNetv2 achieves state-of-the-art performance in mobile image classification tasks +while maintaining low computational complexity, making it ideal for resource-constrained applications. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -24,15 +31,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_shufflenet_v2_x2_0_amp_dist.sh ``` +## References -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification#shufflenet-v2) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification#shufflenet-v2) diff --git a/cv/classification/squeezenet/pytorch/README.md b/cv/classification/squeezenet/pytorch/README.md index 61637e1cf2f4b4fa1a01d20941d6801e3562c68a..b084df03d0517eb8a2e631f8369cff08db3100b5 100644 --- a/cv/classification/squeezenet/pytorch/README.md +++ b/cv/classification/squeezenet/pytorch/README.md @@ -1,13 +1,20 @@ # SqueezeNet -## Model description -SqueezeNet is a convolutional neural network that employs design strategies to reduce the number of parameters, notably with the use of fire modules that "squeeze" parameters using 1x1 convolutions. +## Model Description -## Step 1: Installing -```bash -pip3 install torch torchvision -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +SqueezeNet is a lightweight convolutional neural network designed for efficient deployment on resource-constrained +devices. It achieves AlexNet-level accuracy with 50x fewer parameters through innovative "fire modules" that combine 1x1 +"squeeze" convolutions with 1x1 and 3x3 "expand" convolutions. The architecture focuses on model compression while +maintaining good classification performance. SqueezeNet is particularly suitable for mobile and embedded applications +where model size and computational efficiency are critical, offering a balance between accuracy and resource +requirements. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,17 +32,22 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies -## Step 2: Training -### One single GPU ```bash -python3 train.py --data-path /path/to/imagenet --model squeezenet1_0 --lr 0.001 +pip3 install torch torchvision ``` -### Multiple GPUs on one machine + +## Model Training + ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model squeezenet1_0 --lr 0.001 + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenette --model squeezenet1_0 --lr 0.001 ``` -## Reference -https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py +## References + +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py) diff --git a/cv/classification/swin_transformer/paddlepaddle/README.md b/cv/classification/swin_transformer/paddlepaddle/README.md index 218dbd9fd53d69834061f6ed75595950a36186f0..d63622abf20367cdda92fde8803b1b6b23ede6e2 100644 --- a/cv/classification/swin_transformer/paddlepaddle/README.md +++ b/cv/classification/swin_transformer/paddlepaddle/README.md @@ -1,18 +1,20 @@ # Swin Transformer -## Model description -The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. -## Step 1: Installing +## Model Description -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +The Swin Transformer is a hierarchical vision transformer that introduces shifted windows for efficient self-attention +computation. It processes images in local windows, reducing computational complexity while maintaining global modeling +capabilities. The architecture builds hierarchical feature maps by merging image patches in deeper layers, making it +suitable for both image classification and dense prediction tasks. Swin Transformer achieves state-of-the-art +performance in various vision tasks, offering a powerful alternative to traditional convolutional networks with its +transformer-based approach. + +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run Swin-Transformer +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash cd PaddleClas @@ -42,6 +52,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py -c ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1=0.8024 | +## Model Results + +| Model | GPU | FP32 | +|------------------|------------|--------------| +| Swin Transformer | BI-V100 x8 | Acc@1=0.8024 | diff --git a/cv/classification/swin_transformer/pytorch/README.md b/cv/classification/swin_transformer/pytorch/README.md index 3b2d0b2802bd9c6b3092cfceb398709c5afc6c9f..d62655a1abed909e1c5d84030818bd8f98aa95fa 100644 --- a/cv/classification/swin_transformer/pytorch/README.md +++ b/cv/classification/swin_transformer/pytorch/README.md @@ -1,17 +1,20 @@ # Swin Transformer -## Model description -The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. -## Step 1: Installing +## Model Description -```bash -git clone https://github.com/microsoft/Swin-Transformer.git -git checkout f82860bfb5225915aca09c3227159ee9e1df874d -cd Swin-Transformer -pip install timm==0.4.12 yacs -``` +The Swin Transformer is a hierarchical vision transformer that introduces shifted windows for efficient self-attention +computation. It processes images in local windows, reducing computational complexity while maintaining global modeling +capabilities. The architecture builds hierarchical feature maps by merging image patches in deeper layers, making it +suitable for both image classification and dense prediction tasks. Swin Transformer achieves state-of-the-art +performance in various vision tasks, offering a powerful alternative to traditional convolutional networks with its +transformer-based approach. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -29,15 +32,27 @@ imagenet └── val_list.txt ``` -## Step 2: Training -### Multiple GPUs on one machine +### Install Dependencies + ```bash -# fix --local-rank for torch 2.x +git clone https://github.com/microsoft/Swin-Transformer.git +git checkout f82860bfb5225915aca09c3227159ee9e1df874d +cd Swin-Transformer +pip install timm==0.4.12 yacs +``` + +## Model Training + +```bash +# Multiple GPUs on one machine + +## fix --local-rank for torch 2.x sed -i 's/--local_rank/--local-rank/g' main.py python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \ --cfg configs/swin/swin_tiny_patch4_window7_224.yaml --data-path /path/to/imagenet --batch-size 128 ``` -## Reference -[Swin-Transformer](https://github.com/microsoft/Swin-Transformer) +## References + +- [Swin-Transformer](https://github.com/microsoft/Swin-Transformer) diff --git a/cv/classification/vgg/paddlepaddle/README.md b/cv/classification/vgg/paddlepaddle/README.md index 4152d3a4b1ebf754d21be69915a687668f0d28bc..df26ef1909715f6a634d42b9dbd4f3e728318e37 100644 --- a/cv/classification/vgg/paddlepaddle/README.md +++ b/cv/classification/vgg/paddlepaddle/README.md @@ -1,21 +1,19 @@ # VGG16 -## Model description -VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. -## Step 1: Installing +## Model Description -```bash -git clone https://github.com/PaddlePaddle/PaddleClas.git -``` +VGG is a classic convolutional neural network architecture known for its simplicity and depth. It uses small 3x3 +convolutional filters stacked in multiple layers, allowing for effective feature extraction. The architecture typically +includes 16 or 19 weight layers, with VGG16 being the most popular variant. VGG achieved state-of-the-art performance in +image classification tasks and became a benchmark for subsequent CNN architectures. Its uniform structure and deep +design have influenced many modern deep learning models in computer vision. -```bash -cd PaddleClas -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Prepare Datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -33,8 +31,21 @@ imagenet └── val_list.txt ``` -## Step 3: Training -Notice:if use AMP, modify PaddleClas/ppcls/configs/ImageNet/VGG/VGG16.yaml, +### Install Dependencies + +```bash +git clone https://github.com/PaddlePaddle/PaddleClas.git +``` + +```bash +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training + +Notice:if use AMP, modify PaddleClas/ppcls/configs/ImageNet/VGG/VGG16.yaml, + ```yaml AMP: scale_loss: 128.0 @@ -53,5 +64,6 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/VGG/VGG16.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Reference +## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/vgg/pytorch/README.md b/cv/classification/vgg/pytorch/README.md index a55b1cb7da45a1a5b3989e1ac89d4770db43f9f9..34ce4101fe03981088726e2af5fa0f63c09e894f 100644 --- a/cv/classification/vgg/pytorch/README.md +++ b/cv/classification/vgg/pytorch/README.md @@ -1,17 +1,19 @@ # VGG16 -## Model description -VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. +## Model Description -## Step 1: Installation +VGG is a classic convolutional neural network architecture known for its simplicity and depth. It uses small 3x3 +convolutional filters stacked in multiple layers, allowing for effective feature extraction. The architecture typically +includes 16 or 19 weight layers, with VGG16 being the most popular variant. VGG achieved state-of-the-art performance in +image classification tasks and became a benchmark for subsequent CNN architectures. Its uniform structure and deep +design have influenced many modern deep learning models in computer vision. -```bash -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -29,7 +31,13 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training ```bash # Set data path @@ -39,5 +47,6 @@ export DATA_PATH=/path/to/imagenet bash train_vgg16_amp_dist.sh ``` -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +## References + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/vgg/tensorflow/README.md b/cv/classification/vgg/tensorflow/README.md index c43928627752ed13fb410e9000889123f6072955..0a1ec7786770d454c246c94296aa35bf5352c4b6 100644 --- a/cv/classification/vgg/tensorflow/README.md +++ b/cv/classification/vgg/tensorflow/README.md @@ -1,20 +1,23 @@ # VGG16 -## Model description +## Model Description -VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. +VGG is a classic convolutional neural network architecture known for its simplicity and depth. It uses small 3x3 +convolutional filters stacked in multiple layers, allowing for effective feature extraction. The architecture typically +includes 16 or 19 weight layers, with VGG16 being the most popular variant. VGG achieved state-of-the-art performance in +image classification tasks and became a benchmark for subsequent CNN architectures. Its uniform structure and deep +design have influenced many modern deep learning models in computer vision. -## Step 1: Installation +## Model Preparation -```bash -pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger -``` - -## Step 2: Preparing datasets +### Prepare Resources You can get ImageNet 1K TFrecords ILSVRC2012 dataset directly from below links: -- [ImageNet 1K TFrecords ILSVRC2012 - part 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) -- [ImageNet 1K TFrecords ILSVRC2012 - part 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) + +- [ImageNet 1K TFrecords ILSVRC2012 - part + 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) +- [ImageNet 1K TFrecords ILSVRC2012 - part + 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) The ImageNet TFrecords dataset path structure should look like: @@ -28,8 +31,16 @@ imagenet_tfrecord └── validation-00127-of-00128 ``` -## Step 3: Training -**Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link.** +### Install Dependencies + +```bash +pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger +``` + +## Model Training + +Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link. + ```bash # 1 GPU bash run_train_vgg16_imagenet.sh @@ -38,11 +49,12 @@ bash run_train_vgg16_imagenet.sh bash run_train_vgg16_multigpu_imagenet.sh ``` -## Results +## Model Results + +| Model | GPU | acc | fps | +|-------|------------|---------------------------|-------| +| VGG16 | BI-V100 ×8 | acc@1=0.7160,acc@5=0.9040 | 435.9 | -| GPUS | acc | fps | -| ----------| --------------------------| ----- | -| BI V100×8 | acc@1=0.7160,acc@5=0.9040 | 435.9 | +## References -## Reference -- [TensorFlow/benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) \ No newline at end of file +- [tensorflow/benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) diff --git a/cv/classification/wavemlp/pytorch/README.md b/cv/classification/wavemlp/pytorch/README.md index ac8c08c43a0670dd6f53f495a1e833d6ace147e2..b9a884a00fd1dfdab7cec705d89906d1a9db9ac5 100644 --- a/cv/classification/wavemlp/pytorch/README.md +++ b/cv/classification/wavemlp/pytorch/README.md @@ -1,18 +1,19 @@ # Wave-MLP -## Model description +## Model Description -In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation. The source code is available at https://github.com/huawei-noah/CV-Backbones/tree/master/wavemlp_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/wave_mlp. +Wave-MLP is an innovative vision architecture that represents image tokens as wave functions with amplitude and phase +components. It dynamically modulates token relationships through phase terms, adapting to varying semantic information +in different images. This approach enhances feature aggregation in pure MLP architectures, outperforming traditional +CNNs and transformers in tasks like image classification and object detection. Wave-MLP offers efficient computation +while maintaining high accuracy, making it suitable for various computer vision applications. -## Step 1: Installing -```bash -pip install thop timm==0.4.5 torchprofile -git clone https://github.com/huawei-noah/Efficient-AI-Backbones.git -cd Efficient-AI-Backbones/wavemlp_pytorch/ -git checkout 25531f7fdcf61e300b47c52ba80973d0af8bb011 -``` +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,9 +31,18 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies + +```bash +pip install thop timm==0.4.5 torchprofile +git clone https://github.com/huawei-noah/Efficient-AI-Backbones.git +cd Efficient-AI-Backbones/wavemlp_pytorch/ +git checkout 25531f7fdcf61e300b47c52ba80973d0af8bb011 +``` + +## Model Training -### WaveMLP_T*: +### WaveMLP_T* ### Multiple GPUs on one machine @@ -48,25 +58,21 @@ sed -i 's/args.max_history/100/g' train.py python3 -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 train.py /your_path_to/imagenet/ --output /your_path_to/output/ --model WaveMLP_T_dw --sched cosine --epochs 300 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .8 --cutmix 1.0 --model-ema --model-ema-decay 0.99996 --aa rand-m9-mstd0.5-inc1 --color-jitter 0.4 --warmup-epochs 5 --opt-eps 1e-8 --repeated-aug --remode pixel --reprob 0.25 --amp --lr 1e-3 --weight-decay .05 --drop 0 --drop-path 0.1 -b 128 ``` -## Results on BI-V100 +## Model Results on BI-V100 ### FP16 -| card-batchsize-AMP opt-level | 1 card | 8 cards | -| :-----| ----: | :----: | -| BI-bs126-O1 | 114.76 | 884.27 | - - -### FP32 - -| batch_size | 1 card | 8 cards | -| :-----| ----: | :----: | -| 128 | 140.48 | 1068.15 | +| Model | GPU | precision | batchsize | opt-level | fps | +|----------|-----------|-----------|-----------|-----------|---------| +| Wave-MLP | BI-V100x8 | FP16 | 128 | O1 | 884.27 | +| Wave-MLP | BI-V100x1 | FP16 | 128 | O1 | 114.76 | +| Wave-MLP | BI-V100x8 | FP32 | 128 | O1 | 1068.15 | +| Wave-MLP | BI-V100x1 | FP32 | 128 | O1 | 140.48 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| | 80.1 | SDK V2.2,bs:256,8x,fp32 | 1026 | 83.1 | 198\*8 | 0.98 | 29.4\*8 | 1 | +## References -## Reference -[wavemlp_pytorch](https://github.com/huawei-noah/Efficient-AI-Backbones/blob/master/wavemlp_pytorch/) +- [Efficient-AI-Backbones](https://github.com/huawei-noah/Efficient-AI-Backbones/blob/master/wavemlp_pytorch/) diff --git a/cv/classification/wide_resnet101_2/pytorch/README.md b/cv/classification/wide_resnet101_2/pytorch/README.md index b5cdc54ab5f2dd19519a683f9b28acae69a699a6..495bec852c2017ade18a6d75a74ee2c9a632b7ea 100644 --- a/cv/classification/wide_resnet101_2/pytorch/README.md +++ b/cv/classification/wide_resnet101_2/pytorch/README.md @@ -1,14 +1,19 @@ # Wide_ResNet101_2 -## Model description -Wide Residual Networks are a variant on ResNets where we decrease depth and increase the width of residual networks. This is achieved through the use of wide residual blocks. +## Model Description -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +Wide_ResNet101_2 is an enhanced version of Wide_ResNet101 that further increases network width while maintaining +residual connections. It uses wider residual blocks with more filters per layer, enabling richer feature representation. +This architecture achieves superior performance in image classification tasks by balancing increased capacity with +efficient training. Wide_ResNet101_2 demonstrates improved accuracy over standard ResNet variants while maintaining +computational efficiency, making it suitable for complex visual recognition tasks requiring high performance. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,18 +31,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -## Step 2: Training -### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_wide_resnet101_2_amp_dist.sh ``` -:beers: Done! - +## References -## Reference -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/xception/paddlepaddle/README.md b/cv/classification/xception/paddlepaddle/README.md index af341bc4976f8e08b789ab1ae5633d1504df36b0..425e10787afa8ff8a344e141e24baea58d30b80d 100644 --- a/cv/classification/xception/paddlepaddle/README.md +++ b/cv/classification/xception/paddlepaddle/README.md @@ -1,26 +1,25 @@ # Xception -## Model description +## Model Description -Xception is a convolutional neural network architecture that relies solely on depthwise separable convolution layers. +Xception is a deep convolutional neural network that extends the Inception architecture by replacing standard +convolutions with depthwise separable convolutions. This modification significantly reduces computational complexity +while maintaining high accuracy. Xception introduces extreme Inception modules that completely separate channel and +spatial correlations. The architecture achieves state-of-the-art performance in image classification tasks, offering an +efficient alternative to traditional CNNs. Its design is particularly suitable for applications requiring both high +accuracy and computational efficiency. -## Step 1: Installation +## Model Preparation -```bash -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install scikit-learn easydict visualdl==2.2.0 urllib3==1.26.6 -yum install -y mesa-libGL -``` - -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: ```bash -imagenet +ILSVRC2012 ├── train │ └── n01440764 │ ├── n01440764_10026.JPEG @@ -33,9 +32,9 @@ imagenet └── val_list.txt ``` -**Tips** +Tips: for `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` +directories: -For `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: - train_list.txt: train/n01440764/n01440764_10026.JPEG 0 - val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 @@ -45,7 +44,16 @@ sed -i 's#^#train/#g' train_list.txt sed -i 's#^#val/#g' val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install scikit-learn easydict visualdl==2.2.0 urllib3==1.26.6 +yum install -y mesa-libGL +``` + +## Model Training ```bash # Make sure your dataset path is the same as above @@ -57,10 +65,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py -c ./ppcls/configs/ImageNet/Xception/Xception41.yaml ``` -## Results -| GPUs | TOP1 | TOP5 | ips | -|:-----------:|:-----------:|:-----------:|:-----------:| -| BI-V100 x 8 |0.783 | 0.941 | 537.04 | +## Model Results + +| Model | GPU | TOP1 | TOP5 | ips | +|----------|------------|-------|-------|--------| +| Xception | BI-V100 x8 | 0.783 | 0.941 | 537.04 | + +## References -## Reference - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/xception/pytorch/README.md b/cv/classification/xception/pytorch/README.md index a22a73fc6f316b74d3ddfe925ecd2cb5e05e94f1..beb31b487c02b626c8c7b4b04ddab69de122e563 100755 --- a/cv/classification/xception/pytorch/README.md +++ b/cv/classification/xception/pytorch/README.md @@ -1,14 +1,20 @@ # Xception -## Model description -Xception is a convolutional neural network architecture that relies solely on depthwise separable convolution layers. +## Model Description -## Step 1: Installing -```bash -pip3 install torch torchvision -``` +Xception is a deep convolutional neural network that extends the Inception architecture by replacing standard +convolutions with depthwise separable convolutions. This modification significantly reduces computational complexity +while maintaining high accuracy. Xception introduces extreme Inception modules that completely separate channel and +spatial correlations. The architecture achieves state-of-the-art performance in image classification tasks, offering an +efficient alternative to traditional CNNs. Its design is particularly suitable for applications requiring both high +accuracy and computational efficiency. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,15 +32,22 @@ imagenet └── val_list.txt ``` -## Step 2: Training -### One single GPU +### Install Dependencies + ```bash -python3 train.py --data-path /path/to/imagenet --model xception +pip3 install torch torchvision ``` -### Multiple GPUs on one machine + +## Model Training + ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model xception + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model xception ``` -## Reference -https://github.com/tstandley/Xception-PyTorch +## References + +- [Xception-PyTorch](https://github.com/tstandley/Xception-PyTorch)