# lambda-tensorflow-benchmark **Repository Path**: hangcui97/lambda-tensorflow-benchmark ## Basic Information - **Project Name**: lambda-tensorflow-benchmark - **Description**: lambda-tensorflow-benchmark - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-12-02 - **Last Updated**: 2024-05-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README This is the code to produce the TensorFlow benchmark on this [website](https://lambdalabs.com/gpu-benchmarks) Here are also some related blog posts: - RTX 2080 Ti Deep Learning Benchmarks with TensorFlow - 2020: https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/ - Titan RTX Deep Learning Benchmarks: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/ - Titan V Deep Learning Benchmarks with TensorFlow in 2019: https://lambdalabs.com/blog/titan-v-deep-learning-benchmarks/ Tested Environment: - OS: Ubuntu 18.04 - TensorFlow version: 1.15.4 or 2.3.1 - CUDA Version 10.0 - CUDNN Version 7.6.5 You can use [Lambda stack](https://lambdalabs.com/lambda-stack-deep-learning-software) which system-wise install the above software stack. If you have CUDA 10.0 installed, you can also create a Python virtual environment by following these steps: ``` virtualenv -p /usr/bin/python3.6 venv . venv/bin/activate pip install matplotlib # TensorFlow 1.15.4 pip install tensorflow-gpu==1.15.4 # TensorFlow 2.3.1 pip install tensorflow-gpu==2.3.1 ``` #### Step One: Clone benchmark repo ``` git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive ``` #### Step Two: Run benchmark with thermal profiler ``` TF_XLA_FLAGS=--tf_xla_auto_jit=2 \ ./batch_benchmark.sh \ min_num_gpus max_num_gpus \ num_runs num_batches_per_run \ thermal_sampling_frequency \ config_file ``` Notice if `min_num_gpus` is set to be different from `max_num_gpus`, then multiple benchmarks will be launched multiple times. One for each case between `min_num_gpus` and `max_num_gpus`. This is an example of benchmarking 4 GPUs (`min_num_gpus=4 and max_num_gpus=4`) for a single run (`num_runs=1`) of 100 batches (`num_batches_per_run=100`), measuring thermal every 2 seconds (`thermal_sampling_frequency=2`) and using the config file `config/config_resnet50_replicated_fp32_train_syn`. ``` TF_XLA_FLAGS=--tf_xla_auto_jit=2 \ ./batch_benchmark.sh 4 4 \ 1 100 \ 2 \ config/config_resnet50_replicated_fp32_train_syn ``` The config file `config_resnet50_replicated_fp32_train_syn.sh` sets up a `training` throughput test for `resnet50`, using `replicated` mode for parameter update, use `fp32` as the precision, and uses synthetic (`syn`) data: ``` MODELS="resnet50" VARIABLE_UPDATE="replicated" PRECISION="fp32" RUN_MODE="train" DATA_MODE="syn" ``` You can find more examples of configrations in the `config` folder. #### Step Three: Report Results This is the command to gather results in logs folder into a CSV file: ``` python tools/log2csv.py --precision fp32 python tools/log2csv.py --precision fp16 ``` The gathered results are saved in `tf-train-throughput-fp16.csv`, `tf-train-throughput-fp32.csv`, `tf-train-bs-fp16.csv` and `tf-train-bs-fp32.csv`. Add your own log to the `list_system` dictionary in `tools/log2csv.py`, so they can be included in the generated csv. You can also dispaly the `throughput v.s. time` and `GPU temperature v.s. time` graph using this command: ``` python display_thermal.py path-to-thermal.log --thermal_threshold ``` For example, this is the command to display the graphs of a ResNet50 training using 8x2080Ti: ``` python tools/display_thermal.py \ logs/Gold_6230-GeForce_RTX_2080_Ti_XLA_trt_TF2_2.logs/syn-replicated-fp16-8gpus/resnet50-128/thermal/1 \ --thermal_threshold 89 ``` #### Synthetic Data V.S. Real Data Set `DATA_MODE="syn"` in the config file uses synthetic data in the benchmarks. In which case images of random pixel colors were generated on GPU memory to avoid overheads such as I/O and data augmentation. You can also benchmark with real data. To do so, simply set `DATA_MODE="real"` in the config file. You also need to have imagenet tfrecords. For the purpose of benchmark training throughput, you can download and unzip this [mini portion of ImageNet](https://lambdalabs-files.s3-us-west-2.amazonaws.com/imagenet_mini.tar.gz)(1.3 GB) to your home directory. #### AMD Follow the guidance [here](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream) ``` alias drun='sudo docker run \ -it \ --network=host \ --device=/dev/kfd \ --device=/dev/dri \ --ipc=host \ --shm-size 16G \ --group-add video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v $HOME/dockerx:/dockerx' drun rocm/tensorflow:rocm3.5-tf2.1-dev #installed these two in the container https://repo.radeon.com/rocm/apt/3.5/pool/main/m/miopenkernels-gfx906-60/miopenkernels-gfx906-60_1.0.0_amd64.deb https://repo.radeon.com/rocm/apt/3.5/pool/main/m/miopenkernels-gfx906-64/miopenkernels-gfx906-64_1.0.0_amd64.deb cd /home/dockerx git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive # Run a quick resnet50 test in FP32 ./batch_benchmark.sh 1 1 1 100 2 config_resnet50_replicated_fp32_train_syn # Run full test for all models, FP32 and FP16, training and inference ./batch_benchmark.sh 1 1 1 100 2 config_all ```