From 3f44b193af0362519af8a9110206d1326d7f140a Mon Sep 17 00:00:00 2001 From: lirongzhen1 Date: Wed, 29 Jul 2020 17:34:58 +0800 Subject: [PATCH] add wide and deep benchmark --- docs/source_en/benchmark.md | 22 ++++++++++++++++++++++ docs/source_zh_cn/benchmark.md | 22 ++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/docs/source_en/benchmark.md b/docs/source_en/benchmark.md index 446ddac3bb..bd1b0becf3 100644 --- a/docs/source_en/benchmark.md +++ b/docs/source_en/benchmark.md @@ -27,3 +27,25 @@ For details about the MindSpore pre-trained model, see [Model Zoo](https://gitee 1. The preceding performance is obtained based on ModelArts, the HUAWEI CLOUD AI development platform. The network contains 24 hidden layers, the sequence length is 128 tokens, and the vocabulary contains 21128 tokens. 2. For details about other open source frameworks, see [BERT For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT). + +### Wide & Deep (data parallel) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 16000 | 796892 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 16000*8 | 4872849 samples/sec | 0.76 | + +1. The preceding performance is obtained based on Atlas 800, and the model is data parallel. +2. For details about other open source frameworks, see [Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。 + +### Wide & Deep (Host-Device model parallel) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 1000 | 68715 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 8000*8 | 283830 samples/sec | 0.51 | +| | | | | Ascend: 16 * Ascend 910
CPU:384 Cores | Mixed | 8000*16 | 377848 samples/sec | 0.34 | +| | | | | Ascend: 32 * Ascend 910
CPU:768 Cores | Mixed | 8000*32 | 433423 samples/sec | 0.20 | + +1. The preceding performance is obtained based on Atlas 800, and the model is model parallel. +2. For details about other open source frameworks, see [Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。 diff --git a/docs/source_zh_cn/benchmark.md b/docs/source_zh_cn/benchmark.md index 2da80e81d9..1f4833f0c6 100644 --- a/docs/source_zh_cn/benchmark.md +++ b/docs/source_zh_cn/benchmark.md @@ -26,3 +26,25 @@ 1. 以上数据基于华为云AI开发平台ModelArts测试获得,其中网络包含24个隐藏层,句长为128个token,字典表包含21128个token。 2. 业界其他开源框架数据可参考:[BERT For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT)。 + +### Wide & Deep (数据并行) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 16000 | 796892 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 16000*8 | 4872849 samples/sec | 0.76 | + +1. 以上数据基于Atlas 800测试获得,且网络模型为数据并行。 +2. 业界其他开源框架数据可参考:[Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。 + +### Wide & Deep (Host-Device混合计算模型并行) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 8000 | 68715 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 8000*8 | 283830 samples/sec | 0.51 | +| | | | | Ascend: 16 * Ascend 910
CPU:384 Cores | Mixed | 8000*16 | 377848 samples/sec | 0.34 | +| | | | | Ascend: 32 * Ascend 910
CPU:768 Cores | Mixed | 8000*32 | 433423 samples/sec | 0.20 | + +1. 以上数据基于Atlas 800测试获得,且网络模型为模型并行。 +2. 业界其他开源框架数据可参考:[Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。 -- Gitee