diff --git a/tutorials/notebook/enable_cache.ipynb b/tutorials/notebook/enable_cache.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..46ff5509d4a2231be4bf7f317f98b5c8dad5b98c --- /dev/null +++ b/tutorials/notebook/enable_cache.ipynb @@ -0,0 +1,456 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "naked-wright", + "metadata": {}, + "source": [ + "# 应用单节点数据缓存\n", + "作者:[陈超然](https://gitee.com/sunny_ccr) \n", + "`Linux` `Ascend` `GPU` `CPU` `数据准备` `中级` `高级`" + ] + }, + { + "cell_type": "markdown", + "id": "bound-connecticut", + "metadata": {}, + "source": [ + "## 概述\n", + "对于需要重复访问远程的数据集或需要重复从磁盘中读取数据集的情况,可以使用单节点缓存算子将数据集缓存于本地内存中,以加速数据集的读取。\n", + "\n", + "下面,本教程将演示如何使用单节点缓存服务来缓存经过数据增强处理的数据。" + ] + }, + { + "cell_type": "markdown", + "id": "toxic-penguin", + "metadata": {}, + "source": [ + "## 配置环境\n", + "使用缓存服务前,需要安装MindSpore,并设置相关环境变量。以Conda环境为例,请在下面描述的文本框中依次输入conda的安装路径与当前的虚拟环境名称来完成`path_to_conda`与`your_env_name`变量参数的传入,随后利用这2个变量来完成`LD_LIBRARY_PATH`与`PATH`环境变量的配置" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "social-excitement", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b65f11ec184b46dc9ce09bd9712f3381", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Box(children=(Label(value='please input your path_to_conda:'), Text(value='')))" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fa8be545f5194394a65cb50a3179f080", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Box(children=(Label(value='please input your your_env_name:'), Text(value='')))" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import ipywidgets as widgets\n", + "from IPython.display import display\n", + "\n", + "label_conda = widgets.Label(value='please input your path_to_conda:')\n", + "text_conda = widgets.Text()\n", + "box_conda = widgets.Box([label_conda,text_conda])\n", + "display(box_conda)\n", + "\n", + "label_env = widgets.Label(value='please input your your_env_name:')\n", + "text_env = widgets.Text()\n", + "box_env = widgets.Box([label_env,text_env])\n", + "display(box_env)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "satisfied-parcel", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "your path of the conda is /home/sunny/miniconda3\n", + "your conda env name is seb\n" + ] + } + ], + "source": [ + "path_to_conda = text_conda.value\n", + "your_env_name = text_env.value\n", + "print(\"your path of the conda is {path_to_conda}\".format(path_to_conda=path_to_conda))\n", + "print(\"your conda env name is {your_env_name}\".format(your_env_name=your_env_name))" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "robust-asthma", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/home/sunny/miniconda3/envs/seb/lib/python3.7/site-packages/mindspore:/home/sunny/miniconda3/envs/seb/lib/python3.7/site-packages/mindspore/lib\n" + ] + } + ], + "source": [ + "import os\n", + "mindspore_path = \"{path_to_conda}/envs/{your_env_name}/lib/python3.7/site-packages/mindspore\".format(path_to_conda=path_to_conda, your_env_name=your_env_name)\n", + "\n", + "if 'LD_LIBRARY_PATH' not in os.environ:\n", + " os.environ['LD_LIBRARY_PATH'] = mindspore_path\n", + "elif mindspore_path not in os.environ['LD_LIBRARY_PATH']:\n", + " os.environ['LD_LIBRARY_PATH'] += \":\" + mindspore_path\n", + "\n", + "mindspore_lib_path = \"{path_to_conda}/envs/{your_env_name}/lib/python3.7/site-packages/mindspore/lib\".format(path_to_conda=path_to_conda, your_env_name=your_env_name)\n", + "if mindspore_lib_path not in os.environ['LD_LIBRARY_PATH']:\n", + " os.environ['LD_LIBRARY_PATH'] += \":\" + mindspore_lib_path\n", + "print(os.environ['LD_LIBRARY_PATH'])" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "banned-resource", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/home/sunny/miniconda3/envs/seb/bin:/home/sunny/miniconda3/bin:/home/sunny/miniconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin\n" + ] + } + ], + "source": [ + "conda_env_path = \"{path_to_conda}/envs/{your_env_name}/bin\".format(path_to_conda=path_to_conda, your_env_name=your_env_name)\n", + "if conda_env_path not in os.environ['PATH']:\n", + " os.environ['PATH'] += \":\" + conda_env_path\n", + "print(os.environ['PATH'])" + ] + }, + { + "cell_type": "markdown", + "id": "closed-crazy", + "metadata": {}, + "source": [ + "## 启动缓存服务器\n", + "在使用单节点缓存服务之前,首先需要启动缓存服务器:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "dressed-privacy", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cache server startup completed successfully!\n", + "The cache server daemon has been created as process id 11765 and listening on port 50052\n", + "\n", + "Recommendation:\n", + "Since the server is detached into its own daemon process, monitor the server logs (under /tmp/mindspore/cache/log) for any issues that may happen after startup\n", + "\n" + ] + } + ], + "source": [ + "!cache_admin --start" + ] + }, + { + "cell_type": "markdown", + "id": "spoken-chicago", + "metadata": {}, + "source": [ + "若提示找不到`libpython3.7m.so.1.0`文件,尝试在虚拟环境下查找其路径并设置环境变量:" + ] + }, + { + "cell_type": "raw", + "id": "killing-spotlight", + "metadata": {}, + "source": [ + "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{path_to_conda}/envs/{your_env_name}/lib" + ] + }, + { + "cell_type": "markdown", + "id": "stylish-feelings", + "metadata": {}, + "source": [ + "## 创建缓存会话\n", + "若缓存服务器中不存在缓存会话,则需要创建一个缓存会话,得到缓存会话id:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "permanent-ribbon", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Session created for server on port 50052: 3562539694\n" + ] + } + ], + "source": [ + "!cache_admin -g" + ] + }, + { + "cell_type": "markdown", + "id": "qualified-elite", + "metadata": {}, + "source": [ + "缓存会话id由服务器随机分配。" + ] + }, + { + "cell_type": "markdown", + "id": "diagnostic-transparency", + "metadata": {}, + "source": [ + "## 创建缓存实例\n", + "在脚本中使用`DatasetCache` API来定义一个名为`some_cache`的缓存实例,并把上一步中创建的缓存会话id传入`session_id`参数:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "hungarian-creature", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[WARNING] ME(11750:140586135557952,MainProcess):2021-02-23-06:55:38.977.616 [mindspore/ops/operations/array_ops.py:2302] WARN_DEPRECATED: The usage of Pack is deprecated. Please use Stack.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.\n" + ] + } + ], + "source": [ + "import mindspore.dataset as ds\n", + "\n", + "some_cache = ds.DatasetCache(session_id=3562539694, size=0, spilling=False)" + ] + }, + { + "cell_type": "markdown", + "id": "spiritual-drain", + "metadata": {}, + "source": [ + "## 插入缓存实例\n", + "下面样例中使用到CIFAR-10数据集。运行样例前,需要参照数据集加载中的方法下载并存放CIFAR-10数据集。目录结构如下:" + ] + }, + { + "cell_type": "raw", + "id": "dated-british", + "metadata": {}, + "source": [ + "├─my_training_script.py\n", + "└─cifar-10-batches-bin\n", + " ├── batches.meta.txt\n", + " ├── data_batch_1.bin\n", + " ├── data_batch_2.bin\n", + " ├── data_batch_3.bin\n", + " ├── data_batch_4.bin\n", + " ├── data_batch_5.bin\n", + " ├── readme.html\n", + " └── test_batch.bin" + ] + }, + { + "cell_type": "markdown", + "id": "powered-enemy", + "metadata": {}, + "source": [ + "在应用数据增强算子时将所创建的`some_cache`作为其`cache`参数传入:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "limiting-belarus", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 image shape: (32, 32, 3)\n", + "1 image shape: (32, 32, 3)\n", + "2 image shape: (32, 32, 3)\n", + "3 image shape: (32, 32, 3)\n", + "4 image shape: (32, 32, 3)\n" + ] + } + ], + "source": [ + "import mindspore.dataset.vision.c_transforms as c_vision\n", + "\n", + "dataset_dir = \"cifar-10-batches-bin/\"\n", + "data = ds.Cifar10Dataset(dataset_dir=dataset_dir, num_samples=5, shuffle=False, num_parallel_workers=1)\n", + "\n", + "# apply cache to map\n", + "rescale_op = c_vision.Rescale(1.0 / 255.0, -1.0)\n", + "data = data.map(input_columns=[\"image\"], operations=rescale_op, cache=some_cache)\n", + "\n", + "num_iter = 0\n", + "for item in data.create_dict_iterator(num_epochs=1): # each data is a dictionary\n", + " # in this example, each dictionary has a key \"image\"\n", + " print(\"{} image shape: {}\".format(num_iter, item[\"image\"].shape))\n", + " num_iter += 1" + ] + }, + { + "cell_type": "markdown", + "id": "analyzed-crazy", + "metadata": {}, + "source": [ + "通过cache_admin --list_sessions命令可以查看当前会话有五条数据,说明数据缓存成功。" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "exact-persian", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Listing sessions for server on port 50052\n", + "\n", + " Session Cache Id Mem cached Disk cached Avg cache size Numa hit\n", + " 3562539694 575278224 5 n/a 12442 5\n" + ] + } + ], + "source": [ + "!cache_admin --list_sessions" + ] + }, + { + "cell_type": "markdown", + "id": "structural-pharmacology", + "metadata": {}, + "source": [ + "## 销毁缓存会话\n", + "在训练结束后,可以选择将当前的缓存销毁并释放内存:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "ancient-happening", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Drop session successfully for server on port 50052\n" + ] + } + ], + "source": [ + "!cache_admin --destroy_session 3562539694" + ] + }, + { + "cell_type": "markdown", + "id": "heated-gender", + "metadata": {}, + "source": [ + "以上命令将销毁缓存会话id为3562539694的缓存。" + ] + }, + { + "cell_type": "markdown", + "id": "impaired-student", + "metadata": {}, + "source": [ + "## 关闭缓存服务器\n", + "使用完毕后,可以选择关闭缓存服务器,该操作将销毁当前服务器中存在的所有缓存会话并释放内存。" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "threaded-vulnerability", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cache server on port 50052 has been stopped successfully.\n" + ] + } + ], + "source": [ + "!cache_admin --stop" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}