# TensorNVMe **Repository Path**: tmfll/TensorNVMe ## Basic Information - **Project Name**: TensorNVMe - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-07-06 - **Last Updated**: 2024-07-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TensorNVME A Python Library provides APIs to move PyTorch Tensors between CPU and NVMe. ## Dependencies - [liburing](https://github.com/axboe/liburing) - [libaio](https://pagure.io/libaio) ## Install This package is only supported on Linux. `liburing` and `libaio` can be automatically installed. `liburing` is supported on Linux >= `5.10`, and it won't be installed if the version of your Linux < `5.10`. It will search `libaio` and `liburing` in `/usr/lib`, `/usr/lib64` and `$LD_LIBRARY_PATH`. If not found, backends will be installed in `~/.tensornvme`, and `~/.bashrc` will be modified to set `$LD_LIBRARY_PATH` correctly. **Please `source ~/.bashrc` after installation.** If you use other shells, please make sure `$LD_LIBRARY_PATH` is set correctly. > You must install pytorch and cmake before installing tensornvme. Once you upgrade pytorch, remember to reinstall tensornvme. ### From source ```shell git clone https://github.com/hpcaitech/TensorNVMe.git && cd TensorNVMe ``` First, install requirements: ```shell pip install -r requirements.txt ``` To install `tensornvme` with `liburing` and `libaio`: ```shell pip install -v --no-cache-dir . ``` To install `tensornvme` with only `liburing`: ```shell DISABLE_AIO=1 pip install -v --no-cache-dir . ``` To install `tensornvme` with only `libaio`: ```shell DISABLE_URING=1 pip install -v --no-cache-dir . ``` If you want to install `libaio` or `liburing` for system: ```shell WITH_ROOT=1 sudo pip install -v --no-cache-dir . ``` Then they will be installed in `/usr` and `~/.bashrc` will not be modified. Make sure you have root access. ### From PIP ```shell pip install packaging pip install tensornvme ``` All acceptable environment variables are the same as those when installing from source. ## Use docker ```shell git clone https://github.com/hpcaitech/TensorNVMe.git && cd TensorNVMe/docker && docker build -t tensornvme . ``` ## CLI We provide a CLI to test whether backends work well. ```shell tensornvme check ``` ## Usage It provide both synchronize and asynchronize I/O API. > Only CPU and contiguous tensors can be offloaded. Synchronize API: ```python import torch from tensornvme import DiskOffloader x = torch.rand(2, 2) y = torch.rand(4, 4, 4) offloader = DiskOffloader('./offload') offloader.sync_write(x) # x is saved to a file on disk (in ./offload folder) and the memory of x is freed offloader.sync_read(x) # x is restored offloader.sync_writev([x, y]) # x and y are offloaded offloader.sync_readv([x, y]) # x and y are restored. # sync_writev() and sync_readv() are order sensitive # E.g. sync_writev([x, y]) and sync_writev([y, x]) are different ``` Asynchronize API: ```python import torch from tensornvme import DiskOffloader x = torch.rand(2, 2) y = torch.rand(4, 4, 4) offloader = DiskOffloader('./offload') offloader.async_write(x) # x is being offloaded in the background offloader.sync_write_events() # x is offloaded and the memory of x is freed offloader.async_read(x) # x is being restored in the background offloader.sync_read_events() # x is restored offloader.async_writev([x, y]) # x and y are being offloaded in the background offloader.synchronize() # synchronize() will synchronize both write and read events. offloader.async_readv([x, y]) offloader.synchronize() # x and y are restored. # async_writev() and async_readv() are also order sensitive ``` You can use asynchronize API to overlap computation and data moving. ```python tensors = [] for _ in range(10): tensor = torch.rand(2, 2) tensors.append(tensor) offloader.sync_write(tensor) offloader.sync_read(tensors[0]) # prefetch=1, writing tensor[i] and reading tensor[i+1] for i, tensor in enumerate(tensors): offloader.sync_read_events() if i + 1 < len(tensors): offloader.async_read(tensors[i+1]) tensor.mul_(2.0) # compute offloader.sync_write_events() offloader.async_write(tensor) offloader.synchronize() ``` ## How to test We have C++ test scrpits for `AsyncIO` and `SpaceManager` class. Make sure you have installed `liburing` and `libaio`, and set environment variables correctly before testing. To run the tests: ```shell mkdir build cd build cmake .. make ./test_asyncio ./test_space_mgr ``` We also have python unit tests. Make sure you have installed `pytest`. To run: ```shell pytest ./tests ``` ## How to benchmark We have benchmarks for `Adam` and `CpuAdam` with different backend and prefetch depth to validate TensorNVME's speed. To run the benchmark: ```shell cd benchmark python benchmark_adam.py python benchmark_cpuadam.py ```