# nano-vllm **Repository Path**: CCodingMan/nano-vllm ## Basic Information - **Project Name**: nano-vllm - **Description**: 该项目是用 Python 实现的轻量级 vLLM(大语言模型推理引擎)项目,核心代码仅 1000 多行。它结构清晰、易于阅读,推理速度媲美 vLLM 原版,并集成了前缀缓存(Prefix Caching)、张量并行(Tensor Parallelism)和 Torch 编译等推理优化技术。 - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-03 - **Last Updated**: 2025-07-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Nano-vLLM A lightweight vLLM implementation built from scratch. ## Key Features * 🚀 **Fast offline inference** - Comparable inference speeds to vLLM * 📖 **Readable codebase** - Clean implementation in ~ 1,200 lines of Python code * ⚡ **Optimization Suite** - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc. ## Installation ```bash pip install git+https://github.com/GeeeekExplorer/nano-vllm.git ``` ## Manual Download If you prefer to download the model weights manually, use the following command: ```bash huggingface-cli download --resume-download Qwen/Qwen3-0.6B \ --local-dir ~/huggingface/Qwen3-0.6B/ \ --local-dir-use-symlinks False ``` ## Quick Start See `example.py` for usage. The API mirrors vLLM's interface with minor differences in the `LLM.generate` method: ```python from nanovllm import LLM, SamplingParams llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1) sampling_params = SamplingParams(temperature=0.6, max_tokens=256) prompts = ["Hello, Nano-vLLM."] outputs = llm.generate(prompts, sampling_params) outputs[0]["text"] ``` ## Benchmark See `bench.py` for benchmark. **Test Configuration:** - Hardware: RTX 4070 Laptop (8GB) - Model: Qwen3-0.6B - Total Requests: 256 sequences - Input Length: Randomly sampled between 100–1024 tokens - Output Length: Randomly sampled between 100–1024 tokens **Performance Results:** | Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | |----------------|-------------|----------|-----------------------| | vLLM | 133,966 | 98.37 | 1361.84 | | Nano-vLLM | 133,966 | 93.41 | 1434.13 | ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=GeeeekExplorer/nano-vllm&type=Date)](https://www.star-history.com/#GeeeekExplorer/nano-vllm&Date)