# VLLMReasoning

**Repository Path**: Ghyan/vllm_reasoning

## Basic Information

- **Project Name**: VLLMReasoning
- **Description**: 基于 VLLM 推理框架和 Fastapi 后端框架，实现大模型高并发推理 demo
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: https://gitee.com/Ghyan/vllm_reasoning
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-07-31
- **Last Updated**: 2024-07-31

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# VLLMReasoning

#### 介绍

本项目构建了一个高效的大模型推理系统(Demo)，采用先进的 VLLM 推理框架，结合 FastAPI 后端框架，旨在实现高并发场景下的快速响应与准确推理。VLLM框架专为大规模语言模型设计，能够显著提升推理效率和资源利用率。FastAPI 则以其出色的性能和易用性，为后端服务提供了强大的支撑。

#### 注意事项

1. 当前项目仅为演示示例（Demo），主要展示了如何整合 VLLM 和 FastAPI 来搭建一个基础的高并发推理平台。在实际部署和应用中，建议根据具体业务需求进行定制化开发和优化，以充分发挥系统的潜力并满足特定场景下的性能要求。
2. 由于使用 VLLM 推理框架，目前 VLLM 仅支持 Linux 系统。
3. 根据自己的需要创建虚拟环境，建议 cuda 版本 >= 12.1， python 版本 >= 3.10。

#### 安装教程

1. 拉取项目：git clone https://gitee.com/Ghyan/vllm_reasoning.git
2. pip install -r requirements.txt
3. pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
4. 模型下载地址：https://www.modelscope.cn/models

#### 使用说明

1. 安装好依赖后，在 reasoning.py 中第18行修改模型路径
2. 一切准备就绪后，在项目目录下执行：python app.py
3. 调用接口：http://127.0.0.1:8000/chat
4. 调用示例：![image](调用示例.png)

#### 最后

1. VLLM框架的GitHub地址：https://github.com/vllm-project/vllm
2. FastAPI框架官网地址：https://fastapi.tiangolo.com/zh/
3. 开箱即用的异步FastApi模板：https://gitee.com/Ghyan/fastapi-study