# sllm **Repository Path**: weizj2000/sllm ## Basic Information - **Project Name**: sllm - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-06-19 - **Last Updated**: 2024-06-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # sllm #### 介绍 {**以下是 Gitee 平台说明,您可以替换此简介** Gitee 是 OSCHINA 推出的基于 Git 的代码托管平台(同时支持 SVN)。专为开发者提供稳定、高效、安全的云端软件开发协作平台 无论是个人、团队、或是企业,都能够用 Gitee 实现代码托管、项目管理、协作开发。企业项目请看 [https://gitee.com/enterprises](https://gitee.com/enterprises)} ## 项目设计 ### 项目工程结构 使用python语言编程,分模块设计,功能包含quantization、encrypt、inference。 - 量化模块quantization,定义一个基类,实现原模型加载、参数量化、量化后模型保存、基准测试等流程;GPTQ、AWQ等量化类继承基类,改写其中参数量化的方法或者关键步骤,方便以后添加新的量化方法实现。 - 加密模块encrypt,定义一个基类,实现模型权重加载,权重加密,加密后模型保存等流程;不同模型新建不同类继承基类,改写权重加密方法。 - 模型推理inference,新建api_server.py,定义模型推理服务对外接口;定义多个推理引擎的集成和实现。 ``` sllm ├── assets ├── examples ├── input ├── models ├── notebooks ├── scripts ├── sllm │ ├── encrypt │ │ ├── __init__.py │ │ ├── base_encrypt.py │ │ ├── generic_encrypt.py │ │ └── llama_encrypt.py │ ├── inference │ │ ├── __init__.py │ │ ├── inference_engine.py │ │ ├── vllm_engine.py │ │ ├── lmdeploy_engine.py │ │ └── api_server.py │ ├── quantization │ │ ├── __init__.py │ │ ├── base_quantization.py │ │ ├── gptq.py │ │ └── awq.py │ ├── utils │ │ ├── __init__.py │ │ ├── log.py │ │ └── key.py │ └── __init__.py ├── tests │ ├── __init__.py │ ├── test_quantization.py │ ├── test_encrypt.py │ └── test_inference.py ├── thirdparty │ ├── vllm │ ├── lmdeploy │ ├── tgi ├── .dockerignore ├── .gitignore ├── LICENSE ├── poetry.lock ├── pyproject.toml └── README.md ``` ### 模块设计 #### 量化模块 基类 BaseQuantization: 定义量化模块的基本流程 ```python # sllm/quantization/base_quantization.py class BaseQuantization: def __init__(self, model): self.model = model def load_model(self, model_path): # 加载原始模型 pass def quantize_parameters(self): # 参数量化的方法,需要在子类中实现 raise NotImplementedError def save_quantized_model(self, save_path): # 保存量化后的模型 pass def benchmark(self): # 基准测试 pass ``` GPTQ: 继承BaseQuantization,改写量化方法 ```python # sllm/quantization/gptq.py from .base_quantization.py import BaseQuantization class GPTQ(BaseQuantization): def quantize_parameters(self): # 实现GPTQ的参数量化方法 pass ``` AWQ: 继承BaseQuantization,改写量化方法 ```python # sllm/quantization/awq.py from .base_quantization import BaseQuantization class AWQ(BaseQuantization): def quantize_parameters(self): # 实现AWQ的参数量化方法 pass ``` #### 加密模块 基类 BaseEncrypt: 定义模型权重加密的基本流程 ```python # sllm/encrypt/base_encrypt.py class BaseEncrypt: def __init__(self, model): self.model = model def load_weights(self, weight_path): # 加载模型权重 pass def encrypt_weights(self): # 权重加密方法,需要在子类中实现 raise NotImplementedError def save_encrypted_model(self, save_path): # 保存加密后的模型 pass ``` 具体加密类: 继承 BaseEncrypt,实现特定模型的加密方法。 ```python # sllm/encrypt/generic_encrypt.py from .base_encrypt import BaseEncrypt class GenericEncrypt(BaseEncrypt): def encrypt_weights(self): # 实现具体模型的权重加密方法 pass # sllm/encrypt/llama_encrypt.py from .base_encrypt import BaseEncrypt class LlamaEncrypt(BaseEncrypt): def encrypt_weights(self): # 实现具体模型的权重加密方法 pass ``` #### 推理模块 API Server: 提供模型推理的对外接口 ```python # sllm/inference/api_server.py import fastapi import uvicorn from .inference_engine import InferenceEngine app = fastapi.FastAPI engine = InferenceEngine() @app.post("/v1/chat/completions") async def create_chat_completion(request: ChatCompletionRequest, raw_request: Request): pass @app.get("/v1/models") async def show_available_models(): pass if __name__ == '__main__': uvicorn.run(app) ``` 推理引擎: 集成多个推理引擎并实现推理 ```python # sllm/inference/inference_engine.py class InferenceEngine: def __init__(self): # 初始化推理引擎,可以集成多个引擎 pass def predict(self, data): # 实现模型推理 pass ``` VLLMEngine类实现 ```python # sllm/inference/vllm_engine.py class VLLMEngine: def predict(self, data): # Implement VLLM prediction logic return {"result": "VLLM prediction result"} ``` LMDeployEngine类实现 ```python # sllm/inference/lmdeploy_engine.py class LMDeployEngine: def predict(self, data): # Implement LMDeploy prediction logic return {"result": "LMDeploy prediction result"} ``` #### 测试模块 测试量化模块 ```python # tests/test_quantization.py import unittest from sllm.quantization.gptq import GPTQ from sllm.quantization.awq import AWQ class TestQuantization(unittest.TestCase): def test_gptq(self): # 测试GPTQ量化方法 pass def test_awq(self): # 测试AWQ量化方法 pass if __name__ == '__main__': unittest.main() ``` 测试加密模块 ```python # tests/test_encrypt.py import unittest from sllm.encrypt.generic_encrypt import GenericEncrypt from sllm.encrypt.llama_encrypt import LlamaEncrypt class TestEncrypt(unittest.TestCase): def test_generic_encrypt(self): # 测试SpecificEncrypt1加密方法 pass def test_llama_encrypt(self): # 测试SpecificEncrypt2加密方法 pass if __name__ == '__main__': unittest.main() ``` 测试推理模块 ```python # tests/test_inference.py import unittest from sllm.inference.api_server import app class TestInference(unittest.TestCase): def setUp(self): self.app = app.test_client() self.app.testing = True def test_predict(self): response = self.app.post('/predict', json={"data": "test"}) self.assertEqual(response.status_code, 200) self.assertIn('result', response.json) if __name__ == '__main__': unittest.main() ``` ### 流程图 #### 量化流程 ```text Start | v Load Original Model | v Select Quantization Method |--------------------------| | | v v GPTQ Quantization AWQ Quantization | | v v Quantize Parameters Quantize Parameters | | v v Save Quantized Model Save Quantized Model | | v v Run Benchmark Run Benchmark | | v v End End +-----------------------+ | BaseQuantization | |-----------------------| | - model | |-----------------------| | + __init__(model) | | + load_model(path) | | + quantize_parameters()| | + save_quantized_model(path)| | + benchmark() | +-----------------------+ ^ | | +-------------------+ +------------------+ |GPTQ(BaseQuantization)||AWQ(BaseQuantization)| |-------------------| |------------------| | + quantize_parameters() | + quantize_parameters()| +-------------------+ +------------------+ ``` #### 加密流程 ```text Start | v Load Model Weights | v Select Encryption Method |--------------------------| | | v v Generic Encrypt Llama Encrypt | | v v Encrypt Weights Encrypt Weights | | v v Save Encrypted Model Save Encrypted Model | | v v End End +---------------------+ | BaseEncrypt | |---------------------| | - model | |---------------------| | + __init__(model) | | + load_weights(path)| | + encrypt_weights() | | + save_encrypted_model(path)| +---------------------+ ^ | | +--------------------+ +---------------------+ | GenericEncrypt | | LlamaEncrypt | |--------------------| |---------------------| | + encrypt_weights()| | + encrypt_weights() | +--------------------+ +---------------------+ ``` #### 推理流程 ```text Start | v Initialize Inference Engine | v Add VLLMEngine | v Add LMDeployEngine | v Receive Prediction Request (API) | v Process Data | v Select Appropriate Engine | v Run Inference with Selected Engine |--------------------------| | | v v VLLMEngine Predict LMDeployEngine Predict | | v v Return Results Return Results | | v v End End +----------------------+ | InferenceEngine | |----------------------| | - engines | |----------------------| | + __init__() | | + add_engine(engine) | | + predict(data) | +----------------------+ ^ | | +----------------------+ +----------------------+ | VLLMEngine | | LMDeployEngine | |----------------------| |----------------------| | + predict(data) | | + predict(data) | +----------------------+ +----------------------+ +----------------------+ | api_server | |----------------------| | - app | | - engine | |----------------------| | + predict() | +----------------------+ ```