# chatTTS-api **Repository Path**: taisan/chat-tts-api ## Basic Information - **Project Name**: chatTTS-api - **Description**: 使用开源项目chatTTS语音模型实现兼容open ai的web接口服务 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 4 - **Created**: 2024-07-20 - **Last Updated**: 2025-03-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # chatTTS-openai-api ## 介绍 使用chatTTS语音合成开源模型封装成openai chatgpt兼容web接口 ## 软件架构 使用uvicorn、fastapi、chatTTS等开源库实现高性能接口 更多介绍 [AI语音模型专栏](https://blog.csdn.net/weixin_40986713/category_12735457.html) ## 核心代码 ```python import os import ChatTTS import torch import uvicorn from fastapi import FastAPI, Security, HTTPException from fastapi.responses import StreamingResponse from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials from pydantic import BaseModel from tools.audio import has_ffmpeg_installed from tools.audio.pcm import pcm_arr_to_mp3_view from tools.logger import get_logger from tools.normalizer import text_normlization mn = text_normlization.TextNormalizer() chat = ChatTTS.Chat() chat.load(compile=False) # Set to True for better performance app = FastAPI() security = HTTPBearer() env_bearer_token = os.getenv("ACCESS_TOKEN", 'sk-tarzan') chat_stream = os.getenv("STREAM", True) delay = os.getenv("WAIT", 1) logger = get_logger(" Main ") use_mp3 = has_ffmpeg_installed() if not use_mp3: logger.warning("no ffmpeg installed, use wav file output") # 音色选项:用于预置合适的音色 voices = { "Tarzan": 2, "Alloy": 1111, "Echo": 2222, "Fable": 3333, "Onyx": 4444, "Nova": 5555, } class SpeechCreateParams(BaseModel): model: str voice: str input: str response_format: str speed: float @app.post("/v1/audio/speech") async def audio_speech(params: SpeechCreateParams, credentials: HTTPAuthorizationCredentials = Security(security)): if env_bearer_token is not None and credentials.credentials != env_bearer_token: raise HTTPException(status_code=401, detail="Invalid token") params.response_format = "mp3" if params.response_format is None else params.response_format speed = max(0, min(9, int(5 * params.speed))) audio_seed = voices.get(params.voice, 2) torch.manual_seed(audio_seed) params_infer_code = ChatTTS.Chat.InferCodeParams( prompt=f"[speed_{speed}]", spk_emb=chat.sample_random_speaker() ) # 流模式等待时间s first_prefill_size = delay * 24000 async def generate_audio(): wav = chat.infer( mn.normalize_sentence(params.input), skip_refine_text=True, params_infer_code=params_infer_code, stream=chat_stream, ) if chat_stream: prefill_bytes = b"" meet = False for gen in wav: if gen is not None and len(gen) > 0: mp3_bytes = pcm_arr_to_mp3_view(gen) if not meet: prefill_bytes += mp3_bytes if len(prefill_bytes) > first_prefill_size: meet = True yield prefill_bytes else: yield mp3_bytes del gen else: yield pcm_arr_to_mp3_view(wav[0]) # 使用 StreamingResponse 返回生成器函数 response = StreamingResponse(generate_audio(), media_type="audio/mpeg") response.headers["Content-Disposition"] = f"attachment; filename=audio_speech.mp3" return response if __name__ == "__main__": try: uvicorn.run("main:app", reload=True, host="0.0.0.0", port=3002) except Exception as e: print(f"API启动失败!\n报错:\n{e}") ``` ## 使用说明 ### 直接启动 1. 下载代码 2. 安装python sdk 3. 在项目根目录下,执行命令 `python -m venv .venv` ,创建虚拟环境 4. 安装依赖 项目根目录下执行命令 `pip install -r requirements.txt` 5. 运行代码 项目根目录下执行命令 `python main.py` 这里的 `http://0.0.0.0:3002` 就是连接地址。 **注意:** - 由于git仓库空间限制,模型会在第一次调用接口时候下载,时长需要20分钟左右,国内用户需要科学上网。 - 不能科学上网,请点击 [国内模型下载地址](https://download.csdn.net/download/weixin_40986713/89562516) ,下载解压后,将`asse`t和`config`文件夹放到`chatTTS`项目根目录下即可 #### 1. docker打包命令 ``` docker build -t chattts-api . ``` #### 2.docker命令启动 **gpu显卡模式** ``` docker run -itd --name chattts-api -p 3002:3002 --gpus all --restart=always chattts-api ``` - 默认 ACCESS_TOKEN=sk-tarzan **cpu模式** ``` docker run -itd --name chattts-api -p 3002:3002 --restart=always chattts-api ``` - 默认 ACCESS_TOKEN=sk-tarzan **鉴权模式** ``` docker run -itd --name chattts-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --gpus all --restart=always chattts-api docker run -itd --name chattts-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --restart=always chattts-api ``` - yourtoken 修改你设置的鉴权token,接口调用header 里传 `Authorization:Bearer sk-tarzan` **参数模式** ``` docker run -itd --name chattts-api -p 3003:3003-e ACCESS_TOKEN=yourtoken STREAM=False WAIT=1 --gpus all --restart=always chattts-api ``` - ACCESS_TOKEN 为设置的鉴权token,不传默认为sk-tarzan,接口调用header 里传 `Authorization:Bearer sk-tarzan` - STREAM 为流式返回,不传默认为True开启流式返回 - WAIT 为在流式返回模式下,初始等待时间,默认为3秒,gpu模式可以降低1秒,cpu模式简易升高5秒以上 #### **docker日志查看** ``` docker logs -f [容器id或容器名称] ``` # 相关文章推荐 [《使用 Xinference 部署本地模型》](https://tarzan.blog.csdn.net/article/details/138155630) [《Fastgpt接入Whisper本地模型实现语音输入》](https://tarzan.blog.csdn.net/article/details/139424714) [《部署和接入使用重排模型bge-reranker》](https://tarzan.blog.csdn.net/article/details/138711273) [《部署接入 M3E和chatglm2-m3e文本向量模型》](https://tarzan.blog.csdn.net/article/details/138156015) [《Fastgpt 无法启动或启动后无法正常使用的讨论(启动失败、用户未注册等问题这里)》](https://tarzan.blog.csdn.net/article/details/137055690) [《vllm推理服务兼容openai服务API》](https://tarzan.blog.csdn.net/article/details/136972561) [《解决vllm推理框架内在开启多显卡时报错问题》](https://tarzan.blog.csdn.net/article/details/137043339) **如有疑问,评论区留言,基本当天就会回复!**