# java-llm-proxy
**Repository Path**: ppnt/java-llm-proxy
## Basic Information
- **Project Name**: java-llm-proxy
- **Description**: 该服务是一个支持多模型的大语言模型代理服务。该代理服务将接收来自客户端的请求(包括流式和非流式模式),并通过统一接口将请求转发至OpenAI、Anthropic Claude和Google Gemini等API,同时负责将返回结果以HTTP响应或SSE(Server-Sent Events)流的形式返回给客户端。
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2025-07-07
- **Last Updated**: 2025-07-07
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# 使用tio-boot搭建多模型LLM代理服务
本文档介绍如何基于[tio-boot](https://github.com/litongjava/tio-boot)框架快速搭建一个支持多模型的大语言模型代理服务。该代理服务将接收来自客户端的请求(包括流式和非流式模式),并通过统一接口将请求转发至OpenAI、Anthropic Claude和Google Gemini等API,同时负责将返回结果以HTTP响应或SSE(Server-Sent Events)流的形式返回给客户端。
---
## 1. 概述
* **使用场景**:当需要统一接入多个LLM服务时,可通过本代理服务解决以下问题:
1. 网络策略限制(如无法直连API域名)
2. 统一认证和密钥管理
3. 多模型路由转发
4. 流式/非流式响应格式转换
* **核心功能**:
1. 前端请求发送至本地tio-boot服务(如`http://127.0.0.1/***`)
2. 根据请求路径自动路由到对应API服务(OpenAI/Anthropic/Google)
3. 根据请求体中的`stream`字段自动选择SSE流式或同步HTTP响应
4. 将API返回结果(JSON或SSE流)适配返回给客户端
---
## 2. 环境准备
1. **JDK**:Java 8或更高版本
2. **构建工具**:Maven或Gradle
3. **核心依赖**(pom.xml示例):
```xml
com.litongjava
tio-boot-admin
1.0.4
com.alibaba.fastjson2
fastjson2
2.0.30
org.projectlombok
lombok
1.18.26
provided
```
---
## 3. 项目结构
```
llm-proxy-app/
├─ src/
│ ├─ main/
│ │ ├─ java/
│ │ │ ├─ com/litongjava/llm/proxy/
│ │ │ │ ├─ LLMProxyApp.java # 应用入口
│ │ │ ├─ com/litongjava/llm/proxy/config/
│ │ │ │ └─ LLMProxyConfig.java # 路由配置
│ │ │ ├─ com/litongjava/llm/proxy/handler/
│ │ │ │ └─ LLMProxyHandler.java # 核心处理器
│ │ │ ├─ com/litongjava/llm/proxy/callback/
│ │ │ │ └─ SSEProxyCallback...java # SSE回调处理
│ │ └─ resources/
│ │ └─ app.properties # 配置文件
└─ pom.xml
```
---
## 4. 关键实现
### 4.1 LLMProxyHandler.java - 请求路由处理器
```java
package com.litongjava.llm.proxy.handler;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.fastjson2.JSONObject;
import com.litongjava.claude.ClaudeClient;
import com.litongjava.gemini.GeminiClient;
import com.litongjava.llm.proxy.callback.SSEProxyCallbackEventSourceListener;
import com.litongjava.model.body.RespBodyVo;
import com.litongjava.openai.client.OpenAiClient;
import com.litongjava.proxy.AiChatProxyClient;
import com.litongjava.tio.boot.http.TioRequestContext;
import com.litongjava.tio.core.ChannelContext;
import com.litongjava.tio.http.common.HttpRequest;
import com.litongjava.tio.http.common.HttpResponse;
import com.litongjava.tio.http.common.utils.HttpIpUtils;
import com.litongjava.tio.http.server.util.CORSUtils;
import com.litongjava.tio.utils.environment.EnvUtils;
import com.litongjava.tio.utils.hutool.StrUtil;
import com.litongjava.tio.utils.json.FastJson2Utils;
import lombok.extern.slf4j.Slf4j;
import okhttp3.Response;
import okhttp3.sse.EventSourceListener;
@Slf4j
public class LLMProxyHandler {
public HttpResponse completions(HttpRequest httpRequest) {
long start = System.currentTimeMillis();
HttpResponse httpResponse = TioRequestContext.getResponse();
CORSUtils.enableCORS(httpResponse);
String requestURI = httpRequest.getRequestURI();
String bodyString = httpRequest.getBodyString();
if (StrUtil.isBlank(bodyString)) {
return httpResponse.setJson(RespBodyVo.fail("empty body"));
}
String realIp = HttpIpUtils.getRealIp(httpRequest);
log.info("from:{},requestURI:{}", realIp, requestURI);
Boolean stream = false;
String url = null;
Map headers = new HashMap<>();
if (requestURI.startsWith("/openai")) {
url = OpenAiClient.OPENAI_API_URL + "/chat/completions";
headers.put("authorization", httpRequest.getAuthorization());
JSONObject openAiRequestVo = null;
if (bodyString != null) {
openAiRequestVo = FastJson2Utils.parseObject(bodyString);
stream = openAiRequestVo.getBoolean("stream");
}
} else if (requestURI.startsWith("/anthropic")) {
url = ClaudeClient.CLAUDE_API_URL + "/messages";
headers.put("x-api-key", httpRequest.getHeader("x-api-key"));
headers.put("anthropic-version", httpRequest.getHeader("anthropic-version"));
JSONObject openAiRequestVo = null;
if (bodyString != null) {
openAiRequestVo = FastJson2Utils.parseObject(bodyString);
stream = openAiRequestVo.getBoolean("stream");
}
} else if (requestURI.startsWith("/google")) {
String key = httpRequest.getParam("key");
String modelName1 = requestURI.substring(requestURI.lastIndexOf('/') + 1, requestURI.indexOf(':'));
if (requestURI.endsWith("streamGenerateContent")) {
url = GeminiClient.GEMINI_API_URL + modelName1 + ":streamGenerateContent?alt=sse&key=" + key;
stream = true;
} else {
url = GeminiClient.GEMINI_API_URL + modelName1 + ":generateContent?key=" + key;
}
}
//String authorization = httpRequest.getHeader("authorization");
if (stream != null && stream) {
// 告诉默认的处理器不要将消息体发送给客户端,因为后面会手动发送
httpResponse.setSend(false);
ChannelContext channelContext = httpRequest.getChannelContext();
EventSourceListener openAIProxyCallback = new SSEProxyCallbackEventSourceListener(channelContext, httpResponse, start);
AiChatProxyClient.stream(url, headers, bodyString, openAIProxyCallback);
} else {
try (Response response = AiChatProxyClient.generate(url, headers, bodyString)) {
//OkHttpResponseUtils.toTioHttpResponse(response, httpResponse);
try {
String string = response.body().string();
httpResponse.setString(string, "utf-8", "application/json");
if (EnvUtils.getBoolean("app.debug", false)) {
log.info("chat:{},{}", bodyString, string);
}
} catch (IOException e) {
e.printStackTrace();
}
long end = System.currentTimeMillis();
log.info("finish llm in {} (ms):", (end - start));
}
}
return httpResponse;
}
}
```
**功能说明**:
1. **多模型路由**:根据URL前缀`/openai`、`/anthropic`、`/google`自动路由到对应服务
2. **头部处理**:提取并转换各平台特有的认证头(Authorization/x-api-key)
3. **流式检测**:解析请求体中的`stream`字段决定响应模式
4. **响应转换**:非流式模式直接返回JSON,流式模式通过SSE回调处理
---
### 4.2 SSEProxyCallbackEventSourceListener.java - SSE回调处理器
```java
package com.litongjava.llm.proxy.callback;
import java.io.IOException;
import com.jfinal.kit.StrKit;
import com.litongjava.tio.core.ChannelContext;
import com.litongjava.tio.core.Tio;
import com.litongjava.tio.http.common.HttpResponse;
import com.litongjava.tio.http.common.sse.SsePacket;
import com.litongjava.tio.utils.SystemTimer;
import lombok.extern.slf4j.Slf4j;
import okhttp3.Response;
import okhttp3.sse.EventSource;
import okhttp3.sse.EventSourceListener;
@Slf4j
public class SSEProxyCallbackEventSourceListener extends EventSourceListener {
private ChannelContext channelContext;
private HttpResponse httpResponse;
private long start;
private boolean continueSend = true;
public SSEProxyCallbackEventSourceListener(ChannelContext channelContext, HttpResponse httpResponse, long start) {
this.channelContext = channelContext;
this.httpResponse = httpResponse;
this.start = start;
}
@Override
public void onOpen(EventSource eventSource, Response response) {
httpResponse.addServerSentEventsHeader();
httpResponse.setSend(true);
Tio.send(channelContext, httpResponse);
}
@Override
public void onEvent(EventSource eventSource, String id, String type, String data) {
if (StrKit.notBlank(data)) {
sendPacket(new SsePacket(type, data.getBytes()));
// [DONE] 是open ai的数据标识
if ("[DONE]".equals(data)) {
finish(eventSource);
return;
}
}
}
@Override
public void onClosed(EventSource eventSource) {
finish(eventSource);
}
@Override
public void onFailure(EventSource eventSource, Throwable t, Response response) {
log.error(t.getMessage(), t);
try {
int code = response.code();
String string = response.body().string();
httpResponse.status(code);
httpResponse.body(string);
httpResponse.setSend(true);
Tio.send(channelContext, httpResponse);
} catch (IOException e) {
e.printStackTrace();
} finally {
response.close();
}
finish(eventSource);
}
private void finish(EventSource eventSource) {
log.info("elapse:{}", SystemTimer.currTime - start);
eventSource.cancel();
Tio.close(channelContext, "finish");
}
/** 三次重试发送 SSE,遇断就放弃 */
private void sendPacket(SsePacket packet) {
if (!continueSend)
return;
if (!Tio.bSend(channelContext, packet)) {
if (!Tio.bSend(channelContext, packet)) {
if (!Tio.bSend(channelContext, packet)) {
continueSend = false;
}
}
}
}
}
```
**核心机制**:
1. **SSE初始化**:`onOpen`设置`Content-Type: text/event-stream`并建立连接
2. **数据流式转发**:`onEvent`将收到的数据块实时转发给客户端
3. **终止信号处理**:识别`[DONE]`标记并关闭连接
4. **错误处理**:API错误时返回原始错误信息和状态码
---
### 4.3 LLMProxyConfig.java - 路由配置
```java
package com.litongjava.llm.proxy.config;
import com.litongjava.context.BootConfiguration;
import com.litongjava.llm.proxy.handler.LLMProxyHandler;
import com.litongjava.tio.boot.server.TioBootServer;
import com.litongjava.tio.http.server.router.HttpRequestRouter;
public class LLMProxyConfig implements BootConfiguration {
public void config() {
TioBootServer server = TioBootServer.me();
HttpRequestRouter requestRouter = server.getRequestRouter();
LLMProxyHandler LLMProxyHandler = new LLMProxyHandler();
requestRouter.add("/openai/v1/chat/completions", LLMProxyHandler::completions);
requestRouter.add("/anthropic/v1/messages", LLMProxyHandler::completions);
requestRouter.add("/google/v1beta/models/*", LLMProxyHandler::completions);
}
}
```
**配置说明**:
1. 统一入口:不同API路径使用相同的处理方法
2. 通配符支持:Google模型路径支持`*`通配符
3. 自动装配:通过`BootConfiguration`接口实现启动时自动注册
---
### 4.4 LLMProxyApp.java - 应用入口
```java
package com.litongjava.llm.proxy;
import com.litongjava.llm.proxy.config.LLMProxyConfig;
import com.litongjava.tio.boot.TioApplication;
public class LLMProxyApp {
public static void main(String[] args) {
long start = System.currentTimeMillis();
TioApplication.run(LLMProxyApp.class, new LLMProxyConfig(), args);
long end = System.currentTimeMillis();
System.out.println((end - start) + "ms");
}
}
```
---
## 5. 服务启动与测试
### 5.1 启动服务
```bash
mvn clean package -DskipTests
java -jar target/llm-proxy-app-1.0.0.jar
```
### 5.2 多模型测试示例
**OpenAI非流式测试**:
```bash
curl -X POST http://localhost/openai/v1/chat/completions \
-H "Authorization: Bearer sk-proj-o" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role":"system","content":"Just say hi"}],
"model": "gpt-3.5-turbo",
"stream": false
}'
```
**Google Gemini流式测试**:
```bash
curl -X POST 'http://localhost/google/v1beta/models/gemini-2.5-flash:streamGenerateContent?key=API_KEY' \
-H "Content-Type: application/json" \
-d '{
"contents": [{"role": "user", "parts": [{"text": "hi"}]}]
}'
```
**Anthropic Claude流式测试**:
```bash
curl -X POST http://localhost/anthropic/v1/messages \
-H "x-api-key: YOUR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"messages": [{"role": "user", "content": "Just say hi"}],
"stream": true,
"model":"claude-3-7-sonnet-20250219"
}'
```
---
## 6. 注意事项
1. **响应体单次读取**:
- 在非流式模式中,`response.body().string()`会自动读取并关闭流
- 流式模式中避免重复读取响应体
2. **连接管理**:
- OkHttpClient应全局复用以保证连接池效率
- SSE连接结束时需显式关闭通道
3. **错误处理**:
- API错误时透传原始错误码和消息体
- 网络异常时记录日志并关闭连接
4. **性能优化**:
- 开启DEBUG日志:`app.properties`中设置`app.debug=true`
- 监控请求处理时间:关键节点记录时间戳
---
## 7. 技术总结
本代理服务基于tio-boot框架实现以下核心功能:
1. **多模型统一接入**:
- OpenAI:标准ChatCompletions接口
- Anthropic Claude:Messages API
- Google Gemini:generateContent/streamGenerateContent
2. **双模式响应支持**:
- 同步模式:直接返回完整JSON响应
- 流式模式:通过SSE实时传输数据块
3. **高效路由机制**:
- 路径前缀匹配不同API服务
- 通配符处理模型动态路径
- 统一请求处理方法
通过本方案,可快速构建支持多主流语言模型的统一代理服务,有效解决API访问限制问题,并提供一致的开发体验。