# 大模型数据库

**Repository Path**: git4chen/llmh2-db

## Basic Information

- **Project Name**: 大模型数据库
- **Description**: 大模型数据库,实现大模型+机器学习+datafabric+hpl+neo4j的数据库,提供了大模型对话,RAG, 智能体Agent,langgraph等功能,通过JDBC,ODBC,API接口访问数据库
- **Primary Language**: Java
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: http://8.130.66.117:8080/ui
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 2
- **Created**: 2025-01-05
- **Last Updated**: 2025-01-05

## Categories & Tags

**Categories**: Uncategorized

**Tags**: deprecated

## README

# 大模型数据库AGENT

大模型数据库智能体,实现大模型+RAG+Agent Mesh+百度web搜索+机器学习+fabric+nlp+neo4j的功能,提供了大模型对话,知识库RAG编排,多智能体langgraph编排等功能;
开源版支持本百万级的大模型向量添加和查询，企业版本支持百亿级的大模型向量添加和查询;
可以通过GraalVM把java代码编译成2进制,减少内存消耗,提升执行速度;
WEB平台体验地址:http://8.130.66.117:8080/ui

## 安装和编译

![img_4.png](img_4.png)

1. 建议使用graalVM 23(java版本23以上)编译为2进制,python的版本10以上,langgraph等相关组件,在服务器端配置大模型的.env环境变量;
2. 参考linux-bin目录下面的 h2-*,启动文件runllmdb.sh,.env文件等;
3. python调用大模型的例子和sql见<python和sql例子.md>

``` 
[python]
# 8.130.66.117为大模型数据库云服务器IP,sql语法可以兼容postgresql或mysql,oracle等
#import psycopg2
#con = psycopg2.connect("dbname=./llm.data;DATABASE_TO_LOWER=TRUE;DEFAULT_NULL_ORDERING=HIGH;AUTO_SERVER=TRUE user=sa password=sa host=8.130.66.117 port=5435 options='-c search_path=public' ")
#python连接池
import psycopg2_pool
pool = psycopg2_pool.ConnectionPool(minconn=1, maxconn=3,
                user="sa",
                password="sa",
                host="8.130.66.117",
                port="5435",
                options="-c search_path=public",
                database="./llm.data;DATABASE_TO_LOWER=TRUE;DEFAULT_NULL_ORDERING=HIGH;")
try:
   sql="select chatgpt('info','') from aigc_llm LIMIT 1"
   #con = psycopg2.connect("dbname=./llm.data;DATABASE_TO_LOWER=TRUE;DEFAULT_NULL_ORDERING=HIGH;AUTO_SERVER=TRUE user=sa password=sa host=8.130.66.117 port=5435 options='-c search_path=public' ")
   con=pool.getconn()
   with con.cursor() as cur:
      cur.execute(sql)
      results = cur.fetchall()
finally:
    pool.putconn(con)
    
[java]
通过jdbc调用
url=jdbc:h2:tcp://8.130.66.117:9092/./llm.data;MODE=PostgreSQL;DATABASE_TO_LOWER=TRUE
用户名和密码:sa/sa   
```

DBeaver等数据库工具访问例子
![img_3.png](img_3.png)

# 为什么要使用大模型数据库

## 安全性高

1. API_KEY管理: chatgpt的API_KEY配置在数据库服务器端,使用者通过API,JDBC,ODBC等方式访问chatgpt,不用接触到API_KEY;
2. prompt脱敏: 访问外网chatgpt服务,会对prompt提示词先做脱敏处理再提交到chatgpt,防止隐私泄露,如电话号码变成139***;
3. 信息混淆: 把prompt里面敏感信息自动替换变量,如"张三的电话号码是139***,",数据替换为"A的电话号码是B"
   .大模型返回后再把A替换为张三,B替换为139***;
4. prompt存储加密: 参考LangChain Hub功能,提供标准的prompt接口,把prompt加密后保存到数据库里面,对用户的prompt资产进行分类保护;
5. 大模型api调用日志和审计,以及熔断和限流等管理;
6. 结合NeMo-Guardrails系统，给大模型安装了安全围墙，对大模型的输入和输出内容做有害信息的控制和过滤，评估等;

## 性能快,更省钱

1. 数据库缓存:数据库默认启动缓存,如果是相同的大模型查询,会从缓存里面读取;缓存里面没有,才会从chatgpt里面读取,减少重复token消费;
2. prompt混淆压缩: 对prompt里面长的字符串 替换为短字符,在不影响prompt的情况下, 对空格,换行等进行压缩,减少prompt的token数;
3. 分布式:采用hadoop的mapreduce的技术,把超过大模型窗口的文字内容,分隔成多个块给不同的大模型数据库计算,计算完成后再合并输出;

## 大模型组件生态圈

1. 通过大模型组件市场,下载其他人开源的组件代码和例子;
2. 通过大模型组件市场,把自定义组件发布到组件市场;
3. 通过download命令,从市场下载自定义函数;
4. 通过upload命令,可以把自定义函数上传到市场;
5. 通过list命令,查询市场里面自定义函数;

## 开发效率高,学习成本低

1. prompt管理:大模型数据库自带prompt管理,可以通过api访问,或者在sql里面关联使用;
2. chatgpt调用:一条sql就能实现一个智能体和实现多智能体协同;
   2.1. 自动生成并且修复,调用修复函数实现查询rag->内容生成->内容验证->错误修改->正确的经验再进入到向量库;
   2.2. 强化学习功能,调用lats函数基于langgraph实现Tree search + ReAct + Plan&Execute+ Reflection;
   2.3. 物联网的桥梁下沉检测,调用检修函数实现输入规则和经验库,采集的数据字段,采集的图片字段, 就可以判断是否下沉,是否存在风险;
   2.4. 经典的啤酒喝尿布关联分析,调用机器学习函数mlfpgrowth(编号字段,实体列表字段)就能分析出实体间关系,还可以写入neo4j中;
3. 标准化:采用不同的编程语言(python,java,c#,rust,go等)需要安装不同的SDK和编写不同的代码,采用大模型数据库后统一采用SQL语句,交流和沟通更方便;
4. 学习成本低:不需要学习springAI,langchain,langgraph,llamaindex,机器学习算法,大数据等不同的框架和技术,会使用sql就可以使用大模型服务器;
5. 扩展自定义函数: 通过数据库的自定义函数,方便添加java和python等不同语言的大模型函数,沉淀出公司自己的资产,不需要重复开发相同的功能;

## 兼容不同系统

1. 兼容主流的大模型: 功能基于langgraph和langchain开发,langchain支持的模型,大模型数据库都能兼容;
2. grafana的兼容: 连接大模型数据库,通过自然语言查询进行数据挖掘和探索,grafana对返回的结果进行可视化显示
3. ERP等信息系统的兼容: 可以通jdbc或postgresql的连接访问大模型数据库,连接成功后,就可以使用大模型的CRUD,NLP,机器学习和大模型的服务,没有额外的组件和系统安装;
4. 数据库和文件兼容: 通过pip等功能,可以对接主流的50+数据库类型;兼容pdf,office文件(word,excel,ppt)
   ,csv,图片,视频,声音,md等文件,对文件和文件目录索引和查询;
5. flink和spark的兼容: 通过pip对接flink和sprk,把计算后的结果交给大模型数据库再分析;

## Agent-mesh 微服务专用大模型

1. 每个服务拆分为不同小型微模型数据库，本地模型和外部大模型无缝配合;
2. 简单易用，不需要复杂的安装和配置‌;
3. 快速的启动时间和低内存消耗;
4. 支持分布式存储和计算,处理大规模的数据和并发请求;

# 主要功能教程

## 智能体记忆和图关联-rag开头函数教程

体验地址:http://8.130.66.117:8080/ui 智能体rag教程

```sql
注意：如果参数里面包含特殊字符，调用BASE64ENCODE函数转成base64格式再传入函数
批量的例子:
select rag_loadqa('ball-name', tmp.opendate,tmp.ball) 
from( select  "quest" as opendate,"answer" as ball from csvread('/data/llmdata/ball.txt') ) tmp

-- 大模型对话接口
--参数1：系统Prompt
--参数2：用户Prompt
select llmchat('你是聊天助手','你好')

-- 大模型对话接口
--参数1：json格式的prompt
--参数2：json格式的问题
select llminvoke('[{"role": "system", "content": "You are a Postgresql database expert"},{"role": "user","content":"{sql}:{ddl}"} ]',
      '{"sql":"sql","ddl":"ddl"}')

-- 加载问题和答案到向量表
-- 参数1：向量表名称proc/extract/tablename，如果向量表不存在，会自动创建
-- 参数2：问题是 hostname(111)
-- 参数3：回答是 host_name(111)
-- 参数4：关键词: 问题里面关键词：host_name，如果查询里面包含关键词，必定会查询到
-- 把问题和答案加入向量表
select rag_loadqa('proc/extract/tablename', 'hostname(222)','host_name(222)','host_name')


-- 加载doc文档内容到向量表，rag的类型是doc
-- 参数1：向量表名称proc/extract/tablename，如果向量表不存在，会自动创建
-- 参数2：问题是hostname(111)
-- 参数3：rag的类型，目前支持问答qa，文档doc等，默认是'doc'
select rag_loadtxt('proc/extract/tablename', 'hostname(111)','doc')


-- llm根据向量表答案回答问题
-- 参数1：向量表名称：proc/extract/tablename，如果查询多个向量表，使用分号隔开t1;t2;t3; 系统自动会查询多个向量表
-- 参数2：系统提示词 
-- 参数3：问题内容 hostname(222)
-- 参数4：rag的类型，目前支持问答qa，文档doc等
-- 参数5：ent_vectinfo向量配置算法（BM25Okapi等）表主键id，可以不传。配置bm25d里面表名称会覆盖 sql语句里面的向量表名称
select rag_chat('proc/extract/tablename;', 'You are a Postgresql database expert','hostname(222)','qa') 


-- 加载文件夹和文件到向量表，rag的类型是doc
-- 参数1：文件夹或文件名；如果是文件夹，自动批量加载文件夹下面的全部的文件（包括pdf,csv,word,ppt,md，图片,视频等）
-- 参数2：文档的字符串分隔符：如 '[\n。?!？！]'
select rag_loadfile('/home/softrobot/llm/qwne05/txt/测试.txt', '\n')

-- 获取物理表名。有了物理表名后，可以执行常规sql如drop table，select，delete 和update等；
-- 参数1：向量表名：proc/extract/tablename，如果查询多个向量表，使用分号隔开t1;t2;t3; 返回表名称数组
select rag_name('proc/extract/tablename;')

-- 返回查询原始的字符串
-- 参数1：向量表名称proc/extract/tablename
-- 参数2：问题是hostname(111)
-- 参数3：rag的类型，目前支持问答qa，文档doc等
-- 参数4：查询类型，1：按关键词查询；2：按向量查询；0、混合查询
select rag_query('proc/extract/tablename', 'hostname','qa','1')
```

## 可选择的工具（包含api，文件，数据库，消息队列等）-tol开头的函数

体验地址:http://8.130.66.117:8080/ui 智能体tol教程

```sql

-- 加载工具代码到表
-- 参数1：工具名称：read-table-ddl
-- 参数2：工具的代码
select tol_loadcode('read-table-ddl', 
base64encode('def getpsql_table_ddl(param):
    from sqlalchemy import create_engine, MetaData, Table
    list=param.split(",")
    print(list)
    engine = create_engine(list[0])
    # 反射数据库中的表
    metadata = MetaData()
    metadata.reflect(bind=engine)
    # 获取反射的表            
    example_table = metadata.tables[list[1]]
    # 获取DDL语句
    with engine.connect() as conn:
        ddl_script = str(example_table.create(conn))        
    return ddl_script'))

-- 加载文件夹下面的python，hpl工具到表
-- 参数1：函数名称前缀，防止有相同的函数名称
-- 参数2：文件夹或文件，用分号分割多个文件或文件夹
select tol_loadfile('prefix-','/home/softrobot/llm/qwne05/tools;')

-- 加载和训练工具使用的场景，
-- 参数1：向量表名称：database/tools
-- 参数2：工具使用的场景描述：Read the table ddl. uri is abc1, The name of the table is tab1
-- 参数3：工具的名称：read-table-ddl
-- 参数4：参数描述：tab1
select tol_loadfunc('database/tools','Read the table ddl. uri is abc1, The name of the table is tab1','read-table-ddl','tab1')


-- 根据问题描述，自动执行工具
-- 参数1：向量表名称database/tools
-- 参数2：问题是 Read the table ddl. uri is 1234, The name of the table is 234
select tol_chat('database/tools;','Read the table ddl. uri is 1234, The name of the table is 234')

-- 返回查询原始的字符串
-- 参数1：向量表名称：database/tools
-- 参数2：问题是Read the table ddl
select tol_query('database/tools','Read the table ddl')


```

## 任务执行和动作（任务计划，任务分解，探索等）-act开头的函数教程

体验地址:http://8.130.66.117:8080/ui 智能体act教程

```sql

-- 加载 把问题分解为多个子任务
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：把当前问题分解为子任务 subtask1: Find out the number of hours Thomas worked. #E2 = 1\nsubtask2: Find out the number of hours Toby worked. #E2 = 1
-- 参数4：关键词: worked
select act_loadmap('act/plan/tablename', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week',
'subtask1: Find out the number of hours Thomas worked. #E2 = 1\nsubtask2: Find out the number of hours Toby worked. #E2 = 1','worked')

-- 加载 把问题分解成多个机会
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：把当前问题分解为子计划 plan1: Find out the number of hours Thomas worked. #E2 = 1\nplan2s: Find out the number of hours Toby worked. #E2 = 1
-- 参数4：关键词: worked
select act_loadplan('act/plan/tablename', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week',
'plan1: Find out the number of hours Thomas worked. #E2 = 1\nplan2: Find out the number of hours Toby worked. #E2 = 1','worked')

-- 加载 问题重写（换一个相近的问题） 
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：把当前问题换成新的问题 Thomas, Toby, and Rebecca worked a total of 157 hours
-- 参数4：关键词: worked
select act_loadwt('act/plan/tablename', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week',
'Thomas, Toby, and Rebecca worked a total of 157 hours','worked')

-- 加载 生成与问题相关的列表
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：与问题相关的列表 relevance1: Find out the number of hours Thomas worked. #E2 = 1\nrelevance2: Find out the number of hours Toby worked. #E2 = 1
-- 参数4：关键词: worked
select act_loadrelate('act/plan/tablename', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week',
'relevance1: Find out the number of hours Thomas worked. #E2 = 1\nrelevance2: Find out the number of hours Toby worked. #E2 = 1','worked')


-- 加载 把问题分解为多个子任务
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：把当前问题分解为子任务 subtask1: Find out the number of hours Thomas worked. #E2 = 1\nsubtask2: Find out the number of hours Toby worked. #E2 = 1
-- 参数4：关键词: worked
select act_chatmap('act/plan/tablename;', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week')

-- 加载 把问题分解成多个机会
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：把当前问题分解为子计划 plan1: Find out the number of hours Thomas worked. #E2 = 1\nplan2s: Find out the number of hours Toby worked. #E2 = 1
-- 参数4：关键词: worked
select act_chatplan('act/plan/tablename;', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week')

-- 加载 问题重新（换一个相近的问题） 
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：把当前问题换成新的问题 Thomas, Toby, and Rebecca worked a total of 157 hours
-- 参数4：关键词: worked
select act_chatwt('act/plan/tablename;', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week')

-- 加载 生成与问题相关的列表
-- 参数1：向量表名称act/test，如果向量表不存在，会自动创建
-- 参数2：问题是 Thomas, Toby, and Rebecca worked a total of 157 hours in one week
-- 参数3：与问题相关的列表 relevance1: Find out the number of hours Thomas worked. #E2 = 1\nrelevance2: Find out the number of hours Toby worked. #E2 = 1
-- 参数4：关键词: worked
select act_chatrelate('act/plan/tablename;', 'Thomas, Toby, and Rebecca worked a total of 157 hours in one week')


```

## 对执行结果反思，评估，优化-rft开头的函数

体验地址:http://8.130.66.117:8080/ui 智能体rft教程

```sql


-- 加载doc文档内容到向量表，rag的类型是doc
-- 参数1：向量表名称rft/relevancy，如果向量表不存在，会自动创建
-- 参数2：保存的txt内容：The FIFA World Cup in 2018 was won by the French national football team
-- 参数3：关键字：FIFA World Cup
select rft_loadtxt('rft/relevancy', 'The FIFA World Cup in 2018 was won by the French national football team','%FIFA World Cup%')


-- 返回查询原始的字符串
-- 参数1：向量表名称rft/relevancy;
-- 参数2：问题是The FIFA World Cup in 2018
-- 参数3：查询类型，1：按关键词查询；2：按向量查询；0、混合查询
select rft_query('rft/relevancy;','The FIFA World Cup in 2018','0')

-- 关联分析
-- 参数1：向量表名称rft/relevancy;
-- 参数2：输入值 input
-- 参数3：期望的输出值 expected
-- 参数4：实际输出值 output
select rft_relevancy('rft/relevancy;','input','expected','output')

-- 关联分析
-- 参数1：向量表名称rft/relevancy;
-- 参数2：自定义判断的条件： criteria
-- 参数3：输入值 input
-- 参数4：期望的输出值 expected
-- 参数5：实际输出值 output
select  rft_relevancy('rft/relevancy;','criteria','input','expected','output')

```

## 多智能体协同 Agent mesh -grp开头的函数教程

体验地址:http://8.130.66.117:8080/ui 多智能体langgraph教程

```sql
--查看专家库的表结构
-- 	deleted INTEGER DEFAULT 0, 删除标志
-- vct_name CHARACTER VARYING(500), 向量表名称
-- name CHARACTER VARYING(200), langgraph的节点agent的名称
-- tools CHARACTER VARYING(200) DEFAULT 'database-sql-tools', 节点agent使用的工具
-- prompt CHARACTER VARYING, 节点agent使用的prompt
-- test CHARACTER VARYING(200),对生成答案测试工具
-- verify CHARACTER VARYING(200),答案和问题相似度对比工具
--  cond CHARACTER VARYING, langgraph的条件节点
-- parent_id INTEGER DEFAULT 0,langgraph的父节点
-- flag INTEGER DEFAULT 0, 0、表示推荐状态；1、发布状态；2、禁用状态
select * from ept_sql_convert

-- 根据专家表的内容，注册为langgraph的muti-agent 微服务
--参数1：专家表名称
--参数2： 向量库名称 
select grp_regrec('ept_sql_convert','sql-convert/tools')

-- 查找是否注册了微服务
--参数1：专家表名称 
select grp_find('ept_sql_convert')

-- 调用注册的微服务，如果不存在微服务，提示：Agent is None
--参数1：表名称  ept_sql_convert
--参数2：问题
select grp_chat('ept_sql_convert','select * from "Project" where "ProjectID" >0')

```

## 大模型安全-sft开头函数教程

体验地址:http://8.130.66.117:8080/ui 大模型安全教程目录

```sql

```

## 机器学习ML-ml开头的函数教程

体验地址:http://8.130.66.117:8080/ui 机器学习ml教程

```sql

```

## 多媒体处理（图片，视频，声音等）-pic开头的函数教程

体验地址:http://8.130.66.117:8080/ui 多媒体教程

```sql

```

## 自然语言NLP处理-nlp开头的函数教程

体验地址:http://8.130.66.117:8080/ui 大模型NLP教程目录

```sql

```

## 大模型信息系统(数据编织，增加,删除,修改,查询等)-dat开头函数教程

体验地址:http://8.130.66.117:8080/ui 大模型的data教程目录

```sql

```

## 自助BI教程

体验地址:http://8.130.66.117:8080/ui 大模型自助BI教程目录

```sql

```