diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/get_all_docker_images_flow.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/get_all_docker_images_flow.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..7da843c03e884e7fbe7eda3642daff20ffde9457 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/get_all_docker_images_flow.yaml" @@ -0,0 +1,15 @@ +name: get_all_supported_AI_docker_images +description: "获取所有支持的docker容器镜像,输入为空,输出为支持的AI容器镜像列表,包括名字、tag、registry、repository" +steps: + - name: start + call_type: api + params: + endpoint: GET /docker/images + next: list2markdown + - name: list2markdown + call_type: llm + params: + user_prompt: | + 当前已有的docker容器及tag为:{data}。请将这份内容输出为markdown表格,表头为registry、repository、image_name、tag,请注意如果一个容器镜像有多个tag版本,请分多行展示。 +next_flow: + - docker_pull_specified_AI_docker_images \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/pull_images_flow.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/pull_images_flow.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..81a91bcec77b97e81d6b884c2efc8d1c461ca224 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/pull_images_flow.yaml" @@ -0,0 +1,15 @@ +name: docker_pull_specified_AI_docker_images +description: "从dockerhub拉取指定的docker容器镜像,输入为容器镜像的名字和tag" +steps: + - name: start + call_type: api + params: + endpoint: POST /docker/pull + next: extract_key + - name: extract_key + call_type: extract + params: + keys: + - data.shell +next_flow: + - docker_run_specified_AI_docker_images \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/run_images_flow.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/run_images_flow.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..93a3527c5e3ea8e9d709ac911d0335e90f9b6d26 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/flows/run_images_flow.yaml" @@ -0,0 +1,13 @@ +name: docker_run_specified_AI_docker_images +description: "运行指定的容器镜像,输入为容器镜像的名字和tag" +steps: + - name: start + call_type: api + params: + endpoint: POST /docker/run + next: extract_key + - name: extract_key + call_type: extract + params: + keys: + - data.shell diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/openapi.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/openapi.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..08585e29f3c78e66bdcfc5879f933ab8b618d57a --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/openapi.yaml" @@ -0,0 +1,190 @@ +openapi: 3.0.2 +info: + title: compatibility-ai-infra + version: 0.1.0 +servers: + - url: http://ai-infra-service.compatibility-ai-infra.svc.cluster.local:8101 +paths: + /docker/images: + get: + description: 获取所有支持的AI容器信息,返回容器名字和tag + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ResponseData' + /docker/pull: + post: + description: 输入容器镜像名字和容器镜像tag,返回拉取该容器的shell命令 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/PullDockerImages' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ResponseData' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /docker/run: + post: + description: 输入容器名字和tag,返回运行该容器的shell命令 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RunDockerImages' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ResponseData' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' +components: + schemas: + HTTPValidationError: + description: HTTP校验错误 + type: object + properties: + detail: + title: Detail + type: array + items: + $ref: '#/components/schemas/ValidationError' + PullDockerImages: + description: 生成容器拉取命令的接口的入参 + required: + - image_name + - image_tag + type: object + properties: + image_name: + description: 容器镜像的名字,不要包含转义符 + type: string + enum: + - cann + - oneapi-runtime + - oneapi-basekit + - llm-server + - mlflow + - llm + - tensorflow + - pytorch + - cuda + image_tag: + description: 容器镜像的tag,不要包含转义符 + type: string + enum: + - "8.0.RC1-oe2203sp4" + - "cann7.0.RC1.alpha002-oe2203sp2" + - "2024.2.0-oe2403lts" + - "1.0.0-oe2203sp3" + - "2.11.1-oe2203sp3" + - "2.13.1-oe2203sp3" + - "chatglm2_6b-pytorch2.1.0.a1-cann7.0.RC1.alpha002-oe2203sp2" + - "llama2-7b-q8_0-oe2203sp2" + - "chatglm2-6b-q8_0-oe2203sp2" + - "fastchat-pytorch2.1.0.a1-cann7.0.RC1.alpha002-oe2203sp2" + - "tensorflow2.15.0-oe2203sp2" + - "tensorflow2.15.0-cuda12.2.0-devel-cudnn8.9.5.30-oe2203sp2" + - "pytorch2.1.0-oe2203sp2" + - "pytorch2.1.0-cuda12.2.0-devel-cudnn8.9.5.30-oe2203sp2" + - "pytorch2.1.0.a1-cann7.0.RC1.alpha002-oe2203sp2" + - "cuda12.2.0-devel-cudnn8.9.5.30-oe2203sp2" + ResponseData: + description: 接口返回值的固定格式 + required: + - code + - message + - data + type: object + properties: + code: + description: 状态码 + type: integer + message: + description: 状态信息 + type: string + data: + description: 返回数据 + type: any + RunDockerImages: + description: 生成容器运行命令的接口的入参 + required: + - image_name + - image_tag + type: object + properties: + image_name: + description: 容器镜像的名字,不要包含转义符 + type: string + enum: + - cann + - oneapi-runtime + - oneapi-basekit + - llm-server + - mlflow + - llm + - tensorflow + - pytorch + - cuda + image_tag: + description: 容器镜像的tag,不要包含转义符 + type: string + enum: + - "8.0.RC1-oe2203sp4" + - "cann7.0.RC1.alpha002-oe2203sp2" + - "2024.2.0-oe2403lts" + - "1.0.0-oe2203sp3" + - "2.11.1-oe2203sp3" + - "2.13.1-oe2203sp3" + - "chatglm2_6b-pytorch2.1.0.a1-cann7.0.RC1.alpha002-oe2203sp2" + - "llama2-7b-q8_0-oe2203sp2" + - "chatglm2-6b-q8_0-oe2203sp2" + - "fastchat-pytorch2.1.0.a1-cann7.0.RC1.alpha002-oe2203sp2" + - "tensorflow2.15.0-oe2203sp2" + - "tensorflow2.15.0-cuda12.2.0-devel-cudnn8.9.5.30-oe2203sp2" + - "pytorch2.1.0-oe2203sp2" + - "pytorch2.1.0-cuda12.2.0-devel-cudnn8.9.5.30-oe2203sp2" + - "pytorch2.1.0.a1-cann7.0.RC1.alpha002-oe2203sp2" + - "cuda12.2.0-devel-cudnn8.9.5.30-oe2203sp2" + ValidationError: + description: 接口的入参校验错误时返回的内容格式 + required: + - loc + - msg + - type + type: object + properties: + loc: + title: Location + type: array + items: + anyOf: + - type: string + - type: integer + msg: + title: Message + type: string + type: + title: Error Type + type: string \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/plugin.json" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/plugin.json" new file mode 100644 index 0000000000000000000000000000000000000000..6136093d2313bd85ae2f2244adef96d48dad90bd --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/Compatibility-AI-Infra/plugin.json" @@ -0,0 +1,6 @@ +{ + "id": "ai_docker_images", + "name": "AI容器镜像", + "description": "该插件接受用户的输入,检查当前支持哪些AI容器,拉取容器,运行容器", + "predefined_question": "查看当前支持哪些AI容器,拉取指定的容器,运行指定的容器" +} \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210\351\203\250\347\275\262\346\214\207\345\215\227.md" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210\351\203\250\347\275\262\346\214\207\345\215\227.md" new file mode 100644 index 0000000000000000000000000000000000000000..25e8340e4715cc1301257897d56c836e8c5fc5e7 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210/\346\217\222\344\273\266\342\200\224AI\345\256\271\345\231\250\346\240\210\351\203\250\347\275\262\346\214\207\345\215\227.md" @@ -0,0 +1,36 @@ +# AI容器栈部署指南 + +## 准备工作 + ++ 提前部署euler-copilot-shell 客户端 + ++ 修改 /xxxx/xxxx/values.yaml 文件的 euler-copilot-tune 部分,将enable字段改为True + +```bash +enable: True +``` + ++ 更新环境 + +```bash +helm upgrade euler-copilot . +``` + ++ 检查Compatibility-AI-Infra目录下的openapi.yaml 中 servers.url 字段,确保AI容器服务的启动地址被正确设置 + ++ 获取 $plugin_dir 插件文件夹的路径,该变量位于 euler-copilot-helm/chart/euler_copilot/values.yaml 中的framework模块 + ++ 如果插件目录不存在,需新建该目录 + ++ 将该目录下的Compatibility-AI-Infra文件夹放到 $plugin_dir 中 + +```bash +cp -r ./Compatibility-AI-Infra $PLUGIN_DIR +``` + ++ 重建framework pod,重载插件配置 +```bash +kubectl delete pod framework-xxxx -n 命名空间 +``` + + diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/demarcation.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/demarcation.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..6831bdea203e1ffd360f765e5f85ebdce704a437 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/demarcation.yaml" @@ -0,0 +1,18 @@ +name: demarcation +description: 该工具的作用为针对已知异常事件进行定界分析。需从上下文中获取start_time(开始时间),end_time(结束时间),container_id(容器ID) +steps: + - name: start + call_type: api + params: + endpoint: POST /demarcation + next: report_gen + - name: report_gen + call_type: llm + params: + system_prompt: 你是一个系统智能助手,擅长分析系统的故障现象,最终生成分析报告。 + user_prompt: | + 您是一个专业的运维人员,擅长分析系统的故障现象,最终生成分析报告。当前异常检测结果为{data}。 + 将root_causes_metric_top3内容输出为表格形式,并为每个根因指标进行标号。 + 整个分析报告应该符合markdown规范 +next_flow: + - detection \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/detection.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/detection.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..836c71423d63248cd84fe20593d6f848c9b35363 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/detection.yaml" @@ -0,0 +1,10 @@ +name: detection +description: 该工具的作用为针对已知容器ID和指标,执行profiling分析任务,得到任务ID。需从上下文中获取container_id(容器ID)和三个metric(指标)的其中一个。 +steps: + - name: start + call_type: api + params: + endpoint: POST /detection + next: end + - name: end + call_type: none diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/inspection.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/inspection.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..afaefe31106c5ec2016fb3f030fb363950b62516 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/inspection.yaml" @@ -0,0 +1,16 @@ +name: inspection +description: 该工具的作用为在指定机器上对容器进行异常事件检测。需从上下文中获取start_time(开始时间),end_time(结束时间),machine_id(机器IP) +steps: + - name: start + call_type: api + params: + endpoint: POST /inspection + next: list2markdown + - name: list2markdown + call_type: llm + params: + user_prompt: | + 您是一个专业的运维人员,擅长分析系统的故障现象,最终生成分析报告。当前的异常检测结果为{data}。请将anomaly_events_times_list的信息,输出为表格形式。整个分析报告请符合markdown规范。 + +next_flow: + - demarcation \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/show_profiling.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/show_profiling.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..b82172eb272e6c0679dd32582e18e4ecda7dc2bf --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/flows/show_profiling.yaml" @@ -0,0 +1,36 @@ +name: show_profiling +description: 根据已知的智能诊断任务ID(task_id),获取报告的原始数据。随后根据原始数据,生成详细的报告。 +steps: + - name: start + call_type: api + params: + endpoint: POST /show_profiling + next: report_gen + - name: report_gen + call_type: llm + params: + system_prompt: | + 你是一个数据分析和性能分析的专家,请按以下的模板分析出应用的性能瓶颈: + + 1.分析topStackSelf字段中自身耗时排名前3的函数调用栈,分析结果中应该包含函数的耗时信息、函数调用栈的解释说明。 + 2.分析topStackTotal字段中总耗时排名前3的函数调用栈,分析结果中应该包含函数的耗时信息、函数调用栈的解释说明。 + 3.总结前两步的分析结果,并给出影响应用性能的瓶颈所在,同时给出建议。 + user_prompt: | + 现有定界分析结果:{data} + 上面提供了一个JSON对象,它包含了应用程序的Profiling分析报告。该JSON对象包括如下几个字段: + + - traceEvents:它是一个事件列表,列表中的每一项表示一个事件,每个事件以字典格式存储,事件的主要内容解释如下: + - cat 字段:表示事件的分类,它的值包括 syscall、python_gc、sample、pthread_sync,oncpu。其中,syscall 表示这是一个系统调用事件;python_gc 表示这是一个Python垃圾回收事件;sample表示这是一个cpu调用栈采样事件;oncpu表示这是一个OnCPU事件,它说明了pid字段所代表的进程正在占用cpu。 + - name字段:表示事件的名称; + - pid字段:表示事件的进程ID; + - tid字段:表示事件所在的线程ID; + - ts字段:表示事件发生的开始时间,它是一个时间戳格式,单位是微秒; + - dur字段:表示事件的耗时,单位是微秒; + - sf字段:表示事件的函数调用栈,内容是以分号(;)分隔的函数名列表,分号左边是调用方的函数名,分号右边是被调用的函数名。 + - args字段:表示每个事件特有的信息,内容主要包括:count字段,表示事件发生的计数;thread.name字段,表示事件所在的线程的名称;cpu字段,表示采样的cpu编号。 + - topStackSelf:表示应用程序在执行CPU操作期间,自身耗时排名前10的函数调用栈列表。自身耗时是指函数调用栈自身的耗时。列表中的每一项内容说明如下: + - stack:用字符串表示的一个函数调用栈,内容是以分号(;)分隔的函数名列表,分号左边是调用方的函数名,分号右边是被调用的函数名。 + - self_time:stack表示的函数调用栈的自身耗时,单位是毫秒。 + - topStackTotal:表示应用程序在执行CPU操作期间,总耗时排名前10的函数调用栈列表,总耗时是指函数调用栈累积的耗时,它包含了自身耗时。列表中的每一项内容说明如下: + - stack:用字符串表示的一个函数调用栈,内容是以分号(;)分隔的函数名列表,分号左边是调用方的函数名,分号右边是被调用的函数名。 + - total_time:stack表示的函数调用栈的总耗时,单位是毫秒。 \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/openapi.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/openapi.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..9ebf2715d5ff61cd86150cfa9b208c2c48a2afa3 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/openapi.yaml" @@ -0,0 +1,255 @@ +openapi: 3.0.2 +info: + title: 智能诊断 + version: 1.0.0 +servers: + - url: http://192.168.10.31:20030 +paths: + /inspection: + post: + description: 对指定机器进行异常检测,返回异常事件 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/InspectionRequestData' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /demarcation: + post: + description: 对指定容器进行异常定界 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/DemarcationRequestData' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /detection: + post: + description: 根据定界结果指标进行定位 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/DetectionRequestData' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /show_profiling: + post: + description: 根据任务ID,获取Profiling结果 + requestBody: + content: + application/json: + schema: + type: object + description: 请求数据 + required: + - task_id + properties: + task_id: + type: string + description: 任务ID,为UUID类型 + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: "#/components/schemas/ShowProfilingResponse" + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' +components: + schemas: + HTTPValidationError: + type: object + description: HTTP 校验错误 + properties: + detail: + type: array + items: + $ref: '#/components/schemas/ValidationError' + title: Detail + InspectionRequestData: + type: object + description: 巡检接口入参 + required: + - machine_id + - start_time + - end_time + properties: + machine_id: + description: 机器IP。如果给定的信息没有指定任何机器IP,则默认为“default_0.0.0.0”。 + type: string + title: Machine_ID + default: default_0.0.0.0 + start_time: + description: 根据给定的信息提取出开始时间,如果给定的信息不包含开始时间,开始时间可以设置为当前时间往前推2分钟,最终解析出的时间以'%Y-%m-%d %H:%M:%S'格式输出 + type: string + title: Start_Time + default: '' + end_time: + description: 根据给定的信息提取出结束时间,如果给定的信息不包含结束时间,结束时间可以设置为当前时间,最终解析出的时间以'%Y-%m-%d %H:%M:%S'格式输出 + type: string + title: End_Time + default: '' + DemarcationRequestData: + type: object + description: 定界接口入参 + required: + - start_time + - end_time + - container_id + properties: + start_time: + description: 根据给定的信息提取出开始时间,如果给定的信息不包含开始时间,开始时间可以设置为当前时间往前推2分钟,最终解析出的时间以'%Y-%m-%d %H:%M:%S'格式输出 + type: string + title: Start_Time + default: '' + end_time: + description: 根据给定的信息提取出结束时间,如果给定的信息不包含结束时间,结束时间可以设置为当前时间,最终解析出的时间以'%Y-%m-%d %H:%M:%S'格式输出 + type: string + title: End_Time + default: '' + container_id: + description: 结合问题中指定的具体异常事件,根据给定信息提取容器ID + type: string + title: Container_ID + default: '' + DetectionRequestData: + type: object + description: 定位接口入参 + required: + - container_id + - metric + properties: + container_id: + description: 结合问题中指定的具体指标或者指标号,根据给定信息提取容器ID + type: string + title: Container_ID + default: '' + metric: + description: 结合问题中的具体指标或者指标号,根据给定信息提取具体指标值作为metric + type: string + title: Metric + default: '' + ShowProfilingResponse: + type: object + description: show profiling 的返回结果 + properties: + traceEvents: + type: array + items: + type: object + properties: + cat: + type: string + description: Event category (syscall, python_gc, sample, pthread_sync, oncpu) + name: + type: string + description: Event name + pid: + type: integer + format: int32 + description: Process ID + tid: + type: integer + format: int32 + description: Thread ID + ts: + type: integer + format: int64 + description: Timestamp of the event start in microseconds + dur: + type: integer + format: int32 + description: Duration of the event in microseconds + sf: + type: string + description: Call stack represented as a list of function names separated by semicolons + args: + type: object + additionalProperties: true + description: Additional event-specific information such as count, thread.name, and cpu + topStackSelf: + type: array + items: + type: object + properties: + stack: + type: string + description: Call stack represented as a list of function names separated by semicolons + self_time: + type: number + format: int + description: Exclusive time spent in the call stack in milliseconds + topStackTotal: + type: array + items: + type: object + properties: + stack: + type: string + description: Call stack represented as a list of function names separated by semicolons + total_time: + type: number + format: int + description: Total inclusive time spent in the call stack in milliseconds + ValidationError: + type: object + required: + - loc + - msg + - type + title: ValidationError + properties: + loc: + type: array + items: + anyOf: + - type: string + - type: integer + title: Location + msg: + type: string + title: Message + type: + type: string + title: Error Type \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/plugin.json" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/plugin.json" new file mode 100644 index 0000000000000000000000000000000000000000..b0ef2fd7aa0c13ad626a01d0fc7a4bf010ab3178 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/euler-copilot-rca/plugin.json" @@ -0,0 +1,5 @@ +{ + "id": "rca", + "name": "智能诊断", + "description": "该插件具备以下功能:巡检,定界,定位" +} \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255\351\203\250\347\275\262\346\214\207\345\215\227.md" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255\351\203\250\347\275\262\346\214\207\345\215\227.md" new file mode 100644 index 0000000000000000000000000000000000000000..39736d97b4eae94d210cdcad10448115fc405594 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\257\212\346\226\255\351\203\250\347\275\262\346\214\207\345\215\227.md" @@ -0,0 +1,20 @@ +# 智能诊断部署指南 + +## 准备工作 + ++ 在对应需要诊断的机器上安装gala-gopher + +``` +hub.oepkgs.net/a-ops/gala-gopher-profiling-x86_64:930eulercopilot +``` ++ 启动gala-gopher + +``` +docker run -d --name gala-gopher-profiling --privileged --pid=host --network=host -v /:/host -v /etc/localtime:/etc/localtime:ro -v /sys:/sys -v /usr/lib/debug:/usr/lib/debug -v /var/lib/docker:/var/lib/docker -e GOPHER_HOST_PATH=/host gala-gopher-profiling-x86_64:930eulercopilot +``` + + + + + + diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/data_collection.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/data_collection.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..d042505bb68073c3bd90e2fee2a1fca432e5af70 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/data_collection.yaml" @@ -0,0 +1,15 @@ +name: data_collection +description: 采集某一指定ip主机的系统性能指标 +steps: + - name: start + call_type: api + params: + endpoint: POST /performance_metric + next: show_data + - name: show_data + call_type: llm + params: + user_prompt: | + 当前采集到系统性能指标为:{data}, 输出内容请符合markdown规范。 +next_flow: + - performance_analysis \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/performance_analysis.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/performance_analysis.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..8da64d91e30eddc62cb10ba4230b9e203360c77b --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/performance_analysis.yaml" @@ -0,0 +1,15 @@ +name: performance_analysis +description: 分析性能指标并生成性能分析报告 +steps: + - name: start + call_type: api + params: + endpoint: POST /performance_report + next: extract_key + - name: extract_key + call_type: extract + params: + keys: + - data.output +next_flow: + - performance_tuning \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/performance_tuning.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/performance_tuning.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..841ad0385a0162f935d4c5019313e028cfc65a86 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/flows/performance_tuning.yaml" @@ -0,0 +1,13 @@ +name: performance_tuning +description: 基于性能能分析报告,生成操作系统和Mysql应用的性能优化建议,结果以shell脚本的形式返回 +steps: + - name: start + call_type: api + params: + endpoint: POST /optimization_suggestion + next: extract_key + - name: extract_key + call_type: extract + params: + keys: + - data.script \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/openapi.yaml" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/openapi.yaml" new file mode 100644 index 0000000000000000000000000000000000000000..257a7eb33bbdd0cf3048fd6927efd8e185455857 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/openapi.yaml" @@ -0,0 +1,147 @@ +openapi: 3.0.2 +info: + title: 智能诊断 + version: 1.0.0 +servers: + - url: http://euler-copilot-tune.euler-copilot.svc.cluster.local:8100 +paths: + /performance_metric: + post: + description: 对指定机器进行性能指标采集,返回指标值 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/PerformanceMetricRequestData' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /performance_report: + post: + description: 基于采集到的指标,对指定机器进行性能诊断,生成性能分析报告 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/PerformanceReportRequestData' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /optimization_suggestion: + post: + description: 根据性能分析报告,以及指定的机器应用信息,生成调优建议 + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/OptimizationSuggestionRequestData' + required: true + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' +components: + schemas: + HTTPValidationError: + type: object + description: HTTP 校验错误 + properties: + detail: + type: array + items: + $ref: '#/components/schemas/ValidationError' + OptimizationSuggestionRequestData: + type: object + description: 生成优化建议的接口的入参 + required: + - app + - ip + properties: + app: + type: string + description: 应用名称 + default: mysql + enum: + - mysql + - none + ip: + type: string + description: 点分十进制的ipv4地址, 例如192.168.10.43 + example: "192.168.10.43" + PerformanceMetricRequestData: + type: object + description: 性能指标采集的接口的入参 + required: + - app + - ip + properties: + ip: + type: string + description: 点分十进制的ipv4地址, 例如192.168.10.43 + example: "192.168.10.43" + app: + type: string + description: App + default: none + enum: + - mysql + - none + PerformanceReportRequestData: + type: object + description: 生成性能报告接口的入参 + required: + - ip + properties: + ip: + type: string + description: 点分十进制的ipv4地址, 例如192.168.10.43 + example: "192.168.10.43" + ValidationError: + type: object + required: + - loc + - msg + - type + title: ValidationError + properties: + loc: + type: array + items: + anyOf: + - type: string + - type: integer + title: Location + msg: + type: string + title: Message + type: + type: string + title: Error Type \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/plugin.json" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/plugin.json" new file mode 100644 index 0000000000000000000000000000000000000000..909e58cdf25bea526aa3640556f7999dc0133139 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/euler-copilot-tune/plugin.json" @@ -0,0 +1,6 @@ +{ + "id": "tune", + "name": "智能性能优化", + "description": "该插件具备以下功能:采集系统性能指标,分析系统性能,推荐系统性能优化建议", + "automatic_flow": false +} \ No newline at end of file diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230\351\203\250\347\275\262\346\214\207\345\215\227.md" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230\351\203\250\347\275\262\346\214\207\345\215\227.md" new file mode 100644 index 0000000000000000000000000000000000000000..9da4736bf3816e8dfa885356e947127dcb528398 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230/\346\217\222\344\273\266\342\200\224\346\231\272\350\203\275\350\260\203\344\274\230\351\203\250\347\275\262\346\214\207\345\215\227.md" @@ -0,0 +1,60 @@ +# 插件—智能调优部署指南 + +## 准备工作 + ++ 提前部署euler-copilot-shell 客户端 + ++ 被调优机器需要为 openEuler 2203 LTS-SP3 + ++ 在需要被调优的机器上安装依赖 + +```bash +yum install -y sysstat perf +``` + ++ 被调优机器需要开启SSH 22端口 + + ++ 修改 /xxxx/xxxx/values.yaml 文件的 euler-copilot-tune 部分,配置需要调优的机器,以及对应机器上的mysql的账号名以及密码 + +```bash +ip:xxxxx +password:xxxx +mysql: + user:xxx + password:xxxx +``` + ++ 修改 /xxxx/xxxx/values.yaml 文件的 euler-copilot-tune 部分,将enable字段改为True + +```bash +enable: True +``` + ++ 更新环境 + +```bash +helm upgrade euler-copilot . +``` + ++ 检查euler-copilot-tune目录下的openapi.yaml 中 servers.url 字段,确保调优服务的启动地址被正确设置 + ++ 获取 $plugin_dir 插件文件夹的路径,该变量位于 euler-copilot-helm/chart/euler_copilot/values.yaml 中的framework模块 + ++ 如果插件目录不存在,需新建该目录 + ++ 将该目录下的euler-copilot-tune文件夹放到 $plugin_dir 中 + +```bash +cp -r ./euler-copilot-tune $plugin_dir +``` + ++ 重建 framework pod,重载插件配置 +```bash +kubectl delete pod framework-xxxx -n 命名空间 +``` + + + + + diff --git "a/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\351\211\264\346\235\203\342\200\224AuthHub\351\203\250\347\275\262\346\214\207\345\215\227.md" "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\351\211\264\346\235\203\342\200\224AuthHub\351\203\250\347\275\262\346\214\207\345\215\227.md" new file mode 100644 index 0000000000000000000000000000000000000000..c8f46548bfd9db320728b0930dd1d6954ccd8f40 --- /dev/null +++ "b/docs/user-guide/\351\203\250\347\275\262\346\214\207\345\215\227/\351\211\264\346\235\203\342\200\224AuthHub\351\203\250\347\275\262\346\214\207\345\215\227.md" @@ -0,0 +1,16 @@ +# AuthHub 部署指南 + +## 准备工作 + ++ 将 values.yaml 中的 authHub-web,authHub 的 enable 字段改为True + +```bash +enable: True +``` + ++ 更新环境 + +```bash +helm upgrade euler-copilot . +``` +