diff --git a/gopher_tech.md b/gopher_tech.md index 0fc775f4b3ed420e0955abeb0bce7dded584690e..f0ec13d4bc78dd78979579ae891e3b69527b87df 100644 --- a/gopher_tech.md +++ b/gopher_tech.md @@ -105,31 +105,31 @@ 实体名:block -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| --------------------- | ---------------- | ------------ | ----- | ------------------------------ | -------------------------------------- | -| major | block | key | | 块对象编号 | 支持NVME、SCSI、VirtBlock三种类型Block | -| first_minor | block | key | | 块对象编号 | | -| blk_type | block | label | | 块对象类型(比如disk, part) | | -| blk_name | block | label | | 块对象名称 | | -| disk_name | block | label | | 所属磁盘名称 | | -| latency_req_max | io_latency(0x01) | Gauge | ns | block层I/O操作时延最大值 | | -| latency_req_last | io_latency(0x01) | Gauge | ns | block层I/O操作时延最近值 | | -| latency_req_sum | io_latency(0x01) | Gauge | ns | block层I/O操作时延总计值 | | -| latency_req_jitter | io_latency(0x01) | Gauge | ns | block层I/O操作时延抖动 | | -| count_latency_req | io_latency(0x01) | Gauge | | block层I/O操作操作次数 | | -| latency_driver_max | io_latency(0x01) | Gauge | ns | 驱动层时延最大值 | | -| latency_driver_last | io_latency(0x01) | Gauge | ns | 驱动层时延最近值 | | -| latency_driver_sum | io_latency(0x01) | Gauge | ns | 驱动层时延最总计值 | | -| latency_driver_jitter | io_latency(0x01) | Gauge | ns | 驱动层时延抖动 | | -| count_latency_driver | io_latency(0x01) | Gauge | | 驱动层操作次数 | | -| latency_device_max | io_latency(0x01) | Gauge | ns | 设备层时延最大值 | | -| latency_device_last | io_latency(0x01) | Gauge | ns | 设备层时延最近值 | | -| latency_device_sum | io_latency(0x01) | Gauge | ns | 设备层时延最总计值 | | -| latency_device_jitter | io_latency(0x01) | Gauge | ns | 设备层时延抖动 | | -| count_latency_device | io_latency(0x01) | Gauge | | 设备层操作次数 | | -| err_code | io_err(0x02) | Gauge | | block层I/O操作错误码 | | -| read_bytes | io_count(0x04) | Gauge | bytes | I/O操作读字节数 | | -| write_bytes | io_count(0x04) | Gauge | bytes | I/O操作写字节数 | | +| metrics_name | metrics_type | unit | metrics description | Support | +| --------------------- | ------------ | ----- | ------------------------------ | -------------------------------------- | +| major | key | | 块对象编号 | 支持NVME、SCSI、VirtBlock三种类型Block | +| first_minor | key | | 块对象编号 | | +| blk_type | label | | 块对象类型(比如disk, part) | | +| blk_name | label | | 块对象名称 | | +| disk_name | label | | 所属磁盘名称 | | +| latency_req_max | Gauge | ns | block层I/O操作时延最大值 | | +| latency_req_last | Gauge | ns | block层I/O操作时延最近值 | | +| latency_req_sum | Gauge | ns | block层I/O操作时延总计值 | | +| latency_req_jitter | Gauge | ns | block层I/O操作时延抖动 | | +| count_latency_req | Gauge | | block层I/O操作操作次数 | | +| latency_driver_max | Gauge | ns | 驱动层时延最大值 | | +| latency_driver_last | Gauge | ns | 驱动层时延最近值 | | +| latency_driver_sum | Gauge | ns | 驱动层时延最总计值 | | +| latency_driver_jitter | Gauge | ns | 驱动层时延抖动 | | +| count_latency_driver | Gauge | | 驱动层操作次数 | | +| latency_device_max | Gauge | ns | 设备层时延最大值 | | +| latency_device_last | Gauge | ns | 设备层时延最近值 | | +| latency_device_sum | Gauge | ns | 设备层时延最总计值 | | +| latency_device_jitter | Gauge | ns | 设备层时延抖动 | | +| count_latency_device | Gauge | | 设备层操作次数 | | +| err_code | Gauge | | block层I/O操作错误码 | | +| read_bytes | Gauge | bytes | I/O操作读字节数 | | +| write_bytes | Gauge | bytes | I/O操作写字节数 | | @@ -139,38 +139,38 @@ 实体名:net -| metrics_name | table_name | metrics_type | unit | metrics description | -| ----------------- | ---------- | ------------ | ---- | ------------------- | -| origin | | key | | /proc/dev/snmp | -| tcp_curr_estab | system_tcp | gauge | | 当前的TCP连接数 | -| tcp_in_segs | system_tcp | gauge | segs | TCP接收的分片数 | -| tcp_out_segs | system_tcp | gauge | segs | TCP发送的分片数 | -| tcp_retrans_segs | system_tcp | gauge | segs | TCP重传的分片数 | -| tcp_in_errs | system_tcp | gauge | | TCP入包错误包数 | -| udp_indata_grams | system_udp | gauge | segs | UDP接收包量 | -| udp_outdata_grams | system_udp | gauge | segs | UDP发送包量 | +| metrics_name | metrics_type | unit | metrics description | +| ----------------- | ------------ | ---- | ------------------- | +| origin | key | | /proc/dev/snmp | +| tcp_curr_estab | gauge | | 当前的TCP连接数 | +| tcp_in_segs | gauge | segs | TCP接收的分片数 | +| tcp_out_segs | gauge | segs | TCP发送的分片数 | +| tcp_retrans_segs | gauge | segs | TCP重传的分片数 | +| tcp_in_errs | gauge | | TCP入包错误包数 | +| udp_indata_grams | gauge | segs | UDP接收包量 | +| udp_outdata_grams | gauge | segs | UDP发送包量 | #### 网卡统计 实体名:nic -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------ | ---------- | ------------ | -------- | ---------------------- | -| dev_name | nic | key | | 网卡名称 | -| rx_bytes | nic | gauge | bytes | 网卡接收字节数 | -| rx_packets | nic | gauge | | 网卡接收的总数据包数 | -| rx_errs | nic | gauge | | 网卡接收错误的数据包数 | -| rx_dropped | nic | gauge | | 网卡接收丢弃的数据包数 | -| tx_bytes | nic | gauge | bytes | 网卡发送字节数 | -| tx_packets | nic | gauge | | 网卡发送的总数据包数 | -| tx_errs | nic | gauge | | 网卡发送错误的数据包数 | -| tx_dropped | nic | gauge | | 网卡发送丢弃的数据包数 | -| rxspeed_KB | nic | gauge | Kbytes/s | 网卡上行速率 | -| txspeed_KB | nic | gauge | Kbytes/s | 网卡下行速率 | -| tc_sent_drop | nic | gauge | | TC发送丢包 | -| tc_sent_overlimits | nic | gauge | | TC发送队列溢出 | -| tc_backlog | nic | gauge | | TC backlog队列包数量 | -| tc_ecn_mark | nic | gauge | | TC 拥塞标记 | +| metrics_name | metrics_type | unit | metrics description | +| ------------------ | ------------ | -------- | ---------------------- | +| dev_name | key | | 网卡名称 | +| rx_bytes | gauge | bytes | 网卡接收字节数 | +| rx_packets | gauge | | 网卡接收的总数据包数 | +| rx_errs | gauge | | 网卡接收错误的数据包数 | +| rx_dropped | gauge | | 网卡接收丢弃的数据包数 | +| tx_bytes | gauge | bytes | 网卡发送字节数 | +| tx_packets | gauge | | 网卡发送的总数据包数 | +| tx_errs | gauge | | 网卡发送错误的数据包数 | +| tx_dropped | gauge | | 网卡发送丢弃的数据包数 | +| rxspeed_KB | gauge | Kbytes/s | 网卡上行速率 | +| txspeed_KB | gauge | Kbytes/s | 网卡下行速率 | +| tc_sent_drop | gauge | | TC发送丢包 | +| tc_sent_overlimits | gauge | | TC发送队列溢出 | +| tc_backlog | gauge | | TC backlog队列包数量 | +| tc_ecn_mark | gauge | | TC 拥塞标记 | ## 容器性能 @@ -432,25 +432,25 @@ 实体名:kafka_topic_flow -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------ | ---------------- | ------------ | ---- | ---------------------------------------------------- | -------------- | -| msg_type | kafka_topic_flow | key | | 访问类型,producer或consumer | 需要修改实体名 | -| client_ip | kafka_topic_flow | key | | 客户端IP | | -| num | kafka_topic_flow | gauge | | 在一次采样周期中producer发布或consumer消费的消息数量 | | -| topic | kafka_topic_flow | key | | 消息的topic | | -| server_ip | kafka_topic_flow | key | | kafka server所在主机的网卡IP | | -| server_port | kafka_topic_flow | key | | kafka server所绑定的端口号 | | +| metrics_name | metrics_type | unit | metrics description | Support | +| ------------ | ------------ | ---- | ---------------------------------------------------- | -------------- | +| msg_type | key | | 访问类型,producer或consumer | 需要修改实体名 | +| client_ip | key | | 客户端IP | | +| num | gauge | | 在一次采样周期中producer发布或consumer消费的消息数量 | | +| topic | key | | 消息的topic | | +| server_ip | key | | kafka server所在主机的网卡IP | | +| server_port | key | | kafka server所绑定的端口号 | | ### Topic性能监控 实体名:kafka_topic_metrics -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------ | ------------------- | ------------ | ---- | ---------------------------- | ------- | -| topic | kafka_topic_metrics | key | | 消息的topic | TO BE | -| server_ip | kafka_topic_metrics | key | | kafka server所在主机的网卡IP | TO BE | -| server_port | kafka_topic_metrics | key | | kafka server所绑定的端口号 | TO BE | -| throughput | kafka_topic_metrics | histogram | | topic 吞吐量 | TO BE | +| metrics_name | metrics_type | unit | metrics description | Support | +| ------------ | ------------ | ---- | ---------------------------- | ------- | +| topic | key | | 消息的topic | TO BE | +| server_ip | key | | kafka server所在主机的网卡IP | TO BE | +| server_port | key | | kafka server所绑定的端口号 | TO BE | +| throughput | histogram | | topic 吞吐量 | TO BE | ## Nginx/Haproxy监控 @@ -458,26 +458,71 @@ 实体名:nginx_link -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------ | ---------- | ------------ | ---- | ------------------- | -------------------------- | -| client_ip | nginx_link | key | | 客户端IP | 当前仅支持nginx 1.12.1版本 | -| virtual_ip | nginx_link | key | | 虚拟服务器IP | | -| server_ip | nginx_link | key | | 真实服务端IP | | -| virtual_port | nginx_link | key | | 虚拟服务器端口 | | -| server_port | nginx_link | key | | 真实服务端端口 | | -| is_l7 | nginx_link | label | | 1—七层LB / 0—四层LB | | -| link_count | nginx_link | gauge | | 连接数 | | +| metrics_name | metrics_type | unit | metrics description | Support | +| ------------ | ------------ | ---- | ------------------- | -------------------------- | +| client_ip | key | | 客户端IP | 当前仅支持nginx 1.12.1版本 | +| virtual_ip | key | | 虚拟服务器IP | | +| server_ip | key | | 真实服务端IP | | +| virtual_port | key | | 虚拟服务器端口 | | +| server_port | key | | 真实服务端端口 | | +| is_l7 | label | | 1—七层LB / 0—四层LB | | +| link_count | gauge | | 连接数 | | ### Haproxy负载分担监控 实体名:haproxy_link -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------ | ------------ | ------------ | ---- | ------------------- | ------------------------------ | -| client_ip | haproxy_link | key | | 客户端IP | 当前仅支持haproxy 2.5-dev0版本 | -| virtual_ip | haproxy_link | key | | 虚拟服务器IP | | -| server_ip | haproxy_link | key | | 真实服务端IP | | -| virtual_port | haproxy_link | key | | 虚拟服务器端口 | | -| server_port | haproxy_link | key | | 真实服务端端口 | | -| protocol | haproxy_link | label | | 协议类型(TCP/HTTP) | | -| link_count | haproxy_link | gauge | | 连接数 | | +| metrics_name | metrics_type | unit | metrics description | Support | +| ------------ | ------------ | ---- | ------------------- | ------------------------------ | +| client_ip | key | | 客户端IP | 当前仅支持haproxy 2.5-dev0版本 | +| virtual_ip | key | | 虚拟服务器IP | | +| server_ip | key | | 真实服务端IP | | +| virtual_port | key | | 虚拟服务器端口 | | +| server_port | key | | 真实服务端端口 | | +| protocol | label | | 协议类型(TCP/HTTP) | | +| link_count | gauge | | 连接数 | | + +### 负载分担流拓扑构建 + +通过针对TCP连接、Nginx/haproxy负载分担流监控,可以有效构建出前后端应用之间事件TCP拓扑流,如下图所示: + +![](./png/nginx_flow.png) + + + +## Redis/PostgreSQL监控 + +### Redis性能监控 + +实体名:sli + +| metrics_name | metrics_type | unit | metrics description | Support | +| ------------ | ------------ | ---- | ------------------------------ | ---------------- | +| tgid | key | | 进程ID | 仅支持非加密场景 | +| ins_id | key | | 实例ID | | +| app | key | | 应用名 | | +| method | key | | 请求方法 | | +| server_ip | label | | 服务端IP | | +| server_port | label | | 服务端端口 | | +| client_ip | label | | 客户端IP | | +| client_port | label | | 客户端端口 | | +| rtt_nsec | gauge | ns | Redis协议请求RTT | | +| max_rtt_nsec | gauge | ns | Redis协议采样周期内最大请求RTT | | + +### PostgreSQL性能监控 + +实体名:sli + +| metrics_name | metrics_type | unit | metrics description | Support | +| ------------ | ------------ | ---- | -------------------------------- | ------------------------------- | +| tgid | key | | 进程ID | 支持加密场景,openssl 1.1.1版本 | +| ins_id | key | | 实例ID | | +| app | key | | 应用名 | | +| method | key | | 请求方法 | | +| server_ip | label | | 服务端IP | | +| server_port | label | | 服务端端口 | | +| client_ip | label | | 客户端IP | | +| client_port | label | | 客户端端口 | | +| rtt_nsec | gauge | ns | Postgre协议请求RTT | | +| max_rtt_nsec | gauge | ns | Postgre协议采样周期内最大请求RTT | | +| tps | gauge | | 数据库吞吐量 | 仅支持openGauss 2.0 |