From 99bc7dd1e1afdcfd51bdfbf800d733c1042d3a83 Mon Sep 17 00:00:00 2001 From: luzhihao Date: Tue, 12 Dec 2023 22:19:04 +0800 Subject: [PATCH] update docs --- gopher_tech.md | 1081 ++++++++++++++++-------------------------------- 1 file changed, 352 insertions(+), 729 deletions(-) diff --git a/gopher_tech.md b/gopher_tech.md index 36088b8..0fc775f 100644 --- a/gopher_tech.md +++ b/gopher_tech.md @@ -1,6 +1,6 @@ -# 系统性能 +基础设施 -## 主机概要 +## 主机 实体名:host @@ -14,7 +14,7 @@ | ip_addr | system_os | label | | 所有的IP地址 | | value | system_os | gauge | | 一个固定值作为metric,无实际意义 | -## CPU性能 +### CPU 实体名:cpu @@ -35,87 +35,73 @@ | rps_count | system_cpu | gauge | | CPU收到的RPS次数 | | total_used_per | system_cpu_util | gauge | % | CPU总利用率 | -## 内存性能 +### 内存 + +#### 系统内存 实体名:mem -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------- | -------------- | ------------ | ---- | ------------------------------------------------------ | -| mem | system_meminfo | key | | /proc/meminfo | -| mem_total | system_meminfo | gauge | KB | 系统总的可用物理内存 | -| mem_free | system_meminfo | gauge | KB | 系统还可用的物理内存 | -| mem_available | system_meminfo | gauge | KB | 用户还可用内存 | -| mem_util | system_meminfo | gauge | % | 系统内存使用率 | -| mem_buffers | system_meminfo | gauge | KB | 被 buffer使用的物理内存 | -| mem_cache | system_meminfo | gauge | KB | 被 cache使用的物理内存 | -| mem_active | system_meminfo | gauge | KB | 经常使用的cache页面大小 | -| mem_inactive | system_meminfo | gauge | KB | 非活跃内存大小,可回收 | -| swap_total | system_meminfo | gauge | KB | 交换区总量 | -| swap_free | system_meminfo | gauge | KB | 空闲交换区总量 | -| swap_util | system_meminfo | gauge | % | 交换区的使用率 | -| dentry | system_dentry | gauge | | dentry已占用的数量(注意dentry数量过多会引起系统卡顿) | -| unused_dentry | system_dentry | gauge | | dentry未使用的数量 | - -## 网络性能 - -### 协议栈统计 +| metrics_name | metrics_type | unit | metrics description | +| ------------- | ------------ | ---- | ------------------------------------------------------ | +| available_kB | gauge | KB | 系统可用内存 | +| util | gauge | % | 系统内存使用率 | +| cache_kB | gauge | KB | 系统可用cache大小 | +| active_kB | gauge | KB | 系统活跃cache大小 | +| inactive_kB | gauge | KB | 非活跃cache大小 | +| swap_util | gauge | % | 交换区的使用率 | +| dentry | gauge | | dentry已占用的数量(注意dentry数量过多会引起系统卡顿) | +| unused_dentry | gauge | | dentry未使用的数量 | -实体名:net +#### 内核内存 -| metrics_name | table_name | metrics_type | unit | metrics description | -| ----------------- | ---------- | ------------ | ---- | ------------------- | -| origin | | key | | /proc/dev/snmp | -| tcp_curr_estab | system_tcp | gauge | | 当前的TCP连接数 | -| tcp_in_segs | system_tcp | gauge | segs | TCP接收的分片数 | -| tcp_out_segs | system_tcp | gauge | segs | TCP发送的分片数 | -| tcp_retrans_segs | system_tcp | gauge | segs | TCP重传的分片数 | -| tcp_in_errs | system_tcp | gauge | | TCP入包错误包数 | -| udp_indata_grams | system_udp | gauge | segs | UDP接收包量 | -| udp_outdata_grams | system_udp | gauge | segs | UDP发送包量 | +实体名:mem -### 网卡统计 +| metrics_name | metrics_type | unit | metrics description | 支持 | +| ------------ | ------------ | ---- | ------------------------------------------------------------ | ----- | +| kern_kB | gauge | KB | Linux内核所占内存 | TO BE | +| slab_kB | gauge | KB | Linux内核态小内存分配器所分配的内存(总计、可回收、不可回收) | TO BE | +| page_kB | gauge | KB | 页表内存 | TO BE | +| vmalloc_kB | gauge | KB | Linux内核通过Vmalloc分配的内存 | TO BE | +| stack_kB | gauge | KB | 进程的内核堆栈总和 | TO BE | +| allocPage_kB | gauge | KB | Linux内核调用AllocPage申请的内存 | TO BE | -实体名:nic +#### 应用内存 -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------ | ---------- | ------------ | -------- | ---------------------- | -| dev_name | nic | key | | 网卡名称 | -| rx_bytes | nic | gauge | bytes | 网卡接收字节数 | -| rx_packets | nic | gauge | | 网卡接收的总数据包数 | -| rx_errs | nic | gauge | | 网卡接收错误的数据包数 | -| rx_dropped | nic | gauge | | 网卡接收丢弃的数据包数 | -| tx_bytes | nic | gauge | bytes | 网卡发送字节数 | -| tx_packets | nic | gauge | | 网卡发送的总数据包数 | -| tx_errs | nic | gauge | | 网卡发送错误的数据包数 | -| tx_dropped | nic | gauge | | 网卡发送丢弃的数据包数 | -| rxspeed_KB | nic | gauge | Kbytes/s | 网卡上行速率 | -| txspeed_KB | nic | gauge | Kbytes/s | 网卡下行速率 | -| tc_sent_drop | nic | gauge | | TC发送丢包 | -| tc_sent_overlimits | nic | gauge | | TC发送队列溢出 | -| tc_backlog | nic | gauge | | TC backlog队列包数量 | -| tc_ecn_mark | nic | gauge | | TC 拥塞标记 | +实体名:mem + +| metrics_name | metrics_type | unit | metrics description | 支持 | +| ---------------- | ------------ | ---- | ------------------------------------------------------------ | ----- | +| active_file_kB | gauge | KB | 文件缓存(活动) | TO BE | +| inactive_file_kB | gauge | KB | 文件缓存(非活动) | TO BE | +| active_anon_kB | gauge | KB | 匿名内存(活动) | TO BE | +| inactive_anon_kB | gauge | KB | 匿名内存(非活动) | TO BE | +| mlock | gauge | KB | 系统调用锁定内存 | TO BE | +| big_page_kB | gauge | KB | 系统大页内存大小 | TO BE | +| shmem_kB | gauge | KB | 共享内存(tmpfs)。业务进程退出后,经常会忘记删除tmpfs文件,或者在打开状态直接删掉tmpfs文件,都会操作shmem泄露。 | TO BE | -## I/O性能 -### 磁盘统计 + +### 磁盘 + +#### 磁盘统计 实体名:disk -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------ | ------------- | ------------ | --------------------- | --------------------------------------- | -| disk_name | system_iostat | key | | blk所在的物理磁盘名称 | -| rspeed | system_iostat | gauge | read times/second | 读速率(IOPS) | -| rspeed_kB | system_iostat | gauge | read kbytes/second | 吞吐量 | -| r_await | system_iostat | gauge | ms | 读响应时间 | -| rareq | system_iostat | gauge | | 饱和度(rareq-sz 和 wareq-sz+响应时间) | -| wspeed | system_iostat | gauge | write times/second | 写速率(IOPS) | -| wspeed_kB | system_iostat | gauge | write kbytes/second | 吞吐量 | -| w_await | system_iostat | gauge | ms | 写响应时间 | -| wareq | system_iostat | gauge | | 饱和度(rareq-sz 和 wareq-sz+响应时间) | -| aqu | system_iostat | gauge | | 平均队列深度 | -| util | system_iostat | gauge | % | 磁盘使用率 | - -### Block统计 +| metrics_name | metrics_type | unit | metrics description | +| ------------ | ------------ | --------------------- | --------------------------------------- | +| disk_name | key | | blk所在的物理磁盘名称 | +| rspeed | gauge | read times/second | 读速率(IOPS) | +| wspeed | gauge | write times/second | 写速率(IOPS) | +| rspeed_kB | gauge | read kbytes/second | 吞吐量 | +| wspeed_kB | gauge | write kbytes/second | 吞吐量 | +| r_await | gauge | ms | 读响应时间 | +| w_await | gauge | ms | 写响应时间 | +| rareq | gauge | | 饱和度(rareq-sz 和 wareq-sz+响应时间) | +| wareq | gauge | | 饱和度(rareq-sz 和 wareq-sz+响应时间) | +| aqu | gauge | | 平均队列深度 | +| util | gauge | % | 磁盘使用率 | + +#### Block统计 实体名:block @@ -145,395 +131,304 @@ | read_bytes | io_count(0x04) | Gauge | bytes | I/O操作读字节数 | | | write_bytes | io_count(0x04) | Gauge | bytes | I/O操作写字节数 | | + + +### 网络 + +#### 协议栈统计 + +实体名:net + +| metrics_name | table_name | metrics_type | unit | metrics description | +| ----------------- | ---------- | ------------ | ---- | ------------------- | +| origin | | key | | /proc/dev/snmp | +| tcp_curr_estab | system_tcp | gauge | | 当前的TCP连接数 | +| tcp_in_segs | system_tcp | gauge | segs | TCP接收的分片数 | +| tcp_out_segs | system_tcp | gauge | segs | TCP发送的分片数 | +| tcp_retrans_segs | system_tcp | gauge | segs | TCP重传的分片数 | +| tcp_in_errs | system_tcp | gauge | | TCP入包错误包数 | +| udp_indata_grams | system_udp | gauge | segs | UDP接收包量 | +| udp_outdata_grams | system_udp | gauge | segs | UDP发送包量 | + +#### 网卡统计 + +实体名:nic + +| metrics_name | table_name | metrics_type | unit | metrics description | +| ------------------ | ---------- | ------------ | -------- | ---------------------- | +| dev_name | nic | key | | 网卡名称 | +| rx_bytes | nic | gauge | bytes | 网卡接收字节数 | +| rx_packets | nic | gauge | | 网卡接收的总数据包数 | +| rx_errs | nic | gauge | | 网卡接收错误的数据包数 | +| rx_dropped | nic | gauge | | 网卡接收丢弃的数据包数 | +| tx_bytes | nic | gauge | bytes | 网卡发送字节数 | +| tx_packets | nic | gauge | | 网卡发送的总数据包数 | +| tx_errs | nic | gauge | | 网卡发送错误的数据包数 | +| tx_dropped | nic | gauge | | 网卡发送丢弃的数据包数 | +| rxspeed_KB | nic | gauge | Kbytes/s | 网卡上行速率 | +| txspeed_KB | nic | gauge | Kbytes/s | 网卡下行速率 | +| tc_sent_drop | nic | gauge | | TC发送丢包 | +| tc_sent_overlimits | nic | gauge | | TC发送队列溢出 | +| tc_backlog | nic | gauge | | TC backlog队列包数量 | +| tc_ecn_mark | nic | gauge | | TC 拥塞标记 | + ## 容器性能 实体名:container -| metrics_name | table_name | metrics_type | unit | metrics description | -| -------------------------------------- | ----------------- | ------------ | ------- | ------------------------------------------------------------ | -| container_id | container | key | | 容器ID(简写) | -| name | container | label | | 容器名称 | -| cpucg_inode | container | label | | cpu,cpuacct cgroup ID(容器实例内cgroup目录对应的inode id) | -| memcg_inode | container | label | | memory cgroup ID(容器实例内cgroup目录对应的inode id) | -| pidcg_inode | container | label | | pids cgroup ID(容器实例内cgroup目录对应的inode id) | -| mnt_ns_id | container | label | | mount namespace | -| net_ns_id | container | label | | net namespace | -| proc_id | container | label | | 容器主进程ID | -| blkio_device_usage_total | container_blkio | Gauge | bytes | Blkio device bytes usage, unit bytes | -| cpu_load_average_10s | container_cpu | Gauge | | Value of container cpu load average over the last 10 seconds | -| cpu_system_seconds_total | container_cpu | Gauge | seconds | Cumulative system cpu time consumed, unit second | -| cpu_usage_seconds_total | container_cpu | Gauge | seconds | Cumulative cpu time consumed, unit second | -| cpu_user_seconds_total | container_cpu | Gauge | seconds | Cumulative user cpu time consumed, unit second | -| fs_inodes_free | container_fs | Gauge | | Number of available Inodes | -| fs_inodes_total | container_fs | Gauge | | Total number of Inodes | -| fs_io_current | container_fs | Gauge | | Number of I/Os currently in progress | -| fs_io_time_seconds_total | container_fs | Gauge | seconds | Cumulative count of seconds spent doing I/Os, unit second | -| fs_io_time_weighted_seconds_total | container_fs | Gauge | seconds | Cumulative weighted I/O time, unit second | -| fs_limit_bytes | container_fs | Gauge | bytes | Number of bytes that can be consumed by the container on this filesystem, unit bytes | -| fs_read_seconds_total | container_fs | Gauge | bytes | Cumulative count of bytes read, unit bytes | -| fs_reads_bytes_total | container_fs | Gauge | bytes | Cumulative count of bytes read | -| fs_reads_merged_total | container_fs | Gauge | | Cumulative count of reads merged | -| fs_reads_total | container_fs | Gauge | | Cumulative count of reads completed | -| fs_sector_reads_total | container_fs | Gauge | | Cumulative count of sector reads completed | -| fs_sector_writes_total | container_fs | Gauge | | Cumulative count of sector writes completed | -| fs_usage_bytes | container_fs | Gauge | bytes | Number of bytes that are consumed by the container on this filesystem | -| fs_write_seconds_total | container_fs | Gauge | seconds | Cumulative count of seconds spent writing | -| fs_writes_bytes_total | container_fs | Gauge | bytes | Cumulative count of bytes written | -| fs_writes_merged_total | container_fs | Gauge | | Cumulative count of writes merged | -| fs_writes_total | container_fs | Gauge | | Cumulative count of writes completed | -| memory_cache | container_memory | Gauge | bytes | Total page cache memory | -| memory_failcnt | container_memory | Gauge | | Number of memory usage hits limits | -| memory_failures_total | container_memory | Gauge | | Cumulative count of memory allocation failures | -| memory_mapped_file | container_memory | Gauge | bytes | Size of memory mapped files | -| memory_max_usage_bytes | container_memory | Gauge | bytes | Maximum memory usage recorded | -| memory_rss | container_memory | Gauge | bytes | Size of RSS | -| memory_swap | container_memory | Gauge | bytes | Container swap usage | -| memory_usage_bytes | container_memory | Gauge | bytes | Current memory usage, including all memory regardless of when it was accessed | -| memory_working_set_bytes | container_memory | Gauge | bytes | Current working set | -| network_receive_bytes_total | container_network | Gauge | bytes | Cumulative count of bytes received | -| network_receive_errors_total | container_network | Gauge | | Cumulative count of errors encountered while receiving | -| network_receive_packets_dropped_total | container_network | Gauge | | Cumulative count of packets dropped while receiving | -| network_receive_packets_total | container_network | Gauge | | Cumulative count of packets received | -| network_transmit_bytes_total | container_network | Gauge | bytes | Cumulative count of bytes transmitted | -| network_transmit_errors_total | container_network | Gauge | | Cumulative count of errors encountered while transmitting | -| network_transmit_packets_dropped_total | container_network | Gauge | | Cumulative count of packets dropped while transmitting | -| network_transmit_packets_total | container_network | Gauge | | Cumulative count of packets transmitted | -| oom_events_total | container_oom | Gauge | | Count of out of memory events observed for the container | -| spec_cpu_period | container_spec | Gauge | | CPU period of the container | -| spec_cpu_shares | container_spec | Gauge | | CPU share of the container | -| spec_memory_limit_bytes | container_spec | Gauge | bytes | Memory limit for the container | -| spec_memory_reservation_limit_bytes | container_spec | Gauge | bytes | Memory reservation limit for the container | -| spec_memory_swap_limit_bytes | container_spec | Gauge | bytes | Memory swap limit for the container | -| start_time_seconds | container_start | Gauge | seconds | Start time of the container since unix epoch | -| tasks_state | container_tasks | Gauge | | Number of tasks in given state (sleeping, running, stopped, uninterruptible, or ioawaiting) | -| | | | | | - -# 网络监控 - -## TCP流量监控 - -实体名:tcp_link - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------ | -------------- | ------------ | ----- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| role | | key | | 客户端/服务端 | -| client_ip | | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | | key | | 客户端:对端IP;服务端:本地IP
备注:K8S场景支持Cluster IP转换成Backend IP | -| server_port | | key | | 客户端:对端端口;服务端:本地端口
备注:K8S场景支持Cluster Port转换成Backend Port | -| protocol | | key | | 协议族(IPv4、IPv6) | -| rx_bytes | tcp_tx_rx(0x8) | Gauge | bytes | rx bytes | -| tx_bytes | tcp_tx_rx(0x8) | Gauge | bytes | tx bytes | -| segs_in | tcp_tx_rx(0x8) | Gauge | segs | total number of segments received | -| segs_out | tcp_tx_rx(0x8) | Gauge | segs | total number of segments sent | - -## DNS访问监控 - -实体名:dns - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------ | ---------- | ------------ | ---- | ------------------- | -| tgid | dns | key | | 进程ID | -| domain | dns | key | | 进程访问的DNS域名 | -| delay_avg | dns | Gauge | ms | DNS访问平均时延 | -| max_delay | dns | Gauge | ms | DNS访问最大时延 | -| error_ratio | dns | Gauge | % | DNS访问错误率 | -| count | dns | Gauge | | DNS访问次数 | - -## TCP/IP监控 - -### TCP异常监控 - -实体名:tcp_link - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------- | ------------- | ------------ | ---- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| role | | key | | 客户端/服务端 | -| client_ip | | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | | key | | 客户端:对端IP;服务端:本地IP
备注:K8S场景支持Cluster IP转换成Backend IP | -| server_port | | key | | 客户端:对端端口;服务端:本地端口
备注:K8S场景支持Cluster Port转换成Backend Port | -| protocol | | key | | 协议族(IPv4、IPv6) | -| retran_packets | tcp_abn(0x01) | Gauge | | total number of retrans | -| retrans_ratio | tcp_abn(0x01) | Gauge | | retran ratio | -| backlog_drops | tcp_abn(0x01) | Gauge | | drops caused by backlog queue full | -| sk_drops | tcp_abn(0x01) | Counter | | Number of lost packets in the TCP protocol stack | -| lost_out | tcp_abn(0x01) | Gauge | segs | Number of lost segments estimated by TCP congestion | -| sacked_out | tcp_abn(0x01) | Gauge | segs | Number of out-of-order TCP packets (SACK) or number of repeated TCP ACKs (NO SACK) | -| filter_drops | tcp_abn(0x01) | Gauge | | drops caused by socket filter | -| tmout_count | tcp_abn(0x01) | Gauge | | counter of tcp link timeout | -| snd_buf_limit_count | tcp_abn(0x01) | Gauge | | counter of limits when allocate wmem | -| rmem_scheduls | tcp_abn(0x01) | Gauge | | rmem is not enough | -| tcp_oom | tcp_abn(0x01) | Gauge | | tcp out of memory | -| send_rsts | tcp_abn(0x01) | Gauge | | send_rsts | -| receive_rsts | tcp_abn(0x01) | Gauge | | receive_rsts | - -### Socket监控 - -实体名:endpoint_tcp - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------- | ------------ | ------------ | ---- | ------------------------------------------------------------ | -| tgid | endpoint_tcp | key | | 进程ID | -| role | endpoint_tcp | key | | 客户端(0)/服务端(1) | -| client_ip | endpoint_tcp | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | endpoint_tcp | key | | 客户端:对端IP;服务端:本地IP
备注:K8S场景支持Cluster IP转换成Backend IP | -| server_port | endpoint_tcp | key | | 客户端:对端端口;服务端:本地端口
备注:K8S场景支持Cluster Port转换成Backend Port | -| protocol | endpoint_tcp | key | | 协议族(IPv4、IPv6) | -| listendrop | endpoint_tcp | Gauge | | TCP accept丢弃次数 | -| accept_overflow | endpoint_tcp | Gauge | | TCP accept队列溢出次数 | -| syn_overflow | endpoint_tcp | Gauge | | TCP syn队列溢出次数 | -| passive_open | endpoint_tcp | Gauge | | TCP被动发起的建链次数 | -| passive_open_failed | endpoint_tcp | Gauge | | TCP被动发起的建链失败次数 | -| retran_synacks | endpoint_tcp | Gauge | | TCP synack重传报文数 | -| lost_synacks | endpoint_tcp | Gauge | | TCP synack报文丢失导致的建链失败次数 | -| req_drops | endpoint_tcp | Gauge | | TCP request丢弃次数(因为服务端listen关闭) | -| active_open | endpoint_tcp | Gauge | | TCP主动发起的建链次数 | -| active_open_failed | endpoint_tcp | Gauge | | TCP主动发起的建链失败次数 | - - - -实体名:endpoint_udp - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------- | ------------ | ------------ | ----- | -------------------- | -| tgid | endpoint_udp | key | | 进程ID | -| remote_ip | endpoint_udp | key | | udp 对端IP | -| local_ip | endpoint_udp | key | | udp 本地IP | -| protocol | endpoint_udp | key | | 协议族(IPv4、IPv6) | -| udp_rcv_drops | endpoint_udp | Gauge | bytes | UDP接收失败字节数 | -| udp_sends | endpoint_udp | Gauge | bytes | UDP发送字节数 | -| udp_rcvs | endpoint_udp | Gauge | bytes | UDP接收字节数 | - -# 应用(微服务)访问性能 - -实体名:l7 - -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| --------------- | ---------- | ------------ | ---- | ------------------------------------------------------------ | -------------------------- | -| tgid | | key | | Process ID of l7 session. | openSSL 1.1.1, Go SSL,JSSE | -| client_ip | | key | | Client IP address of l7 session. | | -| server_ip | | key | | Server IP address of l7 session.
备注:K8S场景支持Cluster IP转换成Backend IP | | -| server_port | | key | | Server Port of l7 session.
备注:K8S场景支持Cluster Port转换成Backend Port | | -| l4_role | | key | | Role of l4 protocol(TCP Client/Server or UDP) | | -| l7_role | | key | | Role of l7 protocol(Client or Server) | | -| protocol | | key | | Name of l7 protocol(http/http2/mysql...) | | -| ssl | | label | | Indicates whether an SSL-encrypted l7 session is used. | | -| bytes_sent | l7_link | gauge | | Number of bytes sent by a l7 session. | | -| bytes_recv | l7_link | gauge | | Number of bytes recv by a l7 session. | | -| segs_sent | l7_link | gauge | | Number of segs sent by a l7 session. | | -| segs_recv | l7_link | gauge | | Number of segs recv by a l7 session. | | -| throughput_req | l7_rpc | gauge | qps | Request throughput of l7 session. | | -| throughput_resp | l7_rpc | gauge | qps | Response throughput of l7 session. | | -| req_count | l7_rpc | gauge | | Request num of l7 session. | | -| resp_count | l7_rpc | gauge | | Response num of l7 session. | | -| latency_avg | l7_rpc | gauge | ns | L7 session averaged latency. | | -| latency | l7_rpc | histogram | ns | L7 session histogram latency. | | -| latency_sum | l7_rpc | gauge | ns | L7 session sum latency. | | -| err_ratio | l7_rpc | gauge | % | L7 session error rate. | | -| err_count | l7_rpc | gauge | | L7 session error count. | | - -# 应用性能监控 - -## TCP性能 - -实体名:tcp_link - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------ | ----------------- | ------------ | ----- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| role | | key | | 客户端/服务端 | -| client_ip | | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | | key | | 客户端:对端IP;服务端:本地IP | -| server_port | | key | | 客户端:对端端口;服务端:本地端口 | -| protocol | | key | | 协议族(IPv4、IPv6) | -| rto | tcp_rate(0x20) | histogram | | Retransmission timeOut(us) | -| ato | tcp_rate(0x20) | histogram | | Estimated value of delayed ACK(us) | -| srtt | tcp_rtt(0x4) | histogram | us | Smoothed Round Trip Time(us). | -| snd_cwnd | tcp_windows(0x2) | histogram | | Congestion Control Window Size. | -| reordering | tcp_windows(0x2) | histogram | | Segments to be reordered. | -| rcv_rtt | tcp_rtt(0x4) | histogram | us | Receive end RTT (unidirectional measurement). | -| notsent_bytes | tcp_windows(0x2) | histogram | bytes | Number of bytes not sent currently. | -| notack_bytes | tcp_windows(0x2) | histogram | bytes | Number of bytes not ack currently. | -| snd_wnd | tcp_windows(0x2) | histogram | | Size of TCP send window. | -| rcv_wnd | tcp_windows(0x2) | histogram | | Size of TCP receive window. | -| zero_win_tx_ratio | tcp_windows(0x2) | Gauge | | Ratio of the number of times of sending window 0 to the number of sent bytes | -| zero_win_rx_ratio | tcp_windows(0x2) | Gauge | | Ratio of the number of receive window 0 windows to the number of received bytes | -| zero_rcv_wnd_count | tcp_windows(0x2) | Gauge | | The number of receive window 0 windows. | -| zero_snd_wnd_count | tcp_windows(0x2) | Gauge | | The number of sending window 0 windows. | -| avl_snd_wnd | tcp_windows(0x2) | histogram | | Size of TCP available send window. | -| syn_srtt | tcp_srtt | histogram | us | RTT of syn packet(us). | -| syn_srtt_max | tcp_srtt | Gauge | us | RTT of syn packet(us). | -| sk_rcvbuf | tcp_sockbuf(0x10) | histogram | bytes | Byte length of the RX buffer. | -| sk_sndbuf | tcp_sockbuf(0x10) | histogram | bytes | Byte length of the TX buffer. | +| metrics_name | metrics_type | unit | metrics description | +| -------------------------------------- | ------------ | ------- | ----------------------------------------------- | +| cpu_usage_seconds_total | Gauge | seconds | 容器一秒时间内的整体CPU负载,包括所有CPU Core | +| cpu_system_seconds_total | Gauge | seconds | 容器一秒时间内的系统态CPU负载,包括所有CPU Core | +| cpu_user_seconds_total | Gauge | seconds | 容器一秒时间内的用户态CPU负载,包括所有CPU Core | +| memory_mapped_file | Gauge | bytes | 容器映射文件占用大小 | +| memory_cache | Gauge | bytes | 容器Cache内存占用大小 | +| memory_rss | Gauge | bytes | 容器物理内存占用大小 | +| memory_working_set_bytes | Gauge | bytes | 容器实际占用内存大小(更具参考意义) | +| container_memory_usage_bytes | Gauge | bytes | 容器总共占用内存大小 | +| oom_events_total | Gauge | num | 容器内OOM次数 | +| network_receive_bytes_total | Gauge | bytes | 容器内网络接收统计 | +| network_transmit_bytes_total | Gauge | bytes | 容器内网络发送统计 | +| network_receive_errors_total | Gauge | num | 容器内网络异常统计(接收错误) | +| network_receive_packets_dropped_total | Gauge | num | 容器内网络异常统计(接收丢弃) | +| network_transmit_errors_total | Gauge | num | 容器内网络异常统计(发送错误) | +| network_transmit_packets_dropped_total | Gauge | num | 容器内网络异常统计(发送丢弃) | +| fs_reads_bytes_total | Gauge | bytes | 容器I/O读写字节统计 (读) | +| fs_writes_bytes_total | Gauge | bytes | 容器I/O读写字节统计 (写) | +| container_file_descriptors | Gauge | num | 容器内文件句柄数量 | +| fs_read_seconds_total | Gauge | seconds | 容器I/O读写时间 | +| fs_write_seconds_total | Gauge | seconds | 容器I/O读写时间 | +| fs_inodes_free | Gauge | num | 容器内inode资源统计(空闲) | +| fs_inodes_total | Gauge | num | 容器内inode资源统计(总计) | +| cpu_cfs_throttled_seconds_total | Gauge | seconds | 容器限流 | + + + +## GPU/NPU + +待上线 + + + +# 应用 ## 应用性能 -### 基于流的进程性能 - -实体名:proc_flow_perf - -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------- | -------------- | ------------ | ---- | ----------------------------------------------------------- | ------------------------------------------------------------ | -| tgid | | key | | Process ID | | -| remote_ip | | key | | 对端IP地址 | | -| port | | key | | 客户端:对端Port;服务端:本地Port; | | -| role | | key | | 客户端/服务端 | | -| tx_delay | proc_flow_perf | histogram | us | Delay in the Tx direction of the application TCP link. | | -| rx_delay | proc_flow_perf | histogram | us | Delay in the Rx direction of the application TCP link. | 支持内核4.18(EulerOS版本)
>= 5.10
或者应用使用socket option SO_TIMESTAMPNS | -| tx_throughput | proc_flow_perf | gauge | bps | Throughput in the Tx direction of the application TCP link. | TO BE | -| rx_throughput | proc_flow_perf | gauge | bps | Throughput in the Rx direction of the application TCP link. | TO BE | - -### 进程性能 - -实体名:proc_perf - -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------- | ---------- | ------------ | ---- | ----------------------------------- | ------------------------------------------------------------ | -| tgid | | key | | Process ID | | -| tx_delay | proc_perf | histogram | us | TCP delay in the Tx direction. | | -| rx_delay | proc_perf | histogram | us | TCP delay in the Rx direction. | 支持内核4.18(EulerOS版本)
>= 5.10
或者应用使用socket option SO_TIMESTAMPNS | -| tx_throughput | proc_perf | gauge | bps | TCP throughput in the Tx direction. | TO BE | -| rx_throughput | proc_perf | gauge | bps | TCP throughput in the Rx direction. | TO BE | - - - -## I/O性能 - -实体名:proc - -| metrics_name | table_name | metrics_type | unit | metrics description | -| --------------------- | ------------------ | ------------ | ---- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| ppid | system_proc | label | | 父进程ID | -| pgid | system_proc | label | | 进程组ID | -| comm | | label | | 执行程序名称 | -| cmdline | system_proc | label | | 执行程序命令(包括配置) | -| fd_count | system_proc | Gauge | | 进程文件句柄 | -| fd_free_per | system_proc | Gauge | | 进程剩余FD资源占比% | -| rchar_bytes | system_proc | Gauge | | 进程系统调用至FS的读字节数 | -| wchar_bytes | system_proc | Gauge | | 进程系统调用至FS的写字节数 | -| syscr_count | system_proc | Gauge | | 进程read()/pread()执行次数 | -| syscw_count | system_proc | Gauge | | 进程write()/pwrite()执行次数 | -| read_bytes | system_proc | Gauge | | 进程实际从磁盘读取的字节数 | -| write_bytes | system_proc | Gauge | | 进程实际从磁盘写入的字节数 (page cache情况下,该字段进表示设置dirty page的size) | -| cancelled_write_bytes | system_proc | Gauge | | 参考proc_write_bytes,因为存在page cache 如果write操作结束后,又发生文件被删除事件,会导致diry page并未写入磁盘,所以存在取消写的字节数统计 | -| ns_ext4_read | proc_ext4(0x20) | Gauge | ns | ext4文件系统读操作时间,单位ns | -| ns_ext4_write | proc_ext4(0x20) | Gauge | ns | ext4文件系统写操作时间,单位ns | -| ns_ext4_flush | proc_ext4(0x20) | Gauge | ns | ext4文件系统flush操作时间,单位ns | -| ns_ext4_open | proc_ext4(0x20) | Gauge | ns | ext4文件系统open操作时间,单位ns | -| ns_overlay_read | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统读操作时间,单位ns | -| ns_overlay_write | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统写操作时间,单位ns | -| ns_overlay_flush | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统flush操作时间,单位ns | -| ns_overlay_open | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统open操作时间,单位ns | -| ns_tmpfs_read | proc_tmpfs(0x80) | Gauge | ns | tmpfs文件系统读操作时间,单位ns | -| ns_tmpfs_write | proc_tmpfs(0x80) | Gauge | ns | tmpfs文件系统写操作时间,单位ns | -| ns_tmpfs_flush | proc_tmpfs(0x80) | Gauge | ns | tmpfs文件系统flush操作时间,单位ns | -| less_4k_io_read | proc_io(0x400) | Gauge | | Number of small I/O (less than 4 KB) read operations at the BIO layer. | -| less_4k_io_write | proc_io(0x400) | Gauge | | Number of small I/O (less than 4 KB) write operations at the BIO layer. | -| greater_4k_io_read | proc_io(0x400) | Gauge | | Number of big I/O (greater than 4 KB) read operations at the BIO layer. | -| greater_4k_io_write | proc_io(0x400) | Gauge | | Number of big I/O (greater than 4 KB) write operations at the BIO layer. | -| bio_latency | proc_io(0x400) | Gauge | ns | I/O operation delay at the BIO layer (unit: us). (备注:虚拟化场景针对qemu进程才有意义) | -| bio_err_count | proc_io(0x400) | Gauge | | Number of I/O operation failures at the BIO layer.(备注:虚拟化场景针对qemu进程才有意义) | -| hang_count | proc_io(0x400) | Gauge | | Number of process hang times. | -| iowait_us | proc_io(0x400) | Gauge | us | Process IO_wait time (unit: us). | - -## 内存 - -实体名:proc - -| metrics_name | table_name | metrics_type | unit | metrics description | -| --------------------- | ----------- | ------------ | ---- | --------------------------------------- | -| tgid | | key | | 进程ID | -| ppid | system_proc | label | | 父进程ID | -| pgid | system_proc | label | | 进程组ID | -| comm | | label | | 执行程序名称 | -| cmdline | system_proc | label | | 执行程序命令(包括配置) | -| shared_dirty_size | system_proc | Gauge | | 进程共享属性的dirty page size | -| shared_clean_size | system_proc | Gauge | | 进程共享属性的clean page size | -| private_dirty_size | system_proc | Gauge | | 进程私有属性的dirty page size | -| private_clean_size | system_proc | Gauge | | 进程私有属性的clean page size | -| referenced_size | system_proc | Gauge | | 进程当前已引用的page size | -| lazyfree_size | system_proc | Gauge | | 进程延迟释放内存的size | -| swap_data_size | system_proc | Gauge | | 进程swap区间数据size | -| swap_data_pss_size | system_proc | Gauge | | 进程物理内存swap区间数据size | -| minor pagefault_count | system_proc | Gauge | | 进程轻微pagefault次数(无需从磁盘拷贝) | -| major pagefault_count | system_proc | Gauge | | 进程严重pagefault次数(需从磁盘拷贝) | -| vm_size | system_proc | Gauge | | 进程当前虚拟地址空间大小 | -| pm_size | system_proc | Gauge | | 进程当前物理地址空间大小 | - -## 调度&系统调用 - -实体名:proc - -| metrics_name | table_name | metrics_type | unit | metrics description | -| -------------- | ------------------------ | ------------ | ---- | ----------------------------------- | -| tgid | | key | | 进程ID | -| ppid | system_proc | label | | 父进程ID | -| pgid | system_proc | label | | 进程组ID | -| comm | | label | | 执行程序名称 | -| cmdline | system_proc | label | | 执行程序命令(包括配置) | -| utime_jiffies | system_proc | Gauge | | 进程用户运行时间 | -| stime_jiffies | system_proc | Gauge | | 进程系统态运行时间 | -| ns_mount | proc_syscall_io(0x02) | Gauge | ns | 进程系统调用mount时长,单位ns | -| ns_umount | proc_syscall_io(0x02) | Gauge | ns | 进程系统调用umount时长,单位ns | -| ns_read | proc_syscall_io(0x02) | Gauge | ns | 进程系统调用read时长,单位ns | -| ns_write | proc_syscall_io(0x02) | Gauge | ns | 进程系统调用write时长,单位ns | -| ns_fsync | proc_syscall_io(0x02) | Gauge | ns | 进程系统调用fsync时长,单位ns | -| ns_sendmsg | proc_syscall_net(0x04) | Gauge | ns | 进程系统调用sendmsg时长,单位ns | -| ns_recvmsg | proc_syscall_net(0x04) | Gauge | ns | 进程系统调用recvmsg时长,单位ns | -| ns_sched_yield | proc_syscall_sched(0x08) | Gauge | ns | 进程系统调用sched_yield时长,单位ns | -| ns_futex | proc_syscall_sched(0x08) | Gauge | ns | 进程系统调用futex时长,单位ns | -| ns_epoll_wait | proc_syscall_sched(0x08) | Gauge | ns | 进程系统调用epoll_wait时长,单位ns | -| ns_epoll_pwait | proc_syscall_sched(0x08) | Gauge | ns | 进程系统调用epoll_pwait时长,单位ns | -| ns_fork | proc_syscall_fork(0x10) | Gauge | ns | 进程系统调用fork时长,单位ns | -| ns_vfork | proc_syscall_fork(0x10) | Gauge | ns | 进程系统调用vfork时长,单位ns | -| ns_clone | proc_syscall_fork(0x10) | Gauge | ns | 进程系统调用clone时长,单位ns | -| syscall_failed | proc_syscall (0x01) | Gauge | | 进程系统调用失败次数 | - -## JVM监控 - -实体名:jvm - -| metrics_name | table_name | metrics_type | unit | metrics description | -| -------------------------- | ------------ | ------------ | ----- | ----------------------------------------- | -| tgid | | key | | Java 虚拟机的进程ID | -| runtime | jvm_info | label | | JVM 运行时信息 | -| vendor | jvm_info | label | | JVM 创建者/维护者 | -| version | jvm_info | label | | JVM 版本 | -| info | jvm_info | gauge | | 固定值1 | -| proc_start_time_secs | jvm_proc | gauge | s | 进程起始时间 | -| proc_cpu_secs_total | jvm_proc | counter | s | 进程已使用的CPU时间 | -| class_current_loaded | jvm_class | gauge | | JVM当前已加载类的数量 | -| class_loaded_total | jvm_class | counter | | JVM自执行以来加载的类的总数量 | -| threads_current | jvm_thread | gauge | | JVM当前线程数 | -| threads_daemon | jvm_thread | gauge | | JVM的守护线程数 | -| threads_peak | jvm_thread | gauge | | JVM的峰值线程数 | -| threads_started_total | jvm_thread | counter | | JVM的已启动线程数 | -| threads_deadlocked | jvm_thread | gauge | | JVM的死锁的线程数 | -| area | jvm_mem | label | | JVM内存类型:heap/noheap | -| mem_bytes_used | jvm_mem | gauge | bytes | 给定JVM内存区域的已使用字节数 | -| mem_bytes_commit | jvm_mem | gauge | bytes | 给定JVM内存区域的已提交字节数 | -| mem_bytes_max | jvm_mem | gauge | bytes | 给定JVM内存区域的最大字节数 | -| mem_bytes_init | jvm_mem | gauge | bytes | 给定JVM内存区域的初始字节数 | -| pool | jvm_mem_pool | label | | 内存池类型 | -| mem_pool_bytes_used | jvm_mem_pool | gauge | bytes | 给定JVM内存池的已使用字节数 | -| mem_pool_bytes_commit | jvm_mem_pool | gauge | bytes | 给定JVM内存池的已提交字节数 | -| mem_pool_bytes_max | jvm_mem_pool | gauge | bytes | 给定JVM内存池的最大字节数 | -| mem_pool_coll_used_bytes | jvm_mem_pool | gauge | bytes | 给定JVM内存池最后一次垃圾回收使用的字节数 | -| mem_pool_coll_commit_bytes | jvm_mem_pool | gauge | bytes | 上一次GC内存池的大小 | -| mem_pool_coll_max_bytes | jvm_mem_pool | gauge | bytes | 上一次GC内存池的最大字节数 | -| pool | jvm_buf_pool | label | | 缓冲池类型 | -| buffer_pool_used_bytes | jvm_buf_pool | gauge | bytes | 给定JVM缓冲池的已用字节数 | -| buffer_pool_used_buffers | jvm_buf_pool | gauge | | 给定JVM缓冲池的已用缓冲区数 | -| buffer_pool_capacity_bytes | jvm_buf_pool | gauge | bytes | 给定JVM缓冲池的字节容量 | -| gc | jvm_gc | label | | 垃圾回收器名字 | -| gc_coll_secs_count | jvm_gc | gauge | | 给定的垃圾回收器已发生的GC总次数 | -| gc_coll_secs_sum | jvm_gc | gauge | s | 在给定的垃圾回收器花费的总时间 | - -# Kafka监控 - -## Topic流监控 +应用性能是指通过eBPF非侵入方式观测应用的黑盒性能指标(RED),观测后以应用维度统计观测结果。 + +- 观测指标会携带如下标签(除应用公共标签): + +| label name | 意义 | +| ----------- | ------------------------------------------------------ | +| tgid | Process ID of l7 session. | +| client_ip | Client IP address of l7 session. | +| server_ip | Server IP address of l7 session. | +| server_port | Server Port of l7 session. | +| l4_role | Role of l4 protocol(TCP Client/Server or UDP) | +| l7_role | Role of l7 protocol(Client or Server) | +| protocol | Name of l7 protocol(http/http2/mysql...) | +| ssl | Indicates whether an SSL-encrypted l7 session is used. | + +- 应用性能支持协议包括:HTTP 1.X,PGSQL,#Redis,#DNS,#HTTP2.0,#Dubbo,#MySQL,#Kafka; +- 同时支持加密流的观测,覆盖:openSSL,JSSE, #GoSSL + +- 应用性能表呈现三类观测对象:**POD、容器、进程**;具体呈现方式由应用部署方式决定。比如K8S场景,就会以POD维度呈现应用性能数据。 + +备注:#标注的协议待上线。 + +| metrics_name | metrics_type | unit | metrics description | +| ------------ | ------------ | ---- | ------------------------------------------------------------ | +| req_count | gauge | num | 应用客户端请求数量(用于计算请求速率qps) | +| resp_count | gauge | num | 应用服务端应答数量(用于计算应答速率qps) | +| err_count | gauge | num | 应用服务端错误次数(用于计算错误率:err_count /resp_count) | +| latency_sum | gauge | us | 应用请求时延总和(用于计算平均请求时延:latency_sum/req_count,平均应答时延:latency_sum/resp_count) | +| srtt | gauge | us | 进程TCP时延(tcp_link实体) | +| iowait_ns | gauge | us | 进程I/O阻塞时延(proc实体) | +| cpu | gauge | % | 进程CPU使用率(proc实体) | + +## 流量关系 + +流量关系是指应用之间的访问关系,是由根据应用性能观测数据计算得出。 + +| metrics_name | metrics_type | unit | metrics description | +| ---------------------------- | ------------ | ---- | ------------------------------------------------------------ | +| 客户端应用名(比如POD NAME) | label | | | +| 服务端应用名(比如POD NAME) | label | | | +| req_count | gauge | num | 应用客户端请求数量(用于计算请求速率qps) | +| resp_count | gauge | num | 应用服务端应答数量(用于计算应答速率qps) | +| err_count | gauge | num | 应用服务端错误次数(用于计算错误率:err_count /resp_count) | +| latency_sum | gauge | us | 应用请求时延总和(用于计算平均请求时延:latency_sum/req_count,平均应答时延:latency_sum/resp_count) | +| srtt | gauge | us | 进程TCP时延(tcp_link实体) | + +## 路径拓扑 + +## 访问日志 + +待上线 + +## 应用详细统计 + +### 应用部署关系 + +应用部署关系信息由应用性能指标的标签携带,提供包括: + +**Node标签**:System ID(集群内唯一),管理IP。 + +**进程标签**:进程ID、进程名、cmdline。 + +**网络标签**:clien/server ip、server port、role 标签。 + +**容器标签**:容器ID、容器名称、容器镜像。 + +**POD标签**:POD ID,POD IP,Pod Name,Pod Namespace标签。 + +### 应用详细指标 + +#### 应用性能 + +| metrics_name | metrics_type | unit | metrics description | +| ------------ | ------------ | ---- | ------------------------------------------------------------ | +| req_count | gauge | num | 应用客户端请求数量(用于计算请求速率qps) | +| resp_count | gauge | num | 应用服务端应答数量(用于计算应答速率qps) | +| err_count | gauge | num | 应用服务端错误次数(用于计算错误率:err_count /resp_count) | +| latency_sum | gauge | us | 应用请求时延总和(用于计算平均请求时延:latency_sum/req_count,平均应答时延:latency_sum/resp_count) | +| srtt | gauge | us | 进程TCP时延(tcp_link实体) | +| iowait_ns | gauge | us | 进程I/O阻塞时延(proc实体) | +| cpu | gauge | % | 进程CPU使用率(proc实体) | + +#### 应用I/O + +| metrics_name | metrics_type | unit | metrics description | 支持 | +| ------------------- | ------------ | ----- | ---------------------------------------------------------- | ----- | +| io_delay | Gauge | us | 应用I/O时延 | TO BE | +| iowait_us | Gauge | us | 应用访问I/O产生的wait时延 | | +| bio_latency | Gauge | us | 应用访问I/O产生的bio层时延 | | +| bio_err_count | Gauge | num | 应用访问I/O产生的BIO错误次数 | | +| rchar_bytes | Gauge | bytes | 应用读字节数量(用于计算应用I/O读速率) | | +| wchar_bytes | Gauge | bytes | 应用写字节数量(用于计算应用I/O写速率) | | +| fd_count | Gauge | num | 应用持有的文件句柄数量 | | +| greater_4k_io_read | Gauge | num | 应用内大I/O(大于4K)读操作次数 | | +| greater_4k_io_write | Gauge | num | 应用内大I/O(大于4K)写操作次数 | | +| less_4k_io_read | Gauge | num | 应用内小I/O(大于4K)读操作次数 | | +| less_4k_io_write | Gauge | num | 应用内小I/O(大于4K)写操作次数 | | +| ns_ext4_read | Gauge | us | 应用的文件系统读时延(ext4文件系统,常用文件系统) | | +| ns_overlay_read | Gauge | us | 应用的文件系统读时延(overlay文件系统,容器场景常使用) | | +| ns_tmpfs_read | Gauge | us | 应用的文件系统读时延(tmpfs文件系统,临时文件常使用) | | +| ns_ext4_write | Gauge | us | 应用的文件系统写时延(ext4文件系统,常用文件系统) | | +| ns_overlay_write | Gauge | us | 应用的文件系统写时延(overlay文件系统,容器场景常使用) | | +| ns_tmpfs_write | Gauge | us | 应用的文件系统写时延(tmpfs文件系统,临时文件常使用) | | +| ns_ext4_flush | Gauge | us | 应用的文件系统flush时延(ext4文件系统,常用文件系统) | | +| ns_overlay_flush | Gauge | us | 应用的文件系统flush时延(overlay文件系统,容器场景常使用) | | +| ns_tmpfs_flush | Gauge | us | 应用的文件系统flush时延(tmpfs文件系统,临时文件常使用) | | + +#### 应用CPU + +| metrics_name | metrics_type | unit | metrics description | 支持 | +| ---------------- | ------------ | ---- | ------------------------------------- | ----- | +| user_cpu_ratio | gauge | % | 应用用户态CPU使用率(proc实体) | TO BE | +| system_cpu_ratio | gauge | % | 应用系统态CPU使用率(proc实体) | TO BE | +| offcpu_ns | gauge | us | 应用调度等待CPU调度的时延(proc实体) | | + +#### 应用内存 + +| metrics_name | metrics_type | unit | metrics description | +| --------------------- | ------------ | ----- | ----------------------------- | +| pm_size | Gauge | bytes | 应用物理内存 | +| vm_size | Gauge | bytes | 应用虚拟内存 | +| minor_pagefault_count | Gauge | num | 应用产生的轻微级pagefault次数 | +| major_pagefault_count | Gauge | num | 应用产生的严重级pagefault次数 | +| swap_data_size | Gauge | bytes | 应用swap区域大小 | +| referenced_size | Gauge | bytes | 应用引用的page大小 | + +#### 应用JVM + +| metrics_name | metrics_type | unit | metrics description | +| -------------------------- | ------------ | ------- | --------------------- | +| threads_current | gauge | num | 应用内当前JVM线程数量 | +| threads_daemon | gauge | num | 应用内守护JVM线程数量 | +| threads_peak | gauge | num | 应用内峰值JVM线程数量 | +| threads_deadlocked | gauge | num | 应用内死锁JVM线程数量 | +| mem_bytes_used | gauge | bytes | 应用JVM已用内存占用 | +| mem_bytes_commit | gauge | bytes | 应用JVM提交内存占用 | +| mem_bytes_max | gauge | bytes | 应用JVM最大内存占用 | +| mem_bytes_init | gauge | bytes | 应用JVM初始内存占用 | +| mem_pool_bytes_used | gauge | bytes | 应用JVM已用内存池占用 | +| mem_pool_bytes_commit | gauge | bytes | 应用JVM提交内存池占用 | +| mem_pool_bytes_max | gauge | bytes | 应用JVM最大内存池占用 | +| buffer_pool_used_bytes | gauge | bytes | 应用JVM已用内存buffer | +| buffer_pool_capacity_bytes | gauge | bytes | 应用JVM内存buffer容量 | +| gc_coll_secs_count | gauge | num | 应用内发生GC次数 | +| gc_coll_secs_sum | gauge | seconds | 应用内GC花费的总时间 | + +### 应用网络 + +#### TCP指标 + +| metrics_name | metrics_type | unit | metrics description | +| ------------------- | ------------ | ------- | ------------------------------------------------------------ | +| rx_bytes | Gauge | bytes | 应用内TCP接收字节数(用于计算接收速率bps)(tcp_link实体) | +| tx_bytes | Gauge | bytes | 应用内TCP发送字节数(用于计算发送速率bps)(tcp_link实体) | +| segs_in | Gauge | package | 应用内TCP接收包数量(tcp_link实体) | +| segs_out | Gauge | package | 应用内TCP发送包数量(tcp_link实体) | +| retran_packets | Gauge | package | 应用内所有TCP重传包数量(用于计算重传率:retran_packets/segs_out)(tcp_link实体) | +| active_open_failed | Gauge | num | 应用内TCP主动建链失败次数(endpoint_tcp实体) | +| passive_open_failed | Gauge | num | 应用内TCP被动建链失败次数(endpoint_tcp实体) | +| srtt | histogram | us | 应用内TCP P50/P90/P99传输时延(tcp_link实体) | +| rto | histogram | us | 应用内TCP P50/P90/P99重传超时时间(tcp_link实体) | +| ato | histogram | us | 应用内TCP P50/P90/P99 延时ACK时间(tcp_link实体) | +| rcv_rtt | histogram | us | 应用内TCP P50/P90/P99接收端传输时延(tcp_link实体) | +| client_estab_delay | histogram | us | 应用内TCP P50/P90/P99客户端建链时延(tcp_link实体) | +| server_estab_delay | histogram | us | 应用内TCP P50/P90/P99服务端建链时延(tcp_link实体) | +| reordering | histogram | num | 应用内TCP P50/P90/P99重排序包数量(tcp_link实体) | +| zero_win_tx_ratio | Gauge | % | 应用内TCP发送零窗比率(tcp_link实体) | +| zero_win_rx_ratio | Gauge | % | 应用内TCP接收零窗比率(tcp_link实体) | +| snd_cwnd | histogram | size | 应用内TCP P50/P90/P99拥塞窗口大小(tcp_link实体) | +| snd_wnd | histogram | size | 应用内TCP P50/P90/P99发送窗口大小(tcp_link实体) | +| rcv_wnd | histogram | size | 应用内TCP P50/P90/P99接收窗口大小(tcp_link实体) | +| avl_snd_wnd | histogram | size | 应用内TCP P50/P90/P99可用发送窗口大小(tcp_link实体) | +| zero_rcv_wnd_count | Gauge | num | 应用内TCP接收零窗次数(tcp_link实体) | +| zero_snd_wnd_count | Gauge | num | 应用内TCP发送零窗次数(tcp_link实体) | +| rst_sent | Gauge | package | 应用内发送RST报文次数(endpoint_tcp实体) | +| rst_recv | Gauge | package | 应用内接收RST报文次数(endpoint_tcp实体) | +| sacked_out | Gauge | package | 应用内TCP乱序包数量(tcp_link实体) | +| lost_out | Gauge | package | 应用内TCP拥塞丢包数量(tcp_link实体) | +| sk_drops | counter | package | 应用内TCP丢包数量(IP协议栈丢包)(tcp_link实体) | +| filter_drops | Gauge | package | 应用内TCP丢包数量(TCP过滤丢包,比如被eBPF规则过滤)(tcp_link实体) | +| backlog_drops | Gauge | num | 应用的TCP接收数据队列溢出次数(通常是应用处理数据太慢)(tcp_link实体) | +| tcp_oom | Gauge | num | 应用内发生TCP OOM次数(通常是因为TCP缓存数据量过多,应用处理慢引发)(tcp_link实体) | +| syn_sent | Gauge | package | 应用内SYN报文发送次数(endpoint_tcp实体) | +| retran_syn | Gauge | package | 应用内SYN报文重发次数(endpoint_tcp实体) | +| synack_sent | Gauge | package | 应用内synack发送次数(endpoint_tcp实体) | +| retran_synacks | Gauge | package | 应用内synack重发次数(endpoint_tcp实体) | +| req_drops | Gauge | num | 应用内TCP服务端建链失败次数(关闭侦听后又收到建链请求)(endpoint_tcp实体) | +| accept_overflow | Gauge | num | 应用内TCP服务端建链失败次数(TCP发生半连接队列溢出)(endpoint_tcp实体) | +| syn_overflow | Gauge | num | 应用内TCP服务端建链失败次数(TCP发生syn队列溢出)(endpoint_tcp实体) | + +#### UDP指标 + +| metrics_name | metrics_type | unit | metrics description | +| ------------- | ------------ | ----- | -------------------- | +| udp_sends | Gauge | bytes | 应用内UDP流量统计 | +| udp_rcvs | Gauge | bytes | 应用内UDP流量统计 | +| udp_rcv_drops | Gauge | bytes | 应用内接收侧丢包统计 | + +#### DNS指标 + +| metrics_name | metrics_type | unit | metrics description | +| ------------ | ------------ | ---- | ------------------- | +| domain | label | | 进程访问的DNS域名 | +| delay_avg | Gauge | ms | DNS访问平均时延 | +| max_delay | Gauge | ms | DNS访问最大时延 | +| error_ratio | Gauge | % | DNS访问错误率 | +| count | Gauge | | DNS访问次数 | + +# 基础中间件 + + + +## Kafka监控 + +### Topic流监控 实体名:kafka_topic_flow @@ -546,7 +441,7 @@ | server_ip | kafka_topic_flow | key | | kafka server所在主机的网卡IP | | | server_port | kafka_topic_flow | key | | kafka server所绑定的端口号 | | -## Topic性能监控 +### Topic性能监控 实体名:kafka_topic_metrics @@ -557,9 +452,9 @@ | server_port | kafka_topic_metrics | key | | kafka server所绑定的端口号 | TO BE | | throughput | kafka_topic_metrics | histogram | | topic 吞吐量 | TO BE | -# Nginx/Haproxy监控 +## Nginx/Haproxy监控 -## Nginx 负载分担监控 +### Nginx 负载分担监控 实体名:nginx_link @@ -573,7 +468,7 @@ | is_l7 | nginx_link | label | | 1—七层LB / 0—四层LB | | | link_count | nginx_link | gauge | | 连接数 | | -## Haproxy负载分担监控 +### Haproxy负载分担监控 实体名:haproxy_link @@ -586,275 +481,3 @@ | server_port | haproxy_link | key | | 真实服务端端口 | | | protocol | haproxy_link | label | | 协议类型(TCP/HTTP) | | | link_count | haproxy_link | gauge | | 连接数 | | - -## TCP性能监控 - -实体名:tcp_link - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------ | ----------------- | ------------ | ----- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| role | | key | | 客户端/服务端 | -| client_ip | | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | | key | | 客户端:对端IP;服务端:本地IP | -| server_port | | key | | 客户端:对端端口;服务端:本地端口 | -| protocol | | key | | 协议族(IPv4、IPv6) | -| rto | tcp_rate(0x20) | histogram | | Retransmission timeOut(us) | -| ato | tcp_rate(0x20) | histogram | | Estimated value of delayed ACK(us) | -| srtt | tcp_rtt(0x4) | histogram | us | Smoothed Round Trip Time(us). | -| snd_cwnd | tcp_windows(0x2) | histogram | | Congestion Control Window Size. | -| reordering | tcp_windows(0x2) | histogram | | Segments to be reordered. | -| rcv_rtt | tcp_rtt(0x4) | histogram | us | Receive end RTT (unidirectional measurement). | -| notsent_bytes | tcp_windows(0x2) | histogram | bytes | Number of bytes not sent currently. | -| notack_bytes | tcp_windows(0x2) | histogram | bytes | Number of bytes not ack currently. | -| snd_wnd | tcp_windows(0x2) | histogram | | Size of TCP send window. | -| rcv_wnd | tcp_windows(0x2) | histogram | | Size of TCP receive window. | -| zero_win_tx_ratio | tcp_windows(0x2) | Gauge | | Ratio of the number of times of sending window 0 to the number of sent bytes | -| zero_win_rx_ratio | tcp_windows(0x2) | Gauge | | Ratio of the number of receive window 0 windows to the number of received bytes | -| zero_rcv_wnd_count | tcp_windows(0x2) | Gauge | | The number of receive window 0 windows. | -| zero_snd_wnd_count | tcp_windows(0x2) | Gauge | | The number of sending window 0 windows. | -| avl_snd_wnd | tcp_windows(0x2) | histogram | | Size of TCP available send window. | -| syn_srtt | tcp_srtt | histogram | us | RTT of syn packet(us). | -| syn_srtt_max | tcp_srtt | Gauge | us | RTT of syn packet(us). | -| sk_rcvbuf | tcp_sockbuf(0x10) | histogram | bytes | Byte length of the RX buffer. | -| sk_sndbuf | tcp_sockbuf(0x10) | histogram | bytes | Byte length of the TX buffer. | - -## TCP异常监控 - -实体名:tcp_link - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------- | ------------- | ------------ | ---- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| role | | key | | 客户端/服务端 | -| client_ip | | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | | key | | 客户端:对端IP;服务端:本地IP | -| server_port | | key | | 客户端:对端端口;服务端:本地端口 | -| protocol | | key | | 协议族(IPv4、IPv6) | -| retran_packets | tcp_abn(0x01) | Gauge | | total number of retrans | -| retrans_ratio | tcp_abn(0x01) | Gauge | | retran ratio | -| backlog_drops | tcp_abn(0x01) | Gauge | | drops caused by backlog queue full | -| sk_drops | tcp_abn(0x01) | Counter | | Number of lost packets in the TCP protocol stack | -| lost_out | tcp_abn(0x01) | Gauge | segs | Number of lost segments estimated by TCP congestion | -| sacked_out | tcp_abn(0x01) | Gauge | segs | Number of out-of-order TCP packets (SACK) or number of repeated TCP ACKs (NO SACK) | -| filter_drops | tcp_abn(0x01) | Gauge | | drops caused by socket filter | -| tmout_count | tcp_abn(0x01) | Gauge | | counter of tcp link timeout | -| snd_buf_limit_count | tcp_abn(0x01) | Gauge | | counter of limits when allocate wmem | -| rmem_scheduls | tcp_abn(0x01) | Gauge | | rmem is not enough | -| tcp_oom | tcp_abn(0x01) | Gauge | | tcp out of memory | -| send_rsts | tcp_abn(0x01) | Gauge | | send_rsts | -| receive_rsts | tcp_abn(0x01) | Gauge | | receive_rsts | - -## Socket监控 - -实体名:endpoint_tcp - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------- | ------------ | ------------ | ---- | ------------------------------------------------------------ | -| tgid | endpoint_tcp | key | | 进程ID | -| role | endpoint_tcp | key | | 客户端(0)/服务端(1) | -| client_ip | endpoint_tcp | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | endpoint_tcp | key | | 客户端:对端IP;服务端:本地IP
备注:K8S场景支持Cluster IP转换成Backend IP | -| server_port | endpoint_tcp | key | | 客户端:对端端口;服务端:本地端口
备注:K8S场景支持Cluster Port转换成Backend Port | -| protocol | endpoint_tcp | key | | 协议族(IPv4、IPv6) | -| listendrop | endpoint_tcp | Gauge | | TCP accept丢弃次数 | -| accept_overflow | endpoint_tcp | Gauge | | TCP accept队列溢出次数 | -| syn_overflow | endpoint_tcp | Gauge | | TCP syn队列溢出次数 | -| passive_open | endpoint_tcp | Gauge | | TCP被动发起的建链次数 | -| passive_open_failed | endpoint_tcp | Gauge | | TCP被动发起的建链失败次数 | -| retran_synacks | endpoint_tcp | Gauge | | TCP synack重传报文数 | -| lost_synacks | endpoint_tcp | Gauge | | TCP synack报文丢失导致的建链失败次数 | -| req_drops | endpoint_tcp | Gauge | | TCP request丢弃次数(因为服务端listen关闭) | -| active_open | endpoint_tcp | Gauge | | TCP主动发起的建链次数 | -| active_open_failed | endpoint_tcp | Gauge | | TCP主动发起的建链失败次数 | - -实体名:endpoint_udp - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------- | ------------ | ------------ | ----- | -------------------- | -| tgid | endpoint_udp | key | | 进程ID | -| remote_ip | endpoint_udp | key | | udp 对端IP | -| local_ip | endpoint_udp | key | | udp 本地IP | -| protocol | endpoint_udp | key | | 协议族(IPv4、IPv6) | -| udp_rcv_drops | endpoint_udp | Gauge | bytes | UDP接收失败字节数 | -| udp_sends | endpoint_udp | Gauge | bytes | UDP发送字节数 | -| udp_rcvs | endpoint_udp | Gauge | bytes | UDP接收字节数 | - -## 负载分担流拓扑构建 - -通过针对TCP连接、Nginx/haproxy负载分担流监控,可以有效构建出前后端应用之间事件TCP拓扑流,如下图所示: - -![](./png/nginx_flow.png) - - - -# Redis/PostgreSQL监控 - -## Redis性能监控 - -实体名:sli - -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------ | ------------- | ------------ | ---- | ------------------------------ | ---------------- | -| tgid | | key | | 进程ID | 仅支持非加密场景 | -| ins_id | | key | | 实例ID | | -| app | | key | | 应用名 | | -| method | | key | | 请求方法 | | -| server_ip | | label | | 服务端IP | | -| server_port | | label | | 服务端端口 | | -| client_ip | | label | | 客户端IP | | -| client_port | | label | | 客户端端口 | | -| rtt_nsec | redis_sli | gauge | ns | Redis协议请求RTT | | -| max_rtt_nsec | redis_max_sli | gauge | ns | Redis协议采样周期内最大请求RTT | | - -## PostgreSQL性能监控 - -实体名:sli - -| metrics_name | table_name | metrics_type | unit | metrics description | Support | -| ------------ | ---------- | ------------ | ---- | -------------------------------- | ------------------------------- | -| tgid | | key | | 进程ID | 支持加密场景,openssl 1.1.1版本 | -| ins_id | | key | | 实例ID | | -| app | | key | | 应用名 | | -| method | | key | | 请求方法 | | -| server_ip | | label | | 服务端IP | | -| server_port | | label | | 服务端端口 | | -| client_ip | | label | | 客户端IP | | -| client_port | | label | | 客户端端口 | | -| rtt_nsec | pg_sli | gauge | ns | Postgre协议请求RTT | | -| max_rtt_nsec | pg_max_sli | gauge | ns | Postgre协议采样周期内最大请求RTT | | -| tps | pg_tps | gauge | | 数据库吞吐量 | 仅支持openGauss 2.0 | - - - -## TCP性能监控 - -实体名:tcp_link - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------ | ----------------- | ------------ | ----- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| role | | key | | 客户端/服务端 | -| client_ip | | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | | key | | 客户端:对端IP;服务端:本地IP | -| server_port | | key | | 客户端:对端端口;服务端:本地端口 | -| protocol | | key | | 协议族(IPv4、IPv6) | -| rto | tcp_rate(0x20) | histogram | | Retransmission timeOut(us) | -| ato | tcp_rate(0x20) | histogram | | Estimated value of delayed ACK(us) | -| srtt | tcp_rtt(0x4) | histogram | us | Smoothed Round Trip Time(us). | -| snd_cwnd | tcp_windows(0x2) | histogram | | Congestion Control Window Size. | -| reordering | tcp_windows(0x2) | histogram | | Segments to be reordered. | -| rcv_rtt | tcp_rtt(0x4) | histogram | us | Receive end RTT (unidirectional measurement). | -| notsent_bytes | tcp_windows(0x2) | histogram | bytes | Number of bytes not sent currently. | -| notack_bytes | tcp_windows(0x2) | histogram | bytes | Number of bytes not ack currently. | -| snd_wnd | tcp_windows(0x2) | histogram | | Size of TCP send window. | -| rcv_wnd | tcp_windows(0x2) | histogram | | Size of TCP receive window. | -| zero_win_tx_ratio | tcp_windows(0x2) | Gauge | | Ratio of the number of times of sending window 0 to the number of sent bytes | -| zero_win_rx_ratio | tcp_windows(0x2) | Gauge | | Ratio of the number of receive window 0 windows to the number of received bytes | -| zero_rcv_wnd_count | tcp_windows(0x2) | Gauge | | The number of receive window 0 windows. | -| zero_snd_wnd_count | tcp_windows(0x2) | Gauge | | The number of sending window 0 windows. | -| avl_snd_wnd | tcp_windows(0x2) | histogram | | Size of TCP available send window. | -| syn_srtt | tcp_srtt | histogram | us | RTT of syn packet(us). | -| syn_srtt_max | tcp_srtt | Gauge | us | RTT of syn packet(us). | -| sk_rcvbuf | tcp_sockbuf(0x10) | histogram | bytes | Byte length of the RX buffer. | -| sk_sndbuf | tcp_sockbuf(0x10) | histogram | bytes | Byte length of the TX buffer. | - -## TCP异常监控 - -实体名:tcp_link - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------- | ------------- | ------------ | ---- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| role | | key | | 客户端/服务端 | -| client_ip | | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | | key | | 客户端:对端IP;服务端:本地IP | -| server_port | | key | | 客户端:对端端口;服务端:本地端口 | -| protocol | | key | | 协议族(IPv4、IPv6) | -| retran_packets | tcp_abn(0x01) | Gauge | | total number of retrans | -| retrans_ratio | tcp_abn(0x01) | Gauge | | retran ratio | -| backlog_drops | tcp_abn(0x01) | Gauge | | drops caused by backlog queue full | -| sk_drops | tcp_abn(0x01) | Counter | | Number of lost packets in the TCP protocol stack | -| lost_out | tcp_abn(0x01) | Gauge | segs | Number of lost segments estimated by TCP congestion | -| sacked_out | tcp_abn(0x01) | Gauge | segs | Number of out-of-order TCP packets (SACK) or number of repeated TCP ACKs (NO SACK) | -| filter_drops | tcp_abn(0x01) | Gauge | | drops caused by socket filter | -| tmout_count | tcp_abn(0x01) | Gauge | | counter of tcp link timeout | -| snd_buf_limit_count | tcp_abn(0x01) | Gauge | | counter of limits when allocate wmem | -| rmem_scheduls | tcp_abn(0x01) | Gauge | | rmem is not enough | -| tcp_oom | tcp_abn(0x01) | Gauge | | tcp out of memory | -| send_rsts | tcp_abn(0x01) | Gauge | | send_rsts | -| receive_rsts | tcp_abn(0x01) | Gauge | | receive_rsts | - -## Socket监控 - -实体名:endpoint_tcp - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------------- | ------------ | ------------ | ---- | ------------------------------------------------------------ | -| tgid | endpoint_tcp | key | | 进程ID | -| role | endpoint_tcp | key | | 客户端(0)/服务端(1) | -| client_ip | endpoint_tcp | key | | 客户端:本地IP;服务端:对端IP | -| server_ip | endpoint_tcp | key | | 客户端:对端IP;服务端:本地IP
备注:K8S场景支持Cluster IP转换成Backend IP | -| server_port | endpoint_tcp | key | | 客户端:对端端口;服务端:本地端口
备注:K8S场景支持Cluster Port转换成Backend Port | -| protocol | endpoint_tcp | key | | 协议族(IPv4、IPv6) | -| listendrop | endpoint_tcp | Gauge | | TCP accept丢弃次数 | -| accept_overflow | endpoint_tcp | Gauge | | TCP accept队列溢出次数 | -| syn_overflow | endpoint_tcp | Gauge | | TCP syn队列溢出次数 | -| passive_open | endpoint_tcp | Gauge | | TCP被动发起的建链次数 | -| passive_open_failed | endpoint_tcp | Gauge | | TCP被动发起的建链失败次数 | -| retran_synacks | endpoint_tcp | Gauge | | TCP synack重传报文数 | -| lost_synacks | endpoint_tcp | Gauge | | TCP synack报文丢失导致的建链失败次数 | -| req_drops | endpoint_tcp | Gauge | | TCP request丢弃次数(因为服务端listen关闭) | -| active_open | endpoint_tcp | Gauge | | TCP主动发起的建链次数 | -| active_open_failed | endpoint_tcp | Gauge | | TCP主动发起的建链失败次数 | - -实体名:endpoint_udp - -| metrics_name | table_name | metrics_type | unit | metrics description | -| ------------- | ------------ | ------------ | ----- | -------------------- | -| tgid | endpoint_udp | key | | 进程ID | -| remote_ip | endpoint_udp | key | | udp 对端IP | -| local_ip | endpoint_udp | key | | udp 本地IP | -| protocol | endpoint_udp | key | | 协议族(IPv4、IPv6) | -| udp_rcv_drops | endpoint_udp | Gauge | bytes | UDP接收失败字节数 | -| udp_sends | endpoint_udp | Gauge | bytes | UDP发送字节数 | -| udp_rcvs | endpoint_udp | Gauge | bytes | UDP接收字节数 | - -## I/O性能 - -实体名:proc - -| metrics_name | table_name | metrics_type | unit | metrics description | -| --------------------- | ------------------ | ------------ | ---- | ------------------------------------------------------------ | -| tgid | | key | | 进程ID | -| ppid | system_proc | label | | 父进程ID | -| pgid | system_proc | label | | 进程组ID | -| comm | | label | | 执行程序名称 | -| cmdline | system_proc | label | | 执行程序命令(包括配置) | -| fd_count | system_proc | Gauge | | 进程文件句柄 | -| fd_free_per | system_proc | Gauge | | 进程剩余FD资源占比% | -| rchar_bytes | system_proc | Gauge | | 进程系统调用至FS的读字节数 | -| wchar_bytes | system_proc | Gauge | | 进程系统调用至FS的写字节数 | -| syscr_count | system_proc | Gauge | | 进程read()/pread()执行次数 | -| syscw_count | system_proc | Gauge | | 进程write()/pwrite()执行次数 | -| read_bytes | system_proc | Gauge | | 进程实际从磁盘读取的字节数 | -| write_bytes | system_proc | Gauge | | 进程实际从磁盘写入的字节数 (page cache情况下,该字段进表示设置dirty page的size) | -| cancelled_write_bytes | system_proc | Gauge | | 参考proc_write_bytes,因为存在page cache 如果write操作结束后,又发生文件被删除事件,会导致diry page并未写入磁盘,所以存在取消写的字节数统计 | -| ns_ext4_read | proc_ext4(0x20) | Gauge | ns | ext4文件系统读操作时间,单位ns | -| ns_ext4_write | proc_ext4(0x20) | Gauge | ns | ext4文件系统写操作时间,单位ns | -| ns_ext4_flush | proc_ext4(0x20) | Gauge | ns | ext4文件系统flush操作时间,单位ns | -| ns_ext4_open | proc_ext4(0x20) | Gauge | ns | ext4文件系统open操作时间,单位ns | -| ns_overlay_read | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统读操作时间,单位ns | -| ns_overlay_write | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统写操作时间,单位ns | -| ns_overlay_flush | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统flush操作时间,单位ns | -| ns_overlay_open | proc_overlay(0x40) | Gauge | ns | overlayfs文件系统open操作时间,单位ns | -| ns_tmpfs_read | proc_tmpfs(0x80) | Gauge | ns | tmpfs文件系统读操作时间,单位ns | -| ns_tmpfs_write | proc_tmpfs(0x80) | Gauge | ns | tmpfs文件系统写操作时间,单位ns | -| ns_tmpfs_flush | proc_tmpfs(0x80) | Gauge | ns | tmpfs文件系统flush操作时间,单位ns | -| less_4k_io_read | proc_io(0x400) | Gauge | | Number of small I/O (less than 4 KB) read operations at the BIO layer. | -| less_4k_io_write | proc_io(0x400) | Gauge | | Number of small I/O (less than 4 KB) write operations at the BIO layer. | -| greater_4k_io_read | proc_io(0x400) | Gauge | | Number of big I/O (greater than 4 KB) read operations at the BIO layer. | -| greater_4k_io_write | proc_io(0x400) | Gauge | | Number of big I/O (greater than 4 KB) write operations at the BIO layer. | -| hang_count | proc_io(0x400) | Gauge | | Number of process hang times. | -| iowait_us | proc_io(0x400) | Gauge | us | Process IO_wait time (unit: us). | - -- Gitee