From 8a60f817c44c1dfa56cb7e419308ae78330aed3d Mon Sep 17 00:00:00 2001 From: yinbin6 Date: Wed, 10 Jul 2024 17:45:10 +0800 Subject: [PATCH] example: sync example update --- 0212-example-sync-example-update.patch | 6443 ++++++++++++++++++++++++ gazelle.spec | 6 +- 2 files changed, 6448 insertions(+), 1 deletion(-) create mode 100644 0212-example-sync-example-update.patch diff --git a/0212-example-sync-example-update.patch b/0212-example-sync-example-update.patch new file mode 100644 index 0000000..68edd8f --- /dev/null +++ b/0212-example-sync-example-update.patch @@ -0,0 +1,6443 @@ +From dc6bfbf12bdb318eeb4d6a1f0b4912095b3f79eb Mon Sep 17 00:00:00 2001 +From: yinbin6 +Date: Wed, 10 Jul 2024 18:48:38 +0800 +Subject: [PATCH] example: sync example update + + +diff --git a/README.md b/README.md +index 010e8a4..36f8454 100644 +--- a/README.md ++++ b/README.md +@@ -1,7 +1,9 @@ +-Gazelle ++ + + # 用户态协议栈Gazelle + ++[简体中文](README.md) | [English](README_en.md) ++ + ## 简介 + + Gazelle是一款高性能用户态协议栈。它基于DPDK在用户态直接读写网卡报文,共享大页内存传递报文,使用轻量级LwIP协议栈。能够大幅提高应用的网络I/O吞吐能力。专注于数据库网络性能加速,如MySQL、redis等。兼顾高性能与通用性: +@@ -12,51 +14,44 @@ Gazelle是一款高性能用户态协议栈。它基于DPDK在用户态直接读 + + ## 性能效果 + ### mysql 8.0.20 +- +- ++ ++ + + 使用内核协议栈跑分为54.84万,使用Gazelle跑分为66.85万,Gazelle提升20%+ +-详见[实践系列(一):Gazelle加速mysql 20%](doc/%E5%AE%9E%E8%B7%B5%E7%B3%BB%E5%88%97(%E4%B8%80)Gazelle%E5%8A%A0%E9%80%9Fmysql%2020%25.md) + + ### ceph 14.2.8 +- ++ + + 4k整机场景,Gazelle提升20%+ +-部署及测试详见 [高性能云盘](https://www.hikunpeng.com/document/detail/zh/kunpengcpfs/basicAccelFeatures/storageAccel/kunpengcpfs_hpcd_0002.html) + +-### 后续应用…… +-- redis 2022/12/30 +-- openGauss 2022/12/30 + + ## 详情 + 可点击标题跳转,欢迎投递文章、提意见。 + | 主题 | 内容简介 | 发布时间 | + |:---|:-----|:---| +-|[Gazelle使用指南](doc/Gazelle%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97.md)| 1,安装、部署环境、启动应用程序
2,配置参数说明
3,调测命令说明
4,使用约束、风险、注意事项|已发布| +-|Gazelle介绍| 1,介绍背景
2,简介技术方案
3,性能效果|2022/11/30| +-|[实践系列(一):Gazelle加速mysql 20%](doc/%E5%AE%9E%E8%B7%B5%E7%B3%BB%E5%88%97(%E4%B8%80)Gazelle%E5%8A%A0%E9%80%9Fmysql%2020%25.md)|1,详细测试步骤
2,性能效果|已发布| +-|实践系列(二):Gazelle加速redis xx|1,详细测试步骤
2,性能效果|2022/12/31| +-|[实践系列(三):Gazelle加速ceph client 20%](https://www.hikunpeng.com/document/detail/zh/kunpengcpfs/basicAccelFeatures/storageAccel/kunpengcpfs_hpcd_0002.html)|1,详细测试步骤
2,性能效果|已发布| +-|实践系列(四):Gazelle加速openGauss xx|1,详细测试步骤
2,性能效果|2022/12/31| +-|解读系列(一):Gazelle总体方案介绍|1,支持场景、特性、规格
2,与dpdk、lwip关系
3,总体框架
4,替换posix接口|2022/11/25| +-|解读系列(二):Gazelle为什么能提升xx|介绍提关键技术点:减少拷贝、亲和性、减少上下文切换|2022/12/2| +-|解读系列(三):Gazelle代码框架流程|1,Gazelle框架
2,事件、读、写、ltran报文流程图|2022/12/9| +-|参与Gazelle指导|1,怎么判断应用适不适合应gazelle加速
2,Gazelle常见问题调试
|2022/12/5| + |[openEuler指南](https://gitee.com/openeuler/community/blob/master/zh/contributors/README.md)| 如何参与openEuler社区 | 已发布 | ++|[Gazelle用户指南](doc/user-guide.md)| 1. 安装、部署环境、启动应用程序
2. 配置参数说明
3. 调测命令说明
4. 使用约束、风险、注意事项|已发布| ++|[Gazelle开发者指南](doc/programmer-guide.md)| 1. 技术原理
2. 架构设计| 待定 | ++|[实践系列-Gazelle加速mysql 20%](doc/%E5%AE%9E%E8%B7%B5%E7%B3%BB%E5%88%97-Gazelle%E5%8A%A0%E9%80%9Fmysql.md)|1. 详细测试步骤
2. 性能效果|已发布| ++|[实践系列-Gazelle加速ceph client 20%](https://www.hikunpeng.com/document/detail/zh/kunpengcpfs/basicAccelFeatures/storageAccel/kunpengcpfs_hpcd_0002.html)|1. 详细测试步骤
2. 性能效果|已发布| ++|实践系列-Gazelle加速redis |1. 详细测试步骤
2. 性能效果| 待定 | ++|实践系列-Gazelle加速openGauss |1. 详细测试步骤
2. 性能效果| 待定 | ++|[实践系列-Gazelle支持netperf性能测试](doc/netperf.md)| 1.版本说明
2. 详细测试步骤| 待定 | ++ ++## 特性变更 ++- [多进程模式即将下线](doc/releasenote.md) + + ## 支持列表 + - [posix接口列表及应用支持列表](doc/support.md) + + ## FAQ +-- [如何使用pdump工具抓包](doc/pdump/pdump.md) +-- listenshadow参数如何配置(文档11.30发布) +-- [多进程各自独立使用网卡](doc/mNIC/mNIC.md) ++- [如何使用pdump工具抓包](doc/pdump.md) ++- [多进程各自独立使用网卡](doc/multiple-nic.md) + + ## 路标 +-scene ++TODO + + ## 联系方式 + [订阅邮件列表](https://mailweb.openeuler.org/postorius/lists/high-performance-network.openeuler.org/) + [历史邮件](https://mailweb.openeuler.org/hyperkitty/list/high-performance-network@openeuler.org/) +-微信群名称:openEuler 高性能网络sig +-[SIG首页](https://gitee.com/openeuler/community/tree/master/sig/sig-high-performance-network) +\ No newline at end of file ++[SIG首页](https://gitee.com/openeuler/community/tree/master/sig/sig-high-performance-network) +diff --git a/README_en.md b/README_en.md +new file mode 100644 +index 0000000..f545381 +--- /dev/null ++++ b/README_en.md +@@ -0,0 +1,52 @@ ++ ++ ++# User-Space Protocol Stack Gazelle ++ ++[简体中文](README.md) | [English](README_en.md) ++ ++## Introduction ++ ++Gazelle is a high-performance user-space protocol stack. It is based on DPDK for directly reading and writing network packets in user space, sharing large-page memory to transmit packets, and using the lightweight LwIP protocol stack. It significantly improves the network I/O throughput of applications, focusing on accelerating database network performance, such as MySQL and Redis, while balancing high performance and generality: ++- High Performance ++Zero-copy packet processing, lock-free, flexible scale-out, adaptive scheduling. ++- Generality ++Fully POSIX-compatible, no modifications required, suitable for different types of applications. ++ ++## Performance Results ++ ++### MySQL 8.0.20 ++ ++ ++ ++The score using the kernel protocol stack is 548,400, while using Gazelle, it is 668,500, an improvement of over 20%. ++ ++### Ceph 14.2.8 ++ ++ ++In the 4k full-machine scenario, Gazelle improves performance by over 20%. ++ ++## Details ++Click on the titles for more details. Contributions and feedback are welcome. ++| Topic | Summary | Publication Date | ++|:---|:-----|:---| ++|[openEuler Guide](https://gitee.com/openeuler/community/blob/master/en/contributors/README.md)| How to participate in the openEuler community | Published | ++|[Gazelle User Guide](doc/user-guide_en.md)| 1. Installation, deployment environment, application startup
2. Parameter configuration explanation
3. Debugging command explanation
4. Usage constraints, risks, considerations|Published| ++|[Gazelle Developer Guide](doc/programmer-guide_en.md)| 1. Technical principles
2. Architecture design| To be determined | ++|[Practice Series - Gazelle Accelerating MySQL by 20%](doc/Practice_Series_Gazelle_Accelerates_MySQL.md)|1. Detailed testing steps
2. Performance results|Published| ++|[Practice Series - Gazelle Accelerating Ceph Client by 20%](https://www.hikunpeng.com/document/detail/zh/kunpengcpfs/basicAccelFeatures/storageAccel/kunpengcpfs_hpcd_0002.html)|1. Detailed testing steps
2. Performance results|Published| ++|Practice Series - Gazelle Accelerating Redis |1. Detailed testing steps
2. Performance results| To be determined | ++|Practice Series - Gazelle Accelerating openGauss |1. Detailed testing steps
2. Performance results| To be determined | ++|[Practice Series - Gazelle Supporting Netperf Performance Testing](doc/netperf_en.md)| 1. Version description
2. Detailed testing steps| To be determined | ++ ++## Support List ++- [POSIX interface list and application support list](doc/support_en.md) ++ ++## FAQ ++- [How to use the pdump tool for packet capture](doc/pdump_en.md) ++- [Using multiple processes independently with NICs](doc/multiple-nic_en.md) ++ ++ ++## Contact Information ++[Subscribe to the mailing list](https://mailweb.openeuler.org/postorius/lists/high-performance-network.openeuler.org/) ++[Archived emails](https://mailweb.openeuler.org/hyperkitty/list/high-performance-network@openeuler.org/) ++[SIG Homepage](https://gitee.com/openeuler/community/tree/master/sig/sig-high-performance-network) +diff --git a/build/build.sh b/build/build.sh +index 4464f8c..622e1cc 100755 +--- a/build/build.sh ++++ b/build/build.sh +@@ -31,3 +31,11 @@ if [ $? -ne 0 ]; then + fi + + cd - ++cd ../examples ++cmake . ++make ++if [ $? -ne 0 ]; then ++ echo "build examples failed" ++ exit 1 ++fi ++cd - +diff --git a/doc/Practice_Series_Gazelle_Accelerates_MySQL.md b/doc/Practice_Series_Gazelle_Accelerates_MySQL.md +new file mode 100644 +index 0000000..9bb6b6f +--- /dev/null ++++ b/doc/Practice_Series_Gazelle_Accelerates_MySQL.md +@@ -0,0 +1,304 @@ ++# Practice Series (Part 1) Gazelle Accelerates MySQL by 20% ++ ++## Background Introduction ++ ++The current improvement in network card performance far outpaces that of single-core CPUs. Single-core CPUs are no longer able to fully utilize the bandwidth dividend of network cards. Meanwhile, CPUs are evolving towards multi-core direction, and NUMA architecture is one of the multi-core solutions. From a hardware perspective, there are mainly two solutions to bridge the computational gap between CPU and network card: offloading CPU work to the network card, a hardware acceleration solution; and making full use of the NUMA architecture, a software acceleration solution. It may be instinctive to think that hardware acceleration is faster, but in practical tests, Gazelle software acceleration achieves greater performance improvement, especially in the segment where data efficiently transfers to applications, Gazelle handles it better. ++ ++![Network Card Trend](images/网卡趋势_en.png) ++ ++Currently, there is a wide variety of software programming models, but they can be summarized into two typical network models, as shown below: ++- IO multiplexing model: Application A's network threads are completely isolated from each other, and protocol state contexts are fixed within a thread. ++- Asymmetric model: Application B's network threads are asymmetric, and protocol state contexts migrate across multiple threads. ++ ++![Network Models](images/网络模型_en.png) ++ ++## Challenges in Improving MySQL Performance ++ ++MySQL's network model belongs to the aforementioned asymmetric model, where TCP migrates across threads. Common user-space protocol stacks in the industry are designed for asymmetric applications (such as f-stack), which cannot support TCP migration across threads, or they use global TCP resources (such as lwip). When the number of connections exceeds 40, performance rapidly deteriorates due to competition issues. ++ ++![MySQL Model](images/mysql模型_en.png) ![Accelerating MySQL in the Industry](images/业界加速mysql效果_en.png) ++ ++## Gazelle Solution ++ ++Gazelle is a high-performance user-space protocol stack. It directly reads and writes network packets in user space based on DPDK, shares large page memory for packet transmission, and uses lightweight LwIP protocol stack. It can significantly improve the network I/O throughput of applications, focusing on accelerating database network performance, such as MySQL, Redis, etc. It balances high performance with versatility: ++- High performance: Zero-copy of packets, lock-free, flexible scale-out, adaptive scheduling. ++- Versatility: Fully compatible with POSIX, zero modifications, suitable for different types of applications. ++ ++Gazelle decouples application threads from protocol stack threads, thereby supporting any thread model. Through the routing table of application thread fd and protocol stack thread sock, operations such as read/write of application threads can be executed in the corresponding protocol stack threads. Gazelle is deployed in a multi-core multi-threaded manner, avoiding NUMA traps through regionalized large page memory. ++ ++![Technical Features](images/框图_en.png) ++ ++- POSIX compatibility ++- DPDK bypass kernel ++- Regionalized large page memory management to avoid NUMA traps ++- Application thread affinity management ++- Distributed TCP Hash table, multi-core multi-threaded working mode ++- Decoupling of protocol stack threads and application threads ++- Efficient transmission of packets to applications ++ ++![MySQL Kernel](images/mysql_kernel.png) ![MySQL with Gazelle](images/mysql_gazelle.png) ++ ++As shown, using the kernel protocol stack achieves a score of 548,400, while using Gazelle achieves a score of 668,500, an increase of 20%+. ++ ++## Steps to Accelerate MySQL with Gazelle ++ ++### 1. Environment Requirements ++ ++#### 1.1 Hardware ++ ++One server (Server) and one client (Client) are required. ++ ++| | Server | Client | ++| :------- | :----------------------: | :---------------------: | ++| CPU | Kunpeng 920-4826 * 2 | Kunpeng 920-4826 * 2 | ++| Frequency| 2600MHz | 2600MHz | ++| Memory | 12 * 32G Micron 2666 MHz | 8 * 32G Micron 2666 MHz | ++| Network | 1822 25G | 1822 25G | ++| System Disk | 1.1T HDD TOSHIBA | 1.1T HDD TOSHIBA | ++| Data Disk | 3T HUAWEI SSD NVME | NA | ++ ++#### 1.2 Software ++ ++The software package defaults to using openEuler 22.03 yum source. ++ ++| Software Name | Version | ++| :---------------: | :-------: | ++| mysql | 8.0.20 | ++| benchmarksql | 5.0 | ++ ++#### 1.3 Networking ++ ++![Deployment](images/部署_en.png) ++ ++### 2. Server-Side Deployment ++ ++#### 2.1 Install MySQL Dependencies ++ ++```sh ++yum install -y cmake doxygen bison ncurses-devel openssl-devel libtool tar rpcgen libtirpc-devel bison bc unzip git gcc-c++ libaio libaio-devel numactl ++``` ++ ++#### 2.2 Compile and Install MySQL ++ ++- Download the source code package from the [official website](https://downloads.mysql.com/archives/community/). ++ ++![Download MySQL Source Code Package](images/下载mysql源码包.png) ++ ++- Download optimization patches: [Fine-grained lock optimization feature patch](https://github.com/kunpengcompute/mysql-server/releases/download/tp_v1.0.0/0001-SHARDED-LOCK-SYS.patch), [NUMA scheduling patch](https://github.com/kunpengcompute/mysql-server/releases/download/21.0.RC1.B031/0001-SCHED-AFFINITY.patch), [Lock-free optimization feature patch](https://github.com/kunpengcompute/mysql-server/releases/download/tp_v1.0.0/0002-LOCK-FREE-TRX-SYS.patch). ++ ++- Compile MySQL ++ ++ Ensure that the `libaio-devel` package is installed before compiling. ++ ++```sh ++tar zxvf mysql-boost-8.0.20.tar.gz ++cd mysql-8.0.20/ ++patch -p1 < ../0001-SHARDED-LOCK-SYS.patch ++patch -p1 < ../0001-SCHED-AFFINITY.patch ++patch -p1 < ../0002-LOCK-FREE-TRX-SYS.patch ++cd cmake ++make clean ++cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/mysql-8.0.20 -DWITH_BOOST=../boost -DDOWNLOAD_BOOST=1 ++make -j 64 ++make install ++``` ++ ++#### 2.3 Configure MySQL Parameters ++ ++Use the `my.cnf-arm` configuration file from the Gazelle source code's `doc/conf/` directory. Place it in the `/etc` directory and rename it to `my.cnf`. ++ ++#### 2.4 Deploy MySQL ++ ++```sh ++# Mount the NVMe disk ++mkdir -p /data ++mount /dev/nvme0n1 /data ++mkdir -p /data/mysql ++mkdir -p /data/mysql/data ++mkdir -p /data/mysql/share ++mkdir -p /data/mysql/tmp ++mkdir -p /data/mysql/run ++mkdir -p /data/mysql/log ++ ++# Create user group ++groupadd mysql ++useradd -g mysql mysql ++chown -R mysql:mysql /data ++chown -R mysql:mysql /data/mysql/log/mysql.log ++ ++# Initialize ++echo "" > /data/mysql/log/mysql.log ++rm -fr /data/mysql/data/* ++/usr/local/mysql-8.0.20/bin/mysqld --defaults-file=/etc/my.cnf --user=root --initialize ++ ++# Start the service ++/usr/local/mysql-8.0.20/support-files/mysql.server start ++ ++# After initialization, a random password is generated. Use it to log in to MySQL ++/usr/local/mysql-8.0.20/bin/mysql -u root -p ++alter user 'root'@'localhost' identified by '123456'; ++flush privileges; ++quit ++ ++# Log in to the database again, password is '123456'. Update the root account to be able to access '%' domain, enabling remote access ++/usr/local/mysql-8.0.20/bin/mysql -u root -p ++use mysql; ++update user set host='%' where user='root'; ++flush privileges; ++create database tpcc; ++quit ++ ++# Stop the service first, and then start it again to apply the configured changes ++/usr/local/mysql-8.0.20/support-files/mysql.server stop ++``` ++ ++### 3. Deploying BenchmarkSQL Tool on the Client Side ++ ++- Compilation and Installation ++ ++Download the [BenchmarkSQL tool](https://mirrors.huaweicloud.com/kunpeng/archive/kunpeng_solution/database/patch/benchmarksql5.0-for-mysql.zip). ++ ++```sh ++# Install dependencies for BenchmarkSQL ++yum install -y java ++ ++unzip benchmarksql5.0-for-mysql.zip ++cd benchmarksql5.0-for-mysql/run ++chmod +x *.sh ++``` ++ ++- Configuring BenchmarkSQL Parameters ++ ++ Edit the `benchmarksql5.0-for-mysql/run/props.conf` file. ++ ++ | Configuration Item | Value | Description | ++ | ------------------ | ----- | -------------------------------------------- | ++ | Terminals | 300 | Number of concurrent connections for testing | ++ | runMins | 10 | Duration of the test in minutes | ++ | conn | ip | Modify the default IP to the server's IP | ++ ++### 4. Creating Test Data in MySQL ++ ++```sh ++# Start the MySQL service ++/usr/local/mysql-8.0.20/support-files/mysql.server start ++ ++# Create test data (data creation takes about 45 minutes, after completing the data creation, it is recommended to backup the data under /data/mysql/data on the server side for future tests, data can be copied from here) ++./runDatabaseBuild.sh props.conf ++ ++# Stop the database ++/usr/local/mysql-8.0.20/support-files/mysql.server stop ++``` ++ ++ ++ ++### 5. Configuring the Execution Environment ++ ++#### 5.1 Enabling STEAL Optimization ++ ++Enable STEAL optimization on the server side. ++ ++1. Add the parameter `sched_steal_node_limit=4` to the Linux system startup parameters, and reboot to take effect. ++ ++```sh ++[root@localhost mysql]# cat /proc/cmdline ++BOOT_IMAGE=/vmlinuz-5.10.0-153.12.0.89.oe2203sp2.aarch64 root=/dev/mapper/openeuler-root ro rd.lvm.lv=openeuler/root rd.lvm.lv=openeuler/swap video=VGA-1:640x480-32@60me cgroup_disable=files apparmor=0 crashkernel=1024M,high smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15 console=tty0 sched_steal_node_limit=4 ++``` ++ ++2. Enable STEAL after rebooting. ++ ++```sh ++echo STEAL > /sys/kernel/debug/sched_features ++``` ++ ++#### 5.2 Disabling Test Impacting Factors ++ ++```sh ++# Disable irqbalance ++systemctl stop irqbalance.service ++systemctl disable irqbalance.service ++ ++# Disable firewall ++systemctl stop iptables ++systemctl stop firewalld ++``` ++ ++### 6. Kernel Protocol Stack Testing for MySQL ++ ++```sh ++# Server-side interrupt binding (replace NIC name and CPU core according to the environment) ++ethtool -L enp4s0 combined 5 ++irq1=`cat /proc/interrupts| grep -E enp4s0 | head -n5 | awk -F ':' '{print $1}'` ++cpulist=(91 92 93 94 95) ++c=0 ++for irq in $irq1 ++do ++echo ${cpulist[c]} "->" $irq ++echo ${cpulist[c]} > /proc/irq/$irq/smp_affinity_list ++let "c++" ++done ++ ++# Execute MySQL testing on the client side ++./runBenchmark.sh props.conf ++ ++## Restore the environment ++# Restore the database using backup data on the server side, or regenerate the data. ++rm -fr /data/mysql/data/* ++cp -fr /home/tpccdata/* /data/mysql/data/ ++# Shut down the MySQL process ++pkill -9 mysqld ++``` ++ ++Test results are as follows: ++ ++ ++ ++### 7. Gazelle Testing for MySQL ++Install software packages ++```sh ++yum -y install gazelle dpdk libconfig numactl libboundscheck libcap ++``` ++ ++Modify the `/etc/gazelle/lstack.conf` configuration file as follows: ++ ++| Configuration Item | Value | Description | ++| ------------------ | ------------------------------------------------------------ | --------------------------------------------------------- | ++| dpdk_args | ["--socket-mem", "2048,2048,2048,2048", "--huge-dir", "/mnt/hugepages-lstack", "--proc-type", "primary", "--legacy-mem", "--map-perfect"] | Configure 2G memory usage for each NUMA (can be smaller), mount directory for huge pages | ++| use_ltran | 0 | Do not use ltran | ++| listen_shadow | 1 | Use listen shadow FD, as one MySQL listen thread corresponds to 4 protocol stack threads | ++| num_cpus | "18,38,58,78" | Choose one CPU for each NUMA | ++ ++ ++ ++```sh ++# Server-side allocate huge pages ++echo 8192 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # Choose page size accordingly ++mkdir -p /mnt/hugepages-lstack ++mount -t hugetlbfs nodev /mnt/hugepages-lstack # Do not repeat, otherwise the huge pages will be occupied and cannot be released ++ ++# Load ko on the server ++modprobe vfio enable_unsafe_noiommu_mode=1 ++modprobe vfio-pci ++ ++# Bind NIC to user space on the server ++ip link set enp4s0 down ++dpdk-devbind -b vfio-pci enp4s0 ++ ++# Start mysqld on the server ++LD_PRELOAD=/usr/lib64/liblstack.so GAZELLE_BIND_PROCNAME=mysqld /usr/local/mysql-8.0.20/bin/mysqld --defaults-file=/etc/my.cnf --bind-address=192.168.1.10 & ++ ++# Execute MySQL testing on the client side ++./runBenchmark.sh props.conf ++ ++## Restore the environment ++# Restore the database using backup data on the server side, or regenerate the data. ++rm -fr /data/mysql/data/* ++cp -fr /home/tpccdata/* /data/mysql/data/ ++# Shut down the MySQL process ++pkill -9 mysqld ++``` ++For detailed Gazelle deployment, refer to the [Gazelle User Guide](user-guide.md). ++ ++Test results are as follows: ++ ++ +diff --git a/doc/conf/my.cnf-arm b/doc/conf/my.cnf-arm +new file mode 100644 +index 0000000..68d91ee +--- /dev/null ++++ b/doc/conf/my.cnf-arm +@@ -0,0 +1,84 @@ ++[mysqld_safe] ++log-error=/data/mysql/log/mysql.log ++pid-file=/data/mysql/run/mysqld.pid ++ ++[client] ++socket=/data/mysql/run/mysql.sock ++default-character-set=utf8 ++ ++[mysqld] ++server-id=1 ++#log-error=/data/mysql/log/mysql.log ++#basedir=/usr/local/mysql ++socket=/data/mysql/run/mysql.sock ++tmpdir=/data/mysql/tmp ++datadir=/data/mysql/data ++default_authentication_plugin=mysql_native_password ++port=3306 ++user=root ++#innodb_page_size=4k ++ ++ ++max_connections=2000 ++back_log=4000 ++performance_schema=OFF ++max_prepared_stmt_count=128000 ++#transaction_isolation=READ-COMMITTED ++#skip-grant-tables ++ ++#file ++innodb_file_per_table ++innodb_log_file_size=2048M ++innodb_log_files_in_group=32 ++innodb_open_files=10000 ++table_open_cache_instances=64 ++ ++#buffers ++innodb_buffer_pool_size=230G ++innodb_buffer_pool_instances=16 ++innodb_log_buffer_size=2048M ++innodb_undo_log_truncate=OFF ++ ++#tune ++default_time_zone=+8:00 ++#innodb_numa_interleave=1 ++thread_cache_size=2000 ++sync_binlog=1 ++innodb_flush_log_at_trx_commit=1 ++innodb_use_native_aio=1 ++innodb_spin_wait_delay=180 ++innodb_sync_spin_loops=25 ++innodb_flush_method=O_DIRECT ++innodb_io_capacity=30000 ++innodb_io_capacity_max=40000 ++innodb_lru_scan_depth=9000 ++innodb_page_cleaners=16 ++#innodb_spin_wait_pause_multiplier=25 ++ ++#perf special ++innodb_flush_neighbors=0 ++innodb_write_io_threads=1 ++innodb_read_io_threads=1 ++innodb_purge_threads=1 ++ ++sql_mode=STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO,STRICT_ALL_TABLES ++ ++log-bin=mysql-bin ++skip_log_bin ++ssl=0 ++table_open_cache=30000 ++max_connect_errors=2000 ++innodb_adaptive_hash_index=0 ++ ++mysqlx=0 ++ ++#sched_affinity_foreground_thread=0-21,24-42,46,48-69,72-94 ++#sched_affinity_log_writer=44 ++#sched_affinity_log_flusher=45 ++#sched_affinity_log_write_notifier=45 ++#sched_affinity_log_flush_notifier=45 ++#sched_affinity_log_closer=43 ++#sched_affinity_log_checkpointer=45 ++#sched_affinity_purge_coordinator=43 ++ ++ +diff --git a/doc/conf/my.cnf-x86 b/doc/conf/my.cnf-x86 +new file mode 100644 +index 0000000..9c6dadb +--- /dev/null ++++ b/doc/conf/my.cnf-x86 +@@ -0,0 +1,72 @@ ++[mysqld_safe] ++ ++log-error=/data/mysql/log/mysql.log ++pid-file=/data/mysql/run/mysqld.pid ++ ++[client] ++socket=/data/mysql/run/mysql.sock ++default-character-set=utf8 ++ ++[mysqld] ++server-id=1 ++#log-error=/data/mysql/log/mysql.log ++#basedir=/usr/local/mysql ++socket=/data/mysql/run/mysql.sock ++tmpdir=/data/mysql/tmp ++datadir=/data/mysql/data ++default_authentication_plugin=mysql_native_password ++port=3306 ++user=root ++#innodb_page_size=4k ++ ++max_connections=2000 ++back_log=4000 ++performance_schema=OFF ++max_prepared_stmt_count=128000 ++#transaction_isolation=READ-COMMITTED ++#skip-grant-tables ++ ++#file ++innodb_file_per_table ++innodb_log_file_size=1802M ++innodb_log_files_in_group=18 ++innodb_open_files=10000 ++table_open_cache_instances=64 ++ ++#buffers ++innodb_buffer_pool_size=230G ++innodb_buffer_pool_instances=23 ++innodb_log_buffer_size=159M ++ ++#tune ++default_time_zone=+8:00 ++#innodb_numa_interleave=1 ++thread_cache_size=2000 ++sync_binlog=0 ++innodb_flush_log_at_trx_commit=1 ++innodb_use_native_aio=1 ++innodb_spin_wait_delay=12 ++innodb_sync_spin_loops=436 ++innodb_flush_method=O_DIRECT ++innodb_io_capacity=36368 ++innodb_io_capacity_max=40000 ++innodb_lru_scan_depth=12 ++innodb_page_cleaners=19 ++innodb_thread_concurrency=280 ++#innodb_spin_wait_pause_multiplier=25 ++ ++#perf special ++innodb_flush_neighbors=0 ++innodb_write_io_threads=161 ++innodb_read_io_threads=27 ++innodb_purge_threads=32 ++ ++sql_mode=STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO,STRICT_ALL_TABLES ++ ++log-bin=mysql-bin ++skip_log_bin ++ssl=0 ++table_open_cache=30000 ++max_connect_errors=2000 ++innodb_adaptive_hash_index=0 ++mysqlx=0 +diff --git a/doc/multiple-nic.md b/doc/multiple-nic.md +new file mode 100644 +index 0000000..24b6c3f +--- /dev/null ++++ b/doc/multiple-nic.md +@@ -0,0 +1,92 @@ ++# 多进程各自独立使用网卡 ++ ++gazelle多进程支持分别独立使用不同的网卡。本文档以lstack进程和ltran+lstack进程独立使用网卡为例,说明这种场景的配置方法。 ++ ++## 配置步骤: ++### 配置说明 ++- 同一个进程的ltran.conf和lstack.conf配置相同的`unix_prefix`参数,如示例中的`unix_prefix=08`。不同进程配置不同unix_prefix参数,如示例中的`unix_prefix=07` ++- Gazelle进程不同网卡需要配置不同的dpdk参数,包括 ++ `-a` (dpdk-19.11为`-w`) 指定绑定的网卡PCI地址白名单 ++ `--file-prefix` 指定共享目录名称(任意名称,初始化创建目录) ++ ++### 绑定网卡 ++- dpdk绑定多个网卡,下面的示例配置中,绑定了enp2s7(0000:02:07.0)和enp2s8(0000:02:08.0)两张网卡 ++``` ++[root@localhost ~]# dpdk-devbind -b igb_uio 0000:02:07.0 ++[root@localhost ~]# dpdk-devbind -b igb_uio 0000:02:08.0 ++[root@localhost ~]# dpdk-devbind -s ++ ++Network devices using DPDK-compatible driver ++============================================ ++0000:02:07.0 'Virtio network device 1000' drv=igb_uio unused= ++0000:02:08.0 'Virtio network device 1000' drv=igb_uio unused= ++``` ++ ++### 第一个进程配置 ++- 第一个进程只使用lstack模式(只是示例,任意模式都可以)在enp2s7上运行 ++ dpdk-21.11配置文件如下,dpdk-19.11把其中的-a替换成-w ++``` ++[root@localhost ~]# cat /etc/gazelle/lstack.conf ++dpdk_args=["-l", "0", "-a", "0000:02:07.0", "--socket-mem", "2048,0,0,0", "--huge-dir", "/mnt/hugepages-2M", "--proc-type", "primary", "--file-prefix", "07"] ++ ++use_ltran=0 ++kni_switch=0 ++ ++low_power_mode=0 ++listen_shadow=1 ++unix_prefix="07" ++ ++num_cpus="1" ++ ++host_addr="192.168.1.2" ++mask_addr="255.255.255.0" ++gateway_addr="192.168.1.1" ++devices="aa:bb:cc:dd:ee:ff" ++``` ++### 第二个进程配置 ++- 第二个进程使用lstack+ltran模式(只是示例,任意模式都可以)在enp2s8上运行 ++ dpdk-21.11配置文件如下,dpdk-19.11把其中的-a替换成-w ++``` ++[root@localhost ~]# cat /etc/gazelle/ltran.conf ++forward_kit="dpdk" ++forward_kit_args="-l 1 -a 0000:02:08.0 --socket-mem 1024,0,0,0 --huge-dir /mnt/hugepages --proc-type primary --legacy-mem --map-perfect --syslog daemon --file-prefix 08" ++ ++kni_switch=0 ++ ++dispatch_subnet="192.168.1.0" ++dispatch_subnet_length=8 ++dispatch_max_clients=30 ++unix_prefix="08" ++ ++bond_mode=1 ++bond_miimon=100 ++bond_mtu=1500 ++bond_ports="0x1" ++bond_macs="ff:ee:dd:cc:bb:aa" ++ ++tcp_conn_scan_interval=10 ++``` ++ ++``` ++[root@localhost ~]# cat /etc/gazelle/lstack2.conf ++dpdk_args=["-l", "3", "--socket-mem", "2048,0,0,0", "--huge-dir", "/mnt/hugepages-2M", "--proc-type", "primary", "--file-prefix", "18", "--legacy-mem", "--map-perfect"] ++ ++use_ltran=1 ++kni_switch=0 ++ ++low_power_mode=0 ++listen_shadow=1 ++unix_prefix="08" ++ ++num_cpus="2" ++ ++host_addr="192.168.1.3" ++mask_addr="255.255.255.0" ++gateway_addr="192.168.1.1" ++devices="ff:ee:dd:cc:bb:aa" ++``` ++ ++## 使用 ++- 两个进程可以分别使用不同网卡收发数据包 ++scene ++ +diff --git a/doc/multiple-nic_en.md b/doc/multiple-nic_en.md +new file mode 100644 +index 0000000..f4db49c +--- /dev/null ++++ b/doc/multiple-nic_en.md +@@ -0,0 +1,92 @@ ++# Configuring Multiple Processes to Independently Use Network Cards ++ ++Gazelle supports the capability for multiple processes to independently utilize different network cards. This document illustrates the configuration method for scenarios where the lstack process and the ltran+lstack process independently use network cards, using the example provided. ++ ++## Configuration Steps: ++### Configuration Explanation ++- For the same process, the ltran.conf and lstack.conf files should have the same `unix_prefix` parameter. For instance, in the example, `unix_prefix=08` is used. Different processes should have different `unix_prefix` parameters; for example, `unix_prefix=07` in the example. ++- Gazelle processes requiring different network cards need to be configured with different dpdk parameters, including: ++ - `-a` (in dpdk-19.11, `-w`) specifying the PCI address whitelist of the bound network card. ++ - `--file-prefix` specifying the shared directory name (any name, used to initialize and create directories). ++ ++### Binding Network Cards ++- Bind multiple network cards with dpdk. In the example configuration below, enp2s7 (0000:02:07.0) and enp2s8 (0000:02:08.0) are bound to network cards. ++``` ++[root@localhost ~]# dpdk-devbind -b igb_uio 0000:02:07.0 ++[root@localhost ~]# dpdk-devbind -b igb_uio 0000:02:08.0 ++[root@localhost ~]# dpdk-devbind -s ++ ++Network devices using DPDK-compatible driver ++============================================ ++0000:02:07.0 'Virtio network device 1000' drv=igb_uio unused= ++0000:02:08.0 'Virtio network device 1000' drv=igb_uio unused= ++``` ++ ++### Configuration for the First Process ++- The first process utilizes only the lstack mode (this is just an example, any mode can be used) and runs on enp2s7. ++ Below is the configuration file for dpdk-21.11. For dpdk-19.11, replace `-a` with `-w`. ++``` ++[root@localhost ~]# cat /etc/gazelle/lstack.conf ++dpdk_args=["-l", "0", "-a", "0000:02:07.0", "--socket-mem", "2048,0,0,0", "--huge-dir", "/mnt/hugepages-2M", "--proc-type", "primary", "--file-prefix", "07"] ++ ++use_ltran=0 ++kni_switch=0 ++ ++low_power_mode=0 ++listen_shadow=1 ++unix_prefix="07" ++ ++num_cpus="1" ++ ++host_addr="192.168.1.2" ++mask_addr="255.255.255.0" ++gateway_addr="192.168.1.1" ++devices="aa:bb:cc:dd:ee:ff" ++``` ++ ++### Configuration for the Second Process ++- The second process utilizes the lstack+ltran mode (this is just an example, any mode can be used) and runs on enp2s8. ++ Below is the configuration file for dpdk-21.11. For dpdk-19.11, replace `-a` with `-w`. ++``` ++[root@localhost ~]# cat /etc/gazelle/ltran.conf ++forward_kit="dpdk" ++forward_kit_args="-l 1 -a 0000:02:08.0 --socket-mem 1024,0,0,0 --huge-dir /mnt/hugepages --proc-type primary --legacy-mem --map-perfect --syslog daemon --file-prefix 08" ++ ++kni_switch=0 ++ ++dispatch_subnet="192.168.1.0" ++dispatch_subnet_length=8 ++dispatch_max_clients=30 ++unix_prefix="08" ++ ++bond_mode=1 ++bond_miimon=100 ++bond_mtu=1500 ++bond_ports="0x1" ++bond_macs="ff:ee:dd:cc:bb:aa" ++ ++tcp_conn_scan_interval=10 ++``` ++ ++``` ++[root@localhost ~]# cat /etc/gazelle/lstack2.conf ++dpdk_args=["-l", "3", "--socket-mem", "2048,0,0,0", "--huge-dir", "/mnt/hugepages-2M", "--proc-type", "primary", "--file-prefix", "18", "--legacy-mem", "--map-perfect"] ++ ++use_ltran=1 ++kni_switch=0 ++ ++low_power_mode=0 ++listen_shadow=1 ++unix_prefix="08" ++ ++num_cpus="2" ++ ++host_addr="192.168.1.3" ++mask_addr="255.255.255.0" ++gateway_addr="192.168.1.1" ++devices="ff:ee:dd:cc:bb:aa" ++``` ++ ++## Usage ++- The two processes can independently send and receive data packets using different network cards. ++scene +diff --git a/doc/netperf.md b/doc/netperf.md +new file mode 100644 +index 0000000..6fd9c16 +--- /dev/null ++++ b/doc/netperf.md +@@ -0,0 +1,137 @@ ++# Gazelle支持netperf性能测试 ++Netperf是一个网络性能测量工具,用于评估网络传输速度和延迟。它可以测试TCP和UDP协议的性能,并提供了多种测试模式和选项,以满足不同的测试需求。 ++gazelle已部分支持netperf测试,并持续适配及改进。 ++ ++## 支持情况说明 ++### 版本配套 ++lwip-2.1.3-115或之后版本:https://gitee.com/src-openeuler/lwip ++openeuler/gazelle 2024/02/02及之后版本:https://gitee.com/openeuler/gazelle master分支 ++netperf-2.7.0版本:https://gitee.com/src-openeuler/netperf ++ ++注:src-openEuler/gazelle暂未同步,同步后在此刷新支持netperf功能的版本号。 ++ ++### 测试范围 ++TCP_STREAM,测试tcp吞吐量 ++TCP_RR,测试tcp时延 ++注:目前TCP双端gazelle+物理机场景仅支持包长<1436(MTU) ++ ++UDP_STREAM,测试udp吞吐量 ++UDP_RR,测试udp时延 ++注:目前UDP相关测试仅支持包长<1436(MTU) ++ ++## 使用说明 ++### 环境配置 ++1、按照gazelle用户指南配置好环境后,yum install netperf或者通过源码安装netperf; ++``` ++gazelle用户指南:https://gitee.com/openeuler/gazelle/blob/master/doc/user-guide.md ++``` ++2、在/etc/gazelle/lstack.conf中,添加或修改配置项nonblock_mode=0; ++3、如果测试udp,需要在/etc/gazelle/lstack.conf中,添加或修改配置项udp_enable=1。 ++ ++### 测试命令 ++1、server ++``` ++GAZELLE_BIND_PROCNAME=netserver LD_PRELOAD=/usr/lib64/liblstack.so netserver -D -f -4 -L ip1 ++``` ++注:ip1与/etc/gazelle/lstack.conf一致;-D为取消后台运行;-f为取消执行fork,不支持fork;-4为ipv4 ++ ++2、client ++``` ++#TCP_STREAM ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t TCP_STREAM -l 10 -- -m 1024 ++``` ++注:ip1为server ip;ip2为client ip;-t为指定测试类型;-l为指定测试时长;--为指定更多可配置参数;-m为*_STREAM相关测试类型指定包长 ++ ++``` ++#TCP_RR + 时延测试 ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t TCP_RR -l 10 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++``` ++注:-r为*_RR相关测试类型指定包长;-O为指定需要show的测试结果 ++ ++``` ++#UDP_STREAM+ 时延测试 ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t UDP_STREAM -l 10 -- -m 1024 ++``` ++注:ip1为server ip;ip2为client ip;-t为指定测试类型;-l为指定测试时长;--为指定更多可配置参数;-m为*_STREAM相关测试类型指定包长 ++ ++``` ++#UDP_RR + 时延测试 ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t UDP_RR -l 10 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++``` ++注:-r为*_RR相关测试类型指定包长;-O为指定需要show的测试结果 ++ ++## 使用示例 ++以下示例因测试环境不同数据差异较大,仅供参考测试方法。 ++### server ++``` ++[root@openEuler ~]# GAZELLE_BIND_PROCNAME=netserver LD_PRELOAD=/usr/lib64/liblstack.so netserver -D -4 -f -L 192.168.1.36 ++#省略启动日志 ++Starting netserver with host '192.168.1.36' port '12865' and family AF_INET ++``` ++ ++### client ++#### TCP_STREAM ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t TCP_STREAM -l 2 -- -m 1024 ++#省略启动日志 ++MIGRATED TCP STREAM TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET ++Recv Send Send ++Socket Socket Message Elapsed ++Size Size Size Time Throughput ++bytes bytes bytes secs. 10^6bits/sec ++ ++131072 16384 1024 2.00 9824.61 ++``` ++ ++#### TCP_RR+时延测试 ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t TCP_RR -l 2 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++#省略启动日志 ++MIGRATED TCP REQUEST/RESPONSE TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET : first burst 0 ++Minimum Maximum Mean 99th Stddev Throughput Throughput ++Latency Latency Latency Percentile Latency Units ++Microseconds Microseconds Microseconds Latency Microseconds ++ Microseconds ++4 227 8.94 28 1.68 60085.02 Trans/s ++ ++``` ++ ++#### UDP_STREAM ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t UDP_STREAM -l 5 -- -m 1024 ++#省略启动日志 ++MIGRATED UDP STREAM TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET ++Socket Message Elapsed Messages ++Size Size Time Okay Errors Throughput ++bytes bytes secs # # 10^6bits/sec ++ ++212992 1024 5.00 344561 0 564.42 ++212992 5.00 344533 564.38 ++ ++``` ++ ++#### UDP_RR+时延测试 ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t UDP_RR -l 10 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++#省略启动日志 ++MIGRATED UDP REQUEST/RESPONSE TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET : first burst 0 ++Minimum Maximum Mean 99th Stddev Throughput Throughput ++Latency Latency Latency Percentile Latency Units ++Microseconds Microseconds Microseconds Latency Microseconds ++ Microseconds ++77 7293 176.77 885 193.87 5646.59 Trans/s ++ ++``` ++ ++## 常见问题及解决方案 ++### 常见问题1 ++启动client后,sever退出,client报错如下 ++``` ++Resource temporarily unavailable ++netperf: remote error 11 ++``` ++原因:/etc/gazelle/lstack.conf中没有配置nonblock_mode=0 ++ ++### 常见问题2 ++测试TCP后,想要测试UDP出现错误或者数据异常 ++原因:/etc/gazelle/lstack.conf中没有配置udp_enable=1 +diff --git a/doc/netperf_en.md b/doc/netperf_en.md +new file mode 100644 +index 0000000..b7bfd99 +--- /dev/null ++++ b/doc/netperf_en.md +@@ -0,0 +1,135 @@ ++# Gazelle supports netperf performance testing ++ ++Netperf is a network performance measurement tool used to assess network throughput and latency. It can test the performance of TCP and UDP protocols and offers various test modes and options to meet different testing needs. Gazelle has partially supported netperf testing and continues to adapt and improve. ++ ++## Support Overview ++### Version Compatibility ++lwip-2.1.3-115 or later versions: [lwIP Repository](https://gitee.com/src-openeuler/lwip) ++openeuler/gazelle versions from 2024/02/02 onwards: [Gazelle Repository](https://gitee.com/openeuler/gazelle) (master branch) ++netperf-2.7.0 version: [Netperf Repository](https://gitee.com/src-openeuler/netperf) ++ ++Note: src-openEuler/gazelle is not currently synchronized. The version number supporting netperf functionality will be updated upon synchronization. ++ ++### Test Scope ++TCP_STREAM: Tests TCP throughput ++TCP_RR: Tests TCP latency ++Note: Currently, TCP bidirectional gazelle + physical machine scenarios only support packet lengths <1436 (MTU) ++ ++UDP_STREAM: Tests UDP throughput ++UDP_RR: Tests UDP latency ++Note: Currently, UDP-related tests only support packet lengths <1436 (MTU) ++ ++## Usage Instructions ++### Environment Setup ++1. Follow the gazelle user guide to configure the environment properly, then install netperf via yum or install netperf from source code. ++``` ++Gazelle User Guide: [link](https://gitee.com/openeuler/gazelle/blob/master/doc/user-guide.md) ++``` ++2. Add or modify the configuration item nonblock_mode=0 in /etc/gazelle/lstack.conf. ++3. If testing UDP, add or modify the configuration item udp_enable=1 in /etc/gazelle/lstack.conf. ++ ++### Testing Commands ++1. Server ++``` ++GAZELLE_BIND_PROCNAME=netserver LD_PRELOAD=/usr/lib64/liblstack.so netserver -D -f -4 -L ip1 ++``` ++Note: ip1 should be consistent with /etc/gazelle/lstack.conf; -D for running in the foreground; -f for not forking (fork not supported); -4 for IPv4. ++ ++2. Client ++``` ++# TCP_STREAM ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t TCP_STREAM -l 10 -- -m 1024 ++``` ++Note: ip1 is the server IP; ip2 is the client IP; -t specifies the test type; -l specifies the test duration; -- specifies more configurable parameters; -m specifies the packet length for *_STREAM related test types. ++ ++``` ++# TCP_RR + Latency Test ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t TCP_RR -l 10 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++``` ++Note: -r specifies the packet length for *_RR related test types; -O specifies the test results to display. ++ ++``` ++# UDP_STREAM + Latency Test ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t UDP_STREAM -l 10 -- -m 1024 ++``` ++Note: ip1 is the server IP; ip2 is the client IP; -t specifies the test type; -l specifies the test duration; -- specifies more configurable parameters; -m specifies the packet length for *_STREAM related test types. ++ ++``` ++# UDP_RR + Latency Test ++GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H ip1 -L ip2 -t UDP_RR -l 10 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++``` ++Note: -r specifies the packet length for *_RR related test types; -O specifies the test results to display. ++ ++## Usage Example ++The following examples are for reference and testing purposes. Actual data may vary due to different testing environments. ++ ++### Server ++``` ++[root@openEuler ~]# GAZELLE_BIND_PROCNAME=netserver LD_PRELOAD=/usr/lib64/liblstack.so netserver -D -4 -f -L 192.168.1.36 ++# Start-up logs omitted ++Starting netserver with host '192.168.1.36' port '12865' and family AF_INET ++``` ++ ++### Client ++#### TCP_STREAM ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t TCP_STREAM -l 2 -- -m 1024 ++# Start-up logs omitted ++MIGRATED TCP STREAM TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET ++Recv Send Send ++Socket Socket Message Elapsed ++Size Size Size Time Throughput ++bytes bytes bytes secs. 10^6bits/sec ++ ++131072 16384 1024 2.00 9824.61 ++``` ++ ++#### TCP_RR+Latency Test ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t TCP_RR -l 2 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++# Start-up logs omitted ++MIGRATED TCP REQUEST/RESPONSE TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET : first burst 0 ++Minimum Maximum Mean 99th Stddev Throughput Throughput ++Latency Latency Latency Percentile Latency Units ++Microseconds Microseconds Microseconds Latency Microseconds ++ Microseconds ++4 227 8.94 28 1.68 60085.02 Trans/s ++``` ++ ++#### UDP_STREAM ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t UDP_STREAM -l 5 -- -m 1024 ++# Start-up logs omitted ++MIGRATED UDP STREAM TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET ++Socket Message Elapsed Messages ++Size Size Time Okay Errors Throughput ++bytes bytes secs # # 10^6bits/sec ++ ++212992 1024 5.00 344561 0 564.42 ++212992 5.00 344533 564.38 ++``` ++ ++#### UDP_RR+Latency Test ++``` ++[root@openEuler lstack]# GAZELLE_BIND_PROCNAME=netperf LD_PRELOAD=/usr/lib64/liblstack.so netperf -H 192.168.1.36 -L 192.168.1.34 -t UDP_RR -l 10 -- -r 1024 -O MIN_LATENCY,MAX_LATENCY,MEAN_LATENCY,P99_LATENCY,STDDEV_LATENCY,THROUGHPUT,THROUGHPUT_UNITS ++# Start-up logs omitted ++MIGRATED UDP REQUEST/RESPONSE TEST from 192.168.1.34 () port 0 AF_INET to 192.168.1.36 () port 0 AF_INET : first burst 0 ++Minimum Maximum Mean 99th Stddev Throughput Throughput ++Latency Latency Latency Percentile Latency Units ++Microseconds Microseconds Microseconds Latency Microseconds ++ Microseconds ++77 7293 176.77 885 193.87 5646.59 Trans/s ++``` ++ ++## Common Issues and Solutions ++### Common Issue 1 ++After starting the client, the server exits, and the client reports the following error: ++``` ++Resource temporarily unavailable ++netperf: remote error 11 ++``` ++Reason: nonblock_mode=0 is not configured in /etc/gazelle/lstack.conf. ++ ++### Common Issue 2 ++After testing TCP, encountering errors or abnormal data when testing UDP. ++Reason: udp_enable=1 is not configured in /etc/gazelle/lstack.conf. +diff --git a/doc/pdump.md b/doc/pdump.md +new file mode 100644 +index 0000000..1466662 +--- /dev/null ++++ b/doc/pdump.md +@@ -0,0 +1,90 @@ ++# 使用pdump抓包 ++pdump作为gazelle的从进程,共享网卡驱动收发队列,获取到报文并按pcap格式写入文件,该文件可用wireshark查看。 ++ ++openEuler的dpdk软件包中提供了gazelle-pdump命令对Gazelle抓包。 ++ ++## 常用参数说明: ++ ++|选项|参数值示例|说明| ++|:---|:---|:---| ++|--file-prefix|gazelle|指定主进程共享目录位置
需要和lstack.conf或ltran.conf中的-file-prefix保持一致| ++|device_id|0000:01:00.0|抓包网卡的PCI地址
需要和dpdk-devbind -s命令查询的结果一致| ++|rx-dev|/root/capture-rx.pcap|网卡接收的数据包存放的文件位置| ++|tx-dev|/root/capture-tx.pcap|网卡发送的数据包存放的文件位置,如果它配置的路径与rx-dev的相同,则文件中会同时包含收发的数据包| ++ ++更多参数解释: ++``` ++gazelle-pdump --help ++``` ++ ++## 使用示例: ++``` ++gazelle-pdump --file-prefix gazelle -- --pdump 'device_id=0000:01:00.0,queue=*,rx-dev=/root/capture-rx.pcap,tx-dev=/root/capture-tx.pcap' ++``` ++scene ++ ++使用ctrl+C停止抓包,抓包完成后数据包将保存为pcap文件格式,它可以被`tcpdump`命令进一步处理。 ++ ++scene ++ ++下面的命令将过滤数据包中源IP为`192.168.1.10`的数据包: ++``` ++tcpdump -r /root/capture.pcap src host 192.168.1.10 -w /root/filter-capture.pcap ++``` ++ ++## 常见问题及解决方案: ++### 报错信息1 ++``` ++Device 0000:02:08.0 is not driven by the primary process ++EAL: Requested device 0000:02:08.0 cannot be used ++Port 1 MAC: 02 70 63 61 70 00 ++PDUMP: client request for pdump enable/disable failed ++PDUMP: client request for pdump enable/disable failed ++PDUMP: client request for pdump enable/disable failed ++``` ++原因:lstack/ltran使用的网卡和gazelle-pdump指定的网卡不一致,需要重新检查device_id参数。 ++ ++### 报错信息2 ++``` ++EAL: Multi-process socket /var/run/dpdk/(null)/mp_socket_3884565_28c50010577fe ++EAL: failed to send to (/var/run/dpdk/(null)/mp_socket) due to Connection refused ++EAL: Fail to send request /var/run/dpdk/(null)/mp_socket:bus_vdev_mp ++vdev_scan(): Failed to request vdev from primary ++EAL: Selected IOVA mode 'PA' ++EAL: Probing VFIO support... ++EAL: failed to send to (/var/run/dpdk/(null)/mp_socket) due to Connection refused ++EAL: Cannot send message to primary ++EAL: error allocating rte services array ++EAL: FATAL: rte_service_init() failed ++EAL: rte_service_init() failed ++``` ++原因:gazelle-pdump指定的共享内存路径中没有相应的文件,需要重新检查--file-prefix参数。 ++ ++### 报错信息3 ++``` ++EAL: Failed to hotplug add device ++EAL: Error - exiting with code: 1 ++ Cause: vdev creation failed:create_mp_ring_vdev:700 ++``` ++原因:`lstack`/`ltran`没有链接到`librte_pmd_pcap.so(dpdk-19.11)`/`librte_net_pcap.so(dpdk-21.11)`动态库,需要重新检查编译的Makefile,解决方法如下。 ++- 修改dpdk.spec加入PDUMP的编译选项,重新编译dpdk ++%build ++``` ++sed -ri 's,(LIBRTE_PMD_PCAP=).*,\1y,' %{target}/.config ++``` ++ ++ ++- 使用gazelle相同的编译参数编译dpdk-pdump ++pdump的源文件位于dpdk的目录下:`app/pdump/main.c ` ++ ++- 示例编译命令(基于dpdk-19.11): ++``` ++cc -O0 -g -fno-strict-aliasing -mssse3 -I/usr/include/dpdk -fstack-protector-strong -Werror -Wall -fPIC -c -o main.o main.c ++``` ++ ++- 示例链接命令(基于dpdk-19.11): ++``` ++cc -lm -lpthread -lrt -lnuma -lconfig -lboundscheck -Wl,--whole-archive /usr/lib64/librte_pci.so /usr/lib64/librte_bus_pci.so /usr/lib64/librte_cmdline.so /usr/lib64/librte_hash.so /usr/lib64/librte_mempool.so /usr/lib64/librte_mempool_ring.so /usr/lib64/librte_timer.so /usr/lib64/librte_eal.so /usr/lib64/librte_ring.so /usr/lib64/librte_mbuf.so /usr/lib64/librte_kni.so /usr/lib64/librte_gro.so /usr/lib64/librte_pmd_ixgbe.so /usr/lib64/librte_kvargs.so /usr/lib64/librte_pmd_hinic.so /usr/lib64/librte_pmd_i40e.so /usr/lib64/librte_pmd_virtio.so /usr/lib64/librte_bus_vdev.so /usr/lib64/librte_net.so /usr/lib64/librte_ethdev.so /usr/lib64/librte_pdump.so /usr/lib64//librte_pmd_pcap.so main.o -Wl,--no-whole-archive -Wl,--whole-archive -Wl,--no-whole-archive -o gazelle-pdump ++``` ++ ++保证链接命令中的动态库和liblstack.so使用的编译选项是相同的,就是Makefile里的LIBRTE_LIB库 +diff --git a/doc/pdump_en.md b/doc/pdump_en.md +new file mode 100644 +index 0000000..95856de +--- /dev/null ++++ b/doc/pdump_en.md +@@ -0,0 +1,95 @@ ++# Packet Capture with pdump ++ ++pdump acts as a subprocess of gazelle, sharing the network card driver receive and transmit queues to capture packets and write them to a file in pcap format. This file can be viewed with Wireshark. ++ ++The openEuler dpdk package provides the gazelle-pdump command for capturing packets with Gazelle. ++ ++## Commonly used parameters: ++ ++| Option | Example Value | Description | ++| ------------- | ---------------- | ----------- | ++| --file-prefix | gazelle | Specifies the shared directory location of the main process. It needs to match the value in lstack.conf or ltran.conf. | ++| device_id | 0000:01:00.0 | PCI address of the capture network card. This needs to match the result from the dpdk-devbind -s command. | ++| rx-dev | /root/capture-rx.pcap | Location where the received data packets from the network card are stored. | ++| tx-dev | /root/capture-tx.pcap | Location where the transmitted data packets from the network card are stored. If this path is the same as rx-dev, the file will contain both received and transmitted data packets. | ++ ++For more parameter explanations: ++``` ++gazelle-pdump --help ++``` ++ ++## Usage example: ++``` ++gazelle-pdump --file-prefix gazelle -- --pdump 'device_id=0000:01:00.0,queue=*,rx-dev=/root/capture-rx.pcap,tx-dev=/root/capture-tx.pcap' ++``` ++![scene](images/pdump.png) ++ ++Use ctrl+C to stop the capture. Once the capture is complete, the data packets will be saved in pcap file format, which can be further processed using the `tcpdump` command. ++ ++![scene](images/pdump-tcpdump.png) ++ ++The following command filters packets with a source IP of `192.168.1.10`: ++``` ++tcpdump -r /root/capture.pcap src host 192.168.1.10 -w /root/filter-capture.pcap ++``` ++ ++## Common issues and solutions: ++ ++### Error message 1 ++``` ++Device 0000:02:08.0 is not driven by the primary process ++EAL: Requested device 0000:02:08.0 cannot be used ++Port 1 MAC: 02 70 63 61 70 00 ++PDUMP: client request for pdump enable/disable failed ++PDUMP: client request for pdump enable/disable failed ++PDUMP: client request for pdump enable/disable failed ++``` ++Cause: The network card used by lstack/ltran and the one specified in gazelle-pdump do not match. Check the device_id parameter. ++ ++### Error message 2 ++``` ++EAL: Multi-process socket /var/run/dpdk/(null)/mp_socket_3884565_28c50010577fe ++EAL: failed to send to (/var/run/dpdk/(null)/mp_socket) due to Connection refused ++EAL: Fail to send request /var/run/dpdk/(null)/mp_socket:bus_vdev_mp ++vdev_scan(): Failed to request vdev from primary ++EAL: Selected IOVA mode 'PA' ++EAL: Probing VFIO support... ++EAL: failed to send to (/var/run/dpdk/(null)/mp_socket) due to Connection refused ++EAL: Cannot send message to primary ++EAL: error allocating rte services array ++EAL: FATAL: rte_service_init() failed ++EAL: rte_service_init() failed ++``` ++Cause: The specified shared memory path for gazelle-pdump does not contain the appropriate files. Check the --file-prefix parameter. ++ ++### Error message 3 ++``` ++EAL: Failed to hotplug add device ++EAL: Error - exiting with code: 1 ++ Cause: vdev creation failed:create_mp_ring_vdev:700 ++``` ++Cause: `lstack`/`ltran` is not linked to the dynamic library `librte_pmd_pcap.so(dpdk-19.11)`/`librte_net_pcap.so(dpdk-21.11)`. Check the compiled Makefile. ++ ++Here’s how to address it: ++- Modify dpdk.spec to include PDUMP compilation options and recompile dpdk. ++ ++ `%build` ++ ``` ++ sed -ri 's,(LIBRTE_PMD_PCAP=).*,\1y,' %{target}/.config ++ ``` ++ ++- Compile dpdk-pdump using the same compilation options as gazelle. ++ ++ The source file for pdump is located in the dpdk directory: `app/pdump/main.c`. ++ ++- Example compilation command (based on dpdk-19.11): ++ ``` ++ cc -O0 -g -fno-strict-aliasing -mssse3 -I/usr/include/dpdk -fstack-protector-strong -Werror -Wall -fPIC -c -o main.o main.c ++ ``` ++ ++- Example linking command (based on dpdk-19.11): ++ ``` ++ cc -lm -lpthread -lrt -lnuma -lconfig -lboundscheck -Wl,--whole-archive /usr/lib64/librte_pci.so /usr/lib64/librte_bus_pci.so /usr/lib64/librte_cmdline.so /usr/lib64/librte_hash.so /usr/lib64/librte_mempool.so /usr/lib64/librte_mempool_ring.so /usr/lib64/librte_timer.so /usr/lib64/librte_eal.so /usr/lib64/librte_ring.so /usr/lib64/librte_mbuf.so /usr/lib64/librte_kni.so /usr/lib64/librte_gro.so /usr/lib64/librte_pmd_ixgbe.so /usr/lib64/librte_kvargs.so /usr/lib64/librte_pmd_hinic.so /usr/lib64/librte_pmd_i40e.so /usr/lib64/librte_pmd_virtio.so /usr/lib64/librte_bus_vdev.so /usr/lib64/librte_net.so /usr/lib64/librte_ethdev.so /usr/lib64/librte_pdump.so /usr/lib64//librte_pmd_pcap.so main.o -Wl,--no-whole-archive -Wl,--whole-archive -Wl,--no-whole-archive -o gazelle-pdump ++ ``` ++ ++Ensure that the dynamic libraries in the linking command and the compilation options used for `liblstack.so` are the same as those in the Makefile. +diff --git a/doc/programmer-guide.md b/doc/programmer-guide.md +new file mode 100644 +index 0000000..23e6e29 +--- /dev/null ++++ b/doc/programmer-guide.md +@@ -0,0 +1,168 @@ ++## 设计理念 ++ ++* 协议栈绑定独立的cpu、网卡队列,轮询模式收包,避免中断和调度开销。 ++* 协议栈使用独立的内存池和线程资源(线程化的全局变量、arp/udp/tcp表等),避免锁竞争和cache miss,线性可扩展。 ++* 请求通过网卡硬件均衡 (RSS / flow director),或者软件均衡 (hash table) ,分发流量到各协议栈。 ++* 提供标准POSIX API,应用零修改。 ++ ++ ++ ++## 线程模型 ++ ++### 1、分离线程模型 ++ ++![](images/programmer_分离线程模型.png) ++ ++* 适用场景:业务线程数量很多,支持fd跨线程使用。(通用场景) ++* 协议栈线程和业务线程分离:类似linux内核协议栈的软中断实现,收到请求时由协议栈唤醒业务线程。 ++* 高速无锁读写:业务线程`recv/send`通过无锁队列读写报文数据,不与协议栈产生锁竞争。其他控制面socket请求通过`rpc`发送到协议栈线程。 ++ ++### 2、共线程模型 ++ ++![](images/programmer_共线程模型.png) ++ ++* 适用场景:业务网络线程数量不多,数量固定。业务线程fd不跨线程使用。 ++* 协议栈和业务共线程:业务和协议栈在一个上下文运行,`poll/epoll`内执行协议栈轮询收包。 ++* 极致性能:独占cpu,不需要唤醒调度,但是业务处理时间长可能导致协议栈丢包。 ++* 各业务线程可能`listen`不同port、或者网卡队列数小于线程数,这时需要**流量均衡与转发**。 ++ ++ ++ ++## 多进程模式 ++ ++### 1、进程独占网卡 ++ ++![](images/programmer_sriov.png) ++ ++* `SR-IOV`网卡硬件虚拟化是一个应用普遍的技术,一个网卡PF可以虚拟出多个VF网卡,共享网卡带宽。 ++* PF/VF通过网卡硬件switch基于二层转发,网卡间转发会做DMA拷贝。因此各网卡可以分别绑定内核态、用户态驱动。 ++* 兼容内核协议栈,gazelle不支持的协议或者不需要加速的流量交给内核网卡处理,且不像dpdk kni一样有较高地性能损耗。 ++ ++```sh ++# 每个PF支持32队列,每个PF支持的VF是3个,VF队列数是1个 ++0000:7d:00.1 'HNS GE/10GE/25GE Network Controller a221' if=enp125s0f1 drv=hns3 unused=hclge ++ ++# 每个PF支持16队列,每个PF支持的VF是60个,VF队列数是4个 ++# 每个PF支持64队列,每个PF支持的VF是24个,VF队列数是8个 ++0000:03:00.0 'Hi1822 Family (4*25GE) 1822' if=enp3s0 drv=hinic unused= ++ ++# 每个PF支持63队列,每个PF支持的VF是8个,VF队列数是11个 ++0000:01:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=enp1s0f0 drv=mlx5_core unused= *Active* ++ ++# 每个PF支持63队列,每个PF支持的VF是63个, ++0000:03:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=enp3s0f0 drv=ixgbe unused= ++``` ++ ++### 2、进程共用网卡 ++ ++* 适用场景:网卡数量不够,需要多个进程共用一个网卡。但是进程隔离性可能较差。 ++* 各业务进程/线程可能`listen`不同port、或者网卡队列数小于线程数,这时需要**流量均衡与转发**。 ++ ++![](images/programmer_进程共用网卡.png) ++ ++当前ltran软件转发方案:隔离性好,性能差 ++ ++* 设计理念:进程隔离思想,业务进程重启互不影响。 ++* ltran作为独立的转发进程,收发线程额外占用cpu,单核收、单核发。 ++* ltran使用物理网卡,业务使用软件队列。 ++* 为了防止内存泄露,跨进程有报文拷贝。 ++ ++当前硬件转发方案:隔离性差,性能好 ++ ++* 各进程采用dpdk主从进程模式,共享一个大页内存地址空间,报文在进程间转发时不进行拷贝。 ++* 各进程由于共享大页地址空间,直接使用网卡队列,多进程间没有隔离。为防止内存泄露,必须将这些进程看作整体,同时启动、退出。 ++* Flow Director 硬件转发功能不普及。 ++* 网卡队列数量是固定,在初始化时确定,支持协议栈线程数量固定。 ++ ++### 3、流量均衡与转发 ++ ++设计目标: ++ ++* 融合“软件转发方案”和“硬件转发方案”。 ++* 去除中心转发节点和跨进程拷贝,进程异常退出不能泄露资源。 ++ ++![](images/programmer_流量均衡与转发.png) ++ ++#### 软件转发方案 ++ ++* 不作为独立线程额外分配cpu,在协议栈网卡收包后执行。 ++* 基于dpdk hash表实现,支持`并发写并发读`。 ++* 各网卡硬件队列对应分配一个软件队列,通过软件队列分发报文到其他线程。软件队列采用`多生产者单消费者模式`。 ++ ++#### 内存回收方案 ++ ++需要一个管理节点,监控进程状态,在进程正常/异常退出时回收资源。 ++ ++启动一个协议栈线程: ++ ++* queue_alloc:申请一个queue_id,表示网卡硬件队列、软件转发队列,用于收发报文。 ++ ++* rule_add:在`connect/listen`时添加转发规则,执行`close`时删除转发规则。 ++ ++* memp_alloc:申请一系列memp(几十个左右),用于协议栈的定长结构体内存池。 ++ ++ 注意:需要创建一个`memp_list`存储协议栈线程的所有memp,用于释放。 ++ ++* mbuf_get:每个queue_id绑定了mbufpool,申请mbuf用于收发报文。 ++ ++ 注意:当发生**软件转发**或**进程间loopback**时,会导致mbuf跨进程传递。进程异常退出时需要回收mbuf。 ++ ++退出一个协议栈线程: ++ ++* queue_free:释放queue_id,此队列报文暂时会出现**丢包**。 ++ ++* rule_delete:遍历协议栈的`tcp连接表、udp连接表`,删除转发规则。 ++ ++* memp_free:遍历`memp_list`释放所有的memp。 ++ ++* mbuf_put:通过`rte_mempool_walk()`遍历mbufpool,通过`rte_mempool_obj_iter`遍历mbufpool的所有mbuf,回收未释放的mbuf。 ++ ++ ++ ++## mbuf内存管理 ++ ++![img](images/programmer_mbufpool.png) ++ ++报文数据流: ++ ++* `#1`接收报文 ++ ++ 每个网卡队列绑定了一个L1_mbufpool,由网卡驱动申请mbuf,gazelle释放mbuf。 ++ ++* `#2`发送报文 ++ ++ gazelle从L1_mbufpool申请mbuf,网卡发送结束后释放mbuf。 ++ ++* `#3`发送报文 ++ ++ 当连接数较多内存紧张时,即L1_mbufpool内存池低于`low watermark`,创建L2_mbufpool内存池用于发送。 ++ ++mbufpool竞争问题: ++ ++* 不再使用per connection cache,启用per cpu cache,参考`rte_mempool_default_cache()` ++* mbufpool内存池低于`low watermark`时,关闭`rte_lcore_id()`标记,扫描回收per cpu cache的内存。 ++ ++ ++ ++## DT测试 ++ ++![](images/programmer_veth.png) ++ ++当前问题: ++ ++* 要求物理网卡/虚拟网卡、两台主机;用例自动化程度低;只覆盖了“ltran软件转发方案”。 ++ ++设计目标: ++ ++* 不依赖硬件环境(一台主机,不要求物理/虚拟网卡),一键自动化部署,快速(10分钟内出结果)。作为开发门禁。 ++* 设计一种用户态虚拟网卡`user-veth`,使用软件队列模拟网卡硬件队列,`rte_softrss`模拟rss均衡。 ++* 设计一种虚拟网桥`user-vbridge`,使用hash模拟二层转发。 ++* 当测试网卡启动、硬件offload、网络性能时,才要求物理网卡。如果只有一台主机,可通过`SR-IOV`虚拟出多个VF网卡测试。 ++ ++ ++ ++## 性能调优 - TODO ++ ++* `rte_trace` ++* `rte_metrics` ++ +diff --git a/doc/programmer-guide_en.md b/doc/programmer-guide_en.md +new file mode 100644 +index 0000000..e8afd49 +--- /dev/null ++++ b/doc/programmer-guide_en.md +@@ -0,0 +1,130 @@ ++## Design Principles ++ ++* Bind the protocol stack to independent CPUs and NIC queues, using polling mode for packet reception to avoid interrupt and scheduling overhead. ++* Utilize separate memory pools and thread resources for the protocol stack (e.g., thread-local variables, ARP/UDP/TCP tables) to minimize lock contention and cache misses, ensuring linear scalability. ++* Distribute requests across NIC hardware (RSS/flow director) or software (hash table) for load balancing, directing traffic to different protocol stacks. ++* Provide a standard POSIX API with zero modifications required for applications. ++ ++## Thread Models ++ ++### 1. Separate Thread Model ++ ++![](images/programmer_分离线程模型_en.png) ++ ++* Suitable for scenarios with a large number of business threads, supporting cross-thread FD usage (general use case). ++* Separate protocol stack threads from business threads: similar to Linux kernel's soft interrupt implementation, the protocol stack wakes up business threads upon receiving requests. ++* High-speed lock-free read/write: business threads `recv/send` data through lock-free queues, avoiding lock contention with the protocol stack. Other control plane socket requests are sent to protocol stack threads via RPC. ++ ++### 2. Shared Thread Model ++ ++![](images/programmer_共线程模型_en.png) ++ ++* Suitable for scenarios where there are not many business network threads, and the number of threads is fixed. FDs are not shared across threads. ++* Protocol stack and business threads share the same context: business and protocol stack run in the same context, executing packet polling within `poll/epoll`. ++* Ultimate performance: exclusive CPU usage without the need for wake-up scheduling, but long business processing times may lead to packet loss in the protocol stack. ++* Each business thread may `listen` on different ports or have fewer NIC queues than threads, requiring **traffic balancing and forwarding**. ++ ++## Multi-Process Mode ++ ++### 1. Process Exclusive NIC ++ ++![](images/programmer_sriov.png) ++ ++* SR-IOV NIC hardware virtualization is a widely used technology, where a NIC PF can virtualize multiple VF NICs, sharing NIC bandwidth. ++* PF/VF virtualization is based on hardware switch for layer 2 forwarding, with DMA copying between NICs. Hence, each NIC can be bound separately to kernel-space and user-space drivers. ++* Compatible with kernel protocol stack, allowing non-accelerated or unsupported protocols to be handled by the kernel NIC without significant performance loss. ++ ++### 2. Process Shared NIC ++ ++* Suitable for scenarios where the number of NICs is limited, and multiple processes need to share a single NIC. However, process isolation may be poor. ++* Each business process/thread may `listen` on different ports or have fewer NIC queues than threads, requiring **traffic balancing and forwarding**. ++ ++![](images/programmer_进程共用网卡_en.png) ++ ++Current software forwarding solution (ltran): good isolation, poor performance ++ ++* Design philosophy: process isolation, no impact on other business processes upon restart. ++* ltran acts as an independent forwarding process, with separate CPU usage for receive and transmit threads, one core each. ++* ltran uses a physical NIC, while business processes use software queues. ++* To prevent memory leaks, there is packet copying across processes. ++ ++Current hardware forwarding solution: poor isolation, good performance ++ ++* Each process adopts the dpdk master-slave process mode, sharing a large page memory address space, with packets forwarded between processes without copying. ++* Due to shared large page address space, there is no isolation between processes. To prevent memory leaks, these processes must be treated as a whole, started and stopped together. ++* Flow Director hardware forwarding functionality is not universal. ++* The number of NIC queues is fixed and determined during initialization, supporting a fixed number of protocol stack threads. ++ ++### 3. Traffic Balancing and Forwarding ++ ++Design goals: ++ ++* Integrate software and hardware forwarding solutions. ++* Remove central forwarding nodes and cross-process copying, preventing resource leaks upon process abnormal termination. ++ ++![](images/programmer_流量均衡与转发_en.png) ++ ++#### Software Forwarding Solution ++ ++* No additional CPU allocation as an independent thread; executed after packet reception in the protocol stack. ++* Implemented based on dpdk hash tables, supporting concurrent read and write. ++* Each NIC hardware queue corresponds to a software queue, distributing packets to other threads via software queues. Software queues adopt a multiple producers/single consumer model. ++ ++#### Memory Reclamation Solution ++ ++Requires a management node to monitor process states and reclaim resources upon normal/abnormal process termination. ++ ++Upon starting a protocol stack thread: ++ ++* queue_alloc: allocate a queue_id, representing the NIC hardware queue and software forwarding queue, used for packet reception and transmission. ++* rule_add: add forwarding rules during `connect/listen`, and delete them during `close`. ++* memp_alloc: allocate a series of memp (around dozens) for the protocol stack's fixed-length structure memory pool. ++ Note: a `memp_list` is created to store all memp of protocol stack threads for release. ++* mbuf_get: each queue_id is bound to an mbufpool, used for packet reception and transmission. ++ ++Upon exiting a protocol stack thread: ++ ++* queue_free: release queue_id, causing temporary packet loss for this queue. ++* rule_delete: traverse the `tcp connection table` and `udp connection table` of the protocol stack to delete forwarding rules. ++* memp_free: traverse `memp_list` to release all memp. ++* mbuf_put: traverse mbufpool using `rte_mempool_walk()` and `rte_mempool_obj_iter` to reclaim unreleased mbufs. ++ ++## mbuf Memory Management ++ ++![img](images/programmer_mbufpool_en.png) ++ ++Packet Data Flow: ++ ++* `#1` Packet Reception: ++ Each NIC queue is bound to an L1_mbufpool. Mbufs are allocated by the NIC driver and released by Gazelle. ++ ++* `#2` Packet Transmission: ++ Gazelle requests mbufs from the L1_mbufpool, and after the NIC finishes transmission, the mbufs are released. ++ ++* `#3` Packet Transmission with Memory Pressure: ++ When memory is tight due to a high number of connections, i.e., when the L1_mbufpool falls below the low watermark, an L2_mbufpool is created for transmission. ++ ++Mbufpool Contention Issues: ++ ++* Per-connection cache is replaced with per-CPU cache, as referenced in `rte_mempool_default_cache()`. ++* When the mbufpool falls below the low watermark, the `rte_lcore_id()` flag is turned off, and a scan is conducted to reclaim memory from per-CPU caches. ++ ++## DT Testing ++ ++![](images/programmer_veth.png) ++ ++Current Issues: ++ ++* Requires physical/virtual NICs and two hosts; low level of test automation; only covers the "ltran software forwarding solution". ++ ++Design Objectives: ++ ++* No dependency on hardware environment (single host, no requirement for physical/virtual NICs), one-click automation deployment, rapid results (within 10 minutes). Serve as a development barrier. ++* Design a user-space virtual NIC, `user-veth`, simulating NIC hardware queues using software queues, with `rte_softrss` simulating RSS load balancing. ++* Design a virtual bridge, `user-vbridge`, using hashing to simulate layer 2 forwarding. ++* Test physical NIC startup, hardware offload, and network performance only when necessary. If only one host is available, multiple VF NICs can be virtualized using `SR-IOV` for testing. ++ ++## Performance Tuning - TODO ++ ++* `rte_trace` ++* `rte_metrics` +diff --git a/doc/releasenote.md b/doc/releasenote.md +new file mode 100644 +index 0000000..9422b29 +--- /dev/null ++++ b/doc/releasenote.md +@@ -0,0 +1,7 @@ ++Gazelle ++ ++# Gazelle特性变更 ++## 2024-01-01 ++- 多进程模式衰退,后续将不会对该特性提供维护支持,并在将来的版本中移除; ++ 后续将会有新的多进程模式方案提供,请期待; ++ 期间有多进程使用需求的,可以尝试使用SR-IOV网卡硬件虚拟化组网模式; +diff --git a/doc/support.md b/doc/support.md +index 985e08c..2e4d578 100644 +--- a/doc/support.md ++++ b/doc/support.md +@@ -1,37 +1,36 @@ +-Gazelle +- +-# 用户态协议栈Gazelle支持posix接口列表 +-- int32_t epoll_create1(int32_t flags) +-- int32_t epoll_create(int32_t size) +-- int32_t epoll_ctl(int32_t epfd, int32_t op, int32_t fd, struct epoll_event* event) +-- int32_t epoll_wait(int32_t epfd, struct epoll_event* events, int32_t maxevents, int32_t timeout) +-- int32_t fcntl64(int32_t s, int32_t cmd, ...) +-- int32_t fcntl(int32_t s, int32_t cmd, ...) +-- int32_t ioctl(int32_t s, int32_t cmd, ...) +-- int32_t accept(int32_t s, struct sockaddr *addr, socklen_t *addrlen) +-- int32_t accept4(int32_t s, struct sockaddr *addr, socklen_t *addrlen, int32_t flags) +-- int32_t bind(int32_t s, const struct sockaddr *name, socklen_t namelen) +-- int32_t connect(int32_t s, const struct sockaddr *name, socklen_t namelen) +-- int32_t listen(int32_t s, int32_t backlog) +-- int32_t getpeername(int32_t s, struct sockaddr *name, socklen_t *namelen) +-- int32_t getsockname(int32_t s, struct sockaddr *name, socklen_t *namelen) +-- int32_t getsockopt(int32_t s, int32_t level, int32_t optname, void *optval, socklen_t *optlen) +-- int32_t setsockopt(int32_t s, int32_t level, int32_t optname, const void *optval, socklen_t optlen) +-- int32_t socket(int32_t domain, int32_t type, int32_t protocol) +-- ssize_t read(int32_t s, void *mem, size_t len) +-- ssize_t readv(int32_t s, const struct iovec *iov, int iovcnt) +-- ssize_t write(int32_t s, const void *mem, size_t size) +-- ssize_t writev(int32_t s, const struct iovec *iov, int iovcnt) +-- ssize_t recv(int32_t sockfd, void *buf, size_t len, int32_t flags) +-- ssize_t send(int32_t sockfd, const void *buf, size_t len, int32_t flags) +-- ssize_t recvmsg(int32_t s, struct msghdr *message, int32_t flags) +-- ssize_t sendmsg(int32_t s, const struct msghdr *message, int32_t flags) +-- int32_t close(int32_t s) +-- int32_t poll(struct pollfd *fds, nfds_t nfds, int32_t timeout) +-- int32_t ppoll(struct pollfd *fds, nfds_t nfds, const struct timespec *tmo_p, const sigset_t *sigmask) +-- int32_t sigaction(int32_t signum, const struct sigaction *act, struct sigaction *oldact) +-- pid_t fork(void) +- +-# 用户态协议栈Gazelle支持应用列表 +-- mysql 8.0.20 +-- ceph client 14.2.8 +\ No newline at end of file ++Gazelle ++ ++# Gazelle支持posix接口列表 ++- int32_t epoll_create1(int32_t flags) ++- int32_t epoll_create(int32_t size) ++- int32_t epoll_ctl(int32_t epfd, int32_t op, int32_t fd, struct epoll_event* event) ++- int32_t epoll_wait(int32_t epfd, struct epoll_event* events, int32_t maxevents, int32_t timeout) ++- int32_t fcntl64(int32_t s, int32_t cmd, ...) ++- int32_t fcntl(int32_t s, int32_t cmd, ...) ++- int32_t ioctl(int32_t s, int32_t cmd, ...) ++- int32_t accept(int32_t s, struct sockaddr *addr, socklen_t *addrlen) ++- int32_t accept4(int32_t s, struct sockaddr *addr, socklen_t *addrlen, int32_t flags) ++- int32_t bind(int32_t s, const struct sockaddr *name, socklen_t namelen) ++- int32_t connect(int32_t s, const struct sockaddr *name, socklen_t namelen) ++- int32_t listen(int32_t s, int32_t backlog) ++- int32_t getpeername(int32_t s, struct sockaddr *name, socklen_t *namelen) ++- int32_t getsockname(int32_t s, struct sockaddr *name, socklen_t *namelen) ++- int32_t getsockopt(int32_t s, int32_t level, int32_t optname, void *optval, socklen_t *optlen) ++- int32_t setsockopt(int32_t s, int32_t level, int32_t optname, const void *optval, socklen_t optlen) ++- int32_t socket(int32_t domain, int32_t type, int32_t protocol) ++- ssize_t read(int32_t s, void *mem, size_t len) ++- ssize_t readv(int32_t s, const struct iovec *iov, int iovcnt) ++- ssize_t write(int32_t s, const void *mem, size_t size) ++- ssize_t writev(int32_t s, const struct iovec *iov, int iovcnt) ++- ssize_t recv(int32_t sockfd, void *buf, size_t len, int32_t flags) ++- ssize_t send(int32_t sockfd, const void *buf, size_t len, int32_t flags) ++- ssize_t recvmsg(int32_t s, struct msghdr *message, int32_t flags) ++- ssize_t sendmsg(int32_t s, const struct msghdr *message, int32_t flags) ++- int32_t close(int32_t s) ++- int32_t poll(struct pollfd *fds, nfds_t nfds, int32_t timeout) ++- int32_t ppoll(struct pollfd *fds, nfds_t nfds, const struct timespec *tmo_p, const sigset_t *sigmask) ++- int32_t sigaction(int32_t signum, const struct sigaction *act, struct sigaction *oldact) ++ ++# Gazelle支持应用列表 ++- mysql 8.0.20 ++- ceph client 14.2.8 +diff --git a/doc/support_en.md b/doc/support_en.md +new file mode 100644 +index 0000000..5e9fdce +--- /dev/null ++++ b/doc/support_en.md +@@ -0,0 +1,36 @@ ++Gazelle ++ ++# Gazelle Supported POSIX Interface List ++- int32_t epoll_create1(int32_t flags) ++- int32_t epoll_create(int32_t size) ++- int32_t epoll_ctl(int32_t epfd, int32_t op, int32_t fd, struct epoll_event* event) ++- int32_t epoll_wait(int32_t epfd, struct epoll_event* events, int32_t maxevents, int32_t timeout) ++- int32_t fcntl64(int32_t s, int32_t cmd, ...) ++- int32_t fcntl(int32_t s, int32_t cmd, ...) ++- int32_t ioctl(int32_t s, int32_t cmd, ...) ++- int32_t accept(int32_t s, struct sockaddr *addr, socklen_t *addrlen) ++- int32_t accept4(int32_t s, struct sockaddr *addr, socklen_t *addrlen, int32_t flags) ++- int32_t bind(int32_t s, const struct sockaddr *name, socklen_t namelen) ++- int32_t connect(int32_t s, const struct sockaddr *name, socklen_t namelen) ++- int32_t listen(int32_t s, int32_t backlog) ++- int32_t getpeername(int32_t s, struct sockaddr *name, socklen_t *namelen) ++- int32_t getsockname(int32_t s, struct sockaddr *name, socklen_t *namelen) ++- int32_t getsockopt(int32_t s, int32_t level, int32_t optname, void *optval, socklen_t *optlen) ++- int32_t setsockopt(int32_t s, int32_t level, int32_t optname, const void *optval, socklen_t optlen) ++- int32_t socket(int32_t domain, int32_t type, int32_t protocol) ++- ssize_t read(int32_t s, void *mem, size_t len) ++- ssize_t readv(int32_t s, const struct iovec *iov, int iovcnt) ++- ssize_t write(int32_t s, const void *mem, size_t size) ++- ssize_t writev(int32_t s, const struct iovec *iov, int iovcnt) ++- ssize_t recv(int32_t sockfd, void *buf, size_t len, int32_t flags) ++- ssize_t send(int32_t sockfd, const void *buf, size_t len, int32_t flags) ++- ssize_t recvmsg(int32_t s, struct msghdr *message, int32_t flags) ++- ssize_t sendmsg(int32_t s, const struct msghdr *message, int32_t flags) ++- int32_t close(int32_t s) ++- int32_t poll(struct pollfd *fds, nfds_t nfds, int32_t timeout) ++- int32_t ppoll(struct pollfd *fds, nfds_t nfds, const struct timespec *tmo_p, const sigset_t *sigmask) ++- int32_t sigaction(int32_t signum, const struct sigaction *act, struct sigaction *oldact) ++ ++# Gazelle Supported Applications List ++- mysql 8.0.20 ++- ceph client 14.2.8 +diff --git a/doc/user-guide.md b/doc/user-guide.md +new file mode 100644 +index 0000000..eb3bac2 +--- /dev/null ++++ b/doc/user-guide.md +@@ -0,0 +1,308 @@ ++# Gazelle用户指南 ++ ++## 安装 ++配置openEuler的yum源,直接使用yum命令安装 ++```sh ++#dpdk >= 21.11-2 ++yum install dpdk ++yum install libconfig ++yum install numactl ++yum install libboundscheck ++yum install libpcap ++yum install gazelle ++``` ++ ++## 使用方法 ++配置运行环境,使用Gazelle加速应用程序步骤如下: ++### 1. 使用root权限安装ko ++根据实际情况选择使用ko,提供虚拟网口、绑定网卡到用户态功能。 ++若使用虚拟网口功能,则使用rte_kni.ko ++ ++``` sh ++modprobe rte_kni carrier="on" ++``` ++ ++配置NetworkManager不托管kni网卡 ++``` ++[root@localhost ~]# cat /etc/NetworkManager/conf.d/99-unmanaged-devices.conf ++[keyfile] ++unmanaged-devices=interface-name:kni ++[root@localhost ~]# systemctl reload NetworkManager ++``` ++ ++ ++网卡从内核驱动绑为用户态驱动的ko,根据实际情况选择一种。mlx4和mlx5网卡不需要绑定vfio或uio驱动。 ++``` sh ++#若IOMMU能使用 ++modprobe vfio-pci ++ ++#若IOMMU不能使用,且VFIO支持noiommu ++modprobe vfio enable_unsafe_noiommu_mode=1 ++modprobe vfio-pci ++ ++#其它情况 ++modprobe igb_uio ++``` ++ ++ ++### 2. dpdk绑定网卡 ++将网卡绑定到步骤1选择的驱动。为用户态网卡驱动提供网卡资源访问接口。 ++``` sh ++#使用vfio-pci ++dpdk-devbind -b vfio-pci enp3s0 ++ ++#使用igb_uio ++dpdk-devbind -b igb_uio enp3s0 ++``` ++ ++### 3. 大页内存配置 ++Gazelle使用大页内存提高效率。使用root权限配置系统预留大页内存,可选用任意页大小。因每页内存都需要一个fd,使用内存较大时,建议使用1G的大页,避免占用过多fd。 ++根据实际情况,选择一种页大小,配置足够的大页内存即可。配置大页操作如下: ++``` sh ++#配置2M大页内存:在node0上配置 2M * 1024 = 2G ++echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages ++ ++#配置1G大页内存:在node0上配置1G * 5 = 5G ++echo 5 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages ++ ++#查看配置结果 ++grep Huge /proc/meminfo ++``` ++ ++### 4. 挂载大页内存 ++创建两个目录,分别给lstack的进程、ltran进程访问大页内存使用。操作步骤如下: ++``` sh ++mkdir -p /mnt/hugepages-ltran ++mkdir -p /mnt/hugepages-lstack ++chmod -R 700 /mnt/hugepages-ltran ++chmod -R 700 /mnt/hugepages-lstack ++# 注: /mnt/hugepages-ltran 和 /mnt/hugepages-lstack 必须挂载同样pagesize的大页内存。 ++mount -t hugetlbfs nodev /mnt/hugepages-ltran -o pagesize=2M ++mount -t hugetlbfs nodev /mnt/hugepages-lstack -o pagesize=2M ++``` ++ ++### 5. 应用程序使用Gazelle ++有两种使用Gazelle方法,根据需要选择其一 ++- 重新编译应用程序,链接Gazelle的库 ++修改应用makefile文件链接liblstack.so,示例如下: ++``` ++#makefile中添加Gazelle的Makefile ++-include /etc/gazelle/lstack.Makefile ++ ++#编译添加LSTACK_LIBS变量 ++gcc test.c -o test ${LSTACK_LIBS} ++``` ++ ++- 使用LD_PRELOAD加载Gazelle的库 ++GAZELLE_BIND_PROCNAME环境变量指定进程名,LD_PRELOAD指定Gazelle库路径 ++``` ++GAZELLE_BIND_PROCNAME=test LD_PRELOAD=/usr/lib64/liblstack.so ./test ++``` ++ ++### 6. 配置文件 ++- lstack.conf用于指定lstack的启动参数,默认路径为/etc/gazelle/lstack.conf, 配置文件参数如下 ++ ++|选项|参数格式|说明| ++|:---|:---|:---| ++|dpdk_args|--socket-mem(必需)
--huge-dir(必需)
--proc-type(必需)
--legacy-mem
--map-perfect
-d
--iova-mode
等|dpdk初始化参数,参考dpdk说明
对于没有链接到liblstack.so的PMD,必须使用 -d 加载,比如librte_net_mlx5.so。
当使用非root用户启动,且dpdk版本>23.11,需要指定iova模式为va,即 --iova-mode va| ++|use_ltran| 0/1 | 是否使用ltran | ++|listen_shadow| 0/1 | 是否使用影子fd监听,单个listen线程多个协议栈线程时使用 | ++|num_cpus|"0,2,4 ..."|lstack线程绑定的cpu编号,编号的数量为lstack线程个数(小于等于网卡多队列数量)。可按NUMA选择cpu| ++|app_bind_numa|0/1|应用的epoll和poll线程是否绑定到协议栈所在的numa,缺省值是1,即绑定| ++|app_exclude_cpus|"7,8,9 ..."|应用的epoll和poll线程不会绑定到的cpu编号,app_bind_numa = 1时才生效| ++|low_power_mode|0/1|是否开启低功耗模式,暂不支持| ++|kni_swith|0/1|rte_kni开关,默认为0。只有不使用ltran时才能开启| ++|unix_prefix|"string"|gazelle进程间通信使用的unix socket文件前缀字符串,默认为空,和需要通信的ltran.conf的unix_prefix或gazellectl的-u参数配置一致。不能含有特殊字符,最大长度为128。| ++|host_addr|"192.168.xx.xx"|协议栈的IP地址,必须和redis-server配置
文件里的“bind”字段保存一致。| ++|mask_addr|"255.255.xx.xx"|掩码地址| ++|gateway_addr|"192.168.xx.1"|网关地址| ++|devices|"aa:bb:cc:dd:ee:ff"|网卡通信的mac地址,需要与ltran.conf的bond_macs配置一致;在lstack bond1模式下,可指定bond1的主接口,取值为bond_slave_mac之一| ++|send_connect_number|4|设置为正整数,表示每次协议栈循环中发包处理的连接个数| ++|read_connect_number|4|设置为正整数,表示每次协议栈循环中收包处理的连接个数| ++|rpc_number|4|设置为正整数,表示每次协议栈循环中rpc消息处理的个数| ++|nic_read_num|128|设置为正整数,表示每次协议栈循环中从网卡读取的数据包的个数| ++|tcp_conn_count|1500|tcp的最大连接数,该参数乘以mbuf_count_per_conn是初始化时申请的mbuf池大小,配置过小会启动失败,tcp_conn_count * mbuf_count_per_conn * 2048字节不能大于大页大小 | ++|mbuf_count_per_conn|170|每个tcp连接需要的mbuf个数,该参数乘以tcp_conn_count是初始化时申请的mbuf地址池大小,配置过小会启动失败,tcp_conn_count * mbuf_count_per_conn * 2048字节不能大于大页大小| ++|nic_rxqueue_size|4096|网卡接收队列深度,范围512-8192,缺省值是4096| ++|nic_txqueue_size|2048|网卡发送队列深度,范围512-8192,缺省值是2048| ++|nic_vlan_mode|-1|vlan模式开关,变量值为vlanid,取值范围-1~4094,-1关闭,缺省值是-1| ++|bond_mode|n|bond模式,目前支持ACTIVE_BACKUP/8023AD/ALB三种模式,对应的取值是1/4/6;当取值为-1或者NULL时,表示未配置bond| ++|bond_slave_mac|"aa:bb:cc:dd:ee:ff;dd:aa:cc:dd:ee:ff"|用于组bond的两个子口的mac地址| ++|bond_miimon|n|链路监控时间,单位为ms,取值范围为1到2^31 - 1,缺省值为10ms| ++|| ++|flow_bifurcation|0/1|流量分叉开关(替代kni方案),通过gazelle将不支持处理的报文转发到内核,缺省值是0,即关闭| ++ ++lstack.conf示例: ++``` conf ++dpdk_args=["--socket-mem", "2048,0,0,0", "--huge-dir", "/mnt/hugepages-lstack", "--proc-type", "primary", "--legacy-mem", "--map-perfect"] ++ ++use_ltran=1 ++kni_switch=0 ++ ++low_power_mode=0 ++ ++num_cpus="2,22" ++ ++host_addr="192.168.1.10" ++mask_addr="255.255.255.0" ++gateway_addr="192.168.1.1" ++devices="aa:bb:cc:dd:ee:ff" ++ ++send_connect_number=4 ++read_connect_number=4 ++rpc_number=4 ++nic_read_num=128 ++tcp_conn_count=1500 ++mbuf_count_per_conn=170 ++``` ++ ++- ltran.conf用于指定ltran启动的参数,默认路径为/etc/gazelle/ltran.conf。使用ltran时,lstack.conf内配置use_ltran=1,配置参数如下: ++ ++|功能分类|选项|参数格式|说明| ++|:---|:---|:---|:---| ++|kit|forward_kit|"dpdk"|指定网卡收发模块。
保留字段,目前未使用。| ++||forward_kit_args|-l
--socket-mem(必需)
--huge-dir(必需)
--proc-TYPE(必需)
--legacy-mem(必需)
--map-perfect(必需)
-d
等|dpdk初始化参数,参考dpdk说明。
注:--map-perfect为扩展特性,用于防止dpdk占用多余的地址空间,保证ltran有额外的地址空间分配给lstack。
对于没有链接到ltran的PMD,必须使用 -d 加载,比如librte_net_mlx5.so。
-l绑定的CPU核不要和lstack绑定的CPU重复,否则性能可能会急剧下降。
| ++|kni|kni_switch|0/1|rte_kni开关,默认为0| ++|unix|unix_prefix|"string"|gazelle进程间通信使用的unix socket文件前缀字符串,默认为空,和需要通信的lstack.conf的unix_prefix或gazellectl的-u参数配置一致| ++|dispatcher|dispatch_max_clients|n|ltran支持的最大client数。
1、多进程单线程场景,支持的lstack实例数不大于32,每lstack实例有1个网络线程
2、单进程多线程场景,支持的1个lstack实例,lstack实例的网络线程数不大于32| ++||dispatch_subnet|192.168.xx.xx|子网掩码,表示ltran能识别的IP所在子网网段。参数为样例,子网按实际值配置。| ++||dispatch_subnet_length|n|子网长度,表示ltran能识别的子网长度,例如length为4时,192.168.1.1-192.168.1.16| ++|bond|bond_mode|n|bond模式,目前只支持Active Backup(Mode1),取值为1| ++||bond_miimon|n|bond链路监控时间,单位为ms,取值范围为1到2^64 - 1 - (1000 * 1000)| ++||bond_ports|"0xaa"|使用的dpdk网卡,0x1表示第一块| ++||bond_macs|"aa:bb:cc:dd:ee:ff"|绑定的网卡mac地址,需要跟kni的mac地址保持一致| ++||bond_mtu|n|最大传输单元,默认是1500,不能超过1500,最小值为68,不能低于68| ++ ++ltran.conf示例: ++``` conf ++forward_kit_args="-l 0,1 --socket-mem 1024,0,0,0 --huge-dir /mnt/hugepages-ltran --proc-type primary --legacy-mem --map-perfect --syslog daemon" ++forward_kit="dpdk" ++ ++kni_switch=0 ++ ++dispatch_max_clients=30 ++dispatch_subnet="192.168.1.0" ++dispatch_subnet_length=8 ++ ++bond_mode=1 ++bond_mtu=1500 ++bond_miimon=100 ++bond_macs="aa:bb:cc:dd:ee:ff" ++bond_ports="0x1" ++ ++tcp_conn_scan_interval=10 ++``` ++### 7. 启动应用程序 ++- 启动ltran进程 ++单进程且网卡支持多队列,则直接使用网卡多队列分发报文到各线程,不启动ltran进程,lstack.conf的use_ltran配置为0. ++启动ltran时不使用-config-file指定配置文件,则使用默认路径/etc/gazelle/ltran.conf ++``` sh ++ltran --config-file ./ltran.conf ++``` ++- 启动应用程序 ++启动应用程序前不使用环境变量LSTACK_CONF_PATH指定配置文件,则使用默认路径/etc/gazelle/lstack.conf ++``` sh ++export LSTACK_CONF_PATH=./lstack.conf ++LD_PRELOAD=/usr/lib64/liblstack.so GAZELLE_BIND_PROCNAME=redis-server redis-server redis.conf ++``` ++ ++### 8. API ++Gazelle wrap应用程序POSIX接口,应用程序无需修改代码。 ++ ++### 9. 调测命令 ++- 不使用ltran模式时不支持gazellectl ltran xxx命令 ++- -u参数指定gazelle进程间通信的unix socket前缀,和需要通信的ltran.conf或lstack.conf的unix_prefix配置一致。 ++- 对于udp连接,目前gazellectl lstack xxx 命令目前仅支持无LSTACK_OPTIONS参数的。 ++``` ++Usage: gazellectl [-h | help] ++ or: gazellectl ltran {quit | show} [LTRAN_OPTIONS] [time] [-u UNIX_PREFIX] ++ or: gazellectl lstack show {ip | pid} [LSTACK_OPTIONS] [time] [-u UNIX_PREFIX] ++ ++ quit ltran process exit ++ ++ where LTRAN_OPTIONS := ++ show ltran all statistics ++ -r, rate show ltran statistics per second ++ -i, instance show ltran instance register info ++ -b, burst show ltran NIC packet len per second ++ -t, table {socktable | conntable} show ltran sock or conn table ++ -l, latency show ltran latency ++ ++ where LSTACK_OPTIONS := ++ show lstack all statistics ++ -r, rate show lstack statistics per second ++ -s, snmp show lstack snmp ++ -c, connetct show lstack connect ++ -l, latency show lstack latency ++ -x, xstats show lstack xstats ++ -k, nic-features show state of protocol offload and other features ++ -a, aggregatin [time] show lstack send/recv aggregation ++ set: ++ loglevel {error | info | debug} set lstack loglevel ++ lowpower {0 | 1} set lowpower enable ++ [time] measure latency time default 1S ++``` ++ ++**抓包工具** ++gazelle使用的网卡由dpdk接管,因此普通的tcpdump无法抓到gazelle的数据包。作为替代,gazelle使用dpdk-tools软件包中提供的gazelle-pdump作为数据包捕获工具,它使用dpdk的多进程模式和lstack/ltran进程共享内存。在ltran模式下,gazelle-pdump只能抓取和网卡直接通信的ltran的数据包,通过tcpdump的数据包过滤,可以过滤特定lstack的数据包。 ++[详细使用方法](https://gitee.com/openeuler/gazelle/blob/master/doc/pdump.md) ++ ++### 10. 使用注意 ++#### 1. dpdk配置文件的位置 ++如果是root用户,dpdk启动后的配置文件将会放到/var/run/dpdk目录下; ++如果是非root用户,dpdk配置文件的路径将由环境变量XDG_RUNTIME_DIR决定; ++- 如果XDG_RUNTIME_DIR为空,dpdk配置文件放到/tmp/dpdk目录下; ++- 如果XDG_RUNTIME_DIR不为空,dpdk配置文件放到变量XDG_RUNTIME_DIR下; ++- 注意有些机器会默认设置XDG_RUNTIME_DIR ++ ++#### 2. retbleed漏洞补丁影响gazelle性能 ++- 内核在5.10.0-60.57.0.85版本合入retbleed漏洞补丁,该补丁导致X86架构下gazelle性能下降,可以在启动参数内增加**retbleed=off mitigations=off** 来规避此CVE带来的性能损耗,用户可以根据自身产品特性来选择是否规避,出于安全考虑,默认是不规避的。 ++- 测试场景为发送端内核态,接收端用户态ltran模式,收发1024字节,性能由17000Mb/s下降至5000Mb/s。 ++- 受影响的版本包括openEuler-22.03-LTS(内核版本高于等于5.10.0-60.57.0.85) 及其之后的SP版本。 ++- 具体详情可参考:https:/gitee.com/openeuler/kernel/pulls/110 ++ ++## 约束限制 ++ ++使用 Gazelle 存在一些约束限制: ++#### 功能约束 ++- 不支持accept阻塞模式或者connect阻塞模式。 ++- 最多支持1500个TCP连接。 ++- 当前仅支持TCP、ICMP、ARP、IPv4 协议。 ++- 在对端ping Gazelle时,要求指定报文长度小于等于14792B。 ++- 不支持使用透明大页。 ++- ltran不支持使用多种类型的网卡混合组bond。 ++- ltran的bond1主备模式,只支持链路层故障主备切换(例如网线断开),不支持物理层故障主备切换(例如网卡下电、拔网卡)。 ++- 虚拟机网卡不支持多队列。 ++- bond4/6 模式需要开启混杂模式,该模式与硬件vlan过滤功能冲突,软件vlan过滤功能则需要关闭GRO功能,因此使用bond4/6 模式,若需要配置vlan过滤功能,请在交换机上进行配置。 ++#### 操作约束 ++- 提供的命令行、配置文件默认root权限。非root用户使用,需先提权以及修改文件所有者。 ++- 将用户态网卡绑回到内核驱动,必须先退出Gazelle。 ++- 大页内存不支持在挂载点里创建子目录重新挂载。 ++- ltran需要最低大页内存为1064MB。 ++- 每个应用实例协议栈线程最低大页内存为800MB。 ++- 仅支持64位系统。 ++- 构建x86版本的Gazelle使用了-march=native选项,基于构建环境的CPU(Intel® Xeon® Gold 5118 CPU @ 2.30GHz指令集进行优化。要求运行环境CPU支持 SSE4.2、AVX、AVX2、AVX-512 指令集。 ++- IP数据报重组的最大IP分片数为10(ping 最大包长14792B),TCP协议不使用IP分片。 ++- sysctl配置网卡rp_filter参数为1,否则可能不按预期使用Gazelle协议栈,而是依然使用内核协议栈。 ++- 不使用ltran模式,KNI网口不可配置只支持本地通讯使用,且需要启动前配置NetworkManager不管理KNI网卡。 ++- 虚拟KNI网口的IP及mac地址,需要与lstack.conf配置文件保持一致。 ++- 发送udp报文包长超过45952(32 * 1436)B时,需要将send_ring_size扩大为至少64个。 ++ ++## 风险提示 ++Gazelle可能存在如下安全风险,用户需要根据使用场景评估风险。 ++ ++**共享内存** ++- 现状 ++ 大页内存 mount 至 /mnt/hugepages-lstack 目录,链接 liblstack.so 的进程初始化时在 /mnt/hugepages-lstack 目录下创建文件,每个文件对应 2M 大页内存,并 mmap 这些文件。ltran 在收到 lstask 的注册信息后,根据大页内存配置信息也 mmap 目录下文件,实现大页内存共享。 ++ ltran 在 /mnt/hugepages-ltran 目录的大页内存同理。 ++- 当前消减措施 ++ 大页文件权限 600,只有 OWNER 用户才能访问文件,默认 root 用户,支持配置成其他用户; ++ 大页文件有 DPDK 文件锁,不能直接写或者映射。 ++- 风险点 ++ 属于同一用户的恶意进程模仿DPDK实现逻辑,通过大页文件共享大页内存,写破坏大页内存,导致Gazelle程序crash。建议用户下的进程属于同一信任域。 ++ ++**流量限制** ++Gazelle没有做流量限制,用户有能力发送最大网卡线速流量的报文到网络,可能导致网络流量拥塞。 ++ ++**进程仿冒** ++合法注册到ltran的两个lstack进程,进程A可仿冒进程B发送仿冒消息给ltran,修改ltran的转发控制信息,造成进程B通讯异常,进程B报文转发给进程A信息泄露等问题。建议lstack进程都为可信任进程。 +diff --git a/doc/user-guide_en.md b/doc/user-guide_en.md +new file mode 100644 +index 0000000..d3afc8a +--- /dev/null ++++ b/doc/user-guide_en.md +@@ -0,0 +1,307 @@ ++ # Gazelle User Guide ++ ++ ## Installation ++ Configure the OpenEuler yum repository and install directly using the yum command: ++ ```sh ++ # dpdk >= 21.11-2 ++ yum install dpdk ++ yum install libconfig ++ yum install numactl ++ yum install libboundscheck ++ yum install libpcap ++ yum install gazelle ++ ``` ++ ++ ## Usage ++ Configure the operating environment. The steps for accelerating applications using Gazelle are as follows: ++ ### 1. Install the ko with root privileges ++ Select the ko to use based on the actual situation. It provides virtual network ports and binds the network card to user mode functions. ++ If you use the virtual network port function, use rte_kni.ko ++ ++ ``` sh ++ modprobe rte_kni carrier="on" ++ ``` ++ ++ Configure NetworkManager not to manage the kni network card ++ ``` ++ [root@localhost ~]# cat /etc/NetworkManager/conf.d/99-unmanaged-devices.conf ++ [keyfile] ++ unmanaged-devices=interface-name:kni ++ [root@localhost ~]# systemctl reload NetworkManager ++ ``` ++ ++ Bind the network card from the kernel driver to the user mode driver ko. Select one according to the actual situation. mlx4 and mlx5 network cards do not need to bind vfio or uio drivers. ++ ``` sh ++ # If IOMMU can be used ++ modprobe vfio-pci ++ ++ # If IOMMU cannot be used, and VFIO supports noiommu ++ modprobe vfio enable_unsafe_noiommu_mode=1 ++ modprobe vfio-pci ++ ++ # Other cases ++ modprobe igb_uio ++ ``` ++ ++ ### 2. Bind the network card to dpdk ++ Bind the network card to the driver selected in step 1. Provide network card resource access interface for user mode network card driver. ++ ``` sh ++ # Use vfio-pci ++ dpdk-devbind -b vfio-pci enp3s0 ++ ++ # Use igb_uio ++ dpdk-devbind -b igb_uio enp3s0 ++ ``` ++ ++ ### 3. Huge page memory configuration ++ Gazelle uses huge pages to improve efficiency. Use root privileges to configure the system to reserve huge pages, and any page size can be used. Since each page of memory requires an fd, when using a larger memory, it is recommended to use a 1G page to avoid occupying too many fds. ++ According to the actual situation, select a page size and configure enough huge pages. The steps to configure huge pages are as follows: ++ ``` sh ++ # Configure 2M huge pages: Configure 2M * 1024 = 2G on node0 ++ echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages ++ ++ # Configure 1G huge pages: Configure 1G * 5 = 5G on node0 ++ echo 5 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages ++ ++ # View configuration results ++ grep Huge /proc/meminfo ++ ``` ++ ++ ### 4. Mount huge pages ++ Create two directories for the lstack process and ltran process to access huge pages. The operation steps are as follows: ++ ``` sh ++ mkdir -p /mnt/hugepages-ltran ++ mkdir -p /mnt/hugepages-lstack ++ chmod -R 700 /mnt/hugepages-ltran ++ chmod -R 700 /mnt/hugepages-lstack ++ # Note: /mnt/hugepages-ltran and /mnt/hugepages-lstack must mount huge pages of the same pagesize. ++ mount -t hugetlbfs nodev /mnt/hugepages-ltran -o pagesize=2M ++ mount -t hugetlbfs nodev /mnt/hugepages-lstack -o pagesize=2M ++ ``` ++ ++ ### 5. Application uses Gazelle ++ There are two ways to use Gazelle, choose one according to your needs ++ - Recompile the application and link the Gazelle library ++ Modify the application makefile file to link liblstack.so, as shown below: ++ ``` ++ # Add Gazelle's Makefile to makefile ++ -include /etc/gazelle/lstack.Makefile ++ ++ # Compile and add the LSTACK_LIBS variable ++ gcc test.c -o test ${LSTACK_LIBS} ++ ``` ++ ++ - Use LD_PRELOAD to load the Gazelle library ++ The GAZELLE_BIND_PROCNAME environment variable specifies the process name, and LD_PRELOAD specifies the Gazelle library path ++ ``` ++ GAZELLE_BIND_PROCNAME=test LD_PRELOAD=/usr/lib6 ++ ``` ++ ++ ### 6. Configuration File ++ - The lstack.conf file is used to specify the startup parameters for lstack, with the default path being /etc/gazelle/lstack.conf. The configuration file parameters are as follows: ++ ++ | Option | Parameter Format | Description | ++|:---|:---|:---| ++| dpdk_args | --socket-mem (mandatory)
--huge-dir (mandatory)
--proc-type (mandatory)
--legacy-mem
--map-perfect
-d
--iova-modeetc. | DPDK initialization parameters, refer to DPDK documentation.
For PMDs not linked to liblstack.so, use -d to load them, such as libnet_mlx5.so.
if execute gazelle as a non-root user, we need to specify iova-mode to va,as --iova-mode va | ++| use_ltran | 0/1 | Whether to use ltran. | ++| listen_shadow | 0/1 | Whether to use shadow FD listening. Used when there are multiple protocol stack threads for a single listen thread. | ++| num_cpus | "0,2,4 ..." | CPU numbers to which lstack threads are bound. The number of IDs corresponds to the number of lstack threads (which is less than or equal to the number of queues per NIC). CPUs can be selected according to NUMA. | ++| app_bind_numa | 0/1 | Whether epoll and poll threads of the application are bound to the NUMA where the protocol stack resides. Default is 1, meaning bound. | ++| app_exclude_cpus | "7,8,9 ..." | CPU numbers to which epoll and poll threads of the application are not bound. Only effective when app_bind_numa = 1. | ++| low_power_mode | 0/1 | Whether to enable low power mode. Currently not supported. | ++| kni_swith | 0/1 | rte_kni switch, default is 0. Can only be enabled when not using ltran. | ++| unix_prefix | "string" | Prefix string for inter-process communication using UNIX sockets. Default is empty and should be consistent with the unix_prefix in ltran.conf or the -u parameter of gazellectl. Cannot contain special characters, with a maximum length of 128. | ++| host_addr | "192.168.xx.xx" | IP address of the protocol stack, must be consistent with the "bind" field in the redis-server configuration file. | ++| mask_addr | "255.255.xx.xx" | Mask address. | ++| gateway_addr | "192.168.xx.1" | Gateway address. | ++| devices | "aa:bb:cc:dd:ee:ff" | MAC address for communication via the NIC, must be consistent with the bond_macs configuration in ltran.conf; in lstack bond1 mode, specify the primary interface of bond1, taking one of the bond_slave_mac values. | ++| send_connect_number | 4 | Positive integer indicating the number of connections processed per cycle in the protocol stack for packet transmission. | ++| read_connect_number | 4 | Positive integer indicating the number of connections processed per cycle in the protocol stack for packet reception. | ++| rpc_number | 4 | Positive integer indicating the number of RPC messages processed per cycle in the protocol stack. | ++| nic_read_num | 128 | Positive integer indicating the number of data packets read from the NIC per cycle in the protocol stack. | ++| tcp_conn_count | 1500 | Maximum number of TCP connections. This parameter multiplied by mbuf_count_per_conn is the size of the mbuf pool allocated during initialization. If set too small, startup may fail. tcp_conn_count * mbuf_count_per_conn * 2048 bytes must not exceed the size of the huge page. | ++| mbuf_count_per_conn | 170 | Number of mbufs required per TCP connection. This parameter multiplied by tcp_conn_count is the size of the mbuf address pool allocated during initialization. If set too small, startup may fail. tcp_conn_count * mbuf_count_per_conn * 2048 bytes must not exceed the size of the huge page. | ++| nic_rxqueue_size | 4096 | Depth of the NIC receive queue, range is 512-8192, default is 4096. | ++| nic_txqueue_size | 2048 | Depth of the NIC transmit queue, range is 512-8192, default is 2048. | ++| nic_vlan_mode | -1 | VLAN mode switch, variable value is the VLAN ID, range is -1 to 4094, -1 means disabled, default is -1. | ++| bond_mode | n | Bond mode, currently supports ACTIVE_BACKUP/8023AD/ALB, corresponding values are 1/4/6; when set to -1 or NULL, it means bond is not configured. | ++| bond_slave_mac | "aa:bb:cc:dd:ee:ff;dd:aa:cc:dd:ee:ff" | MAC addresses of the two sub-interfaces used to form a bond. | ++| bond_miimon | n | Link monitoring time in milliseconds, range is 1 to 2^31 - 1, default is 10ms. | ++|flow_bifurcation|0/1|flow bifurcation switch (alternative to KNI scheme), which forwards unsupported packets to the kernel through Gazelle. The default value is 0, which means it is turned off| ++ ++```conf ++lstack.conf example: ++```conf ++dpdk_args=["--socket-mem", "2048,0,0,0", "--huge-dir", "/mnt/hugepages-lstack", "--proc-type", "primary", "--legacy-mem", "--map-perfect"] ++ ++use_ltran=1 ++kni_switch=0 ++ ++low_power_mode=0 ++ ++num_cpus="2,22" ++ ++host_addr="192.168.1.10" ++mask_addr="255.255.255.0" ++gateway_addr="192.168.1.1" ++devices="aa:bb:cc:dd:ee:ff" ++ ++send_connect_number=4 ++read_connect_number=4 ++rpc_number=4 ++nic_read_num=128 ++tcp_conn_count=1500 ++mbuf_count_per_conn=170 ++``` ++ ++ltran.conf is used to specify the parameters for starting ltran, with the default path being /etc/gazelle/ltran.conf. When using ltran, set use_ltran=1 in lstack.conf and configure the parameters as follows: ++ ++| Functional Category | Option | Parameter Format | Description | ++|:---|:---|:---|:---| ++| kit | forward_kit | "dpdk" | Specifies the NIC transmit/receive module.
Reserved field, currently not used. | ++|| forward_kit_args | -l
--socket-mem (required)
--huge-dir (required)
--proc-TYPE (required)
--legacy-mem (required)
--map-perfect (required)
-d
etc. | DPDK initialization parameters, refer to DPDK documentation.
Note: --map-perfect is an extended feature used to prevent DPDK from occupying extra address space, ensuring ltran has additional address space allocated to lstack.
For PMDs not linked to ltran, -d must be used for loading, such as librte_net_mlx5.so.
-l binds CPU cores that are different from those bound to lstack, otherwise performance may drastically decrease.
| ++| kni | kni_switch | 0/1 | rte_kni switch, default is 0 | ++| unix | unix_prefix | "string" | Unix socket file prefix string used for communication between gazelle processes, default is empty, consistent with unix_prefix in the communicating lstack.conf or the -u parameter of gazellectl | ++| dispatcher | dispatch_max_clients | n | Maximum number of clients supported by ltran.
1. In a multi-process single-thread scenario, the number of supported lstack instances is no more than 32, with one network thread per lstack instance.
2. In a single-process multi-thread scenario, only 1 lstack instance is supported, with the number of network threads per lstack instance no more than 32. | ++|| dispatch_subnet | 192.168.xx.xx | Subnet mask indicating the subnet segment where ltran can recognize IP addresses. The parameter is an example; configure the subnet according to the actual value. | ++|| dispatch_subnet_length | n | Subnet length indicating the length of the subnet that ltran can recognize. For example, when the length is 4, it covers IP addresses from 192.168.1.1 to 192.168.1.16 | ++| bond | bond_mode | n | Bonding mode, currently only supports Active Backup (Mode 1), with a value of 1 | ++|| bond_miimon | n | Bond link monitoring time, in milliseconds, with a range from 1 to 2^64 - 1 - (1000 * 1000) | ++|| bond_ports | "0xaa" | DPDK NICs used, where 0x1 represents the first one | ++|| bond_macs | "aa:bb:cc:dd:ee:ff" | MAC addresses bound to the NICs, must be consistent with the MAC address of kni | ++|| bond_mtu | n | Maximum transmission unit, default is 1500, cannot exceed 1500, minimum value is 68, cannot be lower than 68 | ++ ++ltran.conf example: ++```conf ++forward_kit_args="-l 0,1 --socket-mem 1024,0,0,0 --huge-dir /mnt/hugepages-ltran --proc-type primary --legacy-mem --map-perfect --syslog daemon" ++forward_kit="dpdk" ++ ++kni_switch=0 ++ ++dispatch_max_clients=30 ++dispatch_subnet="192.168.1.0" ++dispatch_subnet_length=8 ++ ++bond_mode=1 ++bond_mtu=1500 ++bond_miimon=100 ++bond_macs="aa:bb:cc:dd:ee:ff" ++bond_ports="0x1" ++ ++tcp_conn_scan_interval=10 ++``` ++### 7. Starting the Application ++- Starting the ltran process ++If it's a single process and the network card supports multiple queues, then directly use the network card's multiple queues to distribute packets to each thread, without starting the ltran process. Set use_ltran=0 in lstack.conf. ++If not specifying the configuration file with -config-file when starting ltran, it will use the default path /etc/gazelle/ltran.conf. ++``` sh ++ltran --config-file ./ltran.conf ++``` ++- Starting the application ++Before starting the application, if the LSTACK_CONF_PATH environment variable is not used to specify the configuration file, the default path /etc/gazelle/lstack.conf is used. ++``` sh ++export LSTACK_CONF_PATH=./lstack.conf ++LD_PRELOAD=/usr/lib64/liblstack.so GAZELLE_BIND_PROCNAME=redis-server redis-server redis.conf ++``` ++ ++### 8. API ++Gazelle wraps the POSIX interfaces of applications, so no code modifications are needed for the applications. ++ ++### 9. Debugging Commands ++- The gazellectl ltran xxx command is not supported when not using ltran mode. ++- The -u parameter specifies the unix socket prefix for Gazelle inter-process communication, which must be consistent with the unix_prefix setting in the communicating ltran.conf or lstack.conf. ++- For UDP connections, currently, the gazellectl lstack xxx command only supports without LSTACK_OPTIONS parameters. ++``` ++Usage: gazellectl [-h | help] ++ or: gazellectl ltran {quit | show} [LTRAN_OPTIONS] [time] [-u UNIX_PREFIX] ++ or: gazellectl lstack show {ip | pid} [LSTACK_OPTIONS] [time] [-u UNIX_PREFIX] ++ ++ quit ltran process exit ++ ++ where LTRAN_OPTIONS := ++ show all ltran statistics ++ -r, rate show ltran statistics per second ++ -i, instance show ltran instance register info ++ -b, burst show ltran NIC packet length per second ++ -t, table {socktable | conntable} show ltran sock or conn table ++ -l, latency show ltran latency ++ ++ where LSTACK_OPTIONS := ++ show all lstack statistics ++ -r, rate show lstack statistics per second ++ -s, snmp show lstack snmp ++ -c, connect show lstack connect ++ -l, latency show lstack latency ++ -x, xstats show lstack xstats ++ -k, nic-features show state of protocol offload and other features ++ -a, aggregation [time] show lstack send/recv aggregation ++ set: ++ loglevel {error | info | debug} set lstack log level ++ lowpower {0 | 1} set low power mode ++ [time] measure latency time, default 1S ++``` ++ ++**Packet Capture Tool** ++The network cards used by Gazelle are managed by DPDK, so traditional tcpdump cannot capture packets from Gazelle. Instead, Gazelle uses the gazelle-pdump tool from the dpdk-tools package for packet capture. This tool uses DPDK's multi-process mode and shares memory with lstack/ltran processes. In ltran mode, gazelle-pdump can only capture packets that communicate directly with the network card. By using tcpdump's packet filtering, it is possible to filter packets specific to an lstack. ++[Detailed usage](https://gitee.com/openeuler/gazelle/blob/master/doc/pdump.md) ++ ++### 10. Usage Notes ++#### 1. Location of dpdk Configuration File ++The location of the dpdk configuration file depends on the user's privileges: ++- If running as the root user, the dpdk configuration file will be placed in the /var/run/dpdk directory after dpdk starts. ++- If running as a non-root user, the location of the dpdk configuration file is determined by the XDG_RUNTIME_DIR environment variable: ++ - If XDG_RUNTIME_DIR is empty, the dpdk configuration file will be placed in the /tmp/dpdk directory. ++ - If XDG_RUNTIME_DIR is not empty, the dpdk configuration file will be placed in the directory specified by the XDG_RUNTIME_DIR variable. ++ - Note that some machines may have XDG_RUNTIME_DIR set by default. ++ ++#### 2. Impact of retbleed Vulnerability Patch on gazelle Performance ++- The kernel version 5.10.0-60.57.0.85 introduced the retbleed vulnerability patch, which causes a performance degradation in gazelle on X86 architecture. To mitigate the performance loss caused by this CVE, users can add **retbleed=off mitigations=off** to the boot parameters. Users can choose whether to mitigate this CVE based on their product characteristics, but it is not mitigated by default for security reasons. ++- In the testing scenario where the sender is in kernel mode and the receiver is in user mode using ltran, with packets of 1024 bytes, the performance decreased from 17000 Mb/s to 5000 Mb/s. ++- Affected versions include openEuler-22.03-LTS (kernel version equal to or higher than 5.10.0-60.57.0.85) and subsequent SP versions. ++- For more details, please refer to: https:/gitee.com/openeuler/kernel/pulls/110 ++ ++## Constraints ++ ++There are certain constraints when using Gazelle: ++#### Functional Constraints ++- Blocking modes for accept or connect are not supported. ++- A maximum of 1500 TCP connections is supported. ++- Currently, only TCP, ICMP, ARP, and IPv4 protocols are supported. ++- When pinging Gazelle from the peer, the packet length must be less than or equal to 14792 bytes. ++- Transparent huge pages are not supported. ++- ltran does not support mixing multiple types of bonded network cards. ++- In ltran's bond1 active-backup mode, only link layer failure is supported (e.g., cable disconnection), not physical layer failure (e.g., NIC power off, unplugging NIC). ++- Virtual machine network cards do not support multi-queue. ++#### Operational Constraints ++- The provided command-line and configuration files default to root privileges. Non-root users need to elevate privileges and change file ownership before use. ++- Returning the user-space network card to the kernel driver requires exiting Gazelle first. ++- Huge pages cannot be created in subdirectories under the mount point for remounting. ++- ltran requires a minimum of 1064MB of huge page memory. ++- Each application instance's protocol stack thread requires a minimum of 800MB of huge page memory. ++- Only 64-bit systems are supported. ++- Building the x86 version of Gazelle uses the -march=native option, optimizing for the CPU architecture of the build environment (Intel® Xeon® Gold 5118 CPU @ 2.30GHz instruction set). The running environment's CPU must support SSE4.2, AVX, AVX2, and AVX-512 instruction sets. ++- The maximum number of IP fragments for IP datagram reassembly is 10 (ping maximum packet length 14792 bytes), and TCP protocol does not use IP fragmentation. ++- Ensure sysctl configures the network card's rp_filter parameter to 1; otherwise, Gazelle protocol stack may not be used as expected, and the kernel protocol stack may still be used. ++- Without using ltran mode, KNI interfaces cannot be configured to only support local communication and require NetworkManager to be configured not to manage KNI interfaces before starting. ++- The IP and MAC addresses of virtual KNI interfaces must match those specified in the lstack.conf configuration file. ++- When sending UDP packets longer than 45952 (32 * 1436) bytes, the send_ring_size needs to be increased to at least 64. ++ ++## Risk Alert ++ ++Gazelle may have the following security risks, and users need to assess the risks based on their usage scenarios. ++ ++**Shared Memory** ++- Current Status ++ Huge pages are mounted to the /mnt/hugepages-lstack directory, and processes linking liblstack.so create files in the /mnt/hugepages-lstack directory during initialization, with each file corresponding to 2MB huge pages and mmap-ing these files. ltran, upon receiving registration information from lstask, also mmap-s files in the directory based on the huge page memory configuration, achieving shared huge page memory. ++ ltran operates similarly with huge page memory in the /mnt/hugepages-ltran directory. ++- Current Mitigation Measures ++ Huge page files have permissions set to 600, accessible only by the OWNER user, defaulting to the root user, and can be configured to other users. ++ Huge page files have DPDK file locks, preventing direct writing or mapping. ++- Risk Points ++ Malicious processes from the same user domain can mimic DPDK logic to share huge page memory via shared files, causing damage to huge page memory and leading to Gazelle program crashes. It is recommended that processes under the user belong to the same trust domain. ++ ++**Traffic Limitation** ++Gazelle does not enforce traffic limitations, allowing users to send packets at the maximum network card line speed, potentially causing network traffic congestion. ++ ++**Process Impersonation** ++Legitimately registered lstack processes with ltran can impersonate each other (Process A can impersonate Process B) to send fake messages to ltran, altering ltran's forwarding control information, causing communication anomalies in Process B, and potentially leaking information from Process B to Process A. It is recommended that all lstack processes are trusted processes. +diff --git "a/doc/\345\256\236\350\267\265\347\263\273\345\210\227-Gazelle\345\212\240\351\200\237mysql.md" "b/doc/\345\256\236\350\267\265\347\263\273\345\210\227-Gazelle\345\212\240\351\200\237mysql.md" +new file mode 100644 +index 0000000..ebbf0d6 +--- /dev/null ++++ "b/doc/\345\256\236\350\267\265\347\263\273\345\210\227-Gazelle\345\212\240\351\200\237mysql.md" +@@ -0,0 +1,318 @@ ++ ++ ++ ++ ++Gazelle ++ ++# 实践系列(一)Gazelle加速mysql 20% ++ ++## 背景介绍 ++ ++ 当前网卡性能提升速度远远快于单核CPU,单核CPU已无法充分利用网卡带宽发展红利。同时cpu朝着多核方向发展,NUMA体系结构是现在众核解决方案之一。从硬件视角看解决CPU/网卡之间的算力差距,主要是两种方案,将CPU工作卸载到网卡,硬件加速方案;将NUMA体系结构充分利用起来,软件加速方案。下意识可能都会认为硬件加速方案更快,但是实测中,Gazelle软件加速性能提升更多,后续文章会详讲,主要是数据高效传递到应用这一段路径,Gazelle处理的更好。 ++ ++ ++现在软件的编程模型非常多样,但是可以总结出2类典型的网络模型,如下图: ++- IO复用模型:应用A网络线程之间完全隔离,协议状态上下文固定在某个线程内 ++- 非对称模型:应用B网络线程之间非对称,协议状态上下文会在多个线程之间迁移 ++ ++ ++ ++## 提升mysql性能遇到的问题 ++ ++​ mysql的网络模型属于上述非对称模型,TCP会跨线程迁移。目前业界常见的用户态协议栈都是针对非对称应用设计(如f-stack),不能支持TCP跨线程迁移场景;或使用全局的TCP资源(如lwip),当连接数超过40时,因为竞争问题性能迅速恶化。 ++ ++ ++ ++## Gazelle方案 ++ ++​ Gazelle是一款高性能用户态协议栈。它基于DPDK在用户态直接读写网卡报文,共享大页内存传递报文,使用轻量级LwIP协议栈。能够大幅提高应用的网络I/O吞吐能力。专注于数据库网络性能加速,如MySQL、redis等。兼顾高性能与通用性: ++- 高性能:报文零拷贝,无锁,灵活scale-out,自适应调度。 ++- 通用性 :完全兼容POSIX,零修改,适用不同类型的应用。 ++ ++​ Gazelle解耦应用线程与协议栈线程,从而支持任意的线程模型。通过应用线程fd与协议栈线程sock的路由表,应用线程的read/write等操作能在对应的协议栈线程执行。Gazelle是多核多线程部署,通过区域化大页内存,避免NUMA陷阱。 ++ ++ 技术特征 ++ ++- POSIX兼容 ++ ++- DPDK bypass 内核 ++ ++- 区域化大页内存管理,避免NUMA陷阱 ++ ++- 应用线程亲和性管理 ++ ++- 分布式TCP Hash table,多核多线程工作模式 ++ ++- 协议栈线程与应用线程解耦 ++ ++- 报文高效传递到应用 ++ ++ ++ ++ ++ ++效果如图,使用内核协议栈跑分为54.84万,使用Gazelle跑分为66.85万,Gazelle提升20%+。 ++ ++## Gazelle加速mysql测试步骤 ++ ++### 1. 环境要求 ++ ++#### 1.1 硬件 ++ ++要求服务端(Server)、客户端(Client)各一台 ++ ++| | Server | Client | ++| :------- | :----------------------: | :---------------------: | ++| CPU | Kunpeng 920-4826 * 2 | Kunpeng 920-4826 * 2 | ++| 主频 | 2600MHz | 2600MHz | ++| 内存大小 | 12 * 32G Micron 2666 MHz | 8 * 32G Micron 2666 MHz | ++| 网络 | 1822 25G | 1822 25G | ++| 系统盘 | 1.1T HDD TOSHIBA | 1.1T HDD TOSHIBA | ++| 数据盘 | 3T HUAWEI SSD NVME | NA | ++ ++#### 1.2 软件 ++ ++软件包默认使用openEuler 22.03的yum源 ++ ++| 软件名称 | 版本 | ++| :----------: | :----: | ++| mysql | 8.0.20 | ++| benchmarksql | 5.0 | ++ ++#### 1.3 组网 ++ ++部署 ++ ++### 2. Server端部署 ++ ++#### 2.1 安装mysql依赖包 ++ ++```sh ++yum install -y cmake doxygen bison ncurses-devel openssl-devel libtool tar rpcgen libtirpc-devel bison bc unzip git gcc-c++ libaio libaio-devel numactl ++``` ++ ++#### 2.2 编译安装mysql ++ ++- 从[官网下载](https://downloads.mysql.com/archives/community/)下载源码包 ++ ++下载 ++ ++- 下载优化补丁: [细粒度锁优化特性补丁](https://github.com/kunpengcompute/mysql-server/releases/download/tp_v1.0.0/0001-SHARDED-LOCK-SYS.patch) [NUMA调度补丁](https://github.com/kunpengcompute/mysql-server/releases/download/21.0.RC1.B031/0001-SCHED-AFFINITY.patch) [无锁优化特性补丁](https://github.com/kunpengcompute/mysql-server/releases/download/tp_v1.0.0/0002-LOCK-FREE-TRX-SYS.patch) ++ ++- 编译mysql ++ ++ 编译前确保已安装libaio-devel包 ++ ++```sh ++tar zxvf mysql-boost-8.0.20.tar.gz ++cd mysql-8.0.20/ ++patch -p1 < ../0001-SHARDED-LOCK-SYS.patch ++patch -p1 < ../0001-SCHED-AFFINITY.patch ++patch -p1 < ../0002-LOCK-FREE-TRX-SYS.patch ++cd cmake ++make clean ++cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/mysql-8.0.20 -DWITH_BOOST=../boost -DDOWNLOAD_BOOST=1 ++make -j 64 ++make install ++``` ++ ++#### 2.3 配置mysql参数 ++ ++使用gazelle源码中doc/conf/my.cnf-arm配置文件,放到/etc目录重命名为my.cnf ++ ++#### 2.4 部署mysql ++ ++```sh ++#挂载nvme盘 ++mkdir -p /data ++mount /dev/nvme0n1 /data ++mkdir -p /data/mysql ++mkdir -p /data/mysql/data ++mkdir -p /data/mysql/share ++mkdir -p /data/mysql/tmp ++mkdir -p /data/mysql/run ++mkdir -p /data/mysql/log ++ ++#创建用户组 ++groupadd mysql ++useradd -g mysql mysql ++chown -R mysql:mysql /data ++chown -R mysql:mysql /data/mysql/log/mysql.log ++ ++#初始化 ++echo "" > /data/mysql/log/mysql.log ++rm -fr /data/mysql/data/* ++/usr/local/mysql-8.0.20/bin/mysqld --defaults-file=/etc/my.cnf --user=root --initialize ++ ++#启动服务 ++/usr/local/mysql-8.0.20/support-files/mysql.server start ++ ++#完成初始化后会随机生成一个密码,用其登录mysql ++/usr/local/mysql-8.0.20/bin/mysql -u root -p ++alter user 'root'@'localhost' identified by '123456'; ++flush privileges; ++quit ++ ++#再次登录数据库,密码123456,更新root账号能够访问的域为%,从而可以支持远程访问 ++/usr/local/mysql-8.0.20/bin/mysql -u root -p ++use mysql; ++update user set host='%' where user='root'; ++flush privileges; ++create database tpcc; ++quit ++ ++#先关闭服务,后面测试启动生效配置 ++/usr/local/mysql-8.0.20/support-files/mysql.server stop ++``` ++ ++### 3. client部署benchmarksql工具 ++ ++- 编译安装 ++ ++下载 [benchmarksql工具](https://mirrors.huaweicloud.com/kunpeng/archive/kunpeng_solution/database/patch/benchmarksql5.0-for-mysql.zip) ++ ++```sh ++#安装benchmarksql依赖包 ++yum install -y java ++ ++unzip benchmarksql5.0-for-mysql.zip ++cd benchmarksql5.0-for-mysql/run ++chmod +x *.sh ++``` ++ ++- 配置benchmarksql参数 ++ ++ benchmarksql5.0-for-mysql/run/props.conf ++ ++ | 配置项 | 值 | 描述 | ++ | --------- | ---- | ------------------------------ | ++ | Terminals | 300 | 压力测试的并发数量。 | ++ | runMins | 10 | 压力测试运行时间(单位:分钟) | ++ | conn | ip | 修改默认IP为服务端IP | ++ ++### 4. mysql创建测试数据 ++ ++```sh ++#启动服务 ++/usr/local/mysql-8.0.20/support-files/mysql.server start ++ ++#创建测试数据(创建数据大约45分钟,完成测试数据创建后,建议对server端/data/mysql/data下数据进行备份,之后测试,数据从此拷贝即可) ++./runDatabaseBuild.sh props.conf ++ ++#停止数据库 ++/usr/local/mysql-8.0.20/support-files/mysql.server stop ++``` ++ ++ ++ ++### 5. 配置执行环境 ++ ++#### 5.1 开启STEAL优化 ++ ++服务端开启STEAL优化 ++ ++1. 在linux系统启动项添加参数`sched_steal_node_limit=4`,reboot重启生效 ++ ++```sh ++[root@localhost mysql]# cat /proc/cmdline ++BOOT_IMAGE=/vmlinuz-5.10.0-153.12.0.89.oe2203sp2.aarch64 root=/dev/mapper/openeuler-root ro rd.lvm.lv=openeuler/root rd.lvm.lv=openeuler/swap video=VGA-1:640x480-32@60me cgroup_disable=files apparmor=0 crashkernel=1024M,high smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15 console=tty0 sched_steal_node_limit=4 ++``` ++ ++2. 重启后开启STEAL ++ ++```sh ++echo STEAL > /sys/kernel/debug/sched_features ++``` ++ ++#### 5.2 关闭测试影响项 ++ ++```sh ++#关闭irqbalance ++systemctl stop irqbalance.service ++systemctl disable irqbalance.service ++ ++#关闭防火墙 ++systemctl stop iptables ++systemctl stop firewalld ++``` ++ ++### 6. 内核协议栈测试mysql ++ ++```sh ++#服务端绑中断(根据环境替换网卡名称、绑核cpu核) ++ethtool -L enp4s0 combined 5 ++irq1=`cat /proc/interrupts| grep -E enp4s0 | head -n5 | awk -F ':' '{print $1}'` ++cpulist=(91 92 93 94 95) ++c=0 ++for irq in $irq1 ++do ++echo ${cpulist[c]} "->" $irq ++echo ${cpulist[c]} > /proc/irq/$irq/smp_affinity_list ++let "c++" ++done ++ ++#客户端执行mysql测试 ++./runBenchmark.sh props.conf ++ ++##恢复环境 ++#服务端使用备份数据恢复数据库,也可重新生成数据。 ++rm -fr /data/mysql/data/* ++cp -fr /home/tpccdata/* /data/mysql/data/ ++#关闭mysql进程 ++pkill -9 mysqld ++``` ++ ++测试结果如下: ++ ++ ++ ++### 7. Gazelle测试mysql ++安装软件包 ++```sh ++yum -y install gazelle dpdk libconfig numactl libboundscheck libcap ++``` ++ ++修改/etc/gazelle/lstack.conf配置文件修改如下 ++ ++| 配置项 | 值 | 描述 | ++| ------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ++| dpdk_args | ["--socket-mem", "2048,2048,2048,2048", "--huge-dir", "/mnt/hugepages-lstack", "--proc-type", "primary", "--legacy-mem", "--map-perfect"] | 配置每个NUMA使用2G内存(也可以更小),大页内存挂载目录 | ++| use_ltran | 0 | 不使用ltran | ++| listen_shadow | 1 | 使用listen影子fd,因为mysql一个listen线程对应4个协议栈线程 | ++| num_cpus | "18,38,58,78" | 每个NUMA选择一个cpu | ++ ++ ++ ++```sh ++#服务端分配大页 ++echo 8192 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages #根据实际选择pagesize ++mkdir -p /mnt/hugepages-lstack ++mount -t hugetlbfs nodev /mnt/hugepages-lstack #不能重复操作,否则大页被占用不能释放 ++ ++#服务端加载ko ++modprobe vfio enable_unsafe_noiommu_mode=1 ++modprobe vfio-pci ++ ++#服务端绑定网卡到用户态 ++ip link set enp4s0 down ++dpdk-devbind -b vfio-pci enp4s0 ++ ++#服务端启动mysqld ++LD_PRELOAD=/usr/lib64/liblstack.so GAZELLE_BIND_PROCNAME=mysqld /usr/local/mysql-8.0.20/bin/mysqld --defaults-file=/etc/my.cnf --bind-address=192.168.1.10 & ++ ++#客户端执行mysql测试 ++./runBenchmark.sh props.conf ++ ++##恢复环境 ++#服务端使用备份数据恢复数据库,也可重新生成数据。 ++rm -fr /data/mysql/data/* ++cp -fr /home/tpccdata/* /data/mysql/data/ ++#关闭mysql进程 ++pkill -9 mysqld ++``` ++Gazelle部署详见[Gazelle用户指南](user-guide.md) ++ ++测试结果如下: ++ ++ ++ +diff --git a/examples/FAULT_INJECT.md b/examples/FAULT_INJECT.md +new file mode 100644 +index 0000000..ff551a9 +--- /dev/null ++++ b/examples/FAULT_INJECT.md +@@ -0,0 +1,33 @@ ++# Gazelle 故障注入 说明 ++ ++## 需求 ++1. example:构造黑盒故障 ++ * 延迟类:accept|read: ++ * accept: 构造tcp_acceptmbox_full的情景. ++ * read: 构造tcp_refuse_count、recvmbox满 ++ * 跳过类:跳过 read/write并close: ++ * read: 构造链接关闭时时4次挥手的情景,验证TCP状态机。 ++2. gazelle/lwip: 构造白盒故障,支持注入故障报文、协议栈状态、事件设置、资源异常等 ++ * 编译宏支持 ++ * 提供接口:配置文件、env ++ * 故障报文注入: ++ * 类似内核tc工具: ++ * 内核TC工具qdisc指令原理:报文分组被添加到网卡队列(qdisc),该队列决定发包顺序。
++ qdisc指令可以在队列层面实现延时、丢包、重复等故障。 ++ * dpdk性能检测工具testpmd可以模拟实现类似的故障模拟,testpmd与gazelle不兼容,需要参考其中调用的dpdk接口来改gazelle代码。
++ * 延时故障 ++ * 丢包故障 ++ - 思路:调整网卡队列,随机丢弃百分比的包,然后发送。 ++ - 函数调用:rte_rand(),rte_eth_tx_burst()。 ++ * 包重复故障 ++ * 随机故障 ++ * 乱序故障 ++ * 协议栈状态故障 ++ * ... ++ * 事件设置 ++ * ... ++ * 资源异常 ++ * 资源耗尽,无法申请。 ++ * ... ++ ++ +diff --git a/examples/README.md b/examples/README.md +index 5a73ce0..77a0f85 100644 +--- a/examples/README.md ++++ b/examples/README.md +@@ -7,6 +7,7 @@ + * 支持多线程网络非对称模型,一个 listen 线程,若干个读写线程。listen 线程和读写线程使用 `poll` / `epoll` 监听事件。 + * 支持 `recvmsg` 、`sendmsg` 、`recv` 、`send` 、`recvfrom`、`sendto`、`getpeername` 、`getsockopt` 、`epoll_ctl` 等 posix 接口。 + * 网络通讯报文采用问答方式,丢包或者内容错误则报错并停止通讯。报文内容有变化,长度可配。 ++* 支持网络故障注入,延迟进行(delay)、跳过(skip)read、write、accept等逻辑。 + + ## 网络模型 + +@@ -103,15 +104,15 @@ + * `-a, --as [server | client]`:作为服务端还是客户端。 + * `server`:作为服务端。 + * `client`:作为客户端。 +-* `-i, --ip [xxx.xxx.xxx.xxx]`:IP地址。 +-* `-g, --groupip [xxx.xxx.xxx.xxx]`:UDP组播地址。 ++* `-i, --ip [xxx.xxx.xxx.xxx]`:server端IP地址。当v4与v6地址同时存在时,以","分隔。例如:`-i 192.168.1.88,aa22:bb11:1122:cdef:1234:aa99:7654:7410` ++* `-g, --groupip [xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx]`:配置UDP组播地址与interface地址,以','分隔,其中interface地址为可选项。例如:`-g 224.0.0.24,192.168.1.202`或`-g 224.0.0.24` + * `-p, --port [xxxx]`:端口。 + * `-m, --model [mum | mud]`:采用的网络模型类型。 + * `mum (multi thread, unblock, multiplexing IO)`:多线程非阻塞IO复用。 + * `mud (multi thread, unblock, dissymmetric)`:多线程非阻塞非对称。 + * `-t, --threadnum`:线程数设置。 + * `-c, --connectnum`:连接数设置。当 `domain` 设置为 `udp` 时,`connectnum` 会被设置为1。 +-* `-D, --domain [unix | tcp | udp]`:通信协议。 ++* `-D, --domain [unix | tcp | udp]`:通信协议。当支持多个通信协议时以","分隔。例如:`-D tcp,udp` + * `unix`:基于 unix 协议实现。 + * `tcp`:基于 tcp 协议实现。 + * `udp`:基于 udp 协议实现。 +@@ -132,6 +133,19 @@ + * `-C, --accept`:accept的方式。 + * `ac`:使用accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen)通过套接口接受连接。 + * `ac4`:使用accept4(int sockfd, struct sockaddr *addr,socklen_t *addrlen, int flags)通过套接口接受连接,flags=SOCK_CLOEXEC。 ++* `-k, --keep_alive`:配置TCP keep_alive idle , keep_alive interval时间(second)。 ++* `-I, --inject`: 配置故障注入类型。 ++ * `delay`: ++ * `"delay 20 before_accept"`: 延迟20秒进行accept,时间可自定义,需大于0。可用于构造tcp_acceptmbox_full的情景 ++ * `"delay 20 before_read"`: 延迟20秒进行read,时间可自定义,需大于0。 ++ * `"delay 20 before_write"`: 延迟20秒进行write,时间可自定义,需大于0。 ++ * `"delay 20 before_read_and_write"`: 延迟20秒进行read和write,时间可自定义,需大于0。 ++ * `skip`: ++ * `"skip write"`: 跳过写过程,并关闭链接。 ++ * `"skip read"`: 跳过读过程,并关闭链接。 ++ * `"skip read_and_write"`: 跳过读写写过程,并关闭链接。 ++ ++ + ## 使用 + + * **环境配置** +@@ -235,12 +249,12 @@ make + * 创建udp组播服务端 + + ``` +-./example -A server -D udp -i 192.168.0.1 -g 225.0.0.1 -A recvfromsendto ++./example -A server -D udp -g 225.0.0.1,192.168.0.1 -A recvfromsendto + + [program parameters]: + --> [as]: server +---> [server ip]: 192.168.0.1 + --> [server group ip]: 225.0.0.1 ++--> [server groupip_interface]: 192.168.0.1 + --> [server port]: 5050 + --> [model]: mum + --> [thread number]: 1 +@@ -260,12 +274,12 @@ make + * 创建udp组播客户端 + + ``` +-./example -A client -D udp -i 192.168.0.1 -g 225.0.0.1 -A recvfromsendto ++./example -A client -D udp -g 225.0.0.1,192.168.0.1 -A recvfromsendto + + [program parameters]: +---> [as]: server +---> [server ip]: 225.0.0.1 +---> [client send ip]: 192.168.0.1 ++--> [as]: client ++--> [client group ip]: 225.0.0.1 ++--> [client groupip_interface]: 192.168.0.1 + --> [server port]: 5050 + --> [thread number]: 1 + --> [connection number]: 1 +@@ -280,3 +294,50 @@ make + [program informations]: + --> : [connect num]: 0, [send]: 0.000 B/s + ``` ++ ++* 混杂模式下server 与 client 配置 ++``` ++./example -a server -D tcp,udp -i 192.168.1.88 -p 33333 -g 224.0.0.24,192.168.1.188 ++[program parameters]: ++--> [as]: server ++--> [server group ip]: 224.0.0.24 ++--> [server groupip_interface]: 192.168.1.188 ++--> [server ip]: 192.168.1.888 ++--> [server port]: 33333 ++--> [model]: mum ++--> [thread number]: 1 ++--> [domain]: tcp,udp ++--> [api]: read & write ++--> [packet length]: 1024 ++--> [verify]: off ++--> [ringpmd]: off ++--> [debug]: off ++--> [epoll create]: ec ++--> [accept]: ac ++--> [inject]: none ++ ++[program informations]: ++``` ++``` ++./example -a client -D tcp,udp -i 192.168.1.188 -p 33333 -g 192.168.1.202,224.0.0.24 ++[program parameters]: ++--> [as]: client ++--> [client group ip]: 224.0.0.24 ++--> [client groupip_interface]: 192.168.1.202 ++--> [server ip]: 192.168.1.188 ++--> [server port]: 33333 ++--> [thread number]: 1 ++--> [connection number]: 1 ++--> [domain]: tcp,udp ++--> [api]: read & write ++--> [packet length]: 1024 ++--> [verify]: off ++--> [ringpmd]: off ++--> [debug]: off ++--> [epoll create]: ec ++--> [accept]: ac ++--> [inject]: none ++ ++[program informations]: ++ ++``` +\ No newline at end of file +diff --git a/examples/inc/bussiness.h b/examples/inc/bussiness.h +index 83645ef..3a78b1f 100644 +--- a/examples/inc/bussiness.h ++++ b/examples/inc/bussiness.h +@@ -28,7 +28,9 @@ + */ + struct ServerHandler + { ++ int32_t listen_fd_array[PROTOCOL_MODE_MAX]; + int32_t fd; ///< socket file descriptor ++ int32_t is_v6; + }; + + /** +@@ -39,6 +41,7 @@ struct ClientHandler + { + int32_t fd; ///< socket file descriptor + uint32_t msg_idx; ///< the start charactors index of message ++ int32_t sendtime_interverl; ///< udp send packet interverl + }; + + +@@ -90,24 +93,21 @@ int32_t client_bussiness(char *out, const char *in, uint32_t size, bool verify, + /** + * @brief server checks the information and answers + * This function checks the information and answers. +- * @param server_handler server handler ++ * @param fd socket_fd + * @param pktlen the length of package + * @param api the api + * @return the result + */ +-int32_t server_ans(struct ServerHandler *server_handler, uint32_t pktlen, const char* api, const char* domain); ++int32_t server_ans(int32_t fd, uint32_t pktlen, const char* api, const char* domain); + + /** + * @brief client asks server + * This function asks server. + * @param client_handler client handler +- * @param pktlen the length of package +- * @param api the api +- * @param domain the domain ++ * @param client_unit ClientUnit + * @return the result + */ +-int32_t client_ask(struct ClientHandler *client_handler, uint32_t pktlen, const char* api, const char* domain, in_addr_t ip, uint16_t port); +- ++int32_t client_ask(struct ClientHandler *client_handler, struct ClientUnit *client_unit); + /** + * @brief client checks the information and answers + * This function checks the information and answers. +@@ -119,7 +119,7 @@ int32_t client_ask(struct ClientHandler *client_handler, uint32_t pktlen, const + * @param ip the ip address of peer, maybe group ip + * @return the result + */ +-int32_t client_chkans(struct ClientHandler *client_handler, uint32_t pktlen, bool verify, const char* api, const char* domain, in_addr_t ip); ++int32_t client_chkans(struct ClientHandler *client_handler, uint32_t pktlen, bool verify, const char* api, const char* domain, ip_addr_t* ip); + + + #endif // __EXAMPLES_BUSSINESS_H__ +diff --git a/examples/inc/client.h b/examples/inc/client.h +index 97af33f..0fe07aa 100644 +--- a/examples/inc/client.h ++++ b/examples/inc/client.h +@@ -19,31 +19,8 @@ + #include "parameter.h" + #include "bussiness.h" + +- +-/** +- * @brief client unit +- * The information of each thread of client. +- */ +-struct ClientUnit +-{ +- struct ClientHandler *handlers; ///< the handlers +- int32_t epfd; ///< the connect epoll file descriptor +- struct epoll_event *epevs; ///< the epoll events +- uint32_t curr_connect; ///< current connection number +- uint64_t send_bytes; ///< total send bytes +- in_addr_t ip; ///< server ip +- in_addr_t groupip; ///< server groupip +- uint16_t port; ///< server port +- uint16_t sport; ///< client sport +- uint32_t connect_num; ///< total connection number +- uint32_t pktlen; ///< the length of peckage +- bool verify; ///< if we verify or not +- char* domain; ///< the communication domain +- char* api; ///< the type of api +- bool debug; ///< if we print the debug information +- char* epollcreate; ///< epoll_create method +- struct ClientUnit *next; ///< next pointer +-}; ++#define TIME_SCAN_INTERVAL 1 ++#define TIME_SEND_INTERVAL 1 + + /** + * @brief client +@@ -53,8 +30,14 @@ struct Client + { + struct ClientUnit *uints; ///< the server mum unit + bool debug; ///< if we print the debug information ++ uint32_t threadNum; ++ bool loop; ///< judge client info print while loop is open + }; + ++struct Client_domain_ip { ++ char *domain; ++ uint8_t ip_family; ++}; + + /** + * @brief the single thread, client prints informations +@@ -66,7 +49,7 @@ struct Client + * @param debug if debug or not + * @return the result pointer + */ +-void client_debug_print(const char *ch_str, const char *act_str, in_addr_t ip, uint16_t port, bool debug); ++void client_debug_print(const char *ch_str, const char *act_str, ip_addr_t *ip, uint16_t port, bool debug); + + /** + * @brief the client prints informations +@@ -86,7 +69,7 @@ void client_info_print(struct Client *client); + * @param domain domain + * @return the result pointer + */ +-int32_t client_thread_try_connect(struct ClientHandler *client_handler, int32_t epoll_fd, in_addr_t ip, in_addr_t groupip, uint16_t port, uint16_t sport, const char *domain, const char *api); ++int32_t client_thread_try_connect(struct ClientHandler *client_handler, struct ClientUnit *client_unit); + + /** + * @brief the single thread, client retry to connect to server, register to epoll +@@ -122,4 +105,11 @@ void *client_s_create_and_run(void *arg); + int32_t client_create_and_run(struct ProgramParams *params); + + ++/** ++ * @brief loop server info ++ * This function print loop mode server info. ++ */ ++void loop_info_print(); ++ ++ + #endif // __EXAMPLES_CLIENT_H__ +diff --git a/examples/inc/parameter.h b/examples/inc/parameter.h +index 93e3672..ff2f114 100644 +--- a/examples/inc/parameter.h ++++ b/examples/inc/parameter.h +@@ -20,6 +20,8 @@ + + #define PARAM_DEFAULT_AS ("server") ///< default type + #define PARAM_DEFAULT_IP ("127.0.0.1") ///< default IP ++#define PARAM_DEFAULT_IP_V6 ("0.0.0.0.0.0.0.0") ///< default IP ++#define PARAM_DEFAULT_ADDR_FAMILY (AF_INET) ///< default address family + #define PARAM_DEFAULT_PORT (5050) ///< default port + #define PARAM_DEFAULT_SPORT (0) ///< default sport + #define PARAM_DEFAULT_MODEL ("mum") ///< default model type +@@ -34,6 +36,9 @@ + #define PARAM_DEFAULT_EPOLLCREATE ("ec") ///< default method of epoll_create + #define PARAM_DEFAULT_ACCEPT ("ac") ///< default method of accept method + #define PARAM_DEFAULT_GROUPIP ("0.0.0.0") ///< default group IP> ++#define PARAM_DEFAULT_KEEPALIVEIDLE (0) ///< default TCP_KEEPALIVE_IDLE_TIME> ++ ++#define TCP_KEEPALIVE_IDLE_MAX (3600) // time: second + + + enum { +@@ -43,7 +48,7 @@ enum { + PARAM_NUM_IP = 'i', + #define PARAM_NAME_PORT ("port") ///< name of parameter port + PARAM_NUM_PORT = 'p', +-#define PARAM_NAME_SPORT ("sport") ///< name of parameter sport ++#define PARAM_NAME_SPORT ("sport") ///< name of parameter sport + PARAM_NUM_SPORT = 's', + #define PARAM_NAME_MODEL ("model") ///< name of parameter model type + PARAM_NUM_MODEL = 'm', +@@ -71,12 +76,27 @@ enum { + PARAM_NUM_ACCEPT = 'C', + #define PARAM_NAME_GROUPIP ("groupip") ///< name of parameter group ip + PARAM_NUM_GROUPIP = 'g', ++#define PARAM_NAME_KEEPALIVE ("keep_alive") ///< name of parameter keep_alive ++ PARAM_NUM_KEEPALIVE = 'k', ++#define PARAM_NAME_INJECT ("inject") ///< name of parameter fault inject ++ PARAM_NUM_INJECT = 'I', + }; + + #define NO_ARGUMENT 0 ///< options takes no arguments + #define REQUIRED_ARGUMETN 1 ///< options requires arguments + #define OPTIONAL_ARGUMETN 2 ///< options arguments are optional + ++uint8_t getbit_num(uint8_t mode, uint8_t index); ++uint8_t setbitnum_on(uint8_t mode, uint8_t index); ++uint8_t setbitnum_off(uint8_t mode, uint8_t index); ++ ++uint8_t program_get_protocol_mode_by_domain_ip(char* domain, char* ipv4, char* ipv6, char* group_ip); ++ ++struct ServerBaseCfgInfo { ++ const char *domain; ++ const char *api; ++ uint32_t pktlen; ++}; + + /** + * @brief program option description +@@ -96,12 +116,13 @@ struct ProgramOption { + struct ProgramParams { + char* as; ///< as server or client + char* ip; ///< IP address +- uint32_t port; ///< port +- uint32_t sport; ///< sport ++ char* ipv6; ++ bool port[UNIX_TCP_PORT_MAX]; ///< index:port list; value:port is set or not ++ bool sport[UNIX_TCP_PORT_MAX]; ///< index:sport list; value:sport is set or not + char* model; ///< model type + uint32_t thread_num; ///< the number of threads + uint32_t connect_num; ///< the connection number +- char* domain; ///< the communication dimain ++ char* domain; ///< the communication domain + char* api; ///< the type of api + uint32_t pktlen; ///< the packet length + bool verify; ///< if we verify the message or not +@@ -110,8 +131,58 @@ struct ProgramParams { + char* accept; ///< accept connections method + bool ringpmd; ///< if we use ring PMD or not + char* groupip; ///< group IP address> ++ char* groupip_interface; ///< udp multicast interface address> ++ uint32_t addr_family; ///< IP address family ++ int32_t tcp_keepalive_idle; ///< tcp keepalive idle time ++ int32_t tcp_keepalive_interval; ///< tcp keepalive interval time ++#define INJECT_TYPE_IDX (0) ///< the index of inject type ++#define INJECT_TIME_IDX (1) ///< the index of delay time ++#define INJECT_SKIP_IDX (1) ///< the index of skip location ++#define INJECT_LOCATION_IDX (2) ///< the index of delay location ++#define FAULT_INJECT_PARA_COUNT (3) ///< the count of fault injection parameters ++ char* inject[FAULT_INJECT_PARA_COUNT]; /// < fault inject + }; + ++typedef enum { ++ INJECT_DELAY_ACCEPT = 0, ++ INJECT_DELAY_READ, ++ INJECT_DELAY_WRITE, ++ INJECT_DELAY_MAX, ++}delay_type; ++ ++typedef enum { ++ INJECT_SKIP_READ = 0, ++ INJECT_SKIP_WRITE, ++ INJECT_SKIP_MAX, ++} skip_type; ++ ++typedef enum { ++ V4_TCP, ++ V6_TCP, ++ V4_UDP, ++ V6_UDP, ++ UDP_MULTICAST, ++ UNIX, ++ PROTOCOL_MODE_MAX ++} PROTOCOL_MODE_ENUM_TYPE; ++ ++#define FAULT_INJECT_SKIP_BEGIN(skip_type) \ ++ if (get_g_inject_skip((skip_type))) {} \ ++ else { ++#define FAULT_INJECT_SKIP_END } ++ ++/** ++ * @brief return g_inject_skip value ++ * This function return g_inject_skip value to deside if excute skip ++ */ ++int32_t get_g_inject_skip(skip_type type); ++ ++/** ++ * @brief function execute delay inject ++ * This function delay execute following program. ++ */ ++void fault_inject_delay(delay_type type); ++ + /** + * @brief initialize the parameters + * This function initializes the parameters of main function. +@@ -142,5 +213,6 @@ int32_t program_params_parse(struct ProgramParams *params, uint32_t argc, char * + */ + void program_params_print(struct ProgramParams *params); + ++bool ip_is_v6(const char *ip); + + #endif // __EXAMPLES_PARAMETER_H__ +diff --git a/examples/inc/server.h b/examples/inc/server.h +index a3affef..4631a28 100644 +--- a/examples/inc/server.h ++++ b/examples/inc/server.h +@@ -31,8 +31,7 @@ struct ServerMumUnit + struct epoll_event *epevs; ///< the epoll events + uint32_t curr_connect; ///< current connection number + uint64_t recv_bytes; ///< total receive bytes +- in_addr_t ip; ///< server ip +- in_addr_t groupip; ///< server group ip ++ struct ServerIpInfo server_ip_info; + uint16_t port; ///< server port + uint32_t pktlen; ///< the length of peckage + char* domain; ///< communication domain +@@ -40,6 +39,9 @@ struct ServerMumUnit + bool debug; ///< if we print the debug information + char* epollcreate; ///< epoll_create method + char* accept; ///< accept connections method ++ int32_t tcp_keepalive_idle; ///< tcp keepalive idle time ++ int32_t tcp_keepalive_interval; ///< tcp keepalive interval time ++ uint8_t protocol_type_mode; ///< tcp/udp ipv4/ipv6 protocol mode + struct ServerMumUnit *next; ///< next pointer + }; + +@@ -64,11 +66,13 @@ struct ServerMudWorker + struct epoll_event *epevs; ///< the epoll events + uint64_t recv_bytes; ///< total receive bytes + uint32_t pktlen; ///< the length of peckage +- in_addr_t ip; ///< client ip ++ ip_addr_t ip; ///< client ip + uint16_t port; ///< client port + char* api; ///< the type of api + bool debug; ///< if we print the debug information + char* epollcreate; ///< epoll_create method ++ char* domain; ++ uint32_t curr_connect; + struct ServerMudWorker *next; ///< next pointer + }; + +@@ -82,16 +86,17 @@ struct ServerMud + struct ServerMudWorker *workers; ///< the workers + int32_t epfd; ///< the listen epoll file descriptor + struct epoll_event *epevs; ///< the epoll events +- uint32_t curr_connect; ///< current connection number +- in_addr_t ip; ///< server ip +- in_addr_t groupip; ///< server group ip +- uint16_t port; ///< server port ++ struct ServerIpInfo server_ip_info; ++ bool* port; ///< server port point to parameter's port + uint32_t pktlen; ///< the length of peckage + char* domain; ///< communication domain + char* api; ///< the type of api + bool debug; ///< if we print the debug information + char* accept; ///< accept connections method + char* epollcreate; ///< epoll_create method ++ int32_t tcp_keepalive_idle; ///< tcp keepalive idle time ++ int32_t tcp_keepalive_interval; ///< tcp keepalive interval time ++ uint8_t protocol_type_mode; ///< tcp/udp ipv4/ipv6 protocol mode + }; + + +@@ -105,7 +110,7 @@ struct ServerMud + * @param debug if debug or not + * @return the result pointer + */ +-void server_debug_print(const char *ch_str, const char *act_str, in_addr_t ip, uint16_t port, bool debug); ++void server_debug_print(const char *ch_str, const char *act_str, ip_addr_t *ip, uint16_t port, bool debug); + + /** + * @brief the multi thread, unblock, dissymmetric server prints informations +@@ -136,7 +141,7 @@ int32_t sermud_listener_create_epfd_and_reg(struct ServerMud *server_mud); + * @param server_mud the server unit + * @return the result pointer + */ +-int32_t sermud_listener_accept_connects(struct ServerMud *server_mud); ++int32_t sermud_listener_accept_connects(struct epoll_event *curr_epev, struct ServerMud *server_mud); + + /** + * @brief the worker thread, unblock, dissymmetric server processes the events +@@ -200,7 +205,7 @@ int32_t sersum_create_epfd_and_reg(struct ServerMumUnit *server_unit); + * @param server_handler the server handler + * @return the result pointer + */ +-int32_t sersum_accept_connects(struct ServerMumUnit *server_unit, struct ServerHandler *server_handler); ++int32_t sersum_accept_connects(struct epoll_event *cur_epev, struct ServerMumUnit *server_unit); + + /** + * @brief the single thread, unblock, mutliplexing IO server processes the events +diff --git a/examples/inc/utilities.h b/examples/inc/utilities.h +index 0f9db4e..262481a 100644 +--- a/examples/inc/utilities.h ++++ b/examples/inc/utilities.h +@@ -27,6 +27,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -36,6 +37,7 @@ + #include + + #include ++#include + #include + + #include "securec.h" +@@ -47,7 +49,7 @@ + { \ + printf("\n[error]: "); \ + printf(format, ##__VA_ARGS__); \ +- printf("\n"); \ ++ printf("\n\n"); \ + } while (0) + #define PRINT_WARNNING(format, ...) do \ + { \ +@@ -76,7 +78,7 @@ + } while(0) + #define PRINT_CLIENT_DATAFLOW(format, ...) do \ + { \ +- printf("\033[?25l\033[A\033[K"); \ ++ printf(" "); \ + printf("--> : "); \ + printf(format, ##__VA_ARGS__); \ + printf("\033[?25h\n"); \ +@@ -90,24 +92,94 @@ + #define PROGRAM_INPROGRESS (-2) ///< program in progress flag + + #define UNIX_TCP_PORT_MIN (1024) ///< TCP minimum port number in unix +-#define UNIX_TCP_PORT_MAX (65535) ///< TCP minimum port number in unix ++#define UNIX_TCP_PORT_MAX (65535) ///< TCP maximum port number in unix + #define THREAD_NUM_MIN (1) ///< minimum number of thead + #define THREAD_NUM_MAX (1000) ///< maximum number of thead + #define MESSAGE_PKTLEN_MIN (2) ///< minimum length of message (1 byte) + #define MESSAGE_PKTLEN_MAX (1024 * 1024 * 10) ///< maximum length of message (10 Mb) ++#define UDP_PKTLEN_MAX (65507) ///< maximum length of udp message + +-#define SERVER_SOCKET_LISTEN_BACKLOG (128) ///< the queue of socket ++#define SERVER_SOCKET_LISTEN_BACKLOG (4096) ///< the queue of socket + #define SERVER_EPOLL_SIZE_MAX (10000) ///< the max wait event of epoll + #define SERVER_EPOLL_WAIT_TIMEOUT (-1) ///< the timeout value of epoll + + #define CLIENT_EPOLL_SIZE_MAX (10000) ///< the max wait event of epoll + #define CLIENT_EPOLL_WAIT_TIMEOUT (-1) ///< the timeout value of epoll + +-#define TERMINAL_REFRESH_MS (100) ///< the time cut off between of terminal refresh ++#define TERMINAL_REFRESH_MS (500) ///< the time cut off between of terminal refresh + + #define SOCKET_UNIX_DOMAIN_FILE "unix_domain_file" ///< socket unix domain file + ++#define IPV4_STR "V4" ++#define IPV6_STR "V6" ++#define IPV4_MULTICAST "Multicast" ++#define INVAILD_STR "STR_NULL" ++ ++#define TIMES_CONVERSION_RATE (1000) ++#define KB (1024) ++#define MB (KB * KB) ++#define GB (MB * MB) ++ ++struct ThreadUintInfo { ++ uint64_t send_bytes; ///< total send bytes ++ uint32_t cur_connect_num; ///< total connection number ++ char* domain; ++ char* ip_type_info; ++ pthread_t thread_id; ++}; ++ ++typedef struct ip_addr { ++ struct { ++ struct in_addr ip4; ++ struct in6_addr ip6; ++ } u_addr; ++ uint32_t addr_family; ++} ip_addr_t; ++ ++typedef union sockaddr_union { ++ struct sockaddr sa; ++ struct sockaddr_in in; ++ struct sockaddr_in6 in6; ++} sockaddr_t; + ++/** ++ * @brief client unit ++ * The information of each thread of client. ++ */ ++struct ClientUnit { ++ struct ClientHandler *handlers; ///< the handlers ++ int32_t epfd; ///< the connect epoll file descriptor ++ struct epoll_event *epevs; ///< the epoll events ++ uint32_t curr_connect; ///< current connection number ++ ip_addr_t ip; ///< server ip ++ ip_addr_t groupip; ///< server groupip ++ uint32_t port; ///< server port ++ ip_addr_t groupip_interface; ///< udp multicast interface address> ++ uint32_t sport; ///< client sport ++ uint32_t connect_num; ///< total connection number ++ uint32_t pktlen; ///< the length of peckage ++ uint32_t loop; ///< the packet send to loop ++ bool verify; ///< if we verify or not ++ char* domain; ///< the communication domain ++ char* api; ///< the type of api ++ bool debug; ///< if we print the debug information ++ char* epollcreate; ///< epoll_create method ++ uint8_t protocol_type_mode; ///< tcp/udp ipv4/ipv6 protocol mode ++ struct ThreadUintInfo threadVolume; ++ struct ClientUnit *next; ///< next pointer ++}; ++struct ServerIpInfo { ++ ip_addr_t ip; ///< server ip ++ ip_addr_t groupip; ///< server group ip ++ ip_addr_t groupip_interface; ///< server group interface ip ++}; ++ ++struct LoopInfo { ++ char* model; ++ struct ServerMud *server_mud_info; ++ struct ServerMum *server_mum_info; ++}; ++extern struct LoopInfo loopmod; + /** + * @brief create the socket and listen + * Thi function creates the socket and listen. +@@ -118,7 +190,8 @@ + * @param domain domain + * @return the result + */ +-int32_t create_socket_and_listen(int32_t *socket_fd, in_addr_t ip, in_addr_t groupip, uint16_t port, const char *domain); ++int32_t create_socket_and_listen(int32_t *listen_fd_array, struct ServerIpInfo *server_ip_info, uint16_t port, ++ uint8_t protocol_mode); + + /** + * @brief create the socket and connect +@@ -131,7 +204,7 @@ int32_t create_socket_and_listen(int32_t *socket_fd, in_addr_t ip, in_addr_t gro + * @param api api + * @return the result + */ +-int32_t create_socket_and_connect(int32_t *socket_fd, in_addr_t ip, in_addr_t groupip, uint16_t port, uint16_t sport, const char *domain, const char *api); ++int32_t create_socket_and_connect(int32_t *socket_fd, struct ClientUnit *client_unit); + + /** + * @brief set the socket to unblock +@@ -140,6 +213,7 @@ int32_t create_socket_and_connect(int32_t *socket_fd, in_addr_t ip, in_addr_t gr + * @return the result + */ + int32_t set_socket_unblock(int32_t socket_fd); ++int32_t set_tcp_keep_alive_info(int32_t sockfd, int32_t tcp_keepalive_idle, int32_t tcp_keepalive_interval); + + + #endif // __EXAMPLES_UTILITIES_H__ +diff --git a/examples/main.c b/examples/main.c +index 5338572..dfee2db 100644 +--- a/examples/main.c ++++ b/examples/main.c +@@ -31,7 +31,12 @@ int32_t main(int argc, char *argv[]) + + if (strcmp(prog_params.as, "server") == 0) { + server_create_and_run(&prog_params); +- } else { ++ } else if (strcmp(prog_params.as, "client") == 0) { ++ client_create_and_run(&prog_params); ++ } else if (strcmp(prog_params.as, "loop") == 0) { ++ server_create_and_run(&prog_params); ++ /* sleep to wait server creating */ ++ sleep(1); + client_create_and_run(&prog_params); + } + +diff --git a/examples/src/bussiness.c b/examples/src/bussiness.c +index 7263371..46c99fe 100644 +--- a/examples/src/bussiness.c ++++ b/examples/src/bussiness.c +@@ -11,8 +11,9 @@ + */ + + +-#include "bussiness.h" ++#include "parameter.h" + #include "client.h" ++#include "bussiness.h" + + + static const char bussiness_messages_low[] = "abcdefghijklmnopqrstuvwxyz"; // the lower charactors of business message +@@ -135,41 +136,41 @@ int32_t client_bussiness(char *out, const char *in, uint32_t size, bool verify, + return PROGRAM_OK; + } + +-// server answers +-int32_t server_ans(struct ServerHandler *server_handler, uint32_t pktlen, const char* api, const char* domain) ++static void server_ans_free_buff(char *buff_in, char *buff_out) + { +- const uint32_t length = pktlen; +- char *buffer_in = (char *)malloc(length * sizeof(char)); +- char *buffer_out = (char *)malloc(length * sizeof(char)); ++ if (buff_in) { ++ free(buff_in); ++ } ++ if (buff_out) { ++ free(buff_out); ++ } ++} ++ ++// server_ans_read ++static int32_t server_ans_read(int32_t socket_fd, struct ServerBaseCfgInfo *server_base_info, char *buffer_in, ++ struct sockaddr *client_addr) ++{ ++ const uint32_t length = server_base_info->pktlen; ++ const char *api = server_base_info->api; ++ const char *domain = server_base_info->domain; + + int32_t cread = 0; + int32_t sread = length; + int32_t nread = 0; +- struct sockaddr_in client_addr; +- socklen_t len = sizeof(client_addr); + +- if (strcmp(domain, "udp") == 0 && strncmp(api, "recvfrom", strlen("recvfrom")) != 0) { +- if (getpeername(server_handler->fd, (struct sockaddr *)&client_addr, &len) < 0) { +- if (recvfrom(server_handler->fd, buffer_in, length, MSG_PEEK, (struct sockaddr *)&client_addr, &len) < 0) { +- return PROGRAM_FAULT; +- } +- if (connect(server_handler->fd, (struct sockaddr *)&client_addr, sizeof(struct sockaddr_in)) < 0) { +- return PROGRAM_FAULT; +- } +- } +- } ++ socklen_t len = sizeof(sockaddr_t); + + while (cread < sread) { + if (strcmp(domain, "udp") == 0 && strcmp(api, "recvfromsendto") == 0) { +- nread = recvfrom(server_handler->fd, buffer_in, length, 0, (struct sockaddr *)&client_addr, &len); ++ nread = recvfrom(socket_fd, buffer_in, length, 0, client_addr, &len); + } else { +- nread = read_api(server_handler->fd, buffer_in, length, api); ++ nread = read_api(socket_fd, buffer_in, length, api); + } +- + if (nread == 0) { + return PROGRAM_ABORT; + } else if (nread < 0) { + if (errno != EINTR && errno != EWOULDBLOCK && errno != EAGAIN) { ++ PRINT_ERROR("nread =%d, errno=%d", nread, errno); + return PROGRAM_FAULT; + } + } else { +@@ -177,66 +178,152 @@ int32_t server_ans(struct ServerHandler *server_handler, uint32_t pktlen, const + continue; + } + } ++ return PROGRAM_OK; ++} + +- if (strcmp(api, "recvfrom") == 0) { +- free(buffer_in); +- free(buffer_out); +- return PROGRAM_OK; +- } +- +- server_bussiness(buffer_out, buffer_in, length); ++static int32_t server_ans_write(int32_t socket_fd, struct ServerBaseCfgInfo *server_base_info, char *buffer_out, ++ struct sockaddr *client_addr) ++{ ++ const uint32_t length = server_base_info->pktlen; ++ const char *api = server_base_info->api; ++ const char *domain = server_base_info->domain; + + int32_t cwrite = 0; + int32_t swrite = length; + int32_t nwrite = 0; ++ socklen_t len = sizeof(sockaddr_t); ++ + while (cwrite < swrite) { + if (strcmp(domain, "udp") == 0 && strcmp(api, "recvfromsendto") == 0) { +- nwrite = sendto(server_handler->fd, buffer_out, length, 0, (struct sockaddr *)&client_addr, len); ++ nwrite = sendto(socket_fd, buffer_out, swrite - cwrite, 0, client_addr, len); + } else { +- nwrite = write_api(server_handler->fd, buffer_out, length, api); ++ nwrite = write_api(socket_fd, buffer_out, swrite - cwrite, api); + } + + if (nwrite == 0) { + return PROGRAM_ABORT; + } else if (nwrite < 0) { +- if (errno != EINTR && errno != EWOULDBLOCK && errno != EAGAIN) { ++ if (errno != EINTR && errno != EWOULDBLOCK && errno != EAGAIN) { ++ PRINT_ERROR("nwrite =%d, errno=%d", nwrite, errno); + return PROGRAM_FAULT; +- } ++ } + } else { + cwrite += nwrite; + continue; + } + } ++ return PROGRAM_OK; ++} + +- free(buffer_in); +- free(buffer_out); ++// server answers ++int32_t server_ans(int32_t fd, uint32_t pktlen, const char* api, const char* domain) ++{ ++ const uint32_t length = pktlen; ++ char *buffer_in = (char *)calloc(length, sizeof(char)); ++ char *buffer_out = (char *)calloc(length, sizeof(char)); ++ if (buffer_in == NULL || buffer_out == NULL) { ++ return PROGRAM_FAULT; ++ } ++ ++ struct ServerBaseCfgInfo server_base_info; ++ server_base_info.domain = domain; ++ server_base_info.api = api; ++ server_base_info.pktlen = pktlen; ++ ++ sockaddr_t client_addr; ++ socklen_t len = sizeof(sockaddr_t); ++ ++ if (strcmp(domain, "udp") == 0 && strncmp(api, "recvfrom", strlen("recvfrom")) != 0) { ++ if (getpeername(fd, (struct sockaddr *)&client_addr, &len) < 0) { ++ if (recvfrom(fd, buffer_in, length, MSG_PEEK, (struct sockaddr *)&client_addr, &len) < 0) { ++ server_ans_free_buff(buffer_in, buffer_out); ++ return PROGRAM_FAULT; ++ } ++ if (connect(fd, (struct sockaddr *)&client_addr, len) < 0) { ++ server_ans_free_buff(buffer_in, buffer_out); ++ return PROGRAM_FAULT; ++ } ++ } ++ } ++ ++ fault_inject_delay(INJECT_DELAY_READ); ++ FAULT_INJECT_SKIP_BEGIN(INJECT_SKIP_READ) ++ ++ if (server_ans_read(fd, &server_base_info, buffer_in, (struct sockaddr *)&client_addr) != PROGRAM_OK) { ++ server_ans_free_buff(buffer_in, buffer_out); ++ return PROGRAM_FAULT; ++ } ++ ++ FAULT_INJECT_SKIP_END ++ ++ if (strcmp(api, "recvfrom") == 0) { ++ server_ans_free_buff(buffer_in, buffer_out); ++ return PROGRAM_OK; ++ } ++ ++ server_bussiness(buffer_out, buffer_in, length); ++ ++ fault_inject_delay(INJECT_DELAY_WRITE); ++ FAULT_INJECT_SKIP_BEGIN(INJECT_SKIP_WRITE) ++ ++ if (server_ans_write(fd, &server_base_info, buffer_out, (struct sockaddr *)&client_addr) != PROGRAM_OK) { ++ server_ans_free_buff(buffer_in, buffer_out); ++ return PROGRAM_FAULT; ++ } ++ ++ FAULT_INJECT_SKIP_END ++ ++ server_ans_free_buff(buffer_in, buffer_out); + + return PROGRAM_OK; + } + + // client asks +-int32_t client_ask(struct ClientHandler *client_handler, uint32_t pktlen, const char* api, const char* domain, in_addr_t ip, uint16_t port) ++int32_t client_ask(struct ClientHandler *client_handler, struct ClientUnit *client_unit) + { +- const uint32_t length = pktlen; +- char *buffer_in = (char *)malloc(length * sizeof(char)); +- char *buffer_out = (char *)malloc(length * sizeof(char)); +- struct sockaddr_in server_addr; +- socklen_t len = sizeof(server_addr); +- memset_s(&server_addr, sizeof(server_addr), 0, sizeof(server_addr)); +- server_addr.sin_family = AF_INET; +- server_addr.sin_addr.s_addr = ip; +- server_addr.sin_port = port; ++ const char *api = client_unit->api; ++ const char *domain = client_unit->domain; ++ ++ ip_addr_t *ip = client_unit->protocol_type_mode == UDP_MULTICAST ? &client_unit->groupip : &client_unit->ip; ++ uint16_t port = client_unit->port; ++ ++ const uint32_t length = client_unit->pktlen; ++ char *buffer_in = (char *)calloc(length, sizeof(char)); ++ char *buffer_out = (char *)calloc(length, sizeof(char)); ++ if (buffer_in == NULL || buffer_out == NULL) { ++ return PROGRAM_FAULT; ++ } ++ sockaddr_t server_addr; ++ socklen_t len = 0; ++ ++ if (ip->addr_family == AF_INET6) { ++ memset_s(&server_addr, sizeof(struct sockaddr_in6), 0, sizeof(struct sockaddr_in6)); ++ ((struct sockaddr_in6 *)&server_addr)->sin6_family = AF_INET6; ++ ((struct sockaddr_in6 *)&server_addr)->sin6_addr = ip->u_addr.ip6; ++ ((struct sockaddr_in6 *)&server_addr)->sin6_port = port; ++ len = sizeof(struct sockaddr_in6); ++ } else if (ip->addr_family == AF_INET) { ++ memset_s(&server_addr, sizeof(struct sockaddr_in), 0, sizeof(struct sockaddr_in)); ++ ((struct sockaddr_in *)&server_addr)->sin_family = AF_INET; ++ ((struct sockaddr_in *)&server_addr)->sin_addr = ip->u_addr.ip4; ++ ((struct sockaddr_in *)&server_addr)->sin_port = port; ++ len = sizeof(struct sockaddr_in); ++ } + + client_bussiness(buffer_out, buffer_in, length, false, &(client_handler->msg_idx)); + + int32_t cwrite = 0; + int32_t swrite = length; + int32_t nwrite = 0; ++ ++ fault_inject_delay(INJECT_DELAY_WRITE); ++ FAULT_INJECT_SKIP_BEGIN(INJECT_SKIP_WRITE) ++ + while (cwrite < swrite) { + if (strcmp(domain, "udp") == 0 && strcmp(api, "recvfromsendto") == 0) { +- nwrite = sendto(client_handler->fd, buffer_out, length, 0, (struct sockaddr *)&server_addr, len); ++ nwrite = sendto(client_handler->fd, buffer_out, swrite - cwrite, 0, (struct sockaddr *)&server_addr, len); + } else { +- nwrite = write_api(client_handler->fd, buffer_out, length, api); ++ nwrite = write_api(client_handler->fd, buffer_out, swrite - cwrite, api); + } + if (nwrite == 0) { + return PROGRAM_ABORT; +@@ -250,6 +337,8 @@ int32_t client_ask(struct ClientHandler *client_handler, uint32_t pktlen, const + } + } + ++ FAULT_INJECT_SKIP_END ++ + free(buffer_in); + free(buffer_out); + +@@ -257,18 +346,24 @@ int32_t client_ask(struct ClientHandler *client_handler, uint32_t pktlen, const + } + + // client checks +-int32_t client_chkans(struct ClientHandler *client_handler, uint32_t pktlen, bool verify, const char* api, const char* domain, in_addr_t ip) ++int32_t client_chkans(struct ClientHandler *client_handler, uint32_t pktlen, bool verify, const char* api, const char* domain, ip_addr_t* ip) + { + const uint32_t length = pktlen; +- char *buffer_in = (char *)malloc(length * sizeof(char)); +- char *buffer_out = (char *)malloc(length * sizeof(char)); ++ char *buffer_in = (char *)calloc(length, sizeof(char)); ++ char *buffer_out = (char *)calloc(length, sizeof(char)); ++ if (buffer_in == NULL || buffer_out == NULL) { ++ return PROGRAM_FAULT; ++ } + + int32_t cread = 0; + int32_t sread = length; + int32_t nread = 0; +- struct sockaddr_in server_addr; +- socklen_t len = sizeof(server_addr); ++ sockaddr_t server_addr; ++ socklen_t len = ip->addr_family == AF_INET ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6); + ++ fault_inject_delay(INJECT_DELAY_READ); ++ FAULT_INJECT_SKIP_BEGIN(INJECT_SKIP_READ) ++ + while (cread < sread) { + if (strcmp(domain, "udp") == 0 && strcmp(api, "recvfromsendto") == 0) { + nread = recvfrom(client_handler->fd, buffer_in, length, 0, (struct sockaddr *)&server_addr, &len); +@@ -287,6 +382,8 @@ int32_t client_chkans(struct ClientHandler *client_handler, uint32_t pktlen, boo + } + } + ++ FAULT_INJECT_SKIP_END ++ + if (client_bussiness(buffer_out, buffer_in, length, verify, &(client_handler->msg_idx)) < 0) { + PRINT_ERROR("message verify fault! "); + getchar(); +@@ -295,15 +392,18 @@ int32_t client_chkans(struct ClientHandler *client_handler, uint32_t pktlen, boo + int32_t cwrite = 0; + int32_t swrite = length; + int32_t nwrite = 0; +- if (ip >= inet_addr("224.0.0.0") && ip <= inet_addr("239.255.255.255")) { +- server_addr.sin_addr.s_addr = ip; ++ if (ip->addr_family == AF_INET && ip->u_addr.ip4.s_addr >= inet_addr("224.0.0.0") && ip->u_addr.ip4.s_addr <= inet_addr("239.255.255.255")) { ++ ((struct sockaddr_in*)&server_addr)->sin_addr = ip->u_addr.ip4; + } + ++ fault_inject_delay(INJECT_DELAY_WRITE); ++ FAULT_INJECT_SKIP_BEGIN(INJECT_SKIP_WRITE) ++ + while (cwrite < swrite) { + if (strcmp(domain, "udp") == 0 && strcmp(api, "recvfromsendto") == 0) { +- nwrite = sendto(client_handler->fd, buffer_out, length, 0, (struct sockaddr *)&server_addr, len); ++ nwrite = sendto(client_handler->fd, buffer_out, swrite - cwrite, 0, (struct sockaddr *)&server_addr, len); + } else { +- nwrite = write_api(client_handler->fd, buffer_out, length, api); ++ nwrite = write_api(client_handler->fd, buffer_out, swrite - cwrite, api); + } + if (nwrite == 0) { + return PROGRAM_ABORT; +@@ -317,6 +417,8 @@ int32_t client_chkans(struct ClientHandler *client_handler, uint32_t pktlen, boo + } + } + ++ FAULT_INJECT_SKIP_END ++ + free(buffer_in); + free(buffer_out); + +diff --git a/examples/src/client.c b/examples/src/client.c +index 1366924..43fbd0e 100644 +--- a/examples/src/client.c ++++ b/examples/src/client.c +@@ -12,24 +12,61 @@ + + + #include "client.h" +- ++#include "server.h" + + static pthread_mutex_t client_debug_mutex; // the client mutex for printf ++struct Client *g_client_begin = NULL; ++ ++static int32_t client_process_ask(struct ClientHandler *client_handler, struct ClientUnit *client_unit); ++static void client_get_domain_ipversion(uint8_t protocol_type, struct ClientUnit *client_unit); + ++static void timer_handle(int signum) ++{ ++ if (g_client_begin == NULL) { ++ return; ++ } ++ ++ struct ClientUnit *begin_client_unit = g_client_begin->uints; ++ while (begin_client_unit != NULL) { ++ if (begin_client_unit->domain != NULL && strcmp(begin_client_unit->domain, "udp") != 0) { ++ begin_client_unit = begin_client_unit->next; ++ continue; ++ } ++ for (int32_t i = 0; i < begin_client_unit->connect_num; i++) { ++ struct ClientHandler *handle = begin_client_unit->handlers + i; ++ if (handle->sendtime_interverl == TIME_SEND_INTERVAL) { ++ client_process_ask(handle, begin_client_unit); ++ } else { ++ handle->sendtime_interverl++; ++ } ++ } ++ ++ begin_client_unit = begin_client_unit->next; ++ } ++ alarm(TIME_SCAN_INTERVAL); ++} ++ ++static struct Client_domain_ip g_cfgmode_map[PROTOCOL_MODE_MAX] = { ++ [V4_TCP] = {"tcp", AF_INET}, ++ [V6_TCP] = {"tcp", AF_INET6}, ++ [V4_UDP] = {"udp", AF_INET}, ++ [V6_UDP] = {"udp", AF_INET6}, ++ [UDP_MULTICAST] = {"udp", AF_INET}}; + + // the single thread, client prints informations +-void client_debug_print(const char *ch_str, const char *act_str, in_addr_t ip, uint16_t port, bool debug) ++void client_debug_print(const char *ch_str, const char *act_str, ip_addr_t *ip, uint16_t port, bool debug) + { + if (debug == true) { + pthread_mutex_lock(&client_debug_mutex); +- struct in_addr sin_addr; +- sin_addr.s_addr = ip; ++ uint8_t str_len = ip->addr_family == AF_INET ? INET_ADDRSTRLEN : INET6_ADDRSTRLEN; ++ char str_ip[str_len]; ++ inet_ntop(ip->addr_family, &ip->u_addr, str_ip, str_len); + PRINT_CLIENT("[%s] [pid: %d] [tid: %ld] [%s <- %s:%d]. ", \ + ch_str, \ + getpid(), \ + pthread_self(), \ + act_str, \ +- inet_ntoa(sin_addr), \ ++ str_ip, \ + ntohs(port)); + pthread_mutex_unlock(&client_debug_mutex); + } +@@ -41,7 +78,8 @@ void client_info_print(struct Client *client) + if (client->debug == false) { + struct timeval begin; + gettimeofday(&begin, NULL); +- uint64_t begin_time = (uint64_t)begin.tv_sec * 1000 + (uint64_t)begin.tv_usec / 1000; ++ uint64_t begin_time = (uint64_t)begin.tv_sec * TIMES_CONVERSION_RATE + ++ (uint64_t)begin.tv_usec / TIMES_CONVERSION_RATE; + + uint32_t curr_connect = 0; + double bytes_ps = 0; +@@ -49,45 +87,164 @@ void client_info_print(struct Client *client) + struct ClientUnit *begin_uint = client->uints; + while (begin_uint != NULL) { + curr_connect += begin_uint->curr_connect; +- begin_send_bytes += begin_uint->send_bytes; ++ begin_send_bytes += begin_uint->threadVolume.send_bytes; + begin_uint = begin_uint->next; + } + + struct timeval delay; + delay.tv_sec = 0; +- delay.tv_usec = TERMINAL_REFRESH_MS * 1000; ++ delay.tv_usec = TERMINAL_REFRESH_MS * TIMES_CONVERSION_RATE; + select(0, NULL, NULL, NULL, &delay); + + uint64_t end_send_bytes = 0; + struct ClientUnit *end_uint = client->uints; + while (end_uint != NULL) { +- end_send_bytes += end_uint->send_bytes; ++ end_send_bytes += end_uint->threadVolume.send_bytes; + end_uint = end_uint->next; + } + + struct timeval end; + gettimeofday(&end, NULL); +- uint64_t end_time = (uint64_t)end.tv_sec * 1000 + (uint64_t)end.tv_usec / 1000; +- ++ uint64_t end_time = (uint64_t)end.tv_sec * TIMES_CONVERSION_RATE + ++ (uint64_t)end.tv_usec / TIMES_CONVERSION_RATE; ++ + double bytes_sub = end_send_bytes > begin_send_bytes ? (double)(end_send_bytes - begin_send_bytes) : 0; +- double time_sub = end_time > begin_time ? (double)(end_time - begin_time) / 1000 : 0; ++ double time_sub = end_time > begin_time ? (double)(end_time - begin_time) / TIMES_CONVERSION_RATE : 0; + + bytes_ps = bytes_sub / time_sub; + +- if (bytes_ps < 1024) { ++ if (bytes_ps < KB) { + PRINT_CLIENT_DATAFLOW("[connect num]: %d, [send]: %.3f B/s", curr_connect, bytes_ps); +- } else if (bytes_ps < (1024 * 1024)) { +- PRINT_CLIENT_DATAFLOW("[connect num]: %d, [send]: %.3f KB/s", curr_connect, bytes_ps / 1024); ++ } else if (bytes_ps < MB) { ++ PRINT_CLIENT_DATAFLOW("[connect num]: %d, [send]: %.3f KB/s", curr_connect, bytes_ps / KB); + } else { +- PRINT_CLIENT_DATAFLOW("[connect num]: %d, [send]: %.3f MB/s", curr_connect, bytes_ps / (1024 * 1024)); ++ PRINT_CLIENT_DATAFLOW("[connect num]: %d, [send]: %.3f MB/s", curr_connect, bytes_ps / MB); ++ } ++ ++ if (client->loop) { ++ printf("\033[2A\033[120C\033[K\n"); ++ return; ++ } ++ printf("\033[A\033[K"); ++ } ++} ++ ++static int32_t client_process_ask(struct ClientHandler *client_handler, struct ClientUnit *client_unit) ++{ ++ // not support udp+v6 currently ++ if (strcmp(client_unit->domain, "udp") == 0 && client_unit->ip.addr_family == AF_INET6) { ++ return PROGRAM_OK; ++ } ++ ++ int32_t client_ask_ret = client_ask(client_handler, client_unit); ++ if (client_ask_ret == PROGRAM_FAULT) { ++ --client_unit->curr_connect; ++ struct epoll_event ep_ev; ++ if (client_handler->fd > 0 && epoll_ctl(client_unit->epfd, EPOLL_CTL_DEL, (client_handler)->fd, &ep_ev) < 0) { ++ PRINT_ERROR("client can't delete socket '%d' to control epoll %d! ", (client_handler)->fd, errno); ++ return PROGRAM_FAULT; + } ++ } else if (client_ask_ret == PROGRAM_ABORT) { ++ --client_unit->curr_connect; ++ if (close((client_handler)->fd) < 0) { ++ PRINT_ERROR("client can't close the socket! "); ++ return PROGRAM_FAULT; ++ } ++ client_debug_print("client unit", "close", &client_unit->ip, client_unit->port, client_unit->debug); ++ } else { ++ client_unit->threadVolume.send_bytes += client_unit->pktlen; ++ client_handler->sendtime_interverl = 0; ++ client_debug_print("client unit", "send", &client_unit->ip, client_unit->port, client_unit->debug); ++ } ++ return PROGRAM_OK; ++} ++ ++static void client_get_thread_volume(struct Client *client, struct ThreadUintInfo *threadVolume) ++{ ++ int index = 0; ++ struct ClientUnit *curUint = client->uints; ++ while (curUint != NULL && index < client->threadNum) { ++ threadVolume[index].send_bytes = curUint->threadVolume.send_bytes; ++ ++ threadVolume[index].cur_connect_num = curUint->curr_connect; ++ threadVolume[index].thread_id = curUint->threadVolume.thread_id; ++ threadVolume[index].domain = curUint->threadVolume.domain; ++ threadVolume[index].ip_type_info = curUint->threadVolume.ip_type_info; ++ curUint = curUint->next; ++ index++; ++ } ++} ++ ++void client_info_print_mixed(struct Client *client, struct ThreadUintInfo *threadVolume, ++ struct ThreadUintInfo *endThreadVolume) ++{ ++ if (client->debug == true) { ++ return; ++ } ++ int32_t pthread_num = client->threadNum; ++ int32_t not_support_thread = 0; ++ struct timeval cur = {0}; ++ ++ gettimeofday(&cur, NULL); ++ uint64_t begin_time = (uint64_t)cur.tv_sec * TIMES_CONVERSION_RATE + (uint64_t)cur.tv_usec / TIMES_CONVERSION_RATE; ++ ++ client_get_thread_volume(client, threadVolume); ++ ++ struct timeval delay; ++ delay.tv_sec = 0; ++ delay.tv_usec = TERMINAL_REFRESH_MS * TIMES_CONVERSION_RATE; ++ select(0, NULL, NULL, NULL, &delay); ++ ++ client_get_thread_volume(client, endThreadVolume); ++ ++ gettimeofday(&cur, NULL); ++ uint64_t end_time = (uint64_t)cur.tv_sec * TIMES_CONVERSION_RATE + (uint64_t)cur.tv_usec / TIMES_CONVERSION_RATE; ++ ++ for (int i = 0; i < pthread_num; i++) { ++ uint64_t begin_send_bytes = threadVolume[i].send_bytes; ++ uint64_t end_send_bytes = endThreadVolume[i].send_bytes; ++ pthread_t thread_id = endThreadVolume[i].thread_id; ++ uint32_t connect_num = endThreadVolume[i].cur_connect_num; ++ char *domain = endThreadVolume[i].domain; ++ char *ip_ver = endThreadVolume[i].ip_type_info; ++ ++ if (thread_id == 0) { ++ not_support_thread++; ++ continue; ++ } ++ ++ double bytes_sub = end_send_bytes > begin_send_bytes ? (double)(end_send_bytes - begin_send_bytes) : 0; ++ double time_sub = end_time > begin_time ? (double)(end_time - begin_time) / TIMES_CONVERSION_RATE : 0; ++ double bytes_ps = bytes_sub / time_sub; ++ ++ if (bytes_ps < KB) { ++ PRINT_CLIENT_DATAFLOW("threadID=%-15lu, %s_%-9s [connect num]: %u, [send]: %.3f B/s", ++ thread_id, domain, ip_ver, connect_num, bytes_ps); ++ } else if (bytes_ps < MB) { ++ PRINT_CLIENT_DATAFLOW("threadID=%-15lu, %s_%-9s [connect num]: %u, [send]: %.3f kB/s", ++ thread_id, domain, ip_ver, connect_num, bytes_ps / KB); ++ } else { ++ PRINT_CLIENT_DATAFLOW("threadID=%-15lu, %s_%-9s [connect num]: %u, [send]: %.3f MB/s", ++ thread_id, domain, ip_ver, connect_num, bytes_ps / MB); ++ } ++ } ++ printf("\033[%dA\033[K", pthread_num - not_support_thread); ++} ++ ++void loop_info_print() ++{ ++ printf(" "); ++ if (strcmp(loopmod.model, "mum") == 0) { ++ sermum_info_print(loopmod.server_mum_info); ++ } else { ++ sermud_info_print(loopmod.server_mud_info); + } + } + + // the single thread, client try to connect to server, register to epoll +-int32_t client_thread_try_connect(struct ClientHandler *client_handler, int32_t epoll_fd, in_addr_t ip, in_addr_t groupip, uint16_t port, uint16_t sport, const char *domain, const char *api) ++int32_t client_thread_try_connect(struct ClientHandler *client_handler, struct ClientUnit *client_unit) + { +- int32_t create_socket_and_connect_ret = create_socket_and_connect(&(client_handler->fd), ip, groupip, port, sport, domain, api); ++ int32_t create_socket_and_connect_ret = create_socket_and_connect(&(client_handler->fd), client_unit); + if (create_socket_and_connect_ret == PROGRAM_INPROGRESS) { + return PROGRAM_OK; + } +@@ -97,7 +254,7 @@ int32_t client_thread_try_connect(struct ClientHandler *client_handler, int32_t + // the single thread, client retry to connect to server, register to epoll + int32_t client_thread_retry_connect(struct ClientUnit *client_unit, struct ClientHandler *client_handler) + { +- int32_t clithd_try_cnntask_ret = client_thread_try_connect(client_handler, client_unit->epfd, client_unit->ip, client_unit->groupip, client_unit->port, client_unit->sport, client_unit->domain, client_unit->api); ++ int32_t clithd_try_cnntask_ret = client_thread_try_connect(client_handler, client_unit); + if (clithd_try_cnntask_ret < 0) { + if (clithd_try_cnntask_ret == PROGRAM_INPROGRESS) { + return PROGRAM_OK; +@@ -114,35 +271,27 @@ int32_t client_thread_retry_connect(struct ClientUnit *client_unit, struct Clien + + ++(client_unit->curr_connect); + +- struct sockaddr_in server_addr; +- socklen_t server_addr_len = sizeof(server_addr); ++ sockaddr_t server_addr; ++ socklen_t server_addr_len = client_unit->ip.addr_family ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6); + if (getpeername(client_handler->fd, (struct sockaddr *)&server_addr, &server_addr_len) < 0) { + PRINT_ERROR("client can't socket peername %d! ", errno); + return PROGRAM_FAULT; + } +- client_debug_print("client unit", "connect", server_addr.sin_addr.s_addr, server_addr.sin_port, client_unit->debug); + +- int32_t client_ask_ret = client_ask(client_handler, client_unit->pktlen, client_unit->api, client_unit->domain, client_unit->groupip ? client_unit->groupip:client_unit->ip, client_unit->port); +- if (client_ask_ret == PROGRAM_FAULT) { +- --client_unit->curr_connect; +- struct epoll_event ep_ev; +- if (epoll_ctl(client_unit->epfd, EPOLL_CTL_DEL, client_handler->fd, &ep_ev) < 0) { +- PRINT_ERROR("client can't delete socket '%d' to control epoll %d! ", client_handler->fd, errno); +- return PROGRAM_FAULT; +- } +- } else if (client_ask_ret == PROGRAM_ABORT) { +- --client_unit->curr_connect; +- if (close(client_handler->fd) < 0) { +- PRINT_ERROR("client can't close the socket %d! ", errno); +- return PROGRAM_FAULT; +- } +- client_debug_print("client unit", "close", server_addr.sin_addr.s_addr, server_addr.sin_port, client_unit->debug); +- } else { +- client_unit->send_bytes += client_unit->pktlen; +- client_debug_print("client unit", "send", server_addr.sin_addr.s_addr, server_addr.sin_port, client_unit->debug); ++ // sockaddr to ip, port ++ ip_addr_t remote_ip; ++ uint16_t remote_port = ((struct sockaddr_in*)&server_addr)->sin_port; ++ if (((struct sockaddr *)&server_addr)->sa_family == AF_INET) { ++ remote_ip.addr_family = AF_INET; ++ remote_ip.u_addr.ip4 = ((struct sockaddr_in *)&server_addr)->sin_addr; ++ } else if (((struct sockaddr *)&server_addr)->sa_family == AF_INET6) { ++ remote_ip.addr_family = AF_INET6; ++ remote_ip.u_addr.ip6 = ((struct sockaddr_in6 *)&server_addr)->sin6_addr; + } + +- return PROGRAM_OK; ++ client_debug_print("client unit", "connect", &remote_ip, remote_port, client_unit->debug); ++ ++ return client_process_ask(client_handler, client_unit); + } + + // the single thread, client connects and gets epoll feature descriptors +@@ -162,7 +311,7 @@ int32_t client_thread_create_epfd_and_reg(struct ClientUnit *client_unit) + } + + for (uint32_t i = 0; i < connect_num; ++i) { +- int32_t clithd_try_cnntask_ret = client_thread_try_connect(client_unit->handlers + i, client_unit->epfd, client_unit->ip, client_unit->groupip, client_unit->port, client_unit->sport, client_unit->domain, client_unit->api); ++ int32_t clithd_try_cnntask_ret = client_thread_try_connect(client_unit->handlers + i, client_unit); + if (clithd_try_cnntask_ret < 0) { + if (clithd_try_cnntask_ret == PROGRAM_INPROGRESS) { + continue; +@@ -179,26 +328,11 @@ int32_t client_thread_create_epfd_and_reg(struct ClientUnit *client_unit) + + ++(client_unit->curr_connect); + +- client_debug_print("client unit", "connect", client_unit->ip, client_unit->port, client_unit->debug); +- +- int32_t client_ask_ret = client_ask(client_unit->handlers + i, client_unit->pktlen, client_unit->api, client_unit->domain, client_unit->groupip ? client_unit->groupip:client_unit->ip, client_unit->port); +- if (client_ask_ret == PROGRAM_FAULT) { +- --client_unit->curr_connect; +- struct epoll_event ep_ev; +- if (epoll_ctl(client_unit->epfd, EPOLL_CTL_DEL, (client_unit->handlers + i)->fd, &ep_ev) < 0) { +- PRINT_ERROR("client can't delete socket '%d' to control epoll %d! ", client_unit->epevs[i].data.fd, errno); +- return PROGRAM_FAULT; +- } +- } else if (client_ask_ret == PROGRAM_ABORT) { +- --client_unit->curr_connect; +- if (close((client_unit->handlers + i)->fd) < 0) { +- PRINT_ERROR("client can't close the socket! "); +- return PROGRAM_FAULT; +- } +- client_debug_print("client unit", "close", client_unit->ip, client_unit->port, client_unit->debug); +- } else { +- client_unit->send_bytes += client_unit->pktlen; +- client_debug_print("client unit", "send", client_unit->ip, client_unit->port, client_unit->debug); ++ client_debug_print("client unit", "connect", &client_unit->ip, client_unit->port, client_unit->debug); ++ ++ int32_t client_ask_ret = client_process_ask(client_unit->handlers + i, client_unit); ++ if (client_ask_ret != PROGRAM_OK) { ++ return client_ask_ret; + } + } + } +@@ -206,15 +340,97 @@ int32_t client_thread_create_epfd_and_reg(struct ClientUnit *client_unit) + return PROGRAM_OK; + } + ++ ++static int32_t clithd_proc_epevs_epollout(struct epoll_event *curr_epev, struct ClientUnit *client_unit) ++{ ++ int32_t connect_error = 0; ++ socklen_t connect_error_len = sizeof(connect_error); ++ struct ClientHandler *client_handler = (struct ClientHandler *)curr_epev->data.ptr; ++ if (getsockopt(client_handler->fd, SOL_SOCKET, SO_ERROR, (void *)(&connect_error), &connect_error_len) < 0) { ++ PRINT_ERROR("client can't get socket option %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ if (connect_error < 0) { ++ if (connect_error == ETIMEDOUT) { ++ if (client_thread_retry_connect(client_unit, client_handler) < 0) { ++ return PROGRAM_FAULT; ++ } ++ return PROGRAM_OK; ++ } ++ PRINT_ERROR("client connect error %d! ", connect_error); ++ return PROGRAM_FAULT; ++ } else { ++ ++(client_unit->curr_connect); ++ ++ sockaddr_t server_addr; ++ socklen_t server_addr_len = ++ client_unit->ip.addr_family == AF_INET ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6); ++ if (getpeername(client_handler->fd, (struct sockaddr *)&server_addr, &server_addr_len) < 0) { ++ PRINT_ERROR("client can't socket peername %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ ++ // sockaddr to ip, port ++ ip_addr_t remote_ip; ++ uint16_t remote_port = ((struct sockaddr_in *)&server_addr)->sin_port; ++ if (((struct sockaddr *)&server_addr)->sa_family == AF_INET) { ++ remote_ip.addr_family = AF_INET; ++ remote_ip.u_addr.ip4 = ((struct sockaddr_in *)&server_addr)->sin_addr; ++ } else if (((struct sockaddr *)&server_addr)->sa_family == AF_INET6) { ++ remote_ip.addr_family = AF_INET6; ++ remote_ip.u_addr.ip6 = ((struct sockaddr_in6 *)&server_addr)->sin6_addr; ++ } ++ ++ client_debug_print("client unit", "connect", &remote_ip, remote_port, client_unit->debug); ++ ++ int32_t client_ask_ret = client_process_ask(client_handler, client_unit); ++ if (client_ask_ret != PROGRAM_OK) { ++ return client_ask_ret; ++ } ++ } ++ return PROGRAM_OK; ++} ++ ++static int32_t clithd_proc_epevs_epollin(struct epoll_event *curr_epev, struct ClientUnit *client_unit) ++{ ++ ip_addr_t *chkans_ip = client_unit->protocol_type_mode == UDP_MULTICAST ? &client_unit->groupip : &client_unit->ip; ++ int32_t client_chkans_ret = client_chkans((struct ClientHandler *)curr_epev->data.ptr, client_unit->pktlen, ++ client_unit->verify, client_unit->api, client_unit->domain, chkans_ip); ++ struct ClientHandler *client_handler = (struct ClientHandler *)curr_epev->data.ptr; ++ int32_t fd = client_handler->fd; ++ if (client_chkans_ret == PROGRAM_FAULT) { ++ --client_unit->curr_connect; ++ struct epoll_event ep_ev; ++ if (epoll_ctl(client_unit->epfd, EPOLL_CTL_DEL, fd, &ep_ev) < 0) { ++ PRINT_ERROR("client can't delete socket '%d' to control epoll %d! ", fd, errno); ++ return PROGRAM_FAULT; ++ } ++ } else if (client_chkans_ret == PROGRAM_ABORT) { ++ --client_unit->curr_connect; ++ if (close(fd) < 0) { ++ PRINT_ERROR("client can't close the socket %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ client_debug_print("client unit", "close", &client_unit->ip, client_unit->port, client_unit->debug); ++ } else { ++ client_unit->threadVolume.send_bytes += client_unit->pktlen; ++ client_handler->sendtime_interverl = 0; ++ client_debug_print("client unit", "receive", &client_unit->ip, client_unit->port, client_unit->debug); ++ } ++ return PROGRAM_OK; ++} ++ + // the single thread, client processes epoll events + int32_t clithd_proc_epevs(struct ClientUnit *client_unit) + { + int32_t epoll_nfds = epoll_wait(client_unit->epfd, client_unit->epevs, CLIENT_EPOLL_SIZE_MAX, CLIENT_EPOLL_WAIT_TIMEOUT); ++ int ret = 0; + if (epoll_nfds < 0) { + PRINT_ERROR("client epoll wait error %d! ", errno); + return PROGRAM_FAULT; + } + ++ + for (int32_t i = 0; i < epoll_nfds; ++i) { + struct epoll_event *curr_epev = client_unit->epevs + i; + +@@ -222,76 +438,17 @@ int32_t clithd_proc_epevs(struct ClientUnit *client_unit) + PRINT_ERROR("client epoll wait error! %d", curr_epev->events); + return PROGRAM_FAULT; + } else if (curr_epev->events == EPOLLOUT) { +- int32_t connect_error = 0; +- socklen_t connect_error_len = sizeof(connect_error); +- struct ClientHandler *client_handler = (struct ClientHandler *)curr_epev->data.ptr; +- if (getsockopt(client_handler->fd, SOL_SOCKET, SO_ERROR, (void *)(&connect_error), &connect_error_len) < 0) { +- PRINT_ERROR("client can't get socket option %d! ", errno); +- return PROGRAM_FAULT; +- } +- if (connect_error < 0) { +- if (connect_error == ETIMEDOUT) { +- if (client_thread_retry_connect(client_unit, client_handler) < 0) { +- return PROGRAM_FAULT; +- } +- continue; +- } +- PRINT_ERROR("client connect error %d! ", connect_error); +- return PROGRAM_FAULT; +- } else { +- ++(client_unit->curr_connect); +- +- struct sockaddr_in server_addr; +- socklen_t server_addr_len = sizeof(server_addr); +- if (getpeername(client_handler->fd, (struct sockaddr *)&server_addr, &server_addr_len) < 0) { +- PRINT_ERROR("client can't socket peername %d! ", errno); +- return PROGRAM_FAULT; +- } +- client_debug_print("client unit", "connect", server_addr.sin_addr.s_addr, server_addr.sin_port, client_unit->debug); +- +- int32_t client_ask_ret = client_ask(client_handler, client_unit->pktlen, client_unit->api, client_unit->domain, client_unit->groupip ? client_unit->groupip:client_unit->ip, client_unit->port); +- if (client_ask_ret == PROGRAM_FAULT) { +- --client_unit->curr_connect; +- struct epoll_event ep_ev; +- if (epoll_ctl(client_unit->epfd, EPOLL_CTL_DEL, curr_epev->data.fd, &ep_ev) < 0) { +- PRINT_ERROR("client can't delete socket '%d' to control epoll %d! ", curr_epev->data.fd, errno); +- return PROGRAM_FAULT; +- } +- } else if (client_ask_ret == PROGRAM_ABORT) { +- --client_unit->curr_connect; +- if (close(curr_epev->data.fd) < 0) { +- PRINT_ERROR("client can't close the socket! "); +- return PROGRAM_FAULT; +- } +- client_debug_print("client unit", "close", server_addr.sin_addr.s_addr, server_addr.sin_port, client_unit->debug); +- } else { +- client_unit->send_bytes += client_unit->pktlen; +- client_debug_print("client unit", "send", server_addr.sin_addr.s_addr, server_addr.sin_port, client_unit->debug); +- } ++ ret = clithd_proc_epevs_epollout(curr_epev, client_unit); ++ if (ret != PROGRAM_OK) { ++ return ret; + } + } else if (curr_epev->events == EPOLLIN) { +- int32_t client_chkans_ret = client_chkans((struct ClientHandler *)curr_epev->data.ptr, client_unit->pktlen, client_unit->verify, client_unit->api, client_unit->domain, client_unit->groupip ? client_unit->groupip:client_unit->ip); +- if (client_chkans_ret == PROGRAM_FAULT) { +- --client_unit->curr_connect; +- struct epoll_event ep_ev; +- if (epoll_ctl(client_unit->epfd, EPOLL_CTL_DEL, curr_epev->data.fd, &ep_ev) < 0) { +- PRINT_ERROR("client can't delete socket '%d' to control epoll %d! ", curr_epev->data.fd, errno); +- return PROGRAM_FAULT; +- } +- } else if (client_chkans_ret == PROGRAM_ABORT) { +- --client_unit->curr_connect; +- if (close(curr_epev->data.fd) < 0) { +- PRINT_ERROR("client can't close the socket %d! ", errno); +- return PROGRAM_FAULT; +- } +- client_debug_print("client unit", "close", client_unit->ip, client_unit->port, client_unit->debug); +- } else { +- client_unit->send_bytes += client_unit->pktlen; +- client_debug_print("client unit", "receive", client_unit->ip, client_unit->port, client_unit->debug); ++ ret = clithd_proc_epevs_epollin(curr_epev, client_unit); ++ if (ret != PROGRAM_OK) { ++ return ret; + } + } + } +- + return PROGRAM_OK; + } + +@@ -299,6 +456,17 @@ int32_t clithd_proc_epevs(struct ClientUnit *client_unit) + void *client_s_create_and_run(void *arg) + { + struct ClientUnit *client_unit = (struct ClientUnit *)arg; ++ // update domain ip info. ++ client_get_domain_ipversion(client_unit->protocol_type_mode, client_unit); ++ ++ if (client_unit->protocol_type_mode == UDP_MULTICAST) { ++ client_unit->threadVolume.ip_type_info = IPV4_MULTICAST; ++ } else { ++ client_unit->threadVolume.ip_type_info = (client_unit->ip.addr_family == AF_INET ? IPV4_STR : IPV6_STR); ++ } ++ client_unit->threadVolume.thread_id = pthread_self(); ++ ++ client_unit->threadVolume.domain = client_unit->domain; + + if (client_thread_create_epfd_and_reg(client_unit) < 0) { + exit(PROGRAM_FAULT); +@@ -316,6 +484,42 @@ void *client_s_create_and_run(void *arg) + return (void *)PROGRAM_OK; + } + ++// prase the specific supported TCP IP types by cfg_mode. ++static void client_get_protocol_type_by_cfgmode(uint8_t mode, int32_t *support_type_array, int32_t buff_len, ++ int32_t *actual_len) ++{ ++ int32_t index = 0; ++ for (uint8_t i = V4_TCP; i < PROTOCOL_MODE_MAX; i++) { ++ if (i == V6_UDP) { ++ continue; ++ } ++ if (getbit_num(mode, i) == 1) { ++ if (index >= buff_len) { ++ PRINT_ERROR("index is over, index =%d", index); ++ return; ++ } ++ support_type_array[index] = i; ++ index++; ++ } ++ } ++ *actual_len = index; ++} ++ ++static void client_get_domain_ipversion(uint8_t protocol_type, struct ClientUnit *client_unit) ++{ ++ client_unit->domain = g_cfgmode_map[protocol_type].domain; ++ client_unit->ip.addr_family = g_cfgmode_map[protocol_type].ip_family; ++} ++ ++static void alarm_init() ++{ ++ struct sigaction sa; ++ memset(&sa, 0, sizeof(sa)); ++ sa.sa_handler = &timer_handle; ++ sigaction(SIGALRM, &sa, NULL); ++ alarm(TIME_SCAN_INTERVAL); ++} ++ + // create client and run + int32_t client_create_and_run(struct ProgramParams *params) + { +@@ -323,16 +527,44 @@ int32_t client_create_and_run(struct ProgramParams *params) + const uint32_t thread_num = params->thread_num; + pthread_t *tids = (pthread_t *)malloc(thread_num * sizeof(pthread_t)); + struct Client *client = (struct Client *)malloc(sizeof(struct Client)); ++ g_client_begin = client; ++ client->threadNum = thread_num; ++ + struct ClientUnit *client_unit = (struct ClientUnit *)malloc(sizeof(struct ClientUnit)); ++ memset_s(client_unit, sizeof(struct ClientUnit), 0, sizeof(struct ClientUnit)); ++ int32_t protocol_support_array[PROTOCOL_MODE_MAX] = {0}; ++ int32_t number_of_support_type = 1; + + if (pthread_mutex_init(&client_debug_mutex, NULL) < 0) { + PRINT_ERROR("client can't init posix mutex %d! ", errno); + return PROGRAM_FAULT; + } + ++ bool v4_cfg_flag = (strcmp(params->ip, PARAM_DEFAULT_IP) != 0); ++ bool v6_cfg_flag = (strcmp(params->ipv6, PARAM_DEFAULT_IP_V6) != 0); ++ bool multcact_cfg_flag = (strcmp(params->groupip, PARAM_DEFAULT_GROUPIP) != 0); ++ ++ bool mixed_mode_flag = false; ++ if ((strchr(params->domain, ',') != NULL) || (v4_cfg_flag && v6_cfg_flag) || ++ (multcact_cfg_flag && (v4_cfg_flag || v6_cfg_flag))) { ++ mixed_mode_flag = true; ++ } ++ + client->uints = client_unit; + client->debug = params->debug; + ++ uint8_t protocol_type_mode = program_get_protocol_mode_by_domain_ip(params->domain, params->ip, params->ipv6, ++ params->groupip); ++ ++ client_get_protocol_type_by_cfgmode(protocol_type_mode, protocol_support_array, PROTOCOL_MODE_MAX, ++ &number_of_support_type); ++ ++ uint32_t port = UNIX_TCP_PORT_MIN; ++ uint32_t sport = 0; ++ uint32_t sp = 0; ++ ++ alarm_init(); ++ + for (uint32_t i = 0; i < thread_num; ++i) { + client_unit->handlers = (struct ClientHandler *)malloc(connect_num * sizeof(struct ClientHandler)); + for (uint32_t j = 0; j < connect_num; ++j) { +@@ -342,13 +574,42 @@ int32_t client_create_and_run(struct ProgramParams *params) + client_unit->epfd = -1; + client_unit->epevs = (struct epoll_event *)malloc(CLIENT_EPOLL_SIZE_MAX * sizeof(struct epoll_event)); + client_unit->curr_connect = 0; +- client_unit->send_bytes = 0; +- client_unit->ip = inet_addr(params->ip); +- client_unit->groupip = inet_addr(params->groupip); +- client_unit->port = htons(params->port); +- client_unit->sport = htons(params->sport); ++ ++ client_unit->threadVolume.cur_connect_num = 0; ++ client_unit->threadVolume.thread_id = 0; ++ client_unit->threadVolume.send_bytes = 0; ++ client_unit->threadVolume.ip_type_info = INVAILD_STR; ++ client_unit->threadVolume.domain = INVAILD_STR; ++ ++ client_unit->ip.addr_family = params->addr_family; ++ inet_pton(AF_INET, params->ip, &client_unit->ip.u_addr.ip4); ++ inet_pton(AF_INET6, params->ipv6, &client_unit->ip.u_addr.ip6); ++ client_unit->groupip.addr_family = AF_INET; ++ inet_pton(AF_INET, params->groupip, &client_unit->groupip.u_addr); ++ client_unit->groupip_interface.addr_family = params->addr_family; ++ inet_pton(AF_INET, params->groupip_interface, &client_unit->groupip_interface.u_addr); ++ ++ /* loop to set ports to each client_units */ ++ while (!((params->port)[port])) { ++ port = (port + 1) % UNIX_TCP_PORT_MAX; ++ } ++ client_unit->port = htons(port++); ++ ++ sp = sport; ++ sport++; ++ while (!((params->sport)[sport]) && (sport != sp)) { ++ sport = (sport + 1) % UNIX_TCP_PORT_MAX; ++ } ++ ++ client_unit->sport = htons(sport); + client_unit->connect_num = params->connect_num; + client_unit->pktlen = params->pktlen; ++ if (strcmp(params->as, "loop") == 0) { ++ client_unit->loop = 1; ++ } else { ++ client_unit->loop = 0; ++ } ++ + client_unit->verify = params->verify; + client_unit->domain = params->domain; + client_unit->api = params->api; +@@ -357,6 +618,16 @@ int32_t client_create_and_run(struct ProgramParams *params) + client_unit->next = (struct ClientUnit *)malloc(sizeof(struct ClientUnit)); + memset_s(client_unit->next, sizeof(struct ClientUnit), 0, sizeof(struct ClientUnit)); + ++ if (number_of_support_type > 0) { ++ int32_t index = i % number_of_support_type; ++ client_unit->protocol_type_mode = protocol_support_array[index]; ++ } ++ if (client_unit->protocol_type_mode == V4_UDP || client_unit->protocol_type_mode == V6_UDP || ++ client_unit->protocol_type_mode == UDP_MULTICAST) { ++ client_unit->pktlen = params->pktlen > UDP_PKTLEN_MAX ? UDP_PKTLEN_MAX : params->pktlen; ++ } else { ++ client_unit->pktlen = params->pktlen; ++ } + if (pthread_create((tids + i), NULL, client_s_create_and_run, client_unit) < 0) { + PRINT_ERROR("client can't create thread of poisx %d! ", errno); + return PROGRAM_FAULT; +@@ -367,9 +638,34 @@ int32_t client_create_and_run(struct ProgramParams *params) + if (client->debug == false) { + printf("[program informations]: \n\n"); + } ++ ++ struct ThreadUintInfo *beginVolume = (struct ThreadUintInfo *)malloc(thread_num * sizeof(struct ThreadUintInfo)); ++ if (beginVolume == NULL) { ++ return PROGRAM_FAULT; ++ } ++ memset_s(beginVolume, thread_num * sizeof(struct ThreadUintInfo), 0, thread_num * sizeof(struct ThreadUintInfo)); ++ struct ThreadUintInfo *endVolume = (struct ThreadUintInfo *)malloc(thread_num * sizeof(struct ThreadUintInfo)); ++ if (endVolume == NULL) { ++ return PROGRAM_FAULT; ++ } ++ memset_s(endVolume, thread_num * sizeof(struct ThreadUintInfo), 0, thread_num * sizeof(struct ThreadUintInfo)); ++ ++ if (strcmp(params->as, "loop") == 0) { ++ client->loop = true; ++ } ++ + while (true) { +- client_info_print(client); ++ if (strcmp(params->as, "loop") == 0) { ++ loop_info_print(); ++ } ++ if (mixed_mode_flag == true) { ++ client_info_print_mixed(client, beginVolume, endVolume); ++ } else { ++ client_info_print(client); ++ } + } ++ free(beginVolume); ++ free(endVolume); + + pthread_mutex_destroy(&client_debug_mutex); + +diff --git a/examples/src/parameter.c b/examples/src/parameter.c +index 1bb6858..7f519e7 100644 +--- a/examples/src/parameter.c ++++ b/examples/src/parameter.c +@@ -13,6 +13,8 @@ + + #include "parameter.h" + ++static int32_t g_inject_delay[INJECT_DELAY_MAX] = {0}; ++static int32_t g_inject_skip[INJECT_SKIP_MAX]; + + // program short options + const char prog_short_opts[] = \ +@@ -30,9 +32,11 @@ const char prog_short_opts[] = \ + "r" // ringpmd + "d" // debug + "h" // help +- "E" // epollcreate +- "C" // accept ++ "E:" // epollcreate ++ "C:" // accept + "g:" // group address ++ "k:" // tcp keep_alive ++ "I:" // fault inject + ; + + // program long options +@@ -55,17 +59,72 @@ const struct ProgramOption prog_long_opts[] = \ + {PARAM_NAME_EPOLLCREATE, REQUIRED_ARGUMETN, NULL, PARAM_NUM_EPOLLCREATE}, + {PARAM_NAME_ACCEPT, REQUIRED_ARGUMETN, NULL, PARAM_NUM_ACCEPT}, + {PARAM_NAME_GROUPIP, REQUIRED_ARGUMETN, NULL, PARAM_NUM_GROUPIP}, ++ {PARAM_NAME_KEEPALIVE, REQUIRED_ARGUMETN, NULL, PARAM_NUM_KEEPALIVE}, ++ {PARAM_NAME_INJECT, REQUIRED_ARGUMETN, NULL, PARAM_NUM_INJECT}, + }; + + + // get long options + int getopt_long(int argc, char * const argv[], const char *optstring, const struct ProgramOption *long_opts, int *long_idx); ++// index [0,7) ++uint8_t getbit_num(uint8_t mode, uint8_t index) ++{ ++ return (mode & ((uint8_t)1 << index)) != 0; ++} ++ ++uint8_t setbitnum_on(uint8_t mode, uint8_t index) ++{ ++ mode |= ((uint8_t)1 << index); ++ return mode; ++} ++ ++uint8_t setbitnum_off(uint8_t mode, uint8_t index) ++{ ++ mode &= ~((uint8_t)1 << index); ++ return mode; ++} ++ ++static uint8_t program_set_protocol_mode(uint8_t protocol_mode, char *ipv4, char *ipv6, uint8_t index_v4, ++ uint8_t index_v6) ++{ ++ uint8_t protocol_mode_temp = protocol_mode; ++ if (strcmp(ipv4, PARAM_DEFAULT_IP) != 0) { ++ protocol_mode_temp = setbitnum_on(protocol_mode_temp, index_v4); ++ } ++ if (strcmp(ipv6, PARAM_DEFAULT_IP_V6) != 0) { ++ protocol_mode_temp = setbitnum_on(protocol_mode_temp, index_v6); ++ } ++ return protocol_mode_temp; ++} ++ ++uint8_t program_get_protocol_mode_by_domain_ip(char* domain, char* ipv4, char* ipv6, char* groupip) ++{ ++ uint8_t protocol_mode = 0; ++ char *cur_ptr = NULL; ++ char *next_Ptr = NULL; ++ cur_ptr = strtok_s(domain, ",", &next_Ptr); ++ while (cur_ptr) { ++ if (strcmp(cur_ptr, "tcp") == 0) { ++ protocol_mode = program_set_protocol_mode(protocol_mode, ipv4, ipv6, V4_TCP, V6_TCP); ++ } else if (strcmp(cur_ptr, "udp") == 0) { ++ protocol_mode = program_set_protocol_mode(protocol_mode, ipv4, ipv6, V4_UDP, V6_UDP); ++ } else if (strcmp(cur_ptr, "unix") == 0) { ++ protocol_mode = setbitnum_on(protocol_mode, UNIX); ++ } ++ cur_ptr = strtok_s(NULL, ",", &next_Ptr); ++ } ++ ++ if (strcmp(groupip, PARAM_DEFAULT_GROUPIP) != 0) { ++ protocol_mode = setbitnum_on(protocol_mode, UDP_MULTICAST); ++ } + ++ return protocol_mode; ++} + + // set `as` parameter + void program_param_parse_as(struct ProgramParams *params) + { +- if (strcmp(optarg, "server") == 0 || strcmp(optarg, "client") == 0) { ++ if (strcmp(optarg, "server") == 0 || strcmp(optarg, "client") == 0 || strcmp(optarg, "loop") == 0) { + params->as = optarg; + } else { + PRINT_ERROR("illigal argument -- %s \n", optarg); +@@ -73,40 +132,113 @@ void program_param_parse_as(struct ProgramParams *params) + } + } + +-// set `ip` parameter +-void program_param_parse_ip(struct ProgramParams *params) ++bool ip_is_v6(const char *cp) ++{ ++ if (cp != NULL) { ++ const char *c; ++ for (c = cp; *c != 0; c++) { ++ if (*c == ':') { ++ return 1; ++ } ++ } ++ } ++ return 0; ++} ++ ++ ++static bool program_ipv4_check(char *ipv4) + { +- if (inet_addr(optarg) != INADDR_NONE) { +- params->ip = optarg; ++ in_addr_t ip = ntohl(inet_addr(ipv4)); ++ if (ip == INADDR_NONE) { ++ PRINT_ERROR("illigal argument -- %s \n", ipv4); ++ return false; ++ } ++ if ((ip >= ntohl(inet_addr("1.0.0.1")) && ip <= ntohl(inet_addr("126.255.255.254"))) || ++ (ip >= ntohl(inet_addr("127.0.0.1")) && ip <= ntohl(inet_addr("127.255.255.254"))) || ++ (ip >= ntohl(inet_addr("128.0.0.1")) && ip <= ntohl(inet_addr("191.255.255.254"))) || ++ (ip >= ntohl(inet_addr("192.0.0.1")) && ip <= ntohl(inet_addr("223.255.255.254"))) || ++ (ip >= ntohl(inet_addr("224.0.0.1")) && ip <= ntohl(inet_addr("224.255.255.255"))) ) { // Broadcast IP ++ return true; ++ } ++ ++ PRINT_ERROR("illigal argument -- %s \n", ipv4); ++ return false; ++} ++ ++static void program_param_parse_ipv4_addr(char* v4ip_addr, struct ProgramParams *params) ++{ ++ struct in6_addr ip_tmp; ++ params->addr_family = AF_INET; ++ if (inet_pton(params->addr_family, v4ip_addr, &ip_tmp) > 0 && program_ipv4_check(v4ip_addr) == true) { ++ params->ip = v4ip_addr; + } else { +- PRINT_ERROR("illigal argument -- %s \n", optarg); ++ PRINT_ERROR("illegal ipv4 addr -- %s \n", v4ip_addr); + exit(PROGRAM_ABORT); + } + } + +-// set `port` parameter +-void program_param_parse_port(struct ProgramParams *params) ++static void program_param_parse_ipv6_addr(char* v6ip_add, struct ProgramParams *params) + { +- int32_t port_arg = strtol(optarg, NULL, 0); +- printf("%d\n", port_arg); +- if (CHECK_VAL_RANGE(port_arg, UNIX_TCP_PORT_MIN, UNIX_TCP_PORT_MAX) == true) { +- params->port = (uint32_t)port_arg; ++ struct in6_addr ip_tmp; ++ params->addr_family = AF_INET6; ++ if (inet_pton(AF_INET6, v6ip_add, &ip_tmp) > 0) { ++ params->ipv6 = v6ip_add; + } else { +- PRINT_ERROR("illigal argument -- %s \n", optarg); ++ PRINT_ERROR("illegal ipv6 addr -- %s \n", v6ip_add); + exit(PROGRAM_ABORT); + } + } ++// set `ip` parameter,支持同时配置 ipv4 和 ipv6 地址,格式为 ipv4,ipv6 ++void program_param_parse_ip(struct ProgramParams *params) ++{ ++ char *cur_ptr = NULL; ++ char *next_ptr = NULL; ++ ++ cur_ptr = strtok_s(optarg, ",", &next_ptr); ++ while (cur_ptr) { ++ if (ip_is_v6(cur_ptr)) { ++ program_param_parse_ipv6_addr(cur_ptr, params); ++ } else { ++ program_param_parse_ipv4_addr(cur_ptr, params); ++ } ++ cur_ptr = strtok_s(NULL, ",", &next_ptr); ++ } ++} ++ ++// set `port` parameter ++void program_param_parse_port(struct ProgramParams *params) ++{ ++ char* port_list = optarg; ++ char* token = NULL; ++ int32_t port_arg = 0; ++ params->port[PARAM_DEFAULT_PORT] = 0; ++ ++ while ((token = strtok_r(port_list, ",", &port_list))) { ++ port_arg = strtol(token, NULL, 0); ++ if (CHECK_VAL_RANGE(port_arg, UNIX_TCP_PORT_MIN, UNIX_TCP_PORT_MAX) == true) { ++ params->port[port_arg] = 1; ++ } else { ++ PRINT_ERROR("illigal argument -- %s \n", optarg); ++ exit(PROGRAM_ABORT); ++ } ++ } ++} + + // set `sport` parameter + void program_param_parse_sport(struct ProgramParams *params) + { +- int32_t sport_arg = strtol(optarg, NULL, 0); +- printf("%d\n", sport_arg); +- if (CHECK_VAL_RANGE(sport_arg, UNIX_TCP_PORT_MIN, UNIX_TCP_PORT_MAX) == true) { +- params->sport = (uint32_t)sport_arg; +- } else { +- PRINT_ERROR("illigal argument -- %s \n", optarg); +- exit(PROGRAM_ABORT); ++ char* port_list = optarg; ++ char* token = NULL; ++ int32_t port_arg = 0; ++ ++ while ((token = strtok_r(port_list, ",", &port_list))) { ++ port_arg = strtol(token, NULL, 0); ++ if (CHECK_VAL_RANGE(port_arg, UNIX_TCP_PORT_MIN, UNIX_TCP_PORT_MAX) == true) { ++ params->sport[port_arg] = 1; ++ } else { ++ PRINT_ERROR("illigal argument -- %s \n", optarg); ++ exit(PROGRAM_ABORT); ++ } + } + } + +@@ -148,12 +280,23 @@ void program_param_parse_threadnum(struct ProgramParams *params) + // set `domain` parameter + void program_param_parse_domain(struct ProgramParams *params) + { +- if (strcmp(optarg, "unix") == 0 || strcmp(optarg, "tcp") == 0 || strcmp(optarg, "udp") == 0) { +- params->domain = optarg; +- } else { +- PRINT_ERROR("illigal argument -- %s \n", optarg); ++ char temp[100] = {0}; ++ int32_t ret = strcpy_s(temp, sizeof(temp) / sizeof(char), optarg); ++ if (ret != 0) { ++ PRINT_ERROR("strcpy_s fail ret=%d \n", ret); + exit(PROGRAM_ABORT); + } ++ char *cur_ptr = temp; ++ char *next_ptr = NULL; ++ cur_ptr = strtok_s(cur_ptr, ",", &next_ptr); ++ while (cur_ptr) { ++ if (strcmp(cur_ptr, "tcp") != 0 && strcmp(cur_ptr, "udp") != 0 && strcmp(cur_ptr, "unix") != 0) { ++ PRINT_ERROR("illigal argument -- %s \n", cur_ptr); ++ exit(PROGRAM_ABORT); ++ } ++ cur_ptr = strtok_s(NULL, ",", &next_ptr); ++ } ++ params->domain = optarg; + } + + // set `api` parameter +@@ -174,6 +317,9 @@ void program_param_parse_pktlen(struct ProgramParams *params) + int32_t pktlen_arg = strtol(optarg, NULL, 0); + if (CHECK_VAL_RANGE(pktlen_arg, MESSAGE_PKTLEN_MIN, MESSAGE_PKTLEN_MAX) == true) { + params->pktlen = (uint32_t)pktlen_arg; ++ if (strstr(params->domain, "udp") && params->pktlen > UDP_PKTLEN_MAX) { ++ PRINT_WARNNING("udp message too long, change it to %d \n", UDP_PKTLEN_MAX); ++ } + } else { + PRINT_ERROR("illigal argument -- %s \n", optarg); + exit(PROGRAM_ABORT); +@@ -202,16 +348,196 @@ void program_param_parse_accept(struct ProgramParams *params) + } + } + ++// set `tcp_keepalive_idle` parameter ++void program_param_parse_keepalive(struct ProgramParams *params) ++{ ++ char *token = NULL; ++ char *next_token = NULL; ++ token = strtok_s(optarg, ",", &next_token); ++ if (token == NULL) { ++ PRINT_ERROR("parse keep_alive idle null, illigal argument(%s) \n", optarg); ++ exit(PROGRAM_ABORT); ++ } ++ ++ int32_t keep_alive_idle = strtol(optarg, NULL, 0); ++ if (keep_alive_idle > 0 && keep_alive_idle <= TCP_KEEPALIVE_IDLE_MAX) { ++ params->tcp_keepalive_idle = keep_alive_idle; ++ } else { ++ PRINT_ERROR("keep_alive_idle=%d,illigal argument -- %s \n", keep_alive_idle, optarg); ++ exit(PROGRAM_ABORT); ++ } ++ ++ token = strtok_s(NULL, ",", &next_token); ++ if (token == NULL) { ++ PRINT_ERROR("parse keep_alive interval null, illigal argument(%s) \n", optarg); ++ exit(PROGRAM_ABORT); ++ } ++ int32_t keep_alive_interval = strtol(token, NULL, 0); ++ if (keep_alive_interval > 0 && keep_alive_interval <= TCP_KEEPALIVE_IDLE_MAX) { ++ params->tcp_keepalive_interval = keep_alive_interval; ++ } else { ++ PRINT_ERROR("keep_alive_interval=%d,illigal argument -- %s \n", keep_alive_interval, optarg); ++ exit(PROGRAM_ABORT); ++ } ++} ++ + // set `group ip` parameter + void program_param_parse_groupip(struct ProgramParams *params) + { +- in_addr_t ip = inet_addr(optarg); +- if (ip != INADDR_NONE && ip >= inet_addr("224.0.0.0") && ip <= inet_addr("239.255.255.255")) { +- params->groupip = optarg; ++ char *cur_ptr = NULL; ++ char *next_ptr = NULL; ++ ++ cur_ptr = strtok_s(optarg, ",", &next_ptr); ++ if (program_ipv4_check(cur_ptr) == false) { ++ PRINT_ERROR("illigal argument -- %s \n", cur_ptr); ++ exit(PROGRAM_ABORT); ++ } ++ ++ in_addr_t ip = ntohl(inet_addr(cur_ptr)); ++ if (ip != INADDR_NONE && ip >= ntohl(inet_addr("224.0.0.0")) && ip <= ntohl(inet_addr("239.255.255.255"))) { ++ params->groupip = cur_ptr; + } else { +- PRINT_ERROR("illigal argument -- %s \n", optarg); ++ PRINT_ERROR("illigal argument -- %s \n", cur_ptr); ++ exit(PROGRAM_ABORT); ++ } ++ ++ if (*next_ptr) { ++ if (program_ipv4_check(next_ptr)) { ++ params->groupip_interface = next_ptr; ++ } else { ++ PRINT_ERROR("illigal argument -- %s \n", next_ptr); ++ exit(PROGRAM_ABORT); ++ } ++ } ++} ++ ++void fault_inject_delay(delay_type type) ++{ ++ if (g_inject_delay[type]) { ++ printf("FAULT INJECT: Delay begin, sleep %d seconds.\n", g_inject_delay[type]); ++ sleep(g_inject_delay[type]); ++ g_inject_delay[type] = 0; ++ printf("FAULT INJECT: Delay finished.\n"); ++ } ++} ++ ++ ++// apply fault inject type of delay ++static void delay_param_parse(struct ProgramParams *params) ++{ ++ int32_t time = 0; ++ if (params->inject[INJECT_TIME_IDX] != NULL) { ++ time = atoi(params->inject[INJECT_TIME_IDX]); ++ } ++ if (time <= 0) { ++ PRINT_ERROR("FAULT INJECT: delay time input error! receive: \"%s\"\n", params->inject[INJECT_TIME_IDX]); ++ exit(PROGRAM_ABORT); ++ } ++ ++ char *location = params->inject[INJECT_LOCATION_IDX]; ++ if (location == NULL) { ++ PRINT_ERROR("FAULT INJECT: Lack param for delay fault inject, The location is not appointed.\n"); + exit(PROGRAM_ABORT); + } ++ ++ if (strcmp("before_accept", location) == 0) { ++ g_inject_delay[INJECT_DELAY_ACCEPT] = time; ++ return; ++ } ++ if (strcmp("before_read", location) == 0) { ++ g_inject_delay[INJECT_DELAY_READ] = time; ++ return; ++ } ++ if (strcmp("before_write", location) == 0) { ++ g_inject_delay[INJECT_DELAY_WRITE] = time; ++ return; ++ } ++ if (strcmp("before_read_and_write", location) == 0) { ++ g_inject_delay[INJECT_DELAY_READ] = time; ++ g_inject_delay[INJECT_DELAY_WRITE] = time; ++ return; ++ } ++ ++ PRINT_ERROR("FAULT INJECT: Unidentified fault inject location -- %s \n", location); ++ exit(PROGRAM_ABORT); ++} ++ ++// apply fault inject type of skip ++static void skip_param_parse(struct ProgramParams *params) ++{ ++ char* location = params->inject[INJECT_SKIP_IDX]; ++ if (location == NULL) { ++ PRINT_ERROR("FAULT INJECT: Lack param for skip fault inject, location is not appointed.\n"); ++ exit(PROGRAM_ABORT); ++ } ++ ++ if (strcmp("read", location) == 0) { ++ g_inject_skip[INJECT_SKIP_READ] = 1; ++ return; ++ } ++ if (strcmp("write", location) == 0) { ++ g_inject_skip[INJECT_SKIP_WRITE] = 1; ++ return; ++ } ++ if (strcmp("read_and_write", location) == 0) { ++ g_inject_skip[INJECT_SKIP_READ] = 1; ++ g_inject_skip[INJECT_SKIP_WRITE] = 1; ++ return; ++ } ++ ++ PRINT_ERROR("FAULT INJECT: Unidentified fault inject location -- %s \n", location); ++ exit(PROGRAM_ABORT); ++} ++ ++// judge if need skip fault inject ++int32_t get_g_inject_skip(skip_type type) ++{ ++ return g_inject_skip[type]; ++} ++ ++// check legitimacy of fault injection and apply it. ++static void apply_fault_inject(struct ProgramParams *params) ++{ ++ char *inject_type = params->inject[INJECT_TYPE_IDX]; ++ if (strcmp("delay", inject_type) == 0) { ++ delay_param_parse(params); ++ return; ++ } ++ if (strcmp("skip", inject_type) == 0) { ++ skip_param_parse(params); ++ return; ++ } ++ ++ PRINT_ERROR("FAULT INJCET: Unidentified fault inject -- %s \n", inject_type); ++ exit(PROGRAM_ABORT); ++} ++ ++// set `fault injection` parameter ++static void program_param_parse_inject(struct ProgramParams *params) ++{ ++ int32_t inject_idx = 0; ++ char *inject_input = strdup(optarg); ++ if (inject_input == NULL) { ++ PRINT_ERROR("FAULT INJCET: Insufficient memory, strdup failed.\n"); ++ exit(PROGRAM_ABORT); ++ } ++ ++ char *context = NULL; ++ char *elem = strtok_s(inject_input, " ", &context); ++ if (elem == NULL) { ++ PRINT_ERROR("FAULT INJECT: Input error. -- %s \n", inject_input); ++ exit(PROGRAM_ABORT); ++ } ++ while (elem != NULL) { ++ if (inject_idx == FAULT_INJECT_PARA_COUNT) { ++ PRINT_ERROR("FAULT INJECT: Exceed the max count (3) of fault inject params. -- %s\n", optarg); ++ exit(PROGRAM_ABORT); ++ } ++ params->inject[inject_idx++] = elem; ++ elem = strtok_s(NULL, " ", &context); ++ } ++ ++ apply_fault_inject(params); + } + + // initialize the parameters +@@ -219,8 +545,11 @@ void program_params_init(struct ProgramParams *params) + { + params->as = PARAM_DEFAULT_AS; + params->ip = PARAM_DEFAULT_IP; +- params->port = PARAM_DEFAULT_PORT; +- params->sport = PARAM_DEFAULT_SPORT; ++ params->ipv6 = PARAM_DEFAULT_IP_V6; ++ params->addr_family = PARAM_DEFAULT_ADDR_FAMILY; ++ memset_s(params->port, sizeof(bool)*UNIX_TCP_PORT_MAX, 0, sizeof(bool)*UNIX_TCP_PORT_MAX); ++ memset_s(params->sport, sizeof(bool)*UNIX_TCP_PORT_MAX, 0, sizeof(bool)*UNIX_TCP_PORT_MAX); ++ (params->port)[PARAM_DEFAULT_PORT] = 1; + params->model = PARAM_DEFAULT_MODEL; + params->thread_num = PARAM_DEFAULT_THREAD_NUM; + params->connect_num = PARAM_DEFAULT_CONNECT_NUM; +@@ -233,15 +562,19 @@ void program_params_init(struct ProgramParams *params) + params->epollcreate = PARAM_DEFAULT_EPOLLCREATE; + params->accept = PARAM_DEFAULT_ACCEPT; + params->groupip = PARAM_DEFAULT_GROUPIP; ++ params->groupip_interface = PARAM_DEFAULT_GROUPIP; ++ params->tcp_keepalive_idle = PARAM_DEFAULT_KEEPALIVEIDLE; ++ params->tcp_keepalive_interval = PARAM_DEFAULT_KEEPALIVEIDLE; + } + + // print program helps + void program_params_help(void) + { + printf("\n"); +- printf("-a, --as [server | client]: set programas server or client. \n"); ++ printf("-a, --as [server | client | loop]: set programas server, client or loop. \n"); + printf(" server: as server. \n"); + printf(" client: as client. \n"); ++ printf(" loop: as server and client. \n"); + printf("-i, --ip [???.???.???.???]: set ip address. \n"); + printf("-g, --groupip [???.???.???.???]: set group ip address. \n"); + printf("-p, --port [????]: set port number in range of %d - %d. \n", UNIX_TCP_PORT_MIN, UNIX_TCP_PORT_MAX); +@@ -268,6 +601,16 @@ void program_params_help(void) + printf("-h, --help: see helps. \n"); + printf("-E, --epollcreate [ec | ec1]: epoll_create method. \n"); + printf("-C, --accept [ac | ac4]: accept method. \n"); ++ printf("-k, --keep_alive [keep_alive_idle:keep_alive_interval]: set tcp-alive info in range of %d-%d. \n", ++ PARAM_DEFAULT_KEEPALIVEIDLE, TCP_KEEPALIVE_IDLE_MAX); ++ printf("-I, --inject [\"fault_inject_param0 fault_inject_param1 fault_inject_param2\"]: fault inject\n"); ++ printf(" for example: \"delay 20 before_accept\"\n"); ++ printf(" \"delay 20 before_read\"\n"); ++ printf(" \"delay 20 before_write\"\n"); ++ printf(" \"delay 20 before_read_and_write\"\n"); ++ printf(" \"skip read\"\n"); ++ printf(" \"skip write\"\n"); ++ printf(" \"skip read_and_write\"\n"); + printf("\n"); + } + +@@ -295,7 +638,7 @@ int32_t program_params_parse(struct ProgramParams *params, uint32_t argc, char * + case (PARAM_NUM_PORT): + program_param_parse_port(params); + break; +- case (PARAM_NUM_SPORT): ++ case (PARAM_NUM_SPORT): + program_param_parse_sport(params); + break; + case (PARAM_NUM_MODEL): +@@ -331,9 +674,15 @@ int32_t program_params_parse(struct ProgramParams *params, uint32_t argc, char * + case (PARAM_NUM_ACCEPT): + program_param_parse_accept(params); + break; +- case (PARAM_NUM_GROUPIP): +- program_param_parse_groupip(params); +- break; ++ case (PARAM_NUM_GROUPIP): ++ program_param_parse_groupip(params); ++ break; ++ case (PARAM_NUM_KEEPALIVE): ++ program_param_parse_keepalive(params); ++ break; ++ case (PARAM_NUM_INJECT): ++ program_param_parse_inject(params); ++ break; + case (PARAM_NUM_HELP): + program_params_help(); + return PROGRAM_ABORT; +@@ -345,11 +694,6 @@ int32_t program_params_parse(struct ProgramParams *params, uint32_t argc, char * + } + } + +- if (strcmp(params->domain, "tcp") != 0) { +- params->thread_num = 1; +- params->connect_num = 1; +- } +- + return PROGRAM_OK; + } + +@@ -361,22 +705,47 @@ void program_params_print(struct ProgramParams *params) + printf("--> [as]: %s \n", params->as); + if (strcmp(params->groupip, PARAM_DEFAULT_GROUPIP) != 0) { + if (strcmp(params->as, "server") == 0) { +- printf("--> [server ip]: %s \n", params->ip); + printf("--> [server group ip]: %s \n", params->groupip); ++ printf("--> [server groupip_interface]: %s \n", params->groupip_interface); + } else { +- printf("--> [server ip]: %s \n", params->groupip); +- printf("--> [client send ip]: %s \n", params->ip); ++ printf("--> [client group ip]: %s \n", params->groupip); ++ printf("--> [client groupip_interface]: %s \n", params->groupip_interface); + } +- } else { +- printf("--> [server ip]: %s \n", params->ip); + } +- if ((strcmp(params->as, "server") == 0 && strcmp(params->groupip, PARAM_DEFAULT_GROUPIP)) != 0) { +- printf("--> [server group ip]: %s \n", params->groupip); ++ printf("--> [server ip]: %s \n", params->ip); ++ if (strcmp(params->ipv6, PARAM_DEFAULT_IP_V6) != 0) { ++ printf("--> [server ipv6]: %s \n", params->ipv6); ++ } ++ ++ printf("--> [server port]: "); ++ uint32_t comma = 0; ++ uint32_t sport = 0; ++ ++ /* use comma to print port list */ ++ for (uint32_t i = UNIX_TCP_PORT_MIN; i < UNIX_TCP_PORT_MAX; i++) { ++ if ((params->port)[i]) { ++ printf("%s%u", comma?",":"", i); ++ comma = 1; ++ } ++ if ((params->sport)[i]) { ++ sport = i; ++ } + } +- printf("--> [server port]: %u \n", params->port); +- if (params->sport && strcmp(params->as, "client") == 0) { +- printf("--> [client sport]: %u \n", params->sport); ++ printf(" \n"); ++ ++ /* use comma to print sport list */ ++ if (sport && strcmp(params->as, "client") == 0) { ++ printf("--> [client sport]: "); ++ comma = 0; ++ for (uint32_t i = UNIX_TCP_PORT_MIN; i < sport + 1; i++) { ++ if ((params->sport)[i]) { ++ printf("%s%u", comma?",":"", i); ++ comma = 1; ++ } ++ } ++ printf(" \n"); + } ++ + if (strcmp(params->as, "server") == 0) { + printf("--> [model]: %s \n", params->model); + } +@@ -404,5 +773,16 @@ void program_params_print(struct ProgramParams *params) + printf("--> [debug]: %s \n", (params->debug == true) ? "on" : "off"); + printf("--> [epoll create]: %s \n", params->epollcreate); + printf("--> [accept]: %s \n", params->accept); ++ printf("--> [inject]: "); ++ if (params->inject[INJECT_TYPE_IDX] == NULL) { ++ printf("none \n"); ++ } else { ++ for (int32_t i = 0; i < FAULT_INJECT_PARA_COUNT; ++i) { ++ if (params->inject[i] != NULL) { ++ printf("%s ", params->inject[i]); ++ } ++ } ++ printf("\n"); ++ } + printf("\n"); + } +diff --git a/examples/src/server.c b/examples/src/server.c +index 8634dde..7bc7d9e 100644 +--- a/examples/src/server.c ++++ b/examples/src/server.c +@@ -14,20 +14,22 @@ + #include "server.h" + + static pthread_mutex_t server_debug_mutex; // the server mutex for debug ++struct LoopInfo loopmod; + + // server debug information print +-void server_debug_print(const char *ch_str, const char *act_str, in_addr_t ip, uint16_t port, bool debug) ++void server_debug_print(const char *ch_str, const char *act_str, ip_addr_t *ip, uint16_t port, bool debug) + { + if (debug == true) { + pthread_mutex_lock(&server_debug_mutex); +- struct in_addr sin_addr; +- sin_addr.s_addr = ip; ++ uint8_t str_len = ip->addr_family == AF_INET ? INET_ADDRSTRLEN : INET6_ADDRSTRLEN; ++ char str_ip[str_len]; ++ inet_ntop(ip->addr_family, &ip->u_addr, str_ip, str_len); + PRINT_SERVER("[%s] [pid: %d] [tid: %ld] [%s <- %s:%d]. ", \ + ch_str, \ + getpid(), \ + pthread_self(), \ + act_str, \ +- inet_ntoa(sin_addr), \ ++ str_ip, \ + ntohs(port)); + pthread_mutex_unlock(&server_debug_mutex); + } +@@ -37,7 +39,7 @@ void server_debug_print(const char *ch_str, const char *act_str, in_addr_t ip, u + void sermud_info_print(struct ServerMud *server_mud) + { + if (server_mud->debug == false) { +- uint32_t curr_connect = server_mud->curr_connect; ++ uint32_t curr_connect = 0; + + struct timeval begin; + gettimeofday(&begin, NULL); +@@ -48,6 +50,7 @@ void sermud_info_print(struct ServerMud *server_mud) + struct ServerMudWorker *begin_uint = server_mud->workers; + while (begin_uint != NULL) { + begin_recv_bytes += begin_uint->recv_bytes; ++ curr_connect += begin_uint->curr_connect; + begin_uint = begin_uint->next; + } + +@@ -122,45 +125,82 @@ int32_t sermud_listener_create_epfd_and_reg(struct ServerMud *server_mud) + } + + struct epoll_event ep_ev; +- ep_ev.data.ptr = (void *)&(server_mud->listener); + ep_ev.events = EPOLLIN | EPOLLET; +- if (epoll_ctl(server_mud->epfd, EPOLL_CTL_ADD, server_mud->listener.fd, &ep_ev) < 0) { +- PRINT_ERROR("server can't control epoll %d! ", errno); +- return PROGRAM_FAULT; ++ for (int i = 0; i < PROTOCOL_MODE_MAX; i++) { ++ if (server_mud->listener.listen_fd_array[i] != -1) { ++ struct ServerHandler *server_handler = (struct ServerHandler *)malloc(sizeof(struct ServerHandler)); ++ memset_s(server_handler, sizeof(struct ServerHandler), 0, sizeof(struct ServerHandler)); ++ server_handler->fd = server_mud->listener.listen_fd_array[i]; ++ ep_ev.data.ptr = (void *)server_handler; ++ if (epoll_ctl(server_mud->epfd, EPOLL_CTL_ADD, server_mud->listener.listen_fd_array[i], &ep_ev) < 0) { ++ PRINT_ERROR("epoll_ctl failed %d! listen_fd=%d ", errno, server_mud->listener.listen_fd_array[i]); ++ return PROGRAM_FAULT; ++ } ++ } ++ } ++ ++ return PROGRAM_OK; ++} ++ ++static void sermud_accept_get_remote_ip(sockaddr_t *accept_addr, ip_addr_t *remote_ip, bool is_tcp_v6_flag) ++{ ++ remote_ip->addr_family = is_tcp_v6_flag ? AF_INET6 : AF_INET; ++ if (is_tcp_v6_flag == false) { ++ remote_ip->u_addr.ip4 = ((struct sockaddr_in *)accept_addr)->sin_addr; ++ } else { ++ remote_ip->u_addr.ip6 = ((struct sockaddr_in6 *)accept_addr)->sin6_addr; + } ++} + +- server_debug_print("server mud listener", "waiting", server_mud->ip, server_mud->port, server_mud->debug); ++int32_t sermud_set_socket_opt(int32_t accept_fd, struct ServerMud *server_mud) ++{ ++ if (set_tcp_keep_alive_info(accept_fd, server_mud->tcp_keepalive_idle, server_mud->tcp_keepalive_interval) < 0) { ++ PRINT_ERROR("cant't set_tcp_keep_alive_info! "); ++ return PROGRAM_FAULT; ++ } + ++ if (set_socket_unblock(accept_fd) < 0) { ++ PRINT_ERROR("server can't set the connect socket to unblock! "); ++ return PROGRAM_FAULT; ++ } + return PROGRAM_OK; + } + + // the listener thread, unblock, dissymmetric server accepts the connections +-int32_t sermud_listener_accept_connects(struct ServerMud *server_mud) ++int32_t sermud_listener_accept_connects(struct epoll_event *curr_epev, struct ServerMud *server_mud) + { ++ int32_t fd = ((struct ServerHandler*)(curr_epev->data.ptr))->fd; ++ fault_inject_delay(INJECT_DELAY_ACCEPT); ++ + while (true) { +- struct sockaddr_in accept_addr; +- uint32_t sockaddr_in_len = sizeof(struct sockaddr_in); ++ sockaddr_t accept_addr; ++ bool is_tcp_v6_flag = (fd == server_mud->listener.listen_fd_array[V6_TCP]) ? true : false; ++ ++ uint32_t sockaddr_in_len = is_tcp_v6_flag ? sizeof(struct sockaddr_in6) : sizeof(struct sockaddr_in); ++ + int32_t accept_fd; +- if (strcmp(server_mud->domain, "udp") == 0) { +- break; +- } ++ ++ int32_t listen_fd_index = is_tcp_v6_flag ? V6_TCP : V4_TCP; ++ int32_t listen_fd = server_mud->listener.listen_fd_array[listen_fd_index]; + + if (strcmp(server_mud->accept, "ac4") == 0) { +- accept_fd = accept4(server_mud->listener.fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len, SOCK_CLOEXEC); ++ accept_fd = accept4(listen_fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len, SOCK_CLOEXEC); + } else { +- accept_fd = accept(server_mud->listener.fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len); ++ accept_fd = accept(listen_fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len); + } +- ++ + if (accept_fd < 0) { + break; + } + +- if (set_socket_unblock(accept_fd) < 0) { +- PRINT_ERROR("server can't set the connect socket to unblock! "); ++ if (sermud_set_socket_opt(accept_fd, server_mud) < 0) { + return PROGRAM_FAULT; + } + +- ++(server_mud->curr_connect); ++ // sockaddr to ip, port ++ ip_addr_t remote_ip; ++ uint16_t remote_port = ((struct sockaddr_in *)&accept_addr)->sin_port; ++ sermud_accept_get_remote_ip(&accept_addr, &remote_ip, is_tcp_v6_flag); + + pthread_t *tid = (pthread_t *)malloc(sizeof(pthread_t)); + struct ServerMudWorker *worker = (struct ServerMudWorker *)malloc(sizeof(struct ServerMudWorker)); +@@ -169,26 +209,50 @@ int32_t sermud_listener_accept_connects(struct ServerMud *server_mud) + worker->epevs = (struct epoll_event *)malloc(sizeof(struct epoll_event)); + worker->recv_bytes = 0; + worker->pktlen = server_mud->pktlen; +- worker->ip = accept_addr.sin_addr.s_addr; +- worker->port = accept_addr.sin_port; ++ worker->ip = remote_ip; ++ worker->port = remote_port; + worker->api = server_mud->api; + worker->debug = server_mud->debug; + worker->next = server_mud->workers; + worker->epollcreate = server_mud->epollcreate; ++ worker->worker.is_v6 = is_tcp_v6_flag ? 1 : 0; ++ worker->domain = server_mud->domain; ++ worker->curr_connect = 1; + + server_mud->workers = worker; + +- if (pthread_create(tid, NULL, sermud_worker_create_and_run, server_mud) < 0) { ++ if (pthread_create(tid, NULL, sermud_worker_create_and_run, worker) < 0) { + PRINT_ERROR("server can't create poisx thread %d! ", errno); + return PROGRAM_FAULT; + } + +- server_debug_print("server mud listener", "accept", accept_addr.sin_addr.s_addr, accept_addr.sin_port, server_mud->debug); ++ server_debug_print("server mud listener", "accept", &remote_ip, remote_port, server_mud->debug); + } + + return PROGRAM_OK; + } + ++static int32_t server_handler_close(int32_t epfd, struct ServerHandler *server_handler) ++{ ++ int32_t fd = server_handler->fd; ++ struct epoll_event ep_ev; ++ if (server_handler) { ++ free(server_handler); ++ } ++ ++ if (epoll_ctl(epfd, EPOLL_CTL_DEL, fd, &ep_ev) < 0) { ++ PRINT_ERROR("server can't delete socket '%d' to control epoll %d! ", fd, errno); ++ return PROGRAM_FAULT; ++ } ++ ++ if (close(fd) < 0) { ++ PRINT_ERROR("server can't close the socket %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ ++ return 0; ++} ++ + // the worker thread, unblock, dissymmetric server processes the events + int32_t sermud_worker_proc_epevs(struct ServerMudWorker *worker_unit, const char* domain) + { +@@ -201,32 +265,60 @@ int32_t sermud_worker_proc_epevs(struct ServerMudWorker *worker_unit, const char + for (int32_t i = 0; i < epoll_nfds; ++i) { + struct epoll_event *curr_epev = worker_unit->epevs + i; + +- if (curr_epev->events == EPOLLERR || curr_epev->events == EPOLLHUP || curr_epev->events == EPOLLRDHUP) { ++ if (curr_epev->events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) { ++ worker_unit->curr_connect--; + PRINT_ERROR("server epoll wait error %d! ", curr_epev->events); +- return PROGRAM_FAULT; ++ if (server_handler_close(worker_unit->epfd, (struct ServerHandler *)curr_epev->data.ptr) != 0) { ++ return PROGRAM_FAULT; ++ } + } + + if (curr_epev->events == EPOLLIN) { + struct ServerHandler *server_handler = (struct ServerHandler *)curr_epev->data.ptr; + +- int32_t server_ans_ret = server_ans(server_handler, worker_unit->pktlen, worker_unit->api, domain); ++ int32_t server_ans_ret = server_ans(server_handler->fd, worker_unit->pktlen, worker_unit->api, "tcp"); + if (server_ans_ret == PROGRAM_FAULT) { +- struct epoll_event ep_ev; +- if (epoll_ctl(worker_unit->epfd, EPOLL_CTL_DEL, server_handler->fd, &ep_ev) < 0) { +- PRINT_ERROR("server can't delete socket '%d' to control epoll %d! ", server_handler->fd, errno); ++ worker_unit->curr_connect--; ++ if (server_handler_close(worker_unit->epfd, server_handler) != 0) { + return PROGRAM_FAULT; + } + } else if (server_ans_ret == PROGRAM_ABORT) { +- if (close(server_handler->fd) < 0) { +- PRINT_ERROR("server can't close the socket %d! ", errno); ++ worker_unit->curr_connect--; ++ server_debug_print("server mud worker", "close", &worker_unit->ip, worker_unit->port, worker_unit->debug); ++ if (server_handler_close(worker_unit->epfd, server_handler) != 0) { + return PROGRAM_FAULT; + } +- server_debug_print("server mud worker", "close", worker_unit->ip, worker_unit->port, worker_unit->debug); + } else { + worker_unit->recv_bytes += worker_unit->pktlen; +- server_debug_print("server mud worker", "receive", worker_unit->ip, worker_unit->port, worker_unit->debug); ++ server_debug_print("server mud worker", "receive", &worker_unit->ip, worker_unit->port, worker_unit->debug); ++ } ++ } ++ } ++ ++ return PROGRAM_OK; ++} ++ ++static int32_t sermud_process_epollin_event(struct epoll_event *curr_epev, struct ServerMud *server_mud) ++{ ++ struct ServerHandler *server_handler = (struct ServerHandler *)curr_epev->data.ptr; ++ ++ if (server_handler->fd == server_mud->listener.listen_fd_array[V4_UDP] || ++ server_handler->fd == server_mud->listener.listen_fd_array[UDP_MULTICAST]) { ++ uint32_t pktlen = server_mud->pktlen > UDP_PKTLEN_MAX ? UDP_PKTLEN_MAX : server_mud->pktlen; ++ int32_t server_ans_ret = server_ans(server_handler->fd, pktlen, server_mud->api, "udp"); ++ if (server_ans_ret != PROGRAM_OK) { ++ if (server_handler_close(server_mud->epfd, server_handler) != 0) { ++ PRINT_ERROR("server_handler_close server_ans_ret %d! \n", server_ans_ret); ++ return PROGRAM_FAULT; + } + } ++ server_mud->workers->recv_bytes += pktlen; ++ } else { ++ int32_t sermud_listener_accept_connects_ret = sermud_listener_accept_connects(curr_epev, server_mud); ++ if (sermud_listener_accept_connects_ret < 0) { ++ PRINT_ERROR("server try accept error %d! ", sermud_listener_accept_connects_ret); ++ return PROGRAM_FAULT; ++ } + } + + return PROGRAM_OK; +@@ -244,15 +336,14 @@ int32_t sermud_listener_proc_epevs(struct ServerMud *server_mud) + for (int32_t i = 0; i < epoll_nfds; ++i) { + struct epoll_event *curr_epev = server_mud->epevs + i; + +- if (curr_epev->events == EPOLLERR || curr_epev->events == EPOLLHUP || curr_epev->events == EPOLLRDHUP) { ++ if (curr_epev->events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) { + PRINT_ERROR("server epoll wait error %d! ", curr_epev->events); +- return PROGRAM_FAULT; ++ server_handler_close(server_mud->epfd, (struct ServerHandler *)curr_epev->data.ptr); ++ return PROGRAM_OK; + } + + if (curr_epev->events == EPOLLIN) { +- int32_t sermud_listener_accept_connects_ret = sermud_listener_accept_connects(server_mud); +- if (sermud_listener_accept_connects_ret < 0) { +- PRINT_ERROR("server try accept error %d! ", sermud_listener_accept_connects_ret); ++ if (sermud_process_epollin_event(curr_epev, server_mud) < 0) { + return PROGRAM_FAULT; + } + } +@@ -266,15 +357,15 @@ void *sermud_worker_create_and_run(void *arg) + { + pthread_detach(pthread_self()); + +- struct ServerMudWorker *worker_unit = ((struct ServerMud *)arg)->workers; +- char* domain = ((struct ServerMud *)arg)->domain; ++ struct ServerMudWorker *worker_unit = (struct ServerMudWorker *)arg; ++ char *domain = worker_unit->domain; + + if (sermud_worker_create_epfd_and_reg(worker_unit) < 0) { +- exit(PROGRAM_FAULT); ++ return (void *)PROGRAM_OK; + } + while (true) { + if (sermud_worker_proc_epevs(worker_unit, domain) < 0) { +- exit(PROGRAM_FAULT); ++ return (void *)PROGRAM_OK; + } + } + +@@ -284,26 +375,60 @@ void *sermud_worker_create_and_run(void *arg) + return (void *)PROGRAM_OK; + } + ++void sermud_memory_recycle(struct ServerMud *server_mud) ++{ ++ // recycle mem of epevs ++ if (server_mud->epevs) { ++ free(server_mud->epevs); ++ } ++ struct ServerMudWorker *head = server_mud->workers; ++ while (head) { ++ if (head->epevs) { ++ free(head->epevs); ++ } ++ struct ServerMudWorker *next = head->next; ++ free(head); ++ head = next; ++ } ++} ++ + // create the listener thread, unblock, dissymmetric server and run + void *sermud_listener_create_and_run(void *arg) + { + struct ServerMud *server_mud = (struct ServerMud *)arg; + +- if (create_socket_and_listen(&(server_mud->listener.fd), server_mud->ip, server_mud->groupip, server_mud->port, server_mud->domain) < 0) { +- exit(PROGRAM_FAULT); ++ uint32_t port = 0; ++ for (; port < UNIX_TCP_PORT_MAX; port++) { ++ if ((server_mud->port)[port]) { ++ if (create_socket_and_listen(server_mud->listener.listen_fd_array, &(server_mud->server_ip_info), ++ htons(port), server_mud->protocol_type_mode) < 0) { ++ PRINT_ERROR("create_socket_and_listen err"); ++ sermud_memory_recycle(server_mud); ++ exit(PROGRAM_FAULT); ++ } ++ } + } ++ + if (sermud_listener_create_epfd_and_reg(server_mud) < 0) { +- exit(PROGRAM_FAULT); ++ sermud_memory_recycle(server_mud); ++ exit(PROGRAM_FAULT); + } + while (true) { + if (sermud_listener_proc_epevs(server_mud) < 0) { ++ sermud_memory_recycle(server_mud); + exit(PROGRAM_FAULT); + } + } +- if (close(server_mud->listener.fd) < 0 || close(server_mud->epfd) < 0) { +- exit(PROGRAM_FAULT); +- } + ++ for (int i = 0; i < PROTOCOL_MODE_MAX; i++) { ++ if (server_mud->listener.listen_fd_array[i] == -1) ++ continue; ++ if (close(server_mud->listener.listen_fd_array[i]) < 0) { ++ sermud_memory_recycle(server_mud); ++ exit(PROGRAM_FAULT); ++ } ++ } ++ sermud_memory_recycle(server_mud); + return (void *)PROGRAM_OK; + } + +@@ -319,19 +444,44 @@ int32_t sermud_create_and_run(struct ProgramParams *params) + } + + server_mud->listener.fd = -1; +- server_mud->workers = NULL; ++ for (int32_t i = 0; i < PROTOCOL_MODE_MAX; i++) { ++ server_mud->listener.listen_fd_array[i] = -1; ++ } ++ ++ struct ServerMudWorker *workers = (struct ServerMudWorker *)malloc(sizeof(struct ServerMudWorker)); ++ if (workers == NULL) { ++ PRINT_ERROR("malloc truct ServerMudWorker failed "); ++ return PROGRAM_FAULT; ++ } ++ memset_s(workers, sizeof(struct ServerMudWorker), 0, sizeof(struct ServerMudWorker)); ++ workers->next = NULL; ++ server_mud->workers = workers; ++ + server_mud->epfd = -1; + server_mud->epevs = (struct epoll_event *)malloc(SERVER_EPOLL_SIZE_MAX * sizeof(struct epoll_event)); +- server_mud->curr_connect = 0; +- server_mud->ip = inet_addr(params->ip); +- server_mud->groupip = inet_addr(params->groupip); +- server_mud->port = htons(params->port); ++ server_mud->server_ip_info.ip.addr_family = params->addr_family; ++ ++ inet_pton(AF_INET, params->ip, &server_mud->server_ip_info.ip.u_addr.ip4); ++ inet_pton(AF_INET6, params->ipv6, &server_mud->server_ip_info.ip.u_addr.ip6); ++ ++ server_mud->server_ip_info.groupip.addr_family = params->addr_family; ++ inet_pton(AF_INET, params->groupip, &server_mud->server_ip_info.groupip.u_addr); ++ ++ server_mud->server_ip_info.groupip_interface.addr_family = params->addr_family; ++ inet_pton(AF_INET, params->groupip_interface, &server_mud->server_ip_info.groupip_interface.u_addr); ++ ++ server_mud->port = params->port; + server_mud->pktlen = params->pktlen; +- server_mud->domain = params->domain; ++ ++ server_mud->protocol_type_mode = program_get_protocol_mode_by_domain_ip(params->domain, params->ip, params->ipv6, ++ params->groupip); ++ + server_mud->api = params->api; + server_mud->debug = params->debug; + server_mud->epollcreate = params->epollcreate; + server_mud->accept = params->accept; ++ server_mud->tcp_keepalive_idle = params->tcp_keepalive_idle; ++ server_mud->tcp_keepalive_interval = params->tcp_keepalive_interval; + + if (pthread_create(tid, NULL, sermud_listener_create_and_run, server_mud) < 0) { + PRINT_ERROR("server can't create poisx thread %d! ", errno); +@@ -341,10 +491,17 @@ int32_t sermud_create_and_run(struct ProgramParams *params) + if (server_mud->debug == false) { + printf("[program informations]: \n\n"); + } +- while (true) { +- sermud_info_print(server_mud); ++ ++ if (strcmp(params->as, "server") == 0) { ++ while (true) { ++ sermud_info_print(server_mud); ++ } ++ } else if (strcmp(params->as, "loop") == 0) { ++ loopmod.model = params->model; ++ loopmod.server_mud_info = server_mud; + } + ++ + pthread_mutex_destroy(&server_debug_mutex); + + return PROGRAM_OK; +@@ -413,39 +570,62 @@ int32_t sersum_create_epfd_and_reg(struct ServerMumUnit *server_unit) + return PROGRAM_FAULT; + } + +- struct epoll_event ep_ev; +- ep_ev.data.ptr = (void *)&(server_unit->listener); ++ struct epoll_event ep_ev = {0}; + ep_ev.events = EPOLLIN | EPOLLET; +- if (epoll_ctl(server_unit->epfd, EPOLL_CTL_ADD, server_unit->listener.fd, &ep_ev) < 0) { +- PRINT_ERROR("server can't control epoll %d! ", errno); +- return PROGRAM_FAULT; ++ ++ for (int32_t i = 0; i < PROTOCOL_MODE_MAX; i++) { ++ if (server_unit->listener.listen_fd_array[i] != -1) { ++ struct ServerHandler *server_handler = (struct ServerHandler *)malloc(sizeof(struct ServerHandler)); ++ memset_s(server_handler, sizeof(struct ServerHandler), 0, sizeof(struct ServerHandler)); ++ server_handler->fd = server_unit->listener.listen_fd_array[i]; ++ ++ ep_ev.data.ptr = (void *)server_handler; ++ if (epoll_ctl(server_unit->epfd, EPOLL_CTL_ADD, server_unit->listener.listen_fd_array[i], &ep_ev) < 0) { ++ PRINT_ERROR("epoll_ctl failed %d! listen_fd=%d ", errno, server_unit->listener.listen_fd_array[i]); ++ return PROGRAM_FAULT; ++ } ++ } + } + +- server_debug_print("server mum unit", "waiting", server_unit->ip, server_unit->port, server_unit->debug); ++ server_debug_print("server mum unit", "waiting", &server_unit->server_ip_info.ip, server_unit->port, ++ server_unit->debug); + + return PROGRAM_OK; + } + + // the single thread, unblock, mutliplexing IO server accepts the connections +-int32_t sersum_accept_connects(struct ServerMumUnit *server_unit, struct ServerHandler *server_handler) ++int32_t sersum_accept_connects(struct epoll_event *cur_epev, struct ServerMumUnit *server_unit) + { ++ fault_inject_delay(INJECT_DELAY_ACCEPT); ++ int32_t fd = ((struct ServerHandler*)(cur_epev->data.ptr))->fd; + while (true) { +- struct sockaddr_in accept_addr; +- uint32_t sockaddr_in_len = sizeof(struct sockaddr_in); ++ sockaddr_t accept_addr; ++ bool is_tcp_v6 = (fd == (server_unit->listener.listen_fd_array[V6_TCP])) ? true : false; ++ ++ socklen_t sockaddr_in_len = is_tcp_v6 ? sizeof(struct sockaddr_in6) : sizeof(struct sockaddr_in); + int32_t accept_fd; +- if (strcmp(server_unit->domain, "udp") == 0) { +- break; +- } ++ int32_t ret = 0; ++ ++ int32_t listen_index = (is_tcp_v6) ? V6_TCP : V4_TCP; ++ int32_t listen_fd = server_unit->listener.listen_fd_array[listen_index]; + + if (strcmp(server_unit->accept, "ac4") == 0) { +- accept_fd = accept4(server_unit->listener.fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len, SOCK_CLOEXEC); ++ accept_fd = accept4(listen_fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len, SOCK_CLOEXEC); + } else { +- accept_fd = accept(server_unit->listener.fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len); ++ accept_fd = accept(listen_fd, (struct sockaddr *)&accept_addr, &sockaddr_in_len); + } +- ++ + if (accept_fd < 0) { ++ if (errno != EWOULDBLOCK && errno != EAGAIN){ ++ PRINT_ERROR("accept_fd=%d , errno=%d ", accept_fd, errno); ++ } + break; + } ++ ret = set_tcp_keep_alive_info(accept_fd, server_unit->tcp_keepalive_idle, server_unit->tcp_keepalive_interval); ++ if (ret < 0) { ++ PRINT_ERROR("set_tcp_keep_alive_info ret=%d \n", ret); ++ return PROGRAM_FAULT; ++ } + + if (set_socket_unblock(accept_fd) < 0) { + PRINT_ERROR("server can't set the connect socket to unblock! "); +@@ -454,6 +634,8 @@ int32_t sersum_accept_connects(struct ServerMumUnit *server_unit, struct ServerH + + struct ServerHandler *server_handler = (struct ServerHandler *)malloc(sizeof(struct ServerHandler)); + server_handler->fd = accept_fd; ++ server_handler->is_v6 = (is_tcp_v6) ? 1 : 0; ++ + struct epoll_event ep_ev; + ep_ev.data.ptr = (void *)server_handler; + ep_ev.events = EPOLLIN | EPOLLET; +@@ -463,13 +645,98 @@ int32_t sersum_accept_connects(struct ServerMumUnit *server_unit, struct ServerH + } + + ++server_unit->curr_connect; +- +- server_debug_print("server mum unit", "accept", accept_addr.sin_addr.s_addr, accept_addr.sin_port, server_unit->debug); ++ ++ // sockaddr tp ip, port ++ ip_addr_t remote_ip; ++ uint16_t remote_port = ((struct sockaddr_in*)&accept_addr)->sin_port; ++ remote_ip.addr_family = (is_tcp_v6) ? AF_INET6 : AF_INET; ++ if (is_tcp_v6 == false) { ++ remote_ip.u_addr.ip4 = ((struct sockaddr_in *)&accept_addr)->sin_addr; ++ } else { ++ remote_ip.u_addr.ip6 = ((struct sockaddr_in6 *)&accept_addr)->sin6_addr; ++ } ++ ++ server_debug_print("server mum unit", "accept", &remote_ip, remote_port, server_unit->debug); + } + + return PROGRAM_OK; + } + ++static int sersum_get_remote_ip(struct ServerHandler *server_handler, ip_addr_t *remote_ip, uint16_t *remote_port) ++{ ++ sockaddr_t connect_addr; ++ socklen_t connect_addr_len = server_handler->is_v6 == 0 ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6); ++ if (getpeername(server_handler->fd, (struct sockaddr *)&connect_addr, &connect_addr_len) < 0) { ++ PRINT_ERROR("server can't socket peername %d! ", errno); ++ return PROGRAM_ABORT; ++ } ++ ++ *remote_port = ((struct sockaddr_in *)&connect_addr)->sin_port; ++ if (((struct sockaddr *)&connect_addr)->sa_family == AF_INET) { ++ remote_ip->addr_family = AF_INET; ++ remote_ip->u_addr.ip4 = ((struct sockaddr_in *)&connect_addr)->sin_addr; ++ } else if (((struct sockaddr *)&connect_addr)->sa_family == AF_INET6) { ++ remote_ip->addr_family = AF_INET6; ++ remote_ip->u_addr.ip6 = ((struct sockaddr_in6 *)&connect_addr)->sin6_addr; ++ } ++ return PROGRAM_OK; ++} ++ ++static int sersum_process_tcp_accept_event(struct ServerMumUnit *server_unit, struct epoll_event *curr_epev) ++{ ++ struct ServerHandler *server_handler = (struct ServerHandler *)curr_epev->data.ptr; ++ ip_addr_t remote_ip; ++ uint16_t remote_port; ++ ++ if (sersum_get_remote_ip(server_handler, &remote_ip, &remote_port) != PROGRAM_OK) { ++ return PROGRAM_ABORT; ++ } ++ ++ int32_t server_ans_ret = server_ans(server_handler->fd, server_unit->pktlen, server_unit->api, "tcp"); ++ if (server_ans_ret == PROGRAM_FAULT) { ++ --server_unit->curr_connect; ++ server_handler_close(server_unit->epfd, server_handler); ++ } else if (server_ans_ret == PROGRAM_ABORT) { ++ --server_unit->curr_connect; ++ server_debug_print("server mum unit", "close", &remote_ip, remote_port, server_unit->debug); ++ server_handler_close(server_unit->epfd, server_handler); ++ } else { ++ server_unit->recv_bytes += server_unit->pktlen; ++ server_debug_print("server mum unit", "receive", &remote_ip, remote_port, server_unit->debug); ++ } ++ return PROGRAM_OK; ++} ++ ++static int sersum_process_epollin_event(struct ServerMumUnit *server_unit, struct epoll_event *curr_epev) ++{ ++ struct ServerHandler *server_handler = (struct ServerHandler *)curr_epev->data.ptr; ++ int32_t fd = server_handler->fd; ++ if (fd == (server_unit->listener.listen_fd_array[V4_TCP]) || ++ fd == (server_unit->listener.listen_fd_array[V6_TCP])) { ++ int32_t sersum_accept_connects_ret = sersum_accept_connects(curr_epev, server_unit); ++ if (sersum_accept_connects_ret < 0) { ++ PRINT_ERROR("server try accept error %d! ", sersum_accept_connects_ret); ++ return PROGRAM_ABORT; ++ } ++ } else if (fd == (server_unit->listener.listen_fd_array[V4_UDP]) || ++ fd == (server_unit->listener.listen_fd_array[UDP_MULTICAST])) { ++ uint32_t pktlen = server_unit->pktlen > UDP_PKTLEN_MAX ? UDP_PKTLEN_MAX : server_unit->pktlen; ++ int32_t server_ans_ret = server_ans(fd, pktlen, server_unit->api, "udp"); ++ if (server_ans_ret != PROGRAM_OK) { ++ if (server_handler_close(server_unit->epfd, server_handler) != 0) { ++ PRINT_ERROR("server_handler_close ret %d! \n", server_ans_ret); ++ return PROGRAM_ABORT; ++ } ++ } ++ server_unit->recv_bytes += pktlen; ++ } else { ++ if (sersum_process_tcp_accept_event(server_unit, curr_epev) != PROGRAM_OK) { ++ return PROGRAM_ABORT; ++ } ++ } ++ return PROGRAM_OK; ++} ++ + // the single thread, unblock, mutliplexing IO server processes the events + int32_t sersum_proc_epevs(struct ServerMumUnit *server_unit) + { +@@ -482,47 +749,16 @@ int32_t sersum_proc_epevs(struct ServerMumUnit *server_unit) + for (int32_t i = 0; i < epoll_nfds; ++i) { + struct epoll_event *curr_epev = server_unit->epevs + i; + +- if (curr_epev->events == EPOLLERR || curr_epev->events == EPOLLHUP || curr_epev->events == EPOLLRDHUP) { +- PRINT_ERROR("server epoll wait error %d! ", curr_epev->events); +- return PROGRAM_FAULT; ++ if (curr_epev->events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) { ++ server_unit->curr_connect--; ++ if (server_handler_close(server_unit->epfd, (struct ServerHandler *)curr_epev->data.ptr) != 0) { ++ return PROGRAM_OK; ++ } + } + + if (curr_epev->events == EPOLLIN) { +- if (curr_epev->data.ptr == (void *)&(server_unit->listener) && strcmp(server_unit->domain, "udp") != 0) { +- int32_t sersum_accept_connects_ret = sersum_accept_connects(server_unit, &(server_unit->listener)); +- if (sersum_accept_connects_ret < 0) { +- PRINT_ERROR("server try accept error %d! ", sersum_accept_connects_ret); +- return PROGRAM_FAULT; +- } +- continue; +- } else { +- struct ServerHandler *server_handler = (struct ServerHandler *)curr_epev->data.ptr; +- struct sockaddr_in connect_addr; +- socklen_t connect_addr_len = sizeof(connect_addr); +- if (strcmp(server_unit->domain, "udp") != 0 && getpeername(server_handler->fd, (struct sockaddr *)&connect_addr, &connect_addr_len) < 0) { +- PRINT_ERROR("server can't socket peername %d! ", errno); +- return PROGRAM_FAULT; +- } +- +- int32_t server_ans_ret = server_ans(server_handler, server_unit->pktlen, server_unit->api, server_unit->domain); +- if (server_ans_ret == PROGRAM_FAULT) { +- --server_unit->curr_connect; +- struct epoll_event ep_ev; +- if (epoll_ctl(server_unit->epfd, EPOLL_CTL_DEL, server_handler->fd, &ep_ev) < 0) { +- PRINT_ERROR("server can't delete socket '%d' to control epoll %d! ", server_handler->fd, errno); +- return PROGRAM_FAULT; +- } +- } else if (server_ans_ret == PROGRAM_ABORT) { +- --server_unit->curr_connect; +- if (close(server_handler->fd) < 0) { +- PRINT_ERROR("server can't close the socket %d! ", errno); +- return PROGRAM_FAULT; +- } +- server_debug_print("server mum unit", "close", connect_addr.sin_addr.s_addr, connect_addr.sin_port, server_unit->debug); +- } else { +- server_unit->recv_bytes += server_unit->pktlen; +- server_debug_print("server mum unit", "receive", connect_addr.sin_addr.s_addr, connect_addr.sin_port, server_unit->debug); +- } ++ if (sersum_process_epollin_event(server_unit, curr_epev) != PROGRAM_OK) { ++ return PROGRAM_ABORT; + } + } + } +@@ -535,7 +771,9 @@ void *sersum_create_and_run(void *arg) + { + struct ServerMumUnit *server_unit = (struct ServerMumUnit *)arg; + +- if (create_socket_and_listen(&(server_unit->listener.fd), server_unit->ip, server_unit->groupip, server_unit->port, server_unit->domain) < 0) { ++ if (create_socket_and_listen(server_unit->listener.listen_fd_array, &(server_unit->server_ip_info), ++ server_unit->port, server_unit->protocol_type_mode) < 0) { ++ PRINT_ERROR("create_socket_and_listen err! \n"); + exit(PROGRAM_FAULT); + } + if (sersum_create_epfd_and_reg(server_unit) < 0) { +@@ -560,6 +798,7 @@ int32_t sermum_create_and_run(struct ProgramParams *params) + pthread_t *tids = (pthread_t *)malloc(thread_num * sizeof(pthread_t)); + struct ServerMum *server_mum = (struct ServerMum *)malloc(sizeof(struct ServerMum)); + struct ServerMumUnit *server_unit = (struct ServerMumUnit *)malloc(sizeof(struct ServerMumUnit)); ++ memset_s(server_unit, sizeof(struct ServerMumUnit), 0, sizeof(struct ServerMumUnit)); + + if (pthread_mutex_init(&server_debug_mutex, NULL) < 0) { + PRINT_ERROR("server can't init posix mutex %d! ", errno); +@@ -568,22 +807,47 @@ int32_t sermum_create_and_run(struct ProgramParams *params) + + server_mum->uints = server_unit; + server_mum->debug = params->debug; ++ uint32_t port = UNIX_TCP_PORT_MIN; + + for (uint32_t i = 0; i < thread_num; ++i) { + server_unit->listener.fd = -1; ++ for (int32_t i = 0; i < PROTOCOL_MODE_MAX; i++) { ++ server_unit->listener.listen_fd_array[i] = -1; ++ } + server_unit->epfd = -1; + server_unit->epevs = (struct epoll_event *)malloc(SERVER_EPOLL_SIZE_MAX * sizeof(struct epoll_event)); + server_unit->curr_connect = 0; + server_unit->recv_bytes = 0; +- server_unit->ip = inet_addr(params->ip); +- server_unit->groupip = inet_addr(params->groupip); +- server_unit->port = htons(params->port); ++ server_unit->server_ip_info.ip.addr_family = params->addr_family; ++ inet_pton(AF_INET, params->ip, &server_unit->server_ip_info.ip.u_addr.ip4); ++ inet_pton(AF_INET6, params->ipv6, &server_unit->server_ip_info.ip.u_addr.ip6); ++ ++ server_unit->server_ip_info.groupip.addr_family = AF_INET; ++ inet_pton(AF_INET, params->groupip, &server_unit->server_ip_info.groupip.u_addr); ++ ++ server_unit->server_ip_info.groupip_interface.addr_family = AF_INET; ++ inet_pton(AF_INET, params->groupip_interface, &server_unit->server_ip_info.groupip_interface.u_addr); ++ ++ /* loop to set ports to each server_mums */ ++ while (!((params->port)[port])) { ++ port = (port + 1) % UNIX_TCP_PORT_MAX; ++ } ++ server_unit->port = htons(port++); + server_unit->pktlen = params->pktlen; +- server_unit->domain = params->domain; ++ ++ server_unit->protocol_type_mode = program_get_protocol_mode_by_domain_ip(params->domain, params->ip, ++ params->ipv6, params->groupip); ++ ++ // Create multicast sockets only on the first thread ++ if (i != 0) { ++ server_unit->protocol_type_mode = setbitnum_off(server_unit->protocol_type_mode, UDP_MULTICAST); ++ } + server_unit->api = params->api; + server_unit->debug = params->debug; + server_unit->epollcreate = params->epollcreate; + server_unit->accept = params->accept; ++ server_unit->tcp_keepalive_idle = params->tcp_keepalive_idle; ++ server_unit->tcp_keepalive_interval = params->tcp_keepalive_interval; + server_unit->next = (struct ServerMumUnit *)malloc(sizeof(struct ServerMumUnit)); + if (server_unit->next) { + memset_s(server_unit->next, sizeof(struct ServerMumUnit), 0, sizeof(struct ServerMumUnit)); +@@ -599,8 +863,14 @@ int32_t sermum_create_and_run(struct ProgramParams *params) + if (server_mum->debug == false) { + printf("[program informations]: \n\n"); + } +- while (true) { +- sermum_info_print(server_mum); ++ ++ if (strcmp(params->as, "server") == 0) { ++ while (true) { ++ sermum_info_print(server_mum); ++ } ++ } else if (strcmp(params->as, "loop") == 0) { ++ loopmod.model = params->model; ++ loopmod.server_mum_info = server_mum; + } + + pthread_mutex_destroy(&server_debug_mutex); +diff --git a/examples/src/utilities.c b/examples/src/utilities.c +index 7247b44..59d8bea 100644 +--- a/examples/src/utilities.c ++++ b/examples/src/utilities.c +@@ -11,35 +11,215 @@ + */ + + +-#include "utilities.h" ++#include "parameter.h" + ++int32_t set_tcp_keep_alive_info(int32_t sockfd, int32_t tcp_keepalive_idle, int32_t tcp_keepalive_interval) ++{ ++ int32_t ret = 0; ++ int32_t keep_alive = 1; ++ int32_t keep_idle = 1; ++ int32_t keep_interval = 1; ++ ++ if ((tcp_keepalive_idle == PARAM_DEFAULT_KEEPALIVEIDLE) || ++ (tcp_keepalive_interval == PARAM_DEFAULT_KEEPALIVEIDLE)) { ++ return 0; ++ } ++ ++ keep_idle = tcp_keepalive_idle; ++ ret = setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&keep_alive, sizeof(keep_alive)); ++ if (ret != 0) { ++ PRINT_ERROR("setsockopt keep_alive err ret=%d \n", ret); ++ return ret; ++ } ++ ++ ret = setsockopt(sockfd, SOL_TCP, TCP_KEEPIDLE, (void *)&keep_idle, sizeof(keep_idle)); ++ if (ret != 0) { ++ PRINT_ERROR("setsockopt keep_idle err ret=%d \n", ret); ++ return ret; ++ } ++ ++ keep_interval = tcp_keepalive_interval; ++ ret = setsockopt(sockfd, SOL_TCP, TCP_KEEPINTVL, (void *)&keep_interval, sizeof(keep_interval)); ++ if (ret != 0) { ++ PRINT_ERROR("setsockopt keep_interval err ret=%d \n", ret); ++ return ret; ++ } ++ return ret; ++} ++ ++static int32_t process_unix_fd(int32_t *socket_fd, int32_t *listen_fd_array) ++{ ++ struct sockaddr_un socket_addr; ++ int32_t fd = socket(AF_UNIX, SOCK_STREAM, 0); ++ if (fd < 0) { ++ PRINT_ERROR("can't create socket %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ *socket_fd = fd; ++ ++ unlink(SOCKET_UNIX_DOMAIN_FILE); ++ socket_addr.sun_family = AF_UNIX; ++ strcpy_s(socket_addr.sun_path, sizeof(socket_addr.sun_path), SOCKET_UNIX_DOMAIN_FILE); ++ if (bind(*socket_fd, (struct sockaddr *)&socket_addr, sizeof(struct sockaddr_un)) < 0) { ++ PRINT_ERROR("can't bind the address to socket %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ ++ if (listen(*socket_fd, SERVER_SOCKET_LISTEN_BACKLOG) < 0) { ++ PRINT_ERROR("server socket can't lisiten %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ return PROGRAM_OK; ++} ++ ++static int32_t process_udp_groupip(int32_t fd, ip_addr_t *ip, ip_addr_t *groupip, sockaddr_t *socker_add_info, ++ ip_addr_t *groupip_interface) ++{ ++ struct ip_mreq mreq; ++ if (groupip->u_addr.ip4.s_addr) { ++ mreq.imr_multiaddr = groupip->u_addr.ip4; ++ if (groupip_interface->u_addr.ip4.s_addr) { ++ mreq.imr_interface = groupip_interface->u_addr.ip4; ++ } else { ++ mreq.imr_interface = ip->u_addr.ip4; ++ } ++ ++ if (setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(struct ip_mreq)) == -1) { ++ PRINT_ERROR("can't set the address to group %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ ((struct sockaddr_in *)socker_add_info)->sin_addr = groupip->u_addr.ip4; ++ return PROGRAM_OK; ++ } ++ return PROGRAM_OK; ++} ++ ++static int32_t server_create_sock(uint8_t protocol_mode, int32_t* fd_arry) ++{ ++ bool ret = true; ++ for (int32_t i = 0; i < PROTOCOL_MODE_MAX; i++) { ++ if (getbit_num(protocol_mode, i) == 0) ++ continue; ++ if (i == V4_TCP) { ++ fd_arry[i] = socket(AF_INET, SOCK_STREAM, 0); ++ } else if (i == V6_TCP) { ++ fd_arry[i] = socket(AF_INET6, SOCK_STREAM, 0); ++ } else if (i == V4_UDP) { ++ fd_arry[i] = socket(AF_INET, SOCK_DGRAM, 0); ++ } else if (i == UDP_MULTICAST) { ++ fd_arry[i] = socket(AF_INET, SOCK_DGRAM, 0); ++ } else { ++ continue; ++ } ++ if (fd_arry[i] < 0) { ++ PRINT_ERROR("can't create socket type=%d errno=%d! ", i, errno); ++ ret = false; ++ break; ++ } ++ } ++ ++ if (ret == false) { ++ for (int32_t i = 0; i< PROTOCOL_MODE_MAX; i++) { ++ if (fd_arry[i] > 0) { ++ close(fd_arry[i]); ++ } ++ } ++ return PROGRAM_FAULT; ++ } ++ return PROGRAM_OK; ++} ++ ++static int32_t socket_add_info_init(int32_t idx, uint16_t port, struct ServerIpInfo *server_ip_info, ++ sockaddr_t *socker_add_info, int32_t *listen_fd_array) ++{ ++ ip_addr_t *ip = &(server_ip_info->ip); ++ ip_addr_t *groupip = &(server_ip_info->groupip); ++ ip_addr_t *groupip_interface = &(server_ip_info->groupip_interface); ++ ++ uint32_t len = ((idx == V4_TCP || idx == V4_UDP || idx == UDP_MULTICAST) ? ++ sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6)); ++ memset_s(socker_add_info, len, 0, len); ++ ++ if (idx == V4_TCP || idx == V4_UDP) { ++ ((struct sockaddr_in *)socker_add_info)->sin_addr = ip->u_addr.ip4; ++ } else if (idx == V6_TCP) { ++ ((struct sockaddr_in6 *)socker_add_info)->sin6_addr = ip->u_addr.ip6; ++ } else if (idx == UDP_MULTICAST) { ++ if (process_udp_groupip(listen_fd_array[idx], ip, groupip, socker_add_info, groupip_interface) != PROGRAM_OK) { ++ return PROGRAM_FAULT; ++ } ++ } ++ ++ ((struct sockaddr *)socker_add_info)->sa_family = ((idx == V4_TCP || idx == V4_UDP || idx == UDP_MULTICAST) ? ++ AF_INET : AF_INET6); ++ ((struct sockaddr_in *)socker_add_info)->sin_port = port; ++ return PROGRAM_OK; ++} + + // create the socket and listen +-int32_t create_socket_and_listen(int32_t *socket_fd, in_addr_t ip, in_addr_t groupip, uint16_t port, const char *domain) ++int32_t create_socket_and_listen(int32_t *listen_fd_array, struct ServerIpInfo *server_ip_info, ++ uint16_t port, uint8_t protocol_mode) + { +- if (strcmp(domain, "tcp") == 0) { +- *socket_fd = socket(AF_INET, SOCK_STREAM, 0); +- if (*socket_fd < 0) { +- PRINT_ERROR("can't create socket %d! ", errno); ++ int32_t port_multi = 1; ++ uint32_t len = 0; ++ sockaddr_t socker_add_info; ++ ++ if (getbit_num(protocol_mode, UNIX) == 1) { ++ if (process_unix_fd(&listen_fd_array[UNIX], listen_fd_array) != PROGRAM_OK) { + return PROGRAM_FAULT; + } +- } else if (strcmp(domain, "unix") == 0) { +- *socket_fd = socket(AF_UNIX, SOCK_STREAM, 0); +- if (*socket_fd < 0) { +- PRINT_ERROR("can't create socket %d! ", errno); ++ return PROGRAM_OK; ++ } ++ ++ if (server_create_sock(protocol_mode, listen_fd_array) != PROGRAM_OK) { ++ return PROGRAM_FAULT; ++ } ++ ++ for (int32_t i = 0;i< PROTOCOL_MODE_MAX; i++) { ++ if (listen_fd_array[i] <= 0) ++ continue; ++ if (setsockopt(listen_fd_array[i], SOL_SOCKET, SO_REUSEPORT, (void *)&port_multi, sizeof(int32_t)) < 0) { ++ PRINT_ERROR("can't set the option of socket %d! ", errno); + return PROGRAM_FAULT; + } +- } else if (strcmp(domain, "udp") == 0) { +- *socket_fd = socket(AF_INET, SOCK_DGRAM, 0); +- if (*socket_fd < 0) { +- PRINT_ERROR("can't create socket %d! ", errno); ++ if (set_socket_unblock(listen_fd_array[i]) < 0) { ++ PRINT_ERROR("can't set the socket to unblock! "); ++ return PROGRAM_FAULT; ++ } ++ if (socket_add_info_init(i, port, server_ip_info, &socker_add_info, listen_fd_array) != PROGRAM_OK) { ++ return PROGRAM_FAULT; ++ } ++ ++ len = ((i == V4_TCP || i == V4_UDP || i == UDP_MULTICAST) ? ++ sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6)); ++ ++ if (bind(listen_fd_array[i], (struct sockaddr *)&socker_add_info, len) < 0) { ++ PRINT_ERROR("can't bind the address %d!, i=%d, listen_fd_array[i]=%d ", errno, i, listen_fd_array[i]); + return PROGRAM_FAULT; + } ++ ++ if (i == V4_TCP || i == V6_TCP) { ++ if (listen(listen_fd_array[i], SERVER_SOCKET_LISTEN_BACKLOG) < 0) { ++ PRINT_ERROR("server socket can't lisiten %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ } + } ++ return PROGRAM_OK; ++} + +- int32_t port_multi = 1; +- if (setsockopt(*socket_fd, SOL_SOCKET, SO_REUSEPORT, (void *)&port_multi, sizeof(int32_t)) < 0) { +- PRINT_ERROR("can't set the option of socket %d! ", errno); ++static int32_t creat_socket_init(int32_t *socket_fd, struct ClientUnit *client_unit, sockaddr_t *server_addr) ++{ ++ ip_addr_t *ip = &client_unit->ip; ++ const char *domain = client_unit->domain; ++ ++ if (strcmp(domain, "tcp") == 0) { ++ *socket_fd = socket(ip->addr_family, SOCK_STREAM, 0); ++ } else { ++ *socket_fd = socket(AF_INET, SOCK_DGRAM, 0); ++ } ++ if (*socket_fd < 0) { ++ PRINT_ERROR("client can't create socket %d! ", errno); + return PROGRAM_FAULT; + } + +@@ -48,106 +228,118 @@ int32_t create_socket_and_listen(int32_t *socket_fd, in_addr_t ip, in_addr_t gro + return PROGRAM_FAULT; + } + +- if (strcmp(domain, "tcp") == 0) { +- struct sockaddr_in socket_addr; +- memset_s(&socket_addr, sizeof(socket_addr), 0, sizeof(socket_addr)); +- socket_addr.sin_family = AF_INET; +- socket_addr.sin_addr.s_addr = ip; +- socket_addr.sin_port = port; +- if (bind(*socket_fd, (struct sockaddr *)&socket_addr, sizeof(struct sockaddr_in)) < 0) { +- PRINT_ERROR("can't bind the address to socket %d! ", errno); +- return PROGRAM_FAULT; +- } ++ ((struct sockaddr *)server_addr)->sa_family = ip->addr_family; + +- if (listen(*socket_fd, SERVER_SOCKET_LISTEN_BACKLOG) < 0) { +- PRINT_ERROR("server socket can't lisiten %d! ", errno); +- return PROGRAM_FAULT; ++ return PROGRAM_OK; ++} ++ ++static int32_t pocess_connect_sport(int32_t *socket_fd, struct ClientUnit *client_unit, sockaddr_t *server_addr) ++{ ++ uint16_t sport = client_unit->sport; ++ ip_addr_t *ip = &client_unit->ip; ++ uint32_t addr_len = ip->addr_family == AF_INET ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6); ++ ++ if (sport) { ++ if (ip->addr_family == AF_INET) { ++ ((struct sockaddr_in *)server_addr)->sin_addr.s_addr = htonl(INADDR_ANY); ++ } else if (ip->addr_family == AF_INET6) { ++ ((struct sockaddr_in6 *)server_addr)->sin6_addr = in6addr_any; + } +- } else if (strcmp(domain, "unix") == 0) { +- struct sockaddr_un socket_addr; +- unlink(SOCKET_UNIX_DOMAIN_FILE); +- socket_addr.sun_family = AF_UNIX; +- strcpy_s(socket_addr.sun_path, sizeof(socket_addr.sun_path), SOCKET_UNIX_DOMAIN_FILE); +- if (bind(*socket_fd, (struct sockaddr *)&socket_addr, sizeof(struct sockaddr_un)) < 0) { ++ ((struct sockaddr_in *)server_addr)->sin_port = sport; ++ if (bind(*socket_fd, (struct sockaddr *)server_addr, addr_len) < 0) { + PRINT_ERROR("can't bind the address to socket %d! ", errno); + return PROGRAM_FAULT; + } ++ } ++ return PROGRAM_OK; ++} + +- if (listen(*socket_fd, SERVER_SOCKET_LISTEN_BACKLOG) < 0) { +- PRINT_ERROR("server socket can't lisiten %d! ", errno); ++static int32_t pocess_unix_create_connect(int32_t *socket_fd) ++{ ++ *socket_fd = socket(AF_UNIX, SOCK_STREAM, 0); ++ if (*socket_fd < 0) { ++ PRINT_ERROR("client can't create socket %d! ", errno); ++ return PROGRAM_FAULT; ++ } ++ ++ struct sockaddr_un server_addr; ++ server_addr.sun_family = AF_UNIX; ++ strcpy_s(server_addr.sun_path, sizeof(server_addr.sun_path), SOCKET_UNIX_DOMAIN_FILE); ++ if (connect(*socket_fd, (struct sockaddr *)&server_addr, sizeof(struct sockaddr_un)) < 0) { ++ if (errno == EINPROGRESS) { ++ return PROGRAM_INPROGRESS; ++ } else { ++ PRINT_ERROR("client can't connect to the server %d! ", errno); + return PROGRAM_FAULT; + } +- } else if (strcmp(domain, "udp") == 0) { +- struct sockaddr_in socket_addr; +- memset_s(&socket_addr, sizeof(socket_addr), 0, sizeof(socket_addr)); +- socket_addr.sin_family = AF_INET; +- socket_addr.sin_port = port; +- +- if (groupip) { +- struct ip_mreq mreq; +- mreq.imr_multiaddr.s_addr = groupip; +- mreq.imr_interface.s_addr = ip; +- if (setsockopt(*socket_fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(struct ip_mreq)) == -1) { +- PRINT_ERROR("can't set the address to group %d! ", errno); +- return PROGRAM_FAULT;; ++ } ++ return PROGRAM_OK; ++} ++ ++static int32_t pocess_udp_multicast(int32_t *socket_fd, struct ClientUnit *client_unit, sockaddr_t *server_addr) ++{ ++ const uint32_t loop = client_unit->loop; ++ ip_addr_t *groupip = &client_unit->groupip; ++ if (client_unit->protocol_type_mode == UDP_MULTICAST) { ++ /* set the local device for a multicast socket */ ++ ((struct sockaddr_in *)server_addr)->sin_addr = groupip->u_addr.ip4; ++ ++ struct in_addr localInterface; ++ localInterface.s_addr = client_unit->groupip_interface.u_addr.ip4.s_addr; ++ if (localInterface.s_addr) { ++ if (setsockopt(*socket_fd, IPPROTO_IP, IP_MULTICAST_IF, (char *)&localInterface, ++ sizeof(localInterface)) < 0) { ++ PRINT_ERROR("can't set the multicast interface %d! ", errno); ++ return PROGRAM_FAULT; + } +- socket_addr.sin_addr.s_addr = groupip; +- } else { +- socket_addr.sin_addr.s_addr = ip; + } + +- if (bind(*socket_fd, (struct sockaddr *)&socket_addr, sizeof(struct sockaddr_in)) < 0) { +- PRINT_ERROR("can't bind the address to socket %d! ", errno); ++ /* sent multicast packets should be looped back to the local socket */ ++ if (setsockopt(*socket_fd, IPPROTO_IP, IP_MULTICAST_LOOP, &loop, sizeof(loop)) == -1) { ++ PRINT_ERROR("can't set the multicast loop %d! ", errno); + return PROGRAM_FAULT; +- } ++ } + } +- + return PROGRAM_OK; + } + + // create the socket and connect +-int32_t create_socket_and_connect(int32_t *socket_fd, in_addr_t ip, in_addr_t groupip, uint16_t port, uint16_t sport, const char *domain, const char *api) ++int32_t create_socket_and_connect(int32_t *socket_fd, struct ClientUnit *client_unit) + { ++ ip_addr_t *ip = &client_unit->ip; ++ const char *domain = client_unit->domain; ++ const char *api = client_unit->api; ++ ++ sockaddr_t server_addr; ++ + if (strcmp(domain, "tcp") == 0 || strcmp(domain, "udp") == 0) { +- if (strcmp(domain, "tcp") == 0) { +- *socket_fd = socket(AF_INET, SOCK_STREAM, 0); +- } else { +- *socket_fd = socket(AF_INET, SOCK_DGRAM, 0); +- } +- if (*socket_fd < 0) { +- PRINT_ERROR("client can't create socket %d! ", errno); ++ uint32_t addr_len = ip->addr_family == AF_INET ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6); ++ memset_s(&server_addr, addr_len, 0, addr_len); ++ ++ if (creat_socket_init(socket_fd, client_unit, &server_addr) != PROGRAM_OK) { + return PROGRAM_FAULT; + } + +- if (set_socket_unblock(*socket_fd) < 0) { +- PRINT_ERROR("can't set the socket to unblock! "); ++ if (pocess_connect_sport(socket_fd, client_unit, &server_addr) < 0) { + return PROGRAM_FAULT; + } + +- struct sockaddr_in server_addr; +- memset_s(&server_addr, sizeof(server_addr), 0, sizeof(server_addr)); +- server_addr.sin_family = AF_INET; +- if (sport) { +- server_addr.sin_addr.s_addr = htonl(INADDR_ANY); +- server_addr.sin_port = sport; +- if (bind(*socket_fd, (struct sockaddr *)&server_addr, sizeof(struct sockaddr_in)) < 0) { +- PRINT_ERROR("can't bind the address to socket %d! ", errno); +- return PROGRAM_FAULT; +- } ++ if (ip->addr_family == AF_INET) { ++ ((struct sockaddr_in *)&server_addr)->sin_addr = ip->u_addr.ip4; ++ } else if (ip->addr_family == AF_INET6) { ++ ((struct sockaddr_in6 *)&server_addr)->sin6_addr = ip->u_addr.ip6; + } +- server_addr.sin_addr.s_addr = ip; +- server_addr.sin_port = port; ++ ((struct sockaddr_in *)&server_addr)->sin_port = client_unit->port; ++ + if (strcmp(domain, "udp") == 0) { +- if (groupip) { +- server_addr.sin_addr.s_addr = groupip; +- if (setsockopt(*socket_fd, IPPROTO_IP, IP_MULTICAST_IF, &ip, sizeof(ip)) != 0) { +- PRINT_ERROR("can't set the multicast interface %d! ", errno); +- return PROGRAM_FAULT; +- } ++ int32_t ret = pocess_udp_multicast(socket_fd, client_unit, &server_addr); ++ if (ret != PROGRAM_OK) { ++ return ret; + } + } ++ + if (strcmp(domain, "udp") != 0 || strcmp(api, "recvfromsendto") != 0) { +- if (connect(*socket_fd, (struct sockaddr *)&server_addr, sizeof(struct sockaddr_in)) < 0) { ++ if (connect(*socket_fd, (struct sockaddr *)&server_addr, addr_len) < 0) { + if (errno == EINPROGRESS) { + return PROGRAM_INPROGRESS; + } else { +@@ -157,25 +349,11 @@ int32_t create_socket_and_connect(int32_t *socket_fd, in_addr_t ip, in_addr_t gr + } + } + } else if (strcmp(domain, "unix") == 0) { +- *socket_fd = socket(AF_UNIX, SOCK_STREAM, 0); +- if (*socket_fd < 0) { +- PRINT_ERROR("client can't create socket %d! ", errno); +- return PROGRAM_FAULT; +- } +- +- struct sockaddr_un server_addr; +- server_addr.sun_family = AF_UNIX; +- strcpy_s(server_addr.sun_path, sizeof(server_addr.sun_path), SOCKET_UNIX_DOMAIN_FILE); +- if (connect(*socket_fd, (struct sockaddr *)&server_addr, sizeof(struct sockaddr_un)) < 0) { +- if (errno == EINPROGRESS) { +- return PROGRAM_INPROGRESS; +- } else { +- PRINT_ERROR("client can't connect to the server %d! ", errno); +- return PROGRAM_FAULT; +- } ++ int32_t ret = pocess_unix_create_connect(socket_fd); ++ if (ret != PROGRAM_OK) { ++ return ret; + } + } +- + return PROGRAM_OK; + } + +diff --git a/src/lstack/core/lstack_protocol_stack.c b/src/lstack/core/lstack_protocol_stack.c +index 3e6eeef..e272a04 100644 +--- a/src/lstack/core/lstack_protocol_stack.c ++++ b/src/lstack/core/lstack_protocol_stack.c +@@ -655,7 +655,6 @@ int32_t stack_setup_thread(void) + goto OUT1; + } + } +- + for (uint32_t i = 0; i < queue_num; i++) { + if (get_global_cfg_params()->seperate_send_recv) { + if (i % 2 == 0) { +@@ -694,6 +693,7 @@ int32_t stack_setup_thread(void) + g_stack_group.stack_num = queue_num; + + return 0; ++ + OUT1: + for (int32_t i = 0; i < queue_num; ++i) { + if (t_params[i] != NULL) { +diff --git a/src/lstack/core/lstack_virtio.c b/src/lstack/core/lstack_virtio.c +index bc42bb9..ad3088d 100644 +--- a/src/lstack/core/lstack_virtio.c ++++ b/src/lstack/core/lstack_virtio.c +@@ -298,11 +298,9 @@ int virtio_port_create(int lstack_net_port) + return retval; + } + +- uint16_t actual_queue_num = (g_virtio_instance.rx_queue_num < g_virtio_instance.tx_queue_num) ? +- g_virtio_instance.rx_queue_num : g_virtio_instance.tx_queue_num; + retval = snprintf(portargs, sizeof(portargs), + "path=/dev/vhost-net,queues=%u,queue_size=%u,iface=%s,mac=" RTE_ETHER_ADDR_PRT_FMT, +- actual_queue_num, VIRTIO_TX_RX_RING_SIZE, VIRTIO_USER_NAME, RTE_ETHER_ADDR_BYTES(&addr)); ++ VIRTIO_MAX_QUEUE_NUM, VIRTIO_TX_RX_RING_SIZE, VIRTIO_USER_NAME, RTE_ETHER_ADDR_BYTES(&addr)); + if (retval < 0) { + LSTACK_LOG(ERR, LSTACK, "virtio portargs snprintf failed ret=%d \n", retval); + return retval; +-- +2.33.0 + diff --git a/gazelle.spec b/gazelle.spec index 9e2cafc..3482b9a 100644 --- a/gazelle.spec +++ b/gazelle.spec @@ -2,7 +2,7 @@ Name: gazelle Version: 1.0.2 -Release: 48 +Release: 49 Summary: gazelle is a high performance user-mode stack License: MulanPSL-2.0 URL: https://gitee.com/openeuler/gazelle @@ -228,6 +228,7 @@ Patch9208: 0208-virtio-mode-actual_queue_num.patch Patch9209: 0209-virtio-update-g_rule_port-by-reg_ring_type-enum.patch Patch9210: 0210-virtio-dfx-data-of-virtio.patch Patch9211: 0211-add-flow_bifurcation-switch-in-lstack_cfg-file.patch +Patch9212: 0212-example-sync-example-update.patch %description %{name} is a high performance user-mode stack. @@ -269,6 +270,9 @@ install -Dpm 0640 %{_builddir}/%{name}-%{version}/src/ltran/ltran.conf %{b %config(noreplace) %{conf_path}/ltran.conf %changelog +* Wed Jul 10 2024 yinbin6 - 1.0.2-49 +- example: sync example update.patch + * Fri Jul 5 2024 yinbin6 - 1.0.2-48 - add flow_bifurcation switch in lstack_cfg file - virtio: dfx data of virtio -- Gitee