diff --git a/app/en/blogs/RDMA/RDMA_Network_Guide.md b/app/en/blogs/RDMA/RDMA_Network_Guide.md new file mode 100644 index 0000000000000000000000000000000000000000..1919a1f076873efb17be9a7e66b3cd0e85f354fc --- /dev/null +++ b/app/en/blogs/RDMA/RDMA_Network_Guide.md @@ -0,0 +1,459 @@ +# Identifying CX4/CX5 NICs + +Run the following command: + +``` +lspci |grep Mellanox +``` + +Command output: + +``` +81:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] +81:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] +``` + +# Installing the MLNX Driver + +1. Download the driver package that matches the OS from [https://network.nvidia.com/products/infiniband-drivers/linux/mlnx\_ofed/](https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/). + + ![](figures/en-us_image_0000001745738904.png) + +2. Create a directory and mount the OS image file to this directory. Change the OS image name to the actual one. + + ``` + mkdir -p /mnt/iso + mount openEuler-22.03-LTS-x86_64-dvd.iso /mnt/iso + ``` + +3. Configure the OS image source, for example, the local image, to obtain dependencies required during the installation. + 1. Open the image source file. + + ``` + vim /etc/yum.repos.d/openEuler.repo + ``` + + 2. Press **i** to enter the insert mode and retain only the following content: + + ``` + [OS] + name=OS + baseurl=file:///mnt/iso + enabled=1 + gpgcheck=0 + ``` + + 3. Press **Esc**, type **:wq!**, and press **Enter** to save the file and exit. + 4. Cache the software package. + + ``` + yum makecache + ``` + +4. Upload the driver package to the server and decompress it. Change the driver package name to the actual one. + + ``` + tar -zxvf MLNX_OFED_LINUX-5.4-3.7.5.0-openeuler22.03-x86_64.tgz + ``` + +5. Go to the driver package directory extracted after the decompression and run the following command to install the driver: + + ``` + ./mlnxofedinstall --without-depcheck --without-fw-update --force + ``` + + If the system displays a message indicating that the kernel does not support the driver version, run the following command: + + ``` + ./mlnxofedinstall --add-kernel-support + ``` + +6. Configure the system to automatically start the driver upon system restart. + + ``` + chkconfig --add openibd + /etc/init.d/openibd start + chkconfig openibd on + ``` + +7. Reboot the server after the installation is complete. + +# Verifying the Installation + +1. Check the RoCE LAG function of the driver. + 1. Check whether the RoCE LAG function is enabled. + + ``` + find /sys/ -name roce_lag_enable | xargs cat + ``` + + - If the command output is **1**, the function is enabled. + - If the command output is **0** or no command output is displayed, the function is disabled. + - The function is expected to be disabled. If the function is enabled, go to [1.b](#li519083722516). + + 2. Disable the RoCE LAG function. + + ``` + sed '/load_module mlx5_core/a\ files=`find /sys -name roce_lag_enable`;for file in $files;do echo 0 > $file;done' -i /etc/init.d/openibd + ``` + + 3. Reboot the node to apply the modification. Then, perform [1.a](#li389014811257) again to check whether the modification takes effect. + + ``` + reboot + ``` + +2. Query the driver version. + + ``` + ofed_info -s + ``` + + If the queried driver version is the same as the version installed in [Installing the MLNX Driver](installing-the-mlnx-driver.md), the driver version is correct. + +3. Load the MST tool. + + ``` + mst start + ``` + + If the following information is displayed, the loading is successful. + + ``` + Starting MST (Mellanox Software Tools) driver set + Loading MST PCI module - Success + Loading MST PCI configuration module - Success + Create devices + Unloading MST PCI module (unused) - Success + ``` + +4. Query the device path and network port. + 1. Query the device paths of RoCE and IB cards. + + ``` + mst status + ``` + + Command output: + + ``` + MST modules: + ------------ + MST PCI module is not loaded + MST PCI configuration module loaded + + MST devices: + ------------ + /dev/mst/mt4119_pciconf0 - PCI configuration cycles access. + domain:bus:dev.fn=0000:81:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1 + Chip revision is: 00 + ``` + + A device path **/dev/mst/**_mst\_typeN_ \(_N_ can be 0, 1, 2, ...\) enumerated in the **MST devices** field indicates a CX card. For details about the mapping between **mst\_type** and CX NIC models, see [Table 1](#table93856218287). + + **Table 1** Mapping between mst\_type and CX NIC models + + + + + + + + + + + + + + + + + + + +

mst_type

+

NIC Model

+

mt4099_pci_cr

+

CX3

+

mt4117_pciconf

+

CX4-Lx

+

mt4119_pciconf

+

CX5

+

mt4123_pciconf

+

CX6

+
+ + 2. Query the network ports to be checked. Subsequent steps will check all the queried ports. + + ``` + ll /dev/mst + ``` + + ![](figures/en-us_image_0000001792658857.png) + + Ports **mt4119\_pciconf0** and **mt4119\_pciconf0.1** on the current node will be checked. + +5. Check the firmware version. + 1. Query the firmware version of the RoCE or IB card. In the command, **/dev/mst/mt4119\_pciconf0** is the device path queried in the previous step. Replace it as required. + + ``` + flint -d /dev/mst/mt4119_pciconf0 q + ``` + + The command output is as follows: + + ``` + Image type: FS4 + FW Version: 16.31.2006 + FW Release Date: 31.8.2021 + Product Version: 16.31.2006 + Rom Info: type=UEFI version=14.24.15 cpu=AMD64 + type=PXE version=3.6.404 cpu=AMD64 + Description: UID GuidsNumber + Base GUID: ec0d9a0300c152e4 8 + Base MAC: ec0d9ac152e4 8 + Image VSD: N/A + Device VSD: N/A + PSID: MT_0000000012 + Security Attributes: N/A + ``` + +6. Check the firmware network protocol. + 1. Query the current network protocol. The ETH protocol is used as an example. + + ``` + ibdev2netdev -v + ``` + + ![](figures/en-us_image_0000001745738920.png) + + - If the NIC name prefix is **ib**, the current network protocol is IB. Go to [6.b](#li7704752131814). + - If the NIC name prefix is **en**, the current network protocol is ETH. Go to [7](#li71941554151214). + + 2. Query the values of **LINK\_TYPE\_P1** and **LINK\_TYPE\_P2**. The following uses **/dev/mst/mt4123\_pciconf0** as an example. + + ``` + mlxconfig -d /dev/mst/mt4123_pciconf0 q|grep LINK_TYPE_P1 + mlxconfig -d /dev/mst/mt4123_pciconf0 q|grep LINK_TYPE_P2 + ``` + + - If the command output is empty, the network protocol cannot be changed in the current environment. In this case, change the environment. + - If the query result is displayed, the network protocol can be modified. + - The queried values are expected to be **ETH\(2\)**. If so, go to [7](#li71941554151214). + + ![](figures/en-us_image_0000001792658861.png) + + - If the queried values are **IB\(1\)**, go to [6.c](#li1297794616574). + + ![](figures/en-us_image_0000001792578601.png) + + 3. Change the values of **LINK\_TYPE\_P1** and **LINK\_TYPE\_P2**. The following uses **/dev/mst/mt4123\_pciconf0** as an example. + + ``` + mlxconfig -d /dev/mst/mt4123_pciconf0 s LINK_TYPE_P1=2 + mlxconfig -d /dev/mst/mt4123_pciconf0 s LINK_TYPE_P2=2 + ``` + + ![](figures/en-us_image_0000001745738908.png) + + 4. Run the **reboot** command to reboot the system and perform [6.b](#li7704752131814) to verify that the modification is successful. + +7. Verify the RDMA network. + + Run the following command on the server node: + + ``` + ib_send_bw -d mlx5_1 + ``` + + Run the following command on the client node \(_xx.xx.xx.xx_ indicates the IP address of the server node\): + + ``` + ib_send_bw -d mlx5_1 xx.xx.xx.xx + ``` + +8. \(Optional\) Set firmware options. + + >![](public_sys-resources/icon-note.gif) **NOTE:** + >You are recommended to perform this step to reduce the network delay. + + 1. Query the value of the CX card firmware option **PCI\_WR\_ORDERING**. + + Take **/dev/mst/mt4119\_pciconf0** as an example. Query the firmware settings of the two ports of the device. In the query result, the value of **per\_mkey** is expected to be **1**. If not, go to [8.b](#li19557114614540). + + ``` + mlxconfig -d /dev/mst/mt4119_pciconf0 q | grep PCI_WR_ORDERING + mlxconfig -d /dev/mst/mt4119_pciconf0.1 q | grep PCI_WR_ORDERING + ``` + + ![](figures/en-us_image_0000001745738916.png) + + 2. Set the firmware option **PCI\_WR\_ORDERING** for the two ports of a CX5 card, and run the **reboot** command to restart the system. After the environment is restored, perform [8](#li1555794655417) again to check whether the modification is successful. + + ``` + mlxconfig -y -d /dev/mst/mt4119_pciconf0 s PCI_WR_ORDERING=1 + ``` + + ![](figures/en-us_image_0000001792658853.png) + + ``` + mlxconfig -y -d /dev/mst/mt4119_pciconf0.1 s PCI_WR_ORDERING=1 + ``` + + ![](figures/en-us_image_0000001745579752.png) + +# Configuring NIC IP Addresses + +1. View the association between Ethernet devices and IB devices/ports. + + ``` + ibdev2netdev -v + ``` + + - Name of the NIC associated with the NIC driver client mlx5\_0 on the current node: **enp24s0f0** + - Name of the NIC associated with the NIC driver client mlx5\_1 on the current node: **enp24s0f1** + + ![](figures/en-us_image_0000001792578597.png) + +2. Check the NIC status. + + ``` + ifconfig -a + ``` + + ![](figures/en-us_image_0000001792578593.png) + + If the four states are normal, the NIC can be used properly. + + - **UP** indicates that the NIC is enabled. + - **RUNNING** indicates that the network cable of the NIC is connected. + - **MULTICAST** indicates that multicasting is supported. + - **MTU 1500** indicates the maximum transmission unit. + +3. Configure the NIC IP address based on your environment. The following describes how to add the NIC IP address in the **/etc/sysconfig/network-scripts/ifcfg-enp24s0f0** configuration file. Run **systemctl restart network.service** to restart the application. + + ![](figures/en-us_image_0000001745579748.png) + + After the configuration is complete, check the NIC status by referring to [2](#li1681619318285). + + ![](figures/en-us_image_0000001745579744.png) + +# Common IB Commands + +**Table 1** Common IB commands + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Command

+

Description

+

lspci |grep Mell

+

Checks whether an IB card exists on the host (by searching for the vendor name Mellanox).

+

ibstatus

+

Views IB card information, including the link status, port rate, and port GUID.

+

ibstat

+

Has similar functions to those of ibstatus.

+

ofed_info -s

+

Queries the version of the installed driver.

+

ibv_devinfo

+

Queries the IB device information on the current node.

+

ibqueryerrors -C mlx4_0 -P 1

+

Queries the statistics of each port on the current IB network.

+

perfquery

+

Queries whether packet loss occurs on the IB card port and whether any port symbol is incorrect.

+

ibv_devices

+

Queries the IB card of the current node.

+

ibdump

+

Captures packets at the IB layer. It is provided by Mellanox.

+

ethtool --set-priv-flags eth-s0 sniffer on

+

Enables the sniffer function so that tcpdump can be used to capture packets.

+

ib_atomic_bw

+

Calculates the bandwidth of RDMA atomic transactions between a pair of machines (one server and one client). It obtains the time for receiving complete messages through CPU sampling to calculate the bandwidth. It supports two-way tests and allows you to change the MTU size, TX size, number of iterations, and message size. For more usage, see the -a parameter.

+

ib_atomic_lat

+

Calculates the delay of atomic transactions between a pair of machines in certain RDMA message size. The client sends RDMA atomic operations to the server, samples the CPU clock to obtain the time when all the messages are received, and calculates the delay.

+

ib_read_bw

+

Calculates the bandwidth of RDMA read operations between a pair of machines.

+

ib_read_lat

+

Calculates the read operation delay between a pair of machines in certain RDMA message size.

+

ib_send_bw -d mlx5_1

+

Calculates the RDMA send operation bandwidth between a pair of machines.

+

ib_send_lat

+

Calculates the send operation delay between a pair of machines in certain RDMA message size.

+

ib_write_bw

+

Calculates the RDMA write operation bandwidth between a pair of machines.

+

ib_write_lat

+

Calculates the write operation delay between a pair of machines in certain RDMA message size.

+

raw_ethernet_bw

+

Calculates the send bandwidth between a pair of machines.

+

raw_ethernet_lat

+

Calculates the delay for sending messages of a certain size between a pair of machines.

+

rping

+

Checks whether the RDMA CM connection is normal.

+
+ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001745579744.png b/app/en/blogs/RDMA/figures/en-us_image_0000001745579744.png new file mode 100644 index 0000000000000000000000000000000000000000..5afc7ae67dbc12459b144eaee658efda4deae424 Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001745579744.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001745579748.png b/app/en/blogs/RDMA/figures/en-us_image_0000001745579748.png new file mode 100644 index 0000000000000000000000000000000000000000..5090e1ba6d0a64dac4028b8d34884325477688c2 Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001745579748.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001745579752.png b/app/en/blogs/RDMA/figures/en-us_image_0000001745579752.png new file mode 100644 index 0000000000000000000000000000000000000000..adc54afdba28d0f8f816ca9bf8fdcde1f2ea4a76 Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001745579752.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001745738904.png b/app/en/blogs/RDMA/figures/en-us_image_0000001745738904.png new file mode 100644 index 0000000000000000000000000000000000000000..67deb5f9173575918afacc16ed0b0b6c5906d4f0 Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001745738904.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001745738908.png b/app/en/blogs/RDMA/figures/en-us_image_0000001745738908.png new file mode 100644 index 0000000000000000000000000000000000000000..48437c8dcd9645a8a7800c15d6193bff17619589 Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001745738908.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001745738916.png b/app/en/blogs/RDMA/figures/en-us_image_0000001745738916.png new file mode 100644 index 0000000000000000000000000000000000000000..8f062c69ae64e37b3c5426ada17aa7675f9bc3ce Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001745738916.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001745738920.png b/app/en/blogs/RDMA/figures/en-us_image_0000001745738920.png new file mode 100644 index 0000000000000000000000000000000000000000..d3ad4a1c470be5773f07218c6581e6a4bf1a194e Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001745738920.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001792578593.png b/app/en/blogs/RDMA/figures/en-us_image_0000001792578593.png new file mode 100644 index 0000000000000000000000000000000000000000..05e6e0c2ff8b696be08652d9a0461db24561d53e Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001792578593.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001792578597.png b/app/en/blogs/RDMA/figures/en-us_image_0000001792578597.png new file mode 100644 index 0000000000000000000000000000000000000000..e86a5bf8707494643c15c9c73ec25cbe79f0975e Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001792578597.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001792578601.png b/app/en/blogs/RDMA/figures/en-us_image_0000001792578601.png new file mode 100644 index 0000000000000000000000000000000000000000..8a3a349b1b4b17065bb49d79d8afa478fcfc22ec Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001792578601.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001792658853.png b/app/en/blogs/RDMA/figures/en-us_image_0000001792658853.png new file mode 100644 index 0000000000000000000000000000000000000000..2c1a901d84a1f5daeca3d49a168903ce84f58e5d Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001792658853.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001792658857.png b/app/en/blogs/RDMA/figures/en-us_image_0000001792658857.png new file mode 100644 index 0000000000000000000000000000000000000000..021b2c08bb8f25129d3019f77b2c0990b888d31d Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001792658857.png differ diff --git a/app/en/blogs/RDMA/figures/en-us_image_0000001792658861.png b/app/en/blogs/RDMA/figures/en-us_image_0000001792658861.png new file mode 100644 index 0000000000000000000000000000000000000000..dae5c550cee5854d0b4733d3ea4e743020fb54cf Binary files /dev/null and b/app/en/blogs/RDMA/figures/en-us_image_0000001792658861.png differ diff --git a/app/en/blogs/RDMA/public_sys-resources/icon-caution.gif b/app/en/blogs/RDMA/public_sys-resources/icon-caution.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/en/blogs/RDMA/public_sys-resources/icon-caution.gif differ diff --git a/app/en/blogs/RDMA/public_sys-resources/icon-danger.gif b/app/en/blogs/RDMA/public_sys-resources/icon-danger.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/en/blogs/RDMA/public_sys-resources/icon-danger.gif differ diff --git a/app/en/blogs/RDMA/public_sys-resources/icon-note.gif b/app/en/blogs/RDMA/public_sys-resources/icon-note.gif new file mode 100644 index 0000000000000000000000000000000000000000..db3995e34b6644fc11c916ffe69c7cb5512610d8 Binary files /dev/null and b/app/en/blogs/RDMA/public_sys-resources/icon-note.gif differ diff --git a/app/en/blogs/RDMA/public_sys-resources/icon-notice.gif b/app/en/blogs/RDMA/public_sys-resources/icon-notice.gif new file mode 100644 index 0000000000000000000000000000000000000000..75397a3efc5c345922fd37f551d7d28675ab6c5f Binary files /dev/null and b/app/en/blogs/RDMA/public_sys-resources/icon-notice.gif differ diff --git a/app/en/blogs/RDMA/public_sys-resources/icon-tip.gif b/app/en/blogs/RDMA/public_sys-resources/icon-tip.gif new file mode 100644 index 0000000000000000000000000000000000000000..110cd67cefa9f6b2800a2b8076a7a0dcc00b783c Binary files /dev/null and b/app/en/blogs/RDMA/public_sys-resources/icon-tip.gif differ diff --git a/app/en/blogs/RDMA/public_sys-resources/icon-warning.gif b/app/en/blogs/RDMA/public_sys-resources/icon-warning.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/en/blogs/RDMA/public_sys-resources/icon-warning.gif differ diff --git a/app/en/blogs/weak-modules/Modification_of_the_weak-modules_Script_for_OS_Compatibility.md b/app/en/blogs/weak-modules/Modification_of_the_weak-modules_Script_for_OS_Compatibility.md new file mode 100644 index 0000000000000000000000000000000000000000..53af4d971096d4988360e89aabd562b511fd1537 --- /dev/null +++ b/app/en/blogs/weak-modules/Modification_of_the_weak-modules_Script_for_OS_Compatibility.md @@ -0,0 +1,163 @@ +# Symptom + +Although the **umdk-urma-kmod** and **umdk-urma-compat-ib-kmod** dependencies are successfully installed, the system displays a message indicating that the **ubcore.ko** module in kernel 4.19.90-2012.5.0.0054.oe1.x86\_64 is incompatible with the symbols of kernel 4.19.90-2109.1.0.0108.oe1.x86\_64. When the **modinfo** and **modprobe** commands are used to view and load the .ko files, the .ko file cannot be found. + +- Install the **umdk-urma-kmod** dependency. + + ``` + [root@localhost dlock]# rpm -ivh umdk-urma-kmod-1.3.0-206.3.0.B130.x86_64.rpm + Verifying... ################################# [100%] + Preparing... ################################# [100%] + Updating / installing... + 1:umdk-urma-kmod-1.3.0-206.3.0.B130################################# [100%] + /var/tmp/rpm-tmp.wSpThz: line 2: fg: no job control + Module ubcore.ko from kernel 4.19.90-2012.5.0.0054.oe1.x86_64 is not compatible with kernel 4.19.90-2109.1.0.0108.oe1.x86_64 in symbols: memcpy_s memset_s + Module uburma.ko from kernel 4.19.90-2012.5.0.0054.oe1.x86_64 is not compatible with kernel 4.19.90-2109.1.0.0108.oe1.x86_64 in symbols: snprintf_s + ``` + +- Install the **umdk-urma-compat-ib-kmod** dependency. + + ``` + [root@localhost dlock]# rpm -ivh umdk-urma-compat-ib-kmod-1.3.0-206.3.0.B130.x86_64.rpm + Verifying... ################################# [100%] + Preparing... ################################# [100%] + Updating / installing... + 1:umdk-urma-compat-ib-kmod-1.3.0-20################################# [100%] + Module uboib.ko from kernel 4.19.90-2012.5.0.0054.oe1.x86_64 is not compatible with kernel 4.19.90-2109.1.0.0108.oe1.x86_64 in symbols: ib_destroy_cq_user ib_register_client ib_set_client_data ib_destroy_qp_user rdma_query_gid ib_query_port ib_unregister_client memcpy_s __ib_create_cq ib_dealloc_pd_user backport_dependency_symbol ib_query_qp strcpy_s __ib_alloc_pd memset_s ib_create_qp + Created symlink /etc/systemd/system/multi-user.target.wants/uboib-module.service → /usr/lib/systemd/system/uboib-module.service. + ``` + +# Possible Cause + +The uboib module depends on OFED which is installed separately after the OS is installed. The OFED symbols are in **/lib/modules/**_current\_kernel\_version_**/extra**. When RPM dependencies are installed, the OFED symbols cannot be found when the **weak-modules** script is running, and the default symbols of the OS are used to correspond to the ubcore module. As a result, a compatibility error occurs. + +**OS Compatibility** + +**Table 1** OS compatibility information + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

OS

+

Compatibility Issue Detected

+

openEuler 20.03 SP1 x86

+

Yes

+

openEuler 20.03 SP1 Arm

+

Yes

+

openEuler 20.03 SP3 x86

+

No

+

openEuler 20.03 SP3 Arm

+

No

+

openEuler 20.03 LTS x86

+

No

+

openEuler 20.03 LTS Arm

+

No

+

openEuler 22.03 LTS x86

+

No

+

openEuler 22.03 LTS Arm

+

No

+
+ +**Root Cause** + +The urma kernel version does not match the system version. + +- Run the **rpm -ql umdk-urma-kmod** command to check the minor kernel version used for urma compilation. + + ``` + /etc/modules-load.d/ubcore.conf + /etc/modules-load.d/uburma.conf + /lib/modules/4.19.90-2012.5.0.0054.oe1.x86_64/extra/urma + /lib/modules/4.19.90-2012.5.0.0054.oe1.x86_64/extra/urma/ubcore.ko + /lib/modules/4.19.90-2012.5.0.0054.oe1.x86_64/extra/urma/uburma.ko + /usr/include/umdk + /usr/src/ubcore/Module.symvers + ``` + +- Run the **uname -r** command to query the minor version of the OS. + + ``` + 4.19.90-2109.1.0.0108.oe1.x86_64 + ``` + +According to the query result, the minor version of the kernel used for urma compilation does not match the minor version of the OS. + +**Direct Cause** + +The .ko file in use is not correctly obtained for compatibility check. As a result, the OS incorrectly determines that the .ko file is incompatible. + +The **weak-modules** script checks OS compatibility as follows: + +1. Obtain the symbol table when the current OS kernel is compiled, which is recorded as **S-kernel**. +2. Obtain the symbol tables of extra .ko files in **/lib/modules/$module\_krel/extra**, which is recorded as **S-extra**. +3. Combine the symbol tables in the first two steps into **S-all**. +4. Obtain the symbol table of the .ko file to be checked, which is recorded as **S-target**. +5. Check whether all symbols in **S-target** can be matched in **S-all**. + +- If all symbols in **S-target** can be found in **S-all**, the .ko file is determined as compatible. In this case, the symbolic link of the .ko file is created in **/lib/modules/$\(uname -r\)/weak-updates/**. +- If any symbol in **S-target** cannot be found in **S-all**, the .ko file is considered incompatible and an error is reported. For details about error handling, see [Solution](solution.md). + +# Precautions and Recommendations + +Before installing the RPM packages, check whether the minor kernel version used for urma compilation matches the minor version of the OS. If they match, install the RPM packages. + +If not, refer to [Solution](solution.md). + +# Solution + +Add the .ko files that have been installed using RPM to **S-all** to determine symbol dependency. + +1. Log in to the server and open the **weak-modules** script. + + ``` + vi /sbin/weak-modules + ``` + +2. Add the corresponding script to the white-background area shown in the following figure. + + ![](figures/weak.png) + + The script is as follows: + + ``` + if type "nm" > /dev/null; then + find -L /lib/modules/$krel -name '*.ko' \ + | xargs nm \ + | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' + else + for ko in `find -L /lib/modules/$krel -name '*.ko'` + do + modprobe --show-exports $ko 2>/dev/null + done + fi >> $tmpdir/symvers-$krel + ``` + +3. Save the content and perform the installation again. + diff --git a/app/en/blogs/weak-modules/figures/weak.png b/app/en/blogs/weak-modules/figures/weak.png new file mode 100644 index 0000000000000000000000000000000000000000..9958d0ad87ec4c20467a6d8f4bd2c7ecb5b0a538 Binary files /dev/null and b/app/en/blogs/weak-modules/figures/weak.png differ diff --git a/app/en/blogs/weak-modules/public_sys-resources/icon-caution.gif b/app/en/blogs/weak-modules/public_sys-resources/icon-caution.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/en/blogs/weak-modules/public_sys-resources/icon-caution.gif differ diff --git a/app/en/blogs/weak-modules/public_sys-resources/icon-danger.gif b/app/en/blogs/weak-modules/public_sys-resources/icon-danger.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/en/blogs/weak-modules/public_sys-resources/icon-danger.gif differ diff --git a/app/en/blogs/weak-modules/public_sys-resources/icon-note.gif b/app/en/blogs/weak-modules/public_sys-resources/icon-note.gif new file mode 100644 index 0000000000000000000000000000000000000000..db3995e34b6644fc11c916ffe69c7cb5512610d8 Binary files /dev/null and b/app/en/blogs/weak-modules/public_sys-resources/icon-note.gif differ diff --git a/app/en/blogs/weak-modules/public_sys-resources/icon-notice.gif b/app/en/blogs/weak-modules/public_sys-resources/icon-notice.gif new file mode 100644 index 0000000000000000000000000000000000000000..75397a3efc5c345922fd37f551d7d28675ab6c5f Binary files /dev/null and b/app/en/blogs/weak-modules/public_sys-resources/icon-notice.gif differ diff --git a/app/en/blogs/weak-modules/public_sys-resources/icon-tip.gif b/app/en/blogs/weak-modules/public_sys-resources/icon-tip.gif new file mode 100644 index 0000000000000000000000000000000000000000..110cd67cefa9f6b2800a2b8076a7a0dcc00b783c Binary files /dev/null and b/app/en/blogs/weak-modules/public_sys-resources/icon-tip.gif differ diff --git a/app/en/blogs/weak-modules/public_sys-resources/icon-warning.gif b/app/en/blogs/weak-modules/public_sys-resources/icon-warning.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/en/blogs/weak-modules/public_sys-resources/icon-warning.gif differ diff --git a/app/zh/blogs/RDMA/.keep b/app/zh/blogs/RDMA/.keep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/app/zh/blogs/RDMA/RDMA.md b/app/zh/blogs/RDMA/RDMA.md index c4e75aaad9f268d2109ad09903c18ad7fcdd1a8f..274e0d3b243274a6790891bc1b9871f4da54ea25 100644 --- a/app/zh/blogs/RDMA/RDMA.md +++ b/app/zh/blogs/RDMA/RDMA.md @@ -121,7 +121,7 @@ lspci |grep Mellanox ofed_info -s ``` - 回显驱动版本与安装MLNX驱动版本一致则版本无误。 + 回显驱动版本与[安装MLNX驱动](安装MLNX驱动.md)版本一致则版本无误。 3. 加载MST工具。 diff --git "a/app/zh/blogs/weak-modules/OS\345\205\274\345\256\271\346\200\247weak-modules\350\204\232\346\234\254\344\277\256\346\224\271\346\226\271\346\263\225.md" "b/app/zh/blogs/weak-modules/OS\345\205\274\345\256\271\346\200\247weak-modules\350\204\232\346\234\254\344\277\256\346\224\271\346\226\271\346\263\225.md" new file mode 100644 index 0000000000000000000000000000000000000000..77c1c14507c1f8aa6af356993ef90e53d1a86542 --- /dev/null +++ "b/app/zh/blogs/weak-modules/OS\345\205\274\345\256\271\346\200\247weak-modules\350\204\232\346\234\254\344\277\256\346\224\271\346\226\271\346\263\225.md" @@ -0,0 +1,163 @@ +# 问题现象 + +安装umdk-urma-kmod和umdk-urma-compat-ib-kmod依赖时,虽然安装成功,但是提示内核4.19.90-2012.5.0.0054.oe1.x86\_64中的模块ubcore.ko与内核4.19.90-2109.1.0.0108.oe1.x86\_64的符号不兼容,在使用**modinfo**和**modprobe**查看和加载ko文件时,无法找到ko文件。 + +- 安装umdk-urma-kmod依赖: + + ``` + [root@localhost dlock]# rpm -ivh umdk-urma-kmod-1.3.0-206.3.0.B130.x86_64.rpm + Verifying... ################################# [100%] + Preparing... ################################# [100%] + Updating / installing... + 1:umdk-urma-kmod-1.3.0-206.3.0.B130################################# [100%] + /var/tmp/rpm-tmp.wSpThz: line 2: fg: no job control + Module ubcore.ko from kernel 4.19.90-2012.5.0.0054.oe1.x86_64 is not compatible with kernel 4.19.90-2109.1.0.0108.oe1.x86_64 in symbols: memcpy_s memset_s + Module uburma.ko from kernel 4.19.90-2012.5.0.0054.oe1.x86_64 is not compatible with kernel 4.19.90-2109.1.0.0108.oe1.x86_64 in symbols: snprintf_s + ``` + +- 安装umdk-urma-compat-ib-kmod依赖: + + ``` + [root@localhost dlock]# rpm -ivh umdk-urma-compat-ib-kmod-1.3.0-206.3.0.B130.x86_64.rpm + Verifying... ################################# [100%] + Preparing... ################################# [100%] + Updating / installing... + 1:umdk-urma-compat-ib-kmod-1.3.0-20################################# [100%] + Module uboib.ko from kernel 4.19.90-2012.5.0.0054.oe1.x86_64 is not compatible with kernel 4.19.90-2109.1.0.0108.oe1.x86_64 in symbols: ib_destroy_cq_user ib_register_client ib_set_client_data ib_destroy_qp_user rdma_query_gid ib_query_port ib_unregister_client memcpy_s __ib_create_cq ib_dealloc_pd_user backport_dependency_symbol ib_query_qp strcpy_s __ib_alloc_pd memset_s ib_create_qp + Created symlink /etc/systemd/system/multi-user.target.wants/uboib-module.service → /usr/lib/systemd/system/uboib-module.service. + ``` + +# 问题原因 + +uboib依赖于OFED,而OFED是系统安装后单独安装的,其符号是在“/lib/modules/当前内核版本/extra“下的,在安装RPM依赖时,weak-modules脚本运行中找不到其符号,使用系统默认的符号去和ubcore等对应,判断兼容性出错。 + +**操作系统兼容性信息** + +**表 1** 操作系统兼容性信息 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

操作系统

+

存在兼容性问题

+

openEuler 20.03 SP1 x86

+

+

openEuler 20.03 SP1 Arm

+

+

openEuler 20.03 SP3 x86

+

+

openEuler 20.03 SP3 Arm

+

+

openEuler 20.03 LTS x86

+

+

openEuler 20.03 LTS Arm

+

+

openEuler 22.03 LTS x86

+

+

openEuler 22.03 LTS Arm

+

+
+ +**根本原因** + +urma内核版本和系统版本不匹配。 + +- 使用**rpm -ql umdk-urma-kmod**命令查看urma编译使用的内核小版本。 + + ``` + /etc/modules-load.d/ubcore.conf + /etc/modules-load.d/uburma.conf + /lib/modules/4.19.90-2012.5.0.0054.oe1.x86_64/extra/urma + /lib/modules/4.19.90-2012.5.0.0054.oe1.x86_64/extra/urma/ubcore.ko + /lib/modules/4.19.90-2012.5.0.0054.oe1.x86_64/extra/urma/uburma.ko + /usr/include/umdk + /usr/src/ubcore/Module.symvers + ``` + +- 使用**uname -r**查看系统小版本。 + + ``` + 4.19.90-2109.1.0.0108.oe1.x86_64 + ``` + +查询得知,urma编译使用的内核小版本和实际的系统小版本不匹配。 + +**直接原因** + +没有正确获取到实际使用的ko文件进行兼容性判断,导致误判成不兼容。 + +weak-modules脚本判断是否兼容流程: + +1. 获取当前系统内核编译时的符号表,这里记为**S-kernel**。 +2. 获取“/lib/modules/$module\_krel/extra“中额外ko文件的符号表,这里记为**S-extra**。 +3. 将前两步的符号表合并为**S-all**。 +4. 获取待检查的ko文件的符号表,这里记为**S-target**。 +5. 检查**S-target**中的所有符号是否能在**S-all**中找到匹配。 + +- 如果**S-target**中的所有符号都能在**S-all**中找到,那么兼容性就判断为可兼容,则在“/lib/modules/$\(uname -r\)/weak-updates/“中创建原ko文件的符号链接。 +- 如果**S-target**中有符号不能在**S-all**中找到,那么兼容性将判断为不兼容,将会直接报错。处理方法请参见[解决方法](解决方法.md)。 + +# 预防措施和规范建议 + +在安装RPM包之前,先检查urma编译使用的内核小版本和实际的系统小版本,版本匹配后再进行安装操作。 + +如果不匹配,亦可参见[解决方法](解决方法.md)处理。 + +# 解决方法 + +将当前已RPM安装的ko文件加入**S-all**中进行符号依赖的判断。 + +1. 登录服务器,打开weak-modules脚本文件。 + + ``` + vi /sbin/weak-modules + ``` + +2. 在图片白色位置新增对应脚本。 + + ![](figures/weak.png) + + 脚本如下: + + ``` + if type "nm" > /dev/null; then + find -L /lib/modules/$krel -name '*.ko' \ + | xargs nm \ + | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' + else + for ko in `find -L /lib/modules/$krel -name '*.ko'` + do + modprobe --show-exports $ko 2>/dev/null + done + fi >> $tmpdir/symvers-$krel + ``` + +3. 保存之后重新执行安装操作。 + diff --git a/app/zh/blogs/weak-modules/figures/weak.png b/app/zh/blogs/weak-modules/figures/weak.png new file mode 100644 index 0000000000000000000000000000000000000000..9958d0ad87ec4c20467a6d8f4bd2c7ecb5b0a538 Binary files /dev/null and b/app/zh/blogs/weak-modules/figures/weak.png differ diff --git a/app/zh/blogs/weak-modules/public_sys-resources/icon-caution.gif b/app/zh/blogs/weak-modules/public_sys-resources/icon-caution.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/zh/blogs/weak-modules/public_sys-resources/icon-caution.gif differ diff --git a/app/zh/blogs/weak-modules/public_sys-resources/icon-danger.gif b/app/zh/blogs/weak-modules/public_sys-resources/icon-danger.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/zh/blogs/weak-modules/public_sys-resources/icon-danger.gif differ diff --git a/app/zh/blogs/weak-modules/public_sys-resources/icon-note.gif b/app/zh/blogs/weak-modules/public_sys-resources/icon-note.gif new file mode 100644 index 0000000000000000000000000000000000000000..db3995e34b6644fc11c916ffe69c7cb5512610d8 Binary files /dev/null and b/app/zh/blogs/weak-modules/public_sys-resources/icon-note.gif differ diff --git a/app/zh/blogs/weak-modules/public_sys-resources/icon-notice.gif b/app/zh/blogs/weak-modules/public_sys-resources/icon-notice.gif new file mode 100644 index 0000000000000000000000000000000000000000..75397a3efc5c345922fd37f551d7d28675ab6c5f Binary files /dev/null and b/app/zh/blogs/weak-modules/public_sys-resources/icon-notice.gif differ diff --git a/app/zh/blogs/weak-modules/public_sys-resources/icon-tip.gif b/app/zh/blogs/weak-modules/public_sys-resources/icon-tip.gif new file mode 100644 index 0000000000000000000000000000000000000000..110cd67cefa9f6b2800a2b8076a7a0dcc00b783c Binary files /dev/null and b/app/zh/blogs/weak-modules/public_sys-resources/icon-tip.gif differ diff --git a/app/zh/blogs/weak-modules/public_sys-resources/icon-warning.gif b/app/zh/blogs/weak-modules/public_sys-resources/icon-warning.gif new file mode 100644 index 0000000000000000000000000000000000000000..81fb2aba954177efa588e675927082b1f6bed41f Binary files /dev/null and b/app/zh/blogs/weak-modules/public_sys-resources/icon-warning.gif differ