diff --git a/README-en.md b/README-en.md index 511700f2dd7bb94c350914e59c7f600fbd569241..006e468927366fb390b4d1ccde6a81b6ad48c464 100644 --- a/README-en.md +++ b/README-en.md @@ -8,7 +8,7 @@ DOCS contains all documents of the openEuler community, including the release no ### Searching for a Document in DOCS -1. Open the **docs** folder. The folder contains documents in Chinese (**zh** folder) and English (**en** folder). For example, in the **en** folder, the **docs** folder contains different documents, and the **menu** folder displays the list of documents. +1. Open the **docs** folder. The folder contains documents in Chinese (**zh** folder) and English (**en** folder). For example, in the **en** folder, the **docs** folder contains different documents, and the **menu** folder displays the list of documents. 2. Open the **docs** folder. You can see sub-folders, each with relevant documents inside. The following table shows an example of what the sub-folders may contain: | Folder | Content | @@ -25,34 +25,38 @@ DOCS contains all documents of the openEuler community, including the release no | **userGuide** | *openEuler Toolset User Guide* | | **StratoVirt** | *StratoVirt User Guide* | - ### Modifying a Document When the openEuler version information is updated, the documents here also need to be updated. Thank you for providing updates. ### Checking the Relationship Between Versions and Branches + Before performing updates, you need to first make sure the branch you choose is in consistant with the updated version. The DOCS contains the following four branches: | Branch | Description | Documentation | |--------|-------------|---------------| | **master** | development branch, which is the default branch || -| **stable2-1.0\_Base** | 1.0 Base version branch | **DOCS** > **1.0 BASE** on the [openEuler community website](https://openeuler.org/) | -| **stable2-20.03\_LTS** | 20.03 LTS version branch | **DOCS** > **20.03 LTS** on the [openEuler community website](https://openeuler.org/) | -| **stable2-20.09** | 20.09 version branch | **DOCS** > **20.09** on the [openEuler community website](https://openeuler.org/) | +| **stable2-1.0\_Base** | 1.0 Base version branch | **Learning** > **Documentation** > **1.0 BASE** on the [openEuler community website](https://openeuler.org/en/) | +| **stable2-20.03\_LTS** | 20.03 LTS version branch | **Learning** > **Documentation** > **20.03 LTS** on the [openEuler community website](https://openeuler.org/en/) | +| **stable2-20.09** | 20.09 version branch | **Learning** > **Documentation** > **20.09** on the [openEuler community website](https://openeuler.org/en/) | +| **stable2-23.03** | 23.03 version branch | **Learning** > **Documentation** > **23.03** on the [openEuler community website](https://openeuler.org/en/) | ### Participation + Create or reply to an issue: You can discuss with us by creating or replying to an issue. Submit a Pull Request (PR): You can participate in SIG by submitting a PR. Make a comment: You can submit comments on issues or PRs. You can also comment on the document through **Feedback** on the [website document page](https://docs.openeuler.org/en/). Welcome to submit PRs. ### Member + #### Maintainers + - Rudy_Tan[@rudy_tan](https://gitee.com/rudy_tan) - amyMaYun[@amy_mayun](https://gitee.com/amy_mayun) - qiaominna[@qiaominna](https://gitee.com/qiaominna) +### Contacting Us -### Contacting Us E-mail: doc@openeuler.org IRC: #openeuler-doc diff --git a/README.md b/README.md index 27bc8483e94c9999b00a8001c900eecff2128503..59280b2b2de9bee630f01a4d4d6b327f7ea88900 100644 --- a/README.md +++ b/README.md @@ -43,9 +43,10 @@ Docs当前使用如下4个分支: | 分支 | 说明 | 内容呈现 | |-----|----|----| | master | 开发分支,为默认分支|| -| stable2-1.0_Base | 1.0 Base版本分支 | 分支内容呈现在[openEuler社区](https://openeuler.org/)网站“文档->1.0 BASE | -| stable2-20.03_LTS | 20.03 LTS版本分支 | 分支内容呈现在[openEuler社区](https://openeuler.org/)网站“文档->20.03 LTS | -| stable2-20.09 | 20.09 版本分支 | 分支内容呈现在[openEuler社区](https://openeuler.org/)网站“文档->20.09 | +| stable2-1.0_Base | 1.0 Base版本分支 | 分支内容呈现在[openEuler社区](https://openeuler.org/)网站“学习->文档->1.0 BASE | +| stable2-20.03_LTS | 20.03 LTS版本分支 | 分支内容呈现在[openEuler社区](https://openeuler.org/)网站“学习->文档->20.03 LTS | +| stable2-20.09 | 20.09 版本分支 | 分支内容呈现在[openEuler社区](https://openeuler.org/)网站“学习->文档->20.09 | +| stable2-23.03 | 23.03 版本分支 | 分支内容呈现在[openEuler社区](https://openeuler.org/)网站“学习->文档->23.03 | ### 如何参与SIG diff --git a/docs/en/docs/A-Ops/architecture-awareness-service-manual.md b/docs/en/docs/A-Ops/architecture-awareness-service-manual.md deleted file mode 100644 index 7bd4426c511d9ebba88342b8cfe9f323c742fe93..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/architecture-awareness-service-manual.md +++ /dev/null @@ -1,77 +0,0 @@ -# Architecture Awareness Service Manual - -## Installation - -### Manual Installation - -- Installing using the repo source mounted by Yum. - - Configure the Yum sources **openEuler22.09** and **openEuler22.09:Epol** in the **/etc/yum.repos.d/openEuler.repo** file. - - ```ini - [everything] # openEuler 22.09 officially released repository - name=openEuler22.09 - baseurl=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/RPM-GPG-KEY-openEuler - - [Epol] # openEuler 22.09:Epol officially released repository - name=Epol - baseurl=https://repo.openeuler.org/openEuler-22.09/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/OS/$basearch/RPM-GPG-KEY-openEuler - ``` - - Run the following commands to download and install gala-spider and its dependencies. - - ```shell - # A-Ops architecture awareness service, usually installed on the master node - yum install gala-spider - yum install python3-gala-spider - - # A-Ops architecture awareness probe, usually installed on the master node - yum install gala-gopher - ``` - -- Installing using the RPM packages. Download **gala-spider-vx.x.x-x.oe1.aarch64.rpm**, and then run the following commands to install the modules. (`x.x-x` indicates the version. Replace it with the actual version number.) - - ```shell - rpm -ivh gala-spider-vx.x.x-x.oe1.aarch64.rpm - - rpm -ivh gala-gopher-vx.x.x-x.oe1.aarch64.rpm - ``` - - - -### Installing Using the A-Ops Deployment Service - -#### Editing the Task List - -Modify the deployment task list and enable the steps for gala_spider: - -```yaml ---- -step_list: - ... - gala_gopher: - enable: false - continue: false - gala_spider: - enable: false - continue: false - ... -``` - -#### Editing the Host List - -For details about the host configuration, see section 2.2.3.8 in the [Deployment Management Manual](./deployment-management-manual.md). - -#### Editing the Variable List - -For details about the variable configuration, see section 2.2.3.8 in the [Deployment Management Manual](./deployment-management-manual.md). - -#### Executing the Deployment Task - -See section 3 in the [Deployment Management Manual](./deployment-management-manual.md) to execute the deployment task. \ No newline at end of file diff --git a/docs/en/docs/A-Ops/configuration-source-tracing-service-manual.md b/docs/en/docs/A-Ops/configuration-source-tracing-service-manual.md deleted file mode 100644 index 5cb306fed3342b6636fcf86b37014fa57f3feba1..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/configuration-source-tracing-service-manual.md +++ /dev/null @@ -1,167 +0,0 @@ -gala-ragdoll Usage Guide -============================ - -## Installation - -#### Manual Installation - -- Installing using the repo source mounted by Yum. - - Configure the Yum sources **openEuler22.09** and **openEuler22.09:Epol** in the **/etc/yum.repos.d/openEuler.repo** file. - - ```ini - [everything] # openEuler 22.09 officially released repository - name=openEuler22.09 - baseurl=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/RPM-GPG-KEY-openEuler - - [Epol] # openEuler 22.09:Epol officially released repository - name=Epol - baseurl=https://repo.openeuler.org/openEuler-22.09/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/OS/$basearch/RPM-GPG-KEY-openEuler - - ``` - - Run the following commands to download and install gala--ragdoll and its dependencies. - - ```shell - yum install gala-ragdoll # A-Ops configuration source tracing service - yum install python3-gala-ragdoll - - yum install gala-spider # A-Ops architecture awareness service - yum install python3-gala-spider - ``` - -- Installing using the RPM packages. Download **gala-ragdoll-vx.x.x-x.oe1.aarch64.rpm**, and then run the following commands to install the modules. (`x.x-x` indicates the version. Replace it with the actual version number.) - - ```shell - rpm -ivh gala-ragdoll-vx.x.x-x.oe1.aarch64.rpm - ``` - - - -#### Installing Using the A-Ops Deployment Service - -##### Editing the Task List - -Modify the deployment task list and enable the steps for gala_ragdoll: - -```yaml ---- -step_list: - ... - gala_ragdoll: - enable: false - continue: false - ... -``` - -##### Editing the Host List - -For details about the host configuration, see section 2.2.3.10 in the [Deployment Management Manual](./deployment-management-manual.md). - -##### Editing the Variable List - -For details about the variable configuration, see section 2.2.3.10 in the [Deployment Management Manual](./deployment-management-manual.md). - -##### Executing the Deployment Task - -See section 3 in the [Deployment Management Manual](./deployment-management-manual.md) to execute the deployment task. - - - -### Configuration File Description - -```/etc/yum.repos.d/openEuler.repo``` is the configuration file used to specify the Yum source address. The content of the configuration file is as follows: - -``` -[OS] -name=OS -baseurl=http://repo.openeuler.org/openEuler-20.09/OS/$basearch/ -enabled=1 -gpgcheck=1 -gpgkey=http://repo.openeuler.org/openEuler-20.09/OS/$basearch/RPM-GPG-KEY-openEuler -``` - -### YANG Model Description - -`/etc/yum.repos.d/openEuler.repo` is expressed using the YANG language. For details, see `gala-ragdoll/yang_modules/openEuler-logos-openEuler.repo.yang`. -The following extended fields are added: - -| Extended Field Name | Extended Field Format| Example| -| ------------ | ---------------------- | ----------------------------------------- | -| path | OS_TYPE:CONFIGURATION_FILE_PATH | openEuler:/etc/yum.repos.d/openEuler.repo | -| type | Configuration file type | ini, key-value, json, text, and more | -| spacer | Spacer between a configuration item and its value | " ", "=", ":", and more | - -Attachment: Learning the YANG language: https://datatracker.ietf.org/doc/html/rfc7950/. - -### Creating Domains using Configuration Source Tracing - -#### Viewing the configuration file. - -gala-ragdoll contains the configuration file of the configuration source tracing. - -``` -[root@openeuler-development-1-1drnd ~]# cat /etc/ragdoll/gala-ragdoll.conf -[git] // Defines the current Git information, including the directory and user information of the Git repository. -git_dir = "/home/confTraceTestConf" -user_name = "user" -user_email = "email" - -[collect] // The collect interface provided by A-Ops. -collect_address = "http://192.168.0.0:11111" -collect_api = "/manage/config/collect" - -[ragdoll] -port = 11114 - -``` - -#### Creating the Configuration Domain - - -![](./figures/create_service_domain.png) - - - -#### Adding Managed Nodes to the Configuration Domain - -![](./figures/add_node.png) - - - -#### Adding Configurations to the Configuration Domain - - -![](./figures/add_config.png) - -#### Querying the Expected Configuration - - -![](./figures/view_expected_config.png) - -#### Deleting Configurations - -![](./figures/delete_config.png) - -#### Querying the Actual Configuration - -![](./figures/query_actual_config.png) - - - -#### Verifying the Configuration - - -![](./figures/query_status.png) - - - -#### Configuration Synchronization - -Not provided currently. diff --git a/docs/en/docs/A-Ops/deploying-aops-agent.md b/docs/en/docs/A-Ops/deploying-aops-agent.md deleted file mode 100644 index 6e8445dbe0a64eb479655266c96c19759458ec61..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/deploying-aops-agent.md +++ /dev/null @@ -1,670 +0,0 @@ - -# Deploying aops-agent -### 1. Environment Requirements - -One host running on openEuler 20.03 or later - -### 2. Configuration Environment Deployment - -#### 2.1 Disabling the Firewall - -```shell -systemctl stop firewalld -systemctl disable firewalld -systemctl status firewalld -``` - -#### 2.2 Deploying aops-agent - -1. Run `yum install aops-agent` to install aops-agent based on the Yum source. - -2. Modify the configuration file. Change the value of the **ip** in the agent section to the IP address of the local host. - -``` -vim /etc/aops/agent.conf -``` - - The following uses 192.168.1.47 as an example. - - ```ini - [agent] - ;IP address and port number bound when the aops-agent is started. - ip=192.168.1.47 - port=12000 - - [gopher] - ;Default path of the gala-gopher configuration file. If you need to change the path, ensure that the file path is correct. - config_path=/opt/gala-gopher/gala-gopher.conf - - ;aops-agent log collection configuration - [log] - ;Level of the logs to be collected, which can be set to DEBUG, INFO, WARNING, ERROR, or CRITICAL - log_level=INFO - ;Location for storing collected logs - log_dir=/var/log/aops - ;Maximum size of a log file - max_bytes=31457280 - ;Number of backup logs - backup_count=40 - ``` - -3. Run `systemctl start aops-agent` to start the service. - -#### 2.3 Registering with aops-manager - -To identify users and prevent APIs from being invoked randomly, aops-agent uses tokens to authenticate users, reducing the pressure on the deployed hosts. - -For security purposes, the active registration mode is used to obtain the token. Before the registration, prepare the information to be registered on aops-agent and run the `register` command to register the information with aops-manager. No database is configured for aops-agent. After the registration is successful, the token is automatically saved to the specified file and the registration result is displayed on the GUI. In addition, save the local host information to the aops-manager database for subsequent management. - -1. Prepare the **register.json** file. - - Prepare the information required for registration on aops-agent and save the information in JSON format. The data structure is as follows: - -```JSON -{ - // Name of the login user - "web_username":"admin", - // User password - "web_password": "changeme", - // Host name - "host_name": "host1", - // Name of the group to which the host belongs - "host_group_name": "group1", - // IP address of the host where aops-manager is running - "manager_ip":"192.168.1.23", - // Whether to register as a management host - "management":false, - // External port for running aops-manager - "manager_port":"11111", - // Port for running aops-agent - "agent_port":"12000" -} -``` - -Note: Ensure that aops-manager is running on the target host, for example, 192.168.1.23, and the registered host group exists. - -2. Run `aops_agent register -f register.json`. -3. The registration result is displayed on the GUI. If the registration is successful, the token character string is saved to a specified file. If the registration fails, locate the fault based on the message and log content (**/var/log/aops/aops.log**). - -The following is an example of the registration result: - -- Registration succeeded. - -```shell -[root@localhost ~]# aops_agent register -f register.json -Agent Register Success -``` - -- Registration failed. The following uses the aops-manager start failure as an example. - -```shell -[root@localhost ~]# aops_agent register -f register.json -Agent Register Fail -[root@localhost ~]# -``` - -- Log content - -```shell -2022-09-05 16:11:52,576 ERROR command_manage/register/331: HTTPConnectionPool(host='192.168.1.23', port=11111): Max retries exceeded with url: /manage/host/add (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) -[root@localhost ~]# -``` - -### 3. Plug-in Support - -#### 3.1 gala-gopher - -##### 3.1.1 Introduction - -gala-gopher is a low-load probe framework based on eBPF. It can be used to monitor the CPU, memory, and network status of hosts and collect data. You can configure the collection status of existing probes based on service requirements. - -##### 3.1.2 Deployment - -1. Run `yum install gala-gopher` to install gala-gopher based on the Yum source. -2. Enable probes based on service requirements. You can view information about probes in **/opt/gala-gopher/gala-gopher.conf**. -3. Run `systemctl start gala-gopher` to start the gala-gopher service. - -##### 3.1.3 Others - -For more information about gala-gopher, see https://gitee.com/openeuler/gala-gopher/blob/master/README.md. - -### 4. API Support - -#### 4.1 List of External APIs - -| No.| API | Type| Description | -| ---- | ------------------------------ | ---- | ----------------------| -| 1 | /v1/agent/plugin/start | POST | Starts a plug-in. | -| 2 | /v1/agent/plugin/stop | POST | Stops a plug-in. | -| 3 | /v1/agent/application/info | GET | Collects running applications in the target application collection.| -| 4 | /v1/agent/host/info | GET | Obtains host information. | -| 5 | /v1/agent/plugin/info | GET | Obtains the plug-in running information in aops-agent. | -| 6 | /v1/agent/file/collect | POST | Collects content of the configuration file. | -| 7 | /v1/agent/collect/items/change | POST | Changes the running status of plug-in collection items. | - -##### 4.1.1 /v1/agent/plugin/start - -+ Description: Starts the plug-in that is installed but not running. Currently, only the gala-gopher plug-in is supported. - -+ HTTP request mode: POST - -+ Data submission mode: query - -+ Request parameter - - | Parameter | Mandatory| Type| Description | - | ----------- | ---- | ---- | ------ | - | plugin_name | True | str | Plug-in name| - -+ Request parameter example - - | Parameter | Value | - | ----------- | ----------- | - | plugin_name | gala-gopher | - -+ Response body parameters - - | Parameter| Type| Description | - | ------ | ---- | ---------------- | - | code | int | Return code | - | msg | str | Information corresponding to the status code| - -+ Response example - - ```json - { - "code": 200, - "msg": "xxxx" - } - ``` - - -##### 4.1.2 /v1/agent/plugin/stop - -+ Description: Stops a running plug-in. Currently, only the gala-gopher plug-in is supported. - -+ HTTP request mode: POST - -+ Data submission mode: query - -+ Request parameter - - | Parameter | Mandatory| Type| Description | - | ----------- | ---- | ---- | ------ | - | plugin_name | True | str | Plug-in name| - -+ Request parameter example - - | Parameter | Value | - | ----------- | ----------- | - | plugin_name | gala-gopher | - -+ Response body parameters - - | Parameter| Type| Description | - | ------ | ---- | ---------------- | - | code | int | Return code | - | msg | str | Information corresponding to the status code| - -+ Response example - - ```json - { - "code": 200, - "msg": "xxxx" - } - ``` - - -##### 4.1.3 /v1/agent/application/info - -+ Description: Collects running applications in the target application collection. Currently, the target application collection contains MySQL, Kubernetes, Hadoop, Nginx, Docker, and gala-gopher. - -+ HTTP request mode: GET - -+ Data submission mode: query - -+ Request parameter - - | Parameter| Mandatory| Type| Description| - | ------ | ---- | ---- | ---- | - | | | | | - -+ Request parameter example - - | Parameter| Value| - | ------ | ------ | - | | | - -+ Response body parameters - - | Parameter| Type| Description | - | ------ | ---- | ---------------- | - | code | int | Return code | - | msg | str | Information corresponding to the status code| - | resp | dict | Response body | - - + resp - - | Parameter | Type | Description | - | ------- | --------- | -------------------------- | - | running | List[str] | List of the running applications| - -+ Response example - - ```json - { - "code": 200, - "msg": "xxxx", - "resp": { - "running": [ - "mysql", - "docker" - ] - } - } - ``` - - -##### 4.1.4 /v1/agent/host/info - -+ Description: Obtains information about the host where aops-agent is installed, including the system version, BIOS version, kernel version, CPU information, and memory information. - -+ HTTP request mode: POST - -+ Data submission mode: application/json - -+ Request parameter - - | Parameter | Mandatory| Type | Description | - | --------- | ---- | --------- | ------------------------------------------------ | - | info_type | True | List[str] | List of the information to be collected. Currently, only the CPU, disk, memory, and OS are supported.| - -+ Request parameter example - - ```json - ["os", "cpu","memory", "disk"] - ``` - -+ Response body parameters - - | Parameter| Type| Description | - | ------ | ---- | ---------------- | - | code | int | Return code | - | msg | str | Information corresponding to the status code| - | resp | dict | Response body | - - + resp - - | Parameter| Type | Description | - | ------ | ---------- | -------- | - | cpu | dict | CPU information | - | memory | dict | Memory information| - | os | dict | OS information | - | disk | List[dict] | Disk information| - - + cpu - - | Parameter | Type| Description | - | ------------ | ---- | --------------- | - | architecture | str | CPU architecture | - | core_count | int | Number of cores | - | l1d_cache | str | L1 data cache size| - | l1i_cache | str | L1 instruction cache size| - | l2_cache | str | L2 cache size | - | l3_cache | str | L3 cache size | - | model_name | str | Model name | - | vendor_id | str | Vendor ID | - - + memory - - | Parameter| Type | Description | - | ------ | ---------- | -------------- | - | size | str | Total memory | - | total | int | Number of DIMMs | - | info | List[dict] | Information about all DIMMs| - - + info - - | Parameter | Type| Description | - | ------------ | ---- | -------- | - | size | str | Memory size| - | type | str | Type | - | speed | str | Speed | - | manufacturer | str | Vendor | - - + os - - | Parameter | Type| Description | - | ------------ | ---- | -------- | - | bios_version | str | BIOS version| - | os_version | str | OS version| - | kernel | str | Kernel version| - -+ Response example - - ```json - { - "code": 200, - "msg": "operate success", - "resp": { - "cpu": { - "architecture": "aarch64", - "core_count": "128", - "l1d_cache": "8 MiB (128 instances)", - "l1i_cache": "8 MiB (128 instances)", - "l2_cache": "64 MiB (128 instances)", - "l3_cache": "128 MiB (4 instances)", - "model_name": "Kunpeng-920", - "vendor_id": "HiSilicon" - }, - "memory": { - "info": [ - { - "manufacturer": "Hynix", - "size": "16 GB", - "speed": "2933 MT/s", - "type": "DDR4" - }, - { - "manufacturer": "Hynix", - "size": "16 GB", - "speed": "2933 MT/s", - "type": "DDR4" - } - ], - "size": "32G", - "total": 2 - }, - "os": { - "bios_version": "1.82", - "kernel": "5.10.0-60.18.0.50", - "os_version": "openEuler 22.03 LTS" - }, - "disk": [ - { - "capacity": "xxGB", - "model": "xxxxxx" - } - ] - } - } - ``` - - -##### 4.1.5 /v1/agent/plugin/info - -+ Description: Obtains the plug-in running status of the host. Currently, only the gala-gopher plug-in is supported. - -+ HTTP request mode: GET - -+ Data submission mode: query - -+ Request parameter - - | Parameter| Mandatory| Type| Description| - | ------ | ---- | ---- | ---- | - | | | | | - -+ Request parameter example - - | Parameter| Value| - | ------ | ------ | - | | | - -+ Response body parameters - - | Parameter| Type | Description | - | ------ | ---------- | ---------------- | - | code | int | Return code | - | msg | str | Information corresponding to the status code| - | resp | List[dict] | Response body | - - + resp - - | Parameter | Type | Description | - | ------------- | ---------- | ------------------ | - | plugin_name | str | Plug-in name | - | collect_items | list | Running status of plug-in collection items| - | is_installed | str | Information corresponding to the status code | - | resource | List[dict] | Plug-in resource usage | - | status | str | Plug-in running status | - - + resource - - | Parameter | Type| Description | - | ------------- | ---- | ---------- | - | name | str | Resource name | - | current_value | str | Resource usage| - | limit_value | str | Resource limit| - -+ Response example - - ``` - { - "code": 200, - "msg": "operate success", - "resp": [ - { - "collect_items": [ - { - "probe_name": "system_tcp", - "probe_status": "off", - "support_auto": false - }, - { - "probe_name": "haproxy", - "probe_status": "auto", - "support_auto": true - }, - { - "probe_name": "nginx", - "probe_status": "auto", - "support_auto": true - }, - ], - "is_installed": true, - "plugin_name": "gala-gopher", - "resource": [ - { - "current_value": "0.0%", - "limit_value": null, - "name": "cpu" - }, - { - "current_value": "13 MB", - "limit_value": null, - "name": "memory" - } - ], - "status": "active" - } - ] - } - ``` - - -##### 4.1.6 /v1/agent/file/collect - -+ Description: Collects information such as the content, permission, and owner of the target configuration file. Currently, only text files smaller than 1 MB, without execute permission, and supporting UTF8 encoding can be read. - -+ HTTP request mode: POST - -+ Data submission mode: application/json - -+ Request parameter - - | Parameter | Mandatory| Type | Description | - | --------------- | ---- | --------- | ------------------------ | - | configfile_path | True | List[str] | List of the full paths of the files to be collected| - -+ Request parameter example - - ```json - [ "/home/test.conf", "/home/test.ini", "/home/test.json"] - ``` - -+ Response body parameters - - | Parameter | Type | Description | - | ------------- | ---------- | ---------------- | - | infos | List[dict] | File collection information | - | success_files | List[str] | List of files successfully collected| - | fail_files | List[str] | List of files that fail to be collected| - - + infos - - | Parameter | Type| Description | - | --------- | ---- | -------- | - | path | str | File path| - | content | str | File content| - | file_attr | dict | File attributes| - - + file_attr - - | Parameter| Type| Description | - | ------ | ---- | ------------ | - | mode | str | Permission of the file type| - | owner | str | File owner| - | group | str | Group to which the file belongs| - -+ Response example - - ```json - { - "infos": [ - { - "content": "this is a test file", - "file_attr": { - "group": "root", - "mode": "0644", - "owner": "root" - }, - "path": "/home/test.txt" - } - ], - "success_files": [ - "/home/test.txt" - ], - "fail_files": [ - "/home/test.txt" - ] - } - ``` - - -##### 4.1.7 /v1/agent/collect/items/change - -+ Description: Changes the collection status of the plug-in collection items. Currently, only the status of the gala-gopher collection items can be changed. For the gala-gopher collection items, see **/opt/gala-gopher/gala-gopher.conf**. - -+ HTTP request mode: POST - -+ Data submission mode: application/json - -+ Request parameter - - | Parameter | Mandatory| Type| Description | - | ----------- | ---- | ---- | -------------------------- | - | plugin_name | True | dict | Expected modification result of the plug-in collection items| - - + plugin_name - - | Parameter | Mandatory| Type | Description | - | ------------ | ---- | ------ | ------------------ | - | collect_item | True | string | Expected modification result of the collection item| - -+ Request parameter example - - ```json - { - "gala-gopher":{ - "redis":"auto", - "system_inode":"on", - "tcp":"on", - "haproxy":"auto" - } - } - ``` - -+ Response body parameters - - | Parameter| Type | Description | - | ------ | ---------- | ---------------- | - | code | int | Return code | - | msg | str | Information corresponding to the status code| - | resp | List[dict] | Response body | - - + resp - - | Parameter | Type| Description | - | ----------- | ---- | ------------------ | - | plugin_name | dict | Modification result of the corresponding collection item| - - + plugin_name - - | Parameter | Type | Description | - | ------- | --------- | ---------------- | - | success | List[str] | Collection items that are successfully modified| - | failure | List[str] | Collection items that fail to be modified| - -+ Response example - - ```json - { - "code": 200, - "msg": "operate success", - "resp": { - "gala-gopher": { - "failure": [ - "redis" - ], - "success": [ - "system_inode", - "tcp", - "haproxy" - ] - } - } - } - ``` - - - - ### FAQs - -1. If an error is reported, view the **/var/log/aops/aops.log** file, rectify the fault based on the error message in the log file, and restart the service. - -2. You are advised to run aops-agent in Python 3.7 or later. Pay attention to the version of the Python dependency library when installing it. - -3. The value of **access_token** can be obtained from the **/etc/aops/agent.conf** file after the registration is complete. - -4. To limit the CPU and memory resources of a plug-in, add **MemoryHigh** and **CPUQuota** to the **Service** section in the service file corresponding to the plug-in. - - For example, set the memory limit of gala-gopher to 40 MB and the CPU limit to 20%. - - ```ini - [Unit] - Description=a-ops gala gopher service - After=network.target - - [Service] - Type=exec - ExecStart=/usr/bin/gala-gopher - Restart=on-failure - RestartSec=1 - RemainAfterExit=yes - ;Limit the maximum memory that can be used by processes in the unit. The limit can be exceeded. However, after the limit is exceeded, the process running speed is limited, and the system reclaims the excess memory as much as possible. - ;The option value can be an absolute memory size in bytes (K, M, G, or T suffix based on 1024) or a relative memory size in percentage. - MemoryHigh=40M - ;Set the CPU time limit for the processes of this unit. The value must be a percentage ending with %, indicating the maximum percentage of the total time that the unit can use a single CPU. - CPUQuota=20% - - [Install] - WantedBy=multi-user.target - ``` - - - - - - diff --git a/docs/en/docs/A-Ops/deploying-aops.md b/docs/en/docs/A-Ops/deploying-aops.md deleted file mode 100644 index f7f2cb8d867ab915bf51bcdf7b002a172936a01e..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/deploying-aops.md +++ /dev/null @@ -1,460 +0,0 @@ -# Deploying A-Ops - -## 1. Environment Requirements - -- Two hosts running on openEuler 22.09 - - These two hosts are used to deploy two modes of the check module: scheduler and executor. Other services, such as MySQL, Elasticsearch, and aops-manager, can be independently deployed on any host. To facilitate operations, deploy these services on host A. - -- More than 8 GB memory - -## 2. Configuring the Deployment Environment - -### Host A: - -Deploy the following A-Ops services on host A: aops-tools, aops-manager, aops-check, aops-web, aops-agent, and gala-gopher. - -Deploy the following third-party services on host A: MySQL, Elasticsearch, ZooKeeper, Kafka, and Prometheus. - -The deployment procedure is as follows: - -#### 2.1 Disabling the Firewall - -Disable the firewall on the local host. - -``` -systemctl stop firewalld -systemctl disable firewalld -systemctl status firewalld -``` - -#### 2.2 Deploying aops-tools - -Install aops-tools. - -``` -yum install aops-tools -``` - -#### 2.3 Deploying Databases MySQL and Elasticsearch - -##### 2.3.1 Deploying MySQL - -Use the **aops-basedatabase** script installed during aops-tools installation to install MySQL. - -``` -cd /opt/aops/aops_tools -./aops-basedatabase mysql -``` - -Modify the MySQL configuration file. - -``` -vim /etc/my.cnf -``` - -Add **bind-address** and set it to the IP address of the local host. - -![1662346986112](./figures/修改mysql配置文件.png) - -Restart the MySQL service. - -``` -systemctl restart mysqld -``` - -Connect to the database and set the permission. - -``` -mysql -show databases; -use mysql; -select user,host from user;// If the value of user is root and the value of host is localhost, the MySQL database can be connected only by the local host but cannot be connected from the external network and by the local software client. -update user set host = '%' where user='root'; -flush privileges;// Refresh the permission. -exit -``` - -##### 2.3.2 Deploying Elasticsearch - -Use the **aops-basedatabase** script installed during aops-tools installation to install Elasticsearch. - -``` -cd /opt/aops/aops_tools -./aops-basedatabase elasticsearch -``` - -Modify the configuration file. - -Modify the Elasticsearch configuration file. - -``` -vim /etc/elasticsearch/elasticsearch.yml -``` - -![1662370718890](./figures/elasticsearch配置2.png) - -![1662370575036](./figures/elasticsearch配置1.png) - -![1662370776219](./figures/elasticsearch3.png) - -Restart the Elasticsearch service. - -``` -systemctl restart elasticsearch -``` - -#### 2.4 Deploying aops-manager - -Install aops-manager. - -``` -yum install aops-manager -``` - -Modify the configuration file. - -``` -vim /etc/aops/manager.ini -``` - -Change the IP address of each service in the configuration file to the actual IP address. Because all services are deployed on host A, you need to set their IP addresses to the actual IP address of host A. - -``` -[manager] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=11111 -host_vault_dir=/opt/aops -host_vars=/opt/aops/host_vars - -[uwsgi] -wsgi-file=manage.py -daemonize=/var/log/aops/uwsgi/manager.log -http-timeout=600 -harakiri=600 - -[elasticsearch] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=9200 -max_es_query_num=10000000 - -[mysql] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=3306 -database_name=aops -engine_format=mysql+pymysql://@%s:%s/%s -pool_size=10000 -pool_recycle=7200 - -[aops_check] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=11112 -``` - -Start the aops-manager service. - -``` -systemctl start aops-manager -``` - -#### 2.5 Deploying aops-web - -Install aops-web. - -``` -yum install aops-web -``` - -Modify the configuration file. Because all services are deployed on host A, set the IP address of each service accessed by aops-web to the actual IP address of host A. - -``` -vim /etc/nginx/aops-nginx.conf -``` - -The following figure shows the configuration of some services. - -![1662378186528](./figures/配置web.png) - -Enable the aops-web service. - -``` -systemctl start aops-web -``` - -#### 2.6 Deploying Kafka - -##### 2.6.1 Deploying ZooKeeper - -Install ZooKeeper. - -``` -yum install zookeeper -``` - -Start the ZooKeeper service. - -``` -systemctl start zookeeper -``` - -##### 2.6.2 Deploying Kafka - -Install Kafka. - -``` -yum install kafka -``` - -Modify the configuration file. - -``` -vim /opt/kafka/config/server.properties -``` - -Change the value of **listeners** to the IP address of the local host. - -![1662381371927](./figures/kafka配置.png) - -Start the Kafka service. - -``` -cd /opt/kafka/bin -nohup ./kafka-server-start.sh ../config/server.properties & -tail -f ./nohup.out # Check all the outputs of nohup. If the IP address of host A and the Kafka startup success INFO are displayed, Kafka is started successfully. -``` - -#### 2.7 Deploying aops-check - -Install aops-check. - -``` -yum install aops-check -``` - -Modify the configuration file. - -``` -vim /etc/aops/check.ini -``` - -Change the IP address of each service in the configuration file to the actual IP address. Because all services are deployed on host A, you need to set their IP addresses to the actual IP address of host A. - -``` -[check] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=11112 -mode=configurable // Configurable mode, which means aops-check is used as the scheduler in common diagnosis mode. -timing_check=on - -[default_mode] -period=30 -step=30 - -[elasticsearch] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=9200 - -[mysql] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=3306 -database_name=aops -engine_format=mysql+pymysql://@%s:%s/%s -pool_size=10000 -pool_recycle=7200 - -[prometheus] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=9090 -query_range_step=15s - -[agent] -default_instance_port=8888 - -[manager] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=11111 - -[consumer] -kafka_server_list=192.168.1.1:9092 // Change the service IP address to the actual IP address of host A. -enable_auto_commit=False -auto_offset_reset=earliest -timeout_ms=5 -max_records=3 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -result_name=CHECK_RESULT -[producer] -kafka_server_list = 192.168.1.1:9092 // Change the service IP address to the actual IP address of host A. -api_version = 0.11.5 -acks = 1 -retries = 3 -retry_backoff_ms = 100 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -``` - -Start the aops-check service in configurable mode. - -``` -systemctl start aops-check -``` - -#### 2.8 Deploying the Client Services - -aops-agent and gala-gopher must be deployed on the client. For details, see the [Deploying aops-agent](deploying-aops-agent.md). - -Note: Before registering a host, you need to add a host group to ensure that the host group to which the host belongs exists. In this example, only host A is deployed and managed. - -#### 2.9 Deploying Prometheus - -Install Prometheus. - -``` -yum install prometheus2 -``` - -Modify the configuration file. - -``` -vim /etc/prometheus/prometheus.yml -``` - -Add the gala-gopher addresses of all clients to the monitoring host of Prometheus. - -![1662377261742](./figures/prometheus配置.png) - -Start the Prometheus service: - -``` -systemctl start prometheus -``` - -#### 2.10 Deploying gala-ragdoll - -The configuration source tracing function of A-Ops depends on gala-ragdoll. Git is used to monitor configuration file changes. - -Install gala-ragdoll. - -```shell -yum install gala-ragdoll # A-Ops configuration source tracing -``` - -Modify the configuration file. - -```shell -vim /etc/ragdoll/gala-ragdoll.conf -``` - -Change the IP address in **collect_address** of the **collect** section to the IP address of host A, and change the values of **collect_api** and **collect_port** to the actual API and port number. - -``` -[git] -git_dir = "/home/confTraceTest" -user_name = "user_name" -user_email = "user_email" - -[collect] -collect_address = "http://192.168.1.1" // Change the IP address to the actual IP address of host A. -collect_api = "/manage/config/collect" // Change the API to the actual API for collecting configuration files. -collect_port = 11111 // Change the port number to the actual port number of the service. - -[sync] -sync_address = "http://0.0.0.0" -sync_api = "/demo/syncConf" -sync_port = 11114 - - -[ragdoll] -port = 11114 - -``` - -Start the gala-ragdoll service. - -```shell -systemctl start gala-ragdoll -``` - -### Host B: - -Only aops-check needs to be deployed on host B as the executor. - -#### 2.11 Deploying aops-check - -Install aops-check. - -``` -yum install aops-check -``` - -Modify the configuration file. - -``` -vim /etc/aops/check.ini -``` - -Change the IP address of each service in the configuration file to the actual IP address. Change the IP address of the aops-check service deployed on host B to the IP address of host B. Because other services are deployed on host A, change the IP addresses of those services to the IP address of host A. - -``` -[check] -ip=192.168.1.2 // Change the IP address to the actual IP address of host B. -port=11112 -mode=executor // Executor mode, which means aops-check is used as the executor in normal diagnosis mode. -timing_check=on - -[default_mode] -period=30 -step=30 - -[elasticsearch] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=9200 - -[mysql] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=3306 -database_name=aops -engine_format=mysql+pymysql://@%s:%s/%s -pool_size=10000 -pool_recycle=7200 - -[prometheus] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=9090 -query_range_step=15s - -[agent] -default_instance_port=8888 - -[manager] -ip=192.168.1.1 // Change the service IP address to the actual IP address of host A. -port=11111 - -[consumer] -kafka_server_list=192.168.1.1:9092 // Change the service IP address to the actual IP address of host A. -enable_auto_commit=False -auto_offset_reset=earliest -timeout_ms=5 -max_records=3 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -result_name=CHECK_RESULT -[producer] -kafka_server_list = 192.168.1.1:9092 // Change the service IP address to the actual IP address of host A. -api_version = 0.11.5 -acks = 1 -retries = 3 -retry_backoff_ms = 100 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -``` - -Start the aops-check service in executor mode. - -``` -systemctl start aops-check -``` - - - -The service deployment on the two hosts is complete. diff --git a/docs/en/docs/A-Ops/figures/0BFA7C40-D404-4772-9C47-76EAD7D24E69.png b/docs/en/docs/A-Ops/figures/0BFA7C40-D404-4772-9C47-76EAD7D24E69.png deleted file mode 100644 index 910f58dbf8fb13d52826b7c74728f4c28599660f..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/0BFA7C40-D404-4772-9C47-76EAD7D24E69.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/1631073636579.png b/docs/en/docs/A-Ops/figures/1631073636579.png deleted file mode 100644 index 5aacc487264ac63fbe5322b4f89fca3ebf9c7cd9..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/1631073636579.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/1631073840656.png b/docs/en/docs/A-Ops/figures/1631073840656.png deleted file mode 100644 index 122e391eafe7c0d8d081030a240df90aea260150..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/1631073840656.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/1631101736624.png b/docs/en/docs/A-Ops/figures/1631101736624.png deleted file mode 100644 index 74e2f2ded2ea254c66b221e8ac27a0d8bed9362a..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/1631101736624.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/1631101865366.png b/docs/en/docs/A-Ops/figures/1631101865366.png deleted file mode 100644 index abfbc280a368b93af1e1165385af3a9cac89391d..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/1631101865366.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/1631101982829.png b/docs/en/docs/A-Ops/figures/1631101982829.png deleted file mode 100644 index 0b1c9c7c3676b804dbdf19afbe4f3ec9dbe0627f..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/1631101982829.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/1631102019026.png b/docs/en/docs/A-Ops/figures/1631102019026.png deleted file mode 100644 index 54e8e7d1cffbb28711074e511b08c73f66c1fb75..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/1631102019026.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/20210908212726.png b/docs/en/docs/A-Ops/figures/20210908212726.png deleted file mode 100644 index f7d399aecd46605c09fe2d1f50a1a8670cd80432..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/20210908212726.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/D466AC8C-2FAF-4797-9A48-F6C346A1EC77.png b/docs/en/docs/A-Ops/figures/D466AC8C-2FAF-4797-9A48-F6C346A1EC77.png deleted file mode 100644 index 4b937ab846017ead71ca8b5a75b8af1f0f28e1ef..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/D466AC8C-2FAF-4797-9A48-F6C346A1EC77.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/a-ops_architecture.png b/docs/en/docs/A-Ops/figures/a-ops_architecture.png deleted file mode 100644 index 7a831b183e8cba5da16b9be9d965abe9811ada5b..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/a-ops_architecture.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/add_config.png b/docs/en/docs/A-Ops/figures/add_config.png deleted file mode 100644 index 18d71c2e099c19b5d28848eec6a8d11f29ccee27..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/add_config.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/add_fault_tree.png b/docs/en/docs/A-Ops/figures/add_fault_tree.png deleted file mode 100644 index 664efd5150fcb96f009ce0eddc3d9ac91b9e622f..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/add_fault_tree.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/add_host_group.png b/docs/en/docs/A-Ops/figures/add_host_group.png deleted file mode 100644 index ed4ab3616d418ecf33a006fee3985b8b6d2d965d..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/add_host_group.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/add_node.png b/docs/en/docs/A-Ops/figures/add_node.png deleted file mode 100644 index d68f5e12a62548f2ec59374bda9ab07f43b8b5cb..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/add_node.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/check.PNG b/docs/en/docs/A-Ops/figures/check.PNG deleted file mode 100644 index 2dce821dd43eec6f0d13cd6b2dc1e30653f35489..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/check.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/create_service_domain.png b/docs/en/docs/A-Ops/figures/create_service_domain.png deleted file mode 100644 index 4f5b8de2d2c4ddb9bfdfba1ac17258a834561e2d..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/create_service_domain.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/dashboard.PNG b/docs/en/docs/A-Ops/figures/dashboard.PNG deleted file mode 100644 index 2a4a827191367309aad28a8a6c1835df602bdf72..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/dashboard.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/decryption.png b/docs/en/docs/A-Ops/figures/decryption.png deleted file mode 100644 index da07cfdf9296e201a82cceb210e651261fe7ecee..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/decryption.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/delete_config.png b/docs/en/docs/A-Ops/figures/delete_config.png deleted file mode 100644 index cfea2eb44f7b8aa809404b8b49b4bd2e24172568..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/delete_config.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/delete_host_group.png b/docs/en/docs/A-Ops/figures/delete_host_group.png deleted file mode 100644 index e4d85f6e3f1a269a483943f5115f54daa3de51de..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/delete_host_group.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/delete_hosts.png b/docs/en/docs/A-Ops/figures/delete_hosts.png deleted file mode 100644 index b3da935739369dad1318fe135146755ede13c694..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/delete_hosts.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/deploy.PNG b/docs/en/docs/A-Ops/figures/deploy.PNG deleted file mode 100644 index e30dcb0eb05eb4f41202c736863f3e0ff216398d..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/deploy.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/diag.PNG b/docs/en/docs/A-Ops/figures/diag.PNG deleted file mode 100644 index a67e8515b8313a50b06cb985611ef9c166851811..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/diag.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/diag_error1.png b/docs/en/docs/A-Ops/figures/diag_error1.png deleted file mode 100644 index 9e5b1139febe9f00156b37f3268269ac30a78737..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/diag_error1.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/diag_main_page.png b/docs/en/docs/A-Ops/figures/diag_main_page.png deleted file mode 100644 index b536af938250004bac3053b234bf20bcbf075c9b..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/diag_main_page.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/diagnosis.png b/docs/en/docs/A-Ops/figures/diagnosis.png deleted file mode 100644 index 2c85102fe28deaac0a35fde85fd4497994d2c031..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/diagnosis.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/domain.PNG b/docs/en/docs/A-Ops/figures/domain.PNG deleted file mode 100644 index bad499f96df5934565d36edf2308cec5e4147719..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/domain.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/domain_config.PNG b/docs/en/docs/A-Ops/figures/domain_config.PNG deleted file mode 100644 index 8995424b35cda75f08881037446b7816a0ca09dc..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/domain_config.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/execute_diag.png b/docs/en/docs/A-Ops/figures/execute_diag.png deleted file mode 100644 index afb5f7e9fbfb1d1ce46d096a61729766b4940cd3..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/execute_diag.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/group.PNG b/docs/en/docs/A-Ops/figures/group.PNG deleted file mode 100644 index 584fd1f7195694a3419482cace2a71fa1cd9a3ec..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/group.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/host.PNG b/docs/en/docs/A-Ops/figures/host.PNG deleted file mode 100644 index 3c00681a567cf8f1e1baddfb6fdb7b6cf7df43de..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/host.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/hosts.png b/docs/en/docs/A-Ops/figures/hosts.png deleted file mode 100644 index f4c7b9103baab7748c83392f6120c8f00880860f..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/hosts.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/hosts_in_group.png b/docs/en/docs/A-Ops/figures/hosts_in_group.png deleted file mode 100644 index 9f188d207162fa1418a61a10f83ef9c51a512e65..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/hosts_in_group.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/query_actual_config.png b/docs/en/docs/A-Ops/figures/query_actual_config.png deleted file mode 100644 index d5f6e450fc0e1e246492ca71a6fcd8db572eb469..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/query_actual_config.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/query_status.png b/docs/en/docs/A-Ops/figures/query_status.png deleted file mode 100644 index a3d0b3294bf6e0eeec50a2c2f8c5059bdc256376..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/query_status.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/spider.PNG b/docs/en/docs/A-Ops/figures/spider.PNG deleted file mode 100644 index 53bad6dd38e36db9cadfdbeda21cbc3ef59eddf7..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/spider.PNG and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/spider_detail.jpg b/docs/en/docs/A-Ops/figures/spider_detail.jpg deleted file mode 100644 index b69636fe2161380be56f37caf7fd904d2e63e302..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/spider_detail.jpg and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/view_expected_config.png b/docs/en/docs/A-Ops/figures/view_expected_config.png deleted file mode 100644 index bbead6a91468d5dee570cfdc66faf9a4ab155d7c..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/view_expected_config.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/view_fault_tree.png b/docs/en/docs/A-Ops/figures/view_fault_tree.png deleted file mode 100644 index a566417b18e8bcf19153730904893fc8d827d885..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/view_fault_tree.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/view_report.png b/docs/en/docs/A-Ops/figures/view_report.png deleted file mode 100644 index 2029141179302ecef45d34cb0c9dc916b9142e7b..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/view_report.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/figures/view_report_list.png b/docs/en/docs/A-Ops/figures/view_report_list.png deleted file mode 100644 index 58307ec6ef4c73b6b0f039b1052e5870629ac2e8..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Ops/figures/view_report_list.png and /dev/null differ diff --git a/docs/en/docs/A-Ops/overview.md b/docs/en/docs/A-Ops/overview.md deleted file mode 100644 index cc9cbb1ef494f11184eaa2b2a1a40ada1f86db02..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/overview.md +++ /dev/null @@ -1,4 +0,0 @@ -# A-Ops User Guide - -This document describes the A-Ops intelligent O&M framework and how to install and use services such as intelligent location and configuration source tracing, helping you quickly understand and use A-Ops. By using A-Ops, you can reduce the O&M cost of the system cluster, quickly locate system faults, and centrally manage configuration items. - diff --git a/docs/en/docs/A-Ops/using-gala-anteater.md b/docs/en/docs/A-Ops/using-gala-anteater.md deleted file mode 100644 index 92907378c87b4d1b338584bc598db0763aadb5bf..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/using-gala-anteater.md +++ /dev/null @@ -1,157 +0,0 @@ -# Using gala-anteater - -gala-anteater is an AI-based operating system exception detection platform. It provides functions such as time series data preprocessing, exception detection, and exception reporting. Based on offline pre-training, online model incremental learning and model update, it can be well adapted to multi-dimensional and multi-modal data fault diagnosis. - -This chapter describes how to deploy and use the gala-anteater service. - -#### Installation - -Mount the repo sources. - -```basic -[oe-2209] # openEuler 22.09 officially released repository -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 22.09: Epol officially released repository -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -Install gala-anteater. - -```bash -# yum install gala-anteater -``` - - - -#### Configuration - -> Note: gala-anteater does not contain the config file that needs to be configured. Its parameters are passed through the startup parameters using the command line. - -##### Startup Parameters - -| Parameter| Parameter Full Name| Type| Mandatory (Yes/No)| Default Value| Name| Description| -|---|---|---|---|---|---|---| -| -ks | --kafka_server | string | True | | KAFKA_SERVER | IP address of the Kafka server, for example, **localhost / xxx.xxx.xxx.xxx**.| -| -kp | --kafka_port | string | True | | KAFKA_PORT | Port number of the Kafka server, for example, **9092**.| -| -ps | --prometheus_server | string | True | | PROMETHEUS_SERVER | IP address of the Prometheus server, for example, **localhost / xxx.xxx.xxx.xxx**.| -| -pp | --prometheus_port | string | True | | PROMETHEUS_PORT | Port number of the Prometheus server, for example, **9090**.| -| -m | --model | string | False | vae | MODEL | Exception detection model. Currently, two exception detection models are supported: **random_forest** and **vae**.
**random_forest**: random forest model, which does not support online learning
**vae**: Variational Atuoencoder (VAE), which is an unsupervised model and supports model update based on historical data during the first startup.| -| -d | --duration | int | False | 1 | DURATION | Frequency of executing the exception detection model. The unit is minute, which means that the detection is performed every *x* minutes.| -| -r | --retrain | bool | False | False | RETRAIN | Whether to use historical data to update and iterate the model during startup. Currently, only the VAE model is supported.| -| -l | --look_back | int | False | 4 | LOOK_BACK | Whether to update the model based on the historical data of the last *x* days.| -| -t | --threshold | float | False | 0.8 | THRESHOLD | Threshold of the exception detection model, ranging from 0 to 1. A larger value can reduce the false positive rate of the model. It is recommended that the value be greater than or equal to 0.5.| -| -sli | --sli_time | int | False | 400 | SLI_TIME | Application performance metric. The unit is ms. A larger value can reduce the false positive rate of the model. It is recommended that the value be greater than or equal to 200.
For scenarios with a high false positive rate, it is recommended that the value be greater than 1000.| - - - -#### Start - -Start gala-anteater. - -> Note: gala-anteater can be started and run in command line mode, but cannot be started and run in systemd mode. - -- Running in online training mode (recommended) -```bash -gala-anteater -ks {ip} -kp {port} -ps {ip} -pp {port} -m vae -r True -l 7 -t 0.6 -sli 400 -``` - -- Running in common mode -```bash -gala-anteater -ks {ip} -kp {port} -ps {ip} -pp {port} -m vae -t 0.6 -sli 400 -``` - -Query the gala-anteater service status. - -If the following information is displayed, the service is started successfully. The startup log is saved to the **logs/anteater.log** file in the current running directory. - -```log -2022-09-01 17:52:54,435 - root - INFO - Run gala_anteater main function... -2022-09-01 17:52:54,436 - root - INFO - Start to try updating global configurations by querying data from Kafka! -2022-09-01 17:52:54,994 - root - INFO - Loads metric and operators from file: xxx\metrics.csv -2022-09-01 17:52:54,997 - root - INFO - Loads metric and operators from file: xxx\metrics.csv -2022-09-01 17:52:54,998 - root - INFO - Start to re-train the model based on last day metrics dataset! -2022-09-01 17:52:54,998 - root - INFO - Get training data during 2022-08-31 17:52:00+08:00 to 2022-09-01 17:52:00+08:00! -2022-09-01 17:53:06,994 - root - INFO - Spends: 11.995422840118408 seconds to get unique machine_ids! -2022-09-01 17:53:06,995 - root - INFO - The number of unique machine ids is: 1! -2022-09-01 17:53:06,996 - root - INFO - Fetch metric values from machine: xxxx. -2022-09-01 17:53:38,385 - root - INFO - Spends: 31.3896164894104 seconds to get get all metric values! -2022-09-01 17:53:38,392 - root - INFO - The shape of training data: (17281, 136) -2022-09-01 17:53:38,444 - root - INFO - Start to execute vae model training... -2022-09-01 17:53:38,456 - root - INFO - Using cpu device -2022-09-01 17:53:38,658 - root - INFO - Epoch(s): 0 train Loss: 136.68 validate Loss: 117.00 -2022-09-01 17:53:38,852 - root - INFO - Epoch(s): 1 train Loss: 113.73 validate Loss: 110.05 -2022-09-01 17:53:39,044 - root - INFO - Epoch(s): 2 train Loss: 110.60 validate Loss: 108.76 -2022-09-01 17:53:39,235 - root - INFO - Epoch(s): 3 train Loss: 109.39 validate Loss: 106.93 -2022-09-01 17:53:39,419 - root - INFO - Epoch(s): 4 train Loss: 106.48 validate Loss: 103.37 -... -2022-09-01 17:53:57,744 - root - INFO - Epoch(s): 98 train Loss: 97.63 validate Loss: 96.76 -2022-09-01 17:53:57,945 - root - INFO - Epoch(s): 99 train Loss: 97.75 validate Loss: 96.58 -2022-09-01 17:53:57,969 - root - INFO - Schedule recurrent job with time interval 1 minute(s). -2022-09-01 17:53:57,973 - apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts -2022-09-01 17:53:57,974 - apscheduler.scheduler - INFO - Added job "partial" to job store "default" -2022-09-01 17:53:57,974 - apscheduler.scheduler - INFO - Scheduler started -2022-09-01 17:53:57,975 - apscheduler.scheduler - DEBUG - Looking for jobs to run -2022-09-01 17:53:57,975 - apscheduler.scheduler - DEBUG - Next wakeup is due at 2022-09-01 17:54:57.973533+08:00 (in 59.998006 seconds) -``` - - - -#### Output Data - -If gala-anteater detects an exception, it sends the result to Kafka. The output data format is as follows: - -```json -{ - "Timestamp":1659075600000, - "Attributes":{ - "entity_id":"xxxxxx_sli_1513_18", - "event_id":"1659075600000_1fd37742xxxx_sli_1513_18", - "event_type":"app" - }, - "Resource":{ - "anomaly_score":1.0, - "anomaly_count":13, - "total_count":13, - "duration":60, - "anomaly_ratio":1.0, - "metric_label":{ - "machine_id":"1fd37742xxxx", - "tgid":"1513", - "conn_fd":"18" - }, - "recommend_metrics":{ - "gala_gopher_tcp_link_notack_bytes":{ - "label":{ - "__name__":"gala_gopher_tcp_link_notack_bytes", - "client_ip":"x.x.x.165", - "client_port":"51352", - "hostname":"localhost.localdomain", - "instance":"x.x.x.172:8888", - "job":"prometheus-x.x.x.172", - "machine_id":"xxxxxx", - "protocol":"2", - "role":"0", - "server_ip":"x.x.x.172", - "server_port":"8888", - "tgid":"3381701" - }, - "score":0.24421279500639545 - }, - ... - }, - "metrics":"gala_gopher_ksliprobe_recent_rtt_nsec" - }, - "SeverityText":"WARN", - "SeverityNumber":14, - "Body":"TimeStamp, WARN, APP may be impacting sli performance issues." -} -``` diff --git a/docs/en/docs/A-Ops/using-gala-gopher.md b/docs/en/docs/A-Ops/using-gala-gopher.md deleted file mode 100644 index 6277655b7051d11c1254fbf98b63c5285e6d2846..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/using-gala-gopher.md +++ /dev/null @@ -1,228 +0,0 @@ -# Using gala-gopher - -As a data collection module, gala-gopher provides OS-level monitoring capabilities, supports dynamic probe installation and uninstallation, and integrates third-party probes in a non-intrusive manner to quickly expand the monitoring scope. - -This chapter describes how to deploy and use the gala-gopher service. - -#### Installation - -Mount the repo sources. - -```basic -[oe-2209] # openEuler 22.09 officially released repository -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 22.09: Epol officially released repository -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -Install gala-gopher. - -```bash -# yum install gala-gopher -``` - - - -#### Configuration - -##### Configuration Description - -The configuration file of gala-gopher is **/opt/gala-gopher/gala-gopher.conf**. The configuration items in the file are described as follows (the parts that do not need to be manually configured are not described): - -The following configurations can be modified as required: - -- `global`: gala-gopher global configuration information. - - `log_directory`: gala-gopher log file name. - - `pin_path`: path for storing the map shared by the eBPF probe. You are advised to retain the default value. -- `metric`: metric output mode. - - `out_channel`: metric output channel. The value can be `web_server` or `kafka`. If this parameter is left empty, the output channel is disabled. - - `kafka_topic`: topic configuration information if the output channel is Kafka. -- `event`: output mode of abnormal events. - - `out_channel`: event output channel. The value can be `logs` or `kafka`. If this parameter is left empty, the output channel is disabled. - - `kafka_topic`: topic configuration information if the output channel is Kafka. -- `meta`: metadata output mode. - - `out_channel`: metadata output channel. The value can be `logs` or `kafka`. If this parameter is left empty, the output channel is disabled. - - `kafka_topic`: topic configuration information if the output channel is Kafka. -- `imdb`: cache specification configuration. - - `max_tables_num`: maximum number of cache tables. In the **/opt/gala-gopher/meta** directory, each meta corresponds to a table. - - `max_records_num`: maximum number of records in each cache table. Generally, each probe generates at least one observation record in an observation period. - - `max_metrics_num`: maximum number of metrics contained in each observation record. - - `record_timeout`: aging time of the cache table. If a record in the cache table is not updated within the aging time, the record is deleted. The unit is second. -- `web_server`: configuration of the web_server output channel. - - `port`: listening port. -- `kafka`: configuration of the Kafka output channel. - - `kafka_broker`: IP address and port number of the Kafka server. -- `logs`: configuration of the logs output channel. - - `metric_dir`: path for storing metric data logs. - - `event_dir`: path for storing abnormal event data logs. - - `meta_dir`: metadata log path. - - `debug_dir`: path of gala-gopher run logs. -- `probes`: native probe configuration. - - `name`: probe name, which must be the same as the native probe name. For example, the name of the **example.probe** probe is **example**. - - `param`: probe startup parameters. For details about the supported parameters, see [Startup Parameters](#startup-parameters). - - `switch`: whether to start a probe. The value can be `on` or `off`. -- `extend_probes`: third-party probe configuration. - - `name`: probe name. - - `command`: command for starting a probe. - - `param`: probe startup parameters. For details about the supported parameters, see [Startup Parameters](#startup-parameters). - - `start_check`: If `switch` is set to `auto`, the system determines whether to start the probe based on the execution result of `start_check`. - - `switch`: whether to start a probe. The value can be `on`, `off`, or `auto`. The value `auto` determines whether to start the probe based on the result of `start_check`. - -##### Startup Parameters - -| Parameter| Description | -| ------ | ------------------------------------------------------------ | -| -l | Whether to enable the function of reporting abnormal events. | -| -t | Sampling period, in seconds. By default, the probe reports data every 5 seconds. | -| -T | Delay threshold, in ms. The default value is **0**. | -| -J | Jitter threshold, in ms. The default value is **0**. | -| -O | Offline time threshold, in ms. The default value is **0**. | -| -D | Packet loss threshold. The default value is **0**. | -| -F | If this parameter is set to `task`, data is filtered by **task_whitelist.conf**. If this parameter is set to the PID of a process, only the process is monitored.| -| -P | Range of probe programs loaded to each probe. Currently, the tcpprobe and taskprobe probes are involved.| -| -U | Resource usage threshold (upper limit). The default value is **0** (%). | -| -L | Resource usage threshold (lower limit). The default value is **0** (%). | -| -c | Whether the probe (TCP) identifies `client_port`. The default value is **0** (no). | -| -N | Name of the observation process of the specified probe (ksliprobe). The default value is **NULL**. | -| -p | Binary file path of the process to be observed, for example, `nginx_probe`. You can run `-p /user/local/sbin/nginx` to specify the Nginx file path. The default value is **NULL**.| -| -w | Filtering scope of monitored applications, for example, `-w /opt/gala-gopher/task_whitelist.conf`. You can write the names of the applications to be monitored to the **task_whitelist.conf** file. The default value is **NULL**, indicating that the applications are not filtered.| -| -n | NIC to mount tc eBPF. The default value is **NULL**, indicating that all NICs are mounted. Example: `-n eth0`| - -##### Configuration File Example - -- Select the data output channels. - - ```yaml - metric = - { - out_channel = "web_server"; - kafka_topic = "gala_gopher"; - }; - - event = - { - out_channel = "kafka"; - kafka_topic = "gala_gopher_event"; - }; - - meta = - { - out_channel = "kafka"; - kafka_topic = "gala_gopher_metadata"; - }; - ``` - -- Configure Kafka and Web Server. - - ```yaml - web_server = - { - port = 8888; - }; - - kafka = - { - kafka_broker = ":9092"; - }; - ``` - -- Select the probe to be enabled. The following is an example. - - ```yaml - probes = - ( - { - name = "system_infos"; - param = "-t 5 -w /opt/gala-gopher/task_whitelist.conf -l warn -U 80"; - switch = "on"; - }, - ); - extend_probes = - ( - { - name = "tcp"; - command = "/opt/gala-gopher/extend_probes/tcpprobe"; - param = "-l warn -c 1 -P 7"; - switch = "on"; - } - ); - ``` - - - -#### Start - -After the configuration is complete, start gala-gopher. - -```bash -# systemctl start gala-gopher.service -``` - -Query the status of the gala-gopher service. - -```bash -# systemctl status gala-gopher.service -``` - -If the following information is displayed, the service is started successfully: Check whether the enabled probe is started. If the probe thread does not exist, check the configuration file and gala-gopher run log file. - -![gala-gopher成功启动状态](./figures/gala-gopher成功启动状态.png) - -> Note: The root permission is required for deploying and running gala-gopher. - - - -#### How to Use - -##### Deployment of External Dependent Software - -![gopher软件架构图](./figures/gopher软件架构图.png) - -As shown in the preceding figure, the green parts are external dependent components of gala-gopher. gala-gopher outputs metric data to Prometheus, metadata and abnormal events to Kafka. gala-anteater and gala-spider in gray rectangles obtain data from Prometheus and Kafka. - -> Note: Obtain the installation packages of Kafka and Prometheus from the official websites. - - - -##### Output Data - -- **Metric** - - Prometheus Server has a built-in Express Browser UI. You can use PromQL statements to query metric data. For details, see [Using the expression browser](https://prometheus.io/docs/prometheus/latest/getting_started/#using-the-expression-browser) in the official document. The following is an example. - - If the specified metric is `gala_gopher_tcp_link_rcv_rtt`, the metric data displayed on the UI is as follows: - - ```basic - gala_gopher_tcp_link_rcv_rtt{client_ip="x.x.x.165",client_port="1234",hostname="openEuler",instance="x.x.x.172:8888",job="prometheus",machine_id="1fd3774xx",protocol="2",role="0",server_ip="x.x.x.172",server_port="3742",tgid="1516"} 1 - ``` - -- **Metadata** - - You can directly consume data from the Kafka topic `gala_gopher_metadata`. The following is an example. - - ```bash - # Input request - ./bin/kafka-console-consumer.sh --bootstrap-server x.x.x.165:9092 --topic gala_gopher_metadata - # Output data - {"timestamp": 1655888408000, "meta_name": "thread", "entity_name": "thread", "version": "1.0.0", "keys": ["machine_id", "pid"], "labels": ["hostname", "tgid", "comm", "major", "minor"], "metrics": ["fork_count", "task_io_wait_time_us", "task_io_count", "task_io_time_us", "task_hang_count"]} - ``` - -- **Abnormal events** - - You can directly consume data from the Kafka topic `gala_gopher_event`. The following is an example. - - ```bash - # Input request - ./bin/kafka-console-consumer.sh --bootstrap-server x.x.x.165:9092 --topic gala_gopher_event - # Output data - {"timestamp": 1655888408000, "meta_name": "thread", "entity_name": "thread", "version": "1.0.0", "keys": ["machine_id", "pid"], "labels": ["hostname", "tgid", "comm", "major", "minor"], "metrics": ["fork_count", "task_io_wait_time_us", "task_io_count", "task_io_time_us", "task_hang_count"]} - ``` diff --git a/docs/en/docs/A-Ops/using-gala-spider.md b/docs/en/docs/A-Ops/using-gala-spider.md deleted file mode 100644 index 43bcc6e022528a32145b2ab95989a97f1f238d03..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Ops/using-gala-spider.md +++ /dev/null @@ -1,541 +0,0 @@ -# Using gala-spider - -This chapter describes how to deploy and use gala-spider and gala-inference. - -## gala-spider - -gala-spider provides the OS-level topology drawing function. It periodically obtains the data of all observed objects collected by gala-gopher (an OS-level data collection software) at a certain time point and calculates the topology relationship between them. The generated topology is saved to the graph database ArangoDB. - -### Installation - -Mount the Yum sources. - -```basic -[oe-2209] # openEuler 22.09 officially released repository -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 22.09: Epol officially released repository -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -Install gala-spider. - -```sh -# yum install gala-spider -``` - - - -### Configuration - -#### Configuration File Description - -The configuration file of gala-spider is **/etc/gala-spider/gala-spider.yaml**. The configuration items in this file are described as follows: - -- `global`: global configuration information. - - `data_source`: database for collecting observation metrics. Currently, only `prometheus` is supported. - - `data_agent`: agent for collecting observation metrics. Currently, only `gala_gopher` is supported. -- `spider`: - - `log_conf`: log configuration information. - - `log_path`: log file path. - - `log_level`: level of the logs to be printed. The value can be `DEBUG`, `INFO`, `WARNING`, `ERROR`, or `CRITICAL`. - - `max_size`: log file size, in MB. - - `backup_count`: number of backup log files. -- `storage`: configuration information about the topology storage service. - - `period`: storage period, in seconds, indicating the interval for storing the topology. - - `database`: graph database for storage. Currently, only `arangodb` is supported. - - `db_conf`: configuration information of the graph database. - - `url`: IP address of the graph database server. - - `db_name`: name of the database where the topology is stored. -- `kafka`: Kafka configuration information. - - `server`: Kafka server address. - - `metadata_topic`: topic name of the observed metadata messages. - - `metadata_group_id`: consumer group ID of the observed metadata messages. -- `prometheus`: Prometheus database configuration information. - - `base_url`: IP address of the Prometheus server. - - `instant_api`: API for collecting data at a single time point. - - `range_api`: API for collecting data in a time range. - - `step`: collection time step, which is configured for `range_api`. - -#### Configuration File Example - -```yaml -global: - data_source: "prometheus" - data_agent: "gala_gopher" - -prometheus: - base_url: "http://localhost:9090/" - instant_api: "/api/v1/query" - range_api: "/api/v1/query_range" - step: 1 - -spider: - log_conf: - log_path: "/var/log/gala-spider/spider.log" - # log level: DEBUG/INFO/WARNING/ERROR/CRITICAL - log_level: INFO - # unit: MB - max_size: 10 - backup_count: 10 - -storage: - # unit: second - period: 60 - database: arangodb - db_conf: - url: "http://localhost:8529" - db_name: "spider" - -kafka: - server: "localhost:9092" - metadata_topic: "gala_gopher_metadata" - metadata_group_id: "metadata-spider" -``` - - - -### Start - -- Run the following command to start gala-spider. - - ```sh - # spider-storage - ``` - -- Use the systemd service to start gala-spider. - - ```sh - # systemctl start gala-spider - ``` - - - -### How to Use - -##### Deployment of External Dependent Software - -The running of gala-spider depends on multiple external software for interaction. Therefore, before starting gala-spider, you need to deploy the software on which gala-spider depends. The following figure shows the software dependency of gala-spider. - -![gala-spider软件架构图](./figures/gala-spider软件架构图.png) - -The dotted box on the right indicates the two functional components of gala-spider. The green parts indicate the external components that gala-spider directly depends on, and the gray rectangles indicate the external components that gala-spider indirectly depends on. - -- **spider-storage**: core component of gala-spider, which provides the topology storage function. - 1. Obtains the metadata of the observation object from Kafka. - 2. Obtains information about all observation object instances from Prometheus. - 3. Saves the generated topology to the graph database ArangoDB. -- **gala-inference**: core component of gala-spider, which provides the root cause locating function. It subscribes to abnormal KPI events from Kafka to trigger the root cause locating process of abnormal KPIs, constructs a fault propagation graph based on the topology obtained from the ArangoDB, and outputs the root cause locating result to Kafka. -- **prometheus**: time series database. The observation metric data collected by the gala-gopher component is reported to Prometheus for further processing. -- **kafka**: messaging middleware, which is used to store the observation object metadata reported by gala-gopher, exception events reported by the exception detection component gala-anteater, and root cause locating results reported by the cause-inference component. -- **arangodb**: graph database, which is used to store the topology generated by spider-storage. -- **gala-gopher**: data collection component. It must be deployed in advance. -- **arangodb-ui**: UI provided by ArangoDB, which can be used to query topologies. - -The two functional components in gala-spider are released as independent software packages. - -​ **spider-storage**: corresponds to the gala-spider software package in this section. - -​ **gala-inference**: corresponds to the gala-inference software package. - -For details about how to deploy the gala-gopher software, see [Using gala-gopher](using-gala-gopher.md). This section only describes how to deploy ArangoDB. - -The current ArangoDB version is 3.8.7, which has the following requirements on the operating environment: - -- Only the x86 system is supported. -- GCC 10 or later - -For details about ArangoDB deployment, see [Deployment](https://www.arangodb.com/docs/3.9/deployment.html) in the ArangoDB official document. - -The RPM-based ArangoDB deployment process is as follows: - -1. Configure the Yum sources. - - ```basic - [oe-2209] # openEuler 22.09 officially released repository - name=oe2209 - baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 - enabled=1 - gpgcheck=0 - priority=1 - - [oe-2209:Epol] # openEuler 22.09: Epol officially released repository - name=oe2209_epol - baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ - enabled=1 - gpgcheck=0 - priority=1 - ``` - -2. Install arangodb3. - - ```sh - # yum install arangodb3 - ``` - -3. Modify the configurations. - - The configuration file of the arangodb3 server is **/etc/arangodb3/arangod.conf**. You need to modify the following configurations: - - - `endpoint`: IP address of the arangodb3 server. - - `authentication`: whether identity authentication is required for accessing the arangodb3 server. Currently, gala-spider does not support identity authentication. Therefore, set `authentication` to `false`. - - The following is an example. - - ```yaml - [server] - endpoint = tcp://0.0.0.0:8529 - authentication = false - ``` - -4. Start arangodb3. - - ```sh - # systemctl start arangodb3 - ``` - -##### Modifying gala-spider Configuration Items - -After the dependent software is started, you need to modify some configuration items in the gala-spider configuration file. The following is an example. - -Configure the Kafka server address. - -```yaml -kafka: - server: "localhost:9092" -``` - -Configure the Prometheus server address. - -```yaml -prometheus: - base_url: "http://localhost:9090/" -``` - -Configure the IP address of the ArangoDB server. - -```yaml -storage: - db_conf: - url: "http://localhost:8529" -``` - -##### Starting the Service - -Run `systemctl start gala-spider` to start the service. Run `systemctl status gala-spider` to check the startup status. If the following information is displayed, the startup is successful: - -```sh -[root@openEuler ~]# systemctl status gala-spider -● gala-spider.service - a-ops gala spider service - Loaded: loaded (/usr/lib/systemd/system/gala-spider.service; enabled; vendor preset: disabled) - Active: active (running) since Tue 2022-08-30 17:28:38 CST; 1 day 22h ago - Main PID: 2263793 (spider-storage) - Tasks: 3 (limit: 98900) - Memory: 44.2M - CGroup: /system.slice/gala-spider.service - └─2263793 /usr/bin/python3 /usr/bin/spider-storage -``` - -##### Output Example - -You can query the topology generated by gala-spider on the UI provided by ArangoDB. The procedure is as follows: - -1. Enter the IP address of the ArangoDB server in the address box of the browser, for example, **http://localhost:8529**. The ArangoDB UI is displayed. - -2. Click **DB** in the upper right corner of the page to switch to the spider database. - -3. On the **COLLECTIONS** page, you can view the collections of observation object instances and topology relationships stored in different time segments, as shown in the following figure. - - ![spider拓扑关系图](./figures/spider拓扑关系图.png) - -4. You can query the stored topology using the AQL statements provided by ArangoDB. For details, see the [AQL Documentation](https://www.arangodb.com/docs/3.8/aql/). - - - -## gala-inference - -gala-inference provides the capability of locating root causes of abnormal KPIs. It uses the exception detection result and topology as the input and outputs the root cause locating result to Kafka. The gala-inference component is archived in the gala-spider project. - -### Installation - -Mount the Yum sources. - -```basic -[oe-2209] # openEuler 22.09 officially released repository -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 22.09: Epol officially released repository -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -Install gala-inference. - -```sh -# yum install gala-inference -``` - - - -### Configuration - -#### Configuration File Description - -The configuration items in the gala-inference configuration file **/etc/gala-inference/gala-inference.yaml** are described as follows: - -- `inference`: configuration information about the root cause locating algorithm. - - `tolerated_bias`: tolerable time offset for querying the topology at the exception time point, in seconds. - - `topo_depth`: maximum depth for topology query. - - `root_topk`: yop *K* root cause metrics generated in the root cause locating result. - - `infer_policy`: root cause derivation policy, which can be `dfs` or `rw`. - - `sample_duration`: sampling period of historical metric data, in seconds. - - `evt_valid_duration`: valid period of abnormal system metric events during root cause locating, in seconds. - - `evt_aging_duration`: aging period of abnormal metric events during root cause locating, in seconds. -- `kafka`: Kafka configuration information. - - `server`: IP address of the Kafka server. - - `metadata_topic`: configuration information about the observed metadata messages. - - `topic_id`: topic name of the observed metadata messages. - - `group_id`: consumer group ID of the observed metadata messages. - - `abnormal_kpi_topic`: configuration information about abnormal KPI event messages. - - `topic_id`: topic name of the abnormal KPI event messages. - - `group_id`: consumer group ID of the abnormal KPI event messages. - - `abnormal_metric_topic`: configuration information about abnormal metric event messages. - - `topic_id`: topic name of the abnormal metric event messages. - - `group_id`: consumer group ID of the abnormal system metric event messages. - - `consumer_to`: timeout interval for consuming abnormal system metric event messages, in seconds. - - `inference_topic`: configuration information about the output event messages of the root cause locating result. - - `topic_id`: topic name of the output event messages of the root cause locating result. -- `arangodb`: configuration information about the ArangoDB graph database, which is used to query sub-topologies required for root cause locating. - - `url`: IP address of the graph database server. - - `db_name`: name of the database where the topology is stored. -- `log_conf`: log configuration information. - - `log_path`: log file path. - - `log_level`: level of the logs to be printed. The value can be `DEBUG`, `INFO`, `WARNING`, `ERROR`, or `CRITICAL`. - - `max_size`: log file size, in MB. - - `backup_count`: number of backup log files. -- `prometheus`: Prometheus database configuration information, which is used to obtain historical time series data of metrics. - - `base_url`: IP address of the Prometheus server. - - `range_api`: API for collecting data in a time range. - - `step`: collection time step, which is configured for `range_api`. - -#### Configuration File Example - -```yaml -inference: - # Tolerable time offset for querying the topology at the exception time point, in seconds. - tolerated_bias: 120 - topo_depth: 10 - root_topk: 3 - infer_policy: "dfs" - # Unit: second - sample_duration: 600 - # Valid period of abnormal metric events during root cause locating, in seconds. - evt_valid_duration: 120 - # Aging period of abnormal metric events, in seconds. - evt_aging_duration: 600 - -kafka: - server: "localhost:9092" - metadata_topic: - topic_id: "gala_gopher_metadata" - group_id: "metadata-inference" - abnormal_kpi_topic: - topic_id: "gala_anteater_hybrid_model" - group_id: "abn-kpi-inference" - abnormal_metric_topic: - topic_id: "gala_anteater_metric" - group_id: "abn-metric-inference" - consumer_to: 1 - inference_topic: - topic_id: "gala_cause_inference" - -arangodb: - url: "http://localhost:8529" - db_name: "spider" - -log: - log_path: "/var/log/gala-inference/inference.log" - # log level: DEBUG/INFO/WARNING/ERROR/CRITICAL - log_level: INFO - # unit: MB - max_size: 10 - backup_count: 10 - -prometheus: - base_url: "http://localhost:9090/" - range_api: "/api/v1/query_range" - step: 5 -``` - - - -### Start - -- Run the following command to start gala-inference. - - ```sh - # gala-inference - ``` - -- Use the systemd service to start gala-inference. - - ```sh - # systemctl start gala-inference - ``` - - - -### How to Use - -##### Dependent Software Deployment - -The running dependency of gala-inference is the same as that of gala-spider. For details, see [Deployment of External Dependent Software](#deployment-of-external-dependent-software). In addition, gala-inference indirectly depends on the running of [gala-spider](#gala-spider) and [gala-anteater](using-gala-anteater.md). Deploy gala-spider and gala-anteater in advance. - -##### Modify configuration items. - -Modify some configuration items in the gala-inference configuration file. The following is an example. - -Configure the Kafka server address. - -```yaml -kafka: - server: "localhost:9092" -``` - -Configure the Prometheus server address. - -```yaml -prometheus: - base_url: "http://localhost:9090/" -``` - -Configure the IP address of the ArangoDB server. - -```yaml -arangodb: - url: "http://localhost:8529" -``` - -##### Starting the Service - -Run `systemctl start gala-inference` to start the service. Run `systemctl status gala-inference` to check the startup status. If the following information is displayed, the startup is successful: - -```sh -[root@openEuler ~]# systemctl status gala-inference -● gala-inference.service - a-ops gala inference service - Loaded: loaded (/usr/lib/systemd/system/gala-inference.service; enabled; vendor preset: disabled) - Active: active (running) since Tue 2022-08-30 17:55:33 CST; 1 day 22h ago - Main PID: 2445875 (gala-inference) - Tasks: 10 (limit: 98900) - Memory: 48.7M - CGroup: /system.slice/gala-inference.service - └─2445875 /usr/bin/python3 /usr/bin/gala-inference -``` - -##### Output Example - -When the exception detection module gala-anteater detects a KPI exception, it exports the corresponding abnormal KPI event to Kafka. The gala-inference keeps monitoring the message of the abnormal KPI event. If gala-inference receives the message of the abnormal KPI event, root cause locating is triggered. The root cause locating result is exported to Kafka. You can view the root cause locating result on the Kafka server. The basic procedure is as follows: - -1. If Kafka is installed using the source code, go to the Kafka installation directory. - - ```sh - cd /root/kafka_2.13-2.8.0 - ``` - -2. Run the command for consuming the topic to obtain the output of root cause locating. - - ```sh - ./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic gala_cause_inference - ``` - - Output example: - - ```json - { - "Timestamp": 1661853360000, - "event_id": "1661853360000_1fd37742xxxx_sli_12154_19", - "Atrributes": { - "event_id": "1661853360000_1fd37742xxxx_sli_12154_19" - }, - "Resource": { - "abnormal_kpi": { - "metric_id": "gala_gopher_sli_rtt_nsec", - "entity_id": "1fd37742xxxx_sli_12154_19", - "timestamp": 1661853360000, - "metric_labels": { - "machine_id": "1fd37742xxxx", - "tgid": "12154", - "conn_fd": "19" - } - }, - "cause_metrics": [ - { - "metric_id": "gala_gopher_proc_write_bytes", - "entity_id": "1fd37742xxxx_proc_12154", - "metric_labels": { - "__name__": "gala_gopher_proc_write_bytes", - "cmdline": "/opt/redis/redis-server x.x.x.172:3742", - "comm": "redis-server", - "container_id": "5a10635e2c43", - "hostname": "openEuler", - "instance": "x.x.x.172:8888", - "job": "prometheus", - "machine_id": "1fd37742xxxx", - "pgid": "12154", - "ppid": "12126", - "tgid": "12154" - }, - "timestamp": 1661853360000, - "path": [ - { - "metric_id": "gala_gopher_proc_write_bytes", - "entity_id": "1fd37742xxxx_proc_12154", - "metric_labels": { - "__name__": "gala_gopher_proc_write_bytes", - "cmdline": "/opt/redis/redis-server x.x.x.172:3742", - "comm": "redis-server", - "container_id": "5a10635e2c43", - "hostname": "openEuler", - "instance": "x.x.x.172:8888", - "job": "prometheus", - "machine_id": "1fd37742xxxx", - "pgid": "12154", - "ppid": "12126", - "tgid": "12154" - }, - "timestamp": 1661853360000 - }, - { - "metric_id": "gala_gopher_sli_rtt_nsec", - "entity_id": "1fd37742xxxx_sli_12154_19", - "metric_labels": { - "machine_id": "1fd37742xxxx", - "tgid": "12154", - "conn_fd": "19" - }, - "timestamp": 1661853360000 - } - ] - } - ] - }, - "SeverityText": "WARN", - "SeverityNumber": 13, - "Body": "A cause inferring event for an abnormal event" - } - ``` diff --git a/docs/en/docs/A-Tune/A-Tune.md b/docs/en/docs/A-Tune/A-Tune.md deleted file mode 100644 index cb94a36db10e5d10f1ed758055c3a7ad99011d38..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Tune/A-Tune.md +++ /dev/null @@ -1,5 +0,0 @@ -# A-Tune User Guide - -This document describes how to install and use A-Tune, which is a performance self-optimization software for openEuler. - -This document is intended for developers, open-source enthusiasts, and partners who use the openEuler system and want to know and use A-Tune. You need to have basic knowledge of the Linux OS. \ No newline at end of file diff --git a/docs/en/docs/A-Tune/appendixes.md b/docs/en/docs/A-Tune/appendixes.md deleted file mode 100644 index 2d776555c04a00f5a7c56e5d8b503925019af32a..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Tune/appendixes.md +++ /dev/null @@ -1,25 +0,0 @@ -# Appendixes - -- [Appendixes](#appendixes) - - [Acronyms and Abbreviations](#acronyms-and-abbreviations) - - -## Acronyms and Abbreviations - -**Table 1** Terminology - - - - - - - - - -

Term

-

Description

-

profile

-

Set of optimization items and optimal parameter configuration.

-
- - diff --git a/docs/en/docs/A-Tune/faqs.md b/docs/en/docs/A-Tune/faqs.md deleted file mode 100644 index 0a350b3ebf59fe290d0be52a0c9bd838bc54df4a..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Tune/faqs.md +++ /dev/null @@ -1,57 +0,0 @@ -# FAQs - -## Q1: An error occurs when the **train** command is used to train a model, and the message "training data failed" is displayed. - -Cause: Only one type of data is collected by using the **collection **command. - -Solution: Collect data of at least two data types for training. - - - -## Q2: The atune-adm cannot connect to the atuned service. - -Possible cause: - -1. Check whether the atuned service is started and check the atuned listening address. - - ``` - # systemctl status atuned - # netstat -nap | atuned - ``` - -2. The firewall blocks the atuned listening port. -3. The HTTP proxy is configured in the system. As a result, the connection fails. - -Solution: - -1. If the atuned service is not started, run the following command to start the service: - - ``` - # systemctl start atuned - ``` - -2. Run the following command on the atuned and atune-adm servers to allow the listening port to receive network packets. In the command, **60001** is the listening port number of the atuned server. - - ``` - # iptables -I INPUT -p tcp --dport 60001 -j ACCEPT - # iptables -I INPUT -p tcp --sport 60001 -j ACCEPT - ``` - - -1. Run the following command to delete the HTTP proxy or disable the HTTP proxy for the listening IP address without affecting services: - - ``` - # no_proxy=$no_proxy, Listening IP address - ``` - - -## Q3: The atuned service cannot be started, and the message "Job for atuned.service failed because a timeout was exceeded." is displayed. - -Cause: The hosts file does not contain the localhost information. - -Solution: Add localhost to the line starting with **127.0.0.1** in the **/etc/hosts** file. - -``` -127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 -``` - diff --git a/docs/en/docs/A-Tune/figures/en-us_image_0213178479.png b/docs/en/docs/A-Tune/figures/en-us_image_0213178479.png deleted file mode 100644 index 62ef0decdf6f1e591059904001d712a54f727e68..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/en-us_image_0213178479.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/en-us_image_0213178480.png b/docs/en/docs/A-Tune/figures/en-us_image_0213178480.png deleted file mode 100644 index ad5ed3f7beeb01e6a48707c4806606b41d687e22..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/en-us_image_0213178480.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/en-us_image_0214540398.png b/docs/en/docs/A-Tune/figures/en-us_image_0214540398.png deleted file mode 100644 index cea2292307b57854aa629ec102a5bc1b16d244a0..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/en-us_image_0214540398.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/en-us_image_0227497000.png b/docs/en/docs/A-Tune/figures/en-us_image_0227497000.png deleted file mode 100644 index 3df66e5f25177cba7fe65cfb859fab860bfb7b46..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/en-us_image_0227497000.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/en-us_image_0227497343.png b/docs/en/docs/A-Tune/figures/en-us_image_0227497343.png deleted file mode 100644 index a8654b170295b4b0be3c37187e4b227ca635fbc0..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/en-us_image_0227497343.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/en-us_image_0231122163.png b/docs/en/docs/A-Tune/figures/en-us_image_0231122163.png deleted file mode 100644 index c61c39c5f5119d84c6799b1e17285a7fe313639f..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/en-us_image_0231122163.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/en-us_image_0245342444.png b/docs/en/docs/A-Tune/figures/en-us_image_0245342444.png deleted file mode 100644 index 10f0fceb42c00c80ef49decdc0c480eb04c2ca6d..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/en-us_image_0245342444.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/picture1.png b/docs/en/docs/A-Tune/figures/picture1.png deleted file mode 100644 index 624d148b98bc9890befbecc53f29d6a4890d06af..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/picture1.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/figures/picture4.png b/docs/en/docs/A-Tune/figures/picture4.png deleted file mode 100644 index c576fd0369008e847e6943d6f99351caccf9f3e5..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/figures/picture4.png and /dev/null differ diff --git a/docs/en/docs/A-Tune/getting-to-know-a-tune.md b/docs/en/docs/A-Tune/getting-to-know-a-tune.md deleted file mode 100644 index 2092e0152e2c31ea4bf1aa95277302bcc981b6a9..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Tune/getting-to-know-a-tune.md +++ /dev/null @@ -1,195 +0,0 @@ -# Getting to Know A-Tune - -- [Getting to Know A-Tune](#getting-to-know-a-tune) - - [Introduction](#introduction) - - [Architecture](#architecture) - - [Supported Features and Service Models](#supported-features-and-service-models) - - - -## Introduction - -An operating system \(OS\) is basic software that connects applications and hardware. It is critical for users to adjust OS and application configurations and make full use of software and hardware capabilities to achieve optimal service performance. However, numerous workload types and varied applications run on the OS, and the requirements on resources are different. Currently, the application environment composed of hardware and software involves more than 7000 configuration objects. As the service complexity and optimization objects increase, the time cost for optimization increases exponentially. As a result, optimization efficiency decreases sharply. Optimization becomes complex and brings great challenges to users. - -Second, as infrastructure software, the OS provides a large number of software and hardware management capabilities. The capability required varies in different scenarios. Therefore, capabilities need to be enabled or disabled depending on scenarios, and a combination of capabilities will maximize the optimal performance of applications. - -In addition, the actual business embraces hundreds and thousands of scenarios, and each scenario involves a wide variety of hardware configurations for computing, network, and storage. The lab cannot list all applications, business scenarios, and hardware combinations. - -To address the preceding challenges, openEuler launches A-Tune. - -A-Tune is an AI-based engine that optimizes system performance. It uses AI technologies to precisely profile business scenarios, discover and infer business characteristics, so as to make intelligent decisions, match with the optimal system parameter configuration combination, and give recommendations, ensuring the optimal business running status. - -![](figures/en-us_image_0227497000.png) - -## Architecture - -The following figure shows the A-Tune core technical architecture, which consists of intelligent decision-making, system profile, and interaction system. - -- Intelligent decision-making layer: consists of the awareness and decision-making subsystems, which implements intelligent awareness of applications and system optimization decision-making, respectively. -- System profile layer: consists of the feature engineering and two-layer classification model. The feature engineering is used to automatically select service features, and the two-layer classification model is used to learn and classify service models. -- Interaction system layer: monitors and configures various system resources and executes optimization policies. - -![](figures/en-us_image_0227497343.png) - -## Supported Features and Service Models - -### Supported Features - -[Table 1](#table1919220557576) describes the main features supported by A-Tune, feature maturity, and usage suggestions. - -**Table 1** Feature maturity - - - - - - - - - - - - - - - - - - - -

Feature

-

Maturity

-

Usage Suggestion

-

Auto optimization of 15 applications in 11 workload types

-

Tested

-

Pilot

-

User-defined profile and service models

-

Tested

-

Pilot

-

Automatic parameter optimization

-

Tested

-

Pilot

-
- - -### Supported Service Models - -Based on the workload characteristics of applications, A-Tune classifies services into 11 types. For details about the bottleneck of each type and the applications supported by A-Tune, see [Table 2](#table2819164611311). - -**Table 2** Supported workload types and applications - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Service category

-

Type

-

Bottleneck

-

Supported Application

-

default

-

Default type

-

Low resource usage in terms of cpu, memory, network, and I/O

-

N/A

-

webserver

-

Web application

-

Bottlenecks of cpu and network

-

Nginx, Apache Traffic Server

-

database

-

Database

-
Bottlenecks of cpu, memory, and I/O

-

Mongodb, Mysql, Postgresql, Mariadb

-

big_data

-

Big data

-

Bottlenecks of cpu and memory

-

Hadoop-hdfs, Hadoop-spark

-

middleware

-

Middleware framework

-

Bottlenecks of cpu and network

-

Dubbo

-

in-memory_database

-

Memory database

-

Bottlenecks of memory and I/O

-

Redis

-

basic-test-suite

-

Basic test suite

-

Bottlenecks of cpu and memory

-

SPECCPU2006, SPECjbb2015

-

hpc

-

Human genome

-

Bottlenecks of cpu, memory, and I/O

-

Gatk4

-

storage

-

Storage

-

Bottlenecks of network, and I/O

-

Ceph

-

virtualization

-

Virtualization

-

Bottlenecks of cpu, memory, and I/O

-

Consumer-cloud, Mariadb

-

docker

-

Docker

-

Bottlenecks of cpu, memory, and I/O

-

Mariadb

-
- - - diff --git a/docs/en/docs/A-Tune/installation-and-deployment.md b/docs/en/docs/A-Tune/installation-and-deployment.md deleted file mode 100644 index bc3bc6d8043e57210984b6f625418fc20919df3e..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Tune/installation-and-deployment.md +++ /dev/null @@ -1,511 +0,0 @@ -# Installation and Deployment - -This chapter describes how to install and deploy A-Tune. - -- [Installation and Deployment](#installation-and-deployment) - - [Software and Hardware Requirements](#software-and-hardware-requirements) - - [Hardware Requirement](#hardware-requirement) - - [Software Requirement](#software-requirement) - - [Environment Preparation](#environment-preparation) - - [A-Tune Installation](#a-tune-installation) - - [Installation Modes](#installation-modes) - - [Installation Procedure](#installation-procedure) - - [A-Tune Deployment](#a-tune-deployment) - - [Overview](#overview) - - [Example](#example) - - [Example](#example-1) - - [Starting A-Tune](#starting-a-tune) - - [Starting A-Tune engine](#starting-a-tune-engine) - - - - -## Software and Hardware Requirements - -### Hardware Requirement - -- Huawei Kunpeng 920 processor - -### Software Requirement - -- OS: openEuler 22.03 - -## Environment Preparation - -For details about installing an openEuler OS, see the _openEuler 22.03 Installation Guide_. - -## A-Tune Installation - -This section describes the installation modes and methods of the A-Tune. - -### Installation Modes - -A-Tune can be installed in single-node, distributed, and cluster modes. - -- Single-node mode - - The client and server are installed on the same system. - -- Distributed mode - - The client and server are installed on different systems. - -- Cluster mode - A cluster consists of a client and more than one servers. - - -The installation modes are as follows: - -![](./figures/en-us_image_0231122163.png) - -   - -### Installation Procedure - -To install the A-Tune, perform the following steps: - -1. Mount an openEuler ISO image. - - ``` - # mount openEuler-22.03-LTS-everything-x86_64-dvd.iso /mnt - ``` - > Use the **everything** ISO image. - -2. Configure the local Yum source. - - ``` - # vim /etc/yum.repos.d/local.repo - ``` - - The configured contents are as follows: - - ``` - [local] - name=local - baseurl=file:///mnt - gpgcheck=1 - enabled=1 - ``` - -3. Import the GPG public key of the RPM digital signature to the system. - - ``` - # rpm --import /mnt/RPM-GPG-KEY-openEuler - ``` - - -4. Install an A-Tune server. - - >![](./public_sys-resources/icon-note.gif) **NOTE:** - >In this step, both the server and client software packages are installed. For the single-node deployment, skip **Step 5**. - - ``` - # yum install atune -y - # yum install atune-engine -y - ``` - -5. For a distributed mode, install an A-Tune client on associated server. - - ``` - # yum install atune-client -y - ``` - -6. Check whether the installation is successful. - - ``` - # rpm -qa | grep atune - atune-client-xxx - atune-db-xxx - atune-xxx - atune-engine-xxx - ``` - - If the preceding information is displayed, the installation is successful. - - -## A-Tune Deployment - -This section describes how to deploy A-Tune. - - - -### Overview - -The configuration items in the A-Tune configuration file **/etc/atuned/atuned.cnf** are described as follows: - -- A-Tune service startup configuration (modify the parameter values as required). - - - **protocol**: Protocol used by the gRPC service. The value can be **unix** or **tcp**. **unix** indicates the local socket communication mode, and **tcp** indicates the socket listening port mode. The default value is **unix**. - - **address**: Listening IP address of the gRPC service. The default value is **unix socket**. If the gRPC service is deployed in distributed mode, change the value to the listening IP address. - - **port**: Listening port of the gRPC server. The value ranges from 0 to 65535. If **protocol** is set to **unix**, you do not need to set this parameter. - - **connect**: IP address list of the nodes where the A-Tune is located when the A-Tune is deployed in a cluster. IP addresses are separated by commas (,). - - **rest_host**: Listening address of the REST service. The default value is localhost. - - **rest_port**: Listening port of the REST service. The value ranges from 0 to 65535. The default value is 8383. - - **engine_host**: IP address for connecting to the A-Tune engine service of the system. - - **engine_port**: Port for connecting to the A-Tune engine service of the system. - - **sample_num**: Number of samples collected when the system executes the analysis process. The default value is 20. - - **interval**: Interval for collecting samples when the system executes the analysis process. The default value is 5s. - - **grpc_tls**: Indicates whether to enable SSL/TLS certificate verification for the gRPC service. By default, this function is disabled. After grpc_tls is enabled, you need to set the following environment variables before running the **atune-adm** command to communicate with the server: - - export ATUNE_TLS=yes - - export ATUNED_CACERT= - - export ATUNED_CLIENTCERT= - - export ATUNED_CLIENTKEY= - - export ATUNED_SERVERCN=server - - **tlsservercafile**: Path of the gPRC server's CA certificate. - - **tlsservercertfile**: Path of the gPRC server certificate. - - **tlsserverkeyfile**: Path of the gPRC server key. - - **rest_tls**: Indicates whether to enable SSL/TLS certificate verification for the REST service. This function is enabled by default. - - **tlsrestcacertfile**: Path of the server's CA certificate of the REST service. - - **tlsrestservercertfile**: Path of the server certificate of the REST service. - - **tlsrestserverkeyfile**: Indicates the key path of the REST service. - - **engine_tls**: Indicates whether to enable SSL/TLS certificate verification for the A-Tune engine service. This function is enabled by default.. - - **tlsenginecacertfile**: Path of the client CA certificate of the A-Tune engine service. - - **tlsengineclientcertfile**: Client certificate path of the A-Tune engine service. - - **tlsengineclientkeyfile**: Client key path of the A-Tune engine service. - -- System information - - System is the parameter information required for system optimization. You must modify the parameter information according to the actual situation. - - - **disk**: Disk information to be collected during the analysis process or specified disk during disk optimization. - - **network**: NIC information to be collected during the analysis process or specified NIC during NIC optimization. - - **user**: User name used for ulimit optimization. Currently, only the user **root** is supported. - -- Log information - - Change the log level as required. The default log level is info. Log information is recorded in the **/var/log/messages** file. - -- Monitor information - - Hardware information that is collected by default when the system is started. - -- Tuning information - - Tuning is the parameter information required for offline tuning. - - - **noise**: Evaluation value of Gaussian noise. - - **sel_feature**: Indicates whether to enable the function of generating the importance ranking of offline tuning parameters. By default, this function is disabled. - - -#### Example - -``` -#################################### server ############################### - # atuned config - [server] - # the protocol grpc server running on - # ranges: unix or tcp - protocol = unix - - # the address that the grpc server to bind to - # default is unix socket /var/run/atuned/atuned.sock - # ranges: /var/run/atuned/atuned.sock or ip address - address = /var/run/atuned/atuned.sock - - # the atune nodes in cluster mode, separated by commas - # it is valid when protocol is tcp - # connect = ip01,ip02,ip03 - - # the atuned grpc listening port - # the port can be set between 0 to 65535 which not be used - # port = 60001 - - # the rest service listening port, default is 8383 - # the port can be set between 0 to 65535 which not be used - rest_host = localhost - rest_port = 8383 - - # the tuning optimizer host and port, start by engine.service - # if engine_host is same as rest_host, two ports cannot be same - # the port can be set between 0 to 65535 which not be used - engine_host = localhost - engine_port = 3838 - - # when run analysis command, the numbers of collected data. - # default is 20 - sample_num = 20 - - # interval for collecting data, default is 5s - interval = 5 - - # enable gRPC authentication SSL/TLS - # default is false - # grpc_tls = false - # tlsservercafile = /etc/atuned/grpc_certs/ca.crt - # tlsservercertfile = /etc/atuned/grpc_certs/server.crt - # tlsserverkeyfile = /etc/atuned/grpc_certs/server.key - - # enable rest server authentication SSL/TLS - # default is true - rest_tls = true - tlsrestcacertfile = /etc/atuned/rest_certs/ca.crt - tlsrestservercertfile = /etc/atuned/rest_certs/server.crt - tlsrestserverkeyfile = /etc/atuned/rest_certs/server.key - - # enable engine server authentication SSL/TLS - # default is true - engine_tls = true - tlsenginecacertfile = /etc/atuned/engine_certs/ca.crt - tlsengineclientcertfile = /etc/atuned/engine_certs/client.crt - tlsengineclientkeyfile = /etc/atuned/engine_certs/client.key - - - #################################### log ############################### - [log] - # either "debug", "info", "warn", "error", "critical", default is "info" - level = info - - #################################### monitor ############################### - [monitor] - # with the module and format of the MPI, the format is {module}_{purpose} - # the module is Either "mem", "net", "cpu", "storage" - # the purpose is "topo" - module = mem_topo, cpu_topo - - #################################### system ############################### - # you can add arbitrary key-value here, just like key = value - # you can use the key in the profile - [system] - # the disk to be analysis - disk = sda - - # the network to be analysis - network = enp189s0f0 - - user = root - - #################################### tuning ############################### - # tuning configs - [tuning] - noise = 0.000000001 - sel_feature = false -``` - -The configuration items in the configuration file **/etc/atuned/engine.cnf** of the A-Tune engine are described as follows: - -- Startup configuration of the A-Tune engine service (modify the parameter values as required). - - - **engine_host**: Listening address of the A-Tune engine service. The default value is localhost. - - **engine_port**: Listening port of the A-Tune engine service. The value ranges from 0 to 65535. The default value is 3838. - - **engine_tls**: Indicates whether to enable SSL/TLS certificate verification for the A-Tune engine service. This function is enabled by default. - - **tlsenginecacertfile**: Path of the server CA certificate of the A-Tune engine service. - - **tlsengineservercertfile**: Path of the server certificate of the A-Tune engine service. - - **tlsengineserverkeyfile**: Server key path of the A-Tune engine service. - -- Log information - - Change the log level as required. The default log level is info. Log information is recorded in the **/var/log/messages** file. - - -#### Example - -``` -#################################### engine ############################### - [server] - # the tuning optimizer host and port, start by engine.service - # if engine_host is same as rest_host, two ports cannot be same - # the port can be set between 0 to 65535 which not be used - engine_host = localhost - engine_port = 3838 - - # enable engine server authentication SSL/TLS - # default is true - engine_tls = true - tlsenginecacertfile = /etc/atuned/engine_certs/ca.crt - tlsengineservercertfile = /etc/atuned/engine_certs/server.crt - tlsengineserverkeyfile = /etc/atuned/engine_certs/server.key - - #################################### log ############################### - [log] - # either "debug", "info", "warn", "error", "critical", default is "info" - level = info -``` - -## Starting A-Tune - -After A-Tune is installed, you need to configure the A-Tune service before starting it. -- Configure the A-Tune service. - Modify the network adapter and drive information in the **atuned.cnf** configuration file. - > Note: - > - > If atuned is installed through `make install`, the network adapter and drive information in the configuration file is automatically updated to the default devices on the machine. To collect data from other devices, perform the following steps to configure atuned. - - Run the following command to search for the network adapter that needs to be specified for optimization or data collection, and change the value of **network** in the **/etc/atuned/atuned.cnf** file to the specified network adapter: - ``` - ip addr - ``` - Run the following command to search for the drive that need to be specified for optimization or data collection, and change the value of **disk** in the **/etc/atuned/atuned.cnf** file to the specified drive: - ``` - fdisk -l | grep dev - ``` -- About the certificate: - The A-Tune engine and client use the gRPC communication protocol. Therefore, you need to configure a certificate to ensure system security. For information security purposes, A-Tune does not provide a certificate generation method. You need to configure a system certificate by yourself. - If security is not considered, set **rest_tls** and **engine_tls** in the **/etc/atuned/atuned.cnf** file to **false**, set **engine_tls** in the **/etc/atuned/engine.cnf** file to **false**. - A-Tune is not liable for any consequences incurred if no security certificate is configured. - -- Start the atuned service. - - ``` - # systemctl start atuned - ``` - - -- Query the atuned service status. - - ``` - # systemctl status atuned - ``` - - If the following command output is displayed, the service is started successfully: - - ![](./figures/en-us_image_0214540398.png) - -## Starting A-Tune Engine - -To use AI functions, you need to start the A-Tune engine service. - -- Start the atune-engine service. - - ``` - # systemctl start atune-engine - ``` - - -- Query the atune-engine service status. - - ``` - # systemctl status atune-engine - ``` - - If the following command output is displayed, the service is started successfully: - - ![](./figures/en-us_image_0245342444.png) - -## Distributed Deployment - -### Purpose of Distributed Deployment -A-Tune supports distributed deployment to implement distributed architecture and on-demand deployment. The components of A-Tune can be deployed separately. Lightweight component deployment has little impact on services and avoids installing too many dependencies to reduce the system load. - -This document describes only a common deployment mode: deploying the client and server on the same node and deploying the engine module on another node. For details about other deployment modes, contact A-Tune developers. - -**Deployment relationship** -![](figures/picture1.png) - -### Configuration File -In distributed deployment mode, you need to configure the write the IP address and port number of the engine in the configuration file so that other components can access the engine component through the IP address. - -1. Modify the **/etc/atuned/atuned.cnf** file on the server node. - - Change the values of **engine_host** and **engine_port** in line 34 to the IP address and port number of the engine node. For the deployment in the preceding figure, the values are **engine_host = 192.168.0.1 engine_port = 3838**. - - Change the values of **rest_tls** and **engine_tls** in lines 49 and 55 to **false**. Otherwise, you need to apply for and configure certificates. You do not need to configure SSL certificates in the test environment. However, you need to configure SSL certificates in the production environment to prevent security risks. -2. Modify the** /etc/atuned/engine.cnf** file on the engine node. - - Change the values of **engine_host** and **engine_port** in lines 17 and 18 to the IP address and port number of the engine node. For the deployment in the preceding figure, the value are **engine_host = 192.168.0.1 engine_port = 3838**. - - Change the value of **engine_tls** in line 22 to **false**. -3. After modifying the configuration file, restart the service for the modification to take effect. - - Run the `systemctl restart atuned command` on the server node. - - Run the `systemctl restart atune-engine` command on the engine node. -4. (Optional) Run the `tuning` command in the **A-Tune/examples/tuning/compress** folder. - - For details, see **A-Tune/examples/tuning/compress/README**. - - Run the `atune-adm tuning --project compress --detail compress_client.yaml` command. - - This step is to check whether the distributed deployment is successful. - -### Precautions -1. This document does not describe how to configure the authentication certificates. You can set **rest_tls** or **engine_tls** in the **atuned.cnf** and **engine.cnf** files to **false** if necessary. -2. After modifying the configuration file, restart the service. Otherwise, the modification does not take effect. -3. Do not enable the proxy when using A-Tune. -4. The **disk** and **network** items of the **[system]** section in the **atuned.cnf** file need to be modified. For details about how to modify the items, see the [A-Tune User Guide](https://gitee.com/gaoruoshu/A-Tune/blob/master/Documentation/UserGuide/A-Tune%E7%94%A8%E6%88%B7%E6%8C%87%E5%8D%97.md). - -### Example -#### atuned.cnf -```bash -# ...... - -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.0.1 -engine_port = 3838 - -# ...... -``` -#### engine.cnf -```bash -[server] -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.0.1 -engine_port = 3838 -``` -## Cluster Deployment - -### Purpose of Cluster Deployment -To support fast tuning in multi-node scenarios, A-Tune supports dynamic tuning of parameter settings on multiple nodes at the same time. In this way, you do not need to tune each node separately, improving tuning efficiency. -Cluster deployment mode consists of one master node and several agent nodes. The client and server are deployed on the master node to receive commands and interact with the engine. Other nodes receive instructions from the master node and configure the parameters of the current node. - -**Deployment relationship** - ![](figures/picture4.png) - -In the preceding figure, the client and server are deployed on the node whose IP address is 192.168.0.0. Project files are stored on this node. Other nodes do not contain project files. -The master node communicates with the agent nodes through TCP. Therefore, you need to modify the configuration file. - -### Modifications to atuned.cnf -1. Set the value of **protocol** to **tcp**. -2. Set the value of **address** to the IP address of the current node. -3. Set the value of **connect** to the IP addresses of all nodes. The first IP address is the IP address of the master node, and the subsequent IP addresses are the IP addresses of agent nodes. Use commas (,) to separate the IP addresses. -4. During debugging, you can set **rest_tls** and **engine_tls** to **false**. -5. Perform the same modification on the **atuned.cnf** files of all the master and agent nodes. - -### Precautions -1. The values of **engine_host** and **engine_port** must be consistent in the **engine.cnf** file and the **atuned.cnf** file on the server. -2. This document does not describe how to configure the authentication certificates. You can set **rest_tls** or **engine_tls** in the **atuned.cnf** and **engine.cnf** files to **false** if necessary. -3. After modifying the configuration file, restart the service. Otherwise, the modification does not take effect. -4. Do not enable the proxy when using A-Tune. - -### Example -#### atuned.cnf -```bash -# ...... - -[server] -# the protocol grpc server running on -# ranges: unix or tcp -protocol = tcp - -# the address that the grpc server to bind to -# default is unix socket /var/run/atuned/atuned.sock -# ranges: /var/run/atuned/atuned.sock or ip address -address = 192.168.0.0 - -# the atune nodes in cluster mode, separated by commas -# it is valid when protocol is tcp -connect = 192.168.0.0,192.168.0.1,192.168.0.2,192.168.0.3 - -# the atuned grpc listening port -# the port can be set between 0 to 65535 which not be used -port = 60001 - -# the rest service listening port, default is 8383 -# the port can be set between 0 to 65535 which not be used -rest_host = localhost -rest_port = 8383 - -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.1.1 -engine_port = 3838 - -# ...... -``` - -#### engine.cnf -```bash -[server] -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.1.1 -engine_port = 3838 -``` - -Note: For details about the **engine.cnf** file, see the configuration file for distributed deployment. diff --git a/docs/en/docs/A-Tune/public_sys-resources/icon-caution.gif b/docs/en/docs/A-Tune/public_sys-resources/icon-caution.gif deleted file mode 100644 index 6e90d7cfc2193e39e10bb58c38d01a23f045d571..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/public_sys-resources/icon-caution.gif and /dev/null differ diff --git a/docs/en/docs/A-Tune/public_sys-resources/icon-danger.gif b/docs/en/docs/A-Tune/public_sys-resources/icon-danger.gif deleted file mode 100644 index 6e90d7cfc2193e39e10bb58c38d01a23f045d571..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/public_sys-resources/icon-danger.gif and /dev/null differ diff --git a/docs/en/docs/A-Tune/public_sys-resources/icon-note.gif b/docs/en/docs/A-Tune/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/public_sys-resources/icon-note.gif and /dev/null differ diff --git a/docs/en/docs/A-Tune/public_sys-resources/icon-notice.gif b/docs/en/docs/A-Tune/public_sys-resources/icon-notice.gif deleted file mode 100644 index 86024f61b691400bea99e5b1f506d9d9aef36e27..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/public_sys-resources/icon-notice.gif and /dev/null differ diff --git a/docs/en/docs/A-Tune/public_sys-resources/icon-tip.gif b/docs/en/docs/A-Tune/public_sys-resources/icon-tip.gif deleted file mode 100644 index 93aa72053b510e456b149f36a0972703ea9999b7..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/public_sys-resources/icon-tip.gif and /dev/null differ diff --git a/docs/en/docs/A-Tune/public_sys-resources/icon-warning.gif b/docs/en/docs/A-Tune/public_sys-resources/icon-warning.gif deleted file mode 100644 index 6e90d7cfc2193e39e10bb58c38d01a23f045d571..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/A-Tune/public_sys-resources/icon-warning.gif and /dev/null differ diff --git a/docs/en/docs/A-Tune/usage-instructions.md b/docs/en/docs/A-Tune/usage-instructions.md deleted file mode 100644 index f7e9d88251cd92be878e8f3d56f0607563f44d79..0000000000000000000000000000000000000000 --- a/docs/en/docs/A-Tune/usage-instructions.md +++ /dev/null @@ -1,1146 +0,0 @@ -# Usage Instructions - -You can use functions provided by A-Tune through the CLI client atune-adm. This chapter describes the functions and usage of the A-Tune client. - -- [Usage Instructions](#usage-instructions) - - [Overview](#overview) - - [Querying Workload Types](#querying-workload-types) - - [list](#list) - - [Workload Type Analysis and Auto Optimization](#workload-type-analysis-and-auto-optimization) - - [analysis](#analysis) - - [User-defined Model](#user-defined-model) - - [define](#define) - - [collection](#collection) - - [train](#train) - - [undefine](#undefine) - - [Querying Profiles](#querying-profiles) - - [info](#info) - - [Updating a Profile](#updating-a-profile) - - [update](#update) - - [Activating a Profile](#activating-a-profile) - - [profile](#profile) - - [Rolling Back Profiles](#rolling-back-profiles) - - [rollback](#rollback) - - [Updating Database](#updating-database) - - [upgrade](#upgrade) - - [Querying System Information](#querying-system-information) - - [check](#check) - - [Automatic Parameter Optimization](#automatic-parameter-optimization) - - [Tuning](#tuning) - - - -## Overview - -- You can run the **atune-adm help/--help/-h** command to query commands supported by atune-adm. -- The **define**, **update**, **undefine**, **collection**, **train**, and **upgrade **commands do not support remote execution. -- In the command format, brackets \(\[\]\) indicate that the parameter is optional, and angle brackets \(<\>\) indicate that the parameter is mandatory. The actual parameters prevail. - - -## Querying Workload Types - - - -### list - -#### Function - -Query the supported profiles, and the values of Active. - -#### Format - -**atune-adm list** - -#### Example - -``` -# atune-adm list - -Support profiles: -+------------------------------------------------+-----------+ -| ProfileName | Active | -+================================================+===========+ -| arm-native-android-container-robox | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-fio | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-lmbench | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-netperf | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-stream | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-unixbench | false | -+------------------------------------------------+-----------+ -| basic-test-suite-speccpu-speccpu2006 | false | -+------------------------------------------------+-----------+ -| basic-test-suite-specjbb-specjbb2015 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-hdfs-dfsio-hdd | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-hdfs-dfsio-ssd | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-bayesian | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-kmeans | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql1 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql10 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql2 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql3 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql4 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql5 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql6 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql7 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql8 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql9 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-tersort | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-wordcount | false | -+------------------------------------------------+-----------+ -| cloud-compute-kvm-host | false | -+------------------------------------------------+-----------+ -| database-mariadb-2p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| database-mariadb-4p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| database-mongodb-2p-sysbench | false | -+------------------------------------------------+-----------+ -| database-mysql-2p-sysbench-hdd | false | -+------------------------------------------------+-----------+ -| database-mysql-2p-sysbench-ssd | false | -+------------------------------------------------+-----------+ -| database-postgresql-2p-sysbench-hdd | false | -+------------------------------------------------+-----------+ -| database-postgresql-2p-sysbench-ssd | false | -+------------------------------------------------+-----------+ -| default-default | false | -+------------------------------------------------+-----------+ -| docker-mariadb-2p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| docker-mariadb-4p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| hpc-gatk4-human-genome | false | -+------------------------------------------------+-----------+ -| in-memory-database-redis-redis-benchmark | false | -+------------------------------------------------+-----------+ -| middleware-dubbo-dubbo-benchmark | false | -+------------------------------------------------+-----------+ -| storage-ceph-vdbench-hdd | false | -+------------------------------------------------+-----------+ -| storage-ceph-vdbench-ssd | false | -+------------------------------------------------+-----------+ -| virtualization-consumer-cloud-olc | false | -+------------------------------------------------+-----------+ -| virtualization-mariadb-2p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| virtualization-mariadb-4p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| web-apache-traffic-server-spirent-pingpo | false | -+------------------------------------------------+-----------+ -| web-nginx-http-long-connection | true | -+------------------------------------------------+-----------+ -| web-nginx-https-short-connection | false | -+------------------------------------------------+-----------+ - -``` - ->![](public_sys-resources/icon-note.gif) **NOTE:** ->If the value of Active is **true**, the profile is activated. In the example, the profile of web-nginx-http-long-connection is activated. - -## Workload Type Analysis and Auto Optimization - - -### analysis - -#### Function - -Collect real-time statistics from the system to identify and automatically optimize workload types. - -#### Format - -**atune-adm analysis** \[OPTIONS\] - -#### Parameter Description - -- OPTIONS - - - - - - - - - - - - - - - - - - - - -

Parameter

-

Description

-

--model, -m

-

New model generated after user self-training

-

--characterization, -c

-

Use the default model for application identification and do not perform automatic optimization

-

--times value, -t value

-

Time duration for data collection

-

--script value, -s value

-

File to be executed

-
- - -#### Example - -- Use the default model for application identification. - - ``` - # atune-adm analysis --characterization - ``` - -- Use the default model to identify applications and perform automatic tuning. - - ``` - # atune-adm analysis - ``` - -- Use the user-defined training model for recognition. - - ``` - # atune-adm analysis --model /usr/libexec/atuned/analysis/models/new-model.m - ``` - - -## User-defined Model - -A-Tune allows users to define and learn new models. To define a new model, perform the following steps: - -1. Run the **define** command to define a new profile. -2. Run the **collection** command to collect the system data corresponding to the application. -3. Run the **train** command to train the model. - - -### define - -#### Function - -Add a user-defined application scenarios and the corresponding profile tuning items. - -#### Format - -**atune-adm define** - -#### Example - -Add a profile whose service_type is **test_service**, application_name is **test_app**, scenario_name is **test_scenario**, and tuning item configuration file is **example.conf**. - -``` -# atune-adm define test_service test_app test_scenario ./example.conf -``` - -The **example.conf** file can be written as follows (the following optimization items are optional and are for reference only). You can also run the **atune-adm info** command to view how the existing profile is written. - -``` - [main] - # list its parent profile - [kernel_config] - # to change the kernel config - [bios] - # to change the bios config - [bootloader.grub2] - # to change the grub2 config - [sysfs] - # to change the /sys/* config - [systemctl] - # to change the system service status - [sysctl] - # to change the /proc/sys/* config - [script] - # the script extension of cpi - [ulimit] - # to change the resources limit of user - [schedule_policy] - # to change the schedule policy - [check] - # check the environment - [tip] - # the recommended optimization, which should be performed manunaly -``` - -### collection - -#### Function - -Collect the global resource usage and OS status information during service running, and save the collected information to a CSV output file as the input dataset for model training. - ->![](public_sys-resources/icon-note.gif) **NOTE:** ->- This command depends on the sampling tools such as perf, mpstat, vmstat, iostat, and sar. ->- Currently, only the Kunpeng 920 CPU is supported. You can run the **dmidecode -t processor** command to check the CPU model. - -#### Format - -**atune-adm collection** - -#### Parameter Description - -- OPTIONS - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Parameter

-

Description

-

--filename, -f

-

Name of the generated CSV file used for training: name-timestamp.csv

-

--output_path, -o

-

Path for storing the generated CSV file. The absolute path is required.

-

--disk, -b

-

Disk used during service running, for example, /dev/sda.

-

--network, -n

-

Network port used during service running, for example, eth0.

-

--app_type, -t

-

Mark the application type of the service as a label for training.

-

--duration, -d

-

Data collection time during service running, in seconds. The default collection time is 1200 seconds.

-

--interval, -i

-

Interval for collecting data, in seconds. The default interval is 5 seconds.

-
- - -#### Example - -``` -# atune-adm collection --filename name --interval 5 --duration 1200 --output_path /home/data --disk sda --network eth0 --app_type test_service-test_app-test_scenario -``` - -> Note: -> -> In the example, data is collected every 5 seconds for a duration of 1200 seconds. The collected data is stored as the *name* file in the **/home/data** directory. The application type of the service is defined by the `atune-adm define` command, which is **test_service-test_app-test_scenario** in this example. -> The data collection interval and duration can be specified using the preceding command options. - - -### train - -#### Function - -Use the collected data to train the model. Collect data of at least two application types during training. Otherwise, an error is reported. - -#### Format - -**atune-adm train** - -#### Parameter Description - -- OPTIONS - - | Parameter | Description | - | ----------------- | ------------------------------------------------------ | - | --data_path, -d | Path for storing CSV files required for model training | - | --output_file, -o | Model generated through training | - - -#### Example - -Use the CSV file in the **data** directory as the training input. The generated model **new-model.m** is stored in the **model** directory. - -``` -# atune-adm train --data_path /home/data --output_file /usr/libexec/atuned/analysis/models/new-model.m -``` - -### undefine - -#### Function - -Delete a user-defined profile. - -#### Format - -**atune-adm undefine** - -#### Example - -Delete the user-defined profile. - -``` -# atune-adm undefine test_service-test_app-test_scenario -``` - -## Querying Profiles - - -### info - -#### Function - -View the profile content. - -#### Format - -**atune-adm info** - -#### Example - -View the profile content of web-nginx-http-long-connection. - -``` -# atune-adm info web-nginx-http-long-connection - -*** web-nginx-http-long-connection: - -# -# nginx http long connection A-Tune configuration -# -[main] -include = default-default - -[kernel_config] -#TODO CONFIG - -[bios] -#TODO CONFIG - -[bootloader.grub2] -iommu.passthrough = 1 - -[sysfs] -#TODO CONFIG - -[systemctl] -sysmonitor = stop -irqbalance = stop - -[sysctl] -fs.file-max = 6553600 -fs.suid_dumpable = 1 -fs.aio-max-nr = 1048576 -kernel.shmmax = 68719476736 -kernel.shmall = 4294967296 -kernel.shmmni = 4096 -kernel.sem = 250 32000 100 128 -net.ipv4.tcp_tw_reuse = 1 -net.ipv4.tcp_syncookies = 1 -net.ipv4.ip_local_port_range = 1024 65500 -net.ipv4.tcp_max_tw_buckets = 5000 -net.core.somaxconn = 65535 -net.core.netdev_max_backlog = 262144 -net.ipv4.tcp_max_orphans = 262144 -net.ipv4.tcp_max_syn_backlog = 262144 -net.ipv4.tcp_timestamps = 0 -net.ipv4.tcp_synack_retries = 1 -net.ipv4.tcp_syn_retries = 1 -net.ipv4.tcp_fin_timeout = 1 -net.ipv4.tcp_keepalive_time = 60 -net.ipv4.tcp_mem = 362619 483495 725238 -net.ipv4.tcp_rmem = 4096 87380 6291456 -net.ipv4.tcp_wmem = 4096 16384 4194304 -net.core.wmem_default = 8388608 -net.core.rmem_default = 8388608 -net.core.rmem_max = 16777216 -net.core.wmem_max = 16777216 - -[script] -prefetch = off -ethtool = -X {network} hfunc toeplitz - -[ulimit] -{user}.hard.nofile = 102400 -{user}.soft.nofile = 102400 - -[schedule_policy] -#TODO CONFIG - -[check] -#TODO CONFIG - -[tip] -SELinux provides extra control and security features to linux kernel. Disabling SELinux will improve the performance but may cause security risks. = kernel -disable the nginx log = application -``` - -## Updating a Profile - -You can update the existing profile as required. - - -### update - -#### Function - -Update the original tuning items in the existing profile to the content in the **new.conf** file. - -#### Format - -**atune-adm update** - -#### Example - -Change the tuning item of the profile named **test_service-test_app-test_scenario** to **new.conf**. - -``` -# atune-adm update test_service-test_app-test_scenario ./new.conf -``` - -## Activating a Profile - -### profile - -#### Function - -Manually activate the profile to make it in the active state. - -#### Format - -**atune-adm profile** - -#### Parameter Description - -For details about the profile name, see the query result of the list command. - -#### Example - -Activate the profile corresponding to the web-nginx-http-long-connection. - -``` -# atune-adm profile web-nginx-http-long-connection -``` - -## Rolling Back Profiles - -### rollback - -#### Functions - -Roll back the current configuration to the initial configuration of the system. - -#### Format - -**atune-adm rollback** - -#### Example - -``` -# atune-adm rollback -``` - -## Updating Database - -### upgrade - -#### Function - -Update the system database. - -#### Format - -**atune-adm upgrade** - -#### Parameter Description - -- DB\_FILE - - New database file path. - - -#### Example - -The database is updated to **new\_sqlite.db**. - -``` -# atune-adm upgrade ./new_sqlite.db -``` - -## Querying System Information - - -### check - -#### Function - -Check the CPU, BIOS, OS, and NIC information. - -#### Format - -**atune-adm check** - -#### Example - -``` -# atune-adm check - cpu information: - cpu:0 version: Kunpeng 920-6426 speed: 2600000000 HZ cores: 64 - cpu:1 version: Kunpeng 920-6426 speed: 2600000000 HZ cores: 64 - system information: - DMIBIOSVersion: 0.59 - OSRelease: 4.19.36-vhulk1906.3.0.h356.eulerosv2r8.aarch64 - network information: - name: eth0 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth1 product: HNS GE/10GE/25GE Network Controller - name: eth2 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth3 product: HNS GE/10GE/25GE Network Controller - name: eth4 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth5 product: HNS GE/10GE/25GE Network Controller - name: eth6 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth7 product: HNS GE/10GE/25GE Network Controller - name: docker0 product: -``` - -## Automatic Parameter Optimization - -A-Tune provides the automatic search capability with the optimal configuration, saving the trouble of manually configuring parameters and performance evaluation. This greatly improves the search efficiency of optimal configurations. - - -### Tuning - -#### Function - -Use the specified project file to search the dynamic space for parameters and find the optimal solution under the current environment configuration. - -#### Format - -**atune-adm tuning** \[OPTIONS\] - ->![](public_sys-resources/icon-note.gif) **NOTE:** ->Before running the command, ensure that the following conditions are met: ->1. The YAML configuration file on the server has been edited and stored in the **/etc/atuned/tuning/** directory of the atuned service. ->2. The YAML configuration file of the client has been edited and stored on the atuned client. - -#### Parameter Description - -- OPTIONS - - - - - - - - - - - - - - - - - - -

Parameter

-

Description

-

--restore, -r

-

Restores the initial configuration before tuning.

-

--project, -p

-

Specifies the project name in the YAML file to be restored.

-

--restart, -c

-

Perform tuning based on historical tuning results.

-

--detail, -d

-

Print detailed information about the tuning process.

-
- - - >![](public_sys-resources/icon-note.gif) **NOTE:** - >If this parameter is used, the -p parameter must be followed by a specific project name and the YAML file of the project must be specified. - - -- **PROJECT\_YAML**: YAML configuration file of the client. - -#### Configuration Description - -**Table 1** YAML file on the server - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name

-

Description

-

Type

-

Value Range

-

project

-

Project name.

-

Character string

-

-

-

startworkload

-

Script for starting the service to be optimized.

-

Character string

-

-

-

stopworkload

-

Script for stopping the service to be optimized.

-

Character string

-

-

-

maxiterations

-

Maximum number of optimization iterations, which is used to limit the number of iterations on the client. Generally, the more optimization iterations, the better the optimization effect, but the longer the time required. Set this parameter based on the site requirements.

-

Integer

-

>10

-

object

-

Parameters to be optimized and related information.

-

For details about the object configuration items, see Table 2.

-

-

-

-

-
- -**Table 2** Description of object configuration items - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name

-

Description

-

Type

-

Value Range

-

name

-

Parameter to be optimized.

-

Character string

-

-

-

desc

-

Description of parameters to be optimized.

-

Character string

-

-

-

get

-

Script for querying parameter values.

-

-

-

-

-

set

-

Script for setting parameter values.

-

-

-

-

-

needrestart

-

Specifies whether to restart the service for the parameter to take effect.

-

Enumeration

-

true or false

-

type

-

Parameter type. Currently, the discrete and continuous types are supported.

-

Enumeration

-

discrete or continuous

-

dtype

-

This parameter is available only when type is set to discrete. Currently, int, float and string are supported.

-

Enumeration

-

int, float, string

-

scope

-

Parameter setting range. This parameter is valid only when type is set to discrete and dtype is set to int or float, or type is set to continuous.

-

Integer/Float

-

The value is user-defined and must be within the valid range of this parameter.

-

step

-

Parameter value step, which is used when dtype is set to int or float.

-

Integer/Float

-

This value is user-defined.

-

items

-

Enumerated value of which the parameter value is not within the scope. This is used when dtype is set to int or float.

-

Integer/Float

-

The value is user-defined and must be within the valid range of this parameter.

-

options

-

Enumerated value range of the parameter value, which is used when dtype is set to string.

-

Character string

-

The value is user-defined and must be within the valid range of this parameter.

-
- -**Table 3** Description of configuration items of a YAML file on the client - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name

-

Description

-

Type

-

Value Range

-

project

-

Project name, which must be the same as that in the configuration file on the server.

-

Character string

-

-

-

engine

-

Tuning algorithm.

-

Character string

-

"random", "forest", "gbrt", "bayes", "extraTrees"

-

iterations

-

Number of optimization iterations.

-

Integer

-

≥ 10

-

random_starts

-

Number of random iterations.

-

Integer

-

< iterations

-

feature_filter_engine

-

Parameter search algorithm, which is used to select important parameters. This parameter is optional.

-

Character string

-

"lhs"

-

feature_filter_cycle

-

Parameter search cycles, which is used to select important parameters. This parameter is used together with feature_filter_engine.

-

Integer

-

-

-

feature_filter_iters

-

Number of iterations for each cycle of parameter search, which is used to select important parameters. This parameter is used together with feature_filter_engine.

-

Integer

-

-

-

split_count

-

Number of evenly selected parameters in the value range of tuning parameters, which is used to select important parameters. This parameter is used together with feature_filter_engine.

-

Integer

-

-

-

benchmark

-

Performance test script.

-

-

-

-

-

evaluations

-

Performance test evaluation index.

-

For details about the evaluations configuration items, see Table 4.

-

-

-

-

-
- - -**Table 4** Description of evaluations configuration item - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name

-

Description

-

Type

-

Value Range

-

name

-

Evaluation index name.

-

Character string

-

-

-

get

-

Script for obtaining performance evaluation results.

-

-

-

-

-

type

-

Specifies a positive or negative type of the evaluation result. The value positive indicates that the performance value is minimized, and the value negative indicates that the performance value is maximized.

-

Enumeration

-

positive or negative

-

weight

-

Weight of the index. The value ranges from 0 to 100.

-

Integer

-

0-100

-

threshold

-

Minimum performance requirement of the index.

-

Integer

-

User-defined

-
- -#### Example - -The following is an example of the YAML file configuration on a server: - -``` -project: "compress" -maxiterations: 500 -startworkload: "" -stopworkload: "" -object : - - - name : "compressLevel" - info : - desc : "The compresslevel parameter is an integer from 1 to 9 controlling the level of compression" - get : "cat /root/A-Tune/examples/tuning/compress/compress.py | grep 'compressLevel=' | awk -F '=' '{print $2}'" - set : "sed -i 's/compressLevel=\\s*[0-9]*/compressLevel=$value/g' /root/A-Tune/examples/tuning/compress/compress.py" - needrestart : "false" - type : "continuous" - scope : - - 1 - - 9 - dtype : "int" - - - name : "compressMethod" - info : - desc : "The compressMethod parameter is a string controlling the compression method" - get : "cat /root/A-Tune/examples/tuning/compress/compress.py | grep 'compressMethod=' | awk -F '=' '{print $2}' | sed 's/\"//g'" - set : "sed -i 's/compressMethod=\\s*[0-9,a-z,\"]*/compressMethod=\"$value\"/g' /root/A-Tune/examples/tuning/compress/compress.py" - needrestart : "false" - type : "discrete" - options : - - "bz2" - - "zlib" - - "gzip" - dtype : "string" -``` - -   - -The following is an example of the YAML file configuration on a client: - -``` -project: "compress" -engine : "gbrt" -iterations : 20 -random_starts : 10 - -benchmark : "python3 /root/A-Tune/examples/tuning/compress/compress.py" -evaluations : - - - name: "time" - info: - get: "echo '$out' | grep 'time' | awk '{print $3}'" - type: "positive" - weight: 20 - - - name: "compress_ratio" - info: - get: "echo '$out' | grep 'compress_ratio' | awk '{print $3}'" - type: "negative" - weight: 80 -``` - -   - -#### Example - -- Download test data. - ``` - wget http://cs.fit.edu/~mmahoney/compression/enwik8.zip - ``` -- Prepare the tuning environment. - - Example of **prepare.sh**: - ``` - #!/usr/bin/bash - if [ "$#" -ne 1 ]; then - echo "USAGE: $0 the path of enwik8.zip" - exit 1 - fi - - path=$( - cd "$(dirname "$0")" - pwd - ) - - echo "unzip enwik8.zip" - unzip "$path"/enwik8.zip - - echo "set FILE_PATH to the path of enwik8 in compress.py" - sed -i "s#compress/enwik8#$path/enwik8#g" "$path"/compress.py - - echo "update the client and server yaml files" - sed -i "s#python3 .*compress.py#python3 $path/compress.py#g" "$path"/compress_client.yaml - sed -i "s# compress/compress.py# $path/compress.py#g" "$path"/compress_server.yaml - - echo "copy the server yaml file to /etc/atuned/tuning/" - cp "$path"/compress_server.yaml /etc/atuned/tuning/ - ``` - Run the script. - ``` - sh prepare.sh enwik8.zip - ``` -- Run the `tuning` command to tune the parametres. - - ``` - atune-adm tuning --project compress --detail compress_client.yaml - ``` - -- Restore the configuration before runinng `tuning`. **compress** indicates the project name in the YAML file. - - ``` - atune-adm tuning --restore --project compress - ``` \ No newline at end of file diff --git a/docs/en/docs/Administration/memory-management.md b/docs/en/docs/Administration/memory-management.md deleted file mode 100644 index 0aa95dd06936f32f16618b64dd7267114821e4a1..0000000000000000000000000000000000000000 --- a/docs/en/docs/Administration/memory-management.md +++ /dev/null @@ -1,799 +0,0 @@ -# etmem - -## Introduction - -The development of CPU computing power - particularly lower costs of ARM cores - makes memory cost and capacity become the core frustration that restricts business costs and performance. Therefore, the most pressing issue is how to save memory cost and how to expand memory capacity. - -etmem is a tiered memory expansion technology that uses DRAM+memory compression/high-performance storage media to form tiered memory storage. Memory data is tiered, and cold data is migrated from memory media to high-performance storage media to release memory space and reduce memory costs. - -The etmem software package runs on the etmem client and etmemd server. The etmemd server is resident after being started. It implements functions such as hot and cold memory identification and elimination for the target process. The etmem client runs once when being invoked and controls the etmemd server to respond to different operations based on command options. - -## Compilation Tutorial - -1. Download the etmem source code. - - ```bash - $ git clone https://gitee.com/openeuler/etmem.git - ``` - -2. Install the compilation and running dependency. - - The compilation and running of etmem depend on the libboundscheck component. - -3. Build source code. - - ```bash - $ cd etmem - - $ mkdir build - - $ cd build - - $ cmake .. - - $ make - ``` - -## Precautions - -### Running Dependencies - -As a memory expansion tool, etmem depends on kernel-mode features. To identify memory access and proactively write memory to the swap partition for vertical memory expansion, `etmem_scan` and `etmem_swap` modules need to be inserted when etmem is running. - -```bash -modprobe etmem_scan -modprobe etmem_swap -``` - -### Permission Control - -The root permission is required for running the etmem process. The **root** user has the highest permission in the system. When performing operations as the **root** user, strictly follow the operation guide to prevent system management and security risks caused by other operations. - -### Constraints - -- The etmem client and server must be deployed on the same server. Cross-server communication is not supported. -- etmem can scan only the target processes whose names contain fewer than or equal to 15 characters. The process name can contain letters, digits, special characters ./%-_, and any combination of the preceding three types of characters. Other combinations are invalid. -- When the AEP media is used for memory expansion, the system must be able to correctly identify AEP devices and initialize the AEP devices as NUMA nodes. In addition, the `vm_flags` field in the configuration file can only be set to `ht`. -- Private engine commands are valid only for the corresponding engine and tasks under the engine, for example, `showhostpages` and `showtaskpages` supported by cslide. -- In the third-party policy implementation code, the `fd` field in the `eng_mgt_func` interface cannot be set to `0xff` or `0xfe`. -- Multiple third-party policy dynamic libraries can be added to a project. They are differentiated by `eng_name` in the configuration file. -- Do not scan the same process concurrently. -- Do not use the `/proc/xxx/idle_pages` and `/proc/xxx/swap_pages` files when `etmem_scan.ko` and`etmem_swap.ko` are not loaded. -- The owner of the etmem configuration file must be the **root** user, the permission must be **600** or **400**, and the size of the configuration file cannot exceed 10 MB. -- When etmem injects third-party policies, the owner of the `so` files of the third-party policies must be the **root** user and the permission must be **500** or **700**. - -## Instructions - -### etmem Configuration File - -Before running the etmem process, the administrator needs to plan the processes that require memory expansion, configure the process information in the etmem configuration file, and configure the memory scan loops and times, and cold and hot memory thresholds. - -The sample configuration files are stored in the `/etc/etmem` directory in the source package. There are three sample files by function. - -```text -/etc/etmem/cslide_conf.yaml -/etc/etmem/slide_conf.yaml -/etc/etmem/thirdparty_conf.yaml -``` - -The samples are as follows: - -```sh -#Example of the slide engine -#slide_conf.yaml -[project] -name=test -loop=1 -interval=1 -sleep=1 -sysmem_threshold=50 -swapcache_high_vmark=10 -swapcache_low_vmark=6 - -[engine] -name=slide -project=test - -[task] -project=test -engine=slide -name=background_slide -type=name -value=mysql -T=1 -max_threads=1 -swap_threshold=10g -swap_flag=yes - -#Example of the cslide engine -#cslide_conf.yaml -[engine] -name=cslide -project=test -node_pair=2,0;3,1 -hot_threshold=1 -node_mig_quota=1024 -node_hot_reserve=1024 - -[task] -project=test -engine=cslide -name=background_cslide -type=pid -name=23456 -vm_flags=ht -anon_only=no -ign_host=no - -#Example of the thirdparty engine -#thirdparty_conf.yaml -[engine] -name=thirdparty -project=test -eng_name=my_engine -libname=/usr/lib/etmem_fetch/my_engine.so -ops_name=my_engine_ops -engine_private_key=engine_private_value - -[task] -project=test -engine=my_engine -name=background_third -type=pid -value=12345 -task_private_key=task_private_value -``` - -Fields in the configuration file are described as follows. - -| Item | Description | Mandatory (Yes/No)| With Parameters (Yes/No)| Value Range | Example Description | -|-----------|---------------------|------|-------|------------|-----------------------------------------------------------------| -| [project] | Start flag of the project common configuration section | No | No | N/A | Start flag of the `project` configuration item, indicating that the following configuration items, before another *[xxx]* or to the end of the file, belong to the project section.| -| name | Name of a project | Yes | Yes | A string of fewer than 64 characters| Identifies a project. When configuring an engine or task, you need to specify the project to which the engine or task is mounted. | -| loop | Number of memory scan loops | Yes | Yes | 1 to 120 | `loop=3` // Scan for three times. | -| interval | Interval for memory scans | Yes | Yes | 1 to 1200 | `interval=5` // The scan interval is 5s. | -| sleep | Interval between large loops of memory scans and operations| Yes | Yes | 1 to 1200 | `sleep=10` // The interval between large loops is 10s. | -| sysmem_threshold| Configuration item of the `slide` engine, which specifies the threshold of the system memory swap-out | No | Yes | 0 to 100 | `sysmem_threshold=50` // etmem triggers memory swap-out only when the remaining system memory is less than 50%.| -| swapcache_high_wmark| Configuration item of the `slide` engine, which specifies the high watermark of the system memory occupied by the swap cache| No | Yes | 1 to 100 | `swapcache_high_wmark=5` // The swap cache memory usage can be 5% of the system memory. If the usage exceeds 5%, etmem triggers swap cache reclamation.
Note: The value of `swapcache_high_wmark` must be greater than that of `swapcache_low_wmark`.| -| swapcache_low_wmark| Configuration item of the `slide` engine, which specifies the low watermark of the system memory occupied by the swap cache| No | Yes | [1, **swapcache_high_wmark**) | `swapcache_low_wmark=3` // After swap cache reclamation is triggered, the system reclaims the swap cache memory until the usage is reduced to less than 3%.| -| [engine] | Start flag of the engine common configuration section | No | No | N/A | Start flag of the `engine` configuration item, indicating that the following configuration items, before another *[xxx]* or to the end of the file, belong to the engine section.| -| project | Project | Yes | Yes | A string of fewer than 64 characters | If a project named **test** already exists, enter **project=test**. | -| engine | Engine | Yes | Yes | `slide`, `cslide`, or `thirdparty` | Identifies the slide, cslide, or thirdparty policy. | -| node_pair | Configuration item of the `cslide` engine, which specifies the node pair of the AEP and DRAM in the system| Mandatory when `engine` is set to `cslide`| Yes | Node IDs of the AEP and DRAM are configured in pairs and separated by commas (,). Node pairs are separated by semicolons (;).| `node_pair=2,0;3,1` | -| hot_threshold | Configuration item of the `cslide` engine, which specifies the threshold of the hot and cold memory | Mandatory when `engine` is set to `cslide`| Yes | An integer greater than or equal to `0` and less than or equal to `INT_MAX` | `hot_threshold=3` // Memory accessed fewer than 3 times is identified as cold memory. | -|node_mig_quota|Configuration item of the `cslide` engine, which specifies the maximum unidirectional traffic during each migration between the DRAM and AEP|Mandatory when `engine` is set to `cslide`|Yes|An integer greater than or equal to `0` and less than or equal to `INT_MAX`|`node_mig_quota=1024` // The unit is MB. A maximum of 1,024 MB data can be migrated from the AEP to the DRAM or from the DRAM to the AEP at a time.| -|node_hot_reserve|Configuration item of the `cslide` engine, which specifies the size of the reserved space for the hot memory in the DRAM|Mandatory when `engine` is set to `cslide`|Yes|An integer greater than or equal to `0` and less than or equal to `INT_MAX`|`node_hot_reserve=1024` // The unit is MB. When the hot memory of all VMs is greater than the value of this configuration item, the hot memory is migrated to the AEP.| -|eng_name|Configuration item of the `thirdparty` engine, which specifies the engine name and is used for task mounting|Mandatory when `engine` is set to `thirdparty`|Yes|A string of fewer than 64 characters|`eng_name=my_engine` // To mount a task to the thirdparty engine, you can enter `engine=my_engine` in the task.| -|libname|Configuration item of the `thirdparty` engine, which specifies the address of the dynamic library of the third-party policy. The address is an absolute address.|Mandatory when `engine` is set to `thirdparty`|Yes|A string of fewer than 256 characters|libname=/user/lib/etmem_fetch/code_test/my_engine.so| -|ops_name|Configuration item of the `thirdparty` engine, which specifies the name of the operator in the dynamic library of the third-party policy|Mandatory when `engine` is set to `thirdparty`|Yes|A string of fewer than 256 characters|`ops_name=my_engine_ops` // Name of the structure of the third-party policy implementation interface| -|engine_private_key|(Optional) Configuration item of the `thirdparty` engine. This configuration item is reserved for the third-party policy to parse private parameters.|No|No|Configured based on the private parameters of the third-party policy|Set this configuration item based on the private engine configuration items of the third-party policy.| -| [task] | Start flag of the task common configuration section| No| No| N/A | Start flag of the `task configuration item`, indicating that the following configuration items, before another *[xxx]* or to the end of the file, belong to the task section.| -| project | Project to which a task is mounted | Yes| Yes| A string of fewer than 64 characters | If a project named **test** already exists, enter **project=test**. | -| engine | Engine to which a task is mounted | Yes| Yes| A string of fewer than 64 characters | Specifies the engine to which a task is mounted. | -| name | Name of a task | Yes| Yes| A string of fewer than 64 characters | `name=background1` // The task name is `backgound1`. | -| type | Method of identifying the target process | Yes| Yes| `pid` or `name` | `pid` indicates that the process is identified by the process ID, and `name` indicates that the process is identified by the process name. | -| value | Specific fields identified by the target process | Yes| Yes| Actual process ID/name| This configuration item is used together with the `type` configuration item to specify the ID or name of the target process. Ensure that the configuration is correct and unique. | -| T | Task configuration item of the `slide` engine, which specifies the threshold of the hot and cold memory | Mandatory when `engine` is set to `slide`| Yes| 0 to **loop** x 3 | `T=3` // Memory accessed fewer than 3 times is identified as cold memory. | -| max_threads | Task configuration item of the `slide` engine, which specifies the maximum number of threads in the internal thread pool of etmemd. Each thread processes a memory scan+operation task of a process or child process.| No | Yes| 1 to 2 x Number of cores + 1. The default value is `1`.| This configuration item controls the number of internal processing threads of etmemd. When the target process has multiple child processes, a larger value of this configuration item indicates a larger number of concurrent executions but more occupied resources.| -| vm_flags | Task configuration item of the `cslide` engine, which specifies the flag of the VMA to be scanned. If this configuration item is not configured, the VMA is not distinguished. | No | Yes| The value is a string of fewer than 256 characters. Different flags are separated by spaces. | `vm_flags=ht` // Scan the VMA memory whose flag is `ht` (huge page). | -| anon_only | Task configuration item of the `cslide` engine, which specifies whether to scan only anonymous pages | No | Yes| `yes` or `no` | `anon_only=no` // `yes` indicates that only anonymous pages are scanned. `no` indicates that non-anonymous pages are also scanned. | -| ign_host | Task configuration item of the `cslide` engine, which specifies whether to ignore the page table scan information on the host | No | Yes| `yes` or `no` | `ign_host=no` // `yes`: Ignore; `no`: Do not ignore. | -| task_private_key | (Optional) Task configuration item of the `thirdparty` engine. This configuration item is reserved for the task of the third-party policy to parse private parameters. | No | No| Configured based on the private parameters of the third-party policy | Configured based on the private task parameters of the third-party policy | -| swap_threshold |Configuration item of the `slide` engine, which specifies the threshold of the process memory swap-out | No | Yes| Absolute value of the available memory of a process | `swap_threshold=10g` // If the memory usage of a process is less than 10 GB, swap-out is not triggered.
In the current version, only g/G can be used as the absolute memory unit. This configuration item is used together with `sysmem_threshold`. When the system memory is lower than the threshold, the system checks the threshold of the processes in the allowlist.| -| swap_flag|Configuration item of the `slide` engine, which specifies process memory to be swapped out | No | Yes| `yes` or `no` | `swap_flag=yes` // Specify process memory to be swapped out.| - -### Starting the etmemd Server - -When using the servers provided by etmem, you need to modify the corresponding configuration file as required, and then run the etmemd server to operate the memory of the target process. In addition to starting the etmemd process in binary mode on the CLI, you can configure the `server` file to enable the etmemd server to start the etmemd process in `systemctl` mode. In this scenario, you need to use the `mode-systemctl` parameter to specify whether to enable the function. - -#### How to Use - -You can run the following command to start the etmemd server: - -```bash -etmemd -l 0 -s etmemd_socket -``` - -Or - -```bash -etmemd --log-level 0 --socket etmemd_socket -``` - -`0` in `-l` and `etmemd_socket` in `-s` are user-defined parameters. For details about the parameters, see the following table. - -#### Command-Line Options - -| Option | Description | Mandatory (Yes/No)| With Parameters (Yes/No)| Value Range | Example Description | -| --------------- | ---------------------------------- | -------- | ---------- | --------------------- | ------------------------------------------------------------ | -| `-l` or `\-\-log-level` | etmemd log level. | No | Yes | 0 to 3 | `0`: debug level.
`1`: info level.
`2`: warning level.
`3`: error level.
Only logs of the level that is higher than or equal to the configured level are recorded in the `/var/log/message` file.| -| `-s` or `\-\-socket` | Name of the etmemd listener, which is used to interact with the client.| Yes | Yes | A string of fewer than 107 characters| Specifies the name of the server listener. | -| `-m` or `\-\-mode-systemctl`| Starts the etmemd server in systemctl mode.| No| No| N/A| The `-m` option must be specified in the `service` file.| -| `-h` or `\-\-help` | Prints help information. | No | No | N/A | If this option is specified, the command execution exits after the command output is printed. | - -### Adding or Deleting a Project, Engine, or Task on the etmem Client - -#### Scenarios - -1. The administrator adds an etmem project, engine, or task. (A project can contain multiple etmem engines, and an engine can contain multiple tasks.) - -2. The administrator deletes an existing etmem project, engine, or task. (Before a project is deleted, all tasks in the project automatically stop.) - -#### How to Use - -After the etmemd server runs properly, you can use the `obj` parameter on the etmem client to add or delete a project, engine, or task. The project, engine, or task is identified based on the content configured in the configuration file. - -- Add an object. - ```bash - etmem obj add -f /etc/etmem/slide_conf.yaml -s etmemd_socket - ``` - - Or - - ```bash - etmem obj add --file /etc/etmem/slide_conf.yaml --socket etmemd_socket - ``` - -- Delete an object. - ```bash - etmem obj del -f /etc/etmem/slide_conf.yaml -s etmemd_socket - ``` - - Or - - ```bash - etmem obj del --file /etc/etmem/slide_conf.yaml --socket etmemd_socket - ``` - -#### Command-Line Options - - -| Option | Description | Mandatory (Yes/No)| With Parameters (Yes/No)| Example Description | -| ------------ | ------------------------------------------------------------ | -------- | ---------- | -------------------------------------------------------- | -| `-f` or `\-\-file` | Configuration file of the specified object | Yes | Yes | Specifies the file path. | -| `-s` or `\-\-socket` | Name of the socket for communicating with the etmemd server. The value must be the same as that specified when the etmemd server is started.| Yes | Yes | This option is mandatory. When there are multiple etmemd servers, the administrator selects an etmemd server to communicate with.| - -### Querying, Starting, or Stopping a Project on the etmem Client - -#### Scenarios - -After adding a project by running the `etmem obj add` command, the administrator can start or stop the etmem project before running the `etmem obj del` command to delete the project. - -1. The administrator starts an added project. - -2. The administrator stops a project that has been started. - -When the administrator runs the `obj del` command to delete a project, the project automatically stops if it has been started. - -#### How to Use - -For a project that has been successfully added, you can run the `etmem project` command to start or stop the project. Example commands are as follows: - -- Query a project. - - ```bash - etmem project show -n test -s etmemd_socket - ``` - - Or - - ```bash - etmem project show --name test --socket etmemd_socket - ``` - -- Start a project. - - ```bash - etmem project start -n test -s etmemd_socket - ``` - - Or - - ```bash - etmem project start --name test --socket etmemd_socket - ``` - -- Stop a project. - - ```bash - etmem project stop -n test -s etmemd_socket - ``` - - Or - - ```bash - etmem project stop --name test --socket etmemd_socket - ``` - -- Print help information. - - ```bash - etmem project help - ``` - -#### Command-Line Options - -| Option | Description | Mandatory (Yes/No)| With Parameters (Yes/No)| Example Description | -| ------------ | ------------------------------------------------------------ | -------- | ---------- | -------------------------------------------------------- | -| `-n` or `\-\-name` | Project name | Yes | Yes | Project name, which corresponds to the configuration file. | -| `-s` or `\-\-socket` | Name of the socket for communicating with the etmemd server. The value must be the same as that specified when the etmemd server is started.| Yes | Yes | This option is mandatory. When there are multiple etmemd servers, the administrator selects an etmemd server to communicate with.| - - -### Performing Memory Swap-out on the etmem Client based on the Memory Swap-out Threshold and Flag - -Among the currently supported policies, only the `slide` policy supports private functions and features. - -- Swapping out process or system memory based on threshold - -To achieve optimal service performance, you need to consider the time when the etmem memory is swapped out. When the available system memory is sufficient and the system memory pressure is low, memory swapping is not performed. When the memory usage of processes is low, memory swapping is not performed. The thresholds for controlling the system memory swap-out and process memory swap-out are available. - -- Swapping out the specified process memory - -In the storage environment, I/O latency-sensitive server processes do not want to swap out the memory. Therefore, a mechanism is provided for services to specify the memory that can be swapped out. - -You can add the `sysmem_threshold`, `swap_threshold`, and `swap_flag` parameters to the configuration file. For details, see the description of the etmem configuration file. - -```sh -#slide_conf.yaml -[project] -name=test -loop=1 -interval=1 -sleep=1 -sysmem_threshold=50 - -[engine] -name=slide -project=test - -[task] -project=test -engine=slide -name=background_slide -type=name -value=mysql -T=1 -max_threads=1 -swap_threshold=10g -swap_flag=yes -``` - -#### Swapping Out System Memory Based on Threshold - -In the configuration file, `sysmem_threshold` indicates the threshold for system memory swap-out. The value of `sysmem_threshold` ranges from 0 to 100. If `sysmem_threshold` is configured in the configuration file, etmem triggers memory swap-out only when the available system memory is less than the value of `sysmem_threshold`. - -Procedure: - -1. Compile the configuration file. Configure the `sysmem_threshold` parameter in the configuration file, for example, `sysmem_threshold=20`. -2. Start the server, and add and start a project. - - ```bash - etmemd -l 0 -s monitor_app & - etmem obj add -f etmem_config -s monitor_app - etmem project start -n test -s monitor_app - etmem project show -s monitor_app - ``` - -3. Check the memory swap-out result. etmem triggers memory swap-out only when the available system memory is less than 20%. - -#### Swapping Out Process Memory Based on Threshold - -In the configuration file, `swap_threshold` indicates the threshold for process memory swap-out. `swap_threshold` specifies the absolute value of the process memory usage (*number*+**g**/**G**). If `swap_threshold` is configured in the configuration file, etmem will not trigger memory swap-out for a process when the memory usage of the process is less than the value of `swap_threshold`. - -Procedure: - -1. Compile the configuration file. Configure the `swap_threshold` parameter in the configuration file, for example, `swap_threshold=5g`. -2. Start the server, and add and start a project. - - ```bash - etmemd -l 0 -s monitor_app & - etmem obj add -f etmem_config -s monitor_app - etmem project start -n test -s monitor_app - etmem project show -s monitor_app - ``` - -3. Check the memory swap-out result. etmem triggers memory swap-out only when the absolute value of the memory occupied by the process is greater than 5 GB. - -#### Swapping Out the Specified Process Memory - -In the configuration file, `swap_flag` specifies the process memory that can be swapped out. `swap_flag` can be set to `yes` or `no`. If `swap_flag` is set to `no` or not set in the configuration file, the memory swap-out function of etmem remains unchanged. If `swap_flag` is set to `yes`, only the specified process memory can be swapped out. - -Procedure: - -1. Compile the configuration file. Configure the `swap_flag` parameter in the configuration file, for example, `swap_flag=yes`. -2. Mark the process memory to be swapped out. - - ```bash - madvise(addr_start, addr_len, MADV_SWAPFLAG) - ``` - -3. Start the server, and add and start a project. - - ```bash - etmemd -l 0 -s monitor_app & - etmem obj add -f etmem_config -s monitor_app - etmem project start -n test -s monitor_app - etmem project show -s monitor_app - ``` - -4. Check the memory swap-out result. Only the marked process memory is swapped out. Other memory is retained in the DRAM and will not be swapped out. - -In the scenario where a specified page of a process is swapped out, the `ioctl` call is added to the original scan interface `idle_pages` to ensure that the VMA without a specific flag is not scanned or swapped out. - -Scan Management Interface - -- Prototype - - ```c - ioctl(fd, cmd, void *arg); - ``` - -- Input parameters - - ```text - 1. fd: file descriptor, which is obtained by the open call in /proc/pid/idle_pages. - - 2. cmd: controls the scanning behavior. Currently, the following commands are supported: - VMA_SCAN_ADD_FLAGS: adds a VMA swap-out flag. Only VMAs with the specified flag are scanned. - VMA_SCAN_REMOVE_FLAGS: removes the new VMA swap-out flag. - - 3. args: int pointer argument, which is used to transfer the specific flag mask. Currently, only the following argument is supported: - VMA_SCAN_FLAG: Before the etmem_scan.ko module starts scanning, the `walk_page_test` interface is called to check whether the VMA address meets the scanning requirements. If this flag is set, only the VMA address segment with a specific swap-out flag is scanned, and other VMA addresses are ignored. - ``` - -- Return value - - ```text - 1. If the operation is successful, 0 is returned. - 2. If the operation fails, a non-zero value is returned. - ``` - -- Note - - ```text - All unsupported flags are ignored, but no error is returned. - ``` - -### Reclaiming Swap Cache Memory on the etmem Client - -The user-mode etmem initiates a memory eviction and reclamation operation and interacts with the kernel-mode memory reclamation module through the `write procfs` interface. The kernel-mode memory reclamation module parses the virtual address delivered by the user-mode etmem, obtains the page corresponding to the address, and calls the native kernel interface to swap out the memory corresponding to the page for reclamation. During memory swap-out, the swap cache occupies certain system memory. To further save memory, the swap cache memory reclamation function is added. - -You can add the `swapcache_high_wmark` and `swapcache_low_wmark` parameters to the configuration file to use this function. - -- `swapcache_high_wmark`: high watermark of the system memory that can be occupied by the swap cache. -- `swapcache_low_wmark`: low watermark of the system memory that can be occupied by the swap cache. - -After performing a memory swap-out, etmem checks the memory usage of the swap cache. If the memory usage exceeds the high watermark, etmem delivers the `ioctl` command in `swap_pages` to trigger swap cache memory reclamation. The reclamation stops when the memory usage comes down to the low watermark. - -The following is an example of parameter configuration. For details, see the sections related to the etmem configuration file. - -```sh -#slide_conf.yaml -[project] -name=test -loop=1 -interval=1 -sleep=1 -swapcache_high_vmark=5 -swapcache_low_vmark=3 - -[engine] -name=slide -project=test - -[task] -project=test -engine=slide -name=background_slide -type=name -value=mysql -T=1 -max_threads=1 -``` - -In the swap-out scenario, the swap cache memory needs to be reclaimed to further save the memory. The `ioctl` call is added to the `swap_pages` interface to set the swap cache watermark and enable or disable the swap cache memory reclamation. - -- Prototype - - ```c - ioctl(fd, cmd, void *arg); - ``` - -- Input parameters - - ```text - 1. fd: file descriptor, which is obtained by the open call in /proc/pid/idle_pages. - - 2. cmd: controls the scanning behavior. Currently, the following commands are supported: - RECLAIM_SWAPCACHE_ON: enables swap cache memory swap-out. - RECLAIM_SWAPCACHE_OFF: disables swap cache memory swap-out. - SET_SWAPCACHE_WMARK: specifies the swap cache memory watermark. - - 3. args: int pointer argument, which is used to transfer the specific flag mask. Currently, only the following argument is supported: - Argument used to transfer the swap cache watermark. - ``` - -- Return value - - ```text - 1. If the operation is successful, 0 is returned. - 2. If the operation fails, a non-zero value is returned. - ``` - -- Note - - ```text - All unsupported flags are ignored, but no error is returned. - ``` - -### Executing Private Engine Commands or Functions on the etmem Client - -Among the supported policies, only the `cslide` policy supports private commands. - -- `showtaskpages` -- `showhostpages` - -You can run the commands to view the page access information related to the task and the system huge page usage on the host of the VM. - -The following are example commands: - -```bash -etmem engine showtaskpages <-t task_name> -n proj_name -e cslide -s etmemd_socket - -etmem engine showhostpages -n proj_name -e cslide -s etmemd_socket -``` - -**Note**: ``showtaskpages` and `showhostpages` support only the cslide engine. - -#### Command-Line Options -| Option| Description | Mandatory (Yes/No)| With Parameters (Yes/No)| Example Description| -|----|------|------|-------|------| -|`-n` or `\-\-proj_name`| Project name| Yes| Yes| Specifies the name of an existing project to be executed.| -|`-s` or `\-\-socket`| Name of the socket for communicating with the etmemd server. The value must be the same as that specified when the etmemd process is started.| Yes| Yes| This option is mandatory. When there are multiple etmemd servers, the administrator selects an etmemd server to communicate with.| -|`-e` or `\-\-engine`| Name of the engine to be executed| Yes| Yes| Specifies the name of an existing engine to be executed.| -|`-t` or `\-\-task_name`| Name of the task to be executed| No| Yes| Specifies the name of an existing task to be executed.| - -### Enabling and Disabling the Kernel Swap Function - -When etmem is used for memory expansion, you can determine whether to enable the kernel swap function. You can disable the native swap mechanism of the kernel to prevent the native swap mechanism from swapping out the memory that should not be swapped out to cause user-mode process exceptions. - -The sys interface is provided to implement the preceding control. The `kobj` object named `kernel_swap_enable` is created in the `/sys/kernel/mm/swap` directory. It is used to enable or disable kernel swap. The default value is `true`. - -Examples: - -```sh -#Enable kernel swap. -echo true > /sys/kernel/mm/swap/kernel_swap_enable -Or -echo 1 > /sys/kernel/mm/swap/kernel_swap_enable - -#Disable kernel swap. -echo false > /sys/kernel/mm/swap/kernel_swap_enable -Or -echo 0 > /sys/kernel/mm/swap/kernel_swap_enable - -``` - -### Automatically Starting etmem with System - -#### Scenarios - -etmemd allows you to configure the `systemd` configuration file and start the `systemd` service in `fork` mode. - -#### How to Use - -Compile the `service` configuration file to start etmemd. Use the `-m` option to specify the mode. For example: - -```bash -etmemd -l 0 -s etmemd_socket -m -``` - -#### Command-Line Options -| Option | Description | Mandatory (Yes/No)| With Parameters (Yes/No)| Value Range| Example Description | -|----------------|------------|------|-------|------|-----------| -| `-l` or `\-\-log-level` | etmemd log level.| No | Yes | 0 to 3 | `0`: debug level. `1`: info level. `2`: warning level. `3`: error level. Only logs of the level that is higher than or equal to the configured level are recorded in the `/var/log/message` file.| -| `-s` or `\-\-socket` |Name of the etmemd listener, which is used to interact with the client.| Yes| Yes| A string of fewer than 107 characters| Name of the server listener| -|`-m` or `\-\-mode-systemctl`| When etmemd is started as a service, this option must be specified in the command.| No| No| N/A| N/A| -| `-h` or `\-\-help` | Prints help information.| No |No|N/A|If this option is specified, the command execution exits after the command output is printed.| - - -### Supporting Third-Party Memory Extension Policies - -#### Scenarios - -etmem allows you to register third-party memory extension policies and provides the dynamic library of the scan module. When etmem is running, the third-party policy eviction algorithm is used to evict the memory. - -You can use the dynamic library of the scan module provided by etmem and implement the interfaces in the structure required for connecting to etmem. - -#### How to Use - -To use a third-party extended eviction policy, perform the following steps: - -1. Invoke the scan interface provided by the scan module as required. - -2. Implement each interface based on the function template provided in the etmem header file and encapsulate the interfaces into structures. - -3. Compile the dynamic library of the third-party extended eviction policy. - -4. Specify the `thirdparty` engine in the configuration file as required. - -5. Enter the dynamic library name and interface structure name in the `task` field in the configuration file as required. - -Other operations are similar to those of other etmem engines. - -Interface structure templates: - -```c -struct engine_ops { - -/* Parse the private parameters of the engine. If there are private parameters, implement this interface; otherwise, set it to NULL. */ - -int (*fill_eng_params)(GKeyFile *config, struct engine *eng); - -/* Clear the private parameters of the engine. If there are private parameters, implement this interface; otherwise, set it to NULL. */ - -void (*clear_eng_params)(struct engine *eng); - -/* Parse the private parameters of the task. If there are private parameters, implement this interface; otherwise, set it to NULL. */ - -int (*fill_task_params)(GKeyFile *config, struct task *task); - -/* Parse the private parameters of the task. If there are private parameters, implement this interface; otherwise, set it to NULL. */ - -void (*clear_task_params)(struct task *tk); - -/* Interface for starting a task */ - -int (*start_task)(struct engine *eng, struct task *tk); - -/* Interface for stopping a task */ - -void (*stop_task)(struct engine *eng, struct task *tk); - -/* Fill in the private parameters related to the PID. */ - -int (*alloc_pid_params)(struct engine *eng, struct task_pid **tk_pid); - -/* Destroy the private parameters related to the PID. */ - -void (*free_pid_params)(struct engine *eng, struct task_pid **tk_pid); - -/* Private commands required by third-party policies. If no private command is required, set it to NULL. */ - -int (*eng_mgt_func)(struct engine *eng, struct task *tk, char *cmd, int fd); - -}; -``` - -External interfaces of the scan module - -| Interface Name |Interface Description| -| ------------ | --------------------- | -| etmemd_scan_init | Initializes the scan module.| -| etmemd_scan_exit | Destructs the scan module.| -| etmemd_get_vmas | Obtains the VMAs to be scanned.| -| etmemd_free_vmas | Releases the VMAs scanned by `etmemd_get_vmas`.| -| etmemd_get_page_refs | Scans pages in VMAs.| -| etmemd_free_page_refs | Releases the linked list of page access information obtained by `etmemd_get_page_refs`.| - -In the VM scanning scenario, the `ioctl` call is added to the original scan interface `idle_pages` to provide a mechanism for distinguishing the `ept` scanning granularity and determining whether to ignore the page access flag on the host. - -In the scenario where a specified page of a process is swapped out, the `ioctl` call is added to the original scan interface `idle_pages` to ensure that the VMA without a specific flag is not scanned or swapped out. - -Scan management interface: - -- Prototype - - ```c - ioctl(fd, cmd, void *arg); - ``` - -- Input parameters - - ```text - 1. fd: file descriptor, which is obtained by the open call in /proc/pid/idle_pages. - - 2. cmd: controls the scanning behavior. Currently, the following commands are supported: - IDLE_SCAN_ADD_FLAG: adds a scan flag. - IDLE_SCAM_REMOVE_FLAGS: removes a scan flag. - VMA_SCAN_ADD_FLAGS: adds a VMA swap-out flag. Only VMAs with the specified flag are scanned. - VMA_SCAN_REMOVE_FLAGS: removes the new VMA swap-out flag. - - 3. args: int pointer argument, which is used to transfer the specific flag mask. Currently, only the following argument is supported: - SCAN_AS_HUGE: scans whether a page has been accessed based on the 2 MB huge page granularity when scanning the ept page table. If this flag is not set, scanning is performed based on the granularity of the ept page table. - SCAN_IGN_HUGE: ignores the access flag in the page table on the host side during VM scanning. If this flag is not set, the access flag in the page table on the host side is not ignored. - VMA_SCAN_FLAG: Before the etmem_scan.ko module starts scanning, the `walk_page_test` interface is called to check whether the VMA address meets the scanning requirements. If this flag is set, only the VMA address segment with a specific swap-out flag is scanned, and other VMA addresses are ignored. - ``` - -- Return value - - ```text - 1. If the operation is successful, 0 is returned. - 2. If the operation fails, a non-zero value is returned. - ``` - -- Note - - ```text - All unsupported flags are ignored, but no error is returned. - ``` - -The following is an example of the configuration file. For details, see the configuration file description. - -```sh -#thirdparty -[engine] - -name=thirdparty - -project=test - -eng_name=my_engine - -libname=/user/lib/etmem_fetch/code_test/my_engine.so - -ops_name=my_engine_ops - -engine_private_key=engine_private_value - -[task] - -project=test - -engine=my_engine - -name=background1 - -type=pid - -value=1798245 - -task_private_key=task_private_value -``` - - **Notes**: - -You must use the dynamic library of the scan module provided by etmem and implement the interfaces in the structure required for connecting to etmem. - -The `fd` field in the `eng_mgt_func` interface cannot be set to `0xff` or `0xfe`. - -Multiple third-party policy dynamic libraries can be added to a project. They are differentiated by `eng_name` in the configuration file. - -### etmem Client and Server Help - -Run the following command to print the help information about the etmem server: - -```bash -etmemd -h -``` - -Or - -```bash -etmemd --help -``` - -Run the following command to print the help information about the etmem client: - -```bash -etmem help -``` - -Run the following command to print help information about projects, engines, and tasks on the etmem client: - -```bash -etmem obj help -``` - -Run the following command to print the help information about the project on the etmem client: - -```bash -etmem project help -``` - -## How to Contribute - -1. Fork this repository. -2. Create a branch. -3. Commit your code. -4. Create a pull request (PR). diff --git a/docs/en/docs/DPUOffload/figures/offload-arch.png b/docs/en/docs/DPUOffload/figures/offload-arch.png deleted file mode 100644 index b0f7b8587c47838880bcca5d6694f66a16ec0aaf..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/DPUOffload/figures/offload-arch.png and /dev/null differ diff --git a/docs/en/docs/DPUOffload/figures/qtfs-arch.png b/docs/en/docs/DPUOffload/figures/qtfs-arch.png deleted file mode 100644 index 40fd7e28707642801ec0b984690a25c08e092ac4..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/DPUOffload/figures/qtfs-arch.png and /dev/null differ diff --git a/docs/en/docs/DPUOffload/imperceptible-container-management-plane-offload.md b/docs/en/docs/DPUOffload/imperceptible-container-management-plane-offload.md deleted file mode 100644 index 0b954cbe6b6d7fbe90d56d667c48b4f6750e7d7a..0000000000000000000000000000000000000000 --- a/docs/en/docs/DPUOffload/imperceptible-container-management-plane-offload.md +++ /dev/null @@ -1,35 +0,0 @@ -# Imperceptible Container Management Plane Offload - -## Overview - -Moore's law ceases to apply in data center and cloud scenarios. The CPU computing power growth rate of general processing units is slowing down, while the network I/O speed and performance keep increasing. As a result, the processing capability of current general-purpose processors cannot meet the I/O processing requirements of the network and drives. In traditional data centers, more and more general-purpose CPU computing power is occupied by I/O and management planes. This part of resource loss is called data center tax. According to AWS statistics, the data center tax may account for more than 30% of the computing power of the data center. - -The data processing unit (DPU) is developed to release the computing resources from the host CPU. The management plane, network, storage, and security capabilities are offloaded to DPUs for acceleration, reducing costs and improving efficiency. Mainstream cloud vendors, such as AWS, Alibaba Cloud, and Huawei Cloud, use self-developed processors to offload the management plane and related data plane, achieving 100% utilization of data center computing resources. - -The management plane processes can be offloaded to the DPU by splitting the component source code. The source code is split into two parts that run independently on the host and DPU based on the function logic. In this way, the component is offloaded. However, this method has the following problems: -1. The software compatibility of the component is affected. You need to maintain the component and related patches in subsequent version upgrades, which increases the maintenance workload. -2. The offload cannot be inherited by other components. You need to split each component based on code logic analysis. - -To solve these problems, openEuler introduces imperceptible DPU offload. The abstraction layer provided by the OS shields the cross-host access differences between the host and DPU, and enables service processes to be offloaded to the DPU with virtually zero modification. This part of work at the common layer of the OS and is irrelevant to upper-layer services. Other services can also inherit the offload to DPU. - -## Architecture - -#### Imperceptible Container Management Plane DPU Offload Architecture - -**Figure 1** Imperceptible Container Management Plane DPU Offload Architecture - -![offload-arch](./figures/offload-arch.png) - -As shown in Figure 1, after the container management plane is offloaded, management processes such as dockerd and kubelet run on the DPU side, and container processes run on the host. The interaction between processes is ensured by the system layer. - -* Communication layer: DPUs and hosts can communicate with each other through PCIe interfaces or networks. A communication interface layer is provided based on underlying physical connections to provide communication interfaces for upper-layer services. - -* qtfs kernel shared file system: The container management plane components kubelet and dockerd interact with container processes through file systems. Management plane tools need to prepare data plane paths to rootfs and volume for container processes. In addition, the proc and cgroup file systems need to be used to control and monitor the resources and status of container processes. For details about qtfs, see [qtfs Shared File System Introduction and Usage](./qtfs-architecture-and-usage.md). - -* User-mode offload environment: You need to use qtfs to prepare the runtime environment for the offloaded management plane, and remotely mount the container management and runtime directories of the host to the DPU. System management file systems such as proc, sys, and cgroup need to be mounted. To prevent damage to the native system functions of the DPU, the preceding mounting operations are performed in the chroot environment. In addition, the management plane (running on the DPU) and container processes (running on the host) have invoking relationships. The rexec remote binary execution tool needs to be used to provide corresponding functions. - -For details about how to offload container management plane, see the [Deployment Guide](./offload-deployment-guide.md). - -> ![](public_sys-resources/icon-note.gif) **NOTE**: -> -> In this user guide, modifications are performed to the container management plane components and the rexec tool of a specific version. You can modify other versions based on the actual execution environment. The patch provided in this document is for verification only and is not for commercial use. diff --git a/docs/en/docs/DPUOffload/offload-deployment-guide.md b/docs/en/docs/DPUOffload/offload-deployment-guide.md deleted file mode 100644 index 5a0fa60b9fc5285aa00977c853d79ef6fcfcdbf9..0000000000000000000000000000000000000000 --- a/docs/en/docs/DPUOffload/offload-deployment-guide.md +++ /dev/null @@ -1,166 +0,0 @@ - -# Imperceptible Container Management Plane Offload Deployment Guide - -> ![](./public_sys-resources/icon-note.gif) **NOTE**: -> -> In this user guide, modifications are performed to the container management plane components and the rexec tool of a specific version. You can modify other versions based on the actual execution environment. The patch provided in this document is for verification only and is not for commercial use. - -> ![](./public_sys-resources/icon-note.gif) **NOTE**: -> -> The communication between shared file systems is implemented through the network. You can perform a simulated offload using two physical machines or VMs connected through the network. -> -> Before the verification, you are advised to set up a Kubernetes cluster and container running environment that can be used properly and offload the management plane process of a single node. You can use a physical machine or VM that is connected to the network as an emulated DPU. - -## Introduction - -Container management plane, that is, management tools of containers such as Kubernetes, dockerd, containerd, and isulad. Container management plane offload is to offload the container management plane from the host where the container is located to another host, that is, the DPU, a set of hardware that has an independent running environment. - -By mounting directories related to container running on the host to the DPU through qtfs, the container management plane tool running on the DPU can access these directories and prepare the running environment for the containers running on the host. To remotely mount the special file systems such as proc and sys, a dedicated rootfs is created as the running environment of Kubernetes and dockerd (referred to as **/another_rootfs**). - -In addition, rexec is used to start and delete containers so that the container management plane and containers can run on two different hosts for remote container management. - -## Related Component Patches - -#### rexec - -rexec is a remote execution tool written in the Go language based on the [rexec](https://github.com/docker/libchan/tree/master/examples/rexec) example tool of Docker/libchan. rexec is used to remotely invoke binary files. For ease of use, capabilities such as transferring environment variables and monitoring the exit of original processes are added to rexec. - -To use the rexec tool, run the `CMD_NET_ADDR=tcp://0.0.0.0: rexec_server` command on the server to start the rexec service process, and then run the `CMD_NET_ADDR=tcp://: rexec [command]` on the client`. This instructs rexec_server to execute the command. - -#### dockerd - -The changes to dockerd are based on version 18.09. - -In containerd, the part that invokes libnetwork-setkey through hook is commented out. This does not affect container startup. In addition, to ensure the normal use of `docker load`, an error in the `mount` function in **mounter_linux.go** is commented out. - -In the running environment of the container management plane, **/proc** is mounted to the proc file system on the server, and the local proc file system is mounted to **/local_proc**. In dockerd and containerd, **/proc** is changed to **/local_proc** for accessing **/proc/self/xxx**, **/proc/getpid()/xxx**, or related file systems. - -#### containerd - -The changes to containerd are based on containerd-1.2-rc.1. - -When obtaining mounting information, **/proc/self/mountinfo** can obtain only the local mounting information of dockerd but cannot obtain that on the server. Therefore, **/proc/self/mountinfo** is changed to **/proc/1/mountinfo** to obtain the mounting information on the server by obtaining the mounting information of process 1 on the server. - -In containerd-shim, the Unix socket that communicates with containerd is changed to TCP. containerd obtains the IP address of the running environment of containerd-shim through the **SHIM_HOST** environment variable, that is, the IP address of the server. The has value of shim is used to generate a port number, which is used as the communication port to start containerd-shim. - -In addition, the original method of sending signals to containerd-shim is changed to the method of remotely invoking the `kill` command to send signals to shim, ensuring that Docker can correctly kill containers. - -#### Kubernetes - -kubelet is not modified. The container QoS manager may fail to be configured for the first time. This error does not affect the subsequent pod startup process. - -## Container Management Plane Offload Operation Guide - -Start rexec_server on both the server and client. rexec_server on the server is used to invoke rexec to stat containerd-shim. rexec_server on the client is used to execute invoking of dockerd and containerd by containerd-shim. - -#### Server - -Create a folder required by the container management plane, insert **qtfs_server.ko**, and start the engine process. - -In addition, you need to create the rexec script **/usr/bin/dockerd** on the server. - -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://: rexec /usr/bin/dockerd $* -``` - -#### Client - -Prepare a rootfs as the running environment of dockerd and containerd. Use the following script to mount the server directories required by dockerd and containerd to the client. Ensure that the remote directories mounted in the script exist on both the server and client. - -``` shell -#!/bin/bash -mkdir -p /another_rootfs/var/run/docker/containerd -iptables -t nat -N DOCKER -echo "---------insmod qtfs ko----------" -insmod /YOUR/QTFS/PATH/qtfs.ko qtfs_server_ip= qtfs_log_level=INFO - -# The proc file system in the chroot environment is replaced by the proc shared file system of the DPU. The actual proc file system of the local host needs to be mounted to **/local_proc**. -mount -t proc proc /another_rootfs/local_proc/ - -# Bind the chroot internal environment to the external environment to facilitate configuration and running. -mount --bind /var/run/ /another_rootfs/var/run/ -mount --bind /var/lib/ /another_rootfs/var/lib/ -mount --bind /etc /another_rootfs/etc - -mkdir -p /another_rootfs/var/lib/isulad - -# Create and mount the dev, sys, and cgroup file systems in the chroot environment. -mount -t devtmpfs devtmpfs /another_rootfs/dev/ -mount -t sysfs sysfs /another_rootfs/sys -mkdir -p /another_rootfs/sys/fs/cgroup -mount -t tmpfs tmpfs /another_rootfs/sys/fs/cgroup -list="perf_event freezer files net_cls,net_prio hugetlb pids rdma cpu,cpuacct memory devices blkio cpuset" -for i in $list -do - echo $i - mkdir -p /another_rootfs/sys/fs/cgroup/$i - mount -t cgroup cgroup -o rw,nosuid,nodev,noexec,relatime,$i /another_rootfs/sys/fs/cgroup/$i -done - -## common system dir -mount -t qtfs -o proc /proc /another_rootfs/proc -echo "proc" -mount -t qtfs /sys /another_rootfs/sys -echo "cgroup" - -# Mount the shared directory required by the container management plane. -mount -t qtfs /var/lib/docker/containers /another_rootfs/var/lib/docker/containers -mount -t qtfs /var/lib/docker/containerd /another_rootfs/var/lib/docker/containerd -mount -t qtfs /var/lib/docker/overlay2 /another_rootfs/var/lib/docker/overlay2 -mount -t qtfs /var/lib/docker/image /another_rootfs/var/lib/docker/image -mount -t qtfs /var/lib/docker/tmp /another_rootfs/var/lib/docker/tmp -mkdir -p /another_rootfs/run/containerd/io.containerd.runtime.v1.linux/ -mount -t qtfs /run/containerd/io.containerd.runtime.v1.linux/ /another_rootfs/run/containerd/io.containerd.runtime.v1.linux/ -mkdir -p /another_rootfs/var/run/docker/containerd -mount -t qtfs /var/run/docker/containerd /another_rootfs/var/run/docker/containerd -mount -t qtfs /var/lib/kubelet/pods /another_rootfs/var/lib/kubelet/pods -``` - -In** /another_rootfs**, create the following script to support cross-host operations: - -* /another_rootfs/usr/local/bin/containerd-shim - -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://: /usr/bin/rexec /usr/bin/containerd-shim $* -``` - -* /another_rootfs/usr/local/bin/remote_kill - -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://: /usr/bin/rexec /usr/bin/kill $* -``` - -* /another_rootfs/usr/sbin/modprobe -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://: /usr/bin/rexec /usr/sbin/modprobe $* -``` - -After changing the root directories of dockerd and containerd to the required rootfs, run the following command to start dockerd and containerd: - -* containerd -``` shell -#!/bin/bash -SHIM_HOST= containerd --config /var/run/docker/containerd/containerd.toml --address /var/run/containerd/containerd.sock -``` - -* dockerd -``` shell -#!/bin/bash -SHIM_HOST=CMD_NET_ADDR=tcp://: /usr/bin/dockerd --containerd /var/run/containerd/containerd.sock -``` - -* kubelet - -Use the original parameters to start kubelet in the chroot environment. - -Because **/var/run/** is bound to **/another_rootfs/var/run/**, you can use Docker to access the **docker.sock** interface for container management in the regular rootfs. - -The container management plane is offloaded to the DPU. You can run `docker` commands to create and delete containers, or use `kubectl` on the current node to schedule and destroy pods. The actual container service process runs on the host. - -> ![](./public_sys-resources/icon-note.gif) **NOTE**: -> -> This guide describes only the container management plane offload. The offload of container network and data volumes requires additional offload capabilities, which are not included. You can perform cross-node startup of containers that are not configured with network and storage by referring to this guide. diff --git a/docs/en/docs/DPUOffload/overview.md b/docs/en/docs/DPUOffload/overview.md deleted file mode 100644 index 40a8d09060de0b5cf1dfc6231fac89a49ee82b0f..0000000000000000000000000000000000000000 --- a/docs/en/docs/DPUOffload/overview.md +++ /dev/null @@ -1,11 +0,0 @@ -# Imperceptible Container Management Plane DPU Offload User Guide - -This document describes the container management plane DPU offload function of openEuler and how to install and deploy the function. This function shields the differences of cross-host resource access of the container management plane through the unified abstraction layer provided by the operating system, so that services on the container management plane can be offloaded to the DPU. - -This document is intended for community developers, open source enthusiasts, and partners who use the openEuler system and want to learn and use the OS kernel and containers. Users must: - -- Know basic Linux operations. - -- Know basic mechanisms of the Linux kernel file system. - -- Understand Kubernetes and Docker and know how to deploy and use Docker and Kubernetes. diff --git a/docs/en/docs/DPUOffload/public_sys-resources/icon-note.gif b/docs/en/docs/DPUOffload/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/DPUOffload/public_sys-resources/icon-note.gif and /dev/null differ diff --git a/docs/en/docs/DPUOffload/qtfs-architecture-and-usage.md b/docs/en/docs/DPUOffload/qtfs-architecture-and-usage.md deleted file mode 100644 index c1c5bc7e943aeb235d5bd62e9c675a8a622fdbe4..0000000000000000000000000000000000000000 --- a/docs/en/docs/DPUOffload/qtfs-architecture-and-usage.md +++ /dev/null @@ -1,67 +0,0 @@ -# qtfs Shared File System Architecture and Usage - -## Introduction - -qtfs is a shared file system project. It can be deployed on the host-DPU hardware architecture or two hosts. qtfs works in client-server mode. It enables the client to access specified file systems on the server in a way similar to accessing local files. - -Features of qtfs: - -+ Mount point propagation - -+ Sharing of special file systems such as proc, sys, and cgroup - -+ Sharing of remote file read and write - -+ Remote mounting of file system on the server from the client. - -+ Customized processing of special files - -+ Remote FIFO, Unix sockets, and epoll so that the client and server can use these files like local communication. - -+ Bottom-layer host-DPU communication based on the PCIe protocol, providing better performance than the network - -+ Kernel module development, preventing intrusive modification to the kernel - -## Software Architecture - -![qtfs-arch](./figures/qtfs-arch.png) - -## Installation - -Directory structure: - -+ **qtfs**: code related to the client kernel module. Compile the client .ko file in this directory. - -+ **qtfs_server**: code related to the server kernel module. Compile the .ko files and related programs on the server in this directory. - -+ **qtinfo**: diagnosis tool, which is used to query the working status of the file system and change the log level. - -+ **demo**, **test**, and **doc**: test programs, demo programs, and project documents. - -+ Root directory: common module code used by the client and server. - -Configure the kernel compilation environment on two servers (or VMs). - - 1. The kernel version must be 5.10 or later. - 2. Install the kernel development package: `yum install kernel-devel` - -Install the server: - - 1. cd qtfs_server - 2. make clean && make - 3. insmod qtfs_server.ko qtfs_server_ip=x.x.x.x qtfs_server_port=12345 qtfs_log_level=WARN - 4. ./engine 4096 16 - -Install the client: - - 1. cd qtfs - 2. make clean && make - 3. insmod qtfs.ko qtfs_server_ip=x.x.x.x qtfs_server_port=12345 qtfs_log_level=WARN - -## Usage - -After the installation is complete, mount the server file system to make it visible to the client. For example: - - mount -t qtfs / /root/mnt/ - -Access **/root/mnt** on the client to view and perform operations on files on the server. diff --git a/docs/en/docs/HASK/develop_with_hsak.md b/docs/en/docs/HASK/develop_with_hsak.md deleted file mode 100644 index 923f3bed0dfa78066b7187b6d458e0a873609187..0000000000000000000000000000000000000000 --- a/docs/en/docs/HASK/develop_with_hsak.md +++ /dev/null @@ -1,229 +0,0 @@ -## Instructions - -### **nvme.conf.in** Configuration File - -By default, the HSAK configuration file is located in **/etc/spdk/nvme.conf.in**. You can modify the configuration file based on service requirements. The content of the configuration file is as follows: - -- [Global] - -1. **ReactorMask**: cores used for I/O polling. The value is a hexadecimal number and cannot be set to core 0. The bits from the least significant one to the most significant one indicate different CPU cores. For example, 0x1 indicates core 0, and 0x6 indicates cores 1 and 2. This parameter supports a maximum of 34 characters, including the hexadecimal flag **0x**. Each hexadecimal character can be F at most, indicating four cores. Therefore, a maximum of 128 (32 x 4) cores are supported. -2. **LogLevel**: HSAK log print level (**0**: error; **1**: warning; **2**: notice; **3**: info; **4**: debug). -3. **MemSize**: memory occupied by HSAK (The minimum value is 500 MB.) -4. **MultiQ**: whether to enable multi-queue on the same block device. -5. **E2eDif**: DIF type (**1**: half-way protection; **2**: full protection). Drives from different vendors may have different DIF support capabilities. For details, see the documents provided by hardware vendors. -6. **IoStat**: whether to enable the I/O statistics function. The options are **Yes** and **No**. -7. **RpcServer**: whether to start the RPC listening thread. The options are **Yes** and **No**. -8. **NvmeCUSE**: whether to enable the CUSE function. The options are **Yes** and **No**. After the function is enabled, the NVMe character device is generated in the **/dev/spdk** directory. - -- [Nvme] - -1. **TransportID**: PCI address and name of the NVMe controller. The format is **TransportID "trtype:PCIe traddr:0000:09:00.0" nvme0**. -2. **RetryCount**: number of retries upon an I/O failure. The value **0** indicates no retry. The maximum value is **255**. -3. **TimeoutUsec**: I/O timeout interval. If this parameter is set to **0** or left blank, no timeout interval is set. The unit is μs. -4. **ActionOnTimeout**: I/O timeout behavior (**None**: prints information only; **Reset**: resets the controller; **abort**: aborts the command). The default value is **None**. - -- [Reactor] - -1. **BatchSize**: number of I/Os that can be submitted in batches. The default value is **8**, and the maximum value is **32**. - -### Header File Reference - -HSAK provides two external header files. Include the two files when using HSAK for development. - -1. **bdev_rw.h**: defines the macros, enumerations, data structures, and APIs of the user-mode I/O operations on the data plane. -2. **ublock.h**: defines macros, enumerations, data structures, and APIs for functions such as device management and information obtaining on the management plane. - -### Service Running - -After software development and compilation, you must run the **setup.sh** script to rebind the NVMe drive driver to the user mode before running the software. The script is located in **/opt/spdk** by default. -Run the following commands to change the drive driver's binding mode from kernel to user and reserve 1024 x 2 MB huge pages: - -```shell -[root@localhost ~]# cd /opt/spdk -[root@localhost spdk]# ./setup.sh -0000:3f:00.0 (8086 2701): nvme -> uio_pci_generic -0000:40:00.0 (8086 2701): nvme -> uio_pci_generic -``` - -Run the following commands to restore the drive driver's mode from user to kernel and free the reserved huge pages: - -```shell -[root@localhost ~]# cd /opt/spdk -[root@localhost spdk]# ./setup.sh reset -0000:3f:00.0 (8086 2701): uio_pci_generic -> nvme -0000:40:00.0 (8086 2701): uio_pci_generic -> nvme -``` - -### User-Mode I/O Read and Write Scenarios - -Call HSAK APIs in the following sequence to read and write service data through the user-mode I/O channel: - -1. Initialize the HSAK UIO module. - Call **libstorage_init_module** to initialize the HSAK user-mode I/O channel. - -2. Open a drive block device. - Call **libstorage_open** to open a specified block device. If multiple block devices need to be opened, call this API repeatedly. - -3. Allocate I/O memory. - Call **libstorage_alloc_io_buf** or **libstorage_mem_reserve** to allocate memory. **libstorage_alloc_io_buf** can allocate a maximum of 65 KB I/Os, and **libstorage_mem_reserve** can allocate unlimited memory unless there is no available space. - -4. Perform read and write operations on a drive. - You can call the following APIs to perform read and write operations based on service requirements: - - - libstorage_async_read - - libstorage_async_readv - - libstorage_async_write - - libstorage_async_writev - - libstorage_sync_read - - libstorage_sync_write - -5. Free I/O memory. - Call **libstorage_free_io_buf** or **libstorage_mem_free** to free memory, which must correspond to the API used to allocate memory. - -6. Close a drive block device. - Call **libstorage_close** to close a specified block device. If multiple block devices are opened, call this API repeatedly to close them. - - | API | Description | - | ----------------------- | ------------------------------------------------------------ | - | libstorage_init_module | Initializes the HSAK module. | - | libstorage_open | Opens a block device. | - | libstorage_alloc_io_buf | Allocates memory from buf_small_pool or buf_large_pool of SPDK. | - | libstorage_mem_reserve | Allocates memory space from the huge page memory reserved by DPDK. | - | libstorage_async_read | Delivers asynchronous I/O read requests (the read buffer is a contiguous buffer). | - | libstorage_async_readv | Delivers asynchronous I/O read requests (the read buffer is a discrete buffer). | - | libstorage_async_write | Delivers asynchronous I/O write requests (the write buffer is a contiguous buffer). | - | libstorage_async_wrtiev | Delivers asynchronous I/O write requests (the write buffer is a discrete buffer). | - | libstorage_sync_read | Delivers synchronous I/O read requests (the read buffer is a contiguous buffer). | - | libstorage_sync_write | Delivers synchronous I/O write requests (the write buffer is a contiguous buffer). | - | libstorage_free_io_buf | Frees the allocated memory to buf_small_pool or buf_large_pool of SPDK. | - | libstorage_mem_free | Frees the memory space that libstorage_mem_reserve allocates. | - | libstorage_close | Closes a block device. | - | libstorage_exit_module | Exits the HSAK module. | - -### Drive Management Scenarios - -HSAK contains a group of C APIs, which can be used to format drives and create and delete namespaces. - -1. Call the C API to initialize the HSAK UIO component. If the HSAK UIO component has been initialized, skip this operation. - - libstorage_init_module - -2. Call corresponding APIs to perform drive operations based on service requirements. The following APIs can be called separately: - - - libstorage_create_namespace - - - libstorage_delete_namespace - - - libstorage_delete_all_namespace - - - libstorage_nvme_create_ctrlr - - - libstorage_nvme_delete_ctrlr - - - libstorage_nvme_reload_ctrlr - - - libstorage_low_level_format_nvm - - - libstorage_deallocate_block - -3. If you exit the program, destroy the HSAK UIO. If other services are using the HSAK UIO, you do not need to exit the program and destroy the HSAK UIO. - - libstorage_exit_module - - | API | Description | - | ------------------------------- | ------------------------------------------------------------ | - | libstorage_create_namespace | Creates a namespace on a specified controller (the prerequisite is that the controller supports namespace management). | - | libstorage_delete_namespace | Deletes a namespace from a specified controller. | - | libstorage_delete_all_namespace | Deletes all namespaces from a specified controller. | - | libstorage_nvme_create_ctrlr | Creates an NVMe controller based on the PCI address. | - | libstorage_nvme_delete_ctrlr | Destroys an NVMe controller based on the controller name. | - | libstorage_nvme_reload_ctrlr | Automatically creates or destroys the NVMe controller based on the input configuration file. | - | libstorage_low_level_format_nvm | Low-level formats an NVMe drive. | - | libstorage_deallocate_block | Notifies NVMe drives of blocks that can be freed for garbage collection. | - -### Data-Plane Drive Information Query - -The I/O data plane of HSAK provides a group of C APIs for querying drive information. Upper-layer services can process service logic based on the queried information. - -1. Call the C API to initialize the HSAK UIO component. If the HSAK UIO component has been initialized, skip this operation. - - libstorage_init_module - -2. Call corresponding APIs to query information based on service requirements. The following APIs can be called separately: - - - libstorage_get_nvme_ctrlr_info - - - libstorage_get_mgr_info_by_esn - - - libstorage_get_mgr_smart_by_esn - - - libstorage_get_bdev_ns_info - - - libstorage_get_ctrl_ns_info - -3. If you exit the program, destroy the HSAK UIO. If other services are using the HSAK UIO, you do not need to exit the program and destroy the HSAK UIO. - - libstorage_exit_module - - | API | Description | - | ------------------------------- | ------------------------------------------------------------ | - | libstorage_get_nvme_ctrlr_info | Obtains information about all controllers. | - | libstorage_get_mgr_info_by_esn | Obtains the management information of the drive corresponding to an ESN. | - | libstorage_get_mgr_smart_by_esn | Obtains the S.M.A.R.T. information of the drive corresponding to an ESN. | - | libstorage_get_bdev_ns_info | Obtains namespace information based on the device name. | - | libstorage_get_ctrl_ns_info | Obtains information about all namespaces based on the controller name. | - -### Management-Plane Drive Information Query - -The management plane component Ublock of HSAK provides a group of C APIs for querying drive information on the management plane. - -1. Call the C API to initialize the HSAK Ublock server. - -2. Call the HSAK UIO component initialization API in another process based on service requirements. - -3. If multiple processes are required to query drive information, initialize the Ublock client. - -4. Call the APIs listed in the following table on the Ublock server process or client process to query information. - -5. After obtaining the block device list, call the APIs listed in the following table to free resources. - -6. If you exit the program, destroy the HSAK Ublock module (the destruction method on the server is the same as that on the client). - - | API | Description | - | ---------------------------- | ------------------------------------------------------------ | - | init_ublock | Initializes the Ublock function module. This API must be called before the other Ublock APIs. A process can be initialized only once because the init_ublock API initializes DPDK. The initial memory allocated by DPDK is bound to the process PID. One PID can be bound to only one memory. In addition, DPDK does not provide an API for freeing the memory. The memory can be freed only by exiting the process. | - | ublock_init | It is the macro definition of the init_ublock API. It can be considered as initializing Ublock to an RPC service. | - | ublock_init_norpc | It is the macro definition of the init_ublock API. It can be considered as initializing Ublock to a non-RPC service. | - | ublock_get_bdevs | Obtains the device list. The obtained device list contains only PCI addresses and does not contain specific device information. To obtain specific device information, call the ublock_get_bdev API. | - | ublock_get_bdev | Obtains information about a specific device, including the device serial number, model, and firmware version. The information is stored in character arrays instead of character strings. | - | ublock_get_bdev_by_esn | Obtains the device information based on the specified ESN, including the serial number, model, and firmware version. | - | ublock_get_SMART_info | Obtains the S.M.A.R.T. information of a specified device. | - | ublock_get_SMART_info_by_esn | Obtains the S.M.A.R.T. information of the device corresponding to an ESN. | - | ublock_get_error_log_info | Obtains the error log information of a device. | - | ublock_get_log_page | Obtains information about a specified log page of a specified device. | - | ublock_free_bdevs | Frees the device list. | - | ublock_free_bdev | Frees device resources. | - | ublock_fini | Destroys the Ublock module. This API destroys the Ublock module and internally created resources. This API must be used together with the Ublock initialization API. | - -### Log Management - -HSAK logs are exported to **/var/log/messages** through syslog by default and managed by the rsyslog service of the OS. If a custom log directory is required, use rsyslog to configure the log directory. - -1. Modify the **/etc/rsyslog.conf** configuration file. - -2. Restart the rsyslog service: - - ```shell - if ($programname == 'LibStorage') then { - action(type="omfile" fileCreateMode="0600" file="/var/log/HSAK/run.log") - stop - } - ``` - -3. Start the HSAK process. The log information is redirected to the target directory. - - ```shell - sysemctl restart rsyslog - ``` - -4. If redirected logs need to be dumped, manually configure log dump in the **/etc/logrotate.d/syslog** file. \ No newline at end of file diff --git a/docs/en/docs/HASK/hsak_tools_usage.md b/docs/en/docs/HASK/hsak_tools_usage.md deleted file mode 100644 index 51ccaaae576de301b7c630126871e074ea6b1f5c..0000000000000000000000000000000000000000 --- a/docs/en/docs/HASK/hsak_tools_usage.md +++ /dev/null @@ -1,123 +0,0 @@ -## Command-Line Interface - -### Command for Querying Drive Information - -#### Format - -```shell -libstorage-list [] [] -``` - -#### Parameters - -- *commands*: Only **help** is available. **libstorage-list help** is used to display the help information. - -- *device*: specifies the PCI address. The format is **0000:09:00.0**. Multiple PCI addresses are allowed and separated by spaces. If no specific PCI address is set, the command line lists all enumerated device information. - -#### Precautions - -- The fault injection function applies only to development, debugging, and test scenarios. Do not use this function on live networks. Otherwise, service and security risks may occur. - -- Before running this command, ensure that the management component (Ublock) server has been started, and the user-mode I/O component (UIO) has not been started or has been correctly started. - -- Drives that are not occupied by the Ublock or UIO component will be occupied during the command execution. If the Ublock or UIO component attempts to obtain the drive control permission, a storage device access conflict may occur. As a result, the command execution fails. - -### Command for Switching Drivers for Drives - -#### Format - -```shell -libstorage-shutdown reset [ ...] -``` - -#### Parameters - -- **reset**: switches the UIO driver to the kernel-mode driver for a specific drive. - -- *device*: specifies the PCI address, for example, **0000:09:00.0**. Multiple PCI addresses are allowed and separated by spaces. - -#### Precautions - -- The **libstorage-shutdown reset** command is used to switch a drive from the user-mode UIO driver to the kernel-mode NVMe driver. - -- Before running this command, ensure that the Ublock server has been started, and the UIO component has not been started or has been correctly started. - -- The **libstoage-shutdown reset** command is risky. Before switching to the NVMe driver, ensure that the user-mode instance has stopped delivering I/Os to the NVMe device, all FDs on the NVMe device have been disabled, and the instance that accesses the NVMe device has exited. - -### Command for Obtaining I/O Statistics - -#### Format - -```shell -libstorage-iostat [-t ] [-i ] [-d ] -``` - -#### Parameters - -- -**t**: interval, in seconds. The value ranges from 1 to 3600. This parameter is of the int type. If the input parameter value exceeds the upper limit of the int type, the value is truncated to a negative or positive number. - -- -**i**: number of collection times. The minimum value is **1** and the maximum value is *MAX_INT*. If this parameter is not set, information is collected at an interval by default. This parameter is of the int type. If the input parameter value exceeds the upper limit of the int type, the value is truncated to a negative or positive number. - -- -**d**: name of a block device (for example, **nvme0n1**, which depends on the controller name configured in **/etc/spdk/nvme.conf.in**). You can use this parameter to collect performance data of one or more specified devices. If this parameter is not set, performance data of all detected devices is collected. - -#### Precautions - -- The I/O statistics configuration is enabled. - -- The process has delivered I/O operations to the drive whose performance information needs to be queried through the UIO component. - -- If no device in the current environment is occupied by service processes to deliver I/Os, the command exits after the message "You cannot get iostat info for nvme device no deliver io" is displayed. - -- When multiple queues are enabled on a drive, the I/O statistics tool summarizes the performance data of multiple queues on the drive and outputs the data in a unified manner. - -- The I/O statistics tool supports data records of a maximum of 8192 drive queues. - -- The I/O statistics are as follows: - - | Device | r/s | w/s | rKB/s | wKB/s | avgrq-sz | avgqu-sz | r_await | w_await | await | svctm | util% | poll-n | - | ----------- | ------------------------------ | ------------------------------- | ----------------------------------- | ------------------------------------ | -------------------------------------- | -------------------------- | --------------------- | ---------------------- | ------------------------------- | --------------------------------------- | ------------------ | -------------------------- | - | Device name | Number of read I/Os per second | Number of write I/Os per second | Number of read I/O bytes per second | Number of write I/O bytes per second | Average size of delivered I/Os (bytes) | I/O depth of a drive queue | I/O read latency (μs) | I/O write latency (μs) | Average read/write latency (μs) | Processing latency of a single I/O (μs) | Device utilization | Number of polling timeouts | - -## Commands for Drive Read/Write Operations - -#### Format - -```shell -libstorage-rw [OPTIONS...] -``` - -#### Parameters - -1. **COMMAND** parameters - - - **read**: reads a specified logical block from the device to the data buffer (standard output by default). - - - **write**: writes data in a data buffer (standard input by default) to a specified logical block of the NVMe device. - - - **help**: displays the help information about the command line. - -2. **device**: specifies the PCI address, for example, **0000:09:00.0**. - -3. **OPTIONS** parameters - - - --**start-block, -s**: indicates the 64-bit start address of the logical block to be read or written. The default value is **0**. - - - --**block-count, -c**: indicates the number of the logical blocks to be read or written (counted from 0). - - - --**data-size, -z**: indicates the number of bytes of the data to be read or written. - - - --**namespae-id, -n**: indicates the namespace ID of the device. The default value is **1**. - - - --**data, -d**: indicates the data file used for read and write operations (The read data is saved during read operations and the written data is provided during write operations.) - - - --**limited-retry, -l**: indicates that the device controller restarts for a limited number of times to complete device read and write operations. - - - --**force-unit-access, -f**: ensures that read and write operations are completed from the nonvolatile media before the instruction is completed. - - - --**show-command, -v**: displays instruction information before sending a read/write command. - - - --**dry-run, -w**: displays only information about read and write instructions but does not perform actual read and write operations. - - - --**latency. -t**: collects statistics on the end-to-end read and write latency of the CLI. - - - --**help, -h**: displays the help information about related commands. \ No newline at end of file diff --git a/docs/en/docs/HASK/introduce_hsak.md b/docs/en/docs/HASK/introduce_hsak.md deleted file mode 100644 index 248dd928aee88531f5162f7a2dc9add414f458ee..0000000000000000000000000000000000000000 --- a/docs/en/docs/HASK/introduce_hsak.md +++ /dev/null @@ -1,47 +0,0 @@ -# HSAK Developer Guide - -## Overview - -As the performance of storage media such as NVMe SSDs and SCMs is continuously improved, the latency overhead of the media layer in the I/O stack continuously reduces, and the overhead of the software stack becomes a bottleneck. Therefore, the kernel I/O data plane needs to be reconstructed to reduce the overhead of the software stack. HSAK provides a high-bandwidth and low-latency I/O software stack for new storage media, which reduces the overhead by more than 50% compared with the traditional I/O software stack. -The HSAK user-mode I/O engine is developed based on the open-source SPDK. - -1. A unified interface is provided for external systems to shield the differences between open-source interfaces. -2. Enhanced I/O data plane features are added, such as DIF, drive formatting, batch I/O delivery, trim, and dynamic drive addition and deletion. -3. Commercial features such as drive device management, drive I/O monitoring, and maintenance and test tools are provided. - -## Compilation Tutorial - -1. Download the HSAK source code. - - $ git clone https://gitee.com/openeuler/hsak.git - -2. Install the compilation and running dependencies. - - The compilation and running of HSAK depend on components such as Storage Performance Development Kit (SPDK), Data Plane Development Kit (DPDK), and libboundscheck. - -3. Start the compilation. - - $ cd hsak - - $ mkdir build - - $ cd build - - $ cmake .. - - $ make - -## Precautions - -### Constraints - -- A maximum of 512 NVMe devices can be used and managed on the same machine. -- When HSAK is enabled to execute I/O-related services, ensure that the system has at least 500 MB continuous idle huge page memory. -- Before enabling the user-mode I/O component to execute services, ensure that the drive management component (Ublock) has been enabled. -- When the drive management component (Ublock) is enabled to execute services, ensure that the system has sufficient continuous idle memory. Each time the Ublock component is initialized, 20 MB huge page memory is allocated. -- Before HSAK is run, **setup.sh** is called to configure huge page memory and unbind the kernel-mode driver of the NVMe device. -- Other interfaces provided by the HSAK module can be used only after libstorage_init_module is successfully executed. Each process can call libstorage_init_module only once. -- After the libstorage_exit_module function is executed, other interfaces provided by HSAK cannot be used. In multi-thread scenarios, exit HSAK after all threads end. -- Only one service can be started for the HSAK Ublock component on a server and supports concurrent access of a maximum of 64 Ublock clients. The Ublock server can process a maximum of 20 client requests per second. -- The HSAK Ublock component must be started earlier than the data plane I/O component and Ublock clients. The command line tool provided by HSAK can be executed only after the Ublock server is started. -- Do not register the function for processing the SIGBUS signal. SPDK has an independent processing function for the signal. If the processing function is overwritten, the registered signal processing function becomes invalid and a core dump occurs. \ No newline at end of file diff --git a/docs/en/docs/HSAK/develop_with_hsak.md b/docs/en/docs/HSAK/develop_with_hsak.md deleted file mode 100644 index d135ebd16b0767aa2c4945395e8411b1e7a1353a..0000000000000000000000000000000000000000 --- a/docs/en/docs/HSAK/develop_with_hsak.md +++ /dev/null @@ -1,229 +0,0 @@ -## Instructions - -### **nvme.conf.in** Configuration File - -By default, the HSAK configuration file is located in **/etc/spdk/nvme.conf.in**. You can modify the configuration file based on service requirements. The content of the configuration file is as follows: - -- [Global] - -1. **ReactorMask**: cores used for I/O polling. The value is a hexadecimal number and cannot be set to core 0. The bits from the least significant one to the most significant one indicate different CPU cores. For example, 0x1 indicates core 0, and 0x6 indicates cores 1 and 2. This parameter supports a maximum of 34 characters, including the hexadecimal flag **0x**. Each hexadecimal character can be F at most, indicating four cores. Therefore, a maximum of 128 (32 x 4) cores are supported. -2. **LogLevel**: HSAK log print level (**0**: error; **1**: warning; **2**: notice; **3**: info; **4**: debug). -3. **MemSize**: memory occupied by HSAK (The minimum value is 500 MB.) -4. **MultiQ**: whether to enable multi-queue on the same block device. -5. **E2eDif**: DIF type (**1**: half-way protection; **2**: full protection). Drives from different vendors may have different DIF support capabilities. For details, see the documents provided by hardware vendors. -6. **IoStat**: whether to enable the I/O statistics function. The options are **Yes** and **No**. -7. **RpcServer**: whether to start the RPC listening thread. The options are **Yes** and **No**. -8. **NvmeCUSE**: whether to enable the CUSE function. The options are **Yes** and **No**. After the function is enabled, the NVMe character device is generated in the **/dev/spdk** directory. - -- [Nvme] - -1. **TransportID**: PCI address and name of the NVMe controller. The format is **TransportID "trtype:PCIe traddr:0000:09:00.0" nvme0**. -2. **RetryCount**: number of retries upon an I/O failure. The value **0** indicates no retry. The maximum value is **255**. -3. **TimeoutUsec**: I/O timeout interval. If this parameter is set to **0** or left blank, no timeout interval is set. The unit is μs. -4. **ActionOnTimeout**: I/O timeout behavior (**None**: prints information only; **Reset**: resets the controller; **abort**: aborts the command). The default value is **None**. - -- [Reactor] - -1. **BatchSize**: number of I/Os that can be submitted in batches. The default value is **8**, and the maximum value is **32**. - -### Header File Reference - -HSAK provides two external header files. Include the two files when using HSAK for development. - -1. **bdev_rw.h**: defines the macros, enumerations, data structures, and APIs of the user-mode I/O operations on the data plane. -2. **ublock.h**: defines macros, enumerations, data structures, and APIs for functions such as device management and information obtaining on the management plane. - -### Service Running - -After software development and compilation, you must run the **setup.sh** script to rebind the NVMe drive driver to the user mode before running the software. The script is located in **/opt/spdk** by default. -Run the following commands to change the drive driver's binding mode from kernel to user and reserve 1024 x 2 MB huge pages: - -```shell -[root@localhost ~]# cd /opt/spdk -[root@localhost spdk]# ./setup.sh -0000:3f:00.0 (8086 2701): nvme -> uio_pci_generic -0000:40:00.0 (8086 2701): nvme -> uio_pci_generic -``` - -Run the following commands to restore the drive driver's mode from user to kernel and free the reserved huge pages: - -```shell -[root@localhost ~]# cd /opt/spdk -[root@localhost spdk]# ./setup.sh reset -0000:3f:00.0 (8086 2701): uio_pci_generic -> nvme -0000:40:00.0 (8086 2701): uio_pci_generic -> nvme -``` - -### User-Mode I/O Read and Write Scenarios - -Call HSAK APIs in the following sequence to read and write service data through the user-mode I/O channel: - -1. Initialize the HSAK UIO module. - Call **libstorage_init_module** to initialize the HSAK user-mode I/O channel. - -2. Open a drive block device. - Call **libstorage_open** to open a specified block device. If multiple block devices need to be opened, call this API repeatedly. - -3. Allocate I/O memory. - Call **libstorage_alloc_io_buf** or **libstorage_mem_reserve** to allocate memory. **libstorage_alloc_io_buf** can allocate a maximum of 65 KB I/Os, and **libstorage_mem_reserve** can allocate unlimited memory unless there is no available space. - -4. Perform read and write operations on a drive. - You can call the following APIs to perform read and write operations based on service requirements: - - - libstorage_async_read - - libstorage_async_readv - - libstorage_async_write - - libstorage_async_writev - - libstorage_sync_read - - libstorage_sync_write - -5. Free I/O memory. - Call **libstorage_free_io_buf** or **libstorage_mem_free** to free memory, which must correspond to the API used to allocate memory. - -6. Close a drive block device. - Call **libstorage_close** to close a specified block device. If multiple block devices are opened, call this API repeatedly to close them. - - | API | Description | - | ----------------------- | ------------------------------------------------------------ | - | libstorage_init_module | Initializes the HSAK module. | - | libstorage_open | Opens a block device. | - | libstorage_alloc_io_buf | Allocates memory from buf_small_pool or buf_large_pool of SPDK. | - | libstorage_mem_reserve | Allocates memory space from the huge page memory reserved by DPDK. | - | libstorage_async_read | Delivers asynchronous I/O read requests (the read buffer is a contiguous buffer). | - | libstorage_async_readv | Delivers asynchronous I/O read requests (the read buffer is a discrete buffer). | - | libstorage_async_write | Delivers asynchronous I/O write requests (the write buffer is a contiguous buffer). | - | libstorage_async_wrtiev | Delivers asynchronous I/O write requests (the write buffer is a discrete buffer). | - | libstorage_sync_read | Delivers synchronous I/O read requests (the read buffer is a contiguous buffer). | - | libstorage_sync_write | Delivers synchronous I/O write requests (the write buffer is a contiguous buffer). | - | libstorage_free_io_buf | Frees the allocated memory to buf_small_pool or buf_large_pool of SPDK. | - | libstorage_mem_free | Frees the memory space that libstorage_mem_reserve allocates. | - | libstorage_close | Closes a block device. | - | libstorage_exit_module | Exits the HSAK module. | - -### Drive Management Scenarios - -HSAK contains a group of C APIs, which can be used to format drives and create and delete namespaces. - -1. Call the C API to initialize the HSAK UIO component. If the HSAK UIO component has been initialized, skip this operation. - - libstorage_init_module - -2. Call corresponding APIs to perform drive operations based on service requirements. The following APIs can be called separately: - - - libstorage_create_namespace - - - libstorage_delete_namespace - - - libstorage_delete_all_namespace - - - libstorage_nvme_create_ctrlr - - - libstorage_nvme_delete_ctrlr - - - libstorage_nvme_reload_ctrlr - - - libstorage_low_level_format_nvm - - - libstorage_deallocate_block - -3. If you exit the program, destroy the HSAK UIO. If other services are using the HSAK UIO, you do not need to exit the program and destroy the HSAK UIO. - - libstorage_exit_module - - | API | Description | - | ------------------------------- | ------------------------------------------------------------ | - | libstorage_create_namespace | Creates a namespace on a specified controller (the prerequisite is that the controller supports namespace management). | - | libstorage_delete_namespace | Deletes a namespace from a specified controller. | - | libstorage_delete_all_namespace | Deletes all namespaces from a specified controller. | - | libstorage_nvme_create_ctrlr | Creates an NVMe controller based on the PCI address. | - | libstorage_nvme_delete_ctrlr | Destroys an NVMe controller based on the controller name. | - | libstorage_nvme_reload_ctrlr | Automatically creates or destroys the NVMe controller based on the input configuration file. | - | libstorage_low_level_format_nvm | Low-level formats an NVMe drive. | - | libstorage_deallocate_block | Notifies NVMe drives of blocks that can be freed for garbage collection. | - -### Data-Plane Drive Information Query - -The I/O data plane of HSAK provides a group of C APIs for querying drive information. Upper-layer services can process service logic based on the queried information. - -1. Call the C API to initialize the HSAK UIO component. If the HSAK UIO component has been initialized, skip this operation. - - libstorage_init_module - -2. Call corresponding APIs to query information based on service requirements. The following APIs can be called separately: - - - libstorage_get_nvme_ctrlr_info - - - libstorage_get_mgr_info_by_esn - - - libstorage_get_mgr_smart_by_esn - - - libstorage_get_bdev_ns_info - - - libstorage_get_ctrl_ns_info - -3. If you exit the program, destroy the HSAK UIO. If other services are using the HSAK UIO, you do not need to exit the program and destroy the HSAK UIO. - - libstorage_exit_module - - | API | Description | - | ------------------------------- | ------------------------------------------------------------ | - | libstorage_get_nvme_ctrlr_info | Obtains information about all controllers. | - | libstorage_get_mgr_info_by_esn | Obtains the management information of the drive corresponding to an ESN. | - | libstorage_get_mgr_smart_by_esn | Obtains the S.M.A.R.T. information of the drive corresponding to an ESN. | - | libstorage_get_bdev_ns_info | Obtains namespace information based on the device name. | - | libstorage_get_ctrl_ns_info | Obtains information about all namespaces based on the controller name. | - -### Management-Plane Drive Information Query - -The management plane component Ublock of HSAK provides a group of C APIs for querying drive information on the management plane. - -1. Call the C API to initialize the HSAK Ublock server. - -2. Call the HSAK UIO component initialization API in another process based on service requirements. - -3. If multiple processes are required to query drive information, initialize the Ublock client. - -4. Call the APIs listed in the following table on the Ublock server process or client process to query information. - -5. After obtaining the block device list, call the APIs listed in the following table to free resources. - -6. If you exit the program, destroy the HSAK Ublock module (the destruction method on the server is the same as that on the client). - - | API | Description | - | ---------------------------- | ------------------------------------------------------------ | - | init_ublock | Initializes the Ublock function module. This API must be called before the other Ublock APIs. A process can be initialized only once because the init_ublock API initializes DPDK. The initial memory allocated by DPDK is bound to the process PID. One PID can be bound to only one memory. In addition, DPDK does not provide an API for freeing the memory. The memory can be freed only by exiting the process. | - | ublock_init | It is the macro definition of the init_ublock API. It can be considered as initializing Ublock to an RPC service. | - | ublock_init_norpc | It is the macro definition of the init_ublock API. It can be considered as initializing Ublock to a non-RPC service. | - | ublock_get_bdevs | Obtains the device list. The obtained device list contains only PCI addresses and does not contain specific device information. To obtain specific device information, call the ublock_get_bdev API. | - | ublock_get_bdev | Obtains information about a specific device, including the device serial number, model, and firmware version. The information is stored in character arrays instead of character strings. | - | ublock_get_bdev_by_esn | Obtains the device information based on the specified ESN, including the serial number, model, and firmware version. | - | ublock_get_SMART_info | Obtains the S.M.A.R.T. information of a specified device. | - | ublock_get_SMART_info_by_esn | Obtains the S.M.A.R.T. information of the device corresponding to an ESN. | - | ublock_get_error_log_info | Obtains the error log information of a device. | - | ublock_get_log_page | Obtains information about a specified log page of a specified device. | - | ublock_free_bdevs | Frees the device list. | - | ublock_free_bdev | Frees device resources. | - | ublock_fini | Destroys the Ublock module. This API destroys the Ublock module and internally created resources. This API must be used together with the Ublock initialization API. | - -### Log Management - -HSAK logs are exported to **/var/log/messages** through syslog by default and managed by the rsyslog service of the OS. If a custom log directory is required, use rsyslog to configure the log directory. - -1. Modify the **/etc/rsyslog.conf** configuration file. - -2. Restart the rsyslog service: - - ```shell - if ($programname == 'LibStorage') then { - action(type="omfile" fileCreateMode="0600" file="/var/log/HSAK/run.log") - stop - } - ``` - -3. Start the HSAK process. The log information is redirected to the target directory. - - ```shell - sysemctl restart rsyslog - ``` - -4. If redirected logs need to be dumped, manually configure log dump in the **/etc/logrotate.d/syslog** file. \ No newline at end of file diff --git a/docs/en/docs/HSAK/hsak_interface.md b/docs/en/docs/HSAK/hsak_interface.md deleted file mode 100644 index c1e45123bf7ca1bf14ec9bbdac6420f5182321ba..0000000000000000000000000000000000000000 --- a/docs/en/docs/HSAK/hsak_interface.md +++ /dev/null @@ -1,2551 +0,0 @@ -## C APIs - -### Macro Definition and Enumeration - -#### bdev_rw.h - -##### enum libstorage_ns_lba_size - -1. Prototype - -``` -enum libstorage_ns_lba_size -{ -LIBSTORAGE_NVME_NS_LBA_SIZE_512 = 0x9, -LIBSTORAGE_NVME_NS_LBA_SIZE_4K = 0xc -}; -``` - -2. Description - -Sector (data) size of a drive. - -##### enum libstorage_ns_md_size - -1. Prototype - -``` -enum libstorage_ns_md_size -{ -LIBSTORAGE_METADATA_SIZE_0 = 0, -LIBSTORAGE_METADATA_SIZE_8 = 8, -LIBSTORAGE_METADATA_SIZE_64 = 64 -}; -``` - -2. Description - -Metadata size of a drive. - -3. Remarks - -- ES3000 V3 (single-port) supports formatting of five sector types (512+0, 512+8, 4K+64, 4K, and 4K+8). - -- ES3000 V3 (dual-port) supports formatting of four sector types (512+0, 512+8, 4K+64, and 4K). - -- ES3000 V5 supports formatting of five sector types (512+0, 512+8, 4K+64, 4K, and 4K+8). - -- Optane drives support formatting of seven sector types (512+0, 512+8, 512+16,4K, 4K+8, 4K+64, and 4K+128). - - -##### enum libstorage_ns_pi_type - -1. Prototype - -``` -enum libstorage_ns_pi_type -{ -LIBSTORAGE_FMT_NVM_PROTECTION_DISABLE = 0x0, -LIBSTORAGE_FMT_NVM_PROTECTION_TYPE1 = 0x1, -LIBSTORAGE_FMT_NVM_PROTECTION_TYPE2 = 0x2, -LIBSTORAGE_FMT_NVM_PROTECTION_TYPE3 = 0x3, -}; -``` - -2. Description - -Protection type supported by drives. - -3. Remarks - -ES3000 supports only protection types 0 and 3. Optane drives support only protection types 0 and 1. - -##### enum libstorage_crc_and_prchk - -1. Prototype - -``` -enum libstorage_crc_and_prchk -{ -LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK = 0x0, -LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK = 0x1, -LIBSTORAGE_LIB_CRC_AND_DISABLE_PRCHK = 0x2, -LIBSTORAGE_LIB_CRC_AND_ENABLE_PRCHK = 0x3, -#define NVME_NO_REF 0x4 -LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK_NO_REF = LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK | NVME_NO_REF, -LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK_NO_REF = LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK | NVME_NO_REF, -}; -``` - -2. Description - -- **LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK**: Cyclic redundancy check (CRC) is performed for the application layer, but not for HSAK. CRC is disabled for drives. - -- **LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK**: CRC is performed for the application layer, but not for HSAK. CRC is enabled for drives. - -- **LIBSTORAGE_LIB_CRC_AND_DISABLE_PRCHK**: CRC is performed for HSAK, but not for the application layer. CRC is disabled for drives. - -- **LIBSTORAGE_LIB_CRC_AND_ENABLE_PRCHK**: CRC is performed for HSAK, but not for the application layer. CRC is enabled for drives. - -- **LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK_NO_REF**: CRC is performed for the application layer, but not for HSAK. CRC is disabled for drives. REF tag verification is disabled for drives whose PI TYPE is 1 (Intel Optane P4800). - -- **LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK_NO_REF**: CRC is performed for the application layer, but not for HSAK. CRC is enabled for drives. REF tag verification is disabled for drives whose PI TYPE is 1 (Intel Optane P4800). - -- If PI TYPE of an Intel Optane P4800 drive is 1, the CRC and REF tag of the metadata area are verified by default. - -- Intel Optane P4800 drives support DIF in 512+8 format but does not support DIF in 4096+64 format. - -- For ES3000 V3 and ES3000 V5, PI TYPE of the drives is 3. By default, only the CRC of the metadata area is performed. - -- ES3000 V3 supports DIF in 512+8 format but does not support DIF in 4096+64 format. ES3000 V5 supports DIF in both 512+8 and 4096+64 formats. - - -The summary is as follows: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
E2E Verification ModeCtrl FlagCRC Generator Write ProcessRead Process
Application VerificationCRC for HSAKCRC for DrivesApplication VerificationCRC for HSAKCRC for Drives
Halfway protection0ControllerXXXXXX
1ControllerXXXXX
2ControllerXXXXXX
3ControllerXXXXX
Full protection0AppXXXX
1AppXX
2HSAKXXXX
3HSAKXX
- - - - - -##### enum libstorage_print_log_level - -1. Prototype - -``` -enum libstorage_print_log_level -{ -LIBSTORAGE_PRINT_LOG_ERROR, -LIBSTORAGE_PRINT_LOG_WARN, -LIBSTORAGE_PRINT_LOG_NOTICE, -LIBSTORAGE_PRINT_LOG_INFO, -LIBSTORAGE_PRINT_LOG_DEBUG, -}; -``` - -2. Description - -Storage Performance Development Kit (SPDK) log print levels: ERROR, WARN, NOTICE, INFO, and DEBUG, corresponding to 0 to 4 in the configuration file. - -##### MAX_BDEV_NAME_LEN - -1. Prototype - -``` -#define MAX_BDEV_NAME_LEN 24 -``` - -2. Description - -Maximum length of a block device name. - -##### MAX_CTRL_NAME_LEN - -1. Prototype - -``` -#define MAX_CTRL_NAME_LEN 16 -``` - -2. Description - -Maximum length of a controller. - -##### LBA_FORMAT_NUM - -1. Prototype - -``` -#define LBA_FORMAT_NUM 16 -``` - -2. Description - -Number of LBA formats supported by a controller. - -##### LIBSTORAGE_MAX_DSM_RANGE_DESC_COUNT - -1. Prototype - -``` -#define LIBSTORAGE_MAX_DSM_RANGE_DESC_COUNT 256 -``` - -2. Description - -Maximum number of 16-byte sets in the dataset management command. - -#### ublock.h - -##### UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO - -1. Prototype - -``` -#define UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO 1 -``` - -2. Description - -This macro is used to define that the subsystem corresponding to the uevent event is the userspace I/O subsystem (UIO) provided by the kernel. When the service receives the uevent event, this macro is used to determine whether the event is a UIO event that needs to be processed. - -The value of the int subsystem member in struct ublock_uevent is **UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO**. Currently, only this value is available. - -##### UBLOCK_TRADDR_MAX_LEN - -1. Prototype - -``` -#define UBLOCK_TRADDR_MAX_LEN 256 -``` - -2. Description - -The *Domain:Bus:Device.Function* (**%04x:%02x:%02x.%x**) format indicates the maximum length of the PCI address character string. The actual length is far less than 256 bytes. - -##### UBLOCK_PCI_ADDR_MAX_LEN - -1. Prototype - -``` -#define UBLOCK_PCI_ADDR_MAX_LEN 256 -``` - -2. Description - -Maximum length of the PCI address character string. The actual length is far less than 256 bytes. The possible formats of the PCI address are as follows: - -- Full address: **%x:%x:%x.%x** or **%x.%x.%x.%x** - -- When the **Function** value is **0**: **%x:%x:%x** - -- When the **Domain** value is **0**: **%x:%x.%x** or **%x.%x.%x** - -- When the **Domain** and **Function** values are **0**: **%x:%x** or **%x.%x** - -##### UBLOCK_SMART_INFO_LEN - -1. Prototype - -``` -#define UBLOCK_SMART_INFO_LEN 512 -``` - -2. Description - -Size of the structure for the S.M.A.R.T. information of an NVMe drive, which is 512 bytes. - -##### enum ublock_rpc_server_status - -1. Prototype - -``` -enum ublock_rpc_server_status { -// start rpc server or not -UBLOCK_RPC_SERVER_DISABLE = 0, -UBLOCK_RPC_SERVER_ENABLE = 1, -}; -``` - -2. Description - -Status of the RPC service in HSAK. The status can be enabled or disabled. - -##### enum ublock_nvme_uevent_action - -1. Prototype - -``` -enum ublock_nvme_uevent_action { -UBLOCK_NVME_UEVENT_ADD = 0, -UBLOCK_NVME_UEVENT_REMOVE = 1, -UBLOCK_NVME_UEVENT_INVALID, -}; -``` - -2. Description - -Indicates whether the uevent hot swap event is to insert or remove a drive. - -##### enum ublock_subsystem_type - -1. Prototype - -``` -enum ublock_subsystem_type { -SUBSYSTEM_UIO = 0, -SUBSYSTEM_NVME = 1, -SUBSYSTEM_TOP -}; -``` - -2. Description - -Type of the callback function, which is used to determine whether the callback function is registered for the UIO driver or kernel NVMe driver. - -### Data Structure - -#### bdev_rw.h - -##### struct libstorage_namespace_info - -1. Prototype - -``` -struct libstorage_namespace_info -{ -char name[MAX_BDEV_NAME_LEN]; -uint64_t size; /** namespace size in bytes */ -uint64_t sectors; /** number of sectors */ -uint32_t sector_size; /** sector size in bytes */ -uint32_t md_size; /** metadata size in bytes */ -uint32_t max_io_xfer_size; /** maximum i/o size in bytes */ -uint16_t id; /** namespace id */ -uint8_t pi_type; /** end-to-end data protection information type */ -uint8_t is_active :1; /** namespace is active or not */ -uint8_t ext_lba :1; /** namespace support extending LBA size or not */ -uint8_t dsm :1; /** namespace supports Dataset Management or not */ -uint8_t pad :3; -uint64_t reserved; -}; -``` - -2. Description - -This data structure contains the namespace information of a drive. - -3. Struct members - -| Member | Description | -| ---------------------------- | ------------------------------------------------------------ | -| char name[MAX_BDEV_NAME_LEN] | Name of the namespace. | -| uint64_t size | Size of the drive space allocated to the namespace, in bytes. | -| uint64_t sectors | Number of sectors. | -| uint32_t sector_size | Size of each sector, in bytes. | -| uint32_t md_size | Metadata size, in bytes. | -| uint32_t max_io_xfer_size | Maximum size of data in a single I/O operation, in bytes. | -| uint16_t id | Namespace ID. | -| uint8_t pi_type | Data protection type. The value is obtained from enum libstorage_ns_pi_type. | -| uint8_t is_active :1 | Namespace active or not. | -| uint8_t ext_lba :1 | Whether the namespace supports logical block addressing (LBA) in extended mode. | -| uint8_t dsm :1 | Whether the namespace supports dataset management. | -| uint8_t pad :3 | Reserved parameter. | -| uint64_t reserved | Reserved parameter. | - - - - -##### struct libstorage_nvme_ctrlr_info - -1. Prototype - -``` -struct libstorage_nvme_ctrlr_info -{ -char name[MAX_CTRL_NAME_LEN]; -char address[24]; -struct -{ -uint32_t domain; -uint8_t bus; -uint8_t dev; -uint8_t func; -} pci_addr; -uint64_t totalcap; /* Total NVM Capacity in bytes */ -uint64_t unusecap; /* Unallocated NVM Capacity in bytes */ -int8_t sn[20]; /* Serial number */ -uint8_t fr[8]; /* Firmware revision */ -uint32_t max_num_ns; /* Number of namespaces */ -uint32_t version; -uint16_t num_io_queues; /* num of io queues */ -uint16_t io_queue_size; /* io queue size */ -uint16_t ctrlid; /* Controller id */ -uint16_t pad1; -struct -{ -struct -{ -/** metadata size */ -uint32_t ms : 16; -/** lba data size */ -uint32_t lbads : 8; -uint32_t reserved : 8; -} lbaf[LBA_FORMAT_NUM]; -uint8_t nlbaf; -uint8_t pad2[3]; -uint32_t cur_format : 4; -uint32_t cur_extended : 1; -uint32_t cur_pi : 3; -uint32_t cur_pil : 1; -uint32_t cur_can_share : 1; -uint32_t mc_extented : 1; -uint32_t mc_pointer : 1; -uint32_t pi_type1 : 1; -uint32_t pi_type2 : 1; -uint32_t pi_type3 : 1; -uint32_t md_start : 1; -uint32_t md_end : 1; -uint32_t ns_manage : 1; /* Supports the Namespace Management and Namespace Attachment commands */ -uint32_t directives : 1; /* Controller support Directives or not */ -uint32_t streams : 1; /* Controller support Streams Directives or not */ -uint32_t dsm : 1; /* Controller support Dataset Management or not */ -uint32_t reserved : 11; -} cap_info; -}; -``` - -1. Description - -This data structure contains the controller information of a drive. - -2. Struct members - - -| Member | Description | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| char name[MAX_CTRL_NAME_LEN] | Controller name. | -| char address[24] | PCI address, which is a character string. | -| struct
{
uint32_t domain;
uint8_t bus;
uint8_t dev;
uint8_t func;
} pci_addr | PCI address, in segments. | -| uint64_t totalcap | Total capacity of the controller, in bytes. Optane drives are based on the NVMe 1.0 protocol and do not support this parameter. | -| uint64_t unusecap | Free capacity of the controller, in bytes. Optane drives are based on the NVMe 1.0 protocol and do not support this parameter. | -| int8_t sn[20]; | Serial number of a drive, which is an ASCII character string without **0**. | -| uint8_t fr[8]; | Drive firmware version, which is an ASCII character string without **0**. | -| uint32_t max_num_ns | Maximum number of namespaces. | -| uint32_t version | NVMe protocol version supported by the controller. | -| uint16_t num_io_queues | Number of I/O queues supported by a drive. | -| uint16_t io_queue_size | Maximum length of an I/O queue. | -| uint16_t ctrlid | Controller ID. | -| uint16_t pad1 | Reserved parameter. | - -Members of the struct cap_info substructure: - -| Member | Description | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| struct
{
uint32_t ms : 16;
uint32_t lbads : 8;
uint32_t reserved : 8;
}lbaf[LBA_FORMAT_NUM] | **ms**: metadata size. The minimum value is 8 bytes.
**lbads**: The LBA size is 2^lbads, and the value of **lbads** is greater than or equal to 9. | -| uint8_t nlbaf | Number of LBA formats supported by the controller. | -| uint8_t pad2[3] | Reserved parameter. | -| uint32_t cur_format : 4 | Current LBA format of the controller. | -| uint32_t cur_extended : 1 | Whether the controller supports LBA in extended mode. | -| uint32_t cur_pi : 3 | Current protection type of the controller. | -| uint32_t cur_pil : 1 | The current protection information (PI) of the controller is located in the first or last eight bytes of the metadata. | -| uint32_t cur_can_share : 1 | Whether the namespace supports multi-path transmission. | -| uint32_t mc_extented : 1 | Whether metadata is transmitted as part of the data buffer. | -| uint32_t mc_pointer : 1 | Whether metadata is separated from the data buffer. | -| uint32_t pi_type1 : 1 | Whether the controller supports protection type 1. | -| uint32_t pi_type2 : 1 | Whether the controller supports protection type 2. | -| uint32_t pi_type3 : 1 | Whether the controller supports protection type 3. | -| uint32_t md_start : 1 | Whether the controller supports protection information in the first eight bytes of metadata. | -| uint32_t md_end : 1 | Whether the controller supports protection information in the last eight bytes of metadata. | -| uint32_t ns_manage : 1 | Whether the controller supports namespace management. | -| uint32_t directives : 1 | Whether the Directives command set is supported. | -| uint32_t streams : 1 | Whether Streams Directives is supported. | -| uint32_t dsm : 1 | Whether Dataset Management commands are supported. | -| uint32_t reserved : 11 | Reserved parameter. | - -##### struct libstorage_dsm_range_desc - -1. Prototype - -``` -struct libstorage_dsm_range_desc -{ -/* RESERVED */ -uint32_t reserved; - -/* NUMBER OF LOGICAL BLOCKS */ -uint32_t block_count; - -/* UNMAP LOGICAL BLOCK ADDRESS */uint64_t lba;}; -``` - -2. Description - -Definition of a single 16-byte set in the data management command set. - -3. Struct members - -| Member | Description | -| -------------------- | ------------------------ | -| uint32_t reserved | Reserved parameter. | -| uint32_t block_count | Number of LBAs per unit. | -| uint64_t lba | Start LBA. | - -##### struct libstorage_ctrl_streams_param - -1. Prototype - -``` -struct libstorage_ctrl_streams_param -{ -/* MAX Streams Limit */ -uint16_t msl; - -/* NVM Subsystem Streams Available */ -uint16_t nssa; - -/* NVM Subsystem Streams Open */uint16_t nsso; - -uint16_t pad; -}; -``` - -2. Description - -Streams attribute value supported by NVMe drives. - -3. Struct members - -| Member | Description | -| ------------- | ------------------------------------------------------------ | -| uint16_t msl | Maximum number of Streams resources supported by a drive. | -| uint16_t nssa | Number of Streams resources that can be used by each NVM subsystem. | -| uint16_t nsso | Number of Streams resources used by each NVM subsystem. | -| uint16_t pad | Reserved parameter. | - - - -##### struct libstorage_bdev_streams_param - -1. Prototype - -``` -struct libstorage_bdev_streams_param -{ -/* Stream Write Size */ -uint32_t sws; - -/* Stream Granularity Size */ -uint16_t sgs; - -/* Namespace Streams Allocated */ -uint16_t nsa; - -/* Namespace Streams Open */ -uint16_t nso; - -uint16_t reserved[3]; -}; -``` - -2. Description - -Streams attribute value of the namespace. - -3. Struct members - -| Member | Description | -| -------------------- | ------------------------------------------------------------ | -| uint32_t sws | Write granularity with the optimal performance, in sectors. | -| uint16_t sgs | Write granularity allocated to Streams, in sws. | -| uint16_t nsa | Number of private Streams resources that can be used by a namespace. | -| uint16_t nso | Number of private Streams resources used by a namespace. | -| uint16_t reserved[3] | Reserved parameter. | - -##### struct libstorage_mgr_info - -1. Prototype - -``` -struct libstorage_mgr_info -{ -char pci[24]; -char ctrlName[MAX_CTRL_NAME_LEN]; -uint64_t sector_size; -uint64_t cap_size; -uint16_t device_id; -uint16_t subsystem_device_id; -uint16_t vendor_id; -uint16_t subsystem_vendor_id; -uint16_t controller_id; -int8_t serial_number[20]; -int8_t model_number[40]; -uint8_t firmware_revision[8]; -}; -``` - -2. Description - -Drive management information (consistent with the drive information used by the management plane). - -3. Struct members - -| Member | Description | -| -------------------------------- | ---------------------------------------------- | -| char pci[24] | Character string of the drive PCI address. | -| char ctrlName[MAX_CTRL_NAME_LEN] | Character string of the drive controller name. | -| uint64_t sector_size | Drive sector size. | -| uint64_t cap_size | Drive capacity, in bytes. | -| uint16_t device_id | Drive device ID. | -| uint16_t subsystem_device_id | Drive subsystem device ID. | -| uint16­*t vendor*id | Drive vendor ID. | -| uint16_t subsystem_vendor_id | Drive subsystem vendor ID. | -| uint16_t controller_id | Drive controller ID. | -| int8_t serial_number[20] | Drive serial number. | -| int8_t model_number[40] | Device model. | -| uint8_t firmware_revision[8] | Firmware version. | - -##### struct **attribute**((packed)) libstorage_smart_info - -1. Prototype - -``` -/* same with struct spdk_nvme_health_information_page in nvme_spec.h */ -struct __attribute__((packed)) libstorage_smart_info { -/* details of uint8_t critical_warning - -union spdk_nvme_critical_warning_state { - -uint8_t raw; -* - -struct { - -uint8_t available_spare : 1; - -uint8_t temperature : 1; - -uint8_t device_reliability : 1; - -uint8_t read_only : 1; - -uint8_t volatile_memory_backup : 1; - -uint8_t reserved : 3; - -} bits; - -}; -*/ -uint8_t critical_warning; -uint16_t temperature; -uint8_t available_spare; -uint8_t available_spare_threshold; -uint8_t percentage_used; -uint8_t reserved[26]; - -/* - -Note that the following are 128-bit values, but are - -defined as an array of 2 64-bit values. -*/ -/* Data Units Read is always in 512-byte units. */ -uint64_t data_units_read[2]; -/* Data Units Written is always in 512-byte units. */ -uint64_t data_units_written[2]; -/* For NVM command set, this includes Compare commands. */ -uint64_t host_read_commands[2]; -uint64_t host_write_commands[2]; -/* Controller Busy Time is reported in minutes. */ -uint64_t controller_busy_time[2]; -uint64_t power_cycles[2]; -uint64_t power_on_hours[2]; -uint64_t unsafe_shutdowns[2]; -uint64_t media_errors[2]; -uint64_t num_error_info_log_entries[2]; -/* Controller temperature related. */ -uint32_t warning_temp_time; -uint32_t critical_temp_time; -uint16_t temp_sensor[8]; -uint8_t reserved2[296]; -}; -``` - -1. Description - -This data structure defines the S.M.A.R.T. information of a drive. - -2. Struct members - -| Member | **Description (For details, see the NVMe protocol.)** | -| -------------------------------------- | ------------------------------------------------------------ | -| uint8_t critical_warning | Critical alarm of the controller status. If a bit is set to 1, the bit is valid. You can set multiple bits to be valid. Critical alarms are returned to the host through asynchronous events.
Bit 0: When this bit is set to 1, the redundant space is less than the specified threshold.
Bit 1: When this bit is set to 1, the temperature is higher or lower than a major threshold.
Bit 2: When this bit is set to 1, component reliability is reduced due to major media errors or internal errors.
Bit 3: When this bit is set to 1, the medium has been set to the read-only mode.
Bit 4: When this bit is set to 1, the volatile component of the controller fails. This parameter is valid only when the volatile component exists in the controller.
Bits 5-7: reserved. | -| uint16_t temperature | Temperature of a component. The unit is Kelvin. | -| uint8_t available_spare | Percentage of the available redundant space (0 to 100%). | -| uint8_t available_spare_threshold | Threshold of the available redundant space. An asynchronous event is reported when the available redundant space is lower than the threshold. | -| uint8_t percentage_used | Percentage of the actual service life of a component to the service life of the component expected by the manufacturer. The value **100** indicates that the actual service life of the component has reached to the expected service life, but the component can still be used. The value can be greater than 100, but any value greater than 254 will be set to 255. | -| uint8_t reserved[26] | Reserved. | -| uint64_t data_units_read[2] | Number of 512 bytes read by the host from the controller. The value **1** indicates that 1000 x 512 bytes are read, which exclude metadata. If the LBA size is not 512 bytes, the controller converts it into 512 bytes for calculation. The value is expressed in hexadecimal notation. | -| uint64_t data_units_written[2] | Number of 512 bytes written by the host to the controller. The value **1** indicates that 1000 x 512 bytes are written, which exclude metadata. If the LBA size is not 512 bytes, the controller converts it into 512 bytes for calculation. The value is expressed in hexadecimal notation. | -| uint64_t host_read_commands[2] | Number of read commands delivered to the controller. | -| uint64_t host_write_commands[2]; | Number of write commands delivered to the controller. | -| uint64_t controller_busy_time[2] | Busy time for the controller to process I/O commands. The process from the time the commands are delivered to the time the results are returned to the CQ is busy. The time is expressed in minutes. | -| uint64_t power_cycles[2] | Number of machine on/off cycles. | -| uint64_t power_on_hours[2] | Power-on duration, in hours. | -| uint64_t unsafe_shutdowns[2] | Number of abnormal power-off times. The value is incremented by 1 when CC.SHN is not received during power-off. | -| uint64_t media_errors[2] | Number of times that the controller detects unrecoverable data integrity errors, including uncorrectable ECC errors, CRC errors, and LBA tag mismatch. | -| uint64_t num_error_info_log_entries[2] | Number of entries in the error information log within the controller lifecycle. | -| uint32_t warning_temp_time | Accumulated time when the temperature exceeds the warning alarm threshold, in minutes. | -| uint32_t critical_temp_time | Accumulated time when the temperature exceeds the critical alarm threshold, in minutes. | -| uint16_t temp_sensor[8] | Temperature of temperature sensors 1–8. The unit is Kelvin. | -| uint8_t reserved2[296] | Reserved. | - -##### libstorage_dpdk_contig_mem - -1. Prototype - -``` -struct libstorage_dpdk_contig_mem { -uint64_t virtAddr; -uint64_t memLen; -uint64_t allocLen; -}; -``` - -2. Description - -Description about a contiguous virtual memory segment in the parameters of the callback function that notifies the service layer of initialization completion after the DPDK memory is initialized. - -Currently, 800 MB memory is reserved for HSAK. Other memory is returned to the service layer through **allocLen** in this struct for the service layer to allocate memory for self-management. - -The total memory to be reserved for HSAK is about 800 MB. The memory reserved for each memory segment is calculated based on the number of NUMA nodes in the environment. When there are too many NUMA nodes, the memory reserved on each memory segment is too small. As a result, HSAK initialization fails. Therefore, HSAK supports only the environment with a maximum of four NUMA nodes. - -3. Struct members - -| Member | Description | -| ----------------- | -------------------------------------------------------- | -| uint64_t virtAddr | Start address of the virtual memory. | -| uint64_t memLen | Length of the virtual memory, in bytes. | -| uint64_t allocLen | Available memory length in the memory segment, in bytes. | - -##### struct libstorage_dpdk_init_notify_arg - -1. Prototype - -``` -struct libstorage_dpdk_init_notify_arg { -uint64_t baseAddr; -uint16_t memsegCount; -struct libstorage_dpdk_contig_mem *memseg; -}; -``` - -2. Description - -Callback function parameter used to notify the service layer of initialization completion after DPDK memory initialization, indicating information about all virtual memory segments. - -3. Struct members - -| Member | Description | -| ----------------------------------------- | ------------------------------------------------------------ | -| uint64_t baseAddr | Start address of the virtual memory. | -| uint16_t memsegCount | Number of valid **memseg** array members, that is, the number of contiguous virtual memory segments. | -| struct libstorage_dpdk_contig_mem *memseg | Pointer to the memory segment array. Each array element is a contiguous virtual memory segment, and every two elements are discontiguous. | - -##### struct libstorage_dpdk_init_notify - -1. Prototype - -``` -struct libstorage_dpdk_init_notify { -const char *name; -void (*notifyFunc)(const struct libstorage_dpdk_init_notify_arg *arg); -TAILQ_ENTRY(libstorage_dpdk_init_notify) tailq; -}; -``` - -2. Description - -Struct used to notify the service layer of the callback function registration after the DPDK memory is initialized. - -3. Struct members - -| Member | Description | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| const char *name | Name of the service-layer module of the registered callback function. | -| void (*notifyFunc)(const struct libstorage_dpdk_init_notify_arg *arg) | Callback function parameter used to notify the service layer of initialization completion after the DPDK memory is initialized. | -| TAILQ_ENTRY(libstorage_dpdk_init_notify) tailq | Linked list that stores registered callback functions. | - -#### ublock.h - -##### struct ublock_bdev_info - -1. Prototype - -``` -struct ublock_bdev_info { -uint64_t sector_size; -uint64_t cap_size; // cap_size -uint16_t device_id; -uint16_t subsystem_device_id; // subsystem device id of nvme control -uint16_t vendor_id; -uint16_t subsystem_vendor_id; -uint16_t controller_id; -int8_t serial_number[20]; -int8_t model_number[40]; -int8_t firmware_revision[8]; -}; -``` - -2. Description - -This data structure contains the device information of a drive. - -3. Struct members - -| Member | Description | -| ---------------------------- | ----------------------------------------------- | -| uint64_t sector_size | Sector size of a drive, for example, 512 bytes. | -| uint64_t cap_size | Total drive capacity, in bytes. | -| uint16_t device_id | Device ID. | -| uint16_t subsystem_device_id | Device ID of a subsystem. | -| uint16_t vendor_id | Main ID of the device vendor. | -| uint16_t subsystem_vendor_id | Sub-ID of the device vendor. | -| uint16_t controller_id | ID of the device controller. | -| int8_t serial_number[20] | Device serial number. | -| int8_t model_number[40] | Device model. | -| int8_t firmware_revision[8] | Firmware version. | - -##### struct ublock_bdev - -1. Prototype - -``` -struct ublock_bdev { -char pci[UBLOCK_PCI_ADDR_MAX_LEN]; -struct ublock_bdev_info info; -struct spdk_nvme_ctrlr *ctrlr; -TAILQ_ENTRY(ublock_bdev) link; -}; -``` - -2. Description - -The data structure contains the drive information of the specified PCI address, and the structure itself is a node of the queue. - -3. Struct members - -| Member | Description | -| --------------------------------- | ------------------------------------------------------------ | -| char pci[UBLOCK_PCI_ADDR_MAX_LEN] | PCI address. | -| struct ublock_bdev_info info | Drive information. | -| struct spdk_nvme_ctrlr *ctrlr | Data structure of the device controller. The members in this structure are not open to external systems. External services can obtain the corresponding member data through the SPDK open source interface. | -| TAILQ_ENTRY(ublock_bdev) link | Structure of the pointers before and after a queue. | - -##### struct ublock_bdev_mgr - -1. Prototype - -``` -struct ublock_bdev_mgr { -TAILQ_HEAD(, ublock_bdev) bdevs; -}; -``` - -2. Description - -This data structure defines the header structure of a ublock_bdev queue. - -3. Struct members - -| Member | Description | -| -------------------------------- | ----------------------- | -| TAILQ_HEAD(, ublock_bdev) bdevs; | Queue header structure. | - -##### struct **attribute**((packed)) ublock_SMART_info - -1. Prototype - -``` -struct __attribute__((packed)) ublock_SMART_info { -uint8_t critical_warning; -uint16_t temperature; -uint8_t available_spare; -uint8_t available_spare_threshold; -uint8_t percentage_used; -uint8_t reserved[26]; -/* - -Note that the following are 128-bit values, but are - -defined as an array of 2 64-bit values. -*/ -/* Data Units Read is always in 512-byte units. */ -uint64_t data_units_read[2]; -/* Data Units Written is always in 512-byte units. */ -uint64_t data_units_written[2]; -/* For NVM command set, this includes Compare commands. */ -uint64_t host_read_commands[2]; -uint64_t host_write_commands[2]; -/* Controller Busy Time is reported in minutes. */ -uint64_t controller_busy_time[2]; -uint64_t power_cycles[2]; -uint64_t power_on_hours[2]; -uint64_t unsafe_shutdowns[2]; -uint64_t media_errors[2]; -uint64_t num_error_info_log_entries[2]; -/* Controller temperature related. */ -uint32_t warning_temp_time; -uint32_t critical_temp_time; -uint16_t temp_sensor[8]; -uint8_t reserved2[296]; -}; -``` - -2. Description - -This data structure defines the S.M.A.R.T. information of a drive. - -3. Struct members - -| Member | Description (For details, see the NVMe protocol.) | -| -------------------------------------- | ------------------------------------------------------------ | -| uint8_t critical_warning | Critical alarm of the controller status. If a bit is set to 1, the bit is valid. You can set multiple bits to be valid. Critical alarms are returned to the host through asynchronous events.
Bit 0: When this bit is set to 1, the redundant space is less than the specified threshold.
Bit 1: When this bit is set to 1, the temperature is higher or lower than a major threshold.
Bit 2: When this bit is set to 1, component reliability is reduced due to major media errors or internal errors.
Bit 3: When this bit is set to 1, the medium has been set to the read-only mode.
Bit 4: When this bit is set to 1, the volatile component of the controller fails. This parameter is valid only when the volatile component exists in the controller.
Bits 5-7: reserved. | -| uint16_t temperature | Temperature of a component. The unit is Kelvin. | -| uint8_t available_spare | Percentage of the available redundant space (0 to 100%). | -| uint8_t available_spare_threshold | Threshold of the available redundant space. An asynchronous event is reported when the available redundant space is lower than the threshold. | -| uint8_t percentage_used | Percentage of the actual service life of a component to the service life of the component expected by the manufacturer. The value **100** indicates that the actual service life of the component has reached to the expected service life, but the component can still be used. The value can be greater than 100, but any value greater than 254 will be set to 255. | -| uint8_t reserved[26] | Reserved. | -| uint64_t data_units_read[2] | Number of 512 bytes read by the host from the controller. The value **1** indicates that 1000 x 512 bytes are read, which exclude metadata. If the LBA size is not 512 bytes, the controller converts it into 512 bytes for calculation. The value is expressed in hexadecimal notation. | -| uint64_t data_units_written[2] | Number of 512 bytes written by the host to the controller. The value **1** indicates that 1000 x 512 bytes are written, which exclude metadata. If the LBA size is not 512 bytes, the controller converts it into 512 bytes for calculation. The value is expressed in hexadecimal notation. | -| uint64_t host_read_commands[2] | Number of read commands delivered to the controller. | -| uint64_t host_write_commands[2]; | Number of write commands delivered to the controller. | -| uint64_t controller_busy_time[2] | Busy time for the controller to process I/O commands. The process from the time the commands are delivered to the time the results are returned to the CQ is busy. The value is expressed in minutes. | -| uint64_t power_cycles[2] | Number of machine on/off cycles. | -| uint64_t power_on_hours[2] | Power-on duration, in hours. | -| uint64_t unsafe_shutdowns[2] | Number of abnormal power-off times. The value is incremented by 1 when CC.SHN is not received during power-off. | -| uint64_t media_errors[2] | Number of unrecoverable data integrity errors detected by the controller, including uncorrectable ECC errors, CRC errors, and LBA tag mismatch. | -| uint64_t num_error_info_log_entries[2] | Number of entries in the error information log within the controller lifecycle. | -| uint32_t warning_temp_time | Accumulated time when the temperature exceeds the warning alarm threshold, in minutes. | -| uint32_t critical_temp_time | Accumulated time when the temperature exceeds the critical alarm threshold, in minutes. | -| uint16_t temp_sensor[8] | Temperature of temperature sensors 1–8. The unit is Kelvin. | -| uint8_t reserved2[296] | Reserved. | - -##### struct ublock_nvme_error_info - -1. Prototype - -``` -struct ublock_nvme_error_info { -uint64_t error_count; -uint16_t sqid; -uint16_t cid; -uint16_t status; -uint16_t error_location; -uint64_t lba; -uint32_t nsid; -uint8_t vendor_specific; -uint8_t reserved[35]; -}; -``` - -2. Description - -This data structure contains the content of a single error message in the device controller. The number of errors supported by different controllers may vary. - -3. Struct members - -| Member | Description (For details, see the NVMe protocol.) | -| ----------------------- | ------------------------------------------------------------ | -| uint64_t error_count | Error sequence number, which increases in ascending order. | -| uint16_t sqid | Submission queue identifier for the command associated with an error message. If an error cannot be associated with a specific command, this parameter should be set to **FFFFh**. | -| uint16_t cid | Command identifier associated with an error message. If an error cannot be associated with a specific command, this parameter should be set to **FFFFh**. | -| uint16_t status | Status of a completed command. | -| uint16_t error_location | Command parameter associated with an error message. | -| uint64_t lba | First LBA when an error occurs. | -| uint32_t nsid | Namespace where an error occurs. | -| uint8_t vendor_specific | Log page identifier associated with the page if other vendor-specific error messages are available. The value **00h** indicates that no additional information is available. The valid value ranges from 80h to FFh. | -| uint8_t reserved[35] | Reserved. | - -##### struct ublock_uevent - -1. Prototype - -``` -struct ublock_uevent { -enum ublock_nvme_uevent_action action; -int subsystem; -char traddr[UBLOCK_TRADDR_MAX_LEN + 1]; -}; -``` - -2. Description - -This data structure contains parameters related to the uevent event. - -3. Struct members - -| Member | Description | -| -------------------------------------- | ------------------------------------------------------------ | -| enum ublock_nvme_uevent_action action | Whether the uevent event type is drive insertion or removal through enumeration. | -| int subsystem | Subsystem type of the uevent event. Currently, only **UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO** is supported. If the application receives other values, no processing is required. | -| char traddr[UBLOCK_TRADDR_MAX_LEN + 1] | PCI address character string in the *Domain:Bus:Device.Function* (**%04x:%02x:%02x.%x**) format. | - -##### struct ublock_hook - -1. Prototype - -``` -struct ublock_hook -{ -ublock_callback_func ublock_callback; -void *user_data; -}; -``` - -2. Description - -This data structure is used to register callback functions. - -3. Struct members - -| Member | Description | -| ------------------------------------ | ------------------------------------------------------------ | -| ublock_callback_func ublock_callback | Function executed during callback. The type is bool func(void *info, void *user_data). | -| void *user_data | User parameter transferred to the callback function. | - -##### struct ublock_ctrl_iostat_info - -1. Prototype - -``` -struct ublock_ctrl_iostat_info -{ -uint64_t num_read_ops; -uint64_t num_write_ops; -uint64_t read_latency_ms; -uint64_t write_latency_ms; -uint64_t io_outstanding; -uint64_t num_poll_timeout; -uint64_t io_ticks_ms; -}; -``` - -2. Description - -This data structure is used to obtain the I/O statistics of a controller. - -3. Struct members - -| Member | Description | -| ------------------------- | ------------------------------------------------------------ | -| uint64_t num_read_ops | Accumulated number of read I/Os of the controller. | -| uint64_t num_write_ops | Accumulated number of write I/Os of the controller. | -| uint64_t read_latency_ms | Accumulated read latency of the controller, in ms. | -| uint64_t write_latency_ms | Accumulated write latency of the controller, in ms. | -| uint64_t io_outstanding | Queue depth of the controller. | -| uint64_t num_poll_timeout | Accumulated number of polling timeouts of the controller. | -| uint64_t io_ticks_ms | Accumulated I/O processing latency of the controller, in ms. | - -### API - -#### bdev_rw.h - -##### libstorage_get_nvme_ctrlr_info - -1. Prototype - -uint32_t libstorage_get_nvme_ctrlr_info(struct libstorage_nvme_ctrlr_info** ppCtrlrInfo); - -2. Description - -Obtains information about all controllers. - -3. Parameters - -| Parameter | Description | -| ----------------------------------------------- | ------------------------------------------------------------ | -| struct libstorage_nvme_ctrlr_info** ppCtrlrInfo | Output parameter, which returns all obtained controller information.
Note:
Free the memory using the free API in a timely manner. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | Failed to obtain controller information or no controller information is obtained. | -| > 0 | Number of obtained controllers. | - -##### libstorage_get_mgr_info_by_esn - -1. Prototype - -``` -int32_t libstorage_get_mgr_info_by_esn(const char *esn, struct libstorage_mgr_info *mgr_info); -``` - -2. Description - -Obtains the management information about the NVMe drive corresponding to the ESN. - -3. Parameters - -| Parameter | Description | -| ------------------------------------ | ------------------------------------------------------------ | -| const char *esn | ESN of the target device.
Note:
An ESN is a string of a maximum of 20 characters (excluding the end character of the string), but the length may vary according to hardware vendors. For example, if the length is less than 20 characters, spaces are padded at the end of the character string.
| -| struct libstorage_mgr_info *mgr_info | Output parameter, which returns all obtained NVMe drive management information. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | Succeeded in querying the NVMe drive management information corresponding to an ESN. | -| -1 | Failed to query the NVMe drive management information corresponding to an ESN. | -| -2 | No NVMe drive matching an ESN is obtained. | - -##### libstorage_get_mgr_smart_by_esn - -1. Prototype - -``` -int32_t libstorage_get_mgr_smart_by_esn(const char *esn, uint32_t nsid, struct libstorage_smart_info *mgr_smart_info); -``` - -2. Description - -Obtains the S.M.A.R.T. information of the NVMe drive corresponding to an ESN. - -3. Parameters - -| Parameter | Description | -| ------------------------------------ | ------------------------------------------------------------ | -| const char *esn | ESN of the target device.
Note:
An ESN is a string of a maximum of 20 characters (excluding the end character of the string), but the length may vary according to hardware vendors. For example, if the length is less than 20 characters, spaces are padded at the end of the character string.
| -| uint32_t nsid | Specified namespace. | -| struct libstorage_mgr_info *mgr_info | Output parameter, which returns all obtained S.M.A.R.T. information of NVMe drives. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | Succeeded in querying the S.M.A.R.T. information of the NVMe drive corresponding to an ESN. | -| -1 | Failed to query the S.M.A.R.T. information of the NVMe drive corresponding to an ESN. | -| -2 | No NVMe drive matching an ESN is obtained. | - -##### libstorage_get_bdev_ns_info - -1. Prototype - -``` -uint32_t libstorage_get_bdev_ns_info(const char* bdevName, struct libstorage_namespace_info** ppNsInfo); -``` - -2. Description - -Obtains namespace information based on the device name. - -3. Parameters - -| Parameter | Description | -| ------------------------------------------- | ------------------------------------------------------------ | -| const char* bdevName | Device name. | -| struct libstorage_namespace_info** ppNsInfo | Output parameter, which returns namespace information.
Note:
Free the memory using the free API in a timely manner. | - -4. Return value - -| Return Value | Description | -| ------------ | ---------------------------- | -| 0 | The operation failed. | -| 1 | The operation is successful. | - -##### libstorage_get_ctrl_ns_info - -1. Prototype - -``` -uint32_t libstorage_get_ctrl_ns_info(const char* ctrlName, struct libstorage_namespace_info** ppNsInfo); -``` - -2. Description - -Obtains information about all namespaces based on the controller name. - -3. Parameters - -| Parameter | Description | -| ------------------------------------------- | ------------------------------------------------------------ | -| const char* ctrlName | Controller name. | -| struct libstorage_namespace_info** ppNsInfo | Output parameter, which returns information about all namespaces.
Note:
Free the memory using the free API in a timely manner. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | Failed to obtain the namespace information or no namespace information is obtained. | -| > 0 | Number of namespaces obtained. | - -##### libstorage_create_namespace - -1. Prototype - -``` -int32_t libstorage_create_namespace(const char* ctrlName, uint64_t ns_size, char** outputName); -``` - -2. Description - -Creates a namespace on a specified controller (the prerequisite is that the controller supports namespace management). - -Optane drives are based on the NVMe 1.0 protocol and do not support namespace management. Therefore, this API is not supported. - -ES3000 V3 and V5 support only one namespace by default. By default, a namespace exists on the controller. To create a namespace, delete the original namespace. - -3. Parameters - -| Parameter | Description | -| -------------------- | ------------------------------------------------------------ | -| const char* ctrlName | Controller name. | -| uint64_t ns_size | Size of the namespace to be created (unit: sector_size). | -| char** outputName | Output parameter, which indicates the name of the created namespace.
Note:
Free the memory using the free API in a timely manner. | - -4. Return value - -| Return Value | Description | -| ------------ | ---------------------------------------------- | -| ≤ 0 | Failed to create the namespace. | -| > 0 | ID of the created namespace (starting from 1). | - -##### libstorage_delete_namespace - -1. Prototype - -``` -int32_t libstorage_delete_namespace(const char* ctrlName, uint32_t ns_id); -``` - -2. Description - -Deletes a namespace from a specified controller. Optane drives are based on the NVMe 1.0 protocol and do not support namespace management. Therefore, this API is not supported. - -3. Parameters - -| Parameter | Description | -| -------------------- | ---------------- | -| const char* ctrlName | Controller name. | -| uint32_t ns_id | Namespace ID | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | Deletion succeeded. | -| Other values | Deletion failed.
Note:
Before deleting a namespace, stop I/O operations. Otherwise, the namespace fails to be deleted. | - -##### libstorage_delete_all_namespace - -1. Prototype - -``` -int32_t libstorage_delete_all_namespace(const char* ctrlName); -``` - -2. Description - -Deletes all namespaces from a specified controller. Optane drives are based on the NVMe 1.0 protocol and do not support namespace management. Therefore, this API is not supported. - -3. Parameters - -| Parameter | Description | -| -------------------- | ---------------- | -| const char* ctrlName | Controller name. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | Deletion succeeded. | -| Other values | Deletion failed.
Note:
Before deleting a namespace, stop I/O operations. Otherwise, the namespace fails to be deleted. | - -##### libstorage_nvme_create_ctrlr - -1. Prototype - -``` -int32_t libstorage_nvme_create_ctrlr(const char *pci_addr, const char *ctrlr_name); -``` - -2. Description - -Creates an NVMe controller based on the PCI address. - -3. Parameters - -| Parameter | Description | -| ---------------- | ---------------- | -| char *pci_addr | PCI address. | -| char *ctrlr_name | Controller name. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------- | -| < 0 | Creation failed. | -| 0 | Creation succeeded. | - -##### libstorage_nvme_delete_ctrlr - -1. Prototype - -``` -int32_t libstorage_nvme_delete_ctrlr(const char *ctrlr_name); -``` - -1. Description - -Destroys an NVMe controller based on the controller name. - -2. Parameters - -| Parameter | Description | -| ---------------------- | ---------------- | -| const char *ctrlr_name | Controller name. | - -This API can be called only after all delivered I/Os are returned. - -3. Return value - -| Return Value | Description | -| ------------ | ---------------------- | -| < 0 | Destruction failed. | -| 0 | Destruction succeeded. | - -##### libstorage_nvme_reload_ctrlr - -1. Prototype - -``` -int32_t libstorage_nvme_reload_ctrlr(const char *cfgfile); -``` - -2. Description - -Adds or deletes an NVMe controller based on the configuration file. - -3. Parameters - -| Parameter | Description | -| ------------------- | ------------------------------- | -| const char *cfgfile | Path of the configuration file. | - - -Before using this API to delete a drive, ensure that all delivered I/Os have been returned. - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| < 0 | Failed to add or delete drives based on the configuration file. (Drives may be successfully added or deleted for some controllers.) | -| 0 | Drives are successfully added or deleted based on the configuration file. | - -> Constraints - -- Currently, a maximum of 36 controllers can be configured in the configuration file. - -- The reload API creates as many controllers as possible. If a controller fails to be created, the creation of other controllers is not affected. - -- In concurrency scenarios, the final drive initialization status may be inconsistent with the input configuration file. - -- If you delete a drive that is delivering I/Os by reloading the drive, I/Os fail. - -- After the controller name (for example, **nvme0**) corresponding to the PCI address in the configuration file is modified, the modification does not take effect after this interface is called. - -- The reload function is valid only when drives are added or deleted. Other configuration items in the configuration file cannot be reloaded. - -##### libstorage_low_level_format_nvm - -1. Prototype - -``` -int8_t libstorage_low_level_format_nvm(const char* ctrlName, uint8_t lbaf, -enum libstorage_ns_pi_type piType, -bool pil_start, bool ms_extented, uint8_t ses); -``` - -2. Description - -Low-level formats NVMe drives. - -3. Parameters - -| Parameter | Description | -| --------------------------------- | ------------------------------------------------------------ | -| const char* ctrlName | Controller name. | -| uint8_t lbaf | LBA format to be used. | -| enum libstorage_ns_pi_type piType | Protection type to be used. | -| bool pil_start | The protection information is stored in first eight bytes (1) or last eight bytes (0) of the metadata. | -| bool ms_extented | Whether to format to the extended type. | -| uint8_t ses | Whether to perform secure erase during formatting. Currently, only the value **0** (no-secure erase) is supported. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------- | -| < 0 | Formatting failed. | -| ≥ 0 | LBA format generated after successful formatting. | - -> Constraints - -- This low-level formatting API will clear the data and metadata of the drive namespace. Exercise caution when using this API. - -- It takes several seconds to format an ES3000 drive and several minutes to format an Intel Optane drive. Before using this API, wait until the formatting is complete. If the formatting process is forcibly stopped, the formatting fails. - -- Before formatting, stop the I/O operations on the data plane. If the drive is processing I/O requests, the formatting may fail occasionally. If the formatting is successful, the drive may discard the I/O requests that are being processed. Therefore, before formatting the drive, ensure that the I/O operations on the data plane are stopped. - -- During the formatting, the controller is reset. As a result, the initialized drive resources are unavailable. Therefore, after the formatting is complete, restart the I/O process on the data plane. - -- ES3000 V3 supports protection types 0 and 3, PI start and PI end, and mc extended. ES3000 V3 supports DIF in 512+8 format but does not support DIF in 4096+64 format. - -- ES3000 V5 supports protection types 0 and 3, PI start and PI end, mc extended, and mc pointer. ES3000 V5 supports DIF in both 512+8 and 4096+64 formats. - -- Optane drives support protection types 0 and 1, PI end, and mc extended. Optane drives support DIF in 512+8 format but does not support DIF in 4096+64 format. - -| **Drive Type** | **LBA Format** | **Drive Type** | **LBA Format** | -| ------------------ | ------------------------------------------------------------ | -------------- | ------------------------------------------------------------ | -| Intel Optane P4800 | lbaf0:512+0
lbaf1:512+8
lbaf2:512+16
lbaf3:4096+0
lbaf4:4096+8
lbaf5:4096+64
lbaf6:4096+128 | ES3000 V3, V5 | lbaf0:512+0
lbaf1:512+8
lbaf2:4096+64
lbaf3:4096+0
lbaf4:4096+8 | - -##### LIBSTORAGE_CALLBACK_FUNC - -1. Prototype - -``` -typedef void (*LIBSTORAGE_CALLBACK_FUNC)(int32_t cb_status, int32_t sct_code, void* cb_arg); -``` - -2. Description - -Registered HSAK I/O completion callback function. - -3. Parameters - -| Parameter | Description | -| ----------------- | ------------------------------------------------------------ | -| int32_t cb_status | I/O status code. The value **0** indicates success, a negative value indicates system error code, and a positive value indicates drive error code (for different error codes,
see [Appendixes](#Appendixes)). | -| int32_t sct_code | I/O status code type:
0: [GENERIC](#generic)
1: [COMMAND_SPECIFIC](#command_specific)
2: [MEDIA_DATA_INTERGRITY_ERROR](#media_data_intergrity_error)
7: VENDOR_SPECIFIC | -| void* cb_arg | Input parameter of the callback function. | - -4. Return value - -None. - -##### libstorage_deallocate_block - -1. Prototype - -``` -int32_t libstorage_deallocate_block(int32_t fd, struct libstorage_dsm_range_desc *range, uint16_t range_count, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` - -2. Description - -Notifies NVMe drives of the blocks that can be released. - -3. Parameters - -| Parameter | Description | -| --------------------------------------- | ------------------------------------------------------------ | -| int32_t fd | Open drive file descriptor. | -| struct libstorage_dsm_range_desc *range | Description of blocks that can be released on NVMe drives.
Note:
This parameter requires **libstorage_mem_reserve** to allocate huge page memory. 4 KB alignment is required during memory allocation, that is, align is set to 4096.
The TRIM range of drives is restricted based on different drives. Exceeding the maximum TRIM range on the drives may cause data exceptions. | -| uint16_t range_count | Number of members in the array range. | -| LIBSTORAGE_CALLBACK_FUNC cb | Callback function. | -| void* cb_arg | Callback function parameter. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------- | -| < 0 | Failed to deliver the request. | -| 0 | Request submitted successfully. | - -##### libstorage_async_write - -1. Prototype - -``` -int32_t libstorage_async_write(int32_t fd, void *buf, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` - -2. Description - -Delivers asynchronous I/O write requests (the write buffer is a contiguous buffer). - -3. Parameters - -| Parameter | Description | -| -------------------------------------- | ------------------------------------------------------------ | -| int32_t fd | File descriptor of the block device. | -| void *buf | Buffer for I/O write data (four-byte aligned and cannot cross the 4 KB page boundary).
Note:
The LBA in extended mode must contain the metadata memory size. | -| size_t nbytes | Size of a single write I/O, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| off64_t offset | Write offset of the LBA, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| void *md_buf | Metadata buffer. (Applicable only to LBAs in separated mode. Set this parameter to **NULL** for LBAs in extended mode.) | -| size_t md_len | Buffer length of metadata. (Applicable only to LBAs in separated mode. Set this parameter to **0** for LBAs in extended mode.) | -| enum libstorage_crc_and_prchk dif_flag | Whether to calculate DIF and whether to enable drive verification. | -| LIBSTORAGE_CALLBACK_FUNC cb | Registered callback function. | -| void* cb_arg | Parameters of the callback function. | - -4. Return value - -| Return Value | Description | -| ------------ | ---------------------------------------------- | -| 0 | I/O write requests are submitted successfully. | -| Other values | Failed to submit I/O write requests. | - -##### libstorage_async_read - -1. Prototype - -``` -int32_t libstorage_async_read(int32_t fd, void *buf, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` - -2. Description - -Delivers asynchronous I/O read requests (the read buffer is a contiguous buffer). - -3. Parameters - -| Parameter | Description | -| -------------------------------------- | ------------------------------------------------------------ | -| int32_t fd | File descriptor of the block device. | -| void *buf | Buffer for I/O read data (four-byte aligned and cannot cross the 4 KB page boundary).
Note:
LBAs in extended mode must contain the metadata memory size. | -| size_t nbytes | Size of a single read I/O, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| off64_t offset | Read offset of the LBA, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. The LBA in extended mode does not include the metadata size. | -| void *md_buf | Metadata buffer. (Applicable only to LBAs in separated mode. Set this parameter to **NULL** for LBAs in extended mode.). | -| size_t md_len | Buffer length of metadata. (Applicable only to LBAs in separated mode. Set this parameter to **0** for LBAs in extended mode.). | -| enum libstorage_crc_and_prchk dif_flag | Whether to calculate DIF and whether to enable drive verification. | -| LIBSTORAGE_CALLBACK_FUNC cb | Registered callback function. | -| void* cb_arg | Parameters of the callback function. | - -4. Return value - -| Return Value | Description | -| ------------ | --------------------------------------------- | -| 0 | I/O read requests are submitted successfully. | -| Other values | Failed to submit I/O read requests. | - -##### libstorage_async_writev - -1. Prototype - -``` -int32_t libstorage_async_writev(int32_t fd, struct iovec *iov, int iovcnt, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` - -2. Description - -Delivers asynchronous I/O write requests (the write buffer is a discrete buffer). - -3. Parameters - -| Parameter | Description | -| -------------------------------------- | ------------------------------------------------------------ | -| int32_t fd | File descriptor of the block device. | -| struct iovec *iov | Buffer for I/O write data.
Note:
LBAs in extended mode must contain the metadata size.
The address must be 4-byte-aligned and the length cannot exceed 4 GB. | -| int iovcnt | Number of buffers for I/O write data. | -| size_t nbytes | Size of a single write I/O, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| off64_t offset | Write offset of the LBA, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| void *md_buf | Metadata buffer. (Applicable only to LBAs in separated mode. Set this parameter to **NULL** for LBAs in extended mode.) | -| size_t md_len | Length of the metadata buffer. (Applicable only to LBAs in separated mode. Set this parameter to **0** for LBAs in extended mode.) | -| enum libstorage_crc_and_prchk dif_flag | Whether to calculate DIF and whether to enable drive verification. | -| LIBSTORAGE_CALLBACK_FUNC cb | Registered callback function. | -| void* cb_arg | Parameters of the callback function. | - -4. Return value - -| Return Value | Description | -| ------------ | ---------------------------------------------- | -| 0 | I/O write requests are submitted successfully. | -| Other values | Failed to submit I/O write requests. | - -##### libstorage_async_readv - -1. Prototype - -``` -int32_t libstorage_async_readv(int32_t fd, struct iovec *iov, int iovcnt, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` - -2. Description - -Delivers asynchronous I/O read requests (the read buffer is a discrete buffer). - -3. Parameters - -| Parameter | Description | -| -------------------------------------- | ------------------------------------------------------------ | -| int32_t fd | File descriptor of the block device. | -| struct iovec *iov | Buffer for I/O read data.
Note:
LBAs in extended mode must contain the metadata size.
The address must be 4-byte-aligned and the length cannot exceed 4 GB. | -| int iovcnt | Number of buffers for I/O read data. | -| size_t nbytes | Size of a single read I/O, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| off64_t offset | Read offset of the LBA, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| void *md_buf | Metadata buffer. (Applicable only to LBAs in separated mode. Set this parameter to **NULL** for LBAs in extended mode.) | -| size_t md_len | Length of the metadata buffer. (Applicable only to LBAs in separated mode. Set this parameter to **0** for LBAs in extended mode.) | -| enum libstorage_crc_and_prchk dif_flag | Whether to calculate DIF and whether to enable drive verification. | -| LIBSTORAGE_CALLBACK_FUNC cb | Registered callback function. | -| void* cb_arg | Parameters of the callback function. | - -4. Return value - -| Return Value | Description | -| ------------ | --------------------------------------------- | -| 0 | I/O read requests are submitted successfully. | -| Other values | Failed to submit I/O read requests. | - -##### libstorage_sync_write - -1. Prototype - -``` -int32_t libstorage_sync_write(int fd, const void *buf, size_t nbytes, off_t offset); -``` - -2. Description - -Delivers synchronous I/O write requests (the write buffer is a contiguous buffer). - -3. Parameters - -| Parameter | Description | -| -------------- | ------------------------------------------------------------ | -| int32_t fd | File descriptor of the block device. | -| void *buf | Buffer for I/O write data (four-byte aligned and cannot cross the 4 KB page boundary).
Note:
LBAs in extended mode must contain the metadata memory size. | -| size_t nbytes | Size of a single write I/O, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| off64_t offset | Write offset of the LBA, in bytes. (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | - -4. Return value - -| Return Value | Description | -| ------------ | ---------------------------------------------- | -| 0 | I/O write requests are submitted successfully. | -| Other values | Failed to submit I/O write requests. | - -##### libstorage_sync_read - -1. Prototype - -``` -int32_t libstorage_sync_read(int fd, const void *buf, size_t nbytes, off_t offset); -``` - -2. Description - -Delivers synchronous I/O read requests (the read buffer is a contiguous buffer). - -3. Parameters - -| Parameter | Description | -| -------------- | ------------------------------------------------------------ | -| int32_t fd | File descriptor of the block device. | -| void *buf | Buffer for I/O read data (four-byte aligned and cannot cross the 4 KB page boundary).
Note:
LBAs in extended mode must contain the metadata memory size. | -| size_t nbytes | Size of a single read I/O, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | -| off64_t offset | Read offset of the LBA, in bytes (an integer multiple of **sector_size**).
Note:
Only the data size is included. LBAs in extended mode do not include the metadata size. | - -4. Return value - -| Return Value | Description | -| ------------ | --------------------------------------------- | -| 0 | I/O read requests are submitted successfully. | -| Other values | Failed to submit I/O read requests. | - -##### libstorage_open - -1. Prototype - -``` -int32_t libstorage_open(const char* devfullname); -``` - -2. Description - -Opens a block device. - -3. Parameters - -| Parameter | Description | -| ----------------------- | ---------------------------------------- | -| const char* devfullname | Block device name (format: **nvme0n1**). | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| -1 | Opening failed. For example, the device name is incorrect, or the number of opened FDs is greater than the number of available channels of the NVMe drive. | -| > 0 | File descriptor of the block device. | - -After the MultiQ function in **nvme.conf.in** is enabled, different FDs are returned if a thread opens the same device for multiple times. Otherwise, the same FD is returned. This attribute applies only to the NVMe device. - -##### libstorage_close - -1. Prototype - -``` -int32_t libstorage_close(int32_t fd); -``` - -2. Description - -Closes a block device. - -3. Parameters - -| Parameter | Description | -| ---------- | ------------------------------------------ | -| int32_t fd | File descriptor of an opened block device. | - -4. Return value - -| Return Value | Description | -| ------------ | ----------------------------------------------- | -| -1 | Invalid file descriptor. | -| -16 | The file descriptor is busy. Retry is required. | -| 0 | Close succeeded. | - -##### libstorage_mem_reserve - -1. Prototype - -``` -void* libstorage_mem_reserve(size_t size, size_t align); -``` - -2. Description - -Allocates memory space from the huge page memory reserved by the DPDK. - -3. Parameters - -| Parameter | Description | -| ------------ | ----------------------------------- | -| size_t size | Size of the memory to be allocated. | -| size_t align | Aligns allocated memory space. | - -4. Return value - -| Return Value | Description | -| ------------ | -------------------------------------- | -| NULL | Allocation failed. | -| Other values | Address of the allocated memory space. | - -##### libstorage_mem_free - -1. Prototype - -``` -void libstorage_mem_free(void* ptr); -``` - -2. Description - -Frees the memory space pointed to by **ptr**. - -3. Parameters - -| Parameter | Description | -| --------- | ---------------------------------------- | -| void* ptr | Address of the memory space to be freed. | - -4. Return value - -None. - -##### libstorage_alloc_io_buf - -1. Prototype - -``` -void* libstorage_alloc_io_buf(size_t nbytes); -``` - -2. Description - -Allocates memory from buf_small_pool or buf_large_pool of the SPDK. - -3. Parameters - -| Parameter | Description | -| ------------- | ----------------------------------- | -| size_t nbytes | Size of the buffer to be allocated. | - -4. Return value - -| Return Value | Description | -| ------------ | -------------------------------------- | -| Other values | Start address of the allocated buffer. | - -##### libstorage_free_io_buf - -1. Prototype - -``` -int32_t libstorage_free_io_buf(void *buf, size_t nbytes); -``` - -2. Description - -Frees the allocated memory to buf_small_pool or buf_large_pool of the SPDK. - -3. Parameters - -| Parameter | Description | -| ------------- | ---------------------------------------- | -| void *buf | Start address of the buffer to be freed. | -| size_t nbytes | Size of the buffer to be freed. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------ | -| -1 | Freeing failed. | -| 0 | Freeing succeeded. | - -##### libstorage_init_module - -1. Prototype - -``` -int32_t libstorage_init_module(const char* cfgfile); -``` - -2. Description - -Initializes the HSAK module. - -3. Parameters - -| Parameter | Description | -| ------------------- | ------------------------------------ | -| const char* cfgfile | Name of the HSAK configuration file. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------- | -| Other values | Initialization failed. | -| 0 | Initialization succeeded. | - -##### libstorage_exit_module - -1. Prototype - -``` -int32_t libstorage_exit_module(void); -``` - -2. Description - -Exits the HSAK module. - -3. Parameters - -None. - -4. Return value - -| Return Value | Description | -| ------------ | --------------------------------- | -| Other values | Failed to exit the cleanup. | -| 0 | Succeeded in exiting the cleanup. | - -##### LIBSTORAGE_REGISTER_DPDK_INIT_NOTIFY - -1. Prototype - -``` -LIBSTORAGE_REGISTER_DPDK_INIT_NOTIFY(_name, _notify) -``` - -2. Description - -Service layer registration function, which is used to register the callback function when the DPDK initialization is complete. - -3. Parameters - -| Parameter | Description | -| --------- | ------------------------------------------------------------ | -| _name | Name of a module at the service layer. | -| _notify | Prototype of the callback function registered at the service layer: **void (*notifyFunc)(const struct libstorage_dpdk_init_notify_arg *arg);** | - -4. Return value - -None - -#### ublock.h - -##### init_ublock - -1. Prototype - -``` -int init_ublock(const char *name, enum ublock_rpc_server_status flg); -``` - -2. Description - -Initializes the Ublock module. This API must be called before other Ublock APIs. If the flag is set to **UBLOCK_RPC_SERVER_ENABLE**, that is, Ublock functions as the RPC server, the same process can be initialized only once. - -When Ublock is started as the RPC server, the monitor thread of a server is started at the same time. When the monitor thread detects that the RPC server thread is abnormal (for example, thread suspended), the monitor thread calls the exit function to trigger the process to exit. - -In this case, the product script is used to start the process again. - -3. Parameters - -| Parameter | Description | -| ------------------------------------ | ------------------------------------------------------------ | -| const char *name | Module name. The default value is **ublock**. You are advised to set this parameter to **NULL**. | -| enum ublock_rpc_server_status
flg | Whether to enable RPC. The value can be **UBLOCK_RPC_SERVER_DISABLE** or **UBLOCK_RPC_SERVER_ENABLE**.
If RPC is disabled and the drive is occupied by service processes, the Ublock module cannot obtain the drive information. | - -4. Return value - -| Return Value | Description | -| ------------- | ------------------------------------------------------------ | -| 0 | Initialization succeeded. | -| -1 | Initialization failed. Possible cause: The Ublock module has been initialized. | -| Process exits | Ublock considers that the following exceptions cannot be rectified and directly calls the exit API to exit the process:
- The RPC service needs to be created, but it fails to be created onsite.
- Failed to create a hot swap monitoring thread. | - -##### ublock_init - -1. Prototype - -``` -#define ublock_init(name) init_ublock(name, UBLOCK_RPC_SERVER_ENABLE) -``` - -2. Description - -It is the macro definition of the init_ublock API. It can be regarded as initializing Ublock into the required RPC service. - -3. Parameters - -| Parameter | Description | -| --------- | ------------------------------------------------------------ | -| name | Module name. The default value is **ublock**. You are advised to set this parameter to **NULL**. | - -4. Return value - -| Return Value | Description | -| ------------- | ------------------------------------------------------------ | -| 0 | Initialization succeeded. | -| -1 | Initialization failed. Possible cause: The Ublock RPC server module has been initialized. | -| Process exits | Ublock considers that the following exceptions cannot be rectified and directly calls the exit API to exit the process:
- The RPC service needs to be created, but it fails to be created onsite.
- Failed to create a hot swap monitoring thread. | - -##### ublock_init_norpc - -1. Prototype - -``` -#define ublock_init_norpc(name) init_ublock(name, UBLOCK_RPC_SERVER_DISABLE) -``` - -2. Description - -It is the macro definition of the init_ublock API and can be considered as initializing Ublock into a non-RPC service. - -3. Parameters - -| Parameter | Description | -| --------- | ------------------------------------------------------------ | -| name | Module name. The default value is **ublock**. You are advised to set this parameter to **NULL**. | - -4. Return value - -| Return Value | Description | -| ------------- | ------------------------------------------------------------ | -| 0 | Initialization succeeded. | -| -1 | Initialization failed. Possible cause: The Ublock client module has been initialized. | -| Process exits | Ublock considers that the following exceptions cannot be rectified and directly calls the exit API to exit the process:
- The RPC service needs to be created, but it fails to be created onsite.
- Failed to create a hot swap monitoring thread. | - -##### ublock_fini - -1. Prototype - -``` -void ublock_fini(void); -``` - -2. Description - -Destroys the Ublock module and internally created resources. This API must be used together with the Ublock initialization API. - -3. Parameters - -None. - -4. Return value - -None. - -##### ublock_get_bdevs - -1. Prototype - -``` -int ublock_get_bdevs(struct ublock_bdev_mgr* bdev_list); -``` - -2. Description - -Obtains the device list (all NVMe devices in the environment, including kernel-mode and user-mode drivers). The obtained NVMe device list contains only PCI addresses and does not contain specific device information. To obtain specific device information, call ublock_get_bdev. - -3. Parameters - -| Parameter | Description | -| --------------------------------- | ------------------------------------------------------------ | -| struct ublock_bdev_mgr* bdev_list | Output parameter, which returns the device queue. The **bdev_list** pointer must be allocated externally. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------ | -| 0 | The device queue is obtained successfully. | -| -2 | No NVMe device exists in the environment. | -| Other values | Failed to obtain the device list. | - -##### ublock_free_bdevs - -1. Prototype - -``` -void ublock_free_bdevs(struct ublock_bdev_mgr* bdev_list); -``` - -2. Description - -Releases a device list. - -3. Parameters - -| Parameter | Description | -| --------------------------------- | ------------------------------------------------------------ | -| struct ublock_bdev_mgr* bdev_list | Head pointer of the device queue. After the device queue is cleared, the **bdev_list** pointer is not released. | - -4. Return value - -None. - -##### ublock_get_bdev - -1. Prototype - -``` -int ublock_get_bdev(const char *pci, struct ublock_bdev *bdev); -``` - -2. Description - -Obtains information about a specific device. In the device information, the serial number, model, and firmware version of the NVMe device are saved as character arrays instead of character strings. (The return format varies depending on the drive controller, and the arrays may not end with 0.) - -After this API is called, the corresponding device is occupied by Ublock. Therefore, call ublock_free_bdev to free resources immediately after the required service operation is complete. - -3. Parameters - -| Parameter | Description | -| ------------------------ | ------------------------------------------------------------ | -| const char *pci | PCI address of the device whose information needs to be obtained. | -| struct ublock_bdev *bdev | Output parameter, which returns the device information. The **bdev** pointer must be allocated externally. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | The device information is obtained successfully. | -| -1 | Failed to obtain device information due to incorrect parameters. | -| -11(EAGAIN) | Failed to obtain device information due to the RPC query failure. A retry is required (3s sleep is recommended). | - -##### ublock_get_bdev_by_esn - -1. Prototype - -``` -int ublock_get_bdev_by_esn(const char *esn, struct ublock_bdev *bdev); -``` - -2. Description - -Obtains information about the device corresponding to an ESN. In the device information, the serial number, model, and firmware version of the NVMe device are saved as character arrays instead of character strings. (The return format varies depending on the drive controller, and the arrays may not end with 0.) - -After this API is called, the corresponding device is occupied by Ublock. Therefore, call ublock_free_bdev to free resources immediately after the required service operation is complete. - -3. Parameters - -| Parameter | Description | -| ------------------------ | ------------------------------------------------------------ | -| const char *esn | ESN of the device whose information is to be obtained.
Note:
An ESN is a string of a maximum of 20 characters (excluding the end character of the string), but the length may vary according to hardware vendors. For example, if the length is less than 20 characters, spaces are padded at the end of the character string. | -| struct ublock_bdev *bdev | Output parameter, which returns the device information. The **bdev** pointer must be allocated externally. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | The device information is obtained successfully. | -| -1 | Failed to obtain device information due to incorrect parameters. | -| -11(EAGAIN) | Failed to obtain device information due to the RPC query failure. A retry is required (3s sleep is recommended). | - -##### ublock_free_bdev - -1. Prototype - -``` -void ublock_free_bdev(struct ublock_bdev *bdev); -``` - -2. Description - -Frees device resources. - -3. Parameters - -| Parameter | Description | -| ------------------------ | ------------------------------------------------------------ | -| struct ublock_bdev *bdev | Pointer to the device information. After the data in the pointer is cleared, the **bdev** pointer is not freed. | - -4. Return value - -None. - -##### TAILQ_FOREACH_SAFE - -1. Prototype - -``` -#define TAILQ_FOREACH_SAFE(var, head, field, tvar) -for ((var) = TAILQ_FIRST((head)); -(var) && ((tvar) = TAILQ_NEXT((var), field), 1); -(var) = (tvar)) -``` - -2. Description - -Provides a macro definition for each member of the secure access queue. - -3. Parameters - -| Parameter | Description | -| --------- | ------------------------------------------------------------ | -| var | Queue node member on which you are performing operations. | -| head | Queue head pointer. Generally, it refers to the object address defined by **TAILQ_HEAD(xx, xx) obj**. | -| field | Name of the struct used to store the pointers before and after the queue in the queue node. Generally, it is the name defined by **TAILQ_ENTRY (xx) name**. | -| tvar | Next queue node member. | - -4. Return value - -None. - -##### ublock_get_SMART_info - -1. Prototype - -``` -int ublock_get_SMART_info(const char *pci, uint32_t nsid, struct ublock_SMART_info *smart_info); -``` - -2. Description - -Obtains the S.M.A.R.T. information of a specified device. - -3. Parameters - -| Parameter | Description | -| ------------------------------------ | ------------------------------------------------------------ | -| const char *pci | Device PCI address. | -| uint32_t nsid | Specified namespace. | -| struct ublock_SMART_info *smart_info | Output parameter, which returns the S.M.A.R.T. information of the device. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | The S.M.A.R.T. information is obtained successfully. | -| -1 | Failed to obtain S.M.A.R.T. information due to incorrect parameters. | -| -11(EAGAIN) | Failed to obtain S.M.A.R.T. information due to the RPC query failure. A retry is required (3s sleep is recommended). | - -##### ublock_get_SMART_info_by_esn - -1. Prototype - -``` -int ublock_get_SMART_info_by_esn(const char *esn, uint32_t nsid, struct ublock_SMART_info *smart_info); -``` - -2. Description - -Obtains the S.M.A.R.T. information of the device corresponding to an ESN. - -3. Parameters - -| Parameter | Description | -| --------------------------------------- | ------------------------------------------------------------ | -| const char *esn | Device ESN.
Note:
An ESN is a string of a maximum of 20 characters (excluding the end character of the string), but the length may vary according to hardware vendors. For example, if the length is less than 20 characters, spaces are padded at the end of the character string. | -| uint32_t nsid | Specified namespace. | -| struct ublock_SMART_info
*smart_info | Output parameter, which returns the S.M.A.R.T. information of the device. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | The S.M.A.R.T. information is obtained successfully. | -| -1 | Failed to obtain SMART information due to incorrect parameters. | -| -11(EAGAIN) | Failed to obtain S.M.A.R.T. information due to the RPC query failure. A retry is required (3s sleep is recommended). | - -##### ublock_get_error_log_info - -1. Prototype - -``` -int ublock_get_error_log_info(const char *pci, uint32_t err_entries, struct ublock_nvme_error_info *errlog_info); -``` - -2. Description - -Obtains the error log information of a specified device. - -3. Parameters - -| Parameter | Description | -| ------------------------------------------ | ------------------------------------------------------------ | -| const char *pci | Device PCI address. | -| uint32_t err_entries | Number of error logs to be obtained. A maximum of 256 error logs can be obtained. | -| struct ublock_nvme_error_info *errlog_info | Output parameter, which returns the error log information of the device. For the **errlog_info** pointer, the caller needs to apply for space and ensure that the obtained space is greater than or equal to err_entries x size of (struct ublock_nvme_error_info). | - -4. Return value - -| Return Value | Description | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| Number of obtained error logs. The value is greater than or equal to 0. | Error logs are obtained successfully. | -| -1 | Failed to obtain error logs due to incorrect parameters. | -| -11(EAGAIN) | Failed to obtain error logs due to the RPC query failure. A retry is required (3s sleep is recommended). | - -##### ublock_get_log_page - -1. Prototype - -``` -int ublock_get_log_page(const char *pci, uint8_t log_page, uint32_t nsid, void *payload, uint32_t payload_size); -``` - -2. Description - -Obtains information about a specified device and log page. - -3. Parameters - -| Parameter | Description | -| --------------------- | ------------------------------------------------------------ | -| const char *pci | Device PCI address. | -| uint8_t log_page | ID of the log page to be obtained. For example, **0xC0** and **0xCA** indicate the customized S.M.A.R.T. information of ES3000 V5 drives. | -| uint32_t nsid | Namespace ID. Some log pages support obtaining by namespace while some do not. If obtaining by namespace is not supported, the caller must transfer **0XFFFFFFFF**. | -| void *payload | Output parameter, which stores log page information. The caller is responsible for allocating memory. | -| uint32_t payload_size | Size of the applied payload, which cannot be greater than 4096 bytes. | - -4. Return value - -| Return Value | Description | -| ------------ | ---------------------------------------------------- | -| 0 | The log page is obtained successfully. | -| -1 | Failed to obtain error logs due to parameter errors. | - -##### ublock_info_get_pci_addr - -1. Prototype - -``` -char *ublock_info_get_pci_addr(const void *info); -``` - -2. Description - -Obtains the PCI address of the hot swap device. - -The memory occupied by info and the memory occupied by the returned PCI address do not need to be freed by the service process. - -3. Parameters - -| Parameter | Description | -| ---------------- | ------------------------------------------------------------ | -| const void *info | Hot swap event information transferred by the hot swap monitoring thread to the callback function. | - -4. Return value - -| Return Value | Description | -| ------------ | --------------------------------- | -| NULL | Failed to obtain the information. | -| Other values | Obtained PCI address. | - -##### ublock_info_get_action - -1. Prototype - -``` -enum ublock_nvme_uevent_action ublock_info_get_action(const void *info); -``` - -2. Description - -Obtains the type of the hot swap event. - -The memory occupied by info does not need to be freed by service process. - -3. Parameters - -| Parameter | Description | -| ---------------- | ------------------------------------------------------------ | -| const void *info | Hot swap event information transferred by the hot swap monitoring thread to the callback function. | - -4. Return value - -| Return Value | Description | -| -------------------------- | ------------------------------------------------------------ | -| Type of the hot swap event | Type of the event that triggers the callback function. For details, see the definition in **5.1.2.6 enum ublock_nvme_uevent_action**. | - -##### ublock_get_ctrl_iostat - -1. Prototype - -``` -int ublock_get_ctrl_iostat(const char* pci, struct ublock_ctrl_iostat_info *ctrl_iostat); -``` - -2. Description - -Obtains the I/O statistics of a controller. - -3. Parameters - -| Parameter | Description | -| ------------------------------------------- | ------------------------------------------------------------ | -| const char* pci | PCI address of the controller whose I/O statistics are to be obtained. | -| struct ublock_ctrl_iostat_info *ctrl_iostat | Output parameter, which returns I/O statistics. The **ctrl_iostat** pointer must be allocated externally. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------------------------ | -| 0 | Succeeded in obtaining I/O statistics. | -| -1 | Failed to obtain I/O statistics due to invalid parameters or RPC errors. | -| -2 | Failed to obtain I/O statistics because the NVMe drive is not taken over by the I/O process. | -| -3 | Failed to obtain I/O statistics because the I/O statistics function is disabled. | - -##### ublock_nvme_admin_passthru - -1. Prototype - -``` -int32_t ublock_nvme_admin_passthru(const char *pci, void *cmd, void *buf, size_t nbytes); -``` - -2. Description - -Transparently transmits the **nvme admin** command to the NVMe device. Currently, only the **nvme admin** command for obtaining the identify parameter is supported. - -3. Parameters - -| Parameter | Description | -| --------------- | ------------------------------------------------------------ | -| const char *pci | PCI address of the destination controller of the **nvme admin** command. | -| void *cmd | Pointer to the **nvme admin** command struct. The struct size is 64 bytes. For details, see the NVMe specifications. Currently, only the command for obtaining the identify parameter is supported. | -| void *buf | Saves the output of the **nvme admin** command. The space is allocated by users and the size is expressed in nbytes. | -| size_t nbytes | Size of the user buffer. The buffer for the identify parameter is 4096 bytes, and that for the command to obtain the identify parameter is 4096 nbytes. | - -4. Return value - -| Return Value | Description | -| ------------ | ------------------------------------------ | -| 0 | The user command is executed successfully. | -| -1 | Failed to execute the user command. | - -## Appendixes - -### GENERIC - -Generic Error Code Reference - -| sc | value | -| ------------------------------------------ | ----- | -| NVME_SC_SUCCESS | 0x00 | -| NVME_SC_INVALID_OPCODE | 0x01 | -| NVME_SC_INVALID_FIELD | 0x02 | -| NVME_SC_COMMAND_ID_CONFLICT | 0x03 | -| NVME_SC_DATA_TRANSFER_ERROR | 0x04 | -| NVME_SC_ABORTED_POWER_LOSS | 0x05 | -| NVME_SC_INTERNAL_DEVICE_ERROR | 0x06 | -| NVME_SC_ABORTED_BY_REQUEST | 0x07 | -| NVME_SC_ABORTED_SQ_DELETION | 0x08 | -| NVME_SC_ABORTED_FAILED_FUSED | 0x09 | -| NVME_SC_ABORTED_MISSING_FUSED | 0x0a | -| NVME_SC_INVALID_NAMESPACE_OR_FORMAT | 0x0b | -| NVME_SC_COMMAND_SEQUENCE_ERROR | 0x0c | -| NVME_SC_INVALID_SGL_SEG_DESCRIPTOR | 0x0d | -| NVME_SC_INVALID_NUM_SGL_DESCIRPTORS | 0x0e | -| NVME_SC_DATA_SGL_LENGTH_INVALID | 0x0f | -| NVME_SC_METADATA_SGL_LENGTH_INVALID | 0x10 | -| NVME_SC_SGL_DESCRIPTOR_TYPE_INVALID | 0x11 | -| NVME_SC_INVALID_CONTROLLER_MEM_BUF | 0x12 | -| NVME_SC_INVALID_PRP_OFFSET | 0x13 | -| NVME_SC_ATOMIC_WRITE_UNIT_EXCEEDED | 0x14 | -| NVME_SC_OPERATION_DENIED | 0x15 | -| NVME_SC_INVALID_SGL_OFFSET | 0x16 | -| NVME_SC_INVALID_SGL_SUBTYPE | 0x17 | -| NVME_SC_HOSTID_INCONSISTENT_FORMAT | 0x18 | -| NVME_SC_KEEP_ALIVE_EXPIRED | 0x19 | -| NVME_SC_KEEP_ALIVE_INVALID | 0x1a | -| NVME_SC_ABORTED_PREEMPT | 0x1b | -| NVME_SC_SANITIZE_FAILED | 0x1c | -| NVME_SC_SANITIZE_IN_PROGRESS | 0x1d | -| NVME_SC_SGL_DATA_BLOCK_GRANULARITY_INVALID | 0x1e | -| NVME_SC_COMMAND_INVALID_IN_CMB | 0x1f | -| NVME_SC_LBA_OUT_OF_RANGE | 0x80 | -| NVME_SC_CAPACITY_EXCEEDED | 0x81 | -| NVME_SC_NAMESPACE_NOT_READY | 0x82 | -| NVME_SC_RESERVATION_CONFLICT | 0x83 | -| NVME_SC_FORMAT_IN_PROGRESS | 0x84 | - -### COMMAND_SPECIFIC - -Error Code Reference for Specific Commands - -| sc | value | -| ------------------------------------------ | ----- | -| NVME_SC_COMPLETION_QUEUE_INVALID | 0x00 | -| NVME_SC_INVALID_QUEUE_IDENTIFIER | 0x01 | -| NVME_SC_MAXIMUM_QUEUE_SIZE_EXCEEDED | 0x02 | -| NVME_SC_ABORT_COMMAND_LIMIT_EXCEEDED | 0x03 | -| NVME_SC_ASYNC_EVENT_REQUEST_LIMIT_EXCEEDED | 0x05 | -| NVME_SC_INVALID_FIRMWARE_SLOT | 0x06 | -| NVME_SC_INVALID_FIRMWARE_IMAGE | 0x07 | -| NVME_SC_INVALID_INTERRUPT_VECTOR | 0x08 | -| NVME_SC_INVALID_LOG_PAGE | 0x09 | -| NVME_SC_INVALID_FORMAT | 0x0a | -| NVME_SC_FIRMWARE_REQ_CONVENTIONAL_RESET | 0x0b | -| NVME_SC_INVALID_QUEUE_DELETION | 0x0c | -| NVME_SC_FEATURE_ID_NOT_SAVEABLE | 0x0d | -| NVME_SC_FEATURE_NOT_CHANGEABLE | 0x0e | -| NVME_SC_FEATURE_NOT_NAMESPACE_SPECIFIC | 0x0f | -| NVME_SC_FIRMWARE_REQ_NVM_RESET | 0x10 | -| NVME_SC_FIRMWARE_REQ_RESET | 0x11 | -| NVME_SC_FIRMWARE_REQ_MAX_TIME_VIOLATION | 0x12 | -| NVME_SC_FIRMWARE_ACTIVATION_PROHIBITED | 0x13 | -| NVME_SC_OVERLAPPING_RANGE | 0x14 | -| NVME_SC_NAMESPACE_INSUFFICIENT_CAPACITY | 0x15 | -| NVME_SC_NAMESPACE_ID_UNAVAILABLE | 0x16 | -| NVME_SC_NAMESPACE_ALREADY_ATTACHED | 0x18 | -| NVME_SC_NAMESPACE_IS_PRIVATE | 0x19 | -| NVME_SC_NAMESPACE_NOT_ATTACHED | 0x1a | -| NVME_SC_THINPROVISIONING_NOT_SUPPORTED | 0x1b | -| NVME_SC_CONTROLLER_LIST_INVALID | 0x1c | -| NVME_SC_DEVICE_SELF_TEST_IN_PROGRESS | 0x1d | -| NVME_SC_BOOT_PARTITION_WRITE_PROHIBITED | 0x1e | -| NVME_SC_INVALID_CTRLR_ID | 0x1f | -| NVME_SC_INVALID_SECONDARY_CTRLR_STATE | 0x20 | -| NVME_SC_INVALID_NUM_CTRLR_RESOURCES | 0x21 | -| NVME_SC_INVALID_RESOURCE_ID | 0x22 | -| NVME_SC_CONFLICTING_ATTRIBUTES | 0x80 | -| NVME_SC_INVALID_PROTECTION_INFO | 0x81 | -| NVME_SC_ATTEMPTED_WRITE_TO_RO_PAGE | 0x82 | - -### MEDIA_DATA_INTERGRITY_ERROR - -Error Code Reference for Medium Exceptions - -| sc | value | -| -------------------------------------- | ----- | -| NVME_SC_WRITE_FAULTS | 0x80 | -| NVME_SC_UNRECOVERED_READ_ERROR | 0x81 | -| NVME_SC_GUARD_CHECK_ERROR | 0x82 | -| NVME_SC_APPLICATION_TAG_CHECK_ERROR | 0x83 | -| NVME_SC_REFERENCE_TAG_CHECK_ERROR | 0x84 | -| NVME_SC_COMPARE_FAILURE | 0x85 | -| NVME_SC_ACCESS_DENIED | 0x86 | -| NVME_SC_DEALLOCATED_OR_UNWRITTEN_BLOCK | 0x87 | \ No newline at end of file diff --git a/docs/en/docs/HSAK/hsak_tools_usage.md b/docs/en/docs/HSAK/hsak_tools_usage.md deleted file mode 100644 index 9eef8d83401b13c80302e0d52e9356f7ec86e2e5..0000000000000000000000000000000000000000 --- a/docs/en/docs/HSAK/hsak_tools_usage.md +++ /dev/null @@ -1,123 +0,0 @@ -## Command-Line Interface - -### Command for Querying Drive Information - -#### Format - -```shell -libstorage-list [] [] -``` - -#### Parameters - -- *commands*: Only **help** is available. **libstorage-list help** is used to display the help information. - -- *device*: specifies the PCI address. The format is **0000:09:00.0**. Multiple PCI addresses are allowed and separated by spaces. If no specific PCI address is set, the command line lists all enumerated device information. - -#### Precautions - -- The fault injection function applies only to development, debugging, and test scenarios. Do not use this function on live networks. Otherwise, service and security risks may occur. - -- Before running this command, ensure that the management component (Ublock) server has been started, and the user-mode I/O component (UIO) has not been started or has been correctly started. - -- Drives that are not occupied by the Ublock or UIO component will be occupied during the command execution. If the Ublock or UIO component attempts to obtain the drive control permission, a storage device access conflict may occur. As a result, the command execution fails. - -### Command for Switching Drivers for Drives - -#### Format - -```shell -libstorage-shutdown reset [ ...] -``` - -#### Parameters - -- **reset**: switches the UIO driver to the kernel-mode driver for a specific drive. - -- *device*: specifies the PCI address, for example, **0000:09:00.0**. Multiple PCI addresses are allowed and separated by spaces. - -#### Precautions - -- The **libstorage-shutdown reset** command is used to switch a drive from the user-mode UIO driver to the kernel-mode NVMe driver. - -- Before running this command, ensure that the Ublock server has been started, and the UIO component has not been started or has been correctly started. - -- The **libstoage-shutdown reset** command is risky. Before switching to the NVMe driver, ensure that the user-mode instance has stopped delivering I/Os to the NVMe device, all FDs on the NVMe device have been disabled, and the instance that accesses the NVMe device has exited. - -### Command for Obtaining I/O Statistics - -#### Format - -```shell -libstorage-iostat [-t ] [-i ] [-d ] -``` - -#### Parameters - -- -**t**: interval, in seconds. The value ranges from 1 to 3600. This parameter is of the int type. If the input parameter value exceeds the upper limit of the int type, the value is truncated to a negative or positive number. - -- -**i**: number of collection times. The minimum value is **1** and the maximum value is *MAX_INT*. If this parameter is not set, information is collected at an interval by default. This parameter is of the int type. If the input parameter value exceeds the upper limit of the int type, the value is truncated to a negative or positive number. - -- -**d**: name of a block device (for example, **nvme0n1**, which depends on the controller name configured in **/etc/spdk/nvme.conf.in**). You can use this parameter to collect performance data of one or more specified devices. If this parameter is not set, performance data of all detected devices is collected. - -#### Precautions - -- The I/O statistics configuration is enabled. - -- The process has delivered I/O operations to the drive whose performance information needs to be queried through the UIO component. - -- If no device in the current environment is occupied by service processes to deliver I/Os, the command exits after the message "You cannot get iostat info for nvme device no deliver io" is displayed. - -- When multiple queues are enabled on a drive, the I/O statistics tool summarizes the performance data of multiple queues on the drive and outputs the data in a unified manner. - -- The I/O statistics tool supports data records of a maximum of 8192 drive queues. - -- The I/O statistics are as follows: - - | Device | r/s | w/s | rKB/s | wKB/s | avgrq-sz | avgqu-sz | r_await | w_await | await | svctm | util% | poll-n | - | ----------- | ------------------------------ | ------------------------------- | ----------------------------------- | ------------------------------------ | -------------------------------------- | -------------------------- | --------------------- | ---------------------- | ------------------------------- | --------------------------------------- | ------------------ | -------------------------- | - | Device name | Number of read I/Os per second | Number of write I/Os per second | Number of read I/O bytes per second | Number of write I/O bytes per second | Average size of delivered I/Os (bytes) | I/O depth of a drive queue | I/O read latency (μs) | I/O write latency (μs) | Average read/write latency (μs) | Processing latency of a single I/O (μs) | Device utilization | Number of polling timeouts | - -## Commands for Drive Read/Write Operations - -#### Format - -```shell -libstorage-rw [OPTIONS...] -``` - -#### Parameters - -1. **COMMAND** parameters - - - **read**: reads a specified logical block from the device to the data buffer (standard output by default). - - - **write**: writes data in a data buffer (standard input by default) to a specified logical block of the NVMe device. - - - **help**: displays the help information about the command line. - -2. **device**: specifies the PCI address, for example, **0000:09:00.0**. - -3. **OPTIONS** parameters - - - --**start-block, -s**: indicates the 64-bit start address of the logical block to be read or written. The default value is **0**. - - - --**block-count, -c**: indicates the number of the logical blocks to be read or written (counted from 0). - - - --**data-size, -z**: indicates the number of bytes of the data to be read or written. - - - --**namespae-id, -n**: indicates the namespace ID of the device. The default value is **1**. - - - --**data, -d**: indicates the data file used for read and write operations (The read data is saved during read operations and the written data is provided during write operations.) - - - --**limited-retry, -l**: indicates that the device controller restarts for a limited number of times to complete device read and write operations. - - - --**force-unit-access, -f**: ensures that read and write operations are completed from the nonvolatile media before the instruction is completed. - - - --**show-command, -v**: displays instruction information before sending a read/write command. - - - --**dry-run, -w**: displays only information about read and write instructions but does not perform actual read and write operations. - - - --**latency. -t**: collects statistics on the end-to-end read and write latency of the CLI. - - - --**help, -h**: displays the help information about related commands. \ No newline at end of file diff --git a/docs/en/docs/HSAK/introduce_hsak.md b/docs/en/docs/HSAK/introduce_hsak.md deleted file mode 100644 index e3e86faaacc97f92da75a2151b31f057920ff3d4..0000000000000000000000000000000000000000 --- a/docs/en/docs/HSAK/introduce_hsak.md +++ /dev/null @@ -1,47 +0,0 @@ -# HSAK Developer Guide - -## Overview - -As the performance of storage media such as NVMe SSDs and SCMs is continuously improved, the latency overhead of the media layer in the I/O stack continuously reduces, and the overhead of the software stack becomes a bottleneck. Therefore, the kernel I/O data plane needs to be reconstructed to reduce the overhead of the software stack. HSAK provides a high-bandwidth and low-latency I/O software stack for new storage media, which reduces the overhead by more than 50% compared with the traditional I/O software stack. -The HSAK user-mode I/O engine is developed based on the open-source SPDK. - -1. A unified interface is provided for external systems to shield the differences between open-source interfaces. -2. Enhanced I/O data plane features are added, such as DIF, drive formatting, batch I/O delivery, trim, and dynamic drive addition and deletion. -3. Commercial features such as drive device management, drive I/O monitoring, and maintenance and test tools are provided. - -## Compilation Tutorial - -1. Download the HSAK source code. - - $ git clone https://gitee.com/openeuler/hsak.git - -2. Install the compilation and running dependencies. - - The compilation and running of HSAK depend on components such as Storage Performance Development Kit (SPDK), Data Plane Development Kit (DPDK), and libboundscheck. - -3. Start the compilation. - - $ cd hsak - - $ mkdir build - - $ cd build - - $ cmake .. - - $ make - -## Precautions - -### Constraints - -- A maximum of 512 NVMe devices can be used and managed on the same machine. -- When HSAK is enabled to execute I/O-related services, ensure that the system has at least 500 MB continuous idle huge page memory. -- Before enabling the user-mode I/O component to execute services, ensure that the drive management component (Ublock) has been enabled. -- When the drive management component (Ublock) is enabled to execute services, ensure that the system has sufficient continuous idle memory. Each time the Ublock component is initialized, 20 MB huge page memory is allocated. -- Before HSAK is run, **setup.sh** is called to configure huge page memory and unbind the kernel-mode driver of the NVMe device. -- Other interfaces provided by the HSAK module can be used only after libstorage_init_module is successfully executed. Each process can call libstorage_init_module only once. -- After the libstorage_exit_module function is executed, other interfaces provided by HSAK cannot be used. In multi-thread scenarios, exit HSAK after all threads end. -- Only one service can be started for the HSAK Ublock component on a server and supports concurrent access of a maximum of 64 Ublock clients. The Ublock server can process a maximum of 20 client requests per second. -- The HSAK Ublock component must be started earlier than the data plane I/O component and Ublock clients. The command line tool provided by HSAK can be executed only after the Ublock server is started. -- Do not register the function for processing the SIGBUS signal. SPDK has an independent processing function for the signal. If the processing function is overwritten, the registered signal processing function becomes invalid and a core dump occurs. \ No newline at end of file diff --git a/docs/en/docs/Installation/Installation-Modes1.md b/docs/en/docs/Installation/Installation-Modes1.md index 3a3287f6d6583b5df028d3706efa4c0dc78b1118..7eebd07283e2d98e2d5e6d8be5c0ffd0cf03f6c1 100644 --- a/docs/en/docs/Installation/Installation-Modes1.md +++ b/docs/en/docs/Installation/Installation-Modes1.md @@ -1,12 +1,12 @@ # Installation Modes > ![](./public_sys-resources/icon-notice.gif) **NOTE** -> +> > - The hardware supports only Raspberry Pi 3B/3B+/4B. > - The installation is performed by writing images to the SD card. This section describes how to write images on Windows, Linux, and Mac. > - The image used in this section is the Raspberry Pi image of openEuler. For details about how to obtain the image, see [Installation Preparations](./安装准备-1.html). - +hu - [Installation Modes](#installation-modes) - [Writing Images on Windows](#writing-images-on-windows) @@ -33,9 +33,9 @@ To format the SD card, perform the following procedures: 1. Download and install a SD card formatting tool. The following operations use SD Card Formatter as an example. 2. Start SD Card Formatter. In **Select card**, select the drive letter of the SD card to be formatted. - + If no image has been installed in the SD card, only one drive letter exists. In **Select card**, select the drive letter of the SD card to be formatted. - + If an image has been installed in the SD card, one or more drive letters exist. For example, the SD card corresponds to three drive letters: E, G, and H. In **Select card**, you can select the drive letter E of the boot partition. 3. In **Formatting options**, select a formatting mode. The default mode is **Quick format**. @@ -46,10 +46,10 @@ To format the SD card, perform the following procedures: ### Writing Images to the SD Card -> ![](./public_sys-resources/icon-notice.gif) **NOTE** -If the compressed image file **openEuler-21.09-raspi-aarch64.img.xz** is obtained, decompress the file to obtain the **openEuler-21.09-raspi-aarch64.img** image file. +> ![](./public_sys-resources/icon-notice.gif) **NOTE** +If the compressed image file **openEuler-23.03-raspi-aarch64.img.xz** is obtained, decompress the file to obtain the **openEuler-23.03-raspi-aarch64.img** image file. -To write the **openEuler-21.09-raspi-aarch64.img** image file to the SD card, perform the following procedures: +To write the **openEuler-23.03-raspi-aarch64.img** image file to the SD card, perform the following procedures: 1. Download and install a tool for writing images. The following operations use Win32 Disk Imager as an example. 2. Start Win32 Disk Imager and right-click **Run as administrator**. @@ -64,27 +64,27 @@ This section describes how to write images to the SD card in the Linux environme ### Checking Drive Partition Information -Run the ` **fdisk -l** ` command as the **root** user to obtain the drive information of the SD card. For example, the drive partition corresponding to the SD card can be **/dev/sdb**. +Run the `fdisk -l` command as the **root** user to obtain the drive information of the SD card. For example, the drive partition corresponding to the SD card can be **/dev/sdb**. ### Unmounting the SD Card -1. Run the ` **df -lh** ` command to check the mounted volumes. +1. Run the `df -lh` command to check the mounted volumes. 2. If the partitions corresponding to the SD card are not mounted, skip this step. If the partitions (for example, /dev/sdb1 and /dev/sdb3) are mounted, run the following commands as the **root** user to unmount them: - + `umount /dev/sdb1` - + `umount /dev/sdb3` ### Writing Images to the SD Card -1. If the image obtained is compressed, run the ` **xz -d openEuler-21.09-raspi-aarch64.img.xz** ` command to decompress the compressed file to obtain the **openEuler-21.09-raspi-aarch64.img** image file. Otherwise, skip this step. +1. If the image obtained is compressed, run the `xz -d openEuler-23.03-raspi-aarch64.img.xz` command to decompress the compressed file to obtain the **openEuler-23.03-raspi-aarch64.img** image file. Otherwise, skip this step. + +2. Run the following command as the **root** user to write the `openEuler-23.03-raspi-aarch64.img` image to the SD card: + + `dd bs=4M if=openEuler-23.03-raspi-aarch64.img of=/dev/sdb` -2. Run the following command as the **root** user to write the `openEuler-21.09-raspi-aarch64.img` image to the SD card: - - `dd bs=4M if=openEuler-21.09-raspi-aarch64.img of=/dev/sdb` - - > ![](./public_sys-resources/icon-note.gif) **NOTE** + > ![](./public_sys-resources/icon-note.gif) **NOTE** Generally, the block size is set to 4 MB. If the write operation fails or the written image cannot be used, you can set the block size to 1 MB and try again. However, the write operation is time-consuming when the block size is set to 1 MB. ## Writing Images on Mac @@ -93,26 +93,25 @@ This section describes how to flash images to the SD card in the Mac environment ### Checking Drive Partition Information -Run the ` **diskutil list** ` command as the **root** user to obtain the drive information of the SD card. For example, the drive partition corresponding to the SD card can be **/dev/disk3**. +Run the `diskutil list` command as the **root** user to obtain the drive information of the SD card. For example, the drive partition corresponding to the SD card can be **/dev/disk3**. ### Unmounting the SD Card -1. Run the ` **df -lh** ` command to check the mounted volumes. +1. Run the `df -lh` command to check the mounted volumes. 2. If the partitions corresponding to the SD card are not mounted, skip this step. If the partitions (for example, dev/disk3s1 and /dev/disk3s3) are mounted, run the following commands as the **root** user to unmount them: - + `diskutil umount /dev/disk3s1` - + `diskutil umount /dev/disk3s3` ### Writing Images to the SD Card -1. If the image obtained is compressed, run the `xz -d openEuler-21.09-raspi-aarch64.img.xz` command to decompress the compressed file to obtain the **openEuler-21.09-raspi-aarch64.img** image file. Otherwise, skip this step. +1. If the image obtained is compressed, run the `xz -d openEuler-23.03-raspi-aarch64.img.xz` command to decompress the compressed file to obtain the **openEuler-23.03-raspi-aarch64.img** image file. Otherwise, skip this step. + +2. Run the following command as the **root** user to write the image `openEuler-23.03-raspi-aarch64.img` to the SD card: + + `dd bs=4m if=openEuler-23.03-raspi-aarch64.img of=/dev/disk3` -2. Run the following command as the **root** user to write the image `openEuler-21.09-raspi-aarch64.img` to the SD card: - - `dd bs=4m if=openEuler-21.09-raspi-aarch64.img of=/dev/disk3` - > ![](./public_sys-resources/icon-note.gif) **NOTE** > Generally, the block size is set to 4 MB. If the write operation fails or the written image cannot be used, you can set the block size to 1 MB and try again. However, the write operation is time-consuming when the block size is set to 1 MB. - diff --git a/docs/en/docs/Installation/Installation-Preparations1.md b/docs/en/docs/Installation/Installation-Preparations1.md index 8d5fd29b54e1503d390d94661e93db2538325f2e..b34f11897a6af91760458f441e7b3ce388d476c9 100644 --- a/docs/en/docs/Installation/Installation-Preparations1.md +++ b/docs/en/docs/Installation/Installation-Preparations1.md @@ -2,18 +2,16 @@ This section describes the compatibility of the hardware and software and the related configurations and preparations required for the installation. - -- [Installation Preparations](#安装准备) - - [Obtaining the Installation Source](#获取安装源) - - [Verifying the Image Integrity](#镜像完整性校验) - - [Overview](#简介) - - [Prerequisites](#前提条件) - - [Procedures](#操作指导) - - [Installation Requirements](#安装要求) - - [Hardware Compatibility](#硬件兼容支持) - - [Minimum Hardware Specifications](#最小硬件要求) - - +- [Installation Preparations](#installation-preparations) + - [Obtaining the Installation Source](#obtaining-the-installation-source) + - [Verifying the Image Integrity](#verifying-the-image-integrity) + - [Overview](#overview) + - [Prerequisites](#prerequisites) + - [Procedures](#procedures) + - [Installation Requirements](#installation-requirements) + - [Hardware Compatibility](#hardware-compatibility) + - [Minimum Hardware Specifications](#minimum-hardware-specifications) + ## Obtaining the Installation Source Before installation, obtain the openEuler Raspberry Pi image and its verification file. @@ -26,11 +24,11 @@ Before installation, obtain the openEuler Raspberry Pi image and its verificatio - **aarch64**: image of the AArch64 architecture -6. Click **aarch64**. The Raspberry Pi AArch64 image download list is displayed. +4. Click **aarch64**. The Raspberry Pi AArch64 image download list is displayed. -7. Click **openEuler-21.09-raspi-aarch64.img.xz** to download the openEuler Raspberry Pi image to the local PC. +5. Click **openEuler-21.09-raspi-aarch64.img.xz** to download the openEuler Raspberry Pi image to the local PC. -8. Click **openEuler-21.09-raspi-aarch64.img.xz.sha256sum** to download the verification file of the openEuler Raspberry Pi image to the local PC. +6. Click **openEuler-21.09-raspi-aarch64.img.xz.sha256sum** to download the verification file of the openEuler Raspberry Pi image to the local PC. ## Verifying the Image Integrity @@ -53,21 +51,21 @@ Verification file: **openEuler-21.09-raspi-aarch64.img.xz.sha256sum** To verify the file integrity, perform the following procedures: 1. Obtain the verification value from the verification file. Run the following command: - - ``` - $ cat openEuler-21.09-raspi-aarch64.img.xz.sha256sum + + ```shell + cat openEuler-21.09-raspi-aarch64.img.xz.sha256sum ``` 2. Calculate the SHA256 verification value of the file. Run the following command: - - ``` - $ sha256sum openEuler-21.09-raspi-aarch64.img.xz + + ```shell + sha256sum openEuler-21.09-raspi-aarch64.img.xz ``` - + After the command is executed, the verification value is displayed. 3. Check whether the verification values obtained from the step 1 and step 2 are consistent. - + If they are consistent, the downloaded file is not damaged. Otherwise, the downloaded file is incomplete and you need to obtain the file again. ## Installation Requirements @@ -118,4 +116,3 @@ Currently, the openEuler Raspberry Pi image supports the 3B, 3B+, and 4B version - diff --git a/docs/en/docs/Installation/installation-preparations.md b/docs/en/docs/Installation/installation-preparations.md index 57a45ae2278a6630d2ce5404606a0766ade25670..2e7eb4ce5b8ec33b722b7bb7a89503a199699a9d 100644 --- a/docs/en/docs/Installation/installation-preparations.md +++ b/docs/en/docs/Installation/installation-preparations.md @@ -25,30 +25,30 @@ Obtain the openEuler release package and verification file before the installati Perform the following operations to obtain the openEuler release package: -1. Visit the [openEuler](https://www.openeuler.org/en/) website. -2. Click **Download > Software Packages**. -3. Click **Server Image** below **openEuler 21.09**. The ISO list is displayed. - - **aarch64**: ISO image files of the AArch64 architecture - - **x86_64**: ISO image files of the x86_64 architecture - - **source**: ISO image files of the openEuler source code -4. Select the target openEuler release package and verification file based on the actual environment. - - AArch64 architecture: - 1. Click **aarch64**. - 2. If you install openEuler from a local source, download the release package **openEuler-21.09-aarch64-dvd.iso** and the verification file **openEuler-21.09-aarch64-dvd.iso.sha256sum** to the local host. - 3. If you install openEuler through the network, download the release package **openEuler-21.09-netinst-aarch64-dvd.iso** and the verification file **openEuler-21.09-netinst-aarch64-dvd.iso.sha256sum** to the local host. - - - x86_64 architecture: - 1. Click **x86_64**. - 1. If you install openEuler from a local source, download the release package **openEuler-21.09-x86_64-dvd.iso** and the verification file **openEuler-21.09-x86_64-dvd.iso.sha256sum** to the local host. - 1. If you install openEuler through the network, download the release package **openEuler-21.09-netinst-x86_64-dvd.iso** and the verification file **openEuler-21.09-netinst-x86_64-dvd.iso.sha256sum** to the local host. - ->![](./public_sys-resources/icon-note.gif) **Note** +1. Visit the [openEuler](https://www.openeuler.org/en/) website. +2. Click **Download > Software Packages**. +3. Click **Server Image** below **openEuler 23.03**. The ISO list is displayed. + - **aarch64**: ISO image files of the AArch64 architecture + - **x86_64**: ISO image files of the x86_64 architecture + - **source**: ISO image files of the openEuler source code +4. Select the target openEuler release package and verification file based on the actual environment. + - AArch64 architecture: + 1. Click **aarch64**. + 2. If you install openEuler from a local source, download the release package **openEuler-23.03-aarch64-dvd.iso** and the verification file **openEuler-23.03-aarch64-dvd.iso.sha256sum** to the local host. + 3. If you install openEuler through the network, download the release package **openEuler-23.03-netinst-aarch64-dvd.iso** and the verification file **openEuler-23.03-netinst-aarch64-dvd.iso.sha256sum** to the local host. + + - x86_64 architecture: + 1. Click **x86_64**. + 2. If you install openEuler from a local source, download the release package **openEuler-23.03-x86_64-dvd.iso** and the verification file **openEuler-23.03-x86_64-dvd.iso.sha256sum** to the local host. + 3. If you install openEuler through the network, download the release package **openEuler-23.03-netinst-x86_64-dvd.iso** and the verification file **openEuler-23.03-netinst-x86_64-dvd.iso.sha256sum** to the local host. + +>![](./public_sys-resources/icon-note.gif) **Note** > When the network is available, install the environment on the network because the ISO release package is small. > The release package of AArch64 architecture supports UEFI mode, while the release package of x86_64 architecture supports UEFI mode and Legacy mode. ## Release Package Integrity Check ->![](./public_sys-resources/icon-note.gif) **NOTE** +>![](./public_sys-resources/icon-note.gif) **NOTE** >This section describes how to verify the integrity of the release package in the AArch64 architecture. The procedure for verifying the integrity of the release package in the x86_64 architecture is the same. ### Introduction @@ -61,29 +61,29 @@ Compare the verification value recorded in the verification file with the .iso f Before verifying the integrity of the release package, you need to prepare the following files: -ISO file: **openEuler-21.09-aarch64-dvd.iso** +ISO file: **openEuler-23.03-aarch64-dvd.iso** -Verification file: **openEuler-21.09-aarch64-dvd.iso.sha256sum** +Verification file: **openEuler-23.03-aarch64-dvd.iso.sha256sum** ### Procedures To verify the file integrity, perform the following operations: -1. Obtain the verification value in the verification file. Run the following command: +1. Obtain the verification value in the verification file. Run the following command: - ``` - $ cat openEuler-21.09-aarch64-dvd.iso.sha256sum + ```shell + cat openEuler-23.03-aarch64-dvd.iso.sha256sum ``` -2. Calculate the SHA256 verification value of the file. Run the following command: +2. Calculate the SHA256 verification value of the file. Run the following command: - ``` - $ sha256sum openEuler-21.09-aarch64-dvd.iso + ```shell + sha256sum openEuler-23.03-aarch64-dvd.iso ``` After the command is run, the verification value is displayed. -3. Check whether the values obtained from the step 1 and step 2 are consistent. +3. Check whether the values obtained from the step 1 and step 2 are consistent. If the values are consistent, the .iso file is not damaged. Otherwise, the file is damaged and you need to obtain it again. @@ -95,10 +95,10 @@ To install the openEuler OS on a PM, the PM must meet the following requirements You need to take hardware compatibility into account during openEuler installation. [Table 1](#table14948632047) describes the types of supported servers. ->![](./public_sys-resources/icon-note.gif) **NOTE:** +>![](./public_sys-resources/icon-note.gif) **NOTE:** > ->- TaiShan 200 servers are backed by Huawei Kunpeng 920 processors. ->- Currently, only Huawei TaiShan and FusionServer Pro servers are supported. More servers from other vendors will be supported in the future. +>- TaiShan 200 servers are backed by Huawei Kunpeng 920 processors. +>- Currently, only Huawei TaiShan and FusionServer Pro servers are supported. More servers from other vendors will be supported in the future. **Table 1** Supported servers @@ -128,8 +128,8 @@ To install the openEuler OS on a VM, the PM must meet the following requirements When installing openEuler, pay attention to the compatibility of the virtualization platform. Currently, the following virtualization platforms are supported: -- A virtualization platform created by the virtualization components \(openEuler as the host OS and QEMU and KVM provided in the release package\) of openEuler -- x86 virtualization platform of Huawei public cloud +- A virtualization platform created by the virtualization components \(openEuler as the host OS and QEMU and KVM provided in the release package\) of openEuler +- x86 virtualization platform of Huawei public cloud ### Minimum Virtualization Space @@ -143,23 +143,3 @@ When installing openEuler, pay attention to the compatibility of the virtualizat | CPU | Two CPUs| | Memory | ≥ 4 GB (8 GB or higher recommended for better user experience) | | Hard disk | ≥ 32 GB (120 GB or higher recommended for better user experience) | - - - - - - - - - - - - - - - - - - - - diff --git a/docs/en/docs/Installation/using-kickstart-for-automatic-installation.md b/docs/en/docs/Installation/using-kickstart-for-automatic-installation.md index f5ef2f025332aef52ade4743856a444cebb1e239..4c3d50cdca3327672ecbbdac5e7349841f47803f 100644 --- a/docs/en/docs/Installation/using-kickstart-for-automatic-installation.md +++ b/docs/en/docs/Installation/using-kickstart-for-automatic-installation.md @@ -2,16 +2,16 @@ - [Using Kickstart for Automatic Installation](#using-kickstart-for-automatic-installation) - - [Introduction](#introduction) - - [Overview](#overview) - - [Advantages and Disadvantages](#advantages-and-disadvantages) - - [Background](#background) - - [Semi-automatic Installation Guide](#semi-automatic-installation-guide) - - [Environment Requirements](#environment-requirements) - - [Procedure](#procedure) - - [Full-automatic Installation Guide](#full-automatic-installation-guide) - - [Environment Requirements](#environment-requirements-1) - - [Procedure](#procedure-1) + - [Introduction](#introduction) + - [Overview](#overview) + - [Advantages and Disadvantages](#advantages-and-disadvantages) + - [Background](#background) + - [Semi-automatic Installation Guide](#semi-automatic-installation-guide) + - [Environment Requirements](#environment-requirements) + - [Procedure](#procedure) + - [Full-automatic Installation Guide](#full-automatic-installation-guide) + - [Environment Requirements](#environment-requirements-1) + - [Procedure](#procedure-1) @@ -21,8 +21,8 @@ You can use the kickstart tool to automatically install the openEuler OS in either of the following ways: -- Semi-automatic installation: You only need to specify the location of the kickstart file. Kickstart automatically configures OS attributes such as keyboard, language, and partitions. -- Automatic installation: The OS is automatically installed. +- Semi-automatic installation: You only need to specify the location of the kickstart file. Kickstart automatically configures OS attributes such as keyboard, language, and partitions. +- Automatic installation: The OS is automatically installed. ### Advantages and Disadvantages @@ -58,17 +58,17 @@ You can use the kickstart tool to automatically install the openEuler OS in eith ### Background -**Kickstart** +#### Kickstart Kickstart is an unattended installation mode. The principle of kickstart is to record typical parameters that need to be manually entered during the installation and generate the configuration file **ks.cfg**. During the installation, the installation program searches the **ks.cfg** configuration file first for required parameters. If no matching parameters are found, you need to manually configure these parameters. If all required parameters are covered by the kickstart file, automatic installation can be achieved by only specifying the path of the kickstart file. Both full-automatic or semi-automatic installation can be achieved by kickstart. -**PXE** +#### PXE Pre-boot Execution Environment \(PXE\)\) works in client/server network mode. The PXE client can obtain an IP address from the DHCP server during the startup and implement client boot and installation through the network based on protocols such as trivial file transfer protocol \(TFTP\). -**TFTP** +#### TFTP TFTP is used to transfer simple and trivial files between clients and the server. @@ -78,33 +78,34 @@ TFTP is used to transfer simple and trivial files between clients and the server The environment requirements for semi-automatic installation of openEuler OS using kickstart are as follows: -- PM/VM \(For details about how to create VMs, see the documents from corresponding vendors\): includes the computer where kickstart is used for automatic installation and the computer where the kickstart tool is installed. -- Httpd: stores the kickstart file. -- ISO: openEuler-21.09-aarch64-dvd.iso +- PM/VM \(For details about how to create VMs, see the documents from corresponding vendors\): includes the computer where kickstart is used for automatic installation and the computer where the kickstart tool is installed. +- Httpd: stores the kickstart file. +- ISO: openEuler-23.03-aarch64-dvd.iso ### Procedure To use kickstart to perform semi-automatic installation of openEuler, perform the following steps: -**Environment Preparation** +#### Environment Preparation ->![](./public_sys-resources/icon-note.gif) **NOTE:** +>![](./public_sys-resources/icon-note.gif) **NOTE:** >Before the installation, ensure that the firewall of the HTTP server is disabled. Run the following command to disable the firewall: ->``` +> +>```shell >iptables -F >``` -1. Install httpd and start the service. +1. Install httpd and start the service. - ``` + ```shell # dnf install httpd -y # systemctl start httpd # systemctl enable httpd ``` -2. Run the following commands to prepare the kickstart file: +2. Run the following commands to prepare the kickstart file: - ``` + ```shell # mkdir /var/www/html/ks #vim /var/www/html/ks/openEuler-ks.cfg ===>The file can be obtained by modifying the **anaconda-ks.cfg** file automatically generated from openEuler OS. ==================================== @@ -156,9 +157,10 @@ To use kickstart to perform semi-automatic installation of openEuler, perform th ===================================== ``` - >![](./public_sys-resources/icon-note.gif) **NOTE:** + >![](./public_sys-resources/icon-note.gif) **NOTE:** >The method of generating the password ciphertext is as follows: - >``` +> + >```shell ># python3 >Python 3.7.0 (default, Apr 1 2019, 00:00:00) >[GCC 7.3.0] on linux @@ -169,63 +171,62 @@ To use kickstart to perform semi-automatic installation of openEuler, perform th >$6$63c4tDmQGn5SDayV$mZoZC4pa9Jdt6/ALgaaDq6mIExiOO2EjzomB.Rf6V1BkEMJDcMddZeGdp17cMyc9l9ML9ldthytBEPVcnboR/0 >``` -3. Mount the ISO image file to the CD-ROM drive of the computer where openEuler is to be installed. +3. Mount the ISO image file to the CD-ROM drive of the computer where openEuler is to be installed. If you want to install openEuler through the NFS, specify the path \(which is **cdrom** by default\) of installation source in the kickstart file. +#### Installing the System -**Installing the System** - -1. The installation selection dialog box is displayed. - 1. On the installation wizard page in [Starting the Installation](./installation-guideline.html#starting-the-installation), select **Install openEuler 21.09** and press **e**. - 2. Add **inst.ks=http://server ip/ks/openEuler-ks.cfg** to the startup parameters. +1. The installation selection dialog box is displayed. + 1. On the installation wizard page in [Starting the Installation](./installation-guideline.html#starting-the-installation), select **Install openEuler 23.03** and press **e**. + 2. Add **inst.ks= ip/ks/openEuler-ks.cfg** to the startup parameters. ![startparam.png](https://gitee.com/openeuler/docs/raw/master/docs/zh/docs/Installation/figures/startparam.png "startparam.png") - 3. Press **Ctrl**+**x** to start the automatic installation. + 3. Press **Ctrl**+**x** to start the automatic installation. -2. Verify that the installation is complete. +2. Verify that the installation is complete. After the installation is complete, the system automatically reboots. If the first boot option of the system is set to the CD_ROM, the installation page is displayed again. Shut down the computer and change startup option to start from the hard disk preferentially. ![](./figures/completing-the-automatic-installation.png) - ## Full-automatic Installation Guide ### Environment Requirements The environment requirements for full-automatic installation of openEuler using kickstart are as follows: -- PM/VM \(For details about how to create VMs, see the documents from corresponding vendors\): includes the computer where kickstart is used for automatic installation and the computer where the kickstart tool is installed. -- Httpd: stores the kickstart file. -- TFTP: provides vmlinuz and initrd files. -- DHCPD/PXE: provides the DHCP service. -- ISO: openEuler-21.09-aarch64-dvd.iso +- PM/VM \(For details about how to create VMs, see the documents from corresponding vendors\): includes the computer where kickstart is used for automatic installation and the computer where the kickstart tool is installed. +- Httpd: stores the kickstart file. +- TFTP: provides vmlinuz and initrd files. +- DHCPD/PXE: provides the DHCP service. +- ISO: openEuler-23.03-aarch64-dvd.iso ### Procedure To use kickstart to perform full-automatic installation of openEuler, perform the following steps: -**Environment Preparation** +#### Environment Preparation ->![](./public_sys-resources/icon-note.gif) **NOTE:** +>![](./public_sys-resources/icon-note.gif) **NOTE:** >Before the installation, ensure that the firewall of the HTTP server is disabled. Run the following command to disable the firewall: ->``` +> +>```shell >iptables -F >``` -1. Install httpd and start the service. +1. Install httpd and start the service. - ``` + ```shell # dnf install httpd -y # systemctl start httpd # systemctl enable httpd ``` -2. Install and configure TFTP. +2. Install and configure TFTP. - ``` + ```shell # dnf install tftp-server -y # vim /etc/xinetd.d/tftp service tftp @@ -248,16 +249,16 @@ To use kickstart to perform full-automatic installation of openEuler, perform th # systemctl enable xinetd ``` -3. Prepare the installation source. +3. Prepare the installation source. - ``` - # mount openEuler-21.09-aarch64-dvd.iso /mnt + ```shell + # mount openEuler-23.03-aarch64-dvd.iso /mnt # cp -r /mnt/* /var/www/html/openEuler/ ``` -4. Set and modify the kickstart configuration file **openEuler-ks.cfg**. Select the HTTP installation source by referring to [3](#en-us_topic_0229291289_l1692f6b9284e493683ffa2ef804bc7ca). +4. Set and modify the kickstart configuration file **openEuler-ks.cfg**. Select the HTTP installation source by referring to [3](#en-us_topic_0229291289_l1692f6b9284e493683ffa2ef804bc7ca). - ``` + ```shell #vim /var/www/html/ks/openEuler-ks.cfg ==================================== ***Modify the following information as required.*** @@ -281,9 +282,9 @@ To use kickstart to perform full-automatic installation of openEuler, perform th ... ``` -5. Modify the PXE configuration file **grub.cfg** as follows. (Note: Currently, openEuler does not support the cfg file in bls format.) +5. Modify the PXE configuration file **grub.cfg** as follows. (Note: Currently, openEuler does not support the cfg file in bls format.) - ``` + ```shell # cp -r /mnt/images/pxeboot/* /var/lib/tftpboot/ # cp /mnt/EFI/BOOT/grubaa64.efi /var/lib/tftpboot/ # cp /mnt/EFI/BOOT/grub.cfg /var/lib/tftpboot/ @@ -325,7 +326,7 @@ To use kickstart to perform full-automatic installation of openEuler, perform th 6. Configure DHCP \(which can be replaced by DNSmasq\). - ``` + ```shell # dnf install dhcp -y # # DHCP Server Configuration file. @@ -348,10 +349,9 @@ To use kickstart to perform full-automatic installation of openEuler, perform th # systemctl enable dhcpd ``` +#### Installing the System -**Installing the System** - -1. On the **Start boot option** screen, press **F2** to boot from the PXE and start automatic installation. +1. On the **Start boot option** screen, press **F2** to boot from the PXE and start automatic installation. ![](./figures/en-us_image_0229291270.png) @@ -359,8 +359,7 @@ To use kickstart to perform full-automatic installation of openEuler, perform th ![](./figures/en-us_image_0229291247.png) -2. The automatic installation window is displayed. -3. Verify that the installation is complete. +2. The automatic installation window is displayed. +3. Verify that the installation is complete. ![](./figures/completing-the-automatic-installation.png) - diff --git a/docs/en/docs/K3s/K3s-deployment-guide.md b/docs/en/docs/K3s/K3s-deployment-guide.md deleted file mode 100644 index e61d7082278eaae6b3dcc246d7a60517a524450e..0000000000000000000000000000000000000000 --- a/docs/en/docs/K3s/K3s-deployment-guide.md +++ /dev/null @@ -1,86 +0,0 @@ -# K3s Deployment Guide - -### What Is K3s? -K3s is a lightweight Kubernetes distribution that is optimized for edge computing and IoT scenarios. The K3s provides the following enhanced features: -- Packaged as a single binary file. -- Uses SQLite3-based lightweight storage backend as the default storage mechanism and supports etcd3, MySQL, and PostgreSQL. -- Encapsulated in a simple launcher that handles various complex TLS and options. -- Secure by default and has reasonable default values for lightweight environments. -- Batteries included, providing simple but powerful functions such as local storage providers, service load balancers, Helm controllers, and Traefik Ingress controllers. -- Encapsulates all operations of the Kubernetes control plane in a single binary file and process, capable of automating and managing complex cluster operations including certificate distribution. -- Minimizes external dependencies and requires only kernel and cgroup mounting. - -### Application Scenarios -K3s is applicable to the following scenarios: - -- Edge computing -- IoT -- Continuous integration -- Development -- ARM -- Embedded Kubernetes - -The resources required for running K3s are small. Therefore, K3s is also suitable for development and test scenarios. In these scenarios, K3s facilitates function verification and problem reproduction by shortening cluster startup time and reducing resources consumed by the cluster. - -### Deploying K3s - -#### Preparations - -- Ensure that the host names of the server node and agent node are different. - -You can run the `hostnamectl set-hostname "host name"` command to change the host name. - -![1661829534335](./figures/set-hostname.png) - -- Install K3s on each node using Yum. - - The K3s official website provides binary executable files of different architectures and the **install.sh** script for offline installation. The openEuler community migrates the compilation process of the binary file to the community and releases the compiled RPM package. You can run the `yum` command to download and install K3s. - -![1661830441538](./figures/yum-install.png) - -#### Deploying the Server Node - -To install K3s on a single server, run the following command on the server node: -``` -INSTALL_K3S_SKIP_DOWNLOAD=true k3s-install.sh -``` - -![1661825352724](./figures/server-install.png) - -#### Checking Server Deployment - -![1661825403705](./figures/check-server.png) - -#### Deploying the Agent Node - -Query the token value of the server node. The token is stored in the **/var/lib/rancher/k3s/server/node-token** file on the server node. - -> **Note**: -> -> Only the second half of the token is used. - -![1661825538264](./figures/token.png) - -Add agents. Run the following command on each agent node: - -``` -INSTALL_K3S_SKIP_DOWNLOAD=true K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken k3s-install.sh -``` - -> **Note:** -> -> Replace **myserver** with the IP address of the server or a valid DNS, and replace **mynodetoken** with the token of the server node. - -![1661829392357](./figures/agent-install.png) - -#### Checking Agent Deployment - -After the installation is complete, run `kubectl get nodes` on the server node to check if the agent node is successfully registered. - -![1661826797319](./figures/check-agent.png) - -A basic K3S cluster is set up. - -#### More - -For details about how to use K3s, visit the K3s [official website](https://rancher.com/docs/k3s/latest/en/). diff --git a/docs/en/docs/K3s/figures/agent-install.png b/docs/en/docs/K3s/figures/agent-install.png deleted file mode 100644 index dca1d64ec8aae821393bb715daf4c56b783a68e0..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/K3s/figures/agent-install.png and /dev/null differ diff --git a/docs/en/docs/K3s/figures/check-agent.png b/docs/en/docs/K3s/figures/check-agent.png deleted file mode 100644 index aa467713353d70ad513e8ee13ac9d8b6520b7ee0..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/K3s/figures/check-agent.png and /dev/null differ diff --git a/docs/en/docs/K3s/figures/check-server.png b/docs/en/docs/K3s/figures/check-server.png deleted file mode 100644 index 06343de9a8b0eacb0f6194cf438b2b27af88cae4..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/K3s/figures/check-server.png and /dev/null differ diff --git a/docs/en/docs/K3s/figures/server-install.png b/docs/en/docs/K3s/figures/server-install.png deleted file mode 100644 index 7d30c8f4f73946c8b0555186c1736492039da731..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/K3s/figures/server-install.png and /dev/null differ diff --git a/docs/en/docs/K3s/figures/set-hostname.png b/docs/en/docs/K3s/figures/set-hostname.png deleted file mode 100644 index 32564d6159825b6d4131a6b138a493188ce88c6c..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/K3s/figures/set-hostname.png and /dev/null differ diff --git a/docs/en/docs/K3s/figures/token.png b/docs/en/docs/K3s/figures/token.png deleted file mode 100644 index 79e5313bd1d5e707659cd08d4aafdf528b9df8f0..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/K3s/figures/token.png and /dev/null differ diff --git a/docs/en/docs/K3s/figures/yum-install.png b/docs/en/docs/K3s/figures/yum-install.png deleted file mode 100644 index 0e601a23a5a67e7927f12bc90d1a4137e1a3a567..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/K3s/figures/yum-install.png and /dev/null differ diff --git a/docs/en/docs/Kernel/how-to-use.md b/docs/en/docs/Kernel/how-to-use.md deleted file mode 100644 index 6079a2bb92663f33297627a8e0f3009acf1d1a25..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kernel/how-to-use.md +++ /dev/null @@ -1,255 +0,0 @@ -## How to Use - -### Tiered-Reliability Memory Management for the OS - -**Overview** - -Memory is divided into two ranges based on high and low reliability. Therefore, memory allocation and release must be managed separately based on the reliability. The OS must be able to control the memory allocation path. User-mode processes use low reliable memory, and kernel-mode processes use highly reliable memory. When the highly reliable memory is insufficient, the allocation needs to fall back to the low reliable memory range or the allocation fails. - -In addition, according to the reliability requirements and types of processes, on-demand allocation of highly reliable and low reliable memory is required. For example, specify highly reliable memory for key processes to reduce the probability of memory errors encountered by key processes. Currently, the kernel uses highly reliable memory, and user-mode processes use low reliable memory. As a result, some key or core services, such as the service forwarding process, are unstable. If an exception occurs, I/Os are interrupted, affecting service stability. Therefore, these key services must use highly reliable memory to improve the stability of key processes. - -When a memory error occurs in the system, the OS overwrites the unallocated low reliable memory to clear the undetected memory error. - -**Restrictions** - -- **High-reliability memory for key processes** - - 1. The abuse of the `/proc//reliable` API may cause excessive use of highly reliable memory. - 2. The `reliable` attribute of a user-mode process can be modified by using the proc API or directly inherited from its parent process only after the process is started. `systemd (pid=1)` uses highly reliable memory. Its `reliable` attribute is useless and is not inherited. The `reliable` attribute of kernel-mode threads is invalid. - 3. The program and data segments of processes use highly reliable memory. Because the highly reliable memory is insufficient, the low reliable memory is used for startup. - 4. Common processes also use highly reliable memory in some scenarios, such as HugeTLB, page cache, vDSO, and TMPFS. - -- **Overwrite of unallocated memory** - - The overwrite of the unallocated memory can be executed only once and does not support concurrent operations. If this feature is executed, it will have the following impacts: - - 1. This feature takes a long time. When one CPU of each node is occupied by the overwrite thread, other tasks cannot be scheduled on the CPU. - 2. During the overwrite process, the zone lock needs to be obtained. Other service processes need to wait until the overwrite is complete. As a result, the memory may not be allocated in time. - 3. In the case of concurrent execution, queuing is blocked, resulting in a longer delay. - - If the machine performance is poor, the kernel RCU stall or soft lockup alarm may be triggered, and the process memory allocation may be blocked. Therefore, this feature can be used only on physical machines if necessary. There is a high probability that the preceding problem occurs on VMs. - - The following table lists the reference data of physical machines. (The actual time required depends on the hardware performance and system load.) - - -Table 1 Test data when the TaiShan 2280 V2 server is unloaded - -| Test Item | Node 0 | Node 1 | Node 2 | Node 3 | -| ------------- | ------ | ------ | ------ | ------ | -| Free Mem (MB) | 109290 | 81218 | 107365 | 112053 | - -The total time is 3.2s. - -**Usage** - -This sub-feature provides multiple APIs. You only need to perform steps 1 to 6 to enable and verify the sub-feature. - -1. Configure `kernelcore=reliable` to enable tiered-reliability memory management. `CONFIG_MEMORY_RELIABLE` is mandatory. Otherwise, tiered-reliability memory management is disabled for the entire system. - -2. You can use the startup parameter `reliable_debug=[F][,S][,P]` to disable the fallback function (`F`), disable the TMPFS to use highly reliable memory (`S`), or disable the read/write cache to use highly reliable memory (`P`). By default, all the preceding functions are enabled. - -3. Based on the address range reported by the BIOS, the system searches for and marks the highly reliable memory. For the NUMA system, not every node needs to reserve reliable memory. However, the lower 4 GB physical space on node 0 must be highly reliable memory. During the system startup, the system allocates memory. If the highly reliable memory cannot be allocated, the low reliable memory is allocated (based on the fallback logic of the mirroring function) or the system cannot be started. If low reliable memory is used, the entire system is unstable. Therefore, the highly reliable memory on node 0 must be retained and the lower 4 GB physical space must be highly reliable memory. - -4. After the startup, you can check whether memory tiering is enabled based on the startup log. If it is enabled, the following information is displayed: - - ``` - mem reliable: init succeed, mirrored memory - ``` - -5. The physical address range corresponding to the highly reliable memory can be queried in the startup log. Observe the attributes in the memory map reported by the EFI. The memory range with `MR` is the highly reliable memory range. The following is an excerpt of the startup log. The memory range `mem06` is the highly reliable memory, and `mem07` is the low reliable memory. Their physical address ranges are also listed (the highly and low reliable memory address ranges cannot be directly queried in other modes). - - ``` - [ 0.000000] efi: mem06: [Conventional Memory| |MR| | | | | | |WB| | | ] range=[0x0000000100000000-0x000000013fffffff] (1024MB) - [ 0.000000] efi: mem07: [Conventional Memory| | | | | | | | |WB| | | ] range=[0x0000000140000000-0x000000083eb6cfff] (28651MB) - ``` - -6. During kernel-mode development, a page struct page can be determined based on the zone where the page is located. `ZONE_MOVABLE` indicates a low reliable memory zone. If the zone ID is smaller than `ZONE_MOVABLE`, the zone is a highly reliable memory zone. The following is an example: - - ``` - bool page_reliable(struct page *page) - { - if (!mem_reliable_status() || !page) - return false; - return page_zonenum(page) < ZONE_MOVABLE; - } - ``` - - In addition, the provided APIs are classified based on their functions: - - 1. **Checking whether the reliability function is enabled at the code layer**: In the kernel module, use the following API to check whether the tiered-reliability memory management function is enabled. If `true` is returned, the function is enabled. If `false` is returned, the function is disabled. - - ``` - #include - bool mem_reliable_status(void); - ``` - - 2. **Memory hot swap**: If the kernel enables the memory hot swap operation (Logical Memory hot-add), the highly and low reliable memories also support this operation. The operation unit is the memory block, which is the same as the native process. - - ``` - # Bring the memory online to the highly reliable memory range. - echo online_kernel > /sys/devices/system/memory/auto_online_blocks - # Bring the memory online to the low reliable memory range. - echo online_movable > /sys/devices/system/memory/auto_online_blocks - ``` - - 3. **Dynamically disabling a tiered management function**: The long type is used to determine whether to enable or disable the tiered-reliability memory management function based on each bit. - - - `bit0`: enables tiered-reliability memory management. - - `bit1`: disables fallback to the low reliable memory range. - - `bit2`: disables TMPFS to use highly reliable memory. - - `bit3`: disables the page cache to use highly reliable memory. - - Other bits are reserved for extension. If you need to change the value, call the following proc API (the permission is 600). The value range is 0-15. (The subsequent functions are processed only when `bit 0` of the general function is `1`. Otherwise, all functions are disabled.) - - ``` - echo 15 > /proc/sys/vm/reliable_debug - # All functions are disabled because bit0 is 0. - echo 14 > /proc/sys/vm/reliable_debug - ``` - - This command can only be used to disable the function. This command cannot be used to enable a function that has been disabled or is disabled during running. - - Note: This function is used for escape and is configured only when the tiered-reliability memory management feature needs to be disabled in abnormal scenarios or during commissioning. Do not use this function as a common function. - - 4. **Viewing highly reliable memory statistics**: Call the native `/proc/meminfo` API. - - - `ReliableTotal`: total size of reliable memory managed by the kernel. - - `ReliableUsed`: total size of reliable memory used by the system, including the reserved memory used in the system. - - `ReliableBuddyMem`: remaining reliable memory of the partner system. - - `ReliableTaskUsed`: highly reliable memory used by systemd and key user processes, including anonymous pages and file pages. - - `ReliableShmem`: highly reliable memory usage of the shared memory, including the total highly reliable memory used by the shared memory, TMPFS, and rootfs. - - `ReliableFileCache`: highly reliable memory usage of the read/write cache. - - 5. **Overwrite of unallocated memory**: This function requires the configuration item to be enabled. - - Enable `CONFIG_CLEAR_FREELIST_PAGE` and add the startup parameter `clear_freelist`. Call the proc API. The value can only be `1` (the permission is 0200). - - ``` - echo 1 > /proc/sys/vm/clear_freelist_pages - ``` - - Note: This feature depends on the startup parameter `clear_freelist`. The kernel matches only the prefix of the startup parameter. Therefore, this feature also takes effect for parameters with misspelled suffix, such as `clear_freelisttt`. - - To prevent misoperations, add the kernel module parameter `cfp_timeout_ms` to indicate the maximum execution duration of the overwrite function. If the overwrite function times out, the system exits even if the overwrite operation is not complete. The default value is `2000` ms (the permission is 0644). - - ``` - echo 500 > /sys/module/clear_freelist_page/parameters/cfp_timeout_ms # Set the timeout to 500 ms. - ``` - - 6. **Checking and modifying the high and low reliability attribute of the current process**: Call the `/proc//reliable` API to check whether the process is a highly reliable process. If the process is running and written, the attribute is inherited. If the subprocess does not require the attribute, manually modify the subprocess attribute. The systemd and kernel threads do not support the read and write of the attribute. The value can be `0` or `1`. The default value is `0`, indicating a low reliable process (the permission is 0644). - - ``` - # Change the process whose PID is 1024 to a highly reliable process. After the change, the process applies for memory from the highly reliable memory range. If the memory fails to be allocated, the allocation may fall back to the low reliable memory range. - echo 1 > /proc/1024/reliable - ``` - - 7. **Setting the upper limit of highly reliable memory requested by user-mode processes**: Call `/proc/sys/vm/task_reliable_limit` to modify the upper limit of highly reliable memory requested by user-mode processes. The value range is [`ReliableTaskUsed`, `ReliableTotal`], and the unit is byte (the permission is 0644). Notes: - - - The default value is `ulong_max`, indicating that there is no limit. - - If the value is `0`, the reliable process cannot use the highly reliable memory. In fallback mode, the allocation falls back to the low reliable memory range. Otherwise, OOM occurs. - - If the value is not `0` and the limit is triggered, the fallback function is enabled. The allocation falls back to the low reliable memory range. If the fallback function is disabled, OOM is returned. - -### Highly Reliable Memory for Read and Write Cache - -**Overview** - -A page cache is also called a file cache. When Linux reads or writes files, the page cache is used to cache the logical content of the files to accelerate the access to images and data on disks. If low reliable memory is allocated to page caches, UCE may be triggered during the access, causing system exceptions. Therefore, the read/write cache (page cache) needs to be placed in the highly reliable memory zone. In addition, to prevent the highly reliable memory from being exhausted due to excessive page cache allocations (unlimited by default), the total number of page caches and the total amount of reliable memory need to be limited. - -**Restrictions** - -1. When the page cache exceeds the limit, it is reclaimed periodically. If the generation speed of the page cache is faster than the reclamation speed, the number of page caches may be higher than the specified limit. -2. The usage of `/proc/sys/vm/reliable_pagecache_max_bytes` has certain restrictions. In some scenarios, the page cache forcibly uses reliable memory. For example, when metadata (such as inode and dentry) of the file system is read, the reliable memory used by the page cache exceeds the API limit. In this case, you can run `echo 2 \> /proc/sys/vm/drop_caches` to release inode and dentry. -3. When the highly reliable memory used by the page cache exceeds the `reliable_pagecache_max_bytes` limit, the low reliable memory is allocated by default. If the low reliable memory cannot be allocated, the native process is used. -4. FileCache statistics are first collected in the percpu cache. When the value in the cache exceeds the threshold, the cache is added to the entire system and then displayed in `/proc/meminfo`. `ReliableFileCache` does not have the preceding threshold in `/proc/meminfo`. As a result, the value of `ReliableFileCache` may be greater than that of `FileCache`. -5. Write cache scenarios are restricted by `dirty_limit` (restricted by /`proc/sys/vm/dirty_ratio`, indicating the percentage of dirty pages on a single memory node). If the threshold is exceeded, the current zone is skipped. For tiered-reliability memory, because highly and low reliable memories are in different zones, the write cache may trigger fallback of the local node and use the low reliable memory of the local node. You can run `echo 100 > /proc/sys/vm/dirty_ratio` to cancel the restriction. -6. The highly reliable memory feature for the read/write cache limits the page cache usage. The system performance is affected in the following scenarios: - - If the upper limit of the page cache is too small, the I/O increases and the system performance is affected. - - If the page cache is reclaimed too frequently, system freezing may occur. - - If a large amount of page cache is reclaimed each time after the page cache exceeds the limit, system freezing may occur. - -**Usage** - -The highly reliable memory is enabled by default for the read/write cache. To disable the highly reliable memory, configure `reliable_debug=P`. In addition, the page cache cannot be used unlimitedly. The function of limiting the page cache size depends on the `CONFIG_SHRINK_PAGECACHE` configuration item. - -`FileCache` in `/proc/meminfo` can be used to query the usage of the page cache, and `ReliableFileCache` can be used to query the usage of the reliable memory in the page cache. - -The function of limiting the page cache size depends on several proc APIs, which are defined in `/proc/sys/vm/` to control the page cache usage. For details, see the following table. - -| API Name (Native/New) | Permission| Description | Default Value | -| ------------------------------------ | ---- | ------------------------------------------------------------ | ------------------------------------------ | -| `cache_reclaim_enable` (native) | 644 | Whether to enable the page cache restriction function.
Value range: `0` or `1`. If an invalid value is input, an error is returned.
Example: `echo 1 > cache_reclaim_enable`| 1 | -| `cache_limit_mbytes` (new) | 644 | Upper limit of the cache, in MB.
Value range: The minimum value is 0, indicating that the restriction function is disabled. The maximum value is the memory size in MB, for example, the value displayed by running the `free –m` command (the value of `MemTotal` in `meminfo` converted in MB).
Example: `echo 1024 \> cache_limit_mbytes`
Others: It is recommended that the cache upper limit be greater than or equal to half of the total memory. Otherwise, the I/O performance may be affected if the cache is too small.| 0 | -| `cache_reclaim_s` (native) | 644 | Interval for triggering cache reclamation, in seconds. The system creates work queues based on the number of online CPUs. If there are *n* CPUs, the system creates *n* work queues. Each work queue performs reclamation every `cache_reclaim_s` seconds. This parameter is compatible with the CPU online and offline functions. If the CPU is offline, the number of work queues decreases. If the CPU is online, the number of work queues increases.
Value range: The minimum value is `0` (indicating that the periodic reclamation function is disabled) and the maximum value is `43200`. If an invalid value is input, an error is returned.
Example: `echo 120 \> cache_reclaim_s`
Others: You are advised to set the reclamation interval to several minutes (for example, 2 minutes). Otherwise, frequent reclamation may cause system freezing.| 0 | -| `cache_reclaim_weight` (native) | 644 | Weight of each reclamation. Each CPU of the kernel expects to reclaim `32 x cache_reclaim_weight` pages each time. This weight applies to both reclamation triggered by the page upper limit and periodic page cache reclamation.
Value range: 1 to 100. If an invalid value is input, an error is returned.
Example: `echo 10 \> cache_reclaim_weight`
Others: You are advised to set this parameter to `10` or a smaller value. Otherwise, the system may freeze each time too much memory is reclaimed.| 1 | -| `reliable_pagecache_max_bytes` (new)| 644 | Total amount of highly reliable memory in the page cache.
Value range: 0 to the maximum highly reliable memory, in bytes. You can call `/proc/meminfo` to query the maximum highly reliable memory. If an invalid value is input, an error is returned.
Example: `echo 4096000 \> reliable_pagecache_max_bytes`| Maximum value of the unsigned long type, indicating that the usage is not limited.| - -### Highly Reliable Memory for entered - -**Overview** - -If TMPFS is used as rootfs, it stores core files and data used by the OS. However, TMPFS uses low reliable memory by default, which makes core files and data unreliable. Therefore, TMPFS must use highly reliable memory. - -**Usage** - -By default, the highly reliable memory is enabled for TMPFS. To disable it, configure `reliable_debug=S`. You can dynamically disable it by calling `/proc/sys/vm/reliable_debug`, but cannot dynamically enable it. - -When enabling TMPFS to use highly reliable memory, you can check `ReliableShmem` in `/proc/meminfo` to view the highly reliable memory that has been used by TMPFS. - -By default, the upper limit for TMPFS to use highly reliable memory is half of the physical memory (except when TMPFS is used as rootfs). The conventional SYS V shared memory is restricted by `/proc/sys/kernel/shmmax and /proc/sys/kernel/shmall` and can be dynamically configured. It is also restricted by the highly reliable memory used by TMPFS. For details, see the following table. - -| **Parameter** | **Description** | -|---------------------------------|--------------------------------| -| `/proc/sys/kernel/shmmax` (native)| Size of a single SYS V shared memory range.| -| `/proc/sys/kernel/shmall` (native)| Total size of the SYS V shared memory that can be used. | - -The `/proc/sys/vm/shmem_reliable_bytes_limit` API is added for you to set the available highly reliable size (in bytes) of the system-level TMPFS. The default value is `LONG_MAX`, indicating that the usage is not limited. The value ranges from 0 to the total reliable memory size of the system. The permission is 644. When fallback is disabled and the memory usage reaches the upper limit, an error indicating that no memory is available is returned. When fallback is enabled, the system attempts to allocate memory from the low reliable memory zone. Example: - -``` -echo 10000000 > /proc/sys/vm/shmem_reliable_bytes_limit -``` - -### UCE Does Not Reset After the Switch from the User Mode to Kernel Mode - -**Overview** - -Based on the tiered-reliability memory management solution, the kernel and key processes use highly reliable memory. Most user-mode processes use low reliable memory. When the system is running, a large amount of data needs to be exchanged between the user mode and kernel mode. When data is transferred to the kernel mode, data in the low reliable memory zone is copied to the highly reliable memory zone. The copy operation is performed in kernel mode. If a UCE occurs when the user-mode data is read, that is, the kernel-mode memory consumption UCE occurs, the system triggers a panic. This sub-feature provides solutions for scenarios where UCEs occurred in the switch from the user mode to kernel mode to avoid system reset, including copy-on-write (COW), copy_to_user, copy_from_user, get_user, put_user, and core dump scenarios. Other scenarios are not supported. - -**Restrictions** - -1. ARMv8.2 or later that supports the RAS feature. -2. This feature changes the synchronization exception handling policy. Therefore, this feature takes effect only when the kernel receives a synchronization exception reported by the firmware. -3. The kernel processing depends on the error type reported by the BIOS. The kernel cannot process fatal hardware errors but can process recoverable hardware errors. -4. Only the COW, copy_to_user (including the read page cache), copy_from_user, get_user, put_user, and core dump scenarios are supported. -5. In the core dump scenario, UCE tolerance needs to be implemented on the write API of the file system. This feature supports only three common file systems: ext4, TMPFS, and PipeFS. The corresponding error tolerance APIs are as follows: - - PipeFS: `copy_page_from_iter` - - ext4/TMPFS: `iov_iter_copy_from_user_atomic` - -**Usage** - -Ensure that `CONFIG_ARCH_HAS_COPY_MC` is enabled in the kernel. If `/proc/sys/kernel/machine_check_safe` is set to `1`, this feature is enabled for all scenarios. If `/proc/sys/kernel/machine_check_safe` is set to `0`, this feature is disabled for all scenarios. Other values are invalid. - -The fault tolerance mechanism in each scenario is as follows: - -| **No.**| **Scenario** | **Symptom** | **Mitigation Measure** | -| ---- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| 1 | `copy_from/to_user`: basic switch to the user mode, involving syscall, sysctl, and procfs operations| If a UCE occurs during the copy, the kernel is reset. | If a UCE occurs, kill the current process. The kernel does not automatically reset. | -| 2 | `get/put_user`: simple variable copy, mainly in netlink scenarios.| If a UCE occurs during the copy, the kernel is reset. | If a UCE occurs, kill the current process. The kernel does not automatically reset. | -| 3 | COW: fork subprocess, which triggers COW. | COW is triggered. If a UCE occurs, the kernel is reset. | If a UCE occurs, kill related processes. The kernel does not automatically reset.| -| 4 | Read cache: The user mode uses low reliable memory. When a user-mode program reads or writes files, the OS uses idle memory to cache hard disk files, improving performance. However, when the user-mode program reads a file, the kernel accesses the cache first.| A UCE occurs, causing the kernel to reset. | If a UCE occurs, kill the current process. The kernel does not automatically reset.| -| 5 | UCE is triggered by memory access during a core dump. | A UCE occurs, causing the kernel to reset. | If a UCE occurs, kill the current process. The kernel does not automatically reset. | -| 6 | Write cache: When the write cache is flushed back to the disk, a UCE is triggered. | Cache flushing is actually disk DMA data migration. If a UCE is triggered during this process, page write fails after timeout. As a result, data inconsistency occurs and the file system becomes unavailable. If the data is key data, the kernel resets.| No solution is available. The kernel will be reset. | -| 7 | Kernel startup parameters and module parameters use highly reliable memory. | / | Not supported. The risk is reduced. | -| 8 | relayfs: a file system that quickly forwards data from the kernel mode to the user mode.| / | Not supported. The risk is reduced. | -| 9 | `seq_file`: transfers kernel data to the user mode as a file. | / | Not supported. The risk is reduced. | - -Most user-mode data uses low reliable memory. Therefore, this project involves only the scenario where user-mode data is read in kernel mode. In Linux, data can be exchanged between the user space and kernel space in nine modes, including kernel startup parameters, module parameters, sysfs, sysctl, syscall (system call), netlink, procfs, seq_file, debugfs, and relayfs. There are two other cases: COW and read/write file cache (page cache) when a process is created. - -In sysfs, syscall, netlink, and procfs modes, data is transferred from the user mode to the kernel mode in copy_from_user or get_user mode. - -The user mode can be switched to the kernel mode in the following scenarios: - -copy_from_user, get_user, COW, read cache, and write cache flushing. - -The kernel mode can be switched to the user mode in the following scenarios: - -relayfs, seq_file, copy_to_user, and put_user diff --git a/docs/en/docs/Kernel/overview.md b/docs/en/docs/Kernel/overview.md deleted file mode 100644 index 2751a20e048a4a77e19609d6f9cf9a47f090fb12..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kernel/overview.md +++ /dev/null @@ -1,3 +0,0 @@ -## Overview - -This feature allows you to allocate memory with corresponding reliability as required and mitigates the impact of some possible UCEs or CEs to some extent. In this way, the overall service reliability does not deteriorate when partial memory mirroring (a RAS feature called address range mirroring) is used. \ No newline at end of file diff --git a/docs/en/docs/Kernel/restrictions.md b/docs/en/docs/Kernel/restrictions.md deleted file mode 100644 index 3656a25694c047dbc18979d9b1847c36c0d5bdf2..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kernel/restrictions.md +++ /dev/null @@ -1,87 +0,0 @@ -## Restrictions - -This section describes the general constraints of this feature. Each subfeature has specific constraints, which are described in the corresponding section. - -**Compatibility** - -1. Currently, this feature applies only to ARM64. -2. The hardware needs to support partial memory mirroring (address range mirroring), that is, the memory whose attribute is `EFI_MEMORY_MORE_RELIABLE` is reported through the UEFI standard API. Common memory does not need to be set. The mirrored memory is the highly reliable memory, and the common memory is the low reliable memory. -3. High-reliability and low reliable memory tiering is implemented by using the memory management zones of the kernel. They cannot dynamically flow (that is, pages cannot move between different zones). -4. Continuous physical memory with different reliability is divided into different memblocks. As a result, the allocation of large continuous physical memory blocks may be restricted after memory tiering is enabled. -5. To enable this feature, the value of `kernelcore` must be `reliable`, which is incompatible with other values of this parameter. - - -**Design Specifications** - -1. During kernel-mode development, pay attention to the following points when allocating memory: - - - If the memory allocation API supports the specified `gfp_flag`, only the memory allocation whose `gfp_flag` contains `__GFP_HIGHMEM` and `__GFP_MOVABLE` forcibly allocates the common memory range or redirects to the reliable memory range. Other `gfp_flags` do not intervene. - - - High-reliability memory is allocated from slab, slub, and slob. (If the memory allocated at a time is greater than `KMALLOC_MAX_CACHE_SIZE` and `gfp_flag` is set to a common memory range, low reliable memory may be allocated.) - -2. During user-mode development, pay attention to the following points when allocating memory: - - - After the attribute of a common process is changed to a key process, the highly reliable memory is used only in the actual physical memory allocation phase (page fault). The attribute of the previously allocated memory does not change, and vice versa. Therefore, the memory allocated when a common process is started and changed to a key process may not be highly reliable memory. Whether the configuration takes effect can be verified by querying whether the physical address corresponding to the virtual address belongs to the highly reliable memory range. - - Similar mechanisms (ptmalloc, tcmalloc, and dpdk) in libc libraries, such as chunks in glibc, use cache logic to improve performance. However, memory cache causes inconsistency between the memory allocation logics of the user and the kernel. When a common process becomes a key process, this feature cannot be enabled (it is enabled only when the kernel allocates memory). - -3. When an upper-layer service applies for memory, if the highly reliable memory is insufficient (triggering the native min waterline of the zone) or the corresponding limit is triggered, the page cache is preferentially released to attempt to reclaim the highly reliable memory. If the memory still cannot be allocated, the kernel selects OOM or fallback to the low reliable memory range based on the fallback switch to complete memory allocation. (Fallback indicates that when the memory of a memory management zone or node is insufficient, memory is allocated from other memory management zones or nodes.) - -4. The dynamic memory migration mechanism similar to `NUMA_BALANCING` may cause the allocated highly reliable or low reliable memory to be migrated to another node. Because the migration operation loses the memory allocation context and the target node may not have the corresponding reliable memory, the memory reliability after the migration may not be as expected. - -5. The following configuration files are introduced based on the usage of the user-mode highly reliable memory: - - - **/proc/sys/vm/task_reliable_limit**: upper limit of the highly reliable memory used by key processes (including systemd). It contains anonymous pages and file pages. The SHMEM used by the process is also counted (included in anonymous pages). - - - **/proc/sys/vm/reliable_pagecache_max_bytes**: soft upper limit of the highly reliable memory used by the global page cache. The number of highly reliable page caches used by common processes is limited. By default, the system does not limit the highly reliable memory used by page caches. This restriction does not apply to scenarios such as highly reliable processes and file system metadata. Regardless of whether fallback is enabled, when a common process triggers the upper limit, the low reliable memory is allocated by default. If the low reliable memory cannot be allocated, the native process is used. - - - **/proc/sys/vm/shmem_reliable_bytes_limit**: soft upper limit of the highly reliable memory used by the global SHMEM. It limits the amount of highly reliable memory used by the SHMEM of common processes. By default, the system does not limit the amount of highly reliable memory used by SHMEM. High-reliability processes are not subject to this restriction. When fallback is disabled, if a common process triggers the upper limit, memory allocation fails, but OOM does not occur (consistent with the native process). - - If the above limits are reached, memory allocation fallback or OOM may occur. - - Memory allocation caused by page faults generated by key processes in the TMPFS or page cache may trigger multiple limits. For details about the interaction between multiple limits, see the following table. - - | Whether task_reliable_limit Is Reached| Whether reliable_pagecache_max_bytes or shmem_reliable_bytes_limit Is Reached| Memory Allocation Processing Policy | - | --------------------------- | ------------------------------------------------------------ | ------------------------------------------------ | - | Yes | Yes | The page cache is reclaimed first to meet the allocation requirements. Otherwise, fallback or OOM occurs.| - | Yes | No | The page cache is reclaimed first to meet the allocation requirements. Otherwise, fallback or OOM occurs.| - | No | No | High-reliability memory is allocated first. Otherwise, fallback or OOM occurs. | - | No | Yes | High-reliability memory is allocated first. Otherwise, fallback or OOM occurs. | - - Key processes comply with `task_reliable_limit`. If `task_reliable_limit` is greater than `tmpfs` or `pagecachelimit`, page cache and TMPFS generated by key processes still use highly reliable memory. As a result, the highly reliable memory used by page cache and TMPFS is greater than the corresponding limit. - - When `task_reliable_limit` is triggered, if the size of the highly reliable file cache is less than 4 MB, the file cache will not be reclaimed synchronously. If the highly reliable file cache is less than 4 MB when the page cache is generated, the allocation will fall back to the low reliable memory range. If the highly reliable file cache is greater than 4 MB, the page cache is reclaimed preferentially for allocation. However, when the size is close to 4 MB, direct cache reclamation is triggered more frequently. Because the lock overhead of direct cache reclamation is high, the CPU usage is high. In this case, the file read/write performance is close to the raw disk performance. - -6. Even if the system has sufficient highly reliable memory, the allocation may fall back to the low reliable memory range. - - - If the memory cannot be migrated to another node for allocation, the allocation falls back to the low reliable memory range of the current node. The common scenarios are as follows: - - If the memory allocation contains `__GFP_THISNODE` (for example, transparent huge page allocation), memory can be allocated only from the current node. If the highly reliable memory of the node does not meet the allocation requirements, the system attempts to allocate memory from the low reliable memory range of the memory node. - - A process runs on a node that contains common memory by running commands such as `taskset` and `numactl`. - - A process is scheduled to a common memory node under the native scheduling mechanism of the system memory. - - High-reliability memory allocation triggers the highly reliable memory usage threshold, which also causes fallback to the low reliable memory range. - -7. If tiered-reliability memory fallback is disabled, highly reliable memory cannot be expanded to low reliable memory. As a result, user-mode applications may not be compatible with this feature in determining the memory usage, for example, determining the available memory based on MemFree. - -8. If tiered-reliability memory fallback is enabled, the native fallback is affected. The main difference lies in the selection of the memory management zone and NUMA node. - - - Fallback process of **common user processes**: low reliable memory of the local node -> low reliable memory of the remote node. - - Fallback process of **key user processes**: highly reliable memory of the local node -> highly reliable memory of the remote node. If no memory is allocated and the fallback function of `reliable` is enabled, the system retries as follows: low reliable memory of the local node -> low reliable memory of the remote node. - -**Scenarios** - -1. The default page size (`PAGE_SIZE`) is 4 KB. -2. The lower 4 GB memory of the NUMA node 0 must be highly reliable, and the highly reliable memory size and low reliable memory size must meet the kernel requirements. Otherwise, the system may fail to start. There is no requirement on the highly reliable memory size of other nodes. However, - if a node does not have highly reliable memory or the highly reliable memory is insufficient, the per-node management structure may be located in the highly reliable memory of other nodes (because the per-node management structure is a kernel data structure and needs to be located in the highly reliable memory zone). As a result, a kernel warning is generated, for example, `vmemmap_verify` alarms are generated and the performance is affected. -3. Some statistics (such as the total amount of highly reliable memory for TMPFS) of this feature are collected using the percpu technology, which causes extra overhead. To reduce the impact on performance, there is a certain error when calculating the sum. It is normal that the error is less than 10%. -4. Huge page limit: - - In the startup phase, static huge pages are low reliable memory. By default, static huge pages allocated during running are low reliable memory. If memory allocation occurs in the context of a key process, the allocated huge pages are highly reliable memory. - - In the transparent huge page (THP) scenario, if one of the 512 4 KB pages to be combined (2 MB for example) is a highly reliable page, the newly allocated 2 MB huge page uses highly reliable memory. That is, the THP uses more highly reliable memory. - - The allocation of the reserved 2 MB huge page complies with the native fallback process. If the current node lacks low reliable memory, the allocation falls back to the highly reliable range. - - In the startup phase, 2 MB huge pages are reserved. If no memory node is specified, the load is balanced to each memory node for huge page reservation. If a memory node lacks low reliable memory, highly reliable memory is used according to the native process. -5. Currently, only the normal system startup scenario is supported. In some abnormal scenarios, kernel startup may be incompatible with the memory tiering function, for example, the kdump startup phase. (Currently, kdump can be automatically disabled. In other scenarios, it needs to be disabled by upper-layer services.) -6. In the swap-in and swap-out, memory offline, KSM, cma, and gigantic page processes, the newly allocated page types are not considered based on the tiered-reliability memory. As a result, the page types may not be defined (for example, the highly reliable memory usage statistics are inaccurate and the reliability level of the allocated memory is not as expected). - -**Impact on Performance** - -- Due to the introduction of tiered-reliability memory management, the judgment logic is added for physical page allocation, which affects the performance. The impact depends on the system status, memory type, and high- and low reliable memory margin of each node. -- This feature introduces highly reliable memory usage statistics, which affects system performance. -- When `task_reliable_limit` is triggered, the cache in the highly reliable zone is reclaimed synchronously, which increases the CPU usage. In the scenario where `task_reliable_limit` is triggered by page cache allocation (file read/write operations, such as dd), if the available highly reliable memory (ReliableFileCache is considered as available memory) is close to 4 MB, cache reclamation is triggered more frequently. The overhead of direct cache reclamation is high, causing high CPU usage. In this case, the file read/write performance is close to the raw disk performance. \ No newline at end of file diff --git a/docs/en/docs/Kmesh/Kmesh.md b/docs/en/docs/Kmesh/Kmesh.md new file mode 100644 index 0000000000000000000000000000000000000000..6d992110dea65af8ecb301fbb8ae8ba667e3d860 --- /dev/null +++ b/docs/en/docs/Kmesh/Kmesh.md @@ -0,0 +1,5 @@ +# Kmesh User Guide + +This document describes how to install, deploy, and use Kmesh, a high-performance data plane for the service mesh of openEuler. + +This document is intended for community developers, open source enthusiasts, and partners who use the openEuler operating system (OS) and want to learn and use Kmesh. Users must have basic knowledge of the Linux OS. diff --git a/docs/en/docs/Kmesh/appendix.md b/docs/en/docs/Kmesh/appendix.md new file mode 100644 index 0000000000000000000000000000000000000000..fd5059d26b33ff142b15f73c686e347baa9d549a --- /dev/null +++ b/docs/en/docs/Kmesh/appendix.md @@ -0,0 +1,3 @@ +# Appendix + +Learn more about [Kmesh](https://gitee.com/openeuler/Kmesh#kmesh). diff --git a/docs/en/docs/Kmesh/faqs.md b/docs/en/docs/Kmesh/faqs.md new file mode 100644 index 0000000000000000000000000000000000000000..e3193cd3b12a60f4add3e1a78a57200b1b8c770f --- /dev/null +++ b/docs/en/docs/Kmesh/faqs.md @@ -0,0 +1,23 @@ +# FAQs + +## 1. An error is reported and the Kmesh service exits after being started because the IP address of the control plane program is not configured in cluster startup mode + +![](./figures/not_set_cluster_ip.png) + +Possible cause: In cluster startup mode, the Kmesh service needs to communicate with the control plane program and obtain configuration information from the control plane. Therefore, you need to set the correct IP address of the control plane program. + +Solution: Set the correct IP address of the control plane program by referring to the cluster startup mode in [Installation and Deployment](./Installation and Deployment.md). + +## 2. The message "get kube config error!" is displayed when the Kmesh service is started + +![](./figures/get_kubeconfig_error.png) + +Possible cause: In cluster startup mode, the Kmesh service automatically obtains the IP address of the control plane program based on the Kubernetes configuration. If the kubeconfig path of Kubernetes is not configured in the environment, the kubeconfig will fail to be obtained, and the message "get kube config error!" is displayed. (If the Kmesh configuration file has been manually modified and the IP address of the control plane program has been correctly configured, ignore this problem.) + +Solution: Configure kubeconfig as follows: + +```shell +mkdir -p $HOME/.kube +sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config +sudo chown $(id -u):$(id -g) $HOME/.kube/config +``` diff --git a/docs/en/docs/Kmesh/figures/get_kubeconfig_error.png b/docs/en/docs/Kmesh/figures/get_kubeconfig_error.png new file mode 100644 index 0000000000000000000000000000000000000000..99087b68c6fafea1506e5f8bd862c371e93bdc97 Binary files /dev/null and b/docs/en/docs/Kmesh/figures/get_kubeconfig_error.png differ diff --git a/docs/en/docs/Kmesh/figures/kmesh-arch.png b/docs/en/docs/Kmesh/figures/kmesh-arch.png new file mode 100644 index 0000000000000000000000000000000000000000..000ec80ff35556199caef6ce78953599c1c52312 Binary files /dev/null and b/docs/en/docs/Kmesh/figures/kmesh-arch.png differ diff --git a/docs/en/docs/Kmesh/figures/not_set_cluster_ip.png b/docs/en/docs/Kmesh/figures/not_set_cluster_ip.png new file mode 100644 index 0000000000000000000000000000000000000000..9c879f37fa93c0f4fe0ab0f6220beff174e5f436 Binary files /dev/null and b/docs/en/docs/Kmesh/figures/not_set_cluster_ip.png differ diff --git a/docs/en/docs/Kmesh/installation-and-deployment.md b/docs/en/docs/Kmesh/installation-and-deployment.md new file mode 100644 index 0000000000000000000000000000000000000000..2db6b79e65f94764f8558ed3f6edf409954c677f --- /dev/null +++ b/docs/en/docs/Kmesh/installation-and-deployment.md @@ -0,0 +1,87 @@ +# Installation and Deployment + +## Software + +* OS: openEuler 23.03 + +## Hardware + +* x86_64 + +## Preparing the Environment + +* Install the openEuler operating system. For details, see the *openEuler 23.03 Installation Guide*. + +* The root permission is required for installing Kmesh. + +## Installing Kmesh + +* Install the Kmesh software package. + +```shell +[root@openEuler ~]# yum install Kmesh +``` + +* Check whether the installation is successful. If the command output contains the name of the software package, the installation is successful. + +```shell +[root@openEuler ~]# rpm -q Kmesh +``` + +## Deploying Kmesh + +### Cluster Startup Mode + +Before starting Kmesh, modify the configuration to set the IP address of the control plane program (for example, Istiod IP address) in the cluster. + +```json + "clusters": [ + { + "name": "xds-grpc", + "type" : "STATIC", + "connect_timeout": "1s", + "lb_policy": "ROUND_ROBIN", + "load_assignment": { + "cluster_name": "xds-grpc", + "endpoints": [{ + "lb_endpoints": [{ + "endpoint": { + "address":{ + "socket_address": { + "protocol": "TCP", + "address": "192.168.0.1",# Set the control plane IP address (for example, Istiod IP address). + "port_value": 15010 + } + } + } + }] + }] +``` + +### Local Startup Mode + +Before starting Kmesh, modify `kmesh.service` to disable ADS. + +```shell +[root@openEuler ~]# vim /usr/lib/systemd/system/kmesh.service +ExecStart=/usr/bin/kmesh-daemon -enable-kmesh -enable-ads=false +[root@openEuler ~]# systemctl daemon-reload +``` + +When the Kmesh service is started, the kmesh-daemon program is invoked. For details about how to use the kmesh-daemon program, see [Using kmesh-daemon](./usage.md). + +### Starting Kmesh + +```shell +# Start the Kmesh service. +[root@openEuler ~]# systemctl start kmesh.service +# Check the Kmesh running status. +[root@openEuler ~]# systemctl status kmesh.service +``` + +### Stopping Kmesh + +```shell +# Stop the Kmesh service. +[root@openEuler ~]# systemctl stop kmesh.service +``` diff --git a/docs/en/docs/Kmesh/introduction-to-kmesh.md b/docs/en/docs/Kmesh/introduction-to-kmesh.md new file mode 100644 index 0000000000000000000000000000000000000000..ff9c1b0262afd516958733113b01334c2ddc9a00 --- /dev/null +++ b/docs/en/docs/Kmesh/introduction-to-kmesh.md @@ -0,0 +1,38 @@ +# Introduction to Kmesh + +## Overview + +As the number of cloud-native applications surges, the scale of cloud applications and application SLAs pose high requirements on cloud infrastructure. + +The Kubernetes-based cloud infrastructure can help implement agile deployment and management of applications. However, it does not support application traffic orchestration. The emergence of service mesh makes up for the lack of traffic orchestration in Kubernetes and complements Kubernetes to implement agile cloud application development and O&M. However, with the development of service mesh applications, the current Sidecar-based mesh architecture has obvious performance defects on the data plane, which has become a consensus in the industry. + +* Long delay + Take the typical service mesh Istio as an example. After meshing, the single-hop delay of service access increases by 2.65 ms, which cannot meet the requirements of delay-sensitive applications. + +* High overhead + In Istio, each Sidecar configuration occupies more than 50 MB memory, and the CPU exclusively occupies two cores by default. For large-scale clusters, the overhead is high, reducing the deployment density of service containers. + +Based on the programmable kernel, Kmesh offloads mesh traffic governance to the OS and shortens the data path from 3 hops to 1 hop, greatly shortening the delay of the data plane and accelerating service innovation. + +## Architecture + +The following figure shows the overall architecture of Kmesh. + +![](./figures/kmesh-arch.png) + +Kmesh consists of the following components: + +* kmesh-controller + Kmesh management program, which is responsible for Kmesh lifecycle management, xDS protocol interconnection, and O&M monitoring. + +* kmesh-api + API layer provided by Kmesh for external systems, including orchestration APIs converted by xDS and O&M monitoring channels. + +* kmesh-runtime + Runtime that supports L3 to L7 traffic orchestration implemented in the kernel. + +* kmesh-orchestration + L3 to L7 traffic orchestration implemented based on eBPF, such as routing, gray release, and load balancing. + +* kmesh-probe + O&M monitoring probe, providing E2E monitoring capabilities. diff --git a/docs/en/docs/Kmesh/usage.md b/docs/en/docs/Kmesh/usage.md new file mode 100644 index 0000000000000000000000000000000000000000..89a971b7785bb3965c8a9b9fd34123aea6ec0712 --- /dev/null +++ b/docs/en/docs/Kmesh/usage.md @@ -0,0 +1,69 @@ +# Usage + +## Using kmesh-daemon + +```shell +# Command help +[root@openEuler ~]# kmesh-daemon -h +Usage of kmesh-daemon: + -bpf-fs-path string + bpf fs path (default "/sys/fs/bpf") + -cgroup2-path string + cgroup2 path (default "/mnt/kmesh_cgroup2") + -config-file string + [if -enable-kmesh] deploy in kube cluster (default "/etc/kmesh/kmesh.json") + -enable-ads + [if -enable-kmesh] enable control-plane from ads (default true) + -enable-kmesh + enable bpf kmesh + -service-cluster string + [if -enable-kmesh] TODO (default "TODO") + -service-node string + [if -enable-kmesh] TODO (default "TODO") + +# Enable ADS by default. +[root@openEuler ~]# kmesh-daemon -enable-kmesh + +# Enable ADS and specify the path of the configuration file. +[root@openEuler ~]# kmesh-daemon -enable-kmesh -enable-ads=true -config-file=/examples/kmesh.json + +# Disable ADS. +[root@openEuler ~]# kmesh-daemon -enable-kmesh -enable-ads=false +``` + +## Using kmesh-cmd + +```shell +# Command help +[root@openEuler ~]# kmesh-cmd -h +Usage of kmesh-cmd: + -config-file string + input config-resources to bpf maps (default "./config-resources.json") + +# Manually load configurations. +[root@openEuler ~]# kmesh-cmd -config-file=/examples/config-resources.json +``` + +## Using O&M Commands + +```shell +# Command help +[root@openEuler ~]# curl http://localhost:15200/help + /help: print list of commands + /options: print config options + /bpf/kmesh/maps: print bpf kmesh maps in kernel + /controller/envoy: print control-plane in envoy cache + /controller/kubernetes: print control-plane in kubernetes cache + +# Read the loaded configurations. +[root@openEuler ~]# curl http://localhost:15200/bpf/kmesh/maps +[root@openEuler ~]# curl http://localhost:15200/options +``` + +## Restrictions + +* If `-enable-ads=true` is configured, Kmesh automatically receives orchestration rules from the service mesh control plane. In this case, do not run the `kmesh-cmd` command to deliver rules to avoid repeated configurations. + +* The `-bpf-fs-path` option is used to specify the path of the BPF file system of the OS. Data related to the Kmesh BPF program is stored in this path. The default path is `/sys/fs/bpf`. + +* The `-cgroup2-path` option is used to specify the cgroup path of the OS. The default path is `/mnt/kmesh_cgroup2`. diff --git a/docs/en/docs/KubeOS/about-kubeos.md b/docs/en/docs/KubeOS/about-kubeos.md deleted file mode 100644 index 54feebcead5600c1102f97fe1be6708c0b44d362..0000000000000000000000000000000000000000 --- a/docs/en/docs/KubeOS/about-kubeos.md +++ /dev/null @@ -1,42 +0,0 @@ -# About KubeOS - -## Introduction - -Containers and Kubernetes are widely used in cloud scenarios. However, a current manner of managing the containers and the OSs separately usually faces problems of function redundancy and difficult collaboration between scheduling systems. In addition, it is difficult to manage OS versions. Software packages are installed, updated, and deleted separately in OSs of the same version. After a period of time, the OS versions become inconsistent, causing version fragmentation. Besides, the OSs may be tightly coupled with services, making it difficult to upgrade major versions. To solve the preceding problems, openEuler provides KubeOS, a container OS upgrade tool based on openEuler. - -Container OSs are lightweight OSs designed for scenarios where services run in containers. KubeOS connects container OSs as components to Kubernetes, so that the container OSs are in the same position as services. With KubeOS, a Kubernetes cluster manages containers and container OSs in a unified system. - -KubeOS is a Kubernetes operator for controlling the container OS upgrade process and upgrading the container OSs as a whole to implement collaboration between the OS managers and services. Before the container OSs are upgraded, services are migrated to other nodes to reduce the impact on services during OS upgrade and configuration. In this upgrade pattern, the container OSs are upgraded atomically so that the OSs remain synchronized with the expected status. This ensures that the OS versions in the cluster are consistent, preventing version fragmentation. - -## Architecture - -### KubeOS Architecture - -**Figure 1** KubeOS architecture - -![](./figures/kubeos-architecture.png) - -As shown in the preceding figure, KubeOS consists of three components: os-operator, os-proxy, and os-agent. The os-operator and os-proxy components run in containers and are deployed in the Kubernetes cluster. os-agent is not considered a cluster component. Its instances run on worker nodes as processes. - -- os-operator: global container OS manager, which continuously checks the container OS versions of all nodes, controls the number of nodes to be upgraded concurrently based on the configured information, and marks the nodes to be upgraded. - -- os-proxy: OS manager of a single node, which continuously checks the container OS version of the node. If a node is marked as the node to be upgraded by os-operator, the node is locked, the pod is evicted, and the upgrade information is forwarded to os-agent. - -- os-agent: receives information from os-proxy, downloads the container OS image used for upgrade from the OS image server, upgrades the container OS, and restarts the node. - - -### File System of a Container OS - -**Figure 2** File system layout of a container OS - -![](./figures/file-system-layout-of-a-container-os.png) - - - -As shown in the figure, a container OS comprises four partitions: - -- boot partition: GRUB2 file partition. -- Persist partition: stores persistent user data. When the container OS is upgraded, the data in this partition is retained. -- Two root partitions: Container OSs use the dual-partition mode with two root partitions, rootA and rootB. Assume that the container runs the OS stored in the rootA partition after initialization. When the system is upgraded, the new system is downloaded to the rootB partition. GRUB has two boot options: A and B. The default boot option of GRUB is set to B and the node is restarted. After the node is started, the container runs the upgraded OS in the rootB partition. - -The root file system of a container OS is read-only. Users' persistent data is stored in the Persist partition. diff --git a/docs/en/docs/KubeOS/figures/file-system-layout-of-a-container-os.png b/docs/en/docs/KubeOS/figures/file-system-layout-of-a-container-os.png deleted file mode 100644 index add62e72f85b103b7dd5780d2e360049f5f712df..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/KubeOS/figures/file-system-layout-of-a-container-os.png and /dev/null differ diff --git a/docs/en/docs/KubeOS/figures/kubeos-architecture.png b/docs/en/docs/KubeOS/figures/kubeos-architecture.png deleted file mode 100644 index 7834a3793b73c49ddd046502c65335a08f576c30..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/KubeOS/figures/kubeos-architecture.png and /dev/null differ diff --git a/docs/en/docs/KubeOS/installation-and-deployment.md b/docs/en/docs/KubeOS/installation-and-deployment.md deleted file mode 100644 index 1b10d159d6e1b535c904ebf7772465e332c83436..0000000000000000000000000000000000000000 --- a/docs/en/docs/KubeOS/installation-and-deployment.md +++ /dev/null @@ -1,205 +0,0 @@ -# Installation and Deployment - -This chapter describes how to install and deploy the KubeOS tool. - -- [Installation and Deployment](#installation-and-deployment) - - [Software and Hardware Requirements](#software-and-hardware-requirements) - - [Hardware Requirements](#hardware-requirements) - - [Software Requirements](#software-requirements) - - [Environment Preparation](#environment-preparation) - - [KubeOS Installation](#kubeos-installation) - - [KubeOS Deployment](#kubeos-deployment) - - [Building the os-operator and os-proxy Images](#building-the-os-operator-and-os-proxy-images) - - [Creating a KubeOS VM Image](#creating-a-kubeos-vm-image) - - [Deploying CRD, os-operator, and os-proxy](#deploying-crd-os-operator-and-os-proxy) - -## Software and Hardware Requirements - -### Hardware Requirements - -* Currently, only the x86 and AArch64 architectures are supported. - -### Software Requirements - -* OS: openEuler 22.09 - -### Environment Preparation - -* Install the openEuler system. For details, see the *openEuler 22.09 Installation Guide*. -* Install qemu-img, bc, Parted, tar, Yum, Docker, and dosfstools. - -## KubeOS Installation - -To install KubeOS, perform the following steps: - -1. Configure the Yum sources openEuler 22.09 and openEuler 22.09:EPOL: - - ``` - [openEuler22.09] # openEuler 22.09 official source - name=openEuler22.09 - baseurl=http://repo.openeuler.org/openEuler-22.09/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.09/everything/$basearch/RPM-GPG-KEY-openEuler - ``` - - ``` - [Epol] # openEuler 22.09:EPOL official source - name=Epol - baseurl=http://repo.openeuler.org/openEuler-22.09/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.09/OS/$basearch/RPM-GPG-KEY-openEuler - ``` - -2. Install KubeOS as the **root** user. - - ```shell - # yum install KubeOS KubeOS-scripts -y - ``` - - -> ![](./public_sys-resources/icon-note.gif)**NOTE**: -> -> KubeOS is installed in the **/opt/kubeOS** directory, including the os-operator, os-proxy, os-agent binary files, KubeOS image build tools, and corresponding configuration files. - -## KubeOS Deployment - -After KubeOS is installed, you need to configure and deploy it. This section describes how to configure and deploy KubeOS. - -### Building the os-operator and os-proxy Images - -#### Environment Preparation - -Before using Docker to create a container image, ensure that Docker has been installed and configured. - -#### Procedure - -1. Go to the working directory. - - ```shell - cd /opt/kubeOS - ``` - -2. Specify the image repository, name, and version for os-proxy. - - ```shell - export IMG_PROXY=your_imageRepository/os-proxy_imageName:version - ``` - -3. Specify the image repository, name, and version for os-operator. - - ```shell - export IMG_OPERATOR=your_imageRepository/os-operator_imageName:version - ``` - -4. Compile a Dockerfile to build an image. Pay attention to the following points when compiling a Dockerfile: - - * The os-operator and os-proxy images must be built based on the base image. Ensure that the base image is safe. - * Copy the os-operator and os-proxy binary files to the corresponding images. - * Ensure that the owner and owner group of the os-proxy binary file in the os-proxy image are **root**, and the file permission is **500**. - * Ensure that the owner and owner group of the os-operator binary file in the os-operator image are the user who runs the os-operator process in the container, and the file permission is **500**. - * The locations of the os-operator and os-proxy binary files in the image and the commands run during container startup must correspond to the parameters specified in the YAML file used for deployment. - - An example Dockerfile is as follows: - - ``` - FROM your_baseimage - COPY ./bin/proxy /proxy - ENTRYPOINT ["/proxy"] - ``` - - ``` - FROM your_baseimage - COPY --chown=6552:6552 ./bin/operator /operator - ENTRYPOINT ["/operator"] - ``` - - Alternatively, you can use multi-stage builds in the Dockerfile. - -5. Build the images (the os-operator and os-proxy images) to be included in the containers OS image. - - ```shell - # Specify the Dockerfile path of os-proxy. - export DOCKERFILE_PROXY=your_dockerfile_proxy - # Specify the Dockerfile path of os-operator. - export DOCKERFILE_OPERATOR=your_dockerfile_operator - # Build images. - docker build -t ${IMG_OPERATOR} -f ${DOCKERFILE_OPERATOR} . - docker build -t ${IMG_PROXY} -f ${DOCKERFILE_PROXY} . - ``` - -6. Push the images to the image repository. - - ```shell - docker push ${IMG_OPERATOR} - docker push ${IMG_PROXY} - ``` - - -### Creating a KubeOS VM Image - -#### Precautions - -* The VM image is used as an example. For details about how to create a physical machine image, see **KubeOS Image Creation**. -* The root permission is required for creating a KubeOS image. -* The RPM sources of the kbimg are the **everything** and **EPOL** repositories of openEuler of a specific version. In the Repo file provided during image creation, you are advised to configure the **everything** and **EPOL** repositories of a specific openEuler version for the Yum source. -* By default, the KubeOS VM image built using the default RPM list is stored in the same path as the kbimg tool. This partition must have at least 25 GiB free drive space. -* When creating a KubeOS image, you cannot customize the file system to be mounted. - -#### Procedure - -Use the **kbimg.sh** script to create a KubeOS VM image. For details about the commands, see **KubeOS Image Creation**. - -To create a KubeOS VM image, perform the following steps: - -1. Go to the working directory. - - ```shell - cd /opt/kubeOS/scripts - ``` - -2. Run `kbming.sh` to create a KubeOS image. The following is a command example: - - ```shell - bash kbimg.sh create vm-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' - ``` - In the command, **xx.repo** indicates the actual Yum source file used for creating the image. You are advised to configure both the **everything** and **EPOL** repositories as Yum sources. - - After the KubeOS image is created, the following files are generated in the **/opt/kubeOS/scripts** directory: - - - **system.img**: system image in raw format. The default size is 20 GB. The size of the root file system partition is less than 2,020 MiB, and the size of the Persist partition is less than 16 GiB. - - **system.qcow2**: system image in QCOW2 format. - - **update.img**: partition image of the root file system that is used for upgrade. - - The created KubeOS VM image can be used only in a VM of the x86 or AArch64 architecture. KubeOS does not support legacy boot in an x86 VM - - -### Deploying CRD, os-operator, and os-proxy - -#### Precautions - -* The Kubernetes cluster must be deployed first. For details, see the *openEuler 22.09 Kubernetes Cluster Deployment Guide*. - -- The OS of the worker nodes to be upgraded in the cluster must be the KubeOS built using the method described in the previous section. If it is not, use **system.qcow2** to deploy the VM again. For details about how to deploy a VM, see the *openEuler 22.09 Virtualization User Guide*. Currently, KubeOS does not support the master nodes. Use openEuler 22.09 to deploy the upgrade on the master nodes. -- The YAML files for deploying CustomResourceDefinition (CRD), os-operator, os-proxy, and role-based access control (RBAC) of the OS need to be compiled. -- The os-operator and os-proxy components are deployed in the Kubernetes cluster. os-operator must be deployed as a Deployment, and os-proxy as a DaemonSet. -- Kubernetes security mechanisms, such as the RBAC, pod service account, and security policies, must be deployed. - -#### Procedure - -1. Prepare YAML files used for deploying CRD, RBAC, os-operator, and os-proxy of the OS. For details, see [YAML examples](https://gitee.com/openeuler/KubeOS/tree/master/docs/example/config). The following uses **crd.yaml**, **rbac.yaml**, and **manager.yaml** as examples. - -2. Deploy CRD, RBAC, os-operator, and os-proxy. Assume that the **crd.yaml**, **rbac.yaml**, and **manager.yaml** files are stored in the **config/crd**, **config/rbac**, and **config/manager** directories, respectively. Run the following commands: - - ```shell - kubectl apply -f confg/crd - kubectl apply -f config/rbac - kubectl apply -f config/manager - ``` - -3. After the deployment is complete, run the following command to check whether each component is started properly. If **STATUS** of all components is **Running**, the components are started properly. - - ```shell - kubectl get pods -A - ``` diff --git a/docs/en/docs/KubeOS/kubeos-image-creation.md b/docs/en/docs/KubeOS/kubeos-image-creation.md deleted file mode 100644 index 167520be172e76399583e910970d304a4aba192a..0000000000000000000000000000000000000000 --- a/docs/en/docs/KubeOS/kubeos-image-creation.md +++ /dev/null @@ -1,161 +0,0 @@ -# KubeOS Image Creation # - -## Introduction ## - -kbimg is an image creation tool required for KubeOS deployment and upgrade. You can use kbimg to create KubeOS Docker, VM, and physical machine images. - -## Commands ## - -### Command Format ### - -**bash kbimg.sh** \[ --help | -h \] create \[ COMMANDS \] \[ OPTIONS \] - -### Parameter Description ### - -* COMMANDS - - | Parameter | Description | - | ------------- | ---------------------------------------------- | - | upgrade-image | Generates a Docker image for installation and upgrade.| - | vm-image | Generates a VM image for installation and upgrade. | - | pxe-image | Generates images and files required for physical machine installation. | - - - -* OPTIONS - - | Option | Description | - | ------------ | ------------------------------------------------------------ | - | -p | Path of the repo file. The Yum source required for creating an image is configured in the repo file. | - | -v | Version of the created KubeOS image. | - | -b | Path of the os-agent binary file. | - | -e | Password of the **root** user of the KubeOS image, which is an encrypted password with a salt value. You can run the OpenSSL or KIWI command to generate the password.| - | -d | Generated or used Docker image. | - | -h --help | Help Information. | - - - -## Usage Description ## - -#### Precautions ### - -* The root permission is required for executing **kbimg.sh**. -* Currently, only the x86 and AArch64 architectures are supported. -* The RPM sources of the kbimg are the **everything** and **EPOL** repositories of openEuler of a specific version. In the Repo file provided during image creation, you are advised to configure the **everything** and **EPOL** repositories of a specific openEuler version for the Yum source. - -### Creating a KubeOS Docker Image ### - -#### Precautions #### - -* The created Docker image can be used only for subsequent VM or physical machine image creation or upgrade. It cannot be used to start containers. -* If the default RPM list is used to create a KubeOS image, at least 6 GB drive space is required. If the RPM list is customized, the occupied drive space may exceed 6 GB. - -#### Example #### -* To configure the DNS, customize the `resolv.conf` file in the `scripts` directory. -```shell - cd /opt/kubeOS/scripts - touch resolv.conf - vim resolv.conf -``` -* Create a KubeOS image. -``` shell -cd /opt/kubeOS/scripts -bash kbimg.sh create upgrade-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' -d your_imageRepository/imageName:version -``` - -* After the creation is complete, view the created KubeOS image. - -``` shell -docker images -``` - -### Creating a KubeOS VM Image ### - -#### Precautions #### - -* To use a Docker image to create a KubeOS VM image, pull the corresponding image or create a Docker image first and ensure the security of the Docker image. -* The created KubeOS VM image can be used only in a VM of the x86 or AArch64 architecture. -* Currently, KubeOS does not support legacy boot in an x86 VM. -* If the default RPM list is used to create a KubeOS image, at least 25 GB drive space is required. If the RPM list is customized, the occupied drive space may exceed 25 GB. - -#### Example #### - -* Using the Repo Source - * To configure the DNS, customize the `resolv.conf` file in the `scripts` directory. - ```shell - cd /opt/kubeOS/scripts - touch resolv.conf - vim resolv.conf - ``` - * Create a KubeOS VM image. - ``` shell - cd /opt/kubeOS/scripts - bash kbimg.sh create vm-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' - ``` - -* Using a Docker Image - - ``` shell - cd /opt/kubeOS/scripts - bash kbimg.sh create vm-image -d your_imageRepository/imageName:version - ``` -* Result Description - After the KubeOS image is created, the following files are generated in the **/opt/kubeOS/scripts** directory: - * **system.qcow2**: system image in QCOW2 format. The default size is 20 GiB. The size of the root file system partition is less than 2,020 MiB, and the size of the Persist partition is less than 16 GiB. - * **update.img**: partition image of the root file system used for upgrade. - - -### Creating Images and Files Required for Installing KubeOS on Physical Machines ### - -#### Precautions #### - -* To use a Docker image to create a KubeOS VM image, pull the corresponding image or create a Docker image first and ensure the security of the Docker image. -* The created image can only be used to install KubeOS on a physical machine of the x86 or AArch64 architecture. -* The IP address specified in the **Global.cfg** file is a temporary IP address used during installation. After the system is installed and started, configure the network by referring to **openEuler 22.09 Administrator Guide** > **Configuring the Network**. -* KubeOS cannot be installed on multiple drives at the same time. Otherwise, the startup may fail or the mounting may be disordered. -* Currently, KubeOS does not support legacy boot in an x86 physical machine. -* If the default RPM list is used to create a KubeOS image, at least 5 GB drive space is required. If the RPM list is customized, the occupied drive space may exceed 5 GB. -#### Example #### - -* Modify the `00bootup/Global.cfg` file. All parameters are mandatory. Currently, only IPv4 addresses are supported. The following is a configuration example: - - ```shell - # rootfs file name - rootfs_name=kubeos.tar - # select the target disk to install kubeOS - disk=/dev/sda - # pxe server ip address where stores the rootfs on the http server - server_ip=192.168.1.50 - # target machine temporary ip - local_ip=192.168.1.100 - # target machine temporary route - route_ip=192.168.1.1 - # target machine temporary netmask - netmask=255.255.255.0 - # target machine netDevice name - net_name=eth0 - ``` - -* Using the Repo Source - * To configure the DNS, customize the `resolv.conf` file in the `scripts` directory. - ```shell - cd /opt/kubeOS/scripts - touch resolv.conf - vim resolv.conf - ``` - * Create an image required for installing KubeOS on a physical machine. - ``` - cd /opt/kubeOS/scripts - bash kbimg.sh create pxe-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' - ``` - -* Using a Docker Image - ``` shell - cd /opt/kubeOS/scripts - bash kbimg.sh create pxe-image -d your_imageRepository/imageName:version - ``` - -* Result Description - - * **initramfs.img**: initramfs image used for boot from PXE. - * **kubeos.tar**: OS used for installation from PXE. diff --git a/docs/en/docs/KubeOS/kubeos-user-guide.md b/docs/en/docs/KubeOS/kubeos-user-guide.md deleted file mode 100644 index aa615c852e059d2ea061ea75ee0cd9d5264ebfa7..0000000000000000000000000000000000000000 --- a/docs/en/docs/KubeOS/kubeos-user-guide.md +++ /dev/null @@ -1,8 +0,0 @@ -# KubeOS User Guide - -This document describes how to install, deploy, and use KubeOS in the openEuler system. KubeOS connects the container OS to the scheduling system in standard extension pattern and manages the OS upgrade of nodes in the cluster through the scheduling system. - -This document is intended for community developers, open source enthusiasts, and partners who use the openEuler system and want to learn and use the container OSs. Users must: - -* Know basic Linux operations. -* Understand Kubernetes and Docker. diff --git a/docs/en/docs/KubeOS/public_sys-resources/icon-note.gif b/docs/en/docs/KubeOS/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/KubeOS/public_sys-resources/icon-note.gif and /dev/null differ diff --git a/docs/en/docs/KubeOS/usage-instructions.md b/docs/en/docs/KubeOS/usage-instructions.md deleted file mode 100644 index 1f4446d57d02271f81e9ca66e97007579e51c4be..0000000000000000000000000000000000000000 --- a/docs/en/docs/KubeOS/usage-instructions.md +++ /dev/null @@ -1,169 +0,0 @@ -# Usage Instructions - -- [Usage Instructions](#usage-instructions) - - [Precautions](#precautions) - - [Upgrade](#upgrade) - - [Rollback](#rollback) - - [Application Scenarios](#application-scenarios) - - [Manual Rollback](#manual-rollback) - - [KubeOS-based Rollback](#kubeos-based-rollback) - -## Precautions - -1. KubeOS upgrades the container OS in an atomic manner, where all software packages are upgraded at the same time. By default, single-package upgrade is not supported. -2. KubeOS supports container OSs with two partitions. Partitions more than two are not supported. -3. You can view the upgrade logs of a single node in the **/var/log/messages** file on the node. -4. Strictly follow the upgrade and rollback procedures described in this document. If the steps are performed in a wrong sequence, the system may fail to be upgraded or rolled back. -5. Upgrade using a Docker image and mTLS two-way authentication are supported only in openEuler 22.09 or later. -6. Cross-major version upgrade is not supported. - -## Upgrade - -Create a custom object of the OS type in the cluster and set the corresponding fields. The OS type comes from the CRD object created in the installation and deployment sections. The following table describes the fields. - -| Parameter |Type | Description | How to Use| Mandatory (Yes/No) | -| -------------- | ------ | ------------------------------------------------------------ | ----- | ---------------- | -| imagetype | string | Type of the upgrade image | The value must be `docker` or `disk`. Other values are invalid. This parameter is valid only in upgrade scenarios.|Yes | -| opstype | string | Operation, that is, upgrade or rollback| The value must be `upgrade` or `rollback`. Other values are invalid.|Yes | -| osversion | string | OS version of the image used for upgrade or rollback | The value must be a KubeOS version, for example, `KubeOS 1.0.0`.|Yes | -| maxunavailable | int | Number of nodes to be upgraded or rolled back at the same time| If the value of `maxunavailable` is greater than the actual number of nodes in the cluster, the deployment can be performed. The upgrade or rollback is performed based on the actual number of nodes in the cluster.|Yes | -| dockerimage | string | Docker image used for upgrade | The value must be in the *repository/name:tag* format. This parameter is valid only when the Docker image is used for upgrade.|Yes | -| imageurl | string | Address of the disk image used for the upgrade| `imageurl` contains the protocol and only HTTP or HTTPS is supported. For example, `https://192.168.122.15/update.img` is valid only when a disk image is used for upgrade.|Yes | -| checksum | string | Checksum (SHA-256) value for disk image verification during the upgrade. | This parameter is valid only when a disk image is used for upgrade.|Yes | -| flagSafe | bool | Whether `imageurl` specifies a secure HTTP address | The value must be `true` or `false`. This parameter is valid only when `imageurl` specifies an HTTP address.|Yes | -| mtls | bool | Whether HTTPS two-way authentication is used for the connection to the `imageurl` address. | The value must be `true` or `false`. This parameter is valid only when `imageurl` specifies an HTTPS address.|Yes | -| cacert | string | Root certificate file used for HTTPS or HTTPS two-way authentication | This parameter is valid only when `imageurl` specifies an HTTPS address.| This parameter is mandatory when `imageurl` specifies an HTTPS address.| -| clientcert | string | Client certificate file used for HTTPS two-way authentication | This parameter is valid only when HTTPS two-way authentication is used.|This parameter is mandatory when `mtls` is set to `true`.| -| clientkey | string | Client public key used for HTTPS two-way authentication | This parameter is valid only when HTTPS two-way authentication is used.|This parameter is mandatory when `mtls` is set to `true`.| - -The address specified by `imageurl` contains the protocol. Only the HTTP or HTTPS protocol is supported. If `imageurl` is set to an HTTPS address, secure transmission is used. If `imageurl` is set to an HTTP address, set `flagSafe` to `true`, because the image can be downloaded only when the address is secure. If `imageurl` is set to an HTTP address but `flagSafe` is not set to `true`, the address is insecure by default. The image will not be downloaded, and a message is written to the log of the node to be upgraded indicating that the address is insecure. - -You are advised to set `imageurl` to an HTTPS address. In this case, ensure that the required certificate has been installed on the node to be upgraded. If the image server is maintained by yourself, you need to sign the certificate and ensure that the certificate has been installed on the node to be upgraded. Place the certificate in the **/etc/KubeOS/certs** directory of KubeOS. The administrator specifies the address and must ensure the security of the address. An intranet address is recommended. - -The container OS image provider must check the validity of the image to ensure that the downloaded container OS image is from a reliable source. - -Compile the YAML file for deploying the OS as a custom resource (CR) instance in the cluster. The following is an example YAML file for deploying the CR instance: - -* Upgrade using a disk image - - ``` - apiVersion: upgrade.openeuler.org/v1alpha1 - kind: OS - metadata: - name: os-sample - spec: - imagetype: disk - opstype: upgrade - osversion: edit.os.version - maxunavailable: edit.node.upgrade.number - dockerimage: "" - imageurl: edit.image.url - checksum: image.checksum - flagSafe: imageurl.safety - mtls: imageurl use mtls or not - cacert: ca certificate - clientcert: client certificate - clientkey: client certificate key - ``` - -* Upgrade using a Docker image - - ``` shell - apiVersion: upgrade.openeuler.org/v1alpha1 - kind: OS - metadata: - name: os-sample - spec: - imagetype: docker - opstype: upgrade - osversion: edit.os.version - maxunavailable: edit.node.upgrade.number - dockerimage: dockerimage like repository/name:tag - imageurl: "" - checksum: "" - flagSafe: false - mtls: true - ``` - - Before using a Docker image to perform the upgrade, create the image first. For details about how to create a Docker image, see **KubeOS Image Creation**. - -Assume that the YAML file is **upgrade_v1alpha1_os.yaml**. - -Check the OS version of the node that is not upgraded. - -``` -kubectl get nodes -o custom-columns='NAME:.metadata.name,OS:.status.nodeInfo.osImage' -``` - -Run the following command to deploy the CR instance in the cluster. The node is upgraded based on the configured parameters. - -``` -kubectl apply -f upgrade_v1alpha1_os.yaml -``` - -Check the node OS version again to determine whether the node upgrade is complete. - -``` -kubectl get nodes -o custom-columns='NAME:.metadata.name,OS:.status.nodeInfo.osImage' -``` - -> ![](./public_sys-resources/icon-note.gif)**NOTE**: -> -> If you need to perform the upgrade again, modify the `imageurl`, `osversion`, `checksum`, `maxunavailable`, `flagSafe`, or `dockerimage` parameters in **upgrade_v1alpha1_os.yaml**. - -## Rollback - -### Application Scenarios - -- If a node cannot be started, you can only manually roll back the container OS to the previous version that can be properly started. -- If a node can be started and run the system, you can manually or use KubeOS (similar to the upgrade) to roll back the container OS. You are advised to use KubeOS. - -### Manual Rollback - -Manually restart the node and select the second boot option to roll back the container OS. Manual rollback can only roll back the container OS to the version before the upgrade. - -### KubeOS-based Rollback - -* Roll back to any version. - * Modify the YAML configuration file (for example, **upgrade_v1alpha1_os.yaml**) of the CR instance of the OS and set the corresponding fields to the image information of the target source version. The OS type comes from the CRD object created in the installation and deployment sections. For details about the fields and examples, see the previous section. - - * After the YAML is modified, run the update command. After the custom object is updated in the cluster, the node performs rollback based on the configured field information. - - ``` - kubectl apply -f upgrade_v1alpha1_os.yaml - ``` - -* Roll back to the previous version. - - * Modify the **upgrade_v1alpha1_os.yaml** file. Set **osversion** to the previous version and **opstype** to **rollback** to roll back to the previous version (that is, switch to the previous partition). Example YAML: - - ``` - apiVersion: upgrade.openeuler.org/v1alpha1 - kind: OS - metadata: - name: os-sample - spec: - imagetype: "" - opstype: rollback - osversion: KubeOS pervious version - maxunavailable: 2 - dockerimage: "" - imageurl: "" - checksum: "" - flagSafe: false - mtls:true - ``` - - * After the YAML is modified, run the update command. After the custom object is updated in the cluster, the node performs rollback based on the configured field information. - - ``` - kubectl apply -f upgrade_v1alpha1_os.yaml - ``` - - After the update is complete, the node rolls back the container OS based on the configuration information. - -* Check the OS version of the container on the node to determine whether the rollback is successful. - - ``` - kubectl get nodes -o custom-columns='NAME:.metadata.name,OS:.status.nodeInfo.osImage' - ``` diff --git a/docs/en/docs/Kubernetes/Kubernetes.md b/docs/en/docs/Kubernetes/Kubernetes.md deleted file mode 100644 index 4b1174b926b9a10a3de4a7ecf97757a64da0c341..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/Kubernetes.md +++ /dev/null @@ -1,12 +0,0 @@ -# Kubernetes Cluster Deployment Guide - -This document describes how to deploy a Kubernetes cluster in binary mode on openEuler. - -Note: All operations in this document are performed using the `root` permission. - -## Cluster Status - -The cluster status used in this document is as follows: - -- Cluster structure: six VMs running the `openEuler 21.09` OS, three master nodes, and three nodes. -- Physical machine: `x86/ARM` server of `openEuler 21.09`. diff --git a/docs/en/docs/Kubernetes/deploying-a-Kubernetes-cluster-manually.md b/docs/en/docs/Kubernetes/deploying-a-Kubernetes-cluster-manually.md deleted file mode 100644 index 4f88d05bdb4b924ea9cb23bd394100dc4968b206..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/deploying-a-Kubernetes-cluster-manually.md +++ /dev/null @@ -1,20 +0,0 @@ -# Deploying a Kubernetes Cluster Manually - -**Note: Manual deployment applies only to experimental and learning environments and is not intended for commercial environments.** - - -This chapter describes how to deploy a Kubernetes cluster. - -## Environment - -Deploy based on the preceding [VM installation](./preparing-VMs.md) and obtain the following VM list: - -| HostName | MAC | IPv4 | -| ---------- | ----------------- | -------------------| -| k8smaster0 | 52:54:00:00:00:80 | 192.168.122.154/24 | -| k8smaster1 | 52:54:00:00:00:81 | 192.168.122.155/24 | -| k8smaster2 | 52:54:00:00:00:82 | 192.168.122.156/24 | -| k8snode1 | 52:54:00:00:00:83 | 192.168.122.157/24 | -| k8snode2 | 52:54:00:00:00:84 | 192.168.122.158/24 | -| k8snode3 | 52:54:00:00:00:85 | 192.168.122.159/24 | - diff --git a/docs/en/docs/Kubernetes/deploying-a-node-component.md b/docs/en/docs/Kubernetes/deploying-a-node-component.md deleted file mode 100644 index 485f96d6c4880007b91ec630f380b1b379658a2f..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/deploying-a-node-component.md +++ /dev/null @@ -1 +0,0 @@ -# Deploying a Node Component This section uses the `k8snode1` node as an example. ## Environment Preparation ```bash # A proxy needs to be configured for the intranet. $ dnf install -y docker iSulad conntrack-tools socat containernetworking-plugins $ swapoff -a $ mkdir -p /etc/kubernetes/pki/ $ mkdir -p /etc/cni/net.d $ mkdir -p /opt/cni # Delete the default kubeconfig file. $ rm /etc/kubernetes/kubelet.kubeconfig ## Use iSulad as the runtime ########. # Configure the iSulad. cat /etc/isulad/daemon.json { "registry-mirrors": [ "docker.io" ], "insecure-registries": [ "k8s.gcr.io", "quay.io" ], "pod-sandbox-image": "k8s.gcr.io/pause:3.2",# pause type "network-plugin": "cni", # If this parameter is left blank, the CNI network plug-in is disabled. In this case, the following two paths become invalid. After the plug-in is installed, restart iSulad. "cni-bin-dir": "/usr/libexec/cni/", "cni-conf-dir": "/etc/cni/net.d", } # Add the proxy to the iSulad environment variable and download the image. cat /usr/lib/systemd/system/isulad.service [Service] Type=notify Environment="HTTP_PROXY=http://name:password@proxy:8080" Environment="HTTPS_PROXY=http://name:password@proxy:8080" # Restart the iSulad and set it to start automatically upon power-on. systemctl daemon-reload systemctl restart isulad ## If Docker is used as the runtime, run the following command: ######## $ dnf install -y docker # If a proxy environment is required, configure a proxy for Docker, add the configuration file http-proxy.conf, and edit the following content. Replace name, password, and proxy-addr with the actual values. $ cat /etc/systemd/system/docker.service.d/http-proxy.conf [Service] Environment="HTTP_PROXY=http://name:password@proxy-addr:8080" $ systemctl daemon-reload $ systemctl restart docker ``` ## Creating kubeconfig Configuration Files Perform the following operations on each node to create a configuration file: ```bash $ kubectl config set-cluster openeuler-k8s \ --certificate-authority=/etc/kubernetes/pki/ca.pem \ --embed-certs=true \ --server=https://192.168.122.154:6443 \ --kubeconfig=k8snode1.kubeconfig $ kubectl config set-credentials system:node:k8snode1 \ --client-certificate=/etc/kubernetes/pki/k8snode1.pem \ --client-key=/etc/kubernetes/pki/k8snode1-key.pem \ --embed-certs=true \ --kubeconfig=k8snode1.kubeconfig $ kubectl config set-context default \ --cluster=openeuler-k8s \ --user=system:node:k8snode1 \ --kubeconfig=k8snode1.kubeconfig $ kubectl config use-context default --kubeconfig=k8snode1.kubeconfig ``` **Note: Change k8snode1 to the corresponding node name.** ## Copying the Certificate Similar to the control plane, all certificates, keys, and related configurations are stored in the `/etc/kubernetes/pki/` directory. ```bash $ ls /etc/kubernetes/pki/ ca.pem k8snode1.kubeconfig kubelet_config.yaml kube-proxy-key.pem kube-proxy.pem k8snode1-key.pem k8snode1.pem kube_proxy_config.yaml kube-proxy.kubeconfig ``` ## CNI Network Configuration containernetworking-plugins is used as the CNI plug-in used by kubelet. In the future, plug-ins such as calico and flannel can be introduced to enhance the network capability of the cluster. ```bash # Bridge Network Configuration $ cat /etc/cni/net.d/10-bridge.conf { "cniVersion": "0.3.1", "name": "bridge", "type": "bridge", "bridge": "cnio0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "10.244.0.0/16", "gateway": "10.244.0.1" }, "dns": { "nameservers": [ "10.244.0.1" ] } } # Loopback Network Configuration $ cat /etc/cni/net.d/99-loopback.conf { "cniVersion": "0.3.1", "name": "lo", "type": "loopback" } ``` ## Deploying the kubelet Service ### Configuration File on Which Kubelet Depends ```bash $ cat /etc/kubernetes/pki/kubelet_config.yaml kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 authentication: anonymous: enabled: false webhook: enabled: true x509: clientCAFile: /etc/kubernetes/pki/ca.pem authorization: mode: Webhook clusterDNS: - 10.32.0.10 clusterDomain: cluster.local runtimeRequestTimeout: "15m" tlsCertFile: "/etc/kubernetes/pki/k8snode1.pem" tlsPrivateKeyFile: "/etc/kubernetes/pki/k8snode1-key.pem" ``` **Note: The IP address of the cluster DNS is 10.32.0.10, which must be the same as the value of service-cluster-ip-range.** ### Compiling the systemd Configuration File ```bash $ cat /usr/lib/systemd/system/kubelet.service [Unit] Description=kubelet: The Kubernetes Node Agent Documentation=https://kubernetes.io/docs/ Wants=network-online.target After=network-online.target [Service] ExecStart=/usr/bin/kubelet \ --config=/etc/kubernetes/pki/kubelet_config.yaml \ --network-plugin=cni \ --pod-infra-container-image=k8s.gcr.io/pause:3.2 \ --kubeconfig=/etc/kubernetes/pki/k8snode1.kubeconfig \ --register-node=true \ --hostname-override=k8snode1 \ --cni-bin-dir="/usr/libexec/cni/" \ --v=2 Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target ``` **Note: If iSulad is used as the runtime, add the following configuration:** ```bash --container-runtime=remote \ --container-runtime-endpoint=unix:///var/run/isulad.sock \ ``` ## Deploying kube-proxy ### Configuration File on Which kube-proxy Depends ```bash cat /etc/kubernetes/pki/kube_proxy_config.yaml kind: KubeProxyConfiguration apiVersion: kubeproxy.config.k8s.io/v1alpha1 clientConnection: kubeconfig: /etc/kubernetes/pki/kube-proxy.kubeconfig clusterCIDR: 10.244.0.0/16 mode: "iptables" ``` ### Compiling the systemd Configuration File ```bash $ cat /usr/lib/systemd/system/kube-proxy.service [Unit] Description=Kubernetes Kube-Proxy Server Documentation=https://kubernetes.io/docs/reference/generated/kube-proxy/ After=network.target [Service] EnvironmentFile=-/etc/kubernetes/config EnvironmentFile=-/etc/kubernetes/proxy ExecStart=/usr/bin/kube-proxy \ $KUBE_LOGTOSTDERR \ $KUBE_LOG_LEVEL \ --config=/etc/kubernetes/pki/kube_proxy_config.yaml \ --hostname-override=k8snode1 \ $KUBE_PROXY_ARGS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target ``` ## Starting a Component Service ```bash $ systemctl enable kubelet kube-proxy $ systemctl start kubelet kube-proxy ``` Deploy other nodes in sequence. ## Verifying the Cluster Status Wait for several minutes and run the following command to check the node status: ```bash $ kubectl get nodes --kubeconfig /etc/kubernetes/pki/admin.kubeconfig NAME STATUS ROLES AGE VERSION k8snode1 Ready 17h v1.20.2 k8snode2 Ready 19m v1.20.2 k8snode3 Ready 12m v1.20.2 ``` ## Deploying coredns coredns can be deployed on a node or master node. In this document, coredns is deployed on the `k8snode1` node. ### Compiling the coredns Configuration File ```bash $ cat /etc/kubernetes/pki/dns/Corefile .:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure endpoint https://192.168.122.154:6443 tls /etc/kubernetes/pki/ca.pem /etc/kubernetes/pki/admin-key.pem /etc/kubernetes/pki/admin.pem kubeconfig /etc/kubernetes/pki/admin.kubeconfig default fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 } cache 30 loop reload loadbalance } ``` Note: - Listen to port 53. - Configure the Kubernetes plug-in, including the certificate and the URL of kube api. ### Preparing the service File of systemd ```bash cat /usr/lib/systemd/system/coredns.service [Unit] Description=Kubernetes Core DNS server Documentation=https://github.com/coredns/coredns After=network.target [Service] ExecStart=bash -c "KUBE_DNS_SERVICE_HOST=10.32.0.10 coredns -conf /etc/kubernetes/pki/dns/Corefile" Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target ``` ### Starting the Service ```bash $ systemctl enable coredns $ systemctl start coredns ``` ### Creating the Service Object of coredns ```bash $ cat coredns_server.yaml apiVersion: v1 kind: Service metadata: name: kube-dns namespace: kube-system annotations: prometheus.io/port: "9153" prometheus.io/scrape: "true" labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" kubernetes.io/name: "CoreDNS" spec: clusterIP: 10.32.0.10 ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP - name: metrics port: 9153 protocol: TCP ``` ### Creating the Endpoint Object of coredns ```bash $ cat coredns_ep.yaml apiVersion: v1 kind: Endpoints metadata: name: kube-dns namespace: kube-system subsets: - addresses: - ip: 192.168.122.157 ports: - name: dns-tcp port: 53 protocol: TCP - name: dns port: 53 protocol: UDP - name: metrics port: 9153 protocol: TCP ``` ### Confirming the coredns Service ```bash # View the service object. $ kubectl get service -n kube-system kube-dns NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.32.0.10 53/UDP,53/TCP,9153/TCP 51m # View the endpoint object. $ kubectl get endpoints -n kube-system kube-dns NAME ENDPOINTS AGE kube-dns 192.168.122.157:53,192.168.122.157:53,192.168.122.157:9153 52m ``` \ No newline at end of file diff --git a/docs/en/docs/Kubernetes/deploying-control-plane-components.md b/docs/en/docs/Kubernetes/deploying-control-plane-components.md deleted file mode 100644 index a9b9bb2faff7c208fe6fb3fb1f02616d5c2f7f18..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/deploying-control-plane-components.md +++ /dev/null @@ -1,357 +0,0 @@ -# Deploying Components on the Control Plane - -## Preparing the kubeconfig File for All Components - -### kube-proxy - -```bash -kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://192.168.122.154:6443 --kubeconfig=kube-proxy.kubeconfig -kubectl config set-credentials system:kube-proxy --client-certificate=/etc/kubernetes/pki/kube-proxy.pem --client-key=/etc/kubernetes/pki/kube-proxy-key.pem --embed-certs=true --kubeconfig=kube-proxy.kubeconfig -kubectl config set-context default --cluster=openeuler-k8s --user=system:kube-proxy --kubeconfig=kube-proxy.kubeconfig -kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig -``` - -### kube-controller-manager - -```bash -kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://127.0.0.1:6443 --kubeconfig=kube-controller-manager.kubeconfig -kubectl config set-credentials system:kube-controller-manager --client-certificate=/etc/kubernetes/pki/kube-controller-manager.pem --client-key=/etc/kubernetes/pki/kube-controller-manager-key.pem --embed-certs=true --kubeconfig=kube-controller-manager.kubeconfig -kubectl config set-context default --cluster=openeuler-k8s --user=system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig -kubectl config use-context default --kubeconfig=kube-controller-manager.kubeconfig -``` - -### kube-scheduler - -```bash -kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://127.0.0.1:6443 --kubeconfig=kube-scheduler.kubeconfig -kubectl config set-credentials system:kube-scheduler --client-certificate=/etc/kubernetes/pki/kube-scheduler.pem --client-key=/etc/kubernetes/pki/kube-scheduler-key.pem --embed-certs=true --kubeconfig=kube-scheduler.kubeconfig -kubectl config set-context default --cluster=openeuler-k8s --user=system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig -kubectl config use-context default --kubeconfig=kube-scheduler.kubeconfig -``` - -### admin - -```bash -kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://127.0.0.1:6443 --kubeconfig=admin.kubeconfig -kubectl config set-credentials admin --client-certificate=/etc/kubernetes/pki/admin.pem --client-key=/etc/kubernetes/pki/admin-key.pem --embed-certs=true --kubeconfig=admin.kubeconfig -kubectl config set-context default --cluster=openeuler-k8s --user=admin --kubeconfig=admin.kubeconfig -kubectl config use-context default --kubeconfig=admin.kubeconfig -``` - -### Obtaining the kubeconfig Configuration File - -```bash -admin.kubeconfig kube-proxy.kubeconfig kube-controller-manager.kubeconfig kube-scheduler.kubeconfig -``` - -## Configuration for Generating the Key Provider - -When api-server is started, a key pair `--encryption-provider-config=/etc/kubernetes/pki/encryption-config.yaml` needs to be provided. In this document, a key pair `--encryption-provider-config=/etc/kubernetes/pki/encryption-config.yaml` is generated by using urandom: - -```bash -$ cat generate.bash -#!/bin/bash - -ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64) - -cat > encryption-config.yaml < ![](./public_sys-resources/icon-note.gif)**NOTE:** -> -> - When a cluster is deleted, all data in the cluster is deleted and cannot be restored. Exercise caution when performing this operation. -> - Currently, dismantling a cluster does not delete the containers and the container images. However, if the Kubernetes cluster is configured to install a container engine during the deployment, the container engine will be deleted. As a result, the containers may run abnormally. -> - Some error information may be displayed when dismantling the cluster. Generally, this is caused by the error results returned during the delete operations. The cluster can still be properly dismantled. -> - -You can use the command line to delete the entire cluster. For example, run the following command to delete the k8s-cluster: - -```shell -$ eggo -d cleanup --id k8s-cluster -``` diff --git a/docs/en/docs/Kubernetes/eggo-tool-introduction.md b/docs/en/docs/Kubernetes/eggo-tool-introduction.md deleted file mode 100644 index f7beebeeaf057ea60b24915037d79535b8b78ac8..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/eggo-tool-introduction.md +++ /dev/null @@ -1,433 +0,0 @@ -# Tool Introduction - -This chapter describes the information related to the automatic deployment tool. You are advised to read this chapter before deployment. - -## Deployment Modes - -The automatic Kubernetes cluster deployment tool provided by openEuler supports one-click deployment using the CLI. The tool provides the following deployment modes: - -- Offline deployment: Prepare all required RPM packages, binary files, plugins, and container images on the local host, pack the packages into a tar.gz file in a specified format, and compile the corresponding YAML configuration file. Then, you can run commands to deploy the cluster in one-click. This deployment mode can be used when the VM cannot access the external network. -- Online deployment: Compile the YAML configuration file. The required RPM packages, binary files, plugins, and container images are automatically downloaded from the Internet during installation and deployment. In this mode, the VM must be able to access the software sources and the image repository on which the cluster depends, for example, Docker Hub. - -## Configurations - -When you use the automatic Kubernetes cluster deployment tool, use the YAML configuration file to describe the cluster deployment information. This section describes the configuration items and provides configuration examples. - -### Configuration Items - -- cluster-id: Cluster name, which must comply with the naming rules for the DNS names. Example: k8s-cluster - -- username: User name used to log in to the hosts using SSH where the Kubernetes cluster is to be deployed. The user name must be identical on all hosts. - -- private-key-path:The path of the key for password-free SSH login. You only need to configure either private-key-path or password. If both are configured, private-key-path is used preferentially. - -- masters: The master node list. It is recommended that each master node is also set as a worker node. Each master node contains the following sub-items. Each master node must be configured with a group of sub-items: - - name: The name of the master node, which is the node name displayed to the Kubernetes cluster. - - ip: The IP address of the master node. - - port: The port for SSH login of the node. The default value is 22. - - arch: CPU architecture of the master node. For example, the value for x86_64 CPUs is amd64. - -- workers: The list of the worker nodes. Each worker node contains the following sub-items. Each worker node must be configured with a group of sub-items: - - name: The name of the worker node, which is the node name displayed to the Kubernetes cluster. - - ip: The IP address of the master node. - - port: The port for SSH login of the node. The default value is 22. - - arch: CPU architecture of the worker node. For example, the value for x86_64 CPUs is amd64. - -- etcds: The list of etcd nodes. If this parameter is left empty, one etcd node is deployed for each master node. Otherwise, only the configured etcd node is deployed. Each etcd node contains the following sub-items. Each etcd node must be configured with a group of sub-items: - - name: The name of the etcd node, which is the node name displayed to the Kubernetes cluster. - - ip: The IP address of the etcd node. - - port: The port for SSH login. - - arch: CPU architecture of the etcd node. For example, the value for x86_64 CPUs is amd64. - -- loadbalance: The loadbalance node list. Each loadbalance node contains the following sub-items. Each loadbalance node must be configured with a group of sub-items: - - name: The name of the loadbalance node, which is the node name displayed to the Kubernetes cluster. - - ip: The IP address of the loadbalance node. - - port: The port for SSH login. - - arch: CPU architecture of the loadbalance node. For example, the value for x86_64 CPUs is amd64. - - bind-port: The listening port of the load balancing service. - -- external-ca: Whether to use an external CA certificate. If yes, set this parameter to true. Otherwise, set this parameter to false. - -- external-ca-path: The path of the external CA certificate file. This parameter takes affect only when external-ca is set to true. - -- service: service information created by Kubernetes. The service configuration item contains the following sub-items: - - cidr: The IP address segment of the service created by Kubernetes. - - dnsaddr: DNS address of the service created by Kubernetes - - gateway: The gateway address of the service created by Kubernetes. - - dns: The configuration item of the CoreDNS created by Kubernetes. The dns configuration item contains the following sub-items: - - corednstype: The deployment type of the CoreDNS created by Kubernetes. The value can be pod or binary. - - imageversion: The CoreDNS image version of the pod deployment type. - - replicas: The number of CoreDNS replicas of the pod deployment type. - -- network: The network configuration of the Kubernetes cluster. The network configuration item contains the following sub-items: - - podcidr: IP address segment of the Kubernetes cluster network. - - plugin: The network plugin deployed in the Kubernetes cluster - - plugin-args: The configuration file path of the network plugin of the Kubernetes cluster network. Example: {"NetworkYamlPath": "/etc/kubernetes/addons/calico.yaml"} - -- apiserver-endpoint: The IP address or domain name of the APIServer service that can be accessed by external systems. If loadbalance is configured, set this parameter to the IP address of the loadbalance node. Otherwise, set this parameter to the IP address of the first master node. - -- apiserver-cert-sans: The IP addresses and domain names that need to be configured in the APIServer certificate. This configuration item contains the following sub-items: - - dnsnames: The array list of the domain names that need to be configured in the APIServer certificate. - - ips: The array list of IP addresses that need to be configured in the APIServer certificate. - -- apiserver-timeout: APIServer response timeout interval. - -- etcd-token: The etcd cluster name. - -- dns-vip: The virtual IP address of the DNS. - -- dns-domain: The DNS domain name suffix. - -- pause-image: The complete image name of the pause container. - -- network-plugin: The type of the network plugin. This parameter can only be set to cni. If this item is not configured, the default Kubernetes network is used. - -- cni-bin-dir: network plugin address. Use commas (,) to separate multiple addresses. For example: /usr/libexec/cni,/opt/cni/bin. - -- runtime: The type of the container runtime. Currently, docker and iSulad are supported. - -- runtime-endpoint: The endpoint of the container runtime. This parameter is optional when runtime is set to docker. - -- registry-mirrors: The mirror site address of the image repository used for downloading container images. - -- insecure-registries: The address of the image repository used for downloading container images through HTTP. - -- config-extra-args: The extra parameters for starting services of each component (such as kube-apiserver and etcd). This configuration item contains the following sub-items: - - name: The component name. The value can be etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kube-proxy or kubelet. - - - extra-args: The extended parameters of the component. The format is key: value. Note that the component parameter corresponding to key must be prefixed with a hyphen (-) or two hyphens (--). - - - open-ports: Configure the ports that need to be enabled additionally. The ports required by Kubernetes do not need to be configured. Other plugin ports need to be configured additionally. - - worker | master | etcd | loadbalance: The type of the node where the ports are enabled. Each configuration item contains one or more port and protocol sub-items. - - port: The port address. - - protocol: The port type. The value can be tcp or udp. - - - install: Configure the detailed information about the installation packages or binary files to be installed on each type of nodes. Note that the corresponding files must be packaged in a tar.gz installation package. The following describes the full configuration. Select the configuration items as needed. - - package-source: The detailed information about the installation package. - - type: The compression type of the installation package. Currently, only tar.gz installation packages are supported. - - dstpath: The path where the installation package is to be decompressed on the peer host. The path must be valid absolute path. - - srcpath: The path for storing the installation packages of different architectures. The architecture must correspond to the host architecture. The path must be a valid absolute path. - - arm64: The path of the installation package of the ARM64 architecture. This parameter is required if any ARM64 node is included in the configuration. - - amd64: The path of the installation package of the AMD64 architecture. This parameter is required if any x86_64 node is included in the configuration. - - > ![](./public_sys-resources/icon-note.gif)**NOTE**: - > - > - In the install configuration item, the sub-items of etcd, kubernetes-master, kubernetes-worker, network, loadbalance, container, image, and dns are the same, that is, name, type, dst, schedule, and TimeOut. dst, schedule, and TimeOut are optional. You can determine whether to configure them based on the files to be installed. The following uses the etcd and kubernetes-master nodes as an example. - - - etcd: The list of packages or binary files to be installed on etcd nodes. - - name: The names of the software packages or binary files to be installed. If the software package is an installation package, enter only the name and do not specify the version. During the installation, `$name*` is used for identification. Example: etcd. If there are multiple software packages, use commas (,) to separate them. - - type: The type of the configuration item. The value can be pkg, repo, bin, file, dir, image, yaml, or shell. If type is set to repo, configure the repo source on the corresponding node. - - dst: The path of the destination folder. This parameter is required when type is set to bin, file, or dir. It indicates the directory where a file or folder is stored. To prevent users from incorrectly configuring a path and deleting important files during cleanup, this parameter must be set to a path in the whitelist. For details, see "Whitelist Description." - - kubernetes-master: The list of packages or binary files to be installed on the Kubernetes master nodes. - - kubernetes-worker: The list of packages or binary files to be installed on the Kubernetes worker nodes. - - network: The list of packages or binary files to be installed for the network. - - loadbalance: The list of packages or binary files to be installed on the loadbalance nodes. - - container: The list of packages or binary files to be installed for the containers. - - image: The tar package of the container image. - - dns: Kubernetes CoreDNS installation package. If corednstype is set to pod, this parameter is not required. - - addition: The list of additional installation packages or binary files. - - master: The following configurations will be installed on all master nodes. - - name: The name of the software package or binary file to be installed. - - type: The type of the configuration item. The value can be pkg, repo, bin, file, dir, image, yaml, or shell. If type is set to repo, configure the repo source on the corresponding node. - - schedule: Valid only when type is set to shell. This parameter indicates when the user wants to execute the script. The value can be prejoin (before the node is added), postjoin (after the node is added), precleanup (before the node is removed), or postcleanup (after the node is removed). - - TimeOut: The script execution timeout interval. If the execution times out, the process is forcibly stopped. The default value is 30s. - - worker: The configurations will be installed on all worker nodes. The configuration format is the same as that of master under addition. - -### Whitelist Description - -The value of dst under install must match the whitelist rules. Set it to a path in the whitelist or a subdirectory of the path. The current whitelist is as follows: - -- /usr/bin -- /usr/local/bin -- /opt/cni/bin -- /usr/libexec/cni -- /etc/kubernetes -- /usr/lib/systemd/system -- /etc/systemd/system -- /tmp - -### Configuration Example - -The following is an example of the YAML file configuration. As shown in the example, nodes of different types can be deployed on a same host, but the configurations of these nodes must be the same. For example, a master node and a worker node are deployed on test0. - -```yaml -cluster-id: k8s-cluster -username: root -private-key-path: /root/.ssh/private.key -masters: -- name: test0 - ip: 192.168.0.1 - port: 22 - arch: arm64 -workers: -- name: test0 - ip: 192.168.0.1 - port: 22 - arch: arm64 -- name: test1 - ip: 192.168.0.3 - port: 22 - arch: arm64 -etcds: -- name: etcd-0 - ip: 192.168.0.4 - port: 22 - arch: amd64 -loadbalance: - name: k8s-loadbalance - ip: 192.168.0.5 - port: 22 - arch: amd64 - bind-port: 8443 -external-ca: false -external-ca-path: /opt/externalca -service: - cidr: 10.32.0.0/16 - dnsaddr: 10.32.0.10 - gateway: 10.32.0.1 - dns: - corednstype: pod - imageversion: 1.8.4 - replicas: 2 -network: - podcidr: 10.244.0.0/16 - plugin: calico - plugin-args: {"NetworkYamlPath": "/etc/kubernetes/addons/calico.yaml"} -apiserver-endpoint: 192.168.122.222:6443 -apiserver-cert-sans: - dnsnames: [] - ips: [] -apiserver-timeout: 120s -etcd-external: false -etcd-token: etcd-cluster -dns-vip: 10.32.0.10 -dns-domain: cluster.local -pause-image: k8s.gcr.io/pause:3.2 -network-plugin: cni -cni-bin-dir: /usr/libexec/cni,/opt/cni/bin -runtime: docker -runtime-endpoint: unix:///var/run/docker.sock -registry-mirrors: [] -insecure-registries: [] -config-extra-args: - - name: kubelet - extra-args: - "--cgroup-driver": systemd -open-ports: - worker: - - port: 111 - protocol: tcp - - port: 179 - protocol: tcp -install: - package-source: - type: tar.gz - dstpath: "" - srcpath: - arm64: /root/rpms/packages-arm64.tar.gz - amd64: /root/rpms/packages-x86.tar.gz - etcd: - - name: etcd - type: pkg - dst: "" - kubernetes-master: - - name: kubernetes-client,kubernetes-master - type: pkg - kubernetes-worker: - - name: docker-engine,kubernetes-client,kubernetes-node,kubernetes-kubelet - type: pkg - dst: "" - - name: conntrack-tools,socat - type: pkg - dst: "" - network: - - name: containernetworking-plugins - type: pkg - dst: "" - loadbalance: - - name: gd,gperftools-libs,libunwind,libwebp,libxslt - type: pkg - dst: "" - - name: nginx,nginx-all-modules,nginx-filesystem,nginx-mod-http-image-filter,nginx-mod-http-perl,nginx-mod-http-xslt-filter,nginx-mod-mail,nginx-mod-stream - type: pkg - dst: "" - container: - - name: emacs-filesystem,gflags,gpm-libs,re2,rsync,vim-filesystem,vim-common,vim-enhanced,zlib-devel - type: pkg - dst: "" - - name: libwebsockets,protobuf,protobuf-devel,grpc,libcgroup - type: pkg - dst: "" - - name: yajl,lxc,lxc-libs,lcr,clibcni,iSulad - type: pkg - dst: "" - image: - - name: pause.tar - type: image - dst: "" - dns: - - name: coredns - type: pkg - dst: "" - addition: - master: - - name: prejoin.sh - type: shell - schedule: "prejoin" - TimeOut: "30s" - - name: calico.yaml - type: yaml - dst: "" - worker: - - name: docker.service - type: file - dst: /usr/lib/systemd/system/ - - name: postjoin.sh - type: shell - schedule: "postjoin" -``` - -### Installation Package Structure - -For offline deployment, you need to prepare the Kubernetes software package and the related offline installation packages, and store the offline installation packages in a specific directory structure. The directory structure is as follows: - -```shell -package -├── bin -├── dir -├── file -├── image -├── pkg -└── packages_notes.md -``` - -The preceding directories are described as follows: - -- The directory structure of the offline deployment package corresponds to the package types in the cluster configuration file config. The package types include pkg, repo, bin, file, dir, image, yaml and shell. - -- The bin directory stores binary files, corresponding to the bin package type. - -- The dir directory stores the directory that needs to be copied to the target host. You need to configure the dst destination path, corresponding to the dir package type. - -- The file directory stores three types of files: file, yaml, and shell. The file type indicates the files to be copied to the target host, and requires the dst destination path to be configured. The yaml type indicates the user-defined YAML files, which will be applied after the cluster is deployed. The shell type indicates the scripts to be executed, and requires the schedule execution time to be configured. The execution time includes prejoin (before the node is added), postjoin (after the node is added), precleanup (before the node is removed), and postcleanup (after the node is removed). - -- The image directory stores the container images to be imported. The container images must be in a tar package format that is compatible with Docker (for example, images exported by Docker or isula-build). - -- The pkg directory stores the rpm/deb packages to be installed, corresponding to the pkg package type. You are advised to use binary files to facilitate cross-release deployment. - -### Command Reference - -To utilize the cluster deployment tool provided by openEuler, use the eggo command to deploy the cluster. - -#### Deploying the Kubernetes Cluster - -Run the following command to deploy a Kubernetes cluster using the specified YAML configuration: - -**eggo deploy** [ **-d** ] **-f** *deploy.yaml* - -| Parameter| Mandatory (Yes/No)| Description | -| ------------- | -------- | --------------------------------- | -| --debug \| -d | No| Displays the debugging information.| -| --file \| -f | Yes| Specifies the path of the YAML file for the Kubernetes cluster deployment.| - -#### Adding a Single Node - -Run the following command to add a specified single node to the Kubernetes cluster: - -**eggo** **join** [ **-d** ] **--id** *k8s-cluster* [ **--type** *master,worker* ] **--arch** *arm64* **--port** *22* [ **--name** *master1*] *IP* - -| Parameter| Mandatory (Yes/No) | Description| -| ------------- | -------- | ------------------------------------------------------------ | -| --debug \| -d | No| Displays the debugging information.| -| --id | Yes| Specifies the name of the Kubernetes cluster where the node is to be added.| -| --type \| -t | No| Specifies the type of the node to be added. The value can be master or worker. Use commas (,) to separate multiple types. The default value is worker.| -| --arch \| -a | Yes| Specifies the CPU architecture of the node to be added.| -| --port \| -p | Yes| Specifies the port number for SSH login of the node to be added.| -| --name \| -n | No| Specifies the name of the node to be added.| -| *IP* | Yes| Actual IP address of the node to be added.| - -#### Adding Multiple Nodes - -Run the following command to add specified multiple nodes to the Kubernetes cluster: - -**eggo** **join** [ **-d** ] **--id** *k8s-cluster* **-f** *nodes.yaml* - -| Parameter| Mandatory (Yes/No) | Description | -| ------------- | -------- | -------------------------------- | -| --debug \| -d | No| Displays the debugging information.| -| --id | Yes| Specifies the name of the Kubernetes cluster where the nodes are to be added.| -| --file \| -f | Yes| Specifies the path of the YAML configuration file for adding the nodes.| - -#### Deleting Nodes - -Run the following command to delete one or more nodes from the Kubernetes cluster: - -**eggo delete** [ **-d** ] **--id** *k8s-cluster* *node* [*node...*] - -| Parameter| Mandatory (Yes/No) | Description | -| ------------- | -------- | -------------------------------------------- | -| --debug \| -d | No| Displays the debugging information.| -| --id | Yes| Specifies the name of the cluster where the one or more nodes to be deleted are located.| -| *node* | Yes| Specifies the IP addresses or names of the one or more nodes to be deleted.| - -#### Deleting the Cluster - -Run the following command to delete the entire Kubernetes cluster: - -**eggo cleanup** [ **-d** ] **--id** *k8s-cluster* [ **-f** *deploy.yaml* ] - -| Parameter| Mandatory (Yes/No) | Description| -| ------------- | -------- | ------------------------------------------------------------ | -| --debug \| -d | No| Displays the debugging information.| -| --id | Yes| Specifies the name of the Kubernetes cluster to be deleted.| -| --file \| -f | No| Specifies the path of the YAML file for the Kubernetes cluster deletion. If this parameter is not specified, the cluster configuration cached during cluster deployment is used by default. In normal cases, you are advised not to set this parameter. Set this parameter only when an exception occurs.| - -> ![](./public_sys-resources/icon-note.gif)**NOTE**: -> -> - The cluster configuration cached during cluster deployment is recommended when you delete the cluster. That is, you are advised not to set the --file | -f parameter in normal cases. Set this parameter only when the cache configuration is damaged or lost due to an exception. - - - -#### Querying the Cluster - -Run the following command to query all Kubernetes clusters deployed using eggo: - -**eggo list** [ **-d** ] - -| Parameter| Mandatory (Yes/No) | Description | -| ------------- | -------- | ------------ | -| --debug \| -d | No| Displays the debugging information.| - -#### Generating the Cluster Configuration File - -Run the following command to quickly generate the required YAML configuration file for the Kubernetes cluster deployment. - -**eggo template** **-d** **-f** *template.yaml* **-n** *k8s-cluster* **-u** *username* **-p** *password* **--etcd** [*192.168.0.1,192.168.0.2*] **--masters** [*192.168.0.1,192.168.0.2*] **--workers** *192.168.0.3* **--loadbalance** *192.168.0.4* - -| Parameter| Mandatory (Yes/No) | Description | -| ------------------- | -------- | ------------------------------- | -| --debug \| -d | No| Displays the debugging information.| -| --file \| -f | No| Specifies the path of the generated YAML file.| -| --name \| -n | No| Specifies the name of the Kubernetes cluster.| -| --username \| -u | No| Specifies the user name for SSH login of the configured node.| -| --password \| -p | No| Specifies the password for SSH login of the configured node.| -| --etcd | No| Specifies the IP address list of the etcd nodes.| -| --masters | No| Specifies the IP address list of the master nodes.| -| --workers | No| Specifies the IP address list of the worker nodes.| -| --loadbalance \| -l | No| Specifies the IP address of the loadbalance node.| - -#### Querying the Help Information - -Run the following command to query the help information of the eggo command: - - **eggo help** - -#### Querying the Help Information of Subcommands - -Run the following command to query the help information of the eggo subcommands: - -**eggo deploy | join | delete | cleanup | list | template -h** - -| Parameter| Mandatory (Yes/No) | Description | -| ----------- | -------- | ------------ | -| --help\| -h | Yes| Displays the help information.| diff --git a/docs/en/docs/Kubernetes/installing-etcd.md b/docs/en/docs/Kubernetes/installing-etcd.md deleted file mode 100644 index 9bd37d031107a5b3c3f880db5a90bf5783b2935a..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/installing-etcd.md +++ /dev/null @@ -1,89 +0,0 @@ -# Installing etcd - - -## Preparing the Environment - -Run the following command to enable the port used by etcd: -```bash -firewall-cmd --zone=public --add-port=2379/tcp -firewall-cmd --zone=public --add-port=2380/tcp -``` - -## Installing the etcd Binary Package - -Currently, the RPM package is used for installation. - -``` -rpm -ivh etcd*.rpm -``` - -Prepare the directories. - -```bash -mkdir -p /etc/etcd /var/lib/etcd -cp ca.pem /etc/etcd/ -cp kubernetes-key.pem /etc/etcd/ -cp kubernetes.pem /etc/etcd/ -# Disabling SELinux -setenforce 0 -# Disabling the Default Configuration of the /etc/etcd/etcd.conf File -# Commenting Out the Line, for example, ETCD_LISTEN_CLIENT_URLS="http://localhost:2379". -``` - -## Compiling the etcd.service File - -The following uses the `k8smaster0` machine as an example: - -```bash -$ cat /usr/lib/systemd/system/etcd.service -[Unit] -Description=Etcd Server -After=network.target -After=network-online.target -Wants=network-online.target - -[Service] -Type=notify -WorkingDirectory=/var/lib/etcd/ -EnvironmentFile=-/etc/etcd/etcd.conf -# set GOMAXPROCS to number of processors -ExecStart=/bin/bash -c "ETCD_UNSUPPORTED_ARCH=arm64 /usr/bin/etcd --name=k8smaster0 --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/etc/etcd/kubernetes.pem --peer-key-file=/etc/etcd/kubernetes-key.pem --trusted-ca-file=/etc/etcd/ca.pem --peer-trusted-ca-file=/etc/etcd/ca.pem --peer-client-cert-auth --client-cert-auth --initial-advertise-peer-urls https://192.168.122.154:2380 --listen-peer-urls https://192.168.122.154:2380 --listen-client-urls https://192.168.122.154:2379,https://127.0.0.1:2379 --advertise-client-urls https://192.168.122.154:2379 --initial-cluster-token etcd-cluster-0 --initial-cluster k8smaster0=https://192.168.122.154:2380,k8smaster1=https://192.168.122.155:2380,k8smaster2=https://192.168.122.156:2380 --initial-cluster-state new --data-dir /var/lib/etcd" - -Restart=always -RestartSec=10s -LimitNOFILE=65536 - -[Install] -WantedBy=multi-user.target -``` - -**注意:** - -- The boot setting `ETCD_UNSUPPORTED_ARCH=arm64` needs to be added to ARM64; -- In this document, etcd and Kubernetes control are deployed on the same machine. Therefore, the `kubernetes.pem` and `kubernetes-key.pem` certificates are used to start etcd and Kubernetes control. -- A CA certificate is used in the entire deployment process. etcd can generate its own CA certificate and use its own CA certificate to sign other certificates. However, the certificate signed by the CA certificate needs to be used when the APIServer accesses the etcd client. -- `initial-cluster` needs to be added to all configurations for deploying etcd. -- To improve the storage efficiency of etcd, you can use the directory of the SSD as `data-dir`. - -Start the etcd service. - -```bash -$ systemctl enable etcd -$ systemctl start etcd -``` - -Then, deploy other hosts in sequence. - -## Verifying Basic Functions - -```bash -$ ETCDCTL_API=3 etcdctl -w table endpoint status --endpoints=https://192.168.122.155:2379,https://192.168.122.156:2379,https://192.168.122.154:2379 --cacert=/etc/etcd/ca.pem --cert=/etc/etcd/kubernetes.pem --key=/etc/etcd/kubernetes-key.pem -+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ -| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFTAPPLIED INDEX | ERRORS | -+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ -| https://192.168.122.155:2379 | b50ec873e253ebaa | 3.4.14 | 262 kB | false | false | 819 | 21 | 21 | | -| https://192.168.122.156:2379 | e2b0d126774c6d02 | 3.4.14 | 262 kB | true | false | 819 | 21 | 21 | | -| https://192.168.122.154:2379 | f93b3808e944c379 | 3.4.14 | 328 kB | false | false | 819 | 21 | 21 | | -+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ -``` - diff --git a/docs/en/docs/Kubernetes/installing-the-Kubernetes-software-package.md b/docs/en/docs/Kubernetes/installing-the-Kubernetes-software-package.md deleted file mode 100644 index e88f1adec2524cbf79e5556ce63bee85e5d1fa7f..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/installing-the-Kubernetes-software-package.md +++ /dev/null @@ -1,14 +0,0 @@ -# Installing the Kubernetes Software Package - - -```bash -$ dnf install -y docker conntrack-tools socat -``` - -After the EPOL source is configured, you can directly install Kubernetes through DNF. - -```bash -$ rpm -ivh kubernetes*.rpm -``` - - diff --git a/docs/en/docs/Kubernetes/preparing-VMs.md b/docs/en/docs/Kubernetes/preparing-VMs.md deleted file mode 100644 index 52b21caf6eb76e8ee4219640855f4adcf538bd68..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/preparing-VMs.md +++ /dev/null @@ -1,150 +0,0 @@ -# Preparing VMs - -This document describes how to use virt manager to install a VM. Ignore if your VM is prepared. - -## Installing Dependency Tools - -VM installation depends on related tools. The following command is an example for installing the dependency and enabling the libvirtd service. (If a proxy is required, configure the proxy first.) - -```bash -dnf install virt-install virt-manager libvirt-daemon-qemu edk2-aarch64.noarch virt-viewer -systemctl start libvirtd -systemctl enable libvirtd -``` - -## Preparing VM Disk Files - -```bash -dnf install -y qemu-img -virsh pool-define-as vmPool --type dir --target /mnt/vm/images/ -virsh pool-build vmPool -virsh pool-start vmPool -virsh pool-autostart vmPool -virsh vol-create-as --pool vmPool --name master0.img --capacity 200G --allocation 1G --format qcow2 -virsh vol-create-as --pool vmPool --name master1.img --capacity 200G --allocation 1G --format qcow2 -virsh vol-create-as --pool vmPool --name master2.img --capacity 200G --allocation 1G --format qcow2 -virsh vol-create-as --pool vmPool --name node1.img --capacity 300G --allocation 1G --format qcow2 -virsh vol-create-as --pool vmPool --name node2.img --capacity 300G --allocation 1G --format qcow2 -virsh vol-create-as --pool vmPool --name node3.img --capacity 300G --allocation 1G --format qcow2 -``` - -## Enabling Firewall Ports - -**Method 1** - -1. Query a port. - - ```shell - netstat -lntup | grep qemu-kvm - ``` - -2. Enable the VNC firewall port. For example, if the port number starts from 5900, run the following commands: - - ```shell - firewall-cmd --zone=public --add-port=5900/tcp - firewall-cmd --zone=public --add-port=5901/tcp - firewall-cmd --zone=public --add-port=5902/tcp - firewall-cmd --zone=public --add-port=5903/tcp - firewall-cmd --zone=public --add-port=5904/tcp - firewall-cmd --zone=public --add-port=5905/tcp - ``` - -**Method 2** - -Disable the firewall. - -```shell -systemctl stop firewalld -``` - -## Preparing the VM Configuration File - -A VM configuration file is required for creating a VM. For example, if the configuration file is master.xml and the host name of the VM is k8smaster0, the configuration is as follows: - -```bash - cat master.xml - - - k8smaster0 - 8 - 8 - - hvm - /usr/share/edk2/aarch64/QEMU_EFI-pflash.raw - /var/lib/libvirt/qemu/nvram/k8smaster0.fd - - - - - - - - - 1 - - destroy - restart - restart - - /usr/libexec/qemu-kvm - - - - - - - - - - - - - - - - - - - - - - - - - - - - -``` - -The VM configuration must be unique. Therefore, you need to modify the following to ensure that the VM is unique: - -- name: host name of the VM. You are advised to use lowercase letters. In this example, the value is `k8smaster0`. -- nvram: handle file path of the NVRAM, which must be globally unique. In this example, the value is `/var/lib/libvirt/qemu/nvram/k8smaster0.fd`. -- disk source file: VM disk file path. In this example, the value is `/mnt/vm/images/master0.img`. -- mac address of the interface: MAC address of the interface. In this example, the value is `52:54:00:00:00:80`. - -## Installing a VM - -1. Create and start a VM. - - ```shell - virsh define master.xml - virsh start k8smaster0 - ``` - -2. Obtain the VNC port number of the VM. - - ```shell - virsh vncdisplay k8smaster0 - ``` - -3. Use a VM connection tool, such as VNC Viewer, to remotely connect to the VM and perform configurations as prompted. - -4. Set the host name of the VM, for example, k8smaster0. - - ```shell - hostnamectl set-hostname k8smaster0 - ``` diff --git a/docs/en/docs/Kubernetes/preparing-certificates.md b/docs/en/docs/Kubernetes/preparing-certificates.md deleted file mode 100644 index 4981a1076e76d65dacc04b54a04b97103b3d3101..0000000000000000000000000000000000000000 --- a/docs/en/docs/Kubernetes/preparing-certificates.md +++ /dev/null @@ -1,413 +0,0 @@ - -# Preparing Certificates - -**Statement: The certificate used in this document is self-signed and cannot be used in a commercial environment.** - -Before deploying a cluster, you need to generate certificates required for communication between components in the cluster. This document uses the open-source CFSSL as the verification and deployment tool to help users understand the certificate configuration and the association between certificates of cluster components. You can select a tool based on the site requirements, for example, OpenSSL. - -## Building and Installing CFSSL - -The following commands for building and installing CFSSL are for your reference (the CFSSL website access permission is required, and the proxy must be configured first): - -```bash -$ wget --no-check-certificate https://github.com/cloudflare/cfssl/archive/v1.5.0.tar.gz -$ tar -zxf v1.5.0.tar.gz -$ cd cfssl-1.5.0/ -$ make -j6 -# cp bin/* /usr/local/bin/ -``` - -## Generating a Root Certificate - -Compile the CA configuration file, for example, ca-config.json: - -```bash -$ cat ca-config.json | jq -{ - "signing": { - "default": { - "expiry": "8760h" - }, - "profiles": { - "kubernetes": { - "usages": [ - "signing", - "key encipherment", - "server auth", - "client auth" - ], - "expiry": "8760h" - } - } - } -} -``` - -Compile a CA CSR file, for example, ca-csr.json: - -```bash -$ cat ca-csr.json | jq -{ - "CN": "Kubernetes", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "openEuler", - "OU": "WWW", - "ST": "BinJiang" - } - ] -} -``` - -Generate the CA certificate and key: - -```bash -$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -``` - -The following certificates are obtained: - -```bash -ca.csr ca-key.pem ca.pem -``` - -## Generating the admin Account Certificate - -admin is an account used by K8S for system management. Compile the CSR configuration of the admin account, for example, admin-csr.json: - -```bash -cat admin-csr.json | jq -{ - "CN": "admin", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:masters", - "OU": "Containerum", - "ST": "BinJiang" - } - ] -} -``` - -Generate a certificate: - -```bash -$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin -``` - -The result is as follows: - -```bash -admin.csr admin-key.pem admin.pem -``` - -## Generating a service-account Certificate - -Compile the CSR configuration file of the service-account account, for example, service-account-csr.json: - -```bash -cat service-account-csr.json | jq -{ - "CN": "service-accounts", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "Kubernetes", - "OU": "openEuler k8s install", - "ST": "BinJiang" - } - ] -} -``` - -Generate a certificate: - -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes service-account-csr.json | cfssljson -bare service-account -``` - -The result is as follows: - -```bash -service-account.csr service-account-key.pem service-account.pem -``` - -## Generating the kube-controller-manager Certificate - -Compile the CSR configuration of kube-controller-manager: - -```bash -{ - "CN": "system:kube-controller-manager", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:kube-controller-manager", - "OU": "openEuler k8s kcm", - "ST": "BinJiang" - } - ] -} -``` - -Generate a certificate: - -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager -``` - -The result is as follows: - -```bash -kube-controller-manager.csr kube-controller-manager-key.pem kube-controller-manager.pem -``` - -## Generating the kube-proxy Certificate - -Compile the CSR configuration of kube-proxy: - -```bash -{ - "CN": "system:kube-proxy", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:node-proxier", - "OU": "openEuler k8s kube proxy", - "ST": "BinJiang" - } - ] -} -``` - -Generate a certificate: - -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy -``` - -The result is as follows: - -```bash -kube-proxy.csr kube-proxy-key.pem kube-proxy.pem -``` - -## Generating the kube-scheduler Certificate - -Compile the CSR configuration of kube-scheduler: - -```bash -{ - "CN": "system:kube-scheduler", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:kube-scheduler", - "OU": "openEuler k8s kube scheduler", - "ST": "BinJiang" - } - ] -} -``` - -Generate a certificate: - -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler -``` - -The result is as follows: - -```bash -kube-scheduler.csr kube-scheduler-key.pem kube-scheduler.pem -``` - -## Generating the kubelet Certificate - -The certificate involves the host name and IP address of the server where kubelet is located. Therefore, the configuration of each node is different. The script is compiled as follows: - -```bash -$ cat node_csr_gen.bash - -#!/bin/bash - -nodes=(k8snode1 k8snode2 k8snode3) -IPs=("192.168.122.157" "192.168.122.158" "192.168.122.159") - -for i in "${!nodes[@]}"; do - -cat > "${nodes[$i]}-csr.json" < + +Install Homebrew based on the network status. + +Run the following command to install Homebrew: + +``` Shell +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" +``` + +Alternatively, run the following command to install Homebrew: + +``` Shell +/bin/zsh -c "$(curl -fsSL https://gitee.com/cunkai/HomebrewCN/raw/master/Homebrew.sh)" +``` + +### Installing QEMU and Wget + +OmniVirt depends on QEMU to run on macOS, and image download depends on Wget. Homebrew can be used to easily download and manage such software. Run the following command to install QEMU and Wget: + +``` Shell +brew install qemu +brew install wget +``` + +### Configuring the sudo Password-Free Permission + +OmniVirt depends on QEMU to run on macOS. To improve user network experience, [vmnet framework][1] of macOS is used to provide VM network capabilities. Currently, administrator permissions are required for using vmnet. When using the QEMU backend to create VMs with vmnet network devices, you need to enable the administrator permission. OmniVirt automatically uses the `sudo` command to implement this process during startup. Therefore, you need to configure the `sudo` password-free permission for the current user. If you do not want to perform this configuration, please stop using OmniVirt. + +1. On the macOS desktop, press **Shift**+**Command**+**U** to open the **Utilities** folder in **Go** and find **Terminal.app**. + + +2. Enter `sudo visudo` in the terminal to modify the sudo configuration file. Note that you may be required to enter the password in this step. Enter the password as prompted. + +3. Find and replace `%admin ALL=(ALL) ALL` with `%admin ALL=(ALL) NOPASSWD: ALL`. + + +4. Press **ESC** and enter **:wq** to save the settings. + +## Installing OmniVirt + +OmniVirt supports macOS Ventura for Apple Silicon and x86 architectures. [Download the latest version of OmniVirt][1] for macOS and decompress it to the desired location. + +The directory generated after the decompression contains the following files: + + + +**install.exe** is the installation file, which is used to install the support files required by OmniVirt to the specified location. **OmniVirt.dmg** is the disk image of the main program. + +1. Install the support files. (This operation requires the `sudo` permission. You need to complete the preceding steps first.) Double-click **install.exe** and wait until the program execution is completed. + +2. Configure OmniVirt. + + - Check the locations of QEMU and Wget. The name of the QEMU binary file varies according to the architecture. Select the correct name (Apple Silicon: **qemu-system-aarch64**; Intel: **qemu-system-x86_64**) as required. + + ``` Shell + which wget + which qemu-system-{host_arch} + ``` + + Reference output: + + ```shell + /opt/homebrew/bin/wget + /opt/homebrew/bin/qemu-system-aarch64 + ``` + + Record the paths, which will be used in the following steps. + + - Open the **omnivirt.conf** file and configure it. + + ``` Shell + sudo vi /Library/Application\ Support/org.openeuler.omnivirt/omnivirt.conf + ``` + + OmniVirt configurations are as follows: + + ```shell + [default] + log_dir = # Log file location (xxx.log) + work_dir = # OmniVirt working directory, which is used to store VM images and VM files. + wget_dir = # Path of the Wget executable file. Set this parameter based on the previous step. + qemu_dir = # Path of the QEMU executable file. Set this parameter based on the previous step. + debug = True + + [vm] + cpu_num = 1 # Number of CPUs of the VM. + memory = 1024 #Memory size of the VM, in MB. Do not set this parameter to a value greater than 2048 for M1 users. + ``` + + Save the modifications and exit. + +3. Install **OmniVirt.app**. + + - Double-click **OmniVirt.dmg**. In the displayed window, drag **OmniVirt.app** to **Applications** to complete the installation. You can find **OmniVirt.app** in applications. + + + +## Using OmniVirt + +1. Find **OmniVirt.app** in applications and click to start the program. + +2. OmniVirt needs to access the network. When the following dialog box is displayed, click **Allow**. + + + +3. Currently, OmniVirt can be accessed only in CLI mode. Open **Terminal.app** and use the CLI to perform operations. + +### Operations on Images + +1. List available images. + +```Shell +omnivirt images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 2203-load | Local | Ready | ++-----------+----------+--------------+ +``` + +There are two types of OmniVirt images: remote images and local images. Only local images in the **Ready** state can be used to create VMs. Remote images can be used only after being downloaded. You can also load a downloaded local image to OmniVirt. For details, see the following sections. + +2. Download a remote image. + +```Shell +omnivirt download-image 22.03-LTS + +Downloading: 22.03-LTS, this might take a while, please check image status with "images" command. +``` + +The image download request is an asynchronous request. The download is completed in the background. The time required depends on your network status. The overall image download process includes download, decompression, and format conversion. During the download, you can run the `image` command to view the download progress and image status at any time. + +```Shell +omnivirt images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 22.03-LTS | Local | Downloading | ++-----------+----------+--------------+ +``` + +When the image status changes to **Ready**, the image is downloaded successfully. The image in the **Ready** state can be used to create VMs. + +```Shell +omnivirt images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 22.03-LTS | Local | Ready | ++-----------+----------+--------------+ +``` + +3. Load a local image. + +Load a custom image or an image downloaded to the local host to OmniVirt to create a custom VM. + +```Shell +omnivirt load-image --path {image_file_path} IMAGE_NAME +``` + +The supported image formats are *xxx***.qcow2.xz** and *xxx***.qcow2**. + +Example: + +```Shell +omnivirt load-image --path /opt/openEuler-22.03-LTS-x86_64.qcow2.xz 2203-load + +Loading: 2203-load, this might take a while, please check image status with "images" command. +``` + +Load the **openEuler-22.03-LTS-x86_64.qcow2.xz** file in the **/opt** directory to the OmniVirt system and name it **2203-load**. Similar to the download command, the load command is also an asynchronous command. You need to run the image list command to query the image status until the image status is **Ready**. Compared with directly downloading an image, loading an image is much faster. + +```Shell +omnivirt images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 2203-load | Local | Loading | ++-----------+----------+--------------+ + +omnivirt images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 2203-load | Local | Ready | ++-----------+----------+--------------+ +``` + +4. Delete an image. + +Run the following command to delete an image from the OmniVirt system: + +```Shell +omnivirt delete-image 2203-load + +Image: 2203-load has been successfully deleted. +``` + +### Operations on VMs + +1. List VMs. + +```shell +omnivirt list + ++----------+-----------+---------+---------------+ +| Name | Image | State | IP | ++----------+-----------+---------+---------------+ +| test1 | 2203-load | Running | 172.22.57.220 | ++----------+-----------+---------+---------------+ +| test2 | 2203-load | Running | N/A | ++----------+-----------+---------+---------------+ +``` + +If the VM IP address is **N/A** and the VM status is **Running**, the VM is newly created and the network configuration is not complete. Configuring the network takes several seconds. You can obtain the VM information again later. + +2. Log in to a VM. + +If an IP address has been assigned to a VM, you can run the `ssh` command to log in to the VM. + +```Shell +ssh root@{instance_ip} +``` + +If the official image provided by the openEuler community is used, the default username is **root** and the default password is **openEuler12#$**. + +3. Create a VM. + +```Shell +omnivirt launch --image {image_name} {instance_name} +``` + +Use `--image` to specify an image and a VM name. + +4. Delete a VM. + +```Shell +omnivirt delete-instance {instance_name} +``` + +Delete a specified VM based on the VM name. + +[1]: https://developer.apple.com/documentation/vmnet diff --git a/docs/en/docs/OmniVirt/overall.md b/docs/en/docs/OmniVirt/overall.md new file mode 100644 index 0000000000000000000000000000000000000000..6410bd29bb81653ed54aaccd1856c41c218930f9 --- /dev/null +++ b/docs/en/docs/OmniVirt/overall.md @@ -0,0 +1,18 @@ +# OmniVirt + +OmniVirt is a developer tool set incubated by the technical operation team and infrastructure team of the openEuler community. It integrates virtualization technologies (such as LXD, HyperV, and Virtualization Framework) in mainstream desktop operating systems (OSs). It utilizes VMs and container images officially released by the openEuler community to provide developers with unified development resource (such as VMs and containers) provisioning and management experience on Windows, macOS, and Linux, improving the convenience and stability of using the openEuler development environment on mainstream desktop OSs, as well as developer experience. + +## Background + +The convenience and stability of development resources (such as VMs and containers) provided by mainstream desktop OSs are important factors that affect the experience of openEuler developers, especially for individuals and university developers with limited development resources. Common VM management platforms have many limitations. For example, VirtualBox requires developers to download a large ISO image and install the OS, WSL cannot provide a real openEuler kernel, most VM management software does not fully support Apple Sillicon chips, and many software needs to be paid, greatly reducing developers' work efficiency. + +OmniVirt provides a convenient, easy-to-use, and unified developer tool set on mainstream desktop OSs such as Windows, macOS, and Linux (planned). It supported the x86_64 and AArch64 hardware architectures, including Apple Sillicon chips. It also supports virtual hardware acceleration capabilities for different platforms, providing developers with high-performance development resources. OmniVirt allows users to use VMs and container images (planned) released by the openEuler community, Daily Build images provided by the openEuler community, and other custom images that meet requirements, providing developers with multiple choices. + +## Quick Start + +For macOS users, see [Installing and Running OmniVirt on macOS][1]. + +For Windows users, see [Installing and Running OmniVirt in Windows][2]. + +[1]: ./mac-user-manual.md +[2]: ./win-user-manual.md diff --git a/docs/en/docs/OmniVirt/win-user-manual.md b/docs/en/docs/OmniVirt/win-user-manual.md new file mode 100644 index 0000000000000000000000000000000000000000..c69fc8cdcc7610cc4fb7544e346fa17923396d28 --- /dev/null +++ b/docs/en/docs/OmniVirt/win-user-manual.md @@ -0,0 +1,190 @@ +# Installing and Running OmniVirt in Windows + +OmniVirt currently supports Windows 10/11. [Download the latest version of OmniVirt][1] for Windows and decompress it to the desired location. +Right-click **config-env.bat** and choose **Run as administrator** from the shortcut menu. This script adds the current directory to the system environment variable `path`. If you know how to configure environment variables or the configuration script is incorrect, you can manually add the directory where the current script is located and the **qemu-img** subdirectory to the system environment variable `path`. + +To run OmniVirt on Windows, you need to connect OmniVirt to the Hyper-V virtualization backend. Hyper-V is a Microsoft hardware virtualization product that provides better performance for VMs on Windows. Before running OmniVirt, check whether Hyper-V is enabled in your system. For details about how to check and enable Hyper-V, see [Install Hyper-V][2] or other network resources. + +The directory generated after the decompression contains the following files: + +- **omnivirtd.exe**: main process of OmniVirt. It is a daemon process running in the background. It interacts with various virtualization backends and manages the life cycles of VMs, containers, and images. +- **omnivirt.exe**: OmniVirt CLI client. You can use this client to interact with the omnivirtd daemon process and perform operations on VMs and images. +- **omnivirt-win.conf**: OmniVirt configuration file, which must be stored in the same directory as **omnivirtd.exe**. Configure the file as follows: + +```Conf +[default] +# Configure the directory for storing log files. +log_dir = D:\omnivirt-workdir\logs +# Whether to enable the debug log level. +debug = True +# Configure the OmniVirt working directory. +work_dir = D:\omnivirt-workdir +# Configure the OmniVirt image directory. The image directory is relative to the working directory. +image_dir = images +# Configure the VM file directory of OmniVirt. The VM file directory is relative to the working directory. +instance_dir = instances +``` + +After the configuration is complete, right-click **omnivirtd.exe** and choose **Run as administrator** from the shortcut menu. **omnivirtd.exe** runs as a daemon process in the background. + +Start PowerShell or Terminal to prepare for the corresponding operation. + +## Exiting the omnivirtd Background Process in Windows + +After **omnivirtd.exe** is executed, the omnivirtd icon is displayed in the system tray in the lower right corner of the OS. + + +Right-click the icon in the system tray and choose **Exit OmniVirt** from the shortcut menu to exit the omnivirtd background process. + +### Operations on Images + +1. List available images. + +```PowerShell +omnivirt.exe images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 2203-load | Local | Ready | ++-----------+----------+--------------+ +``` + +There are two types of OmniVirt images: remote images and local images. Only local images in the **Ready** state can be used to create VMs. Remote images can be used only after being downloaded. You can also load a downloaded local image to OmniVirt. For details, see the following sections. + +2. Download a remote image. + +```PowerShell +omnivirt.exe download-image 22.03-LTS + +Downloading: 22.03-LTS, this might take a while, please check image status with "images" command. +``` + +The image download request is an asynchronous request. The download is completed in the background. The time required depends on your network status. The overall image download process includes download, decompression, and format conversion. During the download, you can run the `image` command to view the download progress and image status at any time. + +```PowerShell +omnivirt.exe images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 22.03-LTS | Local | Downloading | ++-----------+----------+--------------+ +``` + +When the image status changes to **Ready**, the image is downloaded successfully. The image in the **Ready** state can be used to create VMs. + +```PowerShell +omnivirt.exe images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 22.03-LTS | Local | Ready | ++-----------+----------+--------------+ +``` + +3. Load a local image. + +Load a custom image or an image downloaded to the local host to OmniVirt to create a custom VM. + +```PowerShell +omnivirt.exe load-image --path {image_file_path} IMAGE_NAME +``` + +The supported image formats are *xxx***.qcow2.xz** and *xxx***.qcow2**. + +Example: + +```PowerShell +omnivirt.exe load-image --path D:\openEuler-22.03-LTS-x86_64.qcow2.xz 2203-load + +Loading: 2203-load, this might take a while, please check image status with "images" command. +``` + +Load the **openEuler-22.03-LTS-x86_64.qcow2.xz** file in the **D:\** directory to the OmniVirt system and name it **2203-load**. Similar to the download command, the load command is also an asynchronous command. You need to run the image list command to query the image status until the image status is **Ready**. Compared with directly downloading an image, loading an image is much faster. + +```PowerShell +omnivirt.exe images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 2203-load | Local | Loading | ++-----------+----------+--------------+ + +omnivirt images + ++-----------+----------+--------------+ +| Images | Location | Status | ++-----------+----------+--------------+ +| 22.03-LTS | Remote | Downloadable | +| 21.09 | Remote | Downloadable | +| 2203-load | Local | Ready | ++-----------+----------+--------------+ +``` + +4. Delete an image. + +Run the following command to delete an image from the OmniVirt system: + +```PowerShell +omnivirt.exe delete-image 2203-load + +Image: 2203-load has been successfully deleted. +``` + +### Operations on VMs + +1. List VMs. + +```Powershell +omnivirt.exe list + ++----------+-----------+---------+---------------+ +| Name | Image | State | IP | ++----------+-----------+---------+---------------+ +| test1 | 2203-load | Running | 172.22.57.220 | ++----------+-----------+---------+---------------+ +| test2 | 2203-load | Running | N/A | ++----------+-----------+---------+---------------+ +``` + +If the VM IP address is **N/A** and the VM status is **Running**, the VM is newly created and the network configuration is not complete. Configuring the network takes several seconds. You can obtain the VM information again later. + +2. Log in to a VM. + +If an IP address has been assigned to a VM, you can run the `ssh` command to log in to the VM. + +```PowerShell +ssh root@{instance_ip} +``` + +If the official image provided by the openEuler community is used, the default username is **root** and the default password is **openEuler12#$**. + +3. Create a VM. + +```PowerShell +omnivirt.exe launch --image {image_name} {instance_name} +``` + +Use `--image` to specify an image and a VM name. + +4. Delete a VM. + +```PowerShell +omnivirt.exe delete-instance {instance_name} +``` + +Delete a specified VM based on the VM name. + +[1]: https://gitee.com/openeuler/omnivirt/releases +[2]: https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v diff --git a/docs/en/docs/Open-Source-Software-Notice/openEuler-Open-Source-Software-Notice.zip b/docs/en/docs/Open-Source-Software-Notice/openEuler-Open-Source-Software-Notice.zip new file mode 100644 index 0000000000000000000000000000000000000000..ab74b7f0aa515f45d656bca9e018d47f7159c3d2 Binary files /dev/null and b/docs/en/docs/Open-Source-Software-Notice/openEuler-Open-Source-Software-Notice.zip differ diff --git a/docs/en/docs/Pin/pin-user-guide.md b/docs/en/docs/Pin/pin-user-guide.md new file mode 100644 index 0000000000000000000000000000000000000000..7d010d6ace7d52cb246c651fbc6755591fadb82d --- /dev/null +++ b/docs/en/docs/Pin/pin-user-guide.md @@ -0,0 +1,77 @@ +# Installation and Deployment + +## Software + +* Operating system (OS): openEuler 23.03 + +## Hardware + +* x86_64 +* Arm + +## Preparing the Environment + +* Install the openEuler OS. For details, see the *openEuler 23.03 Installation Guide*. + +## Installing PIN + +### rpmbuild + +#### Building a GCC Client + +```shell +rpmbuild -ba pin-gcc-client-0.4.1 +``` + +#### Building a PIN Server + +```shell +rpmbuild -ba pin-server-0.4.0 +``` + +### Build + +#### Building a GCC Client + +```shell +git clone https://gitee.com/openeuler/pin-gcc-client.git +cd pin-gcc-client +mkdir build +cd build +cmake ../ -DCMAKE_INSTALL_PREFIX=${INSTALL_PATH} -DCMAKE_INSTALL_LIBDIR=${INSTALL_LIB} -DCMAKE_SKIP_RPATH=ON -DMLIR_DIR=${MLIR_PATH} -DLLVM_DIR=${LLVM_PATH} +make && make install +``` + +#### Building a PIN Server + +```shell +git clone https://gitee.com/openeuler/pin-server.git +cd pin-server +mkdir build +cd build +cmake ../ -DCMAKE_INSTALL_PREFIX=${INSTALL_PATH} -DCMAKE_INSTALL_LIBDIR=${INSTALL_LIB} -DCMAKE_SKIP_RPATH=ON -DMLIR_DIR=${MLIR_PATH} -DLLVM_DIR=${LLVM_PATH} +make && make install +``` + +# Usage + +You can use `-fplugin` and `-fplugin-arg-libpin_xxx` to enable the Plug-IN (PIN) tool. +Command: + +```shell +$(TARGET): $(OBJS) + $(CXX) -fplugin=${CLIENT_PATH}/build/libpin_gcc_client.so \ + -fplugin-arg-libpin_gcc_client-server_path="${SERVER_PATH}/build/pin_server" \ + -fplugin-arg-libpin_gcc_client-log_level="1" \ + -fplugin-arg-libpin_gcc_client-arg1="xxx" +``` + +Compile options: + +`-fplugin`: path of the .so file of the PIN client. + +`-fplugin-arg-libpin_gcc_client-server_path`: path of the executable program of the PIN server. + +`-fplugin-arg-libpin_gcc_client-log_level`: default log level. The value ranges from `0` to `3`. The default value is `1`. + +`-fplugin-arg-libpin_gcc_client-argN`: other parameters that can be specified as required. The value of `N` must be a positive integer. diff --git a/docs/en/docs/Quickstart/quick-start.md b/docs/en/docs/Quickstart/quick-start.md index a18ba86d199fc9d4a34b3e3e40c1ebfcc9a9b466..608c7408564d25d98aebad9ed43d86d1a8a0ae8a 100644 --- a/docs/en/docs/Quickstart/quick-start.md +++ b/docs/en/docs/Quickstart/quick-start.md @@ -1,6 +1,6 @@ # Quick Start -This document uses openEuler 21.09 installed on the TaiShan 200 server as an example to describe how to quickly install and use openEuler OS. For details about the installation requirements and methods, see the [Installation Guide](./../Installation/Installation.html). +This document uses openEuler 23.03 installed on the TaiShan 200 server as an example to describe how to quickly install and use openEuler OS. For details about the installation requirements and methods, see the [Installation Guide](./../Installation/Installation.html). - [Quick Start](#quick-start) - [Making Preparations](#making-preparations) @@ -95,7 +95,6 @@ This document uses openEuler 21.09 installed on the TaiShan 200 server as an exa - ## Obtaining the Installation Source Perform the following operations to obtain the openEuler release package: @@ -106,74 +105,74 @@ Perform the following operations to obtain the openEuler release package: 3. Click the link provided after **Download ISO**. The download list is displayed. -4. Click **openEuler-21.09**. The openEuler 21.09 version download list is displayed. +4. Click **openEuler-23.03**. The openEuler 23.03 version download list is displayed. 5. Click **ISO**. The ISO download list is displayed. - + - **aarch64**: ISO image file of the AArch64 architecture - **x86\_64**: ISO image file of the x86\_64 architecture - **source**: ISO image file of the openEuler source code 6. Select the openEuler release package and verification file to be downloaded based on the architecture of the environment to be installed. - + - If the AArch64 architecture is used: - + 1. Click **aarch64**. - 2. Click **openEuler-21.09-aarch64-dvd.iso** to download the openEuler release package to the local host. - 3. Click **openEuler-21.09-aarch64-dvd.iso.sha256sum** to download the openEuler verification file to the local host. - + 2. Click **openEuler-23.03-aarch64-dvd.iso** to download the openEuler release package to the local host. + 3. Click **openEuler-23.03-aarch64-dvd.iso.sha256sum** to download the openEuler verification file to the local host. + - If the x86\_64 architecture is used: - + 1. Click **x86\_64**. - 2. Click **openEuler-21.09-x86\_64-dvd.iso** to download the openEuler release package to the local host. - 3. Click **openEuler-21.09-x86\_64-dvd.iso.sha256sum** to download the openEuler verification file to the local host. + 2. Click **openEuler-23.03-x86\_64-dvd.iso** to download the openEuler release package to the local host. + 3. Click **openEuler-23.03-x86\_64-dvd.iso.sha256sum** to download the openEuler verification file to the local host. ## Checking the Release Package Integrity To prevent incomplete download of the software package due to network or storage device problems during the transmission, you can perform the following steps to check the integrity of the obtained openEuler software package: 1. Obtain the verification value in the verification file. Run the following command: - - ``` - $cat openEuler-21.09-aarch64-dvd.iso.sha256sum + + ```shell + $cat openEuler-23.03-aarch64-dvd.iso.sha256sum ``` 2. Calculate the SHA256 verification value of the file. Run the following command: - - ``` - $sha256sum openEuler-21.09-aarch64-dvd.iso + + ```shell + $sha256sum openEuler-23.03-aarch64-dvd.iso ``` - + After the command is run, the verification value is displayed. 3. Check whether the values calculated in step 1 and step 2 are the same. - + If the verification values are the same, the .iso file is not damaged. If they are not the same, the file is damaged and you need to obtain the file again. ## Starting Installation 1. Log in to the iBMC WebUI. - + For details, see [TaiShan 200 Server User Guide (Model 2280)](https://support.huawei.com/enterprise/en/doc/EDOC1100093459). 2. Choose **Configuration** from the main menu, and select **Boot Device** from the navigation tree. The **Boot Device** page is displayed. - + Set **Effective** and **Boot Medium** to **One-time** and **DVD-ROM**, respectively, and click **Save**, as shown in [Figure 1](#fig1011938131018). - + **Figure 1** Setting the boot device ![](./figures/setting-the-boot-device.png "setting-the-boot-device") 3. Choose **Remote Console** from the main menu. The **Remote Console** page is displayed. - + Select an integrated remote console as required to access the remote virtual console, for example, **Java Integrated Remote Console (Shared)**. 4. On the toolbar, click the icon shown in the following figure. - + **Figure 2** Drive icon ![](./figures/drive-icon.png "drive-icon") - + An image dialog box is displayed, as shown in the following figure. - + **Figure 3** Image dialog box ![](./figures/image-dialog-box.png "image-dialog-box") @@ -182,69 +181,69 @@ To prevent incomplete download of the software package due to network or storage 6. Select the image file and click **Open**. In the image dialog box, click **Connect**. If **Connect** changes to **Disconnect**, the virtual CD/DVD-ROM drive is connected to the server. 7. On the toolbar, click the restart icon shown in the following figure to restart the device. - + **Figure 4** Restart icon ![](./figures/restart-icon.png "restart-icon") 8. A boot menu is displayed after the system restarts, as shown in [Figure 5](#fig1648754873314). - + > ![](./public_sys-resources/icon-note.gif) **NOTE:** - > - > - If you do not perform any operations within 1 minute, the system automatically selects the default option **Test this media \& install openEuler 21.09** and enters the installation page. + > + > - If you do not perform any operations within 1 minute, the system automatically selects the default option **Test this media \& install openEuler 23.03** and enters the installation page. > - During physical machine installation, if you cannot use the arrow keys to select boot options and the system does not respond after you press **Enter**, click ![](./figures/en-us_image_0229420473.png) on the BMC page and configure **Key \& Mouse Reset**. - + **Figure 5** Installation wizard ![](./figures/Installation_wizard.png "Installation_wizard") -9. On the installation wizard page, press **Enter** to select the default option **Test this media \& install openEuler 21.09** to enter the GUI installation page. +9. On the installation wizard page, press **Enter** to select the default option **Test this media \& install openEuler 23.03** to enter the GUI installation page. ## Performing Installation After entering the GUI installation page, perform the following operations to install the system: 1. Set an installation language. The default language is English. You can change the language based on the site requirements, as shown in [Figure 6](#fig874344811484). - + **Figure 6** Selecting a language ![](./figures/selecting-a-language.png "selecting-a-language") 2. On the **INSTALLATION SUMMARY** page, set configuration items based on the site requirements. - + - A configuration item with an alarm symbol must be configured. When the alarm symbol disappears, you can perform the next operation. - A configuration item without an alarm symbol is configured by default. - You can click **Begin Installation** to install the system only when all alarms are cleared. - + **Figure 7** Installation summary - ![](./figures/installation-summary.png "installation-summary") - + ![](./figures/installation-summary.png "installation-summary") + 1. Select **Software Selection** to set configuration items. - + Based on the site requirements, select **Minimal Install** on the left box and select an add-on in the **Add-Ons for Selected Environment** area on the right, as shown in [Figure 8](#fig1133717611109). - + **Figure 8** Selecting installation software - ![](./figures/selecting-installation-software.png "selecting-installation-software") - + ![](./figures/selecting-installation-software.png "selecting-installation-software") + > ![](./public_sys-resources/icon-note.gif) **NOTE:** - > + > > - In **Minimal Install** mode, not all packages in the installation source are installed. If a required package is not installed, you can mount the installation source to the local host as a repo source, and use DNF to install the package. > - If you select **Virtual Host**, the virtualization components QEMU, libvirt, and edk2 are installed by default. You can select whether to install the OVS component in the add-on area. - + After the setting is complete, click **Done** in the upper left corner to go back to the **INSTALLATION SUMMARY** page. - + 2. Select **Installation Destination** to set configuration items. - + On the **INSTALLATION DESTINATION** page, select a local storage device. - + > ![](./public_sys-resources/icon-notice.gif) **NOTICE:** > The NVMe data protection feature is not supported because the NVMe drivers built in the BIOSs of many servers are of earlier versions. (Data protection: Format disk sectors to 512+N or 4096+N bytes.) Therefore, when selecting a proper storage medium, do not select an NVMe SSD with data protection enabled as the system disk. Otherwise, the OS may fail to boot. > Users can consult the server vendor about whether the BIOS supports NVMe disks with data protection enabled as system disks. If you cannot confirm whether the BIOS supports NVMe disks with data protection enabled as system disks, you are not advised to use an NVMe disk to install the OS, or you can disable the data protection function of an NVMe disk to install the OS. - + You also need to configure the storage to partition the system. You can either manually configure partitions or select **Automatic** for automatic partitioning. Select **Automatic** if the software is installed in a new storage device or the data in the storage device is not required, as shown in [Figure 9](#fig153381468101). - + **Figure 9** Setting the installation destination - ![](./figures/setting-the-installation-destination.png "setting-the-installation-destination") - + ![](./figures/setting-the-installation-destination.png "setting-the-installation-destination") + > ![](./public_sys-resources/icon-note.gif) **NOTE:** - > + > > - During partitioning, to ensure system security and performance, you are advised to divide the device into the following partitions: **/boot**, **/var**, **/var/log**, **/var/log/audit**, **/home**, and **/tmp**. > - If the system is configured with the **swap** partition, the **swap** partition is used when the physical memory of the system is insufficient. Although the **swap** partition can be used to expand the physical memory, if the **swap** partition is used due to insufficient memory, the system response slows and the system performance deteriorates. Therefore, you are not advised to configure the **swap** partition in a system with sufficient physical memory or a performance-sensitive system. > - If you need to split a logical volume group, select **Custom** to manually partition the logical volume group. On the **MANUAL PARTITIONING** page, click **Modify** in the **Volume Group** area to reconfigure the logical volume group. @@ -252,76 +251,76 @@ After entering the GUI installation page, perform the following operations to in After the setting is complete, click **Done** in the upper left corner to go back to the **INSTALLATION SUMMARY** page. 3. Select **Root Password** and set the root password. - + On the **ROOT PASSWORD** page, enter a password that meets the [Password Complexity](#password-complexity) requirements and confirm the password, as shown in [Figure 10](#zh-cn_topic_0186390266_zh-cn_topic_0122145909_fig1323165793018). - + > ![](./public_sys-resources/icon-note.gif) **NOTE:** - > + > > - The **root** account is used to perform key system management tasks. You are not advised to use the **root** account for daily work or system access. - > + > > - If you select **Lock root account** on the **Root Password** page, the **root** account will be disabled. - + **Password Complexity** - + The password of the **root** user or a new user must meet the password complexity requirements. Otherwise, the password setting or user creation will fail. The password must meet the following requirements: - + 1. Contains at least eight characters. - + 2. Contains at least three of the following: uppercase letters, lowercase letters, digits, and special characters. - + 3. Different from the user name. - + 4. Not allowed to contain words in the dictionary. - - > ![](./public_sys-resources/icon-note.gif) **NOTE:** + + > ![](./public_sys-resources/icon-note.gif) **NOTE:** > In the openEuler environment, you can run the `cracklib-unpacker /usr/share/cracklib/pw_dict > dictionary.txt` command to export the dictionary library file **dictionary.txt**. You can check whether the password is in this dictionary. - + **Figure 10** root password ![](./figures/password-of-the-root-account.png "Root password") - + After the settings are completed, click **Done** in the upper left corner to go back to the **INSTALLATION SUMMARY** page. 4. Select **Create a User** and set the parameters. - + [Figure 11](#zh-cn_topic_0186390266_zh-cn_topic_0122145909_fig1237715313319) shows the page for creating a user. Enter the user name and set the password. The password complexity requirements are the same as those of the root password. In addition, you can set the home directory and user group by clicking **Advanced**, as shown in [Figure 12](#zh-cn_topic_0186390266_zh-cn_topic_0122145909_fig1237715313319). - + **Figure 11** Creating a user - ![](./figures/creating-a-user.png "creating-a-user") + ![](./figures/creating-a-user.png "creating-a-user") **Figure 12** Advanced user configuration ![](./figures/advanced-user-configuration.png "Advanced user configuration") - + After the settings are completed, click **Done** in the upper left corner to go back to the **INSTALLATION SUMMARY** page. 5. Set other configuration items. You can use the default values for other configuration items. - + 3. Click **Start the Installation** to install the system, as shown in [Figure 13](#zh-cn_topic_0186390266_zh-cn_topic_0122145909_fig1237715313319). - + **Figure 13** Starting the installation ![](./figures/installation-process.png "installation-process") 4. After the installation is completed, restart the system. - + openEuler has been installed. Click **Reboot** to restart the system. ## Viewing System Information -After the system is installed and restarted, the system CLI login page is displayed. Enter the username and password set during the installation to log in to openEuler and view the following system information. For details about system management and configuration, see the [openEuler 21.09 Administrator Guide](../Administration/administration.html). +After the system is installed and restarted, the system CLI login page is displayed. Enter the username and password set during the installation to log in to openEuler and view the following system information. For details about system management and configuration, see the [openEuler 23.03 Administrator Guide](../Administration/administration.html). - Run the following command to view the system information: - ``` + ```shell cat /etc/os-release ``` For example, the command and output are as follows: - ``` + ```shell $ cat /etc/os-release NAME="openEuler" - VERSION="21.09" + VERSION="23.03" ID="openEuler" - VERSION_ID="21.09" - PRETTY_NAME="openEuler 21.09" + VERSION_ID="23.03" + PRETTY_NAME="openEuler 23.03" ANSI_COLOR="0;31" ``` @@ -329,24 +328,24 @@ After the system is installed and restarted, the system CLI login page is displa Run the following command to view the CPU information: - ``` + ```shell lscpu ``` Run the following command to view the memory information: - ``` + ```shell free ``` Run the following command to view the disk information: - ``` + ```shell fdisk -l ``` - Run the following command to view the IP address: - ``` + ```shell ip addr ``` diff --git a/docs/en/docs/Releasenotes/installing-the-os.md b/docs/en/docs/Releasenotes/installing-the-os.md index 41846ebe9e2eb7d5ef4215137e9ff094e5a8ad36..be4b9e9e0447abc4f9d736c810053720c705ec49 100644 --- a/docs/en/docs/Releasenotes/installing-the-os.md +++ b/docs/en/docs/Releasenotes/installing-the-os.md @@ -2,7 +2,7 @@ ## Release Files -The openEuler release files include [ISO release package](http://repo.openeuler.org/openEuler-22.09/ISO/), [VM images](http://repo.openeuler.org/openEuler-22.09/virtual_machine_img/), [container images](http://repo.openeuler.org/openEuler-22.09/docker_img/), [embedded images](http://repo.openeuler.org/openEuler-22.09/embedded_img/), and [repo sources](http://repo.openeuler.org/openEuler-22.09/). [Table 1](#table8396719144315) describes the ISO release packages. [Table 3](#table1276911538154) describes the container images. [Table 5](#table953512211576) describes the repo sources, which are convenient for online use. +The openEuler release files include [ISO release package](http://repo.openeuler.org/openEuler-23.03/ISO/), [VM images](http://repo.openeuler.org/openEuler-23.03/virtual_machine_img/), [container images](http://repo.openeuler.org/openEuler-23.03/docker_img/), [embedded images](http://repo.openeuler.org/openEuler-23.03/embedded_img/), and [repo sources](http://repo.openeuler.org/openEuler-23.03/). [Table 1](#table8396719144315) describes the ISO release packages. [Table 3](#table1276911538154) describes the container images. [Table 5](#table953512211576) describes the repo sources, which are convenient for online use. **Table 1** ISO release packages @@ -14,37 +14,37 @@ The openEuler release files include [ISO release package](http://repo.openeuler. -

openEuler-22.09-aarch64-dvd.iso

+

openEuler-23.03-aarch64-dvd.iso

Base installation ISO file of the AArch64 architecture, including the core components for running the minimum system.

-

openEuler-22.09-everything-aarch64-dvd.iso

+

openEuler-23.03-everything-aarch64-dvd.iso

Full installation ISO file of the AArch64 architecture, including all components for running the entire system.

-

openEuler-22.09-everything-debug-aarch64-dvd.iso

+

openEuler-23.03-everything-debug-aarch64-dvd.iso

ISO file for openEuler debugging in the AArch64 architecture, including the symbol table information required for debugging.

-

openEuler-22.09-x86_64-dvd.iso

+

openEuler-23.03-x86_64-dvd.iso

Base installation ISO file of the x86_64 architecture, including the core components for running the minimum system.

-

openEuler-22.09-everything-x86_64-dvd.iso

+

openEuler-23.03-everything-x86_64-dvd.iso

Full installation ISO file of the x86_64 architecture, including all components for running the entire system.

-

openEuler-22.09-everything-debuginfo-x86_64-dvd.iso

+

openEuler-23.03-everything-debuginfo-x86_64-dvd.iso

ISO file for openEuler debugging in the x86_64 architecture, including the symbol table information required for debugging.

-

openEuler-22.09-source-dvd.iso

+

openEuler-23.03-source-dvd.iso

ISO file of the openEuler source code.

@@ -72,7 +72,6 @@ The openEuler release files include [ISO release package](http://repo.openeuler. - **Table 2** VM images @@ -83,12 +82,12 @@ The openEuler release files include [ISO release package](http://repo.openeuler. -

openEuler-22.09-aarch64.qcow2.xz

+

openEuler-23.03-aarch64.qcow2.xz

VM image of openEuler in the AArch64 architecture.

-

openEuler-22.09-x86_64.qcow2.xz

+

openEuler-23.03-x86_64.qcow2.xz

VM image of openEuler in the x86_64 architecture.

@@ -96,9 +95,8 @@ The openEuler release files include [ISO release package](http://repo.openeuler. - ->![](./public_sys-resources/icon-note.gif) **NOTE** ->The default password of root user of the VM image is **openEuler12#$**. Change the password upon the first login. +>![](./public_sys-resources/icon-note.gif) **NOTE** +>The default password of root user of the VM image is **openEuler12#$**. Change the password upon the first login. **Table 3** Container images @@ -123,16 +121,15 @@ The openEuler release files include [ISO release package](http://repo.openeuler. - **Table 4** Embedded images | Name | Description | | -------------------------------------- | ------------------------------------------------------------ | | arm64/aarch64-std/zImage | Kernel image that supports QEMU in the AArch64 architecture. | -| arm64/aarch64-std/\*toolchain-22.09.sh | Development and compilation toolchain in the AArch64 architecture. | +| arm64/aarch64-std/\*toolchain-23.03.sh | Development and compilation toolchain in the AArch64 architecture. | | arm64/aarch64-std/\*rootfs.cpio.gz | File system that supports QEMU in the AArch64 architecture. | | arm32/arm-std/zImage | Kernel image that supports QEMU in the ARM architecture. | -| arm32/arm-std/\*toolchain-22.09.sh | Development and compilation toolchain in the ARM architecture. | +| arm32/arm-std/\*toolchain-23.03.sh | Development and compilation toolchain in the ARM architecture. | | arm32/arm-std/\*rootfs.cpio.gz | File system that supports QEMU in the ARM architecture. | | source-list/manifest.xml | Manifest of source code used for building. | @@ -204,11 +201,9 @@ The openEuler release files include [ISO release package](http://repo.openeuler. - - ## Minimum Hardware Specifications -[Table 6](#zh-cn_topic_0182825778_tff48b99c9bf24b84bb602c53229e2541) lists the minimum hardware specifications for installing openEuler 22.09-LTS. +[Table 6](#zh-cn_topic_0182825778_tff48b99c9bf24b84bb602c53229e2541) lists the minimum hardware specifications for installing openEuler 23.03-LTS. **Table 6** Minimum hardware requirements @@ -239,7 +234,6 @@ The openEuler release files include [ISO release package](http://repo.openeuler. - ## Hardware Compatibility [Table 7](#zh-cn_topic_0227922427_table39822012) describes the typical configurations of servers and components supported by openEuler. openEuler will support more servers in the future. Partners and developers are welcome to participate in the contribution and verification. For details about the servers supported by openEuler, see [Compatibility List](https://www.openeuler.org/en/compatibility/). @@ -314,6 +308,3 @@ The openEuler release files include [ISO release package](http://repo.openeuler. - - - diff --git a/docs/en/docs/Releasenotes/key-features.md b/docs/en/docs/Releasenotes/key-features.md index 0b6bd0c0697c835c38e1d2028cf82a23638ce3db..4a10d0124e462ad49095256f4d4d62a321b2a3d1 100644 --- a/docs/en/docs/Releasenotes/key-features.md +++ b/docs/en/docs/Releasenotes/key-features.md @@ -1,6 +1,6 @@ # Key Features -## OpenEuler 22.09 is built based on Linux Kernel 5.10 and absorbs beneficial features and innovative features of later versions from the community. +## openEuler 23.03 is built based on Linux Kernel 6.1 and absorbs beneficial features and innovative features of later versions from the community. - **BPF CO-RE (Compile Once-Run Everywhere) feature**: Solves the portability problem of BPF. After being compiled and verified by the kernel, the compiled program can correctly run on kernels of different versions without recompilation for different kernels. @@ -72,7 +72,7 @@ HybridSched is a full-stack solution for hybrid deployment of VMs, including enh ## Third-party Application Support - **OpenStack Yoga**: The OpenStack version is updated to the latest stable version Yoga released in April 2022, and the OpenStack-Helm component is supported in the OpenStack Yoga version of openEuler. -- **OpenStack deployment tool opensd**: Supports basic deployment of OpenStack Yoga on openEuler 22.09. +- **OpenStack deployment tool opensd**: Supports basic deployment of OpenStack Yoga on openEuler 23.03. - **Hybrid deployment of OpenStack Yoga on VMs**: The high-priority and low-priority VM technology is introduced to OpenStack Nova. VMs that have different requirements on CPU, I/O, and memory resources are deployed and migrated to the same compute node in scheduling, fully utilizing node resources. - **File backup and restoration**: Provide functions such as system backup, file backup, customized restoration, and one-click backup and restoration, greatly reducing O&M costs. diff --git a/docs/en/docs/Releasenotes/release_notes.md b/docs/en/docs/Releasenotes/release_notes.md index 34bc859270bc500e308ec2cb838b809a8f50714b..ceaf10372b370255257e346d1b9da487a9f8f75d 100644 --- a/docs/en/docs/Releasenotes/release_notes.md +++ b/docs/en/docs/Releasenotes/release_notes.md @@ -1,3 +1,3 @@ # Release Notes -This document is the release notes of openEuler 22.09. \ No newline at end of file +This document is the release notes of openEuler 23.03. diff --git a/docs/en/docs/Releasenotes/terms-of-use.md b/docs/en/docs/Releasenotes/terms-of-use.md index 760f72a7b96728d048315c01c35e5bd6840944bc..8fab84f65144bb0c9208baf04afcc3251bccf46d 100644 --- a/docs/en/docs/Releasenotes/terms-of-use.md +++ b/docs/en/docs/Releasenotes/terms-of-use.md @@ -1,6 +1,6 @@ # Terms of Use -**Copyright © 2022 openEuler Community** +**Copyright © 2023 openEuler Community** Your replication, use, modification, and distribution of this document are governed by the Creative Commons License Attribution-ShareAlike 4.0 International Public License \(CC BY-SA 4.0\). You can visit [https://creativecommons.org/licenses/by-sa/4.0/](https://creativecommons.org/licenses/by-sa/4.0/) to view a human-readable summary of \(and not a substitute for\) CC BY-SA 4.0. For the complete CC BY-SA 4.0, visit [https://creativecommons.org/licenses/by-sa/4.0/legalcode](https://creativecommons.org/licenses/by-sa/4.0/legalcode). diff --git a/docs/en/docs/Releasenotes/user-notice.md b/docs/en/docs/Releasenotes/user-notice.md index 60f0e9f9feaccb468f2d3e5c660a5cead22a8efd..58100935a8e155531b4abca47e2e70e3ee0c8474 100644 --- a/docs/en/docs/Releasenotes/user-notice.md +++ b/docs/en/docs/Releasenotes/user-notice.md @@ -1,6 +1,5 @@ # User Notice -- The version number counting rule of openEuler is changed from openEuler _x.x_ to openEuler _year_._month_. For example, openEuler 21.03 indicates that the version is released in March 2021. -- The [Python core team](https://www.python.org/dev/peps/pep-0373/#update) has stopped maintaining Python 2 in January 2020. Python 2 reached end of maintenance (EOM) on December 31, 2020. In 2021, openEuler 21.03 fixed only the critical CVEs related to Python 2. Please switch to Python 3 as soon as possible. -- From openEuler 22.03 LTS, only Python 3 is supported. Please switch to Python 3 to use the OS. - +- The version number counting rule of openEuler is changed from openEuler _x.x_ to openEuler _year_._month_. For example, openEuler 21.03 indicates that the version is released in March 2021. +- The [Python core team](https://www.python.org/dev/peps/pep-0373/#update) has stopped maintaining Python 2 in January 2020. Python 2 reached end of maintenance (EOM) on December 31, 2020. In 2021, openEuler 21.03 fixed only the critical CVEs related to Python 2. Please switch to Python 3 as soon as possible. +- From openEuler 22.03 LTS, only Python 3 is supported. Please switch to Python 3 to use the OS. diff --git a/docs/en/docs/TailorCustom/figures/flowchart.png b/docs/en/docs/TailorCustom/figures/flowchart.png deleted file mode 100644 index d3a71e8bfdb886222151cea3b2a3c0e8d8eae64a..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/TailorCustom/figures/flowchart.png and /dev/null differ diff --git a/docs/en/docs/TailorCustom/figures/lack_pack.png b/docs/en/docs/TailorCustom/figures/lack_pack.png deleted file mode 100644 index a4b7f1da15da70f63a86aae360e89017c2b98f2d..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/TailorCustom/figures/lack_pack.png and /dev/null differ diff --git a/docs/en/docs/TailorCustom/imageTailor-user-guide.md b/docs/en/docs/TailorCustom/imageTailor-user-guide.md deleted file mode 100644 index 2ad4ae70147104cf945e4eeeedfa07e587a552e0..0000000000000000000000000000000000000000 --- a/docs/en/docs/TailorCustom/imageTailor-user-guide.md +++ /dev/null @@ -1,928 +0,0 @@ -# ImageTailor User Guide - - - [Introduction](#introduction) - - [Installation](#installation) - - [Software and Hardware Requirements](#software-and-hardware-requirements) - - [Obtaining the Installation Package](#obtaining-the-installation-package) - - [Installing imageTailor](#installing-imagetailor) - - [Directory Description](#directory-description) - - [Image Customization](#image-customization) - - [Overall Process](#overall-process) - - [Customizing Service Packages](#customizing-service-packages) - - [Setting a Local Repo Source](#setting-a-local-repo-source) - - [Adding Files](#adding-files) - - [Adding RPM Packages](#adding-rpm-packages) - - [Adding Hook Scripts](#adding-hook-scripts) - - [Configuring System Parameters](#configuring-system-parameters) - - [Configuring Host Parameters](#configuring-host-parameters) - - [Configuring Initial Passwords](#configuring-initial-passwords) - - [Configuring Partitions](#configuring-partitions) - - [Configuring the Network](#configuring-the-network) - - [Configuring Kernel Parameters](#configuring-kernel-parameters) - - [Creating an Image](#creating-an-image) - - [Command Description](#command-description) - - [Image Creation Guide](#image-creation-guide) - - [Tailoring Time Zones](#tailoring-time-zones) - - [Customization Example](#customization-example) - - - -## Introduction - -In addition to the kernel, an operating system contains various peripheral packages. These peripheral packages provide functions of a general-purpose operating system but also cause the following problems: - -- A large number of resources (such as memory, disks, and CPUs) are occupied, resulting in low system performance. -- Unnecessary functions increase the development and maintenance costs. - -To address these problems, openEuler provides the imageTailor tool for tailoring and customization images. You can tailor unnecessary peripheral packages in the OS image or add service packages or files as required. This tool includes the following functions: - -- System package tailoring: Tailors system commands, libraries, and drivers based on the list of RPM packages to be installed. -- System configuration modification: Configures the host name, startup services, time zone, network, partitions, drivers to be loaded, and kernel version. -- Software package addition: Adds custom RPM packages or files to the system. - - - -## Installation - -This section uses openEuler 22.03 LTS in the AArch64 architecture as an example to describe the installation method. - -### Software and Hardware Requirements - -The software and hardware requirements of imageTailor are as follows: - -- The architecture is x86_64 or AArch64. - -- The OS is openEuler 22.03 LTS (the kernel version is 5.10 and the Python version is 3.9, which meet the tool requirements). - -- The root directory **/** of the host to run the tool have at least 40 GB space. - -- The Python version is 3.9 or later. - -- The kernel version is 5.10 or later. - -- The SElinux service is disabled. - - ```shell - $ sudo setenforce 0 - $ getenforce - Permissive - ``` - - - -### Obtaining the Installation Package - -Download the openEuler release package to install and use imageTailor. - -1. Obtain the ISO image file and the corresponding verification file. - - The image must be an everything image. Assume that the image is to be stored in the **root** directory. Run the following commands: - - ```shell - $ cd /root/temp - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso.sha256sum - ``` - -3. Obtain the verification value in the sha256sum verification file. - - ```shell - $ cat openEuler-22.03-LTS-everything-aarch64-dvd.iso.sha256sum - ``` - -4. Calculate the verification value of the ISO image file. - - ```shell - $ sha256sum openEuler-22.03-LTS-everything-aarch64-dvd.iso - ``` - -5. Compare the verification value in the sha256sum file with that of the ISO image. If they are the same, the file integrity is verified. Otherwise, the file integrity is damaged. You need to obtain the file again. - -### Installing imageTailor - -The following uses openEuler 22.03 LTS in AArch64 architecture as an example to describe how to install imageTailor. - -1. Ensure that openEuler 22.03 LTS (or a running environment that meets the requirements of imageTailor) has been installed on the host. - - ```shell - $ cat /etc/openEuler-release - openEuler release 22.03 LTS - ``` - -2. Create a **/etc/yum.repos.d/local.repo** file to configure the Yum source. The following is an example of the configuration file. **baseurl** indicates the directory for mounting the ISO image. - - ```shell - [local] - name=local - baseurl=file:///root/imageTailor_mount - gpgcheck=0 - enabled=1 - ``` - -3. Run the following commands as the **root** user to mount the image to the **/root/imageTailor_mount** directory as the Yum source (ensure that the value of **baseurl** is the same as that configured in the repo file and the disk space of the directory is greater than 20 GB): - - ```shell - $ mkdir /root/imageTailor_mount - $ sudo mount -o loop /root/temp/openEuler-22.03-LTS-everything-aarch64-dvd.iso /root/imageTailor_mount/ - ``` - -4. Make the Yum source take effect. - - ```shell - $ yum clean all - $ yum makecache - ``` - -5. Install the imageTailor tool as the **root** user. - - ```shell - $ sudo yum install -y imageTailor - ``` - -6. Run the following command as the **root** user to verify that the tool has been installed successfully: - - ```shell - $ cd /opt/imageTailor/ - $ sudo ./mkdliso -h - ------------------------------------------------------------------------------------------------------------- - Usage: mkdliso -p product_name -c configpath [--minios yes|no|force] [-h] [--sec] - Options: - -p,--product Specify the product to make, check custom/cfg_yourProduct. - -c,--cfg-path Specify the configuration file path, the form should be consistent with custom/cfg_xxx - --minios Make minios: yes|no|force - --sec Perform security hardening - -h,--help Display help information - - Example: - command: - ./mkdliso -p openEuler -c custom/cfg_openEuler --sec - - help: - ./mkdliso -h - ------------------------------------------------------------------------------------------------------------- - ``` - -### Directory Description - -After imageTailor is installed, the directory structure of the tool package is as follows: - -```shell -[imageTailor] - |-[custom] - |-[cfg_openEuler] - |-[usr_file] // Stores files to be added. - |-[usr_install] //Stores hook scripts to be added. - |-[all] - |-[conf] - |-[hook] - |-[cmd.conf] // Configures the default commands and libraries used by an ISO image. - |-[rpm.conf] // Configures the list of RPM packages and drivers installed by default for an ISO image. - |-[security_s.conf] // Configures security hardening policies. - |-[sys.conf] // Configures ISO image system parameters. - |-[kiwi] // Basic configurations of imageTailor. - |-[repos] //RPM sources for obtaining the RPM packages required for creating an ISO image. - |-[security-tool] // Security hardening tool. - |-mkdliso // Executable script for creating an ISO image. -``` - -## Image Customization - -This section describes how to use the imageTailor tool to package the service RPM packages, custom files, drivers, commands, and libraries to the target ISO image. - -### Overall Process - -The following figure shows the process of using imageTailor to customize an image. - -![](./figures/flowchart.png) - -The steps are described as follows: - -- Check software and hardware environment: Ensure that the host for creating the ISO image meets the software and hardware requirements. - -- Customize service packages: Add RPM packages (including service RPM packages, commands, drivers, and library files) and files (including custom files, commands, drivers, and library files). - - - Adding service RPM packages: Add RPM packages to the ISO image as required. For details, see [Installation](#installation). - - Adding custom files: If you want to perform custom operations such as hardware check, system configuration check, and driver installation when the target ISO system is installed or started, you can compile custom files and package them to the ISO image. - - Adding drivers, commands, and library files: If the RPM package source of openEuler does not contain the required drivers, commands, or library files, you can use imageTailor to package the corresponding drivers, commands, or library files into the ISO image. - -- Configure system parameters: - - - Configuring host parameters: To ensure that the ISO image is successfully installed and started, you need to configure host parameters. - - Configuring partitions: You can configure service partitions based on the service plan and adjust system partitions. - - Configuring the network: You can set system network parameters as required, such as the NIC name, IP address, and subnet mask. - - Configuring the initial password: To ensure that the ISO image is successfully installed and started, you need to configure the initial passwords of the **root** user and GRUB. - - Configuring kernel parameters: You can configure the command line parameters of the kernel as required. - -- Configure security hardening policies. - - ImageTailor provides default security hardening policies. You can modify **security_s.conf** (in the ISO image customization phase) to perform secondary security hardening on the system based on service requirements. For details, see the [Security Hardening Guide](https://docs.openeuler.org/en/docs/22.03_LTS/docs/SecHarden/secHarden.html). - -- Create an ISO image: - - Use the imageTailor tool to create an ISO image. - -### Customizing Service Packages - -You can pack service RPM packages, custom files, drivers, commands, and library files into the target ISO image as required. - -#### Setting a Local Repo Source - -To customize an ISO image, you must set a repo source in the **/opt/imageTailor/repos/euler_base/** directory. This section describes how to set a local repo source. - -1. Download the ISO file released by openEuler. (The RPM package of the everything image released by the openEuler must be used.) - ```shell - $ cd /opt - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso - ``` - -2. Create a mount directory **/opt/openEuler_repo** and mount the ISO file to the directory. - ```shell - $ sudo mkdir -p /opt/openEuler_repo - $ sudo mount openEuler-22.03-LTS-everything-aarch64-dvd.iso /opt/openEuler_repo - mount: /opt/openEuler_repo: WARNING: source write-protected, mounted read-only. - ``` - -3. Copy the RPM packages in the ISO file to the **/opt/imageTailor/repos/euler_base/** directory. - ```shell - $ sudo rm -rf /opt/imageTailor/repos/euler_base && sudo mkdir -p /opt/imageTailor/repos/euler_base - $ sudo cp -ar /opt/openEuler_repo/Packages/* /opt/imageTailor/repos/euler_base - $ sudo chmod -R 644 /opt/imageTailor/repos/euler_base - $ sudo ls /opt/imageTailor/repos/euler_base|wc -l - 2577 - $ sudo umount /opt/openEuler_repo && sudo rm -rf /opt/openEuler_repo - $ cd /opt/imageTailor - ``` - -#### Adding Files - -You can add files to an ISO image as required. The file types include custom files, drivers, commands, or library file. Store the files to the **/opt/imageTailor/custom/cfg_openEuler/usr_file** directory. - -##### Precautions - -- The commands to be packed must be executable. Otherwise, imageTailor will fail to pack the commands into the ISO. - -- The file stored in the **/opt/imageTailor/custom/cfg_openEuler/usr_file** directory will be generated in the root directory of the ISO. Therefore, the directory structure of the file must be a complete path starting from the root directory so that imageTailor can place the file in the correct directory. - - For example, if you want **file1** to be in the **/opt** directory of the ISO, create an **opt** directory in the **usr_file** directory and copy **file1** to the **opt** directory. For example: - - ```shell - $ pwd - /opt/imageTailor/custom/cfg_openEuler/usr_file - - $ tree - . - ├── etc - │   ├── default - │   │   └── grub - │   └── profile.d - │   └── csh.precmd - └── opt - └── file1 - - 4 directories, 3 files - ``` - -- The paths in **/opt/imageTailor/custom/cfg_openEuler/usr_file** must be real paths. For example, the paths do not contain soft links. You can run the `realpath` or `readlink -f` command to query the real path. - -- If you need to invoke a custom script in the system startup or installation phase, that is, a hook script, store the file in the **hook** directory. - -#### Adding RPM Packages - -##### Procedure - -To add RPM packages (drivers, commands, or library files) to an ISO image, perform the following steps: - ->![](./public_sys-resources/icon-note.gif) **NOTE:** -> ->- The **rpm.conf** and **cmd.conf** files are stored in the **/opt/imageTailor/custom/cfg_openEuler/** directory. ->- The RPM package tailoring granularity below indicates **sys_cut='no'**. For details about the cutout granularity, see [Configuring Host Parameters](#configuring-host-parameters). ->- If no local repo source is configured, configure a local repo source by referring to [Setting a Local Repo Source](#setting-a-local-repo-source). -> - -1. Check whether the **/opt/imageTailor/repos/euler_base/** directory contains the RPM package to be added. - - - If yes, go to step 2. - - If no, go to step 3. -2. Configure the RPM package information in the **\** section in the **rpm.conf** file. - - For the RPM package tailoring granularity, no further action is required. - - For other tailoring granularities, go to step 4. -3. Obtain the RPM package and store it in the **/opt/imageTailor/custom/cfg_openEuler/usr_rpm** directory. If the RPM package depends on other RPM packages, store the dependency packages to this directory because the added RPM package and its dependent RPM packages must be packed into the ISO image at the same time. - - For the RPM package tailoring granularity, go to step 4. - - For other tailoring granularities, no further action is required. -4. Configure the drivers, commands, and library files to be retained in the RPM package in the **rpm.conf** and **cmd.conf** files. If there are common files to be tailored, configure them in the **\\** section in the **cmd.conf** file. - - -##### Configuration File Description - -| Operation | Configuration File| Section | -| :----------- | :----------- | :----------------------------------------------------------- | -| Adding drivers | rpm.conf | \
\
\

Note: The **driver_name** is the relative path of **/lib/modules/{kernel_version_number}/kernel/**.| -| Adding commands | cmd.conf | \
\
\
| -| Adding library files | cmd.conf | \
\
\
| -| Deleting other files| cmd.conf | \
\
\

Note: The file name must be an absolute path.| - -**Example** - -- Adding drivers - - ```shell - - - - - ...... - - ``` - -- Adding commands - - ```shell - - - - - ...... - - ``` - -- Adding library files - - ```shell - - - - - - ``` - -- Deleting other files - - ```shell - - - - - - ``` - -#### Adding Hook Scripts - -A hook script is invoked by the OS during startup and installation to execute the actions defined in the script. The directory for storing hook scripts of imageTailor is **custom/cfg_openEuler/usr_install/hook directory**, which has different subdirectories. Each subdirectory represents an OS startup or installation phase. Store the scripts based on the phases in which the scripts are invoked. - -##### Script Naming Rule - -The script name must start with **S+number** (the number must be at least two digits). The number indicates the execution sequence of the hook script. Example: **S01xxx.sh** - ->![](./public_sys-resources/icon-note.gif) **NOTE:** -> ->The scripts in the **hook** directory are executed using the `source` command. Therefore, exercise caution when using the `exit` command in the scripts because the entire installation script exits after the `exit` command is executed. - - - -##### Description of hook Subdirectories - -| Subdirectory | Script Example | Time for Execution | Description | -| :-------------------- | :---------------------| :------------------------------- | :----------------------------------------------------------- | -| insmod_drv_hook | N/A | After OS drivers are loaded | N/A | -| custom_install_hook | S01custom_install.sh | After the drivers are loaded, that is, after **insmod_drv_hook** is executed| You can customize the OS installation process by using a custom script.| -| env_check_hook | S01check_hw.sh | Before the OS installation initialization | The script is used to check hardware specifications and types before initialization.| -| set_install_ip_hook | S01set_install_ip.sh | When network configuration is being performed during OS installation initialization. | You can customize the network configuration by using a custom script.| -| before_partition_hook | S01checkpart.sh | Before partitioning | You can check correctness of the partition configuration file by using a custom script.| -| before_setup_os_hook | N/A | Before the repo file is decompressed | You can customize partition mounting.
If the decompression path of the installation package is not the root partition specified in the partition configuration, customize partition mounting and assign the decompression path to the input global variable.| -| before_mkinitrd_hook | S01install_drv.sh | Before the `mkinitrd` command is run | The hook script executed before running the `mkinitrd` command when **initrd** is saved to the disk. You can add and update driver files in **initrd**.| -| after_setup_os_hook | N/A | After OS installation | After the installation is complete, you can perform custom operations on the system files, such as modifying **grub.cfg**.| -| install_succ_hook | N/A | When the OS is successfully installed | The scripts in this subdirectory are used to parse the installation information and send information of whether the installation succeeds.**install_succ_hook** cannot be set to **install_break**.| -| install_fail_hook | N/A | When the OS installation fails | The scripts in this subdirectory are used to parse the installation information and send information of whether the installation succeeds.**install_fail_hook** cannot be set to **install_break**.| - -### Configuring System Parameters - -Before creating an ISO image, you need to configure system parameters, including host parameters, initial passwords, partitions, network, compilation parameters, and system command line parameters. - -#### Configuring Host Parameters - - The **\ \** section in the **/opt/imageTailor/custom/cfg_openEuler/sys.conf** file is used to configure common system parameters, such as the host name and kernel boot parameters. - -The default configuration provided by openEuler is as follows. You can modify the configuration as required. - -```shell - - sys_service_enable='ipcc' - sys_service_disable='cloud-config cloud-final cloud-init-local cloud-init' - sys_utc='yes' - sys_timezone='' - sys_cut='no' - sys_usrrpm_cut='no' - sys_hostname='Euler' - sys_usermodules_autoload='' - sys_gconv='GBK' - -``` - -The parameters are described as follows: - -- sys_service_enable - - This parameter is optional. Services enabled by the OS by default. Separate multiple services with spaces. If you do not need to add a system service, use the default value **ipcc**. Pay attention to the following during the configuration: - - - Default system services cannot be deleted. - - You can configure service-related services, but the repo source must contain the service RPM package. - - By default, only the services configured in this parameter are enabled. If a service depends on other services, you need to configure the depended services in this parameter. - -- sys_service_disable - - This parameter is optional. Services that are not allowed to automatically start upon system startup. Separate multiple services with spaces. If no system service needs to be disabled, leave this parameter blank. - -- sys_utc - - (Mandatory) Indicates whether to use coordinated universal time (UTC) time. The value can be **yes** or **no**. The default value is **yes**. - -- sys_timezone - - This parameter is optional. Sets the time zone. The value can be a time zone supported by openEuler, which can be queried in the **/usr/share/zoneinfo/zone.tab** file. - -- sys_cut - - (Mandatory) Indicates whether to tailor the RPM packages. The value can be **yes**, **no**, or **debug**.**yes** indicates that the RPM packages are tailored. **no** indicates that the RPM packages are not tailored (only the RPM packages in the **rpm.conf** file is installed). **debug** indicates that the RPM packages are tailored but the `rpm` command is retained for customization after installation. The default value is **no**. - - >![](./public_sys-resources/icon-note.gif) NOTE: - > - > - imageTailor installs the RPM package added by the user, deletes the files configured in the **\** section of the **cmd.conf** file, and then deletes the commands, libraries, and drivers that are not configured in **cmd.conf** or **rpm.conf**. - > - When **sys_cut='yes'** is configured, imageTailor does not support the installation of the `rpm` command. Even if the `rpm` command is configured in the **rpm.conf** file, the configuration does not take effect. - -- sys_usrrpm_cut - - (Mandatory) Indicates whether to tailor the RPM packages added by users to the **/opt/imageTailor/custom/cfg_openEuler/usr_rpm** directory. The value can be **yes** or **no**. The default value is **no**. - - - **sys_usrrpm_cut='yes'**: imageTailor installs the RPM packages added by the user, deletes the file configured in the **\** section in the **cmd.conf** file, and then deletes the commands, libraries, and drivers that are not configured in **cmd.conf** or **rpm.conf**. - - - **sys_usrrpm_cut='no'**: imageTailor installs the RPM packages added by the user but does not delete the files in the RPM packages. - -- sys_hostname - - (Mandatory) Host name. After the OS is deployed in batches, you are advised to change the host name of each node to ensure that the host name of each node is unique. - - The host name must be a combination of letters, digits, and hyphens (-) and must start with a letter or digit. Letters are case sensitive. The value contains a maximum of 63 characters. The default value is **Euler**. - -- sys_usermodules_autoload - - (Optional) Driver loaded during system startup. When configuring this parameter, you do not need to enter the file extension **.ko**. If there are multiple drivers, separate them by space. By default, this parameter is left blank, indicating that no additional driver is loaded. - -- sys_gconv - - (Optional) This parameter is used to tailor **/usr/lib/gconv** and **/usr/lib64/gconv**. The options are as follows: - - - **null**/**NULL**: indicates that this parameter is not configured. If **sys_cut='yes'** is configured, **/usr/lib/gconv** and **/usr/lib64/gconv** will be deleted. - - **all**/**ALL**: keeps **/usr/lib/gconv** and **/usr/lib64/gconv**. - - **xxx,xxx**: keeps the corresponding files in the **/usr/lib/gconv** and **/usr/lib64/gconv** directories. If multiple files need to be kept, use commas (,) to separate them. - -- sys_man_cut - - (Optional) Indicates whether to tailor the man pages. The value can be **yes** or **no**. The default value is **yes**. - - - ->![](./public_sys-resources/icon-note.gif) NOTE: -> -> If both **sys_cut** and **sys_usrrpm_cut** are configured, **sys_cut** is used. The following rules apply: -> -> - sys_cut='no' -> -> No matter whether **sys_usrrpm_cut** is set to **yes** or **no**, the system RPM package tailoring granularity is used. That is, imageTailor installs the RPM packages in the repo source and the RPM packages in the **usr_rpm** directory, however, the files in the RPM package are not deleted. Even if some files in the RPM packages are not required, imageTailor will delete them. -> -> - sys_cut='yes' -> -> - sys_usrrpm_cut='no' -> -> System RPM package tailoring granularity: imageTailor deletes files in the RPM packages in the repo sources as configured. -> -> - sys_usrrpm_cut='yes' -> -> System and user RPM package tailoring granularity: imageTailor deletes files in the RPM packages in the repo sources and the **usr_rpm** directory as configured. -> - - - -#### Configuring Initial Passwords - -The **root** and GRUB passwords must be configured during OS installation. Otherwise, you cannot log in to the OS as the **root** user after the OS is installed using the tailored ISO image. This section describes how to configure the initial passwords. - -> ![](./public_sys-resources/icon-note.gif) NOTE: -> -> You must configure the initial **root** and GRUB passwords manually. - -##### Configuring the Initial Password of the root User - -###### Introduction - -The initial password of the **root** user is stored in the **/opt/imageTailor/custom/cfg_openEuler/rpm.conf** file. You can modify this file to set the initial password of the **root** user. - ->![](./public_sys-resources/icon-note.gif) **NOTE:** -> ->- If the `--minios yes/force` parameter is required when you run the `mkdliso` command to create an ISO image, you need to enter the corresponding information in the **/opt/imageTailor/kiwi/minios/cfg_minios/rpm.conf** file. - -The default configuration of the initial password of the **root** user in the **/opt/imageTailor/custom/cfg_openEuler/rpm.conf** file is as follows. Add a password of your choice. - -``` - - - -``` - -The parameters are described as follows: - -- **group**: group to which the user belongs. -- **pwd**: ciphertext of the initial password. The encryption algorithm is SHA-512. Replace **${pwd}** with the actual ciphertext. -- **home**: home directory of the user. -- **name**: name of the user to be configured. - -###### Modification Method - -Before creating an ISO image, you need to change the initial password of the **root** user. The following describes how to set the initial password of the **root** user (**root** permissions are required): - -1. Add a user for generating a password, for example, **testUser**. - - ```shell - $ sudo useradd testUser - ``` - -2. Set the password of **testUser**. Run the following command and set the password as prompted: - - ```shell - $ sudo passwd testUser - Changing password for user testUser. - New password: - Retype new password: - passwd: all authentication tokens updated successfully. - ``` - -3. View the **/etc/shadow** file. The content following **testUser** (string between two colons) is the ciphertext of the password. - - ``` shell script - $ sudo cat /etc/shadow | grep testUser - testUser:$6$YkX5uFDGVO1VWbab$jvbwkZ2Kt0MzZXmPWy.7bJsgmkN0U2gEqhm9KqT1jwQBlwBGsF3Z59heEXyh8QKm3Qhc5C3jqg2N1ktv25xdP0:19052:0:90:7:35:: - ``` - -4. Copy and paste the ciphertext to the **pwd** field in the **/opt/imageTailor/custom/cfg_openEuler/rpm.conf** file. - ``` shell script - - - - ``` - -5. If the `--minios yes/force` parameter is required when you run the `mkdliso` command to create an ISO image, configure the **pwd** field of the corresponding user in **/opt/imageTailor/kiwi/minios/cfg_minios/rpm.conf**. - - ``` shell script - - - - ``` - -##### Configuring the Initial GRUB Password - -The initial GRUB password is stored in the **/opt/imageTailor/custom/cfg_openEuler/usr_file/etc/default/grub** file. Modify this file to configure the initial GRUB password. If the initial GRUB password is not configured, the ISO image will fail to be created. - -> ![](./public_sys-resources/icon-note.gif) NOTE: -> -> - The **root** permissions are required for configuring the initial GRUB password. -> - The default user corresponding to the GRUB password is **root**. -> -> - The `grub2-set-password` command must exist in the system. If the command does not exist, install it in advance. - -1. Run the following command and set the GRUB password as prompted: - - ```shell - $ sudo grub2-set-password -o ./ - Enter password: - Confirm password: - grep: .//grub.cfg: No such file or directory - WARNING: The current configuration lacks password support! - Update your configuration with grub2-mkconfig to support this feature. - ``` - -2. After the command is executed, the **user.cfg** file is generated in the current directory. The content starting with **grub.pbkdf2.sha512** is the encrypted GRUB password. - - ```shell - $ sudo cat user.cfg - GRUB2_PASSWORD=grub.pbkdf2.sha512.10000.CE285BE1DED0012F8B2FB3DEA38782A5B1040FEC1E49D5F602285FD6A972D60177C365F1 - B5D4CB9D648AD4C70CF9AA2CF9F4D7F793D4CE008D9A2A696A3AF96A.0AF86AB3954777F40D324816E45DD8F66CA1DE836DC7FBED053DB02 - 4456EE657350A27FF1E74429546AD9B87BE8D3A13C2E686DD7C71D4D4E85294B6B06E0615 - ``` - -3. Copy the preceding ciphertext and add the following configuration to the **/opt/imageTailor/custom/cfg_openEuler/usr_file/etc/default/grub** file: - - ```shell - GRUB_PASSWORD="grub.pbkdf2.sha512.10000.CE285BE1DED0012F8B2FB3DEA38782A5B1040FEC1E49D5F602285FD6A972D60177C365F1 - B5D4CB9D648AD4C70CF9AA2CF9F4D7F793D4CE008D9A2A696A3AF96A.0AF86AB3954777F40D324816E45DD8F66CA1DE836DC7FBED053DB02 - 4456EE657350A27FF1E74429546AD9B87BE8D3A13C2E686DD7C71D4D4E85294B6B06E0615" - ``` - - -#### Configuring Partitions - -If you want to adjust system partitions or service partitions, modify the **\** section in the **/opt/imageTailor/custom/cfg_openEuler/sys.conf** file. - ->![](./public_sys-resources/icon-note.gif) **NOTE:** -> ->- System partition: partition for storing the OS. ->- Service partition: partition for service data. ->- The type of a partition is determined by the content it stores, not the size, mount path, or file system. ->- Partition configuration is optional. You can manually configure partitions after OS installation. - - The format of **\** is as follows: - -disk_ID mount _path partition _size partition_type file_system [Secondary formatting flag] - -The default configuration is as follows: - -``` shell script - -hd0 /boot 512M primary ext4 yes -hd0 /boot/efi 200M primary vfat yes -hd0 / 30G primary ext4 -hd0 - - extended - -hd0 /var 1536M logical ext4 -hd0 /home max logical ext4 - -``` - -The parameters are described as follows: - -- disk_ID: - ID of a disk. Set this parameter in the format of **hd***x*, where *x* indicates the *x*th disk. - - >![](./public_sys-resources/icon-note.gif) **NOTE:** - > - >Partition configuration takes effect only when the disk can be recognized. - -- mount_path: - Mount path to a specified partition. You can configure service partitions and adjust the default system partition. If you do not mount partitions, set this parameter to **-**. - - >![](./public_sys-resources/icon-note.gif) **NOTE:** - > - >- You must configure the mount path to **/**. You can adjust mount paths to other partitions according to your needs. - >- When the UEFI boot mode is used, the partition configuration in the x86_64 architecture must contain the mount path **/boot**, and the partition configuration in the AArch64 architecture must contain the mount path **/boot/efi**. - -- partition_size: - The value types are as follows: - - - G/g: The unit of a partition size is GB, for example, 2G. - - M/m: The unit of a partition size is MB, for example, 300M. - - T/t: The unit of a partition size is TB, for example, 1T. - - MAX/max: The rest space of a hard disk is used to create a partition. This value can only be assigned to the last partition. - - >![](./public_sys-resources/icon-note.gif) **NOTE:** -> - >- A partition size value cannot contain decimal numbers. If there are decimal numbers, change the unit of the value to make the value an integer. For example, 1.5 GB should be changed to 1536 MB. - >- When the partition size is set to **MAX**/**max**, the size of the remaining partition cannot exceed the limit of the supported file system type (the default file system type is **ext4**, and the maximum size is **16T**). - -- partition_type: - The values of partition types are as follows: - - - primary: primary partitions - - extended: extended partition (configure only *disk_ID* for this partition) - - logical: logical partitions - -- file_system: - Currently, **ext4** and **vfat** file systems are supported. - -- [Secondary formatting flag]: - Indicates whether to format the disk during secondary installation. This parameter is optional. - - - The value can be **yes** or **no**. The default value is **no**. - - >![](./public_sys-resources/icon-note.gif) **NOTE:** - > - >Secondary formatting indicates that openEuler has been installed on the disk before this installation. If the partition table configuration (partition size, mount point, and file type) used in the previous installation is the same as that used in the current installation, this flag can be used to configure whether to format the previous partitions, except the **/boot** and **/** partitions. If the target host is installed for the first time, this flag does not take effect, and all partitions with specified file systems are formatted. - -#### Configuring the Network - -The system network parameters are stored in **/opt/imageTailor/custom/cfg_openEuler/sys.conf**. You can modify the network parameters of the target ISO image, such as the NIC name, IP address, and subnet mask, by configuring **\\** in this file. - -The default network configuration in the **sys.conf** file is as follows. **netconfig-0** indicates the **eth0** NIC. If you need to configure an additional NIC, for example, **eth1**, add **\\** to the configuration file and set the parameters of **eth1**. - -```shell - -BOOTPROTO="dhcp" -DEVICE="eth0" -IPADDR="" -NETMASK="" -STARTMODE="auto" - -``` - -The following table describes the parameters. - -- | Parameter | Mandatory or Not| Value | Description | - | :-------- | -------- | :------------------------------------------------ | :----------------------------------------------------------- | - | BOOTPROTO | Yes | none / static / dhcp | **none**: No protocol is used for boot, and no IP address is assigned.
**static**: An IP address is statically assigned.
**dhcp**: An IP address is dynamically obtained using the dynamic host configuration protocol (DHCP).| - | DEVICE | Yes | Example: **eth1** | NIC name. | - | IPADDR | Yes | Example: **192.168.11.100** | IP address.
This parameter must be configured only when the value of **BOOTPROTO** is **static**.| - | NETMASK | Yes | - | Subnet mask.
This parameter must be configured only when the value of **BOOTPROTO** is **static**.| - | STARTMODE | Yes | manual / auto / hotplug / ifplugd / nfsroot / off | NIC start mode.
**manual**: A user runs the `ifup` command on a terminal to start an NIC.
**auto**/**hotplug**/**ifplug**/**nfsroot**: An NIC is started when the OS identifies it.
**off**: An NIC cannot be started in any situations.
For details about the parameters, run the `man ifcfg` command on the host that is used to create the ISO image.| - - -#### Configuring Kernel Parameters - -To ensure stable and efficient running of the system, you can modify kernel command line parameters as required. For an OS image created by imageTailor, you can modify the **GRUB_CMDLINE_LINUX** configuration in the **/opt/imageTailor/custom/cfg_openEuler/usr_file/etc/default/grub** file to modify the kernel command line parameters. The default settings of the kernel command line parameters in **GRUB_CMDLINE_LINUX** are as follows: - -```shell -GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 crashkernel=512M oops=panic softlockup_panic=1 reserve_kbox_mem=16M crash_kexec_post_notifiers panic=3 console=tty0" -``` - -The meanings of the configurations are as follows (for details about other common kernel command line parameters, see related kernel documents): - -- net.ifnames=0 biosdevname=0 - - Name the NIC in traditional mode. - -- crashkernel=512M - - The memory space reserved for kdump is 512 MB. - -- oops=panic panic=3 - - The kernel panics when an oops error occurs, and the system restarts 3 seconds later. - -- softlockup_panic=1 - - The kernel panics when a soft-lockup is detected. - -- reserve_kbox_mem=16M - - The memory space reserved for Kbox is 16 MB. - -- console=tty0 - - Specifies **tty0** as the output device of the first virtual console. - -- crash_kexec_post_notifiers - - After the system crashes, the function registered with the panic notification chain is called first, and then kdump is executed. - -### Creating an Image - -After customizing the operating system, you can use the `mkdliso` script to create the OS image file. The OSimage created using imageTailor is an ISO image file. - -#### Command Description - -##### Syntax - -**mkdliso -p openEuler -c custom/cfg_openEuler [--minios yes|no|force] [--sec] [-h]** - -##### Parameter Description - -| Parameter| Mandatory| Description | Value Range | -| -------- | -------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| -p | Yes | Specifies the product name. | **openEuler** | -| c | Yes | Specifies the relative path of the configuration file. | **custom/cfg_openEuler** | -| --minios | No | Specifies whether to create the **initrd** file that is used to boot the system during system installation. | The default value is **yes**.
**yes**: The **initrd** file will be created when the command is executed for the first time. When a subsequent `mkdliso` is executed, the system checks whether the **initrd** file exists in the **usr_install/boot** directory using sha256 verification. If the **initrd** file exists, it is not created again. Otherwise, it is created.
**no**: The **initrd** file is not created. The **initrd** file used for system boot and running is the same.
**force**: The **initrd** file will be created forcibly, regardless of whether it exists in the **usr_install/boot** directory or not.| -| --sec | No | Specifies whether to perform security hardening on the generated ISO file.
If this parameter is not specified, the user should undertake the resultant security risks| N/A | -| -h | No | Obtains help information. | N/A | - -#### Image Creation Guide - -To create an ISO image using`mkdliso`, perform the following steps: - ->![](./public_sys-resources/icon-note.gif) NOTE: -> -> - The absolute path to `mkdliso` must not contain spaces. Otherwise, the ISO image creation will fail. -> - In the environment for creating the ISO image, the value of **umask** must be set to **0022**. - -1. Run the `mkdliso` command as the **root** user to generate the ISO image file. The following command is used for reference: - - ```shell - # sudo /opt/imageTailor/mkdliso -p openEuler -c custom/cfg_openEuler --sec - ``` - - After the command is executed, the created files are stored in the **/opt/imageTailor/result/{date}** directory, including **openEuler-aarch64.iso** and **openEuler-aarch64.iso.sha256**. - -2. Verify the integrity of the ISO image file. Assume that the date and time is **2022-03-21-14-48**. - - ```shell - $ cd /opt/imageTailor/result/2022-03-21-14-48/ - $ sha256sum -c openEuler-aarch64.iso.sha256 - ``` - - If the following information is displayed, the ISO image creation is complete. - - ``` - openEuler-aarch64.iso: OK - ``` - - If the following information is displayed, the image is incomplete. The ISO image file is damaged and needs to be created again. - - ```shell - openEuler-aarch64.iso: FAILED - sha256sum: WARNING: 1 computed checksum did NOT match - ``` - -3. View the logs. - - After an image is created, you can view logs as required (for example, when an error occurs during image creation). When an image is created for the first time, the corresponding log file and security hardening log file are compressed into a TAR package (the log file is named in the format of **sys_custom_log_{Date}.tar.gz**) and stored in the **result/log directory**. Only the latest 50 compressed log packages are stored in this directory. If the number of compressed log packages exceeds 50, the earliest files will be overwritten. - - - -### Tailoring Time Zones - -After the customized ISO image is installed, you can tailor the time zones supported by the openEuler system as required. This section describes how to tailor the time zones. - -The information about time zones supported by openEuler is stored in the time zone folder **/usr/shre/zoninfo**. You can run the following command to view the time zone information: - -```shell -$ ls /usr/share/zoneinfo/ -Africa/ America/ Asia/ Atlantic/ Australia/ Etc/ Europe/ -Pacific/ zone.tab -``` - -Each subfolder represents an area. The current areas include continents, oceans, and **Etc**. Each area folder contains the locations that belong to it. Generally, a location is a city or an island. - -All time zones are in the format of *area/location*. For example, if China Standard Time is used in southern China, the time zone is Asia/Shanghai (location may not be the capital). The corresponding time zone file is: - -``` -/usr/share/zoneinfo/Asia/Shanghai -``` - -If you want to tailor some time zones, delete the corresponding time zone files. - -### Customization Example - -This section describes how to use imageTailor to create an ISO image. - -1. Check whether the environment used to create the ISO meets the requirements. - - ``` shell - $ cat /etc/openEuler-release - openEuler release 22.03 LTS - ``` - -2. Ensure that the root directory has at least 40 GB free space. - - ```shell - $ df -h - Filesystem Size Used Avail Use% Mounted on - ...... - /dev/vdb 196G 28K 186G 1% / - ``` - -3. Install the imageTailor tailoring tool. For details, see [Installation](#installation). - - ```shell - $ sudo yum install -y imageTailor - $ ll /opt/imageTailor/ - total 88K - drwxr-xr-x. 3 root root 4.0K Mar 3 08:00 custom - drwxr-xr-x. 10 root root 4.0K Mar 3 08:00 kiwi - -r-x------. 1 root root 69K Mar 3 08:00 mkdliso - drwxr-xr-x. 2 root root 4.0K Mar 9 14:48 repos - drwxr-xr-x. 2 root root 4.0K Mar 9 14:48 security-tool - ``` - -4. Configure a local repo source. - - ```shell - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso - $ sudo mkdir -p /opt/openEuler_repo - $ sudo mount openEuler-22.03-LTS-everything-aarch64-dvd.iso /opt/openEuler_repo - mount: /opt/openEuler_repo: WARNING: source write-protected, mounted read-only. - $ sudo rm -rf /opt/imageTailor/repos/euler_base && sudo mkdir -p /opt/imageTailor/repos/euler_base - $ sudo cp -ar /opt/openEuler_repo/Packages/* /opt/imageTailor/repos/euler_base - $ sudo chmod -R 644 /opt/imageTailor/repos/euler_base - $ sudo ls /opt/imageTailor/repos/euler_base|wc -l - 2577 - $ sudo umount /opt/openEuler_repo && sudo rm -rf /opt/openEuler_repo - $ cd /opt/imageTailor - ``` - -5. Change the **root** and GRUB passwords. - - Replace **\${pwd}** with the encrypted password by referring to [Configuring Initial Passwords](#configuring-initial-passwords). - - ```shell - $ cd /opt/imageTailor/ - $ sudo vi custom/cfg_openEuler/usr_file/etc/default/grub - GRUB_PASSWORD="${pwd1}" - $ - $ sudo vi kiwi/minios/cfg_minios/rpm.conf - - - - $ - $ sudo vi custom/cfg_openEuler/rpm.conf - - - - ``` - -6. Run the tailoring command. - - ```shell - $ sudo rm -rf /opt/imageTailor/result - $ sudo ./mkdliso -p openEuler -c custom/cfg_openEuler --minios force - ...... - Complete release iso file at: result/2022-03-09-15-31/openEuler-aarch64.iso - move all mkdliso log file to result/log/sys_custom_log_20220309153231.tar.gz - $ ll result/2022-03-09-15-31/ - total 889M - -rw-r--r--. 1 root root 889M Mar 9 15:32 openEuler-aarch64.iso - -rw-r--r--. 1 root root 87 Mar 9 15:32 openEuler-aarch64.iso.sha256 - ``` diff --git a/docs/en/docs/TailorCustom/isocut-usage-guide.md b/docs/en/docs/TailorCustom/isocut-usage-guide.md deleted file mode 100644 index 76df612d5595f78027843dbeff6d396d6378c8ba..0000000000000000000000000000000000000000 --- a/docs/en/docs/TailorCustom/isocut-usage-guide.md +++ /dev/null @@ -1,760 +0,0 @@ -# isocut Usage Guide - -- [Introduction](#introduction) -- [Software and Hardware Requirements](#software-and-hardware-requirements) -- [Installation](#Installation) -- [Tailoring and Customizing an Image](#tailoring-and-customizing-an-image) - - [Command Description](#command-description) - - [Software Package Source](#software-package-source) - - [Operation Guide](#operation-guide) - - -## Introduction -The size of an openEuler image is large, and the process of downloading or transferring an image is time-consuming. In addition, when an openEuler image is used to install the OS, all RPM packages contained in the image are installed. You cannot choose to install only the required software packages. - -In some scenarios, you do not need to install the full software package provided by the image, or you need to install additional software packages. Therefore, openEuler provides an image tailoring and customization tool. You can use this tool to customize an ISO image that contains only the required RPM packages based on an openEuler image. The software packages can be the ones contained in an official ISO image or specified in addition to meet custom requirements. - -This document describes how to install and use the openEuler image tailoring and customization tool. - -## Software and Hardware Requirements - -The hardware and software requirements of the computer to make an ISO file using the openEuler tailoring and customization tool are as follows: - -- The CPU architecture is AArch64 or x86_64. -- The operating system is openEuler 20.03 LTS SP3. -- You are advised to reserve at least 30 GB drive space for running the tailoring and customization tool and storing the ISO image. - -## Installation - -The following uses openEuler 20.03 LTS SP3 on the AArch64 architecture as an example to describe how to install the ISO image tailoring and customization tool. - -1. Ensure that openEuler 20.03 LTS SP3 has been installed on the computer. - - ``` shell script - $ cat /etc/openEuler-release - openEuler release 20.03 (LTS-SP3) - ``` - -2. Download the ISO image (must be an **everything** image) of the corresponding architecture and save it to any directory (it is recommended that the available space of the directory be greater than 20 GB). In this example, the ISO image is saved to the **/home/isocut_iso** directory. - - The download address of the AArch64 image is as follows: - - https://repo.openeuler.org/openEuler-20.03-LTS-SP3/ISO/aarch64/openEuler-20.03-LTS-SP3-everything-aarch64-dvd.iso - - > **Note:** - > The download address of the x86_64 image is as follows: - > - > https://repo.openeuler.org/openEuler-20.03-LTS-SP3/ISO/x86_64/openEuler-20.03-LTS-SP3-everything-x86_64-dvd.iso - -3. Create a **/etc/yum.repos.d/local.repo** file to configure the Yum source. The following is an example of the configuration file. **baseurl** is the directory for mounting the ISO image. - - ``` shell script - [local] - name=local - baseurl=file:///home/isocut_mount - gpgcheck=0 - enabled=1 - ``` - -4. Run the following command as the **root** user to mount the image to the **/home/isocut_mount** directory (ensure that the mount directory is the same as **baseurl** configured in the **repo** file) as the Yum source: - - ```shell - sudo mount -o loop /home/isocut_iso/openEuler-20.03-LTS-SP3-everything-aarch64-dvd.iso /home/isocut_mount - ``` - -5. Make the Yum source take effect. - - ```shell - yum clean all - yum makecache - ``` - -6. Install the image tailoring and customization tool as the **root** user. - - ```shell - sudo yum install -y isocut - ``` - -7. Run the following command as the **root** user to verify that the tool has been installed successfully: - - ```shell - $ sudo isocut -h - Checking input ... - usage: isocut [-h] [-t temporary_path] [-r rpm_path] [-k file_path] source_iso dest_iso - - Cut openEuler iso to small one - - positional arguments: - source_iso source iso image - dest_iso destination iso image - - optional arguments: - -h, --help show this help message and exit - -t temporary_path temporary path - -r rpm_path extern rpm packages path - -k file_path kickstart file - ``` - - - -## Tailoring and Customizing an Image - -This section describes how to use the image tailoring and customization tool to create an image by tailoring or adding RPM packages to an openEuler image. - -### Command Description - -#### Format - -Run the `isocut` command to use the image tailoring and customization tool. The command format is as follows: - -**isocut** [ --help | -h ] [ -t <*temp_path*> ] [ -r <*rpm_path*> ] [ -k <*file_path*> ] < *source_iso* > < *dest_iso* > - -#### Parameter Description - -| Parameter| Mandatory| Description| -| ------------ | -------- | -------------------------------------------------------- | -| --help \| -h | No| Queries the help information about the command.| -| -t <*temp_path*> | No| Specifies the temporary directory *temp_path* for running the tool, which is an absolute path. The default value is **/tmp**.| -| -r <*rpm_path*> | No| Specifies the path of the RPM packages to be added to the ISO image.| -| -k <*file_path*> | No | Specifies the kickstart template path if kickstart is used for automatic installation. | -| *source_iso* | Yes| Path and name of the ISO source image to be tailored. If no path is specified, the current path is used by default.| -| *dest_iso* | Yes| Specifies the path and name of the new ISO image created by the tool. If no path is specified, the current path is used by default.| - - - -### Software Package Source - -The RPM packages of the new image can be: - -- Packages contained in an official ISO image. In this case, the RPM packages to be installed are specified in the configuration file **/etc/isocut/rpmlist**. The configuration format is *software_package_name.architecture*. For example, **kernel.aarch64**. - -- Specified in addition. In this case, use the `-r` parameter to specify the path in which the RPM packages are stored when running the `isocut` command and add the RPM package names to the **/etc/isocut/rpmlist** configuration file. (See the name format above.) - - - - >![](./public_sys-resources/icon-note.gif) **NOTE:** - > - >- When customizing an image, if an RPM package specified in the configuration file cannot be found, the RPM package will not be added to the image. - >- If the dependency of the RPM package is incorrect, an error may be reported when running the tailoring and customization tool. - -### kickstart Functions - -You can use kickstart to install images automatically by using the `-k` parameter to specify a kickstart file when running the **isocut** command. - -The isocut tool provides a kickstart template (**/etc/isocut/anaconda-ks.cfg**). You can modify the template as required. - -#### Modifying the kickstart Template - -If you need to use the kickstart template provided by the isocut tool, perform the following modifications: - -- Configure the root user password and the GRUB2 password in the **/etc/isocut/anaconda-ks.cfg** file. Otherwise, the automatic image installation will pause during the password setting process, waiting for you to manually enter the passwords. -- If you want to specify additional RPM packages and use kickstart for automatic installation, specify the RPM packages in the **%packages** field in both the **/etc/isocut/rpmlist** file and the kickstart file. - -See the next section for details about how to modify the kickstart file. - -##### Configuring Initial Passwords - -###### Setting the Initial Password of the **root** User - -Set the initial password of the **root** user as follows in the **/etc/isocut/anaconda-ks.cfg** file. Replace **${pwd}** with the encrypted password. - -```shell -rootpw --iscrypted ${pwd} -``` - -Obtain the initial password of the **root** user as follows (**root** permissions are required): - -1. Add a user for generating the password, for example, **testUser**. - - ``` shell script - $ sudo useradd testUser - ``` - -2. Set the password for the **testUser** user. Run the following command to set the password as prompted: - - ``` shell script - $ sudo passwd testUser - Changing password for user testUser. - New password: - Retype new password: - passwd: all authentication tokens updated successfully. - ``` - -3. View the **/etc/shadow** file to obtain the encrypted password. The encrypted password is the string between the two colons (:) following the **testUser** user name. (******* is used as an example.) - - ``` shell script - $ sudo cat /etc/shadow | grep testUser - testUser:***:19052:0:90:7:35:: - ``` - -4. Run the following command to replace the **pwd** field in the **/etc/isocut/anaconda-ks.cfg** file with the encrypted password (replace __***__ with the actual password): - ``` shell script - rootpw --iscrypted *** - ``` - -###### Configuring the Initial GRUB2 Password - -Add the following configuration to the **/etc/isocut/anaconda-ks.cfg** file to set the initial GRUB2 password: Replace **${pwd}** with the encrypted password. - -```shell -%addon com_huawei_grub_safe --iscrypted --password='${pwd}' -%end -``` - -> ![](./public_sys-resources/icon-note.gif) NOTE: -> -> - The **root** permissions are required for configuring the initial GRUB password. -> - The default user corresponding to the GRUB password is **root**. -> -> - The `grub2-set-password` command must exist in the system. If the command does not exist, install it in advance. - -1. Run the following command and set the GRUB2 password as prompted: - - ```shell - $ sudo grub2-set-password -o ./ - Enter password: - Confirm password: - grep: .//grub.cfg: No such file or directory - WARNING: The current configuration lacks password support! - Update your configuration with grub2-mkconfig to support this feature. - ``` - -2. After the command is executed, the **user.cfg** file is generated in the current directory. The content starting with **grub.pbkdf2.sha512** is the encrypted GRUB2 password. - - ```shell - $ sudo cat user.cfg - GRUB2_PASSWORD=grub.pbkdf2.sha512.*** - ``` - -3. Add the following information to the **/etc/isocut/anaconda-ks.cfg** file. Replace ******* with the encrypted GRUB2 password. - - ```shell - %addon com_huawei_grub_safe --iscrypted --password='grub.pbkdf2.sha512.***' - %end - ``` - -##### Configuring the %packages Field - -If you want to specify additional RPM packages and use kickstart for automatic installation, specify the RPM packages in the **%packages** field in both the **/etc/isocut/rpmlist** file and the kickstart file. - -This section describes how to specify RPM packages in the **/etc/isocut/anaconda-ks.cfg** file. - -The default configurations of **%packages** in the **/etc/isocut/anaconda-ks.cfg** file are as follows: - -```shell -%packages --multilib --ignoremissing -acl.aarch64 -aide.aarch64 -...... -NetworkManager.aarch64 -%end -``` - -Add specified RPM packages to the **%packages** configurations in the following format: - -*software_package_name.architecture*. For example, **kernel.aarch64**. - -```shell -%packages --multilib --ignoremissing -acl.aarch64 -aide.aarch64 -...... -NetworkManager.aarch64 -kernel.aarch64 -%end -``` - -### Operation Guide - - - ->![](./public_sys-resources/icon-note.gif) **NOTE:** -> ->- Do not modify or delete the default configuration items in the **/etc/isocut/rpmlist** file. ->- All `isocut` operations require **root** permissions. ->- The source image to be tailored can be a basic image or **everything** image. In this example, the basic image **openEuler-20.03-LTS-SP3-aarch64-dvd.iso** is used. ->- In this example, assume that the new image is named **new.iso** and stored in the **/home/result** directory, the temporary directory for running the tool is **/home/temp**, and the additional RPM packages are stored in the **/home/rpms** directory. - - - -1. Open the configuration file **/etc/isocut/rpmlist** and specify the RPM packages to be installed (from the official ISO image). - - ``` shell script - sudo vi /etc/isocut/rpmlist - ``` - -2. Ensure that the space of the temporary directory for running the image tailoring and customization tool is greater than 8 GB. The default temporary directory is** /tmp**. You can also use the `-t` parameter to specify another directory as the temporary directory. The path of the directory must be an absolute path. In this example, the **/home/temp** directory is used. The following command output indicates that the available drive space of the **/home** directory is 38 GB, which meets the requirements. - - ```shell - $ df -h - Filesystem Size Used Avail Use% Mounted on - devtmpfs 1.2G 0 1.2G 0% /dev - tmpfs 1.5G 0 1.5G 0% /dev/shm - tmpfs 1.5G 23M 1.5G 2% /run - tmpfs 1.5G 0 1.5G 0% /sys/fs/cgroup - /dev/mapper/openeuler_openeuler-root 69G 2.8G 63G 5% / - /dev/sda2 976M 114M 796M 13% /boot - /dev/mapper/openeuler_openeuler-home 61G 21G 38G 35% /home - ``` - -3. Tailor and customize the image. - - **Scenario 1**: All RPM packages of the new image are from the official ISO image. - - ``` shell script - $ sudo isocut -t /home/temp /home/isocut_iso/openEuler-20.03-LTS-SP3-aarch64-dvd.iso /home/result/new.iso - Checking input ... - Checking user ... - Checking necessary tools ... - Initing workspace ... - Copying basic part of iso image ... - Downloading rpms ... - Finish create yum conf - finished - Regenerating repodata ... - Checking rpm deps ... - Getting the description of iso image ... - Remaking iso ... - Adding checksum for iso ... - Adding sha256sum for iso ... - ISO cutout succeeded, enjoy your new image "/home/result/new.iso" - isocut.lock unlocked ... - ``` - If the preceding information is displayed, the custom image **new.iso** is successfully created. - - **Scenario 2**: The RPM packages of the new image are from the official ISO image and additional packages in **/home/rpms**. - - ```shell - sudo isocut -t /home/temp -r /home/rpms /home/isocut_iso/openEuler-20.03-LTS-SP3-aarch64-dvd.iso /home/result/new.iso - ``` - **Scenario 3**: The kickstart file is used for automatic installation. You need to modify the **/etc/isocut/anaconda-ks.cfg** file. - ```shell - sudo isocut -t /home/temp -k /etc/isocut/anaconda-ks.cfg /home/isocut_iso/openEuler-20.03-LTS-SP3-aarch64-dvd.iso /home/result/new.iso - ``` - - -## FAQs - -### The System Fails to Be Installed Using an Image Tailored Based on the Default RPM Package List - -#### Context - -When isocut is used to tailor an image, the **/etc/isocut/rpmlist** configuration file is used to specify the software packages to be installed. - -Images of different OS versions contain different software packages. As a result, some packages may be missing during image tailoring. -Therefore, the **/etc/isocut/rpmlist** file contains only the kernel software package by default, -ensuring that the image can be successfully tailored. - -#### Symptom - -The ISO image is successfully tailored using the default configuration, but fails to be installed. - -An error message is displayed during the installation, indicating that packages are missing: - -![](./figures/lack_pack.png) - -#### Possible Cause - -The ISO image tailored based on the default RPM package list lacks necessary RPM packages during installation. -The missing RPM packages are displayed in the error message, and may vary depending on the version. - -#### Solution - -1. Add the missing packages. - - 1. Find the missing RPM packages based on the error message. - 2. Add the missing RPM packages to the **/etc/isocut/rpmlist** configuration file. - 3. Tailor and install the ISO image again. - - For example, if the missing packages are those in the example error message, modify the **rpmlist** configuration file as follows: - ```shell - $ cat /etc/isocut/rpmlist - kernel.aarch64 - lvm2.aarch64 - chrony.aarch64 - authselect.aarch64 - shim.aarch64 - efibootmgr.aarch64 - grub2-efi-aa64.aarch64 - dosfstools.aarch64 - ``` -# isocut Usage Guide - -- [Introduction](#introduction) -- [Software and Hardware Requirements](#software-and-hardware-requirements) -- [Installation](#Installation) -- [Tailoring and Customizing an Image](#tailoring-and-customizing-an-image) - - [Command Description](#command-description) - - [Software Package Source](#software-package-source) - - [Operation Guide](#operation-guide) - - -## Introduction -The size of an openEuler image is large, and the process of downloading or transferring an image is time-consuming. In addition, when an openEuler image is used to install the OS, all RPM packages contained in the image are installed. You cannot choose to install only the required software packages. - -In some scenarios, you do not need to install the full software package provided by the image, or you need to install additional software packages. Therefore, openEuler provides an image tailoring and customization tool. You can use this tool to customize an ISO image that contains only the required RPM packages based on an openEuler image. The software packages can be the ones contained in an official ISO image or specified in addition to meet custom requirements. - -This document describes how to install and use the openEuler image tailoring and customization tool. - -## Software and Hardware Requirements - -The hardware and software requirements of the computer to make an ISO file using the openEuler tailoring and customization tool are as follows: - -- The CPU architecture is AArch64 or x86_64. -- The operating system is openEuler 22.03 LTS. -- You are advised to reserve at least 30 GB drive space for running the tailoring and customization tool and storing the ISO image. - -## Installation - -The following uses openEuler 22.03 LTS on the AArch64 architecture as an example to describe how to install the ISO image tailoring and customization tool. - -1. Ensure that openEuler 22.03 LTS has been installed on the computer. - - ``` shell script - $ cat /etc/openEuler-release - openEuler release 22.03 LTS - ``` - -2. Download the ISO image (must be an **everything** image) of the corresponding architecture and save it to any directory (it is recommended that the available space of the directory be greater than 20 GB). In this example, the ISO image is saved to the **/home/isocut_iso** directory. - - The download address of the AArch64 image is as follows: - - https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso - - > **Note:** - > The download address of the x86_64 image is as follows: - > - > https://repo.openeuler.org/openEuler-22.03-LTS/ISO/x86_64/openEuler-22.03-LTS-everything-x86_64-dvd.iso - -3. Create a **/etc/yum.repos.d/local.repo** file to configure the Yum source. The following is an example of the configuration file. **baseurl** is the directory for mounting the ISO image. - - ``` shell script - [local] - name=local - baseurl=file:///home/isocut_mount - gpgcheck=0 - enabled=1 - ``` - -4. Run the following command as the **root** user to mount the image to the **/home/isocut_mount** directory (ensure that the mount directory is the same as **baseurl** configured in the **repo** file) as the Yum source: - - ```shell - sudo mount -o loop /home/isocut_iso/openEuler-22.03-LTS-everything-aarch64-dvd.iso /home/isocut_mount - ``` - -5. Make the Yum source take effect. - - ```shell - yum clean all - yum makecache - ``` - -6. Install the image tailoring and customization tool as the **root** user. - - ```shell - sudo yum install -y isocut - ``` - -7. Run the following command as the **root** user to check whether the tool has been installed successfully: - - ```shell - $ sudo isocut -h - Checking input ... - usage: isocut [-h] [-t temporary_path] [-r rpm_path] [-k file_path] source_iso dest_iso - - Cut EulerOS iso to small one - - positional arguments: - source_iso source iso image - dest_iso destination iso image - - optional arguments: - -h, --help show this help message and exit - -t temporary_path temporary path - -r rpm_path extern rpm packages path - -k file_path kickstart file - ``` - - - -## Tailoring and Customizing an Image - -This section describes how to use the image tailoring and customization tool to create an image by tailoring or adding RPM packages to an openEuler image. - -### Command Description - -#### Format - -Run the `isocut` command to use the image tailoring and customization tool. The command format is as follows: - -**isocut** [ --help | -h ] [ -t <*temp_path*> ] [ -r <*rpm_path*> ] [ -k <*file_path*> ] < *source_iso* > < *dest_iso* > - -#### Parameter Description - -| Parameter| Mandatory| Description| -| ------------ | -------- | -------------------------------------------------------- | -| --help \| -h | No| Queries the help information about the command.| -| -t <*temp_path*> | No| Specifies the temporary directory *temp_path* for running the tool, which is an absolute path. The default value is **/tmp**.| -| -r <*rpm_path*> | No| Specifies the path of the RPM packages to be added to the ISO image.| -| -k <*file_path*> | No | Specifies the kickstart template path if kickstart is used for automatic installation. | -| *source_iso* | Yes| Path and name of the ISO source image to be tailored. If no path is specified, the current path is used by default.| -| *dest_iso* | Yes| Specifies the path and name of the new ISO image created by the tool. If no path is specified, the current path is used by default.| - - - -### Software Package Source - -The RPM packages of the new image can be: - -- Packages contained in an official ISO image. In this case, the RPM packages to be installed are specified in the configuration file **/etc/isocut/rpmlist**. The configuration format is *software_package_name.architecture*. For example, **kernel.aarch64**. - -- Specified in addition. In this case, use the `-r` parameter to specify the path in which the RPM packages are stored when running the `isocut` command and add the RPM package names to the **/etc/isocut/rpmlist** configuration file. (See the name format above.) - - - - >![](./public_sys-resources/icon-note.gif) **NOTE:** - > - >- When customizing an image, if an RPM package specified in the configuration file cannot be found, the RPM package will not be added to the image. - >- If the dependency of the RPM package is incorrect, an error may be reported when running the tailoring and customization tool. - -### kickstart Functions - -You can use kickstart to install images automatically by using the `-k` parameter to specify a kickstart file when running the **isocut** command. - -The isocut tool provides a kickstart template (**/etc/isocut/anaconda-ks.cfg**). You can modify the template as required. - -#### Modifying the kickstart Template - -If you need to use the kickstart template provided by the isocut tool, perform the following modifications: - -- Configure the root user password and the GRUB2 password in the **/etc/isocut/anaconda-ks.cfg** file. Otherwise, the automatic image installation will pause during the password setting process, waiting for you to manually enter the passwords. -- If you want to specify additional RPM packages and use kickstart for automatic installation, specify the RPM packages in the **%packages** field in both the **/etc/isocut/rpmlist** file and the kickstart file. - -See the next section for details about how to modify the kickstart file. - -##### Configuring Initial Passwords - -###### Setting the Initial Password of the **root** User - -Set the initial password of the **root** user as follows in the **/etc/isocut/anaconda-ks.cfg** file. Replace **${pwd}** with the encrypted password. - -```shell -rootpw --iscrypted ${pwd} -``` - -Obtain the initial password of the **root** user as follows (**root** permissions are required): - -1. Add a user for generating the password, for example, **testUser**. - - ``` shell script - $ sudo useradd testUser - ``` - -2. Set the password for the **testUser** user. Run the following command to set the password as prompted: - - ``` shell script - $ sudo passwd testUser - Changing password for user testUser. - New password: - Retype new password: - passwd: all authentication tokens updated successfully. - ``` - -3. View the **/etc/shadow** file to obtain the encrypted password. The encrypted password is the string between the two colons (:) following the **testUser** user name. (******* is used as an example.) - - ``` shell script - $ sudo cat /etc/shadow | grep testUser - testUser:***:19052:0:90:7:35:: - ``` - -4. Run the following command to replace the **pwd** field in the **/etc/isocut/anaconda-ks.cfg** file with the encrypted password (replace __***__ with the actual password): - ``` shell script - rootpw --iscrypted *** - ``` - -###### Configuring the Initial GRUB2 Password - -Add the following configuration to the **/etc/isocut/anaconda-ks.cfg** file to set the initial GRUB2 password: Replace **${pwd}** with the encrypted password. - -```shell -%addon com_huawei_grub_safe --iscrypted --password='${pwd}' -%end -``` - -> ![](./public_sys-resources/icon-note.gif) NOTE: -> -> - The **root** permissions are required for configuring the initial GRUB password. -> - The default user corresponding to the GRUB password is **root**. -> -> - The `grub2-set-password` command must exist in the system. If the command does not exist, install it in advance. - -1. Run the following command and set the GRUB2 password as prompted: - - ```shell - $ sudo grub2-set-password -o ./ - Enter password: - Confirm password: - grep: .//grub.cfg: No such file or directory - WARNING: The current configuration lacks password support! - Update your configuration with grub2-mkconfig to support this feature. - ``` - -2. After the command is executed, the **user.cfg** file is generated in the current directory. The content starting with **grub.pbkdf2.sha512** is the encrypted GRUB2 password. - - ```shell - $ sudo cat user.cfg - GRUB2_PASSWORD=grub.pbkdf2.sha512.*** - ``` - -3. Add the following information to the **/etc/isocut/anaconda-ks.cfg** file. Replace ******* with the encrypted GRUB2 password. - - ```shell - %addon com_huawei_grub_safe --iscrypted --password='grub.pbkdf2.sha512.***' - %end - ``` - -##### Configuring the %packages Field - -If you want to specify additional RPM packages and use kickstart for automatic installation, specify the RPM packages in the **%packages** field in both the **/etc/isocut/rpmlist** file and the kickstart file. - -This section describes how to specify RPM packages in the **/etc/isocut/anaconda-ks.cfg** file. - -The default configurations of **%packages** in the **/etc/isocut/anaconda-ks.cfg** file are as follows: - -```shell -%packages --multilib --ignoremissing -acl.aarch64 -aide.aarch64 -...... -NetworkManager.aarch64 -%end -``` - -Add specified RPM packages to the **%packages** configurations in the following format: - -*software_package_name.architecture*. For example, **kernel.aarch64**. - -```shell -%packages --multilib --ignoremissing -acl.aarch64 -aide.aarch64 -...... -NetworkManager.aarch64 -kernel.aarch64 -%end -``` - -### Operation Guide - - - ->![](./public_sys-resources/icon-note.gif) **NOTE:** -> ->- Do not modify or delete the default configuration items in the **/etc/isocut/rpmlist** file. ->- All `isocut` operations require **root** permissions. ->- The source image to be tailored can be a basic image or **everything** image. In this example, the basic image **openEuler-22.03-LTS-aarch64-dvd.iso** is used. ->- In this example, assume that the new image is named **new.iso** and stored in the **/home/result** directory, the temporary directory for running the tool is **/home/temp**, and the additional RPM packages are stored in the **/home/rpms** directory. - - - -1. Open the configuration file **/etc/isocut/rpmlist** and specify the RPM packages to be installed (from the official ISO image). - - ``` shell script - sudo vi /etc/isocut/rpmlist - ``` - -2. Ensure that the space of the temporary directory for running the image tailoring and customization tool is greater than 8 GB. The default temporary directory is** /tmp**. You can also use the `-t` parameter to specify another directory as the temporary directory. The path of the directory must be an absolute path. In this example, the **/home/temp** directory is used. The following command output indicates that the available drive space of the **/home** directory is 38 GB, which meets the requirements. - - ```shell - $ df -h - Filesystem Size Used Avail Use% Mounted on - devtmpfs 1.2G 0 1.2G 0% /dev - tmpfs 1.5G 0 1.5G 0% /dev/shm - tmpfs 1.5G 23M 1.5G 2% /run - tmpfs 1.5G 0 1.5G 0% /sys/fs/cgroup - /dev/mapper/openeuler_openeuler-root 69G 2.8G 63G 5% / - /dev/sda2 976M 114M 796M 13% /boot - /dev/mapper/openeuler_openeuler-home 61G 21G 38G 35% /home - ``` - -3. Tailor and customize the image. - - **Scenario 1**: All RPM packages of the new image are from the official ISO image. - - ``` shell script - $ sudo isocut -t /home/temp /home/isocut_iso/openEuler-22.03-LTS-aarch64-dvd.iso /home/result/new.iso - Checking input ... - Checking user ... - Checking necessary tools ... - Initing workspace ... - Copying basic part of iso image ... - Downloading rpms ... - Finish create yum conf - finished - Regenerating repodata ... - Checking rpm deps ... - Getting the description of iso image ... - Remaking iso ... - Adding checksum for iso ... - Adding sha256sum for iso ... - ISO cutout succeeded, enjoy your new image "/home/result/new.iso" - isocut.lock unlocked ... - ``` - If the preceding information is displayed, the custom image **new.iso** is successfully created. - - **Scenario 2**: The RPM packages of the new image are from the official ISO image and additional packages in **/home/rpms**. - - ```shell - sudo isocut -t /home/temp -r /home/rpms /home/isocut_iso/openEuler-22.03-LTS-aarch64-dvd.iso /home/result/new.iso - ``` - **Scenario 3**: The kickstart file is used for automatic installation. You need to modify the **/etc/isocut/anaconda-ks.cfg** file. - ```shell - sudo isocut -t /home/temp -k /etc/isocut/anaconda-ks.cfg /home/isocut_iso/openEuler-22.03-LTS-aarch64-dvd.iso /home/result/new.iso - ``` - - -## FAQs - -### The System Fails to Be Installed Using an Image Tailored Based on the Default RPM Package List - -#### Context - -When isocut is used to tailor an image, the **/etc/isocut/rpmlist** configuration file is used to specify the software packages to be installed. - -Images of different OS versions contain different software packages. As a result, some packages may be missing during image tailoring. -Therefore, the **/etc/isocut/rpmlist** file contains only the kernel software package by default, -ensuring that the image can be successfully tailored. - -#### Symptom - -The ISO image is successfully tailored using the default configuration, but fails to be installed. - -An error message is displayed during the installation, indicating that packages are missing: - -![](./figures/lack_pack.png) - -#### Possible Cause - -The ISO image tailored based on the default RPM package list lacks necessary RPM packages during installation. -The missing RPM packages are displayed in the error message, and may vary depending on the version. - -#### Solution - -1. Add the missing packages. - - 1. Find the missing RPM packages based on the error message. - 2. Add the missing RPM packages to the **/etc/isocut/rpmlist** configuration file. - 3. Tailor and install the ISO image again. - - For example, if the missing packages are those in the example error message, modify the **rpmlist** configuration file as follows: - ```shell - $ cat /etc/isocut/rpmlist - kernel.aarch64 - lvm2.aarch64 - chrony.aarch64 - authselect.aarch64 - shim.aarch64 - efibootmgr.aarch64 - grub2-efi-aa64.aarch64 - dosfstools.aarch64 - ``` diff --git a/docs/en/docs/TailorCustom/overview.md b/docs/en/docs/TailorCustom/overview.md deleted file mode 100644 index 053bc0b8481c1b95bbdba0abbe17e94ff674f4fa..0000000000000000000000000000000000000000 --- a/docs/en/docs/TailorCustom/overview.md +++ /dev/null @@ -1,3 +0,0 @@ -# Tailoring and Customization Tool Usage Guide - -This document describes the tailoring and customization tool of openEuler, including the introduction, installation, and usage. \ No newline at end of file diff --git a/docs/en/docs/TailorCustom/public_sys-resources/icon-note.gif b/docs/en/docs/TailorCustom/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/TailorCustom/public_sys-resources/icon-note.gif and /dev/null differ diff --git a/docs/en/docs/commercial_cryptography_features/comm_crypto_app_conf.md b/docs/en/docs/commercial_cryptography_features/comm_crypto_app_conf.md deleted file mode 100644 index 30404ce4c54634bf430d2d154c10c45b8b1eebc1..0000000000000000000000000000000000000000 --- a/docs/en/docs/commercial_cryptography_features/comm_crypto_app_conf.md +++ /dev/null @@ -1 +0,0 @@ -TODO \ No newline at end of file diff --git a/docs/en/docs/oncn-bwm/.keep b/docs/en/docs/oncn-bwm/.keep deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/docs/en/docs/oncn-bwm/overview.md b/docs/en/docs/oncn-bwm/overview.md deleted file mode 100644 index 5068a6ed0285ae1cc217b022337a02a4eeb7a691..0000000000000000000000000000000000000000 --- a/docs/en/docs/oncn-bwm/overview.md +++ /dev/null @@ -1,257 +0,0 @@ -# oncn-bwm User Guide - -## Introduction - -With the rapid development of technologies such as cloud computing, big data, artificial intelligence, 5G, and the Internet of Things (IoT), data center construction becomes more and more important. However, the server resource utilization of the data center is very low, resulting in a huge waste of resources. To improve the utilization of server resources, oncn-bwm emerges. - -oncn-bwm is a pod bandwidth management tool applicable to hybrid deployment of offline services. It properly schedules network resources for nodes based on QoS levels to ensure online service experience and greatly improve the overall network bandwidth utilization of nodes. - -The oncn-bwm tool supports the following functions: - -- Enabling/Disabling/Querying pod bandwidth management -- Setting the pod network priority -- Setting the offline service bandwidth range and online service waterline -- Querying internal statistics - - - -## Installation - -To install the oncn-bwm tool, the operating system must be openEuler 22.09. Run the **yum** command on the host where the openEuler Yum source is configured to install the oncn-bwm tool. - -```shell -# yum install oncn-bwm -``` - -This section describes how to install the oncn-bwm tool. - -### Environmental Requirements - -* Operating system: openEuler 22.09 - -### Installation Procedure - -To install the oncn-bwm tool, do as follows: - -1. Configure the Yum source of openEuler and run the `yum` command to install oncn-bwm. - - ``` - yum install oncn-bwm - ``` - -## How to Use - -The oncn-bwm tool provides the `bwmcli` command line tool to enable pod bandwidth management or perform related configurations. The overall format of the `bwmcli` command is as follows: - -**bwmcli** < option(s) > - -> Note: -> -> The root permission is required for running the `bwmcli` command. -> -> Pod bandwidth management is supported only in the outbound direction of a node (packets are sent from the node to other nodes). -> -> Pod bandwidth management cannot be enabled for NICs for which tc qdisc rules have been configured. -> -> Upgrading the oncn-bwm package does not affect the enabling status before the upgrade. Uninstalling the oncn-bwm package disables pod bandwidth management for all NICs. - - -### Command Interfaces - -#### Pod Bandwidth Management - -**Commands and Functions** - -| Command Format | Function | -| --------------------------- | ------------------------------------------------------------ | -| **bwmcli –e** | Enables pod bandwidth management for a specified NIC.| -| **bwmcli -d** | Disables pod bandwidth management for a specified NIC.| -| **bwmcli -p devs** | Queries pod bandwidth management of all NICs on a node.| - -> Note: -> -> - If no NIC name is specified, the preceding commands take effect for all NICs on a node. -> -> - Enable pod bandwidth management before running other `bwmcli` commands. - - - -**Examples** - -- Enable pod bandwidth management for NICs eth0 and eth1. - - ```shell - # bwmcli –e eth0 –e eth1 - enable eth0 success - enable eth1 success - ``` - -- Disable pod bandwidth management for NICs eth0 and eth1. - - ```shell - # bwmcli –d eth0 –d eth1 - disable eth0 success - disable eth1 success - ``` - -- Query pod bandwidth management of all NICs on a node. - - ```shell - # bwmcli -p devs - eth0 : enabled - eth1 : disabled - eth2 : disabled - docker0 : disabled - lo : disabled - ``` - -#### Pod Network Priority - -**Commands and Functions** - -| Command Format | Function | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| **bwmcli –s** *path* | Sets the network priority of a pod. *path* indicates the cgroup path corresponding to the pod, and *prio* indicates the priority. The value of *path* can be a relative path or an absolute path. The default value of *prio* is **0**. The optional values are **0** and **-1**. The value **0** indicates online services, and the value **-1** indicates offline services.| -| **bwmcli –p** *path* | Queries the network priority of a pod. | - -> Note: -> -> Online and offline network priorities are supported. The oncn-bwm tool controls the bandwidth of pods in real time based on the network priority. The specific policy is as follows: For online pods, the bandwidth is not limited. For offline pods, the bandwidth is limited within the offline bandwidth range. - -**Examples** - -- Set the priority of the pod whose cgroup path is **/sys/fs/cgroup/net_cls/test_online** to **0**. - - ```shell - # bwmcli -s /sys/fs/cgroup/net_cls/test_online 0 - set prio success - ``` - -- Query the priority of the pod whose cgroup path is **/sys/fs/cgroup/net_cls/test_online**. - - ```shell - # bwmcli -p /sys/fs/cgroup/net_cls/test_online - 0 - ``` - - - -#### Offline Service Bandwidth Range - -| Command Format | Function | -| ---------------------------------- | ------------------------------------------------------------ | -| **bwmcli –s bandwidth** | Sets the offline bandwidth for a host or VM. **low** indicates the minimum bandwidth, and **high** indicates the maximum bandwidth. The unit is KB, MB, or GB, and the value range is [1 MB, 9999 GB].| -| **bwmcli –p bandwidth** | Queries the offline bandwidth of a host or VM. | - -> Note: -> -> - All NICs with pod bandwidth management enabled on a host are considered as a whole, that is, the configured online service waterline and offline service bandwidth range are shared. -> -> - The pod bandwidth configured using `bwmcli` takes effect for all offline services on a node. The total bandwidth of all offline services cannot exceed the bandwidth range configured for the offline services. There is no network bandwidth limit for online services. -> -> - The offline service bandwidth range and online service waterline are used together to limit the offline service bandwidth. When the online service bandwidth is lower than the configured waterline, the offline services can use the configured maximum bandwidth. When the online service bandwidth is higher than the configured waterline, the offline services can use the configured minimum bandwidth. - - - -**Examples** - -- Set the offline bandwidth to 30 Mbit/s to 100 Mbit/s. - - ```shell - # bwmcli -s bandwidth 30mb,100mb - set bandwidth success - ``` - -- Query the offline bandwidth range. - - ```shell - # bwmcli -p bandwidth - bandwidth is 31457280(B),104857600(B) - ``` - - - - -#### Online Service Waterline - -**Commands and Functions** - -| Command Format | Function | -| ---------------------------------------------- | ------------------------------------------------------------ | -| **bwmcli –s waterline** | Sets the online service waterline for a host or VM. *val* indicates the waterline value. The unit is KB, MB, or GB, and the value range is [20 MB, 9999 GB].| -| **bwmcli –p waterline** | Queries the online service waterline of a host or VM. | - -> Note: -> -> - When the total bandwidth of all online services on a host is higher than the waterline, the bandwidth that can be used by offline services is limited. When the total bandwidth of all online services on a host is lower than the waterline, the bandwidth that can be used by offline services is increased. -> - The system determines whether the total bandwidth of online services exceeds or is lower than the configured waterline every 10 ms. Then the system determines the bandwidth limit for offline services based on whether the online bandwidth collected within each 10 ms is higher than the waterline. - -**Examples** - -- Set the online service waterline to 20 MB. - - ```shell - # bwmcli -s waterline 20mb - set waterline success - ``` - -- Query the online service waterline. - - ```shell - # bwmcli -p waterline - waterline is 20971520(B) - ``` - - - -#### Statistics - -**Commands and Functions** - -| Command Format | Function | -| ------------------- | ------------------ | -| **bwmcli –p stats** | Queries internal statistics.| - - -> Note: -> -> - **offline_target_bandwidth**: target bandwidth for offline services. -> -> - **online_pkts**: total number of online service packets after pod bandwidth management is enabled. -> -> - **offline_pkts**: total number of offline service packets after pod bandwidth management is enabled. -> -> - **online_rate**: current online service rate. -> -> - **offline_rate**: current offline service rate. - - -**Examples** - -Query internal statistics. - -```shell -# bwmcli -p stats -offline_target_bandwidth: 2097152 -online_pkts: 2949775 -offline_pkts: 0 -online_rate: 602 -offline_rate: 0 -``` - - - - - -### Typical Use Case - -To configure pod bandwidth management on a node, perform the following steps: - -``` -bwmcli -p devs #Query the pod bandwidth management status of the NICs in the system. -bwmcli -e eth0 # Enable pod bandwidth management for the eth0 NIC. -bwmcli -s /sys/fs/cgroup/net_cls/online 0 # Set the network priority of the online service pod to 0 -bwmcli -s /sys/fs/cgroup/net_cls/offline -1 # Set the network priority of the offline service pod to -1. -bwmcli -s bandwidth 20mb,1gb # Set the bandwidth range for offline services. -bwmcli -s waterline 30mb # Set the waterline for online services. -``` diff --git a/docs/en/docs/rubik/example-of-isolation-for-hybrid-deployed-services.md b/docs/en/docs/rubik/example-of-isolation-for-hybrid-deployed-services.md deleted file mode 100644 index 669a51b6ab25409f1bdc10dbdebd0cd88d208453..0000000000000000000000000000000000000000 --- a/docs/en/docs/rubik/example-of-isolation-for-hybrid-deployed-services.md +++ /dev/null @@ -1,233 +0,0 @@ -## Example of Isolation for Hybrid Deployed Services - -### Environment Preparation - -Check whether the kernel supports isolation of hybrid deployed services. - -```bash -# Check whether isolation of hybrid deployed services is enabled in the /boot/config- system configuration. -# If CONFIG_QOS_SCHED=y, the function is enabled. Example: -cat /boot/config-5.10.0-60.18.0.50.oe2203.x86_64 | grep CONFIG_QOS -CONFIG_QOS_SCHED=y -``` - -Install the Docker engine. - -```bash -yum install -y docker-engine -docker version -# The following shows the output of docker version. -Client: - Version: 18.09.0 - EulerVersion: 18.09.0.300 - API version: 1.39 - Go version: go1.17.3 - Git commit: aa1eee8 - Built: Wed Mar 30 05:07:38 2022 - OS/Arch: linux/amd64 - Experimental: false - -Server: - Engine: - Version: 18.09.0 - EulerVersion: 18.09.0.300 - API version: 1.39 (minimum version 1.12) - Go version: go1.17.3 - Git commit: aa1eee8 - Built: Tue Mar 22 00:00:00 2022 - OS/Arch: linux/amd64 - Experimental: false -``` - -### Hybrid Deployed Services - -**Online Service ClickHouse** - -Use the clickhouse-benchmark tool to test the performance and collect statistics on performance metrics such as QPS, P50, P90, and P99. For details, see https://clickhouse.com/docs/en/operations/utilities/clickhouse-benchmark/. - -**Offline Service Stress** - -Stress is a CPU-intensive test tool. You can specify the **--cpu** option to start multiple concurrent CPU-intensive tasks to increase the stress on the system. - -### Usage Instructions - -1) Start a ClickHouse container (online service). - -2) Access the container and run the **clickhouse-benchmark** command. Set the number of concurrent queries to **10**, the number of queries to **10000**, and time limit to **30**. - -3) Start a Stress container (offline service) at the same time and concurrently execute 10 CPU-intensive tasks to increase the stress on the environment. - -4) After the **clickhouse-benchmark** command is executed, a performance test report is generated. - -The **test_demo.sh** script for the isolation test for hybrid deployed services is as follows: - -```bash -#!/bin/bash - -with_offline=${1:-no_offline} -enable_isolation=${2:-no_isolation} -stress_num=${3:-10} -concurrency=10 -timeout=30 -output=/tmp/result.json -online_container= -offline_container= - -exec_sql="echo \"SELECT * FROM system.numbers LIMIT 10000000 OFFSET 10000000\" | clickhouse-benchmark -i 10000 -c $concurrency -t $timeout" - -function prepare() -{ - echo "Launch clickhouse container." - online_container=$(docker run -itd \ - -v /tmp:/tmp:rw \ - --ulimit nofile=262144:262144 \ - -p 34424:34424 \ - yandex/clickhouse-server) - - sleep 3 - echo "Clickhouse container lauched." -} - -function clickhouse() -{ - echo "Start clickhouse benchmark test." - docker exec $online_container bash -c "$exec_sql --json $output" - echo "Clickhouse benchmark test done." -} - -function stress() -{ - echo "Launch stress container." - offline_container=$(docker run -itd joedval/stress --cpu $stress_num) - echo "Stress container launched." - - if [ $enable_isolation == "enable_isolation" ]; then - echo "Set stress container qos level to -1." - echo -1 > /sys/fs/cgroup/cpu/docker/$offline_container/cpu.qos_level - fi -} - -function benchmark() -{ - if [ $with_offline == "with_offline" ]; then - stress - sleep 3 - fi - clickhouse - echo "Remove test containers." - docker rm -f $online_container - docker rm -f $offline_container - echo "Finish benchmark test for clickhouse(online) and stress(offline) colocation." - echo "===============================clickhouse benchmark==================================================" - cat $output - echo "===============================clickhouse benchmark==================================================" -} - -prepare -benchmark -``` - -### Test Results - -Independently execute the online service ClickHouse. - -```bash -sh test_demo.sh no_offline no_isolation -``` - -The baseline QoS data (QPS/P50/P90/P99) of the online service is as follows: - -```json -{ -"localhost:9000": { -"statistics": { -"QPS": 1.8853412284364512, -...... -}, -"query_time_percentiles": { -...... -"50": 0.484905256, -"60": 0.519641313, -"70": 0.570876148, -"80": 0.632544937, -"90": 0.728295525, -"95": 0.808700418, -"99": 0.873945121, -...... -} -} -} -``` - -Execute the **test_demo.sh** script to start the offline service Stress and run the test with the isolation function disabled. - -```bash -# **with_offline** indicates that the offline service Stress is enabled. -# **no_isolation** indicates that isolation of hybrid deployed services is disabled. -sh test_demo.sh with_offline no_isolation -``` - -**When isolation of hybrid deployed services is disabled**, the QoS data (QPS/P80/P90/P99) of the ClickHouse service is as follows: - -```json -{ -"localhost:9000": { -"statistics": { -"QPS": 0.9424028693636205, -...... -}, -"query_time_percentiles": { -...... -"50": 0.840476774, -"60": 1.304607373, -"70": 1.393591017, -"80": 1.41277543, -"90": 1.430316688, -"95": 1.457534764, -"99": 1.555646855, -...... -} -} -``` - -Execute the **test_demo.sh** script to start the offline service Stress and run the test with the isolation function enabled. - -```bash -# **with_offline** indicates that the offline service Stress is enabled. -# **enable_isolation** indicates that isolation of hybrid deployed services is enabled. -sh test_demo.sh with_offline enable_isolation -``` - -**When isolation of hybrid deployed services is enabled**, the QoS data (QPS/P80/P90/P99) of the ClickHouse service is as follows: - -```json -{ -"localhost:9000": { -"statistics": { -"QPS": 1.8825798759270718, -...... -}, -"query_time_percentiles": { -...... -"50": 0.485725185, -"60": 0.512629901, -"70": 0.55656488, -"80": 0.636395956, -"90": 0.734695906, -"95": 0.804118275, -"99": 0.887807409, -...... -} -} -} -``` - -The following table lists the test results. - -| Service Deployment Mode | QPS | P50 | P90 | P99 | -| -------------------------------------- | ------------- | ------------- | ------------- | ------------- | -| ClickHouse (baseline) | 1.885 | 0.485 | 0.728 | 0.874 | -| ClickHouse + Stress (isolation disabled)| 0.942 (-50%) | 0.840 (-42%) | 1.430 (-49%) | 1.556 (-44%) | -| ClickHouse + Stress (isolation enabled) | 1.883 (-0.11%) | 0.486 (-0.21%) | 0.735 (-0.96%) | 0.888 (-1.58%) | - -When isolation of hybrid deployed services is disabled, the QPS of ClickHouse decreases from approximately 1.9 to 0.9, the service response delay (P90) increases from approximately 0.7s to 1.4s, and the QoS decreases by about 50%. When isolation of hybrid deployed services is enabled, the QPS and response delay (P50/P90/P99) of ClickHouse decrease by less than 2% compared with the baseline, and the QoS remains unchanged. diff --git a/docs/en/docs/rubik/figures/icon-note.gif b/docs/en/docs/rubik/figures/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/en/docs/rubik/figures/icon-note.gif and /dev/null differ diff --git a/docs/en/docs/rubik/http-apis.md b/docs/en/docs/rubik/http-apis.md deleted file mode 100644 index f7f4752051ad6b91035d343afda0d89475546edc..0000000000000000000000000000000000000000 --- a/docs/en/docs/rubik/http-apis.md +++ /dev/null @@ -1,67 +0,0 @@ -# HTTP APIs - -## Overview - -The open APIs of Rubik are all HTTP APIs, including the API for setting or updating the pod priority, API for detecting the Rubik availability, and API for querying the Rubik version. - -## APIs - -### API for Setting or Updating the Pod Priority - -Rubik provides the function of setting or updating the pod priority. External systems can call this API to send pod information. Rubik sets the priority based on the received pod information to isolate resources. The API call format is as follows: - -```bash -HTTP POST /run/rubik/rubik.sock -{ - "Pods": { - "podaaa": { - "CgroupPath": "kubepods/burstable/podaaa", - "QosLevel": 0 - }, - "podbbb": { - "CgroupPath": "kubepods/burstable/podbbb", - "QosLevel": -1 - } - } -} -``` - -In the **Pods** settings, specify information about the pods whose priorities need to be set or updated. At least one pod must be specified for each HTTP request, and **CgroupPath** and **QosLevel** must be specified for each pod. The meanings of **CgroupPath** and **QosLevel** are as follows: - -| Item | Value Type| Value Range| Description | -| ---------- | ---------- | ------------ | ------------------------------------------------------- | -| QosLevel | Integer | 0, -1 | pod priority. The value **0** indicates that the service is an online service, and the value **-1** indicates that the service is an offline service. | -| CgroupPath | String | Relative path | cgroup subpath of the pod (relative path in the cgroup subsystem)| - -The following is an example of calling the API: - -```sh -curl -v -H "Accept: application/json" -H "Content-type: application/json" -X POST --data '{"Pods": {"podaaa": {"CgroupPath": "kubepods/burstable/podaaa","QosLevel": 0},"podbbb": {"CgroupPath": "kubepods/burstable/podbbb","QosLevel": -1}}}' --unix-socket /run/rubik/rubik.sock http://localhost/ -``` - -### API for Detecting Availability - -As an HTTP service, Rubik provides an API for detecting whether it is running. - -API format: HTTP/GET /ping - -The following is an example of calling the API: - -```sh -curl -XGET --unix-socket /run/rubik/rubik.sock http://localhost/ping -``` - -If **ok** is returned, the Rubik service is running. - -### API for Querying Version Information - -Rubik allows you to query the Rubik version number through an HTTP request. - -API format: HTTP/GET /version - -The following is an example of calling the API: - -```sh -curl -XGET --unix-socket /run/rubik/rubik.sock http://localhost/version -{"Version":"0.0.1","Release":"1","Commit":"29910e6","BuildTime":"2021-05-12"} -``` diff --git a/docs/en/docs/rubik/installation-and-deployment.md b/docs/en/docs/rubik/installation-and-deployment.md deleted file mode 100644 index 2d19687826b9189b541aa7f11c15b101c53a7498..0000000000000000000000000000000000000000 --- a/docs/en/docs/rubik/installation-and-deployment.md +++ /dev/null @@ -1,199 +0,0 @@ -# Installation and Deployment - -## Overview - -This chapter describes how to install and deploy the Rubik component. - -## Software and Hardware Requirements - -### Hardware - -* Architecture: x86 or AArch64 -* Drive: 1 GB or more -* Memory: 100 MB or more - -### Software - -* OS: openEuler 22.03-LTS -* Kernel: openEuler 22.03-LTS kernel - -### Environment Preparation - -* Install the openEuler OS. For details, see the _openEuler 22.03-LTS Installation Guide_. -* Install and deploy Kubernetes. For details, see the _Kubernetes Cluster Deployment Guide_. -* Install the Docker or iSulad container engine. If the iSulad container engine is used, you need to install the isula-build container image building tool. - -## Installing Rubik - -Rubik is deployed on each Kubernetes node as a DaemonSet. Therefore, you need to perform the following steps to install the Rubik RPM package on each node. - -1. Configure the Yum repositories openEuler 22.03-LTS and openEuler 22.03-LTS:EPOL (the Rubik component is available only in the EPOL repository). - - ``` - # openEuler 22.03-LTS official repository - name=openEuler22.03 - baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.03-LTS/everything/$basearch/RPM-GPG-KEY-openEuler - ``` - - ``` - # openEuler 22.03-LTS:EPOL official repository - name=Epol - baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/EPOL/$basearch/ - enabled=1 - gpgcheck=0 - ``` - -2. Install Rubik with **root** permissions. - - ```shell - sudo yum install -y rubik - ``` - - -> ![](./figures/icon-note.gif)**Note**: -> -> Files related to Rubik are installed in the **/var/lib/rubik** directory. - -## Deploying Rubik - -Rubik runs as a container in a Kubernetes cluster in hybrid deployment scenarios. It is used to isolate and restrict resources for services with different priorities to prevent offline services from interfering with online services, improving the overall resource utilization and ensuring the quality of online services. Currently, Rubik supports isolation and restriction of CPU and memory resources, and must be used together with the openEuler 22.03-LTS kernel. To enable or disable the memory priority feature (that is, memory tiering for services with different priorities), you need to set the value in the **/proc/sys/vm/memcg_qos_enable** file. The value can be **0** or **1**. The default value **0** indicates that the feature is disabled, and the value **1** indicates that the feature is enabled. - -```bash -sudo echo 1 > /proc/sys/vm/memcg_qos_enable -``` - -### Deploying Rubik DaemonSet - -1. Use the Docker or isula-build engine to build Rubik images. Because Rubik is deployed as a DaemonSet, each node requires a Rubik image. After building an image on a node, use the **docker save** and **docker load** commands to load the Rubik image to each node of Kubernetes. Alternatively, build a Rubik image on each node. The following uses isula-build as an example. The command is as follows: - -```sh -isula-build ctr-img build -f /var/lib/rubik/Dockerfile --tag rubik:0.1.0 . -``` - -1. On the Kubernetes master node, change the Rubik image name in the **/var/lib/rubik/rubik-daemonset.yaml** file to the name of the image built in the previous step. - -```yaml -... -containers: -- name: rubik-agent - image: rubik:0.1.0 # The image name must be the same as the Rubik image name built in the previous step. - imagePullPolicy: IfNotPresent -... -``` - -3. On the Kubernetes master node, run the **kubectl** command to deploy the Rubik DaemonSet so that Rubik will be automatically deployed on all Kubernetes nodes. - -```sh -kubectl apply -f /var/lib/rubik/rubik-daemonset.yaml -``` - -4. Run the **kubectl get pods -A** command to check whether Rubik has been deployed on each node in the cluster. (The number of rubik-agents is the same as the number of nodes and all rubik-agents are in the Running status.) - -```sh -[root@localhost rubik]# kubectl get pods -A -NAMESPACE NAME READY STATUS RESTARTS AGE -... -kube-system rubik-agent-76ft6 1/1 Running 0 4s -... -``` - -## Common Configuration Description - -The Rubik deployed using the preceding method is started with the default configurations. You can modify the Rubik configurations as required by modifying the **config.json** section in the **rubik-daemonset.yaml** file and then redeploy the Rubik DaemonSet. - -This section describes common configurations in **config.json**. - -### Configuration Item Description - -```yaml -# The configuration items are in the config.json section of the rubik-daemonset.yaml file. -{ - "autoConfig": true, - "autoCheck": false, - "logDriver": "stdio", - "logDir": "/var/log/rubik", - "logSize": 1024, - "logLevel": "info", - "cgroupRoot": "/sys/fs/cgroup" -} -``` - -| Item | Value Type| Value Range | Description | -| ---------- | ---------- | ------------------ | ------------------------------------------------------------ | -| autoConfig | Boolean | **true** or **false** | **true**: enables automatic pod awareness.
**false**: disables automatic pod awareness.| -| autoCheck | Boolean | **true** or **false** | **true**: enables pod priority check.
**false**: disables pod priority check.| -| logDriver | String | **stdio** or **file** | **stdio**: prints logs to the standard output. The scheduling platform collects and dumps logs.
**file**: prints files to the log directory specified by **logDir**.| -| logDir | String | Absolute path | Directory for storing logs. | -| logSize | Integer | [10,1048576] | Total size of logs, in MB. If the total size of logs reaches the upper limit, the earliest logs will be discarded.| -| logLevel | String | **error**, **info**, or **debug**| Log level. | -| cgroupRoot | String | Absolute path | cgroup mount point. | - -### Automatic Configuration of Pod Priorities - -If **autoConfig** is set to **true** in **config.json** to enable automatic pod awareness, you only need to specify the priority using annotations in the YAML file when deploying the service pods. After being deployed successfully, Rubik automatically detects the creation and update of the pods on the current node, and sets the pod priorities based on the configured priorities. - -### Pod Priority Configuration Depending on kubelet - -Automatic pod priority configuration depends on the pod creation event notifications from the API server, which have a certain delay. The pod priority cannot be configured before the process is started. As a result, the service performance may fluctuate. To avoid this problem, you can disable the automatic priority configuration option and modify the kubelet source code, so that pod priorities can be configured using Rubik HTTP APIs after the cgroup of each container is created and before each container process is started. For details about how to use the HTTP APIs, see [HTTP APIs](./http-apis.md). - -### Automatic Verification of Pod Priorities - -Rubik supports consistency check on the pod QoS priority configurations of the current node during startup. It checks whether the configuration in the Kubernetes cluster is consistent with the pod priority configuration of Rubik. This function is disabled by default. You can enable or disable it using the **autoCheck** option. If this function is enabled, Rubik automatically verifies and corrects the pod priority configuration of the current node when it is started or restarted. - -## Configuring Rubik for Online and Offline Services - -After Rubik is successfully deployed, you can modify the YAML file of a service to specify the service type based on the following configuration example. Then Rubik can configure the priority of the service after it is deployed to isolate resources. - -The following is an example of deploying an online Nginx service: - -```yaml -apiVersion: v1 -kind: Pod -metadata: - name: nginx - namespace: qosexample - annotations: - volcano.sh/preemptable: "false" # If volcano.sh/preemptable is set to true, the service is an offline service. If it is set to false, the service is an online service. The default value is false. -spec: - containers: - - name: nginx - image: nginx - resources: - limits: - memory: "200Mi" - cpu: "1" - requests: - memory: "200Mi" - cpu: "1" -``` - -## Restrictions - -- The maximum number of concurrent HTTP requests that Rubik can receive is 1,000 QPS. If the number of concurrent HTTP requests exceeds the upper limit, an error is reported. - -- The maximum number of pods in a single request received by Rubik is 100. If the number of pods exceeds the upper limit, an error is reported. - -- Only one set of Rubik instances can be deployed on each Kubernetes node. Multiple sets of Rubik instances may conflict with each other. - -- Rubik does not provide port access and can communicate only through sockets. - -- Rubik accepts only valid HTTP request paths and network protocols: http://localhost/ (POST), http://localhost/ping (GET), and http://localhost/version (GET). For details about the functions of HTTP requests, see HTTP APIs(./http-apis.md). - -- Rubik drive requirement: 1 GB or more. - -- Rubik memory requirement: 100 MB or more. - -- Services cannot be switched from a low priority (offline services) to a high priority (online services). For example, if service A is set to an offline service and then to an online service, Rubik reports an error. - -- When directories are mounted to a Rubik container, the minimum permission on the Rubik local socket directory **/run/Rubik** is **700** on the service side. - -- When the Rubik service is available, the timeout interval of a single request is 120s. If the Rubik process enters the T (stopped or being traced) or D (uninterruptible sleep) state, the service becomes unavailable. In this case, the Rubik service does not respond to any request. To avoid this problem, set the timeout interval on the client to avoid infinite waiting. - -- If hybrid deployment is used, the original CPU share funtion of cgroup has the following restrictions: - - If both online and offline tasks are running on the CPU, the CPU share configuration of offline tasks does not take effect. - - If the current CPU has only online or offline tasks, the CPU share configuration takes effect. diff --git a/docs/en/docs/rubik/overview.md b/docs/en/docs/rubik/overview.md deleted file mode 100644 index 7c9aa04a502613ea7eb83fff57430a096ee1e232..0000000000000000000000000000000000000000 --- a/docs/en/docs/rubik/overview.md +++ /dev/null @@ -1,17 +0,0 @@ -# Rubik User Guide - -## Overview - -Low server resource utilization has always been a recognized challenge in the industry. With the development of cloud native technologies, hybrid deployment of online (high-priority) and offline (low-priority) services becomes an effective means to improve resource utilization. - -In hybrid service deployment scenarios, Rubik can properly schedule resources based on Quality if Service (QoS) levels to greatly improve resource utilization while ensuring the quality of online services. - -Rubik supports the following features: - -- Pod CPU priority configuration -- Pod memory priority configuration - -This document is intended for community developers, open source enthusiasts, and partners who use the openEuler system and want to learn and use Rubik. Users must: - -* Know basic Linux operations. -* Be familiar with basic operations of Kubernetes and Docker/iSulad. diff --git a/docs/en/docs/thirdparty_migration/OpenStack-train.md b/docs/en/docs/thirdparty_migration/OpenStack-train.md deleted file mode 100644 index 7ad42a2867c6e95a01ee866f7d271e663b43335c..0000000000000000000000000000000000000000 --- a/docs/en/docs/thirdparty_migration/OpenStack-train.md +++ /dev/null @@ -1,2961 +0,0 @@ -# OpenStack-Wallaby Deployment Guide - - - -- [OpenStack-Wallaby Deployment Guide](#openstack-wallaby-deployment-guide) - - [OpenStack](#openstack) - - [Conventions](#conventions) - - [Preparing the Environment](#preparing-the-environment) - - [Environment Configuration](#environment-configuration) - - [Installing the SQL Database](#installing-the-sql-database) - - [Installing RabbitMQ](#installing-rabbitmq) - - [Installing Memcached](#installing-memcached) - - [OpenStack Installation](#openstack-installation) - - [Installing Keystone](#installing-keystone) - - [Installing Glance](#installing-glance) - - [Installing Placement](#installing-placement) - - [Installing Nova](#installing-nova) - - [Installing Neutron](#installing-neutron) - - [Installing Cinder](#installing-cinder) - - [Installing Horizon](#installing-horizon) - - [Installing Tempest](#installing-tempest) - - [Installing Ironic](#installing-ironic) - - [Installing Kolla](#installing-kolla) - - [Installing Trove](#installing-trove) - - [Installing Swift](#installing-swift) - - [Installing Cyborg](#installing-cyborg) - - [Installing Aodh](#installing-aodh) - - [Installing Gnocchi](#installing-gnocchi) - - [Installing Ceilometer](#installing-ceilometer) - - [Installing Heat](#installing-heat) - - [OpenStack Quick Installation](#openstack-quick-installation) - - -## OpenStack - -OpenStack is an open source cloud computing infrastructure software project developed by the community. It provides an operating platform or tool set for deploying the cloud, offering scalable and flexible cloud computing for organizations. - -As an open source cloud computing management platform, OpenStack consists of several major components, such as Nova, Cinder, Neutron, Glance, Keystone, and Horizon. OpenStack supports almost all cloud environments. The project aims to provide a cloud computing management platform that is easy-to-use, scalable, unified, and standardized. OpenStack provides an infrastructure as a service (IaaS) solution that combines complementary services, each of which provides an API for integration. - -The official source of openEuler 22.03-LTS now supports OpenStack Train. You can configure the Yum source then deploy OpenStack by following the instructions of this document. - -## Conventions - -OpenStack supports multiple deployment modes. This document includes two deployment modes: **All in One** and **Distributed**. The conventions are as follows: - -**All in One** mode: - -```text -Ignores all possible suffixes. -``` - -**Distributed** mode: - -```text -A suffix of (CTL) indicates that the configuration or command applies only to the control node. -A suffix of (CPT) indicates that the configuration or command applies only to the compute node. -A suffix of (STG) indicates that the configuration or command applies only to the storage node. -In other cases, the configuration or command applies to both the control node and compute node. -``` - -***Note*** - -The services involved in the preceding conventions are as follows: - -- Cinder -- Nova -- Neutron - -## Preparing the Environment - -### Environment Configuration - -1. Start the OpenStack Train Yum source. - - ```shell - yum update - yum install openstack-release-train - yum clean all && yum makecache - ``` - - **Note**: Enable the EPOL repository for the Yum source if it is not enabled already. - - ```shell - vi /etc/yum.repos.d/openEuler.repo - - [EPOL] - name=EPOL - baseurl=http://repo.openeuler.org/openEuler-22.03-LTS/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.03-LTS/OS/$basearch/RPM-GPG-KEY-openEuler - EOF - ``` - -2. Change the host name and mapping. - - Set the host name of each node: - - ```shell - hostnamectl set-hostname controller (CTL) - hostnamectl set-hostname compute (CPT) - ``` - - Assuming the IP address of the controller node is **10.0.0.11** and the IP address of the compute node (if any) is **10.0.0.12**, add the following information to the **/etc/hosts** file: - - ```shell - 10.0.0.11 controller - 10.0.0.12 compute - ``` - -### Installing the SQL Database - -1. Run the following command to install the software package: - - ```shell - yum install mariadb mariadb-server python3-PyMySQL - ``` - -2. Run the following command to create and edit the **/etc/my.cnf.d/openstack.cnf** file: - - ```shell - vim /etc/my.cnf.d/openstack.cnf - - [mysqld] - bind-address = 10.0.0.11 - default-storage-engine = innodb - innodb_file_per_table = on - max_connections = 4096 - collation-server = utf8_general_ci - character-set-server = utf8 - ``` - - ***Note*** - - **`bind-address` is set to the management IP address of the controller node.** - -3. Run the following commands to start the database service and configure it to automatically start upon system boot: - - ```shell - systemctl enable mariadb.service - systemctl start mariadb.service - ``` - -4. (Optional) Configure the default database password: - - ```shell - mysql_secure_installation - ``` - - ***Note*** - - **Perform operations as prompted.** - -### Installing RabbitMQ - -1. Run the following command to install the software package: - - ```shell - yum install rabbitmq-server - ``` - -2. Start the RabbitMQ service and configure it to automatically start upon system boot: - - ```shell - systemctl enable rabbitmq-server.service - systemctl start rabbitmq-server.service - ``` - -3. Add the OpenStack user: - - ```shell - rabbitmqctl add_user openstack RABBIT_PASS - ``` - - ***Note*** - - **Replace *RABBIT_PASS* to set the password for the openstack user.** - -4. Run the following command to set the permission of the **openstack** user to allow the user to perform configuration, write, and read operations: - - ```shell - rabbitmqctl set_permissions openstack ".*" ".*" ".*" - ``` - -### Installing Memcached - -1. Run the following command to install the dependency package: - - ```shell - yum install memcached python3-memcached - ``` - -2. Open the **/etc/sysconfig/memcached** file in insert mode. - - ```shell - vim /etc/sysconfig/memcached - - OPTIONS="-l 127.0.0.1,::1,controller" - ``` - -3. Run the following command to start the Memcached service and configure it to automatically start upon system boot: - - ```shell - systemctl enable memcached.service - systemctl start memcached.service - ``` - - ***Note*** - - **After the service is started, you can run `memcached-tool controller stats` to ensure that the service is started properly and available. You can replace `controller` with the management IP address of the controller node.** - -## OpenStack Installation - -### Installing Keystone - -1. Create the **keystone** database and grant permissions: - - ``` sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE keystone; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost' \ - IDENTIFIED BY 'KEYSTONE_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'%' \ - IDENTIFIED BY 'KEYSTONE_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note*** - - **Replace *KEYSTONE_DBPASS* to set the password for the keystone database.** - -2. Install the software package: - - ```shell - yum install openstack-keystone httpd mod_wsgi - ``` - -3. Configure Keystone: - - ```shell - vim /etc/keystone/keystone.conf - - [database] - connection = mysql+pymysql://keystone:KEYSTONE_DBPASS@controller/keystone - - [token] - provider = fernet - ``` - - ***Description*** - - In the **[database]** section, configure the database entry . - - In the **[token]** section, configure the token provider . - - ***Note:*** - - **Replace *KEYSTONE_DBPASS* with the password of the keystone database.** - -4. Synchronize the database: - - ```shell - su -s /bin/sh -c "keystone-manage db_sync" keystone - ``` - -5. Initialize the Fernet keystore: - - ```shell - keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone - keystone-manage credential_setup --keystone-user keystone --keystone-group keystone - ``` - -6. Start the service: - - ```shell - keystone-manage bootstrap --bootstrap-password ADMIN_PASS \ - --bootstrap-admin-url http://controller:5000/v3/ \ - --bootstrap-internal-url http://controller:5000/v3/ \ - --bootstrap-public-url http://controller:5000/v3/ \ - --bootstrap-region-id RegionOne - ``` - - ***Note*** - - **Replace *ADMIN_PASS* to set the password for the admin user.** - -7. Configure the Apache HTTP server: - - ```shell - vim /etc/httpd/conf/httpd.conf - - ServerName controller - ``` - - ```shell - ln -s /usr/share/keystone/wsgi-keystone.conf /etc/httpd/conf.d/ - ``` - - ***Description*** - - Configure **ServerName** to use the control node. - - ***Note*** - **If the ServerName item does not exist, create it. - -8. Start the Apache HTTP service: - - ```shell - systemctl enable httpd.service - systemctl start httpd.service - ``` - -9. Create environment variables: - - ```shell - cat << EOF >> ~/.admin-openrc - export OS_PROJECT_DOMAIN_NAME=Default - export OS_USER_DOMAIN_NAME=Default - export OS_PROJECT_NAME=admin - export OS_USERNAME=admin - export OS_PASSWORD=ADMIN_PASS - export OS_AUTH_URL=http://controller:5000/v3 - export OS_IDENTITY_API_VERSION=3 - export OS_IMAGE_API_VERSION=2 - EOF - ``` - - ***Note*** - - **Replace *ADMIN_PASS* with the password of the admin user.** - -10. Create domains, projects, users, and roles in sequence. python3-openstackclient must be installed first: - - ```shell - yum install python3-openstackclient - ``` - - Import the environment variables: - - ```shell - source ~/.admin-openrc - ``` - - Create the project **service**. The domain **default** has been created during keystone-manage bootstrap. - - ```shell - openstack domain create --description "An Example Domain" example - ``` - - ```shell - openstack project create --domain default --description "Service Project" service - ``` - - Create the (non-admin) project **myproject**, user **myuser**, and role **myrole**, and add the role **myrole** to **myproject** and **myuser**. - - ```shell - openstack project create --domain default --description "Demo Project" myproject - openstack user create --domain default --password-prompt myuser - openstack role create myrole - openstack role add --project myproject --user myuser myrole - ``` - -11. Perform the verification. - - Cancel the temporary environment variables **OS_AUTH_URL** and **OS_PASSWORD**. - - ```shell - source ~/.admin-openrc - unset OS_AUTH_URL OS_PASSWORD - ``` - - Request a token for the **admin** user: - - ```shell - openstack --os-auth-url http://controller:5000/v3 \ - --os-project-domain-name Default --os-user-domain-name Default \ - --os-project-name admin --os-username admin token issue - ``` - - Request a token for user **myuser**: - - ```shell - openstack --os-auth-url http://controller:5000/v3 \ - --os-project-domain-name Default --os-user-domain-name Default \ - --os-project-name myproject --os-username myuser token issue - ``` - -### Installing Glance - -1. Create the database, service credentials, and the API endpoints. - - Create the database: - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE glance; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'localhost' \ - IDENTIFIED BY 'GLANCE_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'%' \ - IDENTIFIED BY 'GLANCE_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note:*** - - **Replace *GLANCE_DBPASS* to set the password for the glance database.** - - Create the service credential: - - ```shell - source ~/.admin-openrc - - openstack user create --domain default --password-prompt glance - openstack role add --project service --user glance admin - openstack service create --name glance --description "OpenStack Image" image - ``` - - Create the API endpoints for the image service: - - ```shell - openstack endpoint create --region RegionOne image public http://controller:9292 - openstack endpoint create --region RegionOne image internal http://controller:9292 - openstack endpoint create --region RegionOne image admin http://controller:9292 - ``` - -2. Install the software package: - - ```shell - yum install openstack-glance - ``` - -3. Configure Glance: - - ```shell - vim /etc/glance/glance-api.conf - - [database] - connection = mysql+pymysql://glance:GLANCE_DBPASS@controller/glance - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000 - auth_url = http://controller:5000 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = glance - password = GLANCE_PASS - - [paste_deploy] - flavor = keystone - - [glance_store] - stores = file,http - default_store = file - filesystem_store_datadir = /var/lib/glance/images/ - ``` - - ***Description:*** - - In the **[database]** section, configure the database entry. - - In the **[keystone_authtoken]** and **[paste_deploy]** sections, configure the identity authentication service entry. - - In the **[glance_store]** section, configure the local file system storage and the location of image files. - - ***Note*** - - **Replace *GLANCE_DBPASS* with the password of the glance database.** - - **Replace *GLANCE_PASS* with the password of user glance.** - -4. Synchronize the database: - - ```shell - su -s /bin/sh -c "glance-manage db_sync" glance - ``` - -5. Start the service: - - ```shell - systemctl enable openstack-glance-api.service - systemctl start openstack-glance-api.service - ``` - -6. Perform the verification. - - Download the image: - - ```shell - source ~/.admin-openrc - - wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img - ``` - - ***Note*** - - **If the Kunpeng architecture is used in your environment, download the image of the AArch64 version. the cirros-0.5.2-aarch64-disk.img image file has been tested.** - - Upload the image to the image service: - - ```shell - openstack image create --disk-format qcow2 --container-format bare \ - --file cirros-0.4.0-x86_64-disk.img --public cirros - ``` - - Confirm the image upload and verify the attributes: - - ```shell - openstack image list - ``` - -### Installing Placement - -1. Create a database, service credentials, and API endpoints. - - Create a database. - - Access the database as the **root** user. Create the **placement** database, and grant permissions. - - ```shell - mysql -u root -p - MariaDB [(none)]> CREATE DATABASE placement; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'localhost' \ - IDENTIFIED BY 'PLACEMENT_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'%' \ - IDENTIFIED BY 'PLACEMENT_DBPASS'; - MariaDB [(none)]> exit - ``` - - **Note**: - - **Replace *PLACEMENT_DBPASS* to set the password for the placement database.** - - ```shell - source admin-openrc - ``` - - Run the following commands to create the Placement service credentials, create the **placement** user, and add the **admin** role to the **placement** user: - - Create the Placement API Service. - - ```shell - openstack user create --domain default --password-prompt placement - openstack role add --project service --user placement admin - openstack service create --name placement --description "Placement API" placement - ``` - - Create API endpoints of the **placement** service. - - ```shell - openstack endpoint create --region RegionOne placement public http://controller:8778 - openstack endpoint create --region RegionOne placement internal http://controller:8778 - openstack endpoint create --region RegionOne placement admin http://controller:8778 - ``` - -2. Perform the installation and configuration. - - Install the software package: - - ```shell - yum install openstack-placement-api - ``` - - Configure Placement: - - Edit the **/etc/placement/placement.conf** file: - - In the **[placement_database]** section, configure the database entry. - - In **[api]** and **[keystone_authtoken]** sections, configure the identity authentication service entry. - - ```shell - # vim /etc/placement/placement.conf - [placement_database] - # ... - connection = mysql+pymysql://placement:PLACEMENT_DBPASS@controller/placement - [api] - # ... - auth_strategy = keystone - [keystone_authtoken] - # ... - auth_url = http://controller:5000/v3 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = placement - password = PLACEMENT_PASS - ``` - - Replace **PLACEMENT_DBPASS** with the password of the **placement** database, and replace **PLACEMENT_PASS** with the password of the **placement** user. - - Synchronize the database: - - ```shell - su -s /bin/sh -c "placement-manage db sync" placement - ``` - - Start the httpd service. - - ```shell - systemctl restart httpd - ``` - -3. Perform the verification. - - Run the following command to check the status: - - ```shell - . admin-openrc - placement-status upgrade check - ``` - - Run the following command to install osc-placement and list the available resource types and features: - - ```shell - yum install python3-osc-placement - openstack --os-placement-api-version 1.2 resource class list --sort-column name - openstack --os-placement-api-version 1.6 trait list --sort-column name - ``` - -### Installing Nova - -1. Create a database, service credentials, and API endpoints. - - Create a database. - - ```sql - mysql -u root -p (CTL) - - MariaDB [(none)]> CREATE DATABASE nova_api; - MariaDB [(none)]> CREATE DATABASE nova; - MariaDB [(none)]> CREATE DATABASE nova_cell0; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'localhost' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'%' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'localhost' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'%' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'localhost' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'%' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> exit - ``` - - **Note**: - - **Replace *NOVA_DBPASS* to set the password for the nova database.** - - ```shell - source ~/.admin-openrc (CTL) - ``` - - Run the following command to create the Nova service certificate: - - ```shell - openstack user create --domain default --password-prompt nova (CTL) - openstack role add --project service --user nova admin (CTL) - openstack service create --name nova --description "OpenStack Compute" compute (CTL) - ``` - - Create a Nova API endpoint. - - ```shell - openstack endpoint create --region RegionOne compute public http://controller:8774/v2.1 (CTL) - openstack endpoint create --region RegionOne compute internal http://controller:8774/v2.1 (CTL) - openstack endpoint create --region RegionOne compute admin http://controller:8774/v2.1 (CTL) - ``` - -2. Install the software packages: - - ```shell - yum install openstack-nova-api openstack-nova-conductor \ (CTL) - openstack-nova-novncproxy openstack-nova-scheduler - - yum install openstack-nova-compute (CPT) - ``` - - **Note**: - - **If the ARM64 architecture is used, you also need to run the following command:** - - ```shell - yum install edk2-aarch64 (CPT) - ``` - -3. Configure Nova: - - ```shell - vim /etc/nova/nova.conf - - [DEFAULT] - enabled_apis = osapi_compute,metadata - transport_url = rabbit://openstack:RABBIT_PASS@controller:5672/ - my_ip = 10.0.0.1 - use_neutron = true - firewall_driver = nova.virt.firewall.NoopFirewallDriver - compute_driver=libvirt.LibvirtDriver (CPT) - instances_path = /var/lib/nova/instances/ (CPT) - lock_path = /var/lib/nova/tmp (CPT) - - [api_database] - connection = mysql+pymysql://nova:NOVA_DBPASS@controller/nova_api (CTL) - - [database] - connection = mysql+pymysql://nova:NOVA_DBPASS@controller/nova (CTL) - - [api] - auth_strategy = keystone - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000/ - auth_url = http://controller:5000/ - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = nova - password = NOVA_PASS - - [vnc] - enabled = true - server_listen = $my_ip - server_proxyclient_address = $my_ip - novncproxy_base_url = http://controller:6080/vnc_auto.html (CPT) - - [glance] - api_servers = http://controller:9292 - - [oslo_concurrency] - lock_path = /var/lib/nova/tmp (CTL) - - [placement] - region_name = RegionOne - project_domain_name = Default - project_name = service - auth_type = password - user_domain_name = Default - auth_url = http://controller:5000/v3 - username = placement - password = PLACEMENT_PASS - - [neutron] - auth_url = http://controller:5000 - auth_type = password - project_domain_name = default - user_domain_name = default - region_name = RegionOne - project_name = service - username = neutron - password = NEUTRON_PASS - service_metadata_proxy = true (CTL) - metadata_proxy_shared_secret = METADATA_SECRET (CTL) - ``` - - Description - - In the **[default]** section, enable the compute and metadata APIs, configure the RabbitMQ message queue entry, configure **my_ip**, and enable the network service **neutron**. - - In the **[api_database]** and **[database]** sections, configure the database entry. - - In the **[api]** and **[keystone_authtoken]** sections, configure the identity service entry. - - In the **[vnc]** section, enable and configure the entry for the remote console. - - In the **[glance]** section, configure the API address for the image service. - - In the **[oslo_concurrency]** section, configure the lock path. - - In the **[placement]** section, configure the entry of the Placement service. - - **Note**: - - **Replace *RABBIT_PASS* with the password of the openstack user in RabbitMQ.** - - **Set *my_ip* to the management IP address of the controller node.** - - **Replace *NOVA_DBPASS* with the password of the nova database.** - - **Replace *NOVA_PASS* with the password of the nova user.** - - **Replace *PLACEMENT_PASS* with the password of the placement user.** - - **Replace *NEUTRON_PASS* with the password of the neutron user.** - - **Replace *METADATA_SECRET* with a proper metadata agent secret.** - - Others - - Check whether VM hardware acceleration (x86 architecture) is supported: - - ```shell - egrep -c '(vmx|svm)' /proc/cpuinfo (CPT) - ``` - - If the returned value is **0**, hardware acceleration is not supported. You need to configure libvirt to use QEMU instead of KVM. - - ```shell - vim /etc/nova/nova.conf (CPT) - - [libvirt] - virt_type = qemu - ``` - - If the returned value is **1** or a larger value, hardware acceleration is supported. You can set the value of **virt_type** to **kvm**. - - **Note**: - - **If the ARM64 architecture is used, you also need to run the following command on the compute node:** - - ```shell - - mkdir -p /usr/share/AAVMF - chown nova:nova /usr/share/AAVMF - - ln -s /usr/share/edk2/aarch64/QEMU_EFI-pflash.raw \ - /usr/share/AAVMF/AAVMF_CODE.fd - ln -s /usr/share/edk2/aarch64/vars-template-pflash.raw \ - /usr/share/AAVMF/AAVMF_VARS.fd - - vim /etc/libvirt/qemu.conf - - nvram = ["/usr/share/AAVMF/AAVMF_CODE.fd: \ - /usr/share/AAVMF/AAVMF_VARS.fd", \ - "/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw: \ - /usr/share/edk2/aarch64/vars-template-pflash.raw"] - ``` - In addition, when the deployment environment in the ARM architecture is nested virtualization, configure the **[libvirt]** section as follows: - - ```shell - [libvirt] - virt_type = qemu - cpu_mode = custom - cpu_model = cortex-a72 - ``` - -4. Synchronize the database. - - Run the following command to synchronize the **nova-api** database: - - ```shell - su -s /bin/sh -c "nova-manage api_db sync" nova (CTL) - ``` - - Run the following command to register the **cell0** database: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 map_cell0" nova (CTL) - ``` - - Create the **cell1** cell: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 create_cell --name=cell1 --verbose" nova (CTL) - ``` - - Synchronize the **nova** database: - - ```shell - su -s /bin/sh -c "nova-manage db sync" nova (CTL) - ``` - - Verify whether **cell0** and **cell1** are correctly registered: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 list_cells" nova (CTL) - ``` - - Add compute node to the OpenStack cluster: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 discover_hosts --verbose" nova (CPT) - ``` - -5. Start the services: - - ```shell - systemctl enable \ (CTL) - openstack-nova-api.service \ - openstack-nova-scheduler.service \ - openstack-nova-conductor.service \ - openstack-nova-novncproxy.service - - systemctl start \ (CTL) - openstack-nova-api.service \ - openstack-nova-scheduler.service \ - openstack-nova-conductor.service \ - openstack-nova-novncproxy.service - ``` - - ```shell - systemctl enable libvirtd.service openstack-nova-compute.service (CPT) - systemctl start libvirtd.service openstack-nova-compute.service (CPT) - ``` - -6. Perform the verification. - - ```shell - source ~/.admin-openrc (CTL) - ``` - - List the service components to verify that each process is successfully started and registered: - - ```shell - openstack compute service list (CTL) - ``` - - List the API endpoints in the identity service to verify the connection to the identity service: - - ```shell - openstack catalog list (CTL) - ``` - - List the images in the image service to verify the connections: - - ```shell - openstack image list (CTL) - ``` - - Check whether the cells are running properly and whether other prerequisites are met. - - ```shell - nova-status upgrade check (CTL) - ``` - -### Installing Neutron - -1. Create the database, service credentials, and API endpoints. - - Create the database: - - ```sql - mysql -u root -p (CTL) - - MariaDB [(none)]> CREATE DATABASE neutron; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'localhost' \ - IDENTIFIED BY 'NEUTRON_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'%' \ - IDENTIFIED BY 'NEUTRON_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note*** - - **Replace *NEUTRON_DBPASS* to set the password for the neutron database.** - - ```shell - source ~/.admin-openrc (CTL) - ``` - - Create the **neutron** service credential: - - ```shell - openstack user create --domain default --password-prompt neutron (CTL) - openstack role add --project service --user neutron admin (CTL) - openstack service create --name neutron --description "OpenStack Networking" network (CTL) - ``` - - Create the API endpoints of the Neutron service: - - ```shell - openstack endpoint create --region RegionOne network public http://controller:9696 (CTL) - openstack endpoint create --region RegionOne network internal http://controller:9696 (CTL) - openstack endpoint create --region RegionOne network admin http://controller:9696 (CTL) - ``` - -2. Install the software packages: - - ```shell - yum install openstack-neutron openstack-neutron-linuxbridge ebtables ipset \ (CTL) - openstack-neutron-ml2 - ``` - - ```shell - yum install openstack-neutron-linuxbridge ebtables ipset (CPT) - ``` - -3. Configure Neutron. - - Set the main configuration items: - - ```shell - vim /etc/neutron/neutron.conf - - [database] - connection = mysql+pymysql://neutron:NEUTRON_DBPASS@controller/neutron (CTL) - - [DEFAULT] - core_plugin = ml2 (CTL) - service_plugins = router (CTL) - allow_overlapping_ips = true (CTL) - transport_url = rabbit://openstack:RABBIT_PASS@controller - auth_strategy = keystone - notify_nova_on_port_status_changes = true (CTL) - notify_nova_on_port_data_changes = true (CTL) - api_workers = 3 (CTL) - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000 - auth_url = http://controller:5000 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = neutron - password = NEUTRON_PASS - - [nova] - auth_url = http://controller:5000 (CTL) - auth_type = password (CTL) - project_domain_name = Default (CTL) - user_domain_name = Default (CTL) - region_name = RegionOne (CTL) - project_name = service (CTL) - username = nova (CTL) - password = NOVA_PASS (CTL) - - [oslo_concurrency] - lock_path = /var/lib/neutron/tmp - ``` - - ***Description*** - - Configure the database entry in the **[database]** section. - - Enable the ML2 and router plugins, allow IP address overlapping, and configure the RabbitMQ message queue entry in the **[default]** section. - - Configure the identity authentication service entry in the **[default]** and **[keystone]** sections. - - Enable the network to notify the change of the compute network topology in the **[default]** and **[nova]** sections. - - Configure the lock path in the **[oslo_concurrency]** section. - - ***Note*** - - **Replace *NEUTRON_DBPASS* with the password of the neutron database.** - - **Replace *RABBIT_PASS* with the password of the openstack user in RabbitMQ.** - - **Replace *NEUTRON_PASS* with the password of the neutron user.** - - **Replace *NOVA_PASS* with the password of the nova user.** - - Configure the ML2 plugin: - - ```shell - vim /etc/neutron/plugins/ml2/ml2_conf.ini - - [ml2] - type_drivers = flat,vlan,vxlan - tenant_network_types = vxlan - mechanism_drivers = linuxbridge,l2population - extension_drivers = port_security - - [ml2_type_flat] - flat_networks = provider - - [ml2_type_vxlan] - vni_ranges = 1:1000 - - [securitygroup] - enable_ipset = true - ``` - - Create the symbolic link for /etc/neutron/plugin.ini. - - ```shell - ln -s /etc/neutron/plugins/ml2/ml2_conf.ini /etc/neutron/plugin.ini - ``` - - **Note** - - **Enable flat, vlan, and vxlan networks, enable the linuxbridge and l2population mechanisms, and enable the port security extension driver in the [ml2] section.** - - **Configure the flat network as the provider virtual network in the [ml2_type_flat] section.** - - **Configure the range of the VXLAN network identifier in the [ml2_type_vxlan] section.** - - **Set ipset enabled in the [securitygroup] section.** - - **Remarks** - - **The actual configurations of l2 can be modified based as required. In this example, the provider network + linuxbridge is used.** - - Configure the Linux bridge agent: - - ```shell - vim /etc/neutron/plugins/ml2/linuxbridge_agent.ini - - [linux_bridge] - physical_interface_mappings = provider:PROVIDER_INTERFACE_NAME - - [vxlan] - enable_vxlan = true - local_ip = OVERLAY_INTERFACE_IP_ADDRESS - l2_population = true - - [securitygroup] - enable_security_group = true - firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver - ``` - - ***Description*** - - Map the provider virtual network to the physical network interface in the **[linux_bridge]** section. - - Enable the VXLAN overlay network, configure the IP address of the physical network interface that processes the overlay network, and enable layer-2 population in the **[vxlan]** section. - - Enable the security group and configure the linux bridge iptables firewall driver in the **[securitygroup]** section. - - ***Note*** - - **Replace *PROVIDER_INTERFACE_NAME* with the physical network interface.** - - **Replace *OVERLAY_INTERFACE_IP_ADDRESS* with the management IP address of the controller node.** - - Configure the Layer-3 agent: - - ```shell - vim /etc/neutron/l3_agent.ini (CTL) - - [DEFAULT] - interface_driver = linuxbridge - ``` - - ***Description*** - - Set the interface driver to linuxbridge in the **[default]** section. - - Configure the DHCP agent: - - ```shell - vim /etc/neutron/dhcp_agent.ini (CTL) - - [DEFAULT] - interface_driver = linuxbridge - dhcp_driver = neutron.agent.linux.dhcp.Dnsmasq - enable_isolated_metadata = true - ``` - - ***Description*** - - In the **[default]** section, configure the linuxbridge interface driver and Dnsmasq DHCP driver, and enable the isolated metadata. - - Configure the metadata agent: - - ```shell - vim /etc/neutron/metadata_agent.ini (CTL) - - [DEFAULT] - nova_metadata_host = controller - metadata_proxy_shared_secret = METADATA_SECRET - ``` - - ***Description*** - - In the **[default]**, configure the metadata host and the shared secret. - - ***Note*** - - **Replace *METADATA_SECRET* with a proper metadata agent secret.** - -4. Configure Nova: - - ```shell - vim /etc/nova/nova.conf - - [neutron] - auth_url = http://controller:5000 - auth_type = password - project_domain_name = Default - user_domain_name = Default - region_name = RegionOne - project_name = service - username = neutron - password = NEUTRON_PASS - service_metadata_proxy = true (CTL) - metadata_proxy_shared_secret = METADATA_SECRET (CTL) - ``` - - ***Description*** - - In the **[neutron]** section, configure the access parameters, enable the metadata agent, and configure the secret. - - ***Note*** - - **Replace *NEUTRON_PASS* with the password of the neutron user.** - - **Replace *METADATA_SECRET* with a proper metadata agent secret.** - -5. Synchronize the database: - - ```shell - su -s /bin/sh -c "neutron-db-manage --config-file /etc/neutron/neutron.conf \ - --config-file /etc/neutron/plugins/ml2/ml2_conf.ini upgrade head" neutron - ``` - -6. Run the following command to restart the compute API service: - - ```shell - systemctl restart openstack-nova-api.service - ``` - -7. Start the network service: - - ```shell - systemctl enable neutron-server.service neutron-linuxbridge-agent.service \ (CTL) - neutron-dhcp-agent.service neutron-metadata-agent.service \ - neutron-l3-agent.service - - systemctl restart neutron-server.service neutron-linuxbridge-agent.service \ (CTL) - neutron-dhcp-agent.service neutron-metadata-agent.service \ - neutron-l3-agent.service - - systemctl enable neutron-linuxbridge-agent.service (CPT) - systemctl restart neutron-linuxbridge-agent.service openstack-nova-compute.service (CPT) - ``` - -8. Perform the verification. - - Run the following command to verify whether the Neutron agent is started successfully: - - ```shell - openstack network agent list - ``` - -### Installing Cinder - -1. Create the database, service credentials, and API endpoints. - - Create the database: - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE cinder; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' \ - IDENTIFIED BY 'CINDER_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' \ - IDENTIFIED BY 'CINDER_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note*** - - **Replace *CINDER_DBPASS* to set the password for the cinder database.** - - ```shell - source ~/.admin-openrc - ``` - - Create the Cinder service credentials: - - ```shell - openstack user create --domain default --password-prompt cinder - openstack role add --project service --user cinder admin - openstack service create --name cinderv2 --description "OpenStack Block Storage" volumev2 - openstack service create --name cinderv3 --description "OpenStack Block Storage" volumev3 - ``` - - Create the API endpoints for the block storage service: - - ```shell - openstack endpoint create --region RegionOne volumev2 public http://controller:8776/v2/%\(project_id\)s - openstack endpoint create --region RegionOne volumev2 internal http://controller:8776/v2/%\(project_id\)s - openstack endpoint create --region RegionOne volumev2 admin http://controller:8776/v2/%\(project_id\)s - openstack endpoint create --region RegionOne volumev3 public http://controller:8776/v3/%\(project_id\)s - openstack endpoint create --region RegionOne volumev3 internal http://controller:8776/v3/%\(project_id\)s - openstack endpoint create --region RegionOne volumev3 admin http://controller:8776/v3/%\(project_id\)s - ``` - -2. Install the software packages: - - ```shell - yum install openstack-cinder-api openstack-cinder-scheduler (CTL) - ``` - - ```shell - yum install lvm2 device-mapper-persistent-data scsi-target-utils rpcbind nfs-utils \ (STG) - openstack-cinder-volume openstack-cinder-backup - ``` - -3. Prepare the storage devices. The following is an example: - - ```shell - pvcreate /dev/vdb - vgcreate cinder-volumes /dev/vdb - - vim /etc/lvm/lvm.conf - - - devices { - ... - filter = [ "a/vdb/", "r/.*/"] - ``` - - ***Description*** - - In the **devices** section, add filters to allow the **/dev/vdb** devices and reject other devices. - -4. Prepare the NFS: - - ```shell - mkdir -p /root/cinder/backup - - cat << EOF >> /etc/export - /root/cinder/backup 192.168.1.0/24(rw,sync,no_root_squash,no_all_squash) - EOF - - ``` - -5. Configure Cinder: - - ```shell - vim /etc/cinder/cinder.conf - - [DEFAULT] - transport_url = rabbit://openstack:RABBIT_PASS@controller - auth_strategy = keystone - my_ip = 10.0.0.11 - enabled_backends = lvm (STG) - backup_driver=cinder.backup.drivers.nfs.NFSBackupDriver (STG) - backup_share=HOST:PATH (STG) - - [database] - connection = mysql+pymysql://cinder:CINDER_DBPASS@controller/cinder - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000 - auth_url = http://controller:5000 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = cinder - password = CINDER_PASS - - [oslo_concurrency] - lock_path = /var/lib/cinder/tmp - - [lvm] - volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver (STG) - volume_group = cinder-volumes (STG) - iscsi_protocol = iscsi (STG) - iscsi_helper = tgtadm (STG) - ``` - - ***Description*** - - In the **[database]** section, configure the database entry. - - In the **[DEFAULT]** section, configure the RabbitMQ message queue entry and **my_ip**. - - In the **[DEFAULT]** and **[keystone_authtoken]** sections, configure the identity authentication service entry. - - In the **[oslo_concurrency]** section, configure the lock path. - - ***Note*** - - **Replace *CINDER_DBPASS* with the password of the cinder database.** - - **Replace *RABBIT_PASS* with the password of the openstack user in RabbitMQ.** - - **Set *my_ip* to the management IP address of the controller node.** - - **Replace *CINDER_PASS* with the password of the cinder user.** - - **Replace *HOST:PATH* with the host IP address and the shared path of the NFS.** - -6. Synchronize the database: - - ```shell - su -s /bin/sh -c "cinder-manage db sync" cinder (CTL) - ``` - -7. Configure Nova: - - ```shell - vim /etc/nova/nova.conf (CTL) - - [cinder] - os_region_name = RegionOne - ``` - -8. Restart the compute API service: - - ```shell - systemctl restart openstack-nova-api.service - ``` - -9. Start the Cinder service: - - ```shell - systemctl enable openstack-cinder-api.service openstack-cinder-scheduler.service (CTL) - systemctl start openstack-cinder-api.service openstack-cinder-scheduler.service (CTL) - ``` - - ```shell - systemctl enable rpcbind.service nfs-server.service tgtd.service iscsid.service \ (STG) - openstack-cinder-volume.service \ - openstack-cinder-backup.service - systemctl start rpcbind.service nfs-server.service tgtd.service iscsid.service \ (STG) - openstack-cinder-volume.service \ - openstack-cinder-backup.service - ``` - - ***Note*** - - If the Cinder volumes are mounted using tgtadm, modify the **/etc/tgt/tgtd.conf** file as follows to ensure that tgtd can discover the iscsi target of cinder-volume. - - ```shell - include /var/lib/cinder/volumes/* - ``` - -10. Perform the verification: - - ```shell - source ~/.admin-openrc - openstack volume service list - ``` - -### Installing Horizon - -1. Install the software package: - - ```shell - yum install openstack-dashboard - ``` - -2. Modify the file. - - Modify the variables: - - ```text - vim /etc/openstack-dashboard/local_settings - - OPENSTACK_HOST = "controller" - ALLOWED_HOSTS = ['*', ] - - SESSION_ENGINE = 'django.contrib.sessions.backends.cache' - - CACHES = { - 'default': { - 'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache', - 'LOCATION': 'controller:11211', - } - } - - OPENSTACK_KEYSTONE_URL = "http://%s:5000/v3" % OPENSTACK_HOST - OPENSTACK_KEYSTONE_MULTIDOMAIN_SUPPORT = True - OPENSTACK_KEYSTONE_DEFAULT_DOMAIN = "Default" - OPENSTACK_KEYSTONE_DEFAULT_ROLE = "user" - - OPENSTACK_API_VERSIONS = { - "identity": 3, - "image": 2, - "volume": 3, - } - ``` - -3. Restart the httpd service: - - ```shell - systemctl restart httpd.service memcached.service - ``` - -4. Perform the verification. - Open the browser, enter in the address bar, and log in to Horizon. - - ***Note*** - - **Replace *HOSTIP* with the management plane IP address of the controller node.** - -### Installing Tempest - -Tempest is the integrated test service of OpenStack. If you need to run a fully automatic test of the functions of the installed OpenStack environment, you are advised to use Tempest. Otherwise, you can choose not to install it. - -1. Install Tempest: - - ```shell - yum install openstack-tempest - ``` - -2. Initialize the directory: - - ```shell - tempest init mytest - ``` - -3. Modify the configuration file: - - ```shell - cd mytest - vi etc/tempest.conf - ``` - - Configure the current OpenStack environment information in **tempest.conf**. For details, see the [official example](https://docs.openstack.org/tempest/latest/sampleconf.html). - -4. Perform the test: - - ```shell - tempest run - ``` - -5. (Optional) Install the tempest extensions. - The OpenStack services have provided some tempest test packages. You can install these packages to enrich the tempest test content. In Train, extension tests for Cinder, Glance, Keystone, Ironic and Trove are provided. You can run the following command to install and use them: - ``` - yum install python3-cinder-tempest-plugin python3-glance-tempest-plugin python3-ironic-tempest-plugin python3-keystone-tempest-plugin python3-trove-tempest-plugin - ``` - -### Installing Ironic - -Ironic is the bare metal service of OpenStack. If you need to deploy bare metal machines, Ironic is recommended. Otherwise, you can choose not to install it. - -1. Set the database. - - The bare metal service stores information in the database. Create a **ironic** database that can be accessed by the **ironic** user and replace **IRONIC_DBPASSWORD** with a proper password. - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE ironic CHARACTER SET utf8; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON ironic.* TO 'ironic'@'localhost' \ - IDENTIFIED BY 'IRONIC_DBPASSWORD'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON ironic.* TO 'ironic'@'%' \ - IDENTIFIED BY 'IRONIC_DBPASSWORD'; - ``` - -2. Install the software packages. - - ```shell - yum install openstack-ironic-api openstack-ironic-conductor python3-ironicclient - ``` - - Start the services. - - ```shell - systemctl enable openstack-ironic-api openstack-ironic-conductor - systemctl start openstack-ironic-api openstack-ironic-conductor - ``` - -3. Create service user authentication. - - 1. Create the bare metal service user: - - ```shell - openstack user create --password IRONIC_PASSWORD \ - --email ironic@example.com ironic - openstack role add --project service --user ironic admin - openstack service create --name ironic \ - --description "Ironic baremetal provisioning service" baremetal - ``` - - 1. Create the bare metal service access entries: - - ```shell - openstack endpoint create --region RegionOne baremetal admin http://$IRONIC_NODE:6385 - openstack endpoint create --region RegionOne baremetal public http://$IRONIC_NODE:6385 - openstack endpoint create --region RegionOne baremetal internal http://$IRONIC_NODE:6385 - ``` - -4. Configure the ironic-api service. - - Configuration file path: **/etc/ironic/ironic.conf** - - 1. Use **connection** to configure the location of the database as follows. Replace **IRONIC_DBPASSWORD** with the password of user **ironic** and replace **DB_IP** with the IP address of the database server. - - ```shell - [database] - - # The SQLAlchemy connection string used to connect to the - # database (string value) - - connection = mysql+pymysql://ironic:IRONIC_DBPASSWORD@DB_IP/ironic - ``` - - 1. Configure the ironic-api service to use the RabbitMQ message broker. Replace **RPC_\*** with the detailed address and the credential of RabbitMQ. - - ```shell - [DEFAULT] - - # A URL representing the messaging driver to use and its full - # configuration. (string value) - - transport_url = rabbit://RPC_USER:RPC_PASSWORD@RPC_HOST:RPC_PORT/ - ``` - - You can also use json-rpc instead of RabbitMQ. - - 1. Configure the ironic-api service to use the credential of the identity authentication service. Replace **PUBLIC_IDENTITY_IP** with the public IP address of the identity authentication server and **PRIVATE_IDENTITY_IP** with the private IP address of the identity authentication server, replace **IRONIC_PASSWORD** with the password of the **ironic** user in the identity authentication service. - - ```shell - [DEFAULT] - - # Authentication strategy used by ironic-api: one of - # "keystone" or "noauth". "noauth" should not be used in a - # production environment because all authentication will be - # disabled. (string value) - - auth_strategy=keystone - - [keystone_authtoken] - # Authentication type to load (string value) - auth_type=password - # Complete public Identity API endpoint (string value) - www_authenticate_uri=http://PUBLIC_IDENTITY_IP:5000 - # Complete admin Identity API endpoint. (string value) - auth_url=http://PRIVATE_IDENTITY_IP:5000 - # Service username. (string value) - username=ironic - # Service account password. (string value) - password=IRONIC_PASSWORD - # Service tenant name. (string value) - project_name=service - # Domain name containing project (string value) - project_domain_name=Default - # User's domain name (string value) - user_domain_name=Default - - ``` - - 1. Create the bare metal service database table: - - ```shell - ironic-dbsync --config-file /etc/ironic/ironic.conf create_schema - ``` - - 1. Restart the ironic-api service: - - ```shell - sudo systemctl restart openstack-ironic-api - ``` - -5. Configure the ironic-conductor service. - - 1. Replace **HOST_IP** with the IP address of the conductor host. - - ```shell - [DEFAULT] - - # IP address of this host. If unset, will determine the IP - # programmatically. If unable to do so, will use "127.0.0.1". - # (string value) - - my_ip=HOST_IP - ``` - - 1. Specifies the location of the database. ironic-conductor must use the same configuration as ironic-api. Replace **IRONIC_DBPASSWORD** with the password of user **ironic** and replace **DB_IP** with the IP address of the database server. - - ```shell - [database] - - # The SQLAlchemy connection string to use to connect to the - # database. (string value) - - connection = mysql+pymysql://ironic:IRONIC_DBPASSWORD@DB_IP/ironic - ``` - - 1. Configure the ironic-api service to use the RabbitMQ message broker. ironic-conductor must use the same configuration as ironic-api. Replace **RPC_\*** with the detailed address and the credential of RabbitMQ. - - ```shell - [DEFAULT] - - # A URL representing the messaging driver to use and its full - # configuration. (string value) - - transport_url = rabbit://RPC_USER:RPC_PASSWORD@RPC_HOST:RPC_PORT/ - ``` - - You can also use json-rpc instead of RabbitMQ. - - 1. Configure the credentials to access other OpenStack services. - - To communicate with other OpenStack services, the bare metal service needs to use the service users to get authenticated by the OpenStack Identity service when requesting other services. The credentials of these users must be configured in each configuration file associated to the corresponding service. - - ```shell - [neutron] - Accessing the OpenStack network services. - [glance] - Accessing the OpenStack image service. - [swift] - Accessing the OpenStack object storage service. - [cinder] - Accessing the OpenStack block storage service. - [inspector] Accessing the OpenStack bare metal introspection service. - [service_catalog] - A special item to store the credential used by the bare metal service. The credential is used to discover the API URL endpoint registered in the OpenStack identity authentication service catalog by the bare metal service. - ``` - - For simplicity, you can use one service user for all services. For backward compatibility, the user name must be the same as that configured in [keystone_authtoken] of the ironic-api service. However, this is not mandatory. You can also create and configure a different service user for each service. - - In the following example, the authentication information for the user to access the OpenStack network service is configured as follows: - - ```shell - The network service is deployed in the identity authentication service domain named RegionOne. Only the public endpoint interface is registered in the service catalog. - - A specific CA SSL certificate is used for HTTPS connection when sending a request. - - The same service user as that configured for ironic-api. - - The dynamic password authentication plugin discovers a proper identity authentication service API version based on other options. - ``` - - ```shell - [neutron] - - # Authentication type to load (string value) - auth_type = password - # Authentication URL (string value) - auth_url=https://IDENTITY_IP:5000/ - # Username (string value) - username=ironic - # User's password (string value) - password=IRONIC_PASSWORD - # Project name to scope to (string value) - project_name=service - # Domain ID containing project (string value) - project_domain_id=default - # User's domain id (string value) - user_domain_id=default - # PEM encoded Certificate Authority to use when verifying - # HTTPs connections. (string value) - cafile=/opt/stack/data/ca-bundle.pem - # The default region_name for endpoint URL discovery. (string - # value) - region_name = RegionOne - # List of interfaces, in order of preference, for endpoint - # URL. (list value) - valid_interfaces=public - ``` - - By default, to communicate with other services, the bare metal service attempts to discover a proper endpoint of the service through the service catalog of the identity authentication service. If you want to use a different endpoint for a specific service, specify the endpoint_override option in the bare metal service configuration file. - - ```shell - [neutron] ... endpoint_override = - ``` - - 1. Configure the allowed drivers and hardware types. - - Set enabled_hardware_types to specify the hardware types that can be used by ironic-conductor: - - ```shell - [DEFAULT] enabled_hardware_types = ipmi - ``` - - Configure hardware interfaces: - - ```shell - enabled_boot_interfaces = pxe enabled_deploy_interfaces = direct,iscsi enabled_inspect_interfaces = inspector enabled_management_interfaces = ipmitool enabled_power_interfaces = ipmitool - ``` - - Configure the default value of the interface: - - ```shell - [DEFAULT] default_deploy_interface = direct default_network_interface = neutron - ``` - - If any driver that uses Direct Deploy is enabled, you must install and configure the Swift backend of the image service. The Ceph object gateway (RADOS gateway) can also be used as the backend of the image service. - - 1. Restart the ironic-conductor service: - - ```shell - sudo systemctl restart openstack-ironic-conductor - ``` - -6. Configure the httpd service. - - 1. Create the root directory of the httpd used by Ironic, and set the owner and owner group. The directory path must be the same as the path specified by the **http_root** configuration item in the **[deploy]** group in **/etc/ironic/ironic.conf**. - - ``` - mkdir -p /var/lib/ironic/httproot ``chown ironic.ironic /var/lib/ironic/httproot - ``` - - 2. Install and configure the httpd Service. - - 1. Install the httpd service. If the httpd service is already installed, skip this step. - - ``` - yum install httpd -y - ``` - 2. Create the **/etc/httpd/conf.d/openstack-ironic-httpd.conf** file. The file content is as follows: - - ``` - Listen 8080 - - - ServerName ironic.openeuler.com - - ErrorLog "/var/log/httpd/openstack-ironic-httpd-error_log" - CustomLog "/var/log/httpd/openstack-ironic-httpd-access_log" "%h %l %u %t \"%r\" %>s %b" - - DocumentRoot "/var/lib/ironic/httproot" - - Options Indexes FollowSymLinks - Require all granted - - LogLevel warn - AddDefaultCharset UTF-8 - EnableSendfile on - - - ``` - - The listening port must be the same as the port specified by **http_url** in the **[deploy]** section of **/etc/ironic/ironic.conf**. - - 3. Restart the httpd service: - - ``` - systemctl restart httpd - ``` - - - -8. Create the deploy ramdisk image. - - The ramdisk image of Train can be created using the ironic-python-agent service or disk-image-builder tool. You can also use the latest ironic-python-agent-builder provided by the community. You can also use other tools. - To use the Train native tool, you need to install the corresponding software package. - - ```shell - yum install openstack-ironic-python-agent - or - yum install diskimage-builder - ``` - - For details, see the [official document](https://docs.openstack.org/ironic/queens/install/deploy-ramdisk.html). - - The following describes how to use the ironic-python-agent-builder to build the deploy image used by ironic. - - 1. Install ironic-python-agent-builder. - - 1. Install the tool: - - ```shell - pip install ironic-python-agent-builder - ``` - - 2. Modify the python interpreter in the following files: - - ```shell - /usr/bin/yum /usr/libexec/urlgrabber-ext-down - ``` - - 3. Install the other necessary tools: - - ```shell - yum install git - ``` - - **DIB** depends on the `semanage` command. Therefore, check whether the `semanage --help` command is available before creating an image. If the system displays a message indicating that the command is unavailable, install the command: - - ```shell - # Check which package needs to be installed. - [root@localhost ~]# yum provides /usr/sbin/semanage - Loaded plug-in: fastestmirror - Loading mirror speeds from cached hostfile - * base: mirror.vcu.edu - * extras: mirror.vcu.edu - * updates: mirror.math.princeton.edu - policycoreutils-python-2.5-34.el7.aarch64 : SELinux policy core python utilities - Source: base - Matching source: - File name: /usr/sbin/semanage - # Install. - [root@localhost ~]# yum install policycoreutils-python - ``` - - 2. Create the image. - - For Arm architecture, add the following information: - ```shell - export ARCH=aarch64 - ``` - - Basic usage: - - ```shell - usage: ironic-python-agent-builder [-h] [-r RELEASE] [-o OUTPUT] [-e ELEMENT] - [-b BRANCH] [-v] [--extra-args EXTRA_ARGS] - distribution - - positional arguments: - distribution Distribution to use - - optional arguments: - -h, --help show this help message and exit - -r RELEASE, --release RELEASE - Distribution release to use - -o OUTPUT, --output OUTPUT - Output base file name - -e ELEMENT, --element ELEMENT - Additional DIB element to use - -b BRANCH, --branch BRANCH - If set, override the branch that is used for ironic- - python-agent and requirements - -v, --verbose Enable verbose logging in diskimage-builder - --extra-args EXTRA_ARGS - Extra arguments to pass to diskimage-builder - ``` - - Example: - - ```shell - ironic-python-agent-builder centos -o /mnt/ironic-agent-ssh -b origin/stable/rocky - ``` - - 3. Allow SSH login. - - Initialize the environment variables and create the image: - - ```shell - export DIB_DEV_USER_USERNAME=ipa \ - export DIB_DEV_USER_PWDLESS_SUDO=yes \ - export DIB_DEV_USER_PASSWORD='123' - ironic-python-agent-builder centos -o /mnt/ironic-agent-ssh -b origin/stable/rocky -e selinux-permissive -e devuser - ``` - - 4. Specify the code repository. - - Initialize the corresponding environment variables and create the image: - - ```shell - # Specify the address and version of the repository. - DIB_REPOLOCATION_ironic_python_agent=git@172.20.2.149:liuzz/ironic-python-agent.git - DIB_REPOREF_ironic_python_agent=origin/develop - - # Clone code from Gerrit. - DIB_REPOLOCATION_ironic_python_agent=https://review.opendev.org/openstack/ironic-python-agent - DIB_REPOREF_ironic_python_agent=refs/changes/43/701043/1 - ``` - - Reference: [source-repositories](https://docs.openstack.org/diskimage-builder/latest/elements/source-repositories/README.html). - - The specified repository address and version are verified successfully. - - 5. Note - -The template of the PXE configuration file of the native OpenStack does not support the ARM64 architecture. You need to modify the native OpenStack code. - -In Train, Ironic provided by the community does not support the boot from ARM 64-bit UEFI PXE. As a result, the format of the generated grub.cfg file (generally in /tftpboot/) is incorrect, causing the PXE boot failure. - -You need to modify the code logic for generating the grub.cfg file. - -The following TLS error is reported when Ironic sends a request to IPA to query the command execution status: - -By default, both IPA and Ironic of Train have TLS authentication enabled to send requests to each other. Disable TLS authentication according to the description on the official website. - -1. Add **ipa-insecure=1** to the following configuration in the Ironic configuration file (**/etc/ironic/ironic.conf**): - -``` -[agent] -verify_ca = False - -[pxe] -pxe_append_params = nofb nomodeset vga=normal coreos.autologin ipa-insecure=1 -``` - -2. Add the IPA configuration file **/etc/ironic_python_agent/ironic_python_agent.conf** to the ramdisk image and configure the TLS as follows: - -**/etc/ironic_python_agent/ironic_python_agent.conf** (The **/etc/ironic_python_agent** directory must be created in advance.) - -``` -[DEFAULT] -enable_auto_tls = False -``` - -Set the permission: - -``` -chown -R ipa.ipa /etc/ironic_python_agent/ -``` - -3. Modify the startup file of the IPA service and add the configuration file option. - - vim usr/lib/systemd/system/ironic-python-agent.service - - ``` - [Unit] - Description=Ironic Python Agent - After=network-online.target - - [Service] - ExecStartPre=/sbin/modprobe vfat - ExecStart=/usr/local/bin/ironic-python-agent --config-file /etc/ironic_python_agent/ironic_python_agent.conf - Restart=always - RestartSec=30s - - [Install] - WantedBy=multi-user.target - ``` - - -Other services such as ironic-inspector are also provided for OpenStack Train. Install the services based on site requirements. - -### Installing Kolla - -Kolla provides the OpenStack service with the container-based deployment function that is ready for the production environment. - -The installation of Kolla is simple. You only need to install the corresponding RPM packages: - -``` -yum install openstack-kolla openstack-kolla-ansible -``` - -After the installation is complete, you can run commands such as `kolla-ansible`, `kolla-build`, `kolla-genpwd`, `kolla-mergepwd` to create an image or deploy a container environment. - -### Installing Trove -Trove is the database service of OpenStack. If you need to use the database service provided by OpenStack, Trove is recommended. Otherwise, you can choose not to install it. - -1. Set the database. - - The database service stores information in the database. Create a **trove** database that can be accessed by the **trove** user and replace **TROVE_DBPASSWORD** with a proper password. - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE trove CHARACTER SET utf8; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON trove.* TO 'trove'@'localhost' \ - IDENTIFIED BY 'TROVE_DBPASSWORD'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON trove.* TO 'trove'@'%' \ - IDENTIFIED BY 'TROVE_DBPASSWORD'; - ``` - -2. Create service user authentication. - - 1. Create the **Trove** service user. - - ```shell - openstack user create --domain default --password-prompt trove - openstack role add --project service --user trove admin - openstack service create --name trove --description "Database" database - ``` - **Description:** Replace *TROVE_PASSWORD* with the password of the **trove** user. - - 1. Create the **Database** service access entry - - ```shell - openstack endpoint create --region RegionOne database public http://controller:8779/v1.0/%\(tenant_id\)s - openstack endpoint create --region RegionOne database internal http://controller:8779/v1.0/%\(tenant_id\)s - openstack endpoint create --region RegionOne database admin http://controller:8779/v1.0/%\(tenant_id\)s - ``` - -3. Install and configure the **Trove** components. - - 1. Install the **Trove** package: - ```shell script - yum install openstack-trove python3-troveclient - ``` - - 2. Configure **trove.conf**: - ```shell script - vim /etc/trove/trove.conf - - [DEFAULT] - log_dir = /var/log/trove - trove_auth_url = http://controller:5000/ - nova_compute_url = http://controller:8774/v2 - cinder_url = http://controller:8776/v1 - swift_url = http://controller:8080/v1/AUTH_ - rpc_backend = rabbit - transport_url = rabbit://openstack:RABBIT_PASS@controller:5672 - auth_strategy = keystone - add_addresses = True - api_paste_config = /etc/trove/api-paste.ini - nova_proxy_admin_user = admin - nova_proxy_admin_pass = ADMIN_PASSWORD - nova_proxy_admin_tenant_name = service - taskmanager_manager = trove.taskmanager.manager.Manager - use_nova_server_config_drive = True - # Set these if using Neutron Networking - network_driver = trove.network.neutron.NeutronDriver - network_label_regex = .* - - [database] - connection = mysql+pymysql://trove:TROVE_DBPASSWORD@controller/trove - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000/ - auth_url = http://controller:5000/ - auth_type = password - project_domain_name = default - user_domain_name = default - project_name = service - username = trove - password = TROVE_PASSWORD - ``` - **Description:** - - In the **[Default]** section, **nova_compute_url** and **cinder_url** are endpoints created by Nova and Cinder in Keystone. - - **nova_proxy_XXX** is a user who can access the Nova service. In the preceding example, the **admin** user is used. - - **transport_url** is the **RabbitMQ** connection information, and **RABBIT_PASS** is the RabbitMQ password. - - In the **[database]** section, **connection** is the information of the database created for Trove in MySQL. - - Replace **TROVE_PASSWORD** in the Trove user information with the password of the **trove** user. - - 3. Configure **trove-guestagent.conf**: - ```shell script - vim /etc/trove/trove-guestagent.conf - - rabbit_host = controller - rabbit_password = RABBIT_PASS - trove_auth_url = http://controller:5000/ - ``` - **Description:** **guestagent** is an independent component in Trove and needs to be pre-built into the virtual machine image created by Trove using Nova. - After the database instance is created, the guestagent process is started to report heartbeat messages to the Trove through the message queue (RabbitMQ). - Therefore, you need to configure the user name and password of the RabbitMQ. - **Since Victoria, Trove uses a unified image to run different types of databases. The database service runs in the Docker container of the Guest VM.** - - Replace **RABBIT_PASS** with the RabbitMQ password. - - 4. Generate the **Trove** database table. - ```shell script - su -s /bin/sh -c "trove-manage db_sync" trove - ``` - -4. Complete the installation and configuration. - 1. Configure the **Trove** service to automatically start: - ```shell script - systemctl enable openstack-trove-api.service \ - openstack-trove-taskmanager.service \ - openstack-trove-conductor.service - ``` - 2. Start the services: - ```shell script - systemctl start openstack-trove-api.service \ - openstack-trove-taskmanager.service \ - openstack-trove-conductor.service - ``` -### Installing Swift - -Swift provides a scalable and highly available distributed object storage service, which is suitable for storing unstructured data in large scale. - -1. Create the service credentials and API endpoints. - - Create the service credential: - - ``` shell - # Create the swift user. - openstack user create --domain default --password-prompt swift - # Add the admin role for the swift user. - openstack role add --project service --user swift admin - # Create the swift service entity. - openstack service create --name swift --description "OpenStack Object Storage" object-store - ``` - - Create the Swift API endpoints. - - ```shell - openstack endpoint create --region RegionOne object-store public http://controller:8080/v1/AUTH_%\(project_id\)s - openstack endpoint create --region RegionOne object-store internal http://controller:8080/v1/AUTH_%\(project_id\)s - openstack endpoint create --region RegionOne object-store admin http://controller:8080/v1 - ``` - - -2. Install the software packages: - - ```shell - yum install openstack-swift-proxy python3-swiftclient python3-keystoneclient python3-keystonemiddleware memcached (CTL) - ``` - -3. Configure the proxy-server. - - The Swift RPM package contains a **proxy-server.conf** file which is basically ready to use. You only need to change the values of **ip** and swift **password** in the file. - - ***Note*** - - **Replace password with the password you set for the swift user in the identity service.** - -4. Install and configure the storage node. (STG) - - Install the supported program packages: - ```shell - yum install xfsprogs rsync - ``` - - Format the /dev/vdb and /dev/vdc devices into XFS: - - ```shell - mkfs.xfs /dev/vdb - mkfs.xfs /dev/vdc - ``` - - Create the mount point directory structure: - - ```shell - mkdir -p /srv/node/vdb - mkdir -p /srv/node/vdc - ``` - - Find the UUID of the new partition: - - ```shell - blkid - ``` - - Add the following to the **/etc/fstab** file: - - ```shell - UUID="" /srv/node/vdb xfs noatime 0 2 - UUID="" /srv/node/vdc xfs noatime 0 2 - ``` - - Mount the devices: - - ```shell - mount /srv/node/vdb - mount /srv/node/vdc - ``` - ***Note*** - - **If the disaster recovery function is not required, you only need to create one device and skip the following rsync configuration.** - - (Optional) Create or edit the **/etc/rsyncd.conf** file to include the following content: - - ```shell - [DEFAULT] - uid = swift - gid = swift - log file = /var/log/rsyncd.log - pid file = /var/run/rsyncd.pid - address = MANAGEMENT_INTERFACE_IP_ADDRESS - - [account] - max connections = 2 - path = /srv/node/ - read only = False - lock file = /var/lock/account.lock - - [container] - max connections = 2 - path = /srv/node/ - read only = False - lock file = /var/lock/container.lock - - [object] - max connections = 2 - path = /srv/node/ - read only = False - lock file = /var/lock/object.lock - ``` - **Replace *MANAGEMENT_INTERFACE_IP_ADDRESS* with the management network IP address of the storage node.** - - Start the rsyncd service and configure it to start upon system startup. - - ```shell - systemctl enable rsyncd.service - systemctl start rsyncd.service - ``` - -5. Install and configure the components on storage nodes. (STG) - - Install the software packages: - - ```shell - yum install openstack-swift-account openstack-swift-container openstack-swift-object - ``` - - Edit **account-server.conf**, **container-server.conf**, and **object-server.conf** in the **/etc/swift directory** and replace **bind_ip** with the management network IP address of the storage node. - - Ensure the proper ownership of the mount point directory structure. - - ```shell - chown -R swift:swift /srv/node - ``` - - Create the recon directory and ensure that it has the correct ownership. - - ```shell - mkdir -p /var/cache/swift - chown -R root:swift /var/cache/swift - chmod -R 775 /var/cache/swift - ``` - -6. Create the account ring. (CTL) - - Switch to the **/etc/swift** directory: - - ```shell - cd /etc/swift - ``` - - Create the basic **account.builder** file: - - ```shell - swift-ring-builder account.builder create 10 1 1 - ``` - - Add each storage node to the ring: - - ```shell - swift-ring-builder account.builder add --region 1 --zone 1 --ip STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS --port 6202 --device DEVICE_NAME --weight DEVICE_WEIGHT - ``` - - **Replace *STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS* with the management network IP address of the storage node. Replace *DEVICE_NAME* with the name of the storage device on the same storage node.** - - ***Note*** - **Repeat this command to each storage device on each storage node.** - - Verify the ring contents: - - ```shell - swift-ring-builder account.builder - ``` - - Rebalance the ring: - - ```shell - swift-ring-builder account.builder rebalance - ``` - -7. Create the container ring. (CTL) - - Switch to the **/etc/swift** directory: - - Create the basic **container.builder** file: - - ```shell - swift-ring-builder container.builder create 10 1 1 - ``` - - Add each storage node to the ring: - - ```shell - swift-ring-builder container.builder \ - add --region 1 --zone 1 --ip STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS --port 6201 \ - --device DEVICE_NAME --weight 100 - - ``` - - **Replace *STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS* with the management network IP address of the storage node. Replace *DEVICE_NAME* with the name of the storage device on the same storage node.** - - ***Note*** - **Repeat this command to every storage devices on every storage nodes.** - - Verify the ring contents: - - ```shell - swift-ring-builder container.builder - ``` - - Rebalance the ring: - - ```shell - swift-ring-builder container.builder rebalance - ``` - -8. Create the object ring. (CTL) - - Switch to the **/etc/swift** directory: - - Create the basic **object.builder** file: - - ```shell - swift-ring-builder object.builder create 10 1 1 - ``` - - Add each storage node to the ring: - - ```shell - swift-ring-builder object.builder \ - add --region 1 --zone 1 --ip STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS --port 6200 \ - --device DEVICE_NAME --weight 100 - ``` - - **Replace *STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS* with the management network IP address of the storage node. Replace *DEVICE_NAME* with the name of the storage device on the same storage node.** - - ***Note*** - **Repeat this command to every storage devices on every storage nodes.** - - Verify the ring contents: - - ```shell - swift-ring-builder object.builder - ``` - - Rebalance the ring: - - ```shell - swift-ring-builder object.builder rebalance - ``` - - Distribute ring configuration files: - - Copy **account.ring.gz**, **container.ring.gz**, and **object.ring.gz** to the **/etc/swift** directory on each storage node and any additional nodes running the proxy service. - - - -9. Complete the installation. - - Edit the **/etc/swift/swift.conf** file: - - ``` shell - [swift-hash] - swift_hash_path_suffix = test-hash - swift_hash_path_prefix = test-hash - - [storage-policy:0] - name = Policy-0 - default = yes - ``` - - **Replace test-hash with a unique value.** - - Copy the **swift.conf** file to the **/etc/swift** directory on each storage node and any additional nodes running the proxy service. - - Ensure correct ownership of the configuration directory on all nodes: - - ```shell - chown -R root:swift /etc/swift - ``` - - On the controller node and any additional nodes running the proxy service, start the object storage proxy service and its dependencies, and configure them to start upon system startup. - - ```shell - systemctl enable openstack-swift-proxy.service memcached.service - systemctl start openstack-swift-proxy.service memcached.service - ``` - - On the storage node, start the object storage services and configure them to start upon system startup. - - ```shell - systemctl enable openstack-swift-account.service openstack-swift-account-auditor.service openstack-swift-account-reaper.service openstack-swift-account-replicator.service - - systemctl start openstack-swift-account.service openstack-swift-account-auditor.service openstack-swift-account-reaper.service openstack-swift-account-replicator.service - - systemctl enable openstack-swift-container.service openstack-swift-container-auditor.service openstack-swift-container-replicator.service openstack-swift-container-updater.service - - systemctl start openstack-swift-container.service openstack-swift-container-auditor.service openstack-swift-container-replicator.service openstack-swift-container-updater.service - - systemctl enable openstack-swift-object.service openstack-swift-object-auditor.service openstack-swift-object-replicator.service openstack-swift-object-updater.service - - systemctl start openstack-swift-object.service openstack-swift-object-auditor.service openstack-swift-object-replicator.service openstack-swift-object-updater.service - ``` -### Installing Cyborg - -Cyborg provides acceleration device support for OpenStack, for example, GPUs, FPGAs, ASICs, NPs, SoCs, NVMe/NOF SSDs, ODPs, DPDKs, and SPDKs. - -1. Initialize the databases. - -``` -CREATE DATABASE cyborg; -GRANT ALL PRIVILEGES ON cyborg.* TO 'cyborg'@'localhost' IDENTIFIED BY 'CYBORG_DBPASS'; -GRANT ALL PRIVILEGES ON cyborg.* TO 'cyborg'@'%' IDENTIFIED BY 'CYBORG_DBPASS'; -``` - -2. Create Keystone resource objects. - -``` -$ openstack user create --domain default --password-prompt cyborg -$ openstack role add --project service --user cyborg admin -$ openstack service create --name cyborg --description "Acceleration Service" accelerator - -$ openstack endpoint create --region RegionOne \ - accelerator public http://:6666/v1 -$ openstack endpoint create --region RegionOne \ - accelerator internal http://:6666/v1 -$ openstack endpoint create --region RegionOne \ - accelerator admin http://:6666/v1 -``` - -3. Install Cyborg - -``` -yum install openstack-cyborg -``` - -4. Configure Cyborg - -Modify **/etc/cyborg/cyborg.conf**. - -``` -[DEFAULT] -transport_url = rabbit://%RABBITMQ_USER%:%RABBITMQ_PASSWORD%@%OPENSTACK_HOST_IP%:5672/ -use_syslog = False -state_path = /var/lib/cyborg -debug = True - -[database] -connection = mysql+pymysql://%DATABASE_USER%:%DATABASE_PASSWORD%@%OPENSTACK_HOST_IP%/cyborg - -[service_catalog] -project_domain_id = default -user_domain_id = default -project_name = service -password = PASSWORD -username = cyborg -auth_url = http://%OPENSTACK_HOST_IP%/identity -auth_type = password - -[placement] -project_domain_name = Default -project_name = service -user_domain_name = Default -password = PASSWORD -username = placement -auth_url = http://%OPENSTACK_HOST_IP%/identity -auth_type = password - -[keystone_authtoken] -memcached_servers = localhost:11211 -project_domain_name = Default -project_name = service -user_domain_name = Default -password = PASSWORD -username = cyborg -auth_url = http://%OPENSTACK_HOST_IP%/identity -auth_type = password -``` - -Set the user names, passwords, and IP addresses as required. - -1. Synchronize the database table. - -``` -cyborg-dbsync --config-file /etc/cyborg/cyborg.conf upgrade -``` - -6. Start the Cyborg services. - -``` -systemctl enable openstack-cyborg-api openstack-cyborg-conductor openstack-cyborg-agent -systemctl start openstack-cyborg-api openstack-cyborg-conductor openstack-cyborg-agent -``` - -### Installing Aodh - -1. Create the database. - -``` -CREATE DATABASE aodh; - -GRANT ALL PRIVILEGES ON aodh.* TO 'aodh'@'localhost' IDENTIFIED BY 'AODH_DBPASS'; - -GRANT ALL PRIVILEGES ON aodh.* TO 'aodh'@'%' IDENTIFIED BY 'AODH_DBPASS'; -``` - -2. Create Keystone resource objects. - -``` -openstack user create --domain default --password-prompt aodh - -openstack role add --project service --user aodh admin - -openstack service create --name aodh --description "Telemetry" alarming - -openstack endpoint create --region RegionOne alarming public http://controller:8042 - -openstack endpoint create --region RegionOne alarming internal http://controller:8042 - -openstack endpoint create --region RegionOne alarming admin http://controller:8042 -``` - -3. Install Aodh. - -``` -yum install openstack-aodh-api openstack-aodh-evaluator openstack-aodh-notifier openstack-aodh-listener openstack-aodh-expirer python3-aodhclient -``` - -4. Modify the configuration file. - -``` -[database] -connection = mysql+pymysql://aodh:AODH_DBPASS@controller/aodh - -[DEFAULT] -transport_url = rabbit://openstack:RABBIT_PASS@controller -auth_strategy = keystone - -[keystone_authtoken] -www_authenticate_uri = http://controller:5000 -auth_url = http://controller:5000 -memcached_servers = controller:11211 -auth_type = password -project_domain_id = default -user_domain_id = default -project_name = service -username = aodh -password = AODH_PASS - -[service_credentials] -auth_type = password -auth_url = http://controller:5000/v3 -project_domain_id = default -user_domain_id = default -project_name = service -username = aodh -password = AODH_PASS -interface = internalURL -region_name = RegionOne -``` - -5. Initialize the database. - -``` -aodh-dbsync -``` - -6. Start the Aodh services. - -``` -systemctl enable openstack-aodh-api.service openstack-aodh-evaluator.service openstack-aodh-notifier.service openstack-aodh-listener.service - -systemctl start openstack-aodh-api.service openstack-aodh-evaluator.service openstack-aodh-notifier.service openstack-aodh-listener.service -``` - -### Installing Gnocchi - -1. Create the database. - -``` -CREATE DATABASE gnocchi; - -GRANT ALL PRIVILEGES ON gnocchi.* TO 'gnocchi'@'localhost' IDENTIFIED BY 'GNOCCHI_DBPASS'; - -GRANT ALL PRIVILEGES ON gnocchi.* TO 'gnocchi'@'%' IDENTIFIED BY 'GNOCCHI_DBPASS'; -``` - -2. Create Keystone resource objects. - -``` -openstack user create --domain default --password-prompt gnocchi - -openstack role add --project service --user gnocchi admin - -openstack service create --name gnocchi --description "Metric Service" metric - -openstack endpoint create --region RegionOne metric public http://controller:8041 - -openstack endpoint create --region RegionOne metric internal http://controller:8041 - -openstack endpoint create --region RegionOne metric admin http://controller:8041 -``` - -3. Install Gnocchi. - -``` -yum install openstack-gnocchi-api openstack-gnocchi-metricd python3-gnocchiclient -``` - -1. Modify the **/etc/gnocchi/gnocchi.conf** configuration file. - -``` -[api] -auth_mode = keystone -port = 8041 -uwsgi_mode = http-socket - -[keystone_authtoken] -auth_type = password -auth_url = http://controller:5000/v3 -project_domain_name = Default -user_domain_name = Default -project_name = service -username = gnocchi -password = GNOCCHI_PASS -interface = internalURL -region_name = RegionOne - -[indexer] -url = mysql+pymysql://gnocchi:GNOCCHI_DBPASS@controller/gnocchi - -[storage] -# coordination_url is not required but specifying one will improve -# performance with better workload division across workers. -coordination_url = redis://controller:6379 -file_basepath = /var/lib/gnocchi -driver = file -``` - -5. Initialize the database. - -``` -gnocchi-upgrade -``` - -6. Start the Gnocchi services. - -``` -systemctl enable openstack-gnocchi-api.service openstack-gnocchi-metricd.service - -systemctl start openstack-gnocchi-api.service openstack-gnocchi-metricd.service -``` - -### Installing Ceilometer - -1. Create Keystone resource objects. - -``` -openstack user create --domain default --password-prompt ceilometer - -openstack role add --project service --user ceilometer admin - -openstack service create --name ceilometer --description "Telemetry" metering -``` - -2. Install Ceilometer. - -``` -yum install openstack-ceilometer-notification openstack-ceilometer-central -``` - -1. Modify the **/etc/ceilometer/pipeline.yaml** configuration file. - -``` -publishers: - # set address of Gnocchi - # + filter out Gnocchi-related activity meters (Swift driver) - # + set default archive policy - - gnocchi://?filter_project=service&archive_policy=low -``` - -4. Modify the **/etc/ceilometer/ceilometer.conf** configuration file. - -``` -[DEFAULT] -transport_url = rabbit://openstack:RABBIT_PASS@controller - -[service_credentials] -auth_type = password -auth_url = http://controller:5000/v3 -project_domain_id = default -user_domain_id = default -project_name = service -username = ceilometer -password = CEILOMETER_PASS -interface = internalURL -region_name = RegionOne -``` - -5. Initialize the database. - -``` -ceilometer-upgrade -``` - -6. Start the Ceilometer services. - -``` -systemctl enable openstack-ceilometer-notification.service openstack-ceilometer-central.service - -systemctl start openstack-ceilometer-notification.service openstack-ceilometer-central.service -``` - -### Installing Heat - -1. Creat the **heat** database and grant proper privileges to it. Replace **HEAT_DBPASS** with a proper password. - -``` -CREATE DATABASE heat; -GRANT ALL PRIVILEGES ON heat.* TO 'heat'@'localhost' IDENTIFIED BY 'HEAT_DBPASS'; -GRANT ALL PRIVILEGES ON heat.* TO 'heat'@'%' IDENTIFIED BY 'HEAT_DBPASS'; -``` - -2. Create a service credential. Create the **heat** user and add the **admin** role to it. - -``` -openstack user create --domain default --password-prompt heat -openstack role add --project service --user heat admin -``` - -3. Create the **heat** and **heat-cfn** services and their API enpoints. - -``` -openstack service create --name heat --description "Orchestration" orchestration -openstack service create --name heat-cfn --description "Orchestration" cloudformation -openstack endpoint create --region RegionOne orchestration public http://controller:8004/v1/%\(tenant_id\)s -openstack endpoint create --region RegionOne orchestration internal http://controller:8004/v1/%\(tenant_id\)s -openstack endpoint create --region RegionOne orchestration admin http://controller:8004/v1/%\(tenant_id\)s -openstack endpoint create --region RegionOne cloudformation public http://controller:8000/v1 -openstack endpoint create --region RegionOne cloudformation internal http://controller:8000/v1 -openstack endpoint create --region RegionOne cloudformation admin http://controller:8000/v1 -``` - -4. Create additional OpenStack management information, including the **heat** domain and its administrator **heat_domain_admin**, the **heat_stack_owner** role, and the **heat_stack_user** role. - -``` -openstack user create --domain heat --password-prompt heat_domain_admin -openstack role add --domain heat --user-domain heat --user heat_domain_admin admin -openstack role create heat_stack_owner -openstack role create heat_stack_user -``` - -5. Install the software packages. - -``` -yum install openstack-heat-api openstack-heat-api-cfn openstack-heat-engine -``` - -6. Modify the configuration file **/etc/heat/heat.conf**. - -``` -[DEFAULT] -transport_url = rabbit://openstack:RABBIT_PASS@controller -heat_metadata_server_url = http://controller:8000 -heat_waitcondition_server_url = http://controller:8000/v1/waitcondition -stack_domain_admin = heat_domain_admin -stack_domain_admin_password = HEAT_DOMAIN_PASS -stack_user_domain_name = heat - -[database] -connection = mysql+pymysql://heat:HEAT_DBPASS@controller/heat - -[keystone_authtoken] -www_authenticate_uri = http://controller:5000 -auth_url = http://controller:5000 -memcached_servers = controller:11211 -auth_type = password -project_domain_name = default -user_domain_name = default -project_name = service -username = heat -password = HEAT_PASS - -[trustee] -auth_type = password -auth_url = http://controller:5000 -username = heat -password = HEAT_PASS -user_domain_name = default - -[clients_keystone] -auth_uri = http://controller:5000 -``` - -7. Initialize the **heat** database table. - -``` -su -s /bin/sh -c "heat-manage db_sync" heat -``` - -8. Start the services. - -``` -systemctl enable openstack-heat-api.service openstack-heat-api-cfn.service openstack-heat-engine.service -systemctl start openstack-heat-api.service openstack-heat-api-cfn.service openstack-heat-engine.service -``` - -## OpenStack Quick Installation - -The OpenStack SIG provides the Ansible script for one-click deployment of OpenStack in All in One or Distributed modes. Users can use the script to quickly deploy an OpenStack environment based on openEuler RPM packages. The following uses the All in One mode installation as an example. - -1. Install the OpenStack SIG Tool. - - ```shell - pip install openstack-sig-tool - ``` - -2. Configure the OpenStack Yum source. - - ```shell - yum install openstack-release-train - ``` - - **Note**: Enable the EPOL repository for the Yum source if it is not enabled already. - - ```shell - vi /etc/yum.repos.d/openEuler.repo - - [EPOL] - name=EPOL - baseurl=http://repo.openeuler.org/openEuler-22.03-LTS/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.03-LTS/OS/$basearch/RPM-GPG-KEY-openEuler - EOF - -3. Update the Ansible configurations. - - Open the **/usr/local/etc/inventory/all_in_one.yaml** file and modify the configuration based on the environment and requirements. Modify the file as follows: - - ```shell - all: - hosts: - controller: - ansible_host: - ansible_ssh_private_key_file: - ansible_ssh_user: root - vars: - mysql_root_password: root - mysql_project_password: root - rabbitmq_password: root - project_identity_password: root - enabled_service: - - keystone - - neutron - - cinder - - placement - - nova - - glance - - horizon - - aodh - - ceilometer - - cyborg - - gnocchi - - kolla - - heat - - swift - - trove - - tempest - neutron_provider_interface_name: br-ex - default_ext_subnet_range: 10.100.100.0/24 - default_ext_subnet_gateway: 10.100.100.1 - neutron_dataplane_interface_name: eth1 - cinder_block_device: vdb - swift_storage_devices: - - vdc - swift_hash_path_suffix: ash - swift_hash_path_prefix: has - children: - compute: - hosts: controller - storage: - hosts: controller - network: - hosts: controller - vars: - test-key: test-value - dashboard: - hosts: controller - vars: - allowed_host: '*' - kolla: - hosts: controller - vars: - # We add openEuler OS support for kolla in OpenStack Queens/Rocky release - # Set this var to true if you want to use it in Q/R - openeuler_plugin: false - ``` - - Key Configurations - - | Item | Description| - |---|---| - | ansible_host | IP address of the all-in-one node.| - | ansible_ssh_private_key_file | Key used by the Ansible script for logging in to the all-in-one node.| - | ansible_ssh_user | User used by the Ansible script for logging in to the all-in-one node.| - | enabled_service | List of services to be installed. You can delete services as required.| - | neutron_provider_interface_name | Neutron L3 bridge name. | - | default_ext_subnet_range | Neutron private network IP address range. | - | default_ext_subnet_gateway | Neutron private network gateway. | - | neutron_dataplane_interface_name | NIC used by Neutron. You are advised to use a new NIC to avoid conflicts with existing NICs causing disconnection of the all-in-one node. | - | cinder_block_device | Name of the block device used by Cinder.| - | swift_storage_devices | Name of the block device used by Swift. | - -4. Run the installation command. - - ```shell - oos env setup all_in_one - ``` - - After the command is executed, the OpenStack environment of the All in One mode is successfully deployed. - - The environment variable file **.admin-openrc** is stored in the home directory of the current user. - -5. Initialize the Tempest environment. - - If you want to perform the Tempest test in the environment, run the `oos env init all_in_one` command to create the OpenStack resources required by Tempest. - - After the command is executed successfully, a **mytest** directory is generated in the home directory of the user. You can run the `tempest run` command in the directory. diff --git a/docs/en/docs/thirdparty_migration/OpenStack-wallaby.md b/docs/en/docs/thirdparty_migration/OpenStack-wallaby.md deleted file mode 100644 index 486d1856d5d70faa55066435483d203939059cf4..0000000000000000000000000000000000000000 --- a/docs/en/docs/thirdparty_migration/OpenStack-wallaby.md +++ /dev/null @@ -1,3208 +0,0 @@ -# OpenStack-Wallaby Deployment Guide - - - -- [OpenStack-Wallaby Deployment Guide](#openstack-wallaby-deployment-guide) - - [OpenStack](#openstack) - - [Conventions](#conventions) - - [Preparing the Environment](#preparing-the-environment) - - [Environment Configuration](#environment-configuration) - - [Installing the SQL Database](#installing-the-sql-database) - - [Installing RabbitMQ](#installing-rabbitmq) - - [Installing Memcached](#installing-memcached) - - [Installing OpenStack](#installing-openstack) - - [Installing Keystone](#installing-keystone) - - [Installing Glance](#installing-glance) - - [Installing Placement](#installing-placement) - - [Installing Nova](#installing-nova) - - [Installing Neutron](#installing-neutron) - - [Installing Cinder](#installing-cinder) - - [Installing Horizon](#installing-horizon) - - [Installing Tempest](#installing-tempest) - - [Installing Ironic](#installing-ironic) - - [Installing Kolla](#installing-kolla) - - [Installing Trove](#installing-trove) - - [Installing Swift](#installing-swift) - - -## OpenStack - -OpenStack is an open source cloud computing infrastructure software project developed by the community. It provides an operating platform or tool set for deploying the cloud, offering scalable and flexible cloud computing for organizations. - -As an open source cloud computing management platform, OpenStack consists of several major components, such as Nova, Cinder, Neutron, Glance, Keystone, and Horizon. OpenStack supports almost all cloud environments. The project aims to provide a cloud computing management platform that is easy-to-use, scalable, unified, and standardized. OpenStack provides an infrastructure as a service (IaaS) solution that combines complementary services, each of which provides an API for integration. - -The official source of openEuler 22.03 LTS now supports OpenStack Wallaby. You can configure the Yum source then deploy OpenStack by following the instructions of this document. - -## Conventions - -OpenStack supports multiple deployment modes. This document includes two deployment modes: `All in One` and `Distributed`. The conventions are as follows: - -`ALL in One` mode: - -```text -Ignores all possible suffixes. -``` - -`Distributed` mode: - -```text -A suffix of `(CTL)` indicates that the configuration or command applies only to the `control node`. -A suffix of `(CPT)` indicates that the configuration or command applies only to the `compute node`. -A suffix of `(STG)` indicates that the configuration or command applies only to the `storage node`. -In other cases, the configuration or command applies to both the `control node` and `compute node`. -``` - -***Note*** - -The services involved in the preceding conventions are as follows: - -- Cinder -- Nova -- Neutron - -## Preparing the Environment - -### Environment Configuration - -1. Configure the openEuler 22.03 LTS official Yum source. Enable the EPOL software repository to support OpenStack. - - ```shell - yum update - yum install openstack-release-wallaby - yum clean all && yum makecache - ``` - - **Note**: Enable the EPOL repository for the Yum source if it is not enabled already. - - ```shell - vi /etc/yum.repos.d/openEuler.repo - - [EPOL] - name=EPOL - baseurl=http://repo.openeuler.org/openEuler-22.03-LTS/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.03-LTS/OS/$basearch/RPM-GPG-KEY-openEuler - EOF - ``` - -2. Change the host name and mapping. - - Set the host name of each node: - - ```shell - hostnamectl set-hostname controller (CTL) - hostnamectl set-hostname compute (CPT) - ``` - - Assuming the IP address of the controller node is **10.0.0.11** and the IP address of the compute node (if any) is **10.0.0.12**, add the following information to the **/etc/hosts** file: - - ```shell - 10.0.0.11 controller - 10.0.0.12 compute - ``` - -### Installing the SQL Database - -1. Run the following command to install the software package: - - ```shell - yum install mariadb mariadb-server python3-PyMySQL - ``` - -2. Run the following command to create and edit the `/etc/my.cnf.d/openstack.cnf` file: - - ```shell - vim /etc/my.cnf.d/openstack.cnf - - [mysqld] - bind-address = 10.0.0.11 - default-storage-engine = innodb - innodb_file_per_table = on - max_connections = 4096 - collation-server = utf8_general_ci - character-set-server = utf8 - ``` - - ***Note*** - - **`bind-address` is set to the management IP address of the controller node.** - -3. Run the following commands to start the database service and configure it to automatically start upon system boot: - - ```shell - systemctl enable mariadb.service - systemctl start mariadb.service - ``` - -4. (Optional) Configure the default database password: - - ```shell - mysql_secure_installation - ``` - - ***Note*** - - **Perform operations as prompted.** - -### Installing RabbitMQ - -1. Run the following command to install the software package: - - ```shell - yum install rabbitmq-server - ``` - -2. Start the RabbitMQ service and configure it to automatically start upon system boot: - - ```shell - systemctl enable rabbitmq-server.service - systemctl start rabbitmq-server.service - ``` - -3. Add the OpenStack user: - - ```shell - rabbitmqctl add_user openstack RABBIT_PASS - ``` - - ***Note*** - - **Replace `RABBIT_PASS` to set the password for the openstack user.** - -4. Run the following command to set the permission of the openstack user to allow the user to perform configuration, write, and read operations: - - ```shell - rabbitmqctl set_permissions openstack ".*" ".*" ".*" - ``` - -### Installing Memcached - -1. Run the following command to install the dependency package: - - ```shell - yum install memcached python3-memcached - ``` - -2. Open the `/etc/sysconfig/memcached` file in insert mode. - - ```shell - vim /etc/sysconfig/memcached - - OPTIONS="-l 127.0.0.1,::1,controller" - ``` - -3. Run the following command to start the Memcached service and configure it to automatically start upon system boot: - - ```shell - systemctl enable memcached.service - systemctl start memcached.service - ``` - - ***Note*** - - **After the service is started, you can run `memcached-tool controller stats` to ensure that the service is started properly and available. You can replace `controller` with the management IP address of the controller node.** - -## Installing OpenStack - -### Installing Keystone - -1. Create the **keyston** database and grant permissions: - - ``` sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE keystone; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost' \ - IDENTIFIED BY 'KEYSTONE_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'%' \ - IDENTIFIED BY 'KEYSTONE_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note*** - - **Replace `KEYSTONE_DBPASS` to set the password for the keystone database.** - -2. Install the software package: - - ```shell - yum install openstack-keystone httpd mod_wsgi - ``` - -3. Configure Keystone: - - ```shell - vim /etc/keystone/keystone.conf - - [database] - connection = mysql+pymysql://keystone:KEYSTONE_DBPASS@controller/keystone - - [token] - provider = fernet - ``` - - ***Description*** - - In the **[database]** section, configure the database entry . - - In the **[token]** section, configure the token provider . - - ***Note:*** - - **Replace `KEYSTONE_DBPASS` with the password of the keystone database.** - -4. Synchronize the database: - - ```shell - su -s /bin/sh -c "keystone-manage db_sync" keystone - ``` - -5. Initialize the Fernet keystore: - - ```shell - keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone - keystone-manage credential_setup --keystone-user keystone --keystone-group keystone - ``` - -6. Start the service: - - ```shell - keystone-manage bootstrap --bootstrap-password ADMIN_PASS \ - --bootstrap-admin-url http://controller:5000/v3/ \ - --bootstrap-internal-url http://controller:5000/v3/ \ - --bootstrap-public-url http://controller:5000/v3/ \ - --bootstrap-region-id RegionOne - ``` - - ***Note*** - - **Replace `ADMIN_PASS` to set the password for the admin user.** - -7. Configure the Apache HTTP server: - - ```shell - vim /etc/httpd/conf/httpd.conf - - ServerName controller - ``` - - ```shell - ln -s /usr/share/keystone/wsgi-keystone.conf /etc/httpd/conf.d/ - ``` - - ***Description*** - - Configure `ServerName` to use the control node. - - ***Note*** - **If the `ServerName` item does not exist, create it. - -8. Start the Apache HTTP service: - - ```shell - systemctl enable httpd.service - systemctl start httpd.service - ``` - -9. Create environment variables: - - ```shell - cat << EOF >> ~/.admin-openrc - export OS_PROJECT_DOMAIN_NAME=Default - export OS_USER_DOMAIN_NAME=Default - export OS_PROJECT_NAME=admin - export OS_USERNAME=admin - export OS_PASSWORD=ADMIN_PASS - export OS_AUTH_URL=http://controller:5000/v3 - export OS_IDENTITY_API_VERSION=3 - export OS_IMAGE_API_VERSION=2 - EOF - ``` - - ***Note*** - - **Replace `ADMIN_PASS` with the password of the admin user.** - -10. Create domains, projects, users, and roles in sequence.The python3-openstackclient must be installed first: - - ```shell - yum install python3-openstackclient - ``` - - Import the environment variables: - - ```shell - source ~/.admin-openrc - ``` - - Create the project `service`. The domain `default` has been created during keystone-manage bootstrap. - - ```shell - openstack domain create --description "An Example Domain" example - ``` - - ```shell - openstack project create --domain default --description "Service Project" service - ``` - - Create the (non-admin) project `myproject`, user `myuser`, and role `myrole`, and add the role `myrole` to `myproject` and `myuser`. - - ```shell - openstack project create --domain default --description "Demo Project" myproject - openstack user create --domain default --password-prompt myuser - openstack role create myrole - openstack role add --project myproject --user myuser myrole - ``` - -11. Perform the verification. - - Cancel the temporary environment variables `OS_AUTH_URL` and `OS_PASSWORD`. - - ```shell - source ~/.admin-openrc - unset OS_AUTH_URL OS_PASSWORD - ``` - - Request a token for the **admin** user: - - ```shell - openstack --os-auth-url http://controller:5000/v3 \ - --os-project-domain-name Default --os-user-domain-name Default \ - --os-project-name admin --os-username admin token issue - ``` - - Request a token for user **myuser**: - - ```shell - openstack --os-auth-url http://controller:5000/v3 \ - --os-project-domain-name Default --os-user-domain-name Default \ - --os-project-name myproject --os-username myuser token issue - ``` - -### Installing Glance - -1. Create the database, service credentials, and the API endpoints. - - Create the database: - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE glance; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'localhost' \ - IDENTIFIED BY 'GLANCE_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'%' \ - IDENTIFIED BY 'GLANCE_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note:*** - - **Replace `GLANCE_DBPASS` to set the password for the glance database.** - - Create the service credential: - - ```shell - source ~/.admin-openrc - - openstack user create --domain default --password-prompt glance - openstack role add --project service --user glance admin - openstack service create --name glance --description "OpenStack Image" image - ``` - - Create the API endpoints for the image service: - - ```shell - openstack endpoint create --region RegionOne image public http://controller:9292 - openstack endpoint create --region RegionOne image internal http://controller:9292 - openstack endpoint create --region RegionOne image admin http://controller:9292 - ``` - -2. Install the software package: - - ```shell - yum install openstack-glance - ``` - -3. Configure Glance: - - ```shell - vim /etc/glance/glance-api.conf - - [database] - connection = mysql+pymysql://glance:GLANCE_DBPASS@controller/glance - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000 - auth_url = http://controller:5000 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = glance - password = GLANCE_PASS - - [paste_deploy] - flavor = keystone - - [glance_store] - stores = file,http - default_store = file - filesystem_store_datadir = /var/lib/glance/images/ - ``` - - ***Description:*** - - In the **[database]** section, configure the database entry. - - In the **[keystone_authtoken]** and **[paste_deploy]** sections, configure the identity authentication service entry. - - In the **[glance_store]** section, configure the local file system storage and the location of image files. - - ***Note*** - - **Replace `GLANCE_DBPASS` with the password of the glance database.** - - **Replace `GLANCE_PASS` with the password of user glance.** - -4. Synchronize the database: - - ```shell - su -s /bin/sh -c "glance-manage db_sync" glance - ``` - -5. Start the service: - - ```shell - systemctl enable openstack-glance-api.service - systemctl start openstack-glance-api.service - ``` - -6. Perform the verification. - - Download the image: - - ```shell - source ~/.admin-openrc - - wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img - ``` - - ***Note*** - - **If the Kunpeng architecture is used in your environment, download the image of the AArch64 version. the Image cirros-0.5.2-aarch64-disk.img has been tested.** - - Upload the image to the image service: - - ```shell - openstack image create --disk-format qcow2 --container-format bare \ - --file cirros-0.4.0-x86_64-disk.img --public cirros - ``` - - Confirm the image upload and verify the attributes: - - ```shell - openstack image list - ``` - -### Installing Placement - -1. Create a database, service credentials, and API endpoints. - - Create a database. - - Access the database as the **root** user. Create the **placement** database, and grant permissions. - - ```shell - mysql -u root -p - MariaDB [(none)]> CREATE DATABASE placement; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'localhost' \ - IDENTIFIED BY 'PLACEMENT_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'%' \ - IDENTIFIED BY 'PLACEMENT_DBPASS'; - MariaDB [(none)]> exit - ``` - - **Note**: - - **Replace `PLACEMENT_DBPASS` to set the password for the placement database.** - - ```shell - source admin-openrc - ``` - - Run the following commands to create the Placement service credentials, create the **placement** user, and add the **admin** role to the **placement** user: - - Create the Placement API Service. - - ```shell - openstack user create --domain default --password-prompt placement - openstack role add --project service --user placement admin - openstack service create --name placement --description "Placement API" placement - ``` - - Create API endpoints of the Placement service. - - ```shell - openstack endpoint create --region RegionOne placement public http://controller:8778 - openstack endpoint create --region RegionOne placement internal http://controller:8778 - openstack endpoint create --region RegionOne placement admin http://controller:8778 - ``` - -2. Perform the installation and configuration. - - Install the software package: - - ```shell - yum install openstack-placement-api - ``` - - Configure Placement: - - Edit the **/etc/placement/placement.conf** file: - - In the **[placement_database]** section, configure the database entry. - - In **[api]** and **[keystone_authtoken]** sections, configure the identity authentication service entry. - - ```shell - # vim /etc/placement/placement.conf - [placement_database] - # ... - connection = mysql+pymysql://placement:PLACEMENT_DBPASS@controller/placement - [api] - # ... - auth_strategy = keystone - [keystone_authtoken] - # ... - auth_url = http://controller:5000/v3 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = placement - password = PLACEMENT_PASS - ``` - - Replace **PLACEMENT_DBPASS** with the password of the **placement** database, and replace **PLACEMENT_PASS** with the password of the **placement** user. - - Synchronize the database: - - ```shell - su -s /bin/sh -c "placement-manage db sync" placement - ``` - - Start the httpd service. - - ```shell - systemctl restart httpd - ``` - -3. Perform the verification. - - Run the following command to check the status: - - ```shell - . admin-openrc - placement-status upgrade check - ``` - - Run the following command to install osc-placement and list the available resource types and features: - - ```shell - yum install python3-osc-placement - openstack --os-placement-api-version 1.2 resource class list --sort-column name - openstack --os-placement-api-version 1.6 trait list --sort-column name - ``` - -### Installing Nova - -1. Create a database, service credentials, and API endpoints. - - Create a database. - - ```sql - mysql -u root -p (CTL) - - MariaDB [(none)]> CREATE DATABASE nova_api; - MariaDB [(none)]> CREATE DATABASE nova; - MariaDB [(none)]> CREATE DATABASE nova_cell0; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'localhost' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'%' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'localhost' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'%' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'localhost' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'%' \ - IDENTIFIED BY 'NOVA_DBPASS'; - MariaDB [(none)]> exit - ``` - - **Note**: - - **Replace `NOVA_DBPASS` to set the password for the nova database.** - - ```shell - source ~/.admin-openrc (CTL) - ``` - - Run the following command to create the Nova service certificate: - - ```shell - openstack user create --domain default --password-prompt nova (CTL) - openstack role add --project service --user nova admin (CTL) - openstack service create --name nova --description "OpenStack Compute" compute (CTL) - ``` - - Create a Nova API endpoint. - - ```shell - openstack endpoint create --region RegionOne compute public http://controller:8774/v2.1 (CTL) - openstack endpoint create --region RegionOne compute internal http://controller:8774/v2.1 (CTL) - openstack endpoint create --region RegionOne compute admin http://controller:8774/v2.1 (CTL) - ``` - -2. Install the software packages: - - ```shell - yum install openstack-nova-api openstack-nova-conductor \ (CTL) - openstack-nova-novncproxy openstack-nova-scheduler - - yum install openstack-nova-compute (CPT) - ``` - - **Note**: - - **If the ARM64 architecture is used, you also need to run the following command:** - - ```shell - yum install edk2-aarch64 (CPT) - ``` - -3. Configure Nova: - - ```shell - vim /etc/nova/nova.conf - - [DEFAULT] - enabled_apis = osapi_compute,metadata - transport_url = rabbit://openstack:RABBIT_PASS@controller:5672/ - my_ip = 10.0.0.1 - use_neutron = true - firewall_driver = nova.virt.firewall.NoopFirewallDriver - compute_driver=libvirt.LibvirtDriver (CPT) - instances_path = /var/lib/nova/instances/ (CPT) - lock_path = /var/lib/nova/tmp (CPT) - - [api_database] - connection = mysql+pymysql://nova:NOVA_DBPASS@controller/nova_api (CTL) - - [database] - connection = mysql+pymysql://nova:NOVA_DBPASS@controller/nova (CTL) - - [api] - auth_strategy = keystone - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000/ - auth_url = http://controller:5000/ - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = nova - password = NOVA_PASS - - [vnc] - enabled = true - server_listen = $my_ip - server_proxyclient_address = $my_ip - novncproxy_base_url = http://controller:6080/vnc_auto.html (CPT) - - [libvirt] - virt_type = qemu (CPT) - cpu_mode = custom (CPT) - cpu_model = cortex-a72 (CPT) - - [glance] - api_servers = http://controller:9292 - - [oslo_concurrency] - lock_path = /var/lib/nova/tmp (CTL) - - [placement] - region_name = RegionOne - project_domain_name = Default - project_name = service - auth_type = password - user_domain_name = Default - auth_url = http://controller:5000/v3 - username = placement - password = PLACEMENT_PASS - - [neutron] - auth_url = http://controller:5000 - auth_type = password - project_domain_name = default - user_domain_name = default - region_name = RegionOne - project_name = service - username = neutron - password = NEUTRON_PASS - service_metadata_proxy = true (CTL) - metadata_proxy_shared_secret = METADATA_SECRET (CTL) - ``` - - Description - - In the **[default]** section, enable the compute and metadata APIs, configure the RabbitMQ message queue entry, configure **my_ip**, and enable the network service **neutron**. - - In the **[api_database]** and **[database]** sections, configure the database entry. - - In the **[api]** and **[keystone_authtoken]** sections, configure the identity service entry. - - In the **[vnc]** section, enable and configure the entry for the remote console. - - In the **[glance]** section, configure the API address for the image service. - - In the **[oslo_concurrency]** section, configure the lock path. - - In the **[placement]** section, configure the entry of the Placement service. - - **Note**: - - **Replace `RABBIT_PASS` with the password of the openstack user in RabbitMQ.** - - **Set `my_ip` to the management IP address of the controller node.** - - **Replace `NOVA_DBPASS` with the password of the nova database.** - - **Replace `NOVA_PASS` with the password of the nova user.** - - **Replace `PLACEMENT_PASS` with the password of the placement user.** - - **Replace `NEUTRON_PASS` with the password of the neutron user.** - - **Replace `METADATA_SECRET` with a proper metadata agent secret.** - - Others - - Check whether VM hardware acceleration (x86 architecture) is supported: - - ```shell - egrep -c '(vmx|svm)' /proc/cpuinfo (CPT) - ``` - - If the returned value is **0**, hardware acceleration is not supported. You need to configure libvirt to use QEMU instead of KVM. - - ```shell - vim /etc/nova/nova.conf (CPT) - - [libvirt] - virt_type = qemu - ``` - - If the returned value is **1** or a larger value, hardware acceleration is supported, and no extra configuration is required. - - **Note**: - - **If the ARM64 architecture is used, you also need to run the following command:** - - ```shell - vim /etc/libvirt/qemu.conf - - nvram = ["/usr/share/AAVMF/AAVMF_CODE.fd: \ - /usr/share/AAVMF/AAVMF_VARS.fd", \ - "/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw: \ - /usr/share/edk2/aarch64/vars-template-pflash.raw"] - - vim /etc/qemu/firmware/edk2-aarch64.json - - { - "description": "UEFI firmware for ARM64 virtual machines", - "interface-types": [ - "uefi" - ], - "mapping": { - "device": "flash", - "executable": { - "filename": "/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw", - "format": "raw" - }, - "nvram-template": { - "filename": "/usr/share/edk2/aarch64/vars-template-pflash.raw", - "format": "raw" - } - }, - "targets": [ - { - "architecture": "aarch64", - "machines": [ - "virt-*" - ] - } - ], - "features": [ - - ], - "tags": [ - - ] - } - - (CPT) - ``` - -4. Synchronize the database. - - Run the following command to synchronize the **nova-api** database: - - ```shell - su -s /bin/sh -c "nova-manage api_db sync" nova (CTL) - ``` - - Run the following command to register the **cell0** database: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 map_cell0" nova (CTL) - ``` - - Create the **cell1** cell: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 create_cell --name=cell1 --verbose" nova (CTL) - ``` - - Synchronize the **nova** database: - - ```shell - su -s /bin/sh -c "nova-manage db sync" nova (CTL) - ``` - - Verify whether **cell0** and **cell1** are correctly registered: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 list_cells" nova (CTL) - ``` - - Add compute node to the OpenStack cluster: - - ```shell - su -s /bin/sh -c "nova-manage cell_v2 discover_hosts --verbose" nova (CPT) - ``` - -5. Start the services: - - ```shell - systemctl enable \ (CTL) - openstack-nova-api.service \ - openstack-nova-scheduler.service \ - openstack-nova-conductor.service \ - openstack-nova-novncproxy.service - - systemctl start \ (CTL) - openstack-nova-api.service \ - openstack-nova-scheduler.service \ - openstack-nova-conductor.service \ - openstack-nova-novncproxy.service - ``` - - ```shell - systemctl enable libvirtd.service openstack-nova-compute.service (CPT) - systemctl start libvirtd.service openstack-nova-compute.service (CPT) - ``` - -6. Perform the verification. - - ```shell - source ~/.admin-openrc (CTL) - ``` - - List the service components to verify that each process is successfully started and registered: - - ```shell - openstack compute service list (CTL) - ``` - - List the API endpoints in the identity service to verify the connection to the identity service: - - ```shell - openstack catalog list (CTL) - ``` - - List the images in the image service to verify the connections: - - ```shell - openstack image list (CTL) - ``` - - Check whether the cells are running properly and whether other prerequisites are met. - - ```shell - nova-status upgrade check (CTL) - ``` - -### Installing Neutron - -1. Create the database, service credentials, and API endpoints. - - Create the database: - - ```sql - mysql -u root -p (CTL) - - MariaDB [(none)]> CREATE DATABASE neutron; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'localhost' \ - IDENTIFIED BY 'NEUTRON_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'%' \ - IDENTIFIED BY 'NEUTRON_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note*** - - **Replace `NEUTRON_DBPASS` to set the password for the neutron database.** - - ```shell - source ~/.admin-openrc (CTL) - ``` - - Create the **neutron** service credential: - - ```shell - openstack user create --domain default --password-prompt neutron (CTL) - openstack role add --project service --user neutron admin (CTL) - openstack service create --name neutron --description "OpenStack Networking" network (CTL) - ``` - - Create the API endpoints of the Neutron service: - - ```shell - openstack endpoint create --region RegionOne network public http://controller:9696 (CTL) - openstack endpoint create --region RegionOne network internal http://controller:9696 (CTL) - openstack endpoint create --region RegionOne network admin http://controller:9696 (CTL) - ``` - -2. Install the software packages: - - ```shell - yum install openstack-neutron openstack-neutron-linuxbridge ebtables ipset \ (CTL) - openstack-neutron-ml2 - ``` - - ```shell - yum install openstack-neutron-linuxbridge ebtables ipset (CPT) - ``` - -3. Configure Neutron. - - Set the main configuration items: - - ```shell - vim /etc/neutron/neutron.conf - - [database] - connection = mysql+pymysql://neutron:NEUTRON_DBPASS@controller/neutron (CTL) - - [DEFAULT] - core_plugin = ml2 (CTL) - service_plugins = router (CTL) - allow_overlapping_ips = true (CTL) - transport_url = rabbit://openstack:RABBIT_PASS@controller - auth_strategy = keystone - notify_nova_on_port_status_changes = true (CTL) - notify_nova_on_port_data_changes = true (CTL) - api_workers = 3 (CTL) - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000 - auth_url = http://controller:5000 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = neutron - password = NEUTRON_PASS - - [nova] - auth_url = http://controller:5000 (CTL) - auth_type = password (CTL) - project_domain_name = Default (CTL) - user_domain_name = Default (CTL) - region_name = RegionOne (CTL) - project_name = service (CTL) - username = nova (CTL) - password = NOVA_PASS (CTL) - - [oslo_concurrency] - lock_path = /var/lib/neutron/tmp - ``` - - ***Description*** - - Configure the database entry in the **[database]** section. - - Enable the ML2 and router plugins, allow IP address overlapping, and configure the RabbitMQ message queue entry in the **[default]** section. - - Configure the identity authentication service entry in the **[default]** and **[keystone]** sections. - - Enable the network to notify the change of the compute network topology in the **[default]** and **[nova]** sections. - - Configure the lock path in the **[oslo_concurrency]** section. - - ***Note*** - - **Replace `NEUTRON_DBPASS` with the password of the neutron database.** - - **Replace `RABBIT_PASS` with the password of the openstack user in RabbitMQ.** - - **Replace `NEUTRON_PASS` with the password of the neutron user.** - - **Replace `NOVA_PASS` with the password of the nova user.** - - Configure the ML2 plugin: - - ```shell - vim /etc/neutron/plugins/ml2/ml2_conf.ini - - [ml2] - type_drivers = flat,vlan,vxlan - tenant_network_types = vxlan - mechanism_drivers = linuxbridge,l2population - extension_drivers = port_security - - [ml2_type_flat] - flat_networks = provider - - [ml2_type_vxlan] - vni_ranges = 1:1000 - - [securitygroup] - enable_ipset = true - ``` - - Create the symbolic link for /etc/neutron/plugin.ini. - - ```shell - ln -s /etc/neutron/plugins/ml2/ml2_conf.ini /etc/neutron/plugin.ini - ``` - - **Note** - - **Enable flat, vlan, and vxlan networks, enable the linuxbridge and l2population mechanisms, and enable the port security extension driver in the [ml2] section.** - - **Configure the flat network as the provider virtual network in the [ml2_type_flat] section.** - - **Configure the range of the VXLAN network identifier in the [ml2_type_vxlan] section.** - - **Set ipset enabled in the [securitygroup] section.** - - **Remarks** - - **The actual configurations of l2 can be modified based as required. In this example, the provider network + linuxbridge is used.** - - Configure the Linux bridge agent: - - ```shell - vim /etc/neutron/plugins/ml2/linuxbridge_agent.ini - - [linux_bridge] - physical_interface_mappings = provider:PROVIDER_INTERFACE_NAME - - [vxlan] - enable_vxlan = true - local_ip = OVERLAY_INTERFACE_IP_ADDRESS - l2_population = true - - [securitygroup] - enable_security_group = true - firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver - ``` - - ***Description*** - - Map the provider virtual network to the physical network interface in the **[linux_bridge]** section. - - Enable the VXLAN overlay network, configure the IP address of the physical network interface that processes the overlay network, and enable layer-2 population in the **[vxlan]** section. - - Enable the security group and configure the linux bridge iptables firewall driver in the **[securitygroup]** section. - - ***Note*** - - **Replace `PROVIDER_INTERFACE_NAME` with the physical network interface.** - - **Replace `OVERLAY_INTERFACE_IP_ADDRESS` with the management IP address of the controller node.** - - Configure the Layer-3 agent: - - ```shell - vim /etc/neutron/l3_agent.ini (CTL) - - [DEFAULT] - interface_driver = linuxbridge - ``` - - ***Description*** - - Set the interface driver to linuxbridge in the **[default]** section. - - Configure the DHCP agent: - - ```shell - vim /etc/neutron/dhcp_agent.ini (CTL) - - [DEFAULT] - interface_driver = linuxbridge - dhcp_driver = neutron.agent.linux.dhcp.Dnsmasq - enable_isolated_metadata = true - ``` - - ***Description*** - - In the **[default]** section, configure the linuxbridge interface driver and Dnsmasq DHCP driver, and enable the isolated metadata. - - Configure the metadata agent: - - ```shell - vim /etc/neutron/metadata_agent.ini (CTL) - - [DEFAULT] - nova_metadata_host = controller - metadata_proxy_shared_secret = METADATA_SECRET - ``` - - ***Description*** - - In the **[default]**, configure the metadata host and the shared secret. - - ***Note*** - - **Replace `METADATA_SECRET` with a proper metadata agent secret.** - -4. Configure Nova: - - ```shell - vim /etc/nova/nova.conf - - [neutron] - auth_url = http://controller:5000 - auth_type = password - project_domain_name = Default - user_domain_name = Default - region_name = RegionOne - project_name = service - username = neutron - password = NEUTRON_PASS - service_metadata_proxy = true (CTL) - metadata_proxy_shared_secret = METADATA_SECRET (CTL) - ``` - - ***Description*** - - In the **[neutron]** section, configure the access parameters, enable the metadata agent, and configure the secret. - - ***Note*** - - **Replace `NEUTRON_PASS` with the password of the neutron user.** - - **Replace `METADATA_SECRET` with a proper metadata agent secret.** - -5. Synchronize the database: - - ```shell - su -s /bin/sh -c "neutron-db-manage --config-file /etc/neutron/neutron.conf \ - --config-file /etc/neutron/plugins/ml2/ml2_conf.ini upgrade head" neutron - ``` - -6. Run the following command to restart the compute API service: - - ```shell - systemctl restart openstack-nova-api.service - ``` - -7. Start the network service: - - ```shell - systemctl enable neutron-server.service neutron-linuxbridge-agent.service \ (CTL) - neutron-dhcp-agent.service neutron-metadata-agent.service \ - systemctl enable neutron-l3-agent.service - systemctl restart openstack-nova-api.service neutron-server.service (CTL) - neutron-linuxbridge-agent.service neutron-dhcp-agent.service \ - neutron-metadata-agent.service neutron-l3-agent.service - - systemctl enable neutron-linuxbridge-agent.service (CPT) - systemctl restart neutron-linuxbridge-agent.service openstack-nova-compute.service (CPT) - ``` - -8. Perform the verification. - - Run the following command to verify whether the Neutron agent is started successfully: - - ```shell - openstack network agent list - ``` - -### Installing Cinder - -1. Create the database, service credentials, and API endpoints. - - Create the database: - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE cinder; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' \ - IDENTIFIED BY 'CINDER_DBPASS'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' \ - IDENTIFIED BY 'CINDER_DBPASS'; - MariaDB [(none)]> exit - ``` - - ***Note*** - - **Replace `CINDER_DBPASS` to set the password for the cinder database.** - - ```shell - source ~/.admin-openrc - ``` - - Create the Cinder service credentials: - - ```shell - openstack user create --domain default --password-prompt cinder - openstack role add --project service --user cinder admin - openstack service create --name cinderv2 --description "OpenStack Block Storage" volumev2 - openstack service create --name cinderv3 --description "OpenStack Block Storage" volumev3 - ``` - - Create the API endpoints for the block storage service: - - ```shell - openstack endpoint create --region RegionOne volumev2 public http://controller:8776/v2/%\(project_id\)s - openstack endpoint create --region RegionOne volumev2 internal http://controller:8776/v2/%\(project_id\)s - openstack endpoint create --region RegionOne volumev2 admin http://controller:8776/v2/%\(project_id\)s - openstack endpoint create --region RegionOne volumev3 public http://controller:8776/v3/%\(project_id\)s - openstack endpoint create --region RegionOne volumev3 internal http://controller:8776/v3/%\(project_id\)s - openstack endpoint create --region RegionOne volumev3 admin http://controller:8776/v3/%\(project_id\)s - ``` - -2. Install the software packages: - - ```shell - yum install openstack-cinder-api openstack-cinder-scheduler (CTL) - ``` - - ```shell - yum install lvm2 device-mapper-persistent-data scsi-target-utils rpcbind nfs-utils \ (STG) - openstack-cinder-volume openstack-cinder-backup - ``` - -3. Prepare the storage devices. The following is an example: - - ```shell - pvcreate /dev/vdb - vgcreate cinder-volumes /dev/vdb - - vim /etc/lvm/lvm.conf - - - devices { - ... - filter = [ "a/vdb/", "r/.*/"] - ``` - - ***Description*** - - In the **devices** section, add filters to allow the **/dev/vdb** devices and reject other devices. - -4. Prepare the NFS: - - ```shell - mkdir -p /root/cinder/backup - - cat << EOF >> /etc/export - /root/cinder/backup 192.168.1.0/24(rw,sync,no_root_squash,no_all_squash) - EOF - - ``` - -5. Configure Cinder: - - ```shell - vim /etc/cinder/cinder.conf - - [DEFAULT] - transport_url = rabbit://openstack:RABBIT_PASS@controller - auth_strategy = keystone - my_ip = 10.0.0.11 - enabled_backends = lvm (STG) - backup_driver=cinder.backup.drivers.nfs.NFSBackupDriver (STG) - backup_share=HOST:PATH (STG) - - [database] - connection = mysql+pymysql://cinder:CINDER_DBPASS@controller/cinder - - [keystone_authtoken] - www_authenticate_uri = http://controller:5000 - auth_url = http://controller:5000 - memcached_servers = controller:11211 - auth_type = password - project_domain_name = Default - user_domain_name = Default - project_name = service - username = cinder - password = CINDER_PASS - - [oslo_concurrency] - lock_path = /var/lib/cinder/tmp - - [lvm] - volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver (STG) - volume_group = cinder-volumes (STG) - iscsi_protocol = iscsi (STG) - iscsi_helper = tgtadm (STG) - ``` - - ***Description*** - - In the **[database]** section, configure the database entry. - - In the **[DEFAULT]** section, configure the RabbitMQ message queue entry and **my_ip**. - - In the **[DEFAULT]** and **[keystone_authtoken]** sections, configure the identity authentication service entry. - - In the **[oslo_concurrency]** section, configure the lock path. - - ***Note*** - - **Replace `CINDER_DBPASS` with the password of the cinder database.** - - **Replace `RABBIT_PASS` with the password of the openstack user in RabbitMQ.** - - **Set `my_ip` to the management IP address of the controller node.** - - **Replace `CINDER_PASS` with the password of the cinder user.** - - **Replace `HOST:PATH` with the host IP address and the shared path of the NFS.** - -6. Synchronize the database: - - ```shell - su -s /bin/sh -c "cinder-manage db sync" cinder (CTL) - ``` - -7. Configure Nova: - - ```shell - vim /etc/nova/nova.conf (CTL) - - [cinder] - os_region_name = RegionOne - ``` - -8. Restart the compute API service: - - ```shell - systemctl restart openstack-nova-api.service - ``` - -9. Start the Cinder service: - - ```shell - systemctl enable openstack-cinder-api.service openstack-cinder-scheduler.service (CTL) - systemctl start openstack-cinder-api.service openstack-cinder-scheduler.service (CTL) - ``` - - ```shell - systemctl enable rpcbind.service nfs-server.service tgtd.service iscsid.service \ (STG) - openstack-cinder-volume.service \ - openstack-cinder-backup.service - systemctl start rpcbind.service nfs-server.service tgtd.service iscsid.service \ (STG) - openstack-cinder-volume.service \ - openstack-cinder-backup.service - ``` - - ***Note*** - - If the Cinder volumes are mounted using tgtadm, modify the /etc/tgt/tgtd.conf file as follows to ensure that tgtd can discover the iscsi target of cinder-volume. - - ```shell - include /var/lib/cinder/volumes/* - ``` - -10. Perform the verification: - - ```shell - source ~/.admin-openrc - openstack volume service list - ``` - -### Installing Horizon - -1. Install the software package: - - ```shell - yum install openstack-dashboard - ``` - -2. Modify the file. - - Modify the variables: - - ```text - vim /etc/openstack-dashboard/local_settings - - OPENSTACK_HOST = "controller" - ALLOWED_HOSTS = ['*', ] - - SESSION_ENGINE = 'django.contrib.sessions.backends.cache' - - CACHES = { - 'default': { - 'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache', - 'LOCATION': 'controller:11211', - } - } - - OPENSTACK_KEYSTONE_URL = "http://%s:5000/v3" % OPENSTACK_HOST - OPENSTACK_KEYSTONE_MULTIDOMAIN_SUPPORT = True - OPENSTACK_KEYSTONE_DEFAULT_DOMAIN = "Default" - OPENSTACK_KEYSTONE_DEFAULT_ROLE = "user" - - OPENSTACK_API_VERSIONS = { - "identity": 3, - "image": 2, - "volume": 3, - } - ``` - -3. Restart the httpd service: - - ```shell - systemctl restart httpd.service memcached.service - ``` - -4. Perform the verification. - Open the browser, enter in the address bar, and log in to Horizon. - - ***Note*** - - **Replace `HOSTIP` with the management plane IP address of the controller node.** - -### Installing Tempest - -Tempest is the integrated test service of OpenStack. If you need to run a fully automatic test of the functions of the installed OpenStack environment, you are advised to use Tempest. Otherwise, you can choose not to install it. - -1. Install Tempest: - - ```shell - yum install openstack-tempest - ``` - -2. Initialize the directory: - - ```shell - tempest init mytest - ``` - -3. Modify the configuration file: - - ```shell - cd mytest - vi etc/tempest.conf - ``` - - Configure the current OpenStack environment information in **tempest.conf**. For details, see the [official example](https://docs.openstack.org/tempest/latest/sampleconf.html). - -4. Perform the test: - - ```shell - tempest run - ``` - -5. (Optional) Install the tempest extensions. - The OpenStack services have provided some tempest test packages. You can install these packages to enrich the tempest test content. In Wallaby, extension tests for Cinder, Glance, Keystone, Ironic and Trove are provided. You can run the following command to install and use them: - ``` - yum install python3-cinder-tempest-plugin python3-glance-tempest-plugin python3-ironic-tempest-plugin python3-keystone-tempest-plugin python3-trove-tempest-plugin - ``` - -### Installing Ironic - -Ironic is the bare metal service of OpenStack. If you need to deploy bare metal machines, Ironic is recommended. Otherwise, you can choose not to install it. - -1. Set the database. - - The bare metal service stores information in the database. Create a **ironic** database that can be accessed by the **ironic** user and replace **IRONIC_DBPASSWORD** with a proper password. - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE ironic CHARACTER SET utf8; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON ironic.* TO 'ironic'@'localhost' \ - IDENTIFIED BY 'IRONIC_DBPASSWORD'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON ironic.* TO 'ironic'@'%' \ - IDENTIFIED BY 'IRONIC_DBPASSWORD'; - ``` - -2. Create service user authentication. - - 1. Create the bare metal service user: - - ```shell - openstack user create --password IRONIC_PASSWORD \ - --email ironic@example.com ironic - openstack role add --project service --user ironic admin - openstack service create --name ironic - --description "Ironic baremetal provisioning service" baremetal - - openstack service create --name ironic-inspector --description "Ironic inspector baremetal provisioning service" baremetal-introspection - openstack user create --password IRONIC_INSPECTOR_PASSWORD --email ironic_inspector@example.com ironic_inspector - openstack role add --project service --user ironic-inspector admin - ``` - - 2. Create the bare metal service access entries: - - ```shell - openstack endpoint create --region RegionOne baremetal admin http://$IRONIC_NODE:6385 - openstack endpoint create --region RegionOne baremetal public http://$IRONIC_NODE:6385 - openstack endpoint create --region RegionOne baremetal internal http://$IRONIC_NODE:6385 - openstack endpoint create --region RegionOne baremetal-introspection internal http://172.20.19.13:5050/v1 - openstack endpoint create --region RegionOne baremetal-introspection public http://172.20.19.13:5050/v1 - openstack endpoint create --region RegionOne baremetal-introspection admin http://172.20.19.13:5050/v1 - ``` - -3. Configure the ironic-api service. - - Configuration file path: **/etc/ironic/ironic.conf** - - 1. Use **connection** to configure the location of the database as follows. Replace **IRONIC_DBPASSWORD** with the password of user **ironic** and replace **DB_IP** with the IP address of the database server. - - ```shell - [database] - - # The SQLAlchemy connection string used to connect to the - # database (string value) - - connection = mysql+pymysql://ironic:IRONIC_DBPASSWORD@DB_IP/ironic - ``` - - 2. Configure the ironic-api service to use the RabbitMQ message broker. Replace **RPC_\*** with the detailed address and the credential of RabbitMQ. - - ```shell - [DEFAULT] - - # A URL representing the messaging driver to use and its full - # configuration. (string value) - - transport_url = rabbit://RPC_USER:RPC_PASSWORD@RPC_HOST:RPC_PORT/ - ``` - - You can also use json-rpc instead of RabbitMQ. - - 3. Configure the ironic-api service to use the credential of the identity authentication service. Replace **PUBLIC_IDENTITY_IP** with the public IP address of the identity authentication server and **PRIVATE_IDENTITY_IP** with the private IP address of the identity authentication server, replace **IRONIC_PASSWORD** with the password of the **ironic** user in the identity authentication service. - - ```shell - [DEFAULT] - - # Authentication strategy used by ironic-api: one of - # "keystone" or "noauth". "noauth" should not be used in a - # production environment because all authentication will be - # disabled. (string value) - - auth_strategy=keystone - host = controller - memcache_servers = controller:11211 - enabled_network_interfaces = flat,noop,neutron - default_network_interface = noop - transport_url = rabbit://openstack:RABBITPASSWD@controller:5672/ - enabled_hardware_types = ipmi - enabled_boot_interfaces = pxe - enabled_deploy_interfaces = direct - default_deploy_interface = direct - enabled_inspect_interfaces = inspector - enabled_management_interfaces = ipmitool - enabled_power_interfaces = ipmitool - enabled_rescue_interfaces = no-rescue,agent - isolinux_bin = /usr/share/syslinux/isolinux.bin - logging_context_format_string = %(asctime)s.%(msecs)03d %(process)d %(levelname)s %(name)s [%(global_request_id)s %(request_id)s %(user_identity)s] %(instance)s%(message)s - - [keystone_authtoken] - # Authentication type to load (string value) - auth_type=password - # Complete public Identity API endpoint (string value) - www_authenticate_uri=http://PUBLIC_IDENTITY_IP:5000 - # Complete admin Identity API endpoint. (string value) - auth_url=http://PRIVATE_IDENTITY_IP:5000 - # Service username. (string value) - username=ironic - # Service account password. (string value) - password=IRONIC_PASSWORD - # Service tenant name. (string value) - project_name=service - # Domain name containing project (string value) - project_domain_name=Default - # User's domain name (string value) - user_domain_name=Default - - [agent] - deploy_logs_collect = always - deploy_logs_local_path = /var/log/ironic/deploy - deploy_logs_storage_backend = local - image_download_source = http - stream_raw_images = false - force_raw_images = false - verify_ca = False - - [oslo_concurrency] - - [oslo_messaging_notifications] - transport_url = rabbit://openstack:123456@172.20.19.25:5672/ - topics = notifications - driver = messagingv2 - - [oslo_messaging_rabbit] - amqp_durable_queues = True - rabbit_ha_queues = True - - [pxe] - ipxe_enabled = false - pxe_append_params = nofb nomodeset vga=normal coreos.autologin ipa-insecure=1 - image_cache_size = 204800 - tftp_root=/var/lib/tftpboot/cephfs/ - tftp_master_path=/var/lib/tftpboot/cephfs/master_images - - [dhcp] - dhcp_provider = none - ``` - - 4. Create the bare metal service database table: - - ```shell - ironic-dbsync --config-file /etc/ironic/ironic.conf create_schema - ``` - - 5. Restart the ironic-api service: - - ```shell - sudo systemctl restart openstack-ironic-api - ``` - -4. Configure the ironic-conductor service. - - 1. Replace **HOST_IP** with the IP address of the conductor host. - - ```shell - [DEFAULT] - - # IP address of this host. If unset, will determine the IP - # programmatically. If unable to do so, will use "127.0.0.1". - # (string value) - - my_ip=HOST_IP - ``` - - 2. Specifies the location of the database. ironic-conductor must use the same configuration as ironic-api. Replace **IRONIC_DBPASSWORD** with the password of user **ironic** and replace **DB_IP** with the IP address of the database server. - - ```shell - [database] - - # The SQLAlchemy connection string to use to connect to the - # database. (string value) - - connection = mysql+pymysql://ironic:IRONIC_DBPASSWORD@DB_IP/ironic - ``` - - 3. Configure the ironic-api service to use the RabbitMQ message broker. ironic-conductor must use the same configuration as ironic-api. Replace **RPC_\*** with the detailed address and the credential of RabbitMQ. - - ```shell - [DEFAULT] - - # A URL representing the messaging driver to use and its full - # configuration. (string value) - - transport_url = rabbit://RPC_USER:RPC_PASSWORD@RPC_HOST:RPC_PORT/ - ``` - - You can also use json-rpc instead of RabbitMQ. - - 4. Configure the credentials to access other OpenStack services. - - To communicate with other OpenStack services, the bare metal service needs to use the service users to get authenticated by the OpenStack Identity service when requesting other services. The credentials of these users must be configured in each configuration file associated to the corresponding service. - - ```shell - [neutron] - Accessing the OpenStack network services. - [glance] - Accessing the OpenStack image service. - [swift] - Accessing the OpenStack object storage service. - [cinder] - Accessing the OpenStack block storage service. - [inspector] Accessing the OpenStack bare metal introspection service. - [service_catalog] - A special item to store the credential used by the bare metal service. The credential is used to discover the API URL endpoint registered in the OpenStack identity authentication service catalog by the bare metal service. - ``` - - For simplicity, you can use one service user for all services. For backward compatibility, the user name must be the same as that configured in [keystone_authtoken] of the ironic-api service. However, this is not mandatory. You can also create and configure a different service user for each service. - - In the following example, the authentication information for the user to access the OpenStack network service is configured as follows: - - ```shell - The network service is deployed in the identity authentication service domain named RegionOne. Only the public endpoint interface is registered in the service catalog. - - A specific CA SSL certificate is used for HTTPS connection when sending a request. - - The same service user as that configured for ironic-api. - - The dynamic password authentication plugin discovers a proper identity authentication service API version based on other options. - ``` - - ```shell - [neutron] - - # Authentication type to load (string value) - auth_type = password - # Authentication URL (string value) - auth_url=https://IDENTITY_IP:5000/ - # Username (string value) - username=ironic - # User's password (string value) - password=IRONIC_PASSWORD - # Project name to scope to (string value) - project_name=service - # Domain ID containing project (string value) - project_domain_id=default - # User's domain id (string value) - user_domain_id=default - # PEM encoded Certificate Authority to use when verifying - # HTTPs connections. (string value) - cafile=/opt/stack/data/ca-bundle.pem - # The default region_name for endpoint URL discovery. (string - # value) - region_name = RegionOne - # List of interfaces, in order of preference, for endpoint - # URL. (list value) - valid_interfaces=public - ``` - - By default, to communicate with other services, the bare metal service attempts to discover a proper endpoint of the service through the service catalog of the identity authentication service. If you want to use a different endpoint for a specific service, specify the endpoint_override option in the bare metal service configuration file. - - ```shell - [neutron] ... endpoint_override = - ``` - - 5. Configure the allowed drivers and hardware types. - - Set enabled_hardware_types to specify the hardware types that can be used by ironic-conductor: - - ```shell - [DEFAULT] enabled_hardware_types = ipmi - ``` - - Configure hardware interfaces: - - ```shell - enabled_boot_interfaces = pxe enabled_deploy_interfaces = direct,iscsi enabled_inspect_interfaces = inspector enabled_management_interfaces = ipmitool enabled_power_interfaces = ipmitool - ``` - - Configure the default value of the interface: - - ```shell - [DEFAULT] default_deploy_interface = direct default_network_interface = neutron - ``` - - If any driver that uses Direct Deploy is enabled, you must install and configure the Swift backend of the image service. The Ceph object gateway (RADOS gateway) can also be used as the backend of the image service. - - 6. Restart the ironic-conductor service: - - ```shell - sudo systemctl restart openstack-ironic-conductor - ``` - -5. Configure the ironic-inspector service. - - Configuration file path: **/etc/ironic-inspector/inspector.conf**. - - 1. Create the database: - - ```shell - # mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE ironic_inspector CHARACTER SET utf8; - - MariaDB [(none)]> GRANT ALL PRIVILEGES ON ironic_inspector.* TO 'ironic_inspector'@'localhost' \ IDENTIFIED BY 'IRONIC_INSPECTOR_DBPASSWORD'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON ironic_inspector.* TO 'ironic_inspector'@'%' \ - IDENTIFIED BY 'IRONIC_INSPECTOR_DBPASSWORD'; - ``` - - 2. Use **connection** to configure the location of the database as follows. Replace **IRONIC_INSPECTOR_DBPASSWORD** with the password of user **ironic_inspector** and replace **DB_IP** with the IP address of the database server: - - ```shell - [database] - backend = sqlalchemy - connection = mysql+pymysql://ironic_inspector:IRONIC_INSPECTOR_DBPASSWORD@DB_IP/ironic_inspector - min_pool_size = 100 - max_pool_size = 500 - pool_timeout = 30 - max_retries = 5 - max_overflow = 200 - db_retry_interval = 2 - db_inc_retry_interval = True - db_max_retry_interval = 2 - db_max_retries = 5 - ``` - - 3. Configure the communication address of the message queue: - - ```shell - [DEFAULT] - transport_url = rabbit://RPC_USER:RPC_PASSWORD@RPC_HOST:RPC_PORT/ - - ``` - - 4. Configure the Keystone authentication: - - ```shell - [DEFAULT] - - auth_strategy = keystone - timeout = 900 - rootwrap_config = /etc/ironic-inspector/rootwrap.conf - logging_context_format_string = %(asctime)s.%(msecs)03d %(process)d %(levelname)s %(name)s [%(global_request_id)s %(request_id)s %(user_identity)s] %(instance)s%(message)s - log_dir = /var/log/ironic-inspector - state_path = /var/lib/ironic-inspector - use_stderr = False - - [ironic] - api_endpoint = http://IRONIC_API_HOST_ADDRRESS:6385 - auth_type = password - auth_url = http://PUBLIC_IDENTITY_IP:5000 - auth_strategy = keystone - ironic_url = http://IRONIC_API_HOST_ADDRRESS:6385 - os_region = RegionOne - project_name = service - project_domain_name = Default - user_domain_name = Default - username = IRONIC_SERVICE_USER_NAME - password = IRONIC_SERVICE_USER_PASSWORD - - [keystone_authtoken] - auth_type = password - auth_url = http://control:5000 - www_authenticate_uri = http://control:5000 - project_domain_name = default - user_domain_name = default - project_name = service - username = ironic_inspector - password = IRONICPASSWD - region_name = RegionOne - memcache_servers = control:11211 - token_cache_time = 300 - - [processing] - add_ports = active - processing_hooks = $default_processing_hooks,local_link_connection,lldp_basic - ramdisk_logs_dir = /var/log/ironic-inspector/ramdisk - always_store_ramdisk_logs = true - store_data =none - power_off = false - - [pxe_filter] - driver = iptables - - [capabilities] - boot_mode=True - ``` - - 5. Configure the ironic inspector dnsmasq service: - - ```shell - #Configuration file path: /etc/ironic-inspector/dnsmasq.conf - port=0 - interface=enp3s0 #Replace with the actual listening network interface. - dhcp-range=172.20.19.100,172.20.19.110 #Replace with the actual DHCP IP address range. - bind-interfaces - enable-tftp - - dhcp-match=set:efi,option:client-arch,7 - dhcp-match=set:efi,option:client-arch,9 - dhcp-match=aarch64, option:client-arch,11 - dhcp-boot=tag:aarch64,grubaa64.efi - dhcp-boot=tag:!aarch64,tag:efi,grubx64.efi - dhcp-boot=tag:!aarch64,tag:!efi,pxelinux.0 - - tftp-root=/tftpboot #Replace with the actual tftpboot directory. - log-facility=/var/log/dnsmasq.log - ``` - - 6. Disable DHCP for the subnet of the ironic provision network. - - ``` - openstack subnet set --no-dhcp 72426e89-f552-4dc4-9ac7-c4e131ce7f3c - ``` - - 7. Initializs the database of the ironic-inspector service. - - Run the following command on the controller node: - - ``` - ironic-inspector-dbsync --config-file /etc/ironic-inspector/inspector.conf upgrade - ``` - - 8. Start the service: - - ```shell - systemctl enable --now openstack-ironic-inspector.service - systemctl enable --now openstack-ironic-inspector-dnsmasq.service - ``` - -6. Configure the httpd service. - - 1. Create the root directory of the httpd used by Ironic, and set the owner and owner group. The directory path must be the same as the path specified by the **http_root** configuration item in the **[deploy]** group in **/etc/ironic/ironic.conf**. - - ``` - mkdir -p /var/lib/ironic/httproot ``chown ironic.ironic /var/lib/ironic/httproot - ``` - - - - 2. Install and configure the httpd Service. - - - - 1. Install the httpd service. If the httpd service is already installed, skip this step. - - ``` - yum install httpd -y - ``` - - - - 2. Create the **/etc/httpd/conf.d/openstack-ironic-httpd.conf** file. The file content is as follows: - - ``` - Listen 8080 - - - ServerName ironic.openeuler.com - - ErrorLog "/var/log/httpd/openstack-ironic-httpd-error_log" - CustomLog "/var/log/httpd/openstack-ironic-httpd-access_log" "%h %l %u %t \"%r\" %>s %b" - - DocumentRoot "/var/lib/ironic/httproot" - - Options Indexes FollowSymLinks - Require all granted - - LogLevel warn - AddDefaultCharset UTF-8 - EnableSendfile on - - - ``` - - The listening port must be the same as the port specified by **http_url** in the **[deploy]** section of **/etc/ironic/ironic.conf**. - - 3. Restart the httpd service: - - ``` - systemctl restart httpd - ``` - - - -7. Create the deploy ramdisk image. - - The ramdisk image of Wallaby can be created using the ironic-python-agent service or disk-image-builder tool. You can also use the latest ironic-python-agent-builder provided by the community. You can also use other tools. - To use the Wallaby native tool, you need to install the corresponding software package. - - ```shell - yum install openstack-ironic-python-agent - or - yum install diskimage-builder - ``` - - For details, see the [official document](https://docs.openstack.org/ironic/queens/install/deploy-ramdisk.html). - - The following describes how to use the ironic-python-agent-builder to build the deploy image used by ironic. - - 1. Install ironic-python-agent-builder. - - - 1. Install the tool: - - ```shell - pip install ironic-python-agent-builder - ``` - - 2. Modify the python interpreter in the following files: - - ```shell - /usr/bin/yum /usr/libexec/urlgrabber-ext-down - ``` - - 3. Install the other necessary tools: - - ```shell - yum install git - ``` - - `DIB` depends on the `semanage` command. Therefore, check whether the `semanage --help` command is available before creating an image. If the system displays a message indicating that the command is unavailable, install the command: - - ```shell - # Check which package needs to be installed. - [root@localhost ~]# yum provides /usr/sbin/semanage - Loaded plug-in: fastestmirror - Loading mirror speeds from cached hostfile - * base: mirror.vcu.edu - * extras: mirror.vcu.edu - * updates: mirror.math.princeton.edu - policycoreutils-python-2.5-34.el7.aarch64 : SELinux policy core python utilities - Source: base - Matching source: - File name: /usr/sbin/semanage - # Install. - [root@localhost ~]# yum install policycoreutils-python - ``` - - 2. Create the image. - - For `arm` architecture, add the following information: - ```shell - export ARCH=aarch64 - ``` - - Basic usage: - - ```shell - usage: ironic-python-agent-builder [-h] [-r RELEASE] [-o OUTPUT] [-e ELEMENT] - [-b BRANCH] [-v] [--extra-args EXTRA_ARGS] - distribution - - positional arguments: - distribution Distribution to use - - optional arguments: - -h, --help show this help message and exit - -r RELEASE, --release RELEASE - Distribution release to use - -o OUTPUT, --output OUTPUT - Output base file name - -e ELEMENT, --element ELEMENT - Additional DIB element to use - -b BRANCH, --branch BRANCH - If set, override the branch that is used for ironic- - python-agent and requirements - -v, --verbose Enable verbose logging in diskimage-builder - --extra-args EXTRA_ARGS - Extra arguments to pass to diskimage-builder - ``` - - Example: - - ```shell - ironic-python-agent-builder centos -o /mnt/ironic-agent-ssh -b origin/stable/rocky - ``` - - 3. Allow SSH login. - - Initialize the environment variables and create the image: - - ```shell - export DIB_DEV_USER_USERNAME=ipa \ - export DIB_DEV_USER_PWDLESS_SUDO=yes \ - export DIB_DEV_USER_PASSWORD='123' - ironic-python-agent-builder centos -o /mnt/ironic-agent-ssh -b origin/stable/rocky -e selinux-permissive -e devuser - ``` - - 4. Specify the code repository. - - Initialize the corresponding environment variables and create the image: - - ```shell - # Specify the address and version of the repository. - DIB_REPOLOCATION_ironic_python_agent=git@172.20.2.149:liuzz/ironic-python-agent.git - DIB_REPOREF_ironic_python_agent=origin/develop - - # Clone code from Gerrit. - DIB_REPOLOCATION_ironic_python_agent=https://review.opendev.org/openstack/ironic-python-agent - DIB_REPOREF_ironic_python_agent=refs/changes/43/701043/1 - ``` - - Reference: [source-repositories](https://docs.openstack.org/diskimage-builder/latest/elements/source-repositories/README.html). - - The specified repository address and version are verified successfully. - - 5. Note - -The template of the PXE configuration file of the native OpenStack does not support the ARM64 architecture. You need to modify the native OpenStack code. - -In Wallaby, Ironic provided by the community does not support the boot from ARM 64-bit UEFI PXE. As a result, the format of the generated grub.cfg file (generally in /tftpboot/) is incorrect, causing the PXE boot failure. - -The generated incorrect configuration file is as follows: - -![erro](/Users/andy_lee/Downloads/erro.png) - -As shown in the preceding figure, in the ARM architecture, the commands for searching for the vmlinux and ramdisk images are **linux** and **initrd**, respectively. The command in red in the preceding figure is the UEFI PXE startup command in the x86 architecture. - -You need to modify the code logic for generating the grub.cfg file. - -The following TLS error is reported when Ironic sends a request to IPA to query the command execution status: - -By default, both IPA and Ironic of Wallaby have TLS authentication enabled to send requests to each other. Disable TLS authentication according to the description on the official website. - -1. Add **ipa-insecure=1** to the following configuration in the Ironic configuration file (**/etc/ironic/ironic.conf**): - -``` -[agent] -verify_ca = False - -[pxe] -pxe_append_params = nofb nomodeset vga=normal coreos.autologin ipa-insecure=1 -``` - -2. Add the IPA configuration file **/etc/ironic_python_agent/ironic_python_agent.conf** to the ramdisk image and configure the TLS as follows: - -**/etc/ironic_python_agent/ironic_python_agent.conf** (The **/etc/ironic_python_agent** directory must be created in advance.) - -``` -[DEFAULT] -enable_auto_tls = False -``` - -Set the permission: - -``` -chown -R ipa.ipa /etc/ironic_python_agent/ -``` - -3. Modify the startup file of the IPA service and add the configuration file option. - - vim usr/lib/systemd/system/ironic-python-agent.service - - ``` - [Unit] - Description=Ironic Python Agent - After=network-online.target - - [Service] - ExecStartPre=/sbin/modprobe vfat - ExecStart=/usr/local/bin/ironic-python-agent --config-file /etc/ironic_python_agent/ironic_python_agent.conf - Restart=always - RestartSec=30s - - [Install] - WantedBy=multi-user.target - ``` - - - -### Installing Kolla - -Kolla provides the OpenStack service with the container-based deployment function that is ready for the production environment. The Kolla and Kolla-ansible services are introduced in openEuler in version 22.03 LTS. - -The installation of Kolla is simple. You only need to install the corresponding RPM packages: - -``` -yum install openstack-kolla openstack-kolla-ansible -``` - -After the installation is complete, you can run commands such as `kolla-ansible`, `kolla-build`, `kolla-genpwd`, `kolla-mergepwd`. - -### Installing Trove -Trove is the database service of OpenStack. If you need to use the database service provided by OpenStack, Trove is recommended. Otherwise, you can choose not to install it. - -1. Set the database. - - The database service stores information in the database. Create a **trove** database that can be accessed by the **trove** user and replace **TROVE_DBPASSWORD** with a proper password. - - ```sql - mysql -u root -p - - MariaDB [(none)]> CREATE DATABASE trove CHARACTER SET utf8; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON trove.* TO 'trove'@'localhost' \ - IDENTIFIED BY 'TROVE_DBPASSWORD'; - MariaDB [(none)]> GRANT ALL PRIVILEGES ON trove.* TO 'trove'@'%' \ - IDENTIFIED BY 'TROVE_DBPASSWORD'; - ``` - -2. Create service user authentication. - - 1. Create the **Trove** service user. - - ```shell - openstack user create --password TROVE_PASSWORD \ - --email trove@example.com trove - openstack role add --project service --user trove admin - openstack service create --name trove - --description "Database service" database - ``` - **Description:** Replace `TROVE_PASSWORD` with the password of the `trove` user. - - 2. Create the **Database** service access entry - - ```shell - openstack endpoint create --region RegionOne database public http://controller:8779/v1.0/%\(tenant_id\)s - openstack endpoint create --region RegionOne database internal http://controller:8779/v1.0/%\(tenant_id\)s - openstack endpoint create --region RegionOne database admin http://controller:8779/v1.0/%\(tenant_id\)s - ``` - -3. Install and configure the **Trove** components. - 1. Install the **Trove** package: - ```shell script - yum install openstack-trove python-troveclient - ``` - 2. Configure `trove.conf`: - ```shell script - vim /etc/trove/trove.conf - - [DEFAULT] - bind_host=TROVE_NODE_IP - log_dir = /var/log/trove - network_driver = trove.network.neutron.NeutronDriver - management_security_groups = - nova_keypair = trove-mgmt - default_datastore = mysql - taskmanager_manager = trove.taskmanager.manager.Manager - trove_api_workers = 5 - transport_url = rabbit://openstack:RABBIT_PASS@controller:5672/ - reboot_time_out = 300 - usage_timeout = 900 - agent_call_high_timeout = 1200 - use_syslog = False - debug = True - - # Set these if using Neutron Networking - network_driver=trove.network.neutron.NeutronDriver - network_label_regex=.* - - - transport_url = rabbit://openstack:RABBIT_PASS@controller:5672/ - - [database] - connection = mysql+pymysql://trove:TROVE_DBPASS@controller/trove - - [keystone_authtoken] - project_domain_name = Default - project_name = service - user_domain_name = Default - password = trove - username = trove - auth_url = http://controller:5000/v3/ - auth_type = password - - [service_credentials] - auth_url = http://controller:5000/v3/ - region_name = RegionOne - project_name = service - password = trove - project_domain_name = Default - user_domain_name = Default - username = trove - - [mariadb] - tcp_ports = 3306,4444,4567,4568 - - [mysql] - tcp_ports = 3306 - - [postgresql] - tcp_ports = 5432 - ``` - **Description:** - - In the `[Default]` section, set `bind_host` to the IP address of the node where Trove is deployed. - - `nova_compute_url` and `cinder_url` are endpoints created by Nova and Cinder in Keystone. - - `nova_proxy_XXX` is a user who can access the Nova service. In the preceding example, the `admin` user is used. - - `transport_url` is the `RabbitMQ` connection information, and `RABBIT_PASS` is the RabbitMQ password. - - In the `[database]` section, `connection` is the information of the database created for Trove in MySQL. - - Replace `TROVE_PASS` in the Trove user information with the password of the **trove** user. - - 5. Configure `trove-guestagent.conf`: - ```shell script - vim /etc/trove/trove-guestagent.conf - - [DEFAULT] - log_file = trove-guestagent.log - log_dir = /var/log/trove/ - ignore_users = os_admin - control_exchange = trove - transport_url = rabbit://openstack:RABBIT_PASS@controller:5672/ - rpc_backend = rabbit - command_process_timeout = 60 - use_syslog = False - debug = True - - [service_credentials] - auth_url = http://controller:5000/v3/ - region_name = RegionOne - project_name = service - password = TROVE_PASS - project_domain_name = Default - user_domain_name = Default - username = trove - - [mysql] - docker_image = your-registry/your-repo/mysql - backup_docker_image = your-registry/your-repo/db-backup-mysql:1.1.0 - ``` - ** Description:** `guestagent` is an independent component in Trove and needs to be pre-built into the virtual machine image created by Trove using Nova. - After the database instance is created, the guestagent process is started to report heartbeat messages to the Trove through the message queue (RabbitMQ). - Therefore, you need to configure the user name and password of the RabbitMQ. - ** Since Victoria, Trove uses a unified image to run different types of databases. The database service runs in the Docker container of the Guest VM.** - - `transport_url` is the `RabbitMQ` connection information, and `RABBIT_PASS` is the RabbitMQ password. - - Replace `TROVE_PASS` in the Trove user information with the password of the **trove** user. - - 6. Generate the `Trove` database table. - ```shell script - su -s /bin/sh -c "trove-manage db_sync" trove - ``` -4. Complete the installation and configuration. - 1. Configure the **Trove** service to automatically start: - ```shell script - systemctl enable openstack-trove-api.service \ - openstack-trove-taskmanager.service \ - openstack-trove-conductor.service - ``` - 2. Start the service: - ```shell script - systemctl start openstack-trove-api.service \ - openstack-trove-taskmanager.service \ - openstack-trove-conductor.service - ``` -### Installing Swift - -Swift provides a scalable and highly available distributed object storage service, which is suitable for storing unstructured data in large scale. - -1. Create the service credentials and API endpoints. - - Create the service credential: - - ``` shell - #Create the swift user. - openstack user create --domain default --password-prompt swift - #Add the admin role for the swift user. - openstack role add --project service --user swift admin - #Create the swift service entity. - openstack service create --name swift --description "OpenStack Object Storage" object-store - ``` - - Create the Swift API endpoints. - - ```shell - openstack endpoint create --region RegionOne object-store public http://controller:8080/v1/AUTH_%\(project_id\)s - openstack endpoint create --region RegionOne object-store internal http://controller:8080/v1/AUTH_%\(project_id\)s - openstack endpoint create --region RegionOne object-store admin http://controller:8080/v1 - ``` - - -2. Install the software packages: - - ```shell - yum install openstack-swift-proxy python3-swiftclient python3-keystoneclient python3-keystonemiddleware memcached (CTL) - ``` - -3. Configure the proxy-server. - - The Swift RPM package contains a **proxy-server.conf** file which is basically ready to use. You only need to change the values of **ip** and swift **password** in the file. - - ***Note*** - - **Replace password with the password you set for the swift user in the identity service.** - -4. Install and configure the storage node. (STG) - - Install the supported program packages: - ```shell - yum install xfsprogs rsync - ``` - - Format the /dev/vdb and /dev/vdc devices into XFS: - - ```shell - mkfs.xfs /dev/vdb - mkfs.xfs /dev/vdc - ``` - - Create the mount point directory structure: - - ```shell - mkdir -p /srv/node/vdb - mkdir -p /srv/node/vdc - ``` - - Find the UUID of the new partition: - - ```shell - blkid - ``` - - Add the following to the **/etc/fstab** file: - - ```shell - UUID="" /srv/node/vdb xfs noatime 0 2 - UUID="" /srv/node/vdc xfs noatime 0 2 - ``` - - Mount the devices: - - ```shell - mount /srv/node/vdb - mount /srv/node/vdc - ``` - ***Note*** - - **If the disaster recovery function is not required, you only need to create one device and skip the following rsync configuration.** - - (Optional) Create or edit the **/etc/rsyncd.conf** file to include the following content: - - ```shell - [DEFAULT] - uid = swift - gid = swift - log file = /var/log/rsyncd.log - pid file = /var/run/rsyncd.pid - address = MANAGEMENT_INTERFACE_IP_ADDRESS - - [account] - max connections = 2 - path = /srv/node/ - read only = False - lock file = /var/lock/account.lock - - [container] - max connections = 2 - path = /srv/node/ - read only = False - lock file = /var/lock/container.lock - - [object] - max connections = 2 - path = /srv/node/ - read only = False - lock file = /var/lock/object.lock - ``` - **Replace `MANAGEMENT_INTERFACE_IP_ADDRESS` with the management network IP address of the storage node.** - - Start the rsyncd service and configure it to start upon system startup. - - ```shell - systemctl enable rsyncd.service - systemctl start rsyncd.service - ``` - -5. Install and configure the components on storage nodes. (STG) - - Install the software packages: - - ```shell - yum install openstack-swift-account openstack-swift-container openstack-swift-object - ``` - - Edit **account-server.conf**, **container-server.conf**, and **object-server.conf** in the **/etc/swift directory** and replace **bind_ip** with the management network IP address of the storage node. - - Ensure the proper ownership of the mount point directory structure. - - ```shell - chown -R swift:swift /srv/node - ``` - - Create the recon directory and ensure that it has the correct ownership. - - ```shell - mkdir -p /var/cache/swift - chown -R root:swift /var/cache/swift - chmod -R 775 /var/cache/swift - ``` - -6. Create the account ring. (CTL) - - Switch to the `/etc/swift` directory: - - ```shell - cd /etc/swift - ``` - - Create the basic `account.builder` file: - - ```shell - swift-ring-builder account.builder create 10 1 1 - ``` - - Add each storage node to the ring: - - ```shell - swift-ring-builder account.builder add --region 1 --zone 1 --ip STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS --port 6202 --device DEVICE_NAME --weight DEVICE_WEIGHT - ``` - - **Replace `STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS` with the management network IP address of the storage node. Replace `DEVICE_NAME` with the name of the storage device on the same storage node.** - - ***Note*** - **Repeat this command to each storage device on each storage node.** - - Verify the ring contents: - - ```shell - swift-ring-builder account.builder - ``` - - Rebalance the ring: - - ```shell - swift-ring-builder account.builder rebalance - ``` - -7. Create the container ring. (CTL) - - Switch to the `/etc/swift` directory: - - Create the basic `container.builder` file: - - ```shell - swift-ring-builder container.builder create 10 1 1 - ``` - - Add each storage node to the ring: - - ```shell - swift-ring-builder container.builder \ - add --region 1 --zone 1 --ip STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS --port 6201 \ - --device DEVICE_NAME --weight 100 - - ``` - - **Replace `STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS` with the management network IP address of the storage node. Replace `DEVICE_NAME` with the name of the storage device on the same storage node.** - - ***Note*** - **Repeat this command to every storage devices on every storage nodes.** - - Verify the ring contents: - - ```shell - swift-ring-builder container.builder - ``` - - Rebalance the ring: - - ```shell - swift-ring-builder container.builder rebalance - ``` - -8. Create the object ring. (CTL) - - Switch to the `/etc/swift` directory: - - Create the basic `object.builder` file: - - ```shell - swift-ring-builder object.builder create 10 1 1 - ``` - - Add each storage node to the ring: - - ```shell - swift-ring-builder object.builder \ - add --region 1 --zone 1 --ip STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS --port 6200 \ - --device DEVICE_NAME --weight 100 - ``` - - **Replace `STORAGE_NODE_MANAGEMENT_INTERFACE_IP_ADDRESS` with the management network IP address of the storage node. Replace `DEVICE_NAME` with the name of the storage device on the same storage node.** - - ***Note*** - **Repeat this command to every storage devices on every storage nodes.** - - Verify the ring contents: - - ```shell - swift-ring-builder object.builder - ``` - - Rebalance the ring: - - ```shell - swift-ring-builder object.builder rebalance - ``` - - Distribute ring configuration files: - - Copy `account.ring.gz`, `container.ring.gz`, and `object.ring.gz` to the `/etc/swift` directory on each storage node and any additional nodes running the proxy service. - - - -9. Complete the installation. - - Edit the `/etc/swift/swift.conf` file: - - ``` shell - [swift-hash] - swift_hash_path_suffix = test-hash - swift_hash_path_prefix = test-hash - - [storage-policy:0] - name = Policy-0 - default = yes - ``` - - **Replace test-hash with a unique value.** - - Copy the `swift.conf` file to the `/etc/swift` directory on each storage node and any additional nodes running the proxy service. - - Ensure correct ownership of the configuration directory on all nodes: - - ```shell - chown -R root:swift /etc/swift - ``` - - On the controller node and any additional nodes running the proxy service, start the object storage proxy service and its dependencies, and configure them to start upon system startup. - - ```shell - systemctl enable openstack-swift-proxy.service memcached.service - systemctl start openstack-swift-proxy.service memcached.service - ``` - - On the storage node, start the object storage services and configure them to start upon system startup. - - ```shell - systemctl enable openstack-swift-account.service openstack-swift-account-auditor.service openstack-swift-account-reaper.service openstack-swift-account-replicator.service - - systemctl start openstack-swift-account.service openstack-swift-account-auditor.service openstack-swift-account-reaper.service openstack-swift-account-replicator.service - - systemctl enable openstack-swift-container.service openstack-swift-container-auditor.service openstack-swift-container-replicator.service openstack-swift-container-updater.service - - systemctl start openstack-swift-container.service openstack-swift-container-auditor.service openstack-swift-container-replicator.service openstack-swift-container-updater.service - - systemctl enable openstack-swift-object.service openstack-swift-object-auditor.service openstack-swift-object-replicator.service openstack-swift-object-updater.service - - systemctl start openstack-swift-object.service openstack-swift-object-auditor.service openstack-swift-object-replicator.service openstack-swift-object-updater.service - ``` - -### Installing Cyborg - -Cyborg provides acceleration device support for OpenStack, for example, GPUs, FPGAs, ASICs, NPs, SoCs, NVMe/NOF SSDs, ODPs, DPDKs, and SPDKs. - -1. Initialize the databases. - -``` -CREATE DATABASE cyborg; -GRANT ALL PRIVILEGES ON cyborg.* TO 'cyborg'@'localhost' IDENTIFIED BY 'CYBORG_DBPASS'; -GRANT ALL PRIVILEGES ON cyborg.* TO 'cyborg'@'%' IDENTIFIED BY 'CYBORG_DBPASS'; -``` - -2. Create Keystone resource objects. - -``` -$ openstack user create --domain default --password-prompt cyborg -$ openstack role add --project service --user cyborg admin -$ openstack service create --name cyborg --description "Acceleration Service" accelerator - -$ openstack endpoint create --region RegionOne \ - accelerator public http://:6666/v1 -$ openstack endpoint create --region RegionOne \ - accelerator internal http://:6666/v1 -$ openstack endpoint create --region RegionOne \ - accelerator admin http://:6666/v1 -``` - -3. Install Cyborg - -``` -yum install openstack-cyborg -``` - -4. Configure Cyborg - -Modify **/etc/cyborg/cyborg.conf**. - -``` -[DEFAULT] -transport_url = rabbit://%RABBITMQ_USER%:%RABBITMQ_PASSWORD%@%OPENSTACK_HOST_IP%:5672/ -use_syslog = False -state_path = /var/lib/cyborg -debug = True - -[database] -connection = mysql+pymysql://%DATABASE_USER%:%DATABASE_PASSWORD%@%OPENSTACK_HOST_IP%/cyborg - -[service_catalog] -project_domain_id = default -user_domain_id = default -project_name = service -password = PASSWORD -username = cyborg -auth_url = http://%OPENSTACK_HOST_IP%/identity -auth_type = password - -[placement] -project_domain_name = Default -project_name = service -user_domain_name = Default -password = PASSWORD -username = placement -auth_url = http://%OPENSTACK_HOST_IP%/identity -auth_type = password - -[keystone_authtoken] -memcached_servers = localhost:11211 -project_domain_name = Default -project_name = service -user_domain_name = Default -password = PASSWORD -username = cyborg -auth_url = http://%OPENSTACK_HOST_IP%/identity -auth_type = password -``` - -Set the user names, passwords, and IP addresses as required. - -1. Synchronize the database table. - -``` -cyborg-dbsync --config-file /etc/cyborg/cyborg.conf upgrade -``` - -6. Start the Cyborg services. - -``` -systemctl enable openstack-cyborg-api openstack-cyborg-conductor openstack-cyborg-agent -systemctl start openstack-cyborg-api openstack-cyborg-conductor openstack-cyborg-agent -``` - -### Installing Aodh - -1. Create the database. - -``` -CREATE DATABASE aodh; - -GRANT ALL PRIVILEGES ON aodh.* TO 'aodh'@'localhost' IDENTIFIED BY 'AODH_DBPASS'; - -GRANT ALL PRIVILEGES ON aodh.* TO 'aodh'@'%' IDENTIFIED BY 'AODH_DBPASS'; -``` - -2. Create Keystone resource objects. - -``` -openstack user create --domain default --password-prompt aodh - -openstack role add --project service --user aodh admin - -openstack service create --name aodh --description "Telemetry" alarming - -openstack endpoint create --region RegionOne alarming public http://controller:8042 - -openstack endpoint create --region RegionOne alarming internal http://controller:8042 - -openstack endpoint create --region RegionOne alarming admin http://controller:8042 -``` - -3. Install Aodh. - -``` -yum install openstack-aodh-api openstack-aodh-evaluator openstack-aodh-notifier openstack-aodh-listener openstack-aodh-expirer python3-aodhclient -``` - -4. Modify the configuration file. - -``` -[database] -connection = mysql+pymysql://aodh:AODH_DBPASS@controller/aodh - -[DEFAULT] -transport_url = rabbit://openstack:RABBIT_PASS@controller -auth_strategy = keystone - -[keystone_authtoken] -www_authenticate_uri = http://controller:5000 -auth_url = http://controller:5000 -memcached_servers = controller:11211 -auth_type = password -project_domain_id = default -user_domain_id = default -project_name = service -username = aodh -password = AODH_PASS - -[service_credentials] -auth_type = password -auth_url = http://controller:5000/v3 -project_domain_id = default -user_domain_id = default -project_name = service -username = aodh -password = AODH_PASS -interface = internalURL -region_name = RegionOne -``` - -5. Initialize the database. - -``` -aodh-dbsync -``` - -6. Start the Aodh services. - -``` -systemctl enable openstack-aodh-api.service openstack-aodh-evaluator.service openstack-aodh-notifier.service openstack-aodh-listener.service - -systemctl start openstack-aodh-api.service openstack-aodh-evaluator.service openstack-aodh-notifier.service openstack-aodh-listener.service -``` - -### Installing Gnocchi - -1. Create the database. - -``` -CREATE DATABASE gnocchi; - -GRANT ALL PRIVILEGES ON gnocchi.* TO 'gnocchi'@'localhost' IDENTIFIED BY 'GNOCCHI_DBPASS'; - -GRANT ALL PRIVILEGES ON gnocchi.* TO 'gnocchi'@'%' IDENTIFIED BY 'GNOCCHI_DBPASS'; -``` - -2. Create Keystone resource objects. - -``` -openstack user create --domain default --password-prompt gnocchi - -openstack role add --project service --user gnocchi admin - -openstack service create --name gnocchi --description "Metric Service" metric - -openstack endpoint create --region RegionOne metric public http://controller:8041 - -openstack endpoint create --region RegionOne metric internal http://controller:8041 - -openstack endpoint create --region RegionOne metric admin http://controller:8041 -``` - -3. Install Gnocchi. - -``` -yum install openstack-gnocchi-api openstack-gnocchi-metricd python3-gnocchiclient -``` - -1. Modify the **/etc/gnocchi/gnocchi.conf** configuration file. - -``` -[api] -auth_mode = keystone -port = 8041 -uwsgi_mode = http-socket - -[keystone_authtoken] -auth_type = password -auth_url = http://controller:5000/v3 -project_domain_name = Default -user_domain_name = Default -project_name = service -username = gnocchi -password = GNOCCHI_PASS -interface = internalURL -region_name = RegionOne - -[indexer] -url = mysql+pymysql://gnocchi:GNOCCHI_DBPASS@controller/gnocchi - -[storage] -# coordination_url is not required but specifying one will improve -# performance with better workload division across workers. -coordination_url = redis://controller:6379 -file_basepath = /var/lib/gnocchi -driver = file -``` - -5. Initialize the database. - -``` -gnocchi-upgrade -``` - -6. Start the Gnocchi services. - -``` -systemctl enable openstack-gnocchi-api.service openstack-gnocchi-metricd.service - -systemctl start openstack-gnocchi-api.service openstack-gnocchi-metricd.service -``` - -### Installing Ceilometer - -1. Create Keystone resource objects. - -``` -openstack user create --domain default --password-prompt ceilometer - -openstack role add --project service --user ceilometer admin - -openstack service create --name ceilometer --description "Telemetry" metering -``` - -2. Install Ceilometer. - -``` -yum install openstack-ceilometer-notification openstack-ceilometer-central -``` - -1. Modify the **/etc/ceilometer/pipeline.yaml** configuration file. - -``` -publishers: - # set address of Gnocchi - # + filter out Gnocchi-related activity meters (Swift driver) - # + set default archive policy - - gnocchi://?filter_project=service&archive_policy=low -``` - -4. Modify the **/etc/ceilometer/ceilometer.conf** configuration file. - -``` -[DEFAULT] -transport_url = rabbit://openstack:RABBIT_PASS@controller - -[service_credentials] -auth_type = password -auth_url = http://controller:5000/v3 -project_domain_id = default -user_domain_id = default -project_name = service -username = ceilometer -password = CEILOMETER_PASS -interface = internalURL -region_name = RegionOne -``` - -5. Initialize the database. - -``` -ceilometer-upgrade -``` - -6. Start the Ceilometer services. - -``` -systemctl enable openstack-ceilometer-notification.service openstack-ceilometer-central.service - -systemctl start openstack-ceilometer-notification.service openstack-ceilometer-central.service -``` - -### Installing Heat - -1. Creat the **heat** database and grant proper privileges to it. Replace **HEAT_DBPASS** with a proper password. - -``` -CREATE DATABASE heat; -GRANT ALL PRIVILEGES ON heat.* TO 'heat'@'localhost' IDENTIFIED BY 'HEAT_DBPASS'; -GRANT ALL PRIVILEGES ON heat.* TO 'heat'@'%' IDENTIFIED BY 'HEAT_DBPASS'; -``` - -2. Create a service credential. Create the **heat** user and add the **admin** role to it. - -``` -openstack user create --domain default --password-prompt heat -openstack role add --project service --user heat admin -``` - -3. Create the **heat** and **heat-cfn** services and their API enpoints. - -``` -openstack service create --name heat --description "Orchestration" orchestration -openstack service create --name heat-cfn --description "Orchestration" cloudformation -openstack endpoint create --region RegionOne orchestration public http://controller:8004/v1/%\(tenant_id\)s -openstack endpoint create --region RegionOne orchestration internal http://controller:8004/v1/%\(tenant_id\)s -openstack endpoint create --region RegionOne orchestration admin http://controller:8004/v1/%\(tenant_id\)s -openstack endpoint create --region RegionOne cloudformation public http://controller:8000/v1 -openstack endpoint create --region RegionOne cloudformation internal http://controller:8000/v1 -openstack endpoint create --region RegionOne cloudformation admin http://controller:8000/v1 -``` - -4. Create additional OpenStack management information, including the **heat** domain and its administrator **heat_domain_admin**, the **heat_stack_owner** role, and the **heat_stack_user** role. - -``` -openstack user create --domain heat --password-prompt heat_domain_admin -openstack role add --domain heat --user-domain heat --user heat_domain_admin admin -openstack role create heat_stack_owner -openstack role create heat_stack_user -``` - -5. Install the software packages. - -``` -yum install openstack-heat-api openstack-heat-api-cfn openstack-heat-engine -``` - -6. Modify the configuration file **/etc/heat/heat.conf**. - -``` -[DEFAULT] -transport_url = rabbit://openstack:RABBIT_PASS@controller -heat_metadata_server_url = http://controller:8000 -heat_waitcondition_server_url = http://controller:8000/v1/waitcondition -stack_domain_admin = heat_domain_admin -stack_domain_admin_password = HEAT_DOMAIN_PASS -stack_user_domain_name = heat - -[database] -connection = mysql+pymysql://heat:HEAT_DBPASS@controller/heat - -[keystone_authtoken] -www_authenticate_uri = http://controller:5000 -auth_url = http://controller:5000 -memcached_servers = controller:11211 -auth_type = password -project_domain_name = default -user_domain_name = default -project_name = service -username = heat -password = HEAT_PASS - -[trustee] -auth_type = password -auth_url = http://controller:5000 -username = heat -password = HEAT_PASS -user_domain_name = default - -[clients_keystone] -auth_uri = http://controller:5000 -``` - -7. Initialize the **heat** database table. - -``` -su -s /bin/sh -c "heat-manage db_sync" heat -``` - -8. Start the services. - -``` -systemctl enable openstack-heat-api.service openstack-heat-api-cfn.service openstack-heat-engine.service -systemctl start openstack-heat-api.service openstack-heat-api-cfn.service openstack-heat-engine.service -``` - -## OpenStack Quick Installation - -The OpenStack SIG provides the Ansible script for one-click deployment of OpenStack in All in One or Distributed modes. Users can use the script to quickly deploy an OpenStack environment based on openEuler RPM packages. The following uses the All in One mode installation as an example. - -1. Install the OpenStack SIG Tool. - - ```shell - pip install openstack-sig-tool - ``` - -2. Configure the OpenStack Yum source. - - ```shell - yum install openstack-release-wallaby - ``` - - **Note**: Enable the EPOL repository for the Yum source if it is not enabled already. - - ```shell - vi /etc/yum.repos.d/openEuler.repo - - [EPOL] - name=EPOL - baseurl=http://repo.openeuler.org/openEuler-22.03-LTS/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.03-LTS/OS/$basearch/RPM-GPG-KEY-openEuler - EOF - -3. Update the Ansible configurations. - - Open the **/usr/local/etc/inventory/all_in_one.yaml** file and modify the configuration based on the environment and requirements. Modify the file as follows: - - ```shell - all: - hosts: - controller: - ansible_host: - ansible_ssh_private_key_file: - ansible_ssh_user: root - vars: - mysql_root_password: root - mysql_project_password: root - rabbitmq_password: root - project_identity_password: root - enabled_service: - - keystone - - neutron - - cinder - - placement - - nova - - glance - - horizon - - aodh - - ceilometer - - cyborg - - gnocchi - - kolla - - heat - - swift - - trove - - tempest - neutron_provider_interface_name: br-ex - default_ext_subnet_range: 10.100.100.0/24 - default_ext_subnet_gateway: 10.100.100.1 - neutron_dataplane_interface_name: eth1 - cinder_block_device: vdb - swift_storage_devices: - - vdc - swift_hash_path_suffix: ash - swift_hash_path_prefix: has - children: - compute: - hosts: controller - storage: - hosts: controller - network: - hosts: controller - vars: - test-key: test-value - dashboard: - hosts: controller - vars: - allowed_host: '*' - kolla: - hosts: controller - vars: - # We add openEuler OS support for kolla in OpenStack Queens/Rocky release - # Set this var to true if you want to use it in Q/R - openeuler_plugin: false - ``` - - Key Configurations - - | Item | Description| - |---|---| - | ansible_host | IP address of the all-in-one node.| - | ansible_ssh_private_key_file | Key used by the Ansible script for logging in to the all-in-one node.| - | ansible_ssh_user | User used by the Ansible script for logging in to the all-in-one node.| - | enabled_service | List of services to be installed. You can delete services as required.| - | neutron_provider_interface_name | Neutron L3 bridge name. | - | default_ext_subnet_range | Neutron private network IP address range. | - | default_ext_subnet_gateway | Neutron private network gateway. | - | neutron_dataplane_interface_name | NIC used by Neutron. You are advised to use a new NIC to avoid conflicts with existing NICs causing disconnection of the all-in-one node. | - | cinder_block_device | Name of the block device used by Cinder.| - | swift_storage_devices | Name of the block device used by Swift. | - -4. Run the installation command. - - ```shell - oos env setup all_in_one - ``` - - After the command is executed, the OpenStack environment of the All in One mode is successfully deployed. - - The environment variable file **.admin-openrc** is stored in the home directory of the current user. - -5. Initialize the Tempest environment. - - If you want to perform the Tempest test in the environment, run the `oos env init all_in_one` command to create the OpenStack resources required by Tempest. - - After the command is executed successfully, a **mytest** directory is generated in the home directory of the user. You can run the `tempest run` command in the directory. \ No newline at end of file diff --git a/docs/en/docs/thirdparty_migration/ha.md b/docs/en/docs/thirdparty_migration/ha.md new file mode 100644 index 0000000000000000000000000000000000000000..fec66a0ad93916672cecdf647322c18d1e1a4a35 --- /dev/null +++ b/docs/en/docs/thirdparty_migration/ha.md @@ -0,0 +1,3 @@ +# HA User Guide + +This document describes how to install and use HA. diff --git a/docs/en/docs/thirdparty_migration/installha.md b/docs/en/docs/thirdparty_migration/installha.md index 849ea3a1805042be48be1c1e0b56820380df88d2..6de0a76a960bf5f3e74e2b42f8be6c5d935c1062 100644 --- a/docs/en/docs/thirdparty_migration/installha.md +++ b/docs/en/docs/thirdparty_migration/installha.md @@ -2,8 +2,6 @@ This section describes how to install and deploy an HA cluster. -\[\[toc]] - ## Installation and Deployment ### Preparing the Environment @@ -17,14 +15,14 @@ At least two physical machines or virtual machines (VMs) installed with openEule Before using the HA software, ensure that the host name has been changed and all host names have been written into the **/etc/hosts** file. 1. Run the following command to change the host name: - - ``` + + ```shell # hostnamectl set-hostname ha1 ``` 2. Edit the `/etc/hosts` file and write the following fields: - - ``` + + ```shell 172.30.30.65 ha1 172.30.30.66 ha2 ``` @@ -33,7 +31,7 @@ Before using the HA software, ensure that the host name has been changed and all After the system is successfully installed, the Yum source is configured by default. The file location is stored in the `/etc/yum.repos.d/openEuler.repo` file. The HA software package uses the following sources: -``` +```shell [OS] name=OS baseurl=http://repo.openeuler.org/openEuler-20.03-LTS-SP1/OS/$basearch/ @@ -58,19 +56,19 @@ gpgkey=http://repo.openeuler.org/openEuler-20.03-LTS-SP1/OS/$basearch/RPM-GPG-KE ### Installing the Components of the HA Software Package -``` +```shell # yum install -y corosync pacemaker pcs fence-agents fence-virt corosync-qdevice sbd drbd drbd-utils ``` ### Setting the **hacluster** User Password -``` +```shell # passwd hacluster ``` ### Modifying the `/etc/corosync/corosync.conf` file -``` +```shell totem { version: 2 cluster_name: hacluster @@ -113,73 +111,76 @@ nodelist { #### Disabling the Firewall 1. Run the following command to disable the firewall: - ``` + + ```shell # systemctl stop firewalld ``` + 2. Change **SELinux** to **disabled** in the **`/etc/selinux/config`** file. - ``` + + ```shell # SELINUX=disabled ``` #### Managing the pcs Service 1. Run the following command to start the pcs service: - - ``` + + ```shell # systemctl start pcsd ``` 2. Run the following command to query the pcs service status: - - ``` + + ```shell # systemctl status pcsd ``` - + The service is started successfully if the following information is displayed: - + ![](./figures/HA-pcs.png) #### Managing the Pacemaker Service 1. Run the following command to start the Pacemaker service: - - ``` + + ```shell # systemctl start pacemaker ``` 2. Run the following command to query the Pacemaker service status: - - ``` + + ```shell # systemctl status pacemaker ``` - + The service is started successfully if the following information is displayed: - + ![](./figures/HA-pacemaker.png) #### Managing the Corosync Service 1. Run the following command to start the Corosync service: - - ``` + + ```shell # systemctl start corosync ``` 2. Run the following command to query the Corosync service status: - - ``` + + ```shell # systemctl status corosync ``` - + The service is started successfully if the following information is displayed: - + ![](./figures/HA-corosync.png) ### Performing Node Authentication **Note: Perform this operation on either node.** -``` +```shell # pcs host auth ha1 ha2 ``` @@ -198,4 +199,4 @@ For details about how to install the management platform newly developed by the ![](./figures/HA-api.png) >**Note:** -> only the Chinese version is available. \ No newline at end of file +> only the Chinese version is available. diff --git a/docs/en/docs/thirdparty_migration/openstack.md b/docs/en/docs/thirdparty_migration/openstack.md deleted file mode 100644 index 53304cbfd6c89621c6bf4bb5b6c0b00f56f869e9..0000000000000000000000000000000000000000 --- a/docs/en/docs/thirdparty_migration/openstack.md +++ /dev/null @@ -1 +0,0 @@ -openEuler OpenStack documents have been moved to [OpenStack SIG Doc](https://openeuler.gitee.io/openstack/). diff --git a/docs/en/docs/thirdparty_migration/thidrparty.md b/docs/en/docs/thirdparty_migration/thidrparty.md index 66f59126694b37d126c81238ab201744905d6b21..690288f8c7fb9b3090d6f7785171d71d4659484b 100644 --- a/docs/en/docs/thirdparty_migration/thidrparty.md +++ b/docs/en/docs/thirdparty_migration/thidrparty.md @@ -1,3 +1,3 @@ # Third-Party Software Porting Guide -This document is intended for community developers, open source enthusiasts, and partners who use the openEuler OS and intend to learn more about third-party software. Basic knowledge about the Linux OS is required for reading this document. \ No newline at end of file +This document is intended for community developers, open source enthusiasts, and partners who use the openEuler OS and intend to learn more about third-party software. Basic knowledge about the Linux OS is required for reading this document. diff --git a/docs/en/docs/thirdparty_migration/usecase.md b/docs/en/docs/thirdparty_migration/usecase.md new file mode 100644 index 0000000000000000000000000000000000000000..b9ef42c8179c021138abea4a486be7cfe5db4d5d --- /dev/null +++ b/docs/en/docs/thirdparty_migration/usecase.md @@ -0,0 +1,249 @@ +# HA Use Cases + +This section describes how to get started with the HA cluster and add an instance. If you are not familiar with HA cluster installation, see [Installing and Deploying an HA Cluster](./installha.md). + +## Quick Start Guide + +The following operations use the management platform newly developed by the community as an example. + +### Login Page + +The user name is `hacluster`, and the password is the one set on the host by the user. + +![](./figures/HA-api.png) + +### Home page + +After logging in to the system, the main page is displayed. The main page consists of the side navigation bar, the top operation area, the resource node list area, and the node operation floating area. + +The following describes the features and usage of the four areas in detail. + +![](./figures/HA-home-page.png) + +#### Navigation bar + +The side navigation bar consists of two parts: the name and logo of the HA cluster software, and the system navigation. The system navigation consists of three parts: **System**, **Cluster Configurations**, and **Tools**. **System** is the default option and the corresponding item to the home page. It displays the information and operation entries of all resources in the system. **Preference Settings** and **Heartbeat Configurations** are set under **Cluster Configurations**. **Log Download** and **Quick Cluster Operation** are set under **Tools**. These two items are displayed in a pop-up box after you click them. + +#### Top Operation Area + +The current login user is displayed statically. When you hover the mouse cursor on the user icon, the operation menu items are displayed, including **Refresh Settings** and **Log Out**. After you click **Refresh Settings**, the **Refresh Settings** dialog box is displayed with the **Refresh Settings** option. You can set the automatic refresh modes for the system, the options are **Do not refresh automatically**, **Refresh every 5 seconds**, and **Refresh every 10 seconds**. By default, **Do not refresh automatically** is selected. Click **Log Out** to log out and jump to the login page. After that, a re-login is required if you want to continue to access the system. + +![](./figures/HA-refresh.png) + +#### Resource Node List Area + +The resource node list displays the resource information such as **Resource Name**, **Status**, **Resource Type**, **Service**, and **Running Node** of all resources in the system, and the node information such as all nodes in the system and the running status of the nodes. In addition, you can **Add**, **Edit**, **Start**, **Stop**, **Clear**, **Migrate**, **Migrate Back**, **Delete**, and **Associate** the resources. + +#### Node Operation Floating Area + +By default, the node operation floating area is collapsed. When you click a node in the heading of the resource node list, the node operation area is displayed on the right, as shown in the preceding figure. This area consists of the collapse button, the node name, the stop button, and the standby button, and provides the stop and standby operations. Click the arrow in the upper left corner of the area to collapse the area. + +### Preference Settings + +The following operations can be performed using command lines. The following is a simple example. For more command details, run the `pcs --help` command. + +- Through the CLI + + ```shell + # pcs property set stonith-enabled=false + # pcs property set no-quorum-policy=ignore + ``` + + Run the following command to view all configurations: + + ```shell + pcs property + ``` + + ![](./figures/HA-firstchoice-cmd.png) + +- Through the GUI + Clicking **Preference Settings** in the navigation bar, the **Preference Settings** dialog box is displayed. Change the values of **No Quorum Policy** and **Stonith Enabled** from the default values to the values shown in the following figure. Then, click **OK**. + + ![](./figures/HA-firstchoice.png) + +### Add Resource + +#### Adding Common Resources + +1. Click **Add Common Resource**. The **Create Resource** dialog box is displayed. + All mandatory configuration items of a resource are displayed on the **Basic** page. After you select a resource type on the **Basic** page, other mandatory and optional configuration items of the resource are displayed. + +2. Enter the resource configuration information. + A gray text area is displayed on the right of the dialog box to describe the current configuration item. After all mandatory parameters are set, click **OK** to create a common resource or click **Cancel** to cancel the add operation. + The optional configuration items on the **Instance Attribute**, **Meta Attribute**, or **Operation Attribute** page are optional. If they are not configured, the resource creation process is not affected. You can modify them as required. Otherwise, the default values are used. + +The following uses Apache as an example to describe how to add resources through the CLI and GUI. + +- Through the CLI + + ```shell + # pcs resource create httpd ocf:heartbeat:apache + ``` + + Check the resource running status: + + ```shell + # pcs status + ``` + + ![](./figures/HA-pcs-status.png) + +- Through the GUI + +1. Enter the resource name and resource type, as shown in the following figure. + + ![](./figures/HA-add-resource.png) + +2. If the following information is displayed, the resource is successfully added and started, and runs on a node, for example, ha1. + + ![](./figures/HA-apache-suc.png) +3. Access the Apache page. + + ![](./figures/HA-apache-show.png) + +#### Adding Group Resources + +>**Note:** +> Adding group resources requires at least one common resource in the cluster. + +1. Click **Add Group Resource**. The **Create Resource** dialog box is displayed. + All the parameters on the **Basic** tab page are mandatory. After setting the parameters, click **OK** to add the resource or click **Cancel** to cancel the add operation. + + ![](./figures/HA-group.png) + + >**Notes:** + > Group resources are started in the sequence of child resources. Therefore, you need to select child resources in sequence. + +2. If the following information is displayed, the resource is added successfully. + + ![](./figures/HA-group-suc.png) + +#### Adding Clone Resources + +1. Click **Add Clone Resource**. The **Create Resource** dialog box is displayed. + On the **Basic** page, enter the object to be cloned. The resource name is automatically generated. After entering the object name, click **OK** to add the resource, or click **Cancel** to cancel the add operation. + + ![](./figures/HA-clone.png) + +2. If the following information is displayed, the resource is added successfully. + + ![](./figures/HA-clone-suc.png) + +### Editing Resources + +- Starting a resource: Select a target resource from the resource node list. The target resource must not be running. Start the resource. +- Stopping a resource: Select a target resource from the resource node list. The target resource must be running. Stop the resource. +- Clearing a resource: Select a target resource from the resource node list. Clear the resource. +- Migrating a resource: Select a target resource from the resource node list. The resource must be a common resource or group resource in the running status. Migrate the resource to migrate it to a specified node. +- Migrating back a resource: Select a target resource from the resource node list. The resource must be a migrated resource. Migrate back the resource to clear the migration settings of the resource and migrate the resource back to the original node. +After you click **Migrate Back**, the status change of the resource item in the list is the same as that when the resource is started. +- Deleting a resource: Select a target resource from the resource node list. Delete the resource. + +### Setting Resource Relationships + +Resource relationships are used to set restrictions for the target resources. There are three types resource restrictions: resource location, resource collaboration, and resource order. + +- Resource location: sets the running level of the resource on the nodes in the cluster to determine the node where the resource runs during startup or switchover. The running levels are Master Node and Slave 1 in descending order. +- Resource collaboration: indicates whether the target resource and other resources in the cluster run on the same node. **Same Node** indicates that this node must run on the same node as the target resource. **Mutually Exclusive** indicates that this node cannot run on the same node as the target resource. +- Resource order: Set the order in which the target resource and other resources in the cluster are started. **Front Resource** indicates that this resource must be started before the target resource. **Follow-up Resource** indicates that this resource can be started only after the target resource is started. + +## HA MySQL Configuration Example + +### Configuring the Virtual IP Address + +1. On the home page, choose **Add** > **Add Common Resource**, and set the parameters as follows: + + ![](./figures/HA-vip.png) + +2. The resource is successfully created and started, and runs on a node, for example, ha1. +3. The IP address can be pinged and connected. After login, you can perform various operations normally. Resources can be switched to ha2 and can be accessed normally. See the following figure. + ![](./figures/HA-vip-suc.png) + +### Configuring NFS Storage + +Perform the following steps to configure another host as the NFS server: + +1. Install the software package. + + ```shell + # yum install -y nfs-utils rpcbind + ``` + +2. Disable the firewall. + + ```shell + # systemctl stop firewalld && systemctl disable firewalld + ``` + +3. Modify the `/etc/selinux/config` file to change the status of SELinux to disabled. + + ```shell + # SELINUX=disabled + ``` + +4. Start services. + + ```shell + # systemctl start rpcbind && systemctl enable rpcbind + # systemctl start nfs-server && systemctl enable nfs-server + ``` + +5. Create a shared directory on the server. + + ```shell + # mkdir -p /test + ``` + +6. Modify the NFS configuration file. + + ```shell + # vim /etc/exports + # /test *(rw,no_root_squash) + ``` + +7. Reload the service. + + ```shell + # systemctl reload nfs + ``` + +8. Install the software package on the client. Install MySQL first and then mount NFS to the MySQL data path. + + ```shell + # yum install -y nfs-utils mariadb-server + ``` + +9. On the home page, choose **Add** > **Add Common Resource** and configure the NFS resource as follows: + + ![](./figures/HA-nfs.png) + +10. The resource is successfully created and started, and runs on a node, for example, ha1. The NFS is mounted to the `/var/lib/mysql` directory. The resource is switched to ha2. The NFS is unmounted from ha1 and automatically mounted to ha2. See the following figure. + + ![](./figures/HA-nfs-suc.png) + +### Configuring MySQL + +1. On the home page, choose **Add** > **Add Common Resource** and configure the MySQL resource as follows: + + ![](./figures/HA-mariadb.png) + +2. If the following information is displayed, the resource is successfully added: + + ![](./figures/HA-mariadb-suc.png) + +### Adding the Preceding Resources as a Group Resource + +1. Add the three resources in the resource startup sequence. + + On the home page, choose **Add** > **Add Group Resource** and configure the group resource as follows: + + ![](./figures/HA-group-new.png) + +2. The group resource is successfully created and started. If the command output is the same as that of the preceding common resources, the group resource is successfully added. + + ![](./figures/HA-group-new-suc.png) + +3. Use ha1 as the standby node and migrate the group resource to the ha2 node. The system is running properly. + + ![](./figures/HA-group-new-suc2.png) diff --git a/docs/en/menu/index.md b/docs/en/menu/index.md index bd91d22e975d79b8a2b550732a9b9873797302a8..fc8d550e97a9dda7e63ed9a2a65821501544a090 100644 --- a/docs/en/menu/index.md +++ b/docs/en/menu/index.md @@ -1,261 +1,221 @@ --- headless: true --- -- [Terms of Use]({{< relref "./docs/Releasenotes/terms-of-use.md" >}}) -- [Release Notes]({{< relref "./docs/Releasenotes/release_notes.md" >}}) - - [User Notice]({{< relref "./docs/Releasenotes/user-notice.md" >}}) - - [Account List]({{< relref "./docs/Releasenotes/account-list.md" >}}) - - [Introduction]({{< relref "./docs/Releasenotes/introduction.md" >}}) - - [OS Installation]({{< relref "./docs/Releasenotes/installing-the-os.md" >}}) - - [Key Features]({{< relref "./docs/Releasenotes/key-features.md" >}}) - - [Known Issues]({{< relref "./docs/Releasenotes/known-issues.md" >}}) - - [Resolved Issues]({{< relref "./docs/Releasenotes/resolved-issues.md" >}}) - - [Common Vulnerabilities and Exposures (CVE)]({{< relref "./docs/Releasenotes/common-vulnerabilities-and-exposures-(cve).md" >}}) - - [Source Code]({{< relref "./docs/Releasenotes/source-code.md" >}}) - - [Contribution]({{< relref "./docs/Releasenotes/contribution.md" >}}) - - [Acknowledgment]({{< relref "./docs/Releasenotes/acknowledgment.md" >}}) -- [Quick Start]({{< relref "./docs/Quickstart/quick-start.md" >}}) -- [Installation Guide]({{< relref "./docs/Installation/Installation.md" >}}) - - [Installation on Servers]({{< relref "./docs/Installation/install-server.md" >}}) - - [Installation Preparations]({{< relref "./docs/Installation/installation-preparations.md" >}}) - - [Installation Modes]({{< relref "./docs/Installation/Installation-Modes1.md" >}}) - - [Installation Guideline]({{< relref "./docs/Installation/installation-guideline.md" >}}) - - [Using Kickstart for Automatic Installation]({{< relref "./docs/Installation/using-kickstart-for-automatic-installation.md" >}}) - - [FAQs]({{< relref "./docs/Installation/faqs.md" >}}) - - [Installation on Raspberry Pi]({{< relref "./docs/Installation/install-pi.md" >}}) - - [Installation Preparations]({{< relref "./docs/Installation/Installation-Preparations1.md" >}}) - - [Installation Mode]({{< relref "./docs/Installation/Installation-Modes1.md" >}}) - - [Installation Guideline]({{< relref "./docs/Installation/Installation-Guide1" >}}) - - [FAQs]({{< relref "./docs/Installation/FAQ1.md" >}}) - - [More Resources]({{< relref "./docs/Installation/More-Resources.md" >}}) -- [Administrator Guide]({{< relref "./docs/Administration/administration.md" >}}) - - [Viewing System Information]({{< relref "./docs/Administration/viewing-system-information.md" >}}) - - [Basic Configuration]({{< relref "./docs/Administration/basic-configuration.md" >}}) - - [User and User Group Management]({{< relref "./docs/Administration/user-and-user-group-management.md" >}}) - - [Software Package Management with DNF]({{< relref "./docs/Administration/using-the-dnf-to-manage-software-packages.md" >}}) - - [Service Management]({{< relref "./docs/Administration/service-management.md" >}}) - - [Process Management]({{< relref "./docs/Administration/process-management.md" >}}) - - [Memory Management]({{< relref "./docs/Administration/memory-management.md" >}}) - - [Network Configuration]({{< relref "./docs/Administration/configuring-the-network.md" >}}) - - [Hard Disk Management with LVM]({{< relref "./docs/Administration/managing-hard-disks-through-lvm.md" >}}) - - [KAE Usage]({{< relref "./docs/Administration/using-the-kae.md" >}}) - - [Service Configuration]({{< relref "./docs/Administration/configuring-services.md" >}}) - - [Configuring the Repo Server]({{< relref "./docs/Administration/configuring-the-repo-server.md" >}}) - - [Configuring the FTP Server]({{< relref "./docs/Administration/configuring-the-ftp-server.md" >}}) - - [Configuring the Web Server]({{< relref "./docs/Administration/configuring-the-web-server.md" >}}) - - [Setting Up the Database Server]({{< relref "./docs/Administration/setting-up-the-database-server.md" >}}) - - [Trusted Computing]({{< relref "./docs/Administration/trusted-computing.md" >}}) - - [FAQs]({{< relref "./docs/Administration/faqs.md" >}}) -- [O&M Guide]({{< relref "./docs/ops_guide/overview.md" >}}) - - [O&M Overview]({{< relref "./docs/ops_guide/om-overview.md" >}}) - - [System Resources and Performance]({{< relref "./docs/ops_guide/system-resources-and-performance.md" >}}) - - [Information Collection]({{< relref "./docs/ops_guide/information-collection.md" >}}) - - [Troubleshooting]({{< relref "./docs/ops_guide/troubleshooting.md" >}}) - - [Commonly Used Tools]({{< relref "./docs/ops_guide/commonly-used-tools.md" >}}) - - [Common Skills]({{< relref "./docs/ops_guide/common-skills.md" >}}) -- [Security Hardening Guide]({{< relref "./docs/SecHarden/secHarden.md" >}}) - - [OS Hardening Overview]({{< relref "./docs/SecHarden/os-hardening-overview.md" >}}) - - [Security Hardening Guide]({{< relref "./docs/SecHarden/security-hardening-guide.md" >}}) - - [Account Passwords]({{< relref "./docs/SecHarden/account-passwords.md" >}}) - - [Authentication and Authorization]({{< relref "./docs/SecHarden/authentication-and-authorization.md" >}}) - - [System Services]({{< relref "./docs/SecHarden/system-services.md" >}}) - - [File Permissions]({{< relref "./docs/SecHarden/file-permissions.md" >}}) - - [Kernel Parameters]({{< relref "./docs/SecHarden/kernel-parameters.md" >}}) - - [SELinux Configuration]({{< relref "./docs/SecHarden/selinux-configuration.md" >}}) - - [Security Hardening Tools]({{< relref "./docs/SecHarden/security-hardening-tools.md" >}}) - - [Appendix]({{< relref "./docs/SecHarden/appendix.md" >}}) -- [Virtualization User Guide]({{< relref "./docs/Virtualization/virtualization.md" >}}) - - [Introduction to Virtualization]({{< relref "./docs/Virtualization/introduction-to-virtualization.md" >}}) - - [Installing Virtualization Components]({{< relref "./docs/Virtualization/virtualization-installation.md" >}}) - - [Environment Preparation]({{< relref "./docs/Virtualization/environment-preparation.md" >}}) - - [VM Configuration]({{< relref "./docs/Virtualization/vm-configuration.md" >}}) - - [Managing VMs]({{< relref "./docs/Virtualization/managing-vms.md" >}}) - - [VM Live Migration]({{< relref "./docs/Virtualization/vm-live-migration.md" >}}) - - [System Resource Management]({{< relref "./docs/Virtualization/system-resource-management.md" >}}) - - [Managing Devices]({{< relref "./docs/Virtualization/managing-devices.md" >}}) - - [VM Maintainability Management]({{< relref "./docs/Virtualization/vm-maintainability-management.md" >}}) - - [Best Practices]({{< relref "./docs/Virtualization/best-practices.md" >}}) - - [Tool Guide]({{< relref "./docs/Virtualization/tool-guide.md" >}}) - - [vmtop]({{< relref "./docs/Virtualization/vmtop.md" >}}) - - [LibcarePlus]({{< relref "./docs/Virtualization/LibcarePlus.md" >}}) - - [Skylark VM Hybrid Deployment]({{< relref "./docs/Virtualization/Skylark.md" >}}) - - [Appendix]({{< relref "./docs/Virtualization/appendix.md" >}}) -- [StratoVirt Virtualization User Guide]({{< relref "./docs/StratoVirt/StratoVirt_guidence.md" >}}) - - [Introduction to StratoVirt]({{< relref "./docs/StratoVirt/StratoVirt_introduction.md" >}}) - - [Installing StratoVirt]({{< relref "./docs/StratoVirt/Install_StratoVirt.md" >}}) - - [Preparing the Environment]({{< relref "./docs/StratoVirt/Prepare_env.md" >}}) - - [Configuring a VM]({{< relref "./docs/StratoVirt/VM_configuration.md" >}}) - - [VM Management]({{< relref "./docs/StratoVirt/VM_management.md" >}}) - - [Connecting to the iSula Secure Container]({{< relref "./docs/StratoVirt/interconnect_isula.md" >}}) - - [Interconnecting with libvirt]({{< relref "./docs/StratoVirt/Interconnect_libvirt.md" >}}) - - [StratoVirt VFIO Instructions]({{< relref "./docs/StratoVirt/StratoVirt_VFIO_instructions.md" >}}) -- [Container User Guide]({{< relref "./docs/Container/container.md" >}}) - - [iSulad Container Engine]({{< relref "./docs/Container/isulad-container-engine.md" >}}) - - [Installation, Upgrade and Uninstallation]({{< relref "./docs/Container/installation-upgrade-Uninstallation.md" >}}) - - [Installation and Configuration]({{< relref "./docs/Container/installation-configuration.md" >}}) - - [Upgrade Methods]({{< relref "./docs/Container/upgrade-methods.md" >}}) - - [Uninstallation]({{< relref "./docs/Container/uninstallation.md" >}}) - - [Application Scenarios]({{< relref "./docs/Container/application-scenarios.md" >}}) - - [Container Management]({{< relref "./docs/Container/container-management.md" >}}) - - [Interconnection with the CNI Network]({{< relref "./docs/Container/interconnection-with-the-cni-network.md" >}}) - - [Privileged Container]({{< relref "./docs/Container/privileged-container.md" >}}) - - [CRI]({{< relref "./docs/Container/cri.md" >}}) - - [Image Management]({{< relref "./docs/Container/image-management.md" >}}) - - [Checking the Container Health Status]({{< relref "./docs/Container/checking-the-container-health-status.md" >}}) - - [Querying Information]({{< relref "./docs/Container/querying-information.md" >}}) - - [Security Features]({{< relref "./docs/Container/security-features.md" >}}) - - [Supporting OCI hooks]({{< relref "./docs/Container/supporting-oci-hooks.md" >}}) - - [Local Volume Management]({{< relref "./docs/Container/local-volume-management.md" >}}) - - [Interconnecting iSulad shim v2 with StratoVirt]({{< relref "./docs/Container/interconnecting-isula-shim-v2-with-stratovirt.md" >}}) - - [Appendix]({{< relref "./docs/Container/appendix.md" >}}) - - [System Container]({{< relref "./docs/Container/system-container.md" >}}) - - [Installation Guideline]({{< relref "./docs/Container/installation-guideline.md" >}}) - - [Usage Guide]({{< relref "./docs/Container/usage-guide.md" >}}) - - [Specifying Rootfs to Create a Container]({{< relref "./docs/Container/specifying-rootfs-to-create-a-container.md" >}}) - - [Using systemd to Start a Container]({{< relref "./docs/Container/using-systemd-to-start-a-container.md" >}}) - - [Reboot or Shutdown in a Container]({{< relref "./docs/Container/reboot-or-shutdown-in-a-container.md" >}}) - - [Configurable Cgroup Path]({{< relref "./docs/Container/configurable-cgroup-path.md" >}}) - - [Writable Namespace Kernel Parameters]({{< relref "./docs/Container/writable-namespace-kernel-parameters.md" >}}) - - [Shared Memory Channels]({{< relref "./docs/Container/shared-memory-channels.md" >}}) - - [Dynamically Loading the Kernel Module]({{< relref "./docs/Container/dynamically-loading-the-kernel-module.md" >}}) - - [Environment Variable Persisting]({{< relref "./docs/Container/environment-variable-persisting.md" >}}) - - [Maximum Number of Handles]({{< relref "./docs/Container/maximum-number-of-handles.md" >}}) - - [Security and Isolation]({{< relref "./docs/Container/security-and-isolation.md" >}}) - - [Dynamically Managing Container Resources \\(syscontainer-tools\\)]({{< relref "./docs/Container/dynamically-managing-container-resources-(syscontainer-tools).md" >}}) - - [Appendix]({{< relref "./docs/Container/appendix-1.md" >}}) - - [Secure Container]({{< relref "./docs/Container/secure-container.md" >}}) - - [Installation and Deployment]({{< relref "./docs/Container/installation-and-deployment-2.md" >}}) - - [Application Scenarios]({{< relref "./docs/Container/application-scenarios-2.md" >}}) - - [Managing the Lifecycle of a Secure Container]({{< relref "./docs/Container/managing-the-lifecycle-of-a-secure-container.md" >}}) - - [Configuring Resources for a Secure Container]({{< relref "./docs/Container/configuring-resources-for-a-secure-container.md" >}}) - - [Monitoring Secure Containers]({{< relref "./docs/Container/monitoring-secure-containers.md" >}}) - - [Appendix]({{< relref "./docs/Container/appendix-2.md" >}}) - - [Docker Container]({{< relref "./docs/Container/docker-container.md" >}}) - - [Installation and Deployment]({{< relref "./docs/Container/installation-and-deployment-3.md" >}}) - - [Container Management]({{< relref "./docs/Container/container-management-1.md" >}}) - - [Image Management]({{< relref "./docs/Container/image-management-1.md" >}}) - - [Command Reference]({{< relref "./docs/Container/command-reference.md" >}}) - - [Container Engine]({{< relref "./docs/Container/container-engine.md" >}}) - - [Container Management]({{< relref "./docs/Container/container-management-2.md" >}}) - - [Image Management]({{< relref "./docs/Container/image-management-2.md" >}}) - - [Statistics]({{< relref "./docs/Container/statistics.md" >}}) - - [Image Building]({{< relref "./docs/Container/isula-build.md" >}}) -- [A-Tune User Guide]({{< relref "./docs/A-Tune/A-Tune.md" >}}) - - [Getting to Know A-Tune]({{< relref "./docs/A-Tune/getting-to-know-a-tune.md" >}}) - - [Installation and Deployment]({{< relref "./docs/A-Tune/installation-and-deployment.md" >}}) - - [Usage Instructions]({{< relref "./docs/A-Tune/usage-instructions.md" >}}) - - [FAQs]({{< relref "./docs/A-Tune/faqs.md" >}}) - - [Appendixes]({{< relref "./docs/A-Tune/appendixes.md" >}}) -- [openEuler Embedded User Guide](https://openeuler.gitee.io/yocto-meta-openeuler/master/index.html) -- [Kernel Live Upgrade Guide]({{< relref "./docs/KernelLiveUpgrade/KernelLiveUpgrade.md" >}}) - - [Installation and Deployment]({{< relref "./docs/KernelLiveUpgrade/installation-and-deployment.md" >}}) - - [How to Run]({{< relref "./docs/KernelLiveUpgrade/how-to-run.md" >}}) - - [Common Problems and Solutions]({{< relref "./docs/KernelLiveUpgrade/common-problems-and-solutions.md" >}}) -- [Tiered-Reliability Memory User Guide]({{< relref "./docs/Kernel/overview.md" >}}) - - [Restrictions]({{< relref "./docs/Kernel/restrictions.md" >}}) - - [How to Use]({{< relref "./docs/Kernel/how-to-use.md" >}}) -- [Application Development Guide]({{< relref "./docs/ApplicationDev/application-development.md" >}}) - - [Preparation]({{< relref "./docs/ApplicationDev/preparations-for-development-environment.md" >}}) - - [Using GCC for Compilation]({{< relref "./docs/ApplicationDev/using-gcc-for-compilation.md" >}}) - - [Using Make for Compilation]({{< relref "./docs/ApplicationDev/using-make-for-compilation.md" >}}) - - [Using JDK for Compilation]({{< relref "./docs/ApplicationDev/using-jdk-for-compilation.md" >}}) - - [Building an RPM Package]({{< relref "./docs/ApplicationDev/building-an-rpm-package.md" >}}) - - [FAQ]({{< relref "./docs/ApplicationDev/FAQ.md" >}}) -- [secGear Development Guide]({{< relref "./docs/secGear/secGear.md" >}}) - - [Introduction to secGear]({{< relref "./docs/secGear/introduction-to-secGear.md" >}}) - - [Installing secGear]({{< relref "./docs/secGear/installing-secGear.md" >}}) - - [secGear Development Guide]( {{< relref "./docs/secGear/secGear-development-guide" >}}) - - [Using the Switchless Feature]({{< relref "./docs/secGear/using-the-switchless-feature.md" >}}) - - [Using the secGear Tool](For {{< relref "./docs/secGear/using-the-secGear-tool.md" >}}) - - [API Reference]({{< relref "./docs/secGear/api-description.md" >}}) -- [Kubernetes Cluster Deployment Guide]({{< relref "./docs/Kubernetes/Kubernetes.md" >}}) - - [Preparing VMs]( {{< relref "./docs/Kubernetes/preparing-VMs.md">}}) - - [Deploying a Kubernetes Cluster]({{< relref "./docs/Kubernetes/deploying-a-Kubernetes-cluster-manually.md" >}}) - - [Installing the Kubernetes Software Package]( {{< relref "./docs/Kubernetes/installing-the-Kubernetes-software-package.md" >}}) - - [Preparing Certificates]({{< relref "./docs/Kubernetes/preparing-certificates.md" >}}) - - [Installing etcd]({{< relref "./docs/Kubernetes/installing-etcd.md" >}}) - - [Deploying Components on the Control Plane]({{< relref "./docs/Kubernetes/deploying-control-plane-components.md" >}}) - - [Deploying a Node Component]({{< relref "./docs/Kubernetes/deploying-a-node-component.md" >}}) - - [Automatic Cluster Deployment]({{< relref "./docs/Kubernetes/eggo-automatic-deployment.md" >}}) - - [Tool Introduction]({{< relref "./docs/Kubernetes/eggo-tool-introduction.md" >}}) - - [Deploying a Cluster]({{< relref "./docs/Kubernetes/eggo-deploying-a-cluster.md" >}}) - - [Dismantling a Cluster]({{< relref "./docs/Kubernetes/eggo-dismantling-a-cluster.md" >}}) - - [Running the Test Pod]({{< relref "./docs/Kubernetes/running-the-test-pod.md" >}}) -- [KubeEdge User Guide]({{< relref "./docs/KubeEdge/overview.md" >}}) - - [KubeEdge Usage Guide]({{< relref "./docs/KubeEdge/kubeedge-usage-guide.md" >}}) - - [KubeEdge Deployment Guide]({{< relref "./docs/KubeEdge/kubeedge-deployment-guide.md" >}}) -- [K3s Deployment Guide]({{< relref "./docs/K3s/K3s-deployment-guide.md" >}}) -- [Third-Party Software Deployment Guide]({{< relref "./docs/thirdparty_migration/thidrparty.md" >}}) - - [OpenStack]({{< relref "./docs/thirdparty_migration/openstack.md" >}}) - - [HA User Guide]({{< relref "./docs/desktop/ha.md" >}}) - - [Deploying an HA Cluster]({{< relref "./docs/thirdparty_migration/installha.md" >}}) - - [HA Usage Example]({{< relref "./docs/desktop/HA_usage_example.md" >}}) - - [KubeSphere Deployment Guide]({{< relref "./docs/desktop/kubesphere.md" >}}) -- [Memory Fabric User Guide]({{< relref "./docs/memory-fabric/memory-fabric-user-guide.md" >}}) -- [Desktop Environment User Guide]({{< relref "./docs/desktop/desktop.md" >}}) - - [UKUI]({{< relref "./docs/desktop/ukui.md" >}}) - - [UKUI Installation]({{< relref "./docs/desktop/installing-UKUI.md" >}}) - - [UKUI User Guide]({{< relref "./docs/desktop/UKUI-user-guide.md" >}}) - - [DDE]({{< relref "./docs/desktop/dde.md" >}}) - - [DDE Installation]({{< relref "./docs/desktop/installing-DDE.md" >}}) - - [DDE User Guide]({{< relref "./docs/desktop/DDE-user-guide.md" >}}) - - [Xfce]({{< relref "./docs/desktop/xfce.md" >}}) - - [Xfce Installation]({{< relref "./docs/desktop/installing-Xfce.md" >}}) - - [Xfce User Guide]({{< relref "./docs/desktop/Xfce_userguide.md" >}}) - - [GNOME]({{< relref "./docs/desktop/gnome.md" >}}) - - [GNOME Installation]({{< relref "./docs/desktop/installing-GNOME.md" >}}) - - [GNOME User Guide]({{< relref "./docs/desktop/Gnome_userguide.md" >}}) -- [Toolset User Guide]({{< relref "./docs/userguide/overview.md" >}}) - - [Patch Tracking]({{< relref "./docs/userguide/patch-tracking.md" >}}) - - [pkgship]({{< relref "./docs/userguide/pkgship.md" >}}) -- [A-Ops User Guide]({{< relref "./docs/A-Ops/overview.md" >}}) - - [Deploying A-Ops]({{< relref "./docs/A-Ops/deploying-aops.md" >}}) - - [Deploying aops-agent]({{< relref "./docs/A-Ops/deploying-aops-agent.md" >}}) - - [Using gala-ragdoll]({{< relref "./docs/A-Ops/configuration-source-tracing-service-manual.md" >}}) - - [Using Architecture Awareness Service]({{< relref "./docs/A-Ops/architecture-awareness-service-manual.md" >}}) - - [Using gala-gopher]({{< relref "./docs/A-Ops/using-gala-gopher.md" >}}) - - [Using gala-anteater]({{< relref "./docs/A-Ops/using-gala-anteater.md" >}}) - - [Using gala-spider]({{< relref "./docs/A-Ops/using-gala-spider.md" >}}) -- [KubeOS User Guide]({{< relref "./docs/KubeOS/kubeos-user-guide.md" >}}) - - [About KubeOS]({{< relref "./docs/KubeOS/about-kubeos.md" >}}) - - [Installation and Deployment]({{< relref "./docs/KubeOS/installation-and-deployment.md" >}}) - - [Usage Instructions]({{< relref "./docs/KubeOS/usage-instructions.md" >}}) - - [KubeOS Image Creation]({{< relref "./docs/KubeOS/kubeos-image-creation.md" >}}) -- [Rubik User Guide]({{< relref "./docs/rubik/overview.md" >}}) - - [Installation and Deployment]({{< relref "./docs/rubik/installation-and-deployment.md" >}}) - - [HTTP APIs]({{< relref "./docs/rubik/http-apis.md" >}}) - - [Example of Isolation for Hybrid Deployed Services]({{< relref "./docs/rubik/example-of-isolation-for-hybrid-deployed-services.md" >}}) -- [oncn-bwm User Guide]({{< relref "./docs/oncn-bwm/overview.md" >}}) -- [Image Tailoring and Customization Tool]({{< relref "./docs/TailorCustom/overview.md" >}}) - - [isocut Usage Guide]({{< relref "./docs/TailorCustom/isocut-usage-guide.md" >}}) - - [ImageTailor User Guide]({{< relref "./docs/TailorCustom/imageTailor-user-guide.md" >}}) -- [Gazelle User Guide]({{< relref "./docs/Gazelle/Gazelle.md" >}}) -- [System Analysis and Tuning User Guide]({{< relref "./docs/SystemOptimization/overview.md" >}}) - - [MySQL Performance Tuning]({{< relref "./docs/SystemOptimization/mysql-performance-tuning.md" >}}) - - [Bid Data Tuning]({{< relref "./docs/SystemOptimization/big-data-tuning.md" >}}) -- [NestOS User Guide]({{< relref "./docs/NestOS/overview.md" >}}) - - [Installation and Deployment]({{< relref "./docs/NestOS/installation-and-deployment.md" >}}) - - [Setting Up Kubernetes and iSulad]({{< relref "./docs/NestOS/usage.md" >}}) - - [Feature Description]({{< relref "./docs/NestOS/feature-description.md" >}}) -- [ShangMi Feature User Guide]({{< relref "./docs/ShangMi/overview.md" >}}) - - [Algorithm Library]({{< relref "./docs/ShangMi/algorithm-library.md" >}}) - - [Certificates]({{< relref "./docs/ShangMi/certificates.md" >}}) - - [User Identity Authentication]({{< relref "./docs/ShangMi/user-identity-authentication.md" >}}) - - [SSH Protocol Stack]({{< relref "./docs/ShangMi/ssh-stack.md" >}}) - - [TLCP Stack]({{< relref "./docs/ShangMi/tlcp-stack.md" >}}) - - [Kernel Module Signing]({{< relref "./docs/ShangMi/kernel-module-signing.md" >}}) - - [File Integrity Protection]({{< relref "./docs/ShangMi/file-integrity-protection.md" >}}) - - [Disk Encryption]({{< relref "./docs/ShangMi/disk-encryption.md" >}}) - - [Secure Boot]({{< relref "./docs/ShangMi/secure-boot.md" >}}) -- [astream User Guide]({{< relref "./docs/astream/overview.md" >}}) - - [Installation and Usage]({{< relref "./docs/astream/installation_and_usage.md" >}}) - - [astream for MySQL]({{< relref "./docs/astream/astream-for-mysql-guide.md" >}}) -- [Imperceptible Container Management Plane DPU Offload User Guide]({{< relref "./docs/DPUOffload/overview.md" >}}) - - [Imperceptible Container Management Plane Offload]({{< relref "./docs/DPUOffload/imperceptible-container-management-plane-offload.md" >}}) - - [qtfs Shared File System]({{< relref "./docs/DPUOffload/qtfs-architecture-and-usage.md" >}}) - - [Imperceptible Offload Deployment Guide]({{< relref "./docs/DPUOffload/offload-deployment-guide.md" >}}) -- [HSAK Developer Guide]({{< relref "./docs/HSAK/introduce_hsak.md" >}}) - - [Develop With HSAK]({{< relref "./docs/HSAK/develop_with_hsak.md" >}}) - - [HSAK Tool Usage]({{< relref "./docs/HSAK/hsak_tools_usage.md" >}}) - - [HSAK Interfaces]({{< relref "./docs/HSAK/hsak_interface.md" >}}) - +- [Terms of Use]({{< relref "./docs/Releasenotes/terms-of-use.md" >}}) +- [Release Notes]({{< relref "./docs/Releasenotes/release_notes.md" >}}) + - [User Notice]({{< relref "./docs/Releasenotes/user-notice.md" >}}) + - [Account List]({{< relref "./docs/Releasenotes/account-list.md" >}}) + - [Introduction]({{< relref "./docs/Releasenotes/introduction.md" >}}) + - [OS Installation]({{< relref "./docs/Releasenotes/installing-the-os.md" >}}) + - [Key Features]({{< relref "./docs/Releasenotes/key-features.md" >}}) + - [Known Issues]({{< relref "./docs/Releasenotes/known-issues.md" >}}) + - [Resolved Issues]({{< relref "./docs/Releasenotes/resolved-issues.md" >}}) + - [Common Vulnerabilities and Exposures (CVE)]({{< relref "./docs/Releasenotes/common-vulnerabilities-and-exposures-(cve).md" >}}) + - [Source Code]({{< relref "./docs/Releasenotes/source-code.md" >}}) + - [Contribution]({{< relref "./docs/Releasenotes/contribution.md" >}}) + - [Acknowledgment]({{< relref "./docs/Releasenotes/acknowledgment.md" >}}) +- [Quick Start]({{< relref "./docs/Quickstart/quick-start.md" >}}) +- [Installation Guide]({{< relref "./docs/Installation/Installation.md" >}}) + - [Installation on Servers]({{< relref "./docs/Installation/install-server.md" >}}) + - [Installation Preparations]({{< relref "./docs/Installation/installation-preparations.md" >}}) + - [Installation Modes]({{< relref "./docs/Installation/Installation-Modes1.md" >}}) + - [Installation Guideline]({{< relref "./docs/Installation/installation-guideline.md" >}}) + - [Using Kickstart for Automatic Installation]({{< relref "./docs/Installation/using-kickstart-for-automatic-installation.md" >}}) + - [FAQs]({{< relref "./docs/Installation/faqs.md" >}}) + - [Installation on Raspberry Pi]({{< relref "./docs/Installation/install-pi.md" >}}) + - [Installation Preparations]({{< relref "./docs/Installation/Installation-Preparations1.md" >}}) + - [Installation Mode]({{< relref "./docs/Installation/Installation-Modes1.md" >}}) + - [Installation Guideline]({{< relref "./docs/Installation/Installation-Guide1" >}}) + - [FAQs]({{< relref "./docs/Installation/FAQ1.md" >}}) + - [More Resources]({{< relref "./docs/Installation/More-Resources.md" >}}) +- [Administrator Guide]({{< relref "./docs/Administration/administration.md" >}}) + - [Viewing System Information]({{< relref "./docs/Administration/viewing-system-information.md" >}}) + - [Basic Configuration]({{< relref "./docs/Administration/basic-configuration.md" >}}) + - [User and User Group Management]({{< relref "./docs/Administration/user-and-user-group-management.md" >}}) + - [Software Package Management with DNF]({{< relref "./docs/Administration/using-the-dnf-to-manage-software-packages.md" >}}) + - [Service Management]({{< relref "./docs/Administration/service-management.md" >}}) + - [Process Management]({{< relref "./docs/Administration/process-management.md" >}}) + - [Network Configuration]({{< relref "./docs/Administration/configuring-the-network.md" >}}) + - [Hard Disk Management with LVM]({{< relref "./docs/Administration/managing-hard-disks-through-lvm.md" >}}) + - [KAE Usage]({{< relref "./docs/Administration/using-the-kae.md" >}}) + - [Service Configuration]({{< relref "./docs/Administration/configuring-services.md" >}}) + - [Configuring the Repo Server]({{< relref "./docs/Administration/configuring-the-repo-server.md" >}}) + - [Configuring the FTP Server]({{< relref "./docs/Administration/configuring-the-ftp-server.md" >}}) + - [Configuring the Web Server]({{< relref "./docs/Administration/configuring-the-web-server.md" >}}) + - [Setting Up the Database Server]({{< relref "./docs/Administration/setting-up-the-database-server.md" >}}) + - [Trusted Computing]({{< relref "./docs/Administration/trusted-computing.md" >}}) + - [FAQs]({{< relref "./docs/Administration/faqs.md" >}}) +- [O&M Guide]({{< relref "./docs/ops_guide/overview.md" >}}) + - [O&M Overview]({{< relref "./docs/ops_guide/om-overview.md" >}}) + - [System Resources and Performance]({{< relref "./docs/ops_guide/system-resources-and-performance.md" >}}) + - [Information Collection]({{< relref "./docs/ops_guide/information-collection.md" >}}) + - [Troubleshooting]({{< relref "./docs/ops_guide/troubleshooting.md" >}}) + - [Commonly Used Tools]({{< relref "./docs/ops_guide/commonly-used-tools.md" >}}) + - [Common Skills]({{< relref "./docs/ops_guide/common-skills.md" >}}) +- [Security Hardening Guide]({{< relref "./docs/SecHarden/secHarden.md" >}}) + - [OS Hardening Overview]({{< relref "./docs/SecHarden/os-hardening-overview.md" >}}) + - [Security Hardening Guide]({{< relref "./docs/SecHarden/security-hardening-guide.md" >}}) + - [Account Passwords]({{< relref "./docs/SecHarden/account-passwords.md" >}}) + - [Authentication and Authorization]({{< relref "./docs/SecHarden/authentication-and-authorization.md" >}}) + - [System Services]({{< relref "./docs/SecHarden/system-services.md" >}}) + - [File Permissions]({{< relref "./docs/SecHarden/file-permissions.md" >}}) + - [Kernel Parameters]({{< relref "./docs/SecHarden/kernel-parameters.md" >}}) + - [SELinux Configuration]({{< relref "./docs/SecHarden/selinux-configuration.md" >}}) + - [Security Hardening Tools]({{< relref "./docs/SecHarden/security-hardening-tools.md" >}}) + - [Appendix]({{< relref "./docs/SecHarden/appendix.md" >}}) +- [Virtualization User Guide]({{< relref "./docs/Virtualization/virtualization.md" >}}) + - [Introduction to Virtualization]({{< relref "./docs/Virtualization/introduction-to-virtualization.md" >}}) + - [Installing Virtualization Components]({{< relref "./docs/Virtualization/virtualization-installation.md" >}}) + - [Environment Preparation]({{< relref "./docs/Virtualization/environment-preparation.md" >}}) + - [VM Configuration]({{< relref "./docs/Virtualization/vm-configuration.md" >}}) + - [Managing VMs]({{< relref "./docs/Virtualization/managing-vms.md" >}}) + - [VM Live Migration]({{< relref "./docs/Virtualization/vm-live-migration.md" >}}) + - [System Resource Management]({{< relref "./docs/Virtualization/system-resource-management.md" >}}) + - [Managing Devices]({{< relref "./docs/Virtualization/managing-devices.md" >}}) + - [VM Maintainability Management]({{< relref "./docs/Virtualization/vm-maintainability-management.md" >}}) + - [Best Practices]({{< relref "./docs/Virtualization/best-practices.md" >}}) + - [Tool Guide]({{< relref "./docs/Virtualization/tool-guide.md" >}}) + - [vmtop]({{< relref "./docs/Virtualization/vmtop.md" >}}) + - [LibcarePlus]({{< relref "./docs/Virtualization/LibcarePlus.md" >}}) + - [Skylark VM Hybrid Deployment]({{< relref "./docs/Virtualization/Skylark.md" >}}) + - [Appendix]({{< relref "./docs/Virtualization/appendix.md" >}}) +- [StratoVirt Virtualization User Guide]({{< relref "./docs/StratoVirt/StratoVirt_guidence.md" >}}) + - [Introduction to StratoVirt]({{< relref "./docs/StratoVirt/StratoVirt_introduction.md" >}}) + - [Installing StratoVirt]({{< relref "./docs/StratoVirt/Install_StratoVirt.md" >}}) + - [Preparing the Environment]({{< relref "./docs/StratoVirt/Prepare_env.md" >}}) + - [Configuring a VM]({{< relref "./docs/StratoVirt/VM_configuration.md" >}}) + - [VM Management]({{< relref "./docs/StratoVirt/VM_management.md" >}}) + - [Connecting to the iSula Secure Container]({{< relref "./docs/StratoVirt/interconnect_isula.md" >}}) + - [Interconnecting with libvirt]({{< relref "./docs/StratoVirt/Interconnect_libvirt.md" >}}) + - [StratoVirt VFIO Instructions]({{< relref "./docs/StratoVirt/StratoVirt_VFIO_instructions.md" >}}) +- [Container User Guide]({{< relref "./docs/Container/container.md" >}}) + - [iSulad Container Engine]({{< relref "./docs/Container/isulad-container-engine.md" >}}) + - [Installation, Upgrade and Uninstallation]({{< relref "./docs/Container/installation-upgrade-Uninstallation.md" >}}) + - [Installation and Configuration]({{< relref "./docs/Container/installation-configuration.md" >}}) + - [Upgrade Methods]({{< relref "./docs/Container/upgrade-methods.md" >}}) + - [Uninstallation]({{< relref "./docs/Container/uninstallation.md" >}}) + - [Application Scenarios]({{< relref "./docs/Container/application-scenarios.md" >}}) + - [Container Management]({{< relref "./docs/Container/container-management.md" >}}) + - [Interconnection with the CNI Network]({{< relref "./docs/Container/interconnection-with-the-cni-network.md" >}}) + - [Privileged Container]({{< relref "./docs/Container/privileged-container.md" >}}) + - [CRI]({{< relref "./docs/Container/cri.md" >}}) + - [Image Management]({{< relref "./docs/Container/image-management.md" >}}) + - [Checking the Container Health Status]({{< relref "./docs/Container/checking-the-container-health-status.md" >}}) + - [Querying Information]({{< relref "./docs/Container/querying-information.md" >}}) + - [Security Features]({{< relref "./docs/Container/security-features.md" >}}) + - [Supporting OCI hooks]({{< relref "./docs/Container/supporting-oci-hooks.md" >}}) + - [Local Volume Management]({{< relref "./docs/Container/local-volume-management.md" >}}) + - [Interconnecting iSulad shim v2 with StratoVirt]({{< relref "./docs/Container/interconnecting-isula-shim-v2-with-stratovirt.md" >}}) + - [Appendix]({{< relref "./docs/Container/appendix.md" >}}) + - [System Container]({{< relref "./docs/Container/system-container.md" >}}) + - [Installation Guideline]({{< relref "./docs/Container/installation-guideline.md" >}}) + - [Usage Guide]({{< relref "./docs/Container/usage-guide.md" >}}) + - [Specifying Rootfs to Create a Container]({{< relref "./docs/Container/specifying-rootfs-to-create-a-container.md" >}}) + - [Using systemd to Start a Container]({{< relref "./docs/Container/using-systemd-to-start-a-container.md" >}}) + - [Reboot or Shutdown in a Container]({{< relref "./docs/Container/reboot-or-shutdown-in-a-container.md" >}}) + - [Configurable Cgroup Path]({{< relref "./docs/Container/configurable-cgroup-path.md" >}}) + - [Writable Namespace Kernel Parameters]({{< relref "./docs/Container/writable-namespace-kernel-parameters.md" >}}) + - [Shared Memory Channels]({{< relref "./docs/Container/shared-memory-channels.md" >}}) + - [Dynamically Loading the Kernel Module]({{< relref "./docs/Container/dynamically-loading-the-kernel-module.md" >}}) + - [Environment Variable Persisting]({{< relref "./docs/Container/environment-variable-persisting.md" >}}) + - [Maximum Number of Handles]({{< relref "./docs/Container/maximum-number-of-handles.md" >}}) + - [Security and Isolation]({{< relref "./docs/Container/security-and-isolation.md" >}}) + - [Dynamically Managing Container Resources \\(syscontainer-tools\\)]({{< relref "./docs/Container/dynamically-managing-container-resources-(syscontainer-tools).md" >}}) + - [Appendix]({{< relref "./docs/Container/appendix-1.md" >}}) + - [Secure Container]({{< relref "./docs/Container/secure-container.md" >}}) + - [Installation and Deployment]({{< relref "./docs/Container/installation-and-deployment-2.md" >}}) + - [Application Scenarios]({{< relref "./docs/Container/application-scenarios-2.md" >}}) + - [Managing the Lifecycle of a Secure Container]({{< relref "./docs/Container/managing-the-lifecycle-of-a-secure-container.md" >}}) + - [Configuring Resources for a Secure Container]({{< relref "./docs/Container/configuring-resources-for-a-secure-container.md" >}}) + - [Monitoring Secure Containers]({{< relref "./docs/Container/monitoring-secure-containers.md" >}}) + - [Appendix]({{< relref "./docs/Container/appendix-2.md" >}}) + - [Docker Container]({{< relref "./docs/Container/docker-container.md" >}}) + - [Installation and Deployment]({{< relref "./docs/Container/installation-and-deployment-3.md" >}}) + - [Container Management]({{< relref "./docs/Container/container-management-1.md" >}}) + - [Image Management]({{< relref "./docs/Container/image-management-1.md" >}}) + - [Command Reference]({{< relref "./docs/Container/command-reference.md" >}}) + - [Container Engine]({{< relref "./docs/Container/container-engine.md" >}}) + - [Container Management]({{< relref "./docs/Container/container-management-2.md" >}}) + - [Image Management]({{< relref "./docs/Container/image-management-2.md" >}}) + - [Statistics]({{< relref "./docs/Container/statistics.md" >}}) + - [Image Building]({{< relref "./docs/Container/isula-build.md" >}}) +- [openEuler Embedded User Guide](https://openeuler.gitee.io/yocto-meta-openeuler/master/index.html) +- [Kernel Live Upgrade Guide]({{< relref "./docs/KernelLiveUpgrade/KernelLiveUpgrade.md" >}}) + - [Installation and Deployment]({{< relref "./docs/KernelLiveUpgrade/installation-and-deployment.md" >}}) + - [How to Run]({{< relref "./docs/KernelLiveUpgrade/how-to-run.md" >}}) + - [Common Problems and Solutions]({{< relref "./docs/KernelLiveUpgrade/common-problems-and-solutions.md" >}}) +- [Application Development Guide]({{< relref "./docs/ApplicationDev/application-development.md" >}}) + - [Preparation]({{< relref "./docs/ApplicationDev/preparations-for-development-environment.md" >}}) + - [Using GCC for Compilation]({{< relref "./docs/ApplicationDev/using-gcc-for-compilation.md" >}}) + - [Using Make for Compilation]({{< relref "./docs/ApplicationDev/using-make-for-compilation.md" >}}) + - [Using JDK for Compilation]({{< relref "./docs/ApplicationDev/using-jdk-for-compilation.md" >}}) + - [Building an RPM Package]({{< relref "./docs/ApplicationDev/building-an-rpm-package.md" >}}) + - [FAQ]({{< relref "./docs/ApplicationDev/FAQ.md" >}}) +- [secGear Development Guide]({{< relref "./docs/secGear/secGear.md" >}}) + - [Introduction to secGear]({{< relref "./docs/secGear/introduction-to-secGear.md" >}}) + - [Installing secGear]({{< relref "./docs/secGear/installing-secGear.md" >}}) + - [secGear Development Guide]( {{< relref "./docs/secGear/secGear-development-guide" >}}) + - [Using the Switchless Feature]({{< relref "./docs/secGear/using-the-switchless-feature.md" >}}) + - [Using the secGear Tool](For {{< relref "./docs/secGear/using-the-secGear-tool.md" >}}) + - [API Reference]({{< relref "./docs/secGear/api-description.md" >}}) +- [KubeEdge User Guide]({{< relref "./docs/KubeEdge/overview.md" >}}) + - [KubeEdge Usage Guide]({{< relref "./docs/KubeEdge/kubeedge-usage-guide.md" >}}) + - [KubeEdge Deployment Guide]({{< relref "./docs/KubeEdge/kubeedge-deployment-guide.md" >}}) +- [Third-Party Software Deployment Guide]({{< relref "./docs/thirdparty_migration/thidrparty.md" >}}) + - [HA User Guide]({{< relref "./docs/thirdparty_migration/ha.md" >}}) + - [Deploying an HA Cluster]({{< relref "./docs/thirdparty_migration/installha.md" >}}) + - [HA Use Cases]({{< relref "./docs/thirdparty_migration/usecase.md" >}}) + - [KubeSphere Deployment Guide]({{< relref "./docs/desktop/kubesphere.md" >}}) +- [Memory Fabric User Guide]({{< relref "./docs/memory-fabric/memory-fabric-user-guide.md" >}}) +- [Desktop Environment User Guide]({{< relref "./docs/desktop/desktop.md" >}}) + - [UKUI]({{< relref "./docs/desktop/ukui.md" >}}) + - [UKUI Installation]({{< relref "./docs/desktop/installing-UKUI.md" >}}) + - [UKUI User Guide]({{< relref "./docs/desktop/UKUI-user-guide.md" >}}) + - [DDE]({{< relref "./docs/desktop/dde.md" >}}) + - [DDE Installation]({{< relref "./docs/desktop/installing-DDE.md" >}}) + - [DDE User Guide]({{< relref "./docs/desktop/DDE-user-guide.md" >}}) + - [Xfce]({{< relref "./docs/desktop/xfce.md" >}}) + - [Xfce Installation]({{< relref "./docs/desktop/installing-Xfce.md" >}}) + - [Xfce User Guide]({{< relref "./docs/desktop/Xfce_userguide.md" >}}) + - [GNOME]({{< relref "./docs/desktop/gnome.md" >}}) + - [GNOME Installation]({{< relref "./docs/desktop/installing-GNOME.md" >}}) + - [GNOME User Guide]({{< relref "./docs/desktop/Gnome_userguide.md" >}}) +- [Toolset User Guide]({{< relref "./docs/userguide/overview.md" >}}) + - [Patch Tracking]({{< relref "./docs/userguide/patch-tracking.md" >}}) + - [pkgship]({{< relref "./docs/userguide/pkgship.md" >}}) +- [Gazelle User Guide]({{< relref "./docs/Gazelle/Gazelle.md" >}}) +- [System Analysis and Tuning User Guide]({{< relref "./docs/SystemOptimization/overview.md" >}}) + - [MySQL Performance Tuning]({{< relref "./docs/SystemOptimization/mysql-performance-tuning.md" >}}) + - [Bid Data Tuning]({{< relref "./docs/SystemOptimization/big-data-tuning.md" >}}) +- [NestOS User Guide]({{< relref "./docs/NestOS/overview.md" >}}) + - [Installation and Deployment]({{< relref "./docs/NestOS/installation-and-deployment.md" >}}) + - [Setting Up Kubernetes and iSulad]({{< relref "./docs/NestOS/usage.md" >}}) + - [Feature Description]({{< relref "./docs/NestOS/feature-description.md" >}}) +- [ShangMi Feature User Guide]({{< relref "./docs/ShangMi/overview.md" >}}) + - [Algorithm Library]({{< relref "./docs/ShangMi/algorithm-library.md" >}}) + - [Certificates]({{< relref "./docs/ShangMi/certificates.md" >}}) + - [User Identity Authentication]({{< relref "./docs/ShangMi/user-identity-authentication.md" >}}) + - [SSH Protocol Stack]({{< relref "./docs/ShangMi/ssh-stack.md" >}}) + - [TLCP Stack]({{< relref "./docs/ShangMi/tlcp-stack.md" >}}) + - [Kernel Module Signing]({{< relref "./docs/ShangMi/kernel-module-signing.md" >}}) + - [File Integrity Protection]({{< relref "./docs/ShangMi/file-integrity-protection.md" >}}) + - [Disk Encryption]({{< relref "./docs/ShangMi/disk-encryption.md" >}}) + - [Secure Boot]({{< relref "./docs/ShangMi/secure-boot.md" >}}) +- [astream User Guide]({{< relref "./docs/astream/overview.md" >}}) + - [Installation and Usage]({{< relref "./docs/astream/installation_and_usage.md" >}}) + - [astream for MySQL]({{< relref "./docs/astream/astream-for-mysql-guide.md" >}}) +- [Kmesh User Guide]({{< relref "./docs/Kmesh/Kmesh.md" >}}) + - [Introduction to Kmesh]({{< relref "./docs/Kmesh/introduction-to-kmesh.md" >}}) + - [Installation and Deployment]({{< relref "./docs/Kmesh/installation-and-deployment.md" >}}) + - [Usage]({{< relref "./docs/Kmesh/usage.md" >}}) + - [FAQs]({{< relref "./docs/Kmesh/faqs.md" >}}) + - [Appendix]({{< relref "./docs/Kmesh/appendix.md" >}}) +- [NFS Multipathing User Guide]({{< relref "./docs/NfsMultipath/nfs-multipathing-user-guide.md" >}}) + - [Introduction to NFS Multipathing]({{< relref "./docs/NfsMultipath/introduction-to-nfs-multipathing.md" >}}) + - [Installation and Deployment]({{< relref "./docs/NfsMultipath/installation-and-deployment.md" >}}) + - [Usage]({{< relref "./docs/NfsMultipath/usage.md" >}}) + - [FAQs]({{< relref "./docs/NfsMultipath/faqs.md" >}}) +- [PIN User Guide]({{< relref "./docs/Pin/pin-user-guide.md" >}}) +- [OmniVirtUser Guide]({{< relref "./docs/OmniVirt/overall.md" >}}) + - [Installing and Running OmniVirt in Windows]({{< relref "./docs/OmniVirt/win-user-manual.md" >}}) + - [Installing and Running OmniVirt on macOS]({{< relref "./docs/OmniVirt/mac-user-manual.md" >}}) diff --git "a/docs/zh/docs/A-Ops/AOps\346\231\272\350\203\275\345\256\232\344\275\215\346\241\206\346\236\266\344\275\277\347\224\250\346\211\213\345\206\214.md" "b/docs/zh/docs/A-Ops/AOps\346\231\272\350\203\275\345\256\232\344\275\215\346\241\206\346\236\266\344\275\277\347\224\250\346\211\213\345\206\214.md" deleted file mode 100644 index 63a8e172d44804120c72279437c80d7776dcfb36..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/AOps\346\231\272\350\203\275\345\256\232\344\275\215\346\241\206\346\236\266\344\275\277\347\224\250\346\211\213\345\206\214.md" +++ /dev/null @@ -1,194 +0,0 @@ -# AOps 智能定位框架使用手册 - -参照[AOps部署指南](AOps部署指南.md)部署AOps前后端服务后,即可使用AOps智能定位框架。 - -下文会从页面的维度进行AOps智能定位框架功能的介绍。 - -### 1. 工作台 - - 该页面为数据看板页面,用户登陆后,仍在该页面。 - - ![4911661916984_.pic](./figures/工作台.jpg) - -支持操作: - -- 当前纳管的主机数量 -- 当前所有未确认的告警数量 - -- 每个主机组告警情况的统计 - -- 用户账户操作 - - - 修改密码 - - 退出登录 -- 业务域和CVE信息暂不支持 - -### 2. 资产管理 - -资产管理分为对主机组进行管理以及对主机进行管理。每个主机在agent侧注册时需指定一个已存在的主机组进行注册,注册完毕后会在前端进行显示。 - -(1)主机组页面: - -![4761661915951_.pic](./figures/主机组.jpg) - -支持如下操作: - -- 主机组添加 -- 主机组删除 -- 查看当前所有主机组 -- 查看每个主机组下的主机信息 - -添加主机组时,需指定主机组的名称和描述。注意:请勿重复名称。 - -![添加主机组](./figures/添加主机组.jpg) - -(2)主机管理页面: - -![主机管理](./figures/主机管理.jpg) - -支持如下操作: - -- 查看主机列表(可根据主机组、管理节点进行筛选,可根据主机名称进行排序) -- 删除主机 -- 点击主机可跳转到主机详情界面 - -(3)主机详细信息界面: - -![主机详情](./figures/主机详情.jpg) - -详情页的上半部分展示了该主机的操作系统及CPU等的基础信息。 - -![插件管理](./figures/插件管理.jpg) - -详情页的下半部分,用户可以看到该主机当前运行的采集插件信息(目前agent只支持gala-gopher插件)。 - -支持如下操作: - -- 查看主机基础信息及插件信息 -- 插件的管理(gala-gopher) - - 插件资源查看 - - 插件的开启和管理 - - gala-gopher的采集探针的开启和关闭 -- 主机场景的识别 - -点击场景识别后,系统会生成该主机的场景,并推荐检测该场景所需开启的插件以及采集项,用户可以推荐结果进行插件/探针的调整。 - -注意:修改插件信息如关闭插件或开关探针后,需要点击保存才能生效。 - -![修改插件](./figures/修改插件.png) - -### 3. 智能定位 - -AOps项目的智能定位策略采用内置网络诊断应用作为模板,生成个性化工作流的策略进行检测和诊断。 - -“应用”作为工作流的模板,描述了检测中各步骤的串联情况,内置各步骤中使用的检测模型的推荐逻辑。用户在生成工作流时,可根据各主机的采集项、场景等信息,定制出工作流的详细信息。 - -(1)工作流列表页面: - -![工作流](./figures/工作流.jpg) - -支持操作: - -- 查看当前工作流列表,支持按照主机组、应用和状态进行筛选,并支持分页操作 -- 查看当前应用列表 - -(2)工作流详情页面: - -![工作流详情](./figures/工作流详情.jpg) - -支持操作: - -- 查看工作流所属主机组,主机数量、状态等基础信息 -- 查看单指标检测、多指标检测、集群故障诊断各步骤的详细算法模型信息 -- 修改检测各步骤应用的模型 -- 执行、暂停和删除工作流 - -修改某检测步骤的模型时,用户可根据模型名或标签搜索系统内置的模型库,选中模型后点击应用进行更改。 - -![修改模型](./figures/修改模型.png) - -(3)应用详情页面 - -![app详情](./figures/应用.png) - -支持操作: - -- 查看应用的整体流程 -- 基于应用创建工作流 - -创建工作流时,点击右上角的创建工作流按钮,并在右侧弹出的窗口中输入工作流的名称和描述,选择要检测的主机组。选中主机组后,下方会列出该主机组的所有主机,用户可选中部分主机后移到右侧的列表,最后点击创建,即可在工作流列表中看到新创建的工作流。 - -![app详情](./figures/app详情.jpg) - -![创建工作流](./figures/创建工作流.jpg) - -(4)告警 - -启动工作流后,会根据工作流的执行周期定时触发诊断,每次诊断若结果为异常,则会作为一条告警存入数据库,同时也会反应在前端告警页面中。 - -![告警](./figures/告警.jpg) - -支持操作: - -- 查看当前告警总数 -- 查看各主机组的告警数量 -- 查看告警列表 -- 告警确认 -- 查看告警详情 -- 下载诊断报告 - -告警确认后,将不在列表中显示 - -![告警确认](./figures/告警确认.jpg) - -点击异常详情后,可以根据主机维度查看告警详情,包括异常数据项的展示以及根因节点、根因异常的判断等。 - -![告警详情](./figures/告警详情.jpg) - -### 4. 配置溯源 - -AOps项目的配置溯源用于对目标主机配置文件内容的变动进行检测记录,对于文件配置错误类引发的故障起到很好的支撑作用。 - -#### 创建配置域 - - -![](./figures/chuangjianyewuyu.png) - - - -#### 添加配置域纳管node - -![](./figures/tianjianode.png) - - - -#### 添加配置域配置 - - -![](./figures/xinzengpeizhi.png) - -#### 查询预期配置 - - -![](./figures/chakanyuqi.png) - -#### 删除配置 - -![](./figures/shanchupeizhi.png) - -#### 查询实际配置 - -![](./figures/chaxunshijipeizhi.png) - - - -#### 配置校验 - - -![](./figures/zhuangtaichaxun.png) - - - -#### 配置同步 - -暂未提供 \ No newline at end of file diff --git "a/docs/zh/docs/A-Ops/AOps\351\203\250\347\275\262\346\214\207\345\215\227.md" "b/docs/zh/docs/A-Ops/AOps\351\203\250\347\275\262\346\214\207\345\215\227.md" deleted file mode 100644 index bd2edfd7c479277b3b322bce82b9179197a0957f..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/AOps\351\203\250\347\275\262\346\214\207\345\215\227.md" +++ /dev/null @@ -1,460 +0,0 @@ -# A-Ops部署指南 - -## 一、环境要求 - -- 2台openEuler 22.09机器 - - 分别用于部署check模块的两种模式:调度器,执行器。其他服务如mysql、elasticsearch、aops-manager等可在任意一台机器独立部署,为便于操作,将这些服务部署在机器A。 - -- 内存尽量为8G+ - -## 二、配置部署环境 - -### 机器A: - -机器A需部署的aops服务有:aops-tools、aops-manager、aops-check、aops-web、aops-agent、gala-gopher。 - -需部署的第三方服务有:mysql、elasticsearch、zookeeper、kafka、prometheus。 - -具体部署步骤如下: - -#### 2.1 关闭防火墙 - -关闭本节点防火墙 - -``` -systemctl stop firewalld -systemctl disable firewalld -systemctl status firewalld -``` - -#### 2.2 部署aops-tools - -安装aops-tools: - -``` -yum install aops-tools -``` - -#### 2.3 部署数据库[mysql、elasticsearch] - -##### 2.3.1 部署mysql - -使用安装aops-tools时安装的aops-basedatabase脚本进行安装 - -``` -cd /opt/aops/aops_tools -./aops-basedatabase mysql -``` - -修改mysql配置文件 - -``` -vim /etc/my.cnf -``` - -新增bind-address, 值为本机ip - -![1662346986112](./figures/修改mysql配置文件.png) - -重启mysql服务 - -``` -systemctl restart mysqld -``` - -连接数据库,设置权限: - -``` -mysql -show databases; -use mysql; -select user,host from user;//出现user为root,host为localhost时,说明mysql只允许本机连接,外网和本地软件客户端则无法连接。 -update user set host = '%' where user='root'; -flush privileges;//刷新权限 -exit -``` - -##### 2.3.2 部署elasticsearch - -使用安装aops-tools时安装的aops-basedatabase脚本进行安装 - -``` -cd /opt/aops/aops_tools -./aops-basedatabase elasticsearch -``` - -修改配置文件: - -修改elasticsearch配置文件: - -``` -vim /etc/elasticsearch/elasticsearch.yml -``` - -![1662370718890](./figures/elasticsearch配置2.png) - -![1662370575036](./figures/elasticsearch配置1.png) - -![1662370776219](./figures/elasticsearch3.png) - -重启elasticsearch服务: - -``` -systemctl restart elasticsearch -``` - -#### 2.4 部署aops-manager - -安装aops-manager - -``` -yum install aops-manager -``` - -修改配置文件: - -``` -vim /etc/aops/manager.ini -``` - -将配置文件中各服务的地址修改为真实地址,由于将所有服务都部署在机器A,故需把IP地址配为机器A的地址。 - -``` -[manager] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=11111 -host_vault_dir=/opt/aops -host_vars=/opt/aops/host_vars - -[uwsgi] -wsgi-file=manage.py -daemonize=/var/log/aops/uwsgi/manager.log -http-timeout=600 -harakiri=600 - -[elasticsearch] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=9200 -max_es_query_num=10000000 - -[mysql] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=3306 -database_name=aops -engine_format=mysql+pymysql://@%s:%s/%s -pool_size=10000 -pool_recycle=7200 - -[aops_check] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=11112 -``` - -启动aops-manager服务: - -``` -systemctl start aops-manager -``` - -#### 2.5 部署aops-web - -安装aops-web - -``` -yum install aops-web -``` - -修改配置文件,由于将所有服务都部署在机器A,故需将web访问的各服务地址配置成机器A的真实ip。 - -``` -vim /etc/nginx/aops-nginx.conf -``` - -部分服务配置截图: - -![1662378186528](./figures/配置web.png) - -开启aops-web服务: - -``` -systemctl start aops-web -``` - -#### 2.6 部署kafka - -##### 2.6.1 部署zookeeper - -安装: - -``` -yum install zookeeper -``` - -启动服务: - -``` -systemctl start zookeeper -``` - -##### 2.6.2 部署kafka - -安装: - -``` -yum install kafka -``` - -修改配置文件: - -``` -vim /opt/kafka/config/server.properties -``` - -将listener 改为本机ip - -![1662381371927](./figures/kafka配置.png) - -启动kafka服务: - -``` -cd /opt/kafka/bin -nohup ./kafka-server-start.sh ../config/server.properties & -tail -f ./nohup.out # 查看nohup所有的输出出现A本机ip 以及 kafka启动成功INFO; -``` - -#### 2.7 部署aops-check - -安装aops-check: - -``` -yum install aops-check -``` - -修改配置文件: - -``` -vim /etc/aops/check.ini -``` - -将配置文件中各服务的地址修改为真实地址,由于将所有服务都部署在机器A,故需把IP地址配为机器A的地址。 - -``` -[check] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=11112 -mode=configurable // 该模式为configurable模式,用于常规诊断模式下的调度器。 -timing_check=on - -[default_mode] -period=30 -step=30 - -[elasticsearch] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=9200 - -[mysql] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=3306 -database_name=aops -engine_format=mysql+pymysql://@%s:%s/%s -pool_size=10000 -pool_recycle=7200 - -[prometheus] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=9090 -query_range_step=15s - -[agent] -default_instance_port=8888 - -[manager] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=11111 - -[consumer] -kafka_server_list=192.168.1.1:9092 // 此处及后续服务ip修改为机器A真实ip -enable_auto_commit=False -auto_offset_reset=earliest -timeout_ms=5 -max_records=3 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -result_name=CHECK_RESULT -[producer] -kafka_server_list = 192.168.1.1:9092 // 此处及后续服务ip修改为机器A真实ip -api_version = 0.11.5 -acks = 1 -retries = 3 -retry_backoff_ms = 100 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -``` - -启动aops-check服务(configurable模式): - -``` -systemctl start aops-check -``` - -#### 2.8 部署客户端服务 - -客户端机器的服务需要部署aops-agent及gala-gopher,具体可参考[aops-agent部署指南](aops-agent部署指南.md)。 - -注意:主机注册时需要先在前端添加主机组操作,确保该主机所属的主机组存在。此处只对机器A做部署、纳管。 - -#### 2.9 部署prometheus - -安装prometheus: - -``` -yum install prometheus2 -``` - -修改配置文件: - -``` -vim /etc/prometheus/prometheus.yml -``` - -将所有客户端的gala-gopher地址新增到prometheus的监控节点中。 - -![1662377261742](./figures/prometheus配置.png) - -启动服务: - -``` -systemctl start prometheus -``` - -#### 2.10 部署gala-ragdoll - -A-Ops配置溯源功能依赖gala-ragdoll实现,通过Git实现配置文件的变动监测。 - -安装gala-ragdoll: - -```shell -yum install gala-ragdoll # A-Ops 配置溯源 -``` - -修改配置文件: - -```shell -vim /etc/ragdoll/gala-ragdoll.conf -``` - -将collect节点collect_address中IP地址修改为机器A的地址,collect_api与collect_port修改为实际接口地址。 - -``` -[git] -git_dir = "/home/confTraceTest" -user_name = "user_name" -user_email = "user_email" - -[collect] -collect_address = "http://192.168.1.1" //此处修改为机器A的真实IP -collect_api = "/manage/config/collect" //此处修改为配置文件采集的实际接口 -collect_port = 11111 //此处修改为服务的实际端口 - -[sync] -sync_address = "http://0.0.0.0" -sync_api = "/demo/syncConf" -sync_port = 11114 - - -[ragdoll] -port = 11114 - -``` - -启动gala-ragdoll服务 - -```shell -systemctl start gala-ragdoll -``` - -### 机器B: - -机器B只需部署aops-check作为执行器。 - -#### 2.11 部署aops-check - -安装aops-check: - -``` -yum install aops-check -``` - -修改配置文件: - -``` -vim /etc/aops/check.ini -``` - -将配置文件中各服务的地址修改为真实地址,除check服务为机器B的地址外,其他服务都部署在机器A,故需把IP地址配置为机器A的地址即可。 - -``` -[check] -ip=192.168.1.2 // 此处ip改为机器B真实ip -port=11112 -mode=executor // executor,用于常规诊断模式下的执行器 -timing_check=on - -[default_mode] -period=30 -step=30 - -[elasticsearch] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=9200 - -[mysql] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=3306 -database_name=aops -engine_format=mysql+pymysql://@%s:%s/%s -pool_size=10000 -pool_recycle=7200 - -[prometheus] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=9090 -query_range_step=15s - -[agent] -default_instance_port=8888 - -[manager] -ip=192.168.1.1 // 此处及后续服务ip修改为机器A真实ip -port=11111 - -[consumer] -kafka_server_list=192.168.1.1:9092 // 此处及后续服务ip修改为机器A真实ip -enable_auto_commit=False -auto_offset_reset=earliest -timeout_ms=5 -max_records=3 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -result_name=CHECK_RESULT -[producer] -kafka_server_list = 192.168.1.1:9092 // 此处及后续服务ip修改为机器A真实ip -api_version = 0.11.5 -acks = 1 -retries = 3 -retry_backoff_ms = 100 -task_name=CHECK_TASK -task_group_id=CHECK_TASK_GROUP_ID -``` - -启动aops-check服务(executor模式): - -``` -systemctl start aops-check -``` - - - -至此,两台机器的服务部署完成。 \ No newline at end of file diff --git "a/docs/zh/docs/A-Ops/aops-agent\351\203\250\347\275\262\346\214\207\345\215\227.md" "b/docs/zh/docs/A-Ops/aops-agent\351\203\250\347\275\262\346\214\207\345\215\227.md" deleted file mode 100644 index e217ee834650f9d13f79a444ea65e1468e832cc4..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/aops-agent\351\203\250\347\275\262\346\214\207\345\215\227.md" +++ /dev/null @@ -1,665 +0,0 @@ - -# aops-agent部署指南 -### 一、环境要求 - -1台openEuler机器,建议openEuler-20.03及以上版本运行。 - -### 二、配置环境部署 - -#### 1. 关闭防火墙 - -```shell -systemctl stop firewalld -systemctl disable firewalld -systemctl status firewalld -``` - -#### 2. aops-agent部署 - -1. 基于yum源安装:yum install aops-agent - -2. 修改配置文件:将agent节点下IP标签值修改为本机IP, - - vim /etc/aops/agent.conf,以IP地址为192.168.1.47为例 - - ```ini - [agent] - ;启动aops-agent时,绑定的IP与端口 - ip=192.168.1.47 - port=12000 - - [gopher] - ;gala-gopher默认配置文件路径,如需修改请确保文件路径的准确性 - config_path=/opt/gala-gopher/gala-gopher.conf - - ;aops-agent采集日志配置 - [log] - ;采集日志级别,可设置为DEBUG,INFO,WARNING,ERROR,CRITICAL - log_level=INFO - ;采集日志存放位置 - log_dir=/var/log/aops - ;日志文件最大值 - max_bytes=31457280 - ;备份日志的数量 - backup_count=40 - ``` - -3. 启动服务:systemctl start aops-agent -#### 3. 向aops-manager注册 - -为了辨别使用者的身份,避免接口被随意调用,aops-agent采用token验证身份,以减轻所部署主机的压力。 - -基于安全性考虑,项目采用主动注册的方式去获取token。注册前,须在agent侧准备好需要注册的信息,调用register命令向aops-manager注册。由于agent未配置数据库,注册成功后,自动将token保存到指定文件内,并在前台展示注册结果。同时将本机相关信息存入到aops-manager侧的数据库中,以便后续管理。 - -1. 准备register.json 文件 - - 在aops-agent侧准备好注册所需信息以json格式存入文件中,数据结构如下: - -```JSON -{ - // 前端登录用户名 - "web_username":"admin", - // 用户密码 - "web_password": "changeme", - // 主机名称 - "host_name": "host1", - // 主机所在组名称 - "host_group_name": "group1", - // aops-manager运行主机IP地址 - "manager_ip":"192.168.1.23", - // 是否注册为管理机器 - "management":false, - // aops-manager运行对外端口 - "manager_port":"11111", - // agent运行端口 - "agent_port":"12000" -} -``` - -`注意:确保aops-manager已在目标主机运行,如192.168.1.23,且注册的主机组要存在。` - -2. 执行:aops_agent register -f register.json, -3. 前台展示注册结果,注册成功时,保存token字符串至指定文件;注册失败时,根据提示以及日志内容了解具体原因。(/var/log/aops/aops.log) - -注册结果示例: - -`注册成功` - -```shell -[root@localhost ~]# aops_agent register -f register.json -Agent Register Success -``` - -`注册失败,以aops-manager未启动为示例` - -```shell -[root@localhost ~]# aops_agent register -f register.json -Agent Register Fail -[root@localhost ~]# -``` - -`对应日志内容` - -```shell -2022-09-05 16:11:52,576 ERROR command_manage/register/331: HTTPConnectionPool(host='192.168.1.23', port=11111): Max retries exceeded with url: /manage/host/add (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) -[root@localhost ~]# -``` - -### 三、插件支持 - -#### 3.1 gala-gopher - -##### 3.1.1 介绍 - -gala-gopher是基于eBPF的低负载探针框架,可用于对主机的CPU,内存,网络等状态的监控以及数据采集服务。可根据实际业务需求对已有采集探针采集状态进行配置。 - -##### 3.1.2 部署 - -1. 基于yum源安装:yum install gala-gopher -2. 基于实际的业务需求,选择需要探针进行开启,探针信息可在/opt/gala-gopher/gala-gopher.conf下查看。 -3. 启动服务:systemctl start gala-gopher - -##### 3.1.3 其它 - -gala-gopher更多信息可参考文档https://gitee.com/openeuler/gala-gopher/blob/master/README.md - -### 四、接口支持 - -#### 4.1 对外接口清单 - -| 序号 | 接口名称 | 类型 | 说明 | -| ---- | ------------------------------ | ---- | ----------------------| -| 1 | /v1/agent/plugin/start | POST | 启动插件 | -| 2 | /v1/agent/plugin/stop | POST | 停止插件 | -| 3 | /v1/agent/application/info | GET | 采集目标应用集内正在运行的应用 | -| 4 | /v1/agent/host/info | GET | 获取主机信息 | -| 5 | /v1/agent/plugin/info | GET | 获取agent中插件运行信息 | -| 6 | /v1/agent/file/collect | POST | 采集配置文件内容 | -| 7 | /v1/agent/collect/items/change | POST | 改变插件采集项的运行状态 | - -##### 4.1.1、/v1/agent/plugin/start - -+ 描述:启动已安装但未运行的插件,目前仅支持gala-gopher插件。 - -+ HTTP请求方式:POST - -+ 数据提交方式:query - -+ 请求参数: - - | 参数名 | 必选 | 类型 | 说明 | - | ----------- | ---- | ---- | ------ | - | plugin_name | True | str | 插件名 | - -+ 请求参数示例 - - | 参数名 | 参数值 | - | ----------- | ----------- | - | plugin_name | gala-gopher | - -+ 返回体参数 - - | 参数名 | 类型 | 说明 | - | ------ | ---- | ---------------- | - | code | int/ | 返回码 | - | msg | str | 状态码对应的信息 | - -+ 返回示例 - - ```json - { - "code": 200, - "msg": "xxxx" - } - ``` - - -##### 4.1.2、/v1/agent/plugin/stop - -+ 描述:使正在运行的插件停止,目前仅支持gala-gopher插件。 - -+ HTTP请求方式:POST - -+ 数据提交方式:query - -+ 请求参数: - - | 参数名 | 必选 | 类型 | 说明 | - | ----------- | ---- | ---- | ------ | - | plugin_name | True | str | 插件名 | - -+ 请求参数示例: - - | 参数名 | 参数值 | - | ----------- | ----------- | - | plugin_name | gala-gopher | - -+ 返回体参数: - - | 参数名 | 类型 | 说明 | - | ------ | ---- | ---------------- | - | code | int | 返回码 | - | msg | str | 状态码对应的信息 | - -+ 返回示例: - - ```json - { - "code": 200, - "msg": "xxxx" - } - ``` - - -##### 4.1.3、/v1/agent/application/info - -+ 描述:采集目标应用集内正在运行的应用,当前目标应用集包含mysql, kubernetes, hadoop, nginx, docker, gala-gopher。 - -+ HTTP请求方式:GET - -+ 数据提交方式:query - -+ 请求参数: - - | 参数名 | 必选 | 类型 | 说明 | - | ------ | ---- | ---- | ---- | - | | | | | - -+ 请求参数示例: - - | 参数名 | 参数值 | - | ------ | ------ | - | | | - -+ 返回体参数: - - | 参数名 | 类型 | 说明 | - | ------ | ---- | ---------------- | - | code | int | 返回码 | - | msg | str | 状态码对应的信息 | - | resp | dict | 响应数据主体 | - - + resp - - | 参数名 | 类型 | 说明 | - | ------- | --------- | -------------------------- | - | running | List[str] | 包含正在运行应用名称的系列 | - -+ 返回示例: - - ```json - { - "code": 200, - "msg": "xxxx", - "resp": { - "running": [ - "mysql", - "docker" - ] - } - } - ``` - - -##### 4.1.4、/v1/agent/host/info - -+ 描述:获取安装agent主机的信息,包含系统版本,BIOS版本,内核版本,CPU信息以及内存信息。 - -+ HTTP请求方式:POST - -+ 数据提交方式:application/json - -+ 请求参数: - - | 参数名 | 必选 | 类型 | 说明 | - | --------- | ---- | --------- | ------------------------------------------------ | - | info_type | True | List[str] | 需采集信息的名称,目前仅支持cpu、disk、memory、os | - -+ 请求参数示例: - - ```json - ["os", "cpu","memory", "disk"] - ``` - -+ 返回体参数: - - | 参数名 | 类型 | 说明 | - | ------ | ---- | ---------------- | - | code | int | 返回码 | - | msg | str | 状态码对应的信息 | - | resp | dict | 响应数据主体 | - - + resp - - | 参数名 | 类型 | 说明 | - | ------ | ---------- | -------- | - | cpu | dict | cpu信息 | - | memory | dict | 内存信息 | - | os | dict | OS信息 | - | disk | List[dict] | 硬盘信息 | - - + cpu - - | 参数名 | 类型 | 说明 | - | ------------ | ---- | --------------- | - | architecture | str | CPU架构 | - | core_count | int | 核心数 | - | l1d_cache | str | 1级数据缓存大小 | - | l1i_cache | str | 1级指令缓存大小 | - | l2_cache | str | 2级缓存大小 | - | l3_cache | str | 3级缓存大小 | - | model_name | str | 模式名称 | - | vendor_id | str | 厂商ID | - - + memory - - | 参数名 | 类型 | 说明 | - | ------ | ---------- | -------------- | - | size | str | 总内存大小 | - | total | int | 内存条数量 | - | info | List[dict] | 所有内存条信息 | - - + info - - | 参数名 | 类型 | 说明 | - | ------------ | ---- | -------- | - | size | str | 内存大小 | - | type | str | 类型 | - | speed | str | 速度 | - | manufacturer | str | 厂商 | - - + os - - | 参数名 | 类型 | 说明 | - | ------------ | ---- | -------- | - | bios_version | str | bios版本 | - | os_version | str | 系统名称 | - | kernel | str | 内核版本 | - -+ 返回示例: - - ```json - { - "code": 200, - "msg": "operate success", - "resp": { - "cpu": { - "architecture": "aarch64", - "core_count": "128", - "l1d_cache": "8 MiB (128 instances)", - "l1i_cache": "8 MiB (128 instances)", - "l2_cache": "64 MiB (128 instances)", - "l3_cache": "128 MiB (4 instances)", - "model_name": "Kunpeng-920", - "vendor_id": "HiSilicon" - }, - "memory": { - "info": [ - { - "manufacturer": "Hynix", - "size": "16 GB", - "speed": "2933 MT/s", - "type": "DDR4" - }, - { - "manufacturer": "Hynix", - "size": "16 GB", - "speed": "2933 MT/s", - "type": "DDR4" - } - ], - "size": "32G", - "total": 2 - }, - "os": { - "bios_version": "1.82", - "kernel": "5.10.0-60.18.0.50", - "os_version": "openEuler 22.03 LTS" - }, - "disk": [ - { - "capacity": "xxGB", - "model": "xxxxxx" - } - ] - } - } - ``` - - -##### 4.1.5、/v1/agent/plugin/info - -+ 描述:获取主机的插件运行情况,目前仅支持gala-gopher插件。 - -+ HTTP请求方式:GET - -+ 数据提交方式:query - -+ 请求参数: - - | 参数名 | 必选 | 类型 | 说明 | - | ------ | ---- | ---- | ---- | - | | | | | - -+ 请求参数示例: - - | 参数名 | 参数值 | - | ------ | ------ | - | | | - -+ 返回体参数: - - | 参数名 | 类型 | 说明 | - | ------ | ---------- | ---------------- | - | code | int | 返回码 | - | msg | str | 状态码对应的信息 | - | resp | List[dict] | 响应数据主体 | - - + resp - - | 参数名 | 类型 | 说明 | - | ------------- | ---------- | ------------------ | - | plugin_name | str | 插件名称 | - | collect_items | list | 插件采集项运行情况 | - | is_installed | str | 状态码对应的信息 | - | resource | List[dict] | 插件资源使用情况 | - | status | str | 插件运行状态 | - - + resource - - | 参数名 | 类型 | 说明 | - | ------------- | ---- | ---------- | - | name | str | 资源名称 | - | current_value | str | 资源使用值 | - | limit_value | str | 资源限制值 | - -+ 返回示例: - - ``` - { - "code": 200, - "msg": "operate success", - "resp": [ - { - "collect_items": [ - { - "probe_name": "system_tcp", - "probe_status": "off", - "support_auto": false - }, - { - "probe_name": "haproxy", - "probe_status": "auto", - "support_auto": true - }, - { - "probe_name": "nginx", - "probe_status": "auto", - "support_auto": true - }, - ], - "is_installed": true, - "plugin_name": "gala-gopher", - "resource": [ - { - "current_value": "0.0%", - "limit_value": null, - "name": "cpu" - }, - { - "current_value": "13 MB", - "limit_value": null, - "name": "memory" - } - ], - "status": "active" - } - ] - } - ``` - - -##### 4.1.6、/v1/agent/file/collect - -+ 描述:采集目标配置文件内容、文件权限、文件所属用户等信息。当前仅支持读取小于1M,无执行权限,且支持UTF8编码的文本文件 - -+ HTTP请求方式:POST - -+ 数据提交方式:application/json - -+ 请求参数: - - | 参数名 | 必选 | 类型 | 说明 | - | --------------- | ---- | --------- | ------------------------ | - | configfile_path | True | List[str] | 需采集文件完整路径的序列 | - -+ 请求参数示例: - - ```json - [ "/home/test.conf", "/home/test.ini", "/home/test.json"] - ``` - -+ 返回体参数: - - | 参数名 | 类型 | 说明 | - | ------------- | ---------- | ---------------- | - | infos | List[dict] | 文件采集信息 | - | success_files | List[str] | 采集成功文件列表 | - | fail_files | List[str] | 采集失败文件列表 | - - + infos - - | 参数名 | 类型 | 说明 | - | --------- | ---- | -------- | - | path | str | 文件路径 | - | content | str | 文件内容 | - | file_attr | dict | 文件属性 | - - + file_attr - - | 参数名 | 类型 | 说明 | - | ------ | ---- | ------------ | - | mode | str | 文件类型权限 | - | owner | str | 文件所属用户 | - | group | str | 文件所属群组 | - -+ 返回示例: - - ```json - { - "infos": [ - { - "content": "this is a test file", - "file_attr": { - "group": "root", - "mode": "0644", - "owner": "root" - }, - "path": "/home/test.txt" - } - ], - "success_files": [ - "/home/test.txt" - ], - "fail_files": [ - "/home/test.txt" - ] - } - ``` - - -##### 4.1.7、/v1/agent/collect/items/change - -+ 描述:更改插件采集项的采集状态,当前仅支持对gala-gopher采集项的更改,gala-gopher采集项可在配置文件中查看`/opt/gala-gopher/gala-gopher.conf` - -+ HTTP请求方式:POST - -+ 数据提交方式:application/json - -+ 请求参数: - - | 参数名 | 必选 | 类型 | 说明 | - | ----------- | ---- | ---- | -------------------------- | - | plugin_name | True | dict | 插件采集项预期修改结果数据 | - - + plugin_name - - | 参数名 | 必选 | 类型 | 说明 | - | ------------ | ---- | ------ | ------------------ | - | collect_item | True | string | 采集项预期修改结果 | - -+ 请求参数示例: - - ```json - { - "gala-gopher":{ - "redis":"auto", - "system_inode":"on", - "tcp":"on", - "haproxy":"auto" - } - } - ``` - -+ 返回体参数: - - | 参数名 | 类型 | 说明 | - | ------ | ---------- | ---------------- | - | code | int | 返回码 | - | msg | str | 状态码对应的信息 | - | resp | List[dict] | 响应数据主体 | - - + resp - - | 参数名 | 类型 | 说明 | - | ----------- | ---- | ------------------ | - | plugin_name | dict | 对应采集项修改结果 | - - + plugin_name - - | 参数名 | 类型 | 说明 | - | ------- | --------- | ---------------- | - | success | List[str] | 修改成功的采集项 | - | failure | List[str] | 修改失败的采集项 | - -+ 返回示例: - - ```json - { - "code": 200, - "msg": "operate success", - "resp": { - "gala-gopher": { - "failure": [ - "redis" - ], - "success": [ - "system_inode", - "tcp", - "haproxy" - ] - } - } - } - ``` - - - - ### FAQ: - -1. 若有报错,请查看日志/var/log/aops/aops.log,根据日志中相关报错提示解决问题,并重启服务。 - -2. 建议项目在Python3.7以上环境运行,安装Python依赖库时需要注意其版本。 - -3. access_token值可在注册完成后,从`/etc/aops/agent.conf`文件中获取。 - -4. 对于插件CPU,以及内存的资源限制目前通过在插件对应service文件中的Service节点下添加MemoryHigh和CPUQuota标签实现。 - - 如对gala-gopher内存限制为40M,CPU限制为20%。 - - ```ini - [Unit] - Description=a-ops gala gopher service - After=network.target - - [Service] - Type=exec - ExecStart=/usr/bin/gala-gopher - Restart=on-failure - RestartSec=1 - RemainAfterExit=yes - ;尽可能限制该单元中的进程最多可以使用多少内存,该限制允许突破,但突破限制后,进程运行速度会收到限制,并且系统会尽可能回收超出的内存 - ;选项值可以是以字节为单位的 绝对内存大小(可以使用以1024为基数的 K, M, G, T 后缀), 也可以是以百分比表示的相对内存大小 - MemoryHigh=40M - ;为此单元的进程设置CPU时间限额,必须设为一个以"%"结尾的百分数, 表示该单元最多可使用单颗CPU总时间的百分之多少 - CPUQuota=20% - - [Install] - WantedBy=multi-user.target - ``` - - - - - - diff --git a/docs/zh/docs/A-Ops/figures/029B66B9-5A3E-447E-B33C-98B894FC4833.png b/docs/zh/docs/A-Ops/figures/029B66B9-5A3E-447E-B33C-98B894FC4833.png deleted file mode 100644 index 230489c21dba54311356bbf2df56e817c0975f91..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/029B66B9-5A3E-447E-B33C-98B894FC4833.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/0BFA7C40-D404-4772-9C47-76EAD7D24E69.png b/docs/zh/docs/A-Ops/figures/0BFA7C40-D404-4772-9C47-76EAD7D24E69.png deleted file mode 100644 index 528bf4e30dc6221c496dd9a6d637359f592856db..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/0BFA7C40-D404-4772-9C47-76EAD7D24E69.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/1631073636579.png b/docs/zh/docs/A-Ops/figures/1631073636579.png deleted file mode 100644 index 5aacc487264ac63fbe5322b4f89fca3ebf9c7cd9..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/1631073636579.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/1631073840656.png b/docs/zh/docs/A-Ops/figures/1631073840656.png deleted file mode 100644 index 122e391eafe7c0d8d081030a240df90aea260150..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/1631073840656.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/1631101736624.png b/docs/zh/docs/A-Ops/figures/1631101736624.png deleted file mode 100644 index 74e2f2ded2ea254c66b221e8ac27a0d8bed9362a..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/1631101736624.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/1631101865366.png b/docs/zh/docs/A-Ops/figures/1631101865366.png deleted file mode 100644 index abfbc280a368b93af1e1165385af3a9cac89391d..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/1631101865366.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/1631101982829.png b/docs/zh/docs/A-Ops/figures/1631101982829.png deleted file mode 100644 index 0b1c9c7c3676b804dbdf19afbe4f3ec9dbe0627f..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/1631101982829.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/1631102019026.png b/docs/zh/docs/A-Ops/figures/1631102019026.png deleted file mode 100644 index 54e8e7d1cffbb28711074e511b08c73f66c1fb75..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/1631102019026.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/20210908212726.png b/docs/zh/docs/A-Ops/figures/20210908212726.png deleted file mode 100644 index f7d399aecd46605c09fe2d1f50a1a8670cd80432..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/20210908212726.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/D466AC8C-2FAF-4797-9A48-F6C346A1EC77.png b/docs/zh/docs/A-Ops/figures/D466AC8C-2FAF-4797-9A48-F6C346A1EC77.png deleted file mode 100644 index d87c5e04fa8cf4f2af0884226be66ddfb5f481e1..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/D466AC8C-2FAF-4797-9A48-F6C346A1EC77.png and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/a-ops\350\275\257\344\273\266\346\236\266\346\236\204.png" "b/docs/zh/docs/A-Ops/figures/a-ops\350\275\257\344\273\266\346\236\266\346\236\204.png" deleted file mode 100644 index 047c6f1bfe3e38c66d34285563d910f6f3bd07e1..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/a-ops\350\275\257\344\273\266\346\236\266\346\236\204.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/app\350\257\246\346\203\205.jpg" "b/docs/zh/docs/A-Ops/figures/app\350\257\246\346\203\205.jpg" deleted file mode 100644 index bd179be46c9e711d7148ee44dc56f4a7a02f56bf..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/app\350\257\246\346\203\205.jpg" and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/chakanyuqi.png b/docs/zh/docs/A-Ops/figures/chakanyuqi.png deleted file mode 100644 index bbead6a91468d5dee570cfdc66faf9a4ab155d7c..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/chakanyuqi.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/chaxunshijipeizhi.png b/docs/zh/docs/A-Ops/figures/chaxunshijipeizhi.png deleted file mode 100644 index d5f6e450fc0e1e246492ca71a6fcd8db572eb469..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/chaxunshijipeizhi.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/check.PNG b/docs/zh/docs/A-Ops/figures/check.PNG deleted file mode 100644 index 2dce821dd43eec6f0d13cd6b2dc1e30653f35489..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/check.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/chuangjianyewuyu.png b/docs/zh/docs/A-Ops/figures/chuangjianyewuyu.png deleted file mode 100644 index 4f5b8de2d2c4ddb9bfdfba1ac17258a834561e2d..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/chuangjianyewuyu.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/dashboard.PNG b/docs/zh/docs/A-Ops/figures/dashboard.PNG deleted file mode 100644 index 2a4a827191367309aad28a8a6c1835df602bdf72..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/dashboard.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/deploy.PNG b/docs/zh/docs/A-Ops/figures/deploy.PNG deleted file mode 100644 index e30dcb0eb05eb4f41202c736863f3e0ff216398d..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/deploy.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/diag.PNG b/docs/zh/docs/A-Ops/figures/diag.PNG deleted file mode 100644 index a67e8515b8313a50b06cb985611ef9c166851811..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/diag.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/domain.PNG b/docs/zh/docs/A-Ops/figures/domain.PNG deleted file mode 100644 index bad499f96df5934565d36edf2308cec5e4147719..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/domain.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/domain_config.PNG b/docs/zh/docs/A-Ops/figures/domain_config.PNG deleted file mode 100644 index 8995424b35cda75f08881037446b7816a0ca09dc..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/domain_config.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/elasticsearch3.png b/docs/zh/docs/A-Ops/figures/elasticsearch3.png deleted file mode 100644 index 893aae242aa9117c64f323374d4728d230894973..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/elasticsearch3.png and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/elasticsearch\351\205\215\347\275\2561.png" "b/docs/zh/docs/A-Ops/figures/elasticsearch\351\205\215\347\275\2561.png" deleted file mode 100644 index 1b7e0eab093b2f0455b8f3972884e5f757fbec3d..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/elasticsearch\351\205\215\347\275\2561.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/elasticsearch\351\205\215\347\275\2562.png" "b/docs/zh/docs/A-Ops/figures/elasticsearch\351\205\215\347\275\2562.png" deleted file mode 100644 index 620dbbda71259e3b6ee6a2efb646a9692adf2456..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/elasticsearch\351\205\215\347\275\2562.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/gala-gopher\346\210\220\345\212\237\345\220\257\345\212\250\347\212\266\346\200\201.png" "b/docs/zh/docs/A-Ops/figures/gala-gopher\346\210\220\345\212\237\345\220\257\345\212\250\347\212\266\346\200\201.png" deleted file mode 100644 index ab16e9d3661db3fd4adc6c605b2d2d08e79fdc1c..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/gala-gopher\346\210\220\345\212\237\345\220\257\345\212\250\347\212\266\346\200\201.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/gala-spider\350\275\257\344\273\266\346\236\266\346\236\204\345\233\276.png" "b/docs/zh/docs/A-Ops/figures/gala-spider\350\275\257\344\273\266\346\236\266\346\236\204\345\233\276.png" deleted file mode 100644 index c5a0768be63a98ef7ccc4a56996a8c715f7090af..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/gala-spider\350\275\257\344\273\266\346\236\266\346\236\204\345\233\276.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/gopher\350\275\257\344\273\266\346\236\266\346\236\204\345\233\276.png" "b/docs/zh/docs/A-Ops/figures/gopher\350\275\257\344\273\266\346\236\266\346\236\204\345\233\276.png" deleted file mode 100644 index f151965a21d11dd7a3e215cc4ef23d70d059f4b1..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/gopher\350\275\257\344\273\266\346\236\266\346\236\204\345\233\276.png" and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/group.PNG b/docs/zh/docs/A-Ops/figures/group.PNG deleted file mode 100644 index 584fd1f7195694a3419482cace2a71fa1cd9a3ec..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/group.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/host.PNG b/docs/zh/docs/A-Ops/figures/host.PNG deleted file mode 100644 index 3c00681a567cf8f1e1baddfb6fdb7b6cf7df43de..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/host.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/jiemi.png b/docs/zh/docs/A-Ops/figures/jiemi.png deleted file mode 100644 index da07cfdf9296e201a82cceb210e651261fe7ecee..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/jiemi.png and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/kafka\351\205\215\347\275\256.png" "b/docs/zh/docs/A-Ops/figures/kafka\351\205\215\347\275\256.png" deleted file mode 100644 index 57eb17ccbd2fa63d97f700c29847fac7f08042ff..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/kafka\351\205\215\347\275\256.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/prometheus\351\205\215\347\275\256.png" "b/docs/zh/docs/A-Ops/figures/prometheus\351\205\215\347\275\256.png" deleted file mode 100644 index 7c8d0328967e8eb9bc4aa7465a273b9ef5a30b58..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/prometheus\351\205\215\347\275\256.png" and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/shanchupeizhi.png b/docs/zh/docs/A-Ops/figures/shanchupeizhi.png deleted file mode 100644 index cfea2eb44f7b8aa809404b8b49b4bd2e24172568..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/shanchupeizhi.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/shanchuzhuji.png b/docs/zh/docs/A-Ops/figures/shanchuzhuji.png deleted file mode 100644 index b3da935739369dad1318fe135146755ede13c694..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/shanchuzhuji.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/shanchuzhujizu.png b/docs/zh/docs/A-Ops/figures/shanchuzhujizu.png deleted file mode 100644 index e4d85f6e3f1a269a483943f5115f54daa3de51de..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/shanchuzhujizu.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/spider.PNG b/docs/zh/docs/A-Ops/figures/spider.PNG deleted file mode 100644 index 53bad6dd38e36db9cadfdbeda21cbc3ef59eddf7..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/spider.PNG and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/spider_detail.jpg b/docs/zh/docs/A-Ops/figures/spider_detail.jpg deleted file mode 100644 index 6d4d2e2b9e79c53dbd359faa03e1c90f07c0ade6..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/spider_detail.jpg and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/spider\346\213\223\346\211\221\345\205\263\347\263\273\345\233\276.png" "b/docs/zh/docs/A-Ops/figures/spider\346\213\223\346\211\221\345\205\263\347\263\273\345\233\276.png" deleted file mode 100644 index 5823a116f384801e1197350f151b4d04ef519ac4..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/spider\346\213\223\346\211\221\345\205\263\347\263\273\345\233\276.png" and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/tianjianode.png b/docs/zh/docs/A-Ops/figures/tianjianode.png deleted file mode 100644 index d68f5e12a62548f2ec59374bda9ab07f43b8b5cb..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/tianjianode.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/tianjiazhujizu.png b/docs/zh/docs/A-Ops/figures/tianjiazhujizu.png deleted file mode 100644 index ed4ab3616d418ecf33a006fee3985b8b6d2d965d..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/tianjiazhujizu.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/xinzengpeizhi.png b/docs/zh/docs/A-Ops/figures/xinzengpeizhi.png deleted file mode 100644 index 18d71c2e099c19b5d28848eec6a8d11f29ccee27..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/xinzengpeizhi.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/zhuangtaichaxun.png b/docs/zh/docs/A-Ops/figures/zhuangtaichaxun.png deleted file mode 100644 index a3d0b3294bf6e0eeec50a2c2f8c5059bdc256376..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/zhuangtaichaxun.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/zhuji.png b/docs/zh/docs/A-Ops/figures/zhuji.png deleted file mode 100644 index f4c7b9103baab7748c83392f6120c8f00880860f..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/zhuji.png and /dev/null differ diff --git a/docs/zh/docs/A-Ops/figures/zuneizhuji.png b/docs/zh/docs/A-Ops/figures/zuneizhuji.png deleted file mode 100644 index 9f188d207162fa1418a61a10f83ef9c51a512e65..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Ops/figures/zuneizhuji.png and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\347\256\241\347\220\206.jpg" "b/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\347\256\241\347\220\206.jpg" deleted file mode 100644 index 9f6d8858468c0cc72c1bd395403f064cc63f82bd..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\347\256\241\347\220\206.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\347\273\204.jpg" "b/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\347\273\204.jpg" deleted file mode 100644 index fb5472de6b3d30abf6af73e286f70ac8e1d58c15..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\347\273\204.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\350\257\246\346\203\205.jpg" "b/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\350\257\246\346\203\205.jpg" deleted file mode 100644 index effd8c29aba14c2e8f301f9f60d8f25ce8c533f0..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\344\270\273\346\234\272\350\257\246\346\203\205.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271mysql\351\205\215\347\275\256\346\226\207\344\273\266.png" "b/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271mysql\351\205\215\347\275\256\346\226\207\344\273\266.png" deleted file mode 100644 index d83425ee0622be329782620318818662b292e88b..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271mysql\351\205\215\347\275\256\346\226\207\344\273\266.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271\346\217\222\344\273\266.png" "b/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271\346\217\222\344\273\266.png" deleted file mode 100644 index ba4a8d4d9aadb7f712bdcb4b193f05f956d38841..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271\346\217\222\344\273\266.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271\346\250\241\345\236\213.png" "b/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271\346\250\241\345\236\213.png" deleted file mode 100644 index 23ff4e5fddb87ac157b1002a70c47d9b4c76b873..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\344\277\256\346\224\271\346\250\241\345\236\213.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\210\233\345\273\272\345\267\245\344\275\234\346\265\201.jpg" "b/docs/zh/docs/A-Ops/figures/\345\210\233\345\273\272\345\267\245\344\275\234\346\265\201.jpg" deleted file mode 100644 index 1a2b45e860914a1ac0cfb6908b02fb5cad4cbd60..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\210\233\345\273\272\345\267\245\344\275\234\346\265\201.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246.jpg" "b/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246.jpg" deleted file mode 100644 index 89ac88e154275d4be8179d773e7093f2357f425f..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246\347\241\256\350\256\244.jpg" "b/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246\347\241\256\350\256\244.jpg" deleted file mode 100644 index 57844f772853c541f7a1328b007a9b6ae4d5caf0..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246\347\241\256\350\256\244.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246\350\257\246\346\203\205.jpg" "b/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246\350\257\246\346\203\205.jpg" deleted file mode 100644 index 5b4830b47897a0d51be28238a879a70b1de9ca3b..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\221\212\350\255\246\350\257\246\346\203\205.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\345\217\260.jpg" "b/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\345\217\260.jpg" deleted file mode 100644 index 998b81e3b88d888d0915dcff48dc8cc5df30d91c..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\345\217\260.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\346\265\201.jpg" "b/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\346\265\201.jpg" deleted file mode 100644 index 17fb5b13034e1fc5276c68583fed1952415b0b5f..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\346\265\201.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\346\265\201\350\257\246\346\203\205.jpg" "b/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\346\265\201\350\257\246\346\203\205.jpg" deleted file mode 100644 index 458e023847bb2ad1f198f5a2dd1691748038137e..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\267\245\344\275\234\346\265\201\350\257\246\346\203\205.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\345\272\224\347\224\250.png" "b/docs/zh/docs/A-Ops/figures/\345\272\224\347\224\250.png" deleted file mode 100644 index aa34bb909ee7c86a95126c13fa532ce93410a931..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\345\272\224\347\224\250.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\346\211\247\350\241\214\350\257\212\346\226\255.png" "b/docs/zh/docs/A-Ops/figures/\346\211\247\350\241\214\350\257\212\346\226\255.png" deleted file mode 100644 index afb5f7e9fbfb1d1ce46d096a61729766b4940cd3..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\346\211\247\350\241\214\350\257\212\346\226\255.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\346\212\245\345\221\212\345\206\205\345\256\271.png" "b/docs/zh/docs/A-Ops/figures/\346\212\245\345\221\212\345\206\205\345\256\271.png" deleted file mode 100644 index 2029141179302ecef45d34cb0c9dc916b9142e7b..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\346\212\245\345\221\212\345\206\205\345\256\271.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\346\217\222\344\273\266\347\256\241\347\220\206.jpg" "b/docs/zh/docs/A-Ops/figures/\346\217\222\344\273\266\347\256\241\347\220\206.jpg" deleted file mode 100644 index 2258d03976902052aaf39d36b6374fa680b9f8aa..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\346\217\222\344\273\266\347\256\241\347\220\206.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\346\226\260\345\242\236\346\225\205\351\232\234\346\240\221.png" "b/docs/zh/docs/A-Ops/figures/\346\226\260\345\242\236\346\225\205\351\232\234\346\240\221.png" deleted file mode 100644 index 664efd5150fcb96f009ce0eddc3d9ac91b9e622f..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\346\226\260\345\242\236\346\225\205\351\232\234\346\240\221.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\346\237\245\347\234\213\346\212\245\345\221\212\345\210\227\350\241\250.png" "b/docs/zh/docs/A-Ops/figures/\346\237\245\347\234\213\346\212\245\345\221\212\345\210\227\350\241\250.png" deleted file mode 100644 index 58307ec6ef4c73b6b0f039b1052e5870629ac2e8..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\346\237\245\347\234\213\346\212\245\345\221\212\345\210\227\350\241\250.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\346\237\245\347\234\213\346\225\205\351\232\234\346\240\221.png" "b/docs/zh/docs/A-Ops/figures/\346\237\245\347\234\213\346\225\205\351\232\234\346\240\221.png" deleted file mode 100644 index a566417b18e8bcf19153730904893fc8d827d885..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\346\237\245\347\234\213\346\225\205\351\232\234\346\240\221.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\346\267\273\345\212\240\344\270\273\346\234\272\347\273\204.jpg" "b/docs/zh/docs/A-Ops/figures/\346\267\273\345\212\240\344\270\273\346\234\272\347\273\204.jpg" deleted file mode 100644 index 9fcd24d949e500323e7a466be7cbfaf48d257ad0..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\346\267\273\345\212\240\344\270\273\346\234\272\347\273\204.jpg" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255error1.png" "b/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255error1.png" deleted file mode 100644 index 9e5b1139febe9f00156b37f3268269ac30a78737..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255error1.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255\344\270\273\347\225\214\351\235\242.png" "b/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255\344\270\273\347\225\214\351\235\242.png" deleted file mode 100644 index b536af938250004bac3053b234bf20bcbf075c9b..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255\344\270\273\347\225\214\351\235\242.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255\345\233\276\347\211\207.png" "b/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255\345\233\276\347\211\207.png" deleted file mode 100644 index 6cef6216522407997d705d29131287f3a30b0f8f..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\350\257\212\346\226\255\345\233\276\347\211\207.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/figures/\351\205\215\347\275\256web.png" "b/docs/zh/docs/A-Ops/figures/\351\205\215\347\275\256web.png" deleted file mode 100644 index 721335115922e03f255e67e6b775c1ac0cfbbc50..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/A-Ops/figures/\351\205\215\347\275\256web.png" and /dev/null differ diff --git "a/docs/zh/docs/A-Ops/gala-anteater\344\275\277\347\224\250\346\211\213\345\206\214.md" "b/docs/zh/docs/A-Ops/gala-anteater\344\275\277\347\224\250\346\211\213\345\206\214.md" deleted file mode 100644 index 23f8c0b53634bae6cdc3b2dda5affee2692ccd73..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/gala-anteater\344\275\277\347\224\250\346\211\213\345\206\214.md" +++ /dev/null @@ -1,158 +0,0 @@ -# **gala-anteater使用手册** - -gala-anteater是一款基于AI的操作系统异常检测平台。主要提供时序数据预处理、异常点发现、异常上报等功能。基于线下预训练、线上模型的增量学习与模型更新,能够很好地适应于多维多模态数据故障诊断。 - -本文主要介绍如何部署和使用gala-anteater服务。 - -#### 安装 - -挂载repo源: - -```basic -[oe-2209] # openEuler 2209 官方发布源 -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 2209:Epol 官方发布源 -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -安装gala-anteater: - -```bash -# yum install gala-anteater -``` - - - -#### 配置 - -> 说明:gala-anteater不包含额外需要配置的config文件,其参数通过命令行的启动参数传递。 - -##### 启动参数介绍 - -| 参数项 | 参数详细名 | 类型 | 是否必须 | 默认值 | 名称 | 含义 | -|---|---|---|---|---|---|---| -| -ks | --kafka_server | string | True | | KAFKA_SERVER | Kafka Server的ip地址,如:localhost / xxx.xxx.xxx.xxx | -| -kp | --kafka_port | string | True | | KAFKA_PORT | Kafka Server的port,如:9092 | -| -ps | --prometheus_server | string | True | | PROMETHEUS_SERVER | Prometheus Server的ip地址,如:localhost / xxx.xxx.xxx.xxx | -| -pp | --prometheus_port | string | True | | PROMETHEUS_PORT | Prometheus Server的port,如:9090 | -| -m | --model | string | False | vae | MODEL | 异常检测模型,目前支持两种异常检测模型,可选(random_forest,vae)
random_forest:随机森林模型,不支持在线学习
vae:Variational Atuoencoder,无监督模型,支持首次启动时,利用历史数据,进行模型更新迭代 | -| -d | --duration | int | False | 1 | DURATION | 异常检测模型执行频率(单位:分),每x分钟,检测一次 | -| -r | --retrain | bool | False | False | RETRAIN | 是否在启动时,利用历史数据,进行模型更新迭代,目前仅支持vae模型 | -| -l | --look_back | int | False | 4 | LOOK_BACK | 利用过去x天的历史数据,更新模型 | -| -t | --threshold | float | False | 0.8 | THRESHOLD | 异常检测模型的阈值:(0,1),较大的值,能够减少模型的误报率,推荐大于等于0.5 | -| -sli | --sli_time | int | False | 400 | SLI_TIME | 表示应用性能指标(单位:毫秒),较大的值,能够减少模型的误报率,推荐大于等于200
对于误报率较高的场景,推荐1000以上 | - - - -#### 启动 - -执行如下命令启动gala-anteater。 - -> 说明:gala-anteter支持命令行方式启动运行,不支持systemd方式。 - -##### 在线训练方式运行(推荐) -```bash -gala-anteater -ks {ip} -kp {port} -ps {ip} -pp {port} -m vae -r True -l 7 -t 0.6 -sli 400 -``` - -##### 普通方式运行 -```bash -gala-anteater -ks {ip} -kp {port} -ps {ip} -pp {port} -m vae -t 0.6 -sli 400 -``` - -##### 查询gala-anteater服务状态 - -若日志显示如下内容,说明服务启动成功,启动日志也会保存到当前运行目录下`logs/anteater.log`文件中。 - -```log -2022-09-01 17:52:54,435 - root - INFO - Run gala_anteater main function... -2022-09-01 17:52:54,436 - root - INFO - Start to try updating global configurations by querying data from Kafka! -2022-09-01 17:52:54,994 - root - INFO - Loads metric and operators from file: xxx\metrics.csv -2022-09-01 17:52:54,997 - root - INFO - Loads metric and operators from file: xxx\metrics.csv -2022-09-01 17:52:54,998 - root - INFO - Start to re-train the model based on last day metrics dataset! -2022-09-01 17:52:54,998 - root - INFO - Get training data during 2022-08-31 17:52:00+08:00 to 2022-09-01 17:52:00+08:00! -2022-09-01 17:53:06,994 - root - INFO - Spends: 11.995422840118408 seconds to get unique machine_ids! -2022-09-01 17:53:06,995 - root - INFO - The number of unique machine ids is: 1! -2022-09-01 17:53:06,996 - root - INFO - Fetch metric values from machine: xxxx. -2022-09-01 17:53:38,385 - root - INFO - Spends: 31.3896164894104 seconds to get get all metric values! -2022-09-01 17:53:38,392 - root - INFO - The shape of training data: (17281, 136) -2022-09-01 17:53:38,444 - root - INFO - Start to execute vae model training... -2022-09-01 17:53:38,456 - root - INFO - Using cpu device -2022-09-01 17:53:38,658 - root - INFO - Epoch(s): 0 train Loss: 136.68 validate Loss: 117.00 -2022-09-01 17:53:38,852 - root - INFO - Epoch(s): 1 train Loss: 113.73 validate Loss: 110.05 -2022-09-01 17:53:39,044 - root - INFO - Epoch(s): 2 train Loss: 110.60 validate Loss: 108.76 -2022-09-01 17:53:39,235 - root - INFO - Epoch(s): 3 train Loss: 109.39 validate Loss: 106.93 -2022-09-01 17:53:39,419 - root - INFO - Epoch(s): 4 train Loss: 106.48 validate Loss: 103.37 -... -2022-09-01 17:53:57,744 - root - INFO - Epoch(s): 98 train Loss: 97.63 validate Loss: 96.76 -2022-09-01 17:53:57,945 - root - INFO - Epoch(s): 99 train Loss: 97.75 validate Loss: 96.58 -2022-09-01 17:53:57,969 - root - INFO - Schedule recurrent job with time interval 1 minute(s). -2022-09-01 17:53:57,973 - apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts -2022-09-01 17:53:57,974 - apscheduler.scheduler - INFO - Added job "partial" to job store "default" -2022-09-01 17:53:57,974 - apscheduler.scheduler - INFO - Scheduler started -2022-09-01 17:53:57,975 - apscheduler.scheduler - DEBUG - Looking for jobs to run -2022-09-01 17:53:57,975 - apscheduler.scheduler - DEBUG - Next wakeup is due at 2022-09-01 17:54:57.973533+08:00 (in 59.998006 seconds) -``` - - - -#### 输出数据 - -gala-anteater如果检测到的异常点,会将结果输出至kafka。输出数据格式如下: - -```json -{ - "Timestamp":1659075600000, - "Attributes":{ - "entity_id":"xxxxxx_sli_1513_18", - "event_id":"1659075600000_1fd37742xxxx_sli_1513_18", - "event_type":"app" - }, - "Resource":{ - "anomaly_score":1.0, - "anomaly_count":13, - "total_count":13, - "duration":60, - "anomaly_ratio":1.0, - "metric_label":{ - "machine_id":"1fd37742xxxx", - "tgid":"1513", - "conn_fd":"18" - }, - "recommend_metrics":{ - "gala_gopher_tcp_link_notack_bytes":{ - "label":{ - "__name__":"gala_gopher_tcp_link_notack_bytes", - "client_ip":"x.x.x.165", - "client_port":"51352", - "hostname":"localhost.localdomain", - "instance":"x.x.x.172:8888", - "job":"prometheus-x.x.x.172", - "machine_id":"xxxxxx", - "protocol":"2", - "role":"0", - "server_ip":"x.x.x.172", - "server_port":"8888", - "tgid":"3381701" - }, - "score":0.24421279500639545 - }, - ... - }, - "metrics":"gala_gopher_ksliprobe_recent_rtt_nsec" - }, - "SeverityText":"WARN", - "SeverityNumber":14, - "Body":"TimeStamp, WARN, APP may be impacting sli performance issues." -} -``` - diff --git "a/docs/zh/docs/A-Ops/gala-gopher\344\275\277\347\224\250\346\211\213\345\206\214.md" "b/docs/zh/docs/A-Ops/gala-gopher\344\275\277\347\224\250\346\211\213\345\206\214.md" deleted file mode 100644 index 785655c57a5d7dfaa9e8dcf2c0637d07fb0acd8c..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/gala-gopher\344\275\277\347\224\250\346\211\213\345\206\214.md" +++ /dev/null @@ -1,228 +0,0 @@ -# **gala-gopher使用手册** - -gala-gopher作为数据采集模块提供OS级的监控能力,支持动态加 /卸载探针,可无侵入式地集成第三方探针,快速扩展监控范围。 - -本文介绍如何部署和使用gala-gopher服务。 - -#### 安装 - -挂载repo源: - -```basic -[oe-2209] # openEuler 2209 官方发布源 -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 2209:Epol 官方发布源 -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -安装gala-gopher: - -```bash -# yum install gala-gopher -``` - - - -#### 配置 - -##### 配置介绍 - -gala-gopher配置文件为`/opt/gala-gopher/gala-gopher.conf`,该文件配置项说明如下(省略无需用户配置的部分)。 - -如下配置可以根据需要进行修改: - -- global:gala-gopher全局配置信息。 - - log_directory:gala-gopher日志文件名。 - - pin_path:ebpf探针共享map存放路径(建议维持默认配置)。 -- metric:指标数据metrics输出方式配置。 - - out_channel:metrics输出通道,支持配置web_server|kafka,配置为空则输出通道关闭。 - - kafka_topic:若输出通道为kafka,此为topic配置信息。 -- event:异常事件event输出方式配置。 - - out_channel:event输出通道,支持配置logs|kafka,配置为空则输出通道关闭。 - - kafka_topic:若输出通道为kafka,此为topic配置信息。 -- meta:元数据metadata输出方式配置。 - - out_channel:metadata输出通道,支持logs|kafka,配置为空则输出通道关闭。 - - kafka_topic:若输出通道为kafka,此为topic配置信息。 -- imdb:cache缓存规格配置。 - - max_tables_num:最大的cache表个数,/opt/gala-gopher/meta目录下每个meta对应一个表。 - - max_records_num:每张cache表最大记录数,通常每个探针在一个观测周期内产生至少1条观测记录。 - - max_metrics_num:每条观测记录包含的最大的metric指标个数。 - - record_timeout:cache表老化时间,若cache表中某条记录超过该时间未刷新则删除记录,单位为秒。 -- web_server:输出通道web_server配置。 - - port:监听端口。 -- kafka:输出通道kafka配置。 - - kafka_broker:kafka服务器的IP和port。 -- logs:输出通道logs配置。 - - metric_dir:metrics指标数据日志路径。 - - event_dir:异常事件数据日志路径。 - - meta_dir:metadata元数据日志路径。 - - debug_dir:gala-gopher运行日志路径。 -- probes:native探针配置。 - - name:探针名称,要求与native探针名一致,如example.probe 探针名为example。 - - param :探针启动参数,支持的参数详见[启动参数介绍表](#启动参数介绍)。 - - switch:探针是否启动,支持配置 on | off。 -- extend_probes :第三方探针配置。 - - name:探针名称。 - - command:探针启动命令。 - - param:探针启动参数,支持的参数详见[启动参数介绍表](#启动参数介绍)。 - - start_check:switch为auto时,需要根据start_check执行结果判定探针是否需要启动。 - - switch:探针是否启动,支持配置on | off | auto,auto会根据start_check判定结果决定是否启动探针。 - -##### 启动参数介绍 - -| 参数项 | 含义 | -| ------ | ------------------------------------------------------------ | -| -l | 是否开启异常事件上报 | -| -t | 采样周期,单位为秒,默认配置为探针5s上报一次数据 | -| -T | 延迟时间阈值,单位为ms,默认配置为0ms | -| -J | 抖动时间阈值,单位为ms,默认配置为0ms | -| -O | 离线时间阈值,单位为ms,默认配置为0ms | -| -D | 丢包阈值,默认配置为0(个) | -| -F | 配置为`task`表示按照`task_whitelist.conf`过滤,配置为具体进程的pid表示仅监控此进程 | -| -P | 指定每个探针加载的探测程序范围,目前tcpprobe、taskprobe探针涉及 | -| -U | 资源利用率阈值(上限),默认为0% | -| -L | 资源利用率阈值(下限),默认为0% | -| -c | 指示探针(tcp)是否标识client_port,默认配置为0(否) | -| -N | 指定探针(ksliprobe)的观测进程名,默认配置为NULL | -| -p | 指定待观测进程的二进制文件路径,比如nginx_probe,通过 -p /user/local/sbin/nginx指定nginx文件路径,默认配置为NULL | -| -w | 筛选应用程序监控范围,如-w /opt/gala-gopher/task_whitelist.conf,用户可将需要监控的程序名写入task_whitelist.conf中,默认配置为NULL表示不筛选 | -| -n | 指定某个网卡挂载tc ebpf,默认配置为NULL表示所有网卡均挂载,示例: -n eth0 | - -##### 配置文件示例 - -- 配置选择数据输出通道: - - ```yaml - metric = - { - out_channel = "web_server"; - kafka_topic = "gala_gopher"; - }; - - event = - { - out_channel = "kafka"; - kafka_topic = "gala_gopher_event"; - }; - - meta = - { - out_channel = "kafka"; - kafka_topic = "gala_gopher_metadata"; - }; - ``` - -- 配置kafka和webServer: - - ```yaml - web_server = - { - port = 8888; - }; - - kafka = - { - kafka_broker = ":9092"; - }; - ``` - -- 选择开启的探针,示例如下: - - ```yaml - probes = - ( - { - name = "system_infos"; - param = "-t 5 -w /opt/gala-gopher/task_whitelist.conf -l warn -U 80"; - switch = "on"; - }, - ); - extend_probes = - ( - { - name = "tcp"; - command = "/opt/gala-gopher/extend_probes/tcpprobe"; - param = "-l warn -c 1 -P 7"; - switch = "on"; - } - ); - ``` - - - -#### 启动 - -配置完成后,执行如下命令启动gala-gopher。 - -```bash -# systemctl start gala-gopher.service -``` - -查询gala-gopher服务状态。 - -```bash -# systemctl status gala-gopher.service -``` - -若显示结果如下,说明服务启动成功。需要关注开启的探针是否已启动,如果探针线程不存在,请检查配置文件及gala-gopher运行日志文件。 - -![gala-gopher成功启动状态](./figures/gala-gopher成功启动状态.png) - -> 说明:gala-gopher部署和运行均需要root权限。 - - - -#### 使用方法 - -##### 外部依赖软件部署 - -![gopher软件架构图](./figures/gopher软件架构图.png) - -如上图所示,绿色部分为gala-gopher的外部依赖组件。gala-gopher会将指标数据metrics输出到promethous,将元数据metadata、异常事件event输出到kafka,灰色部分的gala-anteater和gala-spider会从promethous和kafka获取数据。 - -> 说明:安装kafka、promethous软件包时,需要从官网获取安装包进行部署。 - - - -##### 输出数据 - -- **指标数据metrics** - - Promethous Server内置了Express Browser UI,用户可以通过PromQL查询语句查询指标数据内容。详细教程参见官方文档:[Using the expression browser](https://prometheus.io/docs/prometheus/latest/getting_started/#using-the-expression-browser)。示例如下: - - 指定指标名称为`gala_gopher_tcp_link_rcv_rtt`,UI显示的指标数据为: - - ```basic - gala_gopher_tcp_link_rcv_rtt{client_ip="x.x.x.165",client_port="1234",hostname="openEuler",instance="x.x.x.172:8888",job="prometheus",machine_id="1fd3774xx",protocol="2",role="0",server_ip="x.x.x.172",server_port="3742",tgid="1516"} 1 - ``` - -- **元数据metadata** - - 可以直接从kafka消费topic为`gala_gopher_metadata`的数据来看。示例如下: - - ```bash - # 输入请求 - ./bin/kafka-console-consumer.sh --bootstrap-server x.x.x.165:9092 --topic gala_gopher_metadata - # 输出数据 - {"timestamp": 1655888408000, "meta_name": "thread", "entity_name": "thread", "version": "1.0.0", "keys": ["machine_id", "pid"], "labels": ["hostname", "tgid", "comm", "major", "minor"], "metrics": ["fork_count", "task_io_wait_time_us", "task_io_count", "task_io_time_us", "task_hang_count"]} - ``` - -- **异常事件event** - - 可以直接从kafka消费topic为`gala_gopher_event`的数据来看。示例如下: - - ```bash - # 输入请求 - ./bin/kafka-console-consumer.sh --bootstrap-server x.x.x.165:9092 --topic gala_gopher_event - # 输出数据 - {"timestamp": 1655888408000, "meta_name": "thread", "entity_name": "thread", "version": "1.0.0", "keys": ["machine_id", "pid"], "labels": ["hostname", "tgid", "comm", "major", "minor"], "metrics": ["fork_count", "task_io_wait_time_us", "task_io_count", "task_io_time_us", "task_hang_count"]} - ``` \ No newline at end of file diff --git "a/docs/zh/docs/A-Ops/gala-spider\344\275\277\347\224\250\346\211\213\345\206\214.md" "b/docs/zh/docs/A-Ops/gala-spider\344\275\277\347\224\250\346\211\213\345\206\214.md" deleted file mode 100644 index 1d67eefddf43ec7d7dcd439c5b1959db774d3398..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/gala-spider\344\275\277\347\224\250\346\211\213\345\206\214.md" +++ /dev/null @@ -1,541 +0,0 @@ -# gala-spider使用手册 - -本文档主要介绍如何部署和使用gala-spider和gala-inference。 - -## gala-spider - -gala-spider 提供 OS 级别的拓扑图绘制功能,它将定期获取 gala-gopher (一个 OS 层面的数据采集软件)在某个时间点采集的所有观测对象的数据,并计算它们之间的拓扑关系,最终将生成的拓扑图保存到图数据库 arangodb 中。 - -### 安装 - -挂载 yum 源: - -```basic -[oe-2209] # openEuler 2209 官方发布源 -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 2209:Epol 官方发布源 -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -安装 gala-spider: - -```sh -# yum install gala-spider -``` - - - -### 配置 - -#### 配置文件说明 - -gala-spider 配置文件为 `/etc/gala-spider/gala-spider.yaml` ,该文件配置项说明如下。 - -- global:全局配置信息 - - data_source:指定观测指标采集的数据库,当前只支持 prometheus - - data_agent:指定观测指标采集代理,当前只支持 gala_gopher -- spider: - - log_conf:日志配置信息 - - log_path:日志文件路径 - - log_level:日志打印级别,值包括 DEBUG/INFO/WARNING/ERROR/CRITICAL - - max_size:日志文件大小,单位为兆字节(MB) - - backup_count:日志备份文件数量 -- storage:拓扑图存储服务的配置信息 - - period:存储周期,单位为秒,表示每隔多少秒存储一次拓扑图 - - database:存储的图数据库,当前只支持 arangodb - - db_conf:图数据库的配置信息 - - url:图数据库的服务器地址 - - db_name:拓扑图存储的数据库名称 -- kafka:kafka配置信息 - - server:kafka服务器地址 - - metadata_topic:观测对象元数据消息的topic名称 - - metadata_group_id:观测对象元数据消息的消费者组ID -- prometheus:prometheus数据库配置信息 - - base_url:prometheus服务器地址 - - instant_api:单个时间点采集API - - range_api:区间采集API - - step:采集时间步长,用于区间采集API - -#### 配置文件示例 - -```yaml -global: - data_source: "prometheus" - data_agent: "gala_gopher" - -prometheus: - base_url: "http://localhost:9090/" - instant_api: "/api/v1/query" - range_api: "/api/v1/query_range" - step: 1 - -spider: - log_conf: - log_path: "/var/log/gala-spider/spider.log" - # log level: DEBUG/INFO/WARNING/ERROR/CRITICAL - log_level: INFO - # unit: MB - max_size: 10 - backup_count: 10 - -storage: - # unit: second - period: 60 - database: arangodb - db_conf: - url: "http://localhost:8529" - db_name: "spider" - -kafka: - server: "localhost:9092" - metadata_topic: "gala_gopher_metadata" - metadata_group_id: "metadata-spider" -``` - - - -### 启动 - -1. 通过命令启动。 - - ```sh - # spider-storage - ``` - -2. 通过 systemd 服务启动。 - - ```sh - # systemctl start gala-spider - ``` - - - -### 使用方法 - -##### 外部依赖软件部署 - -gala-spider 运行时需要依赖多个外部软件进行交互。因此,在启动 gala-spider 之前,用户需要将gala-spider依赖的软件部署完成。下图为 gala-spider 项目的软件依赖图。 - -![gala-spider软件架构图](./figures/gala-spider软件架构图.png) - -其中,右侧虚线框内为 gala-spider 项目的 2 个功能组件,绿色部分为 gala-spider 项目直接依赖的外部组件,灰色部分为 gala-spider 项目间接依赖的外部组件。 - -- **spider-storage**:gala-spider 核心组件,提供拓扑图存储功能。 - 1. 从 kafka 中获取观测对象的元数据信息。 - 2. 从 Prometheus 中获取所有的观测实例信息。 - 3. 将生成的拓扑图存储到图数据库 arangodb 中。 -- **gala-inference**:gala-spider 核心组件,提供根因定位功能。它通过订阅 kafka 的异常 KPI 事件触发异常 KPI 的根因定位流程,并基于 arangodb 获取的拓扑图来构建故障传播图,最终将根因定位的结果输出到 kafka 中。 -- **prometheus**:时序数据库,gala-gopher 组件采集的观测指标数据会上报到 prometheus,再由 gala-spider 做进一步处理。 -- **kafka**:消息中间件,用于存储 gala-gopher 上报的观测对象元数据信息,异常检测组件上报的异常事件,以及 cause-inference 组件上报的根因定位结果。 -- **arangodb**:图数据库,用于存储 spider-storage 生成的拓扑图。 -- **gala-gopher**:数据采集组件,请提前部署gala-gopher。 -- **arangodb-ui**:arangodb 提供的 UI 界面,可用于查询拓扑图。 - -gala-spider 项目中的 2 个功能组件会作为独立的软件包分别发布。 - -​ **spider-storage** 组件对应本节中的 gala-spider 软件包。 - -​ **gala-inference** 组件对应 gala-inference 软件包。 - -gala-gopher软件的部署参见[gala-gopher使用手册](gala-gopher使用手册.md),此处只介绍 arangodb 的部署。 - -当前使用的 arangodb 版本是 3.8.7 ,该版本对运行环境有如下要求: - -- 只支持 x86 系统 -- gcc10 以上 - -arangodb 官方部署文档参见:[arangodb部署](https://www.arangodb.com/docs/3.9/deployment.html) 。 - -arangodb 基于 rpm 的部署流程如下: - -1. 配置 yum 源。 - - ```basic - [oe-2209] # openEuler 2209 官方发布源 - name=oe2209 - baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 - enabled=1 - gpgcheck=0 - priority=1 - - [oe-2209:Epol] # openEuler 2209:Epol 官方发布源 - name=oe2209_epol - baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ - enabled=1 - gpgcheck=0 - priority=1 - ``` - -2. 安装 arangodb3。 - - ```sh - # yum install arangodb3 - ``` - -3. 配置修改。 - - arangodb3 服务器的配置文件路径为 `/etc/arangodb3/arangod.conf` ,需要修改如下的配置信息: - - - endpoint:配置 arangodb3 的服务器地址 - - authentication:访问 arangodb3 服务器是否需要进行身份认证,当前 gala-spider 还不支持身份认证,故此处将authentication设置为 false。 - - 示例配置如下: - - ```yaml - [server] - endpoint = tcp://0.0.0.0:8529 - authentication = false - ``` - -4. 启动 arangodb3。 - - ```sh - # systemctl start arangodb3 - ``` - -##### gala-spider配置项修改 - -依赖软件启动后,需要修改 gala-spider 配置文件的部分配置项内容。示例如下: - -配置 kafka 服务器地址: - -```yaml -kafka: - server: "localhost:9092" -``` - -配置 prometheus 服务器地址: - -```yaml -prometheus: - base_url: "http://localhost:9090/" -``` - -配置 arangodb 服务器地址: - -```yaml -storage: - db_conf: - url: "http://localhost:8529" -``` - -##### 启动服务 - -运行 `systemctl start gala-spider` 。查看启动状态可执行 `systemctl status gala-spider` ,输出如下信息说明启动成功。 - -```sh -[root@openEuler ~]# systemctl status gala-spider -● gala-spider.service - a-ops gala spider service - Loaded: loaded (/usr/lib/systemd/system/gala-spider.service; enabled; vendor preset: disabled) - Active: active (running) since Tue 2022-08-30 17:28:38 CST; 1 day 22h ago - Main PID: 2263793 (spider-storage) - Tasks: 3 (limit: 98900) - Memory: 44.2M - CGroup: /system.slice/gala-spider.service - └─2263793 /usr/bin/python3 /usr/bin/spider-storage -``` - -##### 输出示例 - -用户可以通过 arangodb 提供的 UI 界面来查询 gala-spider 输出的拓扑图。使用流程如下: - -1. 在浏览器输入 arangodb 服务器地址,如:http://localhost:8529 ,进入 arangodb 的 UI 界面。 - -2. 界面右上角切换至 `spider` 数据库。 - -3. 在 `Collections` 面板可以看到在不同时间段存储的观测对象实例的集合、拓扑关系的集合,如下图所示: - - ![spider拓扑关系图](./figures/spider拓扑关系图.png) - -4. 可进一步根据 arangodb 提供的 AQL 查询语句查询存储的拓扑关系图,详细教程参见官方文档: [aql文档](https://www.arangodb.com/docs/3.8/aql/)。 - - - -## gala-inference - -gala-inference 提供异常 KPI 根因定位能力,它将基于异常检测的结果和拓扑图作为输入,根因定位的结果作为输出,输出到 kafka 中。gala-inference 组件在 gala-spider 项目下进行归档。 - -### 安装 - -挂载 yum 源: - -```basic -[oe-2209] # openEuler 2209 官方发布源 -name=oe2209 -baseurl=http://119.3.219.20:82/openEuler:/22.09/standard_x86_64 -enabled=1 -gpgcheck=0 -priority=1 - -[oe-2209:Epol] # openEuler 2209:Epol 官方发布源 -name=oe2209_epol -baseurl=http://119.3.219.20:82/openEuler:/22.09:/Epol/standard_x86_64/ -enabled=1 -gpgcheck=0 -priority=1 -``` - -安装 gala-inference: - -```sh -# yum install gala-inference -``` - - - -### 配置 - -#### 配置文件说明 - -gala-inference 配置文件 `/etc/gala-inference/gala-inference.yaml` 配置项说明如下。 - -- inference:根因定位算法的配置信息。 - - tolerated_bias:异常时间点的拓扑图查询所容忍的时间偏移,单位为秒。 - - topo_depth:拓扑图查询的最大深度。 - - root_topk:根因定位结果输出前 K 个根因指标。 - - infer_policy:根因推导策略,包括 dfs 和 rw 。 - - sample_duration:指标的历史数据的采样周期,单位为秒。 - - evt_valid_duration:根因定位时,有效的系统异常指标事件周期,单位为秒。 - - evt_aging_duration:根因定位时,系统异常指标事件的老化周期,单位为秒。 -- kafka:kafka配置信息。 - - server:kafka服务器地址。 - - metadata_topic:观测对象元数据消息的配置信息。 - - topic_id:观测对象元数据消息的topic名称。 - - group_id:观测对象元数据消息的消费者组ID。 - - abnormal_kpi_topic:异常 KPI 事件消息的配置信息。 - - topic_id:异常 KPI 事件消息的topic名称。 - - group_id:异常 KPI 事件消息的消费者组ID。 - - abnormal_metric_topic:系统异常指标事件消息的配置信息。 - - topic_id:系统异常指标事件消息的topic名称。 - - group_id:系统异常指标事件消息的消费者组ID。 - - consumer_to:消费系统异常指标事件消息的超时时间,单位为秒。 - - inference_topic:根因定位结果输出事件消息的配置信息。 - - topic_id:根因定位结果输出事件消息的topic名称。 -- arangodb:arangodb图数据库的配置信息,用于查询根因定位所需要的拓扑子图。 - - url:图数据库的服务器地址。 - - db_name:拓扑图存储的数据库名称。 -- log_conf:日志配置信息。 - - log_path:日志文件路径。 - - log_level:日志打印级别,值包括 DEBUG/INFO/WARNING/ERROR/CRITICAL。 - - max_size:日志文件大小,单位为兆字节(MB)。 - - backup_count:日志备份文件数量。 -- prometheus:prometheus数据库配置信息,用于获取指标的历史时序数据。 - - base_url:prometheus服务器地址。 - - range_api:区间采集API。 - - step:采集时间步长,用于区间采集API。 - -#### 配置文件示例 - -```yaml -inference: - # 异常时间点的拓扑图查询所容忍的时间偏移,单位:秒 - tolerated_bias: 120 - topo_depth: 10 - root_topk: 3 - infer_policy: "dfs" - # 单位: 秒 - sample_duration: 600 - # 根因定位时,有效的异常指标事件周期,单位:秒 - evt_valid_duration: 120 - # 异常指标事件的老化周期,单位:秒 - evt_aging_duration: 600 - -kafka: - server: "localhost:9092" - metadata_topic: - topic_id: "gala_gopher_metadata" - group_id: "metadata-inference" - abnormal_kpi_topic: - topic_id: "gala_anteater_hybrid_model" - group_id: "abn-kpi-inference" - abnormal_metric_topic: - topic_id: "gala_anteater_metric" - group_id: "abn-metric-inference" - consumer_to: 1 - inference_topic: - topic_id: "gala_cause_inference" - -arangodb: - url: "http://localhost:8529" - db_name: "spider" - -log: - log_path: "/var/log/gala-inference/inference.log" - # log level: DEBUG/INFO/WARNING/ERROR/CRITICAL - log_level: INFO - # unit: MB - max_size: 10 - backup_count: 10 - -prometheus: - base_url: "http://localhost:9090/" - range_api: "/api/v1/query_range" - step: 5 -``` - - - -### 启动 - -1. 通过命令启动。 - - ```sh - # gala-inference - ``` - -2. 通过 systemd 服务启动。 - - ```sh - # systemctl start gala-inference - ``` - - - -### 使用方法 - -##### 依赖软件部署 - -gala-inference 的运行依赖和 gala-spider一样,请参见[外部依赖软件部署](#外部依赖软件部署)。此外,gala-inference 还间接依赖 [gala-spider](#gala-spider) 和 [gala-anteater](gala-anteater使用手册.md) 软件的运行,请提前部署gala-spider和gala-anteater软件。 - -##### 配置项修改 - -修改 gala-inference 的配置文件中部分配置项。示例如下: - -配置 kafka 服务器地址: - -```yaml -kafka: - server: "localhost:9092" -``` - -配置 prometheus 服务器地址: - -```yaml -prometheus: - base_url: "http://localhost:9090/" -``` - -配置 arangodb 服务器地址: - -```yaml -arangodb: - url: "http://localhost:8529" -``` - -##### 启动服务 - -直接运行 `systemctl start gala-inference` 即可。可通过执行 `systemctl status gala-inference` 查看启动状态,如下打印表示启动成功。 - -```sh -[root@openEuler ~]# systemctl status gala-inference -● gala-inference.service - a-ops gala inference service - Loaded: loaded (/usr/lib/systemd/system/gala-inference.service; enabled; vendor preset: disabled) - Active: active (running) since Tue 2022-08-30 17:55:33 CST; 1 day 22h ago - Main PID: 2445875 (gala-inference) - Tasks: 10 (limit: 98900) - Memory: 48.7M - CGroup: /system.slice/gala-inference.service - └─2445875 /usr/bin/python3 /usr/bin/gala-inference -``` - -##### 输出示例 - -当异常检测模块 gala-anteater 检测到 KPI 异常后,会将对应的异常 KPI 事件输出到 kafka 中,gala-inference 会一直监测该异常 KPI 事件的消息,如果收到异常 KPI 事件的消息,就会触发根因定位。根因定位会将定位结果输出到 kafka 中,用户可以在 kafka 服务器中查看根因定位的输出结果,基本步骤如下: - -1. 若通过源码安装 kafka ,需要进入 kafka 的安装目录下。 - - ```sh - cd /root/kafka_2.13-2.8.0 - ``` - -2. 执行消费 topic 的命令获取根因定位的输出结果。 - - ```sh - ./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic gala_cause_inference - ``` - - 输出示例如下: - - ```json - { - "Timestamp": 1661853360000, - "event_id": "1661853360000_1fd37742xxxx_sli_12154_19", - "Atrributes": { - "event_id": "1661853360000_1fd37742xxxx_sli_12154_19" - }, - "Resource": { - "abnormal_kpi": { - "metric_id": "gala_gopher_sli_rtt_nsec", - "entity_id": "1fd37742xxxx_sli_12154_19", - "timestamp": 1661853360000, - "metric_labels": { - "machine_id": "1fd37742xxxx", - "tgid": "12154", - "conn_fd": "19" - } - }, - "cause_metrics": [ - { - "metric_id": "gala_gopher_proc_write_bytes", - "entity_id": "1fd37742xxxx_proc_12154", - "metric_labels": { - "__name__": "gala_gopher_proc_write_bytes", - "cmdline": "/opt/redis/redis-server x.x.x.172:3742", - "comm": "redis-server", - "container_id": "5a10635e2c43", - "hostname": "openEuler", - "instance": "x.x.x.172:8888", - "job": "prometheus", - "machine_id": "1fd37742xxxx", - "pgid": "12154", - "ppid": "12126", - "tgid": "12154" - }, - "timestamp": 1661853360000, - "path": [ - { - "metric_id": "gala_gopher_proc_write_bytes", - "entity_id": "1fd37742xxxx_proc_12154", - "metric_labels": { - "__name__": "gala_gopher_proc_write_bytes", - "cmdline": "/opt/redis/redis-server x.x.x.172:3742", - "comm": "redis-server", - "container_id": "5a10635e2c43", - "hostname": "openEuler", - "instance": "x.x.x.172:8888", - "job": "prometheus", - "machine_id": "1fd37742xxxx", - "pgid": "12154", - "ppid": "12126", - "tgid": "12154" - }, - "timestamp": 1661853360000 - }, - { - "metric_id": "gala_gopher_sli_rtt_nsec", - "entity_id": "1fd37742xxxx_sli_12154_19", - "metric_labels": { - "machine_id": "1fd37742xxxx", - "tgid": "12154", - "conn_fd": "19" - }, - "timestamp": 1661853360000 - } - ] - } - ] - }, - "SeverityText": "WARN", - "SeverityNumber": 13, - "Body": "A cause inferring event for an abnormal event" - } - ``` \ No newline at end of file diff --git a/docs/zh/docs/A-Ops/overview.md b/docs/zh/docs/A-Ops/overview.md deleted file mode 100644 index cd3fe77186fcd48076a6a4aefd9d43442d95194a..0000000000000000000000000000000000000000 --- a/docs/zh/docs/A-Ops/overview.md +++ /dev/null @@ -1,3 +0,0 @@ -# A-Ops用户指南 - -本文介绍A-Ops智能运维框架以及智能定位、配置溯源等服务的安装与使用方法,使用户能够快速了解并使用A-Ops。用户能够借由A-Ops降低系统集群的运维成本,实现系统故障快速定位、配置项统筹管理等功能。 diff --git "a/docs/zh/docs/A-Ops/\346\236\266\346\236\204\346\204\237\347\237\245\346\234\215\345\212\241\344\275\277\347\224\250\346\211\213\345\206\214.md" "b/docs/zh/docs/A-Ops/\346\236\266\346\236\204\346\204\237\347\237\245\346\234\215\345\212\241\344\275\277\347\224\250\346\211\213\345\206\214.md" deleted file mode 100644 index f81ce61d67a5d223ef1e447b87cadec22a5ad0b7..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/\346\236\266\346\236\204\346\204\237\347\237\245\346\234\215\345\212\241\344\275\277\347\224\250\346\211\213\345\206\214.md" +++ /dev/null @@ -1,77 +0,0 @@ -# 架构感知服务使用手册 - -## 安装 - -#### 手动安装 - -- 通过yum挂载repo源实现 - - 配置yum源:openEuler22.09 和 openEuler22.09:Epol,repo源路径:/etc/yum.repos.d/openEuler.repo。 - - ```ini - [everything] # openEuler 22.09 官方发布源 - name=openEuler22.09 - baseurl=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/RPM-GPG-KEY-openEuler - - [Epol] # openEuler 22.09:Epol 官方发布源 - name=Epol - baseurl=https://repo.openeuler.org/openEuler-22.09/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/OS/$basearch/RPM-GPG-KEY-openEuler - ``` - - 然后执行如下指令下载以及安装gala-ragdoll及其依赖。 - - ```shell - # A-Ops 架构感知,通常安装在主节点上 - yum install gala-spider - yum install python3-gala-spider - - # A-Ops 架构感知探针,通常安装在主节点上 - yum install gala-gopher - ``` - -- 通过安装rpm包实现。先下载gala-ragdoll-vx.x.x-x.oe1.aarch64.rpm,然后执行如下命令进行安装(其中x.x-x表示版本号,请用实际情况替代)。 - - ```shell - rpm -ivh gala-spider-vx.x.x-x.oe1.aarch64.rpm - - rpm -ivh gala-gopher-vx.x.x-x.oe1.aarch64.rpm - ``` - - - -#### 使用Aops部署服务安装 - -##### 编辑任务列表 - -修改部署任务列表,打开gala_ragdoll步骤开关: - -```yaml ---- -step_list: - ... - gala_gopher: - enable: false - continue: false - gala_spider: - enable: false - continue: false - ... -``` - -##### 编辑主机清单 - -具体步骤参见[部署管理使用手册](部署管理使用手册.md)章节2.2.2.11章节gala-spider与gala-gopher模块主机配置 - -##### 编辑变量列表 - -具体步骤参见[部署管理使用手册](部署管理使用手册.md)章节2.2.2.11章节gala-spider与gala-gopher模块变量配置 - -##### 执行部署任务 - -具体步骤参见[部署管理使用手册](部署管理使用手册.md)章节3执行部署任务 \ No newline at end of file diff --git "a/docs/zh/docs/A-Ops/\351\205\215\347\275\256\346\272\257\346\272\220\346\234\215\345\212\241\344\275\277\347\224\250\346\211\213\345\206\214.md" "b/docs/zh/docs/A-Ops/\351\205\215\347\275\256\346\272\257\346\272\220\346\234\215\345\212\241\344\275\277\347\224\250\346\211\213\345\206\214.md" deleted file mode 100644 index 97afbb57203f58cc8af1c0239e4f51314814f29e..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Ops/\351\205\215\347\275\256\346\272\257\346\272\220\346\234\215\345\212\241\344\275\277\347\224\250\346\211\213\345\206\214.md" +++ /dev/null @@ -1,166 +0,0 @@ -gala-ragdoll的使用指导 -============================ - -## 安装 - -#### 手动安装 - -- 通过yum挂载repo源实现 - - 配置yum源:openEuler22.09 和 openEuler22.09:Epol,repo源路径:/etc/yum.repos.d/openEuler.repo。 - - ```ini - [everything] # openEuler 22.09 官方发布源 - name=openEuler22.09 - baseurl=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/everything/$basearch/RPM-GPG-KEY-openEuler - - [Epol] # openEuler 22.09:Epol 官方发布源 - name=Epol - baseurl=https://repo.openeuler.org/openEuler-22.09/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.09/OS/$basearch/RPM-GPG-KEY-openEuler - ``` - - 然后执行如下指令下载以及安装gala-ragdoll及其依赖。 - - ```shell - yum install gala-ragdoll # A-Ops 配置溯源 - yum install python3-gala-ragdoll - - yum install gala-spider # A-Ops 架构感知 - yum install python3-gala-spider - ``` - -- 通过安装rpm包实现。先下载gala-ragdoll-vx.x.x-x.oe1.aarch64.rpm,然后执行如下命令进行安装(其中x.x-x表示版本号,请用实际情况替代) - - ```shell - rpm -ivh gala-ragdoll-vx.x.x-x.oe1.aarch64.rpm - ``` - - - -#### 使用Aops部署服务安装 - -##### 编辑任务列表 - -修改部署任务列表,打开gala_ragdoll步骤开关: - -```yaml ---- -step_list: - ... - gala_ragdoll: - enable: false - continue: false - ... -``` - -##### 编辑主机清单 - -具体步骤参见[部署管理使用手册](部署管理使用手册.md)章节2.2.2.10章节gala-ragdoll模块主机配置 - -##### 编辑变量列表 - -具体步骤参见[部署管理使用手册](部署管理使用手册.md)章节2.2.2.10章节gala-ragdoll模块变量配置 - -##### 执行部署任务 - -具体步骤参见[部署管理使用手册](部署管理使用手册.md)章节3执行部署任务 - - - -### 配置文件介绍 - -```/etc/yum.repos.d/openEuler.repo```是用来规定yum源地址的配置文件,该配置文件内容为: - -``` -[OS] -name=OS -baseurl=http://repo.openeuler.org/openEuler-22.09/OS/$basearch/ -enabled=1 -gpgcheck=1 -gpgkey=http://repo.openeuler.org/openEuler-22.09/OS/$basearch/RPM-GPG-KEY-openEuler -``` - -### yang模型介绍 - -`/etc/yum.repos.d/openEuler.repo`采用yang语言进行表示,参见`gala-ragdoll/yang_modules/openEuler-logos-openEuler.repo.yang`; -其中增加了三个拓展字段: - -| 拓展字段名称 | 拓展字段格式 | 样例 | -| ------------ | ---------------------- | ----------------------------------------- | -| path | OS类型:配置文件的路径 | openEuler:/etc/yum.repos.d/openEuler.repo | -| type | 配置文件类型 | ini、key-value、json、text等 | -| spacer | 配置项和配置值的中间键 | “ ”、“=”、“:”等 | - -附:yang语言的学习地址:https://tonydeng.github.io/rfc7950-zh/ - -### 通过配置溯源创建域 - -#### 查看配置文件 - -gala-ragdoll中存在配置溯源的配置文件 - -``` -[root@openeuler-development-1-1drnd ~]# cat /etc/ragdoll/gala-ragdoll.conf -[git] // 定义当前的git信息:包括git仓的目录和用户信息 -git_dir = "/home/confTraceTestConf" -user_name = "user" -user_email = "email" - -[collect] // A-OPS 对外提供的collect接口 -collect_address = "http://192.168.0.0:11111" -collect_api = "/manage/config/collect" - -[ragdoll] -port = 11114 - -``` - -#### 创建配置域 - - -![](./figures/chuangjianyewuyu.png) - - - -#### 添加配置域纳管node - -![](./figures/tianjianode.png) - - - -#### 添加配置域配置 - - -![](./figures/xinzengpeizhi.png) - -#### 查询预期配置 - - -![](./figures/chakanyuqi.png) - -#### 删除配置 - -![](./figures/shanchupeizhi.png) - -#### 查询实际配置 - -![](./figures/chaxunshijipeizhi.png) - - - -#### 配置校验 - - -![](./figures/zhuangtaichaxun.png) - - - -#### 配置同步 - -暂未提供 diff --git a/docs/zh/docs/A-Tune/A-Tune.md b/docs/zh/docs/A-Tune/A-Tune.md deleted file mode 100644 index cb0c369559000de42481e7f9ad90f7a4380574de..0000000000000000000000000000000000000000 --- a/docs/zh/docs/A-Tune/A-Tune.md +++ /dev/null @@ -1,5 +0,0 @@ -# A-Tune 用户指南 - -本文档介绍openEuler系统性能自优化软件A-Tune的安装部署和使用方法,以指导用户快速了解并使用A-Tune。 - -本文档适用于使用openEuler系统并希望了解和使用A-Tune的社区开发者、开源爱好者以及相关合作伙伴。使用人员需要具备基本的Linux操作系统知识。 \ No newline at end of file diff --git a/docs/zh/docs/A-Tune/figures/picture1.png b/docs/zh/docs/A-Tune/figures/picture1.png deleted file mode 100644 index 52d496e95f06ef8636730dbbc1aa84d88aea6a34..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/picture1.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/picture4.png b/docs/zh/docs/A-Tune/figures/picture4.png deleted file mode 100644 index 85d57aa2024615a6f0fbff5a7d2a207941eb3085..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/picture4.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/zh-cn_image_0213178479.png b/docs/zh/docs/A-Tune/figures/zh-cn_image_0213178479.png deleted file mode 100644 index d245d48dc07e2b01734e21ec1952e89fa9269bdb..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/zh-cn_image_0213178479.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/zh-cn_image_0213178480.png b/docs/zh/docs/A-Tune/figures/zh-cn_image_0213178480.png deleted file mode 100644 index a32856aa08e459ed0f51f8fcf4c2f51511c12095..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/zh-cn_image_0213178480.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/zh-cn_image_0214540398.png b/docs/zh/docs/A-Tune/figures/zh-cn_image_0214540398.png deleted file mode 100644 index cea2292307b57854aa629ec102a5bc1b16d244a0..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/zh-cn_image_0214540398.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/zh-cn_image_0227497000.png b/docs/zh/docs/A-Tune/figures/zh-cn_image_0227497000.png deleted file mode 100644 index db9b5ce8b6d211d54ea36930504cca415ddfb8ca..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/zh-cn_image_0227497000.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/zh-cn_image_0227497343.png b/docs/zh/docs/A-Tune/figures/zh-cn_image_0227497343.png deleted file mode 100644 index aecf293846ebd12f15b9a3fb5fdc2618d9d527dc..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/zh-cn_image_0227497343.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/zh-cn_image_0231122163.png b/docs/zh/docs/A-Tune/figures/zh-cn_image_0231122163.png deleted file mode 100644 index 66bf082a6537ad70c84e4e8f07de745f973482b9..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/zh-cn_image_0231122163.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/figures/zh-cn_image_0245342444.png b/docs/zh/docs/A-Tune/figures/zh-cn_image_0245342444.png deleted file mode 100644 index 10f0fceb42c00c80ef49decdc0c480eb04c2ca6d..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/figures/zh-cn_image_0245342444.png and /dev/null differ diff --git a/docs/zh/docs/A-Tune/native-turbo.md b/docs/zh/docs/A-Tune/native-turbo.md deleted file mode 100644 index 35d86d1d6cce2c27334b3fb165ba04047c4ad364..0000000000000000000000000000000000000000 --- a/docs/zh/docs/A-Tune/native-turbo.md +++ /dev/null @@ -1,55 +0,0 @@ -# native-turbo特性 - -## 简介 - -大型程序的代码段、数据段可达数百MB,关键业务流程TLB miss较高。内核页表大小对性能有影响。 - -为了方便用户使用大页,native-turbo特性实现了加载程序时自动使用大页的功能,可以针对代码段、数据段使用大页。 - -## 使用方法 - -1. 打开特性开关 - - 该特性有两级开关,sysctl fs.exec-use-hugetlb用于控制本系统是否打开该特性(由root用户控制,0不打开,1打开,其他值非法)。 - - 如果不打开该开关,即使用户设置了环境变量也不会使用该特性,内核会忽略相关流程。 - - 系统打开该特性后,普通用户可以通过环境变量HUGEPAGE_PROBE自行决定运行的程序是否需要使用大页(1使用,不设置或其他值不使用)。 - - ```shell - sysctl fs.exec-use-hugetlb=1 #主程序使用大页 - export HUGEPAGE_PROBE=1 #动态库使用大页 - ``` - - 动态库大页也可以使用LD_HUGEPAGE_LIB=1环境变量强制所有段使用大页。 - -2. 标记需要使用大页的段,默认标记所有段,-x表示仅代码段,-d清除已有标记。 - - ```shell - hugepageedit [-x] [-d] app - ``` - - 该工具由glibc-devel包提供。 - -3. 启动程序 - - ./app - -## 约束限制 - -1. 程序与动态库必须按照2M对齐编译,可通过添加如下gcc编译参数实现: - - ```shell - -zcommon-page-size=0x200000 -zmax-page-size=0x200000 - ``` - -2. 使用前需要预留足够的大页,否则程序会执行失败。 - - 如果使用cgruop,请注意hugetlb的限制,如果限制小于所需大页数量,可能导致运行时崩溃。 - -3. 由于进程页表改为2M,mprotect等系统调用的参数需要按2M对齐,否则会执行失败。 - -4. 不支持libcareplus热补丁机制。 - -5. 多个进程间无法共享大页,会消耗多倍内存。 - diff --git a/docs/zh/docs/A-Tune/public_sys-resources/icon-caution.gif b/docs/zh/docs/A-Tune/public_sys-resources/icon-caution.gif deleted file mode 100644 index 6e90d7cfc2193e39e10bb58c38d01a23f045d571..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/public_sys-resources/icon-caution.gif and /dev/null differ diff --git a/docs/zh/docs/A-Tune/public_sys-resources/icon-danger.gif b/docs/zh/docs/A-Tune/public_sys-resources/icon-danger.gif deleted file mode 100644 index 6e90d7cfc2193e39e10bb58c38d01a23f045d571..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/public_sys-resources/icon-danger.gif and /dev/null differ diff --git a/docs/zh/docs/A-Tune/public_sys-resources/icon-note.gif b/docs/zh/docs/A-Tune/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/public_sys-resources/icon-note.gif and /dev/null differ diff --git a/docs/zh/docs/A-Tune/public_sys-resources/icon-notice.gif b/docs/zh/docs/A-Tune/public_sys-resources/icon-notice.gif deleted file mode 100644 index 86024f61b691400bea99e5b1f506d9d9aef36e27..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/public_sys-resources/icon-notice.gif and /dev/null differ diff --git a/docs/zh/docs/A-Tune/public_sys-resources/icon-tip.gif b/docs/zh/docs/A-Tune/public_sys-resources/icon-tip.gif deleted file mode 100644 index 93aa72053b510e456b149f36a0972703ea9999b7..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/public_sys-resources/icon-tip.gif and /dev/null differ diff --git a/docs/zh/docs/A-Tune/public_sys-resources/icon-warning.gif b/docs/zh/docs/A-Tune/public_sys-resources/icon-warning.gif deleted file mode 100644 index 6e90d7cfc2193e39e10bb58c38d01a23f045d571..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/A-Tune/public_sys-resources/icon-warning.gif and /dev/null differ diff --git "a/docs/zh/docs/A-Tune/\344\275\277\347\224\250\346\226\271\346\263\225.md" "b/docs/zh/docs/A-Tune/\344\275\277\347\224\250\346\226\271\346\263\225.md" deleted file mode 100644 index 4e3099da2d976e66ece738912e5851c3ccf42114..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Tune/\344\275\277\347\224\250\346\226\271\346\263\225.md" +++ /dev/null @@ -1,1155 +0,0 @@ -# 使用方法 -用户可以通过命令行客户端atune-adm使用A-Tune提供的功能。本章介绍A-Tune客户端包含的功能和使用方法。 - - - -- [使用方法](#使用方法) - - [总体说明](#总体说明) - - [查询负载类型](#查询负载类型) - - [list](#list) - - [分析负载类型并自优化](#分析负载类型并自优化) - - [analysis](#analysis) - - [自定义模型](#自定义模型) - - [define](#define) - - [collection](#collection) - - [train](#train) - - [undefine](#undefine) - - [查询profile](#查询profile) - - [info](#info) - - [更新profile](#更新profile) - - [update](#update) - - [激活profile](#激活profile) - - [profile](#profile) - - [回滚profile](#回滚profile) - - [rollback](#rollback) - - [更新数据库](#更新数据库) - - [upgrade](#upgrade) - - [系统信息查询](#系统信息查询) - - [check](#check) - - [参数自调优](#参数自调优) - - [tuning](#tuning) - - - -## 总体说明 - -- 使用A-Tune需要使用root权限。 -- atune-adm支持的命令可以通过 **atune-adm help/--help/-h** 查询。 -- define、update、undefine、collection、train、upgrade不支持远程执行。 -- 命令格式中,\[ \] 表示参数可选,<\> 表示参数必选,具体参数由实际情况确定。 - - -## 查询负载类型 -### list - -### 功能描述 - -查询系统当前支持的profile,以及当前处于active状态的profile。 - -### 命令格式 - -**atune-adm list** - -### 使用示例 - -``` -# atune-adm list - -Support profiles: -+------------------------------------------------+-----------+ -| ProfileName | Active | -+================================================+===========+ -| arm-native-android-container-robox | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-fio | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-lmbench | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-netperf | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-stream | false | -+------------------------------------------------+-----------+ -| basic-test-suite-euleros-baseline-unixbench | false | -+------------------------------------------------+-----------+ -| basic-test-suite-speccpu-speccpu2006 | false | -+------------------------------------------------+-----------+ -| basic-test-suite-specjbb-specjbb2015 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-hdfs-dfsio-hdd | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-hdfs-dfsio-ssd | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-bayesian | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-kmeans | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql1 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql10 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql2 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql3 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql4 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql5 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql6 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql7 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql8 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-sql9 | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-tersort | false | -+------------------------------------------------+-----------+ -| big-data-hadoop-spark-wordcount | false | -+------------------------------------------------+-----------+ -| cloud-compute-kvm-host | false | -+------------------------------------------------+-----------+ -| database-mariadb-2p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| database-mariadb-4p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| database-mongodb-2p-sysbench | false | -+------------------------------------------------+-----------+ -| database-mysql-2p-sysbench-hdd | false | -+------------------------------------------------+-----------+ -| database-mysql-2p-sysbench-ssd | false | -+------------------------------------------------+-----------+ -| database-postgresql-2p-sysbench-hdd | false | -+------------------------------------------------+-----------+ -| database-postgresql-2p-sysbench-ssd | false | -+------------------------------------------------+-----------+ -| default-default | false | -+------------------------------------------------+-----------+ -| docker-mariadb-2p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| docker-mariadb-4p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| hpc-gatk4-human-genome | false | -+------------------------------------------------+-----------+ -| in-memory-database-redis-redis-benchmark | false | -+------------------------------------------------+-----------+ -| middleware-dubbo-dubbo-benchmark | false | -+------------------------------------------------+-----------+ -| storage-ceph-vdbench-hdd | false | -+------------------------------------------------+-----------+ -| storage-ceph-vdbench-ssd | false | -+------------------------------------------------+-----------+ -| virtualization-consumer-cloud-olc | false | -+------------------------------------------------+-----------+ -| virtualization-mariadb-2p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| virtualization-mariadb-4p-tpcc-c3 | false | -+------------------------------------------------+-----------+ -| web-apache-traffic-server-spirent-pingpo | false | -+------------------------------------------------+-----------+ -| web-nginx-http-long-connection | true | -+------------------------------------------------+-----------+ -| web-nginx-https-short-connection | false | -+------------------------------------------------+-----------+ - -``` - ->![](./public_sys-resources/icon-note.gif) **说明:** ->Active为true表示当前激活的profile,示例表示当前激活的profile是web-nginx-http-long-connection。 - -## 分析负载类型并自优化 -### analysis - -### 功能描述 - -采集系统的实时统计数据进行负载类型识别,并进行自动优化。 - -### 命令格式 - -**atune-adm analysis** \[OPTIONS\] - -### 参数说明 - -- OPTIONS - - - - - - - - - - - - - - - - - - - - - - -

参数

-

描述

-

--model, -m

-

用户自训练产生的新模型

-

--characterization, -c

-

使用默认的模型进行应用识别,不进行自动优化

-

--times value, -t value

-

制定收集数据的时长

-

--script value, -s value

-

指定需要运行的文件

-
- - -### 使用示例 - -- 使用默认的模型进行应用识别 - - ``` - # atune-adm analysis --characterization - ``` - -- 使用默认的模型进行应用识别,并进行自动优化 - - ``` - # atune-adm analysis - ``` - -- 使用自训练的模型进行应用识别 - - ``` - # atune-adm analysis --model /usr/libexec/atuned/analysis/models/new-model.m - ``` - - -## 自定义模型 - -A-Tune支持用户定义并学习新模型。定义新模型的操作流程如下: - -1. 用define命令定义一个新应用的profile -2. 用collection命令收集应用对应的系统数据 -3. 用train命令训练得到模型 - - - -### define - -### 功能描述 - -添加用户自定义的应用场景,及对应的profile优化项。 - -### 命令格式 - -**atune-adm define** - -### 使用示例 - -新增一个profile,service_type的名称为test_service,application_name的名称为test_app,scenario_name的名称为test_scenario,优化项的配置文件为example.conf。 - -``` -# atune-adm define test_service test_app test_scenario ./example.conf -``` - -example.conf 可以参考如下方式书写(以下各优化项非必填,仅供参考),也可通过**atune-adm info**查看已有的profile是如何书写的。 - -``` - [main] - # list its parent profile - [kernel_config] - # to change the kernel config - [bios] - # to change the bios config - [bootloader.grub2] - # to change the grub2 config - [sysfs] - # to change the /sys/* config - [systemctl] - # to change the system service status - [sysctl] - # to change the /proc/sys/* config - [script] - # the script extension of cpi - [ulimit] - # to change the resources limit of user - [schedule_policy] - # to change the schedule policy - [check] - # check the environment - [tip] - # the recommended optimization, which should be performed manunaly -``` - -### collection - -### 功能描述 - -采集业务运行时系统的全局资源使用情况以及OS的各项状态信息,并将收集的结果保存到csv格式的输出文件中,作为模型训练的输入数据集。 - ->![](./public_sys-resources/icon-note.gif) **说明:** ->- 本命令依赖采样工具perf,mpstat,vmstat,iostat,sar。 ->- CPU型号目前仅支持鲲鹏920,可通过dmidecode -t processor检查CPU型号。 - -### 命令格式 - -**atune-adm collection** - -### 参数说明 - -- OPTIONS - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

参数

-

描述

-

--filename, -f

-

生成的用于训练的csv文件名:名称-时间戳.csv

-

--output_path, -o

-

生成的csv文件的存放路径,需提供绝对路径

-

--disk, -b

-

业务运行时实际使用的磁盘,如/dev/sda

-

--network, -n

-

业务运行时使用的网络接口,如eth0

-

--app_type, -t

-

标记业务的应用类型,作为训练时使用的标签

-

--duration, -d

-

业务运行时采集数据的时间,单位秒,默认采集时间1200秒

-

--interval,-i

-

采集数据的时间间隔,单位秒,默认采集间隔5秒

-
- - -### 使用示例 - -``` -# atune-adm collection --filename name --interval 5 --duration 1200 --output_path /home/data --disk sda --network eth0 --app_type test_service-test_app-test_scenario -``` -> 说明: -> -> 实例中定义了每隔5秒收集一次数据,一共收集1200秒;采集后的数据存放在/home/data目录下名称为name的文件中,业务的应用类型是通过atune-adm define指定的业务类型,这里为test_service-test_app-test_scenario -> 采集间隔和采集时间都可以通过上述选项指定时长。 -### train - -### 功能描述 - -使用采集的数据进行模型的训练。训练时至少采集两种应用类型的数据,否则训练会出错。 - -### 命令格式 - -**atune-adm train** - -### 参数说明 - -- OPTIONS - - - - - - - - - - - - - -

参数

-

描述

-

--data_path, -d

-

存放模型训练所需的csv文件的目录

-

--output_file, -o

-

训练生成的新模型

-
- - -### 使用示例 - -使用data目录下的csv文件作为训练输入,生成的新模型new-model.m存放在model目录下。 - -``` -# atune-adm train --data_path /home/data --output_file /usr/libexec/atuned/analysis/models/new-model.m -``` - -### undefine - -### 功能描述 - -删除用户自定义的profile。 - -### 命令格式 - -**atune-adm undefine** - -### 使用示例 - -删除自定义的profile。 - -``` -# atune-adm undefine test_service-test_app-test_scenario -``` - -## 查询profile - -### info - -### 功能描述 - -查看对应的profile内容。 - -### 命令格式 - -**atune-adm info** - -### 使用示例 - -查看web-nginx-http-long-connection的profile内容: - -``` -# atune-adm info web-nginx-http-long-connection - -*** web-nginx-http-long-connection: - -# -# nginx http long connection A-Tune configuration -# -[main] -include = default-default - -[kernel_config] -#TODO CONFIG - -[bios] -#TODO CONFIG - -[bootloader.grub2] -iommu.passthrough = 1 - -[sysfs] -#TODO CONFIG - -[systemctl] -sysmonitor = stop -irqbalance = stop - -[sysctl] -fs.file-max = 6553600 -fs.suid_dumpable = 1 -fs.aio-max-nr = 1048576 -kernel.shmmax = 68719476736 -kernel.shmall = 4294967296 -kernel.shmmni = 4096 -kernel.sem = 250 32000 100 128 -net.ipv4.tcp_tw_reuse = 1 -net.ipv4.tcp_syncookies = 1 -net.ipv4.ip_local_port_range = 1024 65500 -net.ipv4.tcp_max_tw_buckets = 5000 -net.core.somaxconn = 65535 -net.core.netdev_max_backlog = 262144 -net.ipv4.tcp_max_orphans = 262144 -net.ipv4.tcp_max_syn_backlog = 262144 -net.ipv4.tcp_timestamps = 0 -net.ipv4.tcp_synack_retries = 1 -net.ipv4.tcp_syn_retries = 1 -net.ipv4.tcp_fin_timeout = 1 -net.ipv4.tcp_keepalive_time = 60 -net.ipv4.tcp_mem = 362619 483495 725238 -net.ipv4.tcp_rmem = 4096 87380 6291456 -net.ipv4.tcp_wmem = 4096 16384 4194304 -net.core.wmem_default = 8388608 -net.core.rmem_default = 8388608 -net.core.rmem_max = 16777216 -net.core.wmem_max = 16777216 - -[script] -prefetch = off -ethtool = -X {network} hfunc toeplitz - -[ulimit] -{user}.hard.nofile = 102400 -{user}.soft.nofile = 102400 - -[schedule_policy] -#TODO CONFIG - -[check] -#TODO CONFIG - -[tip] -SELinux provides extra control and security features to linux kernel. Disabling SELinux will improve the performance but may cause security risks. = kernel -disable the nginx log = application -``` - -## 更新profile - -用户根据需要更新已有profile。 -### update - -### 功能描述 - -将已有profile中原来的优化项更新为new.conf中的内容。 - -### 命令格式 - -**atune-adm update** - -### 使用示例 - -更新名为test_service-test_app-test_scenario的profile优化项为new.conf。 - -``` -# atune-adm update test_service-test_app-test_scenario ./new.conf -``` - -## 激活profile -### profile - -### 功能描述 - -手动激活profile,使其处于active状态。 - -### 命令格式 - -**atune-adm profile** - -### 参数说明 - -profile名参考list命令查询结果。 - -### 使用示例 - -激活web-nginx-http-long-connection对应的profile配置。 - -``` -# atune-adm profile web-nginx-http-long-connection -``` - -## 回滚profile -### rollback - -### 功能描述 - -回退当前的配置到系统的初始配置。 - -### 命令格式 - -**atune-adm rollback** - -### 使用示例 - -``` -# atune-adm rollback -``` - -## 更新数据库 -### upgrade - -### 功能描述 - -更新系统的数据库。 - -### 命令格式 - -**atune-adm upgrade** - -### 参数说明 - -- DB\_FILE - - 新的数据库文件路径 - - -### 使用示例 - -数据库更新为new\_sqlite.db。 - -``` -# atune-adm upgrade ./new_sqlite.db -``` - -## 系统信息查询 -### check - -### 功能描述 - -检查系统当前的cpu、bios、os、网卡等信息。 - -### 命令格式 - -**atune-adm check** - -### 使用示例 - -``` -# atune-adm check - cpu information: - cpu:0 version: Kunpeng 920-6426 speed: 2600000000 HZ cores: 64 - cpu:1 version: Kunpeng 920-6426 speed: 2600000000 HZ cores: 64 - system information: - DMIBIOSVersion: 0.59 - OSRelease: 4.19.36-vhulk1906.3.0.h356.eulerosv2r8.aarch64 - network information: - name: eth0 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth1 product: HNS GE/10GE/25GE Network Controller - name: eth2 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth3 product: HNS GE/10GE/25GE Network Controller - name: eth4 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth5 product: HNS GE/10GE/25GE Network Controller - name: eth6 product: HNS GE/10GE/25GE RDMA Network Controller - name: eth7 product: HNS GE/10GE/25GE Network Controller - name: docker0 product: -``` - -## 参数自调优 - -A-Tune提供了最佳配置的自动搜索能力,免去人工反复做参数调整、性能评价的调优过程,极大地提升最优配置的搜寻效率。 -### tuning - -### 功能描述 - -使用指定的项目文件对参数进行动态空间的搜索,找到当前环境配置下的最优解。 - -### 命令格式 - ->![](./public_sys-resources/icon-note.gif) **说明:** ->在运行命令前,需要满足如下条件: ->1. 服务端的yaml配置文件已经编辑完成并放置于 atuned服务下的**/etc/atuned/tuning/**目录中。 ->2. 客户端的yaml配置文件已经编辑完成并放置于atuned客户端任意目录下。 - -**atune-adm tuning** \[OPTIONS\] - -### 参数说明 - -- OPTIONS - - - - - - - - - - - - - - - - - - - - -

参数

-

描述

-

--restore, -r

-

恢复tuning优化前的初始配置

-

--project, -p

-

指定需要恢复的yaml文件中的项目名称

-

--restart, -c

-

基于历史调优结果进行调优

-

--detail, -d

-

打印tuning过程的详细信息

-
- - - >![](./public_sys-resources/icon-note.gif) **说明:** - >当使用参数时,-p参数后需要跟具体的项目名称且必须指定该项目yaml文件。 - - -- PROJECT\_YAML:客户端yaml配置文件。 - -### 配置说明 - -**表 1** 服务端yaml文件 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

配置名称

-

配置说明

-

参数类型

-

取值范围

-

project

-

项目名称。

-

字符串

-

-

-

startworkload

-

待调优服务的启动脚本。

-

字符串

-

-

-

stopworkload

-

待调优服务的停止脚本。

-

字符串

-

-

-

maxiterations

-

最大调优迭代次数,用于限制客户端的迭代次数。一般来说,调优迭代次数越多,优化效果越好,但所需时间越长。用户必须根据实际的业务场景进行配置。

-

整型

-

>10

-

object

-

需要调节的参数项及信息。

-

object 配置项请参见表2

-

-

-

-

-
- -**表 2** object项配置说明 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

配置名称

-

配置说明

-

参数类型

-

取值范围

-

name

-

待调参数名称

-

字符串

-

-

-

desc

-

待调参数描述

-

字符串

-

-

-

get

-

查询参数值的脚本

-

-

-

-

-

set

-

设置参数值的脚本

-

-

-

-

-

needrestart

-

参数生效是否需要重启业务

-

枚举

-

"true", "false"

-

type

-

参数的类型,目前支持discrete, continuous两种类型,对应离散型、连续型参数

-

枚举

-

"discrete", "continuous"

-

dtype

-

该参数仅在type为discrete类型时配置,目前支持int, float, string类型

-

枚举

-

int, float, string

-

scope

-

参数设置范围,仅在type为discrete且dtype为int或float时或者type为continuous时生效

-

整型/浮点型

-

用户自定义,取值在该参数的合法范围

-

step

-

参数值步长,dtype为int或float时使用

-

整型/浮点型

-

用户自定义

-

items

-

参数值在scope定义范围之外的枚举值,dtype为int或float时使用

-

整型/浮点型

-

用户自定义,取值在该参数的合法范围

-

options

-

参数值的枚举范围,dtype为string时使用

-

字符串

-

用户自定义,取值在该参数的合法范围

-
- - - -**表 3** 客户端yaml文件配置说明 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

配置名称

-

配置说明

-

参数类型

-

取值范围

-

project

-

项目名称,需要与服务端对应配置文件中的project匹配

-

字符串

-

-

-

engine

-

调优算法

-

字符串

-

"random", "forest", "gbrt", "bayes", "extraTrees"

-

iterations

-

调优迭代次数

-

整型

-

>=10

-

random_starts

-

随机迭代次数

-

整型

-

<iterations

-

feature_filter_engine

-

参数搜索算法,用于重要参数选择,该参数可选

-

字符串

-

"lhs"

-

feature_filter_cycle

-

参数搜索轮数,用于重要参数选择,该参数配合feature_filter_engine使用

-

整型

-

-

-

feature_filter_iters

-

每轮参数搜索的迭代次数,用于重要参数选择,该参数配合feature_filter_engine使用

-

整型

-

-

-

split_count

-

调优参数取值范围中均匀选取的参数个数,用于重要参数选择,该参数配合feature_filter_engine使用

-

整型

-

-

-

benchmark

-

性能测试脚本

-

-

-

-

-

evaluations

-

性能测试评估指标

-

evaluations 配置项请参见表4

-

-

-

-

-
- - - -**表 4** evaluations项配置说明 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

配置名称

-

配置说明

-

参数类型

-

取值范围

-

name

-

评价指标名称

-

字符串

-

-

-

get

-

获取性能评估结果的脚本

-

-

-

-

-

type

-

评估结果的正负类型,positive代表最小化性能值,negative代表最大化对应性能值

-

枚举

-

"positive","negative"

-

weight

-

该指标的权重百分比,0-100

-

整型

-

0-100

-

threshold

-

该指标的最低性能要求

-

整型

-

用户指定

-
- -### 配置示例 - -服务端yaml文件配置示例: - -``` -project: "compress" -maxiterations: 500 -startworkload: "" -stopworkload: "" -object : - - - name : "compressLevel" - info : - desc : "The compresslevel parameter is an integer from 1 to 9 controlling the level of compression" - get : "cat /root/A-Tune/examples/tuning/compress/compress.py | grep 'compressLevel=' | awk -F '=' '{print $2}'" - set : "sed -i 's/compressLevel=\\s*[0-9]*/compressLevel=$value/g' /root/A-Tune/examples/tuning/compress/compress.py" - needrestart : "false" - type : "continuous" - scope : - - 1 - - 9 - dtype : "int" - - - name : "compressMethod" - info : - desc : "The compressMethod parameter is a string controlling the compression method" - get : "cat /root/A-Tune/examples/tuning/compress/compress.py | grep 'compressMethod=' | awk -F '=' '{print $2}' | sed 's/\"//g'" - set : "sed -i 's/compressMethod=\\s*[0-9,a-z,\"]*/compressMethod=\"$value\"/g' /root/A-Tune/examples/tuning/compress/compress.py" - needrestart : "false" - type : "discrete" - options : - - "bz2" - - "zlib" - - "gzip" - dtype : "string" -``` - -客户端yaml文件配置示例: - -``` -project: "compress" -engine : "gbrt" -iterations : 20 -random_starts : 10 - -benchmark : "python3 /root/A-Tune/examples/tuning/compress/compress.py" -evaluations : - - - name: "time" - info: - get: "echo '$out' | grep 'time' | awk '{print $3}'" - type: "positive" - weight: 20 - - - name: "compress_ratio" - info: - get: "echo '$out' | grep 'compress_ratio' | awk '{print $3}'" - type: "negative" - weight: 80 -``` - -### 使用示例 -- 下载测试数据 - ``` - wget http://cs.fit.edu/~mmahoney/compression/enwik8.zip - ``` -- 准备调优环境 - prepare.sh文件示例: - ``` - #!/usr/bin/bash - if [ "$#" -ne 1 ]; then - echo "USAGE: $0 the path of enwik8.zip" - exit 1 - fi - - path=$( - cd "$(dirname "$0")" - pwd - ) - - echo "unzip enwik8.zip" - unzip "$path"/enwik8.zip - - echo "set FILE_PATH to the path of enwik8 in compress.py" - sed -i "s#compress/enwik8#$path/enwik8#g" "$path"/compress.py - - echo "update the client and server yaml files" - sed -i "s#python3 .*compress.py#python3 $path/compress.py#g" "$path"/compress_client.yaml - sed -i "s# compress/compress.py# $path/compress.py#g" "$path"/compress_server.yaml - - echo "copy the server yaml file to /etc/atuned/tuning/" - cp "$path"/compress_server.yaml /etc/atuned/tuning/ - ``` - 运行脚本: - ``` - sh prepare.sh enwik8.zip - ``` -- 进行tuning调优 - - ``` - atune-adm tuning --project compress --detail compress_client.yaml - ``` - -- 恢复tuning调优前的初始配置,compress为yaml文件中的项目名称 - - ``` - atune-adm tuning --restore --project compress - ``` - - diff --git "a/docs/zh/docs/A-Tune/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" "b/docs/zh/docs/A-Tune/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" deleted file mode 100644 index 7e4933ea19764a242413164ff562b2a51bc0f832..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Tune/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" +++ /dev/null @@ -1,501 +0,0 @@ -# 安装与部署 -本章介绍如何安装和部署A-Tune。 - - -- [安装与部署](#安装与部署) - - [软硬件要求](#软硬件要求) - - [环境准备](#环境准备) - - [安装A-Tune](#安装a-tune) - - [安装模式介绍](#安装模式介绍) - - [安装操作](#安装操作) - - [部署A-Tune](#部署a-tune) - - [配置介绍](#配置介绍) - - [启动A-Tune](#启动a-tune) - - [启动A-Tune engine](#启动a-tune-engine) - - -## 软硬件要求 - -### 硬件要求 - -- 鲲鹏920处理器 - -### 软件要求 - -- 操作系统:openEuler 21.03 - -## 环境准备 - -- 安装openEuler系统,安装方法参考 《openEuler 21.03 安装指南》。 - -- 安装A-Tune需要使用root权限。 - -## 安装A-Tune - -本节介绍A-Tune的安装模式和安装方法。 -### 安装模式介绍 - -A-Tune支持单机模式、分布式模式安装和集群模式安装: - -- 单机模式 - - client和server安装到同一台机器上。 - -- 分布式模式 - - client和server分别安装在不同的机器上。 - -- 集群模式 - - 由一台client机器和大于一台server机器组成。 - - -三种安装模式的简单图示如下: - -![](./figures/zh-cn_image_0231122163.png) - -### 安装操作 - -安装A-Tune的操作步骤如下: - -1. 挂载openEuler的iso文件。 - - ``` - # mount openEuler-22.03-LTS-everything-x86_64-dvd.iso /mnt - ``` - 请安装everything的iso。 - -2. 配置本地yum源。 - - ``` - # vim /etc/yum.repos.d/local.repo - ``` - - 配置内容如下所示: - - ``` - [local] - name=local - baseurl=file:///mnt - gpgcheck=1 - enabled=1 - ``` - -3. 将RPM数字签名的GPG公钥导入系统。 - - ``` - # rpm --import /mnt/RPM-GPG-KEY-openEuler - ``` - - -4. 安装A-Tune服务端。 - - >![](./public_sys-resources/icon-note.gif) **说明:** - >本步骤会同时安装服务端和客户端软件包,对于单机部署模式,请跳过**步骤5**。 - - ``` - # yum install atune -y - # yum install atune-engine -y - ``` - -5. 若为分布式部署,请安装A-Tune客户端。 - - ``` - # yum install atune-client -y - ``` - -6. 验证是否安装成功。命令和回显如下表示安装成功。 - - ``` - # rpm -qa | grep atune - atune-client-xxx - atune-db-xxx - atune-xxx - atune-engine-xxx - ``` - - -## 部署A-Tune - -本节介绍A-Tune的配置部署。 -### 配置介绍 - -A-Tune配置文件/etc/atuned/atuned.cnf的配置项说明如下: - -- A-Tune服务启动配置(可根据需要进行修改)。 - - - protocol:系统gRPC服务使用的协议,unix或tcp,unix为本地socket通信方式,tcp为socket监听端口方式。默认为unix。 - - address:系统gRPC服务的侦听地址,默认为unix socket,若为分布式部署,需修改为侦听的ip地址。 - - port:系统gRPC服务的侦听端口,范围为0\~65535未使用的端口。如果protocol配置是unix,则不需要配置。 - - connect:若为集群部署时,A-Tune所在节点的ip列表,ip地址以逗号分隔。 - - rest_host:系统rest service的侦听地址,默认为localhost。 - - rest_port:系统rest service的侦听端口,范围为0~65535未使用的端口,默认为8383。 - - engine_host:与系统atune engine service链接的地址。 - - engine_port:与系统atune engine service链接的端口。 - - sample_num:系统执行analysis流程时采集样本的数量,默认为20。 - - interval:系统执行analysis流程时采集样本的间隔时间,默认为5s。 - - grpc_tls:系统gRPC的SSL/TLS证书校验开关,默认不开启。开启grpc_tls后,atune-adm命令在使用前需要设置以下环境变量方可与服务端进行通讯: - - export ATUNE_TLS=yes - - export ATUNED_CACERT=<客户端CA证书路径> - - export ATUNED_CLIENTCERT=<客户端证书路径> - - export ATUNED_CLIENTKEY=<客户端秘钥路径> - - export ATUNED_SERVERCN=server - - tlsservercafile:gRPC服务端CA证书路径。 - - tlsservercertfile:gRPC服务端证书路径。 - - tlsserverkeyfile:gRPC服务端秘钥路径。 - - rest_tls:系统rest service的SSL/TLS证书校验开关,默认开启。 - - tlsrestcacertfile:系统rest service的服务端CA证书路径。 - - tlsrestservercertfile:系统rest service的服务端证书路径 - - tlsrestserverkeyfile:系统rest service的服务端秘钥路径。 - - engine_tls:系统atune engine service的SSL/TLS证书校验开关,默认开启。 - - tlsenginecacertfile:系统atune engine service的客户端CA证书路径。 - - tlsengineclientcertfile:系统atune engine service的客户端证书路径 - - tlsengineclientkeyfile:系统atune engine service的客户端秘钥路径 - -- system信息 - - system为系统执行相关的优化需要用到的参数信息,必须根据系统实际情况进行修改。 - - - disk:执行analysis流程时需要采集的对应磁盘的信息或执行磁盘相关优化时需要指定的磁盘。 - - network:执行analysis时需要采集的对应的网卡的信息或执行网卡相关优化时需要指定的网卡。 - - - user:执行ulimit相关优化时用到的用户名。目前只支持root用户。 - -- 日志信息 - - 根据情况修改日志的级别,默认为info级别,日志信息打印在/var/log/messages中。 - -- monitor信息 - - 为系统启动时默认采集的系统硬件信息。 - -- tuning信息 - - tuning为系统进行离线调优时需要用到的参数信息。 - - - noise:高斯噪声的评估值。 - - sel_feature:控制离线调优参数重要性排名输出的开关,默认关闭。 - - -### 配置示例 - -``` -#################################### server ############################### - # atuned config - [server] - # the protocol grpc server running on - # ranges: unix or tcp - protocol = unix - - # the address that the grpc server to bind to - # default is unix socket /var/run/atuned/atuned.sock - # ranges: /var/run/atuned/atuned.sock or ip address - address = /var/run/atuned/atuned.sock - - # the atune nodes in cluster mode, separated by commas - # it is valid when protocol is tcp - # connect = ip01,ip02,ip03 - - # the atuned grpc listening port - # the port can be set between 0 to 65535 which not be used - # port = 60001 - - # the rest service listening port, default is 8383 - # the port can be set between 0 to 65535 which not be used - rest_host = localhost - rest_port = 8383 - - # the tuning optimizer host and port, start by engine.service - # if engine_host is same as rest_host, two ports cannot be same - # the port can be set between 0 to 65535 which not be used - engine_host = localhost - engine_port = 3838 - - # when run analysis command, the numbers of collected data. - # default is 20 - sample_num = 20 - - # interval for collecting data, default is 5s - interval = 5 - - # enable gRPC authentication SSL/TLS - # default is false - # grpc_tls = false - # tlsservercafile = /etc/atuned/grpc_certs/ca.crt - # tlsservercertfile = /etc/atuned/grpc_certs/server.crt - # tlsserverkeyfile = /etc/atuned/grpc_certs/server.key - - # enable rest server authentication SSL/TLS - # default is true - rest_tls = true - tlsrestcacertfile = /etc/atuned/rest_certs/ca.crt - tlsrestservercertfile = /etc/atuned/rest_certs/server.crt - tlsrestserverkeyfile = /etc/atuned/rest_certs/server.key - - # enable engine server authentication SSL/TLS - # default is true - engine_tls = true - tlsenginecacertfile = /etc/atuned/engine_certs/ca.crt - tlsengineclientcertfile = /etc/atuned/engine_certs/client.crt - tlsengineclientkeyfile = /etc/atuned/engine_certs/client.key - - - #################################### log ############################### - [log] - # either "debug", "info", "warn", "error", "critical", default is "info" - level = info - - #################################### monitor ############################### - [monitor] - # with the module and format of the MPI, the format is {module}_{purpose} - # the module is Either "mem", "net", "cpu", "storage" - # the purpose is "topo" - module = mem_topo, cpu_topo - - #################################### system ############################### - # you can add arbitrary key-value here, just like key = value - # you can use the key in the profile - [system] - # the disk to be analysis - disk = sda - - # the network to be analysis - network = enp189s0f0 - - user = root - - #################################### tuning ############################### - # tuning configs - [tuning] - noise = 0.000000001 - sel_feature = false -``` - -A-Tune engine配置文件/etc/atuned/engine.cnf的配置项说明如下: - -- A-Tune engine服务启动配置(可根据需要进行修改)。 - - - engine_host:系统atune engine service的侦听地址,默认为localhost。 - - engine_port:系统atune engine service的侦听端口,范围为0~65535未使用的端口,默认为3838。 - - engine_tls:系统atune engine service的SSL/TLS证书校验开关,默认开启。 - - tlsenginecacertfile:系统atune engine service的服务端CA证书路径。 - - tlsengineservercertfile:系统atune engine service的服务端证书路径 - - tlsengineserverkeyfile:系统atune engine service的服务端秘钥路径。 - -- 日志信息 - - 根据情况修改日志的级别,默认为info级别,日志信息打印在/var/log/messages中。 - -### 配置示例 - -``` - #################################### engine ############################### - [server] - # the tuning optimizer host and port, start by engine.service - # if engine_host is same as rest_host, two ports cannot be same - # the port can be set between 0 to 65535 which not be used - engine_host = localhost - engine_port = 3838 - - # enable engine server authentication SSL/TLS - # default is true - engine_tls = true - tlsenginecacertfile = /etc/atuned/engine_certs/ca.crt - tlsengineservercertfile = /etc/atuned/engine_certs/server.crt - tlsengineserverkeyfile = /etc/atuned/engine_certs/server.key - - #################################### log ############################### - [log] - # either "debug", "info", "warn", "error", "critical", default is "info" - level = info -``` - -## 启动A-Tune - -A-Tune安装完成后,需要配置A-Tune服务,然后启动A-Tune服务。 -- 配置A-Tune服务: - 修改atuned.cnf配置文件中网卡和磁盘的信息 - > 说明: - > - > 如果通过'make install'安装了atuned服务,网卡和磁盘已经自动更新为当前机器中的默认设备。如果需要从其他设备收集数据,请按照以下步骤配置 atuned 服务。 - - 通过以下命令可以查找当前需要采集或者执行网卡相关优化时需要指定的网卡,并修改/etc/atuned/atuned.cnf中的network配置选项为对应的指定网卡。 - ``` - ip addr - ``` - 通过以下命令可以查找当前需要采集或者执行磁盘相关优化时需要指定的磁盘,并修改/etc/atuned/atuned.cnf中的disk配置选项为对应的指定磁盘。 - ``` - fdisk -l | grep dev - ``` -- 关于证书: - 因为A-Tune的引擎和客户端使用了grpc通信协议,所以为了系统安全,需要配置证书。因为信息安全的原因,A-Tune不会提供证书生成方法,请用户自行配置系统证书。 - 如果不考虑安全问题,可以将/etc/atuned/atuned.cnf中的rest_tls 和 engine_tls配置选项设置为false,并且将/etc/atuned/engine.cnf中的engine_tls配置选项设为false。 - 如果不配置安全证书导致的一切后果与A-Tune无关。 - - -- 启动atuned服务: - - ``` - # systemctl start atuned - ``` - - -- 查询atuned服务状态: - - ``` - # systemctl status atuned - ``` - - 若回显为如下,则服务启动成功。 - - ![](./figures/zh-cn_image_0214540398.png) - -## 启动A-Tune engine - -若需要使用AI相关的功能,需要启动A-Tune engine服务才能使用。 - -- 启动atune-engine服务: - - ``` - # systemctl start atune-engine - ``` - - -- 查询atune-engine服务状态: - - ``` - # systemctl status atune-engine - ``` - - 若回显为如下,则服务启动成功。 - - ![](./figures/zh-cn_image_0245342444.png) - -## 分部式部署 - -### 分部式部署目的 -为了实现分布式架构和按需部署的目标,A-Tune支持分部式部署。可以将三个组件分开部署,轻量化组件部署对业务影响小,也避免安装过多依赖软件,减轻系统负担。
- -部署方式:本文档只介绍常用的一种部署方式:在同一节点部署客户端和服务端,在另一个节点上部署引擎模块。其他的部署方式请咨询A-Tune开发人员。 - -**部署关系图:**
-![输入图片说明](figures/picture1.png) - -### 配置文件 -分部式部署需要修改配置文件,将引擎的ip地址和端口号写入配置文件中,别的组件才能访问该ip地址上的引擎组件。 - -1. 修改服务端节点上的`/etc/atuned/atuned.cnf`文件: - - 34行的`engine_host`和`engine_port`修改为引擎节点的ip地址和端口号。如上图,应该修改为`engine_host = 192.168.0.1 engine_port = 3838`。 - - 将49行和55行的 rest_tls 和engine_tls 改为false,否则需要申请和配置证书。在测试环境中可以不用配置ssl证书,但是正式的现网环境需要配置证书,否则会有安全隐患。 -2. 修改引擎节点/etc/atuned/engine.cnf文件: - - 17行和18行的`engine_host`和`engine_port`修改为引擎节点的ip地址和端口号。如上图,应该修改为`engine_host = 192.168.0.1 engine_port = 3838`。 - - 第22行的engine_tls的值改成false。 -3. 修改完配置文件后需要重启服务,配置才会生效: - - 服务端节点输入命令:`systemctl restart atuned`。 - - 引擎端节点输入命令:`systemctl restart atune-engine`。 -4. (可选步骤)在`A-Tune/examples/tuning/compress`文件夹下运行tuning命令: - - 请先参考`A-Tune/examples/tuning/compress/README`的指导进行预处理。 - - 执行`atune-adm tuning --project compress --detail compress_client.yaml`。 - - 本步骤的目的是检验分部式部署是否成功。 - -### 注意事项 -1. 本文档不对认证证书配置方法作详细说明,如有需要也可以将atuned.cnf和engine.cnf中的rest_tls/engine_tls设成false。 -2. 修改完配置文件后需要重启服务,否则修改不会生效。 -3. 注意使用atune服务时不要同时打开代理。 -4. atuned.cnf 文件中的[system]模块的disk和network项需要修改,修改方法见[A-Tune用户指南2.4.1章节](https://gitee.com/gaoruoshu/A-Tune/blob/master/Documentation/UserGuide/A-Tune%E7%94%A8%E6%88%B7%E6%8C%87%E5%8D%97.md),本文不展开描述。 - -### 举例 -#### atuned.cnf -```bash -# ...前略... - -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.0.1 -engine_port = 3838 - -# ...后略... -``` -#### engine.cnf -```bash -[server] -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.0.1 -engine_port = 3838 -``` -## 集群部署 - -### 集群部署的目的 -为了支持多节点场景快速调优,A-Tune支持对多个节点里的参数配置同时进行动态调优,避免用户单独多次对每个节点进行调优,从而提升调优效率。
-集群部署的方式:分为一个主节点和若干个从节点。在主节点部署客户端和服务端,负责接受命令和引擎交互。其他节点接受主节点的指令,对当前节点的参数进行配置。 - -**部署关系图:**
- ![输入图片说明](figures/picture4.png) - -上图中客户端和服务端部署在ip为192.168.0.0的节点上,项目文件存放在该节点上,其他节点不用放置项目文件。
-主节点和从节点之间通过tcp协议通信,所以需要修改配置文件。 - -### atuned.cnf配置文件修改: -1. protocol 值设置为tcp。 -2. address设置为当前节点的ip地址。 -3. connect设置为所有节点的ip地址,第一个为主节点,其余为从节点ip,中间用逗号隔开。 -4. 在调试时,可以设置rest_tls 和engine_tls 为false。 -5. 所有的主从节点的atuned.cnf都按照上方描述修改。 - -### 注意事项 -1. 将engine.cnf中的`engine_host`和`engine_port`设置为服务端atuned.cnf中`engine_host`和`engine_port`一样的ip和端口号。 -2. 本文档不对认证证书配置方法作详细说明,如有需要也可以将atuned.cnf和engine.cnf中的rest_tls和engine_tls设置为false。 -3. 修改完配置文件后需要重启服务,否则修改不会生效。 -4. 注意使用atune服务时不要同时打开代理。 - -### 举例 -#### atuned.cnf -```bash -# ...前略... - -[server] -# the protocol grpc server running on -# ranges: unix or tcp -protocol = tcp - -# the address that the grpc server to bind to -# default is unix socket /var/run/atuned/atuned.sock -# ranges: /var/run/atuned/atuned.sock or ip address -address = 192.168.0.0 - -# the atune nodes in cluster mode, separated by commas -# it is valid when protocol is tcp -connect = 192.168.0.0,192.168.0.1,192.168.0.2,192.168.0.3 - -# the atuned grpc listening port -# the port can be set between 0 to 65535 which not be used -port = 60001 - -# the rest service listening port, default is 8383 -# the port can be set between 0 to 65535 which not be used -rest_host = localhost -rest_port = 8383 - -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.1.1 -engine_port = 3838 - -# ...后略... -``` - -#### engine.cnf -```bash -[server] -# the tuning optimizer host and port, start by engine.service -# if engine_host is same as rest_host, two ports cannot be same -# the port can be set between 0 to 65535 which not be used -engine_host = 192.168.1.1 -engine_port = 3838 -``` - -**备注:** engine.cnf参考分部式部署的配置文件。 \ No newline at end of file diff --git "a/docs/zh/docs/A-Tune/\345\270\270\350\247\201\351\227\256\351\242\230\344\270\216\350\247\243\345\206\263\346\226\271\346\263\225.md" "b/docs/zh/docs/A-Tune/\345\270\270\350\247\201\351\227\256\351\242\230\344\270\216\350\247\243\345\206\263\346\226\271\346\263\225.md" deleted file mode 100644 index 9fab425e826428dff3b14fb408dd31b7043244a3..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Tune/\345\270\270\350\247\201\351\227\256\351\242\230\344\270\216\350\247\243\345\206\263\346\226\271\346\263\225.md" +++ /dev/null @@ -1,55 +0,0 @@ -# 常见问题与解决方法 - -## **问题1:train命令训练模型出错,提示“training data faild”** - -原因:collection命令只采集一种类型的数据。 - -解决方法:至少采集两种数据类型的数据进行训练。 - -## **问题2:atune-adm无法连接atuned服务** - -可能原因: - -1. 检查atuned服务是否启动,并检查atuned侦听地址。 - - ``` - # systemctl status atuned - # netstat -nap | atuned - ``` - -2. 防火墙阻止了atuned的侦听端口。 -3. 系统配置了http代理导致无法连接。 - -解决方法: - -1. 如果atuned没有启动,启动该服务,参考命令如下: - - ``` - # systemctl start atuned - ``` - -2. 分别在atuned和atune-adm的服务器上执行如下命令,允许侦听端口接收网络包,其中60001为atuned的侦听端口号。 - - ``` - # iptables -I INPUT -p tcp --dport 60001 -j ACCEPT - # iptables -I INPUT -p tcp --sport 60001 -j ACCEPT - ``` - - -1. 不影响业务的前提下删除http代理,或对侦听IP不进行http代理,命令如下: - - ``` - # no_proxy=$no_proxy,侦听地址 - ``` - - -## **问题3:atuned服务无法启动,提示“Job for atuned.service failed because a timeout was exceeded.”** - -原因:hosts文件中缺少localhost配置 - -解决方法:在/etc/hosts文件中127.0.0.1这一行添加上localhost - -``` -127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 -``` - diff --git "a/docs/zh/docs/A-Tune/\350\256\244\350\257\206A-Tune.md" "b/docs/zh/docs/A-Tune/\350\256\244\350\257\206A-Tune.md" deleted file mode 100644 index 4553464fe538a2be8b316fc72ea40d000f0f4878..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Tune/\350\256\244\350\257\206A-Tune.md" +++ /dev/null @@ -1,195 +0,0 @@ -# 认识A-Tune - - -- [认识A-Tune](#认识A-Tune) - - [简介](#简介) - - [架构](#架构) - - [支持特性与业务模型](#支持特性与业务模型) - - -## 简介 - -操作系统作为衔接应用和硬件的基础软件,如何调整系统和应用配置,充分发挥软硬件能力,从而使业务性能达到最优,对用户至关重要。然而,运行在操作系统上的业务类型成百上千,应用形态千差万别,对资源的要求各不相同。当前硬件和基础软件组成的应用环境涉及高达7000多个配置对象,随着业务复杂度和调优对象的增加,调优所需的时间成本呈指数级增长,导致调优效率急剧下降,调优成为了一项极其复杂的工程,给用户带来巨大挑战。 - -其次,操作系统作为基础设施软件,提供了大量的软硬件管理能力,每种能力适用场景不尽相同,并非对所有的应用场景都通用有益,因此,不同的场景需要开启或关闭不同的能力,组合使用系统提供的各种能力,才能发挥应用程序的最佳性能。 - -另外,实际业务场景成千上万,计算、网络、存储等硬件配置也层出不穷,实验室无法遍历穷举所有的应用和业务场景,以及不同的硬件组合。 - -为了应对上述挑战,openEuler推出了A-Tune。 - -A-Tune是一款基于AI开发的系统性能优化引擎,它利用人工智能技术,对业务场景建立精准的系统画像,感知并推理出业务特征,进而做出智能决策,匹配并推荐最佳的系统参数配置组合,使业务处于最佳运行状态。 - -![](./figures/zh-cn_image_0227497000.png) - -## 架构 - -A-Tune核心技术架构如下图,主要包括智能决策、系统画像和交互系统三层。 - -- 智能决策层:包含感知和决策两个子系统,分别完成对应用的智能感知和对系统的调优决策。 -- 系统画像层:主要包括自动特征工程和两层分类模型,自动特征工程用于业务特征的自动选择,两层分类模型用于业务模型的学习和分类。 -- 交互系统层:用于各类系统资源的监控和配置,调优策略执行在本层进行。 - -![](./figures/zh-cn_image_0227497343.png) - -## 支持特性与业务模型 - -### 支持特性 - -A-Tune支持的主要特性、特性成熟度以及使用建议请参见[表1](#table1919220557576)。 - -**表 1** 特性成熟度 - - - - - - - - - - - - - - - - - - - - -

特性

-

成熟度

-

使用建议

-

11大类15款应用负载类型自动优化

-

已测试

-

试用

-

自定义profile和业务模型

-

已测试

-

试用

-

参数自调优

-

已测试

-

试用

-
- - -### 支持业务模型 - -根据应用的负载特征,A-Tune将业务分为11大类,各类型的负载特征和A-Tune支持的应用请参见[表2](#table2819164611311)。 - -**表 2** 支持的业务类型和应用 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

业务大类

-

业务类型

-

瓶颈点

-

支持的应用

-

default

-

默认类型

-

算力、内存、网络、IO各维度资源使用率都不高

-

N/A

-

webserver

-

web应用

-

算力瓶颈、网络瓶颈

-

Nginx、Apache Traffic Server

-

database

-

数据库

-
算力瓶颈、内存瓶颈、IO瓶颈 -

Mongodb、Mysql、Postgresql、Mariadb

-

big-data

-

大数据

-

算力瓶颈、内存瓶颈

-

Hadoop-hdfs、Hadoop-spark

-

middleware

-

中间件框架

-

算力瓶颈、网络瓶颈

-

Dubbo

-

in-memory-database

-

内存数据库

-

内存瓶颈、IO瓶颈

-

Redis

-

basic-test-suite

-

基础测试套

-

算力瓶颈、内存瓶颈

-

SPECCPU2006、SPECjbb2015

-

hpc

-

人类基因组

-

算力瓶颈、内存瓶颈、IO瓶颈

-

Gatk4

-

storage

-

存储

-

网络瓶颈、IO瓶颈

-

Ceph

-

virtualization

-

虚拟化

-

算力瓶颈、内存瓶颈、IO瓶颈

-

Consumer-cloud、Mariadb

-

docker

-

容器

-

算力瓶颈、内存瓶颈、IO瓶颈

-

Mariadb

-
- - diff --git "a/docs/zh/docs/A-Tune/\351\231\204\345\275\225.md" "b/docs/zh/docs/A-Tune/\351\231\204\345\275\225.md" deleted file mode 100644 index 80c87bad98827cce03e146ba202bad25b65b32fa..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/A-Tune/\351\231\204\345\275\225.md" +++ /dev/null @@ -1,28 +0,0 @@ -# 附录 - - -- [附录](#附录) - - [术语和缩略语](#术语和缩略语) - - - -## 术语和缩略语 - -**表 1** 术语表 - - - - - - - - - - - -

术语

-

含义

-

profile

-

优化项集合,最佳的参数配置

-
- diff --git a/docs/zh/docs/Administration/memory-management.md b/docs/zh/docs/Administration/memory-management.md deleted file mode 100644 index e3daff813f6b135a9824acd5d4ffc8ac7f7576c5..0000000000000000000000000000000000000000 --- a/docs/zh/docs/Administration/memory-management.md +++ /dev/null @@ -1,799 +0,0 @@ -# etmem - -## 介绍 - -随着CPU算力的发展,尤其是ARM核成本的降低,内存成本和内存容量成为约束业务成本和性能的核心痛点,因此如何节省内存成本,如何扩大内存容量成为存储迫切要解决的问题。 - -etmem内存分级扩展技术,通过DRAM+内存压缩/高性能存储新介质形成多级内存存储,对内存数据进行分级,将分级后的内存冷数据从内存介质迁移到高性能存储介质中,达到内存容量扩展的目的,从而实现内存成本下降。 - -etmem软件包运行的工具主要分为etmem客户端和etmemd服务端。etmemd服务端工具,运行后常驻,其中实现了目的进程的内存冷热识别及淘汰等功能。etmem客户端工具,调用时运行一次,根据命令参数的不同,控制etmemd服务端响应不同的操作。 - -## 编译教程 - -1. 下载etmem源码 - - ```bash - $ git clone https://gitee.com/openeuler/etmem.git - ``` - -2. 编译和运行依赖 - - etmem的编译和运行依赖于libboundscheck组件 - -3. 编译 - - ```bash - $ cd etmem - - $ mkdir build - - $ cd build - - $ cmake .. - - $ make - ``` - -## 注意事项 - -### 运行依赖 - -etmem作为内存扩展工具,需要依赖于内核态的特性支持,为了可以识别内存访问情况和支持主动将内存写入swap分区来达到内存垂直扩展的需求,etmem在运行时需要插入`etmem_scan`和`etmem_swap`模块: - -```bash -modprobe etmem_scan -modprobe etmem_swap -``` - -### 权限限制 - -运行etmem进程需要root权限,root用户具有系统最高权限,在使用root用户进行操作时,请严格按照操作指导进行操作,避免其他操作造成系统管理及安全风险。 - -### 使用约束 - -- etmem的客户端和服务端需要在同一个服务器上部署,不支持跨服务器通信的场景。 -- etmem仅支持扫描进程名小于或等于15个字符长度的目标进程。在使用进程名时,支持的进程名有效字符为:“字母”, “数字”,特殊字符“./%-_”以及上述三种的组合,其余组合认为是非法字符。 -- 在使用AEP介质进行内存扩展的时候,依赖于系统可以正确识别AEP设备并将AEP设备初始化为`numa node`。并且配置文件中的`vm_flags`字段只能配置为`ht`。 -- 引擎私有命令仅针对对应引擎和引擎下的任务有效,比如cslide所支持的`showhostpages`和`showtaskpages`。 -- 第三方策略实现代码中,`eng_mgt_func`接口中的`fd`不能写入`0xff`和`0xfe`字。 -- 支持在一个工程内添加多个不同的第三方策略动态库,以配置文件中的`eng_name`来区分。 -- 禁止并发扫描同一个进程。 -- 未加载`etmem_scan`和`etmem_swap` ko时,禁止使用`/proc/xxx/idle_pages`和`/proc/xxx/swap_pages`文件。 -- etmem对应配置文件,其权限要求为属主为root用户,且权限为600或400,配置文件大小不超过10M。 -- etmem在进行第三方策略注入时,第三方策略的`so`权限要求为属主为root用户,且权限为500或700。 - -## 使用说明 - -### etmem配置文件 - -在运行etmem进程之前,需要管理员预先规划哪些进程需要做内存扩展,将进程信息配置到etmem配置文件中,并配置内存扫描的周期、扫描次数、内存冷热阈值等信息。 - -配置文件的示例文件在源码包中,放置在`/etc/etmem`文件路径下,按照功能划分为3个示例文件, - -```text -/etc/etmem/cslide_conf.yaml -/etc/etmem/slide_conf.yaml -/etc/etmem/thirdparty_conf.yaml -``` - -示例内容分别为: - -```sh -#slide引擎示例 -#slide_conf.yaml -[project] -name=test -loop=1 -interval=1 -sleep=1 -sysmem_threshold=50 -swapcache_high_vmark=10 -swapcache_low_vmark=6 - -[engine] -name=slide -project=test - -[task] -project=test -engine=slide -name=background_slide -type=name -value=mysql -T=1 -max_threads=1 -swap_threshold=10g -swap_flag=yes - -#cslide引擎示例 -#cslide_conf.yaml -[engine] -name=cslide -project=test -node_pair=2,0;3,1 -hot_threshold=1 -node_mig_quota=1024 -node_hot_reserve=1024 - -[task] -project=test -engine=cslide -name=background_cslide -type=pid -name=23456 -vm_flags=ht -anon_only=no -ign_host=no - -#thirdparty引擎示例 -#thirdparty_conf.yaml -[engine] -name=thirdparty -project=test -eng_name=my_engine -libname=/usr/lib/etmem_fetch/my_engine.so -ops_name=my_engine_ops -engine_private_key=engine_private_value - -[task] -project=test -engine=my_engine -name=background_third -type=pid -value=12345 -task_private_key=task_private_value -``` - -配置文件各字段说明: - -| 配置项 | 配置项含义 | 是否必须 | 是否有参数 | 参数范围 | 示例说明 | -|-----------|---------------------|------|-------|------------|-----------------------------------------------------------------| -| [project] | project公用配置段起始标识 | 否 | 否 | NA | project参数的开头标识,表示下面的参数直到另外的[xxx]或文件结尾为止的范围内均为project section的参数 | -| name | project的名字 | 是 | 是 | 64个字以内的字符串 | 用来标识project,engine和task在配置时需要指定要挂载到的project | -| loop | 内存扫描的循环次数 | 是 | 是 | 1~120 | loop=3 //扫描3次 | -| interval | 每次内存扫描的时间间隔 | 是 | 是 | 1~1200 | interval=5 //每次扫描之间间隔5s | -| sleep | 每个内存扫描+操作的大周期之间时间间隔 | 是 | 是 | 1~1200 | sleep=10 //每次大周期之间间隔10s | -| sysmem_threshold| slide engine的配置项,系统内存换出阈值 | 否 | 是 | 0~100 | sysmem_threshold=50 //系统内存剩余量小于50%时,etmem才会触发内存换出| -| swapcache_high_wmark| slide engine的配置项,swacache可以占用系统内存的比例,高水线 | 否 | 是 | 1~100 | swapcache_high_wmark=5 //swapcache内存占用量可以为系统内存的5%,超过该比例,etmem会触发swapcache回收
注: swapcache_high_wmark需要大于swapcache_low_wmark| -| swapcache_low_wmark| slide engine的配置项,swacache可以占用系统内存的比例,低水线 | 否 | 是 | [1~swapcache_high_wmark) | swapcache_low_wmark=3 //触发swapcache回收后,系统会将swapcache内存占用量回收到低于3%| -| [engine] | engine公用配置段起始标识 | 否 | 否 | NA | engine参数的开头标识,表示下面的参数直到另外的[xxx]或文件结尾为止的范围内均为engine section的参数 | -| project | 声明所在的project | 是 | 是 | 64个字以内的字符串 | 已经存在名字为test的project,则可以写为project=test | -| engine | 声明所在的engine | 是 | 是 | slide/cslide/thridparty | 声明使用的是slide或cslide或thirdparty策略 | -| node_pair | cslide engine的配置项,声明系统中AEP和DRAM的node pair | engine为cslide时必须配置 | 是 | 成对配置AEP和DRAM的node号,AEP和DRAM之间用逗号隔开,没对pair之间用分号隔开 | node_pair=2,0;3,1 | -| hot_threshold | cslide engine的配置项,声明内存冷热水线的阈值 | engine为cslide时必须配置 | 是 | 大于等于0,小于等于INT_MAX的整数 | hot_threshold=3 //访问次数小于3的内存会被识别为冷内存 | -|node_mig_quota|cslide engine的配置项,流控,声明每次DRAM和AEP互相迁移时单向最大流量|engine为cslide时必须配置|是|大于等于0,小于等于INT_MAX的整数|node_mig_quota=1024 //单位为MB,AEP到DRAM或DRAM到AEP搬迁一次最大1024M| -|node_hot_reserve|cslide engine的配置项,声明DRAM中热内存的预留空间大小|engine为cslide时必须配置|是|大于等于0,小于等于INT_MAX的整数|node_hot_reserve=1024 //单位为MB,当所有虚拟机热内存大于此配置值时,热内存也会迁移到AEP中| -|eng_name|thirdparty engine的配置项,声明engine自己的名字,供task挂载|engine为thirdparty时必须配置|是|64个字以内的字符串|eng_name=my_engine //对此第三方策略engine挂载task时,task中写明engine=my_engine| -|libname|thirdparty engine的配置项,声明第三方策略的动态库的地址,绝对地址|engine为thirdparty时必须配置|是|256个字以内的字符串|libname=/user/lib/etmem_fetch/code_test/my_engine.so| -|ops_name|thirdparty engine的配置项,声明第三方策略的动态库中操作符号的名字|engine为thirdparty时必须配置|是|256个字以内的字符串|ops_name=my_engine_ops //第三方策略实现接口的结构体的名字| -|engine_private_key|thirdparty engine的配置项,预留给第三方策略自己解析私有参数的配置项,选配|否|否|根据第三方策略私有参数自行限制|根据第三方策略私有engine参数自行配置| -| [task] | task公用配置段起始标识 | 否 | 否 | NA | task参数的开头标识,表示下面的参数直到另外的[xxx]或文件结尾为止的范围内均为task section的参数 | -| project | 声明所挂的project | 是 | 是 | 64个字以内的字符串 | 已经存在名字为test的project,则可以写为project=test | -| engine | 声明所挂的engine | 是 | 是 | 64个字以内的字符串 | 所要挂载的engine的名字 | -| name | task的名字 | 是 | 是 | 64个字以内的字符串 | name=background1 //声明task的名字是backgound1 | -| type | 目标进程识别的方式 | 是 | 是 | pid/name | pid代表通过进程号识别,name代表通过进程名称识别 | -| value | 目标进程识别的具体字段 | 是 | 是 | 实际的进程号/进程名称 | 与type字段配合使用,指定目标进程的进程号或进程名称,由使用者保证配置的正确及唯一性 | -| T | engine为slide的task配置项,声明内存冷热水线的阈值 | engine为slide时必须配置 | 是 | 0~loop * 3 | T=3 //访问次数小于3的内存会被识别为冷内存 | -| max_threads | engine为slide的task配置项,etmemd内部线程池最大线程数,每个线程处理一个进程/子进程的内存扫描+操作任务 | 否 | 是 | 1~2 * core数 + 1,默认为1 | 对外部无表象,控制etmemd服务端内部处理线程个数,当目标进程有多个子进程时,配置越大,并发执行的个数也多,但占用资源也越多 | -| vm_flags | engine为cslide的task配置项,通过指定flag扫描的vma,不配置此项时扫描则不会区分 | 否 | 是 | 256长度以内的字符串,不同flag以空格隔开 | vm_flags=ht //扫描flags为ht(大页)的vma内存 | -| anon_only | engine为cslide的task配置项,标识是否只扫描匿名页 | 否 | 是 | yes/no | anon_only=no //配置为yes时只扫描匿名页,配置为no时非匿名页也会扫描 | -| ign_host | engine为cslide的task配置项,标识是否忽略host上的页表扫描信息 | 否 | 是 | yes/no | ign_host=no //yes为忽略,no为不忽略 | -| task_private_key | engine为thirdparty的task配置项,预留给第三方策略的task解析私有参数的配置项,选配 | 否 | 否 | 根据第三方策略私有参数自行限制 | 根据第三方策略私有task参数自行配置 | -| swap_threshold |slide engine的配置项,进程内存换出阈值 | 否 | 是 | 进程可用内存绝对值 | swap_threshold=10g //进程占用内存在低于10g时不会触发换出。
当前版本下,仅支持g/G作为内存绝对值单位。与sysmem_threshold配合使用,仅系统内存低于阈值时,进行白名单中进程阈值判断 | -| swap_flag|slide engine的配置项,进程指定内存换出 | 否 | 是 | yes/no | swap_flag=yes//使能进程指定内存换出 | - -### etmemd服务端启动 - -在使用etmem提供的服务时,首先根据需要修改相应的配置文件,然后运行etmemd服务端,常驻在系统中来操作目标进程的内存。除了支持在命令行中通过二进制来启动etmemd的进程外,还可以通过配置`service`文件来使etmemd服务端通过`systemctl`方式拉起,此场景需要通过`mode-systemctl`参数来指定支持 - -#### 使用方法 - -可以通过下列示例命令启动etmemd的服务端: - -```bash -etmemd -l 0 -s etmemd_socket -``` - -或者 - -```bash -etmemd --log-level 0 --socket etmemd_socket -``` - -其中`-l`的`0`和`-s`的`etmemd_socket`是用户自己输入的参数,参数具体含义参考以下列表 - -#### 命令行参数说明 - -| 参数 | 参数含义 | 是否必须 | 是否有参数 | 参数范围 | 示例说明 | -| --------------- | ---------------------------------- | -------- | ---------- | --------------------- | ------------------------------------------------------------ | -| -l或\-\-log-level | etmemd日志级别 | 否 | 是 | 0~3 | 0:debug级别
1:info级别
2:warning级别
3:error级别
只有大于等于配置的级别才会打印到/var/log/message文件中 | -| -s或\-\-socket | etmemd监听的名称,用于与客户端交互 | 是 | 是 | 107个字符之内的字符串 | 指定服务端监听的名称 | -| -m或\-\-mode-systemctl| 指定通过systemctl方式来拉起stmemd服务| 否| 否| NA| service文件中需要指定-m参数| -| -h或\-\-help | 帮助信息 | 否 | 否 | NA | 执行时带有此参数会打印后退出 | - -### 通过etmem客户端添加或者删除工程/引擎/任务 - -#### 场景描述 - -1)管理员创建etmem的project/engine/task(一个工程可包含多个etmem engine,一个engine可以包含多个任务) - -2)管理员删除已有的etmem project/engine/task(删除工程前,会自动先停止该工程中的所有任务) - -#### 使用方法 - -在etmemd服务端正常运行后,通过etmem客户端,通过第二个参数指定为obj,来进行创建或删除动作,对project/engine/task则是通过配置文件中配置的内容来进行识别和区分。 - -- 添加对象: - ```bash - etmem obj add -f /etc/etmem/slide_conf.yaml -s etmemd_socket - ``` - - 或 - - ```bash - etmem obj add --file /etc/etmem/slide_conf.yaml --socket etmemd_socket - ``` - -- 删除对象: - ```bash - etmem obj del -f /etc/etmem/slide_conf.yaml -s etmemd_socket - ``` - - 或 - - ```bash - etmem obj del --file /etc/etmem/slide_conf.yaml --socket etmemd_socket - ``` - -#### 命令行参数说明 - - -| 参数 | 参数含义 | 是否必须 | 是否有参数 | 示例说明 | -| ------------ | ------------------------------------------------------------ | -------- | ---------- | -------------------------------------------------------- | -| -f或\-\-file | 指定对象的配置文件 | 是 | 是 | 需要指定路径名称 | -| -s或\-\-socket | 与etmemd服务端通信的socket名称,需要与etmemd启动时指定的保持一致 | 是 | 是 | 必须配置,在有多个etmemd时,由管理员选择与哪个etmemd通信 | - -### 通过etmem客户端查询/启动/停止工程 - -#### 场景描述 - -在已经通过`etmem obj add`添加工程之后,在还未调用`etmem obj del`删除工程之前,可以对etmem的工程进行启动和停止。 - -1)管理员启动已添加的工程 - -2)管理员停止已启动的工程 - -在管理员调用`obj del`删除工程时,如果工程已经启动,则会自动停止。 - -#### 使用方法 - -对于已经添加成功的工程,可以通过`etmem project`的命令来控制工程的启动和停止,命令示例如下: - -- 查询工程 - - ```bash - etmem project show -n test -s etmemd_socket - ``` - - 或 - - ```bash - etmem project show --name test --socket etmemd_socket - ``` - -- 启动工程 - - ```bash - etmem project start -n test -s etmemd_socket - ``` - - 或 - - ```bash - etmem project start --name test --socket etmemd_socket - ``` - -- 停止工程 - - ```bash - etmem project stop -n test -s etmemd_socket - ``` - - 或 - - ```bash - etmem project stop --name test --socket etmemd_socket - ``` - -- 打印帮助 - - ```bash - etmem project help - ``` - -#### 命令行参数说明 - -| 参数 | 参数含义 | 是否必须 | 是否有参数 | 示例说明 | -| ------------ | ------------------------------------------------------------ | -------- | ---------- | -------------------------------------------------------- | -| -n或\-\-name | 指定project名称 | 是 | 是 | project名称,与配置文件一一对应 | -| -s或\-\-socket | 与etmemd服务端通信的socket名称,需要与etmemd启动时指定的保持一致 | 是 | 是 | 必须配置,在有多个etmemd时,由管理员选择与哪个etmemd通信 | - - -### 通过etmem客户端,支持内存阈值换出以及指定内存换出 - -当前支持的策略中,只有slide策略支持私有的功能特性 - -- 进程或系统内存阈值换出 - -为了获得业务的极致性能,需要考虑etmem内存扩展进行内存换出的时机;当系统可用内存足够,系统内存压力不大时,不进行内存交换;当进程占用内存不高时,不进行内存交换;提供系统内存换出阈值控制以及进程内存换出阈值控制。 - -- 进程指定内存换出 - -在存储环境下,具有IO时延敏感型业务进程,上述进程内存不希望进行换出,因此提供一种机制,由业务指定可换出内存 - -针对进程或系统内存阈值换出,进程指定内存换出功能,可以在配置文件中添加`sysmem_threshold`,`swap_threshold`,`swap_flag`参数,示例如下,具体含义请参考etmem配置文件说明章节。 - -```sh -#slide_conf.yaml -[project] -name=test -loop=1 -interval=1 -sleep=1 -sysmem_threshold=50 - -[engine] -name=slide -project=test - -[task] -project=test -engine=slide -name=background_slide -type=name -value=mysql -T=1 -max_threads=1 -swap_threshold=10g -swap_flag=yes -``` - -#### 系统内存阈值换出 - -配置文件中`sysmem_threshold`用于指示系统内存阈值换出功能,`sysmem_threshold`取值范围为0-100,如果配置文件中设定了`sysmem_threshold`,那么只有系统内存剩余量低于该比例时,etmem才会触发内存换出流程 - -示例使用方法如下: - -1. 参考示例编写配置文件,配置文件中填写`sysmem_threshold`参数,例如`sysmem_threshold=20` -2. 启动服务端,并通过服务端添加,启动工程。 - - ```bash - etmemd -l 0 -s monitor_app & - etmem obj add -f etmem_config -s monitor_app - etmem project start -n test -s monitor_app - etmem project show -s monitor_app - ``` - -3. 观察内存换出结果,只有系统可用内存低于20%时,etmem才会触发内存换出 - -#### 进程内存阈值换出 - -配置文件中`swap_threshold`用于指示进程内存阈值换出功能,`swap_threshold`为进程内存占用量绝对值(格式为"数字+单位g/G"),如果配置文件中设定了`swap_threshold`,那么该进程内存占用量在小于该设定的可用内存量时,etmem不会针对该进程触发换出流程 - -示例使用方法如下: - -1. 参考示例编写配置文件,配置文件中填写`swap_threshold`参数,例如`swap_threshold=5g` -2. 启动服务端,并通过服务端添加,启动工程。 - - ```bash - etmemd -l 0 -s monitor_app & - etmem obj add -f etmem_config -s monitor_app - etmem project start -n test -s monitor_app - etmem project show -s monitor_app - ``` - -3. 观察内存换出结果,只有进程占用内存绝对值高于5G时,etmem才会触发内存换出 - -#### 进程指定内存换出 - -配置文件中`swap_flag`用于指示进程指定内存换出功能,`swap_flag`取值仅有两个:`yes/no`,如果配置文件中设定了`swap_flag`为no或者未配置,那么etmem换出功能无变化,如果`swap_flag`设定为yes,那么etmem仅仅换出进程指定的内存。 - -示例使用方法如下: - -1. 参考示例编写配置文件,配置文件中填写`swap_flag`参数,例如`swap_flag=yes` -2. 业务进程对需要进行换出的内存打标记 - - ```bash - madvise(addr_start, addr_len, MADV_SWAPFLAG) - ``` - -3. 启动服务端,并通过服务端添加,启动工程。 - - ```bash - etmemd -l 0 -s monitor_app & - etmem obj add -f etmem_config -s monitor_app - etmem project start -n test -s monitor_app - etmem project show -s monitor_app - ``` - -4. 观察内存换出结果,只有进程打标记的部分内存会被换出,其余内存保留在DRAM中,不会被换出 - -针对进程指定页面换出的场景中,在原扫描接口`idle_pages`中添加`ioctl`命令字的形式,来确认不带有特定标记的vma不进行扫描与换出操作 - -扫描管理接口 - -- 函数原型 - - ```c - ioctl(fd, cmd, void *arg); - ``` - -- 输入参数 - - ```text - 1. fd:文件描述符,通过open调用在/proc/pid/idle_pages下打开文件获得 - - 2.cmd:控制扫描行为,当前支持如下cmd: - VMA_SCAN_ADD_FLAGS:新增vma指定内存换出标记,仅扫描带有特定标记的VMA - VMA_SCAN_REMOVE_FLAGS:删除新增的VMA指定内存换出标记 - - 3.args:int指针参数,传递具体标记掩码,当前仅支持如下参数: - VMA_SCAN_FLAG:在etmem_scan.ko扫描模块开始扫描前,会调用接口walk_page_test接口判断vma地址是否符合扫描要求,此标记置位时,会仅扫描带有特定换出标记的vma地址段,而忽略其他vma地址 - ``` - -- 返回值 - - ```text - 1.成功,返回0 - 2.失败返回非0 - ``` - -- 注意事项 - - ```text - 所有不支持的标记都会被忽略,但是不会返回错误 - ``` - -### 通过etmem客户端,支持swapcache内存回收指令 - -用户态etmem发起内存淘汰回收操作,通过`write procfs`接口与内核态的内存回收模块交互,内存回收模块解析用户态下发的虚拟地址,获取地址对应的page页面,并调用内核原生接口将该page对应内存进行换出回收,在内存换出的过程中,swapcache会占用部分系统内存,为进一步节约内存,添加swapcache内存回收功能. - -针对swapcache内存回收功能,可以在配置文件中添加`swapcache_high_wmark`,`swapcache_low_wmark`参数。 - -- `swapcache_high_wmark`: swapcache可以占用系统内存的高水位线 -- `swapcache_low_wmark`:swapcache可以占用系统内存的低水位线 - -在etmem进行一轮内存换出后,会进行swapcache占用系统内存比例的检查,当占用比例超过高水位线后,会通过`swap_pages`下发`ioctl`命令,触发swapcache内存回收,并回收到低水位线停止 - -配置参数示例如下,具体请参考etmem配置文件相关章节: - -```sh -#slide_conf.yaml -[project] -name=test -loop=1 -interval=1 -sleep=1 -swapcache_high_vmark=5 -swapcache_low_vmark=3 - -[engine] -name=slide -project=test - -[task] -project=test -engine=slide -name=background_slide -type=name -value=mysql -T=1 -max_threads=1 -``` - -针对swap换出场景中,需要通过swapcache内存回收进一步节约内存,在原内存换出接口`swap_pages`中通过添加`ioctl`接口的方式,来提供swapcache水线的设定以及swapcache内存占用量回收的启动与关闭 - -- 函数原型 - - ```c - ioctl(fd, cmd, void *arg); - ``` - -- 输入参数 - - ```text - 1. fd:文件描述符,通过open调用在/proc/pid/idle_pages下打开文件获得 - - 2.cmd:控制扫描行为,当前支持如下cmd: - RECLAIM_SWAPCACHE_ON:启动swapcache内存换出 - RECLAIM_SWAPCACHE_OFF:关闭swapcache内存换出 - SET_SWAPCACHE_WMARK:设定swapcache内存水线 - - 3.args:int指针参数,传递具体标记掩码,当前仅支持如下参数: - 参数用来传递swapcache水线具体值 - ``` - -- 返回值 - - ```text - 1.成功,返回0 - 2.失败返回非0 - ``` - -- 注意事项 - - ```text - 所有不支持的标记都会被忽略,但是不会返回错误 - ``` - -### 通过etmem客户端,执行引擎私有命令或功能 - -当前支持的策略中,只有cslide策略支持私有的命令 - -- `showtaskpages` -- `showhostpages` - -针对使用此策略引擎的engine和engine所有的task,可以通过这两个命令分别查看task相关的页面访问情况和虚拟机的host上系统大页的使用情况。 - -示例命令如下: - -```bash -etmem engine showtaskpages <-t task_name> -n proj_name -e cslide -s etmemd_socket - -etmem engine showhostpages -n proj_name -e cslide -s etmemd_socket -``` - -**注意** :`showtaskpages`和`showhostpages`仅支持引擎使用cslide的场景 - -#### 命令行参数说明 -| 参数 | 参数含义 | 是否必须 | 是否有参数 | 实例说明 | -|----|------|------|-------|------| -|-n或\-\-proj_name| 指定project的名字| 是| 是| 指定已经存在,所需要执行的project的名字| -|-s或\-\-socket| 与etmemd服务端通信的socket名称,需要与etmemd启动时指定的保持一致| 是| 是| 必须配置,在有多个etmemd时,由管理员选择与哪个etmemd通信| -|-e或\-\-engine| 指定执行的引擎的名字| 是| 是| 指定已经存在的,所需要执行的引擎的名字| -|-t或\-\-task_name| 指定执行的任务的名字| 否| 是| 指定已经存在的,所需要执行的任务的名字| - -### 支持kernel swap功能开启与关闭 - -针对swap换出到磁盘场景,当etmem用于内存扩展时,用户可以选择是否同时开启内核swap功能。用户可以关闭内核原生swap机制,以免原生swap机制换出不应被换出的内存,导致用户态进程出现问题。 - -通过提供sys接口实现上述控制,在`/sys/kernel/mm/swap`目录下创建`kobj`对象,对象名为`kernel_swap_enable`,默认为`true`,用于控制kernel swap的启动与关闭 - -具体示例如下: - -```sh -#开启kernel swap -echo true > /sys/kernel/mm/swap/kernel_swap_enbale -或者 -echo 1 > /sys/kernel/mm/swap/kernel_swap_enbale - -#关闭kernel swap -echo false > /sys/kernel/mm/swap/kernel_swap_enbale -或者 -echo 0 > /sys/kernel/mm/swap/kernel_swap_enbale - -``` - -### etmem支持随系统自启动 - -#### 场景描述 - -etmemd支持由用户配置`systemd`配置文件后,以`fork`模式作为`systemd`服务被拉起运行 - -#### 使用方法 - -编写`service`配置文件,来启动etmemd,必须使用-m参数来指定此模式,例如 - -```bash -etmemd -l 0 -s etmemd_socket -m -``` - -#### 命令行参数说明 -| 参数 | 参数含义 | 是否必须 | 是否有参数 | 参数范围 | 实例说明 | -|----------------|------------|------|-------|------|-----------| -| -l或\-\-log-level | etmemd日志级别 | 否 | 是 | 0~3 | 0:debug级别;1:info级别;2:warning级别;3:error级别;只有大于等于配置的级别才会打印到/var/log/message文件中| -| -s或\-\-socket |etmemd监听的名称,用于与客户端交互 | 是 | 是| 107个字符之内的字符串| 指定服务端监听的名称| -|-m或\-\-mode-systemctl | etmemd作为service被拉起时,命令中需要指定此参数来支持 | 否 | 否 | NA | NA | -| -h或\-\-help | 帮助信息 | 否 |否 |NA |执行时带有此参数会打印后退出| - - -### etmem支持第三方内存扩展策略 - -#### 场景描述 - -etmem支持用户注册第三方内存扩展策略,同时提供扫描模块动态库,运行时通过第三方策略淘汰算法淘汰内存。 - -用户使用etmem所提供的扫描模块动态库并实现对接etmem所需要的结构体中的接口 - -#### 使用方法 - -用户使用自己实现的第三方扩展淘汰策略,主要需要按下面步骤进行实现和操作: - -1. 按需调用扫描模块提供的扫描接口, - -2. 按照etmem头文件中提供的函数模板来实现各个接口,最终封装成结构体 - -3. 编译出第三方扩展淘汰策略的动态库 - -4. 在配置文件中按要求声明类型为thirdparty的engine - -5. 将动态库的名称和接口结构体的名称按要求填入配置文件中task对应的字段 - -其他操作步骤与使用etmem的其他engine类似 - -接口结构体模板 - -```c -struct engine_ops { - -/* 针对引擎私有参数的解析,如果有,需要实现,否则置NULL */ - -int (*fill_eng_params)(GKeyFile *config, struct engine *eng); - -/* 针对引擎私有参数的清理,如果有,需要实现,否则置NULL */ - -void (*clear_eng_params)(struct engine *eng); - -/* 针对任务私有参数的解析,如果有,需要实现,否则置NULL */ - -int (*fill_task_params)(GKeyFile *config, struct task *task); - -/* 针对任务私有参数的清理,如果有,需要实现,否则置NULL */ - -void (*clear_task_params)(struct task *tk); - -/* 启动任务的接口 */ - -int (*start_task)(struct engine *eng, struct task *tk); - -/* 停止任务的接口 */ - -void (*stop_task)(struct engine *eng, struct task *tk); - -/* 填充pid相关私有参数 */ - -int (*alloc_pid_params)(struct engine *eng, struct task_pid **tk_pid); - -/* 销毁pid相关私有参数 */ - -void (*free_pid_params)(struct engine *eng, struct task_pid **tk_pid); - -/* 第三方策略自身所需要的私有命令支持,如果没有,置为NULL */ - -int (*eng_mgt_func)(struct engine *eng, struct task *tk, char *cmd, int fd); - -}; -``` - -扫描模块对外接口说明 - -| 接口名称 |接口描述| -| ------------ | --------------------- | -| etmemd_scan_init | scan模块初始化| -| etmemd_scan_exit | scan模块析构| -| etmemd_get_vmas | 获取需要扫描的vma| -| etmemd_free_vmas | 释放etmemd_get_vmas扫描到的vma| -| etmemd_get_page_refs | 扫描vmas中的页面| -| etmemd_free_page_refs | 释放etmemd_get_page_refs获取到的页访问信息链表| - -针对扫描虚拟机的场景中,在原扫描接口`idle_pages`中添加`ioctl`接口的方式,来提供区分扫描`ept`的粒度和是否忽略host上页访问标记的机制 - -针对进程指定页面换出的场景中,在原扫描接口`idle_pages`中添加`ioctl`命令字的形式,来确认不带有特定标记的vma不进行扫描和换出操作 - -扫描管理接口: - -- 函数原型 - - ```c - ioctl(fd, cmd, void *arg); - ``` - -- 输入参数 - - ```text - 1. fd:文件描述符,通过open调用在/proc/pid/idle_pages下打开文件获得 - - 2.cmd:控制扫描行为,当前支持如下cmd: - IDLE_SCAN_ADD_FLAG:新增一个扫描标记 - IDLE_SCAM_REMOVE_FLAGS:删除一个扫描标记 - VMA_SCAN_ADD_FLAGS:新增vma指定内存换出标记,仅扫描带有特定标记的VMA - VMA_SCAN_REMOVE_FLAGS:删除新增的VMA指定内存换出标记 - - 3.args:int指针参数,传递具体标记掩码,当前仅支持如下参数: - SCAN_AS_HUGE:扫描ept页表时,按照2M大页粒度扫描页是否被访问过。此标记未置位时,按照ept页表自身粒度扫描 - SCAN_IGN_HUGE:扫描虚拟机时,忽略host侧页表上的访问标记。此标记未置位时,不会忽略host侧页表上的访问标记。 - VMA_SCAN_FLAG:在etmem_scan.ko扫描模块开始扫描前,会调用接口walk_page_test接口判断vma地址是否符合扫描要求,此标记置位时,会仅扫描带有特定换出标记的vma地址段,而忽略其他vma地址 - ``` - -- 返回值 - - ```text - 1.成功,返回0 - 2.失败返回非0 - ``` - -- 注意事项 - - ```text - 所有不支持的标记都会被忽略,但是不会返回错误 - ``` - -配置文件示例如下所示,具体含义请参考配置文件说明章节: - -```sh -#thirdparty -[engine] - -name=thirdparty - -project=test - -eng_name=my_engine - -libname=/user/lib/etmem_fetch/code_test/my_engine.so - -ops_name=my_engine_ops - -engine_private_key=engine_private_value - -[task] - -project=test - -engine=my_engine - -name=background1 - -type=pid - -value=1798245 - -task_private_key=task_private_value -``` - - **注意** : - -用户需使用etmem所提供的扫描模块动态库并实现对接etmem所需要的结构体中的接口 - -`eng_mgt_func`接口中的`fd`不能写入`0xff`和`0xfe`字 - -支持在一个工程内添加多个不同的第三方策略动态库,以配置文件中的`eng_name`来区分 - -### etmem客户端和服务端帮助说明 - -通过下列命令可以打印etmem服务端帮助说明 - -```bash -etmemd -h -``` - -或 - -```bash -etmemd --help -``` - -通过下列命令可以打印etmem客户端帮助说明 - -```bash -etmem help -``` - -通过下列命令可以打印etmem客户端操作工程/引擎/任务相关帮助说明 - -```bash -etmem obj help -``` - -通过下列命令可以打印etmem客户端对项目相关帮助说明 - -```bash -etmem project help -``` - -## 参与贡献 - -1. Fork本仓库 -2. 新建个人分支 -3. 提交代码 -4. 新建Pull Request diff --git a/docs/zh/docs/DPUOffload/figures/offload-arch.png b/docs/zh/docs/DPUOffload/figures/offload-arch.png deleted file mode 100644 index 944900b42c13091e4ec40c6d51dc3c95088aa1b8..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/DPUOffload/figures/offload-arch.png and /dev/null differ diff --git a/docs/zh/docs/DPUOffload/figures/qtfs-arch.png b/docs/zh/docs/DPUOffload/figures/qtfs-arch.png deleted file mode 100644 index 40fd7e28707642801ec0b984690a25c08e092ac4..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/DPUOffload/figures/qtfs-arch.png and /dev/null differ diff --git a/docs/zh/docs/DPUOffload/overview.md b/docs/zh/docs/DPUOffload/overview.md deleted file mode 100644 index 518deb0158764ba6e51207b29543f8ebaf513294..0000000000000000000000000000000000000000 --- a/docs/zh/docs/DPUOffload/overview.md +++ /dev/null @@ -1,11 +0,0 @@ -# 容器管理面DPU无感卸载指南 - -本文档介绍基于openEuler操作系统的容器管理面DPU无感卸载功能特性及安装部署方法,该特性可以通过操作系统提供的统一抽象层,屏蔽容器管理面跨主机资源访问的差异,实现容器管理面业务无感卸载到DPU上。 - -本文档适用于使用openEuler系统并希望了解和使用操作系统内核及容器的社区开发者、开源爱好者以及相关合作伙伴。使用人员需要具备以下经验和技能: - -- 熟悉Linux基本操作 - -- 熟悉linux内核文件系统相关基础机制 - -- 对kubernetes和docker有一定了解,熟悉docker及kubernetes部署及使用 \ No newline at end of file diff --git a/docs/zh/docs/DPUOffload/public_sys-resources/icon-note.gif b/docs/zh/docs/DPUOffload/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/DPUOffload/public_sys-resources/icon-note.gif and /dev/null differ diff --git "a/docs/zh/docs/DPUOffload/qtfs\345\205\261\344\272\253\346\226\207\344\273\266\347\263\273\347\273\237\346\236\266\346\236\204\345\217\212\344\275\277\347\224\250\346\211\213\345\206\214.md" "b/docs/zh/docs/DPUOffload/qtfs\345\205\261\344\272\253\346\226\207\344\273\266\347\263\273\347\273\237\346\236\266\346\236\204\345\217\212\344\275\277\347\224\250\346\211\213\345\206\214.md" deleted file mode 100644 index 8088f480ee8008714ab9c3fa066be15c2864bcda..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/DPUOffload/qtfs\345\205\261\344\272\253\346\226\207\344\273\266\347\263\273\347\273\237\346\236\266\346\236\204\345\217\212\344\275\277\347\224\250\346\211\213\345\206\214.md" +++ /dev/null @@ -1,69 +0,0 @@ -# qtfs - -## 介绍 - -qtfs是一个共享文件系统项目,可部署在host-dpu的硬件架构上,也可以部署在2台服务器之间。以客户端服务器的模式工作,使客户端能通过qtfs访问服务端的指定文件系统,得到本地文件访问一致的体验。 - -qtfs的特性: - -+ 支持挂载点传播; - -+ 支持proc、sys、cgroup等特殊文件系统的共享; - -+ 支持远程文件读写的共享; - -+ 支持在客户端对服务端的文件系统进行远程挂载; - -+ 支持特殊文件的定制化处理; - -+ 支持远端fifo、unix-socket等,并且支持epoll,使客户端和服务端像本地通信一样使用这些文件; - -+ 支持基于host-dpu架构通过PCIe协议底层通信,性能大大优于网络; - -+ 支持内核模块形式开发,无需对内核进行侵入式修改。 - -## 软件架构 - -软件大体框架图: - -![qtfs-arch](./figures/qtfs-arch.png) - -## 安装教程 - -目录说明: - -+ **qtfs**: 客户端内核模块相关代码,直接在该目录下编译客户端ko。 - -+ **qtfs_server**: 服务端内核模块相关代码,直接在该目录下编译服务端ko和相关程序。 - -+ **qtinfo**: 诊断工具,支持查询文件系统的工作状态以及修改log级别等。 - -+ **demo**、**test**、**doc**: 测试程序、演示程序以及项目资料等。 - -+ 根目录: 客户端与服务端通用的公共模块代码。 - -首先找两台服务器(或虚拟机)配置内核编译环境: - - 1. 要求内核版本在5.10或更高版本。 -  2. 安装内核开发包:yum install kernel-devel。 - -服务端安装: - - 1. cd qtfs_server - 2. make clean && make - 3. insmod qtfs_server.ko qtfs_server_ip=x.x.x.x qtfs_server_port=12345 qtfs_log_level=WARN - 4. ./engine 4096 16 - -客户端安装: - - 1. cd qtfs - 2. make clean && make - 3. insmod qtfs.ko qtfs_server_ip=x.x.x.x qtfs_server_port=12345 qtfs_log_level=WARN - -## 使用说明 - -安装完成后,客户端通过挂载把服务端的文件系统让客户端可见,例如: - - mount -t qtfs / /root/mnt/ - -客户端进入"/root/mnt"后便可查看到server端的所有文件,以及对其进行相关操作。 diff --git "a/docs/zh/docs/DPUOffload/\345\256\271\345\231\250\347\256\241\347\220\206\351\235\242\346\227\240\346\204\237\345\215\270\350\275\275.md" "b/docs/zh/docs/DPUOffload/\345\256\271\345\231\250\347\256\241\347\220\206\351\235\242\346\227\240\346\204\237\345\215\270\350\275\275.md" deleted file mode 100644 index 2e4be2f1c0cf2c4c5d2a9dca9cafefc3d4aefc7e..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/DPUOffload/\345\256\271\345\231\250\347\256\241\347\220\206\351\235\242\346\227\240\346\204\237\345\215\270\350\275\275.md" +++ /dev/null @@ -1,31 +0,0 @@ -# 容器管理面无感卸载介绍 - -## 概述 - -在数据中心及云场景下,随着摩尔定律失效,通用处理单元CPU算力增长速率放缓,而同时网络IO类速率及性能不断攀升,二者增长速率差异形成的剪刀差,即当前通用处理器的处理能力无法跟上网络、磁盘等IO处理的需求。传统数据中心下越来越多的通用CPU算力被IO及管理面等占用,这部分资源损耗称之为数据中心税(Data-center Tax)。据AWS统计,数据中心税可能占据数据中心算力的30%以上,部分场景下甚至可能更多。 - -DPU的出现就是为了将这部分算力资源从主机CPU上解放出来,通过将管理面、网络、存储、安全等能力卸载到专有的处理器芯片(DPU)上进行处理加速,达成降本增效的结果。目前主流云厂商如AWS、阿里云、华为云都通过自研芯片完成管理面及相关数据面的卸载,达成数据中心计算资源100%售卖给客户。 - -管理面进程卸载到DPU可以通过对组件源码进行拆分达成,将源码根据功能逻辑拆分成独立运行的两部分,分别运行在主机和DPU,达成组件卸载的目的。但是这种做法有以下问题:一是影响组件的软件兼容性,组件后续版本升级和维护需要自己维护相关patch,带来一定的维护工作量;二是卸载工作无法被其他组件继承,后续组件卸载后仍需要进行代码逻辑分析和拆分等工作。为解决上述问题,本方案提出DPU的无感卸载,通过OS提供的抽象层,屏蔽应用在主机和DPU间跨主机访问的差异,让业务进程近似0改动达成卸载到DPU运行的目标,且这部分工作属于操作系统通用层,与上层业务无关,其他业务进行DPU卸载时也可以继承。 - -## 架构介绍 - -#### 容器管理面DPU无感卸载架构 - -**图1**容器管理面DPU无感卸载架构 - -![offload-arch](./figures/offload-arch.png) - -如图1所示,容器管理面卸载后,dockerd、kubelet等管理进程运行在DPU侧,容器进程本身运行在HOST,进程之间的交互关系由系统层提供对应的能力来保证: - -* 通信层:DPU和主机之间可能通过PCIe或网络进行通信,需要基于底层物理连接提供通信接口层,为上层业务提供通信接口。 - -* 内核共享文件系统qtfs:容器管理面组件kubelet、dockerd与容器进程之间的主要交互通过文件系统进行;管理面工具需要为容器进程准备rootfs、volume等数据面路径;还需要在运行时通过proc文件系统、cgroup文件系统等控制和监控容器进程的资源及状态。共享文件系统的详细介绍参考[共享文件系统介绍](qtfs共享文件系统架构及使用手册.md) - -* 用户态卸载环境:用户态需要使用qtfs为容器管理面准备卸载后的运行时环境,将主机的容器管理及运行时相关目录远程挂载到DPU;另外由于需要挂载proc、sys、cgroup等系统管理文件系统,为防止对DPU原生系统功能的破坏,上述挂载动作都在chroot环境内完成。另外管理面(运行于DPU)和容器进程(运行于主机)之间仍存在调用关系,需要通过远程二进制执行工具(rexec)提供对应功能。 - -容器管理面无感卸载的操作步骤可参考[部署指导文档](./无感卸载部署指导.md) - -> ![](./public_sys-resources/icon-note.gif)**说明**: -> -> 上述操作指导涉及对容器管理面组件的少量改动和rexec工具修改,这些修改基于指定版本,其他版本可基于实际执行环境做适配修改。文档中提供的patch仅供验证指导使用,不具备实际商用的条件 \ No newline at end of file diff --git "a/docs/zh/docs/DPUOffload/\346\227\240\346\204\237\345\215\270\350\275\275\351\203\250\347\275\262\346\214\207\345\257\274.md" "b/docs/zh/docs/DPUOffload/\346\227\240\346\204\237\345\215\270\350\275\275\351\203\250\347\275\262\346\214\207\345\257\274.md" deleted file mode 100644 index c15eed9c1fef689e3bb0f379c0e67582e9509ec7..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/DPUOffload/\346\227\240\346\204\237\345\215\270\350\275\275\351\203\250\347\275\262\346\214\207\345\257\274.md" +++ /dev/null @@ -1,166 +0,0 @@ - -# 容器管理面无感卸载部署指导 - -> ![](./public_sys-resources/icon-note.gif)**说明**: -> -> 本指导涉及对容器管理面组件的少量改动和rexec工具修改,这些修改基于指定版本,其他版本可基于实际执行环境做适配修改。文档中提供的patch仅供验证指导使用,不具备实际商用的条件。 - -> ![](./public_sys-resources/icon-note.gif)**说明**: -> -> 当前共享文件系统之间通信通过网络完成,可通过网络互连的两台物理机器或VM模拟验证。 -> -> 建议用户验证前先搭建可正常使用的kubernetes集群和容器运行环境,针对其中单个节点的管理面进程进行卸载验证,卸载环境(DPU)可选择一台具备网络连接的物理机或VM。 - -## 简介 - -容器管理面,即kubernetes、dockerd、containerd、isulad等容器的管理工具,而容器管理面卸载,即是将容器管理面卸载到与容器所在机器(以下称为HOST)之外的另一台机器(当前场景下是指DPU,一个具备独立运行环境的硬件集合)上运行。 - -我们使用共享文件系统qtfs将HOST上与容器运行相关的目录挂载到DPU上,使得容器管理面工具(运行在DPU)可以访问到这些目录,并为容器(运行在HOST)准备运行所需要的环境,此处,因为需要挂载远端的proc和sys等特殊文件系统,所以,我们创建了一个专门的rootfs以作为kubernetes、dockerd的运行环境(以下称为`/another_rootfs`)。 - -并且通过rexec执行容器的拉起、删除等操作,使得可以将容器管理面和容器分离在不同的两台机器上,远程对容器进行管理。 - -## 相关组件补丁介绍 - -#### rexec介绍 - -rexec是一个用go语言开发的远程执行工具,基于docker/libchan下的[rexec](https://github.com/docker/libchan/tree/master/examples/rexec)示例工具改造而成,实现远程调用远端二进制的功能,为方便使用在rexec中增加了环境变量传递和监控原进程退出等能力。 - -rexec工具的具体使用方式为在服务器端用`CMD_NET_ADDR=tcp://0.0.0.0:<端口号> rexec_server`的方式拉起rexec服务进程,然后在客户端用`CMD_NET_ADDR=tcp://<服务端ip>:<端口号> rexec [要执行的指令] `的方式启动,便可以调用rexec_server执行需要执行的指令,并等待指令执行结果返回。 - -#### dockerd相关改动介绍 - -对dockerd的改动基于18.09版本。 - -在containerd中,暂时注释掉了通过hook调用libnetwork-setkey的部分,此处不影响容器的拉起。并且,为了docker load的正常使用,注释掉了在mounter_linux.go 中mount函数中一处错误的返回。 - -最后,因为在容器管理面的运行环境中,将`/proc`挂在了服务端的proc文件系统,而本地的proc文件系统则挂载在了`/local_proc`,所以,dockerd以及containerd中的对`/proc/self/xxx`或者`/proc/getpid()/xxx`或者相关的文件系统访问的部分,我们统统将`/proc`改为了`/local_proc`。 - -#### containerd相关改动介绍 - -对于containerd的改动基于containerd-1.2-rc.1版本。 - -在获取mountinfo时,因为`/proc/self/mountinfo`只能获取到dockerd本身在本地的mountinfo,而无法获取到服务端的mountinfo,所以,将其改为了`/proc/1/mountinfo`,使其通过获取服务端1号进程mountinfo的方式得到服务端的mountinfo。 - -在contaienrd-shim中,将与containerd通信的unix socket改为了用tcp通信,containerd通过`SHIM_HOST`环境变量获取containerd-shim所运行环境的ip,即服务端ip。用shim的哈希值计算出一个端口号,并以此作为通信的端口,来拉起containerd-shim. - -并且,将原来的通过系统调用给contaienr-shim发信号的方式,改为了通过远程调用kill指令的方式向shim发信号,确保了docker杀死容器的行为可以正确的执行。 - -#### kubernetes相关改动介绍 - -kubelet暂不需要功能性改动,可能会遇到容器QoS管理器首次设置失败的错误,该错误不影响后续Pods拉起流程,暂时忽略该报错。 - -## 容器管理面卸载操作指南 - -在服务器端和客户端,都要拉起rexec_server。服务器端拉起rexec_server,主要是用于客户端创建容器时用rexec拉起containerd-shim,而客户端拉起rexec_server,则是为了执行containerd-shim对dockerd和containerd的调用。 - -#### 服务器端 - -创建容器管理面所需要的文件夹,然后插入qtfs_server.ko,并拉起engine进程。 - -此外在服务器端,还需要创建rexec脚本/usr/bin/dockerd. - -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://<客户端ip>: rexec /usr/bin/dockerd $* -``` - -#### 客户端 - -需要准备一个rootfs,作为dockerd与containerd的运行环境,通过如下的脚本,将dockerd、containerd所需要的服务端目录挂载到客户端。并且,需要确保在以下脚本中被挂载的远程目录在服务端和客户端都存在。 - -``` shell -#!/bin/bash -mkdir -p /another_rootfs/var/run/docker/containerd -iptables -t nat -N DOCKER -echo "---------insmod qtfs ko----------" -insmod /YOUR/QTFS/PATH/qtfs.ko qtfs_server_ip=<服务端ip> qtfs_log_level=INFO - -# chroot环境内的proc使用DPU的proc共享文件系统替换,需要将本机真实proc文件系统挂载到local_proc下使用 -mount -t proc proc /another_rootfs/local_proc/ - -# 将chroot内环境与外部环境bind,方便进行配置和运行 -mount --bind /var/run/ /another_rootfs/var/run/ -mount --bind /var/lib/ /another_rootfs/var/lib/ -mount --bind /etc /another_rootfs/etc - -mkdir -p /another_rootfs/var/lib/isulad - -# 在chroot环境内创建并挂载dev、sys和cgroup文件系统 -mount -t devtmpfs devtmpfs /another_rootfs/dev/ -mount -t sysfs sysfs /another_rootfs/sys -mkdir -p /another_rootfs/sys/fs/cgroup -mount -t tmpfs tmpfs /another_rootfs/sys/fs/cgroup -list="perf_event freezer files net_cls,net_prio hugetlb pids rdma cpu,cpuacct memory devices blkio cpuset" -for i in $list -do - echo $i - mkdir -p /another_rootfs/sys/fs/cgroup/$i - mount -t cgroup cgroup -o rw,nosuid,nodev,noexec,relatime,$i /another_rootfs/sys/fs/cgroup/$i -done - -## common system dir -mount -t qtfs -o proc /proc /another_rootfs/proc -echo "proc" -mount -t qtfs /sys /another_rootfs/sys -echo "cgroup" - -# 挂载容器管理面所需要的共享目录 -mount -t qtfs /var/lib/docker/containers /another_rootfs/var/lib/docker/containers -mount -t qtfs /var/lib/docker/containerd /another_rootfs/var/lib/docker/containerd -mount -t qtfs /var/lib/docker/overlay2 /another_rootfs/var/lib/docker/overlay2 -mount -t qtfs /var/lib/docker/image /another_rootfs/var/lib/docker/image -mount -t qtfs /var/lib/docker/tmp /another_rootfs/var/lib/docker/tmp -mkdir -p /another_rootfs/run/containerd/io.containerd.runtime.v1.linux/ -mount -t qtfs /run/containerd/io.containerd.runtime.v1.linux/ /another_rootfs/run/containerd/io.containerd.runtime.v1.linux/ -mkdir -p /another_rootfs/var/run/docker/containerd -mount -t qtfs /var/run/docker/containerd /another_rootfs/var/run/docker/containerd -mount -t qtfs /var/lib/kubelet/pods /another_rootfs/var/lib/kubelet/pods -``` - -在/another_rootfs中,需要创建以下脚本,用来支持部分跨主机操作。 - -* /another_rootfs/usr/local/bin/containerd-shim - -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://<服务端ip>: /usr/bin/rexec /usr/bin/containerd-shim $* -``` - -* /another_rootfs/usr/local/bin/remote_kill - -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://<服务端ip>: /usr/bin/rexec /usr/bin/kill $* -``` - -* /another_rootfs/usr/sbin/modprobe -``` shell -#!/bin/bash -CMD_NET_ADDR=tcp://<服务端ip>: /usr/bin/rexec /usr/sbin/modprobe $* -``` - -在chroot到dockerd和containerd运行所需的rootfs后,用如下的命令拉起dockerd和containerd - -* containerd -``` shell -#!/bin/bash -SHIM_HOST=<服务端ip> containerd --config /var/run/docker/containerd/containerd.toml --address /var/run/containerd/containerd.sock -``` - -* dockerd -``` shell -#!/bin/bash -SHIM_HOST=<服务端ip> CMD_NET_ADDR=tcp://<服务端ip>: /usr/bin/dockerd --containerd /var/run/containerd/containerd.sock -``` - -* kubelet - -在chroot环境内使用原参数拉起kubelet即可。 - -因为我们已经将/var/run/和/another_rootfs/var/run/绑定在了一起,所以可以在正常的rootfs下,通过docker来访问docker.sock接口进行容器管理。 - -至此,完成容器管理面卸载到DPU,可以通过docker相关操作进行容器创建、删除等操作,也可以通过kubectl在当前节点进行pods调度和销毁,且实际容器业务进程运行在HOST侧。 - -> ![](./public_sys-resources/icon-note.gif)**说明**: -> -> 本指导所述操作只涉及容器管理面进程卸载,不包含容器网络和数据卷volume等卸载,如有相关需求,需要通过额外的网络或存储卸载能力支持。本指导支持不带网络和存储的容器跨节点拉起。 \ No newline at end of file diff --git a/docs/zh/docs/HSAK/develop_with_hsak.md b/docs/zh/docs/HSAK/develop_with_hsak.md deleted file mode 100644 index d96c08a1337679bd2b4c73a97ac45bda9872a094..0000000000000000000000000000000000000000 --- a/docs/zh/docs/HSAK/develop_with_hsak.md +++ /dev/null @@ -1,227 +0,0 @@ - -## 使用说明 - -### nvme.conf.in配置文件 - -HSAK配置文件默认安装在/etc/spdk/nvme.conf.in,开发人员可以根据实际业务需要对配置文件进行修改,配置文件内容如下: -- [Global] -1. ReactorMask:指定用于轮询IO的核(16进制,不能指定0核,按bit位从低位到高位,分别表示不同CPU核,如:0x1表示0核,0x6表示1、2两个核,以此类推,本字段最大支持34个字符,去掉表示16进制的0x标记,剩余32个计数字符,每个16进制字符最大是F,可表示4个核,所以最多可以支持32*4=128个核)。 -2. LogLevel:HSAK日志打印级别(0:error;1:warning;2:notice;3:info;4:debug)。 -3. MemSize:HSAK占用的内存(最小值为500MB)。 -4. MultiQ:是否在同一个块设备上开启多队列。 -5. E2eDif:DIF类型(1:半程保护;2:全程保护),不同厂商的硬盘对DIF支持能力可能不同,具体请参考硬件厂家资料。 -6. IoStat:是否使能IO统计开关(Yes\No)。 -7. RpcServer:是否启动rpc侦听线程(Yes\No)。 -8. NvmeCUSE:是否启动CUSE功能(Yes\No),开启后在/dev/spdk目录下生成nvme字符设备。 -- [Nvme] -1. TransportID:指定NVMe控制器的PCI地址和名称,使用格式为:TransportID "trtype:PCIe traddr:0000:09:00.0" nvme0。 -2. RetryCount:IO失败时的重试次数,0表示不重试,最大255。 -3. TimeoutUsec:IO超时时间,0或者不配置该配置项表示不设置超时时间,单位是μs。 -4. ActionOnTimeout:IO超时行为(None:仅打印信息;Reset:reset控制器;abort:丢弃超时指令),默认None。 -- [Reactor] -1. BatchSize:支持批量提交提交IO的个数,默认是8,最大是32。 - -### 头文件引用 - -HSAK提供两个对外头文件,开发者在使用HSAK进行开发时需要包含这两个文件: -1. bdev_rw.h:定义了数据面用户态IO操作的宏、枚举、数据结构和接口API。 -2. ublock.h:定义了管理面设备管理、信息获取等功能的宏、枚举、数据结构和接口API。 - -### 业务运行 - -开发者在进行软件开发编译后,运行前,需要先运行setup.sh脚本程序,用于重新绑定NVMe盘驱动到用户态,该脚本默认安装在:/opt/spdk。 -执行如下命令将盘驱动从内核态绑定到用户态,同时预留1024个2M大页: - -```shell -[root@localhost ~]# cd /opt/spdk -[root@localhost spdk]# ./setup.sh -0000:3f:00.0 (8086 2701): nvme -> uio_pci_generic -0000:40:00.0 (8086 2701): nvme -> uio_pci_generic -``` - -执行如下命令将盘驱动从用户态恢复到内核态,同时释放预留的大页: - -```shell -[root@localhost ~]# cd /opt/spdk -[root@localhost spdk]# ./setup.sh reset -0000:3f:00.0 (8086 2701): uio_pci_generic -> nvme -0000:40:00.0 (8086 2701): uio_pci_generic -> nvme -``` - -### 用户态IO读写场景 - -开发者通过以下顺序调用HSAK接口,实现经由用户态IO通道的业务数据读写: - -1. 初始化HSAK UIO模块。可调用接口libstorage_init_module,完成HSAK用户态IO通道的初始化。 - -2. 打开磁盘块设备。可调用libstorage_open,打开指定块设备,如需打开多个块设备,需要多次重复调用。 - -3. 申请IO内存。可调用接口libstorage_alloc_io_buf或libstorage_mem_reserve,前者最大可申请单个65K的IO,后者没有限制(除非无可用空间)。 - -4. 对磁盘进行读写操作。根据实际业务需要,可调用如下接口进行读写操作: - - - libstorage_async_read - - libstorage_async_readv - - libstorage_async_write - - libstorage_async_writev - - libstorage_sync_read - - libstorage_sync_write - -5. 释放IO内存。可调用接口libstorage_free_io_buf或libstorage_mem_free,需要与申请时调用的接口对应。 - -6. 关闭磁盘块设备。可调用接口libstorage_close,关闭指定块设备,如果打开了多个块设备,则需要多次重复调用接口进行关闭。 - - | 接口名称 | 功能描述 | - | ----------------------- | --------------------------------------------- | - | libstorage_init_module | HSAK模块初始化接口。 | - | libstorage_open | 打开块设备。 | - | libstorage_alloc_io_buf | 从SPDK的buf_small_pool或者buf_large_pool中分配内存。 | - | libstorage_mem_reserve | 从DPDK预留的大页内存中分配内存空间。 | - | libstorage_async_read | HSAK下发异步IO读请求的接口(读缓冲区为连续buffer)。 | - | libstorage_async_readv | HSAK下发异步IO读请求的接口(读缓冲区为离散buffer)。 | - | libstorage_async_write | HSAK下发异步IO写请求的接口(写缓冲区为连续buffer)。 | - | libstorage_async_wrtiev | HSAK下发异步IO写请求的接口(写缓冲区为离散buff)。 | - | libstorage_sync_read | HSAK下发同步IO读请求的接口(读缓冲区为连续buffer)。 | - | libstorage_sync_write | HSAK下发同步IO写请求的接口(写缓冲区为连续buffer)。 | - | libstorage_free_io_buf | 释放所分配的内存到SPDK的buf_small_pool或者buf_large_pool中。 | - | libstorage_mem_free | 释放libstorage_mem_reserve所申请的内存空间。 | - | libstorage_close | 关闭块设备。 | - | libstorage_exit_module | HSAK模块退出接口。 | - -### 盘管理场景 - -HSAK包含一组C接口,可以对盘进行格式化、创建、删除namespace操作。 - -1. 首先需要调用C接口对HSAK UIO组件进行初始化,如果已经初始化过了,就不需要再调用了。 - - libstorage_init_module - -2. 根据业务需要,调用相应的接口进行盘操作,以下接口可单独调用: - - - libstorage_create_namespace - - - libstorage_delete_namespace - - - libstorage_delete_all_namespace - - - libstorage_nvme_create_ctrlr - - - libstorage_nvme_delete_ctrlr - - - libstorage_nvme_reload_ctrlr - - - libstorage_low_level_format_nvm - - - libstorage_deallocate_block - -3. 最后如果退出程序,则需要销毁HSAK UIO,如果还有其他业务在使用,不需要退出,则不用销毁。 - - libstorage_exit_module - - | 接口名称 | 功能描述 | - | ------------------------------- | ----------------------------------------- | - | libstorage_create_namespace | 在指定控制器上创建namespace(前提是控制器具有namespace管理能力)。 | - | libstorage_delete_namespace | 在指定控制器上删除namespace。 | - | libstorage_delete_all_namespace | 删除指定控制器上所有namespace。 | - | libstorage_nvme_create_ctrlr | 根据PCI地址创建NVMe控制器。 | - | libstorage_nvme_delete_ctrlr | 根据控制器名称销毁NVMe控制器。 | - | libstorage_nvme_reload_ctrlr | 根据传入的配置文件自动创建或销毁NVMe控制器。 | - | libstorage_low_level_format_nvm | 低级格式化NVMe盘。 | - | libstorage_deallocate_block | 告知NVMe盘可释放的块,用于垃圾回收。 | - -### 数据面盘信息查询 - -在HSAK的IO数据面提供一组C接口,用于查询盘信息,上层业务可根据查询到的信息进行相关的业务逻辑处理。 - -1. 首先需要调用C接口对HSAK UIO进行初始化,如果已经初始化过了,就不需要再调用了。 - - libstorage_init_module - -2. 根据业务需要,调用相应接口进行信息查询,以下接口可单独调用: - - - libstorage_get_nvme_ctrlr_info - - - libstorage_get_mgr_info_by_esn - - - libstorage_get_mgr_smart_by_esn - - - libstorage_get_bdev_ns_info - - - libstorage_get_ctrl_ns_info - -3. 最后如果退出程序,则需要销毁HSAK UIO,如果还有其他业务在使用,不需要退出,则不用销毁。 - - libstorage_exit_module - - | 接口名称 | 功能描述 | - | ------------------------------- | ----------------------------- | - | libstorage_get_nvme_ctrlr_info | 获取所有控制器信息。 | - | libstorage_get_mgr_info_by_esn | 数据面获取设备序列号(ESN)对应的磁盘的管理信息。 | - | libstorage_get_mgr_smart_by_esn | 数据面获取设备序列号(ESN)对应的磁盘的SMART信息。 | - | libstorage_get_bdev_ns_info | 根据设备名称,获取namespace信息。 | - | libstorage_get_ctrl_ns_info | 根据控制器名称,获取所有namespace信息。 | - -### 管理面盘信息查询场景 - -在HSAK的管理面组件ublock提供一组C接口,用于支持在管理面对盘信息进行查询。 - -1. 首先调用C接口对HSAK ublock服务端进行初始化。 - - | 接口名称 | 功能描述 | - | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | - | init_ublock | 初始化ublock功能模块,本接口必须在其他所有ublock接口之前被调用,同一个进程只能初始化一次,原因是init_ublock接口中会初始化DPDK,而DPDK初始化所分配的内存同进程PID绑定,一个PID只能绑定一块内存,且DPDK没有提供释放这块内存的接口,只能通过进程退出来释放。 | - | ublock_init | 本身是对init_ublock接口的宏定义,可理解为将ublock初始化为需要RPC服务。 | - | ublock_init_norpc | 本身是对init_ublock接口的宏定义,可理解为ublock初始化为无RPC服务。 | - -2. 根据业务需要,在另一个进程中调用HSAK UIO组件初始化接口。 - -3. 在ublock服务端进程或客户端进程调用如下接口进行相应的信息查询业务。 - - | 接口名称 | 功能描述 | - | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | - | ublock_get_bdevs | 业务进程通过调用本接口获取设备列表,获取的设备列表中只有PCI地址,不包含具体设备信息,需要获取具体设备信息,请调用接口ublock_get_bdev。 | - | ublock_get_bdev | 进程通过调用本接口获取具体某个设备的信息,设备信息中包括:设备的序列号、型号、fw版本号信息以字符数组形式保持,不是字符串形式。 | - | ublock_get_bdev_by_esn | 进程通过调用该接口,根据给定的ESN号获取对应设备的信息,设备信息中:序列号、型号、fw版本号。 | - | ublock_get_SMART_info | 进程通过调用本接口获取指定设备的SMART信息。 | - | ublock_get_SMART_info_by_esn | 进程通过调用本接口获取ESN号对应设备的SMART信息。 | - | ublock_get_error_log_info | 进程通过调用本接口获取设备的Error log信息。 | - | ublock_get_log_page | 进程通过调用本接口获取指定设备,指定log page的信息。 | - - -4. 对于块设备列表,在获取相应信息后需要调用以下接口进行资源释放。 - - | 接口名称 | 功能描述 | - | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | - | ublock_free_bdevs | 进程通过调用本接口释放设备列表。 | - | ublock_free_bdev | 进程通过调用本接口释放设备资源。 | - - -5. 最后如果退出程序,则需要销毁HSAK ublock模块(服务端和客户端销毁方法相同)。 - - | 接口名称 | 功能描述 | - | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | - | ublock_fini | 销毁ublock功能模块,本接口将销毁ublock模块以及内部创建的资源,本接口同ublock初始化接口需要配对使用。 | - -### 日志管理 - -HSAK的日志当前是通过syslog默认输出到/var/log/messages中,由操作系统的rsyslog服务管理。如果您需要自定义日志目录,可以通过rsyslog配置。 - -1. 首先需要在配置文件/etc/rsyslog.conf中增加如下修改: - - ```shell - if ($programname == 'LibStorage') then { - action(type="omfile" fileCreateMode="0600" file="/var/log/HSAK/run.log") - stop - } - ``` - -2. 重启rsyslog服务: - - ```shell - sysemctl restart rsyslog - ``` - -3. 启动HSAK进程,日志信息即重定向到对应目录。 - -4. 重定向日志如果需要转储,需要用户在/etc/logrotate.d/syslog文件中手动配置。 - diff --git a/docs/zh/docs/HSAK/hsak_interface.md b/docs/zh/docs/HSAK/hsak_interface.md deleted file mode 100644 index d1d1680a66ee412362d04f967382e5f6a4e4a908..0000000000000000000000000000000000000000 --- a/docs/zh/docs/HSAK/hsak_interface.md +++ /dev/null @@ -1,2392 +0,0 @@ - -## C接口 - -### 宏定义和枚举 - -#### bdev_rw.h - -##### enum libstorage_ns_lba_size - -1. 原型 - -``` -enum libstorage_ns_lba_size -{ -LIBSTORAGE_NVME_NS_LBA_SIZE_512 = 0x9, -LIBSTORAGE_NVME_NS_LBA_SIZE_4K = 0xc -}; -``` - -2. 描述 - -磁盘sector_size(数据)大小。 - -##### enum libstorage_ns_md_size - -1. 原型 -``` -enum libstorage_ns_md_size -{ -LIBSTORAGE_METADATA_SIZE_0 = 0, -LIBSTORAGE_METADATA_SIZE_8 = 8, -LIBSTORAGE_METADATA_SIZE_64 = 64 -}; -``` -2. 描述 - -磁盘meta data(元数据) size大小。 - -3. 备注 - -- ES3000 V3(单端口)支持5种扇区类型的格式化(512+0,512+8,4K+64,4K,4K+8)。 - -- ES3000 V3(双端口)支持4种扇区类型的格式化(512+0,512+8,4K+64,4K)。 - -- ES3000 V5 支持5种扇区类型的格式化(512+0,512+8,4K+64,4K,4K+8)。 - -- Optane盘支持7种扇区类型的格式化(512+0,512+8,512+16,4K,4K+8,4K+64,4K+128)。 - - -##### enum libstorage_ns_pi_type - -1. 原型 -``` -enum libstorage_ns_pi_type -{ -LIBSTORAGE_FMT_NVM_PROTECTION_DISABLE = 0x0, -LIBSTORAGE_FMT_NVM_PROTECTION_TYPE1 = 0x1, -LIBSTORAGE_FMT_NVM_PROTECTION_TYPE2 = 0x2, -LIBSTORAGE_FMT_NVM_PROTECTION_TYPE3 = 0x3, -}; -``` -2. 描述 - -磁盘支持的保护类型。 - -3. 备注 - -ES3000仅支持保护类型0和保护类型3,Optane盘仅支持保护类型0和保护类型1。 -##### enum libstorage_crc_and_prchk - -1. 原型 -``` -enum libstorage_crc_and_prchk -{ -LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK = 0x0, -LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK = 0x1, -LIBSTORAGE_LIB_CRC_AND_DISABLE_PRCHK = 0x2, -LIBSTORAGE_LIB_CRC_AND_ENABLE_PRCHK = 0x3, -#define NVME_NO_REF 0x4 -LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK_NO_REF = LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK | NVME_NO_REF, -LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK_NO_REF = LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK | NVME_NO_REF, -}; -``` -2. 描述 - -- LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK:应用层做CRC校验,HSAK不做CRC校验,关闭盘的CRC校验。 - -- LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK:应用层做CRC校验,HSAK不做CRC校验,开启盘的CRC校验。 - -- LIBSTORAGE_LIB_CRC_AND_DISABLE_PRCHK:应用层不做CRC校验,HSAK做CRC校验,关闭盘的CRC校验。 - -- LIBSTORAGE_LIB_CRC_AND_ENABLE_PRCHK:应用层不做CRC校验,HSAK做CRC校验,开启盘的CRC校验。 - -- LIBSTORAGE_APP_CRC_AND_DISABLE_PRCHK_NO_REF:应用层做CRC校验,HSAK不做CRC校验,关闭盘的CRC校验。对于PI TYPE为1的磁盘(Intel optane P4800),关闭盘的REF TAG校验。 - -- LIBSTORAGE_APP_CRC_AND_ENABLE_PRCHK_NO_REF:应用层做CRC校验,HSAK不做CRC校验,开启盘的CRC校验。对于PI TYPE为1的磁盘(Intel optane P4800),关闭盘的REF TAG校验。 - -- Intel optane P4800盘PI TYPE为1,默认会校验元数据区的CRC和REF TAG。 - -- Intel optane P4800盘的512+8格式支持DIF,4096+64格式不支持。 - -- ES3000 V3和ES3000 V5盘PI TYPE为3,默认只校验元数据区的CRC。 - -- ES3000 V3的512+8格式支持DIF,4096+64格式不支持。ES3000 V5的512+8和4096+64格式均支持DIF。 - - -总结为如下: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
端到端校验方式ctrlflagCRC生成者写流程读流程
应用校验HSAK校验CRC盘校验CRC应用校验HSAK校验CRC盘校验CRC
半程保护0控制器XXXXXX
1控制器XXXXX
2控制器XXXXXX
3控制器XXXXX
全程保护0APPXXXX
1APPXX
2HSAKXXXX
3HSAKXX
- - -##### enum libstorage_print_log_level - -1. 原型 - -``` -enum libstorage_print_log_level -{ -LIBSTORAGE_PRINT_LOG_ERROR, -LIBSTORAGE_PRINT_LOG_WARN, -LIBSTORAGE_PRINT_LOG_NOTICE, -LIBSTORAGE_PRINT_LOG_INFO, -LIBSTORAGE_PRINT_LOG_DEBUG, -}; -``` - -2. 描述 - -SPDK日志打印级别:ERROR、WARN、NOTICE、INFO、DEBUG,分别对应配置文件中的0~4。 - -##### MAX_BDEV_NAME_LEN - -1. 原型 -``` -#define MAX_BDEV_NAME_LEN 24 -``` -2. 描述 - -块设备名最大长度限制。 - -##### MAX_CTRL_NAME_LEN - -1. 原型 -``` -#define MAX_CTRL_NAME_LEN 16 -``` -2. 描述 - -控制器名最大长度限制。 - -##### LBA_FORMAT_NUM - -1. 原型 -``` -#define LBA_FORMAT_NUM 16 -``` -2. 描述 - -控制器所支持的LBA格式数目。 - -##### LIBSTORAGE_MAX_DSM_RANGE_DESC_COUNT - -1. 原型 - -#define LIBSTORAGE_MAX_DSM_RANGE_DESC_COUNT 256 - -2. 描述 - -数据集管理命令中16字节集的最大数目。 - -#### ublock.h - -##### UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO - -1. 原型 -``` -#define UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO 1 -``` -2. 描述 - -用于定义uevent事件所对应的子系统是内核uio,在业务收到uevent事件时,通过该宏定义判断是否为需要处理的内核uio事件。 - -数据结构struct ublock_uevent中成员int subsystem的值取值为UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO,当前仅此一个可选值。 - -##### UBLOCK_TRADDR_MAX_LEN - -1. 原型 -``` -#define UBLOCK_TRADDR_MAX_LEN 256 -``` -2. 描述 - -以"域:总线:设备.功能"(%04x:%02x:%02x.%x)格式表示的PCI地址字符串的最大长度,其实实际长度远小于256字节。 - -##### UBLOCK_PCI_ADDR_MAX_LEN - -1. 原型 -``` -#define UBLOCK_PCI_ADDR_MAX_LEN 256 -``` -2. 描述 - -PCI地址字符串最大长度,实际长度远小于256字节;此处PCI地址格式可能的形式为: - -- 全地址:%x:%x:%x.%x 或 %x.%x.%x.%x。 - -- 功能值为0:%x:%x:%x。 - -- 域值为0:%x:%x.%x 或 %x.%x.%x。 - -- 域和功能值为0:%x:%x 或 %x.%x。 - -##### UBLOCK_SMART_INFO_LEN - -1. 原型 -``` -#define UBLOCK_SMART_INFO_LEN 512 -``` -2. 描述 - -获取NVMe盘SMART信息结构体的大小,为512字节。 - -##### enum ublock_rpc_server_status - -1. 原型 -``` -enum ublock_rpc_server_status { -// start rpc server or not -UBLOCK_RPC_SERVER_DISABLE = 0, -UBLOCK_RPC_SERVER_ENABLE = 1, -}; -``` -2. 描述 - -用于表示HSAK内部RPC服务状态,启用或关闭。 - -##### enum ublock_nvme_uevent_action - -1. 原型 -``` -enum ublock_nvme_uevent_action { -UBLOCK_NVME_UEVENT_ADD = 0, -UBLOCK_NVME_UEVENT_REMOVE = 1, -UBLOCK_NVME_UEVENT_INVALID, -}; -``` -2. 描述 - -用于表示uevent热插拔事件是插入硬盘还是移除硬盘。 - -##### enum ublock_subsystem_type - -1. 原型 -``` -enum ublock_subsystem_type { -SUBSYSTEM_UIO = 0, -SUBSYSTEM_NVME = 1, -SUBSYSTEM_TOP -}; -``` -2. 描述 - -指定回调函数类型,用于区分产品注册回调函数时是针对于uio驱动还是针对于内核nvme驱动。 - -### 数据结构 - -#### bdev_rw.h - -##### struct libstorage_namespace_info - -1. 原型 -``` -struct libstorage_namespace_info -{ -char name[MAX_BDEV_NAME_LEN]; -uint64_t size; /** namespace size in bytes */ -uint64_t sectors; /** number of sectors */ -uint32_t sector_size; /** sector size in bytes */ -uint32_t md_size; /** metadata size in bytes */ -uint32_t max_io_xfer_size; /** maximum i/o size in bytes */ -uint16_t id; /** namespace id */ -uint8_t pi_type; /** end-to-end data protection information type */ -uint8_t is_active :1; /** namespace is active or not */ -uint8_t ext_lba :1; /** namespace support extending LBA size or not */ -uint8_t dsm :1; /** namespace supports Dataset Management or not */ -uint8_t pad :3; -uint64_t reserved; -}; -``` -2. 描述 - -该数据结构中包含硬盘namespace相关信息。 - -3. 结构体成员 - -| **成员** | 描述 | -|------------------------------|------------------------------------------------| -| char name[MAX_BDEV_NAME_LEN] | Namespace名字 | -| uint64_t size | 该namespace所分配的硬盘大小,字节为单位 | -| uint64_t sectors | 扇区数 | -| uint32_t sector_size | 每扇区大小,字节为单位 | -| uint32_t md_size | Metadata大小,字节为单位 | -| uint32_t max_io_xfer_size | 最大允许的单次IO操作数据大小,字节为单位 | -| uint16_t id | Namespace ID | -| uint8_t pi_type | 数据保护类型,取值自enum libstorage_ns_pi_type | -| uint8_t is_active :1 | Namespace是否激活 | -| uint8_t ext_lba :1 | Namespace是否支持扩展LBA | -| uint8_t dsm :1 | Namespace是否支持数据集管理 | -| uint8_t pad :3 | 保留字段 | -| uint64_t reserved | 保留字段 | - - - - -##### struct libstorage_nvme_ctrlr_info - -1. 原型 -``` -struct libstorage_nvme_ctrlr_info -{ -char name[MAX_CTRL_NAME_LEN]; -char address[24]; -struct -{ -uint32_t domain; -uint8_t bus; -uint8_t dev; -uint8_t func; -} pci_addr; -uint64_t totalcap; /* Total NVM Capacity in bytes */ -uint64_t unusecap; /* Unallocated NVM Capacity in bytes */ -int8_t sn[20]; /* Serial number */ -uint8_t fr[8]; /* Firmware revision */ -uint32_t max_num_ns; /* Number of namespaces */ -uint32_t version; -uint16_t num_io_queues; /* num of io queues */ -uint16_t io_queue_size; /* io queue size */ -uint16_t ctrlid; /* Controller id */ -uint16_t pad1; -struct -{ -struct -{ -/** metadata size */ -uint32_t ms : 16; -/** lba data size */ -uint32_t lbads : 8; -uint32_t reserved : 8; -} lbaf[LBA_FORMAT_NUM]; -uint8_t nlbaf; -uint8_t pad2[3]; -uint32_t cur_format : 4; -uint32_t cur_extended : 1; -uint32_t cur_pi : 3; -uint32_t cur_pil : 1; -uint32_t cur_can_share : 1; -uint32_t mc_extented : 1; -uint32_t mc_pointer : 1; -uint32_t pi_type1 : 1; -uint32_t pi_type2 : 1; -uint32_t pi_type3 : 1; -uint32_t md_start : 1; -uint32_t md_end : 1; -uint32_t ns_manage : 1; /* Supports the Namespace Management and Namespace Attachment commands */ -uint32_t directives : 1; /* Controller support Directives or not */ -uint32_t streams : 1; /* Controller support Streams Directives or not */ -uint32_t dsm : 1; /* Controller support Dataset Management or not */ -uint32_t reserved : 11; -} cap_info; -}; -``` -1. 描述 - -该数据结构中包含硬盘控制器相关信息。 - -2. 结构体成员 - - -| **成员** | **描述** | -|----------|----------| -| char name[MAX_CTRL_NAME_LEN] | 控制器名字 | -| char address[24] | PCI地址,字符串形式 | -| struct
{
uint32_t domain;
uint8_t bus;
uint8_t dev;
uint8_t func;
} pci_addr | PCI地址,分段形式 | -| uint64_t totalcap | 控制器的总容量大小(字节为单位)Optane盘基于NVMe 1.0协议,不支持该字段 | -| uint64_t unusecap | 控制器未使用的容量大小(字节为单位)Optane盘基于NVMe 1.0协议,不支持该字段 | -| int8_t sn[20]; | 硬盘序列号。不带'0'的ASCII字符串 | -| uint8_t fr[8]; | 硬盘firmware版本号。不带'0'的ASCII字符串 | -| uint32_t max_num_ns | 最大允许的namespace数 | -| uint32_t version | 控制器支持的NVMe标准协议版本号 | -| uint16_t num_io_queues | 硬盘支持的IO队列数量 | -| uint16_t io_queue_size | IO队列最大深度 | -| uint16_t ctrlid | 控制器ID | -| uint16_t pad1 | 保留字段 | - -struct cap_info子结构体成员: - -| **成员** | **描述** | -|-----------------------------------|------------------------------------| -| struct
{
uint32_t ms : 16;
uint32_t lbads : 8;
uint32_t reserved : 8;
}lbaf[LBA_FORMAT_NUM] | ms:元数据大小,最小为8字节
lbads:指示LBA大小为2^lbads,lbads不小于9 | -| uint8_t nlbaf | 控制器所支持的LBA格式数 | -| uint8_t pad2[3] | 保留字段 | -| uint32_t cur_format : 4 | 控制器当前的LBA格式 | -| uint32_t cur_extended : 1 | 控制器当前是否支持扩展型LBA | -| uint32_t cur_pi : 3 | 控制器当前的保护类型 | -| uint32_t cur_pil : 1 | 控制器当前的PI(保护信息)位于元数据的first eight bytes或者last eight bytes | -| uint32_t cur_can_share : 1 | namespace是否支持多路径传输 | -| uint32_t mc_extented : 1 | 元数据是否作为数据缓冲区的一部分进行传输 | -| uint32_t mc_pointer : 1 | 元数据是否与数据缓冲区分离 | -| uint32_t pi_type1 : 1 | 控制器是否支持保护类型一 | -| uint32_t pi_type2 : 1 | 控制器是否支持保护类型二 | -| uint32_t pi_type3 : 1 | 控制器是否支持保护类型三 | -| uint32_t md_start : 1 | 控制器是否支持PI(保护信息)位于元数据的first eight bytes | -| uint32_t md_end : 1 | 控制器是否支持PI(保护信息)位于元数据的last eight bytes | -| uint32_t ns_manage : 1 | 控制器是否支持namespace管理 | -| uint32_t directives : 1 | 是否支持Directives命令集 | -| uint32_t streams : 1 | 是否支持Streams Directives | -| uint32_t dsm : 1 | 是否支持Dataset Management命令 | -| uint32_t reserved : 11 | 保留字段 | - -##### struct libstorage_dsm_range_desc - -1. 原型 -``` -struct libstorage_dsm_range_desc -{ -/* RESERVED */ -uint32_t reserved; - -/* NUMBER OF LOGICAL BLOCKS */ -uint32_t block_count; - -/* UNMAP LOGICAL BLOCK ADDRESS */uint64_t lba;}; -``` -2. 描述 - -数据管理命令集中单个16字节集的定义。 - -3. 结构体成员 - -| **成员** | **描述** | -|----------------------|--------------| -| uint32_t reserved | 保留字段 | -| uint32_t block_count | 单位LBA的数量 | -| uint64_t lba | 起始LBA | - -##### struct libstorage_ctrl_streams_param - -1. 原型 -``` -struct libstorage_ctrl_streams_param -{ -/* MAX Streams Limit */ -uint16_t msl; - -/* NVM Subsystem Streams Available */ -uint16_t nssa; - -/* NVM Subsystem Streams Open */uint16_t nsso; - -uint16_t pad; -}; -``` -2. 描述 - -NVMe盘支持的Streams属性值。 - -3. 结构体成员 - -| **成员** | **描述** | -|---------------|--------------------------------------| -| uint16_t msl | 硬盘支持的最大Streams资源数 | -| uint16_t nssa | 每个NVM子系统可使用的Streams资源数 | -| uint16_t nsso | 每个NVM子系统已经使用的Streams资源数 | -| uint16_t pad | 保留字段 | - - - -##### struct libstorage_bdev_streams_param - -1. 原型 -``` -struct libstorage_bdev_streams_param -{ -/* Stream Write Size */ -uint32_t sws; - -/* Stream Granularity Size */ -uint16_t sgs; - -/* Namespace Streams Allocated */ -uint16_t nsa; - -/* Namespace Streams Open */ -uint16_t nso; - -uint16_t reserved[3]; -}; -``` -2. 描述 - -Namespace的Streams属性值。 - -3. 结构体成员 - - |**成员** | **描述** | - |-------------------------|---------------------------------| - |uint32_t sws |性能最优的写粒度,单位:sectors| - |uint16_t sgs |Streams分配的写粒度,单位:sws| - |uint16_t nsa |Namespace可使用的私有Streams资源数| - |uint16_t nso |Namespace已使用的私有Streams资源数| - |uint16_t reserved[3] |保留字段| - -##### struct libstorage_mgr_info - -1. 原型 -``` -struct libstorage_mgr_info -{ -char pci[24]; -char ctrlName[MAX_CTRL_NAME_LEN]; -uint64_t sector_size; -uint64_t cap_size; -uint16_t device_id; -uint16_t subsystem_device_id; -uint16_t vendor_id; -uint16_t subsystem_vendor_id; -uint16_t controller_id; -int8_t serial_number[20]; -int8_t model_number[40]; -uint8_t firmware_revision[8]; -}; -``` -2. 描述 - -磁盘管理信息(与管理面使用的磁盘信息一致)。 - -3. 结构体成员 - - |**成员** | **描述**| -|-------------------------|------------------------------------| - |char pci[24] 磁盘PCI地址字符串| - |char ctrlName[MAX_CTRL_NAME_LEN] |磁盘控制器名字符串| - |uint64_t sector_size |磁盘扇区大小| - |uint64_t cap_size |磁盘容量,单位:字节| - |uint16_t device_id |磁盘设备ID| - |uint16_t subsystem_device_id |磁盘子系统设备ID| - |uint16­_t vendor_id |磁盘厂商ID| - |uint16_t subsystem_vendor_id |磁盘子系统厂商ID| - |uint16_t controller_id |磁盘控制器ID| - |int8_t serial_number[20] |磁盘序列号| - |int8_t model_number[40] |设备型号| - |uint8_t firmware_revision[8] |固件版本号| - -##### struct __attribute__((packed)) libstorage_smart_info - -1. 原型 -``` -/* same with struct spdk_nvme_health_information_page in nvme_spec.h */ -struct __attribute__((packed)) libstorage_smart_info { -/* details of uint8_t critical_warning - -union spdk_nvme_critical_warning_state { - -uint8_t raw; -* - -struct { - -uint8_t available_spare : 1; - -uint8_t temperature : 1; - -uint8_t device_reliability : 1; - -uint8_t read_only : 1; - -uint8_t volatile_memory_backup : 1; - -uint8_t reserved : 3; - -} bits; - -}; -*/ -uint8_t critical_warning; -uint16_t temperature; -uint8_t available_spare; -uint8_t available_spare_threshold; -uint8_t percentage_used; -uint8_t reserved[26]; - -/* - -Note that the following are 128-bit values, but are - -defined as an array of 2 64-bit values. -*/ -/* Data Units Read is always in 512-byte units. */ -uint64_t data_units_read[2]; -/* Data Units Written is always in 512-byte units. */ -uint64_t data_units_written[2]; -/* For NVM command set, this includes Compare commands. */ -uint64_t host_read_commands[2]; -uint64_t host_write_commands[2]; -/* Controller Busy Time is reported in minutes. */ -uint64_t controller_busy_time[2]; -uint64_t power_cycles[2]; -uint64_t power_on_hours[2]; -uint64_t unsafe_shutdowns[2]; -uint64_t media_errors[2]; -uint64_t num_error_info_log_entries[2]; -/* Controller temperature related. */ -uint32_t warning_temp_time; -uint32_t critical_temp_time; -uint16_t temp_sensor[8]; -uint8_t reserved2[296]; -}; -``` -1. 描述 - -该数据结构定义了硬盘SMART INFO信息内容。 - -2. 结构体成员 - -| **成员** | **描述(具体可以参考NVMe协议)** | -|-----------------------------------|------------------------------------| -| uint8_t critical_warning | 该域表示控制器状态的重要的告警,bit位设置为1表示有效,可以设置
多个bit位有效。重要的告警信息通过异步事件返回给主机端。
Bit0:设置为1时表示冗余空间小于设定的阈值
Bit1:设置为1时表示温度超过或低于一个重要的阈值
Bit2:设置为1时表示由于重要的media错误或者internal error,器件的可靠性已经降低。
Bit3:设置为1时,该介质已经被置为只读模式。
Bit4:设置为1时,表示控制器的易失性器件fail,该域仅在控制器内部存在易失性器件时有效。
Bit 5~7:保留 | -| uint16_t temperature | 表示整个器件的温度,单位为Kelvin。 | -| uint8_t available_spare | 表示可用冗余空间的百分比(0到100%)。 | -| uint8_t available_spare_threshold | 可用冗余空间的阈值,低于该阈值时上报异步事件。 | -| uint8_t percentage_used | 该值表示用户实际使用和厂家设定的器件寿命的百分比,100表示已经达
到厂家预期的寿命,但可能不会失效,可以继续使用。该值允许大于100
,高于254的值都会被置为255。 | -| uint8_t reserved[26] | 保留 | -| uint64_t data_units_read[2] | 该值表示主机端从控制器中读走的512字节数目,其中1表示读走100
0个512字节,该值不包括metadata。当LBA大小不为512
B时,控制器将其转换成512B进行计算。16进制表示。 | -| uint64_t data_units_written[2] | 该值表示主机端写入控制器中的512字节数目,其中1表示写入1000
个512字节,该值不包括metadata。当LBA大小不为512B
时,控制器将其转换成512B进行计算。16进制表示。 | -| uint64_t host_read_commands[2] | 表示下发到控制器的读命令的个数。 | -| uint64_t host_write_commands[2] | 表示下发到控制器的写命令的个数 | -| uint64_t controller_busy_time[2] | 表示控制器处理I/O命令的busy时间,从命令下发SQ到完成命令返回到CQ的整个过程都为busy。该值以分钟为单位。 | -| uint64_t power_cycles[2] | 上下电次数。 | -| uint64_t power_on_hours[2] | power-on时间小时数。 | -| uint64_t unsafe_shutdowns[2] | 异常关机次数,掉电时仍未接收到CC.SHN时该值加1。 | -| uint64_t media_errors[2] | 表示控制器检测到不可恢复的数据完整性错误的次数,
其中包括不可纠的ECC错误,CRC错误,LBA tag不匹配。 | -| uint64_t num_error_info_log_entries[2] | 该域表示控制器生命周期内的错误信息日志的entry数目。 | -| uint32_t warning_temp_time | 温度超过warning告警值的累积时间,单位分钟。 | -| uint32_t critical_temp_time | 温度超过critical告警值的累积时间,单位分钟。 | -| uint16_t temp_sensor[8] | 温度传感器1~8的温度值,单位Kelvin。 | -| uint8_t reserved2[296] | 保留 | - -##### libstorage_dpdk_contig_mem - -1. 原型 -``` -struct libstorage_dpdk_contig_mem { -uint64_t virtAddr; -uint64_t memLen; -uint64_t allocLen; -}; -``` -2. 描述 - -DPDK内存初始化之后,通知业务层初始化完成的回调函数参数中描述一段连续虚拟内存的信息。 - -当前HSAK预留了800M内存,其他内存通过该结构体中的allocLen返回给业务层,用于业务层申请内存自行管理。 - -HSAK需要预留的总内存是800M左右,每一个内存段上预留的内存是根据环境的NUMA节点数来计算的。在NUMA节点过多时,每个内存段上预留的内存过小,会导致HSAK初始化失败。因此HSAK只支持最多4个NUMA节点的环境。 - -3. 结构体成员 - -| **成员** | **描述** | -|--------------------|----------------------| - |uint64_t virtAddr |虚拟内存起始地址。| - |uint64_t memLen |虚拟内存长度,单位:字节。| - |uint64_t allocLen |该内存段中可用的内存长度,单位:字节。| - -##### struct libstorage_dpdk_init_notify_arg - -1. 原型 -``` -struct libstorage_dpdk_init_notify_arg { -uint64_t baseAddr; -uint16_t memsegCount; -struct libstorage_dpdk_contig_mem *memseg; -}; -``` -2. 描述 - -用于DPDK内存初始化之后,通知业务层初始化完成的回调函数参数,表示所有虚拟内存段信息。 - -3. 结构体成员 - -| **成员** | **描述**| -|------------------------|-----------------------| - |uint64_t baseAddr |虚拟内存起始地址。| - |uint16_t memsegCount |有效的'memseg'数组成员个数,即连续的虚拟内存段的段数。| - |struct libstorage_dpdk_contig_mem *memseg |指向内存段数组的指针,每个数组元素都是一段连续的虚拟内存,两两元素之间是不连续的。| - -##### struct libstorage_dpdk_init_notify - -1. 原型 -``` -struct libstorage_dpdk_init_notify { -const char *name; -void (*notifyFunc)(const struct libstorage_dpdk_init_notify_arg *arg); -TAILQ_ENTRY(libstorage_dpdk_init_notify) tailq; -}; -``` -2. 描述 - -用于DPDK内存初始化之后,通知业务层回调函数注册的结构体。 - -3. 结构体成员 - -| **成员** | **描述**| -|-------------------------------|--------------------------| - |const char *name |注册的回调函数的业务层模块名字。| - |void (*notifyFunc)(const struct libstorage_dpdk_init_notify_arg *arg) |DPDK内存初始化之后,通知业务层初始化完成的回调函数参数。| - |TAILQ_ENTRY(libstorage_dpdk_init_notify) tailq |存放回调函数注册的链表。| - -#### ublock.h - -##### struct ublock_bdev_info - -1. 原型 -``` -struct ublock_bdev_info { -uint64_t sector_size; -uint64_t cap_size; // cap_size -uint16_t device_id; -uint16_t subsystem_device_id; // subsystem device id of nvme control -uint16_t vendor_id; -uint16_t subsystem_vendor_id; -uint16_t controller_id; -int8_t serial_number[20]; -int8_t model_number[40]; -int8_t firmware_revision[8]; -}; -``` -2. 描述 - -该数据结构中包含硬盘设备信息。 - -3. 结构体成员 - -| **成员** | **描述**| -|------------------|------------| - |uint64_t sector_size |硬盘扇区大小,比如512字节 | - |uint64_t cap_size |硬盘总容量,字节为单位 | - |uint16_t device_id |设备id号 | - |uint16_t subsystem_device_id |子系统的设备id号 | - |uint16_t vendor_id |设备厂商主id号 | - |uint16_t subsystem_vendor_id |设备厂商子id号 | - |uint16_t controller_id |设备控制器id号 | - |int8_t serial_number[20] |设备序列号 | - |int8_t model_number[40] |设备型号 | - |int8_t firmware_revision[8] |固件版本号 | - -##### struct ublock_bdev - -1. 原型 -``` -struct ublock_bdev { -char pci[UBLOCK_PCI_ADDR_MAX_LEN]; -struct ublock_bdev_info info; -struct spdk_nvme_ctrlr *ctrlr; -TAILQ_ENTRY(ublock_bdev) link; -}; -``` -2. 描述 - -该数据结构中包含指定PCI地址的硬盘信息,而结构本身为队列的一个节点。 - -3. 结构体成员 - - |**成员** | **描述** | -|-----------------------------------|----------------------------------------------------------------------------------------------------| - |char pci[UBLOCK_PCI_ADDR_MAX_LEN] | PCI地址 | - |struct ublock_bdev_info info | 硬盘设备信息 | - |struct spdk_nvme_ctrlr *ctrlr | 设备控制器数据结构,该结构体内成员不对外开放,外部业务可通过SPDK开源接口获取相应成员数据。 | - |TAILQ_ENTRY(ublock_bdev) link | 队列前后指针结构体 | - -##### struct ublock_bdev_mgr - -1. 原型 -``` -struct ublock_bdev_mgr { -TAILQ_HEAD(, ublock_bdev) bdevs; -}; -``` -2. 描述 - -该数据结构内定义了一个ublock_bdev队列的头结构。 - -3. 结构体成员 - - |**成员** | **描述** | - |---------------------------------|------------------| - |TAILQ_HEAD(, ublock_bdev) bdevs; | 队列头结构体 | - -##### struct __attribute__((packed)) ublock_SMART_info - -1. 原型 -``` -struct __attribute__((packed)) ublock_SMART_info { -uint8_t critical_warning; -uint16_t temperature; -uint8_t available_spare; -uint8_t available_spare_threshold; -uint8_t percentage_used; -uint8_t reserved[26]; -/* - -Note that the following are 128-bit values, but are - -defined as an array of 2 64-bit values. -*/ -/* Data Units Read is always in 512-byte units. */ -uint64_t data_units_read[2]; -/* Data Units Written is always in 512-byte units. */ -uint64_t data_units_written[2]; -/* For NVM command set, this includes Compare commands. */ -uint64_t host_read_commands[2]; -uint64_t host_write_commands[2]; -/* Controller Busy Time is reported in minutes. */ -uint64_t controller_busy_time[2]; -uint64_t power_cycles[2]; -uint64_t power_on_hours[2]; -uint64_t unsafe_shutdowns[2]; -uint64_t media_errors[2]; -uint64_t num_error_info_log_entries[2]; -/* Controller temperature related. */ -uint32_t warning_temp_time; -uint32_t critical_temp_time; -uint16_t temp_sensor[8]; -uint8_t reserved2[296]; -}; -``` -2. 描述 - -该数据结构定义了硬盘SMART INFO信息内容。 - -3. 结构体成员 - -| **成员** | **描述(具体可以参考NVMe协议)** | -|-----------------------------------|-----------------------------------| -| uint8_t critical_warning | 该域表示控制器状态的重要的告警,bit位设置为1表示有效,可以设置
多个bit位有效。重要的告警信息通过异步事件返回给主机端。
Bit0:设置为1时表示冗余空间小于设定的阈值
Bit1:设置为1时表示温度超过或低于一个重要的阈值
Bit2:设置为1时表示由于重要的media错误或者internal error,器件的可靠性已经降低。
Bit3:设置为1时,该介质已经被置为只读模式。
Bit4:设置为1时,表示控制器的易失性器件fail,该域仅在控制器内部存在易失性器件时有效。
Bit 5~7:保留 | -| uint16_t temperature | 表示整个器件的温度,单位为Kelvin。 | -| uint8_t available_spare | 表示可用冗余空间的百分比(0到100%)。 | -| uint8_t available_spare_threshold | 可用冗余空间的阈值,低于该阈值时上报异步事件。 | -| uint8_t percentage_used | 该值表示用户实际使用和厂家设定的器件寿命的百分比,100表示已经达
到厂家预期的寿命,但可能不会失效,可以继续使用。该值允许大于100
,高于254的值都会被置为255。 | -| uint8_t reserved[26] | 保留 | -| uint64_t data_units_read[2] | 该值表示主机端从控制器中读走的512字节数目,其中1表示读走100
0个512字节,该值不包括metadata。当LBA大小不为512B
时,控制器将其转换成512B进行计算。16进制表示。 | -| uint64_t data_units_written[2] | 该值表示主机端写入控制器中的512字节数目,其中1表示写入1000
个512字节,该值不包括metadata。当LBA大小不为512B
时,控制器将其转换成512B进行计算。16进制表示。 | -| uint64_t host_read_commands[2] | 表示下发到控制器的读命令的个数。 | -| uint64_t host_write_commands[2] | 表示下发到控制器的写命令的个数 | -| uint64_t controller_busy_time[2] | 表示控制器处理I/O命令的busy时间,从命令下发SQ到完成命令返回到CQ的整个过程都为busy。该值以分钟为单位。| -| uint64_t power_cycles[2] | 上下电次数。 | -| uint64_t power_on_hours[2] | power-on时间小时数。 | -| uint64_t unsafe_shutdowns[2] | 异常关机次数,掉电时仍未接收到CC.SHN时该值加1。 | -| uint64_t media_errors[2] | 表示控制器检测到不可恢复的数据完整性错误的次数,其中包括不可纠的E
CC错误,CRC错误,LBA
tag不匹配。 | -| uint64_t num_error_info_log_entries[2] | 该域表示控制器生命周期内的错误信息日志的entry数目。 | -| uint32_t warning_temp_time | 温度超过warning告警值的累积时间,单位分钟。 | -| uint32_t critical_temp_time | 温度超过critical告警值的累积时间,单位分钟。 | -| uint16_t temp_sensor[8] | 温度传感器1~8的温度值,单位Kelvin。 | -| uint8_t reserved2[296] | 保留 | - -##### struct ublock_nvme_error_info - -1. 原型 -``` -struct ublock_nvme_error_info { -uint64_t error_count; -uint16_t sqid; -uint16_t cid; -uint16_t status; -uint16_t error_location; -uint64_t lba; -uint32_t nsid; -uint8_t vendor_specific; -uint8_t reserved[35]; -}; -``` -2. 描述 - -该数据结构中包含设备控制器中单条错误信息具体内容,不同控制器可支持的错误条数可能不同。 - -3. 结构体成员 - - |**成员** | **描述(具体可以参考NVMe协议)** | - |------------------------|----------------------------------------------------------------------------------------------------------------------------------------| - |uint64_t error_count | Error序号,累增。 | - |uint16_t sqid | 此字段指示与错误信息关联的命令的提交队列标识符。如果错误无法关联特定命令,则该字段应设置为FFFFh。 | - |uint16_t cid | 此字段指示与错误信息关联的命令标识符。如果错误无法关联特定命令,则该字段应设置为FFFFh。 | - |uint16_t status | 此字段指示已完成命令的"状态字段"。 | - |uint16_t error_location | 此字段指示与错误信息关联的命令参数。 | - |uint64_t lba | 该字段表示遇到错误情况的第一个LBA。 | - |uint32_t nsid | 该字段表示遇到错误情况的namespace。 | - |uint8_t vendor_specific | 如果有其他供应商特定的错误信息可用,则此字段提供与该页面关联的日志页面标识符。 值00h表示没有可用的附加信息。有效值的范围为80h至FFh。 | - |uint8_t reserved[35] | 保留 | - -##### struct ublock_uevent - -1. 原型 -``` -struct ublock_uevent { -enum ublock_nvme_uevent_action action; -int subsystem; -char traddr[UBLOCK_TRADDR_MAX_LEN + 1]; -}; -``` -2. 描述 - -该数据结构中包含用于表示uevent事件的相关参数。 - -3. 结构体成员 - - | **成员** | **描述** | - |----------------------------------------|-------------------------------------------------------------------------------------------------------------------------| - | enum ublock_nvme_uevent_action action | 通过枚举,表示uevent事件类型为插入硬盘,还是移除硬盘。 | - | int subsystem | 表示uevent事件的子系统类型,当前仅支持UBLOCK_NVME_UEVENT_SUBSYSTEM_UIO,如果应用程序收到其他值,则可不处理。 | - | char traddr[UBLOCK_TRADDR_MAX_LEN + 1] | 以"域:总线:设备.功能"(%04x:%02x:%02x.%x)格式表示的PCI地址字符串。 | - -##### struct ublock_hook - -1. 原型 -``` -struct ublock_hook -{ -ublock_callback_func ublock_callback; -void *user_data; -}; -``` -2. 描述 - -该数据结构用于注册回调函数。 - -3. 结构体成员 - -| **成员** | **描述** | -|---------------------------------------|---------------------------------------------------------------------------| -| ublock_callback_func ublock_callback | 表示回调时执行的函数,类型为bool func(void *info, void *user_data). | -| void *user_data | 传给回调函数的用户参数 | - -##### struct ublock_ctrl_iostat_info - -1. 原型 -``` -struct ublock_ctrl_iostat_info -{ -uint64_t num_read_ops; -uint64_t num_write_ops; -uint64_t read_latency_ms; -uint64_t write_latency_ms; -uint64_t io_outstanding; -uint64_t num_poll_timeout; -uint64_t io_ticks_ms; -}; -``` -2. 描述 - -该数据结构用于获取控制器的IO统计信息。 - -3. 结构体成员 - -| **成员** | **描述** | -|-----------------------------|---------------------------------------------| -| uint64_t num_read_ops | 获取的该控制器的读IO个数(累加值) | -| uint64_t num_write_ops | 获取的该控制器的写IO个数(累加值) | -| uint64_t read_latency_ms | 获取的该控制器的读时延(累加值,ms) | -| uint64_t write_latency_ms | 获取的该控制器的写时延(累加值,ms) | -| uint64_t io_outstanding | 获取的该控制器的队列深度 | -| uint64_t num_poll_timeout | 获取的该控制器的轮询超时次数(累加值) | -| uint64_t io_ticks_ms | 获取的该控制器的IO处理时延(累加值,ms) | - -### API - -#### bdev_rw.h - -##### libstorage_get_nvme_ctrlr_info - -1. 接口原型 - -uint32_t libstorage_get_nvme_ctrlr_info(struct libstorage_nvme_ctrlr_info** ppCtrlrInfo); - -2. 接口描述 - -获取所有控制器信息。 - -3. 参数 - -| **参数成员** | **描述** | -|-----------------------------------|-----------------------------------| -| struct libstorage_nvme_ctrlr_info** ppCtrlrInfo| 出参,返回所有获取到的控制器信息。
说明:
使用后务必通过free接口释放内存。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|----------------------------------------------| -| 0 | 控制器信息获取失败,或未获取到任何控制器信息 | -| 大于0 | 获取到的控制器个数 | - -##### libstorage_get_mgr_info_by_esn - -1. 接口原型 -``` -int32_t libstorage_get_mgr_info_by_esn(const char *esn, struct libstorage_mgr_info *mgr_info); -``` -2. 接口描述 - -数据面获取设备序列号(ESN)对应的NVMe磁盘的管理信息。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------------------|----------------------------------------------| -| const char *esn | 被查询设备的ESN号
说明:
ESN号是最大有效长度为20的字符串(不包括字符串结束符),但该长
度根据不同硬件厂商可能存在差异,如不足20字符,需要在字符串末尾加
空格补齐。 | -| struct libstorage_mgr_info *mgr_info | 出参,返回所有获取到的NVMe磁盘管理信息。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|------------------------------------------| -| 0 | 查询ESN对应的NVMe磁盘管理信息成功。 | -| -1 | 查询ESN对应的NVMe磁盘管理信息失败。 | -| -2 | 未获取到任何匹配ESN的NVMe磁盘。 | - -##### libstorage_get_mgr_smart_by_esn - -1. 接口原型 -``` -int32_t libstorage_get_mgr_smart_by_esn(const char *esn, uint32_t nsid, struct libstorage_smart_info *mgr_smart_info); -``` -2. 接口描述 - -数据面获取设备序列号(ESN)对应的NVMe磁盘的SMART信息。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------------------|------------------------------------------| -| const char *esn | 被查询设备的ESN号
说明:
ESN号是最大有效长度为20的字符串(不包括字符串结束符),但该长
度根据不同硬件厂商可能存在差异,如不足20字符,需要在字符串末尾加
空格补齐。 | -| uint32_t nsid | 指定的namespace | -| struct libstorage_mgr_info *mgr_info | 出参,返回所有获取到的NVMe磁盘SMART信息。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|---------------------------------------| -| 0 | 查询ESN对应的NVMe磁盘SMART信息成功。 | -| -1 | 查询ESN对应的NVMe磁盘SMART信息失败。 | -| -2 | 未获取到任何匹配ESN的NVMe磁盘。 | - -##### libstorage_get_bdev_ns_info - -1. 接口原型 -``` -uint32_t libstorage_get_bdev_ns_info(const char* bdevName, struct libstorage_namespace_info** ppNsInfo); -``` -2. 接口描述 - -根据设备名称,获取namespace信息。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------------------|---------------------------------------| -| const char* bdevName | 设备名称 | -| struct libstorage_namespace_info** ppNsInfo | 出参,返回namespace信息。
说明
使用后务必通过free接口释放内存。 | - -4. 返回值 - -| **返回值**| **描述** | -|------------|---------------| -| 0 | 获取失败 | -| 1 | 获取成功 | - -##### libstorage_get_ctrl_ns_info - -1. 接口原型 -``` -uint32_t libstorage_get_ctrl_ns_info(const char* ctrlName, struct libstorage_namespace_info** ppNsInfo); -``` -2. 接口描述 - -根据控制器名称,获取所有namespace信息。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------------------|---------------------------------------| -| const char* ctrlName | 控制器名称 | -| struct libstorage_namespace_info** ppNsInfo| 出参,返回所有namespace信息。
说明
使用后务必通过free接口释放内存。 | - -4. 返回值 - -| **返回值**| **描述** | -|------------|-------------------------------------------| -| 0 | 获取失败,或未获取到任何namespace信息 | -| 大于0 | 获取到的namespace个数 | - -##### libstorage_create_namespace - -1. 接口原型 -``` -int32_t libstorage_create_namespace(const char* ctrlName, uint64_t ns_size, char** outputName); -``` -2. 接口描述 - -在指定控制器上创建namespace(前提是控制器具有namespace管理能力)。 - -Optane盘基于NVMe 1.0协议,不支持namespace管理,因此不支持该接口的使用。 - -ES3000 V3和V5默认只支持一个namespace。在控制器上默认会存在一个namespace,如果要创建新的namespace,需要将原有namespace删除。 - -3. 参数 - -| **参数成员** | **描述** | -|--------------------------|-------------------------------------------| -| const char* ctrlName | 控制器名称 | -| uint64_t ns_size | 要创建的namespace大小(以sertor_size为单位) | -| char** outputName | 出参:创建的namespace名称
说明
使用后务必通过free接口释放内存。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-----------------------------------| -| 小于等于0 | 创建namespace失败 | -| 大于0 | 所创建的namespace编号(从1开始) | - -##### libstorage_delete_namespace - -1. 接口原型 -``` -int32_t libstorage_delete_namespace(const char* ctrlName, uint32_t ns_id); -``` -2. 接口描述 - -在指定控制器上删除namespace。Optane盘基于NVMe 1.0协议,不支持namespace管理,因此不支持该接口的使用。 - -3. 参数 - -| **参数成员** | **描述** | -|-----------------------|-------------------| -| const char* ctrlName | 控制器名字 | -| uint32_t ns_id | Namespace ID | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-------------------------------------------| -| 0 | 删除成功 | -| 非0 | 删除失败
说明
删除namespace前要求先停止IO相关动作,否则删除失败。 | - -##### libstorage_delete_all_namespace - -1. 接口原型 -``` -int32_t libstorage_delete_all_namespace(const char* ctrlName); -``` -2. 接口描述 - -删除指定控制器上所有namespace。Optane盘基于NVMe 1.0协议,不支持namespace管理,因此不支持该接口的使用。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------|----------------| -| const char* ctrlName |控制器名称 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-------------------------------------------| -| 0 | 删除成功 | -| 非0 | 删除失败
说明
删除namespace前要求先停止IO相关动作,否则删除失败。 | - -##### libstorage_nvme_create_ctrlr - -1. 接口原型 -``` -int32_t libstorage_nvme_create_ctrlr(const char *pci_addr, const char *ctrlr_name); -``` -2. 接口描述 - -根据PCI地址创建NVMe控制器。 - -3. 参数 - -| **参数成员** | **描述** | -|--------------------|-------------------| -| char *pci_addr |PCI地址 | -| char *ctrlr_name |控制器名称 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|--------------| -| 小于0 | 创建失败 | -| 0 | 创建成功 | - -##### libstorage_nvme_delete_ctrlr - -1. 接口原型 -``` -int32_t libstorage_nvme_delete_ctrlr(const char *ctrlr_name); -``` -1. 接口描述 - -根据控制器名称销毁NVMe控制器。 - -2. 参数 - -| **参数成员** | **描述** | -|-------------------------|-----------------| -| const char *ctrlr_name | 控制器名称 | - -确保已下发的io已经全部返回后方可调用本接口。 - -3. 返回值 - -| **返回值** | **描述** | -|-------------|--------------| -| 小于0 | 销毁失败 | -| 0 | 销毁成功 | - -##### libstorage_nvme_reload_ctrlr - -1. 接口原型 -``` -int32_t libstorage_nvme_reload_ctrlr(const char *cfgfile); -``` -2. 接口描述 - -根据配置文件增删NVMe控制器。 - -3. 参数 - -| **参数成员** | **描述** | -|----------------------|-------------------| -| const char *cfgfile | 配置文件路径 | - - -使用本接口删盘时,需要确保已下发的io已经全部返回。 - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-----------------------------------------------------| -| 小于0 | 根据配置文件增删盘失败(可能部分控制器增删成功) | -| 0 | 根据配置文件增删盘成功 | - -> 使用限制 - -- 目前最多支持在配置文件中配置36个控制器。 - -- 重加载接口会尽可能创建多的控制器,某个控制器创建失败,不会影响其他控制器的创建。 - -- 无法保证并发场景下最终的盘初始化情况与最后调用传入的配置文件相符。 - -- 对正在下发io的盘通过reload删除时,会导致io失败。 - -- 修改配置文件中pci地址对应的控制器名称(e.g.nvme0),调用此接口后无法生效。 - -- reload仅针对于增删盘的场景有效,配置文件中的其他配置项修改无法重载。 - -##### libstorage_low_level_format_nvm - -1. 接口原型 -``` -int8_t libstorage_low_level_format_nvm(const char* ctrlName, uint8_t lbaf, -enum libstorage_ns_pi_type piType, -bool pil_start, bool ms_extented, uint8_t ses); -``` -2. 接口描述 - -低级格式化NVMe盘。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------------------------|----------------------------------------------------------------------------| -| const char* ctrlName | 控制器名称 | -| uint8_t lbaf | 所要使用的LBA格式 | -| enum libstorage_ns_pi_type piType |所要使用的保护类型 | -| bool pil_start | pi信息位于元数据的first eight bytes(1) or last eight bytes (0) | -| bool ms_extented | 是否要格式化成扩展型 | -| uint8_t ses | 格式化时是否进行安全擦除(当前仅支持设置为0:no-secure earse) | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-----------------------------| -| 小于0 | 格式化失败 | -| 大于等于0 | 当前格式化成功的LBA格式 | - -> 使用限制 - -- 该低级格式化接口会清除磁盘namespace的数据和元数据,请谨慎使用。 - -- ES3000盘在格式化时耗时数秒,Intel Optane盘在格式化时需耗时数分钟,在使用该接口时需要等待其执行完成。若强行杀掉格式化进程,会导致格式化失败。 - -- 在格式化执行之前,需要停止数据面的IO操作。如果当前磁盘正在处理IO请求,格式化操作会概率性出现失败,并且在格式化成功的情况下会存在硬盘丢弃正在处理的IO的可能,所以在格式化前,请保证数据面的IO操作已停止。 - -- 格式化过程中会reset控制器,导致之前已经初始化的磁盘资源不可用。因此格式化完成之后,需要重启数据面IO进程。 - -- ES3000 V3支持保护类型0和3,支持PI start和PI end,仅支持mc extended。ES3000 V3的512+8格式支持DIF,4096+64格式不支持。 - -- ES3000 V5支持保护类型0和3,支持PI start和PI end,支持mc extended和mc pointer。ES3000 V5的512+8和4096+64格式均支持DIF。 - -- Optane盘支持保护类型0和1,仅支持PI end,仅支持mc extended。Optane的512+8格式支持DIF,4096+64格式不支持。 - -| **磁盘类型** | **LBA格式** | **磁盘类型** | **LBA格式** | -|----------------------|-----------------|---------------|-------------------| -| Intel Optane P4800 | lbaf0:512+0
lbaf1:512+8
lbaf2:512+16
lbaf3:4096+0
lbaf4:4096+8
lbaf5:4096+64
lbaf6:4096+128 | ES3000 V3、V5 | lbaf0:512+0
lbaf1:512+8
lbaf2:4096+64
lbaf3:4096+0
lbaf4:4096+8 | - -##### LIBSTORAGE_CALLBACK_FUNC - -1. 接口原型 -``` -typedef void (*LIBSTORAGE_CALLBACK_FUNC)(int32_t cb_status, int32_t sct_code, void* cb_arg); -``` -2. 接口描述 - -注册的HSAK io完成回调函数。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|-----------------------------| -| int32_t cb_status | io 状态码,0为成功,负值为系统错误码,正值为硬盘错误码(不同错误码的
含义见[附录](#附录)) | -| int32_t sct_code | io 状态码类型(0:[GENERIC](#generic);
1:[COMMAND_SPECIFIC](#command_specific);
2:[MEDIA_DATA_INTERGRITY_ERROR](#media_data_intergrity_error)
7:VENDOR_SPECIFIC) | -| void* cb_arg | 回调函数的入参 | - -4. 返回值 - -无。 - -##### libstorage_deallocate_block - -1. 接口原型 -``` -int32_t libstorage_deallocate_block(int32_t fd, struct libstorage_dsm_range_desc *range, uint16_t range_count, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` -2. 接口描述 - -告知NVMe盘可释放的块。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|-----------------------------| -| int32_t fd | 已打开的硬盘文件描述符 | -| struct libstorage_dsm_range_desc *range | NVMe盘可释放的块描述列表
说明
该参数需要使用libstorage_mem_reserve分配
大页内存,分配内存时需要4K对齐,即align设置为4096。
盘的TRIM的范围根据不同的盘进行约束,超过盘侧的最大TRIM范围
可能触发数据异常。 | -| uint16_t range_count | 数组range的成员数 | -| LIBSTORAGE_CALLBACK_FUNC cb | 回调函数 | -| void* cb_arg | 回调函数参数 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|----------------| -| 小于0 | 请求下发失败 | -| 0 | 请求下发成功 | - -##### libstorage_async_write - -1. 接口原型 -``` -int32_t libstorage_async_write(int32_t fd, void *buf, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` -2. 接口描述 - -HSAK下发异步IO写请求的接口(写缓冲区为连续buffer)。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|-----------------------------| -| int32_t fd | 块设备的文件描述符 | -| void *buf | IO写数据的缓冲区(四字节对齐,不能跨4K页面边界)
说明
注:扩展型LBA要包含元数据内存大小。 | -| size_t nbytes | 单次写IO大小(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| off64_t offset | LBA的写偏移(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| void *md_buf | 元数据的缓冲区(仅适用于分离型LBA,扩展型LBA设置为NULL即可) | -| size_t md_len | 元数据的缓冲区长度(仅适用于分离型LBA,扩展型LBA设置为0即可) | -| enum libstorage_crc_and_prchk dif_flag | 是否计算DIF、是否开启盘的校验 | -| LIBSTORAGE_CALLBACK_FUNC cb | 注册的回调函数 | -| void* cb_arg | 回调函数的参数 | - -4. 返回值 - -| **返回值**| **描述** | -|------------|--------------------| -| 0 | IO写请求提交成功 | -| 非0 | IO写请求提交失败 | - -##### libstorage_async_read - -1. 接口原型 -``` -int32_t libstorage_async_read(int32_t fd, void *buf, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` -2. 接口描述 - -HSAK下发异步IO读请求的接口(读缓冲区为连续buffer)。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|----------------------------------| -| int32_t fd | 块设备的文件描述符 | -| void *buf | IO读数据的缓冲区(四字节对齐,不能跨4K页面边界)
说明
扩展型LBA要包含元数据内存大小。 | -| size_t nbytes | 单次读IO大小(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| off64_t offset | LBA的读偏移(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| void *md_buf | 元数据的缓冲区(仅适用于分离型LBA,扩展型LBA设置为NULL即可) | -| size_t md_len | 元数据的缓冲区长度(仅适用于分离型LBA,扩展型LBA设置为0即可) | -| enum libstorage_crc_and_prchk dif_flag | 是否计算DIF、是否开启盘的校验 | -| LIBSTORAGE_CALLBACK_FUNC cb | 注册的回调函数 | -| void* cb_arg | 回调函数的参数 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|------------------| -| 0 | IO读请求提交成功 | -| 非0 | IO读请求提交失败 | - -##### libstorage_async_writev - -1. 接口原型 -``` -int32_t libstorage_async_writev(int32_t fd, struct iovec *iov, int iovcnt, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` -2. 接口描述 - -HSAK下发异步IO写请求的接口(写缓冲区为离散buffer)。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|----------------------------------| -| int32_t fd | 块设备的文件描述符 | -| struct iovec *iov | IO写数据的缓冲区
说明
扩展型LBA要包含元数据大小。
地址要求四字节对齐,长度不超过4GB。 | -| int iovcnt | IO写数据的缓冲区个数 | -| size_t nbytes | 单次写IO大小(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| off64_t offset | LBA的写偏移(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| void *md_buf | 元数据的缓冲区(仅适用于分离型LBA,扩展型LBA设置为NULL即可) | -| size_t md_len | 元数据的缓冲区长度(仅适用于分离型LBA,扩展型LBA设置为0即可) | -| enum libstorage_crc_and_prchk dif_flag | 是否计算DIF、是否开启盘的校验 | -| LIBSTORAGE_CALLBACK_FUNC cb | 注册的回调函数 | -| void* cb_arg | 回调函数的参数 | - -4. 返回值 - -| **返回值** | **描述** | -|--------------|-------------------| -| 0 | IO写请求提交成功 | -| 非0 | IO写请求提交失败 | - -##### libstorage_async_readv - -1. 接口原型 -``` -int32_t libstorage_async_readv(int32_t fd, struct iovec *iov, int iovcnt, size_t nbytes, off64_t offset, void *md_buf, size_t md_len, enum libstorage_crc_and_prchk dif_flag, LIBSTORAGE_CALLBACK_FUNC cb, void* cb_arg); -``` -2. 接口描述 - -HSAK下发异步IO读请求的接口(读缓冲区为离散buffer)。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|----------------------------------| -| int32_t fd | 块设备的文件描述符 | -| struct iovec *iov | IO读数据的缓冲区
说明
扩展型LBA要包含元数据大小。
地址要求四字节对齐,长度不超过4GB。 | -| int iovcnt | IO读数据的缓冲区个数 | -| size_t nbytes | 单次读IO大小(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| off64_t offset | LBA的读偏移(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| void *md_buf | 元数据的缓冲区(仅适用于分离型LBA,扩展型LBA设置为NULL即可) | -| size_t md_len | 元数据的缓冲区长度(仅适用于分离型LBA,扩展型LBA设置为0即可) | -| enum libstorage_crc_and_prchk dif_flag | 是否计算DIF、是否开启盘的校验 | -| LIBSTORAGE_CALLBACK_FUNC cb | 注册的回调函数 | -| void* cb_arg | 回调函数的参数 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|----------------------| -| 0 | IO读请求提交成功 | -| 非0 | IO读请求提交失败 | - -##### libstorage_sync_write - -1. 接口原型 -``` -int32_t libstorage_sync_write(int fd, const void *buf, size_t nbytes, off_t offset); -``` -2. 接口描述 - -HSAK下发同步IO写请求的接口(写缓冲区为连续buffer)。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|----------------------------------| -| int32_t fd | 块设备的文件描述符 | -| void *buf | IO写数据的缓冲区(四字节对齐,不能跨4K页面边界)
说明
扩展型LBA要包含元数据内存大小。 | -| size_t nbytes | 单次写IO大小(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| off64_t offset | LBA的写偏移(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-----------------------| -| 0 | IO写请求提交成功 | -| 非0 | IO写请求提交失败 | - -##### libstorage_sync_read - -1. 接口原型 -``` -int32_t libstorage_sync_read(int fd, const void *buf, size_t nbytes, off_t offset); -``` -2. 接口描述 - -HSAK下发同步IO读请求的接口(读缓冲区为连续buffer)。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------------------|----------------------------------| -| int32_t fd | 块设备的文件描述符 | -| void *buf | IO读数据的缓冲区(四字节对齐,不能跨4K页面边界)
说明
扩展型LBA要包含元数据内存大小。 | -| size_t nbytes | 单次读IO大小(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | -| off64_t offset | LBA的读偏移(单位:字节。sector_size的整数倍)
说明
仅包含数据大小,扩展型LBA也不含元数据大小。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-----------------------| -| 0 | IO读请求提交成功 | -| 非0 | IO读请求提交失败 | - -##### libstorage_open - -1. 接口原型 -``` -int32_t libstorage_open(const char* devfullname); -``` -2. 接口描述 - -打开块设备。 - -3. 参数 - -| **参数成员** | **描述** | -|--------------------------|---------------------------------| -| const char* devfullname | 块设备名称(格式为nvme0n1) | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-------------------------------------------------------------------| -| -1 | 打开失败(如设备名不对,或打开的fd数目>NVME盘的可使用通道数目) | -| 大于0 | 块设备的文件描述符 | - -开启nvme.conf.in中的MultiQ开关以后,同一个线程多次打开同一个设备,会返回不同的fd;否则仍返回同一个fd。该特性只针对NVME设备。 - -##### libstorage_close - -1. 接口原型 -``` -int32_t libstorage_close(int32_t fd); -``` -2. 接口描述 - -关闭块设备。 - -3. 参数 - -| **参数成员** | **描述** | -|--------------|---------------------------| -| int32_t fd |已打开的块设备的文件描述符 | - -4. 返回值 - -| **返回值**| **描述** | -|------------|--------------------------------| -| -1 | 无效文件描述符 | -| -16 | 文件描述符正忙,需要重试 | -| 0 | 关闭成功 | - -##### libstorage_mem_reserve - -1. 接口原型 -``` -void* libstorage_mem_reserve(size_t size, size_t align); -``` -2. 接口描述 - -从DPDK预留的大页内存中分配内存空间。 - -3. 参数 - -| **参数成员**| **描述** | -|---------------|-------------------------------| -| size_t size | 需要分配的内存的大小 | -| size_t align | 所分配的内存空间按照align对齐 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|---------------------------| -| NULL | 分配失败 | -| 非NULL | 所分配内存空间的地址 | - -##### libstorage_mem_free - -1. 接口原型 -``` -void libstorage_mem_free(void* ptr); -``` -2. 接口描述 - -释放ptr指向的内存空间。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------|--------------------------| -| void* ptr |所要释放的内存空间的地址 | - -4. 返回值 - -无。 - -##### libstorage_alloc_io_buf - -1. 接口原型 -``` -void* libstorage_alloc_io_buf(size_t nbytes); -``` -2. 接口描述 - -从SPDK的buf_small_pool或者buf_large_pool中分配内存。 - -3. 参数 - -| **参数成员** | **描述** | -|----------------|-----------------------------| -| size_t nbytes | 所需要分配的缓冲区大小 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|--------------------------| -| 非NULL | 所分配的缓冲区的首地址 | - -##### libstorage_free_io_buf - -1. 接口原型 -``` -int32_t libstorage_free_io_buf(void *buf, size_t nbytes); -``` -2. 接口描述 - -释放所分配的内存到SPDK的buf_small_pool或者buf_large_pool中。 - -3. 参数 - -| **参数成员** | **描述** | -|----------------|------------------------------| -| void *buf | 所要释放的缓冲区的首地址 | -| size_t nbytes | 所要释放的缓冲区的大小 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|--------------| -| -1 | 释放失败 | -| 0 | 释放成功 | - -##### libstorage_init_module - -1. 接口原型 -``` -int32_t libstorage_init_module(const char* cfgfile); -``` -2. 接口描述 - -HSAK模块初始化接口。 - -3. 参数 - -| **参数成员** | **描述** | -|----------------------|---------------------| -| const char* cfgfile | HSAK 配置文件名称 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|---------------| -| 非0 | 初始化失败 | -| 0 | 初始化成功 | - -##### libstorage_exit_module - -1. 接口原型 -``` -int32_t libstorage_exit_module(void); -``` -2. 接口描述 - -HSAK模块退出接口。 - -3. 参数 - -无。 - -4. 返回值 - -| **返回值** | **描述** | -|-------------|---------------| -| 非0 | 退出清理失败 | -| 0 | 退出清理成功 | - -##### LIBSTORAGE_REGISTER_DPDK_INIT_NOTIFY - -1. 接口原型 -``` -LIBSTORAGE_REGISTER_DPDK_INIT_NOTIFY(_name, _notify) -``` -2. 接口描述 - -业务层注册函数,用于注册DPDK初始化完成时的回调函数。 - -3. 参数 - -| **参数成员** | **描述** | -|----------------|---------------------------------------------------------------------------------------------------| -| _name |业务层模块名称。 | -| _notify |业务层注册的回调函数原型:void (*notifyFunc)(const struct libstorage_dpdk_init_notify_arg *arg); | - -4. 返回值 - -无 - -#### ublock.h - -##### init_ublock - -1. 接口原型 -``` -int init_ublock(const char *name, enum ublock_rpc_server_status flg); -``` -2. 接口描述 - -初始化Ublock功能模块,本接口必须在其他所有Ublock接口之前被调用。如果flag被置为UBLOCK_RPC_SERVER_ENABLE,即ublock作为rpc server,则同一个进程只能初始化一次。 - -在ublock作为rpc server启动时,会同时启动一个server的monitor线程。monitor线程监控到rpc server线程出现异常(如卡死时),会主动调用exit触发进程退出。 - -此时依赖于产品的脚本再次拉起相关进程。 - -3. 参数 - -| **参数成员** | **描述** | -|----------------------------------|---------------------------------| -| const char *name | 模块名字,默认值为"ublock",建议该参数可以传NULL。 | -| enum ublock_rpc_server_status
flg | 是否启用RPC的标记值:UBLOCK_RPC_SERVER_
DISABLE或UBLOCK_RPC_SERVER_ENAB
LE;
在不启用RPC情况下,如果硬盘被业务进程占用,那么Ublock模块
将无法获取该硬盘信息。 | - -4. 返回值 - -| **返回值** | **描述** | -|----------------------------------|---------------------------------| -| 0 | 初始化成功。 | -| -1 | 初始化失败,可能原因:Ublock模块已经被初始化。 | -| 进程exit | Ublock认为在两种情况下属于无法修复异常,直接调用exit接口
退出进程:
- 需要创建RPC服务,但RPC服务现场创建失败。
- 创建热插拔监控线程,但失败。 | - -##### ublock_init - -1. 接口原型 -``` -#define ublock_init(name) init_ublock(name, UBLOCK_RPC_SERVER_ENABLE) -``` -2. 接口描述 - -本身是对init_ublock接口的宏定义,可理解为将Ublock初始化为需要RPC服务。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------|----------------------------------------------------| -| name | 模块名字,默认值为"ublock",建议该参数可以传NULL。 | - -4. 返回值 - -| **返回值** | **描述** | -|---------------|----------------------------------------------------| -| 0 | 初始化成功。 | -| -1 | 初始化失败,可能原因:Ublock rpc
server模块已经被初始化。 | -| 进程exit | Ublock认为在两种情况下属于无法修复异常,直接调用exit接口
退出进程:
- 需要创建RPC服务,但RPC服务现场创建失败。
- 创建热插拔监控线程,但失败。 | - -##### ublock_init_norpc - -1. 接口原型 -``` -#define ublock_init_norpc(name) init_ublock(name, UBLOCK_RPC_SERVER_DISABLE) -``` -2. 接口描述 - -本身是对init_ublock接口的宏定义,可理解为将Ublock初始化为无RPC服务。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------|------------------------------------------------------| -| name | 模块名字,默认值为"ublock",建议该参数可以传NULL。 | - -4. 返回值 - -| **返回值** | **描述** | -|---------------------------------|-----------------------------| -| 0 | 初始化成功。 | -| -1 | 初始化失败,可能原因:Ublock
client模块已经被初始化。 | -| 进程exit | Ublock认为在两种情况下属于无法修复异常,直接调用exit接口
退出进程:
- 需要创建RPC服务,但RPC服务现场创建失败。
- 创建热插拔监控线程,但失败。 | - -##### ublock_fini - -1. 接口原型 -``` -void ublock_fini(void); -``` -2. 接口描述 - -销毁Ublock功能模块,本接口将销毁Ublock模块以及内部创建的资源,本接口同Ublock初始化接口需要配对使用。 - -3. 参数 - -无。 - -4. 返回值 - -无。 - -##### ublock_get_bdevs - -1. 接口原型 -``` -int ublock_get_bdevs(struct ublock_bdev_mgr* bdev_list); -``` -2. 接口描述 - -业务进程通过调用本接口获取设备列表(环境上所有的NVME设备,包括内核态驱动和用户态驱动),获取的NVMe设备列表中只有PCI地址,不包含具体设备信息,需要获取具体设备信息,请调用接口ublock_get_bdev。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------------------------|------------------------------------------------------| -| struct ublock_bdev_mgr* bdev_list |出参,返回设备队列,bdev_list指针需要在外部分配。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-----------------------| -| 0 | 获取设备队列成功。 | -| -2 | 环境中没有NVMe设备。 | -| 其余值 | 获取设备队列失败。 | - -##### ublock_free_bdevs - -1. 接口原型 -``` -void ublock_free_bdevs(struct ublock_bdev_mgr* bdev_list); -``` -2. 接口描述 - -业务进程通过调用本接口释放设备列表。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------------------------|--------------------------------------------------------------| -| struct ublock_bdev_mgr* bdev_list |设备队列头指针,设备队列清空后,bdev_list指针本身不会被释放。 | - -4. 返回值 - -无。 - -##### ublock_get_bdev - -1. 接口原型 -``` -int ublock_get_bdev(const char *pci, struct ublock_bdev *bdev); -``` -2. 接口描述 - -业务进程通过调用本接口获取具体某个设备的信息,设备信息中:NVMe设备的序列号、型号、fw版本号信息以字符数组形式保存,不是字符串形式(不同硬盘控制器返回形式不同,不保证数组结尾必定含有"0")。 - -本接口调用后,对应设备会被Ublock占用,请务必在完成相应业务操作后立即调用ublock_free_bdev释放资源。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------------------|--------------------------------------------------| -| const char *pci | 需要获取信息的设备PCI地址 | -| struct ublock_bdev *bdev | 出参,返回设备信息,bdev指针需要在外部分配。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|------------------------------------------------------------| -| 0 | 获取设备信息成功。 | -| -1 | 获取设备信息失败,如参数错误等。 | -| -11(EAGAIN) | 获取设备信息失败,如rpc查询失败,需要重试(建议sleep 3s)。 | - -##### ublock_get_bdev_by_esn - -1. 接口原型 -``` -int ublock_get_bdev_by_esn(const char *esn, struct ublock_bdev *bdev); -``` -2. 接口描述 - -业务进程通过调用本接口,根据给定的ESN号获取对应设备的信息,设备信息中:NVMe设备的序列号、型号、fw版本号信息以字符数组形式保存,不是字符串形式(不同硬盘控制器返回形式不同,不保证数组结尾必定含有"0")。 - -本接口调用后,对应设备会被Ublock占用,请务必在完成相应业务操作后立即调用ublock_free_bdev释放资源。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------------------|--------------------------------------------------| -| const char *esn | 需要获取信息的设备ESN号。
说明
ESN号是最大有效长度为20的字符串(不包括字符串结束符),但该长
度根据不同硬件厂商可能存在差异,如不足20字符,需要在字符串末尾加
空格补齐。 | -| struct ublock_bdev *bdev | 出参,返回设备信息,bdev指针需要在外部分配。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|--------------------------------------------------------------| -| 0 | 获取设备信息成功。 | -| -1 | 获取设备信息失败,如参数错误等 | -| -11(EAGAIN)| 获取设备信息失败,如rpc查询失败,需要重试(建议sleep 3s)。 | - -##### ublock_free_bdev - -1. 接口原型 -``` -void ublock_free_bdev(struct ublock_bdev *bdev); -``` -2. 接口描述 - -业务进程通过调用本接口释放设备资源。 - -3. 参数 - -| **参数成员** | **描述** | -|----------------------------|-------------------------------------------------------------| -| struct ublock_bdev *bdev | 设备信息指针,该指针内数据清空后,bdev指针本身不会被释放。 | - -4. 返回值 - -无。 - -##### TAILQ_FOREACH_SAFE - -1. 接口原型 -``` -#define TAILQ_FOREACH_SAFE(var, head, field, tvar) -for ((var) = TAILQ_FIRST((head)); -(var) && ((tvar) = TAILQ_NEXT((var), field), 1); -(var) = (tvar)) -``` -2. 接口描述 - -提供安全访问队列每个成员的宏定义。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------|----------------------------------------------------------------------------------------------------| -| var | 当前操作的队列节点成员 | -| head | 队列头指针,一般情况下是指通过TAILQ_HEAD(xx, xx) obj定义的obj的地址 | -| field | 队列节点中用于保存队列前后指针的结构体名字,一般情况下是指通过TAILQ_ENTRY(xx)name定义的名字name | -| tvar | 下一个队列节点成员 | - -4. 返回值 - -无。 - -##### ublock_get_SMART_info - -1. 接口原型 -``` -int ublock_get_SMART_info(const char *pci, uint32_t nsid, struct ublock_SMART_info *smart_info); -``` -2. 接口描述 - -业务进程通过调用本接口获取指定设备的SMART信息。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------------------------------|----------------------------| -| const char *pci | 设备PCI地址 | -| uint32_t nsid | 指定的namespace | -| struct ublock_SMART_info *smart_info | 出参,返回设备SMART信息 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|---------------------------------------------------------------| -| 0 | 获取SMART信息成功。 | -| -1 | 获取SMART信息失败,如参数错误等。 | -| -11(EAGAIN)| 获取SMART信息失败,如rpc查询失败,需要重试(建议sleep 3s)。 | - -##### ublock_get_SMART_info_by_esn - -1. 接口原型 -``` -int ublock_get_SMART_info_by_esn(const char *esn, uint32_t nsid, struct ublock_SMART_info *smart_info); -``` -2. 接口描述 - -业务进程通过调用本接口获取ESN号对应设备的SMART信息。 - -3. 参数 - -| **参数成员** | **描述** | -|--------------------------|-----------------------------------------------| -| const char *esn | 设备ESN号
说明
ESN号是最大有效长度为20的字符串(不包括字符串结束符),但该长
度根据不同硬件厂商可能存在差异,如不足20字符,需要在字符串末尾加
空格补齐。 | -| uint32_t nsid | 指定的namespace | -| struct ublock_SMART_info
*smart_info | 出参,返回设备SMART信息 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|--------------------------------------------------------------| -| 0 | 获取SMART信息成功。 | -| -1 | 获取SMART信息失败,如参数错误等。 | -| -11(EAGAIN) | 获取SMART信息失败,如rpc查询失败,需要重试(建议sleep 3s)。 | - -##### ublock_get_error_log_info - -1. 接口原型 -``` -int ublock_get_error_log_info(const char *pci, uint32_t err_entries, struct ublock_nvme_error_info *errlog_info); -``` -2. 接口描述 - -业务进程通过调用本接口获取指定设备的Error log信息。 - -3. 参数 - -| **参数成员** | **描述** | -|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------| -| const char *pci | 设备PCI地址 | -| uint32_t err_entries | 指定希望获取的Error Log条数,最多256条 | -| struct ublock_nvme_error_info *errlog_info | 出参,返回设备Error log信息,errlog_info指针需要调用者申请空间,且确保申请的空间大于或等于err_entries * sizeof (struct ublock_nvme_error_info) | - -4. 返回值 - -| **返回值** | **描述** | -|-------------------------------------|--------------------------------------------------------------| -| 获取到的Error log条数,大于或等于0 | 获取Error log成功。 | -| -1 | 获取Error log失败,如参数错误等。 | -| -11(EAGAIN) | 获取Error log失败,如rpc查询失败,需要重试(建议sleep 3s)。 | - -##### ublock_get_log_page - -1. 接口原型 -``` -int ublock_get_log_page(const char *pci, uint8_t log_page, uint32_t nsid, void *payload, uint32_t payload_size); -``` -2. 接口描述 - -业务进程通过调用本接口获取指定设备,指定log page的信息。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------------|-------------------------------------------------------------------------------------------------------------------------| -| const char *pci | 设备PCI地址 | -| uint8_t log_page | 指定希望获取的log page ID,比如0xC0, 0xCA代表ES3000 V5盘自定义的SMART信息 | -| uint32_t nsid | 指定namespace ID,各个log page对按namespace获取支持情况不一致,如果不支持按namespace获取,调用者需要显示传0xFFFFFFFF | -| void *payload | 出参,存储log page信息,由调用者负责申请内存 | -| uint32_t payload_size | 申请的payload大小,不大于4096 Bytes | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|------------------------------------| -| 0 | 获取log page成功 | -| -1 | 获取Error log失败,如参数错误等 | - -##### ublock_info_get_pci_addr - -1. 接口原型 -``` -char *ublock_info_get_pci_addr(const void *info); -``` -2. 接口描述 - -业务进程的回调函数中,通过调用本接口获取热插拔设备的PCI地址。 - -info占用的内存以及返回的PCI地址占用得内存不需要业务进程进行释放。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------|---------------------------------------------| -| const void *info | 热插拔监控线程传递给回调函数的热插拔事件信息 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|--------------------| -| NULL | 获取失败 | -| 非NULL | 获取的PCI地址 | - -##### ublock_info_get_action - -1. 接口原型 -``` -enum ublock_nvme_uevent_action ublock_info_get_action(const void *info); -``` -2. 接口描述 - -业务进程的回调函数中,通过调用本接口获取热插拔事件的类型。 - -info占用的内存不需要业务进程进行释放。 - -3. 参数 - -| **参数成员** | **描述** | -|-------------------|------------------------------------------------| -| const void *info | 热插拔监控线程传递给回调函数的热插拔事件信息 | - -4. 返回值 - -| **返回值** | **描述** | -|----------------|------------------------------------------------------------------------------| -| 热插拔事件类型| 触发回调函数的事件类型,详见结构体enum ublock_nvme_uevent_action的定义。 | - -##### ublock_get_ctrl_iostat - -1. 接口原型 -``` -int ublock_get_ctrl_iostat(const char* pci, struct ublock_ctrl_iostat_info *ctrl_iostat); -``` -2. 接口描述 - -业务进程通过调用本接口获取控制器的IO统计信息。 - -3. 参数 - -| **参数成员** | **描述** | -|-----------------------------------------------|----------------------------------------------------------| -| const char* pci | 需要获取IO统计信息的控制器的PCI地址。 | -| struct ublock_ctrl_iostat_info *ctrl_iostat |出参,返回IO统计信息,ctrl_iostat指针需要在外部分配。 | - -4. 返回值 - -| **返回值** | **描述** | -|-------------|-------------------------------------------------| -| 0 | 获取IO统计信息成功。 | -| -1 | 获取IO统计信息失败(无效参数、RPC error)。 | -| -2 | 获取IO统计信息失败(NVMe盘没有被IO进程接管)。 | -| -3 | 获取IO统计信息失败(IO统计开关未打开)。 | - -##### ublock_nvme_admin_passthru - -1. 接口原型 -``` -int32_t ublock_nvme_admin_passthru(const char *pci, void *cmd, void *buf, size_t nbytes); -``` -2. 接口描述 - -业务进程通过调用该接口透传nvme admin命令给nvme设备。当前仅支持获取identify字段的nvme admin命令。 - -3. 参数 - -| **参数成员** | **描述** | -|------------------|----------------------------------------------------------------------------------------------------| -| const char *pci | nvme admin命令目的控制器的PCI地址。 | -| void *cmd | nvme admin命令结构体指针,结构体大小为64字节,内容参考nvme spec。当前仅支持获取identify字段命令。 | -| void *buf | 保存nvme admin命令返回内容,其空间由用户分配,大小为nbytes。 | -| size_t nbytes | 用户buf的大小。identify字段为4096字节,获取identify命令的nbytes为4096。 | - -4. 返回值 - -| **返回值**| **描述** | -|------------|--------------------| -| 0 | 用户命令执行成功。 | -| -1 | 用户命令执行失败。 | - -## 附录 - -### GENERIC - -通用类型错误码参考 - -|sc |value| -|---------------------------------------------|---------------| -| NVME_SC_SUCCESS | 0x00 | -| NVME_SC_INVALID_OPCODE | 0x01 | -| NVME_SC_INVALID_FIELD | 0x02 | -| NVME_SC_COMMAND_ID_CONFLICT | 0x03 | -| NVME_SC_DATA_TRANSFER_ERROR | 0x04 | -| NVME_SC_ABORTED_POWER_LOSS | 0x05 | -| NVME_SC_INTERNAL_DEVICE_ERROR | 0x06 | -| NVME_SC_ABORTED_BY_REQUEST | 0x07 | -| NVME_SC_ABORTED_SQ_DELETION | 0x08 | -| NVME_SC_ABORTED_FAILED_FUSED | 0x09 | -| NVME_SC_ABORTED_MISSING_FUSED | 0x0a | -| NVME_SC_INVALID_NAMESPACE_OR_FORMAT | 0x0b | -| NVME_SC_COMMAND_SEQUENCE_ERROR | 0x0c | -| NVME_SC_INVALID_SGL_SEG_DESCRIPTOR | 0x0d | -| NVME_SC_INVALID_NUM_SGL_DESCIRPTORS | 0x0e | -| NVME_SC_DATA_SGL_LENGTH_INVALID | 0x0f | -| NVME_SC_METADATA_SGL_LENGTH_INVALID | 0x10 | -| NVME_SC_SGL_DESCRIPTOR_TYPE_INVALID | 0x11 | -| NVME_SC_INVALID_CONTROLLER_MEM_BUF | 0x12 | -| NVME_SC_INVALID_PRP_OFFSET | 0x13 | -| NVME_SC_ATOMIC_WRITE_UNIT_EXCEEDED | 0x14 | -| NVME_SC_OPERATION_DENIED | 0x15 | -| NVME_SC_INVALID_SGL_OFFSET | 0x16 | -| NVME_SC_INVALID_SGL_SUBTYPE | 0x17 | -| NVME_SC_HOSTID_INCONSISTENT_FORMAT | 0x18 | -| NVME_SC_KEEP_ALIVE_EXPIRED | 0x19 | -| NVME_SC_KEEP_ALIVE_INVALID | 0x1a | -| NVME_SC_ABORTED_PREEMPT | 0x1b | -| NVME_SC_SANITIZE_FAILED | 0x1c | -| NVME_SC_SANITIZE_IN_PROGRESS | 0x1d | -| NVME_SC_SGL_DATA_BLOCK_GRANULARITY_INVALID | 0x1e | -| NVME_SC_COMMAND_INVALID_IN_CMB | 0x1f | -| NVME_SC_LBA_OUT_OF_RANGE | 0x80 | -| NVME_SC_CAPACITY_EXCEEDED | 0x81 | -| NVME_SC_NAMESPACE_NOT_READY | 0x82 | -| NVME_SC_RESERVATION_CONFLICT | 0x83 | -| NVME_SC_FORMAT_IN_PROGRESS | 0x84 | - -### COMMAND_SPECIFIC - -特定命令错误码参考 - -|sc |value| -|---------------------------------------------|---------------| -| NVME_SC_COMPLETION_QUEUE_INVALID | 0x00 | -| NVME_SC_INVALID_QUEUE_IDENTIFIER | 0x01 | -| NVME_SC_MAXIMUM_QUEUE_SIZE_EXCEEDED | 0x02 | -| NVME_SC_ABORT_COMMAND_LIMIT_EXCEEDED | 0x03 | -| NVME_SC_ASYNC_EVENT_REQUEST_LIMIT_EXCEEDED | 0x05 | -| NVME_SC_INVALID_FIRMWARE_SLOT | 0x06 | -| NVME_SC_INVALID_FIRMWARE_IMAGE | 0x07 | -| NVME_SC_INVALID_INTERRUPT_VECTOR | 0x08 | -| NVME_SC_INVALID_LOG_PAGE | 0x09 | -| NVME_SC_INVALID_FORMAT | 0x0a | -| NVME_SC_FIRMWARE_REQ_CONVENTIONAL_RESET | 0x0b | -| NVME_SC_INVALID_QUEUE_DELETION | 0x0c | -| NVME_SC_FEATURE_ID_NOT_SAVEABLE | 0x0d | -| NVME_SC_FEATURE_NOT_CHANGEABLE | 0x0e | -| NVME_SC_FEATURE_NOT_NAMESPACE_SPECIFIC | 0x0f | -| NVME_SC_FIRMWARE_REQ_NVM_RESET | 0x10 | -| NVME_SC_FIRMWARE_REQ_RESET | 0x11 | -| NVME_SC_FIRMWARE_REQ_MAX_TIME_VIOLATION | 0x12 | -| NVME_SC_FIRMWARE_ACTIVATION_PROHIBITED | 0x13 | -| NVME_SC_OVERLAPPING_RANGE | 0x14 | -| NVME_SC_NAMESPACE_INSUFFICIENT_CAPACITY | 0x15 | -| NVME_SC_NAMESPACE_ID_UNAVAILABLE | 0x16 | -| NVME_SC_NAMESPACE_ALREADY_ATTACHED | 0x18 | -| NVME_SC_NAMESPACE_IS_PRIVATE | 0x19 | -| NVME_SC_NAMESPACE_NOT_ATTACHED | 0x1a | -| NVME_SC_THINPROVISIONING_NOT_SUPPORTED | 0x1b | -| NVME_SC_CONTROLLER_LIST_INVALID | 0x1c | -| NVME_SC_DEVICE_SELF_TEST_IN_PROGRESS | 0x1d | -| NVME_SC_BOOT_PARTITION_WRITE_PROHIBITED | 0x1e | -| NVME_SC_INVALID_CTRLR_ID | 0x1f | -| NVME_SC_INVALID_SECONDARY_CTRLR_STATE | 0x20 | -| NVME_SC_INVALID_NUM_CTRLR_RESOURCES | 0x21 | -| NVME_SC_INVALID_RESOURCE_ID | 0x22 | -| NVME_SC_CONFLICTING_ATTRIBUTES | 0x80 | -| NVME_SC_INVALID_PROTECTION_INFO | 0x81 | -| NVME_SC_ATTEMPTED_WRITE_TO_RO_PAGE | 0x82 | - -### MEDIA_DATA_INTERGRITY_ERROR - -介质异常错误码参考 - -|sc |value| -|-----------------------------------------|---------------| -| NVME_SC_WRITE_FAULTS | 0x80 | -| NVME_SC_UNRECOVERED_READ_ERROR | 0x81 | -| NVME_SC_GUARD_CHECK_ERROR | 0x82 | -| NVME_SC_APPLICATION_TAG_CHECK_ERROR | 0x83 | -| NVME_SC_REFERENCE_TAG_CHECK_ERROR | 0x84 | -| NVME_SC_COMPARE_FAILURE | 0x85 | -| NVME_SC_ACCESS_DENIED | 0x86 | -| NVME_SC_DEALLOCATED_OR_UNWRITTEN_BLOCK | 0x87 | diff --git a/docs/zh/docs/HSAK/hsak_tools_usage.md b/docs/zh/docs/HSAK/hsak_tools_usage.md deleted file mode 100644 index 8cf8411e88cc2cf9959b1e961e79d241c2643d14..0000000000000000000000000000000000000000 --- a/docs/zh/docs/HSAK/hsak_tools_usage.md +++ /dev/null @@ -1,125 +0,0 @@ - -## 命令行接口 - -### 盘信息查询命令 - -#### 命令格式 - -```shell -libstorage-list [] [] -``` - -#### 参数说明 - -- commands: 只有“help”可选,“libstorage-list help”用于显示帮助内容。 - -- device: 指定PCI地址,格式如:0000:09:00.0,允许同时多个,中间用空格隔离,如果不设置具体的PCI地址,则命令行列出所有枚举到的设备信息。 - -#### 注意事项 - -- 故障注入功能仅限于开发、调试以及测试场景使用,禁止在用户现网使用,否则会引起业务及安全风险。 - -- 在执行本命令时,管理组件(ublock)服务端必须已经启动,用户态IO组件(uio)未启动或已正确启动均可。 - -- 对于未被ublock组件和用户态IO组件占用的盘,在本命令执行过程中会被占用,此时如果ublock组件或用户态IO组件尝试获取盘控制权,可能存储设备访问冲突,导致失败。 - -### 盘切换驱动命令 - -#### 命令格式 - -```shell -libstorage-shutdown reset [ ...] -``` - -#### 参数说明 - -- reset: 用于对指定盘从uio驱动切换到内核态驱动; - -- device: 指定PCI地址,格式如:0000:09:00.0,允许同时多个,中间用空格隔离。 - -#### 注意事项 - -- libstorage-shutdown reset命令用于将盘从用户态uio驱动切换到内核态nvme驱动。 - -- 在执行本命令时,管理组件(ublock)服务端必须已经启动,用户态IO组件未启动或已正确启动均可。 - -- libstoage-shutdown reset命令为危险动作,请确认在切换nvme设备驱动之前,用户态实例已经停止对nvme设备下发IO,nvme设备上的fd已全部关闭,且访问nvme设备的实例已退出。 - -### 获取IO统计数据命令 - -#### 命令格式 - -```shell -libstorage-iostat [-t ] [-i ] [-d ] -``` - -#### 参数说明 - -- -t: 时间间隔,以秒为单位,最小1秒,最大为3600秒。该参数为int型,如果入参值超过int型上限,将被截断成负数或者正数。 - -- -i: 收集次数,最小为1,最大为MAX_INT次,如果不设置,默认以时间间隔持续收集。该参数为int型,如果入参超过int型上限,将被截断成负数或者正数。 - -- -d:指定块设备名称(eg:nvme0n1,其依赖于/etc/spdk/nvme.conf.in中配置的控制器名称),可以通过本参数收集指定一个或多个设备性能数据,如果不设置本参数,则收集所有识别到的设备性能数据。 - -#### 注意事项 - -- IO统计配置项已使能。 - -- 进程已经通过用户态IO组件对所需要查询性能信息的盘下发IO操作。 - -- 如果当前环境上没有任何设备被业务进程占用下发IO,则该命令将在提示:You cannot get iostat info for nvme device no deliver io后退出。 - -- 在磁盘打开多队列情况下,IO统计工具将该磁盘上多队列的性能数据汇总后统一输出。 - -- IO统计工具最多支持8192个磁盘队列的数据记录。 - -- IO统计数据输出结果如下: - - | Device | r/s | w/s | rKB/s | wKB/s | avgrq-sz | avgqu-sz | r_await | w_await | await | svctm | util% | poll-n | - | ------ | ------- | ------- | ------- | ------- | ------------ | --------- | --------- | --------- | ---------- | ------------ | ----- | ------ | - | 设备名称 | 每秒读IO个数 | 每秒写IO个数 | 每秒读IO字节 | 每秒写IO字节 | 平均下发IO大小(字节) | 磁盘排队的IO深度 | IO读时延(us) | IO写时延(us) | 读写平均时延(us) | 单个IO处理时延(us) | 设备利用率 | 轮询超时次数 | - -## 盘读写命令 - -#### 命令格式 - -```shell -libstorage-rw [OPTIONS...] -``` - -#### 参数说明 - -1. COMMAND参数 - - - read,从设备读取指定的逻辑块到数据缓存区(默认是标准输出)。 - - - write,将数据缓存区(默认是标准输入)的数据写入到NVMe设备的指定逻辑块。 - - - help,显示该命令行的帮助信息。 - -2. device: 指定PCI地址,格式如:0000:09:00.0。 - -3. OPTIONS参数 - - - --start-block,-s:读写逻辑块的64位首地址(默认值为0)。 - - - --block-count,-c:读写逻辑块的数量(从0开始计数)。 - - - --data-size,-z:读写数据的字节数。 - - - --namespae-id,-n:设备的namespace id(默认id值是1)。 - - - --data,-d:读写用的数据文件(读时保存读出的数据,写时提供写入数据)。 - - - --limited-retry,-l:设备控制器进行有限次数的重启来完成设备读写。 - - - --force-unit-access,-f:确保指令完成之前从非易失介质中完成读写。 - - - --show-command,-v:发送读写命令之前显示指令相关信息。 - - - --dry-run,-w:仅显示读写指令相关信息,不进行实际读写操作。 - - - --latency,-t:统计命令行端到端读写的时延。 - - - --help,-h:显示相关命令的帮助信息。 - diff --git a/docs/zh/docs/HSAK/introduce_hsak.md b/docs/zh/docs/HSAK/introduce_hsak.md deleted file mode 100644 index 4c280e1f25775f7297abd36c3850734d7074d4df..0000000000000000000000000000000000000000 --- a/docs/zh/docs/HSAK/introduce_hsak.md +++ /dev/null @@ -1,48 +0,0 @@ -# HSAK开发者指南 - -## 介绍 - -随着NVMe SSD、SCM等存储介质性能不断提升,介质层在IO栈中的时延开销不断缩减,软件栈的开销成为瓶颈,急需重构内核IO数据面,减少软件栈的开销,HSAK针对新型存储介质提供高带宽低时延的IO软件栈,相对传统IO软件栈,软件栈开销降低50%以上。 -HSAK用户态IO引擎基于开源的SPDK基础上进行开发: - -1. 对外提供统一的接口,屏蔽开源接口的差异。 -2. 在开源基础上新增IO数据面增强特性,如DIF功能,磁盘格式化,IO批量下发,trim特性,动态增删盘等特性。 -3. 提供磁盘设备管理,磁盘IO监测,维测工具等特性。 - -## 编译教程 - -1. 下载hsak源码 - - $ git clone https://gitee.com/openeuler/hsak.git - -2. 编译和运行依赖 - - hsak的编译和运行依赖于spdk、dpdk、libboundscheck等组件 - -3. 编译 - - $ cd hsak - - $ mkdir build - - $ cd build - - $ cmake .. - - $ make - -## 注意事项 - -### 使用约束 - -- 同一台机器最多使用和管理512个NVMe设备。 -- 启用HSAK执行IO相关业务时,需要确保系统有至少500M以上连续的空闲大页内存。 -- 启用用户态IO组件执行相关业务时,需要确保硬盘管理组件(ublock)已经启用。 -- 启用磁盘管理组件(ublock)执行相关业务时,需确保系统有足够的连续空闲内存,每次初始化ublock组件会申请20MB大页内存。 -- 每次运行HSAK之前,产品需要调用setup.sh来配置大页,解绑NVMe设备内核态驱动。 -- 执行libstorage_init_module成功后方可使用HSAK模块提供的其他接口;每个进程仅能执行一次libstorage_init_module调用。 -- 执行libstorage_exit_module函数之后不能再使用HSAK提供的其他接口,再多线程场景特别需要注意,在所有线程结束之后再退出HSAK。 -- HSAK ublock组件在一个服务器上只能启动一个服务,且最大支持64个ublock客户端并发访问,ublock服务端处理客户端请求的处理上限是20次/秒。 -- HSAK ublock组件必须早于数据面IO组件和ublock客户端启动,HSAK提供的命令行工具也必须在ublock服务端启动后才能执行。 -- 不要注册SIGBUS信号处理函数;spdk针对该信号有单独的处理函数;若该函数被覆盖,会导致spdk注册的SIGBUS处理函数失效,产生coredump。 - diff --git "a/docs/zh/docs/Installation/\344\275\277\347\224\250kickstart\350\207\252\345\212\250\345\214\226\345\256\211\350\243\205.md" "b/docs/zh/docs/Installation/\344\275\277\347\224\250kickstart\350\207\252\345\212\250\345\214\226\345\256\211\350\243\205.md" index 801f40d558e71531d2d7b890bab22b459045f02a..b5a8cd0a54a0b9812e07774f378cdbc582b8b224 100644 --- "a/docs/zh/docs/Installation/\344\275\277\347\224\250kickstart\350\207\252\345\212\250\345\214\226\345\256\211\350\243\205.md" +++ "b/docs/zh/docs/Installation/\344\275\277\347\224\250kickstart\350\207\252\345\212\250\345\214\226\345\256\211\350\243\205.md" @@ -80,7 +80,7 @@ TFTP(Trivial File Transfer Protocol,简单文件传输协议),该协议 - 物理机/虚拟机(虚拟机创建可参考对应厂商的资料)。包括使用kickstart工具进行自动化安装的计算机和被安装的计算机。 - httpd:存放kickstart文件。 -- ISO: openEuler-21.09-aarch64-dvd.iso +- ISO: openEuler-23.03-aarch64-dvd.iso ### 操作步骤 @@ -177,7 +177,7 @@ TFTP(Trivial File Transfer Protocol,简单文件传输协议),该协议 **安装系统** 1. 启动系统进入安装选择界面。 - 1. 在“[启动安装](./安装指导.html#启动安装)”中的“安装引导界面”中选择“Install openEuler 21.09”,并按下“e”键。 + 1. 在“[启动安装](./安装指导.html#启动安装)”中的“安装引导界面”中选择“Install openEuler 23.03”,并按下“e”键。 2. 启动参数中追加“inst.ks=http://server ip/ks/openEuler-ks.cfg”。 ![](./figures/startparam.png) @@ -201,7 +201,7 @@ TFTP(Trivial File Transfer Protocol,简单文件传输协议),该协议 - httpd:存放kickstart文件。 - tftp:提供vmlinuz和initrd文件。 - dhcpd/pxe:提供DHCP服务。 -- ISO:openEuler-21.09-aarch64-dvd.iso。 +- ISO:openEuler-23.03-aarch64-dvd.iso。 ### 操作步骤 @@ -251,7 +251,7 @@ TFTP(Trivial File Transfer Protocol,简单文件传输协议),该协议 3. 安装源的制作。 ``` - # mount openEuler-21.09-aarch64-dvd.iso /mnt + # mount openEuler-23.03-aarch64-dvd.iso /mnt # cp -r /mnt/* /var/www/html/openEuler/ ``` diff --git "a/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207-1.md" "b/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207-1.md" index 3fe7818d50d7864097dd92fed1824745c554fd91..060545ce4d621c331972e0c54b696ea9e4d5f89b 100644 --- "a/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207-1.md" +++ "b/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207-1.md" @@ -20,13 +20,13 @@ 在安装开始前,您需要获取 openEuler 发布的树莓派镜像及其校验文件。 1. 登录[openEuler社区](https://openeuler.org/zh/download/)网站。 -2. 单击卡片 openEuler 21.09 上的“下载”按钮。 +2. 单击卡片 openEuler 23.03 上的“下载”按钮。 3. 单击“raspi_img”,进入树莓派镜像的下载列表。 - aarch64:AArch64 架构的镜像。 4. 单击“aarch64”,进入树莓派 AArch64 架构镜像的下载列表。 -5. 单击“openEuler-21.09-raspi-aarch64.img.xz”,将 openEuler 发布的树莓派镜像下载到本地。 -6. 单击“openEuler-21.09-raspi-aarch64.img.xz.sha256sum”,将 openEuler 发布的树莓派镜像的校验文件下载到本地。 +5. 单击“openEuler-23.03-raspi-aarch64.img.xz”,将 openEuler 发布的树莓派镜像下载到本地。 +6. 单击“openEuler-23.03-raspi-aarch64.img.xz.sha256sum”,将 openEuler 发布的树莓派镜像的校验文件下载到本地。 ## 镜像完整性校验 @@ -40,9 +40,9 @@ 在校验镜像文件的完整性之前,需要准备如下文件: -镜像文件:openEuler-21.09-raspi-aarch64.img.xz +镜像文件:openEuler-23.03-raspi-aarch64.img.xz -校验文件:openEuler-21.09-raspi-aarch64.img.xz.sha256sum +校验文件:openEuler-23.03-raspi-aarch64.img.xz.sha256sum ### 操作指导 @@ -51,13 +51,13 @@ 1. 获取校验文件中的校验值。执行命令如下: ``` - $ cat openEuler-21.09-raspi-aarch64.img.xz.sha256sum + $ cat openEuler-23.03-raspi-aarch64.img.xz.sha256sum ``` 2. 计算文件的 sha256 校验值。执行命令如下: ``` - $ sha256sum openEuler-21.09-raspi-aarch64.img.xz + $ sha256sum openEuler-23.03-raspi-aarch64.img.xz ``` 命令执行完成后,输出校验值。 diff --git "a/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207.md" "b/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207.md" index 3e99b3f9440e6551f6f7c2b11e32a156ea121d06..6f4ab752d216f6b2be83e2b20397cb9d53b18257 100644 --- "a/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207.md" +++ "b/docs/zh/docs/Installation/\345\256\211\350\243\205\345\207\206\345\244\207.md" @@ -11,20 +11,20 @@ 1. 登录[openEuler社区](https://openeuler.org/zh/)网站。 2. 单击“下载”进入页签,进入版本下载页面。 -3. 单击卡片 openEuler 21.09 上的“获取ISO”,显示ISO下载列表。 +3. 单击卡片 openEuler 23.03 上的“获取ISO”,显示ISO下载列表。 - aarch64:AArch64架构的ISO。 - x86\_64:x86\_64架构的ISO。 - source:openEuler源码ISO。 4. 根据实际待安装环境的架构选择需要下载的 openEuler 的发布包和校验文件。 - 若为AArch64架构。 1. 单击“aarch64”。 - 2. 若选择本地安装,分别将发布包 “openEuler-21.09-aarch64-dvd.iso”和校验文件“openEuler-21.09-aarch64-dvd.iso.sha256sum”下载到本地。 - 3. 若选择网络安装,分别将发布包 “openEuler-21.09-netinst-aarch64-dvd.iso”和校验文件“openEuler-21.09-netinst-aarch64-dvd.iso.sha256sum”下载到本地。 + 2. 若选择本地安装,分别将发布包 “openEuler-23.03-aarch64-dvd.iso”和校验文件“openEuler-23.03-aarch64-dvd.iso.sha256sum”下载到本地。 + 3. 若选择网络安装,分别将发布包 “openEuler-23.03-netinst-aarch64-dvd.iso”和校验文件“openEuler-23.03-netinst-aarch64-dvd.iso.sha256sum”下载到本地。 - 若为x86\_64架构。 1. 单击“x86\_64”。 - 2. 若选择本地安装,分别将发布包 “openEuler-21.09-x86\_64-dvd.iso”和校验文件“openEuler-21.09-x86\_64-dvd.iso.sha256sum”下载到本地。 - 3. 若选择网络安装,分别将发布包 “openEuler-21.09-netinst-x86\_64-dvd.iso ”和校验文件“openEuler-21.09-netinst-x86\_64-dvd.iso.sha256sum”下载到本地。 + 2. 若选择本地安装,分别将发布包 “openEuler-23.03-x86\_64-dvd.iso”和校验文件“openEuler-23.03-x86\_64-dvd.iso.sha256sum”下载到本地。 + 3. 若选择网络安装,分别将发布包 “openEuler-23.03-netinst-x86\_64-dvd.iso ”和校验文件“openEuler-23.03-netinst-x86\_64-dvd.iso.sha256sum”下载到本地。 >![](./public_sys-resources/icon-note.gif) **说明:** > - 网络安装方式的 ISO 发布包较小,在有网络的安装环境可以选择网络安装方式。 @@ -45,9 +45,9 @@ 在校验发布包完整性之前,需要准备如下文件: -iso文件:openEuler-21.09-aarch64-dvd.iso +iso文件:openEuler-23.03-aarch64-dvd.iso -校验文件:openEuler-21.09-aarch64-dvd.iso.sha256sum +校验文件:openEuler-23.03-aarch64-dvd.iso.sha256sum ### 操作指导 @@ -56,13 +56,13 @@ iso文件:openEuler-21.09-aarch64-dvd.iso 1. 获取校验文件中的校验值。执行命令如下: ``` - $ cat openEuler-21.09-aarch64-dvd.iso.sha256sum + $ cat openEuler-23.03-aarch64-dvd.iso.sha256sum ``` 2. 计算文件的sha256校验值。执行命令如下: ``` - $ sha256sum openEuler-21.09-aarch64-dvd.iso + $ sha256sum openEuler-23.03-aarch64-dvd.iso ``` 命令执行完成后,输出校验值。 diff --git "a/docs/zh/docs/Installation/\345\256\211\350\243\205\346\226\271\345\274\217\344\273\213\347\273\215-1.md" "b/docs/zh/docs/Installation/\345\256\211\350\243\205\346\226\271\345\274\217\344\273\213\347\273\215-1.md" index 84bdd00f4550140e14a0cd5cc5f0490192ef69eb..a6ed6744074bdd15bb00b8f3c05d378dc01ab2a9 100644 --- "a/docs/zh/docs/Installation/\345\256\211\350\243\205\346\226\271\345\274\217\344\273\213\347\273\215-1.md" +++ "b/docs/zh/docs/Installation/\345\256\211\350\243\205\346\226\271\345\274\217\344\273\213\347\273\215-1.md" @@ -44,9 +44,9 @@ ### 写入 SD 卡 >![](./public_sys-resources/icon-notice.gif) **须知:** ->如果获取的是压缩后的镜像文件“openEuler-21.09-raspi-aarch64.img.xz”,需要先将压缩文件解压得到 “openEuler-21.09-raspi-aarch64.img”镜像文件。 +>如果获取的是压缩后的镜像文件“openEuler-23.03-raspi-aarch64.img.xz”,需要先将压缩文件解压得到 “openEuler-23.03-raspi-aarch64.img”镜像文件。 -请按照以下步骤将“openEuler-21.09-raspi-aarch64.img”镜像文件写入 SD 卡: +请按照以下步骤将“openEuler-23.03-raspi-aarch64.img”镜像文件写入 SD 卡: 1. 下载并安装刷写镜像的工具,以下操作以 Win32 Disk Imager 工具为例。 2. 右键选择“以管理员身份运行”,打开 Win32 Disk Imager。 @@ -74,10 +74,10 @@ ### 写入 SD 卡 -1. 如果获取的是压缩后的镜像,需要先执行 `xz -d openEuler-21.09-raspi-aarch64.img.xz` 命令将压缩文件解压得到“openEuler-21.09-raspi-aarch64.img”镜像文件;否则,跳过该步骤。 -2. 将镜像 `openEuler-21.09-raspi-aarch64.img` 刷写入 SD 卡,在 root 权限下执行以下命令: +1. 如果获取的是压缩后的镜像,需要先执行 `xz -d openEuler-23.03-raspi-aarch64.img.xz` 命令将压缩文件解压得到“openEuler-23.03-raspi-aarch64.img”镜像文件;否则,跳过该步骤。 +2. 将镜像 `openEuler-23.03-raspi-aarch64.img` 刷写入 SD 卡,在 root 权限下执行以下命令: - `dd bs=4M if=openEuler-21.09-raspi-aarch64.img of=/dev/sdb` + `dd bs=4M if=openEuler-23.03-raspi-aarch64.img of=/dev/sdb` >![](./public_sys-resources/icon-note.gif) **说明:** >一般情况下,将块大小设置为 4M。如果写入失败或者写入的镜像无法使用,可以尝试将块大小设置为 1M 重新写入,但是设置为 1M 比较耗时。 @@ -101,10 +101,10 @@ ### 写入 SD 卡 -1. 如果获取的是压缩后的镜像,需要先执行 `xz -d openEuler-21.09-raspi-aarch64.img.xz` 命令将压缩文件解压得到“openEuler-21.09-raspi-aarch64.img”镜像文件;否则,跳过该步骤。 -2. 将镜像 `openEuler-21.09-raspi-aarch64.img` 刷入 SD 卡,在 root 权限下执行以下命令: +1. 如果获取的是压缩后的镜像,需要先执行 `xz -d openEuler-23.03-raspi-aarch64.img.xz` 命令将压缩文件解压得到“openEuler-23.03-raspi-aarch64.img”镜像文件;否则,跳过该步骤。 +2. 将镜像 `openEuler-23.03-raspi-aarch64.img` 刷入 SD 卡,在 root 权限下执行以下命令: - `dd bs=4m if=openEuler-21.09-raspi-aarch64.img of=/dev/disk3` + `dd bs=4m if=openEuler-23.03-raspi-aarch64.img of=/dev/disk3` >![](./public_sys-resources/icon-note.gif) **说明:** >一般情况下,将块大小设置为 4m。如果写入失败或者写入的镜像无法使用,可以尝试将块大小设置为 1m 重新写入,但是设置为 1m 比较耗时。 diff --git "a/docs/zh/docs/K3s/K3s\351\203\250\347\275\262\346\214\207\345\215\227.md" "b/docs/zh/docs/K3s/K3s\351\203\250\347\275\262\346\214\207\345\215\227.md" deleted file mode 100644 index 6671cf170d9ac19c37e8c8e8abb1bfe75e3d0166..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/K3s/K3s\351\203\250\347\275\262\346\214\207\345\215\227.md" +++ /dev/null @@ -1,86 +0,0 @@ -# K3s部署指南 - -### 什么是K3s -K3s 是一个轻量级的 Kubernetes 发行版,它针对边缘计算、物联网等场景进行了高度优化。K3s 有以下增强功能: -- 打包为单个二进制文件。 -- 使用基于 sqlite3 的轻量级存储后端作为默认存储机制。同时支持使用 etcd3、MySQL 和 PostgreSQL 作为存储机制。 -- 封装在简单的启动程序中,通过该启动程序处理很多复杂的 TLS 和选项。 -- 默认情况下是安全的,对轻量级环境有合理的默认值。 -- 添加了简单但功能强大的batteries-included功能,例如:本地存储提供程序,服务负载均衡器,Helm controller 和 Traefik Ingress controller。 -- 所有 Kubernetes control-plane 组件的操作都封装在单个二进制文件和进程中,使 K3s 具有自动化和管理包括证书分发在内的复杂集群操作的能力。 -- 最大程度减轻了外部依赖性,K3s 仅需要 kernel 和 cgroup 挂载。 - -### 适用场景 -K3s 适用于以下场景: - -- 边缘计算-Edge -- 物联网-IoT -- CI -- Development -- ARM -- 嵌入 K8s - -由于运行 K3s 所需的资源相对较少,所以 K3s 也适用于开发和测试场景。在这些场景中,如果开发或测试人员需要对某些功能进行验证,或对某些问题进行重现,那么使用 K3s 不仅能够缩短启动集群的时间,还能够减少集群需要消耗的资源。 - -### 部署K3s - -#### 准备工作: - -- 确保server节点及agent节点主机名不一致: - -可以通过 hostnamectl set-hostname “主机名” 进行主机名的修改。 - -![1661829534335](./figures/set-hostname.png) - -- 在各节点yum 安装 K3s: - - K3s官网采用下载对应架构二进制可执行文件的格式,通过install.sh脚本进行离线安装,openEuler社区将该二进制文件的编译过程移植到社区中,并编译出RPM包。此处可通过yum命令直接进行下载安装。 - -![1661830441538](./figures/yum-install.png) - -#### 部署server节点 - -如需在单个服务器上安装 K3s,可以在 server 节点上执行如下操作: -``` -INSTALL_K3S_SKIP_DOWNLOAD=true k3s-install.sh -``` - -![1661825352724](./figures/server-install.png) - -#### 检查server部署情况 - -![1661825403705](./figures/check-server.png) - -#### 部署agent节点 - -首先查询server节点的token值,该token可在server节点的/var/lib/rancher/k3s/server/node-token查到。 - -> **注意**: -> -> 后续我们只用到该token的后半部分。 - -![1661825538264](./figures/token.png) - -选择添加其他 agent,请在每个 agent 节点上执行以下操作。 - -``` -INSTALL_K3S_SKIP_DOWNLOAD=true K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken k3s-install.sh -``` - -> **注意** -> -> 将 myserver 替换为 server 的 IP 或有效的 DNS,并将 mynodetoken 替换 server 节点的 token: - -![1661829392357](./figures/agent-install.png) - -#### 检查agent节点是否部署成功 - -安装完毕后,回到 **server** 节点,执行 `kubectl get nodes`,可以看到agent节点已注册成功。 - -![1661826797319](./figures/check-agent.png) - -至此,一个基础的k3s集群搭建完成。 - -#### 更多用法 - -K3s的更多用法可以参考K3s官网,https://rancher.com/docs/k3s/latest/en/ ,https://docs.rancher.cn/k3s/ diff --git a/docs/zh/docs/K3s/figures/agent-install.png b/docs/zh/docs/K3s/figures/agent-install.png deleted file mode 100644 index dca1d64ec8aae821393bb715daf4c56b783a68e0..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/K3s/figures/agent-install.png and /dev/null differ diff --git a/docs/zh/docs/K3s/figures/check-agent.png b/docs/zh/docs/K3s/figures/check-agent.png deleted file mode 100644 index aa467713353d70ad513e8ee13ac9d8b6520b7ee0..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/K3s/figures/check-agent.png and /dev/null differ diff --git a/docs/zh/docs/K3s/figures/check-server.png b/docs/zh/docs/K3s/figures/check-server.png deleted file mode 100644 index 06343de9a8b0eacb0f6194cf438b2b27af88cae4..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/K3s/figures/check-server.png and /dev/null differ diff --git a/docs/zh/docs/K3s/figures/server-install.png b/docs/zh/docs/K3s/figures/server-install.png deleted file mode 100644 index 7d30c8f4f73946c8b0555186c1736492039da731..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/K3s/figures/server-install.png and /dev/null differ diff --git a/docs/zh/docs/K3s/figures/set-hostname.png b/docs/zh/docs/K3s/figures/set-hostname.png deleted file mode 100644 index 32564d6159825b6d4131a6b138a493188ce88c6c..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/K3s/figures/set-hostname.png and /dev/null differ diff --git a/docs/zh/docs/K3s/figures/token.png b/docs/zh/docs/K3s/figures/token.png deleted file mode 100644 index 79e5313bd1d5e707659cd08d4aafdf528b9df8f0..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/K3s/figures/token.png and /dev/null differ diff --git a/docs/zh/docs/K3s/figures/yum-install.png b/docs/zh/docs/K3s/figures/yum-install.png deleted file mode 100644 index 0e601a23a5a67e7927f12bc90d1a4137e1a3a567..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/K3s/figures/yum-install.png and /dev/null differ diff --git "a/docs/zh/docs/Kernel/\345\206\205\345\255\230\345\217\257\351\235\240\346\200\247\345\210\206\347\272\247\347\211\271\346\200\247\344\275\277\347\224\250\346\214\207\345\215\227.md" "b/docs/zh/docs/Kernel/\345\206\205\345\255\230\345\217\257\351\235\240\346\200\247\345\210\206\347\272\247\347\211\271\346\200\247\344\275\277\347\224\250\346\214\207\345\215\227.md" deleted file mode 100644 index 5ea0eff5b88425f13b020fed80854226fa0763f4..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kernel/\345\206\205\345\255\230\345\217\257\351\235\240\346\200\247\345\210\206\347\272\247\347\211\271\346\200\247\344\275\277\347\224\250\346\214\207\345\215\227.md" +++ /dev/null @@ -1,371 +0,0 @@ -[1 内存可靠性分级管理](#_Toc113540073) - -[1.1 概述](#_Toc113540074) - -[1.2 约束限制](#_Toc113540075) - -[1.3 使用方法](#_Toc113540076) - -[1.3.1 OS支持内存分级管理](#_Toc113540077) - -[1.3.2 读写缓存使用高可靠内存](#_Toc113540078) - -[1.3.3 tmpfs使用高可靠内存](#_Toc113540079) - -[1.3.4 用户态穿越内核UCE不复位](#_Toc113540080) - -# 内存可靠性分级管理 - -[1.1 概述](#概述) - -[1.2 约束限制](#约束限制) - -[1.3 使用方法](#使用方法) - -## 概述 - -本特性可以支撑使用者按照需求分配在对应可靠性的内存上,并对部分可能的UCE或CE故障影响进行一定程度的缓解,达到部分MR内存(address range mirror)的情况下,支撑业务整体可靠性不下降。 - -## 约束限制 - -本章节介绍该特性的通用约束,每个子特性会有具体的约束,在对应的小节中详细说明。 - -**兼容性限制** - -1. 本特性目前仅适用于ARM64。 -2. 硬件需支持部分MR内存(address range mirror),即通过UEFI标准接口上报属性为EFI_MEMORY_MORE_RELIABLE的内存,普通内存无需额外置位。镜像内存(MR)对应为高可靠内存,普通内存对应为低可靠内存。 -3. 高低可靠内存分级借助内核的内存管理区(zone)来实现,两者无法进行动态流动(即页面不能在不同zone之间移动)。 -4. 不同可靠性的连续物理内存会被分隔到不同的memblock,可能会导致原本可以申请大块连续物理内存的场景在使能内存分级后受到限制。 -5. 本特性使能需要依赖启动参数kernelcore的取值为“kernelcore=reliable”,与该参数其他取值均不兼容。 - - -**设计规格限制** - -1. 内核态开发时,内存申请操作需要注意: - - - 若内存申请接口支持指定gfp_flag,只有gfp_flag包含__GFP_HIGHMEM且__GFP_MOVABLE的内存申请会强制普通内存区域分配或者将这次内存分配重定向可靠内存区域,其他gfp_flag都不会进行干预。 - - - 从slab/slub/slob申请获取的都是高可靠内存(一次性申请内存大于KMALLOC_MAX_CACHE_SIZE时且gfp_flag指定为普通内存区域时可能申请到低可靠内存)。 - -2. 用户态开发时,内存申请操作需要注意: - - - 更改普通进程属性为关键进程后,实际物理内存分配阶段(page fault)才会使用高可靠内存,此前已分配的内存属性不会改变,反之亦然。因此普通进程被拉起到更改关键进程属性期间申请的内存可能不是高可靠内存。是否生效可以通过查询虚拟地址对应的物理地址是否属于高可靠内存段来验证。 - - Libc库如glibc中chunk等类似机制(ptmalloc、tcmalloc、dpdk)为了提高性能存在使用cache的逻辑,而内存cache会导致用户申请内存与内核内存申请逻辑不能完全对应,普通进程变成关键进程时并不能真正使能(该标记仅仅在内核实际发生内存申请时使能)。 - -3. 当上层业务申请内存的时发现高可靠内存不足(触发zone原生min水线)或者触发对应limit限制,会优先释放pagecache以尝试回收高可靠内存。如果仍然申请不到,内核会根据fallback的开关选择oom或fallback到低可靠内存区域完成内存申请。(fallback指某个内存管理区/节点内存不足时,到其他内存管理区/节点申请内存的情况。) - -4. 类似于NUMA_BALANCING的内存动态迁移机制,可能导致已经分配的高/低可靠内存被迁移到别的节点,由于该迁移操作丢失内存申请的上下文,且目标node可能没有对应可靠性的内存,因此可能导致迁移后的内存可靠性与预期不符。 - -5. 按照用户态高可靠内存用途引入如下三个配置文件: - - - /proc/sys/vm/task_reliable_limit: 关键进程(包含systemd)使用的高可靠内存上限。包含匿名页和文件页。进程使用的shmem也会被统计到其中(包含在匿名页中)。 - - - /proc/sys/vm/reliable_pagecache_max_bytes:全局pagecache使用的高可靠内存软上限。约束普通进程使用的高可靠pagecache的数量,系统默认不限制pagecache使用的高可靠内存的量。高可靠进程和文件系统元数据等场景不受此约束。无论fallback开关是否开启,普通进程触发该上限时,会默认申请低可靠内存,若低可靠内存申请不到,则遵循原生流程处理。 - - - /proc/sys/vm/shmem_reliable_bytes_limit:全局shmem使用的高可靠内存软上限。约束普通进程shmem使用高可靠内存的数量,系统默认不限制shmem使用的高可靠内存的量。高可靠进程不受此约束。关闭fallback时,普通进程触发该上限会导致内存申请失败,但不会OOM(与原生流程一致)。 - - 触及这些值可能会导致内存申请fallback或者OOM。 - - 关键进程在tmpfs或pagecache流程产生缺页引发的内存申请,有可能触发多个limit,多个limit之间交互关系情况详见表格。 - - | 是否触及task_reliable_limit | 是否触及reliable_pagecache_max_bytes或者shmem_reliable_bytes_limit | 内存申请处理策略 | - | --------------------------- | ------------------------------------------------------------ | ------------------------------------------------ | - | 是 | 是 | 优先回收pagecache以满足申请,否则Fallback或者OOM | - | 是 | 否 | 优先回收pagecache以满足申请,否则Fallback或者OOM | - | 否 | 否 | 先高可靠内存,失败Fallback或者OOM | - | 否 | 是 | 先高可靠内存,失败Fallback或者OOM | - - 关键进程会遵循task_reliable_limit的限制,如果task_reliable_limit高于tmpfs或pagecachelimit时,由关键进程产生的pagecache、tmpfs依旧会使用高可靠内存,由此会产生pagecache、tmpfs使用高可靠内存数量高于对应Limit的情况。 - - 当触发task_reliable_limit,如果高可靠filecache低于4M,不会进行同步回收。如果pagecache产生时,高可靠filecache低于4M,那么会fallback到低可靠内存完成申请,如果高于4M那么会优先回收pagecache满足此次申请。但接近4M时,会触发更频繁的cache直接回收,由于cache直接回收锁开销大,会导致高cpu占用率,此时文件读写性能接近裸盘性能。 - -6. 即使系统存在足够申请的高可靠内存,在如下场景下也存在fallback到低可靠内存区域内存申请的场景。 - - - 如果进行无法迁移到其他节点进行内存申请,那么会fallback当前节点的低可靠内存,常用场景举例如下: - - 如果内存申请带上了__GFP_THISNODE(如透明大页申请),代表只能从当前节点申请内存,如果此节点高可靠内存不满足申请情况,那么会尝试从本内存节点的低可靠内存区域进行内存申请。 - - 进程通过taskset、numactl等命令运行在某个包含普通内存节点。 - - 进程在系统内存原生的调度机制下调度到了某个包含普通内存节点。 - - 高可靠内存申请触发高可靠内存使用水线也会导致fallback到低可靠。 - -7. 内存分级fallback关闭时,高可靠内存将不能向低可靠内存扩展,有可能导致用户态应用对内存用量的判断与本特性不兼容,比如通过MemFree判断可用内存量。 - -8. 内存分级fallback开启时,对原生fallback有影响,主要区别在于内存管理区zone与NUMA节点的选择上,列举如下: - - - **普通用户进程**fallback流程将会是: 本节点低可靠内存->远端节点低可靠内存。 - - **关键用户进程**fallback流程将会是:本节点高可靠内存-> 远端节点高可靠内存。如果还未申请到内存且开启了memory reliable的fallback功能,将会额外重试: 本节点低可靠内存-> 远端节点低可靠内存。 - -**场景限制** - -1. 默认页面大小(PAGE_SIZE)只支持4K页面大小。 -2. Numa Node0上低4G内存必须要为高可靠且高可靠内存大小与低可靠内存大小满足内核使用,否则可能导致系统无法启动。其他node的高可靠内存空间大小无要求,但需注意: - 某个node上没有高可靠内存或者高可靠内存不足,可能导致per-node管理结构位于其他node的高可靠内存上(因为其为内核数据结构需要在高可靠区域),由此会产生内核warning,如vmemmap会产生vmemmap_verify相关告警且存在性能影响。 -3. 本特性部分统计值(比如tmpfs高可靠总量)使用percpu技术进行统计,会有额外开销,计算总和时考虑到减少性能影响。因此存在一定的误差,误差10%内属于正常。 -4. 大页限制: - - 启动阶段静态大页为低可靠内存,运行时申请的静态大页默认为低可靠内存,如果内存申请发生在关键进程上下文,那么申请到的大页为高可靠内存。 - - 透明大页THP场景下,通过扫描进行合并大页(2M页为例)时如果待合并的512个4k页面中某一个为高可靠页面,那么新申请的2M大页会使用高可靠内存,透明大页会导致使用更多高可靠内存。 - - 2M大页预留申请遵循原生的fallback流程,如果当前node缺少低可靠内存,那么会fallback高可靠区间申请高可靠完成内存申请。 - - 启动阶段进行2M大页预留,如果没有指定内存节点,那么会负载均衡到每个内存节点进行大页的预留。如果某个内存节点缺少低可靠内存,那么会遵循原生流程使用高可靠。 -5. 当前仅仅支持正常系统启动场景。部分异常场景内核启动可能与内存分级功能不兼容,如kdump启动阶段(当前kdump已支持自动关闭,其他场景需要上层业务关闭。) -6. SWAP换入换出、内存offline、KSM、cma、giganic page流程下新申请的页面类型没有基于分级内存进行考量,可能出现未定义情况(未定义情况包括高可靠用量统计不准、申请到的内存可靠性等级与预期不符等)。 - -**性能影响** - -- 物理页申请因分级管理的引入而增加了判断逻辑,会有一些性能影响,具体影响程度与系统状况、申请内存类型、各节点高低可靠内存余量有关。 -- 本特性引入高可靠内存相关用量统计值,会对系统性能产生影响。 -- 触发task_reliable_limit时,会对位于高可靠区域的cache同步回收,会增加CPU占用率。pagecache申请(文件读写操作,比如dd)触发task_reliable_limit的场景下,如果当前高可靠内存可用量(ReliableFileCache视为可用内存)接近4M时,会触发更频繁的cache直接回收,由于cache直接回收锁开销大,会导致高cpu占用率。此时文件读写性能接近裸盘性能。 - -## 使用方法 - -### OS支持内存分级管理 - -**概述** - -由于内存按照高低可靠性分为两段,内存的申请释放也需要按照高低可靠来进行分开管理。OS需要能够控制内存申请路径,用户态进程使用低可靠内存,内核态使用高可靠内存。高可靠内存不足时需要能够fallback到低可靠区申请或者直接申请失败。 - -同时对于进程部分内存段的可靠性需求与进程本身的性质,也需要能够支持按需指定申请高低可靠内存。如指定关键进程使用高可靠内存,减少关键进程遇到内存错误的概率。目前内核使用的都是高可靠内存,用户态进程使用的都是低可靠内存。如此会造成一些关键或者核心服务的不稳定,如业务转发进程,如果发生故障,会造成IO中断,影响业务的稳定性。因此需要对这些关键服务特殊处理,使其使用高可靠内存,提高关键进程运行的稳定性。 - -在系统遇到内存错误,OS应对未分配的低可靠内存进行覆盖写,以清除未发现的内存错误。 - -**约束限制** - -- **关键进程使用高可靠内存** - - 1. /proc//reliable 接口的滥用可能存在高可靠内存被过多使用的风险。 - 2. 用户态进程 reliable 属性只能在进程被拉起后,通过 proc 接口修改或者直接继承父进程该属性。systemd(pid=1)使用高可靠内存,其 reliable 属性无作用,也不继承,内核态线程reliable属性无效。 - 3. 进程的程序段和数据使用高可靠内存,高可靠不足,使用低可靠启动。 - 4. 普通进程在某些场景也会使用到高可靠内存,如hugetlb、pagecache、vdso、tmpfs等。 - -- **未分配内存覆盖写特性** - - 未分配内存覆盖写特性只能执行一次,不支持并发操作,如果执行会有如下影响: - - 1. 该特性耗时较大,每个 Node 有一个 CPU 被覆盖写线程所占用,其他任务在该 CPU 上无法调度。 - 2. 覆盖写过程需获取 Zone lock,其他业务进程内存申请要等待覆盖写完成,可能导致内存分配不及时。 - 3. 并发执行情况下会排队阻塞,产生更大的延时。 - - 如果机器性能不佳,将有可能触发内核RCU stall或soft lockup警告,以及进程内存申请操作被阻塞。因此请限制该特性在必要时只在物理机环境下使用,虚拟机等场景大概率出现如上现象。 - - 物理机参考数据可见下表(实际耗时与硬件性能、当前系统负载有关系)。 - - -表:基于物理机TaiShan 2280 V2空载状态下测试数据 - -| 测试项 | Node 0 | Node 1 | Node 2 | Node 3 | -| ------------- | ------ | ------ | ------ | ------ | -| Free Mem (MB) | 109290 | 81218 | 107365 | 112053 | - -总耗时 3.2s - -**使用方法** - -本子特性提供较多接口,使能特性并校验只需要步骤1-6即可。 - -1. 配置启动参数“kernelcore=reliable”,代表打开内存分级管理开关,CONFIG_MEMORY_RELIABLE是必要的配置,否则整个系统的内存可靠性分级管理不使能。 - -2. 根据需要,可以通过启动参数reliable_debug=[F][,S][,P]来选择性关闭fallback功能(F)、关闭tmpfs使用高可靠内存(S)以及关闭读写缓存使用高可靠内存(P),默认以上功能都使能。 - -3. 根据BIOS上报的地址段,查找高可靠内存,并进行标记,对于NUMA系统,不一定每个node上都要预留可靠内存,但是node 0上低4G物理空间必须为高可靠的内存,系统启动过程中会申请内存,如果无法分到高可靠内存,则会 fallback 到低可靠内存进行分配(mirror功能自带的fallback逻辑)或导致系统无法启动。如果使用低可靠内存,整个系统都不稳定,所以要保留node0上的高可靠内存且低4G物理空间必须为高可靠的内存。 - -4. 启动后,用户可以通过启动日志判断内存分级是否使能,应出现如下打印: - - ``` - mem reliable: init succeed, mirrored memory - ``` - -5. 高可靠内存对应的物理地址段可以通过启动日志来查询,观察efi上报memory map里的属性,带有“MR”的即为高可靠内存段,如下为启动日志节选,其中内存段mem06为高可靠内存,mem07为低可靠内存,其物理地址范围也列举在后(其他方式无法直接查询高低可靠内存地址范围)。 - - ``` - [ 0.000000] efi: mem06: [Conventional Memory| |MR| | | | | | |WB| | | ] range=[0x0000000100000000-0x000000013fffffff] (1024MB) - [ 0.000000] efi: mem07: [Conventional Memory| | | | | | | | |WB| | | ] range=[0x0000000140000000-0x000000083eb6cfff] (28651MB) - ``` - -6. 内核态开发时,对于一个页面struct page,可以通过其所处的 zone来判断,ZONE_MOVABLE为低可靠内存区,zone编号小于ZONE_MOVABLE的均为高可靠内存区,判断方式举例如下。 - - ``` - bool page_reliable(struct page *page) - { - if (!mem_reliable_status() || !page) - return false; - return page_zonenum(page) < ZONE_MOVABLE; - } - ``` - - 此外提供的若干接口按照功能点分类列举如下: - - 1. **代码层面判断可靠性是否使能:**在内核模块中通过如下接口来判断,返回 true 表示内存分级功能真正使能,返回false则未使能。 - - ``` - #include - bool mem_reliable_status(void); - ``` - - 2. **内存热插拔:**如果内核本身使能内存热插拔操作(Logical Memory hot-add),高低可靠内存也支持该操作,操作单位为memory block,与原生流程一致。 - - ``` - # 上线内存到高可靠区 - echo online_kernel > /sys/devices/system/memory/auto_online_blocks - # 上线内存到低可靠区 - echo online_movable > /sys/devices/system/memory/auto_online_blocks - ``` - - 3. **动态关闭某项分级管理功能:**使用long类型控制根据每个bit判断内存分级功能开关与关闭某项功能: - - - bit0:内存可靠性分级管理功能。 - - bit1:禁止fallback到低可靠区域。 - - bit2:关闭tmpfs使用高可靠内存。 - - bit3:关闭pagecache使用高可靠内存。 - - 其他bit预留,用于扩展。如需更改,可通过如下proc接口(权限为600),取值范围0-15,(只有当总功能bit0为1时才会处理后续功能,否则直接关闭所有功能)。 - - ``` - echo 15 > /proc/sys/vm/reliable_debug - # 关闭所有功能,因为bit0为0 - echo 14 > /proc/sys/vm/reliable_debug - ``` - - 此命令只能用于关闭功能,不能打开。对于已经关闭的功能或者运行时关闭的功能,这个命令不能将其打开。 - - 注:此功能用于逃生使用,仅异常场景或者调测阶段需要关闭内存可靠性特性时配置,禁止作为常规功能直接使用。 - - 4. **查看高可靠内存部分统计信息:**可以通过原生/proc/meminfo来查看,其中: - - - ReliableTotal:内核管理可靠内存总大小。 - - ReliableUsed:系统使用可靠内存总大小,包含系统阶段reserved使用。 - - ReliableBuddyMem:伙伴系统剩余可靠总内存大小。 - - ReliableTaskUsed:表示当前关键用户进程,systemd使用的高可靠内存大小,包括匿名页与文件页。 - - ReliableShmem:表示共享内存高可靠用量,包括共享内存、tmpfs、rootfs使用高可靠内存总大小。 - - ReliableFileCache:表示读写缓存高可靠内存用量。 - - 5. **未分配内存覆盖写:**该功能需要打开配置项。 - - CONFIG_CLEAR_FREELIST_PAGE,并且添加启动参数clear_freelist,两者具备才会使能。通过proc接口触发,取值范围只能为1(权限为0200)。 - - ``` - echo 1 > /proc/sys/vm/clear_freelist_pages - ``` - - 注:该特性依赖启动参数clear_freelist,内核对启动参数只匹配前缀,故诸如“clear_freelisttt”也会生效该特性。 - - 为防止误操作,加入内核模块参数cfp_timeout_ms,代表调用覆盖写功能的最长执行时长(超时则没写完也退出),通过sys接口触发,默认取值为2000ms(权限为0644): - - ``` - echo 500 > /sys/module/clear_freelist_page/parameters/cfp_timeout_ms # 设置超时为500ms - ``` - - 6. **查看更改当前进程高低可靠属性:**可以通过/proc//reliable来查看该进程是否是高可靠进程;运行写入,该标识会继承,如果子进程不需要,则手动修改子进程属性;systemd和内核线程不支持该属性的读写;可以写0或 者1,默认为0,代表低可靠进程(权限为0644)。 - - ``` - # 更改pid=1024的进程为高可靠进程,从更改之后开始进程缺页申请的内存是从高可靠内存区域申请,申请不到有可能fallback到低可靠 - echo 1 > /proc/1024/reliable - ``` - - 7. **设置用户态高可靠进程申请上限:**通过/proc/sys/vm/task_reliable_limit来修改用户态进程申请高可靠内存的上限,对应取值范围为[ReliableTaskUsed, ReliableTotal],单位为Byte(权限为0644)。需注意: - - - 默认值为ulong_max,代表不受限制。 - - 当该值为0,可靠进程不可使用高可靠内存,fallback模式下,fallback到低可靠内存区域申请,否则直接OOM。 - - 当该值不为0并且触发该limit, 使能fallback功能,fallback到低可靠内存区域申请内存,不使能fallback功能,则返回OOM。 - -### 读写缓存使用高可靠内存 - -**概述** - -Page cache 也叫页缓冲或文件缓冲,在linux读写文件时,它用于缓存文件的逻辑内容,从而加快对磁盘上映像和数据的访问。Page cache申请如果使用低可靠内存,当访问时可能触发UCE导致系统异常。因此,需要将读写缓存(page cache)放到高可靠内存区域,同时为了防止Page cache申请过多(默认无限制)将高可靠内存耗尽,需要对page cache的总量及使用可靠内存总量进行限制。 - -**约束限制** - -1. page cache超过限制后,page cache会进行定期回收,如果page cache的产生的速度大于page cache回收的速度则无法保证page cache的数量在指定的限制之下。 -2. /proc/sys/vm/reliable_pagecache_max_bytes的使用有一定限制,有部分场景的page cache会强制使用可靠内存,如读文件系统的元数据(inode, dentry等),会导致page cache使用的可靠内存超过接口的限制,可以通过 echo 2 \> /proc/sys/vm/drop_caches 来释放inode和dentry。 -3. page cache使用的高可靠内存超过reliable_pagecache_max_bytes限制时,会默认申请低可靠内存,若低可靠内存申请不到,则遵循原生流程处理。 -4. FileCache的统计会先统计到per cpu的缓存中,当缓存中的值超过阈值时才会加到整个系统中,之后才能在/proc/meminfo中体现,ReliableFileCache在/proc/meminfo中没有上述的阈值,会导致有时ReliableFileCache比FileCache的统计值稍大。 -5. 写缓存场景会被dirty_limit限制(由/proc/sys/vm/dirty_ratio限制,代表单个内存节点脏页百分比),超过阈值会跳过当前zone。对于内存分级而言,由于高低可靠内存处于不同的zone,写缓存有可能触发本节点的fallback,使用本节点的低可靠内存。可以通过echo 100 > /proc/sys/vm/dirty_ratio来取消限制。 -6. 读写缓存使用高可靠内存的特性中会限制page cache的使用量,如下几种情况会导致系统性能受影响: - - 如果page cache的上限值限制的过小,会导致IO增加,影响系统性能。 - - 如果page cache 回收的过于频繁,则可能会导致系统卡顿。 - - 如果page cache超过限制后每次回收量过大,则可能导致系统卡顿。 - -**使用方法** - -读写缓存使用高可靠内存默认使能,如需关闭,可通过启动项参数设置reliable_debug=P。且page cache不能无限使用,需要限制page cache的使用量。限制page cache量的功能依赖的config开关为CONFIG_SHRINK_PAGECACHE。 - -/proc/meminfo中的FileCache可以用来查询page cache的使用量,ReliableFileCache可以用来查询page cache中可靠内存的使用量。 - -限制page cache量的功能依赖若干proc接口,接口定义在/proc/sys/vm/下,用来控制page cache的使用量,具体如下表: - -| 接口名称(原生/新增) | 权限 | 说明 | 默认值 | -| ------------------------------------ | ---- | ------------------------------------------------------------ | ------------------------------------------ | -| cache_reclaim_enable(原生) | 644 | 表示page cache限制的功能的使能开关。 **取值范围:**0 或者 1,输入非法值,返回错误。 **示例:**echo 1 > cache_reclaim_enable | 1 | -| cache_limit_mbytes(新增) | 644 | **含义:**表示cache的上限,以M为单位,最小值0(表示关闭限制功能),最大值为meminfo中的MemTotal值(以M为单位换算后)。 **取值范围:**最小值0(表示关闭限制功能),最大值为内存大小(以M为单位,如free –m看到的值)。 **示例:** echo 1024 \> cache_limit_mbytes **其他:**建议cache上限不要低于总内存的一半,否则cache过小可能影响IO性能。 | 0 | -| cache_reclaim_s(原生) | 644 | **含义:**表示定期触发cache回收的时间,以秒为单位系统会根据当前online的cpu个数来创建工作队列,如果有n个cpu则创建n个工作队列,每个工作队列每隔cache_reclaim_s秒进行一次回收。该参数与cpu上下线功能兼容,如果cpu offline,则会减少工作队列个数,cpu online,则会增加。 **取值范围:**最小值0(表示关闭定期回收功能),最大43200,输入非法值,返回错误。 **示例:**echo 120 \> cache_reclaim_s **其他:**建议定期回收时间设成几分钟的级别(如2分钟),否则频繁回收可能导致系统卡顿。 | 0 | -| cache_reclaim_weight(原生) | 644 | **含义:**表示每次回收的权值,内核每个CPU每次期望回收32 \* cache_reclaim_weight个page。该权值同时作用于page上限触发的回收和定期page cache回收。 **取值范围:**最小值1,最大值100,输入非法值,返回错误。 **示例:**echo 10 \> cache_reclaim_weight **其他:**建议设为10或以下,否则每次回收过多内存时,系统可能卡顿。 | 1 | -| reliable_pagecache_max_bytes(新增) | 644 | **含义:**该接口用于控制page cache中高可靠内存的总量。 **取值范围**:0 \~ 高可靠内存最大值,单位为Bytes,高可靠内存的的最大值可以通过/proc/meminfo查询,输入非法值返回错误。 **示例:**echo 4096000 \> reliable_pagecache_max_bytes | unsigned long 类型的最大值,代表不限制用量 | - -### tmpfs使用高可靠内存 - -**概述** - -对于使用tmpfs做rootfs,rootfs中存放的操作系统使用的核心文件和数据。但是tmpfs默认使用的是低可靠内存,这样会造成核心文件和数据的不可靠。因此需要tmpfs整体使用高可靠内存。 - -**使用方法** - -tmpfs使用高可靠内存默认使能,如需关闭,可通过启动项参数设置reliable_debug=S,可以通过/proc/sys/vm/reliable_debug动态关闭,但不能动态打开。 - -在使能tmpfs使用高可靠内存时,可通过/proc/meminfo中ReliableShmem查看当前tmpfs已经使用的高可靠内存。 - -默认tmpfs使用上限是物理内存的一半(rootfs使用tmpfs时除外)。基于传统的SYS V的共享内存,它的使用同时受/proc/sys/kernel/shmmax以及/proc/sys/kernel/shmall的限制,可以动态配置。同时他们也受tmpfs使用高可靠内存的限制。详见下表. - -| **参数** | **说明** | -|---------------------------------|--------------------------------| -| /proc/sys/kernel/shmmax(原生) | SysV共享内存单个段可使用的大小 | -| /proc/sys/kernel/shmall(原生) | SysV共享内存总的可使用的大小 | - -新增接口 /proc/sys/vm/shmem_reliable_bytes_limit 用户设置系统级别 tmpfs 可用的高可靠大小(单位为Byte),默认值为LONG_MAX,代表用量不受限。可设置的范围为[0, 系统可靠内存总大小],权限为644。fallback关闭时,在达到该使用上限时,返回没有内存的错误,fallback开启时会尝试从低可靠区域申请。使用示例: - -``` -echo 10000000 > /proc/sys/vm/shmem_reliable_bytes_limit -``` - -### 用户态穿越内核UCE不复位 - -**概述** - -按照内存可靠性分级的方案,内核以及关键进程使用高可靠内存。大部分用户态进程使用低可靠内存。系统运行时,涉及大量的用户态与内核态数据交换,数据传入内核态时,即发生低可靠内存上数据拷贝到高可靠内存区域。拷贝动作发生在内核态,如果这时在读取用户态数据时发生UCE错误,即发生内核态消费内存UCE,系统会触发panic。本子特性对部分用户态穿越内核UCE场景提供解决方案,避免系统复位,包括COW场景、copy_to_user场景、copy_from_user场景、get_user场景、put_user场景、coredump场景,其余场景均不支持。 - -**约束限制** - -1. ARMv8.2及以上版本支持的RAS特性。 -2. 本特性更改的是同步异常处理策略,因此本特性的生效依赖于内核收到Firmware上报的同步异常。 -3. 内核处理依赖BIOS上报的错误类型,不能处理fatal级别硬件错误,可以处理recoverable级别的硬件错误。 -4. 仅支持COW场景、copy_to_user场景(包含读缓存pagecache)、copy_from_user场景、get_user场景、put_user场景、coredump场景六个用户态穿越内核态场景,其余场景不支持。 -5. 在coredump场景中,因为需要在文件系统的write接口上做UCE容错,本特性只支持常用的三个文件系统:ext4、tmpfs、pipefs,对应的容错接口如下: - - pipefs:copy_page_from_iter - - ext4/tmpfs:iov_iter_copy_from_user_atomic - -**使用方法** - -确保内核开启config开关CONFIG_ARCH_HAS_COPY_MC,/proc/sys/kernel/machine_check_safe值为1时代表全场景使能,改为0代表不使能,其他值均为非法。 - -当前各场景容错处理机制如下: - -| 序号 | 场景 | 现象 | 应对措施 | -| ---- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| 1 | copy_from/to_user:最基本的用户态穿越,主要涉及syscall,sysctl,procfs的操作 | 如果在拷贝时出现UCE异常,会导致内核复位 | 出现 UCE 异常时,kill 当前进程,内核不主动复位 | -| 2 | get/put_user:用于简单的变量拷贝,主要是netlink的场景用的比较多 | 如果在拷贝时出现UCE异常,会导致内核复位 | 出现 UCE 异常时,kill 当前进程,内核不主动复位 | -| 3 | COW:fork子进程,触发写时拷贝 | 触发写时拷贝,如果出现UCE会导致内核复位 | 出现 UCE 异常时,kill 相关进程,内核不主动复位 已实现,见36节 | -| 4 | 读缓存:用户态使用了低可靠内存,用户态程序读写文件时,操作系统会使用空闲内存缓存硬盘文件,提升性能。但用户态程序对文件的读取会先经过内核访问缓存 | 出现UCE异常,会导致内核复位 | 出现 UCE 错误时,kill 当前进程,内核不主动复位 已实现,见36节 | -| 5 | coredump时内存访问触发UCE | 出现UCE异常,会导致内核复位 | 出现 UCE 错误时,kill 当前进程,内核不主动复位 | -| 6 | 写缓存:写缓存回刷到磁盘时,触发UCE | 回刷缓存其实就是磁盘DMA数据的搬移,如果在此过程中触发UCE,超时结束后,页面写失败,这样会造成数据不一致,进而会导致文件系统不可用,如果是关键的数据,会出现内核复位 | 没有解决方案,不支持,内核会发生复位 | -| 7 | 内核启动参数、模块参数使用的都是高可靠内存 | / | 不支持,且本身风险降低 | -| 8 | relayfs:是一个快速的转发数据的文件系统,用于从内核态转发数据到用户态, | / | 不支持,且本身风险降低 | -| 9 | seq_file:将内核数据通过文件的形式传输到用户态 | / | 不支持,且本身风险降低 | - -由于用户态数据大多使用低可靠内存,本项目只涉及内核态读取用户态数据的场景。Linux系统下用户空间与内核空间数据交换有九种方式, 包括内核启动参数、模块参数与 sysfs、sysctl、syscall(系统调用)、netlink、procfs、seq_file、debugfs和relayfs。另有两种情况,进程创建时,COW(copy on write,写时复制)场景,和读写文件缓存(pagecache)场景。 - -其中sysfs,syscall, netlink, procfs 等方式从用户态传输数据到内核都是通过copy_from_user或者get_user的方式。 - -因此用户态穿越到内核有如下几种场景: - -copy_from_user、get_user、COW、读缓存、写缓存回刷。 - -内核态传到用户态有如下几种场景: - -relayfs、seq_file、copy_to_user、put_user。 diff --git "a/docs/zh/docs/KubeOS/figures/\345\256\271\345\231\250OS\346\226\207\344\273\266\345\270\203\345\261\200.png" "b/docs/zh/docs/KubeOS/figures/\345\256\271\345\231\250OS\346\226\207\344\273\266\345\270\203\345\261\200.png" deleted file mode 100644 index 7dfdcb3aaef79462ecc196159659b22cb21b9a9d..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/KubeOS/figures/\345\256\271\345\231\250OS\346\226\207\344\273\266\345\270\203\345\261\200.png" and /dev/null differ diff --git "a/docs/zh/docs/KubeOS/figures/\345\256\271\345\231\250OS\346\236\266\346\236\204.png" "b/docs/zh/docs/KubeOS/figures/\345\256\271\345\231\250OS\346\236\266\346\236\204.png" deleted file mode 100644 index 626071e62735bab2e33ec2a6f1a5839409d33319..0000000000000000000000000000000000000000 Binary files "a/docs/zh/docs/KubeOS/figures/\345\256\271\345\231\250OS\346\236\266\346\236\204.png" and /dev/null differ diff --git a/docs/zh/docs/KubeOS/overview.md b/docs/zh/docs/KubeOS/overview.md deleted file mode 100644 index beb6b093378dc4c24fd7d81e1f37081c1ebf6a91..0000000000000000000000000000000000000000 --- a/docs/zh/docs/KubeOS/overview.md +++ /dev/null @@ -1,9 +0,0 @@ -# 容器OS升级指南 - -本文档介绍基于openEuler系统的容器OS升级特性的安装部署和使用方法,容器OS升级是使OS可以通过标准扩展方式接入调度系统,通过调度系统管理集群内节点的OS的升级。 - -本文档适用于使用openEuler系统并希望了解和使用容器OS的社区开发者、开源爱好者以及相关合作伙伴。使用人员需要具备以下经验和技能: - -* 熟悉Linux基本操作。 -* 对kubernetes和docker有一定了解 - diff --git a/docs/zh/docs/KubeOS/public_sys-resources/icon-note.gif b/docs/zh/docs/KubeOS/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/KubeOS/public_sys-resources/icon-note.gif and /dev/null differ diff --git "a/docs/zh/docs/KubeOS/\344\275\277\347\224\250\346\226\271\346\263\225.md" "b/docs/zh/docs/KubeOS/\344\275\277\347\224\250\346\226\271\346\263\225.md" deleted file mode 100644 index 22db85467ed49e737ff7bd105ec135e1bac545db..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/KubeOS/\344\275\277\347\224\250\346\226\271\346\263\225.md" +++ /dev/null @@ -1,184 +0,0 @@ -# 使用方法 - - - - - -- [使用方法](#使用方法) - - - [注意事项](#注意事项) - - - [升级指导](#升级指导) - - - [回退指导](#回退指导) - - - [使用场景](#使用场景) - - - [手动回退](#手动回退) - - - [工具回退](#工具回退) - - - - - -## 注意事项 - -1. 容器 OS 升级为所有软件包原子升级,默认不在容器 OS 内提供单包升级能力。 -2. 容器 OS 升级为双区升级的方式,不支持更多分区数量。 -3. 单节点的升级过程的日志可在节点的/var/log/messages文件查看。 -4. 请严格按照提供的升级和回退流程进行操作,异常调用顺序可能会导致系统无法升级或回退。 -5. 使用docker镜像升级和mtls双向认证仅支持 openEuler 22.09 及之后的版本 -6. 不支持跨大版本升级 - -## 升级指导 - -在集群中创建类别为OS的定制对象,设置相应字段。类别OS来自于安装和部署章节创建的CRD对象,字段及说明如下: - -| 参数 |参数类型 | 参数说明 | 使用说明 | 是否必选 | -| -------------- | ------ | ------------------------------------------------------------ | ----- | ---------------- | -| imagetype | string | 使用的升级镜像的类型 | 需为 docker 或者 disk ,其他值无效,且该参数仅在升级场景有效|是 | -| opstype | string | 进行的操作,升级或者回退 | 需为 upgrade ,或者 rollback ,其他值无效 |是 | -| osversion | string | 用于升级或回退的镜像的OS版本 | 需为 KubeOS version , 例如: KubeOS 1.0.0|是 | -| maxunavailable | int | 同时进行升级或回退的节点数 | maxunavailable值设置为大于实际集群的节点数时也可正常部署,升级或回退时会按照集群内实际节点数进行|是 | -| dockerimage | string | 用于升级的容器镜像 | 需要为容器镜像格式:repository/name:tag,仅在使用容器镜像升级场景下有效|是 | -| imageurl | string | 用于升级的磁盘镜像的地址 | imageurl中包含协议,只支持http或https协议,例如:https://192.168.122.15/update.img 仅在使用磁盘镜像升级场景下有效|是 | -| checksum | string | 用于升级的磁盘镜像校验的checksum(SHA-256)值 | 仅在使用磁盘镜像升级场景下有效 |是 | -| flagSafe | bool | 当imageurl的地址使用http协议表示是否是安全的 | 需为 true 或者 false ,仅在imageurl使用http协议时有效 |是 | -| mtls | bool | 用于表示与imageurl连接是否采用https双向认证 | 需为 true 或者 false ,仅在imageurl使用https协议时有效|是 | -| cacert | string | https或者https双向认证时使用的根证书文件 | 仅在imageurl使用https协议时有效| imageurl使用https协议时必选 | -| clientcert | string | https双向认证时使用的客户端证书文件 | 仅在使用https双向认证时有效|mtls为true时必选 | -| clientkey | string | https双向认证时使用的客户端公钥 | 仅在使用https双向认证时有效|mtls为true时必选 | - -imageurl指定的地址里包含协议,只支持http或https协议。imageurl为https协议时为安全传输,imageurl为http地址时,需指定flagSafe为true,即用户明确该地址为安全时,才会下载镜像。如imageurl为http地址且没有指定flagSafe为true,默认该地址不安全,不会下载镜像并且在升级节点的日志中提示用户该地址不安全 - -对于imageurl,推荐使用https协议,使用https协议需要升级的机器已安装相应证书。如果镜像服务器由用户自己维护,需要用户自己进行签名,并保证升级节点已安装对应证书。用户需要将证书放在容器OS /etc/KubeOS/certs目录下。地址由管理员传入,管理员应该保证网址的安全性,推荐采用内网地址。 - -容器OS镜像的合法性检查需要由容器OS镜像服务提供者做合法性检查,确保下载的容器OS镜像来源可靠 - -编写YAML文件,在集群中部署 OS 的cr实例,用于部署cr实例的YAML示例如下: - -* 使用磁盘镜像进行升级 - - ``` - apiVersion: upgrade.openeuler.org/v1alpha1 - kind: OS - metadata: - name: os-sample - spec: - imagetype: disk - opstype: upgrade - osversion: edit.os.version - maxunavailable: edit.node.upgrade.number - dockerimage: "" - imageurl: edit.image.url - checksum: image.checksum - flagSafe: imageurl.safety - mtls: imageurl use mtls or not - cacert: ca certificate - clientcert: client certificate - clientkey: client certificate key - ``` - -* 使用容器镜像升级 - - ``` shell - apiVersion: upgrade.openeuler.org/v1alpha1 - kind: OS - metadata: - name: os-sample - spec: - imagetype: docker - opstype: upgrade - osversion: edit.os.version - maxunavailable: edit.node.upgrade.number - dockerimage: dockerimage like repository/name:tag - imageurl: "" - checksum: "" - flagSafe: false - mtls: true - ``` - - 使用容器镜像进行升级前请先制作升级所需的容器镜像,制作方式请见《容器OS镜像制作指导》 - -假定将上面的YAML保存到upgrade_v1alpha1_os.yaml - -查看未升级的节点的 OS 版本 - -``` -kubectl get nodes -o custom-columns='NAME:.metadata.name,OS:.status.nodeInfo.osImage' -``` - -执行命令,在集群中部署cr实例后,节点会根据配置的参数信息进行升级。 - -``` -kubectl apply -f upgrade_v1alpha1_os.yaml -``` - -再次查看节点的 OS 版本来确认节点是否升级完成 - -``` -kubectl get nodes -o custom-columns='NAME:.metadata.name,OS:.status.nodeInfo.osImage' -``` - -> ![](./public_sys-resources/icon-note.gif)**说明**: -> -> 如果后续需要再次升级,与上面相同对 upgrade_v1alpha1_os.yaml 的 imageurl ,osversion,checksum,maxunavailable,flagSafe 或者dockerimage字段进行相应修改。 - -## 回退指导 - -### 使用场景 - -- 虚拟机无法正常启动时,需要退回到上一可以启动的版本时进行回退操作,仅支持手动回退容器 OS 。 -- 虚拟机能够正常启动并且进入系统,需要将当前版本退回到老版本时进行回退操作,支持工具回退(类似升级方式)和手动回退,建议使用工具回退。 - -### 手动回退 - -手动重启虚拟机,选择第二启动项进行回退,手动回退仅支持回退到本次升级之前的版本。 - -### 工具回退 - -* 回退至任意版本 - * 修改 OS 的cr实例的YAML 配置文件(例如 upgrade_v1alpha1_os.yaml),设置相应字段为期望回退的老版本镜像信息。类别OS来自于安装和部署章节创建的CRD对象,字段说明及示例请见上一节升级指导。 - - * YAML修改完成后执行更新命令,在集群中更新定制对象后,节点会根据配置的字段信息进行回退 - - ``` - kubectl apply -f upgrade_v1alpha1_os.yaml - ``` - -* 回退至上一版本 - - * 修改upgrade_v1alpha1_os.yaml,设置osversion为上一版本,opstype为rollback,回退至上一版本(即切换至上一分区)。YAML示例如下: - - ``` - apiVersion: upgrade.openeuler.org/v1alpha1 - kind: OS - metadata: - name: os-sample - spec: - imagetype: "" - opstype: rollback - osversion: KubeOS pervious version - maxunavailable: 2 - dockerimage: "" - imageurl: "" - checksum: "" - flagSafe: false - mtls:true - ``` - - * YAML修改完成后执行更新命令,在集群中更新定制对象后,节点会根据配置的字段信息进行回退 - - ``` - kubectl apply -f upgrade_v1alpha1_os.yaml - ``` - - 更新完成后,节点会根据配置信息回退容器 OS。 - -* 查看节点容器 OS 版本,确认回退是否成功。 - - ``` - kubectl get nodes -o custom-columns='NAME:.metadata.name,OS:.status.nodeInfo.osImage' - ``` - diff --git "a/docs/zh/docs/KubeOS/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" "b/docs/zh/docs/KubeOS/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" deleted file mode 100644 index 418f2a3b64c6414761a9afaa95c9a4d4fe57deb6..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/KubeOS/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" +++ /dev/null @@ -1,224 +0,0 @@ -# 安装与部署 - -本章介绍如何安装和部署容器 OS 升级工具。 - - - - - -- [安装与部署](#安装与部署) - - - [软硬件要求](#软硬件要求) - - - [硬件要求](#硬件要求) - - [软件要求](#软件要求) - - [环境准备](#环境准备) - - - [安装容器OS升级工具](#安装容器os升级工具) - - - [部署容器OS升级工具](#部署容器os升级工具) - - - [制作os-operator和os-proxy镜像](#制作os-operator和os-proxy镜像) - - [制作容器OS镜像](#制作容器os镜像) - - [部署CRD,operator和proxy](#部署crd,operator和proxy) - - - - - -## 软硬件要求 - -### 硬件要求 - -* 当前仅支持 x86和 AArch64 架构 - -### 软件要求 - -* 操作系统:openEuler 22.09 - -### 环境准备 - -* 安装 openEuler 系统,安装方法参考《openEuler 22.09 安装指南》 -* 安装 qemu-img,bc,parted,tar,yum,docker,dosfstools - -## 安装容器OS升级工具 - -安装容器 OS 升级工具的操作步骤如下: - -1. 配置 yum 源:openEuler 22.09 和 openEuler 22.09 EPOL - - ``` - [openEuler22.09] # openEuler 22.09 官方发布源 - name=openEuler22.09 - baseurl=http://repo.openeuler.org/openEuler-22.09/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.09/everything/$basearch/RPM-GPG-KEY-openEuler - ``` - - ``` - [Epol] # openEuler 22.09:Epol 官方发布源 - name=Epol - baseurl=http://repo.openeuler.org/openEuler-22.09/EPOL/main/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=http://repo.openeuler.org/openEuler-22.09/OS/$basearch/RPM-GPG-KEY-openEuler - ``` - -2. 使用 root 账户安装容器 OS 升级工具: - - ```shell - # yum install KubeOS KubeOS-scripts -y - ``` - - -> ![](./public_sys-resources/icon-note.gif)**说明**: -> -> 容器 OS 升级工具会安装在 /opt/kubeOS 目录下,包括os-operator,os-proxy,os-agent二进制,制作容器 OS 工具及相应配置文件 。 - -## 部署容器OS升级工具 - -容器OS升级工具安装完成后,需要对此进行配置部署,本章介绍如何配置和部署容器OS升级工具。 - -### 制作os-operator和os-proxy镜像 - -#### 环境准备 - -使用 Docker 制作容器镜像,请先确保 Docker 已经安装和配置完成。 - -#### 操作步骤 - -1. 进入工作目录。 - - ```shell - cd /opt/kubeOS - ``` - -2. 指定 proxy 的镜像仓库、镜像名及版本。 - - ```shell - export IMG_PROXY=your_imageRepository/os-proxy_imageName:version - ``` - -3. 指定 operator 的镜像仓库、镜像名及版本。 - - ```shell - export IMG_OPERATOR=your_imageRepository/os-operator_imageName:version - ``` - -4. 请用户自行编写Dockerfile来构建镜像 ,Dockfile编写请注意以下几项 - - * os-operator和os-proxy镜像需要基于baseimage进行构建,请用户保证baseimage的安全性 - * 需将os-operator和os-proxy二进制文件分别拷贝到对应的镜像中 - * 请确保os-proxy镜像中os-proxy二进制文件件属主和属组为root,文件权限为500 - * 请确保os-operator镜像中os-operator二进制文件属主和属组为容器内运行os-operator进程的用户,文件权限为500 - * os-operator和os-proxy的二进制文件在镜像内的位置和容器启动时运行的命令需与部署的yaml中指定的字段相对应。 - - Dockerfile示例如下 - - ``` - FROM your_baseimage - COPY ./bin/proxy /proxy - ENTRYPOINT ["/proxy"] - ``` - - ``` - FROM your_baseimage - COPY --chown=6552:6552 ./bin/operator /operator - ENTRYPOINT ["/operator"] - ``` - - Dockerfile也可以使用多阶段构建。 - -5. 构建容器镜像(os-operator 和 os-proxy 镜像)。 - - ```shell - # 指定proxy的Dockerfile地址 - export DOCKERFILE_PROXY=your_dockerfile_proxy - # 指定operator的Dockerfile路径 - export DOCKERFILE_OPERATOR=your_dockerfile_operator - # 镜像构建 - docker build -t ${IMG_OPERATOR} -f ${DOCKERFILE_OPERATOR} . - docker build -t ${IMG_PROXY} -f ${DOCKERFILE_PROXY} . - ``` - -6. 将容器镜像 push 到镜像仓库。 - - ```shell - docker push ${IMG_OPERATOR} - docker push ${IMG_PROXY} - ``` - - -### 制作容器OS虚拟机镜像 - -#### 注意事项 - -* 以虚拟机镜像为例,如需进行物理机的镜像制作请见《容器OS镜像制作指导》 -* 制作容器OS 镜像需要使用 root 权限 -* 容器OS 镜像制作工具的 rpm 包源为 openEuler 具体版本的 everything 仓库和 EPOL 仓库。制作镜像时提供的 repo 文件中,yum 源建议同时配置 openEuler 具体版本的 everything 仓库和 EPOL 仓库 -* 使用默认 rpmlist 制作的容器OS虚拟机镜像,默认和制作工具保存在相同路径,该分区至少有 25GiB 的剩余磁盘空间 -* 制作容器 OS 镜像时,不支持用户自定义配置挂载文件 - -#### 操作步骤 - -制作容器OS 虚拟机镜像使用 kbimg.sh 脚本,命令详情请见《容器OS镜像制作指导》 - -制作容器OS 虚拟机镜像的步骤如下: - -1. 进入执行目录: - - ```shell - cd /opt/kubeOS/scripts - ``` - -2. 执行 kbming.sh 制作容器OS,参考命令如下: - - ```shell - bash kbimg.sh create vm-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' - ``` - 其中 xx.repo 为制作镜像所需要的 yum 源,yum 源建议配置为 openEuler 具体版本的 everything 仓库和 EPOL 仓库。 - - 容器 OS 镜像制作完成后,会在 /opt/kubeOS/scripts 目录下生成: - - - raw格式的系统镜像system.img,system.img大小默认为20G,支持的根文件系统分区大小<2020MiB,持久化分区<16GB。 - - qcow2 格式的系统镜像 system.qcow2。 - - 可用于升级的根文件系统分区镜像 update.img 。 - - 制作出来的容器 OS 虚拟机镜像目前只能用于 CPU 架构为 x86 和 AArch64 的虚拟机场景,不支持 x86 架构的虚拟机使用 legacy 启动模式启动。 - - -### 部署CRD,operator和proxy - -#### 注意事项 - -* 请先部署 Kubernetes 集群,部署方法参考《openEuler 22.09 Kubernetes 集群部署指南》 - -- 集群中准备进行升级的 Worker 节点的 OS 需要为使用上一节方式制作出来的容器 OS,如不是,请用 system.qcow2重新部署虚拟机,虚拟机部署请见《openEuler 22.09 虚拟化用户指南》,Master节点目前不支持容器 OS 升级,请用openEuler 22.09部署Master节点 -- 部署 OS 的 CRD(CustomResourceDefinition),os-operator,os-proxy 以及 RBAC (Role-based access control) 机制的 YAML 需要用户自行编写。 -- operator 和 proxy 部署在 kubernetes 集群中,operator 应部署为 deployment,proxy 应部署为damonset -- 尽量部署好 kubernetes 的安全措施,如 rbac 机制,pod 的 service account 和 security policy 配置等 - -#### 操作步骤 - -1. 准备 YAML 文件,包括用于部署 OS 的CRD、RBAC 机制、os- operator 和os- proxy 的 YAML 文件,可参考[yaml-example](https://gitee.com/openeuler/KubeOS/tree/master/docs/example/config)。假设分别为 crd.yaml、rbac.yaml、manager.yaml 。 - -2. 部署 CRD、RBAC、os-operator 和 os-proxy。假设 crd.yaml、rbac.yaml、manager.yaml 文件分别存放在当前目录的 config/crd、config/rbac、config/manager 目录下 ,参考命令如下: - - ```shell - kubectl apply -f config/crd - kubectl apply -f config/rbac - kubectl apply -f config/manager - ``` - -3. 部署完成后,执行以下命令,确认各个组件是否正常启动。如果所有组件的 STATUS 为 Running,说明组件已经正常启动。 - - ```shell - kubectl get pods -A - ``` - - - - - - diff --git "a/docs/zh/docs/KubeOS/\345\256\271\345\231\250OS\351\225\234\345\203\217\345\210\266\344\275\234\346\214\207\345\257\274.md" "b/docs/zh/docs/KubeOS/\345\256\271\345\231\250OS\351\225\234\345\203\217\345\210\266\344\275\234\346\214\207\345\257\274.md" deleted file mode 100644 index ca4009432d6798f10c2c69aa8b7bf7d3e833ed32..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/KubeOS/\345\256\271\345\231\250OS\351\225\234\345\203\217\345\210\266\344\275\234\346\214\207\345\257\274.md" +++ /dev/null @@ -1,162 +0,0 @@ -# 容器OS镜像制作指导# - -## 简介 ## - -kbimg是KubeOS部署和升级所需的镜像制作工具,可以使用kbimg制作KubeOS docker,虚拟机和物理机镜像 - -## 命令介绍 ## - -### 命令格式 ### - -**bash kbimg.sh** \[ --help | -h \] create \[ COMMANDS \] \[ OPTIONS \] - -### 参数说明 ### - -* COMMANDS - - | 参数 | 描述 | - | ------------- | ---------------------------------------------- | - | upgrade-image | 生成用于安装和升级的docker镜像格式的 KubeOS 镜像 | - | vm-image | 生成用于部署和升级的虚拟机镜像 | - | pxe-image | 生成物理机安装所需的镜像及文件 | - - - -* OPTIONS - - | 参数 | 描述 | - | ------------ | ------------------------------------------------------------ | - | -p | repo 文件的路径,repo 文件中配置制作镜像所需要的 yum 源 | - | -v | 制作出来的KubeOS镜像的版本 | - | -b | os-agent二进制的路径 | - | -e | KubeOS 镜像 root 用户密码,加密后的带盐值的密码,可以用 openssl,kiwi 命令生成 | - | -d | 生成或者使用的 docke r镜像 | - | -h --help | 查看帮助信息 | - - - -## 使用说明 ## - -#### 注意事项 ### - -* kbimg.sh 执行需要 root 权限 -* 当前仅支持 x86和 AArch64 架构使用 -* 容器 OS 镜像制作工具的 rpm 包源为 openEuler 具体版本的 everything 仓库和 EPOL 仓库。制作镜像时提供的 repo 文件中,yum 源建议同时配置 openEuler 具体版本的 everything 仓库和 EPOL 仓库 - -### KubeOS docker镜像制作 ### - -#### 注意事项 #### - -* 制作的 docker 镜像仅用于后续的虚拟机/物理机镜像制作或升级使用,不支持启动容器 -* 使用默认 rpmlist 进行容器OS镜像制作时所需磁盘空间至少为6G,如自已定义 rpmlist 可能会超过6G - -#### 使用示例 #### -* 如需进行DNS配置,请先在```scripts```目录下自定义```resolv.conf```文件 -```shell - cd /opt/kubeOS/scripts - touch resolv.conf - vim resolv.conf -``` -* 制作KubeOS容器镜像 -``` shell -cd /opt/kubeOS/scripts -bash kbimg.sh create upgrade-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' -d your_imageRepository/imageName:version -``` - -* 制作完成后查看制作出来的KubeOS容器镜像 - -``` shell -docker images -``` - -### KubeOS 虚拟机镜像制作 ### - -#### 注意事项 #### - -* 如使用 docker 镜像制作请先拉取相应镜像或者先制作docker镜像,并保证 docker 镜像的安全性 -* 制作出来的容器 OS 虚拟机镜像目前只能用于 CPU 架构为 x86 和 AArch64 的虚拟机 -* 容器 OS 目前不支持 x86 架构的虚拟机使用 legacy 启动模式启动 -* 使用默认rpmlist进行容器OS镜像制作时所需磁盘空间至少为25G,如自已定义rpmlist可能会超过25G - -#### 使用示例 #### - -* 使用repo源制作 - * 如需进行DNS配置,请先在```scripts```目录下自定义```resolv.conf```文件 - ```shell - cd /opt/kubeOS/scripts - touch resolv.conf - vim resolv.conf - ``` - * KubeOS虚拟机镜像制作 - ``` shell - cd /opt/kubeOS/scripts - bash kbimg.sh create vm-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' - ``` - -* 使用docker镜像制作 - - ``` shell - cd /opt/kubeOS/scripts - bash kbimg.sh create vm-image -d your_imageRepository/imageName:version - ``` -* 结果说明 - 容器 OS 镜像制作完成后,会在 /opt/kubeOS/scripts 目录下生成: - * system.qcow2: qcow2 格式的系统镜像,大小默认为 20GiB,支持的根文件系统分区大小 < 2020 MiB,持久化分区 < 16GiB 。 - * update.img: 用于升级的根文件系统分区镜像 - - -### KubeOS 物理机安装所需镜像及文件制作 ### - -#### 注意事项 #### - -* 如使用 docker 镜像制作请先拉取相应镜像或者先制作 docker 镜像,并保证 docker 镜像的安全性 -* 制作出来的容器 OS 物理安装所需的镜像目前只能用于 CPU 架构为 x86 和 AArch64 的物理机安装 -* Global.cfg配置中指定的ip为安装时使用的临时ip,请在系统安装启动后请参考《openEuler 22.09 管理员指南-配置网络》进行网络配置 -* 不支持多个磁盘都安装KubeOS,可能会造成启动失败或挂载紊乱 -* 容器OS 目前不支持 x86 架构的物理机使用 legacy 启动模式启动 -* 使用默认rpmlist进行镜像制作时所需磁盘空间至少为5G,如自已定义 rpmlist 可能会超过5G -#### 使用示例 #### - -* 首先需要修改```00bootup/Global.cfg```的配置,对相关参数进行配置,参数均为必填,ip目前仅支持ipv4,配置示例如下 - - ```shell - # rootfs file name - rootfs_name=kubeos.tar - # select the target disk to install kubeOS - disk=/dev/sda - # pxe server ip address where stores the rootfs on the http server - server_ip=192.168.1.50 - # target machine temporary ip - local_ip=192.168.1.100 - # target machine temporary route - route_ip=192.168.1.1 - # target machine temporary netmask - netmask=255.255.255.0 - # target machine netDevice name - net_name=eth0 - ``` - -* 使用 repo 源制作 - * 如需进行DNS配置,请在```scripts```目录下自定义```resolv.conf```文件 - ```shell - cd /opt/kubeOS/scripts - touch resolv.conf - vim resolv.conf - ``` - * KubeOS物理机安装所需镜像制作 - ``` - cd /opt/kubeOS/scripts - bash kbimg.sh create pxe-image -p xxx.repo -v v1 -b ../bin/os-agent -e '''$1$xyz$RdLyKTL32WEvK3lg8CXID0''' - ``` - -* 使用 docker 镜像制作 - ``` shell - cd /opt/kubeOS/scripts - bash kbimg.sh create pxe-image -d your_imageRepository/imageName:version - ``` - -* 结果说明 - - * initramfs.img: 用于pxe启动用的 initramfs 镜像 - * kubeos.tar: pxe安装所用的 OS - diff --git "a/docs/zh/docs/KubeOS/\350\256\244\350\257\206\345\256\271\345\231\250OS\345\215\207\347\272\247.md" "b/docs/zh/docs/KubeOS/\350\256\244\350\257\206\345\256\271\345\231\250OS\345\215\207\347\272\247.md" deleted file mode 100644 index c12842050ee3aaf0df34477d27ed5c7e0d2e7200..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/KubeOS/\350\256\244\350\257\206\345\256\271\345\231\250OS\345\215\207\347\272\247.md" +++ /dev/null @@ -1,42 +0,0 @@ -# 认识容器 OS 升级 - -## 概述 - -在云场景中,容器和 kubernetes 的应用越来越广泛。然而,当前对容器和 OS 进行独立管理的方式,往往面临功能冗余、两套调度系统协同困难的问题。另外,OS 的版本管理比较困难,相同版本的 OS 在使用过程中会各自安装、更新、删除软件包,一段时间后 OS 版本变得不一致,导致版本分裂,并且 OS 可能和业务紧耦合,造成大版本升级等比较困难。为了应对上述问题,openEuler 推出了基于openEuler的容器 OS 升级工具。 - -容器 OS 针对业务以容器的形式运行的场景,专门设计的一种轻量级操作系统。基于openEuler的容器 OS 升级工具将容器 OS 作为组件接入 kubernetes,使容器 OS 和业务处于同等地位,通过 kubernetes 集群统一管理容器和容器 OS,实现一套系统管理容器和OS。 - -openEuler 容器 OS 升级工具通过 kubernetes operator 扩展机制控制容器 OS 的升级流程,对容器 OS 进行整体升级,从而实现 OS 管理器和业务协同,该升级方式会在容器 OS 升级前,将业务迁移到其他非升级节点,减少 OS 升级、配置过程中对业务的影响。该升级方式是对容器 OS 进行原子升级,使 OS 一直向预想的状态同步,保证集群里的 OS 版本一致,避免版本分裂问题。 - -## 架构介绍 - -### 容器 OS 升级架构 - -**图1** 容器 OS 升级架构 - -![](./figures/容器OS架构.png) - -如图所示,容器 OS 主要包含三个组件 os-operator,os-proxy 和 os-agent 。os-operator 和 os-proxy 运行在容器中,部署在 kubernetes 集群内;os-agent 不属于集群,直接作为进程运行在 Worker Node 中。 - -- os-operator:全局的容器 OS 管理器,持续查看所有节点的容器 OS 版本信息,并根据用户配置的信息控制同时进行升级的节点个数,并标记准备升级的节点。 - -- os-proxy:单节点的 OS 管理器,持续查看当前节点的容器 OS 版本信息。如果当前节点被 os-operator 标记为准备升级的节点后,锁定节点并驱逐 pod,转发升级信息到 os-agent 。 - -- os-agent:接收来自 proxy 的信息,从 OSImage Server 下载用于更新的容器 OS 镜像,然后进行升级并重启节点。 - - -### 容器 OS 文件系统 - -**图 2** 容器 OS 文件系统布局 - -![](./figures/容器OS文件布局.png) - - - -如图所示,容器 OS 包含四个分区: - -- boot 分区:grub2文件分区 -- Persist 分区:用于存放持久性用户数据,容器 OS 升级时,该分区的数据也会保留 -- 两个 root 分区:容器 OS 采用双分区模式,将 root 分区划分为 rootA 和 rootB。假定初始化时,系统运行在 rootA 分区上,当进行系统更新时,会下载新系统到 rootB 分区,grub会有两个启动项分别为A,B,将 grub 默认启动项设置为B,最后会重启虚拟机。虚拟机启动后容器 OS 将运行在刚更新过的 rootB 分区上。 - -容器OS的root文件系统为只读,用户的持久化数据存放在Persist持久化数据分区 。 \ No newline at end of file diff --git a/docs/zh/docs/Kubernetes/Kubernetes.md b/docs/zh/docs/Kubernetes/Kubernetes.md deleted file mode 100644 index 0ac8aaeb8010db54b03c7d1551bcefad601fc788..0000000000000000000000000000000000000000 --- a/docs/zh/docs/Kubernetes/Kubernetes.md +++ /dev/null @@ -1,13 +0,0 @@ -# Kubernetes 集群部署指南 - -本文档介绍在 openEuler 操作系统上,通过二进制部署 K8S 集群的一个参考方法。 - -说明:本文所有操作均使用 `root`权限执行。 - -## 集群状态 - -本文所使用的集群状态如下: - -- 集群结构:6 个 `openEuler 21.09`系统的虚拟机,3 个 master 和 3 个 node 节点 -- 物理机:`openEuler 21.09 `的 `x86/ARM`服务器 - diff --git "a/docs/zh/docs/Kubernetes/eggo\345\267\245\345\205\267\344\273\213\347\273\215.md" "b/docs/zh/docs/Kubernetes/eggo\345\267\245\345\205\267\344\273\213\347\273\215.md" deleted file mode 100644 index e81d4f591b03a80da4f5c7fc9daa58e5d2166823..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kubernetes/eggo\345\267\245\345\205\267\344\273\213\347\273\215.md" +++ /dev/null @@ -1,433 +0,0 @@ -# 工具介绍 - -本章介绍自动化部署工具的相关内容,建议用户在部署前阅读。 - -## 部署方式 - -openEuler 提供的 Kubernetes 集群自动化部署工具使用命令行方式进行集群的一键部署。它提供了如下几种部署方式: - -- 离线部署:本地准备好所有需要用到的 RPM 软件包、二进制文件、插件、容器镜像,并将它们按照一定的格式打包成一个 tar.gz 文件,然后完成对应 YAML 配置文件的编写,即可执行命令实现一键部署。当虚拟机无法访问外部网络时,可以采用该部署方式。 -- 在线部署:只需要完成对应 YAML 配置文件的编写,所需的RPM 软件包、二进制文件、插件、容器镜像,都在安装部署阶段连接互联网自动下载。该方式需要虚拟机能够访问软件源、集群依赖的镜像仓库,例如 Docker Hub 。 - -## 配置介绍 - -使用工具自动化部署 Kubernetes 集群时,使用 YAML 配置文件描述集群部署的信息,此处介绍各配置项含义以及配置示例。 - -### 配置项介绍 - -- cluster-id:集群名称,请遵循 DNS 域名的命名规范。例如 k8s-cluster - -- username:需要部署 k8s 集群的机器的 ssh 登录用户名,所有机器都需要使用同一个用户名。 - -- private-key-path:ssh 免密登录的秘钥存储文件的路径。private-key-path 和 password 只需要配置其中一项,如果两者都进行了配置,优先使用 private-key-path - -- masters:master 节点列表,建议每个 master 节点同时作为 worker 节点。每个 master 节点包含如下配置子项,多个 master 节点配置多组子项内容: - - name:master 节点名称,为 k8s 集群看到的该节点名称 - - ip:master 节点的 IP 地址 - - port:ssh 登录该节点的端口,默认为 22 - - arch:master 节点的 CPU 架构,例如 x86_64 取值为 amd64 - -- workers:worker 节点列表。每个 worker 节点包含如下配置子项,多个 worker 节点配置多个子项内容: - - name:worker 节点名称,为 k8s 集群看到的该节点名称 - - ip:worker 节点的 IP 地址 - - port:ssh 登录该节点的端口,默认为 22 - - arch:worker 节点的 CPU 架构,例如 x86_64 取值为 amd64 - -- etcds:etcd 节点的列表。如果该项为空,则会为每个 master 节点部署一个 etcd,否则只会部署配置的 etcd 节点。每个 etcd 节点包含如下配置子项,多个 etcd 节点配置多组子项内容: - - name:etcd 节点的名称,为 k8s 集群看到的该节点的名称 - - ip:etcd 节点的 IP 地址 - - port:ssh 登录的端口 - - arch:etcd 节点的 CPU 架构,例如 x86_64 取值为 amd64 - -- loadbalance:loadbalance 节点列表。每个 loadbalance 节点包含如下配置子项,多个 loadbalance 节点配置多组子项内容: - - name:loadbalance 节点的名称,为 k8s 集群看到的该节点的名称 - - ip:loadbalance 节点的 IP 地址 - - port:ssh 登录的端口 - - arch:loadbalance 节点的 CPU 架构,例如 x86_64 取值为 amd64 - - bind-port:负载均衡服务的侦听端口 - -- external-ca:是否使用外部 CA 证书,使用则配置为 true,反之,配置为 false - -- external-ca-path:外部 CA 证书文件的路径 。仅 external-ca 为 true 时有效 - -- service:k8s 创建的 service 信息。service 配置包含如下配置子项: - - cidr:k8s 创建的 service 的 IP 地址网段 - - dnsaddr:k8s 创建的 service 的 DNS 地址 - - gateway:k8s创建的 service 的网关地址 - - dns:k8s 创建的 coredns 的配置。dns 配置包含如下配置子项: - - corednstype:k8s 创建的 coredns 的部署类型,支持 pod 和 binary - - imageversion:pod 部署类型的 coredns 镜像版本 - - replicas:pod 部署类型的 coredns 副本数量 - -- network:k8s 集群网络配置。network 配置包含如下配置子项: - - podcidr:k8s 集群网络的 IP 地址网段 - - plugin:k8s 集群部署的网络插件 - - plugin-args:k8s 集群网络的网络插件的配置文件路径。例如 : {"NetworkYamlPath": "/etc/kubernetes/addons/calico.yaml"} - -- apiserver-endpoint:进群外部可访问的 APISERVER 服务的地址或域名,如果配置了 loadbalances 则填loadbalance 地址,否则填写第 1 个 master 节点地址。 - -- apiserver-cert-sans:apiserver 相关证书中需要额外配置的 IP 和域名。它包含如下子配置项 - - dnsnames:apiserver 相关证书中需要额外配置的域名数组列表。 - - ips:apiserver 相关证书中需要额外配置的 IP 地址数组列表。 - -- apiserver-timeout:apiserver 响应超时时间 - -- etcd-token:etcd 集群名称 - -- dns-vip:dns 的虚拟 IP 地址 - -- dns-domain:DNS 域名后缀 - -- pause-image:pause 容器的完整镜像名称 - -- network-plugin:网络插件类型。仅支持配置 cni ,配置为空时使用 k8s 默认网络。 - -- cni-bin-dir:网络插件地址,多个地址使用 "," 分隔,例如:/usr/libexec/cni,/opt/cni/bin - -- runtime:指定容器运行时类型,目前支持 docker 和 iSulad - -- runtime-endpoint:容器运行时 endpoint,当 runtime 为 docker 时,可以不指定 - -- registry-mirrors:下载容器镜像时,使用的镜像仓库的 mirror 站点地址 - -- insecure-registries:下载容器镜像时,使用 http 协议下载镜像的镜像仓库地址 - -- config-extra-args:各个组件(例如 kube-apiserver、etcd)服务启动配置的额外参数。它包含如下子配置项: - - name:组件名称,支持 etcd、kube-apiserver、kube-controller-manager、kube-scheduler、kube-proxy、kubelet - - - extra-args:组件的拓展参数,格式为 key: value 格式,注意 key 对应的组件参数前需要加上 "-" 或者 "--" 。 - - - open-ports:配置需要额外打开的端口,k8s 自身所需端口不需要进行配置,k8s 以外的插件端口需要进行额外配置。 - - worker | master | etcd | loadbalance:指定打开端口的节点类型,每项配置包含一个多或者多个 port 和 protocol 子配置项。 - - port:端口地址 - - protocol:端口类型,可选值为 tcp 或者 udp - - - install:配置各种类型节点上需要安装的安装包或者二进制文件的详细信息,注意将对应文件放到在 tar.gz 安装包中。以下给全量配置说明,具体配置请根据实际情况选择。 - - package-source:配置安装包的详细信息 - - type:安装包的压缩类型,目前只支持 tar.gz 类型的安装包 - - dstpath:安装包在对端机器上的路径,必须是可用的绝对路径 - - srcpath:不同架构安装包的存放路径,架构必须与机器架构相对应,必须是可用的绝对路径 - - arm64:arm64 架构安装包的路径,配置的机器中存在 arm64 机器场景下需要配置 - - amd64:amd64 类型安装包的路径,配置的机器中存在 x86_64 机器场景下需要配置 - - > ![](./public_sys-resources/icon-note.gif)**说明**: - > - > - install 配置中 etcd、kubernetes-master、kubernetes-worker、network、loadbalance、container、image、dns 中的子配置项相同,都是 name、type、dst,schedule、TimeOut 。其中 dst,schedule、TimeOut 为可选项,用户根据安装的文件决定是否配置。下述仅以 etcd 和 kubernetes-master 节点的配置为例说明。 - - - etcd:etcd 类型节点需要安装的包或二进制文件列表 - - name:需要安装的软件包或二进制文件的名称,如果是安装包则只写名称,不填写具体的版本号,安装时会使用 `$name*` 识别,例如 etcd 。如果为多个软件包,各名称使用 ,分隔 。 - - type:配置项类型,可选值为 pkg、repo、bin、file、dir、image、yaml、shell 。如果配置为 repo ,请在对应节点上配置 repo 源 - - dst:目的文件夹路径,type 为 bin、file、dir 类型时需要配置。表示将文件/文件夹放到节点的哪个目录下,为了防止用户误配置路径,导致 cleanup 时删除重要文件,此配置必须配置为白名单中的路径。详见 “白名单说明” - - kubernetes-master:k8s master 类型节点需要安装的包或二进制文件列表 - - kubernetes-worker:k8s worker 类型节点需要安装的包或二进制文件列表 - - network:网络需要安装的包或二进制文件列表 - - loadbalance:loadbalance 类型节点需要安装的包或二进制文件列表 - - container:容器需要安装的包或二进制文件列表 - - image:容器镜像 tar 包 - - dns:k8s coredns 安装包。如果 corednstype 配置为 pod,此处无需配置 - - addition:额外的安装包或二进制文件列表 - - master:以下配置会安装在所有 master 节点 - - name:需要安装的软件包包或二进制文件的名称 - - type:配置项类型,可选值为 pkg、repo、bin、file、dir、image、yaml、shell 。如果配置为 repo ,请在对应节点上配置 repo 源 - - schedule:仅在 type 为 shell 时有效,代表用户想要执行脚本的时机,支持 prejoin(节点加入前)、postjoin(节点加入后)、precleanup(节点退出前)、postcleanup(节点退出后)。 - - TimeOut:脚本执行超时时间,超时时该进程被强制终止运行。未配置默认为 30s - - worker:配置会安装在所有 worker 节点,具体配置格式和 addition 下的 master 相同 - -### 白名单介绍 - -install 配置中 dst 项的值必须符合白名单规则,配置为白名单对应路径及其子目录。当前白名单如下: - -- /usr/bin -- /usr/local/bin -- /opt/cni/bin -- /usr/libexec/cni -- /etc/kubernetes -- /usr/lib/systemd/system -- /etc/systemd/system -- /tmp - -### 配置示例 - -此处给出一个 YAML 文件配置示例。从示例可知,同一台机器,可以部署多个类型的节点,但是不同节点的配置必须一致,例如 test0 机器部署了 master 和 worker 类型。 - -```yaml -cluster-id: k8s-cluster -username: root -private-key-path: /root/.ssh/private.key -masters: -- name: test0 - ip: 192.168.0.1 - port: 22 - arch: arm64 -workers: -- name: test0 - ip: 192.168.0.1 - port: 22 - arch: arm64 -- name: test1 - ip: 192.168.0.3 - port: 22 - arch: arm64 -etcds: -- name: etcd-0 - ip: 192.168.0.4 - port: 22 - arch: amd64 -loadbalance: - name: k8s-loadbalance - ip: 192.168.0.5 - port: 22 - arch: amd64 - bind-port: 8443 -external-ca: false -external-ca-path: /opt/externalca -service: - cidr: 10.32.0.0/16 - dnsaddr: 10.32.0.10 - gateway: 10.32.0.1 - dns: - corednstype: pod - imageversion: 1.8.4 - replicas: 2 -network: - podcidr: 10.244.0.0/16 - plugin: calico - plugin-args: {"NetworkYamlPath": "/etc/kubernetes/addons/calico.yaml"} -apiserver-endpoint: 192.168.122.222:6443 -apiserver-cert-sans: - dnsnames: [] - ips: [] -apiserver-timeout: 120s -etcd-external: false -etcd-token: etcd-cluster -dns-vip: 10.32.0.10 -dns-domain: cluster.local -pause-image: k8s.gcr.io/pause:3.2 -network-plugin: cni -cni-bin-dir: /usr/libexec/cni,/opt/cni/bin -runtime: docker -runtime-endpoint: unix:///var/run/docker.sock -registry-mirrors: [] -insecure-registries: [] -config-extra-args: - - name: kubelet - extra-args: - "--cgroup-driver": systemd -open-ports: - worker: - - port: 111 - protocol: tcp - - port: 179 - protocol: tcp -install: - package-source: - type: tar.gz - dstpath: "" - srcpath: - arm64: /root/rpms/packages-arm64.tar.gz - amd64: /root/rpms/packages-x86.tar.gz - etcd: - - name: etcd - type: pkg - dst: "" - kubernetes-master: - - name: kubernetes-client,kubernetes-master - type: pkg - kubernetes-worker: - - name: docker-engine,kubernetes-client,kubernetes-node,kubernetes-kubelet - type: pkg - dst: "" - - name: conntrack-tools,socat - type: pkg - dst: "" - network: - - name: containernetworking-plugins - type: pkg - dst: "" - loadbalance: - - name: gd,gperftools-libs,libunwind,libwebp,libxslt - type: pkg - dst: "" - - name: nginx,nginx-all-modules,nginx-filesystem,nginx-mod-http-image-filter,nginx-mod-http-perl,nginx-mod-http-xslt-filter,nginx-mod-mail,nginx-mod-stream - type: pkg - dst: "" - container: - - name: emacs-filesystem,gflags,gpm-libs,re2,rsync,vim-filesystem,vim-common,vim-enhanced,zlib-devel - type: pkg - dst: "" - - name: libwebsockets,protobuf,protobuf-devel,grpc,libcgroup - type: pkg - dst: "" - - name: yajl,lxc,lxc-libs,lcr,clibcni,iSulad - type: pkg - dst: "" - image: - - name: pause.tar - type: image - dst: "" - dns: - - name: coredns - type: pkg - dst: "" - addition: - master: - - name: prejoin.sh - type: shell - schedule: "prejoin" - TimeOut: "30s" - - name: calico.yaml - type: yaml - dst: "" - worker: - - name: docker.service - type: file - dst: /usr/lib/systemd/system/ - - name: postjoin.sh - type: shell - schedule: "postjoin" -``` - -### 安装包结构 - -如果是离线部署,需要准备 Kubernetes 以及相关的离线安装包,并遵循特定目录结构存放离线安装包。需要遵循的目录结构如下: - -```shell -package -├── bin -├── dir -├── file -├── image -├── pkg -└── packages_notes.md -``` - -上述各目录的含义如下: - -- 离线部署包的目录结构与集群配置 config 中的 package 的类型对应,package 类型有 pkg、repo、bin、file、dir、image、yaml、shell 八种。 - -- bin 目录存放二进制文件,对应 package 类型 bin 。 - -- dir 目录存放需要拷贝到目标机器的目录,需要配置 dst 目的地路径,对应 package 类型 dir 。 - -- file 目录存放 file、yaml、shell 三种类型的文件。其中 file 类型代表需要拷贝到目标机器的文件,同时需要配置 dst 目的地路径;yaml 类型代表用户自定义的 YAML 文件,会在集群部署完成后 apply 该 YAML 文件;shell 类型代表用户想要执行的脚本,同时需要配置 schedule 执行时机,执行时机包括 prejoin(节点加入前)、postjoin(节点加入后)、precleanup(节点退出前)、postcleanup(节点退出后)四个阶段。 - -- image 目录存放需要导入的容器镜像。这些容器镜像必须兼容 docker 的 tar 包格式(例如由 docker 或 isula-build 导出镜像)。 - -- pkg 目录下存放需要安装的 rpm/deb 包,对应 package 类型 pkg 。建议使用二进制文件,便于跨发行版本的部署。 - -### 命令参考 - -openEuler 提供的集群部署工具,使用命令行 eggo 进行集群部署。 - -#### 部署 k8s 集群 - -通过指定的 YAML 配置部署 k8s 集群: - -**eggo deploy** [ **-d** ] **-f** *deploy.yaml* - -| 参数 | 是否必选 | 参数含义 | -| ------------- | -------- | --------------------------------- | -| --debug \| -d | 否 | 打印调试信息 | -| --file \| -f | 是 | 指定部署 k8s 集群的 YAML 文件路径 | - -#### 加入单节点 - -将指定的单节点加入到 k8s 集群中: - -**eggo** **join** [ **-d** ] **--id** *k8s-cluster* [ **--type** *master,worker* ] **--arch** *arm64* **--port** *22* [ **--name** *master1*] *IP* - -| 参数 | 是否必选 | 参数含义 | -| ------------- | -------- | ------------------------------------------------------------ | -| --debug \| -d | 否 | 打印调试信息 | -| --id | 是 | 指定将要加入 k8s 集群名称 | -| --type \| -t | 否 | 指定加入节点的类型,支持 master、worker 。多个类型使用 “,” 隔开,默认值为 worker 。 | -| --arch \| -a | 是 | 指定加入节点的 CPU 架构 | -| --port \| -p | 是 | 指定 ssh 登录所加入节点的端口号 | -| --name \| -n | 否 | 指定加入节点的名称 | -| *IP* | 是 | 加入节点的实际 IP 地址 | - -#### 加入多节点 - -将指定的多个节点加入到 k8s 集群: - -**eggo** **join** [ **-d** ] **--id** *k8s-cluster* **-f** *nodes.yaml* - -| 参数 | 是否必选 | 参数含义 | -| ------------- | -------- | -------------------------------- | -| --debug \| -d | 否 | 打印调试信息 | -| --id | 是 | 指定将要加入 k8s 集群名称 | -| --file \| -f | 是 | 指定加入节点的 YAML 配置文件路径 | - -#### 删除节点 - -删除 k8s 集群中的一个或者多个节点: - -**eggo delete** [ **-d** ] **--id** *k8s-cluster* *node* [*node...*] - -| 参数 | 是否必选 | 参数含义 | -| ------------- | -------- | -------------------------------------------- | -| --debug \| -d | 否 | 打印调试信息 | -| --id | 是 | 指定将要删除的节点所在的集群名称 | -| *node* | 是 | 要删除的单个或多个节点的 IP 地址或者节点名称 | - -#### 删除集群 - -删除整个 k8s 集群: - -**eggo cleanup** [ **-d** ] **--id** *k8s-cluster* [ **-f** *deploy.yaml* ] - -| 参数 | 是否必选 | 参数含义 | -| ------------- | -------- | ------------------------------------------------------------ | -| --debug \| -d | 否 | 打印调试信息 | -| --id | 是 | 指定将要清除的 k8s 集群名称 | -| --file \| -f | 否 | 指定清除 k8s 集群的 YAML 文件路径。不指定时,默认使用部署集群时缓存的集群配置。正常情况下,建议不配置该选项,仅异常情况下配置。 | - -> ![](./public_sys-resources/icon-note.gif)**说明** -> -> - 建议使用部署集群时缓存的集群配置删除集群,即正常情况下,不建议配置 --file | -f 参数。当异常导致缓存配置破坏或者丢失时,才配置该参数。 - - - -#### 查询集群 - -查询当前所有通过 eggo 部署的 k8s 集群: - -**eggo list** [ **-d** ] - -| 参数 | 是否必选 | 参数含义 | -| ------------- | -------- | ------------ | -| --debug \| -d | 否 | 打印调试信息 | - -#### 生成集群配置文件 - -快速生成部署 k8s 集群所需的 YAML 配置文件: - -**eggo template** **-d** **-f** *template.yaml* **-n** *k8s-cluster* **-u** *username* **-p** *password* **--etcd** [*192.168.0.1,192.168.0.2*] **--masters** [*192.168.0.1,192.168.0.2*] **--workers** *192.168.0.3* **--loadbalance** *192.168.0.4* - -| 参数 | 是否必选 | 参数含义 | -| ------------------- | -------- | ------------------------------- | -| --debug \| -d | 否 | 打印调试信息 | -| --file \| -f | 否 | 指定生成的 YAML 文件的路径 | -| --name \| -n | 否 | 指定 k8s 集群的名称 | -| --username \| -u | 否 | 指定 ssh 登录所配置节点的用户名 | -| --password \| -p | 否 | 指定 ssh 登录所配置节点的密码 | -| --etcd | 否 | 指定 etcd 节点的 IP 列表 | -| --masters | 否 | 指定 master 节点的 IP 列表 | -| --workers | 否 | 指定 worker 节点的 IP 列表 | -| --loadbalance \| -l | 否 | 指定 loadbalance 节点的 IP | - -#### 查询帮助信息 - -查询 eggo 命令的帮助信息: - - **eggo help** - -#### 查询子命令帮助信息 - -查询 eggo 子命令的帮助信息: - -**eggo deploy | join | delete | cleanup | list | template -h** - -| 参数 | 是否必选 | 参数含义 | -| ----------- | -------- | ------------ | -| --help\| -h | 是 | 打印帮助信息 | \ No newline at end of file diff --git "a/docs/zh/docs/Kubernetes/eggo\346\213\206\351\231\244\351\233\206\347\276\244.md" "b/docs/zh/docs/Kubernetes/eggo\346\213\206\351\231\244\351\233\206\347\276\244.md" deleted file mode 100644 index edc8e8aa203a4d5dc20d784f5db9e5910af570fe..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kubernetes/eggo\346\213\206\351\231\244\351\233\206\347\276\244.md" +++ /dev/null @@ -1,27 +0,0 @@ -# 拆除集群 - -当业务需求下降,不需要原有数量的节点时,可以通过删除集群中的节点,节省系统资源,从而降低成本。当业务不需要集群时,也可以直接删除整个集群。 - -## 删除节点 - -可以使用命令行删除集群中的节点。例如,删除 k8s-cluster 集群中 IP 地址为 *192.168.0.5* 和 *192.168.0.6* 所有节点类型,参考命令如下: - -```shell -$ eggo -d delete --id k8s-cluster 192.168.0.5 192.168.0.6 -``` - -## 删除整个集群 - -> ![](./public_sys-resources/icon-note.gif)**说明** -> -> - 删除集群会删除整个集群的数据,且无法恢复,请谨慎操作。 -> - 当前,拆除集群不会清理容器和容器镜像,但若部署 Kubernetes 集群时,配置了需要安装容器引擎,则会清除容器引擎,这可能导致容器运行异常。 -> - 拆除集群过程中可能会打印一些错误信息,一般是由于清理过程中操作集群时反馈了错误的结果导致,集群仍然能够正常拆除 -> - -可以使用命令行方式删除整个集群。例如,删除 k8s-cluster 集群的参考命令如下: - -```shell -$ eggo -d cleanup --id k8s-cluster -``` - diff --git "a/docs/zh/docs/Kubernetes/eggo\350\207\252\345\212\250\345\214\226\351\203\250\347\275\262.md" "b/docs/zh/docs/Kubernetes/eggo\350\207\252\345\212\250\345\214\226\351\203\250\347\275\262.md" deleted file mode 100644 index 1b13cc16ddb99ccd5d46000287100c1adfe95760..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kubernetes/eggo\350\207\252\345\212\250\345\214\226\351\203\250\347\275\262.md" +++ /dev/null @@ -1,23 +0,0 @@ -# 自动化部署 - -由于手动部署 Kubernetes 集群依赖人工部署各类组件,该方式耗时耗力。尤其是在大规模部署 Kubernetes 集群环境时,面临效率和出错的问题。为了解决该问题,openEuler 自 21.09 版本推出 Kubernetes 集群部署工具,该工具实现了大规模 Kubernetes 的自动化部署、部署流程追踪等功能,并且具备高度的灵活性。 - -这里介绍 Kubernetes 集群自动化部署工具的使用方法。 - -## 架构简介 - - - -![](./figures/arch.png) - -自动化集群部署整体架构如图所示,各模块含义如下: - -- GitOps:负责集群配置信息的管理,如更新、创建、删除等; 21.09 版本暂时不提供集群管理集群的功能。 -- InitCluster:元集群,作为中心集群管理其他业务集群。 -- eggops:自定义 CRD 和 controller 用于抽象 k8s 集群。 -- master:k8s 的 master 节点,承载集群的控制面。 -- worker:k8s 的负载节点,承载用户业务。 -- ClusterA、ClusterB、ClusterC:业务集群,承载用户业务。 - -如果您对openEuler提供的k8s集群部署工具感兴趣,欢迎访问源码仓:[https://gitee.com/openeuler/eggo](https://gitee.com/openeuler/eggo) - diff --git "a/docs/zh/docs/Kubernetes/eggo\351\203\250\347\275\262\351\233\206\347\276\244.md" "b/docs/zh/docs/Kubernetes/eggo\351\203\250\347\275\262\351\233\206\347\276\244.md" deleted file mode 100644 index 23d370bccc6712ab3eba610c360f69572bb19bc4..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kubernetes/eggo\351\203\250\347\275\262\351\233\206\347\276\244.md" +++ /dev/null @@ -1,258 +0,0 @@ -# 部署集群 - -本小节介绍如何部署 Kubernetes 集群。 - -## 环境准备 - -openEuler 提供的 Kubernetes 集群自动化部署工具: - -- 支持在多种常见 Linux 发行版(例如 openEuler、CentOS、Ubuntu)上部署 Kubernetes 集群。 -- 支持在不同 CPU 架构(例如 AMD64 和 ARM64)上混合部署。 - -### 前提条件 - -使用 Kubernetes 集群自动化部署工具,需要满足如下要求: - -- 部署集群需要使用 root 权限 -- 待部署 Kubernetes 的机器已经配置好机器名称 hostname ,并且已安装 tar 命令,确保能够使用 tar 命令解压 tar.gz 格式的压缩包。 -- 待部署 Kubernetes 的机器已经配置 ssh ,确保能够远程访问。如果是普通用户 ssh 登录,需要确保该用户有免密执行 sudo 的权限。 - -## 准备安装包 - -如果是离线安装,请根据集群的架构,准备对应架构的依赖包(ETCD 相关软件包、容器引擎相关软件包、Kubernetes 集群组件软件包、网络相关的软件包、coredns 软件包、依赖的容器镜像等)。 - -假设网络插件为 calico、集群中所有机器的架构为 ARM64,准备安装包的步骤如下: - -1. 下载依赖的软件包和 calico.yaml 。 - -2. 导出容器镜像。 - - ```shell - $ docker save -o images.tar calico/node:v3.19.1 calico/cni:v3.19.1 calico/kube-controllers:v3.19.1 calico/pod2daemon-flexvol:v3.19.1 k8s.gcr.io/pause:3.2 - ``` - -3. 按照规定的目录存放下载的安装包、文件和镜像(具体存放格式请参见 “准备环境”)。例如: - - ```shell - $ tree package - package - ├── bin - │ ├── bandwidth - │ ├── bridge - │ ├── conntrack - │ ├── containerd - │ ├── containerd-shim - │ ├── coredns - │ ├── ctr - │ ├── dhcp - │ ├── docker - │ ├── dockerd - │ ├── docker-init - │ ├── docker-proxy - │ ├── etcd - │ ├── etcdctl - │ ├── firewall - │ ├── flannel - │ ├── host-device - │ ├── host-local - │ ├── ipvlan - │ ├── kube-apiserver - │ ├── kube-controller-manager - │ ├── kubectl - │ ├── kubelet - │ ├── kube-proxy - │ ├── kube-scheduler - │ ├── loopback - │ ├── macvlan - │ ├── portmap - │ ├── ptp - │ ├── runc - │ ├── sbr - │ ├── socat - │ ├── static - │ ├── tuning - │ ├── vlan - │ └── vrf - ├── file - │ ├── calico.yaml - │ └── docker.service - ├── image - │ └── images.tar - └── packages_notes.md - ``` - -4. 编写 packages_notes.md,声明软件包来源,便于用户查看。 - - ```shell - 1. ETCD - - etcd,etcdctl - - 架构:arm64 - - 版本:3.5.0 - - 地址:https://github.com/etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-arm64.tar.gz - - 2. Docker Engine - - containerd,containerd-shim,ctr,docker,dockerd,docker-init,docker-proxy,runc - - 架构:arm64 - - 版本:19.03.0 - - 地址:https://download.docker.com/linux/static/stable/aarch64/docker-19.03.0.tgz - - 3. Kubernetes - - kube-apiserver,kube-controller-manager,kube-scheduler,kubectl,kubelet,kube-proy - - 架构:arm64 - - 版本:1.21.3 - - 地址:https://www.downloadkubernetes.com/ - - 4. network - - bandwidth,dhcp,flannel,host-local,loopback,portmap,sbr,tuning,vrf,bridge,firewall,host-device,ipvlan,macvlan,ptp,static,vlan - - 架构:arm64 - - 版本:0.9.1 - - 地址:https://github.com/containernetworking/plugins/releases/download/v0.9.1/cni-plugins-linux-arm64-v0.9.1.tgz - - 5. coredns - - coredns - - 架构:arm64 - - 版本:1.8.4 - - 地址:https://github.com/coredns/coredns/releases/download/v1.8.4/coredns_1.8.4_linux_arm64.tgz - - 6. images.tar - - calico/node:v3.19.1 calico/cni:v3.19.1 calico/kube-controllers:v3.19.1 calico/pod2daemon-flexvol:v3.19.1 k8s.gcr.io/pause:3.2 - - 架构:arm64 - - 版本:NA - - 地址:NA - 7. calico.yaml - - 架构:NA - - 版本:v3.19.1 - - 地址:https://docs.projectcalico.org/manifests/calico.yaml - ``` - -5. 进入 package 目录,将下载的软件包打包成 packages-arm64.tar.gz - - ```shell - $ tar -zcf package-arm64.tar.gz * - ``` - -6. 查看压缩包,确认打包成功。 - - ```shell - $ tar -tvf package/packages-arm64.tar.gz - drwxr-xr-x root/root 0 2021-07-29 10:37 bin/ - -rwxr-xr-x root/root 3636214 2021-02-05 23:43 bin/sbr - -rwxr-xr-x root/root 40108032 2021-07-28 16:40 bin/kube-proxy - -rwxr-xr-x root/root 4186218 2021-02-05 23:43 bin/vlan - -rwxr-xr-x root/root 3076118 2021-02-05 23:43 bin/static - -rwxr-xr-x root/root 3496425 2021-02-05 23:43 bin/host-local - -rwxr-xr-x root/root 3847814 2021-02-05 23:43 bin/portmap - -rwxr-xr-x root/root 9681959 2021-02-05 23:43 bin/dhcp - -rwxr-xr-x root/root 4054640 2021-02-05 23:43 bin/host-device - -rwxr-xr-x root/root 43909120 2021-07-28 16:41 bin/kube-scheduler - -rwxr-xr-x root/root 32831616 2019-07-18 02:27 bin/containerd - -rwxr-xr-x root/root 3284795 2021-02-05 23:43 bin/flannel - -rwxr-xr-x root/root 21757952 2021-06-16 05:52 bin/etcd - -rwxr-xr-x root/root 546520 2019-07-18 02:27 bin/docker-init - -rwxr-xr-x root/root 5878304 2019-07-18 02:27 bin/containerd-shim - -rwxr-xr-x root/root 4191734 2021-02-05 23:43 bin/macvlan - -rwxr-xr-x root/root 55248437 2019-07-18 02:27 bin/docker - -rwxr-xr-x root/root 376208 2019-10-27 01:42 bin/socat - -rwxr-xr-x root/root 4053707 2021-02-05 23:43 bin/bandwidth - -rwxr-xr-x root/root 4328311 2021-02-05 23:43 bin/ptp - -rwxr-xr-x root/root 3633613 2021-02-05 23:43 bin/vrf - -rwxr-xr-x root/root 3432839 2021-02-05 23:43 bin/loopback - -rwxr-xr-x root/root 109617672 2021-07-28 16:42 bin/kubelet - -rwxr-xr-x root/root 113442816 2021-07-28 16:42 bin/kube-apiserver - -rwxr-xr-x root/root 44171264 2021-05-28 18:33 bin/coredns - -rwxr-xr-x root/root 43122688 2021-07-28 16:41 bin/kubectl - -rwxr-xr-x root/root 16711680 2021-06-16 05:52 bin/etcdctl - -rwxr-xr-x root/root 3570597 2021-02-05 23:43 bin/tuning - -rwxr-xr-x root/root 4397098 2021-02-05 23:43 bin/bridge - -rwxr-xr-x root/root 4612178 2021-02-05 23:43 bin/firewall - -rwxr-xr-x root/root 68921120 2019-07-18 02:27 bin/dockerd - -rwxr-xr-x root/root 2898746 2019-07-18 02:27 bin/docker-proxy - -rwxr-xr-x root/root 4186585 2021-02-05 23:43 bin/ipvlan - -rwxr-xr-x root/root 18446016 2019-07-18 02:27 bin/ctr - -rwxr-xr-x root/root 80752 2019-01-27 19:40 bin/conntrack - -rwxr-xr-x root/root 8037728 2019-07-18 02:27 bin/runc - drwxr-xr-x root/root 0 2021-07-29 10:39 file/ - -rw-r--r-- root/root 20713 2021-07-29 10:39 file/calico.yaml - -rw-r--r-- root/root 1004 2021-07-29 10:39 file/docker.service - drwxr-xr-x root/root 0 2021-07-29 11:02 image/ - -rw-r--r-- root/root 264783872 2021-07-29 11:02 image/images.tar - -rw-r--r-- root/root 1298 2021-07-29 11:05 packages_notes.md - ``` - - - -## 准备配置文件 - -准备部署时使用的 YAML 配置文件。可以使用如下命令生成一个模板配置,然后根据部署需求修改生成的 template.yaml 。 - -```shell -$ eggo template -f template.yaml -``` - -或者直接使用命令行方式修改默认配置,参考命令如下: - -```shell -$ eggo template -f template.yaml -n k8s-cluster -u username -p password --masters 192.168.0.1 --masters 192.168.0.2 --workers 192.168.0.3 --etcds 192.168.0.4 --loadbalancer 192.168.0.5 -``` - -## 安装 Kubernetes 集群 - -安装 Kubernetes 集群。此处假设指定配置文件 template.yaml 。 - -```shell -$ eggo -d deploy -f template.yaml -``` - -安装完成后,根据回显信息,确认集群各节点是否安装成功。 - -```shell -\------------------------------- -message: create cluster success -summary: -192.168.0.1 success -192.168.0.2 success -192.168.0.3 success -\------------------------------- -To start using cluster: cluster-example, you need following as a regular user: - -​ export KUBECONFIG=/etc/eggo/cluster-example/admin.conf -``` - -## 加入节点 - -当集群中节点不满足业务需求,需要扩容时,可以在集群中新增节点。 - -- 添加单个节点:通过命令行添加。示例参考如下: - - ```shell - $ eggo -d join --id k8s-cluster --type master,worker --arch arm64 --port 22 192.168.0.5 - ``` - -- 添加多个节点:通过配置文件方式添加。 - - ```shell - $ eggo -d join --id k8s-cluster --file join.yaml - ``` - - join.yaml 中配置新增的节点信息,示例如下: - - ```yaml - masters: # 配置master节点列表,建议每个master节点同时作为worker节点,否则master节点可能无法直接访问pod - - name: test0 # 该节点的名称,为 k8s 集群查询显示的该节点名称 - ip: 192.168.0.2 # 该节点的 IP 地址 - port: 22 # ssh 登录的端口号 - arch: arm64 # 机器架构,x86_64 配置为 amd64 - - name: test1 - ip: 192.168.0.3 - port: 22 - arch: arm64 - workers: # 配置 worker 节点列表 - - name: test0 # 该节点的名称,为 k8s 集群查询显示的该节点名称 - ip: 192.168.0.4 # 该节点的 IP 地址 - port: 22 # ssh 登录的端口号 - arch: arm64 # 机器架构,x86_64 配置为 amd64 - - name: test2 - ip: 192.168.0.5 - port: 22 - arch: arm64 - ``` \ No newline at end of file diff --git a/docs/zh/docs/Kubernetes/figures/arch.png b/docs/zh/docs/Kubernetes/figures/arch.png deleted file mode 100644 index 93c5b4cb56b6d165dc5a5cf7aa76a007c362ef55..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/Kubernetes/figures/arch.png and /dev/null differ diff --git a/docs/zh/docs/Kubernetes/public_sys-resources/icon-note.gif b/docs/zh/docs/Kubernetes/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/Kubernetes/public_sys-resources/icon-note.gif and /dev/null differ diff --git "a/docs/zh/docs/Kubernetes/\345\207\206\345\244\207\350\231\232\346\213\237\346\234\272.md" "b/docs/zh/docs/Kubernetes/\345\207\206\345\244\207\350\231\232\346\213\237\346\234\272.md" deleted file mode 100644 index 946c4af0c86dd8c969d96ed28dc08a195e2c5b5c..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kubernetes/\345\207\206\345\244\207\350\231\232\346\213\237\346\234\272.md" +++ /dev/null @@ -1,157 +0,0 @@ -# 准备虚拟机 - - -本章介绍使用 virt manager 安装虚拟机的方法,如果您已经准备好虚拟机,可以跳过本章节。 - -## 安装依赖工具 - -安装虚拟机,会依赖相关工具,安装依赖并使能 libvirtd 服务的参考命令如下(如果需要代理,请先配置代理): - -```bash -$ dnf install virt-install virt-manager libvirt-daemon-qemu edk2-aarch64.noarch virt-viewer -$ systemctl start libvirtd -$ systemctl enable libvirtd -``` - -## 准备虚拟机磁盘文件 - -```bash -$ dnf install -y qemu-img -$ virsh pool-define-as vmPool --type dir --target /mnt/vm/images/ -$ virsh pool-build vmPool -$ virsh pool-start vmPool -$ virsh pool-autostart vmPool -$ virsh vol-create-as --pool vmPool --name master0.img --capacity 200G --allocation 1G --format qcow2 -$ virsh vol-create-as --pool vmPool --name master1.img --capacity 200G --allocation 1G --format qcow2 -$ virsh vol-create-as --pool vmPool --name master2.img --capacity 200G --allocation 1G --format qcow2 -$ virsh vol-create-as --pool vmPool --name node1.img --capacity 300G --allocation 1G --format qcow2 -$ virsh vol-create-as --pool vmPool --name node2.img --capacity 300G --allocation 1G --format qcow2 -$ virsh vol-create-as --pool vmPool --name node3.img --capacity 300G --allocation 1G --format qcow2 -``` - -## 打开 VNC 防火墙端口 - -**方法一** - -1. 查询端口 - - ```shell - $ netstat -lntup | grep qemu-kvm - ``` - -2. 打开 VNC 的防火墙端口。假设端口从 5900 开始,参考命令如下: - - ```shell - $ firewall-cmd --zone=public --add-port=5900/tcp - $ firewall-cmd --zone=public --add-port=5901/tcp - $ firewall-cmd --zone=public --add-port=5902/tcp - $ firewall-cmd --zone=public --add-port=5903/tcp - $ firewall-cmd --zone=public --add-port=5904/tcp - $ firewall-cmd --zone=public --add-port=5905/tcp - ``` - - - -**方法二** - -直接关闭防火墙 - -```shell -$ systemctl stop firewalld -``` - - - -## 准备虚拟机配置文件 - -创建虚拟机需要虚拟机配置文件。假设配置文件为 master.xml ,以虚拟机 hostname 为 k8smaster0 的节点为例,参考配置如下: - -```bash - cat master.xml - - - k8smaster0 - 8 - 8 - - hvm - /usr/share/edk2/aarch64/QEMU_EFI-pflash.raw - /var/lib/libvirt/qemu/nvram/k8smaster0.fd - - - - - - - - - 1 - - destroy - restart - restart - - /usr/libexec/qemu-kvm - - - - - - - - - - - - - - - - - - - - - - - - - - - - -``` - -由于虚拟机相关配置必须唯一,新增虚拟机需要适配修改如下内容,保证虚拟机的唯一性: - -- name:虚拟机 hostname,建议尽量小写。例中为 `k8smaster0` -- nvram:nvram的句柄文件路径,需要全局唯一。例中为 `/var/lib/libvirt/qemu/nvram/k8smaster0.fd` -- disk 的 source file:虚拟机磁盘文件路径。例中为 `/mnt/vm/images/master0.img` -- interface 的 mac address:interface 的 mac 地址。例中为 `52:54:00:00:00:80` - - - -## 安装虚拟机 - -1. 创建并启动虚拟机 - - ```shell - $ virsh define master.xml - $ virsh start k8smaster0 - ``` - -2. 获取虚拟机的 VNC 端口号 - - ```shell - $ virsh vncdisplay k8smaster0 - ``` - -3. 使用虚拟机链接工具,例如 VNC Viewer 远程链接虚拟机,并根据提示依次选择配置,完成系统安装 - -4. 设置虚拟机 hostname,例如设置为 k8smaster0 - - ```shell - $ hostnamectl set-hostname k8smaster0 - ``` diff --git "a/docs/zh/docs/Kubernetes/\345\207\206\345\244\207\350\257\201\344\271\246.md" "b/docs/zh/docs/Kubernetes/\345\207\206\345\244\207\350\257\201\344\271\246.md" deleted file mode 100644 index b60d91f5e589646ec37f7dcd897198d9703ed104..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kubernetes/\345\207\206\345\244\207\350\257\201\344\271\246.md" +++ /dev/null @@ -1,388 +0,0 @@ - -# 准备证书 - - -**声明:本文使用的证书为自签名,不能用于商用环境** - -部署集群前,需要生成集群各组件之间通信所需的证书。本文使用开源 CFSSL 作为验证部署工具,以便用户了解证书的配置和集群组件之间证书的关联关系。用户可以根据实际情况选择合适的工具,例如 OpenSSL 。 - -## 编译安装 CFSSL - -编译安装 CFSSL 的参考命令如下(需要互联网下载权限,需要配置代理的请先完成配置,需要配置 go语言环境), - -```bash -$ wget --no-check-certificate https://github.com/cloudflare/cfssl/archive/v1.5.0.tar.gz -$ tar -zxf v1.5.0.tar.gz -$ cd cfssl-1.5.0/ -$ make -j6 -# cp bin/* /usr/local/bin/ -``` - -## 生成根证书 - -编写 CA 配置文件,例如 ca-config.json: - -```bash -$ cat ca-config.json | jq -{ - "signing": { - "default": { - "expiry": "8760h" - }, - "profiles": { - "kubernetes": { - "usages": [ - "signing", - "key encipherment", - "server auth", - "client auth" - ], - "expiry": "8760h" - } - } - } -} -``` - -编写 CA CSR 文件,例如 ca-csr.json: - -```bash -$ cat ca-csr.json | jq -{ - "CN": "Kubernetes", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "openEuler", - "OU": "WWW", - "ST": "BinJiang" - } - ] -} -``` - -生成 CA 证书和密钥: -```bash -$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -``` - -得到如下证书: - -```bash -ca.csr ca-key.pem ca.pem -``` - -## 生成 admin 账户证书 - -admin 是 K8S 用于系统管理的一个账户,编写 admin 账户的 CSR 配置,例如 admin-csr.json: -```bash -cat admin-csr.json | jq -{ - "CN": "admin", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:masters", - "OU": "Containerum", - "ST": "BinJiang" - } - ] -} -``` - -生成证书: -```bash -$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin -``` - -结果如下: -```bash -admin.csr admin-key.pem admin.pem -``` - -## 生成 service-account 账户证书 - -编写 service-account 账户的 CSR 配置文件,例如 service-account-csr.json: -```bash -cat service-account-csr.json | jq -{ - "CN": "service-accounts", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "Kubernetes", - "OU": "openEuler k8s install", - "ST": "BinJiang" - } - ] -} -``` - -生成证书: -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes service-account-csr.json | cfssljson -bare service-account -``` - -结果如下: -```bash -service-account.csr service-account-key.pem service-account.pem -``` - -## 生成 kube-controller-manager 组件证书 - -编写 kube-controller-manager 的 CSR 配置: -```bash -{ - "CN": "system:kube-controller-manager", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:kube-controller-manager", - "OU": "openEuler k8s kcm", - "ST": "BinJiang" - } - ] -} -``` - -生成证书: -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager -``` - -结果如下: -```bash -kube-controller-manager.csr kube-controller-manager-key.pem kube-controller-manager.pem -``` - -## 生成 kube-proxy 证书 - -编写 kube-proxy 的 CSR 配置: -```bash -{ - "CN": "system:kube-proxy", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:node-proxier", - "OU": "openEuler k8s kube proxy", - "ST": "BinJiang" - } - ] -} -``` - -生成证书: -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy -``` - -结果如下: -```bash -kube-proxy.csr kube-proxy-key.pem kube-proxy.pem -``` - -## 生成 kube-scheduler 证书 - -编写 kube-scheduler 的 CSR 配置: -```bash -{ - "CN": "system:kube-scheduler", - "key": { - "algo": "rsa", - "size": 2048 - }, - "names": [ - { - "C": "CN", - "L": "HangZhou", - "O": "system:kube-scheduler", - "OU": "openEuler k8s kube scheduler", - "ST": "BinJiang" - } - ] -} -``` - -生成证书: -```bash -$ cfssl gencert -ca=../ca/ca.pem -ca-key=../ca/ca-key.pem -config=../ca/ca-config.json -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler -``` - -结果如下: -```bash -kube-scheduler.csr kube-scheduler-key.pem kube-scheduler.pem -``` - -## 生成 kubelet 证书 - -由于证书涉及到 kubelet 所在机器的 hostname 和 IP 地址信息,因此每个 node 节点配置不尽相同,所以编写脚本完成,生成脚本如下: -```bash -$ cat node_csr_gen.bash - -#!/bin/bash - -nodes=(k8snode1 k8snode2 k8snode3) -IPs=("192.168.122.157" "192.168.122.158" "192.168.122.159") - -for i in "${!nodes[@]}"; do - -cat > "${nodes[$i]}-csr.json" < 17h v1.20.2 -k8snode2 Ready 19m v1.20.2 -k8snode3 Ready 12m v1.20.2 -``` - -## 部署 coredns - -coredns可以部署到node节点或者master节点,本文这里部署到节点`k8snode1`。 - -### 编写 coredns 配置文件 - -```bash -$ cat /etc/kubernetes/pki/dns/Corefile -.:53 { - errors - health { - lameduck 5s - } - ready - kubernetes cluster.local in-addr.arpa ip6.arpa { - pods insecure - endpoint https://192.168.122.154:6443 - tls /etc/kubernetes/pki/ca.pem /etc/kubernetes/pki/admin-key.pem /etc/kubernetes/pki/admin.pem - kubeconfig /etc/kubernetes/pki/admin.kubeconfig default - fallthrough in-addr.arpa ip6.arpa - } - prometheus :9153 - forward . /etc/resolv.conf { - max_concurrent 1000 - } - cache 30 - loop - reload - loadbalance -} -``` - -说明: - -- 监听53端口; -- 设置kubernetes插件配置:证书、kube api的URL; - -### 准备 systemd 的 service 文件 - -```bash -cat /usr/lib/systemd/system/coredns.service -[Unit] -Description=Kubernetes Core DNS server -Documentation=https://github.com/coredns/coredns -After=network.target - -[Service] -ExecStart=bash -c "KUBE_DNS_SERVICE_HOST=10.32.0.10 coredns -conf /etc/kubernetes/pki/dns/Corefile" - -Restart=on-failure -LimitNOFILE=65536 - -[Install] -WantedBy=multi-user.target -``` - -### 启动服务 - -```bash -$ systemctl enable coredns -$ systemctl start coredns -``` - -### 创建 coredns 的 Service 对象 - -```bash -$ cat coredns_server.yaml -apiVersion: v1 -kind: Service -metadata: - name: kube-dns - namespace: kube-system - annotations: - prometheus.io/port: "9153" - prometheus.io/scrape: "true" - labels: - k8s-app: kube-dns - kubernetes.io/cluster-service: "true" - kubernetes.io/name: "CoreDNS" -spec: - clusterIP: 10.32.0.10 - ports: - - name: dns - port: 53 - protocol: UDP - - name: dns-tcp - port: 53 - protocol: TCP - - name: metrics - port: 9153 - protocol: TCP -``` - -### 创建 coredns 的 endpoint 对象 - -```bash -$ cat coredns_ep.yaml -apiVersion: v1 -kind: Endpoints -metadata: - name: kube-dns - namespace: kube-system -subsets: - - addresses: - - ip: 192.168.122.157 - ports: - - name: dns-tcp - port: 53 - protocol: TCP - - name: dns - port: 53 - protocol: UDP - - name: metrics - port: 9153 - protocol: TCP -``` - -### 确认 coredns 服务 - -```bash -# 查看service对象 -$ kubectl get service -n kube-system kube-dns -NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -kube-dns ClusterIP 10.32.0.10 53/UDP,53/TCP,9153/TCP 51m -# 查看endpoint对象 -$ kubectl get endpoints -n kube-system kube-dns -NAME ENDPOINTS AGE -kube-dns 192.168.122.157:53,192.168.122.157:53,192.168.122.157:9153 52m -``` \ No newline at end of file diff --git "a/docs/zh/docs/Kubernetes/\351\203\250\347\275\262\346\216\247\345\210\266\351\235\242\347\273\204\344\273\266.md" "b/docs/zh/docs/Kubernetes/\351\203\250\347\275\262\346\216\247\345\210\266\351\235\242\347\273\204\344\273\266.md" deleted file mode 100644 index 410f35a191b4f62c13d3e86be3919891af5de791..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/Kubernetes/\351\203\250\347\275\262\346\216\247\345\210\266\351\235\242\347\273\204\344\273\266.md" +++ /dev/null @@ -1,353 +0,0 @@ -# 部署控制面组件 - - -## 准备所有组件的 kubeconfig - -### kube-proxy - -```bash -$ kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://192.168.122.154:6443 --kubeconfig=kube-proxy.kubeconfig -$ kubectl config set-credentials system:kube-proxy --client-certificate=/etc/kubernetes/pki/kube-proxy.pem --client-key=/etc/kubernetes/pki/kube-proxy-key.pem --embed-certs=true --kubeconfig=kube-proxy.kubeconfig -$ kubectl config set-context default --cluster=openeuler-k8s --user=system:kube-proxy --kubeconfig=kube-proxy.kubeconfig -$ kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig -``` - -### kube-controller-manager - -```bash -$ kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://127.0.0.1:6443 --kubeconfig=kube-controller-manager.kubeconfig -$ kubectl config set-credentials system:kube-controller-manager --client-certificate=/etc/kubernetes/pki/kube-controller-manager.pem --client-key=/etc/kubernetes/pki/kube-controller-manager-key.pem --embed-certs=true --kubeconfig=kube-controller-manager.kubeconfig -$ kubectl config set-context default --cluster=openeuler-k8s --user=system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig -$ kubectl config use-context default --kubeconfig=kube-controller-manager.kubeconfig -``` - -### kube-scheduler - -```bash -$ kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://127.0.0.1:6443 --kubeconfig=kube-scheduler.kubeconfig -$ kubectl config set-credentials system:kube-scheduler --client-certificate=/etc/kubernetes/pki/kube-scheduler.pem --client-key=/etc/kubernetes/pki/kube-scheduler-key.pem --embed-certs=true --kubeconfig=kube-scheduler.kubeconfig -$ kubectl config set-context default --cluster=openeuler-k8s --user=system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig -$ kubectl config use-context default --kubeconfig=kube-scheduler.kubeconfig -``` - -### admin - -```bash -$ kubectl config set-cluster openeuler-k8s --certificate-authority=/etc/kubernetes/pki/ca.pem --embed-certs=true --server=https://127.0.0.1:6443 --kubeconfig=admin.kubeconfig -$ kubectl config set-credentials admin --client-certificate=/etc/kubernetes/pki/admin.pem --client-key=/etc/kubernetes/pki/admin-key.pem --embed-certs=true --kubeconfig=admin.kubeconfig -$ kubectl config set-context default --cluster=openeuler-k8s --user=admin --kubeconfig=admin.kubeconfig -$ kubectl config use-context default --kubeconfig=admin.kubeconfig -``` - -### 获得相关 kubeconfig 配置文件 - -```bash -admin.kubeconfig kube-proxy.kubeconfig kube-controller-manager.kubeconfig kube-scheduler.kubeconfig -``` - -## 生成密钥提供者的配置 - -api-server 启动时需要提供一个密钥对`--encryption-provider-config=/etc/kubernetes/pki/encryption-config.yaml`,本文通过 urandom 生成一个: - -```bash -$ cat generate.bash -#!/bin/bash - -ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64) - -cat > encryption-config.yaml < - [快速入门](#快速入门) @@ -103,7 +103,7 @@ 1. 登录[openEuler社区](https://openeuler.org)网站。 2. 单击“下载”。 3. 单击“获取ISO:”后面的“Link”,显示版本列表。 -4. 单击“openEuler-21.09”,进入openEuler 21.09版本下载列表。 +4. 单击“openEuler-23.03”,进入openEuler 23.03版本下载列表。 5. 单击“ISO”,进入ISO下载列表。 - aarch64:AArch64架构的ISO。 - x86\_64:x86\_64架构的ISO。 @@ -112,13 +112,13 @@ 6. 根据实际待安装环境的架构选择需要下载的openEuler的发布包和校验文件。 - 若为AArch64架构。 1. 单击“aarch64”。 - 2. 单击“openEuler-21.09-aarch64-dvd.iso”,将openEuler发布包下载到本地。 - 3. 单击“openEuler-21.09-aarch64-dvd.iso.sha256sum”,将openEuler校验文件下载到本地。 + 2. 单击“openEuler-23.03-aarch64-dvd.iso”,将openEuler发布包下载到本地。 + 3. 单击“openEuler-23.03-aarch64-dvd.iso.sha256sum”,将openEuler校验文件下载到本地。 - 若为x86\_64架构。 1. 单击“x86\_64”。 - 2. 单击“openEuler-21.09-x86\_64-dvd.iso”,将openEuler发布包下载到本地。 - 3. 单击“openEuler-21.09-x86\_64-dvd.iso.sha256sum”,将openEuler校验文件下载到本地。 + 2. 单击“openEuler-23.03-x86\_64-dvd.iso”,将openEuler发布包下载到本地。 + 3. 单击“openEuler-23.03-x86\_64-dvd.iso.sha256sum”,将openEuler校验文件下载到本地。 @@ -129,13 +129,13 @@ 1. 获取校验文件中的校验值。执行命令如下: ``` - $ cat openEuler-21.09-aarch64-dvd.iso.sha256sum + $ cat openEuler-23.03-aarch64-dvd.iso.sha256sum ``` 2. 计算文件的sha256校验值。执行命令如下: ``` - $ sha256sum openEuler-21.09-aarch64-dvd.iso + $ sha256sum openEuler-23.03-aarch64-dvd.iso ``` 命令执行完成后,输出校验值。 @@ -179,13 +179,13 @@ 8. 设备重启后进入到openEuler操作系统安装引导界面,如[图5](#fig1648754873314)所示。 >![](./public_sys-resources/icon-note.gif) **说明:** - >- 如果60秒内未按任何键,系统将从默认选项“Test this media & install openEuler 21.09”自动进入安装界面。 + >- 如果60秒内未按任何键,系统将从默认选项“Test this media & install openEuler 23.03”自动进入安装界面。 >- 安装物理机时,如果使用键盘上下键无法选择启动选项,按“Enter”键无响应,可以单击BMC界面上的鼠标控制图标“![](./figures/zh-cn_image_0229420473.png)”,设置“键鼠复位”。 **图 5** 安装引导界面 ![](./figures/Installation_wizard.png) -9. 在安装引导界面,按“Enter”,进入默认选项“Test this media & install openEuler 21.09”的图形化安装界面。 +9. 在安装引导界面,按“Enter”,进入默认选项“Test this media & install openEuler 23.03”的图形化安装界面。 ## 安装 @@ -295,7 +295,7 @@ ## 查看系统信息 -系统安装完成并重启后直接进入系统命令行登录界面,输入安装过程中设置的用户和密码,进入openEuler操作系统,查看如下系统信息。若需要进行系统管理和配置操作,请参考《[管理员指南](https://openeuler.org/zh/docs/21.09/docs/Administration/administration.html)》。 +系统安装完成并重启后直接进入系统命令行登录界面,输入安装过程中设置的用户和密码,进入openEuler操作系统,查看如下系统信息。若需要进行系统管理和配置操作,请参考《[管理员指南](https://openeuler.org/zh/docs/23.03/docs/Administration/administration.html)》。 - 查看系统信息,命令如下: ``` @@ -307,10 +307,10 @@ ``` # cat /etc/os-release NAME="openEuler" - VERSION="21.09" + VERSION="23.03" ID="openEuler" - VERSION_ID="21.09" - PRETTY_NAME="openEuler 21.09" + VERSION_ID="23.03" + PRETTY_NAME="openEuler 23.03" ANSI_COLOR="0;31" ``` diff --git a/docs/zh/docs/Releasenotes/release_notes.md b/docs/zh/docs/Releasenotes/release_notes.md index cafa7bd670491e00fea478d8a85e8d9f859a85b9..24c3e33efdf61d9973f2b531cebd258f86a9d5aa 100644 --- a/docs/zh/docs/Releasenotes/release_notes.md +++ b/docs/zh/docs/Releasenotes/release_notes.md @@ -1,3 +1,3 @@ # 发行说明 -本文档是 openEuler 22.09 版本的发行说明。 \ No newline at end of file +本文档是 openEuler 23.03 版本的发行说明。 \ No newline at end of file diff --git "a/docs/zh/docs/Releasenotes/\345\205\263\351\224\256\347\211\271\346\200\247.md" "b/docs/zh/docs/Releasenotes/\345\205\263\351\224\256\347\211\271\346\200\247.md" index 0e0e904238c2abbf201f55b44149bc1afbcf8af9..0795432aadb717505cf74deb9f67ef4c2587838e 100644 --- "a/docs/zh/docs/Releasenotes/\345\205\263\351\224\256\347\211\271\346\200\247.md" +++ "b/docs/zh/docs/Releasenotes/\345\205\263\351\224\256\347\211\271\346\200\247.md" @@ -1,6 +1,6 @@ # 关键特性 -## openEuler 22.09基于 Linux Kernel 5.10内核构建, 同时吸收了社区高版本的有益特性及社区创新特性。 +## openEuler 23.03基于 Linux Kernel 6.1内核构建, 同时吸收了社区高版本的有益特性及社区创新特性。 - **BPF CO-RE(Compile Once-Run Everywhere)特性**: 解决BPF的可移植性,即编写的程序通过编译和内核校验之后,能正确地在 不同版本的内核上运行 —— 而无需针对不同内核重新编译。 @@ -62,7 +62,7 @@ HybridSched 是虚拟机混部全栈解决方案,包括增强的 OpenStack 集 ## 更多的第三方应用支持 - **OpenStack Yoga**:OpenStack版本更新到2022年4月份发布的最新稳定版本Yoga,并且在openEuler的OpenStack Yoga版本中我们增加支持了OpenStack-Helm组件。 -- **openStack部署工具opensd(联通)**:支持OpenStack Yoga版本在openEuler 22.09上的基本部署。 +- **openStack部署工具opensd(联通)**:支持OpenStack Yoga版本在openEuler 23.03上的基本部署。 - **OpenStack Yoga虚拟机混部:在OpenStack Nova中引入虚拟机高低优先级技术,对CPU、IO、Memory等资源有不同需求的虚拟机通过调度方式部署、迁移到同一个计算节点上,从而使得节点的资源得到充分利用 - **文件备份还原**,备份还原具备系统备份,文件备份,自定义还原,一键备份还原等功能,可极大减少运维成本。 diff --git "a/docs/zh/docs/Releasenotes/\346\263\225\345\276\213\345\243\260\346\230\216.md" "b/docs/zh/docs/Releasenotes/\346\263\225\345\276\213\345\243\260\346\230\216.md" index 9d3a65aaabc32dcad6ab0373dbd91d34a2a67b88..84b32c8c06683c9e4bad76d3da4896abfa333f97 100644 --- "a/docs/zh/docs/Releasenotes/\346\263\225\345\276\213\345\243\260\346\230\216.md" +++ "b/docs/zh/docs/Releasenotes/\346\263\225\345\276\213\345\243\260\346\230\216.md" @@ -1,6 +1,6 @@ # 法律声明 -**版权所有 © 2022 openEuler社区** +**版权所有 © 2023 openEuler社区** 您对“本文档”的复制、使用、修改及分发受知识共享\(Creative Commons\)署名—相同方式共享4.0国际公共许可协议\(以下简称“CC BY-SA 4.0”\)的约束。为了方便用户理解,您可以通过访问[https://creativecommons.org/licenses/by-sa/4.0/](https://creativecommons.org/licenses/by-sa/4.0/) 了解CC BY-SA 4.0的概要 \(但不是替代\)。CC BY-SA 4.0的完整协议内容您可以访问如下网址获取:[https://creativecommons.org/licenses/by-sa/4.0/legalcode](https://creativecommons.org/licenses/by-sa/4.0/legalcode)。 diff --git "a/docs/zh/docs/Releasenotes/\347\224\250\346\210\267\351\241\273\347\237\245.md" "b/docs/zh/docs/Releasenotes/\347\224\250\346\210\267\351\241\273\347\237\245.md" index 9fd87a0e99c5f8fdc583c75581290995db0c3c1b..dd37cb62e70c0096b8f5ba02c6842279bdf4da35 100644 --- "a/docs/zh/docs/Releasenotes/\347\224\250\346\210\267\351\241\273\347\237\245.md" +++ "b/docs/zh/docs/Releasenotes/\347\224\250\346\210\267\351\241\273\347\237\245.md" @@ -2,4 +2,4 @@ - openEuler版本号计数规则由openEuler x.x变更为以年月为版本号,以便用户了解版本发布时间,例如openEuler 21.03表示发布时间为2021年3月。 - [Python核心团队](https://www.python.org/dev/peps/pep-0373/#update)已经于2020年1月停止对Python 2的维护。2021年,openEuler 21.03 仅修复Python 2的致命CVE。Python 2已于2020年12月31日全面停止维护。请您尽快切换到Python 3。 -- openEuler 22.03-LTS版本开始,停止支持和维护Python 2,仅支持Python 3,请您切换和使用Python 3。 \ No newline at end of file +- openEuler 22.03-LTS版本开始,停止支持和维护Python 2,仅支持Python 3,请您切换和使用Python 3。 \ No newline at end of file diff --git "a/docs/zh/docs/Releasenotes/\347\263\273\347\273\237\345\256\211\350\243\205.md" "b/docs/zh/docs/Releasenotes/\347\263\273\347\273\237\345\256\211\350\243\205.md" index 8ae5b391c3fa608d750aae8c5821aa01716fa1e2..886df70ef3710316546507e6711032ac3d2f44b1 100644 --- "a/docs/zh/docs/Releasenotes/\347\263\273\347\273\237\345\256\211\350\243\205.md" +++ "b/docs/zh/docs/Releasenotes/\347\263\273\347\273\237\345\256\211\350\243\205.md" @@ -2,7 +2,7 @@ ## 发布件 -openEuler发布件包括[ISO发布包](http://repo.openeuler.org/openEuler-22.09/ISO/)、[虚拟机镜像](http://repo.openeuler.org/openEuler-22.09/virtual_machine_img/)、[容器镜像](http://repo.openeuler.org/openEuler-22.09/docker_img/)、[嵌入式镜像](http://repo.openeuler.org/openEuler-22.09/embedded_img/)和[repo源](http://repo.openeuler.org/openEuler-22.09/)。ISO发布包请参见[表1](#table8396719144315)。容器镜像清单参见[表3](#table1276911538154)。repo源方便在线使用,repo源目录请参见[表5](#table953512211576)。 +openEuler发布件包括[ISO发布包](http://repo.openeuler.org/openEuler-23.03/ISO/)、[虚拟机镜像](http://repo.openeuler.org/openEuler-23.03/virtual_machine_img/)、[容器镜像](http://repo.openeuler.org/openEuler-23.03/docker_img/)、[嵌入式镜像](http://repo.openeuler.org/openEuler-23.03/embedded_img/)和[repo源](http://repo.openeuler.org/openEuler-23.03/)。ISO发布包请参见[表1](#table8396719144315)。容器镜像清单参见[表3](#table1276911538154)。repo源方便在线使用,repo源目录请参见[表5](#table953512211576)。 **表 1** 发布ISO列表 @@ -14,37 +14,37 @@ openEuler发布件包括[ISO发布包](http://repo.openeuler.org/openEuler-22.09 -

openEuler-22.09-aarch64-dvd.iso

+

openEuler-23.03-aarch64-dvd.iso

AArch64架构的基础安装ISO,包含了运行最小系统的核心组件

-

openEuler-22.09-everything-aarch64-dvd.iso

+

openEuler-23.03-everything-aarch64-dvd.iso

AArch64架构的全量安装ISO,包含了运行完整系统所需的全部组件

-

openEuler-22.09-everything-debug-aarch64-dvd.iso

+

openEuler-23.03-everything-debug-aarch64-dvd.iso

AArch64架构下openEuler的调试ISO,包含了调试所需的符号表信息

-

openEuler-22.09-x86_64-dvd.iso

+

openEuler-23.03-x86_64-dvd.iso

x86_64架构的基础安装ISO,包含了运行最小系统的核心组件

-

openEuler-22.09-everything-x86_64-dvd.iso

+

openEuler-23.03-everything-x86_64-dvd.iso

x86_64架构的全量安装ISO,包含了运行完整系统所需的全部组件

-

openEuler-22.09-everything-debuginfo-x86_64-dvd.iso

+

openEuler-23.03-everything-debuginfo-x86_64-dvd.iso

x86_64架构下openEuler的调试ISO,包含了调试所需的符号表信息

-

openEuler-22.09-source-dvd.iso

+

openEuler-23.03-source-dvd.iso

openEuler源码ISO

@@ -82,12 +82,12 @@ openEuler发布件包括[ISO发布包](http://repo.openeuler.org/openEuler-22.09 -

openEuler-22.09-aarch64.qcow2.xz

+

openEuler-23.03-aarch64.qcow2.xz

AArch64架构下openEuler虚拟机镜像

-

openEuler-22.09-x86_64.qcow2.xz

+

openEuler-23.03-x86_64.qcow2.xz

x86_64架构下openEuler虚拟机镜像

@@ -125,10 +125,10 @@ openEuler发布件包括[ISO发布包](http://repo.openeuler.org/openEuler-22.09 | 名称 | 描述 | | -------------------------------------- | ------------------------------- | | arm64/aarch64-std/zImage | aarch64架构下支持qemu的内核镜像 | -| arm64/aarch64-std/\*toolchain-22.09.sh | aarch64架构下对应的开发编译链 | +| arm64/aarch64-std/\*toolchain-23.03.sh | aarch64架构下对应的开发编译链 | | arm64/aarch64-std/\*rootfs.cpio.gz | aarch64架构下支持qemu的文件系统 | | arm32/arm-std/zImage | arm架构下支持qemu的内核镜像 | -| arm32/arm-std/\*toolchain-22.09.sh | arm架构下对应的开发编译链 | +| arm32/arm-std/\*toolchain-23.03.sh | arm架构下对应的开发编译链 | | arm32/arm-std/\*rootfs.cpio.gz | arm架构下支持qemu的文件系统 | | source-list/manifest.xml | 构建使用的源码清单 | @@ -203,7 +203,7 @@ openEuler发布件包括[ISO发布包](http://repo.openeuler.org/openEuler-22.09 ## 最小硬件要求 -安装 openEuler 22.09-LTS 所需的最小硬件要求如[表6](#zh-cn_topic_0182825778_tff48b99c9bf24b84bb602c53229e2541)所示。 +安装 openEuler 23.03-LTS 所需的最小硬件要求如[表6](#zh-cn_topic_0182825778_tff48b99c9bf24b84bb602c53229e2541)所示。 **表 6** 最小硬件要求 diff --git a/docs/zh/docs/TailorCustom/figures/flowchart.png b/docs/zh/docs/TailorCustom/figures/flowchart.png deleted file mode 100644 index e4fecb8b310f204d6cfd07449ccc3c93d1badd51..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/TailorCustom/figures/flowchart.png and /dev/null differ diff --git a/docs/zh/docs/TailorCustom/figures/lack_pack.png b/docs/zh/docs/TailorCustom/figures/lack_pack.png deleted file mode 100644 index a4b7f1da15da70f63a86aae360e89017c2b98f2d..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/TailorCustom/figures/lack_pack.png and /dev/null differ diff --git "a/docs/zh/docs/TailorCustom/imageTailor \344\275\277\347\224\250\346\214\207\345\215\227.md" "b/docs/zh/docs/TailorCustom/imageTailor \344\275\277\347\224\250\346\214\207\345\215\227.md" deleted file mode 100644 index 10cfe8d67d4100cec6974db37e47c4eb0e3f5699..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/TailorCustom/imageTailor \344\275\277\347\224\250\346\214\207\345\215\227.md" +++ /dev/null @@ -1,933 +0,0 @@ -# imageTailor 使用指南 - -- [简介](#简介) -- [安装工具](#安装工具) - - [软硬件要求](#软硬件要求) - - [获取安装包](#获取安装包) - - [安装 imageTailor](#安装-imageTailor) - - [目录介绍](#目录介绍) -- [定制系统](#定制系统) - - [总体流程](#总体流程) - - [定制业务包](#定制业务包) - - [配置本地 repo 源](#配置本地-repo-源) - - [添加文件](#添加文件) - - [添加 RPM 包](#添加-RPM-包) - - [添加 hook 脚本](#添加-hook-脚本) - - [配置系统参数](#配置系统参数) - - [配置主机参数](#配置主机参数) - - [配置初始密码](#配置初始密码) - - [配置分区](#配置分区) - - [配置网络](#配置网络) - - [配置内核参数](#配置内核参数) - - [制作系统](#制作系统) - - [命令介绍](#命令介绍) - - [制作指导](#制作指导) - - [裁剪时区](#裁剪时区) - - [定制示例](#定制示例) - - - -## 简介 - -操作系统除内核外,还包含各种功能的外围包。通用操作系统包含较多外围包,提供了丰富的功能,但是这也带来了一些影响: - -- 占用资源(内存、磁盘、CPU 等)多,导致系统运行效率低 -- 很多功能用户不需要,增加了开发和维护成本 - -因此,openEuler 提供了 imageTailor 镜像裁剪定制工具。用户可以根据需求裁剪操作系统镜像中不需要的外围包,或者添加所需的业务包或文件。该工具主要提供了以下功能: - -- 系统包裁剪定制:用户可以选择默认安装以及裁剪的rpm,也支持用户裁剪定制系统命令、库、驱动。 -- 系统配置定制:用户可以配置主机名、启动服务、时区、网络、分区、加载驱动、版本号等。 -- 用户文件定制:支持用户添加定制文件到系统镜像中。 - - - -## 安装工具 - -本节以 openEuler 22.03 LTS 版本 AArch64 架构为例,说明安装方法。 - -### 软硬件要求 - -安装和运行 imageTailor 需要满足以下软硬件要求: - -- 机器架构为 x86_64 或者 AArch64 - -- 操作系统为 openEuler 22.03 LTS(该版本内核版本为 5.10,python 版本为 3.9,满足工具要求) - -- 运行工具的机器根目录 '/' 需要 40 GB 以上空间 - -- python 版本 3.9 及以上 - -- kernel 内核版本 5.10 及以上 - -- 关闭 SElinux 服务 - - ```shell - $ sudo setenforce 0 - $ getenforce - Permissive - ``` - - - -### 获取安装包 - -安装和使用 imageTailor 工具,首先需要下载 openEuler 发布件。 - -1. 获取 ISO 镜像文件和对应的校验文件。 - - 镜像必须为 everything 版本,此处假设存放在 root 目录,参考命令如下: - - ```shell - $ cd /root/temp - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso.sha256sum - ``` - -3. 获取 sha256sum 校验文件中的校验值。 - - ```shell - $ cat openEuler-22.03-LTS-everything-aarch64-dvd.iso.sha256sum - ``` - -4. 计算 ISO 镜像文件的校验值。 - - ```shell - $ sha256sum openEuler-22.03-LTS-everything-aarch64-dvd.iso - ``` - -5. 对比上述 sha256sum 文件的检验值和 ISO 镜像的校验值,如果两者相同,说明文件完整性检验成功。否则说明文件完整性被破坏,需要重新获取文件。 - -### 安装 imageTailor - -此处以 openEuler 22.03 LTS 版本的 AArch64 架构为例,介绍如何安装 imageTailor 工具。 - -1. 确认机器已经安装操作系统 openEuler 22.03 LTS( imageTailor 工具的运行环境)。 - - ```shell - $ cat /etc/openEuler-release - openEuler release 22.03 LTS - ``` - -2. 创建文件 /etc/yum.repos.d/local.repo,配置对应 yum 源。配置内容参考如下,其中 baseurl 是用于挂载 ISO 镜像的目录: - - ```shell - [local] - name=local - baseurl=file:///root/imageTailor_mount - gpgcheck=0 - enabled=1 - ``` - -3. 使用 root 权限,挂载光盘镜像到 /root/imageTailor_mount 目录(请与上述 repo 文件中配置的 baseurl 保持一致,且建议该目录的磁盘空间大于 20 GB)作为 yum 源,参考命令如下: - - ```shell - $ mkdir /root/imageTailor_mount - $ sudo mount -o loop /root/temp/openEuler-22.03-LTS-everything-aarch64-dvd.iso /root/imageTailor_mount/ - ``` - -4. 使 yum 源生效: - - ```shell - $ yum clean all - $ yum makecache - ``` - -5. 使用 root 权限,安装 imageTailor 裁剪工具: - - ```shell - $ sudo yum install -y imageTailor - ``` - -6. 使用 root 权限,确认工具已安装成功。 - - ```shell - $ cd /opt/imageTailor/ - $ sudo ./mkdliso -h - ------------------------------------------------------------------------------------------------------------- - Usage: mkdliso -p product_name -c configpath [--minios yes|no|force] [-h] [--sec] - Options: - -p,--product Specify the product to make, check custom/cfg_yourProduct. - -c,--cfg-path Specify the configuration file path, the form should be consistent with custom/cfg_xxx - --minios Make minios: yes|no|force - --sec Perform security hardening - -h,--help Display help information - - Example: - command: - ./mkdliso -p openEuler -c custom/cfg_openEuler --sec - - help: - ./mkdliso -h - ------------------------------------------------------------------------------------------------------------- - ``` - -### 目录介绍 - -imageTailor 工具安装完成后,工具包的目录结构如下: - -```shell -[imageTailor] - |-[custom] - |-[cfg_openEuler] - |-[usr_file] // 存放用户添加的文件 - |-[usr_install] // 存放用户的 hook 脚本 - |-[all] - |-[conf] - |-[hook] - |-[cmd.conf] // 配置 ISO 镜像默认使用的命令和库 - |-[rpm.conf] // 配置 ISO 镜像默认安装的 RPM 包和驱动列表 - |-[security_s.conf] // 配置安全加固策略 - |-[sys.conf] // 配置 ISO 镜像系统参数 - |-[kiwi] // imageTailor 基础配置 - |-[repos] // RPM 源,制作 ISO 镜像需要的 RPM 包 - |-[security-tool] // 安全加固工具 - |-mkdliso // 制作 ISO 镜像的可执行脚本 -``` - -## 定制系统 - -本章介绍使用 imageTailor 工具将业务 RPM 包、自定义文件、驱动、命令和文件打包至目标 ISO 镜像。 - -### 总体流程 - -使用 imageTailor 工具定制系统的流程请参见下图: - -![](./figures/flowchart.png) - -各流程含义如下: - -- 检查软硬件环境:确认制作 ISO 镜像的机器满足软硬件要求。 - -- 定制业务包:包括添加 RPM 包(包括业务 RPM 包、命令、驱动、库文件)和添加文件(包括自定义文件、命令、驱动、库文件) - - - 添加业务 RPM 包:用户可以根据需要,添加 RPM 包到 ISO 镜像。具体要求请参见 [安装工具](# 安装工具) 章节。 - - 添加自定义文件:若用户希望在目标 ISO 系统安装或启动时,能够进行自定义的硬件检查、系统配置检查、驱动安装等操作,可编写自定义文件,并打包到 ISO 镜像。 - - 添加驱动、命令、库文件:当 openEuler 的 RPM 包源未包含用户需要的驱动、命令或库文件时,可以使用 imageTailor 工具将对应驱动、命令或库文件打包至 ISO 镜像。 - -- 配置系统参数 - - - 配置主机参数:为了确保 ISO 镜像安装和启动成功,需要配置主机参数。 - - 配置分区:用户可以根据业务规划配置业务分区,同时可以调整系统分区。 - - 配置网络:用户可以根据需要配置系统网络参数,例如:网卡名称、IP 地址、子网掩码。 - - 配置初始密码:为了确保 ISO 镜像安装和启动成功,需要配置 root 初始密码和 grub 初始密码。 - - 配置内核参数:用户可以根据需求配置内核的命令行参数。 - -- 配置安全加固策略 - - imageTailor 提供了默认地安全加固策略。用户可以根据业务需要,通过编辑 security_s.conf 对系统进行二次加固(仅在系统 ISO 镜像定制阶段),具体的操作方法请参见 《 [安全加固指南](https://docs.openeuler.org/zh/docs/22.03_LTS/docs/SecHarden/secHarden.html) 》。 - -- 制作操作系统 ISO 镜像 - - 使用 imageTailor 工具制作操作系统 ISO 镜像。 - -### 定制业务包 - -用户可以根据业务需要,将业务 RPM 包、自定义文件、驱动、命令和库文件打包至目标 ISO 镜像。 - -#### 配置本地 repo 源 - -定制 ISO 操作系统镜像,必须在 /opt/imageTailor/repos/euler_base/ 目录配置 repo 源。本节主要介绍配置本地 repo 源的方法。 - -1. 下载 openEuler 发布的 ISO (必须使用 openEuler 发布 everything 版本镜像 的 RPM 包)。 - ```shell - $ cd /opt - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso - ``` - -2. 创建挂载目录 /opt/openEuler_repo ,并挂载 ISO 到该目录 。 - ```shell - $ sudo mkdir -p /opt/openEuler_repo - $ sudo mount openEuler-22.03-LTS-everything-aarch64-dvd.iso /opt/openEuler_repo - mount: /opt/openEuler_repo: WARNING: source write-protected, mounted read-only. - ``` - -3. 拷贝 ISO 中的 RPM 包到 /opt/imageTailor/repos/euler_base/ 目录下。 - ```shell - $ sudo rm -rf /opt/imageTailor/repos/euler_base && sudo mkdir -p /opt/imageTailor/repos/euler_base - $ sudo cp -ar /opt/openEuler_repo/Packages/* /opt/imageTailor/repos/euler_base - $ sudo chmod -R 644 /opt/imageTailor/repos/euler_base - $ sudo ls /opt/imageTailor/repos/euler_base|wc -l - 2577 - $ sudo umount /opt/openEuler_repo && sudo rm -rf /opt/openEuler_repo - $ cd /opt/imageTailor - ``` - -#### 添加文件 - -用户可以根据需要添加文件到 ISO 镜像,此处的文件类型可以是用户自定义文件、驱动、命令、库文件。用户只需要将文件放至 /opt/imageTailor/custom/cfg_openEuler/usr_file 目录下即可。 - -##### 注意事项 - -- 命令必须具有可执行权限,否则 imageTailor 工具无法将该命令打包至 ISO 中。 - -- 存放在 /opt/imageTailor/custom/cfg_openEuler/usr_file 目录下的文件,会生成在 ISO 根目录下,所以文件的目录结构必须是从根目录开始的完整路径,以便 imageTailor 工具能够将该文件放至正确的目录下。 - - 例如:假设希望文件 file1 在 ISO 的 /opt 目录下,则需要在 usr_file 目录下新建 opt 目录,再将 file1 文件拷贝至 opt 目录。如下: - - ```shell - $ pwd - /opt/imageTailor/custom/cfg_openEuler/usr_file - - $ tree - . - ├── etc - │   ├── default - │   │   └── grub - │   └── profile.d - │   └── csh.precmd - └── opt - └── file1 - - 4 directories, 3 files - ``` - -- 存放在 /opt/imageTailor/custom/cfg_openEuler/usr_file 目录下的目录必须是真实路径(例如路径中不包含软连接。可在系统中使用 `realpath` 或 `readlink -f` 命令查询真实路径)。 - -- 如果需要在系统启动或者安装阶段调用用户提供的脚本,即 hook 脚本,则需要将该文件放在 hook 目录下。 - -#### 添加 RPM 包 - -##### 操作流程 - -用户可以添加 RPM 包(驱动、命令或库文件)到 ISO 镜像,操作步骤如下: - ->![](./public_sys-resources/icon-note.gif) **说明:** -> ->- 下述 rpm.conf 和 cmd.conf 均在 /opt/imageTailor/custom/cfg_openEuler/ 目录下。 ->- 下述 RPM 包裁剪粒度是指 sys_cut='no' 。裁剪粒度详情请参见 [配置主机参数](#配置主机参数) 。 ->- 若没有配置本地 repo 源,请参见 [配置本地 repo 源 ](#配置本地 repo 源)进行配置。 -> - -1. 确认 /opt/imageTailor/repos/euler_base/ 目录中是否包含需要添加的 RPM 包。 - - - 是,请执行步骤 2 。 - - 否,请执行步骤 3 。 -2. 在 rpm.conf 的 \ 字段配置该 RPM 包信息。 - - 若为 RPM 包裁剪粒度,则操作完成。 - - 若为其他裁剪粒度,请执行步骤 4 。 -3. 用户自己提供 RPM 包,放至 /opt/imageTailor/custom/cfg_openEuler/usr_rpm 目录下。如果 RPM 包依赖于其他 RPM 包,也必须将依赖包放至该目录,因为新增 RPM 包需要和依赖 RPM 包同时打包至 ISO 镜像。 - - 若为用户 RPM 包文件裁剪,则执行 4 。 - - 其他裁剪粒度,则操作完成。 -4. 请在 rpm.conf 和 cmd.conf 中配置该 RPM 包中要保留的驱动、命令和库文件。如果有要裁剪的普通文件,需要在 cmd.conf 文件中的 \\ 区域配置。 - - -##### 配置文件说明 - -| 对象 | 对应配置文件 | 填写区域 | -| :----------- | :----------- | :----------------------------------------------------------- | -| 添加驱动 | rpm.conf | \
\
\

说明:其中驱动名称所在路径为 " /lib/modules/{内核版本号}/kernel/ " 的相对路径 | -| 添加命令 | cmd.conf | \
\
\
| -| 添加库文件 | cmd.conf | \
\
\
| -| 删除其他文件 | cmd.conf | \
\
\

说明:普通文件名称必须包含绝对路径 | - -**示例** - -- 添加驱动 - - ```shell - - - - - ...... - - ``` - -- 添加命令 - - ```shell - - - - - ...... - - ``` - -- 添加库文件 - - ```shell - - - - - - ``` - -- 删除其他文件 - - ```shell - - - - - - ``` - -#### 添加 hook 脚本 - -hook 脚本由 OS 在启动和安装过程中调用,执行脚本中定义的动作。imageTailor 工具存放 hook 脚本的目录为 custom/cfg_openEuler/usr_install/hook,且其下有不同子目录,每个子目录代表 OS 启动或安装的不同阶段,用户根据脚本需要被调用的阶段存放,OS 会在对应阶段调用该脚本。用户可以根据需要存放自定义脚本到指定目录。 - -##### **脚本命名规则** - -用户可自定义脚本名称,必须 "S+数字(至少两位,个位数以0开头)" 开头,数字代表 hook 脚本的执行顺序。脚本名称示例:S01xxx.sh - ->![](./public_sys-resources/icon-note.gif) **说明:** -> ->hook 目录下的脚本是通过 source 方式调用,所以脚本中需要谨慎使用 exit 命令,因为调用 exit 命令之后,整个安装的脚本程序也同步退出了。 - - - -##### hook 子目录说明 - -| hook 子目录 | hook 脚本举例 | hook 执行点 | 说明 | -| :-------------------- | :---------------------| :------------------------------- | :----------------------------------------------------------- | -| insmod_drv_hook | 无 | 加载 OS 驱动之后 | 无 | -| custom_install_hook | S01custom_install.sh | 驱动加载完成后(即 insmod_drv_hook 执行后) | 用户可以自定义安装过程,不需要使用 OS 默认安装流程。 | -| env_check_hook | S01check_hw.sh | 安装初始化之前 | 初始化之前检查硬件配置规格、获取硬件类型。 | -| set_install_ip_hook | S01set_install_ip.sh | 安装初始化过程中,配置网络时 | 用户根据自身组网,自定义网络配置。 | -| before_partition_hook | S01checkpart.sh | 在分区前调用 | 用户可以在分区之前检查分区配置文件是否正确。 | -| before_setup_os_hook | 无 | 解压repo之前 | 用户可以进行自定义分区挂载操作。
如果安装包解压的路径不是分区配置中指定的根分区。则需要用户自定义分区挂载,并将解压路径赋值给传入的全局变量。 | -| before_mkinitrd_hook | S01install_drv.sh | 执行 mkinitrd 操作之前 | initrd 放在硬盘的场景下,执行 mkinitrd 操作之前的 hook。用户可以进行添加、更新驱动文件等自定义操作。 | -| after_setup_os_hook | 无 | 安装完系统之后 | 用户可以在安装完成之后进行系统文件的自定义操作,包括修改 grub.cfg 等 | -| install_succ_hook | 无 | 系统安装流程成功结束 | 用户执行解析安装信息,回传安装是否成功等操作。install_succ_hook 不可以设置为 install_break。 | -| install_fail_hook | 无 | 系统安装失败 | 用户执行解析安装信息,回传安装是否成功等操作。install_fail_hook 不可以设置为 install_break。 | - -### 配置系统参数 - -开始制作操作系统 ISO 镜像之前,需要配置系统参数,包括主机参数、初始密码、分区、网络、编译参数和系统命令行参数。 - -#### 配置主机参数 - - /opt/imageTailor/custom/cfg_openEuler/sys.conf 文件的 \ \ 区域用于配置系统的常用参数,例如主机名、内核启动参数等。 - -openEuler 提供的默认配置如下,用户可以需要进行修改: - -```shell - - sys_service_enable='ipcc' - sys_service_disable='cloud-config cloud-final cloud-init-local cloud-init' - sys_utc='yes' - sys_timezone='' - sys_cut='no' - sys_usrrpm_cut='no' - sys_hostname='Euler' - sys_usermodules_autoload='' - sys_gconv='GBK' - -``` - -配置中的各参数含义如下: - -- sys_service_enable - - 可选配置。OS 默认启用的服务,多个服务请以空格分开。如果用户不需要新增系统服务,请保持默认值,默认值为 ipcc 。配置时请注意: - - - 只能在默认配置的基础上增加系统服务,不能删减系统服务。 - - 可以配置业务相关的服务,但是需要 repo 源中包含业务 RPM 包。 - - 默认只开启该参数中配置的服务,如果服务依赖其他服务,需要将被依赖的服务也配置在该参数中。 - -- sys_service_disable - - 可选配置。禁止服务开机自启动的服务,多个服务请以空格分开。如果用户没有需要禁用的系统服务,请修改该参数为空。 - -- sys_utc - - 必选配置。是否采用 UTC 时间。yes 表示采用,no 表示不采用,默认值为 yes 。 - -- sys_timezone - - 可选配置。设置时区,即该单板所处的时区。可配置的范围为 openEuler 支持的时区,可通过 /usr/share/zoneinfo/zone.tab 文件查询。 - -- sys_cut - - 必选配置。是否裁剪 RPM 包。可配置为 yes、no 或者 debug 。yes 表示裁剪,no 表示不裁剪(仅安装 rpm.conf 中的 RPM 包),debug 表示裁剪但会保留 `rpm` 命令方便安装后定制。默认值为 no 。 - - >![](./public_sys-resources/icon-note.gif) 说明: - > - > - imageTailor 工具会先安装用户添加的 RPM 包,再删除 cmd.conf - > 中 \ 区域的文件,最后删除 - > cmd.conf 和 rpm.conf 中未配置的命令、库和驱动。 - > - sys_cut='yes' 时,imageTailor 工具不支持 `rpm` 命令的安装,即使在 rpm.conf 中配置了也不生效。 - -- sys_usrrpm_cut - - 必选配置。是否裁剪用户添加到 /opt/imageTailor/custom/cfg_openEuler/usr_rpm 目录下的 RPM 包。yes 表示裁剪,no 表示不裁剪。默认值为 no 。 - - - sys_usrrpm_cut='yes' :imageTailor 工具会先安装用户添加的 RPM 包,然后删除 cmd.conf 中 \ 区域配置的文件,最后删除 cmd.conf 和 rpm.conf 中未配置的命令、库和驱动。 - - - sys_usrrpm_cut='no' :imageTailor 工具会安装用户添加的 RPM 包,不删除用户 RPM 包中的文件。 - -- sys_hostname - - 必选配置。主机名。大批量部署 OS 时,部署成功后,建议修改每个节点的主机名,确保各个节点的主机名不重复。 - - 主机名要求:字母、数字、"-" 的组合,首字母必须是字母或数字。字母支持大小写。字符个数不超过 63 。默认值为 Euler 。 - -- sys_usermodules_autoload - - 可选配置。系统启动阶段加载的驱动,配置该参数时,不需要填写后缀 .ko 。如果有多个驱动,请以空格分开。默认为空,不加载额外驱动。 - -- sys_gconv - - 可选配置。该参数用于定制 /usr/lib/gconv, /usr/lib64/gconv ,配置取值为: - - - null/NULL:表示不配置。如果裁剪系统(sys_cut=“yes”),则/usr/lib/gconv 和 /usr/lib64/gconv 会被删除。 - - all/ALL:不裁剪 /usr/lib/gconv 和 /usr/lib64/gconv 。 - - xxx,xxx: 保留 /usr/lib/gconv 和 /usr/lib64/gconv 目录下对应的文件。若需要保留多个文件,可用 "," 分隔。 - -- sys_man_cut - - 可选配置。配置是否裁剪 man 文档。yes 表示裁剪,no 表示不裁剪。默认值为 yes 。 - - - ->![](./public_sys-resources/icon-note.gif) 说明: -> -> sys_cut 和 sys_usrrpm_cut 同时配置时,sys_cut 优先级更高,即遵循如下原则: -> -> - sys_cut='no' -> -> 无论 sys_usrrpm_cut='no' 还是 sys_usrrpm_cut='yes' ,都为系统 RPM 包裁剪粒度,即imageTailor 会安装 repo 源中的 RPM 包和 usr_rpm 目录下的 RPM 包,但不会裁剪 RPM 包中的文件。即使用户不需要这些 RPM 包中的部分文件,imageTailor 也不会进行裁剪。 -> -> - sys_cut='yes' -> -> - sys_usrrpm_cut='no' -> -> 系统 RPM 包文件裁剪粒度:imageTailor 会根据用户配置,裁剪 repo 源中 RPM 包的文件。 -> -> - sys_usrrpm_cut='yes' -> -> 系统和用户 RPM 包文件裁剪粒度:imageTailor 会根据用户的配置,裁剪 repo 源和 usr_rpm 目录中 RPM 包的文件。 -> - - - -#### 配置初始密码 - -操作系统安装时,必须具有 root 初始密码和 grub 初始密码,否则裁剪得到的 ISO 在安装后无法使用 root 账号进行登录。本节介绍配置初始密码的方法。 - -> ![](./public_sys-resources/icon-note.gif)说明: -> -> root 初始密码和 grub 初始密码,必须由用户自行配置。 - -##### 配置 root 初始密码 - -###### 简介 - -root 初始密码保存在 "/opt/imageTailor/custom/cfg_openEuler/rpm.conf" 中,用户通过修改该文件配置 root 初始密码。 - ->![](./public_sys-resources/icon-note.gif) **说明:** -> ->- 若使用 `mkdliso` 命令制作 ISO 镜像时需要使用 --minios yes/force 参数(制作在系统安装时进行系统引导的 initrd),则还需要在 /opt/imageTailor/kiwi/minios/cfg_minios/rpm.conf 中填写相应信息。 - -/opt/imageTailor/custom/cfg_openEuler/rpm.conf 中 root 初始密码的默认配置如下,需要用户自行添加: - -``` - - - -``` - -各参数含义如下: - -- group:用户所属组。 -- pwd:用户初始密码的加密密文,加密算法为 SHA-512。${pwd} 需要替换成用户实际的加密密文。 -- home:用户的家目录。 -- name:需要配置用户的用户名。 - -###### 修改方法 - -用户在制作 ISO 镜像前需要修改 root 用户的初始密码,这里给出设置 root 初始密码的方法(需使用 root 权限): - -1. 添加用于生成密码的用户,此处假设 testUser。 - - ```shell - $ sudo useradd testUser - ``` - -2. 设置 testUser 用户的密码。参考命令如下,根据提示设置密码: - - ```shell - $ sudo passwd testUser - Changing password for user testUser. - New password: - Retype new password: - passwd: all authentication tokens updated successfully. - ``` - -3. 查看 /etc/shadow 文件,testUser 后的内容(两个 : 间的字符串)即为加密后的密码。 - - ``` shell script - $ sudo cat /etc/shadow | grep testUser - testUser:$6$YkX5uFDGVO1VWbab$jvbwkZ2Kt0MzZXmPWy.7bJsgmkN0U2gEqhm9KqT1jwQBlwBGsF3Z59heEXyh8QKm3Qhc5C3jqg2N1ktv25xdP0:19052:0:90:7:35:: - ``` - -4. 拷贝上述加密密码替换 /opt/imageTailor/custom/cfg_openEuler/rpm.conf 中的 pwd 字段,如下所示: - ``` shell script - - - - ``` - -5. 若使用 `mkdliso` 命令制作 ISO 镜像时需要使用 --minios yes/force 参数,请修改 /opt/imageTailor/kiwi/minios/cfg_minios/rpm.conf 中对应用户的 pwd 字段。 - - ``` shell script - - - - ``` - -##### 配置 grub 初始密码 - -grub 初始密码保存在 /opt/imageTailor/custom/cfg_openEuler/usr_file/etc/default/grub 中,用户通过修改该文件配置 grub 初始密码。如果未配置 grub 初始密码,制作 ISO 镜像会失败。 - -> ![](./public_sys-resources/icon-note.gif)说明: -> -> - 配置 grub 初始密码需要使用 root 权限。 -> - grub 密码对应的默认用户为 root 。 -> -> - 系统中需有 grub2-set-password 命令,若不存在,请提前安装该命令。 - -1. 执行如下命令,根据提示设置 grub 密码: - - ```shell - $ sudo grub2-set-password -o ./ - Enter password: - Confirm password: - grep: .//grub.cfg: No such file or directory - WARNING: The current configuration lacks password support! - Update your configuration with grub2-mkconfig to support this feature. - ``` - -2. 命令执行完成后,会在当前目录生成 user.cfg 文件,grub.pbkdf2.sha512 开头的内容即 grub 加密密码。 - - ```shell - $ sudo cat user.cfg - GRUB2_PASSWORD=grub.pbkdf2.sha512.10000.CE285BE1DED0012F8B2FB3DEA38782A5B1040FEC1E49D5F602285FD6A972D60177C365F1 - B5D4CB9D648AD4C70CF9AA2CF9F4D7F793D4CE008D9A2A696A3AF96A.0AF86AB3954777F40D324816E45DD8F66CA1DE836DC7FBED053DB02 - 4456EE657350A27FF1E74429546AD9B87BE8D3A13C2E686DD7C71D4D4E85294B6B06E0615 - ``` - -3. 复制上述密文,并在 /opt/imageTailor/custom/cfg_openEuler/usr_file/etc/default/grub 文件中增加如下配置: - - ```shell - GRUB_PASSWORD="grub.pbkdf2.sha512.10000.CE285BE1DED0012F8B2FB3DEA38782A5B1040FEC1E49D5F602285FD6A972D60177C365F1 - B5D4CB9D648AD4C70CF9AA2CF9F4D7F793D4CE008D9A2A696A3AF96A.0AF86AB3954777F40D324816E45DD8F66CA1DE836DC7FBED053DB02 - 4456EE657350A27FF1E74429546AD9B87BE8D3A13C2E686DD7C71D4D4E85294B6B06E0615" - ``` - - -#### 配置分区 - -若用户想调整系统分区或业务分区,可以通过修改 /opt/imageTailor/custom/cfg_openEuler/sys.conf 文件中的 \ 实现。 - ->![](./public_sys-resources/icon-note.gif) **说明:** -> ->- 系统分区:存放操作系统的分区 ->- 业务分区:存放业务数据的分区 ->- 差别:在于存放的内容,而每个分区的大小、挂载路径和文件系统类型都不是区分业务分区和系统分区的依据。 ->- 配置分区为可选项,用户也可以在安装 OS 之后,手动配置分区 - - \ 的配置格式为: - -hd 磁盘号 挂载路径 分区大小 分区类型 文件系统类型 [二次格式化标志位] - -其默认配置如下: - -``` shell script - -hd0 /boot 512M primary ext4 yes -hd0 /boot/efi 200M primary vfat yes -hd0 / 30G primary ext4 -hd0 - - extended - -hd0 /var 1536M logical ext4 -hd0 /home max logical ext4 - -``` - -各参数含义如下: - -- hd 磁盘号 - 磁盘的编号。请按照 hdx 的格式填写,x 指第 x 块盘。 - - >![](./public_sys-resources/icon-note.gif) **说明:** - > - >分区配置只在被安装机器的磁盘能被识别时才有效。 - -- 挂载路径 - 指定分区挂载的路径。用户既可以配置业务分区,也可以对默认配置中的系统分区进行调整。如果不挂载,则设置为 '-'。 - - >![](./public_sys-resources/icon-note.gif) **说明:** - > - >- 分区配置中必须有 '/' 挂载路径。其他的请用户自行调整。 - >- 采用 UEFI 引导时,在 x86_64 的分区配置中必须有 '/boot' 挂载路径,在 AArch64 的分区配置中必须有 '/boot/efi' 挂载路径。 - -- 分区大小 - 分区大小的取值有以下四种: - - - G/g:指定以 GB 为单位的分区大小,例如:2G。 - - M/m:指定以 MB 为单位的分区大小,例如:300M。 - - T/t:指定以 TB 为单位的分区大小,例如:1T。 - - MAX/max:指定将硬盘上剩余的空间全部用来创建一个分区。只能在最后一个分区配置该值。 - - >![](./public_sys-resources/icon-note.gif) **说明:** -> - >- 分区大小不支持小数,如果是小数,请换算成其他单位,调整为整数的数值。例如:不能填写 1.5G,应填写为 1536M。 - >- 分区大小取 MAX/max 值时,剩余分区大小不能超过支持文件系统类型的限制(默认文件系统类型 ext4,限制大小 16T)。 - -- 分区类型 - 分区有以下三种: - - - 主分区: primary - - 扩展分区:extended(该分区只需配置 hd 磁盘号即可) - - 逻辑分区:logical - -- 文件系统类型 - 目前支持的文件系统类型有:ext4、vfat - -- 二次格式化标志位 - 可选配置,表示二次安装时是否格式化: - - - 是:yes - - 否:no 。不配置默认为 no 。 - - >![](./public_sys-resources/icon-note.gif) **说明:** - > - >二次格式化是指本次安装之前,磁盘已安装过 openEuler 系统。当前一次安装跟本次安装的使用相同的分区表配置(分区大小,挂载点,文件类型)时,该标志位可以配置是否格式化之前的分区,'/boot' 和 '/' 分区除外,每次都会重新格式化。如果目标机器第一次安装,则该标志位不生效,所有指定了文件系统的分区都会进行格式化。 - -#### 配置网络 - -系统网络参数保存在 /opt/imageTailor/custom/cfg_openEuler/sys.conf 中,用户可以通过该文件的\\ 配置修改目标 ISO 镜像的网络参数,例如:网卡名称、IP地址、子网掩码。 - -sys.conf 中默认的网络配置如下,其中 netconfig-0 代表网卡 eth0。如果需要配置多块网卡,例如eth1,请在配置文件中增加 \\,并在其中填写网卡 eth1 的各项参数。 - -```shell - -BOOTPROTO="dhcp" -DEVICE="eth0" -IPADDR="" -NETMASK="" -STARTMODE="auto" - -``` - -各参数含义请参见下表: - -- | 参数名称 | 是否必配 | 参数值 | 说明 | - | :-------- | -------- | :------------------------------------------------ | :----------------------------------------------------------- | - | BOOTPROTO | 是 | none / static / dhcp | none:引导时不使用协议,不配地址
static:静态分配地址
dhcp:使用 DHCP 协议动态获取地址 | - | DEVICE | 是 | 如:eth1 | 网卡名称 | - | IPADDR | 是 | 如:192.168.11.100 | IP 地址
当 BOOTPROTO 参数为 static 时,该参数必配;其他情况下,该参数不用配置 | - | NETMASK | 是 | - | 子网掩码
当 BOOTPROTO 参数为 static 时,该参数必配;其他情况下,该参数不用配置 | - | STARTMODE | 是 | manual / auto / hotplug / ifplugd / nfsroot / off | 启用网卡的方法:
manual:用户在终端执行 ifup 命令启用网卡。
auto \ hotplug \ ifplug \ nfsroot:当 OS 识别到该网卡时,便启用该网卡。
off:任何情况下,网卡都无法被启用。
各参数更具体的说明请在制作 ISO 镜像的机器上执行 `man ifcfg` 命令查看。 | - - -#### 配置内核参数 - -为了系统能够更稳定高效地运行,用户可以根据需要修改内核命令行参数。imageTailor 工具制作的 OS 镜像,可以通过修改 /opt/imageTailor/custom/cfg_openEuler/usr_file/etc/default/grub 中的 GRUB_CMDLINE_LINUX 配置实现内核命令行参数修改。 GRUB_CMDLINE_LINUX 中内核命令行参数的默认配置如下: - -```shell -GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 crashkernel=512M oops=panic softlockup_panic=1 reserve_kbox_mem=16M crash_kexec_post_notifiers panic=3 console=tty0" -``` - -此处各配置含义如下(其余常见的内核命令行参数请查阅内核相关文档): - -- net.ifnames=0 biosdevname=0 - - 以传统方式命名网卡。 - -- crashkernel=512M - - 为 kdump 预留的内存空间大小为 512 MB。 - -- oops=panic panic=3 - - 内核 oops 时直接 panic,并且 3 秒后重启系统。 - -- softlockup_panic=1 - - 在检测到软死锁(soft-lockup)时让内核 panic。 - -- reserve_kbox_mem=16M - - 为 kbox 预留的内存空间大小为 16 MB。 - -- console=tty0 - - 指定第一个虚拟控制台的输出设备为 tty0。 - -- crash_kexec_post_notifiers - - 系统 crash 后,先调用注册到 panic 通知链上的函数,再执行 kdump。 - -### 制作系统 - -操作系统定制完成后,可以通过 mkdliso 脚本制作系统镜像文件。 imageTailor 制作的 OS 为 ISO 格式的镜像文件。 - -#### 命令介绍 - -##### 命令格式 - -**mkdliso -p openEuler -c custom/cfg_openEuler [--minios yes|no|force] [--sec] [-h]** - -##### 参数说明 - -| 参数名称 | 是否必选 | 参数含义 | 取值范围 | -| -------- | -------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| -p | 是 | 设置产品名称 | openEuler | -| c | 是 | 指定配置文件的相对路径 | custom/cfg_openEuler | -| --minios | 否 | 制作在系统安装时进行系统引导的 initrd | 默认为 yes
yes:第一次执行命令时会制作 initrd,之后执行命令会判断 'usr_install/boot'
目录下是否存在 initrd(sha256 校验)。如果存在,就不重新制作 initrd,否则制作 initrd 。
no:不制作 initrd,采用原有方式,系统引导和运行使用的 initrd 相同。
force:强制制作 initrd,不管 'usr_install/boot' 目录下是否存在 initrd。 | -| --sec | 否 | 是否对生成的 ISO 进行安全加固
如果用户不输入该参数,则由此造成的安全风险由用户承担 | 无 | -| -h | 否 | 获取帮助信息 | 无 | - -#### 制作指导 - -使用 mkdliso 制作 ISO 镜像的操作步骤如下: - ->![](./public_sys-resources/icon-note.gif) 说明: -> -> - mkdliso 所在的绝对路径中不能有空格,否则会导致制作 ISO 失败。 -> - 制作 ISO 的环境中,umask 的值必须设置为 0022。 - -1. 使用 root 权限,执行 mkdliso 命令,生成 IOS 镜像文件。参考命令如下: - - ```shell - # sudo /opt/imageTailor/mkdliso -p openEuler -c custom/cfg_openEuler --sec - ``` - - 命令执行完成后,制作出的新文件在 /opt/imageTailor/result/{日期} 目录下,包括 openEuler-aarch64.iso 和 openEuler-aarch64.iso.sha256 。 - -2. 验证 ISO 镜像文件的完整性。此处假设日期为 2022-03-21-14-48 。 - - ```shell - $ cd /opt/imageTailor/result/2022-03-21-14-48/ - $ sha256sum -c openEuler-aarch64.iso.sha256 - ``` - - 回显如下,表示 ISO 镜像文件完整,ISO 制作完成。 - - ``` - openEuler-aarch64.iso: OK - ``` - - 若回显如下,表示镜像不完整,说明 ISO 镜像文件完整性被破坏,需要重新制作。 - - ```shell - openEuler-aarch64.iso: FAILED - sha256sum: WARNING: 1 computed checksum did NOT match - ``` - -3. 查看日志 - - 镜像制作完成后,可以根据需要(例如制作出错时)查看日志。第一次制作镜像时,对应的日志和安全加固日志被压缩为一个 tar 包(日志的命名格式为:sys_custom_log_{*日期* }.tar.gz),存放在 result/log 目录下。该目录只保留最近时间的 50 个日志压缩包,超过 50 个时会对旧文件进行覆盖。 - - - -### 裁剪时区 - -定制完成的 ISO 镜像安装后,用户可以根据需求裁剪 openEuler 系统支持的时区。本节介绍裁剪时区的方法。 - -openEuler 操作系统支持的时区信息存放在时区文件夹 /usr/shre/zoninfo 下,可通过如下命令查看: - -```shell -$ ls /usr/share/zoneinfo/ -Africa/ America/ Asia/ Atlantic/ Australia/ Etc/ Europe/ -Pacific/ zone.tab -``` - -其中每个子文件夹代表一个 Area ,当前 Area 包括:大陆、海洋以及 Etc 。每个 Area 文件夹内部则包含了隶属于其的 Location 。一个 Location 一般为一座城市或者一个岛屿。 - -所有时区均以 Area/Location 的形式来表示,比如中国大陆南部使用北京时间,其时区为 Asia/Shanghai(Location 并不一定会使用首都)。对应的,其时区文件为: - -``` -/usr/share/zoneinfo/Asia/Shanghai -``` - -若用户希望裁剪某些时区,则只需将对应的时区文件删除即可。 - -### 定制示例 - -本节给出使用 imageTailor 工具定制一个 ISO 操作系统镜像的简易方案,方便用户了解制作的整体流程。 - -1. 检查制作 ISO 所在环境是否满足要求。 - - ``` shell - $ cat /etc/openEuler-release - openEuler release 22.03 LTS - ``` - -2. 确保根目录有 40 GB 以上空间。 - - ```shell - $ df -h - Filesystem Size Used Avail Use% Mounted on - ...... - /dev/vdb 196G 28K 186G 1% / - ``` - -3. 安装 imageTailor 裁剪工具。具体安装方法请参见 [安装工具](#安装工具) 章节。 - - ```shell - $ sudo yum install -y imageTailor - $ ll /opt/imageTailor/ - total 88K - drwxr-xr-x. 3 root root 4.0K Mar 3 08:00 custom - drwxr-xr-x. 10 root root 4.0K Mar 3 08:00 kiwi - -r-x------. 1 root root 69K Mar 3 08:00 mkdliso - drwxr-xr-x. 2 root root 4.0K Mar 9 14:48 repos - drwxr-xr-x. 2 root root 4.0K Mar 9 14:48 security-tool - ``` - -4. 配置本地 repo 源。 - - ```shell - $ wget https://repo.openeuler.org/openEuler-22.03-LTS/ISO/aarch64/openEuler-22.03-LTS-everything-aarch64-dvd.iso - $ sudo mkdir -p /opt/openEuler_repo - $ sudo mount openEuler-22.03-LTS-everything-aarch64-dvd.iso /opt/openEuler_repo - mount: /opt/openEuler_repo: WARNING: source write-protected, mounted read-only. - $ sudo rm -rf /opt/imageTailor/repos/euler_base && sudo mkdir -p /opt/imageTailor/repos/euler_base - $ sudo cp -ar /opt/openEuler_repo/Packages/* /opt/imageTailor/repos/euler_base - $ sudo chmod -R 644 /opt/imageTailor/repos/euler_base - $ sudo ls /opt/imageTailor/repos/euler_base|wc -l - 2577 - $ sudo umount /opt/openEuler_repo && sudo rm -rf /opt/openEuler_repo - $ cd /opt/imageTailor - ``` - -5. 修改 grub/root 密码 - - 以下 ${pwd} 的实际内容请参见 [配置初始密码](#配置初始密码) 章节生成并替换。 - - ```shell - $ cd /opt/imageTailor/ - $ sudo vi custom/cfg_openEuler/usr_file/etc/default/grub - GRUB_PASSWORD="${pwd1}" - $ - $ sudo vi kiwi/minios/cfg_minios/rpm.conf - - - - $ - $ sudo vi custom/cfg_openEuler/rpm.conf - - - - ``` - -6. 执行裁剪命令。 - - ```shell - $ sudo rm -rf /opt/imageTailor/result - $ sudo ./mkdliso -p openEuler -c custom/cfg_openEuler --minios force - ...... - Complete release iso file at: result/2022-03-09-15-31/openEuler-aarch64.iso - move all mkdliso log file to result/log/sys_custom_log_20220309153231.tar.gz - $ ll result/2022-03-09-15-31/ - total 889M - -rw-r--r--. 1 root root 889M Mar 9 15:32 openEuler-aarch64.iso - -rw-r--r--. 1 root root 87 Mar 9 15:32 openEuler-aarch64.iso.sha256 - ``` - - diff --git "a/docs/zh/docs/TailorCustom/isocut\344\275\277\347\224\250\346\214\207\345\215\227.md" "b/docs/zh/docs/TailorCustom/isocut\344\275\277\347\224\250\346\214\207\345\215\227.md" deleted file mode 100644 index 79d839523105d0e9415064c965a7f5b294ae2463..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/TailorCustom/isocut\344\275\277\347\224\250\346\214\207\345\215\227.md" +++ /dev/null @@ -1,384 +0,0 @@ -# isocut 使用指南 - -- [简介](#简介) -- [软硬件要求](#软硬件要求) -- [安装工具](#安装工具) -- [裁剪定制镜像](#裁剪定制镜像) - - [命令介绍](#命令介绍) - - [软件包来源](#软件包来源) - - [操作指导](#操作指导) -- [FAQ](#FAQ) - - [默认 rpm 包列表安装系统失败](#默认-rpm-包列表安装系统失败) - - -## 简介 -openEuler 光盘镜像较大,下载、传输镜像很耗时。另外,使用 openEuler 光盘镜像安装操作系统时,会安装镜像所包含的全量 RPM 软件包,用户无法只安装部分所需的软件包。 - -在某些场景下,用户不需要安装镜像提供的全量软件包,或者需要一些额外的软件包。因此,openEuler 提供了镜像裁剪定制工具。通过该工具,用户可以基于 openEuler 光盘镜像裁剪定制仅包含所需 RPM 软件包的 ISO 镜像。这些软件包可以来自原有 ISO 镜像,也可以额外指定,从而满足用户定制需求。 - -本文档介绍 openEuler 镜像裁剪定制工具的安装和使用方法,以指导用户更好的完成镜像裁剪定制。 - -## 软硬件要求 - -使用 openEuler 裁剪定制工具制作 ISO 所使用的机器需要满足如下软硬件要求: - -- CPU 架构为 AArch64 或者 x86_64 -- 操作系统为 openEuler 20.03 LTS SP3 -- 建议预留 30 GB 以上的磁盘空间(用于运行裁剪定制工具和存放 ISO 镜像) - -## 安装工具 - -此处以 openEuler 20.03 LTS SP3 版本的 AArch64 架构为例,介绍 ISO 镜像裁剪定制工具的安装操作。 - -1. 确认机器已安装操作系统 openEuler 20.03 LTS SP3(镜像裁剪定制工具的运行环境)。 - - ``` shell script - $ cat /etc/openEuler-release - openEuler release 20.03 (LTS-SP3) - ``` - -2. 下载对应架构的 ISO 镜像(必须是 everything 版本),并存放在任一目录(建议该目录磁盘空间大于 20 GB),此处假设存放在 /home/isocut_iso 目录。 - - AArch64 架构的镜像下载链接为: - - https://repo.openeuler.org/openEuler-20.03-LTS-SP3/ISO/aarch64/openEuler-20.03-LTS-SP3-everything-aarch64-dvd.iso - - > **说明:** - > x86_64 架构的镜像下载链接为: - > - > https://repo.openeuler.org/openEuler-20.03-LTS-SP3/ISO/x86_64/openEuler-20.03-LTS-SP3-everything-x86_64-dvd.iso - -3. 创建文件 /etc/yum.repos.d/local.repo,配置对应 yum 源。配置内容参考如下,其中 baseurl 是用于挂载 ISO 镜像的目录: - - ``` shell script - [local] - name=local - baseurl=file:///home/isocut_mount - gpgcheck=0 - enabled=1 - ``` - -4. 使用 root 权限,挂载光盘镜像到 /home/isocut_mount 目录(请与上述 repo 文件中配置的 baseurl 保持一致)作为 yum 源,参考命令如下: - - ```shell - sudo mount -o loop /home/isocut_iso/openEuler-20.03-LTS-SP3-everything-aarch64-dvd.iso /home/isocut_mount - ``` - -5. 使 yum 源生效: - - ```shell - yum clean all - yum makecache - ``` - -6. 使用 root 权限,安装镜像裁剪定制工具: - - ```shell - sudo yum install -y isocut - ``` - -7. 使用 root 权限,确认工具已安装成功。 - - ```shell - $ sudo isocut -h - Checking input ... - usage: isocut [-h] [-t temporary_path] [-r rpm_path] [-k file_path] source_iso dest_iso - - Cut openEuler iso to small one - - positional arguments: - source_iso source iso image - dest_iso destination iso image - - optional arguments: - -h, --help show this help message and exit - -t temporary_path temporary path - -r rpm_path extern rpm packages path - -k file_path kickstart file - ``` - - - -## 裁剪定制镜像 - -此处介绍如何使用镜像裁剪定制工具基于 openEuler 光盘镜像裁剪或添加额外 RPM 软件包制作新镜像的方法。 - -### 命令介绍 - -#### 命令格式 - -镜像裁剪定制工具通过 isocut 命令执行功能。命令的使用格式为: - -**isocut** [ --help | -h ] [ -t <*temp_path*> ] [ -r <*rpm_path*> ] [ -k <*file_path*> ] < *source_iso* > < *dest_iso* > - -#### 参数说明 - -| 参数 | 是否必选 | 参数含义 | -| ---------------- | -------- | ------------------------------------------------------------ | -| --help \| -h | 否 | 查询命令的帮助信息。 | -| -t <*temp_path*> | 否 | 指定工具运行的临时目录 *temp_path*,其中 *temp_path* 为绝对路径。默认为 /tmp 。 | -| -r <*rpm_path*> | 否 | 用户需要额外添加到 ISO 镜像中的 RPM 包路径。 | -| -k <*file_path*> | 否 | 用户需要使用 kickstart 自动安装,指定 kickstart 模板路径。 | -| *source_iso* | 是 | 用于裁剪的 ISO 源镜像所在路径和名称。不指定路径时,默认当前路径。 | -| *dest_iso* | 是 | 裁剪定制生成的 ISO 新镜像存放路径和名称。不指定路径时,默认当前路径。 | - - - -### 软件包来源 - -新镜像的 RPM 包来源有: - -- 原有 ISO 镜像。该情况通过配置文件 /etc/isocut/rpmlist 指定需要安装的 RPM 软件包,配置格式为 "软件包名.对应架构",例如:kernel.aarch64 。 - -- 额外指定。执行 **isocut** 时使用 -r 参数指定软件包所在路径,并将添加的 RPM 包按上述格式添加到配置文件 /etc/isocut/rpmlist 中。 - - - - >![](./public_sys-resources/icon-note.gif) **说明:** - > - >- 裁剪定制镜像时,若无法找到配置文件中指定的 RPM 包,则镜像中不会添加该 RPM 包。 - >- 若 RPM 包的依赖有问题,则裁剪定制镜像时可能会报错。 - - -### kickstart 功能介绍 - -用户需要实现镜像自动化安装,可以通过 kickstart 的方式。在执行 **isocut** 时使用 -k 参数指定 kickstart 文件。 - -isocut 为用户提供了 kickstart 模板,路径是 /etc/isocut/anaconda-ks.cfg,用户可以基于该模板修改。 - -#### 修改 kickstart 模板 - -若用户需要使用 isocut 工具提供的 kickstart 模板,需要修改以下内容: - -- 必须在文件 /etc/isocut/anaconda-ks.cfg 中配置 root 和 grub2 的密码。否则镜像自动化安装会卡在设置密码的环节,等待用户手动输入密码。 -- 如果要添加额外 RPM 包,并使用 kickstart 自动安装,则在 /etc/isocut/rpmlist 和 kickstart 文件的 %packages 字段都要指定该 RPM 包。 - -接下来介绍 kickstart 文件详细修改方法。 - -##### 配置初始密码 - -###### 配置 root 初始密码 - -/etc/isocut/anaconda-ks.cfg 中 root 初始密码的默认配置如下,其中 ${pwd} 需要替换成用户实际的加密密文: - -```shell -rootpw --iscrypted ${pwd} -``` - -这里给出设置 root 初始密码的方法(需使用 root 权限): - -1. 添加用于生成密码的用户,此处假设 testUser。 - - ``` shell script - $ sudo useradd testUser - ``` - -2. 设置 testUser 用户的密码。参考命令如下,根据提示设置密码: - - ``` shell script - $ sudo passwd testUser - Changing password for user testUser. - New password: - Retype new password: - passwd: all authentication tokens updated successfully. - ``` - -3. 查看 /etc/shadow 文件,获取加密密码(用户 testUser 后,两个 : 间的字符串,此处使用 *** 代替)。 - - ``` shell script - $ sudo cat /etc/shadow | grep testUser - testUser:***:19052:0:90:7:35:: - ``` - -4. 拷贝上述加密密码替换 /etc/isocut/anaconda-ks.cfg 中的 pwd 字段,如下所示(请用实际内容替换 *** ): - ``` shell script - rootpw --iscrypted *** - ``` - -###### 配置 grub2 初始密码 - -/etc/isocut/anaconda-ks.cfg 文件中添加以下配置,配置 grub2 初始密码。其中 ${pwd} 需要替换成用户实际的加密密文: - -```shell -%addon com_huawei_grub_safe --iscrypted --password='${pwd}' -%end -``` - -> ![](./public_sys-resources/icon-note.gif)说明: -> -> - 配置 grub 初始密码需要使用 root 权限。 -> - grub 密码对应的默认用户为 root 。 -> -> - 系统中需有 grub2-set-password 命令,若不存在,请提前安装该命令。 - -1. 执行如下命令,根据提示设置 grub2 密码: - - ```shell - $ sudo grub2-set-password -o ./ - Enter password: - Confirm password: - grep: .//grub.cfg: No such file or directory - WARNING: The current configuration lacks password support! - Update your configuration with grub2-mkconfig to support this feature. - ``` - -2. 命令执行完成后,会在当前目录生成 user.cfg 文件,grub.pbkdf2.sha512 开头的内容即 grub2 加密密码。 - - ```shell - $ sudo cat user.cfg - GRUB2_PASSWORD=grub.pbkdf2.sha512.*** - ``` - -3. 复制上述密文,并在 /etc/isocut/anaconda-ks.cfg 文件中增加如下配置: - - ```shell - %addon com_huawei_grub_safe --iscrypted --password='grub.pbkdf2.sha512.***' - %end - ``` - -##### 配置 %packages 字段 - -如果需要添加额外 RPM 包,并使用 kickstart 自动安装,需要在 /etc/isocut/rpmlist 和 kickstart 文件的 %packages 字段都指定该 RPM 包。 - -此处介绍在 /etc/isocut/anaconda-ks.cfg 文件中添加 RPM 包。 - -/etc/isocut/anaconda-ks.cfg 文件的 %packages 默认配置如下: - -```shell -%packages --multilib --ignoremissing -acl.aarch64 -aide.aarch64 -...... -NetworkManager.aarch64 -%end -``` - -将额外指定的 RPM 软件包添加到 %packages 配置中,需要遵循如下配置格式: - -"软件包名.对应架构",例如:kernel.aarch64 - -```shell -%packages --multilib --ignoremissing -acl.aarch64 -aide.aarch64 -...... -NetworkManager.aarch64 -kernel.aarch64 -%end -``` - - -### 操作指导 - ->![](./public_sys-resources/icon-note.gif) **说明:** -> ->- 请不要修改或删除 /etc/isocut/rpmlist 文件中的默认配置项。 ->- isocut 的所有操作需要使用 root 权限。 ->- 待裁剪的源镜像可以为基础镜像,也可以是 everything 版镜像,例子中以基础版镜像 openEuler-20.03-LTS-SP3-aarch64-dvd.iso 为例。 ->- 例子中假设新生成的镜像名称为 new.iso,且存放在 /home/result 路径;运行工具的临时目录为 /home/temp;额外的 RPM 软件包存放在 /home/rpms 目录。 - - - -1. 修改配置文件 /etc/isocut/rpmlist,指定用户需要安装的 RPM 软件包(来自原有 ISO 镜像)。 - - ``` shell script - sudo vi /etc/isocut/rpmlist - ``` - -2. 确定运行镜像裁剪定制工具的临时目录空间大于 8 GB 。默认临时目录为 /tmp,也可以使用 -t 参数指定其他目录作为临时目录,该目录必须为绝对路径。本例中使用目录 /home/temp,由如下回显可知 /home 目录可用磁盘为 38 GB,满足要求。 - - ```shell - $ df -h - Filesystem Size Used Avail Use% Mounted on - devtmpfs 1.2G 0 1.2G 0% /dev - tmpfs 1.5G 0 1.5G 0% /dev/shm - tmpfs 1.5G 23M 1.5G 2% /run - tmpfs 1.5G 0 1.5G 0% /sys/fs/cgroup - /dev/mapper/openeuler_openeuler-root 69G 2.8G 63G 5% / - /dev/sda2 976M 114M 796M 13% /boot - /dev/mapper/openeuler_openeuler-home 61G 21G 38G 35% /home - ``` - -3. 执行裁剪定制。 - - **场景一**:新镜像的所有 RPM 包来自原有 ISO 镜像 - - ``` shell script - $ sudo isocut -t /home/temp /home/isocut_iso/openEuler-20.03-LTS-SP3-aarch64-dvd.iso /home/result/new.iso - Checking input ... - Checking user ... - Checking necessary tools ... - Initing workspace ... - Copying basic part of iso image ... - Downloading rpms ... - Finish create yum conf - finished - Regenerating repodata ... - Checking rpm deps ... - Getting the description of iso image ... - Remaking iso ... - Adding checksum for iso ... - Adding sha256sum for iso ... - ISO cutout succeeded, enjoy your new image "/home/result/new.iso" - isocut.lock unlocked ... - ``` - 回显如上,说明新镜像 new.iso 定制成功。 - - **场景二**:新镜像的 RPM 包除来自原有 ISO 镜像,还包含来自 /home/rpms 的额外软件包 - - ```shell - sudo isocut -t /home/temp -r /home/rpms /home/isocut_iso/openEuler-20.03-LTS-SP3-aarch64-dvd.iso /home/result/new.iso - ``` - - **场景三**:使用 kickstart 文件实现自动化安装,需要修改 /etc/isocut/anaconda-ks.cfg 文件 - ```shell - sudo isocut -t /home/temp -k /etc/isocut/anaconda-ks.cfg /home/isocut_iso/openEuler-20.03-LTS-SP3-aarch64-dvd.iso /home/result/new.iso - ``` - - -## FAQ - -### 默认 rpm 包列表安装系统失败 - -#### 背景描述 - -用户使用 isocut 裁剪镜像时通过配置文件 /etc/isocut/rpmlist 指定需要安装的软件包。 - -由于不同版本会有软件包减少,可能导致裁剪镜像时出现缺包等问题。 -因此 /etc/isocut/rpmlist 中默认只包含 kernel 软件包。 -保证默认配置裁剪镜像必定成功。 - -#### 问题描述 - -使用默认配置裁剪出来的 iso 镜像,能够裁剪成功,但是安装可能失败。 - -安装报错缺包,报错截图如下: - -![](./figures/lack_pack.png) - -#### 原因分析 - -使用默认配置的 RPM 软件包列表,裁剪的 iso 镜像在安装时缺少必要的 RPM 包。 -缺少的包如报错的图示,并且在不同版本中,缺少的 RPM 包也可能是不同的,以安装时实际报错为准。 - -#### 解决方案 - -1. 增加缺少的包 - - 1. 根据报错的提示整理缺少的 RPM 包列表 - 2. 将上述 RPM 包列表添加到配置文件 /etc/isocut/rpmlist 中。 - 3. 再次裁剪安装 iso 镜像 - - 以问题描述中的缺包情况为例,修改 rpmlist 配置文件如下: - ```shell - $ cat /etc/isocut/rpmlist - kernel.aarch64 - lvm2.aarch64 - chrony.aarch64 - authselect.aarch64 - shim.aarch64 - efibootmgr.aarch64 - grub2-efi-aa64.aarch64 - dosfstools.aarch64 - ``` - diff --git a/docs/zh/docs/TailorCustom/overview.md b/docs/zh/docs/TailorCustom/overview.md deleted file mode 100644 index a59750ba597bdf80a9c5b817c6e18f067b7d463e..0000000000000000000000000000000000000000 --- a/docs/zh/docs/TailorCustom/overview.md +++ /dev/null @@ -1,3 +0,0 @@ -# 裁剪定制工具使用指南 - -本文主要介绍 openEuler 的裁剪定制工具,包含工具的介绍、安装以及使用等内容。 \ No newline at end of file diff --git a/docs/zh/docs/TailorCustom/public_sys-resources/icon-note.gif b/docs/zh/docs/TailorCustom/public_sys-resources/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/TailorCustom/public_sys-resources/icon-note.gif and /dev/null differ diff --git a/docs/zh/docs/oncn-bwm/overview.md b/docs/zh/docs/oncn-bwm/overview.md deleted file mode 100644 index 4c2e2af162808890545ba76234a2f93d84b6346a..0000000000000000000000000000000000000000 --- a/docs/zh/docs/oncn-bwm/overview.md +++ /dev/null @@ -1,258 +0,0 @@ -# oncn-bwm用户指南 - -## 简介 - -随着云计算、大数据、人工智能、5G、物联网等技术的迅速发展,数据中心的建设越来越重要。然而,数据中心的服务器资源利用率很低,造成了巨大的资源浪费。为了提高服务器资源利用率,oncn-bwm 应运而生。 - -oncn-bwm 是一款适用于离线业务混合部署场景的 Pod 带宽管理工具,它会根据 QoS 分级对节点内的网络资源进行合理调度,保障在线业务服务体验的同时,大幅提升节点整体的网络带宽利用率。 - -oncn-bwm 工具支持如下功能: - -- 使能/去除/查询 Pod 带宽管理 -- 设置 Pod 网络优先级 -- 设置离线业务带宽范围和在线业务水线 -- 内部统计信息查询 - - - -## 安装 - -安装 oncn-bwm 工具需要操作系统为 openEuler 22.09,在配置了 openEuler yum 源的机器直接使用 yum 命令安装,参考命令如下: - -```shell -# yum install oncn-bwm -``` - -此处介绍如何安装 oncn-bwm 工具。 - -### 环境要求 - -* 操作系统:openEuler 22.09 - -### 安装步骤 - -安装 oncn-bwm 工具的操作步骤如下: - -1. 配置openEuler的yum源,直接使用yum命令安装 - - ``` - yum install oncn-bwm - ``` - -## 使用方法 - -oncn-bwm 工具提供了 `bwmcli` 命令行工具来使能 Pod 带宽管理或进行相关配置。`bwmcli` 命令的整体格式如下: - -**bwmcli** < option(s) > - -> 说明: -> -> 使用 `bwmcli` 命令需要 root 权限。 -> -> 仅支持节点上出方向(报文从节点内发往其他节点)的 Pod 带宽管理。 -> -> 已设置 tc qdisc 规则的网卡,不支持使能 Pod 带宽管理。 -> -> 升级 oncn-bwm 包不会影响升级前的使能状态;卸载 oncn-bwm 包会关闭所有网卡的 Pod 带宽管理。 - - -### 命令接口 - -#### Pod 带宽管理 - -**命令和功能** - -| 命令格式 | 功能 | -| --------------------------- | ------------------------------------------------------------ | -| **bwmcli –e** <网卡名称> | 使能指定网卡的 Pod 带宽管理。 | -| **bwmcli -d** <网卡名称> | 去除指定网卡的 Pod 带宽管理。 | -| **bwmcli -p devs** | 查询节点所有网卡的 Pod 带宽管理。 | - -> 说明: -> -> - 不指定网卡名时,上述命令会对节点上的所有的网卡生效。 -> -> - 执行 `bwmcli` 其他命令前需要开启 Pod 带宽管理。 - - - -**使用示例** - -- 使能网卡 eth0 和 eth1 的 Pod 带宽管理 - - ```shell - # bwmcli –e eth0 –e eth1 - enable eth0 success - enable eth1 success - ``` - -- 取消网卡 eth0 和 eth1 的 Pod 带宽管理 - - ```shell - # bwmcli –d eth0 –d eth1 - disable eth0 success - disable eth1 success - ``` - -- 查询节点所有网卡的 Pod 带宽管理 - - ```shell - # bwmcli -p devs - eth0 : enabled - eth1 : disabled - eth2 : disabled - docker0 : disabled - lo : disabled - ``` - -#### Pod 网络优先级 - -**命令和功能** - -| 命令格式 | 功能 | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| **bwmcli –s** *path* | 设置 Pod 网络优先级。其中 *path* 为 Pod 对应的 cgroup 路径, *prio* 为优先级。*path* 取相对路径或者绝对路径均可。 *prio* 默认值为 0,可选值为 0 和 -1,0 标识为在线业务,-1 标识为离线业务。 | -| **bwmcli –p** *path* | 查询 Pod 网络优先级。 | - -> 说明: -> -> 支持在线或离线两种网络优先级,oncn-bwm 工具会按照网络优先级实时控制 Pod 的带宽,具体策略为:对于在线类型的 Pod ,不会限制其带宽;对于离线类型的 Pod ,会将其带宽限制在离线带宽范围内。 - -**使用示例** - -- 设置 cgroup 路径为 /sys/fs/cgroup/net_cls/test_online 的 Pod 的优先级为 0 - - ```shell - # bwmcli -s /sys/fs/cgroup/net_cls/test_online 0 - set prio success - ``` - -- 查询 cgroup 路径为 /sys/fs/cgroup/net_cls/test_online 的 Pod 的优先级 - - ```shell - # bwmcli -p /sys/fs/cgroup/net_cls/test_online - 0 - ``` - - - -#### 离线业务带宽范围 - -| 命令格式 | 功能 | -| ---------------------------------- | ------------------------------------------------------------ | -| **bwmcli –s bandwidth** | 设置一个主机/虚拟机的离线带宽。其中 low 表示最低带宽,high 表示最高带宽,其单位可取值为 kb/mb/gb ,有效范围为 [1mb, 9999gb]。 | -| **bwmcli –p bandwidth** | 查询设置一个主机/虚拟机的离线带宽。 | - -> 说明: -> -> - 一个主机上所有使能 Pod 带宽管理的网卡在实现内部被当成一个整体看待,也就是共享设置的在线业务水线和离线业务带宽范围。 -> -> - 使用 `bwmcli` 设置 Pod 带宽对此节点上所有离线业务生效,所有离线业务的总带宽不能超过离线业务带宽范围。在线业务没有网络带宽限制。 -> -> - 离线业务带宽范围与在线业务水线共同完成离线业务带宽限制,当在线业务带宽低于设置的水线时:离线业务允许使用设置的最高带宽;当在线业务带宽高于设置的水线时,离线业务允许使用设置的最低带宽。 - - - -**使用示例** - -- 设置离线带宽范围在 30mb 到 100mb - - ```shell - # bwmcli -s bandwidth 30mb,100mb - set bandwidth success - ``` - -- 查询离线带宽范围 - - ```shell - # bwmcli -p bandwidth - bandwidth is 31457280(B),104857600(B) - ``` - - - - -#### 在线业务水线 - -**命令和功能** - -| 命令格式 | 功能 | -| ---------------------------------------------- | ------------------------------------------------------------ | -| **bwmcli –s waterline** | 设置一个主机/虚拟机的在线业务水线,其中 *val* 为水线值,单位可取值为 kb/mb/gb ,有效范围为 [20mb, 9999gb]。 | -| **bwmcli –p waterline** | 查询一个主机/虚拟机的在线业务水线。 | - -> 说明: -> -> - 当一个主机上所有在线业务的总带宽高于水线时,会限制离线业务可以使用的带宽,反之当一个主机上所有在线业务的总带宽低于水线时,会提高离线业务可以使用的带宽。 -> - 判断在线业务的总带宽是否超过/低于设置的水线的时机:每 10 ms 判断一次,根据每个 10 ms 内统计的在线带宽是否高于水线来决定对离线业务采用的带宽限制。 - -**使用示例** - -- 设置在线业务水线为 20mb - - ```shell - # bwmcli -s waterline 20mb - set waterline success - ``` - -- 查询在线业务水线 - - ```shell - # bwmcli -p waterline - waterline is 20971520(B) - ``` - - - -#### 统计信息 - -**命令和功能** - -| 命令格式 | 功能 | -| ------------------- | ------------------ | -| **bwmcli –p stats** | 查询内部统计信息。 | - - -> 说明: -> -> - offline_target_bandwidth 表示离线业务目标带宽 -> -> - online_pkts 表示开启 Pod 带宽管理后在线业务总包数 -> -> - offline_pkts 表示开启 Pod 带宽管理后离线业务总包数 -> -> - online_rate 表示当前在线业务速率 -> -> - offline_rate 表示当前离线业务速率 - - -**使用示例** - -查询内部统计信息 - -```shell -# bwmcli -p stats -offline_target_bandwidth: 2097152 -online_pkts: 2949775 -offline_pkts: 0 -online_rate: 602 -offline_rate: 0 -``` - - - - - -### 典型使用案例 - -完整配置一个节点上的 Pod 带宽管理可以按照如下步骤顺序操作: - -``` -bwmcli -p devs # 查询系统当前网卡 Pod 带宽管理状态 -bwmcli -e eth0 # 使能 eth0 的网卡 Pod 带宽管理 -bwmcli -s /sys/fs/cgroup/net_cls/online 0 # 设置在线业务 Pod 的网络优先级为 0 -bwmcli -s /sys/fs/cgroup/net_cls/offline -1 # 设置离线业务 Pod 的网络优先级为 -1 -bwmcli -s bandwidth 20mb,1gb # 配置离线业务带宽范围 -bwmcli -s waterline 30mb # 配置在线业务的水线 -``` - diff --git a/docs/zh/docs/rubik/figures/icon-note.gif b/docs/zh/docs/rubik/figures/icon-note.gif deleted file mode 100644 index 6314297e45c1de184204098efd4814d6dc8b1cda..0000000000000000000000000000000000000000 Binary files a/docs/zh/docs/rubik/figures/icon-note.gif and /dev/null differ diff --git "a/docs/zh/docs/rubik/http\346\216\245\345\217\243\346\226\207\346\241\243.md" "b/docs/zh/docs/rubik/http\346\216\245\345\217\243\346\226\207\346\241\243.md" deleted file mode 100644 index 75bca4a21377e1cd452552922cb1da17e27520ed..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/rubik/http\346\216\245\345\217\243\346\226\207\346\241\243.md" +++ /dev/null @@ -1,67 +0,0 @@ -# http接口 - -## 概述 - -rubik对外开放接口均为http接口,当前包括pod优先级设置/更新接口、rubik探活接口和rubik版本号查询接口。 - -## 接口介绍 - -### 设置、更新Pod优先级接口 - -rubik提供了设置或更新pod优先级的功能,外部可通过调用该接口发送pod相关信息,rubik根据接收到的pod信息对其设置优先级从而达到资源隔离的目的。接口调用格式为: - -```bash -HTTP POST /run/rubik/rubik.sock -{ - "Pods": { - "podaaa": { - "CgroupPath": "kubepods/burstable/podaaa", - "QosLevel": 0 - }, - "podbbb": { - "CgroupPath": "kubepods/burstable/podbbb", - "QosLevel": -1 - } - } -} -``` - -Pods 配置中为需要设置或更新优先级的 Pod 信息,每一个http请求至少需要指定配置1个 pod,每个 pod 必须指定CgroupPath 和 QosLevel,其含义如下: - -| 配置项 | 配置值类型 | 配置取值范围 | 配置含义 | -| ---------- | ---------- | ------------ | ------------------------------------------------------- | -| QosLevel | int | 0、-1 | pod优先级,0表示其为在线业务,-1表示其为离线业务 | -| CgroupPath | string | 相对路径 | 对应Pod的cgroup子路径(即其在cgroup子系统下的相对路径) | - -接口调用示例如下: - -```sh -curl -v -H "Accept: application/json" -H "Content-type: application/json" -X POST --data '{"Pods": {"podaaa": {"CgroupPath": "kubepods/burstable/podaaa","QosLevel": 0},"podbbb": {"CgroupPath": "kubepods/burstable/podbbb","QosLevel": -1}}}' --unix-socket /run/rubik/rubik.sock http://localhost/ -``` - -### 探活接口 - -rubik作为HTTP服务,提供探活接口用于帮助判断rubik是否处于运行状态。 - -接口形式:HTTP/GET /ping - -接口调用示例如下: - -```sh -curl -XGET --unix-socket /run/rubik/rubik.sock http://localhost/ping -``` - -若返回ok则代表rubik服务处于运行状态。 - -### 版本信息查询接口 - -rubik支持通过HTTP请求查询当前rubik的版本号。 - -接口形式:HTTP/GET /version - -接口调用示例如下: - -```sh -curl -XGET --unix-socket /run/rubik/rubik.sock http://localhost/version -{"Version":"0.0.1","Release":"1","Commit":"29910e6","BuildTime":"2021-05-12"} -``` diff --git a/docs/zh/docs/rubik/overview.md b/docs/zh/docs/rubik/overview.md deleted file mode 100644 index 52f70b7f583482bf51f21b8956771da0ada6b3d5..0000000000000000000000000000000000000000 --- a/docs/zh/docs/rubik/overview.md +++ /dev/null @@ -1,18 +0,0 @@ -# rubik使用指南 - -## 概述 - -服务器资源利用率低一直是业界公认的难题,随着云原生技术的发展,将在线(高优先级)、离线(低优先级)业务混合部署成为了当下提高资源利用率的有效手段。 - -rubik容器调度在业务混合部署的场景下,根据QoS分级,对资源进行合理调度,从而实现在保障在线业务服务质量的前提下,大幅提升资源利用率。 - -rubik当前支持如下特性: - -- pod CPU优先级的配置 -- pod memory优先级的配置 - -本文档适用于使用openEuler系统并希望了解和使用rubik的社区开发者、开源爱好者以及相关合作伙伴。使用人员需要具备以下经验和技能: - -* 熟悉Linux基本操作 -* 熟悉kubernetes和docker/iSulad基本操作 - diff --git "a/docs/zh/docs/rubik/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" "b/docs/zh/docs/rubik/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" deleted file mode 100644 index 1a1df9490d08a90bb582327285ac5d478ebff1ef..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/rubik/\345\256\211\350\243\205\344\270\216\351\203\250\347\275\262.md" +++ /dev/null @@ -1,199 +0,0 @@ -# 安装与部署 - -## 概述 - -本章节主要介绍rubik组件的安装以及部署方式。 - -## 软硬件要求 - -### 硬件要求 - -* 当前仅支持 x86、aarch64架构 -* rubik磁盘使用需求:配额1GB及以上。 -* rubik内存使用需求:配额100MB及以上。 - -### 软件要求 - -* 操作系统:openEuler 22.03-LTS -* 内核:openEuler 22.03-LTS版本内核 - -### 环境准备 - -* 安装 openEuler 系统,安装方法参考《openEuler 22.03-LTS 安装指南》 -* 安装并部署 kubernetes,安装及部署方法参考《Kubernetes 集群部署指南》。 -* 安装docker或isulad容器引擎,若采用isulad容器引擎,需同时安装isula-build容器镜像构建工具。 - -## 安装rubik - -rubik以k8s daemonSet形式部署在k8s的每一个节点上,故需要在每一个节点上使用以下步骤安装rubik rpm包。 - -1. 配置 yum 源:openEuler 22.03-LTS 和 openEuler 22.03-LTS:EPOL(rubik组件当前仅在EPOL源中),参考如下: - - ``` - # openEuler 22.03-LTS 官方发布源 - name=openEuler22.03 - baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/everything/$basearch/ - enabled=1 - gpgcheck=1 - gpgkey=https://repo.openeuler.org/openEuler-22.03-LTS/everything/$basearch/RPM-GPG-KEY-openEuler - ``` - - ``` - # openEuler 22.03-LTS:Epol 官方发布源 - name=Epol - baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/EPOL/$basearch/ - enabled=1 - gpgcheck=0 - ``` - -2. 使用root权限安装rubik: - - ```shell - sudo yum install -y rubik - ``` - - -> ![](.\figures/icon-note.gif)**说明**: -> -> rubik工具相关文件会安装在/var/lib/rubik目录下 - -## 部署rubik - -rubik以容器形式运行在混合部署场景下的k8s集群中,用于对不同优先级业务进行资源隔离和限制,避免离线业务对在线业务产生干扰,在提高资源总体利用率的同时保障在线业务的服务质量。当前rubik支持对CPU、内存资源进行隔离和限制,需配合openEuler 22.03-LTS版本的内核使用。若用户想要开启内存优先级特性(即针对不同优先级业务实现内存资源的分级),需要通过设置/proc/sys/vm/memcg_qos_enable开关,有效值为0和1,其中0为默认值表示关闭特性,1表示开启特性。 - -```bash -sudo echo 1 > /proc/sys/vm/memcg_qos_enable -``` - -### 部署rubik daemonset - -1. 使用docker或isula-build容器引擎构建rubik镜像,由于rubik以daemonSet形式部署,故每一个节点都需要rubik镜像。用户可以在一个节点构建镜像后使用docker save/load功能将rubik镜像load到k8s的每一个节点,也可以在各节点上都构建一遍rubik镜像。以isula-build为例,参考命令如下: - -```sh -isula-build ctr-img build -f /var/lib/rubik/Dockerfile --tag rubik:0.1.0 . -``` - -2. 在k8s master节点,修改`/var/lib/rubik/rubik-daemonset.yaml`文件中的rubik镜像名,与上一步构建出来的镜像名保持一致。 - -```yaml -... -containers: -- name: rubik-agent - image: rubik:0.1.0 # 此处镜像名需与上一步构建的rubik镜像名一致 - imagePullPolicy: IfNotPresent -... -``` - -3. 在k8s master节点,使用kubectl命令部署rubik daemonset,rubik会自动被部署在k8s的所有节点: - -```sh -kubectl apply -f /var/lib/rubik/rubik-daemonset.yaml -``` - -4. 使用`kubectl get pods -A`命令查看rubik是否已部署到集群每一个节点上(rubik-agent数量与节点数量相同且均为Running状态) - -```sh -[root@localhost rubik]# kubectl get pods -A -NAMESPACE NAME READY STATUS RESTARTS AGE -... -kube-system rubik-agent-76ft6 1/1 Running 0 4s -... -``` - -## 常用配置说明 - -通过以上方式部署的rubik将以默认配置启动,用户可以根据实际需要修改rubik配置,可通过修改rubik-daemonset.yaml文件中的config.json段落内容后重新部署rubik daemonset实现。 - -本章介绍 config.json 的常用配置,以方便用户根据需要进行配置。 - -### 配置项说明 - -```yaml -# 该部分配置内容位于rubik-daemonset.yaml文件中的config.json段落 -{ - "autoConfig": true, - "autoCheck": false, - "logDriver": "stdio", - "logDir": "/var/log/rubik", - "logSize": 1024, - "logLevel": "info", - "cgroupRoot": "/sys/fs/cgroup" -} -``` - -| 配置项 | 配置值类型 | 配置取值范围 | 配置含义 | -| ---------- | ---------- | ------------------ | ------------------------------------------------------------ | -| autoConfig | bool | true、false | true:开启Pod自动感知功能。
false:关闭 Pod 自动感知功能。 | -| autoCheck | bool | true、false | true:开启 Pod 优先级校验功能。
false:关闭 Pod 优先级校验功能。 | -| logDriver | string | stdio、file | stdio:直接向标准输出打印日志,日志收集和转储由调度平台完成。
file:将文件打印到日志目录,路径由logDir指定。 | -| logDir | string | 绝对路径 | 指定日志存放的目录路径。 | -| logSize | int | [10,1048576] | 指定日志存储总大小,单位 MB,若日志总量达到上限则最早的日志会被丢弃。 | -| logLevel | string | error、info、debug | 指定日志级别。 | -| cgroupRoot | string | 绝对路径 | 指定 cgroup 挂载点。 | - -### Pod优先级自动配置 - -若在rubik config中配置autoConfig为true开启了Pod自动感知配置功能,用户仅需在部署业务pod时在yaml中通过annotation指定其优先级,部署后rubik会自动感知当前节点pod的创建与更新,并根据用户配置的优先级设置pod优先级。 - -### 依赖于kubelet的Pod优先级配置 - -由于Pod优先级自动配置依赖于来自api-server pod创建事件的通知,具有一定的延迟性,无法在进程启动之前及时完成Pod优先级的配置,导致业务性能可能存在抖动。用户可以关闭优先级自动配置选项,通过修改kubelet源码,在容器cgroup创建后、容器进程启动前调用rubik http接口配置pod优先级,http接口具体使用方法详见[http接口文档](./http接口文档.md) - -### 支持自动校对Pod优先级 - -rubik支持在启动时对当前节点Pod QoS优先级配置进行一致性校对,此处的一致性是指k8s集群中的配置和rubik对pod优先级的配置之间的一致性。该校对功能默认关闭,用户可以通过 autoCheck 选项控制是否开启。若开启该校对功能,启动或者重启 rubik 时,rubik会自动校验并更正当前节点pod优先级配置。 - -## 在离线业务配置示例 - -rubik部署成功后,用户在部署实际业务时,可以根据以下配置示例对业务yaml文件进行修改,指定业务的在离线类型,rubik即可在业务部署后对其优先级进行配置,从而达到资源隔离的目的。 - -以下为部署一个nginx在线业务的示例: - -```yaml -apiVersion: v1 -kind: Pod -metadata: - name: nginx - namespace: qosexample - annotations: - volcano.sh/preemptable: "false" # volcano.sh/preemptable为true代表业务为离线业务,false代表业务为在线业务,默认为false -spec: - containers: - - name: nginx - image: nginx - resources: - limits: - memory: "200Mi" - cpu: "1" - requests: - memory: "200Mi" - cpu: "1" -``` - -## 约束限制 - -- rubik接受HTTP请求并发量上限1000QPS,并发量超过上限则报错。 - -- rubik接受的单个请求中pod上限为100个,pod数量越界则报错。 - -- 每个k8s节点只能部署一个rubik,多个rubik会冲突。 - -- rubik不提供端口访问,只能通过socket通信。 - -- rubik只接收合法http请求路径及网络协议:http://localhost/(POST)、http://localhost/ping(GET)、http://localhost/version(GET)。各http请求的功能详见[http接口文档](./http接口文档.md) - -- rubik磁盘使用需求:配额1GB及以上。 - -- rubik内存使用需求:配额100MB及以上。 - -- 禁止将业务从低优先级(离线业务)往高优先级(在线业务)切换。如业务A先被设置为离线业务,接着请求设置为在线业务,rubik报错。 - -- 容器挂载目录时,rubik本地套接字/run/rubik的目录权限需由业务侧保证最小权限700。 - -- rubik服务端可用时,单个请求超时时间为120s。如果rubik进程进入T(暂停状态或跟踪状态)、D状态(不可中断的睡眠状态),则服务端不可用,此时rubik服务不会响应任何请求。为了避免此情况的发生,请在客户端设置超时时间,避免无限等待。 - -- 使用混部后,原始的cgroup cpu share功能存在限制。具体表现为: - - 若当前CPU中同时有在线任务和离线任务运行,则离线任务的CPU share配置无法生效。 - - 若当前CPU中只有在线任务或只有离线任务,CPU share能生效。 \ No newline at end of file diff --git "a/docs/zh/docs/rubik/\346\267\267\351\203\250\351\232\224\347\246\273\347\244\272\344\276\213.md" "b/docs/zh/docs/rubik/\346\267\267\351\203\250\351\232\224\347\246\273\347\244\272\344\276\213.md" deleted file mode 100644 index 109179e5a80cf48f39365f3f5036f185a816340a..0000000000000000000000000000000000000000 --- "a/docs/zh/docs/rubik/\346\267\267\351\203\250\351\232\224\347\246\273\347\244\272\344\276\213.md" +++ /dev/null @@ -1,233 +0,0 @@ -## 混部隔离示例 - -### 环境准备 - -查看内核是否支持混部隔离功能 - -```bash -# 查看/boot/config-系统配置是否开启混部隔离功能 -# 若CONFIG_QOS_SCHED=y则说明使能了混部隔离功能,例如: -cat /boot/config-5.10.0-60.18.0.50.oe2203.x86_64 | grep CONFIG_QOS -CONFIG_QOS_SCHED=y -``` - -安装docker容器引擎 - -```bash -yum install -y docker-engine -docker version -# 如下为docker version显示结果 -Client: - Version: 18.09.0 - EulerVersion: 18.09.0.300 - API version: 1.39 - Go version: go1.17.3 - Git commit: aa1eee8 - Built: Wed Mar 30 05:07:38 2022 - OS/Arch: linux/amd64 - Experimental: false - -Server: - Engine: - Version: 18.09.0 - EulerVersion: 18.09.0.300 - API version: 1.39 (minimum version 1.12) - Go version: go1.17.3 - Git commit: aa1eee8 - Built: Tue Mar 22 00:00:00 2022 - OS/Arch: linux/amd64 - Experimental: false -``` - -### 混部业务 - -**在线业务(clickhouse)** - -使用clickhouse-benchmark测试工具进行性能测试,统计出QPS/P50/P90/P99等相关性能指标,用法参考:https://clickhouse.com/docs/zh/operations/utilities/clickhouse-benchmark/ - -**离线业务(stress)** - -stress是一个CPU密集型测试工具,可以通过指定--cpu参数启动多个并发CPU密集型任务给系统环境加压 - -### 使用说明 - -1)启动一个clickhouse容器(在线业务)。 - -2)进入容器内执行clickhouse-benchmark命令,设置并发线程数为10个、查询10000次、查询总时间30s。 - -3)同时启动一个stress容器(离线业务),并发执行10个CPU密集型任务对环境进行加压。 - -4)clickhouse-benchmark执行完后输出一个性能测试报告。 - -混部隔离测试脚本(**test_demo.sh**)如下: - -```bash -#!/bin/bash - -with_offline=${1:-no_offline} -enable_isolation=${2:-no_isolation} -stress_num=${3:-10} -concurrency=10 -timeout=30 -output=/tmp/result.json -online_container= -offline_container= - -exec_sql="echo \"SELECT * FROM system.numbers LIMIT 10000000 OFFSET 10000000\" | clickhouse-benchmark -i 10000 -c $concurrency -t $timeout" - -function prepare() -{ - echo "Launch clickhouse container." - online_container=$(docker run -itd \ - -v /tmp:/tmp:rw \ - --ulimit nofile=262144:262144 \ - -p 34424:34424 \ - yandex/clickhouse-server) - - sleep 3 - echo "Clickhouse container lauched." -} - -function clickhouse() -{ - echo "Start clickhouse benchmark test." - docker exec $online_container bash -c "$exec_sql --json $output" - echo "Clickhouse benchmark test done." -} - -function stress() -{ - echo "Launch stress container." - offline_container=$(docker run -itd joedval/stress --cpu $stress_num) - echo "Stress container launched." - - if [ $enable_isolation == "enable_isolation" ]; then - echo "Set stress container qos level to -1." - echo -1 > /sys/fs/cgroup/cpu/docker/$offline_container/cpu.qos_level - fi -} - -function benchmark() -{ - if [ $with_offline == "with_offline" ]; then - stress - sleep 3 - fi - clickhouse - echo "Remove test containers." - docker rm -f $online_container - docker rm -f $offline_container - echo "Finish benchmark test for clickhouse(online) and stress(offline) colocation." - echo "===============================clickhouse benchmark==================================================" - cat $output - echo "===============================clickhouse benchmark==================================================" -} - -prepare -benchmark -``` - -### 测试结果 - -单独执行clickhouse在线业务 - -```bash -sh test_demo.sh no_offline no_isolation -``` - -得到在线业务的QoS(QPS/P50/P90/P99等指标)**基线数据**如下: - -```json -{ -"localhost:9000": { -"statistics": { -"QPS": 1.8853412284364512, -...... -}, -"query_time_percentiles": { -...... -"50": 0.484905256, -"60": 0.519641313, -"70": 0.570876148, -"80": 0.632544937, -"90": 0.728295525, -"95": 0.808700418, -"99": 0.873945121, -...... -} -} -} -``` - -启用stress离线业务,未开启混部隔离功能下,执行test_demo.sh测试脚本 - -```bash -# with_offline参数表示启用stress离线业务 -# no_isolation参数表示未开启混部隔离功能 -sh test_demo.sh with_offline no_isolation -``` - -**未开启混部隔离的情况下**,clickhouse业务QoS数据(QPS/P80/P90/P99等指标)如下: - -```json -{ -"localhost:9000": { -"statistics": { -"QPS": 0.9424028693636205, -...... -}, -"query_time_percentiles": { -...... -"50": 0.840476774, -"60": 1.304607373, -"70": 1.393591017, -"80": 1.41277543, -"90": 1.430316688, -"95": 1.457534764, -"99": 1.555646855, -...... -} -} -``` - -启用stress离线业务,开启混部隔离功能下,执行test_demo.sh测试脚本 - -```bash -# with_offline参数表示启用stress离线业务 -# enable_isolation参数表示开启混部隔离功能 -sh test_demo.sh with_offline enable_isolation -``` - -**开启混部隔离功能的情况下**,clickhouse业务QoS数据(QPS/P80/P90/P99等指标)如下: - -```json -{ -"localhost:9000": { -"statistics": { -"QPS": 1.8825798759270718, -...... -}, -"query_time_percentiles": { -...... -"50": 0.485725185, -"60": 0.512629901, -"70": 0.55656488, -"80": 0.636395956, -"90": 0.734695906, -"95": 0.804118275, -"99": 0.887807409, -...... -} -} -} -``` - -从上面的测试结果整理出一个表格如下: - -| 业务部署方式 | QPS | P50 | P90 | P99 | -| -------------------------------------- | ------------- | ------------- | ------------- | ------------- | -| 单独运行clickhouse在线业务(基线) | 1.885 | 0.485 | 0.728 | 0.874 | -| clickhouse+stress(未开启混部隔离功能) | 0.942(-50%) | 0.840(-42%) | 1.430(-49%) | 1.556(-44%) | -| clickhouse+stress(开启混部隔离功能) | 1.883(-0.11%) | 0.486(-0.21%) | 0.735(-0.96%) | 0.888(-1.58%) | - -在未开启混部隔离功能的情况下,在线业务clickhouse的QPS从1.9下降到0.9,同时业务的响应时延(P90)也从0.7s增大到1.4s,在线业务QoS下降了50%左右;而在开启混部隔离功能的情况下,不管是在线业务的QPS还是响应时延(P50/P90/P99)相比于基线值下降不到2%,在线业务QoS基本没有变化。 diff --git a/docs/zh/docs/thirdparty_migration/openstack.md b/docs/zh/docs/thirdparty_migration/openstack.md deleted file mode 100644 index 831b8883bbca3d02b33d81a5c01e5df7471d58f5..0000000000000000000000000000000000000000 --- a/docs/zh/docs/thirdparty_migration/openstack.md +++ /dev/null @@ -1 +0,0 @@ -openEuler OpenStack相关文档已迁移至[OpenStack SIG官网文档](https://openeuler.gitee.io/openstack/)。请访问链接获取详细信息。 diff --git a/docs/zh/menu/index.md b/docs/zh/menu/index.md index 3073f79a85e9ab0d4b63d8ee32481d4b4e75cefb..c7fc8ea7e214f551f9fd91d4399515cd0b5a4b36 100644 --- a/docs/zh/menu/index.md +++ b/docs/zh/menu/index.md @@ -35,7 +35,6 @@ headless: true - [使用DNF管理软件包]({{< relref "./docs/Administration/使用DNF管理软件包.md" >}}) - [管理服务]({{< relref "./docs/Administration/管理服务.md" >}}) - [管理进程]({{< relref "./docs/Administration/管理进程.md" >}}) - - [管理内存]({{< relref "./docs/Administration/memory-management.md" >}}) - [配置网络]({{< relref "./docs/Administration/配置网络.md" >}}) - [使用LVM管理硬盘]({{< relref "./docs/Administration/使用LVM管理硬盘.md" >}}) - [使用KAE加速引擎]({{< relref "./docs/Administration/使用KAE加速引擎.md" >}}) @@ -134,19 +133,11 @@ headless: true - [镜像管理]({{< relref "./docs/Container/镜像管理-4.md" >}}) - [统计信息]({{< relref "./docs/Container/统计信息-4.md" >}}) - [容器镜像构建]({{< relref "./docs/Container/isula-build构建工具.md" >}}) -- [A-Tune用户指南]({{< relref "./docs/A-Tune/A-Tune.md" >}}) - - [认识A-Tune]({{< relref "./docs/A-Tune/认识A-Tune.md" >}}) - - [安装与部署]({{< relref "./docs/A-Tune/安装与部署.md" >}}) - - [使用方法]({{< relref "./docs/A-Tune/使用方法.md" >}}) - - [native-turbo特性]({{< relref "./docs/A-Tune/native-turbo.md" >}}) - - [常见问题与解决方法]({{< relref "./docs/A-Tune/常见问题与解决方法.md" >}}) - - [附录]({{< relref "./docs/A-Tune/附录.md" >}}) - [Embedded用户指南](https://openeuler.gitee.io/yocto-meta-openeuler/master/index.html) - [内核热升级指南]({{< relref "./docs/KernelLiveUpgrade/KernelLiveUpgrade.md" >}}) - [安装与部署]({{< relref "./docs/KernelLiveUpgrade/安装与部署.md" >}}) - [使用方法]({{< relref "./docs/KernelLiveUpgrade/使用方法.md" >}}) - [常见问题与解决方法]({{< relref "./docs/KernelLiveUpgrade/常见问题与解决方法.md" >}}) -- [内存可靠性分级特性使用指南]({{< relref "./docs/Kernel/内存可靠性分级特性使用指南.md" >}}) - [应用开发指南]({{< relref "./docs/ApplicationDev/application-development.md" >}}) - [开发环境准备]({{< relref "./docs/ApplicationDev/开发环境准备.md" >}}) - [使用GCC编译]({{< relref "./docs/ApplicationDev/使用GCC编译.md" >}}) @@ -162,23 +153,8 @@ headless: true - [应用场景]({{< relref "./docs/secGear/应用场景.md" >}}) - [使用secGear工具]({{< relref "./docs/secGear/使用secGear工具.md" >}}) - [接口参考]({{< relref "./docs/secGear/接口参考.md" >}}) -- [Kubernetes集群部署指南]({{< relref "./docs/Kubernetes/Kubernetes.md" >}}) - - [准备虚拟机]({{< relref "./docs/Kubernetes/准备虚拟机.md" >}}) - - [手动部署集群]({{< relref "./docs/Kubernetes/手动部署集群.md" >}}) - - [安装Kubernetes软件包]({{< relref "./docs/Kubernetes/安装Kubernetes软件包.md" >}}) - - [准备证书]({{< relref "./docs/Kubernetes/准备证书.md" >}}) - - [安装etcd]({{< relref "./docs/Kubernetes/安装etcd.md" >}}) - - [部署控制面组件]({{< relref "./docs/Kubernetes/部署控制面组件.md" >}}) - - [部署Node节点组件]({{< relref "./docs/Kubernetes/部署Node节点组件.md" >}}) - - [自动部署集群]({{< relref "./docs/Kubernetes/eggo自动化部署.md" >}}) - - [工具介绍]({{< relref "./docs/Kubernetes/eggo工具介绍.md" >}}) - - [部署集群]({{< relref "./docs/Kubernetes/eggo部署集群.md" >}}) - - [拆除集群]({{< relref "./docs/Kubernetes/eggo拆除集群.md" >}}) - - [运行测试pod]({{< relref "./docs/Kubernetes/运行测试pod.md" >}}) - [Kubeedge部署指南]({{< relref "./docs/Kubernetes/Kubernetes.md" >}}) -- [k3s部署指南]({{< relref "./docs/k3s/k3s部署指南.md" >}}) - [第三方软件安装指南]({{< relref "./docs/thirdparty_migration/thidrparty.md" >}}) - - [OpenStack]({{< relref "./docs/thirdparty_migration/openstack.md" >}}) - [HA 用户指南]({{< relref "./docs/thirdparty_migration/ha.md" >}}) - [部署 HA]({{< relref "./docs/thirdparty_migration/installha.md" >}}) - [HA 使用实例]({{< relref "./docs/thirdparty_migration/usecase.md" >}}) @@ -200,28 +176,6 @@ headless: true - [工具集用户指南]({{< relref "./docs/userguide/overview.md" >}}) - [patch-tracking]({{< relref "./docs/userguide/patch-tracking.md" >}}) - [pkgship]({{< relref "./docs/userguide/pkgship.md" >}}) -- [Aops用户指南]({{< relref "./docs/A-Ops/overview.md" >}}) - - [AOps部署指南]({{< relref "./docs/A-Ops/AOps部署指南.md" >}}) - - [AOps智能定位框架使用手册]({{< relref "./docs/A-Ops/AOps智能定位框架使用手册.md" >}}) - - [aops-agent部署指南]({{< relref "./docs/A-Ops/aops-agent部署指南.md" >}}) - - [配置溯源服务使用手册]({{< relref "./docs/A-Ops/配置溯源服务使用手册.md" >}}) - - [架构感知服务使用手册]({{< relref "./docs/A-Ops/架构感知服务使用手册.md" >}}) - - [gala-gopher使用手册]({{< relref "./docs/A-Ops/gala-gopher使用手册.md" >}}) - - [gala-anteater使用手册]({{< relref "./docs/A-Ops/gala-anteater使用手册.md" >}}) - - [gala-spider使用手册]({{< relref "./docs/A-Ops/gala-spider使用手册.md" >}}) -- [容器OS升级用户指南]({{< relref "./docs/KubeOS/overview.md" >}}) - - [认识容器OS升级]({{< relref "./docs/KubeOS/认识容器OS升级.md" >}}) - - [安装与部署]({{< relref "./docs/KubeOS/安装与部署.md" >}}) - - [使用方法]({{< relref "./docs/KubeOS/使用方法.md" >}}) - - [容器OS镜像制作指导]({{< relref "./docs/KubeOS/容器OS镜像制作指导.md" >}}) -- [云原生混合部署rubik用户指南]({{< relref "./docs/rubik/overview.md" >}}) - - [安装与部署]({{< relref "./docs/rubik/安装与部署.md" >}}) - - [http接口文档]({{< relref "./docs/rubik/http接口文档.md" >}}) - - [混部隔离示例]({{< relref "./docs/rubik/混部隔离示例.md" >}}) -- [oncn-bwm用户指南]({{< relref "./docs/oncn-bwm/overview.md" >}}) -- [镜像裁剪定制工具使用指南]({{< relref "./docs/TailorCustom/overview.md" >}}) - - [isocut 使用指南]({{< relref "./docs/TailorCustom/isocut使用指南.md" >}}) - - [imageTailor 使用指南]({{< relref "./docs/TailorCustom/imageTailor 使用指南.md" >}}) - [Gazelle用户指南]({{< relref "./docs/Gazelle/Gazelle.md" >}}) - [NestOS用户指南]({{< relref "./docs/NestOS/overview.md" >}}) - [安装与部署]({{< relref "./docs/NestOS/安装与部署.md" >}}) @@ -240,14 +194,6 @@ headless: true - [astream用户指南]({{< relref "./docs/astream/overview.md" >}}) - [安装与使用方法]({{< relref "./docs/astream/安装与使用方法.md" >}}) - [astream用户指南]({{< relref "./docs/astream/astream应用于MySQL指导.md" >}}) -- [容器管理面DPU无感卸载用户指南]({{< relref "./docs/DPUOffload/overview.md" >}}) - - [容器管理面无感卸载]({{< relref "./docs/DPUOffload/容器管理面无感卸载.md" >}}) - - [qtfs共享文件系统架构及使用手册]({{< relref "./docs/DPUOffload/qtfs共享文件系统架构及使用手册.md" >}}) - - [无感卸载部署指导]({{< relref "./docs/DPUOffload/无感卸载部署指导.md" >}}) -- [HSAK开发者指南]({{< relref "./docs/HSAK/introduce_hsak.md" >}}) - - [使用HSAK开发应用程序]({{< relref "./docs/HSAK/develop_with_hsak.md" >}}) - - [HSAK工具使用说明]({{< relref "./docs/HSAK/hsak_tools_usage.md" >}}) - - [HSAK接口说明]({{< relref "./docs/HSAK/hsak_interface.md" >}}) - [Kmesh用户指南]({{< relref "./docs/Kmesh/Kmesh.md" >}}) - [认识Kmesh]({{< relref "./docs/Kmesh/认识Kmesh.md" >}}) - [安装与部署]({{< relref "./docs/Kmesh/安装与部署.md" >}}) @@ -260,3 +206,6 @@ headless: true - [使用方法]({{< relref "./docs/NfsMultipath/使用方法.md" >}}) - [常见问题与解决办法]({{< relref "./docs/NfsMultipath/常见问题与解决方法.md" >}}) - [插件框架特性用户指南]({{< relref "./docs/Pin/插件框架特性用户指南.md" >}}) +- [OmniVirt用户指南]({{< relref "./docs/OmniVirt/overall.md" >}}) + - [在Windows下安装与运行OmniVirt]({{< relref "./docs/OmniVirt/win-user-manual.md" >}}) + - [在MacOS下安装与运行OmniVirt]({{< relref "./docs/OmniVirt/mac-user-manual.md" >}})