# obs-datax-plugins
**Repository Path**: HuaweiCloudDeveloper/obs-datax-plugins
## Basic Information
- **Project Name**: obs-datax-plugins
- **Description**: 针对datax的obs插件,提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。
- **Primary Language**: Java
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 8
- **Forks**: 2
- **Created**: 2022-08-25
- **Last Updated**: 2025-08-16
## Categories & Tags
**Categories**: big-data
**Tags**: None
## README
datax 的 OBS 插件,共分读、写两部分。  
由 [潍坊雷鸣云网络科技有限公司](http://www.leimingyun.com) 参与贡献。
## 相关使用文档  
* [obsreader 读](obsreader/doc/obsreader.md)
* [obswriter 写](obswriter/doc/obswriter.md)
## 使用步骤
#### 1. git拉取官方 DataX 仓库
DataX官方git仓库地址: https://github.com/alibaba/DataX
#### 2. 修改 DataX 的 pom.xml 
将 obsreader 和 obswriter 的标签放到 modules 标签内
```
    
    obsreader
    obswriter
```
如下图所示:

#### 3. 修改 DataX 的 package.xml
将 obsreader 和 obswriter 的标签放到 fileSets 标签内
```
    
    
        obsreader/target/datax/
        
            **/*.*
        
        datax
    
    
        obswriter/target/datax/
        
            **/*.*
        
        datax
    
```
如下图所示:

#### 4. 打包
这里以macOS为例子
1. 进入 DataX 项目根目录(这里将DataX放在了 git/DataX 路径下)
    `cd git/DataX`
2. 通过maven进行打包
    `mvn -U clean package assembly:assembly -Dmaven.test.skip=true`
打包成功,日志显示如下
```
[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------------------------
[INFO] Total time: 08:12 min
[INFO] Finished at: 2022-09-13T16:26:48+08:00
[INFO] Final Memory: 133M/960M
[INFO] -----------------------------------------------------------------
```
打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:
```
$ ls ./target/datax/datax/
bin		conf		job		lib		log		log_perf	plugin		script		tmp
```
#### 5. 使用
1. 进入已打好的 DataX 包的 bin 目录下
    `cd git/DataX/target/datax/datax/bin/`
2. 创建作业配置文件(json格式)
   (1) 通过指令查看配置模板
    
    `python datax.py -r obsreader -w obswriter`
   
   模板内容如下:
   ```
   $ python datax.py -r obsreader -w obswriter
   DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
   Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
   Please refer to the obsreader document:
      https://github.com/alibaba/DataX/blob/master/obsreader/doc/obsreader.md
   Please refer to the obswriter document:
      https://github.com/alibaba/DataX/blob/master/obswriter/doc/obswriter.md
   Please save the following configuration as a json file and  use
      python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
   to run the job.
   {
      "job": {
         "content": [
            {
               "reader": {
                  "name": "obsreader",
                  "parameter": {
                     "accessKey": "",
                     "bucket": "",
                     "column": [],
                     "compress": "",
                     "encoding": "",
                     "endpoint": "",
                     "fieldDelimiter": "",
                     "object": [],
                     "secretKey": ""
                  }
               },
               "writer": {
                  "name": "obswriter",
                  "parameter": {
                     "accessKey": "",
                     "bucket": "",
                     "encoding": "",
                     "endpoint": "",
                     "fieldDelimiter": "",
                     "object": "",
                     "secretKey": "",
                     "writeMode": ""
                  }
               }
            }
         ],
         "setting": {
            "speed": {
               "channel": ""
            }
         }
      }
   }
   ```
   (2) 通过模板配置json文件,如下:
    ```
    #obsreader2obswriter.json
    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "obsreader",
              "parameter": {
                "accessKey": "YOURHUAWEICLOUDACCESSKEY",
                "bucket": "testbucket",
                "column": [
                  "*"
                ],
                "compress": "",
                "encoding": "utf-8",
                "endpoint": "https://obs.cn-north-4.myhuaweicloud.com",
                "fieldDelimiter": ",",
                "object": [
                  "readertest/*"
                ],
                "secretKey": "YOURHUAWEICLOUDSECRETKEY"
              }
            },
            "writer": {
              "name": "obswriter",
              "parameter": {
                "accessKey": "YOURHUAWEICLOUDACCESSKEY",
                "secretKey": "YOURHUAWEICLOUDSECRETKEY",
                "bucket": "testbucket",
                "encoding": "",
                "endpoint": "https://obs.cn-north-4.myhuaweicloud.com",
                "fieldDelimiter": "",
                "object": "writertest/test",
                "writeMode": "append"
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": "1"
          }
        }
      }
    }
    ```
3. 启动 DataX
   ```
   cd git/DataX/target/datax/datax/bin/
   python datax.py ./obsreader2obswriter.json
   ```
   同步结束,显示日志如下:
   ```
   ...
   2022-09-14 09:18:25.508 [job-0] INFO  JobContainer - 
   任务启动时刻                    : 2022-09-14 09:18:12
   任务结束时刻                    : 2022-09-14 09:18:25
   任务总计耗时                    :                 13s
   任务平均流量                    :              445B/s
   记录写入速度                    :              4rec/s
   读出记录总数                    :                  48
   读写失败总数                    :                   0
   ```
## 在华为云应用商店快速应用
已放入华为云应用商店,您只需开通一个华为云账户,即可一键安装部署好,并提供可视化UI界面,可直接访问使用。
[https://marketplace.huaweicloud.com/contents/59350065-23ed-4522-8dcd-fae455fa6e7b#productid=00301-1361005-0--0](https://marketplace.huaweicloud.com/contents/59350065-23ed-4522-8dcd-fae455fa6e7b#productid=00301-1361005-0--0)