# DynaMigrate **Repository Path**: yunwe/DynaMigrate ## Basic Information - **Project Name**: DynaMigrate - **Description**: datax的辅助工具,自动生成配置和执行同步 - **Primary Language**: Unknown - **License**: MulanPSL-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-07 - **Last Updated**: 2025-07-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: Database, MySQL, Python, shell, sync ## README # DynaMigrate 本项目用于自动生成 DataX JOB配置文件,实现各类数据源之间的表结构、数据同步。它会自动检测源数据源中的所有表,并为每个表生成对应的 DataX Job 文件。 ## 功能特性 1. 自动发现源数据库中的所有表 2. 为每张表生成完整的 DataX 作业配置 3. 自动检测主键或唯一索引用于数据分片 4. 支持多种写入模式(insert/update/replace) 5. 可配置性能参数(通道数、批量大小等) ## 启动 运行容器 如果不使用容器需要自行安装datax和python3环境 ``` docker run --name datax -d registry.cn-hangzhou.aliyuncs.com/huang-image/datax:v202309_5 docker exec -it datax bash ``` ## 同步数据 ### MySQL_To_MySQL #### 同步步骤 1. 创建配置文件 `config.json` 进入容器后,跳转到指定路径:`cd /opt/datax/MySQL2MySQL` 配置数据库连接信息和模式(update/insert/replace) ```json { "source_db": { "host": "10.180.20.40", "port": 3309, "user": "root", "password": "Cityos123@", "database": "test", "ssl": false }, "target_db": { "host": "10.180.20.41", "port": 3309, "user": "root", "password": "Cityos123@", "database": "test" }, "datax": { "job_dir": "./job", "script_dir": "./scripts", "max_workers": 4, "channel": 8, "batch_size": 4096, "write_mode": "update", "pre_sql": [] } } ``` 2. 同步表结构 ```bash 8ac94625d3ed:/opt/datax/MySQL2MySQL# python3 sync_mysql_schema.py config.json 2025-06-19 03:22:21,633 - INFO - Starting schema sync, begin time: 2025-06-19 03:22:21.633622 2025-06-19 03:22:21,633 - INFO - Connecting to source database... 2025-06-19 03:22:21,667 - INFO - Connecting to target database... 2025-06-19 03:22:21,670 - INFO - Getting source database objects... 2025-06-19 03:22:21,706 - INFO - Successfully got structure for table app_user 2025-06-19 03:22:21,706 - INFO - Got structures for 1 tables and 0 views 2025-06-19 03:22:21,706 - INFO - Starting schema sync to target database... 2025-06-19 03:22:21,708 - INFO - Starting table sync... 2025-06-19 03:22:21,708 - INFO - Table app_user already exists, skipping creation 2025-06-19 03:22:21,708 - INFO - Table app_user already exists, skipping creation 2025-06-19 03:22:21,711 - INFO - All objects synced successfully 2025-06-19 03:22:21,711 - INFO - Sync completed, end time: 2025-06-19 03:22:21.711862 2025-06-19 03:22:21,711 - INFO - Total duration: 0:00:00.078240 # 检查源数据库和目标数据库的表结构及数据行数差异 8ac94625d3ed:/opt/datax/MySQL2MySQL# python3 database_comparator.py config.json 2025-06-19 03:26:03,966 - INFO - Connecting to source database... 2025-06-19 03:26:03,985 - INFO - Connecting to target database... 2025-06-19 03:26:03,987 - INFO - Successfully connected to both databases 2025-06-19 03:26:03,990 - INFO - Common tables: 1 2025-06-19 03:26:03,990 - INFO - Tables only in source: 0 2025-06-19 03:26:03,990 - INFO - Tables only in target: 0 2025-06-19 03:26:03,990 - INFO - Starting row count comparison for common tables... 2025-06-19 03:26:04,396 - INFO - Table app_user: Counts match (800000) 2025-06-19 03:26:04,396 - INFO - Comparison summary: 2025-06-19 03:26:04,396 - INFO - Total tables compared: 1 2025-06-19 03:26:04,396 - INFO - Tables with matching counts: 1 2025-06-19 03:26:04,396 - INFO - Tables with mismatched counts: 0 2025-06-19 03:26:04,396 - INFO - Tables with comparison errors: 0 2025-06-19 03:26:04,396 - INFO - Results saved to results/comparison_results.json 2025-06-19 03:26:04,396 - INFO - Database connections closed ``` 3. 生成Job文件,json将保存在 `job_dir` 指定的目录中 ```bash 8ac94625d3ed:/opt/datax/MySQL2MySQL# python3 generate_mysql_job.py config.json 2025-06-19 03:26:58,316 - INFO - Built writer config for table app_user with 9 columns 2025-06-19 03:26:58,316 - INFO - Generated 1 jobs ``` 4. 同步数据 ```bash # 脚本若提示有迁移失败的表,日志存放在./logs/下,后台执行使用 nohup ./run_sync.sh & 8ac94625d3ed:/opt/datax# ./run_sync.sh [2025-06-19 03:27:10] Starting DataX sync with 4 parallel workers [2025-06-19 03:27:10] Processing job (PID: 171)... [2025-06-19 03:27:10] Processing app_user (PID: 172)... [SUCCESS] Synced job [SUCCESS] Synced app_user ---------------------------------------- [2025-06-19 03:27:33] Sync completed Total runtime: 00:00:23 All tables synced successfully ``` 如果希望迁移指定表,可以在执行完 **步骤4** 后将不需要迁移的 ./job/ 下的json移动到其他路径 #### 配置文件详细说明 `config.json` 内容示例如下: ```json { "source_db": { "host": "source_host", "port": 3306, "user": "username", "password": "password", "database": "source_db" }, "target_db": { "host": "target_host", "port": 3306, "user": "username", "password": "password", "database": "target_db" }, "datax": { "job_dir": "./jobs", "script_dir": "./scripts", "channel": 8, "batch_size": 4096, "write_mode": "insert", "where": "create_time > '2023-01-01'", "pre_sql": "truncate table ${table}" } } ``` - **source_db**: 源数据库连接配置 - `host`: 数据库主机 - `port`: 端口号(默认3306) - `user`: 用户名 - `password`: 密码 - `database`: 数据库名 - **target_db**: 目标数据库连接配置(格式同 source_db) - **datax**: DataX 相关配置 - `job_dir`: 生成的作业文件存放目录 - `script_dir`: 生成的脚本文件存放目录 - `channel`: 并发通道数(默认8) - `batch_size`: 批量大小(默认4096) - `write_mode`: 写入模式(insert/update/replace) - `where`: 可选,数据过滤条件 - `pre_sql`: 可选,写入前执行的SQL(可使用${table}变量) #### 高级选项 1. **增量同步**:在配置文件中设置 `where` 条件,如 `"where": "update_time > '2023-01-01'"` 2. **更新模式**:设置 `"write_mode": "update"` 并确保表有主键或唯一索引 3. **预处理**:使用 `pre_sql` 在同步前执行清理操作,如 `"pre_sql": "truncate table ${table}"` #### 注意事项 1. 确保源数据库和目标数据库网络互通 2. 目标表结构应与源表一致 3. 对于大表,建议配置合适的 `splitPk`(工具会自动检测主键) 4. 同步前建议备份目标数据库 ### Mongodb_To_Mongodb #### 同步步骤 #### 配置文件说明