# Mediapipe-BiLSTM-SLR

**Repository Path**: elfbobo_admin_admin/bilstmslr

## Basic Information

- **Project Name**: Mediapipe-BiLSTM-SLR
- **Description**:  基于 MediaPipe 与 BiLSTM 的手语识别系统，包含数据采集、数据处理、模型训练、模型评估与可视化测试面板。
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2026-04-22
- **Last Updated**: 2026-04-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# MediapipeBiLSTMSLR

中文 | [English](#english)

一个基于 `MediaPipe` 关键点提取与 `BiLSTM` 时序建模的轻量级手语识别开源项目，包含**数据采集、数据处理、模型训练、模型评估与实时测试面板**等完整流程，适合课程设计、科研实验、算法复现与二次开发。

## 项目简介

本项目通过 `MediaPipe Hands + Pose` 提取双手与上半身关键点特征，并将连续帧特征序列输入 `BiLSTM` 模型进行手语动作识别。整个流程尽量保持轻量、直观、易复现，便于在普通设备上快速搭建原型系统。

### 主要特性

- 基于 `MediaPipe` 的手部与姿态关键点提取
- 基于 `BiLSTM` 的时序动作建模
- 提供图形化数据采集工具
- 支持数据增强、归一化、定长序列处理
- 提供模型训练与测试评估模块
- 提供实时识别测试面板
- 代码结构清晰，适合开源发布与二次开发

## 功能模块

- `data_collector_gui.py`：图形化手语数据采集工具
- `data_processor.py`：数据增强、序列对齐、归一化与数据集生成
- `model_trainer.py`：BiLSTM 模型训练与训练曲线保存
- `model_evaluator.py`：模型评估、性能指标统计与可视化输出
- `test_gui.py`：实时手语识别测试面板
- `utils.py`：关键点提取、位置校准、图像绘制等通用工具函数
- `config.py`：项目路径与超参数配置
- `mp_compat.py`：MediaPipe / protobuf 兼容补丁

## 项目结构

```text
MediapipeBiLSTMSLR/
├── config.py
├── mp_compat.py
├── utils.py
├── data_collector_gui.py
├── data_processor.py
├── model_trainer.py
├── model_evaluator.py
├── test_gui.py
├── requirements.txt
├── README.md
├── data/
│   ├── raw/
│   └── processed/
└── models/
```

## 环境依赖

建议使用 `Python 3.10 ~ 3.11`。

安装依赖：

```bash
pip install -r requirements.txt
```

## 数据说明

### 原始数据目录

采集后的原始关键点序列保存在：

```text
data/raw/<gesture_name>/
```

每个样本通常包含：
- `video_xxx.npy`：关键点序列特征
- `video_xxx_metadata.json`：样本元信息

### 处理后数据目录

处理后的训练数据保存在：

```text
data/processed/
```

主要文件包括：
- `sequences.npy`
- `labels.npy`
- `label_map.json`

## 使用流程

### 1. 数据采集

运行图形化采集工具：

```bash
python data_collector_gui.py
```

建议采集时：
- 保证上半身完整出现在画面中
- 双手动作尽量清晰
- 每个手语类别采集多个样本
- 不同人、不同速度、不同幅度可提升泛化能力

### 2. 数据处理

运行数据处理脚本：

```bash
python data_processor.py
```

该步骤会完成：
- 加载原始数据
- 数据增强
- 序列补齐或截断
- 特征归一化
- 保存训练数据集

### 3. 模型训练

运行训练脚本：

```bash
python model_trainer.py
```

训练完成后会在 `models/` 下生成：
- `best_model.keras`
- `model.keras`
- `label_map.json`
- `model_config.json`
- `training_history.png`

### 4. 模型评估

运行评估脚本：

```bash
python model_evaluator.py
```

评估结果会输出：
- 分类报告
- 混淆矩阵
- 置信度分布图
- 各类别性能图
- `evaluation_results.json`

### 5. 实时测试

运行测试面板：

```bash
python test_gui.py
```

可以通过摄像头实时录制手语动作并进行识别。

## 模型方法概述

本项目的整体流程如下：

1. 使用 `MediaPipe` 提取双手与上半身关键点
2. 将每一帧关键点拼接为固定维度特征向量
3. 将连续帧组成时序序列
4. 使用 `BiLSTM` 提取时间动态特征
5. 通过全连接层输出手语类别概率

## 可配置参数

你可以在 `config.py` 中修改：

- 数据采集参数
- MediaPipe 检测阈值
- 序列长度
- 特征维度
- 数据增强倍率
- 学习率
- Batch Size
- Epoch 数量
- Dropout 比例
- 实时识别阈值

## 适用场景

本项目适用于：

- 手语识别课程项目
- 毕业设计与科研实验
- MediaPipe + 时序建模学习
- 小样本动作识别原型开发
- 相关算法的二次开发与功能扩展

## 注意事项

- 当前项目默认使用关键点序列而非原始视频帧进行训练。
- 若样本量较少，模型结果可能存在较大波动。
- 如需更强泛化能力，建议扩充数据集并引入更多被试者。
- 实时识别效果与摄像头质量、光照、动作规范程度密切相关。

## 开源说明

本仓库为手语识别算法模块的开源整理版本，保留了核心流程与主要功能，适合独立部署与代码复用。

如你在研究或项目中使用本仓库，欢迎注明来源。

---

## English

MediapipeBiLSTMSLR is a lightweight open-source sign language recognition project based on `MediaPipe` landmark extraction and `BiLSTM` temporal modeling. It provides a complete pipeline including **data collection, data processing, model training, model evaluation, and a real-time testing panel**.

## Overview

This project uses `MediaPipe Hands + Pose` to extract hand and upper-body landmarks, converts them into frame-wise feature vectors, and feeds sequential features into a `BiLSTM` model for sign language recognition. The implementation is designed to be simple, lightweight, and easy to reproduce.

### Key Features

- Hand and pose landmark extraction based on `MediaPipe`
- Temporal sequence modeling with `BiLSTM`
- GUI-based data collection tool
- Data augmentation, normalization, and fixed-length sequence processing
- Model training and evaluation modules
- Real-time sign language recognition test panel
- Clear code structure for open-source release and secondary development

## Modules

- `data_collector_gui.py`: GUI tool for sign language data collection
- `data_processor.py`: data augmentation, sequence alignment, normalization, and dataset generation
- `model_trainer.py`: BiLSTM model training and training-curve export
- `model_evaluator.py`: model evaluation, metrics summary, and visualization
- `test_gui.py`: real-time recognition test panel
- `utils.py`: utility functions for landmark extraction, position checking, and visualization
- `config.py`: project paths and hyperparameter settings
- `mp_compat.py`: compatibility patch for MediaPipe / protobuf

## Project Structure

```text
MediapipeBiLSTMSLR/
├── config.py
├── mp_compat.py
├── utils.py
├── data_collector_gui.py
├── data_processor.py
├── model_trainer.py
├── model_evaluator.py
├── test_gui.py
├── requirements.txt
├── README.md
├── data/
│   ├── raw/
│   └── processed/
└── models/
```

## Requirements

`Python 3.10 ~ 3.11` is recommended.

Install dependencies with:

```bash
pip install -r requirements.txt
```

## Data Format

### Raw Data Directory

Collected landmark sequences are stored in:

```text
data/raw/<gesture_name>/
```

Each sample usually contains:
- `video_xxx.npy`: landmark feature sequence
- `video_xxx_metadata.json`: sample metadata

### Processed Data Directory

Processed training data are saved in:

```text
data/processed/
```

Main files include:
- `sequences.npy`
- `labels.npy`
- `label_map.json`

## Usage

### 1. Data Collection

```bash
python data_collector_gui.py
```

Suggestions:
- Keep the upper body fully visible in the camera view
- Make hand movements clear and consistent
- Collect multiple samples for each gesture class
- Include variations in speed and motion amplitude for better generalization

### 2. Data Processing

```bash
python data_processor.py
```

This step performs:
- raw data loading
- data augmentation
- sequence padding or truncation
- feature normalization
- processed dataset export

### 3. Model Training

```bash
python model_trainer.py
```

After training, the `models/` directory will contain:
- `best_model.keras`
- `model.keras`
- `label_map.json`
- `model_config.json`
- `training_history.png`

### 4. Model Evaluation

```bash
python model_evaluator.py
```

Evaluation outputs include:
- classification report
- confusion matrix
- confidence distribution plot
- per-class performance plot
- `evaluation_results.json`

### 5. Real-time Testing

```bash
python test_gui.py
```

This panel allows real-time sign language recording and recognition via webcam.

## Method

The overall pipeline is as follows:

1. Extract hand and upper-body landmarks using `MediaPipe`
2. Convert each frame into a fixed-dimensional feature vector
3. Build temporal sequences from consecutive frames
4. Use `BiLSTM` to model temporal dynamics
5. Predict sign categories with fully connected layers

## Configurable Parameters

You can modify the following in `config.py`:

- data collection settings
- MediaPipe thresholds
- sequence length
- feature dimension
- augmentation factor
- learning rate
- batch size
- number of epochs
- dropout rate
- recognition threshold

## Use Cases

This project is suitable for:

- sign language recognition coursework
- undergraduate / graduate research projects
- learning MediaPipe + temporal modeling
- small-scale action recognition prototyping
- secondary development and algorithm extension

## Notes

- This project trains on landmark sequences instead of raw video frames.
- With limited data, model performance may vary significantly.
- For better generalization, it is recommended to expand the dataset and include more subjects.
- Real-time recognition performance depends on camera quality, lighting conditions, and motion consistency.

## Open-source Statement

This repository is an open-source extraction of the core sign language recognition module, retaining the essential pipeline and major functionalities for standalone deployment and reuse.

If you use this repository in research or engineering projects, citation or attribution is appreciated.