# ocr_video

**Repository Path**: juht/ocr_video

## Basic Information

- **Project Name**: ocr_video
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-09
- **Last Updated**: 2025-09-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# OpenOCR Video Pipeline with Memory Optimization

This project provides a high-performance OCR pipeline specifically optimized for video processing with intelligent memory management. It supports configurable processing strategies for videos of all lengths while preventing memory overflow.

## Key Features

### Performance Optimizations
* **Memory-Optimized Processing:** Automatically adjusts batch sizes and segment durations based on available memory
* **Dynamic Resource Management:** Monitors and controls RAM usage to prevent out-of-memory errors
* **Video Length Optimization:** Special handling for short, medium and long videos
* **Segment-Based Processing:** Divides long videos into manageable chunks to optimize memory usage

### OCR Capabilities
* **Region of Interest (ROI) Filtering:** Focus OCR on specific areas (e.g., subtitle regions)
* **Text Line Merging:** Intelligently combines related text boxes into coherent lines
* **Frame Skipping:** Process only key frames to improve performance
* **Subtitle Tracking:** Tracks text across frames for consistent subtitle extraction
* **Duplicate Suppression:** Prevents repetitive text while maintaining temporal accuracy

### Output Options
* **JSON Export:** Detailed OCR results with timestamps and positions
* **SRT Generation:** Creates subtitle files compatible with video players
* **Visualization:** Optional debug visualizations of detected text regions
* **Performance Metrics:** Memory usage graphs and timing statistics

## Installation

```bash
# Clone repository
git clone https://github.com/yourusername/openocr-video.git
cd openocr-video

# Install dependencies
pip install -r requirements.txt
```

## Quick Start

```bash
# Basic usage with memory optimization
python main.py --video_path input.mp4 --generate_srt --max_memory_gb 4.0

# For subtitle extraction with region of interest (bottom of screen)
python main.py --video_path input.mp4 --roi "0.1,0.7,0.9,0.97" --max_memory_gb 4.0

# For high-resolution videos on limited-memory systems
python main.py --video_path 4k_video.mp4 --resize_factor 0.4 --max_memory_gb 2.0
```

## Memory Optimization Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--max_memory_gb` | Maximum memory usage allowed in GB | 4.0 |
| `--resize_factor` | Frame size reduction factor (0-1) | 0.5 |
| `--initial_batch_size` | Starting batch size for processing | 8 |
| `--segment_duration_sec` | Duration of each video segment in seconds | 60 |
| `--dynamic_sizing` | Enable automatic batch and segment size adjustment | True |

## Understanding `segment_duration_sec`

The `segment_duration_sec` parameter is crucial for memory management and processing efficiency:

### What It Does
This parameter controls how video frames are grouped for processing. The system:
1. Divides the video into time segments (e.g., 60-second chunks)
2. Opens the video file and processes one segment
3. Closes the video file and releases all memory
4. Repeats for the next segment

### Impact on Memory Usage
- **Smaller Values (10-30 seconds)**:
  - Lower peak memory consumption
  - More frequent memory cleanup
  - Better for memory-constrained systems
  - Recommended for 4K videos or when processing with limited RAM

- **Larger Values (120-300 seconds)**:
  - Higher peak memory usage
  - Fewer memory cleanup operations
  - Better for systems with abundant RAM (16GB+)
  - May cause out-of-memory errors on long, high-resolution videos

### Impact on Processing Speed
- **Smaller Values (10-30 seconds)**:
  - More overhead from frequent video file open/close operations
  - Slightly slower overall processing due to segment transition overhead
  - Better for real-time monitoring of progress
  
- **Larger Values (120-300 seconds)**:
  - Less file operation overhead
  - Typically faster overall processing when memory is sufficient
  - May become slower if memory limits are reached and system starts swapping

### Recommended Settings by Video Length

| Video Length | Recommended Setting | Impact on Speed |
|--------------|---------------------|-----------------|
| < 3 minutes  | Set to video length | 2-5x faster     |
| 3-10 minutes | 1/3 of video length | 1.5-3x faster   |
| > 10 minutes | 60-120 seconds      | Minimal impact  |

**Note:** With `dynamic_sizing=True` (default), the system will automatically optimize this parameter based on video length.

## Optimizing for Video Length

The system automatically adjusts its processing strategy based on video duration:

### Short Video Optimization (< 3 minutes)
- **Single Segment Processing**: Processes the entire video in one pass
- **Larger Batch Sizes**: Uses up to 2x larger batch sizes for faster processing
- **Reduced Overhead**: Eliminates file opening/closing operations
- **Maximum Memory Utilization**: Uses more available memory to speed up processing

### Medium Video Optimization (3-30 minutes)
- **Reduced Segmentation**: Uses fewer segments to minimize overhead
- **Balanced Memory Usage**: Adjusts segment size proportionally to video length
- **Adaptive Batch Sizing**: Maintains larger batch sizes when memory permits

### Long Video Optimization (> 30 minutes)
- **Full Memory Management**: Applies complete memory optimization strategy
- **Controlled Segmentation**: Keeps segment size between 30-300 seconds based on available memory
- **Conservative Batch Sizing**: Prioritizes stable processing over speed

### Memory vs Speed Trade-off:
- With `dynamic_sizing=True`, the system automatically optimizes for your video length
- For short videos on high-memory systems, set `max_memory_gb` higher (e.g., 8.0-16.0) to maximize speed
- For short videos on limited memory, keep `resize_factor` low (e.g., 0.3-0.4) to allow larger batch sizes

## OCR Processing Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--sec_skip` | Process frames every N seconds | 0.5 |
| `--roi` | Region of interest (format: "x1,y1,x2,y2") | "0.1,0.6,0.9,0.97" |
| `--drop_score` | Minimum confidence score | 0.9 |
| `--line_y_thresh` | Vertical threshold for line merging | 0.5 |
| `--line_x_gap` | Horizontal gap threshold for line merging | 0.3 |
| `--iou_thresh` | IoU threshold for duplicate detection | 0.5 |
| `--min_interval` | Minimum time between duplicate text | 5.0 |
| `--text_sim_threshold` | Text similarity threshold | 0.8 |

## Output Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--output_json` | Path to save JSON results | "output_results.json" |
| `--output_srt` | Path to save SRT subtitle file | "output_results.srt" |
| `--generate_srt` | Generate SRT file | True |
| `--debug_det_dir` | Directory to save detection visualizations | None |

## Advanced Usage Examples

### Processing a Long HD Movie

```bash
python main.py --video_path movie.mp4 --max_memory_gb 6.0 --segment_duration_sec 120 \
  --roi "0.1,0.75,0.9,0.95" --sec_skip 1.0 --resize_factor 0.5
```

### Processing a 4K Video on Limited Memory

```bash
python main.py --video_path 4k_concert.mp4 --max_memory_gb 2.0 --segment_duration_sec 30 \
  --resize_factor 0.3 --initial_batch_size 4
```

### Maximum Speed for Short Videos

```bash
python main.py --video_path short_clip.mp4 --max_memory_gb 16.0 \
  --dynamic_sizing --initial_batch_size 16
```

### Extracting Subtitles from an Anime

```bash
python main.py --video_path anime_episode.mp4 --roi "0.1,0.8,0.9,0.95" \
  --text_sim_threshold 0.95 --min_interval 3.0
```

## Performance Logging

The system automatically logs performance metrics to the `logs/` directory:
- Memory usage graphs
- Processing speed statistics
- Segment and batch timing details

## JSON Output Format

```json
[
  {
    "start_time": 10.5,
    "end_time": 13.2,
    "timestamp": 10.5,
    "texts": [
      {
        "text": "Example subtitle text",
        "score": 0.97,
        "box": [250, 400, 800, 50]
      }
    ]
  }
]
```

## Understanding Memory Optimization

The system uses a multi-level memory optimization approach:

1. **Frame-Level**: Resizes frames to reduce memory footprint
2. **Batch-Level**: Dynamically adjusts batch sizes based on available memory
3. **Segment-Level**: Processes the video in time-based chunks (segment_duration_sec)
4. **Video-Length Aware**: Uses different strategies for short, medium, and long videos
5. **Monitoring**: Continuously tracks memory usage and adjusts accordingly

## Troubleshooting

### Out of Memory Errors
- Decrease `--max_memory_gb` to set a lower limit
- Reduce `--resize_factor` to work with smaller frames
- Decrease `--segment_duration_sec` to process smaller video chunks
- Lower `--initial_batch_size` to reduce peak memory usage

### Slow Processing
- Increase `--sec_skip` to process fewer frames
- Set a specific ROI to process smaller image regions
- Increase `--max_memory_gb` if you have available RAM
- Decrease `--segment_duration_sec` for short videos to avoid unnecessary segmentation
- Increase `--initial_batch_size` for faster batch processing when memory allows

### Poor Text Recognition
- Decrease `--resize_factor` to maintain higher image quality
- Adjust `--line_y_thresh` and `--line_x_gap` for better line merging
- Increase `--drop_score` threshold for higher confidence results
- Reduce `--text_sim_threshold` to capture more variations in text

## License

[MIT License](LICENSE)