# ocr

**Repository Path**: jigr/ocr

## Basic Information

- **Project Name**: ocr
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-01
- **Last Updated**: 2025-11-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# OCR Image Recognition API

A FastAPI-based OCR (Optical Character Recognition) service using PaddleOCR for text recognition from images.

## Features

- 🚀 Fast and efficient OCR processing using PaddleOCR 3.1
- 🔧 Built with FastAPI for high-performance web API
- 🖼️ Support for multiple image formats (JPG, PNG, BMP, TIFF, WebP)
- 🌐 Multiple language support (Chinese, English, and more)
- 📁 File upload and Base64 image processing
- 🏥 Health check endpoints
- 📊 Detailed OCR results with confidence scores and bounding boxes
- ⚙️ Configurable settings via environment variables
- 🖼️ **Advanced Image Preprocessing** - Enhance image quality for better OCR accuracy
- 📐 **Skew Correction** - Automatically correct tilted documents and images
- 🎯 **High Accuracy Models** - Use server models for maximum recognition accuracy

## Requirements

- Python 3.8+
- PaddlePaddle 3.2.0 (CPU version)
- PaddleOCR 3.1.0
- FastAPI
- Other dependencies listed in `requirements.txt`

## Installation

1. **Clone or download this project**
   ```bash
   cd ocr
   ```

2. **Create a virtual environment (recommended)**
   ```bash
   python -m venv venv
   
   # On Windows
   venv\Scripts\activate
   
   # On macOS/Linux
   source venv/bin/activate
   ```

3. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```
   
   **Note:** If you encounter dependency conflicts (especially with numpy/opencv), see [DEPENDENCY_RESOLUTION.md](DEPENDENCY_RESOLUTION.md) for solutions, or use the stable version:
   ```bash
   pip install -r requirements-stable.txt
   ```

4. **Set up environment variables (optional)**
   ```bash
   copy .env.example .env
   # Edit .env file to customize settings
   ```

## Usage

### Starting the Server

```bash
python main.py
```

Or using uvicorn directly:
```bash
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```

The API will be available at: `http://localhost:8001`

### API Documentation

- **Swagger UI**: `http://localhost:8001/docs`
- **ReDoc**: `http://localhost:8001/redoc`
- **Enhanced API Guide**: [API_DOCUMENTATION.md](API_DOCUMENTATION.md)
- **Digit-Only OCR Guide**: [DIGITS_OCR_DOCUMENTATION.md](DIGITS_OCR_DOCUMENTATION.md)
- **License Plate Removal OCR Guide**: [LICENSE_PLATE_OCR_DOCUMENTATION.md](LICENSE_PLATE_OCR_DOCUMENTATION.md)
- **Image Preprocessing Guide**: [IMAGE_PREPROCESSING_DOCUMENTATION.md](IMAGE_PREPROCESSING_DOCUMENTATION.md)
- **High Accuracy Models Guide**: [HIGH_ACCURACY_MODELS.md](HIGH_ACCURACY_MODELS.md)

## API Endpoints

### Enhanced OCR API (Recommended)
- `POST /api/ocr/recognize` - Advanced OCR with configurable options
  - Language selection (ch, en, fr, german, korean, etc.)
  - Confidence threshold filtering
  - **Image enhancement and skew correction options**
  - Detailed response with statistics
  - File size validation
- `POST /api/ocr/digits` - Specialized digit-only OCR with filtering
  - Optimized for numerical data (phone numbers, IDs, prices, etc.)
  - Configurable space and special character filtering
  - **Image enhancement and skew correction options**
  - Higher default confidence threshold for accuracy
- `POST /api/ocr/digits_without_license_plate` - Digit-only OCR with license plate removal
  - Detects and removes license plate regions
  - **Image enhancement and skew correction options**
  - Duplicate digit removal capabilities

### Legacy Endpoints
- `POST /ocr/upload` - Basic OCR from uploaded image file
- `POST /ocr/base64` - OCR from base64 encoded image

### Utility Endpoints
- `GET /` - Basic health check
- `GET /health` - Detailed health check with OCR service status
- `GET /ocr/languages` - Get list of supported OCR languages

## Usage Examples

### 1. Enhanced OCR API (Recommended)

```bash
# Basic usage
curl -X POST "http://localhost:8001/api/ocr/recognize" \
     -H "accept: application/json" \
     -H "Content-Type: multipart/form-data" \
     -F "file=@image.jpg"

# With advanced options including preprocessing
curl -X POST "http://localhost:8001/api/ocr/recognize?language=en&confidence_threshold=0.8&enhance_image=true&correct_skew=true" \
     -H "accept: application/json" \
     -H "Content-Type: multipart/form-data" \
     -F "file=@document.png"

# Digit-only OCR with preprocessing
curl -X POST "http://localhost:8001/api/ocr/digits?enhance_image=true" \
     -H "accept: application/json" \
     -H "Content-Type: multipart/form-data" \
     -F "file=@image_with_digits.jpg"

# Digit-only OCR with all options
curl -X POST "http://localhost:8001/api/ocr/digits?confidence_threshold=0.9&enhance_image=true&correct_skew=true" \
     -H "accept: application/json" \
     -H "Content-Type: multipart/form-data" \
     -F "file=@id_card.png"
```

### 2. Legacy Upload Image File

```bash
curl -X POST "http://localhost:8001/ocr/upload" \
     -H "accept: application/json" \
     -H "Content-Type: multipart/form-data" \
     -F "file=@image.jpg"
```

### 2. Base64 Image Processing

```bash
curl -X POST "http://localhost:8001/ocr/base64" \
     -H "accept: application/json" \
     -H "Content-Type: application/json" \
     -d '{"image_base64": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."}'
```

### 3. Python Client Example

```python
import requests

# Enhanced OCR API with preprocessing (recommended)
with open('image.jpg', 'rb') as f:
    files = {'file': ('image.jpg', f, 'image/jpeg')}
    params = {
        'language': 'ch',  # Chinese + English
        'confidence_threshold': 0.7,
        'enhance_image': True,     # Apply image enhancement
        'correct_skew': True       # Apply skew correction
    }
    response = requests.post(
        'http://localhost:8001/api/ocr/recognize', 
        files=files, 
        params=params
    )
    result = response.json()
    print(f"Recognized text: {result['data']['total_text']}")
    print(f"Confidence: {result['data']['average_confidence']:.3f}")

# Digit-only OCR API with preprocessing (specialized for numbers)
with open('id_card.jpg', 'rb') as f:
    files = {'file': ('id_card.jpg', f, 'image/jpeg')}
    params = {
        'confidence_threshold': 0.9,  # Higher for digits
        'filter_spaces': False,        # Keep spaces in ID numbers
        'filter_special_chars': True,  # Remove special chars
        'enhance_image': True          # Apply image enhancement
    }
    response = requests.post(
        'http://localhost:8001/api/ocr/digits', 
        files=files, 
        params=params
    )
    result = response.json()
    print(f"Recognized digits: {result['data']['total_text']}")

# Legacy upload API
with open('image.jpg', 'rb') as f:
    files = {'file': f}
    response = requests.post('http://localhost:8001/ocr/upload', files=files)
    result = response.json()
    print("Recognized text:", result['data']['total_text'])

# Base64 processing
with open('image.jpg', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode()
    response = requests.post(
        'http://localhost:8001/ocr/base64', 
        json={'image_base64': image_data}
    )
    result = response.json()
    print("Recognized text:", result['data']['total_text'])
```

## Response Format

### Successful OCR Response
```json
{
  "success": true,
  "message": "OCR recognition completed successfully",
  "data": {
    "results": [
      {
        "text": "Recognized text",
        "confidence": 0.95,
        "bbox": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
      }
    ],
    "total_text": "All recognized text combined",
    "processing_time": 1.23
  }
}
```

### Error Response
```json
{
  "success": false,
  "message": "Error description",
  "data": null
}
```

## Configuration

The application can be configured using environment variables or a `.env` file:

| Variable | Default | Description |
|----------|---------|-------------|
| `HOST` | `0.0.0.0` | Server host address |
| `PORT` | `8000` | Server port |
| `DEBUG` | `true` | Enable debug mode |
| `OCR_LANGUAGE` | `ch` | OCR language (ch=Chinese+English) |
| `OCR_USE_ANGLE_CLS` | `true` | Enable angle classification |
| `OCR_USE_SPACE_CHAR` | `true` | Use space character in recognition |
| `OCR_GPU` | `false` | Use GPU acceleration (set to false for CPU) |
| `OCR_USE_HIGH_ACCURACY` | `false` | Use high accuracy server models |
| `OCR_DET_MODEL_DIR` | `` | Custom detection model directory |
| `OCR_REC_MODEL_DIR` | `` | Custom recognition model directory |
| `OCR_CLS_MODEL_DIR` | `` | Custom classification model directory |
| `MAX_FILE_SIZE` | `10485760` | Max upload file size (10MB) |
| `LOG_LEVEL` | `INFO` | Logging level |

## Supported Languages

The default configuration supports Chinese and English (`ch`). Other supported languages include:

- `en` - English
- `fr` - French  
- `german` - German
- `korean` - Korean
- `japan` - Japanese
- `chinese_cht` - Traditional Chinese
- And more...

Use `GET /ocr/languages` endpoint to get the full list.

## Project Structure

```
ocr/
├── main.py                 # FastAPI application entry point
├── requirements.txt        # Python dependencies
├── .env.example           # Environment variables template
├── config/
│   ├── __init__.py
│   └── settings.py        # Application settings
├── models/
│   ├── __init__.py
│   └── response_models.py # Pydantic response models
├── services/
│   ├── __init__.py
│   └── ocr_service.py     # OCR service implementation
└── utils/
    ├── __init__.py
    ├── file_utils.py      # File handling utilities
    └── logging_utils.py   # Logging configuration
```

## Development

### Running Tests
```bash
# Add test files and run
pytest
```

### Code Formatting
```bash
# Using black for code formatting
black .

# Using flake8 for linting
flake8 .
```

## Troubleshooting

### Common Issues

1. **PaddleOCR installation issues**
   - Make sure you have the correct Python version (3.8+)
   - Try installing with specific versions: `pip install paddlepaddle==3.2.0 paddleocr==3.1.0`

2. **Memory issues**
   - The first OCR recognition may take longer as PaddleOCR downloads models
   - Ensure sufficient disk space for model files (~500MB)

3. **Image processing errors**
   - Verify image format is supported
   - Check image file is not corrupted
   - Ensure image size is within limits

### Logs

Check application logs for detailed error information. Logs include:
- OCR initialization status
- Processing times
- Error details
- API request/response information

## Performance Notes

- First OCR request may be slower due to model loading
- CPU version is used by default for better compatibility
- For production use, consider using GPU version for better performance
- Large images may take more time to process

## License

This project is provided as-is for educational and development purposes.

## Contributing

Feel free to submit issues and enhancement requests!