# ocr **Repository Path**: jigr/ocr ## Basic Information - **Project Name**: ocr - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-01 - **Last Updated**: 2025-11-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # OCR Image Recognition API A FastAPI-based OCR (Optical Character Recognition) service using PaddleOCR for text recognition from images. ## Features - 🚀 Fast and efficient OCR processing using PaddleOCR 3.1 - 🔧 Built with FastAPI for high-performance web API - 🖼️ Support for multiple image formats (JPG, PNG, BMP, TIFF, WebP) - 🌐 Multiple language support (Chinese, English, and more) - 📁 File upload and Base64 image processing - 🏥 Health check endpoints - 📊 Detailed OCR results with confidence scores and bounding boxes - ⚙️ Configurable settings via environment variables - 🖼️ **Advanced Image Preprocessing** - Enhance image quality for better OCR accuracy - 📐 **Skew Correction** - Automatically correct tilted documents and images - 🎯 **High Accuracy Models** - Use server models for maximum recognition accuracy ## Requirements - Python 3.8+ - PaddlePaddle 3.2.0 (CPU version) - PaddleOCR 3.1.0 - FastAPI - Other dependencies listed in `requirements.txt` ## Installation 1. **Clone or download this project** ```bash cd ocr ``` 2. **Create a virtual environment (recommended)** ```bash python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate ``` 3. **Install dependencies** ```bash pip install -r requirements.txt ``` **Note:** If you encounter dependency conflicts (especially with numpy/opencv), see [DEPENDENCY_RESOLUTION.md](DEPENDENCY_RESOLUTION.md) for solutions, or use the stable version: ```bash pip install -r requirements-stable.txt ``` 4. **Set up environment variables (optional)** ```bash copy .env.example .env # Edit .env file to customize settings ``` ## Usage ### Starting the Server ```bash python main.py ``` Or using uvicorn directly: ```bash uvicorn main:app --host 0.0.0.0 --port 8000 --reload ``` The API will be available at: `http://localhost:8001` ### API Documentation - **Swagger UI**: `http://localhost:8001/docs` - **ReDoc**: `http://localhost:8001/redoc` - **Enhanced API Guide**: [API_DOCUMENTATION.md](API_DOCUMENTATION.md) - **Digit-Only OCR Guide**: [DIGITS_OCR_DOCUMENTATION.md](DIGITS_OCR_DOCUMENTATION.md) - **License Plate Removal OCR Guide**: [LICENSE_PLATE_OCR_DOCUMENTATION.md](LICENSE_PLATE_OCR_DOCUMENTATION.md) - **Image Preprocessing Guide**: [IMAGE_PREPROCESSING_DOCUMENTATION.md](IMAGE_PREPROCESSING_DOCUMENTATION.md) - **High Accuracy Models Guide**: [HIGH_ACCURACY_MODELS.md](HIGH_ACCURACY_MODELS.md) ## API Endpoints ### Enhanced OCR API (Recommended) - `POST /api/ocr/recognize` - Advanced OCR with configurable options - Language selection (ch, en, fr, german, korean, etc.) - Confidence threshold filtering - **Image enhancement and skew correction options** - Detailed response with statistics - File size validation - `POST /api/ocr/digits` - Specialized digit-only OCR with filtering - Optimized for numerical data (phone numbers, IDs, prices, etc.) - Configurable space and special character filtering - **Image enhancement and skew correction options** - Higher default confidence threshold for accuracy - `POST /api/ocr/digits_without_license_plate` - Digit-only OCR with license plate removal - Detects and removes license plate regions - **Image enhancement and skew correction options** - Duplicate digit removal capabilities ### Legacy Endpoints - `POST /ocr/upload` - Basic OCR from uploaded image file - `POST /ocr/base64` - OCR from base64 encoded image ### Utility Endpoints - `GET /` - Basic health check - `GET /health` - Detailed health check with OCR service status - `GET /ocr/languages` - Get list of supported OCR languages ## Usage Examples ### 1. Enhanced OCR API (Recommended) ```bash # Basic usage curl -X POST "http://localhost:8001/api/ocr/recognize" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@image.jpg" # With advanced options including preprocessing curl -X POST "http://localhost:8001/api/ocr/recognize?language=en&confidence_threshold=0.8&enhance_image=true&correct_skew=true" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@document.png" # Digit-only OCR with preprocessing curl -X POST "http://localhost:8001/api/ocr/digits?enhance_image=true" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@image_with_digits.jpg" # Digit-only OCR with all options curl -X POST "http://localhost:8001/api/ocr/digits?confidence_threshold=0.9&enhance_image=true&correct_skew=true" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@id_card.png" ``` ### 2. Legacy Upload Image File ```bash curl -X POST "http://localhost:8001/ocr/upload" \ -H "accept: application/json" \ -H "Content-Type: multipart/form-data" \ -F "file=@image.jpg" ``` ### 2. Base64 Image Processing ```bash curl -X POST "http://localhost:8001/ocr/base64" \ -H "accept: application/json" \ -H "Content-Type: application/json" \ -d '{"image_base64": "..."}' ``` ### 3. Python Client Example ```python import requests # Enhanced OCR API with preprocessing (recommended) with open('image.jpg', 'rb') as f: files = {'file': ('image.jpg', f, 'image/jpeg')} params = { 'language': 'ch', # Chinese + English 'confidence_threshold': 0.7, 'enhance_image': True, # Apply image enhancement 'correct_skew': True # Apply skew correction } response = requests.post( 'http://localhost:8001/api/ocr/recognize', files=files, params=params ) result = response.json() print(f"Recognized text: {result['data']['total_text']}") print(f"Confidence: {result['data']['average_confidence']:.3f}") # Digit-only OCR API with preprocessing (specialized for numbers) with open('id_card.jpg', 'rb') as f: files = {'file': ('id_card.jpg', f, 'image/jpeg')} params = { 'confidence_threshold': 0.9, # Higher for digits 'filter_spaces': False, # Keep spaces in ID numbers 'filter_special_chars': True, # Remove special chars 'enhance_image': True # Apply image enhancement } response = requests.post( 'http://localhost:8001/api/ocr/digits', files=files, params=params ) result = response.json() print(f"Recognized digits: {result['data']['total_text']}") # Legacy upload API with open('image.jpg', 'rb') as f: files = {'file': f} response = requests.post('http://localhost:8001/ocr/upload', files=files) result = response.json() print("Recognized text:", result['data']['total_text']) # Base64 processing with open('image.jpg', 'rb') as f: image_data = base64.b64encode(f.read()).decode() response = requests.post( 'http://localhost:8001/ocr/base64', json={'image_base64': image_data} ) result = response.json() print("Recognized text:", result['data']['total_text']) ``` ## Response Format ### Successful OCR Response ```json { "success": true, "message": "OCR recognition completed successfully", "data": { "results": [ { "text": "Recognized text", "confidence": 0.95, "bbox": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] } ], "total_text": "All recognized text combined", "processing_time": 1.23 } } ``` ### Error Response ```json { "success": false, "message": "Error description", "data": null } ``` ## Configuration The application can be configured using environment variables or a `.env` file: | Variable | Default | Description | |----------|---------|-------------| | `HOST` | `0.0.0.0` | Server host address | | `PORT` | `8000` | Server port | | `DEBUG` | `true` | Enable debug mode | | `OCR_LANGUAGE` | `ch` | OCR language (ch=Chinese+English) | | `OCR_USE_ANGLE_CLS` | `true` | Enable angle classification | | `OCR_USE_SPACE_CHAR` | `true` | Use space character in recognition | | `OCR_GPU` | `false` | Use GPU acceleration (set to false for CPU) | | `OCR_USE_HIGH_ACCURACY` | `false` | Use high accuracy server models | | `OCR_DET_MODEL_DIR` | `` | Custom detection model directory | | `OCR_REC_MODEL_DIR` | `` | Custom recognition model directory | | `OCR_CLS_MODEL_DIR` | `` | Custom classification model directory | | `MAX_FILE_SIZE` | `10485760` | Max upload file size (10MB) | | `LOG_LEVEL` | `INFO` | Logging level | ## Supported Languages The default configuration supports Chinese and English (`ch`). Other supported languages include: - `en` - English - `fr` - French - `german` - German - `korean` - Korean - `japan` - Japanese - `chinese_cht` - Traditional Chinese - And more... Use `GET /ocr/languages` endpoint to get the full list. ## Project Structure ``` ocr/ ├── main.py # FastAPI application entry point ├── requirements.txt # Python dependencies ├── .env.example # Environment variables template ├── config/ │ ├── __init__.py │ └── settings.py # Application settings ├── models/ │ ├── __init__.py │ └── response_models.py # Pydantic response models ├── services/ │ ├── __init__.py │ └── ocr_service.py # OCR service implementation └── utils/ ├── __init__.py ├── file_utils.py # File handling utilities └── logging_utils.py # Logging configuration ``` ## Development ### Running Tests ```bash # Add test files and run pytest ``` ### Code Formatting ```bash # Using black for code formatting black . # Using flake8 for linting flake8 . ``` ## Troubleshooting ### Common Issues 1. **PaddleOCR installation issues** - Make sure you have the correct Python version (3.8+) - Try installing with specific versions: `pip install paddlepaddle==3.2.0 paddleocr==3.1.0` 2. **Memory issues** - The first OCR recognition may take longer as PaddleOCR downloads models - Ensure sufficient disk space for model files (~500MB) 3. **Image processing errors** - Verify image format is supported - Check image file is not corrupted - Ensure image size is within limits ### Logs Check application logs for detailed error information. Logs include: - OCR initialization status - Processing times - Error details - API request/response information ## Performance Notes - First OCR request may be slower due to model loading - CPU version is used by default for better compatibility - For production use, consider using GPU version for better performance - Large images may take more time to process ## License This project is provided as-is for educational and development purposes. ## Contributing Feel free to submit issues and enhancement requests!