SilverLining.Cloud – Infrastructure-Native APIs & Cloud Utilities

Building a Scalable Paddle OCR Pipeline with AWS and Azure

Published on 09.04.2026

This article provides a technical breakdown of our PaddleOCR document analysis system. The architecture is designed to handle high-volume OCR tasks using a multi-cloud approach, combining AWS for API management and Azure for GPU-accelerated processing.

Try out our PaddleOCR API live in your browser: https://silverlining.cloud/products/ocr-api

Scroll down to learn more ⬇️

System Architecture

The system uses an asynchronous, worker-based pattern to ensure stability during large jobs.

API Entry Point (AWS Lambda): The user sends a request containing a file_url. The Lambda validates the input, downloads the file, and stores it in AWS S3.
Job Queue (Azure Queue Storage): Once the file is stored, the Lambda adds a message to an Azure Storage Queue. This message contains the job metadata (language, page ranges, and file location).
GPU Processing (Azure Container Apps): A dedicated worker pulls jobs from the queue. This worker runs in a containerized environment with access to NVIDIA T4 GPUs.
Result Storage: The worker produces two types of output:
Raw JSON: The full, detailed output from the OCR engine.
Normalized JSON: A cleaned version containing full text, layout regions, and table data. These are stored in Azure Blob Storage.

The Processing Engine

The core of the system is PaddleOCR (PP-StructureV3). This model does more than just recognize text; it performs complex document analysis:

Layout Analysis: It identifies different regions such as text blocks, titles, and images.
Table Recognition: It converts visual tables into structured data.
Multi-language Support: The system supports over 30 languages, including Chinese, Arabic, Cyrillic, and various Devanagari scripts.
PDF Handling: We use pypdfium2 to render PDF pages into high-resolution images for the OCR engine to process.

Infrastructure and Scaling

We use a "Scale-to-Zero" strategy to keep costs efficient while maintaining high performance.

GPU Acceleration: The worker uses CUDA 11.8 to run PaddlePaddle on GPU, which significantly reduces processing time for multi-page documents.
KEDA Scaling: We use KEDA (Kubernetes-based Event Driven Autoscaling) on Azure. If the queue is empty, the system scales down to zero replicas, meaning no GPU costs are incurred. When jobs arrive, it can scale up to 15 concurrent workers.
Monitoring: Every request and job status is tracked in Amazon DynamoDB, allowing users to poll for results or receive updates via Webhooks.

Data Output

Users receive a job_id to fetch their results. The API provides a normalized payload that maps OCR text lines back to their specific layout regions (e.g., matching a line of text to a specific box on the page). If the result is smaller than 1MB, it is returned directly in the API response; otherwise, a secure download URL is provided.

Skip the Infrastructure Pain

Setting up GPU clusters and managing auto-scaling is a headache. Skip the infrastructure work and start using our ready-made API today. You can focus on your code while we handle the servers.