PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

PP-OCRv6 reaches 86.2% detection accuracy with models ranging from 1.5M to 34.5M parameters

PP-OCRv6 is the latest release in PaddleOCR’s universal OCR model family, designed to handle text detection and recognition across documents, screenshots, multilingual images, digital displays, industrial labels, and scene text.

The model family scales from 1.5M to 34.5M parameters across three tiers: tiny, small, and medium. The medium and small tiers support 50 languages, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. You can test PP-OCRv6 online using the PP-OCRv6 Online Demo.

On PaddleOCR’s official multi-scenario benchmarks, PP-OCRv6_medium achieves 86.2% detection Hmean and 83.2% recognition accuracy. Compared with PP-OCRv5_server, it improves text detection by 4.6 percentage points and text recognition by 5.1 percentage points.

PP-OCRv6 targets a practical need: producing accurate, structured text outputs with small models and flexible deployment options. For context on why specialized OCR models remain useful despite the rise of Vision-Language Models, see the previous blog post on PP-OCRv5.

The release introduces improvements to architecture, training, and data for both detection and recognition. The main goal is to increase accuracy while keeping model sizes suitable for different deployment settings.

Model tiers and performance

PP-OCRv6 provides three model tiers covering different sizes and accuracy levels:

PP-OCRv6_tiny: 1.5M parameters. Detection Hmean 80.6%. Recognition accuracy 73.5%. Typical scenarios include edge devices, lightweight local OCR, latency-sensitive demos, and constrained environments.
PP-OCRv6_small: 7.7M parameters. Detection Hmean 84.1%. Recognition accuracy 81.3%. Typical scenarios include mobile, desktop, balanced OCR services, and multilingual OCR with lower compute cost.
PP-OCRv6_medium: 34.5M parameters. Detection Hmean 86.2%. Recognition accuracy 83.2%. Typical scenarios include accuracy-oriented OCR, server-side pipelines, industrial OCR, document ingestion, and multilingual OCR.

PP-OCRv6 uses PPLCNetV4 as a unified backbone for text detection and text recognition.

For developers, the main benefit is consistency across the model family. The tiny, small, and medium tiers are not unrelated models; they are part of the same OCR family and share a common architectural direction.

Technical improvements

Text detection is the first stage of the OCR pipeline. Detection quality affects the crops sent to the recognizer, and poor crops often lead to poorer recognition.

PP-OCRv6 upgrades the detection module with RepLKFPN, a lightweight large-kernel feature pyramid network designed for multi-scale text detection while keeping inference efficient.

This is relevant for real-world OCR inputs, where text may be small, dense, rotated, low-resolution, or embedded in complex backgrounds.

For text recognition, PP-OCRv6 uses EncoderWithLightSVTR. It combines local context modeling with global attention to improve recognition quality on challenging text crops.

The recognition improvements are especially relevant for multilingual text, screen text, industrial characters, special symbols, dense text, and noisy image regions.

The medium and small tiers support 50 languages in one model family, covering Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages.

This helps reduce the need for separate OCR models across common multilingual scenarios.

Deployment options

Install PaddleOCR:

pip install paddleocr

Run OCR with Paddle Inference (Default backend):

from paddleocr import PaddleOCR

# Model: PP-OCRv6_medium(Default)
# Backend: Paddle Inference(Default)
ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
)
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")

for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

The OCR result can be saved as visualization images and structured JSON output. The structured output can then be used by downstream systems such as document parsing, search, extraction, RAG, analytics, or agent workflows.

PP-OCRv6 can be used with multiple inference backends through PaddleOCR. PaddleOCR 3.7 provides a unified inference-engine interface, where engine selects the underlying runtime and related configuration can be passed through the pipeline or module API.

Available backends include:

Transformers: Hugging Face / PyTorch-oriented inference path for supported PaddleOCR models.
ONNX Runtime: Portable inference path for ONNX-based deployment environments.
Paddle Inference: Native Paddle inference format.

For Hugging Face users, PaddleOCR supports running selected OCR and document parsing models with a Transformers backend. This can be enabled with:

engine="transformers"

For more details on how the Transformers backend works in PaddleOCR, see the PaddleOCR blog post on running OCR and Document Parsing Tasks with a Transformers Backend.

Run PP-OCRv6 example with Transformer Backend:

from paddleocr import PaddleOCR

# Model: PP-OCRv6_medium(Default)
# Backend: transformers
ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="transformers",
)
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")

ONNX variants are also available in the PP-OCRv6 Collection for environments that use ONNX Runtime through engine="onnxruntime":

from paddleocr import PaddleOCR

# Model: PP-OCRv6_medium(Default)
# Backend: ONNX Runtime
ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="onnxruntime",
)
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")

Together, these backend options make PP-OCRv6 available across different runtime environments while keeping the same OCR model family on the Hugging Face Hub.

PP-OCRv6 extends PaddleOCR with a lightweight, multilingual OCR model family for real-world text detection and recognition.

The release includes three model tiers from 1.5M to 34.5M parameters, up to 50-language OCR support, improved detection and recognition accuracy over PP-OCRv5_server, and multiple model formats on the Hugging Face Hub, including safetensors, Paddle inference models, and ONNX models.

Together with the hosted Hugging Face Space and the available PaddleOCR inference backends, PP-OCRv6 provides several entry points for evaluation and integration:

Online Demo: PP-OCRv6 Online Demo
Model Collection: PP-OCRv6 Collection
Transformers Backend Blog: PaddleOCR with Transformers Backend
PaddleOCR Documentation: PP-OCRv6 Documentation
PaddleOCR Official Website: https://www.paddleocr.com

You can evaluate PP-OCRv6 with the online demo, explore the available model assets in the Collection, and use the inference backend that matches your own OCR workflow.

Source Read original →