Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

Mistral AI has launched OCR 4, a document-understanding model that outputs bounding boxes, block classifications, and per-word confidence scores alongside the extracted…

By AI Maestro June 24, 2026 5 min read
Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

Mistral AI has launched OCR 4, a document-understanding model that outputs bounding boxes, block classifications, and per-word confidence scores alongside the extracted text. It handles 170 languages across 10 groups and is designed for self-hosted use in enterprise search and retrieval pipelines.

Key specifications

  • The model returns structured data including typed block labels and confidence metrics, not just plain text.
  • Performance gains are noted for rare and low-resource languages.
  • Independent annotators preferred OCR 4 over other tested systems, with an average win rate of 72%.
  • Pricing is set at $4 per 1,000 pages, reducing to $2 with the Batch-API discount.
  • A single endpoint handles both raw extraction and schema-driven Document AI output.

Structured output over plain text

Previous versions focused on converting pages into clean text and tables. OCR 4 instead provides a structured representation of the entire document. Each block is localised with a bounding box and classified by type, such as titles, tables, equations, or signatures. Inline confidence scores are generated for every page and word.

Downstream systems gain more than just the content. They also know where each element sits, its role, and the model’s confidence level. This context is essential for citations, redactions, and human-in-the-loop verification.

The model accepts standard enterprise formats including PDF, DOC, PPT, and OpenDocument. It is compact enough for a single-container deployment. Enterprise customers can access self-managed deployment for data residency and compliance needs.

Independent benchmark results

Mistral compared OCR 4 against AI-native models, frontier general-purpose models, enterprise document services, and its own OCR 3. Independent annotators preferred OCR 4 over every leading system tested. Win rates averaged 72% across the comparison set.

The evaluation used over 600 documents across 12+ languages sourced from third-party vendors. Annotators ranked each competitor’s output against OCR 4’s on a document-by-document basis.

On automated benchmarks, OCR 4 scored 85.20 on the public OlmOCRBench. It scored 93.07 on OmniDocBench and .98 on Mistral’s internal Crawl Multilingual evaluation.

Two customer data points add context. Rogo reported equivalent accuracy at roughly 8x lower cost and 17x lower latency versus leading agentic parsers. Anaqua measured roughly 4x faster per page than its incumbent provider.

Why segmentation matters

Bounding boxes were Mistral’s most-requested capability. They localise text for in-context highlighting and reliable data pipelines.

Block types and confidence scores serve different functions. They drive source-grounded citations, redactions, and human-in-the-loop verification. This structure supports several downstream workloads.

Clean, classified blocks become better retrieval units for RAG. Agents gain structural primitives to act on documents, not just read them. Connectors receive consistent, typed output for ingestion and indexing.

OCR 4 is also an ingestion component of Mistral Search Toolkit, now in public preview. Search Toolkit is Mistral’s open-source, composable search framework. Its structured output supplies citation-ready inputs to retrieval and evaluation workflows.

Use cases and examples

OCR 4 supports both high-volume pipelines and interactive document workflows.

  • Document parsing and extraction: Turn a multilingual contract into clean, structured markdown for indexing.
  • Retrieval-Augmented Generation (RAG): Feed classified blocks into Search Toolkit for source-grounded answers with citations.
  • Agentic workflows: Give an invoice-processing agent typed fields and bounding boxes to fill forms automatically.
  • Confidence-gated pipelines: Route low-confidence regions to human verifiers, and auto-approve the rest.
  • Enterprise search: Use OCR 4 as a data-source component for ingestion and entity extraction across an archive.

Early users apply OCR 4 to turn invoices into structured fields and digitise company archives. Others extract clean text from technical reports or power enterprise search.

A note on scope from Mistral official release: OCR 4 is a document-understanding model, not a decision-maker. It is not intended for medical diagnosis, legal judgment, or high-stakes financial decisions. It is also unsuited to safety-critical systems, real-time processing, or non-document inputs like raw audio or video.

Comparison: Pure Extraction vs Document AI

OCR 4 ships behind a single API endpoint. Every request runs the same model. It always returns extracted content, bounding boxes, block types, confidence scores, and markdown. What varies is how much you layer on top.

CapabilityPure Extraction ModeDocument AI Mode (same endpoint)
OutputMarkdown, bboxes, block types, confidenceStructured JSON in a schema you define
How it worksRaw OCR responseOCR output fed to mistral-small-2603
Image annotationNot appliedPer-image vision-language call on schema
Custom promptNoYes, guides interpretation or summary
Best forPipelines, agents, batch ingestionBusiness users, pilots, no parsing logic
Price$4 / 1,000 pages ($2 batch)$5 / 1,000 pages
Self-hostingAvailable for enterpriseAvailable for enterprise

The decision rule is simple. Need raw extracted content? Use OCR 4 as-is. Need the output reshaped into a schema or annotated with domain fields? Add the Document AI parameters to the same call.

Working With the API

Basic extraction takes a document URL and returns structured pages. Set include_blocks=True to get the typed blocks and bounding boxes.

import os
from mistralai.client import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": "https://arxiv.org/pdf/2201.04234"
    },
    include_blocks=True,                  # typed blocks + bounding boxes
    table_format="html",                  # None (inline), "markdown", or "html"
    include_image_base64=True
)

The response is a JSON object with a pages array. Each page carries markdown, images, tables, hyperlinks, dimensions, and confidence_scores. To gate a human-review pipeline, request per-word confidence.

ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={"type": "document_url",
              "document_url": "https://arxiv.org/pdf/2201.04234"},
    confidence_scores_granularity="word"   # or "page" for aggregates
)

The “word” setting adds a word_confidence_scores array per page and per table entry. For high-volume jobs, Mistral recommends the Batch Inference service, which halves the per-page cost.

Try It: Interactive Output Explorer

The embed below visualises OCR 4’s structured output. Switch between sample documents, toggle bounding boxes and block types, and turn on the confidence heatmap. The Markdown and JSON tabs show the two output shapes side by side. The sample data is illustrative, not a live API call.

Check out the Mistral OCR 4 announcement, OCR 4 model card, and OCR Processor docs. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to Aidevsignals.

Scroll to Top