Azure Document Intelligence (formerly Form Recognizer) is Microsoft’s managed IDP service. It handles invoices, receipts, purchase orders, and ID documents well — out of the box, with no custom training required for standard formats.
For many use cases, it’s a reasonable starting point. For many production workflows, it’s not enough.
What Azure Document Intelligence does well#
Before the alternatives, it’s worth being clear about where Azure DI genuinely works:
Standard document types. The prebuilt models for invoices, receipts, W-2s, and ID documents are competent on documents that look like the training data. If your invoices look like invoices, the prebuilt invoice model works.
Getting started quickly. No custom training, no infrastructure to manage. You upload a document, get a response. For prototypes and low-stakes workflows, that speed matters.
Handwriting recognition. Azure DI has strong handwriting OCR, which is useful for forms and paper documents.
Managed infrastructure. Microsoft handles scaling, availability, and model updates. You don’t run anything.
Where it falls short#
The limits become visible as soon as your documents deviate from the expected.
Edge cases and layout variation#
Azure DI’s prebuilt models are trained on representative examples of common document types. Your documents aren’t always representative.
A water utility’s lab report. A freight forwarder’s bill of lading. A logistics company’s delivery manifesto. A financial services firm’s proprietary reporting template. These documents have specific structures that prebuilt models weren’t trained on.
When you submit a document that doesn’t match the training distribution, extraction quality drops — often without a clear signal that it has. The API returns results with confidence scores, but those scores reflect the model’s certainty about its extraction, not whether the extracted values are actually correct.
Custom training has real limits#
Azure DI does support custom models. You label examples, train a model, deploy it. In practice, this works for document types with consistent layouts and enough labelled examples.
It struggles with:
- High layout variation within a single document type (the same invoice from 20 different suppliers)
- Small label sets (you need enough examples for the model to generalise)
- Documents where field locations are unpredictable or context-dependent
Ongoing maintenance cost#
When your document layouts change — and they will — retraining a custom model is a project. There’s no quick “update the extraction rule” option; you’re back to labelling and retraining.
Cost at volume#
Azure DI pricing is per-page. At low volumes, the cost is negligible. At high volumes — thousands of pages per day — it becomes significant. At that point, the economics of a custom pipeline often look better.
No control over failure modes#
When Azure DI fails, it fails opaquely. You get a low-confidence result or an empty field. There’s no mechanism for routing uncertain extractions to a human reviewer as part of the pipeline itself — that logic is yours to build on top.
The alternatives#
AWS Textract#
Amazon’s equivalent. Similar strengths and weaknesses. Solid OCR, prebuilt models for common types, custom models for domain-specific documents.
Worth considering if you’re already in AWS infrastructure. The same edge-case limitations apply.
Google Document AI#
Google’s offering. Stronger on form parsing and table extraction than the other two in some benchmarks. Similar managed model constraints.
Worth evaluating if you’re in GCP or if table-heavy documents are your main challenge.
Open-source OCR + custom pipeline#
Tools like Tesseract (OCR), pdfplumber (PDF parsing), and PyMuPDF (text and layout extraction) handle the ingestion layer. You build the extraction logic yourself.
This is the approach with the most control and the highest build cost. It makes sense when:
- Your document layouts are highly specific
- You need complete control over failure modes
- Volume justifies the engineering investment
- You need the extraction logic to be auditable and maintainable
Custom pipeline with selective LLMs#
The approach I use in production: rules-based extraction as the baseline, LLMs introduced only where layout variation genuinely makes rules insufficient, confidence scoring on every field, uncertain results routed to human review.
This gives you the accuracy and control of a custom pipeline with the flexibility of LLMs for the hard cases — without the risk of LLM hallucination passing silently downstream.
How to decide#
The right choice depends on your documents and your tolerance for failure.
| Azure DI / AWS / Google | Custom Pipeline | |
|---|---|---|
| Standard document types | ✓ Works well | Overkill |
| High layout variation | ✗ Breaks at edges | ✓ Handles it |
| Domain-specific documents | Needs custom training | ✓ Built for this |
| Silent failures acceptable | Manageable | Not recommended |
| Control over failure modes | Limited | ✓ Full control |
| Time to first result | Days | Weeks |
| Cost at high volume | Per-page pricing | Fixed infrastructure |
| Ongoing maintenance | Platform-managed | Your team or contractor |
A useful heuristic: if 100% of your documents look like textbook examples of their type, a managed platform is probably fine. If any significant portion of your documents are domain-specific, have variable layouts, or require high accuracy for downstream decisions — you need more control than a managed platform gives you.
The real question#
The decision between Azure DI and a custom pipeline isn’t primarily about technology. It’s about where your documents sit on the variation spectrum and what the cost of extraction errors is in your specific workflow.
If errors in your extracted data propagate into compliance records, financial reports, or operational decisions — the cost of silent failures is high. That’s the scenario where the confidence scoring and human-in-the-loop design of a custom pipeline pays for itself.
If you’re not sure where your documents fall, the fastest way to find out is to run your actual documents — the awkward ones, not the clean examples — through the platform you’re evaluating. The results usually make the decision obvious.
Frequently asked questions#
What are the main alternatives to Azure Document Intelligence? The primary alternatives are: AWS Textract and Google Document AI (other managed cloud platforms with similar tradeoffs), custom extraction pipelines built with Python libraries (pdfplumber, PyMuPDF) plus selective LLM augmentation, and other managed IDP vendors like Nanonets or Docsumo. The right choice depends on your document types, layout variation, and accuracy requirements.
When should I use a custom pipeline instead of Azure Document Intelligence? When your documents have significant layout variation, are domain-specific (lab reports, certificates of analysis, customs documents), or when extraction errors have direct consequences in compliance, finance, or operations. Azure DI performs well on standard formats like invoices and receipts; it struggles with unusual layouts and documents outside its training distribution.
Is Azure Document Intelligence accurate enough for production use? For standard document types — invoices, receipts, purchase orders in common formats — yes. For domain-specific documents with significant layout variation, accuracy on edge cases is lower. The key question isn’t average accuracy but failure mode: does it fail loudly (so you can catch errors) or silently (so errors propagate downstream)? Azure DI’s confidence scores can be used to detect low-confidence extractions, but the routing logic to handle them is yours to build.
How does Azure Document Intelligence compare to AWS Textract? Both are managed cloud platforms with pre-trained models for common document types. Azure DI has broader out-of-the-box support for document types (invoices, receipts, IDs, business cards, tax forms). AWS Textract has stronger table extraction. Both struggle with domain-specific documents and high layout variation. At scale, Textract’s pricing model can be more favourable; Azure DI integrates better with existing Azure infrastructure.
What does a custom extraction pipeline cost compared to Azure Document Intelligence? Azure DI charges per page processed (typically $0.001–$0.01 per page depending on the feature). A custom pipeline has upfront build cost but lower ongoing cost at high volume. The more relevant comparison for most businesses is build-and-maintain cost versus per-page cost at their specific volume — and the accuracy difference on their specific documents.
Book a Diagnostic Session →