Skip to main content
  1. Blogs/
  2. Intelligent Document Processing — Guides and Code/
  3. IDP Glossary: Intelligent Document Processing Terms Explained/

What is Straight-Through Processing (STP)?

·700 words·4 mins·
Subhajit Bhar
Author
Subhajit Bhar
I build production-grade document extraction pipelines for businesses that process invoices, lab reports, contracts, and other document types at scale.
Table of Contents

Straight-through processing (STP) is the automated handling of a document or transaction from receipt to completion without any manual intervention.


What STP means in document processing
#

A document arrives. Data is extracted from it. The extracted data is validated against expected formats and business rules. The validated output is delivered to a downstream system — an ERP, a database, a workflow tool. All of this happens automatically, with no human touching the document at any point.

That’s straight-through processing. The document moves from ingestion to output in seconds, not hours.

In intelligent document processing, STP is the target state for any document a system handles well. You’re not aiming for STP on every document regardless of circumstances — you’re aiming for it on the documents where the extraction is reliable enough to trust without human review.


STP rate as a metric
#

STP rate is the percentage of documents processed from start to finish without manual intervention.

A system with an 85% STP rate handles 85 out of every 100 documents automatically. The other 15 require some form of human review before the output is accepted.

Higher STP rate means more automation and less manual effort. But STP rate on its own isn’t the right metric to optimise for. What matters is STP rate on documents where the extraction is actually correct.

A system that achieves 95% STP by passing uncertain extractions through automatically isn’t high-performing — it’s producing bad data at speed. The 5% that would have gone to review are now in your downstream systems with errors no one caught.


The tension between STP and accuracy
#

This is the design problem at the centre of any production IDP pipeline: pushing more documents through automatically increases STP rate, but it also increases the risk of incorrect extractions reaching downstream systems unchecked.

The right design resolves this tension by using confidence scoring to separate what the system handles confidently from what it doesn’t. High-confidence extractions proceed automatically — those are your STP documents. Low-confidence extractions are routed to human-in-the-loop review before they go anywhere.

This gives you a high STP rate on the documents the system handles well, without sacrificing accuracy on the difficult ones. The reviewer only sees the documents the system is genuinely uncertain about. Everything else goes straight through.

Treating STP rate as the goal without confidence scoring leads to one outcome: you eventually stop trusting the output.


What determines STP rate
#

Several factors affect how many documents a pipeline can process without human intervention:

Document consistency. Consistent layouts extract reliably. A document type where every supplier uses the same template will have a higher STP rate than one where each sender has their own format. The more predictable the document, the more confidently the system can extract from it.

Extraction approach. Rules-based extraction on known formats produces deterministic, high-confidence results. When a regex matches a known field pattern exactly, confidence is high. LLM-based extraction on variable layouts introduces more uncertainty and typically requires tighter threshold management to maintain accuracy.

Confidence thresholds. Lowering the threshold that triggers human review increases STP rate — more extractions pass through automatically. But lower thresholds also mean lower-confidence extractions proceeding without verification. Setting thresholds is a deliberate trade-off, not a configuration detail.

Document quality. Poor scans, low-resolution images, and handwritten content reduce extraction confidence even on known layouts. A batch of clean PDFs will have a higher STP rate than a batch of photographed paper forms, all else being equal.


STP in practice
#

A well-designed document extraction pipeline on a mature document type with known layouts typically achieves 85-95% STP. The remaining 5-15% go to human review — not because the system failed, but because those documents fell outside the confident operating range.

Over time, as the system accumulates more layouts and refines its extraction rules, STP rate tends to improve. New document sources arrive with unfamiliar formats, causing a temporary drop in STP rate for that source while rules are established. Once extraction is stable for the new layout, STP rate recovers.

The review queue isn’t a sign that something is wrong. It’s the mechanism that keeps the STP documents trustworthy.


Book a Diagnostic Session →

Related

Contract Data Extraction: Pulling Structured Data from Legal Documents

·1710 words·9 mins
Contracts are the hardest document type to extract data from reliably. Invoices have a predictable structure. Lab reports have defined fields. Contracts are natural language documents, and the information you need — key dates, party names, payment terms, renewal clauses, termination conditions — can appear anywhere, phrased in many different ways, across documents that range from two pages to two hundred.

Customs Declaration Data Extraction: Automating Import and Export Documentation

·1439 words·7 mins
Customs declarations are among the most error-sensitive documents in logistics. A wrong tariff code or an incorrectly extracted commodity value can trigger delays, fines, or hold actions. At the same time, import/export operations process hundreds or thousands of declarations per month, and the manual effort of verifying and entering data from these documents is substantial.