Logistics runs on paperwork. A single shipment from a manufacturer in Guangzhou to a distributor in Hamburg might require a bill of lading, commercial invoice, packing list, certificate of origin, customs entry, and a dangerous goods declaration — all of which need to be read, keyed into systems, and verified before anything moves.
There are 30+ major container shipping lines, each with their own bill of lading format. Customs declarations vary by country, and HS code requirements shift with trade agreements. One wrong field — a miskeyed consignee address, an incorrect HS code, a discrepancy between the invoice total and the line item sum — and the shipment stops. Sometimes it’s a delay. Sometimes it’s a fine.
That’s the context for intelligent document processing in logistics. This isn’t about automating low-stakes paperwork. It’s about processing documents where errors have real financial and legal consequences, across formats that were never designed to be machine-readable.
The document types logistics teams process#
Bill of lading (B/L). The contract of carriage between the shipper and carrier. Every shipping line has its own template. MSC’s B/L looks nothing like Maersk’s, and both look nothing like a house B/L issued by a freight forwarder. Key fields: shipper, consignee, notify party, vessel, voyage, port of loading, port of discharge, container numbers, cargo description, gross weight. A mistake on the consignee means the cargo can’t be released.
Commercial invoice. The basis for customs valuation and duty calculation. Varies enormously by supplier — small exporters often use their own templates with no consistent structure. Key fields: buyer, seller, incoterms, unit prices, quantities, total value, currency, HS codes. A discrepancy between the declared value and what HMRC or CBP expects triggers an audit.
Packing list. Details how cargo is physically packed: carton counts, dimensions, weights per line item. Often arrives as a multi-page table. When packing lists don’t match the invoice or B/L, customs wants to know why.
Customs declaration (import/export entry). Country-specific. In the UK, that’s a CHIEF or CDS entry. In the US, a CBP Form 7501. HS codes must be correct at the 6 or 10-digit level depending on jurisdiction. Duty rates, VAT, and trade preference eligibility all flow from the HS code. An incorrect HS code isn’t just an accuracy problem — it can be post-clearance audit territory.
Certificate of origin. Required for preferential tariff treatment under trade agreements. Format depends on the issuing body — some are chamber-stamped documents, some are REX declarations, some are EUR.1 movement certificates. The key risk is claiming preference on a shipment that doesn’t qualify, which leads to back-duty plus interest.
Dangerous goods declaration (DGD). IATA or IMDG format depending on air or sea. Highly structured, but the UN number, packing group, and emergency contact must be extracted correctly. Errors here have regulatory consequences beyond customs.
Proof of delivery (POD). Often a scanned, handwritten or semi-structured document. Key fields: delivery date, signatory, condition notes. For billing and dispute resolution.
Why standard tools struggle#
The obvious answer is layout variation. Bills of lading from 30 carriers, commercial invoices from hundreds of suppliers, customs forms from dozens of countries — no two look the same, and they all change over time.
But layout variation is just the start.
Multi-page tables are common in packing lists and invoices. A table that starts on page 2 and continues on page 3, with a subtotal only on page 3, is difficult to reconstruct correctly using basic PDF extraction. Merge cells in carrier B/L tables make column alignment unreliable.
Scanned documents add OCR complexity. A handwritten amendment to a printed B/L — a changed container number, a corrected weight — needs to be detected and reconciled against the typed fields. Low-resolution scans from port agents in some origin countries make field-level OCR unreliable without pre-processing.
Then there’s the compliance angle. Standard tools are built to extract data. They’re not built to validate it. A tool that silently extracts an HS code in the wrong format, or misreads a digit in a container number, doesn’t fail visibly — it just passes bad data downstream. In a workflow where that data feeds a customs entry, silent failures are unacceptable.
What an IDP pipeline for logistics actually does#
A production document extraction pipeline for logistics has four stages.
Ingestion. Documents arrive via email attachments, EDI message attachments, carrier portals, and forwarded PDFs. The pipeline needs to handle native PDFs, scanned PDFs, and image files. Pre-processing — deskew, denoise, resolution normalisation — runs before extraction. Document classification happens here too: is this a B/L or a packing list? Sometimes that’s obvious from filename conventions; sometimes it requires reading the document.
Extraction. This is where schema-first extraction matters. You define the output schema first — every field you need, its type, whether it’s required, and its validation rules. Then you build extraction to populate it. For carriers with consistent B/L layouts, regex and coordinate-based rules are faster and more reliable than an LLM. For variable supplier invoices, a language model handles the layout variation. The extraction strategy is per document type, sometimes per carrier — not a single model for everything.
Validation. Extraction alone isn’t enough. Cross-checks run after extraction: does the invoice total match the sum of line items? Does the HS code match the expected format for the destination country? Are all required fields present? Is the gross weight on the B/L within tolerance of the packing list total? Confidence scoring runs on every extracted field, so the pipeline knows which fields it’s certain about and which need review.
Output. Validated data goes to downstream systems — TMS, customs software (e.g., ASM Sequoia, e-Customs), WMS — via API or structured file. Exceptions route to a review queue. Human-in-the-loop processing isn’t an afterthought; it’s how the system handles the cases it isn’t confident about, which in logistics can include anything from an unfamiliar carrier template to a document with a handwritten amendment.
The edge cases that matter in logistics#
Multi-supplier invoices. A consolidation shipment might include goods from three suppliers, each with their own invoice format, bundled into one consignment. The pipeline needs to handle multiple document layouts in a single batch, and ensure the extracted data maps to the correct line items.
Documents in multiple languages. A commercial invoice from a Chinese supplier might be bilingual (Chinese and English), with some fields in one language and some in the other. Certificates of origin from some countries arrive in French or Spanish. The pipeline needs to handle this without requiring manual pre-sorting.
HS code extraction and validation. HS codes are frequently embedded in invoice tables, sometimes in a column labelled “Tariff Code” or “HTS” or “CN Code” or just “Code.” After extraction, format validation checks that the code is plausible — correct digit count, valid chapter prefix. For UK/EU trade, the pipeline can cross-reference against the trade tariff to flag codes that don’t exist.
Split shipments. A single purchase order shipped in two separate containers, with separate B/Ls and packing lists but a single commercial invoice. Reconciling the documents to each other — and to the PO — requires the pipeline to understand relationships across documents, not just extract individual ones.
Amendments. A B/L amendment changes the consignee after the original was issued. A revised commercial invoice changes the declared value. The pipeline needs to handle version control: which document supersedes which, and does the downstream system need to be updated?
What this looks like in practice#
A freight forwarder processing 200 shipments per month, each requiring four to six documents, is looking at 800 to 1,200 documents monthly — all of which need to be read and keyed into their customs and TMS software.
With manual processing, each document takes 5-15 minutes depending on complexity. That’s 100-300 person-hours per month, not counting error correction.
An IDP pipeline changes the workflow. Documents arrive and are processed automatically. Straightforward cases — a B/L from a carrier whose format the pipeline knows well, an invoice that matches the expected structure — go straight through. Anomalies get flagged: an HS code that doesn’t validate, a weight discrepancy between documents, a missing required field. Those route to a review queue where a coordinator looks at exactly the flagged field, confirms or corrects it, and moves on.
The result is that the team’s time shifts from keying data to reviewing exceptions. The volume of work that requires human attention shrinks to the genuinely ambiguous cases.
How to evaluate an IDP solution for logistics#
Does it handle your carriers’ specific bill of lading formats? Ask for a test on a sample of your actual documents — not a demo on clean PDFs. Include documents from your three highest-volume carriers and your most problematic supplier.
How does it handle amendments? Amended B/Ls and revised invoices are common. Find out whether the system detects that a document is an amendment, and how it handles the downstream update.
What’s the accuracy on HS codes? HS code extraction is higher stakes than extracting a shipment reference. Ask specifically about HS code accuracy, not just overall field-level accuracy figures.
How does it fail — loudly or silently? A system that outputs a wrong HS code with no flag is more dangerous than one that routes the document to manual review. Understand how the system signals low confidence, and what proportion of documents trigger review versus pass through automatically.
What happens when a new carrier or supplier format appears? Carrier templates change. New suppliers come onboard. How long does it take the system to handle a new format — days, weeks, or a support ticket queue?
Book a Diagnostic Session →
