/
Tags/
Python/

Python

Schema-First PDF Extraction in Python with Pydantic

4 March 2026·1186 words·6 mins

Most PDF extraction projects start with the document. You open a PDF, look at the text, write a regex, extract a value. Repeat for each field.

pdfplumber vs PyMuPDF vs PyPDF2 for PDF Extraction

4 March 2026·875 words·5 mins

PDF Extraction

If you’re extracting data from PDFs in Python, you’ll encounter three libraries repeatedly: pdfplumber, PyMuPDF (imported as fitz), and PyPDF2. They overlap in capability but differ in what they’re optimised for.

Extracting Tables from PDFs in Python: The Complete Guide

4 March 2026·1073 words·6 mins

PDF Extraction

Extracting tables from PDFs is one of the most common requirements in document automation and one of the most reliable ways to introduce subtle errors if you do it carelessly.

↑