Python 3.10+ introduced match; 3.12 perfected it with literal patterns and type unions. This replaces endless if-elif chains.
# Modern 3.12 pattern from typing import Literal, UnionState = Literal["idle", "running", "paused"] Event = Union[State, tuple[str, int]]
def handle_event(event: Event) -> str: match event: case "idle": return "Starting engine" case "running" | "paused": # OR pattern return "Already active" case ("error", code) if code > 400: # Guard + sequence pattern return f"Critical error code" case _: return "Unknown"
Impact: Your code becomes self-documenting. The pattern matching shows the exact state transitions, making bugs impossible to hide.
Abandon setup.py, requirements.txt, and Pipfile. Python 3.12 standardizes everything in pyproject.toml: Python 3
[project] name = "pdf-power" version = "3.0.0" requires-python = ">=3.12" dependencies = ["pypdf>=4.0", "numpy>=1.26"][build-system] requires = ["setuptools>=68"] build-backend = "setuptools.build_meta"
[tool.ruff] # Linter replacing flake8/black/isort line-length = 88
Impact: One file controls dependencies, build, linting, testing, and typing.
Impact: True separation of content, style, and template.
Instead of writing PDFs line-by-line, use a layered approach: a Canvas for headers/footers, a DocTemplate for flowables, and a Story list for dynamic content. This pattern mirrors HTML/CSS and enables reuse of corporate design without touching business logic. Impact : Your code becomes self-documenting
Impact: 10x throughput for PDF reports.
PDF operations are often I/O-bound (reading files, writing results). Use ThreadPoolExecutor for operations like merging, watermarking, or form filling. Use ProcessPoolExecutor for CPU-bound OCR or image-to-PDF conversion. The strategy: separate pypdf operations (light) from pdf2image (heavy).
The Impact: Extract tables and text from 5,000-page reports in seconds.
pdfplumber builds on pdfminer.six but adds intelligent layout analysis. Its secret weapon: lazy caching and page objects as context managers.
import pdfplumberwith pdfplumber.open("large_report.pdf") as pdf: # only first page parsed into memory first_page = pdf.pages[0] table = first_page.extract_table()
# efficiently iterate for page in pdf.pages: if "_summary_" in page.extract_text().lower(): print(page.extract_tables())
Strategy: Combine with functools.lru_cache when repeatedly extracting from same page.
type Matrix = list[list[float]] type Point = tuple[float, float, float] type ImageOrPDF = Path | bytes | None
Strategy: Use this for domain-driven design. Define all business types at the top of your module as a "contract."
Use try-except-else with three fallbacks: Impact : One file controls dependencies, build, linting,
Impact: Intelligent text reflow.
Unlike pypdf’s raw text extraction (which returns garbage for multi-column layouts), pdfminer.six provides LTPage objects with bounding boxes and reading order. Strategy: sort components by y0 descending and x0 ascending, then group by vertical overlap to reconstruct columns.