PDFCanon · Live — Available Now

Deterministic PDF Normalization & Canonical Hashing API

A production-ready API for turning inconsistent PDF files into stable, comparable outputs. PDFCanon removes risky active content, rebuilds structure, and returns hashes your system can trust.

Open pdfcanon.com Read the API Docs

The Problem

PDFs are structurally chaotic

Two visually identical PDFs can produce different SHA-256 hashes because their internal structure, metadata, and save history differ. That makes ordinary file hashing unreliable for deduplication, tamper checks, archival verification, and audit evidence.

Embedded JavaScript and active content hidden inside the file
Incremental update history alters the hash on every re-save
Hidden metadata and object ordering differences between producers

The Solution

A strict, deterministic normalization pipeline

PDFCanon applies a deterministic normalization pipeline before hashing. The same input produces the same normalized output, giving your application a stable value to compare and store.

Active content stripped — no JavaScript, no embedded files
Structure rebuilt — XRef table, object ordering, incremental updates collapsed
Stable SHA-256 hash produced — idempotent, auditable, trustworthy

Key Capabilities

The core pieces needed to make PDF integrity checks repeatable.

Active Content Removal

JavaScript, embedded files, rich media, and AcroForms are stripped or flattened during normalization.

Structural Canonicalization

Object ordering, XRef rebuilds, incremental update collapse, and metadata stripping reduce producer-specific differences.

Stable SHA-256 Hashing

Returns both original and normalized SHA-256 values so your system can compare source files and canonical outputs separately.

Compliance-Ready Reports

A machine-readable JSON report documents what was detected and changed during normalization.

Multi-Tenant API

Organization isolation, API key management, webhook support, and billing support for production SaaS usage.

Usage-Based Pricing

Usage-based pricing with a free tier, so teams can test the API before committing production volume.

Who Uses PDFCanon

PDFCanon is built for teams that accept, store, compare, or audit PDFs from outside their own systems.

SaaS platforms

Accepting PDF uploads from untrusted sources
E-sign companies

Verifying document integrity before and after signing
Fintech & HR SaaS

Onboarding pipelines with document deduplication
Legal tech

Tamper detection and chain-of-custody requirements
Government contractors

Document submission audit trail requirements
Compliance-driven teams

SOC 2, ISO 27001, and regulatory audit evidence

Technical Details

Implementation details for technical evaluators.

API example

POST /v1/normalize

POST https://api.pdfcanon.com/v1/normalize
Authorization: Bearer pk_live_••••••••
Content-Type: multipart/form-data

// Response 200 OK
{
  "id": "nrm_01j8x7k...",
  "status": "success",
  "original_sha256": "a3f4b7c2...",
  "normalized_sha256": "e8c1d290...",
  "report": { ... }
}

Technology stack

High-performance cloud backend

High-throughput normalization with minimal latency
qpdf toolchain

Industry-standard PDF structural transformation
Schema-level tenant isolation

Each organization's data is fully partitioned
Cloud object storage

Scalable, redundant storage for normalized outputs

Evaluate PDFCanon with your own files

Start at pdfcanon.com, review the current free tier, and test the API against the PDF workflows your product already handles.