PDFCanon · Live — Available Now

Deterministic PDF Normalization & Canonical Hashing API

A production-ready API for turning inconsistent PDF files into stable, comparable outputs. PDFCanon removes risky active content, rebuilds structure, and returns hashes your system can trust.

Problem and Solution

The Problem

PDFs are structurally chaotic

Two visually identical PDFs can produce different SHA-256 hashes because their internal structure, metadata, and save history differ. That makes ordinary file hashing unreliable for deduplication, tamper checks, archival verification, and audit evidence.

  • Embedded JavaScript and active content hidden inside the file
  • Incremental update history alters the hash on every re-save
  • Hidden metadata and object ordering differences between producers
The Solution

A strict, deterministic normalization pipeline

PDFCanon applies a deterministic normalization pipeline before hashing. The same input produces the same normalized output, giving your application a stable value to compare and store.

  • Active content stripped — no JavaScript, no embedded files
  • Structure rebuilt — XRef table, object ordering, incremental updates collapsed
  • Stable SHA-256 hash produced — idempotent, auditable, trustworthy

Key Capabilities

The core pieces needed to make PDF integrity checks repeatable.

Active Content Removal

JavaScript, embedded files, rich media, and AcroForms are stripped or flattened during normalization.

Structural Canonicalization

Object ordering, XRef rebuilds, incremental update collapse, and metadata stripping reduce producer-specific differences.

Stable SHA-256 Hashing

Returns both original and normalized SHA-256 values so your system can compare source files and canonical outputs separately.

Compliance-Ready Reports

A machine-readable JSON report documents what was detected and changed during normalization.

Multi-Tenant API

Organization isolation, API key management, webhook support, and billing support for production SaaS usage.

Usage-Based Pricing

Usage-based pricing with a free tier, so teams can test the API before committing production volume.

Who Uses PDFCanon

PDFCanon is built for teams that accept, store, compare, or audit PDFs from outside their own systems.

  • SaaS platforms
    Accepting PDF uploads from untrusted sources
  • E-sign companies
    Verifying document integrity before and after signing
  • Fintech & HR SaaS
    Onboarding pipelines with document deduplication
  • Legal tech
    Tamper detection and chain-of-custody requirements
  • Government contractors
    Document submission audit trail requirements
  • Compliance-driven teams
    SOC 2, ISO 27001, and regulatory audit evidence

Technical Details

Implementation details for technical evaluators.

API example

POST /v1/normalize
POST https://api.pdfcanon.com/v1/normalize
Authorization: Bearer pk_live_••••••••
Content-Type: multipart/form-data

// Response 200 OK
{
  "id": "nrm_01j8x7k...",
  "status": "success",
  "original_sha256": "a3f4b7c2...",
  "normalized_sha256": "e8c1d290...",
  "report": { ... }
}

Technology stack

  • High-performance cloud backend
    High-throughput normalization with minimal latency
  • qpdf toolchain
    Industry-standard PDF structural transformation
  • Schema-level tenant isolation
    Each organization's data is fully partitioned
  • Cloud object storage
    Scalable, redundant storage for normalized outputs

Evaluate PDFCanon with your own files

Start at pdfcanon.com, review the current free tier, and test the API against the PDF workflows your product already handles.