Why this OCR API is different

Most OCR APIs are wrappers around Tesseract. They give you a flat string of text with no structure — no headings, no tables, no formatting. You then have to write post-processing code to reconstruct the document layout.

pdfToMarkdown uses a vision-language model that reads documents the way a human does. It understands that a column of numbers with a header row is a table, that a line in all-caps is a heading, and that a block of indented text is a code sample or list.

The output is structured markdown you can actually use.

What comes out of the API

Send any PDF in, get markdown out:

# Invoice #1042

**Vendor:** Acme Corp
**Date:** 2024-01-15

| Description       | Qty | Unit Price | Total  |
|-------------------|-----|------------|--------|
| API Pro Plan      |   1 |    $299.00 | $299.00|
| Setup fee         |   1 |     $49.00 |  $49.00|

**Subtotal:** $348.00
**Tax (8%):** $27.84
**Total due:** $375.84

Tables stay as tables. Headings stay as headings. Multi-column layouts are linearized intelligently.

Supported document types

The API handles any PDF, but performs especially well on:

Invoices and receipts — line items, totals, vendor details extracted cleanly
Research and academic papers — equations, citations, multi-column layouts
Legal contracts — clause structure, defined terms, signature blocks
Financial reports — tables with merged cells, footnotes, appendices
Scanned documents — the vision model handles low-resolution and rotated scans

Integration in 60 seconds

No SDKs, no API wrappers, no config files needed:

# Free tier — no key required, page 1 only
curl -X POST https://pdftomarkdown.dev/v1/convert \
  -H "Authorization: Bearer demo_public_key" \
  -H "Content-Type: application/json" \
  -d '{"input":{"pdf_url":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"}}'

Or with the Python SDK:

from pdftomarkdown import convert

result = convert("document.pdf")
print(result.markdown)

Compared to other OCR APIs

Feature	pdfToMarkdown	Tesseract-based APIs	Cloud Vision APIs
Table detection	✓	✗	Partial
Markdown output	✓	✗	✗
Free tier, no signup	✓	Varies	✗
Multi-page PDFs	✓ (Developer)	✓	✓
Math/equation support	✓	✗	Partial

Invoice OCR API — vertical guide for accounts payable workflows
PDF Parsing API — parsing structured data from PDFs
API documentation — full endpoint reference with examples

Pricing

Both tiers are free. No credit card required.

Hacker

Free, no signup

Public demo key — copy & paste
Only page 1 is processed
1 request/min per IP
Watermark in output

View docs →

Developer

Free, GitHub login

Personal API key
100 pages/month
Multi-page PDFs
No watermark

Get API key →

Get your free API key

Free tier — no account needed. It converts page 1 only and adds a watermark. Upgrade to developer to remove the watermark and unlock full multi-page PDFs.

Sign in with GitHub Or read the docs first →

OCR API for Developers