OCR API

OCR API for Developers

One endpoint. POST a PDF, get clean markdown back — tables, headings, and lists preserved exactly as they appear on the page.

$ curl -X POST https://pdftomarkdown.dev/v1/convert \
  -H "Authorization: Bearer demo_public_key" \
  -H "Content-Type: application/json" \
  -d '{"input":{"pdf_url":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"}}'
{
  "markdown": "# Document\n\n| Column A | Column B |\n|---|---|\n| Value 1 | Value 2 |\n\n**Summary text here**",
  "pages": 3,
  "request_id": "req_abc123"
}

Why this OCR API is different

Most OCR APIs are wrappers around Tesseract. They give you a flat string of text with no structure — no headings, no tables, no formatting. You then have to write post-processing code to reconstruct the document layout.

pdfToMarkdown uses a vision-language model that reads documents the way a human does. It understands that a column of numbers with a header row is a table, that a line in all-caps is a heading, and that a block of indented text is a code sample or list.

The output is structured markdown you can actually use.

What comes out of the API

Send any PDF in, get markdown out:

# Invoice #1042

**Vendor:** Acme Corp
**Date:** 2024-01-15

| Description       | Qty | Unit Price | Total  |
|-------------------|-----|------------|--------|
| API Pro Plan      |   1 |    $299.00 | $299.00|
| Setup fee         |   1 |     $49.00 |  $49.00|

**Subtotal:** $348.00
**Tax (8%):** $27.84
**Total due:** $375.84

Tables stay as tables. Headings stay as headings. Multi-column layouts are linearized intelligently.

Supported document types

The API handles any PDF, but performs especially well on:

  • Invoices and receipts — line items, totals, vendor details extracted cleanly
  • Research and academic papers — equations, citations, multi-column layouts
  • Legal contracts — clause structure, defined terms, signature blocks
  • Financial reports — tables with merged cells, footnotes, appendices
  • Scanned documents — the vision model handles low-resolution and rotated scans

Integration in 60 seconds

No SDKs, no API wrappers, no config files needed:

# Free tier — no key required, page 1 only
curl -X POST https://pdftomarkdown.dev/v1/convert \
  -H "Authorization: Bearer demo_public_key" \
  -H "Content-Type: application/json" \
  -d '{"input":{"pdf_url":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"}}'

Or with the Python SDK:

from pdftomarkdown import convert

result = convert("document.pdf")
print(result.markdown)

Compared to other OCR APIs

FeaturepdfToMarkdownTesseract-based APIsCloud Vision APIs
Table detectionPartial
Markdown output
Free tier, no signupVaries
Multi-page PDFs✓ (Developer)
Math/equation supportPartial

Pricing

Both tiers are free. No credit card required.

Hacker

Free, no signup

  • Public demo key — copy & paste
  • Only page 1 is processed
  • 1 request/min per IP
  • Watermark in output
View docs →

Developer

Free, GitHub login

  • Personal API key
  • 100 pages/month
  • Multi-page PDFs
  • No watermark
Get API key →

Get your free API key

Free tier — no account needed. It converts page 1 only and adds a watermark. Upgrade to developer to remove the watermark and unlock full multi-page PDFs.