OCR API
OCR API for Developers
One endpoint. POST a PDF, get clean markdown back — tables, headings, and lists preserved exactly as they appear on the page.
$ curl -X POST https://pdftomarkdown.dev/v1/convert \
-H "Authorization: Bearer demo_public_key" \
-H "Content-Type: application/json" \
-d '{"input":{"pdf_url":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"}}' {
"markdown": "# Document\n\n| Column A | Column B |\n|---|---|\n| Value 1 | Value 2 |\n\n**Summary text here**",
"pages": 3,
"request_id": "req_abc123"
} Why this OCR API is different
Most OCR APIs are wrappers around Tesseract. They give you a flat string of text with no structure — no headings, no tables, no formatting. You then have to write post-processing code to reconstruct the document layout.
pdfToMarkdown uses a vision-language model that reads documents the way a human does. It understands that a column of numbers with a header row is a table, that a line in all-caps is a heading, and that a block of indented text is a code sample or list.
The output is structured markdown you can actually use.
What comes out of the API
Send any PDF in, get markdown out:
# Invoice #1042
**Vendor:** Acme Corp
**Date:** 2024-01-15
| Description | Qty | Unit Price | Total |
|-------------------|-----|------------|--------|
| API Pro Plan | 1 | $299.00 | $299.00|
| Setup fee | 1 | $49.00 | $49.00|
**Subtotal:** $348.00
**Tax (8%):** $27.84
**Total due:** $375.84
Tables stay as tables. Headings stay as headings. Multi-column layouts are linearized intelligently.
Supported document types
The API handles any PDF, but performs especially well on:
- Invoices and receipts — line items, totals, vendor details extracted cleanly
- Research and academic papers — equations, citations, multi-column layouts
- Legal contracts — clause structure, defined terms, signature blocks
- Financial reports — tables with merged cells, footnotes, appendices
- Scanned documents — the vision model handles low-resolution and rotated scans
Integration in 60 seconds
No SDKs, no API wrappers, no config files needed:
# Free tier — no key required, page 1 only
curl -X POST https://pdftomarkdown.dev/v1/convert \
-H "Authorization: Bearer demo_public_key" \
-H "Content-Type: application/json" \
-d '{"input":{"pdf_url":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"}}'
Or with the Python SDK:
from pdftomarkdown import convert
result = convert("document.pdf")
print(result.markdown)
Compared to other OCR APIs
| Feature | pdfToMarkdown | Tesseract-based APIs | Cloud Vision APIs |
|---|---|---|---|
| Table detection | ✓ | ✗ | Partial |
| Markdown output | ✓ | ✗ | ✗ |
| Free tier, no signup | ✓ | Varies | ✗ |
| Multi-page PDFs | ✓ (Developer) | ✓ | ✓ |
| Math/equation support | ✓ | ✗ | Partial |
Related pages
- Invoice OCR API — vertical guide for accounts payable workflows
- PDF Parsing API — parsing structured data from PDFs
- API documentation — full endpoint reference with examples
Pricing
Both tiers are free. No credit card required.
Hacker
Free, no signup
- Public demo key — copy & paste
- Only page 1 is processed
- 1 request/min per IP
- Watermark in output
Developer
Free, GitHub login
- Personal API key
- 100 pages/month
- Multi-page PDFs
- No watermark
Get your free API key
Free tier — no account needed. It converts page 1 only and adds a watermark. Upgrade to developer to remove the watermark and unlock full multi-page PDFs.