If you’re building a RAG pipeline or LLM application that ingests PDFs, you’ve likely encountered LlamaParse — the PDF parsing service from the LlamaIndex team. It’s a capable tool, but it comes with trade-offs that make it the wrong choice for many projects.

This comparison covers what both tools do, where each shines, and how to choose.

What is LlamaParse?

LlamaParse is a cloud-based document parsing service built by LlamaIndex (now called llama_index in the Python package). It’s designed primarily for RAG (retrieval-augmented generation) workflows — taking PDFs and converting them into formats suitable for vector embedding and retrieval.

It launched in 2024 as a paid add-on to the LlamaIndex ecosystem, with a free tier of 1,000 pages/day.

What is pdfToMarkdown?

pdfToMarkdown is a standalone API that converts PDFs to clean markdown. It’s not tied to any particular framework. You send a PDF, you get back markdown. Use it with LlamaIndex, LangChain, raw OpenAI calls, or anything else.

The core difference: ecosystem lock-in

LlamaParse is designed to work inside the LlamaIndex framework. The native Python interface is:

from llama_parse import LlamaParse

parser = LlamaParse(result_type="markdown")
documents = parser.load_data("document.pdf")
# returns LlamaIndex Document objects

This is convenient if you’re already using LlamaIndex. But if you’re not — or if you want to switch frameworks later — you’re dealing with an extra dependency and a specific object model.

pdfToMarkdown returns plain text. It works with any stack:

from pdftomarkdown import convert

result = convert("document.pdf")
markdown = result.markdown  # just a string

# Use it with anything
from langchain.text_splitter import MarkdownTextSplitter
chunks = MarkdownTextSplitter().split_text(markdown)

Pricing comparison

	LlamaParse	pdfToMarkdown
Free tier	1,000 pages/day	100 pages/month
Authentication	API key (account required)	Public key (no account) or GitHub login
Paid plans	Credits-based	TBD
Credit card required	Yes for paid	No

LlamaParse’s free tier is more generous in raw page count, but requires account creation and is structured around LlamaCloud credits. pdfToMarkdown’s free tier (via GitHub login) has lower limits but no credit card friction and works immediately with the public demo key.

Output quality

Both tools use modern vision-language models to understand document layout, so both handle tables, columns, and mixed content better than traditional OCR.

LlamaParse has a feature called “instruction following” — you can give natural-language instructions alongside your document:

parser = LlamaParse(
    result_type="markdown",
    parsing_instruction="Extract only the financial tables, ignore boilerplate text."
)

This is genuinely powerful for specific extraction tasks where you know exactly what you want.

pdfToMarkdown takes the opposite approach: return a clean, complete markdown representation of the document and let you process it however you want. For most use cases — especially when you don’t know the document structure in advance — this is more flexible.

API design

LlamaParse’s API requires the LlamaIndex Python SDK. There’s no simple curl interface for quick testing:

# LlamaParse — you need to pip install llama-parse
import nest_asyncio
nest_asyncio.apply()

from llama_parse import LlamaParse
parser = LlamaParse(api_key="llx-...", result_type="markdown")
docs = parser.load_data("file.pdf")

pdfToMarkdown works from a single HTTP request:

curl -X POST https://pdftomarkdown.dev/v1/convert \
  -H "Authorization: Bearer demo_public_key" \
  -H "Content-Type: application/json" \
  -d '{"input":{"pdf_url":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"}}'

For prototyping, debugging, or use in non-Python environments (Node.js, Go, Ruby, shell scripts), the HTTP API is much more accessible.

When to use LlamaParse

You’re already using LlamaIndex and want seamless integration
You need instruction-following extraction (specific fields, filtered content)
You’re processing large volumes and the free tier limits fit your usage
You want LlamaCloud’s other features (vector stores, managed indexes)

When to use pdfToMarkdown

You want framework-agnostic markdown output
You’re testing a new project and don’t want to commit to an ecosystem
You need to integrate with LangChain, raw OpenAI, or a custom stack
You want to test the API instantly without account creation
You prefer a simpler pricing model without ecosystem coupling

Framework agnosticism in practice

Here’s how pdfToMarkdown markdown integrates with LangChain — something LlamaParse makes awkward:

from pdftomarkdown import convert
from langchain.schema import Document
from langchain.text_splitter import MarkdownHeaderTextSplitter

# Convert
result = convert("report.pdf", api_key="your-key")

# Split by markdown headers for better chunking
splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[("#", "h1"), ("##", "h2"), ("###", "h3")]
)
chunks = splitter.split_text(result.markdown)

# Each chunk is a standard LangChain Document
documents = [Document(page_content=c.page_content, metadata=c.metadata)
             for c in chunks]

And with raw OpenAI:

from pdftomarkdown import convert
from openai import OpenAI

client = OpenAI()
result = convert("contract.pdf", api_key="your-key")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": f"Summarize this contract:\n\n{result.markdown}"}
    ]
)

No special adapters, no framework types — just text.

Bottom line

	LlamaParse	pdfToMarkdown
Best for	LlamaIndex pipelines	Any stack, framework-agnostic use
Free tier	1,000 pages/day (account required)	Demo key (no signup) + 100/month with GitHub
Instruction following	Yes	No
API simplicity	SDK-first	HTTP-first
Output	LlamaIndex Documents	Plain markdown string
Ecosystem coupling	High (LlamaCloud)	None

If you’re building inside the LlamaIndex ecosystem, LlamaParse is the obvious choice. If you want a clean HTTP API that returns standard markdown and works with any framework, pdfToMarkdown is one curl command away.