Search "Claude OCR" and you'll find developers excited about using Anthropic's vision models to extract text from documents. And they're right — Claude Sonnet's visual understanding is impressive. It can read complex tables, understand layouts, and extract text that traditional OCR misses.
But there's a significant gap between "Claude can read this PDF" and "I have a production document processing system."
Claude OCR means calling Anthropic's API with images and hoping for good results. Graphlit is document infrastructure that uses Claude (among other extractors) as one component of a complete semantic platform.
This comparison explains what you get with raw Claude API calls, what you're missing, and why document infrastructure matters.
Table of Contents
- TL;DR — Quick Comparison
- What "Claude OCR" Actually Means
- The DIY Claude Pipeline
- What's Missing from Raw API Calls
- What Graphlit Provides
- Cost Comparison
- When to Use Claude Directly
- When You Need Infrastructure
TL;DR — Quick Comparison
What "Claude OCR" Actually Means
When people say "Claude OCR," they mean using Claude's vision capabilities to extract text from images or PDFs. Claude Sonnet 3.5/4 can:
- Read text from images with high accuracy
- Understand complex table structures
- Interpret charts and diagrams
- Handle handwritten content (to a degree)
- Preserve document layout in output
The vision quality is genuinely excellent. For complex documents with tables and mixed layouts, Claude often outperforms traditional OCR and even specialized document AI services.
But Claude is an LLM, not document infrastructure.
The DIY Claude Pipeline
To build "Claude OCR" into a working system, you need to:
1. PDF Processing
# Convert PDF pages to images
# Handle different PDF types (native, scanned, mixed)
# Manage image resolution and quality
# Deal with corrupted or malformed PDFs
2. Token Management
# Claude has context limits
# Large documents exceed single-call limits
# Split documents into chunks
# Reassemble results coherently
3. API Integration
# Handle rate limits (429 errors)
# Implement exponential backoff
# Manage concurrent requests
# Queue large batches
4. Error Handling
# Retry failed requests
# Handle partial failures
# Log errors for debugging
# Implement fallback strategies
5. Output Normalization
# Claude's output varies
# Parse and normalize responses
# Handle unexpected formats
# Convert to consistent structure
6. Everything Else
# Store extracted text
# Generate embeddings
# Build search index
# Create entity extraction
# Implement RAG conversations
# ...
This is months of engineering work to handle edge cases, scale reliably, and maintain over time.
What's Missing from Raw API Calls
No Standardized Output
Claude returns what it returns. Different prompts, different documents, different results. You need to build parsing and normalization.
No Multi-Page Handling
PDFs have hundreds of pages. Claude has context limits. You need to split, process, and reassemble — maintaining document coherence across chunks.
No Rate Limit Management
Hit Anthropic's rate limits and your pipeline stops. You need queuing, backoff, and retry logic.
No Error Recovery
API calls fail. Networks timeout. Claude occasionally hallucinates. Production systems need graceful degradation.
No Downstream Processing
Extraction is step one. What about:
- Embedding for vector search?
- Entity extraction (people, companies, dates)?
- Knowledge graph construction?
- Search indexing?
- RAG conversation assembly?
No Cost Optimization
Claude vision tokens are expensive. Without proper chunking and caching, costs explode at scale.
No Observability
When something goes wrong (and it will), how do you debug? Production systems need logging, metrics, and tracing.
What Graphlit Provides
Graphlit uses Claude as one of several extraction backends — but wraps it in production infrastructure:
Automatic PDF Processing
Upload a PDF. Get Markdown. We handle:
- Page extraction and image conversion
- Resolution optimization
- Multi-page document assembly
- Corrupted file handling
Token-Optimized Processing
We chunk documents intelligently:
- Respect Claude's context limits
- Maintain document coherence
- Optimize for extraction quality
- Minimize token costs
Production Reliability
Built-in infrastructure:
- Rate limit management with queuing
- Automatic retries with backoff
- Error logging and recovery
- Processing status tracking
Consistent Output
Every document produces:
- Clean Markdown with structure
- Preserved tables and formatting
- Consistent quality regardless of input
Everything After Extraction
Automatic downstream processing:
- Vector embeddings for semantic search
- Entity extraction (Schema.org types)
- Knowledge graph construction
- Hybrid search indexing
- RAG-ready conversations
Multiple Backend Options
Claude is one choice. You can also use:
- Azure AI Document Intelligence — Fast, reliable default
- Reducto — Specialized for structured documents
- Deepseek — Cost-effective for high volume
Cost Comparison
DIY Claude OCR Costs
API costs (Claude Sonnet):
- Input: ~$3 per million tokens
- Output: ~$15 per million tokens
- A 10-page PDF with images: ~50K tokens = ~$0.15-0.50
Infrastructure costs:
- PDF processing compute
- Vector database ($70-200/month)
- Embedding API ($50-100/month)
- Queue/worker infrastructure
Engineering costs:
- Initial build: 2-4 months
- Ongoing maintenance: Significant
- Edge case debugging: Continuous
Graphlit Costs
Credits include extraction (Claude or other backends) plus all infrastructure — embeddings, entities, search, conversations.
Real Cost Comparison
Processing 500 documents/month:
DIY Claude:
- API: ~$75-250/month (varies by document size)
- Vector DB: $70/month
- Embedding: $30/month
- Engineering: ??? (your team's time)
- Total: $175+ plus significant engineering time
Graphlit:
- Pro plan: $149/month
- Engineering: Hours of integration
- Total: $149/month, everything included
When to Use Claude Directly
Use raw Claude API calls when:
- One-off extraction: Single documents, manual review
- Experimentation: Learning how vision models work
- Custom prompts: Specific extraction needs with unique prompting
- Existing infrastructure: You've already built the pipeline
- Cost optimization at scale: Very high volume with custom optimization
If you're extracting a few documents manually and don't need search, embeddings, or conversations, direct API calls work fine.
When You Need Infrastructure
Use Graphlit when:
- Production workloads: Reliability and consistency matter
- Multiple documents: More than occasional one-offs
- Search and retrieval: You need to find information later
- Knowledge graphs: Entity extraction and relationships
- RAG applications: Conversational AI over documents
- Team collaboration: Multiple users accessing shared knowledge
- Time constraints: You can't spend months building infrastructure
The gap between "Claude can read this" and "production document system" is infrastructure. Graphlit provides that infrastructure.
Integration Example
DIY Claude OCR
import anthropic
import base64
from pdf2image import convert_from_path
client = anthropic.Anthropic()
# Convert PDF to images
images = convert_from_path("document.pdf")
results = []
for i, image in enumerate(images):
# Convert to base64
buffer = io.BytesIO()
image.save(buffer, format="PNG")
image_data = base64.b64encode(buffer.getvalue()).decode()
# Call Claude (handle rate limits yourself)
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_data}},
{"type": "text", "text": "Extract all text from this document page as Markdown."}
]
}]
)
results.append(response.content[0].text)
except anthropic.RateLimitError:
# Handle rate limiting (implement backoff)
pass
except Exception as e:
# Handle other errors
pass
# Combine results (handle page boundaries)
full_text = "\n\n".join(results)
# Now you need:
# - Embedding pipeline
# - Vector database
# - Entity extraction
# - Search indexing
# - RAG implementation
# - Error handling
# - ...
Graphlit with Claude Backend
import { Graphlit, Types } from 'graphlit-client';
const client = new Graphlit();
// Create workflow with Claude extraction
const workflow = await client.createWorkflow({
name: "Claude LLM Extraction",
preparation: {
jobs: [{
connector: {
type: Types.FilePreparationServiceTypes.ModelDocument,
modelDocument: {
specification: { id: claudeSpecificationId }
}
}
}]
}
});
// Ingest document — everything automatic
const result = await client.ingestUri(
"https://example.com/complex-document.pdf",
"Financial Report",
undefined,
undefined,
true,
{ id: workflow.createWorkflow?.id }
);
// Document is now:
// - Extracted by Claude with vision
// - Multi-page handled automatically
// - Embedded for vector search
// - Entities extracted
// - Knowledge graph updated
// - Search indexed
// Query immediately
const contents = await client.queryContents({
search: "quarterly revenue"
});
// RAG conversation ready
const response = await client.promptConversation(
"Summarize the key financial metrics",
conversationId,
{ id: specificationId }
);
Summary
Claude's vision capabilities are excellent. For document understanding and text extraction, Claude Sonnet is among the best available.
But "Claude OCR" isn't a product — it's an API call. Building production document processing requires:
- PDF handling and page management
- Token optimization and chunking
- Rate limiting and error recovery
- Output normalization
- Embedding and search infrastructure
- Entity extraction and knowledge graphs
- RAG conversation assembly
Graphlit provides the infrastructure that turns Claude's capabilities into a production system. You get Claude's extraction quality plus everything else you need to build AI applications.
Don't build document infrastructure from scratch. Use Claude through Graphlit and focus on your application.
Explore Graphlit Features:
- Document Processing — Extraction backend options
- Building Knowledge Graphs — Automatic entity extraction
- Complete Guide to Search — Hybrid semantic search
- PDF Extraction Comparison
Learn More:
Claude can read documents. Graphlit turns that into infrastructure.