Claude OCR vs. Graphlit: Raw LLM Calls vs. Document Infrastructure

Search "Claude OCR" and you'll find developers excited about using Anthropic's vision models to extract text from documents. And they're right — Claude Sonnet's visual understanding is impressive. It can read complex tables, understand layouts, and extract text that traditional OCR misses.

But there's a significant gap between "Claude can read this PDF" and "I have a production document processing system."

Claude OCR means calling Anthropic's API with images and hoping for good results. Graphlit is document infrastructure that uses Claude (among other extractors) as one component of a complete semantic platform.

This comparison explains what you get with raw Claude API calls, what you're missing, and why document infrastructure matters.

TL;DR — Quick Comparison
What "Claude OCR" Actually Means
The DIY Claude Pipeline
What's Missing from Raw API Calls
What Graphlit Provides
Cost Comparison
When to Use Claude Directly
When You Need Infrastructure

TL;DR — Quick Comparison

Capability	Claude API (DIY OCR)	Graphlit (with Claude backend)
What You Get	Raw API calls, you build everything else	Complete document infrastructure
Vision Quality	Excellent — Claude Sonnet is best-in-class	Same Claude quality, properly integrated
PDF Handling	DIY: Convert pages to images, manage tokens	Automatic: Upload PDF, get Markdown
Multi-page Documents	DIY: Split, process, reassemble	Automatic: Handled seamlessly
Rate Limiting	DIY: Implement backoff, queuing	Automatic: Managed for you
Error Handling	DIY: Retries, fallbacks, logging	Automatic: Built-in resilience
Token Management	DIY: Count tokens, manage context	Automatic: Optimized chunking
Output Format	Whatever Claude returns	Consistent Markdown with structure
Vector Embeddings	DIY: Separate embedding pipeline	Automatic on ingestion
Entity Extraction	DIY: Additional prompts or pipelines	Automatic Schema.org entities
Knowledge Graphs	DIY: Build from scratch	Per-user knowledge graphs
Semantic Search	DIY: Vector DB, indexing, query	Hybrid search included
RAG Conversations	DIY: Context assembly, streaming	Built-in with streaming
Cost	~$3/1M input tokens + engineering time	Usage-based credits, infrastructure included

What "Claude OCR" Actually Means

When people say "Claude OCR," they mean using Claude's vision capabilities to extract text from images or PDFs. Claude Sonnet 3.5/4 can:

Read text from images with high accuracy
Understand complex table structures
Interpret charts and diagrams
Handle handwritten content (to a degree)
Preserve document layout in output

The vision quality is genuinely excellent. For complex documents with tables and mixed layouts, Claude often outperforms traditional OCR and even specialized document AI services.

But Claude is an LLM, not document infrastructure.

The DIY Claude Pipeline

To build "Claude OCR" into a working system, you need to:

1. PDF Processing

# Convert PDF pages to images
# Handle different PDF types (native, scanned, mixed)
# Manage image resolution and quality
# Deal with corrupted or malformed PDFs

2. Token Management

# Claude has context limits
# Large documents exceed single-call limits
# Split documents into chunks
# Reassemble results coherently

3. API Integration

# Handle rate limits (429 errors)
# Implement exponential backoff
# Manage concurrent requests
# Queue large batches

4. Error Handling

# Retry failed requests
# Handle partial failures
# Log errors for debugging
# Implement fallback strategies

5. Output Normalization

# Claude's output varies
# Parse and normalize responses
# Handle unexpected formats
# Convert to consistent structure

6. Everything Else

# Store extracted text
# Generate embeddings
# Build search index
# Create entity extraction
# Implement RAG conversations
# ...

This is months of engineering work to handle edge cases, scale reliably, and maintain over time.

What's Missing from Raw API Calls

No Standardized Output

Claude returns what it returns. Different prompts, different documents, different results. You need to build parsing and normalization.

No Multi-Page Handling

PDFs have hundreds of pages. Claude has context limits. You need to split, process, and reassemble — maintaining document coherence across chunks.

No Rate Limit Management

Hit Anthropic's rate limits and your pipeline stops. You need queuing, backoff, and retry logic.

No Error Recovery

API calls fail. Networks timeout. Claude occasionally hallucinates. Production systems need graceful degradation.

No Downstream Processing

Extraction is step one. What about:

Embedding for vector search?
Entity extraction (people, companies, dates)?
Knowledge graph construction?
Search indexing?
RAG conversation assembly?

No Cost Optimization

Claude vision tokens are expensive. Without proper chunking and caching, costs explode at scale.

No Observability

When something goes wrong (and it will), how do you debug? Production systems need logging, metrics, and tracing.

What Graphlit Provides

Graphlit uses Claude as one of several extraction backends — but wraps it in production infrastructure:

Automatic PDF Processing

Upload a PDF. Get Markdown. We handle:

Page extraction and image conversion
Resolution optimization
Multi-page document assembly
Corrupted file handling

Token-Optimized Processing

We chunk documents intelligently:

Respect Claude's context limits
Maintain document coherence
Optimize for extraction quality
Minimize token costs

Production Reliability

Built-in infrastructure:

Rate limit management with queuing
Automatic retries with backoff
Error logging and recovery
Processing status tracking

Consistent Output

Every document produces:

Clean Markdown with structure
Preserved tables and formatting
Consistent quality regardless of input

Everything After Extraction

Automatic downstream processing:

Vector embeddings for semantic search
Entity extraction (Schema.org types)
Knowledge graph construction
Hybrid search indexing
RAG-ready conversations

Multiple Backend Options

Claude is one choice. You can also use:

Azure AI Document Intelligence — Fast, reliable default
Reducto — Specialized for structured documents
Deepseek — Cost-effective for high volume

Cost Comparison

DIY Claude OCR Costs

API costs (Claude Sonnet):

Input: ~$3 per million tokens
Output: ~$15 per million tokens
A 10-page PDF with images: ~50K tokens = ~$0.15-0.50

Infrastructure costs:

PDF processing compute
Vector database ($70-200/month)
Embedding API ($50-100/month)
Queue/worker infrastructure

Engineering costs:

Initial build: 2-4 months
Ongoing maintenance: Significant
Edge case debugging: Continuous

Graphlit Costs

Plan	Cost	Includes
Free	100 credits	Full platform with Claude extraction
Starter	$49/month	1,000 credits
Pro	$149/month	5,000 credits
Enterprise	Custom	Volume discounts

Credits include extraction (Claude or other backends) plus all infrastructure — embeddings, entities, search, conversations.

Real Cost Comparison

Processing 500 documents/month:

DIY Claude:

API: ~$75-250/month (varies by document size)
Vector DB: $70/month
Embedding: $30/month
Engineering: ??? (your team's time)
Total: $175+ plus significant engineering time

Graphlit:

Pro plan: $149/month
Engineering: Hours of integration
Total: $149/month, everything included

When to Use Claude Directly

Use raw Claude API calls when:

One-off extraction: Single documents, manual review
Experimentation: Learning how vision models work
Custom prompts: Specific extraction needs with unique prompting
Existing infrastructure: You've already built the pipeline
Cost optimization at scale: Very high volume with custom optimization

If you're extracting a few documents manually and don't need search, embeddings, or conversations, direct API calls work fine.

When You Need Infrastructure

Use Graphlit when:

Production workloads: Reliability and consistency matter
Multiple documents: More than occasional one-offs
Search and retrieval: You need to find information later
Knowledge graphs: Entity extraction and relationships
RAG applications: Conversational AI over documents
Team collaboration: Multiple users accessing shared knowledge
Time constraints: You can't spend months building infrastructure

The gap between "Claude can read this" and "production document system" is infrastructure. Graphlit provides that infrastructure.

Integration Example

DIY Claude OCR

import anthropic
import base64
from pdf2image import convert_from_path

client = anthropic.Anthropic()

# Convert PDF to images
images = convert_from_path("document.pdf")

results = []
for i, image in enumerate(images):
    # Convert to base64
    buffer = io.BytesIO()
    image.save(buffer, format="PNG")
    image_data = base64.b64encode(buffer.getvalue()).decode()
    
    # Call Claude (handle rate limits yourself)
    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_data}},
                    {"type": "text", "text": "Extract all text from this document page as Markdown."}
                ]
            }]
        )
        results.append(response.content[0].text)
    except anthropic.RateLimitError:
        # Handle rate limiting (implement backoff)
        pass
    except Exception as e:
        # Handle other errors
        pass

# Combine results (handle page boundaries)
full_text = "\n\n".join(results)

# Now you need:
# - Embedding pipeline
# - Vector database
# - Entity extraction
# - Search indexing
# - RAG implementation
# - Error handling
# - ...

Graphlit with Claude Backend

import { Graphlit, Types } from 'graphlit-client';

const client = new Graphlit();

// Create workflow with Claude extraction
const workflow = await client.createWorkflow({
    name: "Claude LLM Extraction",
    preparation: {
        jobs: [{
            connector: {
                type: Types.FilePreparationServiceTypes.ModelDocument,
                modelDocument: {
                    specification: { id: claudeSpecificationId }
                }
            }
        }]
    }
});

// Ingest document — everything automatic
const result = await client.ingestUri(
    "https://example.com/complex-document.pdf",
    "Financial Report",
    undefined,
    undefined,
    true,
    { id: workflow.createWorkflow?.id }
);

// Document is now:
// - Extracted by Claude with vision
// - Multi-page handled automatically
// - Embedded for vector search
// - Entities extracted
// - Knowledge graph updated
// - Search indexed

// Query immediately
const contents = await client.queryContents({
    search: "quarterly revenue"
});

// RAG conversation ready
const response = await client.promptConversation(
    "Summarize the key financial metrics",
    conversationId,
    { id: specificationId }
);

Summary

Claude's vision capabilities are excellent. For document understanding and text extraction, Claude Sonnet is among the best available.

But "Claude OCR" isn't a product — it's an API call. Building production document processing requires:

PDF handling and page management
Token optimization and chunking
Rate limiting and error recovery
Output normalization
Embedding and search infrastructure
Entity extraction and knowledge graphs
RAG conversation assembly

Graphlit provides the infrastructure that turns Claude's capabilities into a production system. You get Claude's extraction quality plus everything else you need to build AI applications.

Don't build document infrastructure from scratch. Use Claude through Graphlit and focus on your application.

Explore Graphlit Features:

Document Processing — Extraction backend options
Building Knowledge Graphs — Automatic entity extraction
Complete Guide to Search — Hybrid semantic search
PDF Extraction Comparison

Learn More:

Claude can read documents. Graphlit turns that into infrastructure.