Reducto + Graphlit: World-Class Extraction Meets Semantic Infrastructure

Here's the good news: you don't have to choose between Reducto and Graphlit. Reducto is one of the extraction backends that Graphlit integrates, which means you can get Reducto's excellent document parsing capabilities AND Graphlit's semantic infrastructure in a single platform.

Reducto specializes in document parsing — turning PDFs, spreadsheets, and presentations into structured, chunked content optimized for LLM applications. It's particularly strong at table extraction, form parsing, and intelligent chunking.

Graphlit is the semantic infrastructure layer that handles everything after extraction — embedding, entity extraction, knowledge graphs, hybrid search, and conversational AI. And when you use Graphlit, you can choose Reducto as your extraction backend.

This page explains what each platform does, when to use Reducto directly, and when the combination through Graphlit gives you more power.

TL;DR — Quick Comparison
What Reducto Does Best
What Graphlit Adds
Using Reducto Through Graphlit
When to Use Reducto Directly
When to Use Graphlit with Reducto
Pricing Comparison
Integration Example

TL;DR — Quick Comparison

Capability	Reducto	Graphlit (with Reducto)
Primary Focus	Document parsing and extraction	End-to-end semantic infrastructure
PDF Extraction	Excellent — state-of-the-art table and form extraction	Uses Reducto (or other backends) for extraction
Table Extraction	Best-in-class, layout-aware chunking	Inherits Reducto's capabilities
Form Extraction	Structured JSON extraction with custom schemas	Inherits Reducto's capabilities
Output Format	Markdown, JSON, structured chunks	Markdown with automatic downstream processing
Vector Embeddings	Not included	Automatic on ingestion
Entity Extraction	Not included	Automatic Schema.org entities
Knowledge Graphs	Not included	Per-user knowledge graphs
Semantic Search	Not included	Hybrid vector + keyword + graph search
RAG Conversations	Not included	Built-in streaming conversations
Data Connectors	Upload/URL only	30+ connectors (Slack, GitHub, email, feeds)
Pricing	$0.015/credit (~$0.015/page)	Usage-based credits (includes extraction + infrastructure)

What Reducto Does Best

Reducto is a document parsing powerhouse. If you need to extract structured data from complex documents, Reducto delivers:

State-of-the-Art Table Extraction

Reducto's table extraction is among the best available. It handles:

Complex merged cells
Multi-page tables
Nested table structures
Tables without clear borders

Intelligent Chunking

Unlike basic text splitters, Reducto's chunking is layout-aware:

Respects document structure (headings, sections)
Keeps tables intact
Preserves semantic coherence
Optimized for RAG retrieval

Structured Extraction

The /extract endpoint lets you define custom schemas and pull specific fields:

{
  "invoice_number": "INV-2024-001",
  "vendor_name": "Acme Corp",
  "line_items": [
    { "description": "Widget A", "quantity": 10, "price": 99.99 }
  ],
  "total_amount": 999.90
}

Format Support

Reducto handles 30+ file types:

PDFs (native and scanned)
Office documents (DOCX, XLSX, PPTX)
Images (PNG, JPEG, TIFF)
And more

What Graphlit Adds

Graphlit takes Reducto's excellent extraction and builds the complete AI infrastructure around it:

Automatic Embedding

Every document is embedded for vector search immediately after extraction — no separate embedding pipeline needed.

Entity Extraction

People, organizations, places, events, and products are automatically identified and linked:

Document: "Q3 Report.pdf"
  → Person: Alice Chen (CFO)
  → Organization: Acme Inc.
  → Event: Earnings Call (Oct 15, 2024)

Knowledge Graphs

Entities aren't just extracted — they're connected. Alice's mentions across all documents are linked, creating a navigable knowledge graph.

Hybrid Search

Search across your entire knowledge base with:

Vector semantic search
Keyword/BM25 search
Graph-aware context expansion
Entity and metadata filters

RAG Conversations

Built-in conversational AI with streaming responses, conversation branching, and automatic source citations.

30+ Data Connectors

Beyond document upload:

Communication: Slack, Discord, Teams, Email
Development: GitHub, Linear, Jira
Cloud Storage: Google Drive, Dropbox, SharePoint
Media: RSS feeds, podcasts, YouTube

Using Reducto Through Graphlit

When you configure a Graphlit workflow with Reducto as the extraction backend, you get:

Reducto's extraction quality — tables, forms, layouts all handled correctly
Automatic embedding — no separate pipeline
Entity extraction — people, orgs, places identified
Knowledge graph — relationships connected
Search index — immediately queryable
RAG ready — conversational AI available

import { Graphlit, Types } from 'graphlit-client';

const client = new Graphlit();

// Create a workflow that uses Reducto for extraction
const workflow = await client.createWorkflow({
    name: "Reducto Extraction Workflow",
    preparation: {
        jobs: [{
            connector: {
                type: Types.FilePreparationServiceTypes.Reducto,
                reducto: {
                    // Reducto-specific configuration
                }
            }
        }]
    }
});

// Ingest using Reducto extraction + full Graphlit processing
const result = await client.ingestUri(
    "https://example.com/complex-report.pdf",
    "Q3 Financial Report",
    undefined,
    undefined,
    true,
    { id: workflow.createWorkflow?.id }
);

// Document is now:
// - Extracted by Reducto (tables, forms, layout preserved)
// - Embedded for vector search
// - Entity-enriched
// - Knowledge graph connected
// - Searchable and RAG-ready

When to Use Reducto Directly

Use Reducto's API directly when:

You only need extraction: Your pipeline handles everything else
You have existing infrastructure: Vector DB, embedding pipeline, search — all built
You need the /extract endpoint: Structured JSON extraction with custom schemas for automation workflows
You're building document automation: Invoice processing, form extraction, data entry automation

Reducto excels as a focused extraction tool. If extraction is your only need, use it directly.

When to Use Graphlit with Reducto

Use Graphlit (with Reducto as the backend) when:

You need the full stack: Extraction through conversation, all managed
You want automatic entity extraction: People, organizations, events identified
You need knowledge graphs: Relationships and connections across documents
You have multiple data sources: Not just PDFs — Slack, GitHub, email, feeds
You're building AI applications: Semantic search, RAG chatbots, knowledge assistants
You want zero infrastructure: No databases to manage, no pipelines to maintain

The combination gives you Reducto's extraction quality plus Graphlit's semantic infrastructure.

Pricing Comparison

Reducto Pricing

Plan	Cost	Includes
Standard	$0.015/credit	Parse, Extract, Edit, Split APIs
Growth	Custom	Volume discounts, zero retention
Enterprise	Custom	VPC, custom SLAs, dedicated support

Credits roughly correspond to pages, though complex documents may use more.

Graphlit Pricing

Plan	Cost	Includes
Free	100 credits	Full platform (extraction + infrastructure)
Starter	$49/month	1,000 credits
Pro	$149/month	5,000 credits
Enterprise	Custom	Volume discounts, SLA

Graphlit credits include extraction (via Reducto or other backends) plus all downstream processing — embedding, entity extraction, search indexing, and conversations.

Cost Comparison

For a RAG application with 1,000 documents/month:

Reducto alone:

Extraction: ~$15/month
Plus: Vector database, embedding API, entity extraction, search infrastructure, engineering time

Graphlit with Reducto:

Starter plan: $49/month (includes everything)
No additional infrastructure costs

Integration Example

Reducto Direct: Extraction Only

import reducto

client = reducto.Reducto(api_key="...")

# Parse document
result = client.parse.run(
    document_url="https://example.com/report.pdf",
    options={"chunking_method": "semantic"}
)

# Now you need to:
# 1. Send chunks to embedding API
# 2. Store in vector database
# 3. Build entity extraction pipeline
# 4. Create search infrastructure
# 5. Implement RAG conversations

Graphlit with Reducto: Complete Infrastructure

import { Graphlit, Types } from 'graphlit-client';

const client = new Graphlit();

// Ingest with Reducto extraction + automatic processing
const result = await client.ingestUri(
    "https://example.com/report.pdf",
    "Financial Report",
    undefined,
    undefined,
    true,
    { id: reductoWorkflowId }  // Workflow configured for Reducto
);

// Everything is automatic:
// - Reducto extracts with excellent table handling
// - Content is embedded
// - Entities are extracted
// - Knowledge graph is updated
// - Search index is ready

// Immediately queryable
const contents = await client.queryContents({
    types: [Types.ContentTypes.File],
    search: "quarterly revenue"
});

// RAG conversation ready
const response = await client.promptConversation(
    "What were the key financial metrics?",
    conversationId,
    { id: specificationId }
);

Summary

Reducto is excellent at what it does — document parsing with state-of-the-art table extraction and intelligent chunking.

Graphlit builds the semantic infrastructure around extraction — embedding, entities, knowledge graphs, search, and conversations.

Together, you get the best of both worlds: Reducto's extraction quality powering Graphlit's AI infrastructure. No need to choose — use Reducto through Graphlit and get everything.

Explore Graphlit Features:

Document Processing — Configure extraction backends
Building Knowledge Graphs — Automatic entity extraction
Complete Guide to Search — Hybrid semantic search
Workflows and Processing — Custom processing pipelines

Learn More:

Great extraction is the foundation. Semantic infrastructure is what you build on top.