Comparison

Reducto + Graphlit: World-Class Extraction Meets Semantic Infrastructure

Kirk Marple
Kirk Marple
December 5, 2025
Comparison

Here's the good news: you don't have to choose between Reducto and Graphlit. Reducto is one of the extraction backends that Graphlit integrates, which means you can get Reducto's excellent document parsing capabilities AND Graphlit's semantic infrastructure in a single platform.

Reducto specializes in document parsing — turning PDFs, spreadsheets, and presentations into structured, chunked content optimized for LLM applications. It's particularly strong at table extraction, form parsing, and intelligent chunking.

Graphlit is the semantic infrastructure layer that handles everything after extraction — embedding, entity extraction, knowledge graphs, hybrid search, and conversational AI. And when you use Graphlit, you can choose Reducto as your extraction backend.

This page explains what each platform does, when to use Reducto directly, and when the combination through Graphlit gives you more power.


Table of Contents

  1. TL;DR — Quick Comparison
  2. What Reducto Does Best
  3. What Graphlit Adds
  4. Using Reducto Through Graphlit
  5. When to Use Reducto Directly
  6. When to Use Graphlit with Reducto
  7. Pricing Comparison
  8. Integration Example

TL;DR — Quick Comparison

CapabilityReductoGraphlit (with Reducto)
Primary FocusDocument parsing and extractionEnd-to-end semantic infrastructure
PDF ExtractionExcellent — state-of-the-art table and form extractionUses Reducto (or other backends) for extraction
Table ExtractionBest-in-class, layout-aware chunkingInherits Reducto's capabilities
Form ExtractionStructured JSON extraction with custom schemasInherits Reducto's capabilities
Output FormatMarkdown, JSON, structured chunksMarkdown with automatic downstream processing
Vector EmbeddingsNot includedAutomatic on ingestion
Entity ExtractionNot includedAutomatic Schema.org entities
Knowledge GraphsNot includedPer-user knowledge graphs
Semantic SearchNot includedHybrid vector + keyword + graph search
RAG ConversationsNot includedBuilt-in streaming conversations
Data ConnectorsUpload/URL only30+ connectors (Slack, GitHub, email, feeds)
Pricing$0.015/credit (~$0.015/page)Usage-based credits (includes extraction + infrastructure)

What Reducto Does Best

Reducto is a document parsing powerhouse. If you need to extract structured data from complex documents, Reducto delivers:

State-of-the-Art Table Extraction

Reducto's table extraction is among the best available. It handles:

  • Complex merged cells
  • Multi-page tables
  • Nested table structures
  • Tables without clear borders

Intelligent Chunking

Unlike basic text splitters, Reducto's chunking is layout-aware:

  • Respects document structure (headings, sections)
  • Keeps tables intact
  • Preserves semantic coherence
  • Optimized for RAG retrieval

Structured Extraction

The /extract endpoint lets you define custom schemas and pull specific fields:

{
  "invoice_number": "INV-2024-001",
  "vendor_name": "Acme Corp",
  "line_items": [
    { "description": "Widget A", "quantity": 10, "price": 99.99 }
  ],
  "total_amount": 999.90
}

Format Support

Reducto handles 30+ file types:

  • PDFs (native and scanned)
  • Office documents (DOCX, XLSX, PPTX)
  • Images (PNG, JPEG, TIFF)
  • And more

What Graphlit Adds

Graphlit takes Reducto's excellent extraction and builds the complete AI infrastructure around it:

Automatic Embedding

Every document is embedded for vector search immediately after extraction — no separate embedding pipeline needed.

Entity Extraction

People, organizations, places, events, and products are automatically identified and linked:

Document: "Q3 Report.pdf"
  → Person: Alice Chen (CFO)
  → Organization: Acme Inc.
  → Event: Earnings Call (Oct 15, 2024)

Knowledge Graphs

Entities aren't just extracted — they're connected. Alice's mentions across all documents are linked, creating a navigable knowledge graph.

Hybrid Search

Search across your entire knowledge base with:

  • Vector semantic search
  • Keyword/BM25 search
  • Graph-aware context expansion
  • Entity and metadata filters

RAG Conversations

Built-in conversational AI with streaming responses, conversation branching, and automatic source citations.

30+ Data Connectors

Beyond document upload:

  • Communication: Slack, Discord, Teams, Email
  • Development: GitHub, Linear, Jira
  • Cloud Storage: Google Drive, Dropbox, SharePoint
  • Media: RSS feeds, podcasts, YouTube

Using Reducto Through Graphlit

When you configure a Graphlit workflow with Reducto as the extraction backend, you get:

  1. Reducto's extraction quality — tables, forms, layouts all handled correctly
  2. Automatic embedding — no separate pipeline
  3. Entity extraction — people, orgs, places identified
  4. Knowledge graph — relationships connected
  5. Search index — immediately queryable
  6. RAG ready — conversational AI available
import { Graphlit, Types } from 'graphlit-client';

const client = new Graphlit();

// Create a workflow that uses Reducto for extraction
const workflow = await client.createWorkflow({
    name: "Reducto Extraction Workflow",
    preparation: {
        jobs: [{
            connector: {
                type: Types.FilePreparationServiceTypes.Reducto,
                reducto: {
                    // Reducto-specific configuration
                }
            }
        }]
    }
});

// Ingest using Reducto extraction + full Graphlit processing
const result = await client.ingestUri(
    "https://example.com/complex-report.pdf",
    "Q3 Financial Report",
    undefined,
    undefined,
    true,
    { id: workflow.createWorkflow?.id }
);

// Document is now:
// - Extracted by Reducto (tables, forms, layout preserved)
// - Embedded for vector search
// - Entity-enriched
// - Knowledge graph connected
// - Searchable and RAG-ready

When to Use Reducto Directly

Use Reducto's API directly when:

  • You only need extraction: Your pipeline handles everything else
  • You have existing infrastructure: Vector DB, embedding pipeline, search — all built
  • You need the /extract endpoint: Structured JSON extraction with custom schemas for automation workflows
  • You're building document automation: Invoice processing, form extraction, data entry automation

Reducto excels as a focused extraction tool. If extraction is your only need, use it directly.


When to Use Graphlit with Reducto

Use Graphlit (with Reducto as the backend) when:

  • You need the full stack: Extraction through conversation, all managed
  • You want automatic entity extraction: People, organizations, events identified
  • You need knowledge graphs: Relationships and connections across documents
  • You have multiple data sources: Not just PDFs — Slack, GitHub, email, feeds
  • You're building AI applications: Semantic search, RAG chatbots, knowledge assistants
  • You want zero infrastructure: No databases to manage, no pipelines to maintain

The combination gives you Reducto's extraction quality plus Graphlit's semantic infrastructure.


Pricing Comparison

Reducto Pricing

PlanCostIncludes
Standard$0.015/creditParse, Extract, Edit, Split APIs
GrowthCustomVolume discounts, zero retention
EnterpriseCustomVPC, custom SLAs, dedicated support

Credits roughly correspond to pages, though complex documents may use more.

Graphlit Pricing

PlanCostIncludes
Free100 creditsFull platform (extraction + infrastructure)
Starter$49/month1,000 credits
Pro$149/month5,000 credits
EnterpriseCustomVolume discounts, SLA

Graphlit credits include extraction (via Reducto or other backends) plus all downstream processing — embedding, entity extraction, search indexing, and conversations.

Cost Comparison

For a RAG application with 1,000 documents/month:

Reducto alone:

  • Extraction: ~$15/month
  • Plus: Vector database, embedding API, entity extraction, search infrastructure, engineering time

Graphlit with Reducto:

  • Starter plan: $49/month (includes everything)
  • No additional infrastructure costs

Integration Example

Reducto Direct: Extraction Only

import reducto

client = reducto.Reducto(api_key="...")

# Parse document
result = client.parse.run(
    document_url="https://example.com/report.pdf",
    options={"chunking_method": "semantic"}
)

# Now you need to:
# 1. Send chunks to embedding API
# 2. Store in vector database
# 3. Build entity extraction pipeline
# 4. Create search infrastructure
# 5. Implement RAG conversations

Graphlit with Reducto: Complete Infrastructure

import { Graphlit, Types } from 'graphlit-client';

const client = new Graphlit();

// Ingest with Reducto extraction + automatic processing
const result = await client.ingestUri(
    "https://example.com/report.pdf",
    "Financial Report",
    undefined,
    undefined,
    true,
    { id: reductoWorkflowId }  // Workflow configured for Reducto
);

// Everything is automatic:
// - Reducto extracts with excellent table handling
// - Content is embedded
// - Entities are extracted
// - Knowledge graph is updated
// - Search index is ready

// Immediately queryable
const contents = await client.queryContents({
    types: [Types.ContentTypes.File],
    search: "quarterly revenue"
});

// RAG conversation ready
const response = await client.promptConversation(
    "What were the key financial metrics?",
    conversationId,
    { id: specificationId }
);

Summary

Reducto is excellent at what it does — document parsing with state-of-the-art table extraction and intelligent chunking.

Graphlit builds the semantic infrastructure around extraction — embedding, entities, knowledge graphs, search, and conversations.

Together, you get the best of both worlds: Reducto's extraction quality powering Graphlit's AI infrastructure. No need to choose — use Reducto through Graphlit and get everything.


Explore Graphlit Features:

Learn More:

Great extraction is the foundation. Semantic infrastructure is what you build on top.

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

Reducto + Graphlit: World-Class Extraction Meets Semantic Infrastructure