Azure AI Document Intelligence + Graphlit: Enterprise Extraction Meets Semantic Infrastructure

Azure AI Document Intelligence (formerly Form Recognizer) is Microsoft's enterprise document processing service. It's battle-tested, SOC 2 compliant, and deeply integrated with the Azure ecosystem. It's also Graphlit's default extraction backend.

When you ingest documents into Graphlit, Azure AI Document Intelligence handles the extraction by default — giving you Microsoft's enterprise-grade OCR and layout analysis, combined with Graphlit's semantic infrastructure for everything that comes after.

This page explains what each platform provides, how they work together, and when you might want to use Azure directly vs. through Graphlit.

TL;DR — Quick Comparison
What Azure AI Document Intelligence Does
What Graphlit Adds
The Default Integration
When to Use Azure Directly
When to Use Graphlit with Azure
Pricing Comparison
Integration Example

TL;DR — Quick Comparison

Capability	Azure AI Document Intelligence	Graphlit (with Azure AI)
Primary Focus	Document extraction and analysis	End-to-end semantic infrastructure
OCR Quality	Excellent — enterprise-grade, 275+ languages	Uses Azure AI for extraction
Layout Analysis	Strong — tables, headers, paragraphs, figures	Uses Azure AI, adds semantic chunking
Pre-built Models	Invoices, receipts, IDs, tax forms, etc.	Access via Azure AI backend
Custom Models	Train on your document types	Use Azure directly for custom training
Output Format	JSON with coordinates, confidence scores	Markdown with automatic downstream processing
Vector Embeddings	Not included	Automatic on ingestion
Entity Extraction	Key-value pairs, tables	Schema.org entities (people, orgs, places, events)
Knowledge Graphs	Not included	Per-user knowledge graphs with relationships
Semantic Search	Not included	Hybrid vector + keyword + graph search
RAG Conversations	Not included	Built-in streaming conversations
Data Connectors	Azure Blob, custom integration	30+ connectors (Slack, GitHub, email, feeds)
Compliance	SOC 2, HIPAA, ISO 27001, Azure Government	Inherits Azure compliance for extraction

What Azure AI Document Intelligence Does

Azure AI Document Intelligence is Microsoft's enterprise document processing service. It excels at:

Enterprise-Grade OCR

275+ languages supported
Handles low-quality scans and photos
Confidence scores for extracted text
Bounding box coordinates for every element

Layout Analysis

The Layout model understands document structure:

Headers and titles
Paragraphs and sections
Tables (with cell merging)
Figures and captions
Page numbers and headers/footers

Pre-built Models

Specialized models for common document types:

Invoices: Vendor, line items, totals, due dates
Receipts: Merchant, items, subtotal, tax, tip
ID Documents: Name, DOB, address, ID number
Tax Forms: W-2, 1040, 1099 data extraction
Contracts: Parties, terms, dates, clauses
Health Insurance Cards: Member info, coverage details

Custom Models

Train models on your specific document types:

Template-based (fixed layouts)
Neural (variable layouts)
Composed models (multiple document types)

Compliance

Enterprise-ready security:

SOC 2 Type 2
HIPAA BAA available
ISO 27001, 27017, 27018
Azure Government regions

What Graphlit Adds

Graphlit takes Azure AI's extraction and builds complete semantic infrastructure:

Automatic Embedding

Every document is embedded for vector search immediately — no separate pipeline.

Semantic Entity Extraction

Beyond Azure's key-value pairs, Graphlit identifies Schema.org entities:

Document: "Partnership Agreement.pdf"
  → Person: Alice Chen (CEO, Acme Inc)
  → Person: Bob Smith (CFO, Widget Co)
  → Organization: Acme Inc
  → Organization: Widget Co
  → Event: Signing Date (March 15, 2024)

Knowledge Graphs

Entities are connected across documents. Alice appears in meeting notes, contracts, and emails — all linked in a navigable graph.

Hybrid Search

Search with:

Vector semantic similarity
Keyword matching
Graph-aware context expansion
Entity and metadata filters

RAG Conversations

Built-in conversational AI:

Streaming responses
Source citations
Conversation branching
Multi-turn context

30+ Data Connectors

Beyond document upload:

Communication: Slack, Discord, Teams, Email
Development: GitHub, Linear, Jira
Cloud Storage: Google Drive, Dropbox, SharePoint, OneDrive
Media: RSS feeds, podcasts, YouTube

The Default Integration

Azure AI Document Intelligence is Graphlit's default extraction backend. When you ingest a document without specifying a workflow, Azure AI handles extraction automatically.

import { Graphlit } from 'graphlit-client';

const client = new Graphlit();

// Default ingestion uses Azure AI Document Intelligence
const result = await client.ingestUri(
    "https://example.com/contract.pdf",
    "Partnership Agreement"
);

// Document is:
// - Extracted by Azure AI (OCR, layout, tables)
// - Converted to Markdown
// - Embedded for vector search
// - Entity-enriched
// - Knowledge graph connected
// - Search indexed

This gives you enterprise-grade extraction with zero configuration.

When to Use Azure Directly

Use Azure AI Document Intelligence directly when:

You need custom models: Training on specific document types (invoices, forms, IDs)
You need pre-built models: Invoice extraction, receipt parsing, ID verification
You need raw coordinates: Bounding boxes for UI overlays or validation
You're all-in on Azure: Deep Azure ecosystem integration (Logic Apps, Power Automate)
You need Azure Government: FedRAMP compliance requirements
You only need extraction: No search, conversations, or knowledge graphs

Azure AI is particularly strong for structured document automation — processing thousands of invoices or forms with consistent schemas.

When to Use Graphlit with Azure

Use Graphlit (which uses Azure AI by default) when:

You need the full stack: Extraction through conversation, all managed
You want automatic embeddings: No separate vector pipeline
You need semantic entities: People, organizations, events identified
You need knowledge graphs: Relationships across documents
You have multiple data sources: Not just documents — Slack, email, GitHub
You're building AI applications: Search, RAG chatbots, knowledge assistants
You want zero infrastructure: No databases, no pipelines, no maintenance

Graphlit gives you Azure AI's extraction quality plus complete semantic infrastructure.

Pricing Comparison

Azure AI Document Intelligence Pricing

Model	Price (Pay-as-you-go)
Read (OCR)	$1.50 / 1,000 pages
Layout	$10.00 / 1,000 pages
Pre-built (Invoice, Receipt, etc.)	$10.00 / 1,000 pages
Custom Extraction	$30.00 / 1,000 pages
Custom Classification	$10.00 / 1,000 pages

Commitment tiers available for volume discounts.

Graphlit Pricing

Plan	Cost	Includes
Free	100 credits	Full platform (Azure AI extraction + infrastructure)
Starter	$49/month	1,000 credits
Pro	$149/month	5,000 credits
Enterprise	Custom	Volume discounts, SLA

Graphlit credits include Azure AI extraction plus all downstream processing — embedding, entity extraction, search indexing, and conversations.

Cost Comparison

For a knowledge base with 1,000 documents/month:

Azure AI alone:

Layout extraction: ~$10/month
Plus: Vector database, embedding API, entity extraction, search infrastructure, engineering time

Graphlit with Azure AI:

Starter plan: $49/month (includes everything)
No additional infrastructure costs

Integration Example

Azure AI Direct: Extraction Only

from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential

client = DocumentIntelligenceClient(endpoint, AzureKeyCredential(key))

# Analyze document
poller = client.begin_analyze_document(
    "prebuilt-layout",
    document_url
)
result = poller.result()

# Now you need to:
# 1. Convert result to your format
# 2. Send to embedding API
# 3. Store in vector database  
# 4. Build entity extraction
# 5. Create search infrastructure
# 6. Implement RAG conversations

Graphlit with Azure AI: Complete Infrastructure

import { Graphlit, Types } from 'graphlit-client';

const client = new Graphlit();

// Ingest with Azure AI extraction (default) + automatic processing
const result = await client.ingestUri(
    "https://example.com/contract.pdf",
    "Partnership Agreement"
);

// Everything automatic:
// - Azure AI extracts with enterprise-grade OCR
// - Content converted to Markdown
// - Embedded for vector search
// - Entities extracted (people, orgs, dates)
// - Knowledge graph updated
// - Search index ready

// Immediately queryable
const contents = await client.queryContents({
    types: [Types.ContentTypes.File],
    search: "partnership terms"
});

// RAG conversation ready
const response = await client.promptConversation(
    "What are the key terms of this partnership?",
    conversationId,
    { id: specificationId }
);

Summary

Azure AI Document Intelligence is Microsoft's enterprise document processing powerhouse — battle-tested OCR, layout analysis, and pre-built models with enterprise compliance.

Graphlit uses Azure AI as its default extraction backend, adding semantic infrastructure — embeddings, entity extraction, knowledge graphs, search, and conversations.

Together, you get Microsoft's enterprise extraction quality powering Graphlit's AI infrastructure. Azure AI is already integrated — just start ingesting documents and the combination works automatically.

Explore Graphlit Features:

Document Processing — Extraction backend options
Building Knowledge Graphs — Automatic entity extraction
Complete Guide to Search — Hybrid semantic search
Data Connectors — 30+ source integrations

Learn More:

Enterprise extraction is the foundation. Semantic infrastructure is what makes it intelligent.