Workflows control how Graphlit processes content—from text extraction to entity recognition to enrichment. Think of them as pipelines: content enters, gets transformed through multiple stages, and emerges indexed and searchable with extracted metadata.
This guide covers the three workflow stages (preparation, extraction, enrichment), model selection, entity type configuration, and production patterns. By the end, you'll know how to customize processing for any use case.
What You'll Learn
- The three workflow stages and when to use each
- Preparation: Text extraction strategies (OCR, vision models)
- Extraction: Entity extraction configuration
- Enrichment: Summarization and generation
- Model selection by content type
- Multi-stage complex workflows
- Production workflow patterns
Prerequisites: A Graphlit project, SDK installed, content to process.
Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.
Part 1: Workflow Architecture
The Three Stages
Every workflow can have up to three stages:
Content → PREPARATION → EXTRACTION → ENRICHMENT → Indexed Content
Preparation: Extract raw text from files (PDFs, images, audio, video)
Extraction: Extract structured data (entities, topics, relationships)
Enrichment: Generate derivatives (summaries, audio, translations)
You can use any combination—one stage, two stages, or all three.
Basic Workflow
import { Graphlit } from 'graphlit-client';
import {
FilePreparationServiceTypes,
EntityExtractionServiceTypes,
ObservableTypes
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
const workflow = await graphlit.createWorkflow({
name: "PDF Processing",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument // Vision-based extraction
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Place
]
}
}]
}
});
console.log('Workflow created:', workflow.createWorkflow.id);
// Use workflow with content
const content = await graphlit.ingestUri(
'https://example.com/document.pdf',
'Document',
undefined,
undefined,
undefined,
{ id: workflow.createWorkflow.id }
);
Part 2: Preparation Stage
Preparation extracts raw text from files. Critical for making content searchable.
Text Files (No Preparation Needed)
// For plain text, markdown, HTML—skip preparation
const workflow = await graphlit.createWorkflow({
name: "Text Only",
// No preparation stage
extraction: {
jobs: [/* ... */]
}
});
Document Files (PDFs, Word, Excel)
Two strategies:
1. Traditional OCR (Fast, Lower Quality)
import { FilePreparationServiceTypes } from 'graphlit-client/dist/generated/graphql-types';
const ocrWorkflow = await graphlit.createWorkflow({
name: "OCR Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.Document // Traditional OCR
}
}]
}
});
Good for: Simple PDFs, fast processing
Bad for: Scanned docs, tables, multi-column layouts
2. Vision Models (Slower, Higher Quality)
const visionWorkflow = await graphlit.createWorkflow({
name: "Vision Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument // GPT-4 Vision
}
}]
}
});
Good for: Scanned PDFs, complex tables, handwriting
Bad for: Cost-sensitive applications (uses GPT-4 Vision)
Audio/Video Files
const audioWorkflow = await graphlit.createWorkflow({
name: "Audio Transcription",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelAudio // Transcribe to text
}
}]
}
});
What it does:
- Transcribes speech to text
- Identifies speakers (if multiple)
- Timestamps segments
Use cases: Podcast processing, meeting transcripts, video content
Email/Message Files
const emailWorkflow = await graphlit.createWorkflow({
name: "Email Processing",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.Email // Email-specific parsing
}
}]
}
});
Extracts:
- Email body (HTML to text)
- Headers (from, to, subject, date)
- Attachments
- Quoted replies
Part 3: Extraction Stage
Extract structured data from text.
Entity Extraction
Configure entity types:
import { ObservableTypes, EntityExtractionServiceTypes } from 'graphlit-client/dist/generated/graphql-types';
const entityWorkflow = await graphlit.createWorkflow({
name: "Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
// General types
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Place,
ObservableTypes.Event,
ObservableTypes.Product,
ObservableTypes.CreativeWork
]
}
}]
}
});
Medical entity types:
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalProcedure,
ObservableTypes.MedicalTreatment,
ObservableTypes.MedicalDrug,
ObservableTypes.MedicalDevice,
ObservableTypes.MedicalTest
]
}
}]
}
Multi-Pass Extraction
Run multiple extraction jobs:
extraction: {
jobs: [
// Pass 1: General entities
{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization
]
}
},
// Pass 2: Medical entities (if needed)
{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalProcedure
]
}
}
]
}
When to use: Documents with both general and domain-specific entities (e.g., medical reports mentioning doctors and treatments)
Part 4: Enrichment Stage
Generate derivatives from content.
Summarization
import { EnrichmentServiceTypes } from 'graphlit-client/dist/generated/graphlit-types';
const summaryWorkflow = await graphlit.createWorkflow({
name: "Summarization",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument
}
}]
},
enrichment: {
jobs: [{
connector: {
type: EnrichmentServiceTypes.ModelSummarization,
prompt: "Summarize this document in 3-5 sentences."
}
}]
}
});
Use cases:
- Document previews
- Executive summaries
- Email digests
Text-to-Speech
enrichment: {
jobs: [{
connector: {
type: EnrichmentServiceTypes.ModelAudioGeneration,
voice: "alloy" // ElevenLabs or OpenAI voice
}
}]
}
Generates: Audio version of text content
Image Generation
enrichment: {
jobs: [{
connector: {
type: EnrichmentServiceTypes.ModelImageGeneration,
prompt: "Create a visual representation of this concept"
}
}]
}
Part 5: Complete Workflow Examples
Example 1: Research Paper Processing
const researchWorkflow = await graphlit.createWorkflow({
name: "Research Papers",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument // Vision for tables/diagrams
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person, // Authors, researchers
ObservableTypes.Organization, // Universities, labs
ObservableTypes.CreativeWork // Citations
]
}
}]
},
enrichment: {
jobs: [{
connector: {
type: EnrichmentServiceTypes.ModelSummarization,
prompt: "Summarize the key findings, methodology, and conclusions."
}
}]
}
});
Example 2: Customer Support Emails
const supportWorkflow = await graphlit.createWorkflow({
name: "Support Emails",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.Email
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.Person, // Customer names
ObservableTypes.Organization, // Company names
ObservableTypes.Product // Products mentioned
]
}
}]
}
});
Example 3: Meeting Recordings
const meetingWorkflow = await graphlit.createWorkflow({
name: "Meeting Transcripts",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelAudio // Transcribe
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.Person, // Attendees, mentioned people
ObservableTypes.Event, // Mentioned meetings, deadlines
ObservableTypes.Product // Products discussed
]
}
}]
},
enrichment: {
jobs: [{
connector: {
type: EnrichmentServiceTypes.ModelSummarization,
prompt: "Summarize: meeting topic, key decisions, action items, attendees."
}
}]
}
});
Part 6: Workflow Management
Query Workflows
const workflows = await graphlit.queryWorkflows();
workflows.workflows.results.forEach(workflow => {
console.log(`${workflow.name}:`);
console.log(` Preparation: ${workflow.preparation ? 'Yes' : 'No'}`);
console.log(` Extraction: ${workflow.extraction ? 'Yes' : 'No'}`);
console.log(` Enrichment: ${workflow.enrichment ? 'Yes' : 'No'}`);
});
Update Workflow
await graphlit.updateWorkflow(workflowId, {
name: 'Updated Name',
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Product // Added Product
]
}
}]
}
});
Delete Workflow
await graphlit.deleteWorkflow(workflowId);
Part 7: Production Patterns
Pattern 1: Workflow Library
Create reusable workflows for common scenarios:
// Create once, use many times
const workflows = {
simple: await graphlit.createWorkflow({ name: "Simple Text" }),
entities: await graphlit.createWorkflow({ name: "Entity Extraction" }),
summary: await graphlit.createWorkflow({ name: "With Summary" }),
medical: await graphlit.createWorkflow({ name: "Medical Entities" })
};
// Use appropriate workflow per content type
function getWorkflowForContent(contentType: string) {
switch (contentType) {
case 'medical': return workflows.medical.createWorkflow.id;
case 'research': return workflows.entities.createWorkflow.id;
default: return workflows.simple.createWorkflow.id;
}
}
Pattern 2: Conditional Workflows
Apply different workflows based on content characteristics:
async function ingestWithSmartWorkflow(uri: string) {
// Check file type
const isPdf = uri.endsWith('.pdf');
const isAudio = uri.match(/\.(mp3|wav|m4a)$/);
let workflowId;
if (isPdf) {
workflowId = visionWorkflowId; // Use vision for PDFs
} else if (isAudio) {
workflowId = audioWorkflowId; // Use transcription for audio
} else {
workflowId = simpleWorkflowId; // Default
}
return graphlit.ingestUri(uri, undefined, undefined, undefined, undefined, { id: workflowId });
}
Pattern 3: Workflow Monitoring
Track workflow usage:
// After processing, check what was extracted
const content = await graphlit.getContent(contentId);
console.log(`Workflow: ${content.content.workflow?.name}`);
console.log(`Pages extracted: ${content.content.pages?.length}`);
console.log(`Entities extracted: ${content.content.observations?.length}`);
if (content.content.summary) {
console.log(`Summary generated: ${content.content.summary.length} chars`);
}
Common Issues & Solutions
Issue: No Text Extracted
Problem: Content indexed but no searchable text.
Solution: Check preparation stage:
// Bad: No preparation for PDF
const workflow = await graphlit.createWorkflow({
name: "Broken",
// Missing preparation!
extraction: { /* ... */ }
});
// Good: Add preparation
const fixed = await graphlit.createWorkflow({
name: "Fixed",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument
}
}]
},
extraction: { /* ... */ }
});
Issue: Poor Entity Extraction
Problem: Entities not detected or incorrect.
Solutions:
- Use vision model for PDFs:
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument // Not .Document
}
}]
}
- Check entity types are relevant:
// Bad: Extracting medical entities from business docs
extractedTypes: [ObservableTypes.MedicalCondition]
// Good: Extract relevant types
extractedTypes: [ObservableTypes.Person, ObservableTypes.Organization]
Issue: Slow Processing
Problem: Content takes > 5 minutes to process.
Solutions:
- Simplify workflow (remove enrichment if not needed)
- Use faster preparation models:
// Slower: Vision model
type: FilePreparationServiceTypes.ModelDocument
// Faster: Traditional OCR
type: FilePreparationServiceTypes.Document
What's Next?
You now understand workflows completely. Next steps:
- Create workflow library for your use cases
- Optimize model selection (vision vs OCR)
- Monitor extraction quality in production
- Combine with specifications for custom models
Related guides:
- AI Models and Specifications - Choose custom models
- Building Knowledge Graphs - Use extracted entities
- Content Ingestion - Apply workflows during ingestion
Happy processing! ⚙️