Core Platform16 min read

Workflows and Processing Pipelines

Configure content processing pipelines with preparation, extraction, and enrichment stages. Learn about workflow creation, model selection, and complex multi-stage pipelines.

Workflows control how Graphlit processes content—from text extraction to entity recognition to enrichment. Think of them as pipelines: content enters, gets transformed through multiple stages, and emerges indexed and searchable with extracted metadata.

This guide covers the three workflow stages (preparation, extraction, enrichment), model selection, entity type configuration, and production patterns. By the end, you'll know how to customize processing for any use case.

What You'll Learn

  • The three workflow stages and when to use each
  • Preparation: Text extraction strategies (OCR, vision models)
  • Extraction: Entity extraction configuration
  • Enrichment: Summarization and generation
  • Model selection by content type
  • Multi-stage complex workflows
  • Production workflow patterns

Prerequisites: A Graphlit project, SDK installed, content to process.

Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.


Part 1: Workflow Architecture

The Three Stages

Every workflow can have up to three stages:

Content → PREPARATION → EXTRACTION → ENRICHMENT → Indexed Content

Preparation: Extract raw text from files (PDFs, images, audio, video)
Extraction: Extract structured data (entities, topics, relationships)
Enrichment: Generate derivatives (summaries, audio, translations)

You can use any combination—one stage, two stages, or all three.

Basic Workflow

import { Graphlit } from 'graphlit-client';
import {
  FilePreparationServiceTypes,
  EntityExtractionServiceTypes,
  ObservableTypes
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

const workflow = await graphlit.createWorkflow({
  name: "PDF Processing",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument  // Vision-based extraction
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          ObservableTypes.Person,
          ObservableTypes.Organization,
          ObservableTypes.Place
        ]
      }
    }]
  }
});

console.log('Workflow created:', workflow.createWorkflow.id);

// Use workflow with content
const content = await graphlit.ingestUri(
  'https://example.com/document.pdf',
  'Document',
  undefined,
  undefined,
  undefined,
  { id: workflow.createWorkflow.id }
);

Part 2: Preparation Stage

Preparation extracts raw text from files. Critical for making content searchable.

Text Files (No Preparation Needed)

// For plain text, markdown, HTML—skip preparation
const workflow = await graphlit.createWorkflow({
  name: "Text Only",
  // No preparation stage
  extraction: {
    jobs: [/* ... */]
  }
});

Document Files (PDFs, Word, Excel)

Two strategies:

1. Traditional OCR (Fast, Lower Quality)

import { FilePreparationServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

const ocrWorkflow = await graphlit.createWorkflow({
  name: "OCR Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.Document  // Traditional OCR
      }
    }]
  }
});

Good for: Simple PDFs, fast processing
Bad for: Scanned docs, tables, multi-column layouts

2. Vision Models (Slower, Higher Quality)

const visionWorkflow = await graphlit.createWorkflow({
  name: "Vision Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument  // GPT-4 Vision
      }
    }]
  }
});

Good for: Scanned PDFs, complex tables, handwriting
Bad for: Cost-sensitive applications (uses GPT-4 Vision)

Audio/Video Files

const audioWorkflow = await graphlit.createWorkflow({
  name: "Audio Transcription",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelAudio  // Transcribe to text
      }
    }]
  }
});

What it does:

  • Transcribes speech to text
  • Identifies speakers (if multiple)
  • Timestamps segments

Use cases: Podcast processing, meeting transcripts, video content

Email/Message Files

const emailWorkflow = await graphlit.createWorkflow({
  name: "Email Processing",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.Email  // Email-specific parsing
      }
    }]
  }
});

Extracts:

  • Email body (HTML to text)
  • Headers (from, to, subject, date)
  • Attachments
  • Quoted replies

Part 3: Extraction Stage

Extract structured data from text.

Entity Extraction

Configure entity types:

import { ObservableTypes, EntityExtractionServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

const entityWorkflow = await graphlit.createWorkflow({
  name: "Entity Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          // General types
          ObservableTypes.Person,
          ObservableTypes.Organization,
          ObservableTypes.Place,
          ObservableTypes.Event,
          ObservableTypes.Product,
          ObservableTypes.CreativeWork
        ]
      }
    }]
  }
});

Medical entity types:

extraction: {
  jobs: [{
    connector: {
      type: EntityExtractionServiceTypes.ModelDocument,
      extractedTypes: [
        ObservableTypes.MedicalCondition,
        ObservableTypes.MedicalProcedure,
        ObservableTypes.MedicalTreatment,
        ObservableTypes.MedicalDrug,
        ObservableTypes.MedicalDevice,
        ObservableTypes.MedicalTest
      ]
    }
  }]
}

Multi-Pass Extraction

Run multiple extraction jobs:

extraction: {
  jobs: [
    // Pass 1: General entities
    {
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          ObservableTypes.Person,
          ObservableTypes.Organization
        ]
      }
    },
    // Pass 2: Medical entities (if needed)
    {
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          ObservableTypes.MedicalCondition,
          ObservableTypes.MedicalProcedure
        ]
      }
    }
  ]
}

When to use: Documents with both general and domain-specific entities (e.g., medical reports mentioning doctors and treatments)


Part 4: Enrichment Stage

Generate derivatives from content.

Summarization

import { EnrichmentServiceTypes } from 'graphlit-client/dist/generated/graphlit-types';

const summaryWorkflow = await graphlit.createWorkflow({
  name: "Summarization",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument
      }
    }]
  },
  enrichment: {
    jobs: [{
      connector: {
        type: EnrichmentServiceTypes.ModelSummarization,
        prompt: "Summarize this document in 3-5 sentences."
      }
    }]
  }
});

Use cases:

  • Document previews
  • Executive summaries
  • Email digests

Text-to-Speech

enrichment: {
  jobs: [{
    connector: {
      type: EnrichmentServiceTypes.ModelAudioGeneration,
      voice: "alloy"  // ElevenLabs or OpenAI voice
    }
  }]
}

Generates: Audio version of text content

Image Generation

enrichment: {
  jobs: [{
    connector: {
      type: EnrichmentServiceTypes.ModelImageGeneration,
      prompt: "Create a visual representation of this concept"
    }
  }]
}

Part 5: Complete Workflow Examples

Example 1: Research Paper Processing

const researchWorkflow = await graphlit.createWorkflow({
  name: "Research Papers",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument  // Vision for tables/diagrams
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          ObservableTypes.Person,          // Authors, researchers
          ObservableTypes.Organization,    // Universities, labs
          ObservableTypes.CreativeWork     // Citations
        ]
      }
    }]
  },
  enrichment: {
    jobs: [{
      connector: {
        type: EnrichmentServiceTypes.ModelSummarization,
        prompt: "Summarize the key findings, methodology, and conclusions."
      }
    }]
  }
});

Example 2: Customer Support Emails

const supportWorkflow = await graphlit.createWorkflow({
  name: "Support Emails",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.Email
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [
          ObservableTypes.Person,        // Customer names
          ObservableTypes.Organization,  // Company names
          ObservableTypes.Product        // Products mentioned
        ]
      }
    }]
  }
});

Example 3: Meeting Recordings

const meetingWorkflow = await graphlit.createWorkflow({
  name: "Meeting Transcripts",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelAudio  // Transcribe
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [
          ObservableTypes.Person,  // Attendees, mentioned people
          ObservableTypes.Event,   // Mentioned meetings, deadlines
          ObservableTypes.Product  // Products discussed
        ]
      }
    }]
  },
  enrichment: {
    jobs: [{
      connector: {
        type: EnrichmentServiceTypes.ModelSummarization,
        prompt: "Summarize: meeting topic, key decisions, action items, attendees."
      }
    }]
  }
});

Part 6: Workflow Management

Query Workflows

const workflows = await graphlit.queryWorkflows();

workflows.workflows.results.forEach(workflow => {
  console.log(`${workflow.name}:`);
  console.log(`  Preparation: ${workflow.preparation ? 'Yes' : 'No'}`);
  console.log(`  Extraction: ${workflow.extraction ? 'Yes' : 'No'}`);
  console.log(`  Enrichment: ${workflow.enrichment ? 'Yes' : 'No'}`);
});

Update Workflow

await graphlit.updateWorkflow(workflowId, {
  name: 'Updated Name',
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          ObservableTypes.Person,
          ObservableTypes.Organization,
          ObservableTypes.Product  // Added Product
        ]
      }
    }]
  }
});

Delete Workflow

await graphlit.deleteWorkflow(workflowId);

Part 7: Production Patterns

Pattern 1: Workflow Library

Create reusable workflows for common scenarios:

// Create once, use many times
const workflows = {
  simple: await graphlit.createWorkflow({ name: "Simple Text" }),
  entities: await graphlit.createWorkflow({ name: "Entity Extraction" }),
  summary: await graphlit.createWorkflow({ name: "With Summary" }),
  medical: await graphlit.createWorkflow({ name: "Medical Entities" })
};

// Use appropriate workflow per content type
function getWorkflowForContent(contentType: string) {
  switch (contentType) {
    case 'medical': return workflows.medical.createWorkflow.id;
    case 'research': return workflows.entities.createWorkflow.id;
    default: return workflows.simple.createWorkflow.id;
  }
}

Pattern 2: Conditional Workflows

Apply different workflows based on content characteristics:

async function ingestWithSmartWorkflow(uri: string) {
  // Check file type
  const isPdf = uri.endsWith('.pdf');
  const isAudio = uri.match(/\.(mp3|wav|m4a)$/);
  
  let workflowId;
  if (isPdf) {
    workflowId = visionWorkflowId;  // Use vision for PDFs
  } else if (isAudio) {
    workflowId = audioWorkflowId;   // Use transcription for audio
  } else {
    workflowId = simpleWorkflowId;  // Default
  }
  
  return graphlit.ingestUri(uri, undefined, undefined, undefined, undefined, { id: workflowId });
}

Pattern 3: Workflow Monitoring

Track workflow usage:

// After processing, check what was extracted
const content = await graphlit.getContent(contentId);

console.log(`Workflow: ${content.content.workflow?.name}`);
console.log(`Pages extracted: ${content.content.pages?.length}`);
console.log(`Entities extracted: ${content.content.observations?.length}`);

if (content.content.summary) {
  console.log(`Summary generated: ${content.content.summary.length} chars`);
}

Common Issues & Solutions

Issue: No Text Extracted

Problem: Content indexed but no searchable text.

Solution: Check preparation stage:

// Bad: No preparation for PDF
const workflow = await graphlit.createWorkflow({
  name: "Broken",
  // Missing preparation!
  extraction: { /* ... */ }
});

// Good: Add preparation
const fixed = await graphlit.createWorkflow({
  name: "Fixed",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument
      }
    }]
  },
  extraction: { /* ... */ }
});

Issue: Poor Entity Extraction

Problem: Entities not detected or incorrect.

Solutions:

  1. Use vision model for PDFs:
preparation: {
  jobs: [{
    connector: {
      type: FilePreparationServiceTypes.ModelDocument  // Not .Document
    }
  }]
}
  1. Check entity types are relevant:
// Bad: Extracting medical entities from business docs
extractedTypes: [ObservableTypes.MedicalCondition]

// Good: Extract relevant types
extractedTypes: [ObservableTypes.Person, ObservableTypes.Organization]

Issue: Slow Processing

Problem: Content takes > 5 minutes to process.

Solutions:

  1. Simplify workflow (remove enrichment if not needed)
  2. Use faster preparation models:
// Slower: Vision model
type: FilePreparationServiceTypes.ModelDocument

// Faster: Traditional OCR
type: FilePreparationServiceTypes.Document

What's Next?

You now understand workflows completely. Next steps:

  1. Create workflow library for your use cases
  2. Optimize model selection (vision vs OCR)
  3. Monitor extraction quality in production
  4. Combine with specifications for custom models

Related guides:

Happy processing! ⚙️

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

Workflows and Processing Pipelines | Graphlit Developer Guides