Building Knowledge Graphs: From Zero to Production

Knowledge graphs transform unstructured content—PDFs, emails, meeting transcripts, Slack messages—into queryable networks of entities and relationships. Instead of searching for keywords, you can ask "Who works at which company?" or "What topics were discussed in meetings about Project Apollo?"

This guide takes you from your first entity extraction to production knowledge graphs serving real applications. We'll cover the Observable/Observation model, entity types, extraction workflows, and advanced querying patterns—with complete code examples.

What You'll Build

By the end of this guide, you'll know how to:

Extract entities (people, companies, places) from any content type
Configure extraction workflows for your use case
Understand the Observable/Observation architecture
Query entities and relationships
Build entity-filtered search and RAG systems
Handle deduplication, confidence scores, and edge cases
Scale to production with multi-content knowledge graphs

Prerequisites:

A Graphlit project (free tier works) - Sign up (2 min)
SDK installed: npm install graphlit-client (30 sec)
Some content to process (we'll use a public PDF)

Time to complete: 90 minutes
Difficulty: Intermediate

Developer Note: All Graphlit IDs are GUIDs (e.g., 550e8400-e29b-41d4-a716-446655440000). In code examples below, we use short placeholders like content-123 for readability where they're variables that would be populated at runtime. Example outputs show realistic GUID format.

Your First Knowledge Graph (15 min)
Understanding Observable/Observation Model
Entity Types and Extraction Strategies
Querying Your Knowledge Graph
Different Content Types
Production Patterns
Advanced Querying

Part 1: Your First Knowledge Graph (15 minutes)

Let's start with the simplest possible example: extract entities from a single PDF.

Step 1: Create an Extraction Workflow

Workflows tell Graphlit how to process content. For knowledge graphs, you need an extraction stage:

import { Graphlit } from 'graphlit-client';
import {
  FilePreparationServiceTypes,
  EntityExtractionServiceTypes,
  ObservableTypes
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Create workflow with entity extraction
const workflow = await graphlit.createWorkflow({
  name: "PDF Entity Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument  // Uses vision model for PDFs
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          ObservableTypes.Person,
          ObservableTypes.Organization,
          ObservableTypes.Place,
          ObservableTypes.Event
        ]
      }
    }]
  }
});

console.log(`✓ Workflow created: ${workflow.createWorkflow.id}`);

What's happening:

preparation stage: Vision model extracts text from PDF pages (handles scans, tables, multi-column layouts)
extraction stage: LLM reads extracted text and identifies entities
extractedTypes: Only extract these 4 entity types (there are 12 medical types and 7 general types available)

💡 Pro Tip: Use FilePreparationServiceTypes.ModelDocument (vision model) for scanned PDFs—3x more accurate than OCR but costs 2x more. For text-based PDFs, use FilePreparationServiceTypes.Document (OCR) to save costs.

Step 2: Ingest Content with the Workflow

// Ingest a research paper
const content = await graphlit.ingestUri(
  'https://arxiv.org/pdf/2301.00001.pdf',
  "AI Research Paper",
  undefined,
  undefined,
  undefined,
  { id: workflow.createWorkflow.id }  // Use our extraction workflow
);

console.log(`✓ Ingesting: ${content.ingestUri.id}`);

// Wait for processing to complete
let isDone = false;
while (!isDone) {
  const status = await graphlit.isContentDone(content.ingestUri.id);
  isDone = status.isContentDone.result;
  
  if (!isDone) {
    await new Promise(resolve => setTimeout(resolve, 2000));
  }
}

console.log('✓ Extraction complete!');

Developer hint: isContentDone polls processing status. For production, use webhooks instead.

⚠️ Warning: Polling in tight loops will hit rate limits. Add 2-5 second delays between checks, or better yet, use webhooks for production apps.

Step 3: Retrieve Extracted Entities

// Get content with observations (entity mentions)
const contentDetails = await graphlit.getContent(content.ingestUri.id);
const observations = contentDetails.content.observations || [];

console.log(`Found ${observations.length} entity observations`);

// Group by entity type
const byType = new Map<string, Set<string>>();

observations.forEach(obs => {
  if (!byType.has(obs.type)) {
    byType.set(obs.type, new Set());
  }
  byType.get(obs.type)!.add(obs.observable.name);
});

// Display results
byType.forEach((entities, type) => {
  console.log(`\n${type} (${entities.size} unique):`);
  Array.from(entities).slice(0, 5).forEach(name => {
    console.log(`  - ${name}`);
  });
});

Example output:

PERSON (23 unique):
  - Geoffrey Hinton
  - Yann LeCun
  - Yoshua Bengio
  - Andrew Ng
  - Fei-Fei Li

ORGANIZATION (12 unique):
  - Google
  - OpenAI
  - Stanford University
  - MIT
  - DeepMind

🎉 Congratulations! You just built your first knowledge graph. But this is just the beginning—let's understand what's actually happening under the hood.

Part 2: Understanding the Observable/Observation Model

This is the most important concept in Graphlit knowledge graphs. Get this, and everything else makes sense.

The Two-Tier Architecture

Observable = The entity itself (e.g., the person "Geoffrey Hinton")
Observation = A specific mention of that entity in content

Why two layers?

Deduplication: "Geoffrey Hinton" mentioned 50 times across 10 PDFs = 1 Observable, 50 Observations
Provenance: Track exactly where each mention appears (page 3, paragraph 2, coordinates)
Confidence: Each mention has its own confidence score
Relationships: Find co-occurrences (who was mentioned with whom, on which pages)

Data Flow Diagram

PDF Ingestion
    ↓
Vision Model Extracts Text (Preparation)
    ↓
LLM Identifies Entities (Extraction)
    ↓
For Each Entity Mention:
  ├─ Create Observation
  │  ├─ Type (PERSON, ORGANIZATION, etc.)
  │  ├─ Confidence (0.0-1.0)
  │  ├─ Page number & coordinates
  │  └─ Text context
  ↓
Entity Resolution (Automatic Deduplication)
  ├─ Is this entity already in the graph?
  ├─ Match by name, properties
  └─ Create new Observable OR link to existing
  ↓
Knowledge Graph Updated
  └─ Observable now has N observations

Code Example: Observations vs Observables

// Get observations from a single content item
const content = await graphlit.getContent('content-123');

content.content.observations?.forEach(observation => {
  console.log(`\nObservation ID: ${observation.id}`);
  console.log(`  Entity Type: ${observation.type}`);
  console.log(`  Entity Name: ${observation.observable.name}`);
  console.log(`  Observable ID: ${observation.observable.id}`);  // The entity itself
  
  // Where was this entity mentioned?
  observation.occurrences?.forEach(occ => {
    console.log(`    Page ${occ.pageIndex}, confidence: ${occ.confidence}`);
    if (occ.boundingBox) {
      console.log(`    Location: (${occ.boundingBox.left}, ${occ.boundingBox.top})`);
    }
  });
});

Example output:

Observation ID: 8c3e2f1a-4b5d-6e7f-8g9h-0i1j2k3l4m5n
  Entity Type: PERSON
  Entity Name: Geoffrey Hinton
  Observable ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890  ← The actual entity
    Page 3, confidence: 0.95
    Location: (120, 450)
    Page 12, confidence: 0.89
    Location: (340, 200)

Now query the Observable (entity) directly:

// Get the entity itself with all its mentions across ALL content
const entityResult = await graphlit.queryObservables({
  observables: [{ id: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890' }]
});

const entity = entityResult.observables?.results?.[0];

console.log(`Entity: ${entity?.observable.name}`);
console.log(`Type: ${entity?.type}`);
console.log(`Mentioned in ${entity?.observable.observationCount} places`);

Key insight: Observations are scoped to single content items. Observables span your entire knowledge graph.

✅ Quick Win: Once you understand this model, you can build entity-filtered search and RAG chatbots that answer questions like "What did Alice say about Project Phoenix?"

Part 3: Entity Types and Extraction Strategies

Graphlit supports 19 built-in entity types across two categories:

General Entity Types (7)

import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';

const generalTypes = [
  ObservableTypes.Person,           // People, authors, speakers
  ObservableTypes.Organization,     // Companies, institutions, agencies
  ObservableTypes.Place,            // Cities, countries, addresses
  ObservableTypes.Event,            // Meetings, conferences, incidents
  ObservableTypes.Product,          // Software, devices, offerings
  ObservableTypes.CreativeWork,     // Books, papers, articles
  ObservableTypes.Other             // Catch-all for domain-specific entities
];

Medical Entity Types (12)

For healthcare, research, and clinical applications:

const medicalTypes = [
  ObservableTypes.MedicalCondition,      // Diseases, symptoms, diagnoses
  ObservableTypes.MedicalProcedure,      // Surgeries, treatments, therapies
  ObservableTypes.MedicalTest,           // Labs, imaging, diagnostics
  ObservableTypes.MedicalTreatment,      // Drugs, protocols, interventions
  ObservableTypes.MedicalAnatomy,        // Organs, body parts, systems
  ObservableTypes.MedicalDevice,         // Equipment, implants, instruments
  ObservableTypes.MedicalGuideline,      // Protocols, standards, best practices
  ObservableTypes.MedicalStudy,          // Clinical trials, research papers
  ObservableTypes.MedicalMeasurement,    // Vital signs, lab values, metrics
  ObservableTypes.MedicalCode,           // ICD-10, CPT codes
  ObservableTypes.MedicalQuality,        // Severity descriptors
  ObservableTypes.MedicalDrug            // Pharmaceuticals, medications
];

Choosing Entity Types: Decision Matrix

Use Case	Entity Types	Why
Corporate docs (emails, reports)	Person, Organization, Place, Event	Track who works where, meeting attendees
Research papers	Person, Organization, CreativeWork	Authors, citations, institutions
Customer support	Person, Organization, Product, Event	Customer names, companies, products discussed, incidents
Medical records	All medical types + Person, Place	Patient data, treatments, locations
Legal documents	Person, Organization, Place, Event	Parties, companies, jurisdictions, dates
News/Social media	Person, Organization, Place, Event, Product	Who, what, where, when

Configuring Multi-Type Extraction

// Comprehensive entity extraction workflow
const comprehensiveWorkflow = await graphlit.createWorkflow({
  name: "Comprehensive Entity Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument
      }
    }]
  },
  extraction: {
    jobs: [
      // First pass: General entities
      {
        connector: {
          type: EntityExtractionServiceTypes.ModelDocument,
          extractedTypes: [
            ObservableTypes.Person,
            ObservableTypes.Organization,
            ObservableTypes.Place,
            ObservableTypes.Event,
            ObservableTypes.Product,
            ObservableTypes.CreativeWork
          ]
        }
      },
      // Second pass: Medical entities (if applicable)
      {
        connector: {
          type: EntityExtractionServiceTypes.ModelDocument,
          extractedTypes: [
            ObservableTypes.MedicalCondition,
            ObservableTypes.MedicalProcedure,
            ObservableTypes.MedicalTreatment,
            ObservableTypes.MedicalDrug
          ]
        }
      }
    ]
  }
});

Developer hint: Multiple extraction jobs run in parallel. Use this for medical + general extraction or different models per entity type.

Part 4: Querying Your Knowledge Graph

Now that you have entities, let's query them like a graph database.

Basic Queries: Get All Entities of a Type

import { EntityState } from 'graphlit-client/dist/generated/graphql-types';

// Get all people in your knowledge graph
const people = await graphlit.queryObservables({
  filter: {
    types: [ObservableTypes.Person],
    states: [EntityState.Enabled]  // Exclude disabled/deleted entities
  }
});

console.log(`Total people: ${people.observables?.results?.length}`);

people.observables?.results?.forEach(result => {
  const person = result.observable;
  console.log(`- ${person.name} (mentioned ${person.observationCount} times)`);
});

Filtering by Name (Search Entities)

// Find all organizations with "Research" in the name
const researchOrgs = await graphlit.queryObservables({
  filter: {
    types: [ObservableTypes.Organization],
    searchText: "Research",  // Fuzzy name matching
    states: [EntityState.Enabled]
  }
});

researchOrgs.observables?.results?.forEach(result => {
  console.log(result.observable.name);
});
// Output: "OpenAI Research", "Google Research", "MIT Research Lab", etc.

Advanced: Find Entity Relationships (Co-occurrence)

Entities that appear on the same page are likely related. Let's find person-organization relationships:

// Get content with observations
const content = await graphlit.getContent('content-123');
const observations = content.content.observations || [];

// Build co-occurrence matrix
const relationships: Array<{
  person: string;
  organization: string;
  pages: number[];
}> = [];

observations
  .filter(obs => obs.type === ObservableTypes.Person)
  .forEach(personObs => {
    const personPages = new Set(
      personObs.occurrences?.map(occ => occ.pageIndex) || []
    );
    
    observations
      .filter(obs => obs.type === ObservableTypes.Organization)
      .forEach(orgObs => {
        const orgPages = new Set(
          orgObs.occurrences?.map(occ => occ.pageIndex) || []
        );
        
        // Find shared pages
        const sharedPages = Array.from(personPages).filter(p => orgPages.has(p));
        
        if (sharedPages.length > 0) {
          relationships.push({
            person: personObs.observable.name,
            organization: orgObs.observable.name,
            pages: sharedPages
          });
        }
      });
  });

// Display top relationships
relationships
  .sort((a, b) => b.pages.length - a.pages.length)
  .slice(0, 10)
  .forEach(rel => {
    console.log(`${rel.person} ↔ ${rel.organization}`);
    console.log(`  Co-occurs on ${rel.pages.length} pages: ${rel.pages.join(', ')}`);
  });

Example output:

Geoffrey Hinton ↔ Google
  Co-occurs on 8 pages: 3, 7, 12, 15, 18, 23, 29, 31

Yann LeCun ↔ Meta
  Co-occurs on 5 pages: 4, 9, 14, 20, 27

Entity-Filtered Search (RAG)

Use entities to filter search results—find content mentioning specific people or companies:

// Search for content mentioning both "Geoffrey Hinton" AND "neural networks"
const searchResults = await graphlit.searchContents(
  'neural networks',
  {
    filters: [
      {
        observations: {
          observables: [
            { id: 'entity-person-123' }  // Geoffrey Hinton's entity ID
          ]
        }
      }
    ]
  }
);

searchResults.results?.forEach(result => {
  console.log(`${result.name} - Score: ${result.score}`);
});

Use case: "Show me all emails where Alice mentioned Project Phoenix" or "Find meeting transcripts with Bob and the CFO".

Part 5: Building Knowledge Graphs from Different Content Types

The workflow pattern stays the same, but extraction strategies differ by content type.

Emails (Gmail, Outlook)

// Create email-optimized workflow
const emailWorkflow = await graphlit.createWorkflow({
  name: "Email Entity Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.Email  // Email-specific parsing
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,  // Emails are pure text
        extractedTypes: [
          ObservableTypes.Person,        // Senders, recipients, mentioned people
          ObservableTypes.Organization,  // Companies, clients
          ObservableTypes.Place,         // Office locations, meeting venues
          ObservableTypes.Event          // Meetings, deadlines
        ]
      }
    }]
  }
});

// Create Gmail feed with workflow
const feed = await graphlit.createFeed({
  name: 'Gmail Entity Extraction',
  type: FeedTypes.Email,
  email: {
    type: FeedServiceTypes.GoogleEmail,
    google: {
      refreshToken: access_token
    },
    readLimit: 100  // Last 100 emails
  },
  workflow: { id: emailWorkflow.createWorkflow.id }
});

What you get: Automatic contact extraction, company mentions, meeting locations—queryable as a knowledge graph.

Slack Messages

// Slack-optimized workflow
const slackWorkflow = await graphlit.createWorkflow({
  name: "Slack Entity Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.Message
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [
          ObservableTypes.Person,
          ObservableTypes.Organization,
          ObservableTypes.Product,
          ObservableTypes.Event
        ]
      }
    }]
  }
});

// Create Slack feed for #general channel
const slackFeed = await graphlit.createFeed({
  name: 'Slack Channel Entities',
  type: FeedTypes.Slack,
  slack: {
    token: slack_token,
    channel: 'general'  // Single channel
  },
  workflow: { id: slackWorkflow.createWorkflow.id }
});

Use case: "Who's talking about which products in Slack?" or "Which customers were mentioned in #support today?"

Meeting Transcripts

// Meeting/audio workflow
const meetingWorkflow = await graphlit.createWorkflow({
  name: "Meeting Transcript Entities",
  preparation: {
    jobs: [
      {
        connector: {
          type: FilePreparationServiceTypes.ModelAudio  // Transcribe audio
        }
      }
    ]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelText,
        extractedTypes: [
          ObservableTypes.Person,   // Attendees, mentioned people
          ObservableTypes.Event,    // Mentioned meetings, deadlines
          ObservableTypes.Product,  // Products discussed
          ObservableTypes.Place     // Office locations, cities
        ]
      }
    }]
  }
});

Developer hint: Transcription happens in the preparation stage. Extraction runs on the transcript text.

Part 6: Production Patterns

Pattern 1: Multi-Content Knowledge Graphs

Build a unified knowledge graph across all your content:

// Create workflow once
const productionWorkflow = await graphlit.createWorkflow({
  name: "Production Entity Extraction",
  preparation: {
    jobs: [{
      connector: {
        type: FilePreparationServiceTypes.ModelDocument
      }
    }]
  },
  extraction: {
    jobs: [{
      connector: {
        type: EntityExtractionServiceTypes.ModelDocument,
        extractedTypes: [
          ObservableTypes.Person,
          ObservableTypes.Organization,
          ObservableTypes.Place,
          ObservableTypes.Event
        ]
      }
    }]
  }
});

// Ingest multiple content items with same workflow
const contents = [
  'https://company.com/q4-report.pdf',
  'https://company.com/strategy.pdf',
  'https://company.com/org-chart.pdf'
];

for (const url of contents) {
  await graphlit.ingestUri(
    url,
    undefined,
    undefined,
    undefined,
    undefined,
    { id: productionWorkflow.createWorkflow.id }
  );
}

// Now query entities across ALL ingested content
const allPeople = await graphlit.queryObservables({
  filter: {
    types: [ObservableTypes.Person],
    states: [EntityState.Enabled]
  }
});

console.log(`Total people across all documents: ${allPeople.observables?.results?.length}`);

Key insight: Entities are automatically deduplicated. "Alice Johnson" mentioned in 3 different PDFs = 1 Observable with 3+ Observations.

Pattern 2: Confidence-Based Filtering

Not all entity mentions are equally reliable. Filter by confidence:

// Get high-confidence observations only
const content = await graphlit.getContent('content-123');

const highConfidenceEntities = content.content.observations
  ?.filter(obs => {
    const avgConfidence = obs.occurrences
      ?.reduce((sum, occ) => sum + (occ.confidence || 0), 0) 
      / (obs.occurrences?.length || 1);
    return avgConfidence >= 0.8;  // 80%+ confidence
  })
  .map(obs => ({
    name: obs.observable.name,
    type: obs.type,
    confidence: obs.occurrences
      ?.reduce((sum, occ) => sum + (occ.confidence || 0), 0)
      / (obs.occurrences?.length || 1)
  }));

console.log('High-confidence entities:', highConfidenceEntities);

Rule of thumb:

0.9+: Very reliable (use in production UIs)
0.7-0.9: Reliable (good for most use cases)
0.5-0.7: Moderate (review manually)
<0.5: Low confidence (likely false positives)

Pattern 3: Webhooks for Real-Time Processing

Don't poll isContentDone—use webhooks:

// Configure webhook when creating feed
const feed = await graphlit.createFeed({
  name: 'Gmail with Webhooks',
  type: FeedTypes.Email,
  email: {
    type: FeedServiceTypes.GoogleEmail,
    google: {
      refreshToken: access_token
    },
    readLimit: 100
  },
  workflow: { id: workflow.createWorkflow.id },
  webhookUrl: 'https://yourapp.com/webhooks/graphlit'
});

// Your webhook handler (Express.js example)
app.post('/webhooks/graphlit', async (req, res) => {
  const event = req.body;
  
  if (event.type === 'content.done') {
    const contentId = event.contentId;
    
    // Retrieve entities
    const content = await graphlit.getContent(contentId);
    const entities = content.content.observations || [];
    
    // Store in your database, trigger notifications, etc.
    await storeEntities(entities);
    
    console.log(`Processed ${entities.length} entities from ${event.contentId}`);
  }
  
  res.sendStatus(200);
});

Pattern 4: Entity Deduplication Strategies

Graphlit deduplicates automatically by name, but you can improve matching:

// Query entity with properties to help deduplication
const entity = await graphlit.getObservable('entity-person-123');

console.log('Entity properties:', {
  name: entity.observable?.name,
  email: entity.observable?.properties?.email,
  affiliation: entity.observable?.properties?.affiliation,
  alternateNames: entity.observable?.alternateNames
});

// Entities with same name + same email = deduplicated automatically
// "Kirk Marple" (kirk@graphlit.com) in PDF + "Kirk Marple" (kirk@graphlit.com) in email = 1 Observable

Part 7: Advanced Querying & Graph Traversal

Query Entities with Related Content

Find all content mentioning a specific person:

// Get all content where "Alice Johnson" is mentioned
const aliceEntity = await graphlit.queryObservables({
  filter: {
    searchText: "Alice Johnson",
    types: [ObservableTypes.Person]
  }
});

const aliceId = aliceEntity.observables?.results?.[0]?.observable.id;

// Search content filtered by this entity
const relatedContent = await graphlit.searchContents('', {
  filters: [{
    observations: {
      observables: [{ id: aliceId }]
    }
  }]
});

console.log(`Content mentioning Alice Johnson: ${relatedContent.results?.length}`);
relatedContent.results?.forEach(content => {
  console.log(`- ${content.name}`);
});

Build Entity Timeline (Chronological Mentions)

// Get all observations of an entity, sorted by content date
const entityTimeline = await graphlit.queryObservables({
  observables: [{ id: 'entity-person-123' }]
});

// Fetch each content item to get dates
const timeline = await Promise.all(
  entityTimeline.observables?.results?.[0]?.observable.observations?.map(async obs => {
    const content = await graphlit.getContent(obs.contentId);
    return {
      contentName: content.content.name,
      date: content.content.finishedDate || content.content.creationDate,
      pages: obs.occurrences?.map(occ => occ.pageIndex)
    };
  }) || []
);

// Sort by date
timeline
  .sort((a, b) => new Date(a.date).getTime() - new Date(b.date).getTime())
  .forEach(item => {
    console.log(`${item.date}: ${item.contentName} (pages ${item.pages?.join(', ')})`);
  });

Use case: "Show me the history of mentions for Project Phoenix across all docs, chronologically."

Entity Co-Occurrence Network (Graph Visualization Data)

// Build person-person co-occurrence network
interface PersonRelationship {
  person1: string;
  person2: string;
  strength: number;  // Number of shared pages
}

const network: PersonRelationship[] = [];

const content = await graphlit.getContent('content-123');
const personObservations = content.content.observations
  ?.filter(obs => obs.type === ObservableTypes.Person) || [];

// For each pair of people
for (let i = 0; i < personObservations.length; i++) {
  for (let j = i + 1; j < personObservations.length; j++) {
    const person1 = personObservations[i];
    const person2 = personObservations[j];
    
    const pages1 = new Set(person1.occurrences?.map(occ => occ.pageIndex));
    const pages2 = new Set(person2.occurrences?.map(occ => occ.pageIndex));
    
    const sharedPages = Array.from(pages1).filter(p => pages2.has(p));
    
    if (sharedPages.length > 0) {
      network.push({
        person1: person1.observable.name,
        person2: person2.observable.name,
        strength: sharedPages.length
      });
    }
  }
}

// Export for visualization (D3.js, Cytoscape, etc.)
console.log('Network data for graph visualization:', JSON.stringify(network));

Use case: Visualize who works together based on co-mentions in documents.

Common Issues & Solutions

Issue: Too Many False Positives

Problem: LLM extracts irrelevant entities (e.g., "Monday" as a place).

Solutions:

Filter by confidence score (>= 0.8)
Use more specific entity types (avoid Other)
Post-process with custom filters:

const validEntities = observations.filter(obs => {
  // Exclude single-word places (likely days/months)
  if (obs.type === ObservableTypes.Place && !obs.observable.name.includes(' ')) {
    return false;
  }
  
  // Exclude generic organization names
  if (obs.type === ObservableTypes.Organization && 
      ['Company', 'Corporation', 'Inc'].includes(obs.observable.name)) {
    return false;
  }
  
  return true;
});

Issue: Entities Not Deduplicating

Problem: "Alice Johnson" and "A. Johnson" appear as separate entities.

Solutions:

Check alternateNames field (Graphlit populates this automatically)
Manual entity merging (not yet supported—coming soon)
Use entity properties to help matching:

// Enrich entities with properties from source data
// Graphlit will deduplicate entities with matching email addresses

Issue: Missing Entities

Problem: Expected entities not extracted.

Solutions:

Verify entity type is in extractedTypes array
Check PDF text extraction quality (scanned PDFs may have OCR errors)
Use ModelDocument for PDFs (not Text)—vision models are more accurate
Lower confidence threshold temporarily to see if entities are extracted with low confidence

What's Next?

You now have everything you need to build production knowledge graphs. Next steps:

Integrate with your app: Use entity IDs to filter search, build entity-driven UIs
Add more content types: Emails, Slack, meetings—unified knowledge graph
Explore relationships: Build co-occurrence networks, guided search
Scale to production: Webhooks, batch processing, monitoring

Related guides:

The Complete Guide to Search - Use entities to filter search results
Building AI Chat Applications - Entity-filtered RAG
Data Connectors Guide - Connect Gmail, Slack, etc. for entity extraction

Complete Example: Production Knowledge Graph

Here's a complete, production-ready example that ties everything together:

import { Graphlit } from 'graphlit-client';
import {
  FilePreparationServiceTypes,
  EntityExtractionServiceTypes,
  ObservableTypes,
  EntityState
} from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

async function buildProductionKnowledgeGraph() {
  // 1. Create extraction workflow
  console.log('Creating workflow...');
  const workflow = await graphlit.createWorkflow({
    name: "Production Entity Extraction",
    preparation: {
      jobs: [{
        connector: {
          type: FilePreparationServiceTypes.ModelDocument
        }
      }]
    },
    extraction: {
      jobs: [{
        connector: {
          type: EntityExtractionServiceTypes.ModelDocument,
          extractedTypes: [
            ObservableTypes.Person,
            ObservableTypes.Organization,
            ObservableTypes.Place,
            ObservableTypes.Event,
            ObservableTypes.Product
          ]
        }
      }]
    }
  });

  // 2. Ingest multiple documents
  console.log('Ingesting documents...');
  const documents = [
    'https://company.com/annual-report.pdf',
    'https://company.com/strategy.pdf',
    'https://company.com/team-directory.pdf'
  ];

  const contentIds: string[] = [];

  for (const url of documents) {
    const content = await graphlit.ingestUri(
      url,
      undefined,
      undefined,
      undefined,
      undefined,
      { id: workflow.createWorkflow.id }
    );
    contentIds.push(content.ingestUri.id);
  }

  // 3. Wait for all to complete
  console.log('Processing...');
  for (const contentId of contentIds) {
    let isDone = false;
    while (!isDone) {
      const status = await graphlit.isContentDone(contentId);
      isDone = status.isContentDone.result;
      if (!isDone) await new Promise(r => setTimeout(r, 2000));
    }
  }

  // 4. Query unified knowledge graph
  console.log('\n=== Knowledge Graph Statistics ===');
  
  const people = await graphlit.queryObservables({
    filter: { types: [ObservableTypes.Person], states: [EntityState.Enabled] }
  });
  console.log(`Total people: ${people.observables?.results?.length}`);

  const orgs = await graphlit.queryObservables({
    filter: { types: [ObservableTypes.Organization], states: [EntityState.Enabled] }
  });
  console.log(`Total organizations: ${orgs.observables?.results?.length}`);

  const places = await graphlit.queryObservables({
    filter: { types: [ObservableTypes.Place], states: [EntityState.Enabled] }
  });
  console.log(`Total places: ${places.observables?.results?.length}`);

  // 5. Find top entities by mention count
  console.log('\n=== Most Mentioned People ===');
  people.observables?.results
    ?.sort((a, b) => (b.observable.observationCount || 0) - (a.observable.observationCount || 0))
    .slice(0, 10)
    .forEach((result, i) => {
      console.log(`${i + 1}. ${result.observable.name} (${result.observable.observationCount} mentions)`);
    });

  // 6. Build relationship network
  console.log('\n=== Person-Organization Relationships ===');
  
  // Get all observations from all content
  const allObservations: any[] = [];
  for (const contentId of contentIds) {
    const content = await graphlit.getContent(contentId);
    allObservations.push(...(content.content.observations || []));
  }

  // Find co-occurrences
  const relationships = new Map<string, Set<string>>();

  allObservations
    .filter(obs => obs.type === ObservableTypes.Person)
    .forEach(personObs => {
      const personName = personObs.observable.name;
      
      allObservations
        .filter(obs => obs.type === ObservableTypes.Organization)
        .forEach(orgObs => {
          const orgName = orgObs.observable.name;
          
          // Check if on same pages in same document
          const personPages = new Set(
            personObs.occurrences?.map((occ: any) => `${occ.contentId}-${occ.pageIndex}`)
          );
          const orgPages = new Set(
            orgObs.occurrences?.map((occ: any) => `${occ.contentId}-${occ.pageIndex}`)
          );
          
          const overlap = Array.from(personPages).filter(p => orgPages.has(p));
          
          if (overlap.length > 0) {
            if (!relationships.has(personName)) {
              relationships.set(personName, new Set());
            }
            relationships.get(personName)!.add(orgName);
          }
        });
    });

  // Display top relationships
  Array.from(relationships.entries())
    .sort((a, b) => b[1].size - a[1].size)
    .slice(0, 10)
    .forEach(([person, orgs]) => {
      console.log(`${person} ↔ ${Array.from(orgs).join(', ')}`);
    });

  console.log('\n✓ Knowledge graph complete!');
}

buildProductionKnowledgeGraph().catch(console.error);

Run this, and you'll have a production knowledge graph with entity statistics, relationships, and queryable structure—ready to power search, RAG, and AI applications.

Happy graph building! 🚀

Building Knowledge Graphs: From Zero to Production

What You'll Build

Table of Contents

Part 1: Your First Knowledge Graph (15 minutes)

Step 1: Create an Extraction Workflow

Step 2: Ingest Content with the Workflow

Step 3: Retrieve Extracted Entities

Part 2: Understanding the Observable/Observation Model

The Two-Tier Architecture

Data Flow Diagram

Code Example: Observations vs Observables

Part 3: Entity Types and Extraction Strategies

General Entity Types (7)

Medical Entity Types (12)

Choosing Entity Types: Decision Matrix

Configuring Multi-Type Extraction

Part 4: Querying Your Knowledge Graph

Basic Queries: Get All Entities of a Type

Filtering by Name (Search Entities)

Advanced: Find Entity Relationships (Co-occurrence)

Entity-Filtered Search (RAG)

Part 5: Building Knowledge Graphs from Different Content Types

Emails (Gmail, Outlook)

Slack Messages

Meeting Transcripts

Part 6: Production Patterns

Pattern 1: Multi-Content Knowledge Graphs

Pattern 2: Confidence-Based Filtering

Pattern 3: Webhooks for Real-Time Processing

Pattern 4: Entity Deduplication Strategies

Part 7: Advanced Querying & Graph Traversal

Query Entities with Related Content

Build Entity Timeline (Chronological Mentions)

Entity Co-Occurrence Network (Graph Visualization Data)

Common Issues & Solutions

Issue: Too Many False Positives

Issue: Entities Not Deduplicating

Issue: Missing Entities

What's Next?

Complete Example: Production Knowledge Graph

Ready to Build with Graphlit?