Knowledge graphs transform unstructured content—PDFs, emails, meeting transcripts, Slack messages—into queryable networks of entities and relationships. Instead of searching for keywords, you can ask "Who works at which company?" or "What topics were discussed in meetings about Project Apollo?"
This guide takes you from your first entity extraction to production knowledge graphs serving real applications. We'll cover the Observable/Observation model, entity types, extraction workflows, and advanced querying patterns—with complete code examples.
What You'll Build
By the end of this guide, you'll know how to:
- Extract entities (people, companies, places) from any content type
- Configure extraction workflows for your use case
- Understand the Observable/Observation architecture
- Query entities and relationships
- Build entity-filtered search and RAG systems
- Handle deduplication, confidence scores, and edge cases
- Scale to production with multi-content knowledge graphs
Prerequisites:
- A Graphlit project (free tier works) - Sign up (2 min)
- SDK installed:
npm install graphlit-client(30 sec) - Some content to process (we'll use a public PDF)
Time to complete: 90 minutes
Difficulty: Intermediate
Developer Note: All Graphlit IDs are GUIDs (e.g.,
550e8400-e29b-41d4-a716-446655440000). In code examples below, we use short placeholders likecontent-123for readability where they're variables that would be populated at runtime. Example outputs show realistic GUID format.
Table of Contents
- Your First Knowledge Graph (15 min)
- Understanding Observable/Observation Model
- Entity Types and Extraction Strategies
- Querying Your Knowledge Graph
- Different Content Types
- Production Patterns
- Advanced Querying
Part 1: Your First Knowledge Graph (15 minutes)
Let's start with the simplest possible example: extract entities from a single PDF.
Step 1: Create an Extraction Workflow
Workflows tell Graphlit how to process content. For knowledge graphs, you need an extraction stage:
import { Graphlit } from 'graphlit-client';
import {
FilePreparationServiceTypes,
EntityExtractionServiceTypes,
ObservableTypes
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
// Create workflow with entity extraction
const workflow = await graphlit.createWorkflow({
name: "PDF Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument // Uses vision model for PDFs
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Place,
ObservableTypes.Event
]
}
}]
}
});
console.log(`✓ Workflow created: ${workflow.createWorkflow.id}`);
What's happening:
preparationstage: Vision model extracts text from PDF pages (handles scans, tables, multi-column layouts)extractionstage: LLM reads extracted text and identifies entitiesextractedTypes: Only extract these 4 entity types (there are 12 medical types and 7 general types available)
💡 Pro Tip: Use
FilePreparationServiceTypes.ModelDocument(vision model) for scanned PDFs—3x more accurate than OCR but costs 2x more. For text-based PDFs, useFilePreparationServiceTypes.Document(OCR) to save costs.
Step 2: Ingest Content with the Workflow
// Ingest a research paper
const content = await graphlit.ingestUri(
'https://arxiv.org/pdf/2301.00001.pdf',
"AI Research Paper",
undefined,
undefined,
undefined,
{ id: workflow.createWorkflow.id } // Use our extraction workflow
);
console.log(`✓ Ingesting: ${content.ingestUri.id}`);
// Wait for processing to complete
let isDone = false;
while (!isDone) {
const status = await graphlit.isContentDone(content.ingestUri.id);
isDone = status.isContentDone.result;
if (!isDone) {
await new Promise(resolve => setTimeout(resolve, 2000));
}
}
console.log('✓ Extraction complete!');
Developer hint: isContentDone polls processing status. For production, use webhooks instead.
⚠️ Warning: Polling in tight loops will hit rate limits. Add 2-5 second delays between checks, or better yet, use webhooks for production apps.
Step 3: Retrieve Extracted Entities
// Get content with observations (entity mentions)
const contentDetails = await graphlit.getContent(content.ingestUri.id);
const observations = contentDetails.content.observations || [];
console.log(`Found ${observations.length} entity observations`);
// Group by entity type
const byType = new Map<string, Set<string>>();
observations.forEach(obs => {
if (!byType.has(obs.type)) {
byType.set(obs.type, new Set());
}
byType.get(obs.type)!.add(obs.observable.name);
});
// Display results
byType.forEach((entities, type) => {
console.log(`\n${type} (${entities.size} unique):`);
Array.from(entities).slice(0, 5).forEach(name => {
console.log(` - ${name}`);
});
});
Example output:
PERSON (23 unique):
- Geoffrey Hinton
- Yann LeCun
- Yoshua Bengio
- Andrew Ng
- Fei-Fei Li
ORGANIZATION (12 unique):
- Google
- OpenAI
- Stanford University
- MIT
- DeepMind
🎉 Congratulations! You just built your first knowledge graph. But this is just the beginning—let's understand what's actually happening under the hood.
Part 2: Understanding the Observable/Observation Model
This is the most important concept in Graphlit knowledge graphs. Get this, and everything else makes sense.
The Two-Tier Architecture
Observable = The entity itself (e.g., the person "Geoffrey Hinton")
Observation = A specific mention of that entity in content
Why two layers?
- Deduplication: "Geoffrey Hinton" mentioned 50 times across 10 PDFs = 1 Observable, 50 Observations
- Provenance: Track exactly where each mention appears (page 3, paragraph 2, coordinates)
- Confidence: Each mention has its own confidence score
- Relationships: Find co-occurrences (who was mentioned with whom, on which pages)
Data Flow Diagram
PDF Ingestion
↓
Vision Model Extracts Text (Preparation)
↓
LLM Identifies Entities (Extraction)
↓
For Each Entity Mention:
├─ Create Observation
│ ├─ Type (PERSON, ORGANIZATION, etc.)
│ ├─ Confidence (0.0-1.0)
│ ├─ Page number & coordinates
│ └─ Text context
↓
Entity Resolution (Automatic Deduplication)
├─ Is this entity already in the graph?
├─ Match by name, properties
└─ Create new Observable OR link to existing
↓
Knowledge Graph Updated
└─ Observable now has N observations
Code Example: Observations vs Observables
// Get observations from a single content item
const content = await graphlit.getContent('content-123');
content.content.observations?.forEach(observation => {
console.log(`\nObservation ID: ${observation.id}`);
console.log(` Entity Type: ${observation.type}`);
console.log(` Entity Name: ${observation.observable.name}`);
console.log(` Observable ID: ${observation.observable.id}`); // The entity itself
// Where was this entity mentioned?
observation.occurrences?.forEach(occ => {
console.log(` Page ${occ.pageIndex}, confidence: ${occ.confidence}`);
if (occ.boundingBox) {
console.log(` Location: (${occ.boundingBox.left}, ${occ.boundingBox.top})`);
}
});
});
Example output:
Observation ID: 8c3e2f1a-4b5d-6e7f-8g9h-0i1j2k3l4m5n
Entity Type: PERSON
Entity Name: Geoffrey Hinton
Observable ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890 ← The actual entity
Page 3, confidence: 0.95
Location: (120, 450)
Page 12, confidence: 0.89
Location: (340, 200)
Now query the Observable (entity) directly:
// Get the entity itself with all its mentions across ALL content
const entityResult = await graphlit.queryObservables({
observables: [{ id: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890' }]
});
const entity = entityResult.observables?.results?.[0];
console.log(`Entity: ${entity?.observable.name}`);
console.log(`Type: ${entity?.type}`);
console.log(`Mentioned in ${entity?.observable.observationCount} places`);
Key insight: Observations are scoped to single content items. Observables span your entire knowledge graph.
✅ Quick Win: Once you understand this model, you can build entity-filtered search and RAG chatbots that answer questions like "What did Alice say about Project Phoenix?"
Part 3: Entity Types and Extraction Strategies
Graphlit supports 19 built-in entity types across two categories:
General Entity Types (7)
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
const generalTypes = [
ObservableTypes.Person, // People, authors, speakers
ObservableTypes.Organization, // Companies, institutions, agencies
ObservableTypes.Place, // Cities, countries, addresses
ObservableTypes.Event, // Meetings, conferences, incidents
ObservableTypes.Product, // Software, devices, offerings
ObservableTypes.CreativeWork, // Books, papers, articles
ObservableTypes.Other // Catch-all for domain-specific entities
];
Medical Entity Types (12)
For healthcare, research, and clinical applications:
const medicalTypes = [
ObservableTypes.MedicalCondition, // Diseases, symptoms, diagnoses
ObservableTypes.MedicalProcedure, // Surgeries, treatments, therapies
ObservableTypes.MedicalTest, // Labs, imaging, diagnostics
ObservableTypes.MedicalTreatment, // Drugs, protocols, interventions
ObservableTypes.MedicalAnatomy, // Organs, body parts, systems
ObservableTypes.MedicalDevice, // Equipment, implants, instruments
ObservableTypes.MedicalGuideline, // Protocols, standards, best practices
ObservableTypes.MedicalStudy, // Clinical trials, research papers
ObservableTypes.MedicalMeasurement, // Vital signs, lab values, metrics
ObservableTypes.MedicalCode, // ICD-10, CPT codes
ObservableTypes.MedicalQuality, // Severity descriptors
ObservableTypes.MedicalDrug // Pharmaceuticals, medications
];
Choosing Entity Types: Decision Matrix
Configuring Multi-Type Extraction
// Comprehensive entity extraction workflow
const comprehensiveWorkflow = await graphlit.createWorkflow({
name: "Comprehensive Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument
}
}]
},
extraction: {
jobs: [
// First pass: General entities
{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Place,
ObservableTypes.Event,
ObservableTypes.Product,
ObservableTypes.CreativeWork
]
}
},
// Second pass: Medical entities (if applicable)
{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.MedicalCondition,
ObservableTypes.MedicalProcedure,
ObservableTypes.MedicalTreatment,
ObservableTypes.MedicalDrug
]
}
}
]
}
});
Developer hint: Multiple extraction jobs run in parallel. Use this for medical + general extraction or different models per entity type.
Part 4: Querying Your Knowledge Graph
Now that you have entities, let's query them like a graph database.
Basic Queries: Get All Entities of a Type
import { EntityState } from 'graphlit-client/dist/generated/graphql-types';
// Get all people in your knowledge graph
const people = await graphlit.queryObservables({
filter: {
types: [ObservableTypes.Person],
states: [EntityState.Enabled] // Exclude disabled/deleted entities
}
});
console.log(`Total people: ${people.observables?.results?.length}`);
people.observables?.results?.forEach(result => {
const person = result.observable;
console.log(`- ${person.name} (mentioned ${person.observationCount} times)`);
});
Filtering by Name (Search Entities)
// Find all organizations with "Research" in the name
const researchOrgs = await graphlit.queryObservables({
filter: {
types: [ObservableTypes.Organization],
searchText: "Research", // Fuzzy name matching
states: [EntityState.Enabled]
}
});
researchOrgs.observables?.results?.forEach(result => {
console.log(result.observable.name);
});
// Output: "OpenAI Research", "Google Research", "MIT Research Lab", etc.
Advanced: Find Entity Relationships (Co-occurrence)
Entities that appear on the same page are likely related. Let's find person-organization relationships:
// Get content with observations
const content = await graphlit.getContent('content-123');
const observations = content.content.observations || [];
// Build co-occurrence matrix
const relationships: Array<{
person: string;
organization: string;
pages: number[];
}> = [];
observations
.filter(obs => obs.type === ObservableTypes.Person)
.forEach(personObs => {
const personPages = new Set(
personObs.occurrences?.map(occ => occ.pageIndex) || []
);
observations
.filter(obs => obs.type === ObservableTypes.Organization)
.forEach(orgObs => {
const orgPages = new Set(
orgObs.occurrences?.map(occ => occ.pageIndex) || []
);
// Find shared pages
const sharedPages = Array.from(personPages).filter(p => orgPages.has(p));
if (sharedPages.length > 0) {
relationships.push({
person: personObs.observable.name,
organization: orgObs.observable.name,
pages: sharedPages
});
}
});
});
// Display top relationships
relationships
.sort((a, b) => b.pages.length - a.pages.length)
.slice(0, 10)
.forEach(rel => {
console.log(`${rel.person} ↔ ${rel.organization}`);
console.log(` Co-occurs on ${rel.pages.length} pages: ${rel.pages.join(', ')}`);
});
Example output:
Geoffrey Hinton ↔ Google
Co-occurs on 8 pages: 3, 7, 12, 15, 18, 23, 29, 31
Yann LeCun ↔ Meta
Co-occurs on 5 pages: 4, 9, 14, 20, 27
Entity-Filtered Search (RAG)
Use entities to filter search results—find content mentioning specific people or companies:
// Search for content mentioning both "Geoffrey Hinton" AND "neural networks"
const searchResults = await graphlit.searchContents(
'neural networks',
{
filters: [
{
observations: {
observables: [
{ id: 'entity-person-123' } // Geoffrey Hinton's entity ID
]
}
}
]
}
);
searchResults.results?.forEach(result => {
console.log(`${result.name} - Score: ${result.score}`);
});
Use case: "Show me all emails where Alice mentioned Project Phoenix" or "Find meeting transcripts with Bob and the CFO".
Part 5: Building Knowledge Graphs from Different Content Types
The workflow pattern stays the same, but extraction strategies differ by content type.
Emails (Gmail, Outlook)
// Create email-optimized workflow
const emailWorkflow = await graphlit.createWorkflow({
name: "Email Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.Email // Email-specific parsing
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText, // Emails are pure text
extractedTypes: [
ObservableTypes.Person, // Senders, recipients, mentioned people
ObservableTypes.Organization, // Companies, clients
ObservableTypes.Place, // Office locations, meeting venues
ObservableTypes.Event // Meetings, deadlines
]
}
}]
}
});
// Create Gmail feed with workflow
const feed = await graphlit.createFeed(
FeedServiceTypes.Gmail,
{ id: emailWorkflow.createWorkflow.id },
{ readLimit: 100 }, // Last 100 emails
'Gmail Entity Extraction'
);
What you get: Automatic contact extraction, company mentions, meeting locations—queryable as a knowledge graph.
Slack Messages
// Slack-optimized workflow
const slackWorkflow = await graphlit.createWorkflow({
name: "Slack Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.Message
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Product,
ObservableTypes.Event
]
}
}]
}
});
// Create Slack feed
const slackFeed = await graphlit.createFeed(
FeedServiceTypes.Slack,
{ id: slackWorkflow.createWorkflow.id },
{
type: FeedTypeTypes.Channel,
channels: ['general', 'engineering', 'product']
},
'Slack Channel Entities'
);
Use case: "Who's talking about which products in Slack?" or "Which customers were mentioned in #support today?"
Meeting Transcripts
// Meeting/audio workflow
const meetingWorkflow = await graphlit.createWorkflow({
name: "Meeting Transcript Entities",
preparation: {
jobs: [
{
connector: {
type: FilePreparationServiceTypes.ModelAudio // Transcribe audio
}
}
]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelText,
extractedTypes: [
ObservableTypes.Person, // Attendees, mentioned people
ObservableTypes.Event, // Mentioned meetings, deadlines
ObservableTypes.Product, // Products discussed
ObservableTypes.Place // Office locations, cities
]
}
}]
}
});
Developer hint: Transcription happens in the preparation stage. Extraction runs on the transcript text.
Part 6: Production Patterns
Pattern 1: Multi-Content Knowledge Graphs
Build a unified knowledge graph across all your content:
// Create workflow once
const productionWorkflow = await graphlit.createWorkflow({
name: "Production Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Place,
ObservableTypes.Event
]
}
}]
}
});
// Ingest multiple content items with same workflow
const contents = [
'https://company.com/q4-report.pdf',
'https://company.com/strategy.pdf',
'https://company.com/org-chart.pdf'
];
for (const url of contents) {
await graphlit.ingestUri(
url,
undefined,
undefined,
undefined,
undefined,
{ id: productionWorkflow.createWorkflow.id }
);
}
// Now query entities across ALL ingested content
const allPeople = await graphlit.queryObservables({
filter: {
types: [ObservableTypes.Person],
states: [EntityState.Enabled]
}
});
console.log(`Total people across all documents: ${allPeople.observables?.results?.length}`);
Key insight: Entities are automatically deduplicated. "Alice Johnson" mentioned in 3 different PDFs = 1 Observable with 3+ Observations.
Pattern 2: Confidence-Based Filtering
Not all entity mentions are equally reliable. Filter by confidence:
// Get high-confidence observations only
const content = await graphlit.getContent('content-123');
const highConfidenceEntities = content.content.observations
?.filter(obs => {
const avgConfidence = obs.occurrences
?.reduce((sum, occ) => sum + (occ.confidence || 0), 0)
/ (obs.occurrences?.length || 1);
return avgConfidence >= 0.8; // 80%+ confidence
})
.map(obs => ({
name: obs.observable.name,
type: obs.type,
confidence: obs.occurrences
?.reduce((sum, occ) => sum + (occ.confidence || 0), 0)
/ (obs.occurrences?.length || 1)
}));
console.log('High-confidence entities:', highConfidenceEntities);
Rule of thumb:
- 0.9+: Very reliable (use in production UIs)
- 0.7-0.9: Reliable (good for most use cases)
- 0.5-0.7: Moderate (review manually)
- <0.5: Low confidence (likely false positives)
Pattern 3: Webhooks for Real-Time Processing
Don't poll isContentDone—use webhooks:
// Configure webhook when creating feed or workflow
const feed = await graphlit.createFeed(
FeedServiceTypes.Gmail,
{ id: workflow.createWorkflow.id },
{ readLimit: 100 },
'Gmail with Webhooks',
undefined,
'https://yourapp.com/webhooks/graphlit' // Your webhook endpoint
);
// Your webhook handler (Express.js example)
app.post('/webhooks/graphlit', async (req, res) => {
const event = req.body;
if (event.type === 'content.done') {
const contentId = event.contentId;
// Retrieve entities
const content = await graphlit.getContent(contentId);
const entities = content.content.observations || [];
// Store in your database, trigger notifications, etc.
await storeEntities(entities);
console.log(`Processed ${entities.length} entities from ${event.contentId}`);
}
res.sendStatus(200);
});
Pattern 4: Entity Deduplication Strategies
Graphlit deduplicates automatically by name, but you can improve matching:
// Query entity with properties to help deduplication
const entity = await graphlit.getObservable('entity-person-123');
console.log('Entity properties:', {
name: entity.observable?.name,
email: entity.observable?.properties?.email,
affiliation: entity.observable?.properties?.affiliation,
alternateNames: entity.observable?.alternateNames
});
// Entities with same name + same email = deduplicated automatically
// "Kirk Marple" (kirk@graphlit.com) in PDF + "Kirk Marple" (kirk@graphlit.com) in email = 1 Observable
Part 7: Advanced Querying & Graph Traversal
Query Entities with Related Content
Find all content mentioning a specific person:
// Get all content where "Alice Johnson" is mentioned
const aliceEntity = await graphlit.queryObservables({
filter: {
searchText: "Alice Johnson",
types: [ObservableTypes.Person]
}
});
const aliceId = aliceEntity.observables?.results?.[0]?.observable.id;
// Search content filtered by this entity
const relatedContent = await graphlit.searchContents('', {
filters: [{
observations: {
observables: [{ id: aliceId }]
}
}]
});
console.log(`Content mentioning Alice Johnson: ${relatedContent.results?.length}`);
relatedContent.results?.forEach(content => {
console.log(`- ${content.name}`);
});
Build Entity Timeline (Chronological Mentions)
// Get all observations of an entity, sorted by content date
const entityTimeline = await graphlit.queryObservables({
observables: [{ id: 'entity-person-123' }]
});
// Fetch each content item to get dates
const timeline = await Promise.all(
entityTimeline.observables?.results?.[0]?.observable.observations?.map(async obs => {
const content = await graphlit.getContent(obs.contentId);
return {
contentName: content.content.name,
date: content.content.finishedDate || content.content.creationDate,
pages: obs.occurrences?.map(occ => occ.pageIndex)
};
}) || []
);
// Sort by date
timeline
.sort((a, b) => new Date(a.date).getTime() - new Date(b.date).getTime())
.forEach(item => {
console.log(`${item.date}: ${item.contentName} (pages ${item.pages?.join(', ')})`);
});
Use case: "Show me the history of mentions for Project Phoenix across all docs, chronologically."
Entity Co-Occurrence Network (Graph Visualization Data)
// Build person-person co-occurrence network
interface PersonRelationship {
person1: string;
person2: string;
strength: number; // Number of shared pages
}
const network: PersonRelationship[] = [];
const content = await graphlit.getContent('content-123');
const personObservations = content.content.observations
?.filter(obs => obs.type === ObservableTypes.Person) || [];
// For each pair of people
for (let i = 0; i < personObservations.length; i++) {
for (let j = i + 1; j < personObservations.length; j++) {
const person1 = personObservations[i];
const person2 = personObservations[j];
const pages1 = new Set(person1.occurrences?.map(occ => occ.pageIndex));
const pages2 = new Set(person2.occurrences?.map(occ => occ.pageIndex));
const sharedPages = Array.from(pages1).filter(p => pages2.has(p));
if (sharedPages.length > 0) {
network.push({
person1: person1.observable.name,
person2: person2.observable.name,
strength: sharedPages.length
});
}
}
}
// Export for visualization (D3.js, Cytoscape, etc.)
console.log('Network data for graph visualization:', JSON.stringify(network));
Use case: Visualize who works together based on co-mentions in documents.
Common Issues & Solutions
Issue: Too Many False Positives
Problem: LLM extracts irrelevant entities (e.g., "Monday" as a place).
Solutions:
- Filter by confidence score (>= 0.8)
- Use more specific entity types (avoid
Other) - Post-process with custom filters:
const validEntities = observations.filter(obs => {
// Exclude single-word places (likely days/months)
if (obs.type === ObservableTypes.Place && !obs.observable.name.includes(' ')) {
return false;
}
// Exclude generic organization names
if (obs.type === ObservableTypes.Organization &&
['Company', 'Corporation', 'Inc'].includes(obs.observable.name)) {
return false;
}
return true;
});
Issue: Entities Not Deduplicating
Problem: "Alice Johnson" and "A. Johnson" appear as separate entities.
Solutions:
- Check
alternateNamesfield (Graphlit populates this automatically) - Manual entity merging (not yet supported—coming soon)
- Use entity properties to help matching:
// Enrich entities with properties from source data
// Graphlit will deduplicate entities with matching email addresses
Issue: Missing Entities
Problem: Expected entities not extracted.
Solutions:
- Verify entity type is in
extractedTypesarray - Check PDF text extraction quality (scanned PDFs may have OCR errors)
- Use
ModelDocumentfor PDFs (notText)—vision models are more accurate - Lower confidence threshold temporarily to see if entities are extracted with low confidence
What's Next?
You now have everything you need to build production knowledge graphs. Next steps:
- Integrate with your app: Use entity IDs to filter search, build entity-driven UIs
- Add more content types: Emails, Slack, meetings—unified knowledge graph
- Explore relationships: Build co-occurrence networks, guided search
- Scale to production: Webhooks, batch processing, monitoring
Related guides:
- The Complete Guide to Search - Use entities to filter search results
- Building AI Chat Applications - Entity-filtered RAG
- Data Connectors Guide - Connect Gmail, Slack, etc. for entity extraction
Complete Example: Production Knowledge Graph
Here's a complete, production-ready example that ties everything together:
import { Graphlit } from 'graphlit-client';
import {
FilePreparationServiceTypes,
EntityExtractionServiceTypes,
ObservableTypes,
EntityState
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
async function buildProductionKnowledgeGraph() {
// 1. Create extraction workflow
console.log('Creating workflow...');
const workflow = await graphlit.createWorkflow({
name: "Production Entity Extraction",
preparation: {
jobs: [{
connector: {
type: FilePreparationServiceTypes.ModelDocument
}
}]
},
extraction: {
jobs: [{
connector: {
type: EntityExtractionServiceTypes.ModelDocument,
extractedTypes: [
ObservableTypes.Person,
ObservableTypes.Organization,
ObservableTypes.Place,
ObservableTypes.Event,
ObservableTypes.Product
]
}
}]
}
});
// 2. Ingest multiple documents
console.log('Ingesting documents...');
const documents = [
'https://company.com/annual-report.pdf',
'https://company.com/strategy.pdf',
'https://company.com/team-directory.pdf'
];
const contentIds: string[] = [];
for (const url of documents) {
const content = await graphlit.ingestUri(
url,
undefined,
undefined,
undefined,
undefined,
{ id: workflow.createWorkflow.id }
);
contentIds.push(content.ingestUri.id);
}
// 3. Wait for all to complete
console.log('Processing...');
for (const contentId of contentIds) {
let isDone = false;
while (!isDone) {
const status = await graphlit.isContentDone(contentId);
isDone = status.isContentDone.result;
if (!isDone) await new Promise(r => setTimeout(r, 2000));
}
}
// 4. Query unified knowledge graph
console.log('\n=== Knowledge Graph Statistics ===');
const people = await graphlit.queryObservables({
filter: { types: [ObservableTypes.Person], states: [EntityState.Enabled] }
});
console.log(`Total people: ${people.observables?.results?.length}`);
const orgs = await graphlit.queryObservables({
filter: { types: [ObservableTypes.Organization], states: [EntityState.Enabled] }
});
console.log(`Total organizations: ${orgs.observables?.results?.length}`);
const places = await graphlit.queryObservables({
filter: { types: [ObservableTypes.Place], states: [EntityState.Enabled] }
});
console.log(`Total places: ${places.observables?.results?.length}`);
// 5. Find top entities by mention count
console.log('\n=== Most Mentioned People ===');
people.observables?.results
?.sort((a, b) => (b.observable.observationCount || 0) - (a.observable.observationCount || 0))
.slice(0, 10)
.forEach((result, i) => {
console.log(`${i + 1}. ${result.observable.name} (${result.observable.observationCount} mentions)`);
});
// 6. Build relationship network
console.log('\n=== Person-Organization Relationships ===');
// Get all observations from all content
const allObservations: any[] = [];
for (const contentId of contentIds) {
const content = await graphlit.getContent(contentId);
allObservations.push(...(content.content.observations || []));
}
// Find co-occurrences
const relationships = new Map<string, Set<string>>();
allObservations
.filter(obs => obs.type === ObservableTypes.Person)
.forEach(personObs => {
const personName = personObs.observable.name;
allObservations
.filter(obs => obs.type === ObservableTypes.Organization)
.forEach(orgObs => {
const orgName = orgObs.observable.name;
// Check if on same pages in same document
const personPages = new Set(
personObs.occurrences?.map((occ: any) => `${occ.contentId}-${occ.pageIndex}`)
);
const orgPages = new Set(
orgObs.occurrences?.map((occ: any) => `${occ.contentId}-${occ.pageIndex}`)
);
const overlap = Array.from(personPages).filter(p => orgPages.has(p));
if (overlap.length > 0) {
if (!relationships.has(personName)) {
relationships.set(personName, new Set());
}
relationships.get(personName)!.add(orgName);
}
});
});
// Display top relationships
Array.from(relationships.entries())
.sort((a, b) => b[1].size - a[1].size)
.slice(0, 10)
.forEach(([person, orgs]) => {
console.log(`${person} ↔ ${Array.from(orgs).join(', ')}`);
});
console.log('\n✓ Knowledge graph complete!');
}
buildProductionKnowledgeGraph().catch(console.error);
Run this, and you'll have a production knowledge graph with entity statistics, relationships, and queryable structure—ready to power search, RAG, and AI applications.
Happy graph building! 🚀