Search is the foundation of AI applications—whether you're building RAG systems, Q&A interfaces, or knowledge bases. Graphlit provides three search types (vector, keyword, hybrid), advanced filtering, and performance optimizations that let you build production-grade search experiences.
This guide covers everything from basic queries to advanced patterns like entity filtering, metadata queries, and performance tuning. By the end, you'll know which search type to use, how to optimize queries, and how to build powerful filtered search UIs.
What You'll Learn
- Vector, keyword, and hybrid search—when to use each
- How the RRF (Reciprocal Rank Fusion) algorithm works
- Content filters: types, dates, entities, metadata
- "Find similar" queries for recommendation engines
- Performance patterns and optimization strategies
- Production search architecture
Prerequisites:
- A Graphlit project with ingested content - Get started (15 min)
- SDK installed:
npm install graphlit-client(30 sec)
Time to complete: 60 minutes
Difficulty: Intermediate
Developer Note: All Graphlit IDs are GUIDs (e.g.,
550e8400-e29b-41d4-a716-446655440000). In code examples below, we use short placeholders for readability. Example outputs show realistic GUID format.
Table of Contents
- The Three Search Types
- How Hybrid Search Works (RRF)
- Filtering Search Results
- Find Similar Queries
- Performance Patterns
- Advanced Patterns
- Production Search Architecture
Part 1: The Three Search Types
Graphlit supports three search approaches. Understanding when to use each is critical for building great search experiences.
Hybrid Search (Default & Recommended)
Combines vector and keyword search using RRF algorithm. This is the default and works best for most use cases.
import { Graphlit } from 'graphlit-client';
const graphlit = new Graphlit();
// Hybrid search (default - no searchType needed)
const results = await graphlit.queryContents({
search: "machine learning applications in healthcare"
});
console.log(`Found ${results.contents.results.length} results`);
results.contents.results.forEach((content, index) => {
console.log(`${index + 1}. ${content.name}`);
console.log(` Relevance: ${(content.relevance * 100).toFixed(1)}%`);
console.log(` Type: ${content.type}`);
});
Example output:
Found 15 results
1. Machine Learning in Medical Diagnosis
Relevance: 92.3%
Type: DOCUMENT
2. Healthcare AI Applications
Relevance: 87.1%
Type: DOCUMENT
3. ML Models for Patient Care
Relevance: 83.5%
Type: DOCUMENT
When to use:
- Almost always (it's the default for a reason)
- Production applications
- User-facing search
- Mixed queries (names + concepts)
- When unsure which search type to use
Vector Search (Semantic)
Finds content by meaning, not exact words. Converts text to embeddings (high-dimensional vectors) and measures similarity.
import { SearchTypes } from 'graphlit-client/dist/generated/graphql-types';
// Pure vector search
const vectorResults = await graphlit.queryContents({
search: "reducing carbon emissions",
searchType: SearchTypes.Vector
});
What it finds:
- "lowering CO2 output" ✓
- "decreasing greenhouse gases" ✓
- "climate change mitigation" ✓
- "sustainability efforts" ✓
When to use:
- Conceptual queries ("What is AI safety?")
- Finding semantically similar content
- Cross-language concepts
- Exploratory search
- When users don't know exact terminology
When NOT to use:
- Exact phrases ("Project Apollo v2.3")
- Names and identifiers ("JIRA-1234")
- Very short queries (< 3 words)
- Code or technical identifiers
Keyword Search (Exact Matching)
Traditional token-based search using BM25 algorithm. Finds exact word matches.
// Pure keyword search
const keywordResults = await graphlit.queryContents({
search: "Kirk Marple",
searchType: SearchTypes.Keyword
});
What it finds:
- Documents containing "Kirk" AND "Marple" ✓
- Exact name matches ✓
- Phrase proximity
When to use:
- Exact phrases or names
- IDs and codes
- Technical terms (not concepts)
- When precision > recall
- Short, specific queries
When NOT to use:
- Conceptual queries
- When synonyms matter
- Cross-language search
- Exploratory queries
Decision Matrix
Rule of thumb: When in doubt, use hybrid (the default). It handles edge cases better than either approach alone.
✅ Quick Win: Start with hybrid search—it's the default for a reason. Only switch to pure vector or keyword if you have specific requirements after testing.
Part 2: How Hybrid Search Works (RRF Deep Dive)
Hybrid search uses Reciprocal Rank Fusion (RRF) to merge vector and keyword results.
The Algorithm
For each result:
RRF_score = 1/(k + vector_rank) + 1/(k + keyword_rank)
Where:
k = 60 (constant)
vector_rank = position in vector search results (1-indexed)
keyword_rank = position in keyword search results (1-indexed)
Worked Example
Query: "machine learning"
Vector search results (semantic):
- "ML Applications" (rank 1)
- "AI Algorithms" (rank 2)
- "Deep Learning Guide" (rank 3)
- "Neural Networks" (rank 4)
Keyword search results (exact match):
- "Machine Learning Basics" (rank 1)
- "ML Applications" (rank 2) ← Also in vector!
- "Learn Machine Learning" (rank 3)
- "ML Fundamentals" (rank 4)
RRF scoring:
"ML Applications" (appears in BOTH):
- Vector: 1/(60+1) = 0.0164
- Keyword: 1/(60+2) = 0.0161
- Combined: 0.0325 ← Highest score!
"Machine Learning Basics" (keyword only):
- Vector: not in top results = 0
- Keyword: 1/(60+1) = 0.0164
- Combined: 0.0164
"AI Algorithms" (vector only):
- Vector: 1/(60+2) = 0.0161
- Keyword: not in top results = 0
- Combined: 0.0161
Final ranking:
- "ML Applications" (0.0325) ← Appears in both, ranked highest
- "Machine Learning Basics" (0.0164)
- "AI Algorithms" (0.0161)
- "Deep Learning Guide" (0.0159)
- "Learn Machine Learning" (0.0159)
Key insight: Content that appears in BOTH search types gets boosted significantly. This is why hybrid works well—it rewards content that matches both semantic meaning AND exact terms.
💡 Pro Tip: RRF eliminates the need for score normalization. Since it uses ranks (1, 2, 3...) not scores (0.95, 124, etc.), you can combine results from any algorithms without complex math.
The Pipeline
User Query: "machine learning in healthcare"
↓
Split into TWO parallel searches:
├─ Vector Search
│ ↓
│ Query → Embedding (3072-dim vector)
│ ↓
│ Cosine similarity vs content embeddings
│ ↓
│ Ranked results A
│
└─ Keyword Search
↓
Token matching (BM25)
↓
Ranked results B
↓
RRF Fusion (merge A + B using reciprocal ranks)
↓
Final ranked results (sorted by RRF score)
Why RRF Works
Problem with simple concatenation:
// Bad: Just concatenate results
const results = [...vectorResults, ...keywordResults];
// Issue: How do you merge scores from different algorithms?
// Vector scores: 0.95, 0.89, 0.82
// Keyword scores: 124, 98, 76
// Can't compare directly!
RRF solution:
// Good: Use ranks (positions), not scores
// Ranks are universal: 1, 2, 3, 4...
// RRF converts ranks to comparable scores
Benefits:
- No score normalization needed
- Robust to different search algorithms
- Rewards consensus (appearing in both)
- Simple and effective
Part 3: Filtering Search Results
Filters let you narrow search results by content type, date, entities, and metadata.
Filter by Content Type
import { ContentTypes } from 'graphlit-client/dist/generated/graphql-types';
// Search only PDF documents
const pdfResults = await graphlit.queryContents({
search: "annual report",
filter: {
types: [ContentTypes.Document]
}
});
// Search only emails
const emailResults = await graphlit.queryContents({
search: "project update",
filter: {
types: [ContentTypes.Email]
}
});
// Multiple types
const results = await graphlit.queryContents({
search: "Q4 strategy",
filter: {
types: [ContentTypes.Document, ContentTypes.Email, ContentTypes.Message]
}
});
Available content types:
ContentTypes.File- Generic filesContentTypes.Page- Web pagesContentTypes.Message- Slack, Teams, DiscordContentTypes.Post- Social mediaContentTypes.Email- Gmail, OutlookContentTypes.Event- Calendar eventsContentTypes.Issue- Jira, Linear, GitHub issuesContentTypes.Document- PDFs, Word, etc.
Filter by File Type
import { FileTypes } from 'graphlit-client/dist/generated/graphql-types';
// Search only PDFs
const pdfResults = await graphlit.queryContents({
search: "technical specification",
filter: {
fileTypes: [FileTypes.Pdf]
}
});
// Search documents and presentations
const docResults = await graphlit.queryContents({
search: "product roadmap",
filter: {
fileTypes: [FileTypes.Pdf, FileTypes.Docx, FileTypes.Pptx]
}
});
Common file types:
FileTypes.PdfFileTypes.DocxFileTypes.PptxFileTypes.XlsxFileTypes.Md- MarkdownFileTypes.TxtFileTypes.Mp4- VideoFileTypes.Mp3- Audio
Filter by Date Range
// Content created in last 30 days
const recent = await graphlit.queryContents({
search: "project updates",
filter: {
creationDateRange: {
from: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000).toISOString(),
to: new Date().toISOString()
}
}
});
// Content from specific quarter
const q4Results = await graphlit.queryContents({
search: "financial report",
filter: {
creationDateRange: {
from: "2024-10-01T00:00:00Z",
to: "2024-12-31T23:59:59Z"
}
}
});
Filter by Entity (Knowledge Graph)
Powerful pattern: Search content mentioning specific people, companies, or places.
import { ObservableTypes } from 'graphlit-client/dist/generated/graphql-types';
// First, find the entity
const aliceEntity = await graphlit.queryObservables({
filter: {
searchText: "Alice Johnson",
types: [ObservableTypes.Person]
}
});
const aliceId = aliceEntity.observables?.results?.[0]?.observable.id;
// Search content mentioning Alice
const results = await graphlit.queryContents({
search: "project status",
filter: {
observations: {
observables: [{ id: aliceId }]
}
}
});
console.log(`Content mentioning Alice Johnson about project status:`);
results.contents.results.forEach(content => {
console.log(`- ${content.name}`);
});
Use cases:
- "Show me all emails from Bob"
- "Find docs mentioning Acme Corp"
- "Content about Project Phoenix"
- "Meetings with the CFO"
Combine Multiple Filters
// Complex filter: Recent PDFs mentioning Alice about "budget"
const complexResults = await graphlit.queryContents({
search: "budget proposal",
filter: {
types: [ContentTypes.Document],
fileTypes: [FileTypes.Pdf],
creationDateRange: {
from: new Date(Date.now() - 90 * 24 * 60 * 60 * 1000).toISOString()
},
observations: {
observables: [{ id: aliceId }]
}
}
});
console.log(`Recent PDF budget proposals mentioning Alice: ${complexResults.contents.results.length}`);
Part 4: "Find Similar" Queries
Build recommendation engines and "more like this" features.
Find Similar Documents
// User reads a document, show similar content
const currentContentId = '550e8400-e29b-41d4-a716-446655440000';
const similar = await graphlit.queryContents({
filter: {
similarContents: [{ id: currentContentId }]
},
limit: 5
});
console.log('Similar documents:');
similar.contents.results.forEach((content, index) => {
console.log(`${index + 1}. ${content.name} (${(content.relevance * 100).toFixed(1)}% match)`);
});
How it works:
- Uses vector embeddings of entire document
- Finds nearest neighbors in embedding space
- No query text needed—pure similarity
Use cases:
- "More like this" buttons
- Related articles
- Recommendation engines
- Content discovery
Similar + Search
// Find similar documents about a specific topic
const results = await graphlit.queryContents({
search: "machine learning",
filter: {
similarContents: [{ id: currentContentId }]
}
});
// Result: Documents similar to currentContent that also mention "machine learning"
Similar + Filters
// Similar PDFs from last 3 months
const filteredSimilar = await graphlit.queryContents({
filter: {
similarContents: [{ id: currentContentId }],
fileTypes: [FileTypes.Pdf],
creationDateRange: {
from: new Date(Date.now() - 90 * 24 * 60 * 60 * 1000).toISOString()
}
}
});
Part 5: Performance Patterns
Pagination
// First page (10 results)
const page1 = await graphlit.queryContents({
search: "artificial intelligence",
limit: 10,
offset: 0
});
// Second page
const page2 = await graphlit.queryContents({
search: "artificial intelligence",
limit: 10,
offset: 10
});
// Third page
const page3 = await graphlit.queryContents({
search: "artificial intelligence",
limit: 10,
offset: 20
});
Best practice: Limit to 10-20 results per page for optimal performance.
Limit Results for Speed
// Fast: Get top 5 results
const topResults = await graphlit.queryContents({
search: "quarterly report",
limit: 5
});
// Slower: Get top 100 results
const manyResults = await graphlit.queryContents({
search: "quarterly report",
limit: 100 // Takes longer
});
Rule of thumb:
- Top 10-20: Very fast (< 100ms)
- Top 50: Fast (< 300ms)
- Top 100+: Slower (> 500ms)
⚠️ Warning: Requesting 1000+ results can timeout. Use pagination instead—fetch 20 at a time as users scroll.
Pre-filter for Speed
// Slow: Search everything, then filter in app
const allResults = await graphlit.queryContents({
search: "project update"
});
const pdfResults = allResults.contents.results.filter(c => c.fileType === 'PDF');
// Fast: Filter at query time
const fastResults = await graphlit.queryContents({
search: "project update",
filter: {
fileTypes: [FileTypes.Pdf]
}
});
Why faster:
- Pre-filtering reduces search space
- Vector search only scans relevant content
- Indexed filters are optimized
Cache Repeated Queries
// In your application (not Graphlit SDK)
const searchCache = new Map<string, any>();
async function cachedSearch(query: string) {
if (searchCache.has(query)) {
console.log('Cache hit!');
return searchCache.get(query);
}
const results = await graphlit.queryContents({ search: query });
searchCache.set(query, results);
// Expire after 5 minutes
setTimeout(() => searchCache.delete(query), 5 * 60 * 1000);
return results;
}
// Usage
const results = await cachedSearch("machine learning"); // First call: hits API
const results2 = await cachedSearch("machine learning"); // Second call: cache hit
Optimize Filters
// Slow: Multiple sequential filters
const step1 = await graphlit.queryContents({ search: "AI" });
const step2 = step1.contents.results.filter(c => c.type === 'DOCUMENT');
const step3 = step2.filter(c => new Date(c.creationDate) > someDate);
// Fast: Single query with all filters
const optimized = await graphlit.queryContents({
search: "AI",
filter: {
types: [ContentTypes.Document],
creationDateRange: { from: someDate.toISOString() }
}
});
Part 6: Advanced Patterns
Faceted Search
Build filter UIs that show available options:
// Get all content types in results
const results = await graphlit.queryContents({
search: "machine learning"
});
const typeCounts = new Map<string, number>();
results.contents.results.forEach(content => {
typeCounts.set(content.type, (typeCounts.get(content.type) || 0) + 1);
});
console.log('Available filters:');
typeCounts.forEach((count, type) => {
console.log(`${type}: ${count} results`);
});
Example output:
Available filters:
DOCUMENT: 23 results
EMAIL: 12 results
PAGE: 8 results
MESSAGE: 5 results
Use case: Build filter sidebar with counts.
Search + Entity Extraction
Search for content, then extract mentioned entities:
// Search
const results = await graphlit.queryContents({
search: "quarterly financial results"
});
// For each result, get mentioned companies
for (const content of results.contents.results) {
const details = await graphlit.getContent(content.id);
const companies = details.content.observations
?.filter(obs => obs.type === ObservableTypes.Organization)
.map(obs => obs.observable.name);
console.log(`${content.name} mentions: ${companies?.join(', ')}`);
}
Use case: "Show search results grouped by mentioned company."
Multi-Step Filtering
Narrow results progressively:
// Step 1: Broad search
const step1 = await graphlit.queryContents({
search: "project updates",
limit: 100
});
// User selects "last 30 days" filter
const step2 = await graphlit.queryContents({
search: "project updates",
filter: {
creationDateRange: {
from: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000).toISOString()
}
},
limit: 100
});
// User selects "PDFs only"
const step3 = await graphlit.queryContents({
search: "project updates",
filter: {
creationDateRange: {
from: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000).toISOString()
},
fileTypes: [FileTypes.Pdf]
}
});
console.log(`Narrowed from ${step1.contents.results.length} → ${step2.contents.results.length} → ${step3.contents.results.length}`);
Search Highlighting (Manual)
Graphlit doesn't return highlighted snippets, but you can build them:
const results = await graphlit.queryContents({
search: "machine learning"
});
// For each result, find matching text
results.contents.results.forEach(content => {
if (content.pages && content.pages.length > 0) {
const topChunk = content.pages[0].chunks?.[0];
if (topChunk) {
// Simple highlight (you'd use a better algorithm in production)
const highlighted = topChunk.text
.replace(/machine learning/gi, '<mark>$&</mark>');
console.log(`${content.name}: ...${highlighted.substring(0, 200)}...`);
}
}
});
Part 7: Production Search Architecture
Pattern 1: Two-Tier Search
Fast initial search, detailed follow-up:
// Tier 1: Fast search (limited fields)
const quickResults = await graphlit.queryContents({
search: query,
limit: 20
});
// Display results to user
displayResults(quickResults);
// Tier 2: User clicks a result, fetch full details
async function onResultClick(contentId: string) {
const details = await graphlit.getContent(contentId);
// Show full content with metadata, entities, etc.
displayFullContent(details);
}
Pattern 2: Search + RAG
Use search results as context for AI:
// Search for relevant content
const searchResults = await graphlit.queryContents({
search: "What is our return policy?",
limit: 5
});
// Extract text from top results
const context = searchResults.contents.results
.map(c => c.pages?.[0]?.chunks?.[0]?.text || '')
.join('\n\n');
// Pass to conversation (RAG)
const conversation = await graphlit.createConversation('Customer Support');
const response = await graphlit.promptConversation(
`Context:\n${context}\n\nQuestion: What is our return policy?`,
conversation.createConversation.id
);
Pattern 3: User-Scoped Search
Multi-tenant search with user-specific content:
// Each user has a collection
const userCollectionId = await getUserCollection(userId);
// Search only user's content
const results = await graphlit.queryContents({
search: query,
filter: {
collections: [{ id: userCollectionId }]
}
});
Pattern 4: Real-Time Search
Debounced search-as-you-type:
import { debounce } from 'lodash';
// Debounce search to avoid hammering API
const debouncedSearch = debounce(async (query: string) => {
if (query.length < 3) return; // Don't search short queries
const results = await graphlit.queryContents({
search: query,
limit: 10
});
updateSearchResults(results);
}, 300); // 300ms delay
// User types in search box
searchInput.addEventListener('input', (e) => {
debouncedSearch(e.target.value);
});
Pattern 5: Federated Search
Search multiple content sources simultaneously:
// Search docs
const docResults = graphlit.queryContents({
search: query,
filter: { types: [ContentTypes.Document] }
});
// Search emails
const emailResults = graphlit.queryContents({
search: query,
filter: { types: [ContentTypes.Email] }
});
// Search messages
const messageResults = graphlit.queryContents({
search: query,
filter: { types: [ContentTypes.Message] }
});
// Combine results
const [docs, emails, messages] = await Promise.all([docResults, emailResults, messageResults]);
console.log(`Found: ${docs.contents.results.length} docs, ${emails.contents.results.length} emails, ${messages.contents.results.length} messages`);
Common Issues & Solutions
Issue: No Results for Exact Phrase
Problem: Searching "Project Alpha" returns nothing, but you know it exists.
Solutions:
- Try keyword search:
const results = await graphlit.queryContents({
search: "Project Alpha",
searchType: SearchTypes.Keyword
});
- Check content state:
import { EntityState } from 'graphlit-client/dist/generated/graphql-types';
const results = await graphlit.queryContents({
search: "Project Alpha",
filter: {
states: [EntityState.Enabled, EntityState.Disabled] // Include all states
}
});
Issue: Too Many Irrelevant Results
Problem: Search returns hundreds of loosely related documents.
Solutions:
- Add filters:
const filtered = await graphlit.queryContents({
search: "AI",
filter: {
types: [ContentTypes.Document],
creationDateRange: { from: "2024-01-01T00:00:00Z" }
}
});
- Lower limit:
const topResults = await graphlit.queryContents({
search: "AI",
limit: 10 // Only top 10 most relevant
});
Issue: Slow Queries
Problem: Search takes > 1 second.
Solutions:
- Reduce limit:
// Fast
const results = await graphlit.queryContents({
search: query,
limit: 10
});
- Add filters to narrow scope:
const results = await graphlit.queryContents({
search: query,
filter: {
types: [ContentTypes.Document], // Search only docs
creationDateRange: { from: recentDate } // Only recent
}
});
- Cache common queries (see Performance Patterns above)
What's Next?
You now have a complete understanding of search in Graphlit. Next steps:
- Build a search UI with filters and facets
- Integrate with RAG using search results as context
- Add entity filtering for knowledge graph-powered search
- Optimize performance with the patterns above
Related guides:
- Building Knowledge Graphs - Extract entities for filtered search
- Building AI Chat Applications - RAG is powered by search
- Metadata Filtering - Advanced query optimization
Complete Example: Production Search API
Here's a production-ready search endpoint with all patterns combined:
import { Graphlit } from 'graphlit-client';
import {
ContentTypes,
FileTypes,
SearchTypes,
EntityState
} from 'graphlit-client/dist/generated/graphql-types';
const graphlit = new Graphlit();
interface SearchOptions {
query: string;
contentTypes?: ContentTypes[];
fileTypes?: FileTypes[];
dateFrom?: string;
entityIds?: string[];
searchType?: SearchTypes;
limit?: number;
offset?: number;
}
async function productionSearch(options: SearchOptions) {
const {
query,
contentTypes,
fileTypes,
dateFrom,
entityIds,
searchType = SearchTypes.Hybrid, // Default hybrid
limit = 20,
offset = 0
} = options;
// Build filter dynamically
const filter: any = {};
if (contentTypes && contentTypes.length > 0) {
filter.types = contentTypes;
}
if (fileTypes && fileTypes.length > 0) {
filter.fileTypes = fileTypes;
}
if (dateFrom) {
filter.creationDateRange = { from: dateFrom };
}
if (entityIds && entityIds.length > 0) {
filter.observations = {
observables: entityIds.map(id => ({ id }))
};
}
// Execute search
console.log(`Searching: "${query}" with ${Object.keys(filter).length} filters`);
const startTime = Date.now();
const results = await graphlit.queryContents({
search: query,
searchType,
filter: Object.keys(filter).length > 0 ? filter : undefined,
limit,
offset
});
const duration = Date.now() - startTime;
// Return formatted results
return {
query,
total: results.contents.results.length,
duration: `${duration}ms`,
results: results.contents.results.map(content => ({
id: content.id,
name: content.name,
type: content.type,
fileType: content.fileType,
relevance: content.relevance,
creationDate: content.creationDate,
// Preview text from first chunk
preview: content.pages?.[0]?.chunks?.[0]?.text?.substring(0, 200)
}))
};
}
// Example usage
const searchResults = await productionSearch({
query: "machine learning healthcare applications",
contentTypes: [ContentTypes.Document],
fileTypes: [FileTypes.Pdf],
dateFrom: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000).toISOString(),
limit: 10
});
console.log(JSON.stringify(searchResults, null, 2));
Output:
{
"query": "machine learning healthcare applications",
"total": 10,
"duration": "142ms",
"results": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "ML in Medical Diagnosis.pdf",
"type": "DOCUMENT",
"fileType": "PDF",
"relevance": 0.923,
"creationDate": "2024-06-15T10:30:00Z",
"preview": "Machine learning has revolutionized medical diagnosis by enabling accurate pattern recognition in medical imaging. Deep learning models can now detect anomalies in X-rays..."
}
]
}
Happy searching! 🔍