Specifications control which AI models Graphlit uses for embeddings, chat, entity extraction, and enrichment. The right model choice impacts quality, cost, and latency. This guide helps you choose wisely.
What You'll Learn
- LLM options (GPT-4, Claude, Gemini, Llama)
- Embedding models and dimensions
- Creating custom specifications
- Model selection by use case
- Cost vs quality tradeoffs
- Latency optimization
Prerequisites: A Graphlit project, SDK installed.
Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.
Part 1: Model Types
LLMs (Large Language Models)
Used for: Chat, summarization, entity extraction
Available models:
OpenAI:
gpt-4o- Latest, best quality, balanced speedgpt-4-turbo- Previous gen, still excellentgpt-3.5-turbo- Faster, cheaper, lower qualitygpt-4o-mini- Ultra-fast, cheapest OpenAI model
Anthropic Claude:
claude-3-5-sonnet- Best reasoning, slowerclaude-3-opus- Highest quality, expensiveclaude-3-haiku- Fastest, cheapest Claude
Google Gemini:
gemini-1.5-pro- Strong multimodal, good balancegemini-1.5-flash- Fast, cost-effective
Meta Llama (via Groq):
llama-3-70b- Open source, fast inference
Embedding Models
Used for: Vector search, similarity
Available models:
OpenAI:
text-embedding-3-large- 3072 dimensions, best quality (default)text-embedding-3-small- 1536 dimensions, good quality, 2x cheapertext-embedding-ada-002- Legacy, 1536 dimensions
Cohere:
embed-english-v3.0- Optimized for Englishembed-multilingual-v3.0- 100+ languages
Voyage AI:
voyage-large-2- High quality, 1536 dimensionsvoyage-code-2- Optimized for code
Part 2: Creating Specifications
Default Specification
Graphlit uses defaults if you don't specify:
- LLM:
gpt-4o - Embeddings:
text-embedding-3-large
import { Graphlit } from 'graphlit-client';
const graphlit = new Graphlit();
// Uses default models
const conversation = await graphlit.createConversation('My Chat');
const response = await graphlit.promptConversation('Hello', conversation.createConversation.id);
Custom LLM Specification
import { ModelServiceTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';
// Create specification with Claude
const spec = await graphlit.createSpecification({
name: 'Claude Sonnet',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Anthropic,
anthropic: {
model: 'claude-3-5-sonnet-20241022',
temperature: 0.7,
probability: 0.2,
completionTokenLimit: 4096
}
});
console.log('Specification created:', spec.createSpecification.id);
// Use specification with conversation
const conversation = await graphlit.createConversation(
'Claude Chat',
{ id: spec.createSpecification.id }
);
const response = await graphlit.promptConversation('Explain AI', conversation.createConversation.id);
Custom Embedding Specification
// Create embedding specification
const embeddingSpec = await graphlit.createSpecification({
name: 'Small Embeddings',
type: SpecificationTypes.Embedding,
serviceType: ModelServiceTypes.OpenAI,
openAI: {
model: 'text-embedding-3-small', // Cheaper, 1536 dimensions
modelType: 'EMBEDDING'
}
});
// Use with content ingestion
const workflow = await graphlit.createWorkflow({
name: 'Custom Embeddings',
preparation: { /* ... */ },
embedding: { id: embeddingSpec.createSpecification.id }
});
Part 3: Model Selection by Use Case
Chat Applications
Recommended: GPT-4o
const chatSpec = await graphlit.createSpecification({
name: 'Chat Optimized',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAI,
openAI: {
model: 'gpt-4o',
temperature: 0.7, // Balanced creativity
completionTokenLimit: 2048
}
});
Why: Fast response, good quality, reasonable cost
Alternative: Claude 3.5 Sonnet
- Use for complex reasoning tasks
- Better for multi-step analysis
- Slower but higher quality
Summarization
Recommended: GPT-4o-mini
const summarySpec = await graphlit.createSpecification({
name: 'Summarization',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAI,
openAI: {
model: 'gpt-4o-mini',
temperature: 0.3, // Lower for factual output
completionTokenLimit: 512
}
});
Why: Fast, cheap, good enough for summaries
Entity Extraction
Recommended: GPT-4o
const entitySpec = await graphlit.createSpecification({
name: 'Entity Extraction',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAI,
openAI: {
model: 'gpt-4o',
temperature: 0.1, // Very low for precision
completionTokenLimit: 1024
}
});
Why: Accurate, reliable entity detection
Search/Embeddings
Recommended: text-embedding-3-large (default)
const searchSpec = await graphlit.createSpecification({
name: 'High Quality Embeddings',
type: SpecificationTypes.Embedding,
serviceType: ModelServiceTypes.OpenAI,
openAI: {
model: 'text-embedding-3-large',
modelType: 'EMBEDDING'
}
});
Why: Best search quality, worth the cost
Cost-sensitive alternative: text-embedding-3-small
- 2x cheaper
- 1536 vs 3072 dimensions
- 95% of the quality
Code/Technical Content
Recommended: GPT-4o or Gemini 1.5 Pro
const codeSpec = await graphlit.createSpecification({
name: 'Code Optimized',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.Google,
google: {
model: 'gemini-1.5-pro',
temperature: 0.2,
completionTokenLimit: 4096
}
});
Why: Strong code understanding, good for technical docs
Part 4: Cost Optimization
Pricing Overview (Approximate)
LLM Costs (per 1M tokens):
- GPT-4o: $2.50 input, $10 output
- GPT-4o-mini: $0.15 input, $0.60 output
- Claude 3.5 Sonnet: $3 input, $15 output
- Claude 3 Haiku: $0.25 input, $1.25 output
- Gemini 1.5 Pro: $1.25 input, $5 output
- Gemini 1.5 Flash: $0.075 input, $0.30 output
Embedding Costs (per 1M tokens):
- text-embedding-3-large: $0.13
- text-embedding-3-small: $0.02
- text-embedding-ada-002: $0.10
Optimization Strategies
1. Use cheaper models for simple tasks:
// For summarization, don't use GPT-4o
const cheapSummary = await graphlit.createSpecification({
serviceType: ModelServiceTypes.OpenAI,
openAI: { model: 'gpt-4o-mini' } // 17x cheaper!
});
2. Lower temperature for factual tasks:
// Lower temperature = less randomness = fewer tokens
openAI: {
model: 'gpt-4o',
temperature: 0.1, // vs 0.7 default
}
3. Limit output tokens:
openAI: {
model: 'gpt-4o',
completionTokenLimit: 512 // vs 2048 default
}
4. Use smaller embeddings:
// text-embedding-3-small = 2x cheaper
// 95% of quality for most use cases
openAI: {
model: 'text-embedding-3-small'
}
5. Cache embeddings (automatic):
- Graphlit caches embeddings automatically
- Re-ingesting same content doesn't re-embed
Part 5: Latency Optimization
Fastest Models
For chat (latency-critical):
gpt-4o-mini- Fastest OpenAI modelclaude-3-haiku- Fastest Claude modelgemini-1.5-flash- Fast Google model
const fastSpec = await graphlit.createSpecification({
name: 'Ultra Fast',
type: SpecificationTypes.Completion,
serviceType: ModelServiceTypes.OpenAI,
openAI: {
model: 'gpt-4o-mini',
temperature: 0.7,
completionTokenLimit: 1024 // Smaller = faster
}
});
Streaming for Perceived Speed
// Even with slower models, streaming feels faster
await graphlit.streamAgent(
'Explain quantum computing',
async (event) => {
// Text appears immediately as it generates
},
conversationId,
{ id: claudeSpecId } // Even Claude feels fast with streaming
);
Part 6: Production Patterns
Pattern 1: Specification Library
// Create once, reuse everywhere
const specs = {
chat: await graphlit.createSpecification({
name: 'Chat',
serviceType: ModelServiceTypes.OpenAI,
openAI: { model: 'gpt-4o' }
}),
summary: await graphlit.createSpecification({
name: 'Summary',
serviceType: ModelServiceTypes.OpenAI,
openAI: { model: 'gpt-4o-mini' }
}),
entities: await graphlit.createSpecification({
name: 'Entities',
serviceType: ModelServiceTypes.OpenAI,
openAI: { model: 'gpt-4o', temperature: 0.1 }
})
};
// Use appropriate spec per task
function getSpecForTask(task: string) {
switch (task) {
case 'chat': return specs.chat.createSpecification.id;
case 'summarize': return specs.summary.createSpecification.id;
case 'extract': return specs.entities.createSpecification.id;
}
}
Pattern 2: A/B Testing Models
// Test different models for quality/cost
const modelA = await graphlit.promptConversation(
'Test question',
undefined,
{ id: gpt4SpecId }
);
const modelB = await graphlit.promptConversation(
'Test question',
undefined,
{ id: claudeSpecId }
);
// Compare quality, cost, latency
console.log('GPT-4o:', modelA.promptConversation?.message?.message);
console.log('Claude:', modelB.promptConversation?.message?.message);
Pattern 3: Dynamic Model Selection
async function smartPrompt(prompt: string, complexity: 'simple' | 'complex') {
const specId = complexity === 'simple'
? fastCheapSpecId
: highQualitySpecId;
return graphlit.promptConversation(prompt, undefined, { id: specId });
}
// Simple question → cheap model
await smartPrompt('What is 2+2?', 'simple');
// Complex analysis → expensive model
await smartPrompt('Analyze Q4 strategy implications', 'complex');
Common Issues & Solutions
Issue: Poor Response Quality
Problem: AI responses are inaccurate or nonsensical.
Solutions:
- Try higher-quality model:
// Upgrade to GPT-4o or Claude 3.5 Sonnet
openAI: { model: 'gpt-4o' }
- Lower temperature for factual tasks:
openAI: { temperature: 0.1 } // vs 0.7
Issue: Slow Responses
Problem: Chat takes > 10 seconds.
Solution: Use faster models:
openAI: { model: 'gpt-4o-mini' } // 3x faster
Issue: High Costs
Problem: Bills are too high.
Solutions:
- Use cheaper models for non-critical tasks
- Lower completion token limits
- Use smaller embeddings
What's Next?
You now understand model selection. Next steps:
- Create specification library for your use cases
- A/B test models to find best quality/cost balance
- Monitor usage to optimize costs
Related guides:
- Building AI Chat Applications - Use specifications in chat
- Workflows and Processing - Apply specs to workflows
Happy optimizing! 🎯