AI Models and Specifications Guide

Specifications control which AI models Graphlit uses for embeddings, chat, entity extraction, and enrichment. The right model choice impacts quality, cost, and latency. This guide helps you choose wisely.

What You'll Learn

LLM options (GPT-4, Claude, Gemini, Llama)
Embedding models and dimensions
Creating custom specifications
Model selection by use case
Cost vs quality tradeoffs
Latency optimization

Prerequisites: A Graphlit project, SDK installed.

Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.

Part 1: Model Types

LLMs (Large Language Models)

Used for: Chat, summarization, entity extraction

Available models:

OpenAI:

gpt-4o - Latest, best quality, balanced speed
gpt-4-turbo - Previous gen, still excellent
gpt-3.5-turbo - Faster, cheaper, lower quality
gpt-4o-mini - Ultra-fast, cheapest OpenAI model

Anthropic Claude:

claude-3-5-sonnet - Best reasoning, slower
claude-3-opus - Highest quality, expensive
claude-3-haiku - Fastest, cheapest Claude

Google Gemini:

gemini-1.5-pro - Strong multimodal, good balance
gemini-1.5-flash - Fast, cost-effective

Meta Llama (via Groq):

llama-3-70b - Open source, fast inference

Embedding Models

Used for: Vector search, similarity

Available models:

OpenAI:

text-embedding-3-large - 3072 dimensions, best quality (default)
text-embedding-3-small - 1536 dimensions, good quality, 2x cheaper
text-embedding-ada-002 - Legacy, 1536 dimensions

Cohere:

embed-english-v3.0 - Optimized for English
embed-multilingual-v3.0 - 100+ languages

Voyage AI:

voyage-large-2 - High quality, 1536 dimensions
voyage-code-2 - Optimized for code

Part 2: Creating Specifications

Default Specification

Graphlit uses defaults if you don't specify:

LLM: gpt-4o
Embeddings: text-embedding-3-large

import { Graphlit } from 'graphlit-client';

const graphlit = new Graphlit();

// Uses default models
const conversation = await graphlit.createConversation('My Chat');
const response = await graphlit.promptConversation('Hello', conversation.createConversation.id);

Custom LLM Specification

import { ModelServiceTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';

// Create specification with Claude
const spec = await graphlit.createSpecification({
  name: 'Claude Sonnet',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: 'claude-3-5-sonnet-20241022',
    temperature: 0.7,
    probability: 0.2,
    completionTokenLimit: 4096
  }
});

console.log('Specification created:', spec.createSpecification.id);

// Use specification with conversation
const conversation = await graphlit.createConversation(
  'Claude Chat',
  { id: spec.createSpecification.id }
);

const response = await graphlit.promptConversation('Explain AI', conversation.createConversation.id);

Custom Embedding Specification

// Create embedding specification
const embeddingSpec = await graphlit.createSpecification({
  name: 'Small Embeddings',
  type: SpecificationTypes.Embedding,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'text-embedding-3-small',  // Cheaper, 1536 dimensions
    modelType: 'EMBEDDING'
  }
});

// Use with content ingestion
const workflow = await graphlit.createWorkflow({
  name: 'Custom Embeddings',
  preparation: { /* ... */ },
  embedding: { id: embeddingSpec.createSpecification.id }
});

Part 3: Model Selection by Use Case

Chat Applications

Recommended: GPT-4o

const chatSpec = await graphlit.createSpecification({
  name: 'Chat Optimized',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o',
    temperature: 0.7,  // Balanced creativity
    completionTokenLimit: 2048
  }
});

Why: Fast response, good quality, reasonable cost

Alternative: Claude 3.5 Sonnet

Use for complex reasoning tasks
Better for multi-step analysis
Slower but higher quality

Summarization

Recommended: GPT-4o-mini

const summarySpec = await graphlit.createSpecification({
  name: 'Summarization',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o-mini',
    temperature: 0.3,  // Lower for factual output
    completionTokenLimit: 512
  }
});

Why: Fast, cheap, good enough for summaries

Entity Extraction

Recommended: GPT-4o

const entitySpec = await graphlit.createSpecification({
  name: 'Entity Extraction',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o',
    temperature: 0.1,  // Very low for precision
    completionTokenLimit: 1024
  }
});

Why: Accurate, reliable entity detection

Search/Embeddings

Recommended: text-embedding-3-large (default)

const searchSpec = await graphlit.createSpecification({
  name: 'High Quality Embeddings',
  type: SpecificationTypes.Embedding,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'text-embedding-3-large',
    modelType: 'EMBEDDING'
  }
});

Why: Best search quality, worth the cost

Cost-sensitive alternative: text-embedding-3-small

2x cheaper
1536 vs 3072 dimensions
95% of the quality

Code/Technical Content

Recommended: GPT-4o or Gemini 1.5 Pro

const codeSpec = await graphlit.createSpecification({
  name: 'Code Optimized',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Google,
  google: {
    model: 'gemini-1.5-pro',
    temperature: 0.2,
    completionTokenLimit: 4096
  }
});

Why: Strong code understanding, good for technical docs

Part 4: Cost Optimization

Pricing Overview (Approximate)

LLM Costs (per 1M tokens):

GPT-4o: $2.50 input, $10 output
GPT-4o-mini: $0.15 input, $0.60 output
Claude 3.5 Sonnet: $3 input, $15 output
Claude 3 Haiku: $0.25 input, $1.25 output
Gemini 1.5 Pro: $1.25 input, $5 output
Gemini 1.5 Flash: $0.075 input, $0.30 output

Embedding Costs (per 1M tokens):

text-embedding-3-large: $0.13
text-embedding-3-small: $0.02
text-embedding-ada-002: $0.10

Optimization Strategies

1. Use cheaper models for simple tasks:

// For summarization, don't use GPT-4o
const cheapSummary = await graphlit.createSpecification({
  serviceType: ModelServiceTypes.OpenAI,
  openAI: { model: 'gpt-4o-mini' }  // 17x cheaper!
});

2. Lower temperature for factual tasks:

// Lower temperature = less randomness = fewer tokens
openAI: {
  model: 'gpt-4o',
  temperature: 0.1,  // vs 0.7 default
}

3. Limit output tokens:

openAI: {
  model: 'gpt-4o',
  completionTokenLimit: 512  // vs 2048 default
}

4. Use smaller embeddings:

// text-embedding-3-small = 2x cheaper
// 95% of quality for most use cases
openAI: {
  model: 'text-embedding-3-small'
}

5. Cache embeddings (automatic):

Graphlit caches embeddings automatically
Re-ingesting same content doesn't re-embed

Part 5: Latency Optimization

Fastest Models

For chat (latency-critical):

gpt-4o-mini - Fastest OpenAI model
claude-3-haiku - Fastest Claude model
gemini-1.5-flash - Fast Google model

const fastSpec = await graphlit.createSpecification({
  name: 'Ultra Fast',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o-mini',
    temperature: 0.7,
    completionTokenLimit: 1024  // Smaller = faster
  }
});

Streaming for Perceived Speed

// Even with slower models, streaming feels faster
await graphlit.streamAgent(
  'Explain quantum computing',
  async (event) => {
    // Text appears immediately as it generates
  },
  conversationId,
  { id: claudeSpecId }  // Even Claude feels fast with streaming
);

Part 6: Production Patterns

Pattern 1: Specification Library

// Create once, reuse everywhere
const specs = {
  chat: await graphlit.createSpecification({
    name: 'Chat',
    serviceType: ModelServiceTypes.OpenAI,
    openAI: { model: 'gpt-4o' }
  }),
  
  summary: await graphlit.createSpecification({
    name: 'Summary',
    serviceType: ModelServiceTypes.OpenAI,
    openAI: { model: 'gpt-4o-mini' }
  }),
  
  entities: await graphlit.createSpecification({
    name: 'Entities',
    serviceType: ModelServiceTypes.OpenAI,
    openAI: { model: 'gpt-4o', temperature: 0.1 }
  })
};

// Use appropriate spec per task
function getSpecForTask(task: string) {
  switch (task) {
    case 'chat': return specs.chat.createSpecification.id;
    case 'summarize': return specs.summary.createSpecification.id;
    case 'extract': return specs.entities.createSpecification.id;
  }
}

Pattern 2: A/B Testing Models

// Test different models for quality/cost
const modelA = await graphlit.promptConversation(
  'Test question',
  undefined,
  { id: gpt4SpecId }
);

const modelB = await graphlit.promptConversation(
  'Test question',
  undefined,
  { id: claudeSpecId }
);

// Compare quality, cost, latency
console.log('GPT-4o:', modelA.promptConversation?.message?.message);
console.log('Claude:', modelB.promptConversation?.message?.message);

Pattern 3: Dynamic Model Selection

async function smartPrompt(prompt: string, complexity: 'simple' | 'complex') {
  const specId = complexity === 'simple' 
    ? fastCheapSpecId 
    : highQualitySpecId;
  
  return graphlit.promptConversation(prompt, undefined, { id: specId });
}

// Simple question → cheap model
await smartPrompt('What is 2+2?', 'simple');

// Complex analysis → expensive model
await smartPrompt('Analyze Q4 strategy implications', 'complex');

Common Issues & Solutions

Issue: Poor Response Quality

Problem: AI responses are inaccurate or nonsensical.

Solutions:

Try higher-quality model:

// Upgrade to GPT-4o or Claude 3.5 Sonnet
openAI: { model: 'gpt-4o' }

Lower temperature for factual tasks:

openAI: { temperature: 0.1 }  // vs 0.7

Issue: Slow Responses

Problem: Chat takes > 10 seconds.

Solution: Use faster models:

openAI: { model: 'gpt-4o-mini' }  // 3x faster

Issue: High Costs

Problem: Bills are too high.