Core Platform15 min read

AI Models and Specifications Guide

Choose and configure the right AI models for your use case. Guide covers LLM selection (GPT-4, Claude, Gemini), embedding models, custom specifications, and cost optimization.

Specifications control which AI models Graphlit uses for embeddings, chat, entity extraction, and enrichment. The right model choice impacts quality, cost, and latency. This guide helps you choose wisely.

What You'll Learn

  • LLM options (GPT-4, Claude, Gemini, Llama)
  • Embedding models and dimensions
  • Creating custom specifications
  • Model selection by use case
  • Cost vs quality tradeoffs
  • Latency optimization

Prerequisites: A Graphlit project, SDK installed.

Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.


Part 1: Model Types

LLMs (Large Language Models)

Used for: Chat, summarization, entity extraction

Available models:

OpenAI:

  • gpt-4o - Latest, best quality, balanced speed
  • gpt-4-turbo - Previous gen, still excellent
  • gpt-3.5-turbo - Faster, cheaper, lower quality
  • gpt-4o-mini - Ultra-fast, cheapest OpenAI model

Anthropic Claude:

  • claude-3-5-sonnet - Best reasoning, slower
  • claude-3-opus - Highest quality, expensive
  • claude-3-haiku - Fastest, cheapest Claude

Google Gemini:

  • gemini-1.5-pro - Strong multimodal, good balance
  • gemini-1.5-flash - Fast, cost-effective

Meta Llama (via Groq):

  • llama-3-70b - Open source, fast inference

Embedding Models

Used for: Vector search, similarity

Available models:

OpenAI:

  • text-embedding-3-large - 3072 dimensions, best quality (default)
  • text-embedding-3-small - 1536 dimensions, good quality, 2x cheaper
  • text-embedding-ada-002 - Legacy, 1536 dimensions

Cohere:

  • embed-english-v3.0 - Optimized for English
  • embed-multilingual-v3.0 - 100+ languages

Voyage AI:

  • voyage-large-2 - High quality, 1536 dimensions
  • voyage-code-2 - Optimized for code

Part 2: Creating Specifications

Default Specification

Graphlit uses defaults if you don't specify:

  • LLM: gpt-4o
  • Embeddings: text-embedding-3-large
import { Graphlit } from 'graphlit-client';

const graphlit = new Graphlit();

// Uses default models
const conversation = await graphlit.createConversation('My Chat');
const response = await graphlit.promptConversation('Hello', conversation.createConversation.id);

Custom LLM Specification

import { ModelServiceTypes, SpecificationTypes } from 'graphlit-client/dist/generated/graphql-types';

// Create specification with Claude
const spec = await graphlit.createSpecification({
  name: 'Claude Sonnet',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Anthropic,
  anthropic: {
    model: 'claude-3-5-sonnet-20241022',
    temperature: 0.7,
    probability: 0.2,
    completionTokenLimit: 4096
  }
});

console.log('Specification created:', spec.createSpecification.id);

// Use specification with conversation
const conversation = await graphlit.createConversation(
  'Claude Chat',
  { id: spec.createSpecification.id }
);

const response = await graphlit.promptConversation('Explain AI', conversation.createConversation.id);

Custom Embedding Specification

// Create embedding specification
const embeddingSpec = await graphlit.createSpecification({
  name: 'Small Embeddings',
  type: SpecificationTypes.Embedding,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'text-embedding-3-small',  // Cheaper, 1536 dimensions
    modelType: 'EMBEDDING'
  }
});

// Use with content ingestion
const workflow = await graphlit.createWorkflow({
  name: 'Custom Embeddings',
  preparation: { /* ... */ },
  embedding: { id: embeddingSpec.createSpecification.id }
});

Part 3: Model Selection by Use Case

Chat Applications

Recommended: GPT-4o

const chatSpec = await graphlit.createSpecification({
  name: 'Chat Optimized',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o',
    temperature: 0.7,  // Balanced creativity
    completionTokenLimit: 2048
  }
});

Why: Fast response, good quality, reasonable cost

Alternative: Claude 3.5 Sonnet

  • Use for complex reasoning tasks
  • Better for multi-step analysis
  • Slower but higher quality

Summarization

Recommended: GPT-4o-mini

const summarySpec = await graphlit.createSpecification({
  name: 'Summarization',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o-mini',
    temperature: 0.3,  // Lower for factual output
    completionTokenLimit: 512
  }
});

Why: Fast, cheap, good enough for summaries

Entity Extraction

Recommended: GPT-4o

const entitySpec = await graphlit.createSpecification({
  name: 'Entity Extraction',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o',
    temperature: 0.1,  // Very low for precision
    completionTokenLimit: 1024
  }
});

Why: Accurate, reliable entity detection

Search/Embeddings

Recommended: text-embedding-3-large (default)

const searchSpec = await graphlit.createSpecification({
  name: 'High Quality Embeddings',
  type: SpecificationTypes.Embedding,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'text-embedding-3-large',
    modelType: 'EMBEDDING'
  }
});

Why: Best search quality, worth the cost

Cost-sensitive alternative: text-embedding-3-small

  • 2x cheaper
  • 1536 vs 3072 dimensions
  • 95% of the quality

Code/Technical Content

Recommended: GPT-4o or Gemini 1.5 Pro

const codeSpec = await graphlit.createSpecification({
  name: 'Code Optimized',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.Google,
  google: {
    model: 'gemini-1.5-pro',
    temperature: 0.2,
    completionTokenLimit: 4096
  }
});

Why: Strong code understanding, good for technical docs


Part 4: Cost Optimization

Pricing Overview (Approximate)

LLM Costs (per 1M tokens):

  • GPT-4o: $2.50 input, $10 output
  • GPT-4o-mini: $0.15 input, $0.60 output
  • Claude 3.5 Sonnet: $3 input, $15 output
  • Claude 3 Haiku: $0.25 input, $1.25 output
  • Gemini 1.5 Pro: $1.25 input, $5 output
  • Gemini 1.5 Flash: $0.075 input, $0.30 output

Embedding Costs (per 1M tokens):

  • text-embedding-3-large: $0.13
  • text-embedding-3-small: $0.02
  • text-embedding-ada-002: $0.10

Optimization Strategies

1. Use cheaper models for simple tasks:

// For summarization, don't use GPT-4o
const cheapSummary = await graphlit.createSpecification({
  serviceType: ModelServiceTypes.OpenAI,
  openAI: { model: 'gpt-4o-mini' }  // 17x cheaper!
});

2. Lower temperature for factual tasks:

// Lower temperature = less randomness = fewer tokens
openAI: {
  model: 'gpt-4o',
  temperature: 0.1,  // vs 0.7 default
}

3. Limit output tokens:

openAI: {
  model: 'gpt-4o',
  completionTokenLimit: 512  // vs 2048 default
}

4. Use smaller embeddings:

// text-embedding-3-small = 2x cheaper
// 95% of quality for most use cases
openAI: {
  model: 'text-embedding-3-small'
}

5. Cache embeddings (automatic):

  • Graphlit caches embeddings automatically
  • Re-ingesting same content doesn't re-embed

Part 5: Latency Optimization

Fastest Models

For chat (latency-critical):

  1. gpt-4o-mini - Fastest OpenAI model
  2. claude-3-haiku - Fastest Claude model
  3. gemini-1.5-flash - Fast Google model
const fastSpec = await graphlit.createSpecification({
  name: 'Ultra Fast',
  type: SpecificationTypes.Completion,
  serviceType: ModelServiceTypes.OpenAI,
  openAI: {
    model: 'gpt-4o-mini',
    temperature: 0.7,
    completionTokenLimit: 1024  // Smaller = faster
  }
});

Streaming for Perceived Speed

// Even with slower models, streaming feels faster
await graphlit.streamAgent(
  'Explain quantum computing',
  async (event) => {
    // Text appears immediately as it generates
  },
  conversationId,
  { id: claudeSpecId }  // Even Claude feels fast with streaming
);

Part 6: Production Patterns

Pattern 1: Specification Library

// Create once, reuse everywhere
const specs = {
  chat: await graphlit.createSpecification({
    name: 'Chat',
    serviceType: ModelServiceTypes.OpenAI,
    openAI: { model: 'gpt-4o' }
  }),
  
  summary: await graphlit.createSpecification({
    name: 'Summary',
    serviceType: ModelServiceTypes.OpenAI,
    openAI: { model: 'gpt-4o-mini' }
  }),
  
  entities: await graphlit.createSpecification({
    name: 'Entities',
    serviceType: ModelServiceTypes.OpenAI,
    openAI: { model: 'gpt-4o', temperature: 0.1 }
  })
};

// Use appropriate spec per task
function getSpecForTask(task: string) {
  switch (task) {
    case 'chat': return specs.chat.createSpecification.id;
    case 'summarize': return specs.summary.createSpecification.id;
    case 'extract': return specs.entities.createSpecification.id;
  }
}

Pattern 2: A/B Testing Models

// Test different models for quality/cost
const modelA = await graphlit.promptConversation(
  'Test question',
  undefined,
  { id: gpt4SpecId }
);

const modelB = await graphlit.promptConversation(
  'Test question',
  undefined,
  { id: claudeSpecId }
);

// Compare quality, cost, latency
console.log('GPT-4o:', modelA.promptConversation?.message?.message);
console.log('Claude:', modelB.promptConversation?.message?.message);

Pattern 3: Dynamic Model Selection

async function smartPrompt(prompt: string, complexity: 'simple' | 'complex') {
  const specId = complexity === 'simple' 
    ? fastCheapSpecId 
    : highQualitySpecId;
  
  return graphlit.promptConversation(prompt, undefined, { id: specId });
}

// Simple question → cheap model
await smartPrompt('What is 2+2?', 'simple');

// Complex analysis → expensive model
await smartPrompt('Analyze Q4 strategy implications', 'complex');

Common Issues & Solutions

Issue: Poor Response Quality

Problem: AI responses are inaccurate or nonsensical.

Solutions:

  1. Try higher-quality model:
// Upgrade to GPT-4o or Claude 3.5 Sonnet
openAI: { model: 'gpt-4o' }
  1. Lower temperature for factual tasks:
openAI: { temperature: 0.1 }  // vs 0.7

Issue: Slow Responses

Problem: Chat takes > 10 seconds.

Solution: Use faster models:

openAI: { model: 'gpt-4o-mini' }  // 3x faster

Issue: High Costs

Problem: Bills are too high.

Solutions:

  1. Use cheaper models for non-critical tasks
  2. Lower completion token limits
  3. Use smaller embeddings

What's Next?

You now understand model selection. Next steps:

  1. Create specification library for your use cases
  2. A/B test models to find best quality/cost balance
  3. Monitor usage to optimize costs

Related guides:

Happy optimizing! 🎯

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

AI Models and Specifications Guide | Graphlit Developer Guides