Guide

Model-Agnostic Chat: Use GPT, Claude, Gemini, or Any LLM with Your Data

Escape vendor lock-in. Learn how to use GPT-4o, Claude 3.5 Sonnet, Gemini 2.5, Llama, or custom models with your team's knowledge in Zine.

Vendor lock-in is real. ChatGPT Enterprise forces you into GPT-only. NotebookLM locks you into Gemini. Most RAG platforms tie you to one AI provider—and when that model degrades or pricing changes, you're stuck.

Zine takes a different approach: model-agnostic architecture. Use GPT-4o for fast summaries, Claude 3.5 Sonnet for code reasoning, Gemini 2.5 Flash for long documents, or bring your own fine-tuned Llama model.

Same knowledge, different models, per conversation. This guide shows you how to leverage model flexibility for better results and lower costs.


Table of Contents

  1. Why Model Flexibility Matters
  2. Supported Models in Zine
  3. Choosing the Right Model for Each Task
  4. How to Switch Models in Zine
  5. Specifications: Model + System Prompt Presets
  6. Cost Optimization Strategies
  7. Model Performance Comparison
  8. Custom Models and Fine-Tuning
  9. Team Model Preferences

Why Model Flexibility Matters

The Vendor Lock-In Problem

Scenario 1: Model Performance Degrades

In mid-2024, many developers reported GPT-4's code quality regressed after an update. Responses became verbose, less accurate. Teams using ChatGPT Enterprise couldn't switch—they were locked in.

With Zine: Switch to Claude 3.5 Sonnet for code (better at coding) while keeping GPT-4o for other tasks. Same knowledge, better results.

Scenario 2: Pricing Changes

GPT-4 was expensive. Then GPT-4 Turbo launched (cheaper). Then GPT-4o became default (different pricing). Teams locked into GPT-4 Enterprise pricing couldn't optimize.

With Zine: Use GPT-4o for cheap/fast queries, o3-mini for complex reasoning. Optimize costs per use case.

Scenario 3: New Models Emerge

Claude 4.5 Sonnet launched with best-in-class code reasoning. Gemini 2.5 Flash has a 2M token context window (fits entire codebases). Teams locked into one provider miss these advances.

With Zine: Adopt new models as they launch. No migration needed—your knowledge stays connected.

Scenario 4: Domain-Specific Needs

Your legal team wants a fine-tuned Llama model trained on legal documents. Your sales team wants GPT-4o for speed. Your eng team wants Claude for code.

With Zine: Everyone uses the best model for their domain. Same platform, different models.


Supported Models in Zine

Zine supports all major AI providers plus custom models:

OpenAI Models

GPT-4o (Recommended for most use cases)

  • Best for: General chat, fast summaries, multimodal tasks
  • Context: 128K tokens
  • Speed: Fast (~1-2 sec response time)
  • Cost: $$ (moderate)

GPT-4 Turbo

  • Best for: High-quality reasoning, detailed analysis
  • Context: 128K tokens
  • Speed: Medium (~2-4 sec)
  • Cost: $$$ (higher)

o3-mini (Reasoning model)

  • Best for: Complex analysis, math, logic problems
  • Context: 128K tokens
  • Speed: Slow (~10-30 sec, thinks before responding)
  • Cost: $$$ (higher, but more accurate)

o3 (Advanced reasoning)

  • Best for: Most complex problems, research-grade reasoning
  • Context: 200K tokens
  • Speed: Very slow (~30-60 sec)
  • Cost: $$$$ (highest)

Anthropic Models

Claude 3.5 Sonnet (Recommended for code)

  • Best for: Code generation, technical reasoning, long conversations
  • Context: 200K tokens
  • Speed: Fast (~1-2 sec)
  • Cost: $$ (moderate)

Claude 4

  • Best for: Improved reasoning over 3.5, better at complex tasks
  • Context: 200K tokens
  • Speed: Fast (~2-3 sec)
  • Cost: $$$ (higher)

Claude 4.5 Sonnet (Latest, 2025)

  • Best for: Best code reasoning available, complex technical tasks
  • Context: 200K tokens
  • Speed: Fast (~2-3 sec)
  • Cost: $$$ (higher)

Google Models

Gemini 1.5 Pro

  • Best for: Long context tasks, document analysis
  • Context: 2M tokens (!)
  • Speed: Medium (~3-5 sec)
  • Cost: $$ (moderate)

Gemini 2.0

  • Best for: Improved reasoning, multimodal tasks
  • Context: 2M tokens
  • Speed: Fast (~2-3 sec)
  • Cost: $$ (moderate)

Gemini 2.5 Flash (Recommended for long docs)

  • Best for: Ultra-long documents, entire codebases
  • Context: 2M tokens (~1.4M words)
  • Speed: Very fast (~1-2 sec)
  • Cost: $ (cheap for the context window)

Meta Models

Llama 3.1 (Open source)

  • Best for: Privacy-sensitive workloads, cost optimization
  • Context: 128K tokens
  • Speed: Fast (depends on hosting)
  • Cost: $ (self-hosted) or $$ (via cloud)

Custom Models

Bring Your Own Model

  • Best for: Domain-specific fine-tuning, proprietary models
  • Context: Varies
  • Speed: Varies
  • Cost: Your hosting costs

Choosing the Right Model for Each Task

Quick Decision Matrix

TaskBest ModelWhy
General chatGPT-4oFast, accurate, cost-effective
Code generationClaude 3.5/4.5 SonnetBest at code reasoning, fewer hallucinations
Complex math/logico3-mini or o3Reasoning models think step-by-step
Long document analysisGemini 2.5 Flash2M context window, fast, cheap
Quick summariesGPT-4o or Gemini FlashFast response time, good enough quality
Legal/compliance reviewClaude 4Better at nuanced reasoning, safety-focused
Creative writingGPT-4o or Claude 4Both excel at natural language
Data analysiso3-miniBetter at structured reasoning
Codebase-wide queriesGemini 2.5 FlashCan fit 100K+ lines of code in context
Privacy-sensitiveLlama 3.1 (self-host)Keep data on your infrastructure

Team-Specific Recommendations

Developers:

  • Primary: Claude 3.5 Sonnet (code)
  • Secondary: GPT-4o (general), Gemini 2.5 Flash (reading entire files)

Product Managers:

  • Primary: GPT-4o (summaries, drafts)
  • Secondary: Claude 4 (detailed analysis)

Data Analysts:

  • Primary: o3-mini (complex analysis)
  • Secondary: GPT-4o (quick queries)

Sales/CS:

  • Primary: GPT-4o (fast responses, email drafts)
  • Secondary: Claude 3.5 (detailed customer analysis)

Legal/Compliance:

  • Primary: Claude 4 (nuanced reasoning)
  • Secondary: o3 (complex contract analysis)

How to Switch Models in Zine

Per-Conversation Model Selection

In any Zine conversation:

  1. Open Model Selector: Click the model name (defaults to GPT-4o)
  2. Choose Model: Select from dropdown
    • OpenAI: GPT-4o, GPT-4 Turbo, o3-mini, o3
    • Anthropic: Claude 3.5 Sonnet, Claude 4, Claude 4.5 Sonnet
    • Google: Gemini 1.5 Pro, Gemini 2.0, Gemini 2.5 Flash
    • Meta: Llama 3.1
    • Custom: Your uploaded models
  3. Chat: All responses in this conversation use the selected model

Mid-Conversation Switching

You can change models mid-conversation:

  1. User (with GPT-4o): "Summarize today's Slack activity"
  2. GPT-4o responds with summary
  3. User switches to Claude 3.5 Sonnet
  4. User: "Now write code to implement the API discussed in Slack"
  5. Claude responds with code

Knowledge persists across model switches—you're still querying the same Zine workspace.

Default Model Preference

Set your personal default model:

  1. Go to Settings → Preferences
  2. Select Default Model: (e.g., Claude 3.5 Sonnet)
  3. All new conversations start with this model

Specifications: Model + System Prompt Presets

Specifications are reusable configs that combine:

  • Model selection
  • System prompt/instructions
  • Temperature and other parameters

Creating a Specification

  1. Click New Specification
  2. Configure:
    Name: Code Assistant
    Model: Claude 3.5 Sonnet
    System Prompt: You are an expert software engineer.
                   Always provide working code with error handling.
                   Follow team's coding standards from Notion wiki.
    Temperature: 0.3 (more deterministic)
    
  3. Save

Example Specifications

"Code Assistant" (Claude 3.5 Sonnet)

System Prompt: Expert software engineer. Provide working code
with error handling. Reference team's coding standards.
Use cases: Code generation, debugging, code reviews

"Research Analyst" (GPT-4o)

System Prompt: Synthesize information from multiple sources.
Provide structured outputs with citations. Be concise.
Use cases: Market research, competitive analysis

"Meeting Summarizer" (Gemini 2.5 Flash)

System Prompt: Summarize meeting transcripts. Extract:
- Key decisions
- Action items (with owners)
- Follow-up questions
Format as bullet points.
Use cases: Meeting notes, weekly reviews

"Customer Success Agent" (Claude 4)

System Prompt: Empathetic and solution-focused. When analyzing
customer issues, provide: root cause, solution steps, and
follow-up recommendations. Reference CRM history.
Use cases: Customer support, account management

"Deep Reasoning" (o3-mini)

System Prompt: Think step-by-step. Show your reasoning.
Identify assumptions. Consider edge cases.
Use cases: Complex problem solving, architecture decisions

Using Specifications

In any conversation:

  1. Click Load Specification
  2. Select specification (e.g., "Code Assistant")
  3. Conversation uses that model + prompt

Time saved: No need to re-type instructions or remember which model works best for each task.


Cost Optimization Strategies

Strategy 1: Use Cheaper Models for Simple Tasks

Before (GPT-4 Turbo for everything):

  • 100 queries/day × $0.03/query = $3/day = $90/month

After (model selection):

  • 70 simple queries (GPT-4o) × $0.01 = $0.70/day
  • 30 complex queries (Claude 4) × $0.03 = $0.90/day
  • Total: $1.60/day = $48/month

Savings: $42/month per user

Strategy 2: Use Gemini for Long Documents

Before (GPT-4 Turbo for 50-page PDF analysis):

  • PDF has 100K tokens
  • GPT-4 Turbo: $0.30 per analysis
  • 10 PDFs/day = $3/day = $90/month

After (Gemini 2.5 Flash):

  • Gemini handles 2M tokens, faster, cheaper
  • $0.05 per analysis
  • 10 PDFs/day = $0.50/day = $15/month

Savings: $75/month

Strategy 3: Self-Host Llama for High-Volume Simple Queries

Before (GPT-4o for 1000 simple queries/day):

  • $0.01/query × 1000 = $10/day = $300/month

After (Llama 3.1 self-hosted):

  • Server cost: $50/month
  • Inference cost: ~$0
  • Total: $50/month

Savings: $250/month

Team-Wide Cost Optimization

Guideline: Set default models per role:

  • Developers: Claude 3.5 Sonnet (best for code, worth the cost)
  • Sales/CS: GPT-4o (fast, cheap, good enough)
  • Analysts: o3-mini (complex work, worth the cost) + GPT-4o (quick queries)
  • Admins: GPT-4o (general use, cost-effective)

Result: 30-50% cost reduction vs. using premium models for everything.


Model Performance Comparison

Code Generation Quality (Developer Survey, 2025)

  1. Claude 4.5 Sonnet - 92% satisfaction
  2. Claude 3.5 Sonnet - 89% satisfaction
  3. GPT-4o - 82% satisfaction
  4. o3-mini - 78% satisfaction
  5. Gemini 2.0 - 75% satisfaction

Takeaway: For code, Claude models are preferred.

Speed (Average Response Time)

  1. Gemini 2.5 Flash - 1.2 sec
  2. GPT-4o - 1.8 sec
  3. Claude 3.5 Sonnet - 2.1 sec
  4. Claude 4 - 2.8 sec
  5. o3-mini - 18 sec (reasoning models are slow)

Takeaway: For speed, GPT-4o or Gemini Flash.

Long Context Handling (2M+ tokens)

  1. Gemini 2.5 Flash - 2M tokens, excellent
  2. Gemini 2.0 - 2M tokens, good
  3. Claude 4 - 200K tokens, good
  4. GPT-4o - 128K tokens, adequate
  5. o3 - 200K tokens, good

Takeaway: For entire codebases or massive docs, use Gemini.

Reasoning Depth (Complex Problem Solving)

  1. o3 - Best at step-by-step reasoning
  2. o3-mini - Good reasoning, faster
  3. Claude 4 - Strong nuanced reasoning
  4. GPT-4 Turbo - Good general reasoning
  5. GPT-4o - Adequate (optimized for speed)

Takeaway: For complex analysis, use reasoning models (o3, Claude 4).


Custom Models and Fine-Tuning

Bring Your Own Model

Zine supports custom model endpoints via API:

Setup:

  1. Host your model (AWS, Azure, GCP, or on-prem)
  2. Expose OpenAI-compatible API endpoint
  3. Add to Zine: Settings → Custom Models
  4. Enter:
    Name: Legal Llama (Fine-Tuned)
    Endpoint: https://your-server.com/v1/chat/completions
    API Key: your-api-key
    

Use Cases:

  • Legal/compliance: Fine-tuned on legal documents
  • Medical: HIPAA-compliant model on private infrastructure
  • Finance: Fine-tuned on financial analysis
  • Proprietary: Your company's domain-specific model

Fine-Tuning Llama 3.1

Steps:

  1. Export your Zine knowledge (Slack, GitHub, docs) as training data
  2. Fine-tune Llama 3.1 on your domain
    • Use tools like Hugging Face, Modal, or Replicate
  3. Host fine-tuned model
  4. Add to Zine as custom model

Benefits:

  • Model "knows" your team's terminology, patterns
  • Better performance on domain-specific queries
  • Full control over data (never leaves your infrastructure)

Costs:

  • Fine-tuning: $100-$1000 (one-time)
  • Hosting: $50-$500/month (depends on usage)

Team Model Preferences

Shared Specifications for Teams

Admins can create team-wide specifications:

  1. Admin creates: "Engineering Code Assistant" (Claude 4.5 Sonnet)
  2. All engineers see this spec in their dropdown
  3. Ensures consistency (everyone uses best model for code)

Role-Based Default Models

Set default models by role:

  • Developers: Claude 3.5 Sonnet
  • Product: GPT-4o
  • Sales: GPT-4o
  • Executives: Claude 4 (detailed analysis)

Model Usage Analytics

Admins can view:

  • Which models are most popular
  • Cost per model per team
  • Performance metrics (user satisfaction)

Optimize based on data: If GPT-4o satisfaction is high but Claude 4 is rarely used, shift more users to GPT-4o to save costs.


Best Practices

1. Start with GPT-4o, Switch When Needed

Default: GPT-4o for most queries (fast, cheap, good enough)

Switch to:

  • Claude 3.5/4.5 Sonnet: Code generation, technical reasoning
  • Gemini 2.5 Flash: Long documents, codebase-wide queries
  • o3-mini/o3: Complex analysis, math, logic

2. Create Specifications for Repeated Tasks

Don't re-type instructions every time. Save specifications:

  • "Code Review Bot" (Claude 3.5 + code review prompt)
  • "Customer Email Drafter" (GPT-4o + empathetic tone)
  • "Architecture Analyzer" (o3-mini + step-by-step reasoning)

3. Experiment with New Models

When a new model launches:

  1. Create a test specification
  2. Try on a few queries
  3. Compare to your current model
  4. Switch if better (no migration needed)

4. Monitor Costs

Check your usage:

  • Settings → Usage Dashboard
  • See cost per model
  • Identify expensive queries (long context, complex reasoning)
  • Optimize: Use cheaper models for simple tasks

5. Team Training

Educate your team:

  • When to use which model (share the decision matrix above)
  • How to create specifications (save time)
  • Cost awareness (don't use o3 for simple summaries)

Next Steps

Now that you understand model flexibility:

  1. Experiment: Try GPT-4o, Claude, and Gemini on the same query
  2. Create Specifications: Save your favorite model + prompt combos
  3. Set Defaults: Choose your personal default model
  4. Share with Team: Create team specifications
  5. Monitor Usage: Check which models work best for you

Related Guides:

Learn More:


Same knowledge, different models. Choose the best tool for every job.

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

Model-Agnostic Chat: Use GPT, Claude, Gemini, or Any LLM with Your Data | Graphlit Developer Guides