Vendor lock-in is real. ChatGPT Enterprise forces you into GPT-only. NotebookLM locks you into Gemini. Most RAG platforms tie you to one AI provider—and when that model degrades or pricing changes, you're stuck.
Zine takes a different approach: model-agnostic architecture. Use GPT-4o for fast summaries, Claude 3.5 Sonnet for code reasoning, Gemini 2.5 Flash for long documents, or bring your own fine-tuned Llama model.
Same knowledge, different models, per conversation. This guide shows you how to leverage model flexibility for better results and lower costs.
Table of Contents
- Why Model Flexibility Matters
- Supported Models in Zine
- Choosing the Right Model for Each Task
- How to Switch Models in Zine
- Specifications: Model + System Prompt Presets
- Cost Optimization Strategies
- Model Performance Comparison
- Custom Models and Fine-Tuning
- Team Model Preferences
Why Model Flexibility Matters
The Vendor Lock-In Problem
Scenario 1: Model Performance Degrades
In mid-2024, many developers reported GPT-4's code quality regressed after an update. Responses became verbose, less accurate. Teams using ChatGPT Enterprise couldn't switch—they were locked in.
With Zine: Switch to Claude 3.5 Sonnet for code (better at coding) while keeping GPT-4o for other tasks. Same knowledge, better results.
Scenario 2: Pricing Changes
GPT-4 was expensive. Then GPT-4 Turbo launched (cheaper). Then GPT-4o became default (different pricing). Teams locked into GPT-4 Enterprise pricing couldn't optimize.
With Zine: Use GPT-4o for cheap/fast queries, o3-mini for complex reasoning. Optimize costs per use case.
Scenario 3: New Models Emerge
Claude 4.5 Sonnet launched with best-in-class code reasoning. Gemini 2.5 Flash has a 2M token context window (fits entire codebases). Teams locked into one provider miss these advances.
With Zine: Adopt new models as they launch. No migration needed—your knowledge stays connected.
Scenario 4: Domain-Specific Needs
Your legal team wants a fine-tuned Llama model trained on legal documents. Your sales team wants GPT-4o for speed. Your eng team wants Claude for code.
With Zine: Everyone uses the best model for their domain. Same platform, different models.
Supported Models in Zine
Zine supports all major AI providers plus custom models:
OpenAI Models
GPT-4o (Recommended for most use cases)
- Best for: General chat, fast summaries, multimodal tasks
- Context: 128K tokens
- Speed: Fast (~1-2 sec response time)
- Cost: $$ (moderate)
GPT-4 Turbo
- Best for: High-quality reasoning, detailed analysis
- Context: 128K tokens
- Speed: Medium (~2-4 sec)
- Cost: $$$ (higher)
o3-mini (Reasoning model)
- Best for: Complex analysis, math, logic problems
- Context: 128K tokens
- Speed: Slow (~10-30 sec, thinks before responding)
- Cost: $$$ (higher, but more accurate)
o3 (Advanced reasoning)
- Best for: Most complex problems, research-grade reasoning
- Context: 200K tokens
- Speed: Very slow (~30-60 sec)
- Cost: $$$$ (highest)
Anthropic Models
Claude 3.5 Sonnet (Recommended for code)
- Best for: Code generation, technical reasoning, long conversations
- Context: 200K tokens
- Speed: Fast (~1-2 sec)
- Cost: $$ (moderate)
Claude 4
- Best for: Improved reasoning over 3.5, better at complex tasks
- Context: 200K tokens
- Speed: Fast (~2-3 sec)
- Cost: $$$ (higher)
Claude 4.5 Sonnet (Latest, 2025)
- Best for: Best code reasoning available, complex technical tasks
- Context: 200K tokens
- Speed: Fast (~2-3 sec)
- Cost: $$$ (higher)
Google Models
Gemini 1.5 Pro
- Best for: Long context tasks, document analysis
- Context: 2M tokens (!)
- Speed: Medium (~3-5 sec)
- Cost: $$ (moderate)
Gemini 2.0
- Best for: Improved reasoning, multimodal tasks
- Context: 2M tokens
- Speed: Fast (~2-3 sec)
- Cost: $$ (moderate)
Gemini 2.5 Flash (Recommended for long docs)
- Best for: Ultra-long documents, entire codebases
- Context: 2M tokens (~1.4M words)
- Speed: Very fast (~1-2 sec)
- Cost: $ (cheap for the context window)
Meta Models
Llama 3.1 (Open source)
- Best for: Privacy-sensitive workloads, cost optimization
- Context: 128K tokens
- Speed: Fast (depends on hosting)
- Cost: $ (self-hosted) or $$ (via cloud)
Custom Models
Bring Your Own Model
- Best for: Domain-specific fine-tuning, proprietary models
- Context: Varies
- Speed: Varies
- Cost: Your hosting costs
Choosing the Right Model for Each Task
Quick Decision Matrix
Team-Specific Recommendations
Developers:
- Primary: Claude 3.5 Sonnet (code)
- Secondary: GPT-4o (general), Gemini 2.5 Flash (reading entire files)
Product Managers:
- Primary: GPT-4o (summaries, drafts)
- Secondary: Claude 4 (detailed analysis)
Data Analysts:
- Primary: o3-mini (complex analysis)
- Secondary: GPT-4o (quick queries)
Sales/CS:
- Primary: GPT-4o (fast responses, email drafts)
- Secondary: Claude 3.5 (detailed customer analysis)
Legal/Compliance:
- Primary: Claude 4 (nuanced reasoning)
- Secondary: o3 (complex contract analysis)
How to Switch Models in Zine
Per-Conversation Model Selection
In any Zine conversation:
- Open Model Selector: Click the model name (defaults to GPT-4o)
- Choose Model: Select from dropdown
- OpenAI: GPT-4o, GPT-4 Turbo, o3-mini, o3
- Anthropic: Claude 3.5 Sonnet, Claude 4, Claude 4.5 Sonnet
- Google: Gemini 1.5 Pro, Gemini 2.0, Gemini 2.5 Flash
- Meta: Llama 3.1
- Custom: Your uploaded models
- Chat: All responses in this conversation use the selected model
Mid-Conversation Switching
You can change models mid-conversation:
- User (with GPT-4o): "Summarize today's Slack activity"
- GPT-4o responds with summary
- User switches to Claude 3.5 Sonnet
- User: "Now write code to implement the API discussed in Slack"
- Claude responds with code
Knowledge persists across model switches—you're still querying the same Zine workspace.
Default Model Preference
Set your personal default model:
- Go to Settings → Preferences
- Select Default Model: (e.g., Claude 3.5 Sonnet)
- All new conversations start with this model
Specifications: Model + System Prompt Presets
Specifications are reusable configs that combine:
- Model selection
- System prompt/instructions
- Temperature and other parameters
Creating a Specification
- Click New Specification
- Configure:
Name: Code Assistant Model: Claude 3.5 Sonnet System Prompt: You are an expert software engineer. Always provide working code with error handling. Follow team's coding standards from Notion wiki. Temperature: 0.3 (more deterministic) - Save
Example Specifications
"Code Assistant" (Claude 3.5 Sonnet)
System Prompt: Expert software engineer. Provide working code
with error handling. Reference team's coding standards.
Use cases: Code generation, debugging, code reviews
"Research Analyst" (GPT-4o)
System Prompt: Synthesize information from multiple sources.
Provide structured outputs with citations. Be concise.
Use cases: Market research, competitive analysis
"Meeting Summarizer" (Gemini 2.5 Flash)
System Prompt: Summarize meeting transcripts. Extract:
- Key decisions
- Action items (with owners)
- Follow-up questions
Format as bullet points.
Use cases: Meeting notes, weekly reviews
"Customer Success Agent" (Claude 4)
System Prompt: Empathetic and solution-focused. When analyzing
customer issues, provide: root cause, solution steps, and
follow-up recommendations. Reference CRM history.
Use cases: Customer support, account management
"Deep Reasoning" (o3-mini)
System Prompt: Think step-by-step. Show your reasoning.
Identify assumptions. Consider edge cases.
Use cases: Complex problem solving, architecture decisions
Using Specifications
In any conversation:
- Click Load Specification
- Select specification (e.g., "Code Assistant")
- Conversation uses that model + prompt
Time saved: No need to re-type instructions or remember which model works best for each task.
Cost Optimization Strategies
Strategy 1: Use Cheaper Models for Simple Tasks
Before (GPT-4 Turbo for everything):
- 100 queries/day × $0.03/query = $3/day = $90/month
After (model selection):
- 70 simple queries (GPT-4o) × $0.01 = $0.70/day
- 30 complex queries (Claude 4) × $0.03 = $0.90/day
- Total: $1.60/day = $48/month
Savings: $42/month per user
Strategy 2: Use Gemini for Long Documents
Before (GPT-4 Turbo for 50-page PDF analysis):
- PDF has 100K tokens
- GPT-4 Turbo: $0.30 per analysis
- 10 PDFs/day = $3/day = $90/month
After (Gemini 2.5 Flash):
- Gemini handles 2M tokens, faster, cheaper
- $0.05 per analysis
- 10 PDFs/day = $0.50/day = $15/month
Savings: $75/month
Strategy 3: Self-Host Llama for High-Volume Simple Queries
Before (GPT-4o for 1000 simple queries/day):
- $0.01/query × 1000 = $10/day = $300/month
After (Llama 3.1 self-hosted):
- Server cost: $50/month
- Inference cost: ~$0
- Total: $50/month
Savings: $250/month
Team-Wide Cost Optimization
Guideline: Set default models per role:
- Developers: Claude 3.5 Sonnet (best for code, worth the cost)
- Sales/CS: GPT-4o (fast, cheap, good enough)
- Analysts: o3-mini (complex work, worth the cost) + GPT-4o (quick queries)
- Admins: GPT-4o (general use, cost-effective)
Result: 30-50% cost reduction vs. using premium models for everything.
Model Performance Comparison
Code Generation Quality (Developer Survey, 2025)
- Claude 4.5 Sonnet - 92% satisfaction
- Claude 3.5 Sonnet - 89% satisfaction
- GPT-4o - 82% satisfaction
- o3-mini - 78% satisfaction
- Gemini 2.0 - 75% satisfaction
Takeaway: For code, Claude models are preferred.
Speed (Average Response Time)
- Gemini 2.5 Flash - 1.2 sec
- GPT-4o - 1.8 sec
- Claude 3.5 Sonnet - 2.1 sec
- Claude 4 - 2.8 sec
- o3-mini - 18 sec (reasoning models are slow)
Takeaway: For speed, GPT-4o or Gemini Flash.
Long Context Handling (2M+ tokens)
- Gemini 2.5 Flash - 2M tokens, excellent
- Gemini 2.0 - 2M tokens, good
- Claude 4 - 200K tokens, good
- GPT-4o - 128K tokens, adequate
- o3 - 200K tokens, good
Takeaway: For entire codebases or massive docs, use Gemini.
Reasoning Depth (Complex Problem Solving)
- o3 - Best at step-by-step reasoning
- o3-mini - Good reasoning, faster
- Claude 4 - Strong nuanced reasoning
- GPT-4 Turbo - Good general reasoning
- GPT-4o - Adequate (optimized for speed)
Takeaway: For complex analysis, use reasoning models (o3, Claude 4).
Custom Models and Fine-Tuning
Bring Your Own Model
Zine supports custom model endpoints via API:
Setup:
- Host your model (AWS, Azure, GCP, or on-prem)
- Expose OpenAI-compatible API endpoint
- Add to Zine: Settings → Custom Models
- Enter:
Name: Legal Llama (Fine-Tuned) Endpoint: https://your-server.com/v1/chat/completions API Key: your-api-key
Use Cases:
- Legal/compliance: Fine-tuned on legal documents
- Medical: HIPAA-compliant model on private infrastructure
- Finance: Fine-tuned on financial analysis
- Proprietary: Your company's domain-specific model
Fine-Tuning Llama 3.1
Steps:
- Export your Zine knowledge (Slack, GitHub, docs) as training data
- Fine-tune Llama 3.1 on your domain
- Use tools like Hugging Face, Modal, or Replicate
- Host fine-tuned model
- Add to Zine as custom model
Benefits:
- Model "knows" your team's terminology, patterns
- Better performance on domain-specific queries
- Full control over data (never leaves your infrastructure)
Costs:
- Fine-tuning: $100-$1000 (one-time)
- Hosting: $50-$500/month (depends on usage)
Team Model Preferences
Shared Specifications for Teams
Admins can create team-wide specifications:
- Admin creates: "Engineering Code Assistant" (Claude 4.5 Sonnet)
- All engineers see this spec in their dropdown
- Ensures consistency (everyone uses best model for code)
Role-Based Default Models
Set default models by role:
- Developers: Claude 3.5 Sonnet
- Product: GPT-4o
- Sales: GPT-4o
- Executives: Claude 4 (detailed analysis)
Model Usage Analytics
Admins can view:
- Which models are most popular
- Cost per model per team
- Performance metrics (user satisfaction)
Optimize based on data: If GPT-4o satisfaction is high but Claude 4 is rarely used, shift more users to GPT-4o to save costs.
Best Practices
1. Start with GPT-4o, Switch When Needed
Default: GPT-4o for most queries (fast, cheap, good enough)
Switch to:
- Claude 3.5/4.5 Sonnet: Code generation, technical reasoning
- Gemini 2.5 Flash: Long documents, codebase-wide queries
- o3-mini/o3: Complex analysis, math, logic
2. Create Specifications for Repeated Tasks
Don't re-type instructions every time. Save specifications:
- "Code Review Bot" (Claude 3.5 + code review prompt)
- "Customer Email Drafter" (GPT-4o + empathetic tone)
- "Architecture Analyzer" (o3-mini + step-by-step reasoning)
3. Experiment with New Models
When a new model launches:
- Create a test specification
- Try on a few queries
- Compare to your current model
- Switch if better (no migration needed)
4. Monitor Costs
Check your usage:
- Settings → Usage Dashboard
- See cost per model
- Identify expensive queries (long context, complex reasoning)
- Optimize: Use cheaper models for simple tasks
5. Team Training
Educate your team:
- When to use which model (share the decision matrix above)
- How to create specifications (save time)
- Cost awareness (don't use o3 for simple summaries)
Next Steps
Now that you understand model flexibility:
- ✅ Experiment: Try GPT-4o, Claude, and Gemini on the same query
- ✅ Create Specifications: Save your favorite model + prompt combos
- ✅ Set Defaults: Choose your personal default model
- ✅ Share with Team: Create team specifications
- ✅ Monitor Usage: Check which models work best for you
Related Guides:
- MCP Integration - Use Zine with Cursor, VS Code
- Data Connectors - Connect your knowledge sources
- Automated Alerts - Set up model-specific briefings
Learn More:
- Try Zine - Free tier available
- Schedule a demo - Get help choosing models for your team
Same knowledge, different models. Choose the best tool for every job.