Incident Response: Find Root Cause in 2 Minutes, Not 30

3:47 AM. Your phone buzzes. Checkout API is down.

You're on-call. You need to:

Check error logs (Sentry)
Search Slack #incidents for similar issues
Check GitHub for recent deployments
Search Notion for runbooks
Ask teammates if they remember this error

By the time you've gathered context, it's 4:15 AM—28 minutes wasted before you even start fixing.

Zine transforms incident response: Search once, get everything—error logs (via Sentry MCP), Slack incident history, recent GitHub changes, and runbooks. Root cause identified in 2 minutes.

This guide shows DevOps, SREs, and on-call engineers how to set up unified incident context that saves hours during critical moments.

The Incident Response Time Sink
The Zine Incident Response Workflow
Setup: One-Time Configuration
During an Incident: The 2-Minute Context Gather
Connecting Sentry MCP for Error Logs
Post-Incident: Automated Documentation
Real Incident Examples
Best Practices

The Incident Response Time Sink

Traditional Incident Response Flow

Step 1: Identify the problem (2 minutes)

Alert comes in: "Checkout API 500 errors spiking"
Check monitoring dashboard (Datadog, New Relic, etc.)

Step 2: Check error logs (5-10 minutes)

Open Sentry or CloudWatch
Filter by service: checkout-api
Filter by timeframe: Last hour
Read stack traces, identify error patterns

Step 3: Search Slack #incidents (5-10 minutes)

"Has this happened before?"
Manually scroll through channel
Find similar incident from 3 months ago
Read 50-message thread to find resolution

Step 4: Check recent deployments (5-10 minutes)

Open GitHub
Check recent PRs merged to production
Read PR descriptions, commits
Identify suspicious changes

Step 5: Search runbooks (3-5 minutes)

Open Notion or Confluence
Search "checkout troubleshooting"
Find (hopefully) relevant runbook

Step 6: Ask teammates (10-15 minutes)

Slack: "Anyone seen checkout errors before?"
Wait for responses
Senior engineer shares context from memory

Total time: 30-45 minutes of context gathering before you start fixing.

During that time: Users can't checkout. Revenue lost. Stress accumulates.

The Zine Incident Response Workflow

Unified Incident Context in One Query

3:47 AM Alert: Checkout API errors

3:48 AM - Open Zine, one query:

Checkout API errors OR timeouts

3:49 AM - Zine returns (in 15 seconds):

Sentry (via MCP): 87 errors in last hour, stack trace shows Redis timeout
Slack #incidents (2 months ago): Same error, resolution documented (increase Redis timeout)
GitHub PR #567: Merged yesterday, "Optimize Redis cache" (modified timeout config)
Slack #engineering (yesterday): "Concerns about new Redis timeout settings" (Alice raised this)
Notion runbook: "Redis Troubleshooting" (timeout adjustment procedure)
GitHub PR #601 (2 months ago): Past fix for same issue

3:49 AM - Hypothesis identified:

Recent Redis config change (PR #567) set timeout too aggressive
This caused the same issue 2 months ago (PR #601 fixed it)
Alice warned about this yesterday in Slack

3:50 AM - Fix:

Revert Redis timeout config
OR: Increase timeout based on runbook
Deploy fix

Total time: 3 minutes from alert to fix deployment.

Time saved: 27-42 minutes.

Setup: One-Time Configuration

Step 1: Connect Core Tools to Zine

Required:

Slack: Connect #incidents, #engineering, #devops channels
GitHub: Connect repos (especially backend services)
Notion: Connect runbooks, architecture docs

Recommended: 4. Meeting recordings: Past architecture/incident review meetings 5. Email: Vendor discussions, escalation threads

Initial sync: 1-3 hours (one-time)

Step 2: Connect Sentry MCP (Optional but Powerful)

Sentry offers an MCP server for error tracking.

Add Sentry MCP to Zine (as MCP client):

Zine Settings → MCP Servers → Add Server
Select "Sentry MCP"
Enter Sentry API key
Authorize

Now Zine can query Sentry errors in addition to searching Slack/GitHub.

Benefit: One query gets error logs + team discussions + code changes.

Step 3: Create Saved Views for Incidents

In Zine, create saved views:

"Recent Incidents":

source:slack channel:#incidents after:30d

"Production Changes":

source:github label:production merged after:7d

"Critical Bugs":

source:github label:critical state:open

Time saved: One-click access during incidents.

Step 4: Set Up Alert (Optional)

Create an alert for proactive monitoring:

Alert Name: "Incident Monitor"
Query:

Slack #incidents new threads
OR GitHub issues labeled 'production' OR 'outage'
OR Sentry errors increased by 50%+
from the last hour

Schedule: Hourly
Delivery: Slack DM

Benefit: Know about incidents immediately, even if you're not actively monitoring.

During an Incident: The 2-Minute Context Gather

Query Templates for Common Incidents

API Errors:

[service-name] API errors OR timeouts OR 500

Database Issues:

Database OR postgres OR mongodb slow OR timeout OR connection

Cache Problems:

Redis OR memcached OR cache timeout OR eviction

Deployment Issues:

Deployment OR deploy failed OR rollback recent

Performance Degradation:

Slow OR performance OR latency [service-name]

What to Look For in Results

1. Past Incidents (Slack #incidents):

"Has this happened before?"
If yes: How was it resolved?
Time saved: Don't re-diagnose

2. Recent Changes (GitHub):

PRs merged in last 24-48 hours
Changes to affected service
Likely culprits for new bugs

3. Known Issues (GitHub Issues):

Open issues about this service
Known bugs or limitations
Workarounds documented

4. Runbooks (Notion):

Troubleshooting procedures
Recovery steps
Contact information for escalation

5. Team Knowledge (Slack #engineering):

Discussions about this service
Known gotchas or edge cases
Expertise (who knows this system best)

Connecting Sentry MCP for Error Logs

Why Sentry MCP?

Without Sentry MCP:

Search Zine → Get Slack/GitHub context
Open Sentry separately → Get error logs
Manually correlate the two

With Sentry MCP:

Search Zine → Get Slack/GitHub context + Sentry error logs in one query
AI correlates automatically

Setup Sentry MCP

Option 1: Connect to Zine (Recommended)

Zine Settings → MCP Servers → Add Server
Select "Sentry"

Enter:

Sentry API Key: your-sentry-api-key
Sentry Organization: your-org-slug

Save

Now when you query Zine, it can include Sentry data.

Option 2: Connect Directly in Cursor

Add Sentry MCP alongside Zine MCP in Cursor config:

{
  "mcpServers": {
    "zine": { ... },
    "sentry": {
      "type": "sse",
      "url": "sentry-mcp-endpoint",
      "apiKey": "your-sentry-key"
    }
  }
}

Querying Sentry via Zine

Query:

Checkout API Sentry errors in the last hour

Zine returns:

Sentry errors (87 errors, stack traces, affected users)
Slack #incidents (team discussion if any)
GitHub recent changes (PRs merged recently)

All unified.

Post-Incident: Automated Documentation

Generate Incident Report

After resolving an incident, use Zine to generate postmortem:

Query in Zine chat:

Generate incident report for checkout API timeout on November 13, 2025

Zine AI compiles:

What happened: Timeline of errors (Sentry data)
Root cause: GitHub PR #567 changed Redis timeout (too aggressive)
Team response: Slack #incidents thread (Bob identified issue, Alice deployed fix)
Resolution: Config rolled back, errors stopped
Follow-up actions: Update runbook, add monitoring for Redis timeouts

Export to Notion: Click "Export" → saves as Notion page in "Incident Reports" database.

Time saved: 30-45 minutes writing postmortem manually.

Real Incident Examples

Example 1: Database Connection Exhaustion

Alert: 4:00 AM - API returning 500 errors

Query Zine:

Database connection errors OR pool exhausted

Returns:

Slack #incidents (6 months ago): Same error, resolution: Increase connection pool size
GitHub PR #234: Past fix
GitHub PR #789: Merged 3 days ago, modified database config (potential cause)
Notion runbook: "Database Connection Issues"

Root cause identified in 3 minutes: Recent PR reduced connection pool size (optimization attempt backfired).

Fix: Revert connection pool change.

Downtime: 8 minutes (vs. 45 minutes without Zine context).

Example 2: Redis Cache Eviction Bug

Alert: 2:00 PM - Checkout flow broken

Query Zine:

Checkout OR payment redis OR cache

Returns:

Sentry: 143 errors, "Redis key not found"
Slack #engineering (last week): "Changed Redis eviction policy to save memory"
GitHub PR #567: "Update Redis config" (merged 3 days ago)
Slack #engineering (last week): Alice warned: "This might evict active session keys"

Root cause identified in 2 minutes: New eviction policy is too aggressive, evicting session keys prematurely.

Fix: Adjust eviction policy to exclude session keys.

Alice's warning was right: Slack context prevented surprise. Team knew this was a risk.

Example 3: Third-Party API Outage

Alert: 5:30 PM - Payment processing failing

Query Zine:

Payment API errors OR Stripe OR payment gateway

Returns:

Slack #incidents: Bob posted 10 minutes ago "Stripe status page shows outage"
Email (from Stripe): Incident notification received 15 minutes ago
Notion runbook: "Third-Party Outage Response" (enable fallback payment processor)
GitHub: Fallback implementation in payment-service

Root cause identified in 1 minute: Stripe outage (external, not our bug).

Response: Enable fallback processor, notify customers, monitor Stripe status.

No time wasted debugging our code (Slack context immediately indicated external issue).

Best Practices

1. Connect Tools Before Incidents Happen

Don't wait until 3 AM:

Set up Slack, GitHub, Sentry integration during normal hours
Test queries during calm periods
Create saved views for common incident types

Preparation pays off when seconds matter.

2. Document Resolutions in Slack

After fixing:

Post resolution in Slack #incidents
Include: Root cause, fix applied, prevention steps

Why: Next time this happens (it will), Zine finds this thread immediately.

Example post:

Checkout API timeout resolved.
Root cause: PR #567 set Redis timeout to 1000ms (too low).
Fix: Increased to 3000ms.
Prevention: Added monitoring for Redis timeouts.
Runbook updated in Notion.

3. Use Time Filters Strategically

Recent changes (last 24-48 hours):

after:24h deploy OR merged

Past incidents (last 6 months):

after:6mo [error-pattern]

Why: New bugs likely caused by recent changes. Historical incidents provide resolution patterns.

4. Create Incident-Specific Saved Views

"Recent Deployments":

source:github merged to:production after:48h

"Open Production Issues":

source:github label:production state:open

"Past Incidents":

source:slack channel:#incidents after:30d

Benefit: One-click access during high-stress incidents.

5. Set Up Proactive Alerts

Don't wait for pages:

Alert when Sentry errors spike
Alert when Slack #incidents has new thread
Alert when GitHub issues labeled "production" are created

Result: Catch issues before they become full outages.

Next Steps

Now that you understand incident response with Zine:

✅ Connect Tools: Slack #incidents, GitHub, Notion runbooks
✅ Test During Calm: Practice queries before real incidents
✅ Create Saved Views: For recent changes, open issues, past incidents
✅ Set Up Alerts: Proactive monitoring
✅ Add Sentry MCP: Unified error logs + context
✅ Document Runbook: Update team's incident response process to include Zine

Related Guides:

MCP Integration - Connect Sentry MCP
Slack Knowledge Base - Set up #incidents search
GitHub Intelligence - Track deployments
Automated Alerts - Proactive incident monitoring

Learn More:

Try Zine - Free tier available
Schedule a demo - Get help setting up incident response workflow

Every minute counts during incidents. Don't waste 30 gathering context.

Incident Response: Find Root Cause in 2 Minutes, Not 30

Table of Contents

The Incident Response Time Sink

Traditional Incident Response Flow

The Zine Incident Response Workflow

Unified Incident Context in One Query

Setup: One-Time Configuration

Step 1: Connect Core Tools to Zine

Step 2: Connect Sentry MCP (Optional but Powerful)

Step 3: Create Saved Views for Incidents

Step 4: Set Up Alert (Optional)

During an Incident: The 2-Minute Context Gather

Query Templates for Common Incidents

What to Look For in Results

Connecting Sentry MCP for Error Logs

Why Sentry MCP?

Setup Sentry MCP

Querying Sentry via Zine

Post-Incident: Automated Documentation

Generate Incident Report

Real Incident Examples

Example 1: Database Connection Exhaustion

Example 2: Redis Cache Eviction Bug

Example 3: Third-Party API Outage

Best Practices

1. Connect Tools Before Incidents Happen

2. Document Resolutions in Slack

3. Use Time Filters Strategically

4. Create Incident-Specific Saved Views

5. Set Up Proactive Alerts

Next Steps

Ready to Build with Graphlit?