Guide

Multimodal Processing: Search Text, Audio, Video, Images, and Code

Zine doesn't just search text. Search meeting transcripts, video content, images (OCR), and code syntax—all in one unified interface.

Most search systems only handle text documents—PDFs, Word docs, plain text files.

Your team's knowledge lives in:

  • Meeting recordings (audio, video) - architectural decisions, product reviews
  • Screenshots (images) - error messages, UI designs, whiteboard sessions
  • Videos (YouTube, Loom) - product demos, onboarding tutorials
  • Code files (GitHub) - implementations, not just docs
  • Presentations (PowerPoint, Keynote) - strategy decks, customer pitches

If your search can't handle these formats, you're missing 60%+ of your team's knowledge.

Zine's multimodal processing extracts searchable text from audio, video, images, and code—so you can search:

  • "What did the CTO say about microservices?" (from meeting recording)
  • "Show me error screenshots from Slack" (OCR'd images)
  • "Find the authentication implementation" (syntax-aware code search)
  • "Onboarding video about our deployment process" (video transcripts)

This guide covers all supported formats, how they're processed, and how to search them effectively.


Table of Contents

  1. Supported Formats Overview
  2. Audio Processing
  3. Video Processing
  4. Image Processing (OCR)
  5. Code Processing
  6. Document Processing
  7. Searching Multimodal Content
  8. Best Practices

Supported Formats Overview

All Supported Content Types

Audio Formats:

  • MP3, WAV, M4A, FLAC, OGG
  • Podcast recordings
  • Voice memos
  • Meeting audio
  • Phone call recordings

Video Formats:

  • MP4, MOV, AVI, WMV, MKV
  • Meeting recordings (Zoom, Google Meet, Teams)
  • Product demos (Loom, Screen Studio)
  • YouTube videos (via URL)
  • Training videos

Image Formats (with OCR):

  • JPEG, PNG, GIF, TIFF, BMP, WebP
  • Screenshots
  • Whiteboard photos
  • Diagrams and charts
  • Scanned documents
  • UI mockups

Code Formats (syntax-aware):

  • JavaScript/TypeScript (.js, .ts, .jsx, .tsx)
  • Python (.py)
  • Java (.java)
  • C/C++ (.c, .cpp, .h)
  • Go (.go)
  • Rust (.rs)
  • Ruby (.rb)
  • PHP (.php)
  • Swift (.swift)
  • Kotlin (.kt)
  • 30+ languages supported

Document Formats:

  • PDF (searchable and scanned)
  • Microsoft Word (.doc, .docx)
  • PowerPoint (.ppt, .pptx)
  • Excel (.xls, .xlsx)
  • Google Docs, Sheets, Slides
  • Markdown (.md)
  • Plain text (.txt)
  • HTML (.html)
  • RTF, ODT, LaTeX

Presentation Formats:

  • PowerPoint slides (text + embedded images OCR'd)
  • Keynote exports
  • Google Slides (via Drive connector)

Audio Processing

How Audio is Processed

Step 1: Ingestion

  • Upload audio file or connect feed (Zoom, Google Meet)
  • Zine detects format (MP3, WAV, etc.)

Step 2: Transcription

  • Speech-to-text (OpenAI Whisper or similar)
  • Multi-language support (50+ languages)
  • Speaker diarization (identifies who said what)
  • Timestamp alignment (every utterance timestamped)

Step 3: Indexing

  • Transcript becomes searchable text
  • Timestamps preserved for navigation
  • Speaker names extracted (if available)

Step 4: Enrichment (optional)

  • Entity extraction (people, companies mentioned)
  • Key topics identified
  • Action items extracted

Supported Audio Sources

Direct Upload:

Upload MP3, WAV, M4A files (up to 500MB)

Meeting Recordings (via connectors):

  • Zoom: Auto-sync cloud recordings
  • Google Meet: Recordings saved to Drive
  • Microsoft Teams: Meeting recordings from OneDrive
  • Loom: Video/audio from Loom library

Podcasts:

  • RSS feed connector (auto-download new episodes)

Voice Memos:

  • Upload from phone (iOS Voice Memos, Android Recorder)
  • Dropbox/Drive sync

Example: Meeting Recording Search

Scenario: Product roadmap review meeting (1 hour)

Uploaded: Zoom recording (meeting.mp4, 350MB)

Zine processes:

  1. Extracts audio track
  2. Transcribes (10 minutes processing)
  3. Identifies speakers: Alice (PM), Bob (Eng Lead), Sarah (Design)
  4. Timestamps every sentence

Transcript indexed:

[00:03:42] Alice: Should we prioritize mobile app in Q4?
[00:04:15] Bob: Mobile is technically complex. I'd estimate 3 months.
[00:04:58] Sarah: From design perspective, mobile needs 2 weeks prep.
[00:06:12] Alice: Let's target Q4 for beta, Q1 for full launch.

Now searchable:

  • Query: "mobile app priority"
    • Returns: This meeting, jump to 00:03:42
  • Query: "Bob's concerns about mobile"
    • Returns: Timestamp 00:04:15 ("technically complex")
  • Query: "Q4 roadmap decisions"
    • Returns: This meeting + other roadmap discussions

Video Processing

How Video is Processed

Step 1: Ingestion

  • Upload video file or connect feed (YouTube, Loom)

Step 2: Audio Extraction

  • Extract audio track from video
  • Transcribe (same process as audio)

Step 3: Visual Analysis (optional)

  • Extract frames (every 5 seconds)
  • OCR any text visible in video (screen shares, slides)
  • Identify scenes/chapters

Step 4: Indexing

  • Transcript + visual text searchable
  • Thumbnail generated
  • Chapter markers (if available)

Supported Video Sources

Direct Upload:

Upload MP4, MOV, AVI (up to 2GB)

YouTube (via URL or feed):

Paste YouTube URL → Zine downloads + transcribes

Meeting Platforms:

  • Zoom: Cloud recordings
  • Google Meet: Drive-saved recordings
  • Microsoft Teams: OneDrive recordings

Screen Recording Tools:

  • Loom: Library connector
  • Screen Studio: Export MP4s
  • OBS recordings: Upload MP4s

Example: Product Demo Video Search

Scenario: Loom demo of new feature (15 minutes)

Uploaded: demo-checkout-flow.mp4

Zine processes:

  1. Transcribes narration: "Here's the new checkout flow..."
  2. OCRs screen content: Button labels, form fields
  3. Indexes both transcript + visible text

Now searchable:

  • Query: "checkout flow demo"
    • Returns: This video, plays from start
  • Query: "payment button"
    • Returns: Timestamp where button is shown + mentioned
  • Query: "How do we handle errors in checkout?"
    • Returns: Section of video discussing error handling

Image Processing (OCR)

How Images are Processed

Step 1: Ingestion

  • Upload image or pulled from Slack/Drive/email

Step 2: OCR (Optical Character Recognition)

  • Extract all text visible in image
  • Handle:
    • Screenshots (UI text, code snippets, error messages)
    • Whiteboard photos (handwritten notes, diagrams)
    • Scanned documents (PDFs, receipts)
    • Charts/graphs (axis labels, legends)

Step 3: Indexing

  • Extracted text becomes searchable
  • Image thumbnail preserved
  • Metadata (filename, source, date)

Supported Image Sources

Direct Upload:

Upload JPEG, PNG, GIF, TIFF (up to 50MB)

Slack (via connector):

  • Screenshots shared in channels
  • Whiteboard photos
  • Design mockups

Google Drive (via connector):

  • Scanned documents
  • Photos

Email (via connector):

  • Inline images
  • Attachments

Example: Screenshot Search

Scenario: Developer shares error screenshot in Slack

Image content:

[Screenshot of console]
Error: ECONNREFUSED 127.0.0.1:6379
  at RedisClient.connect (/app/redis.js:42)
  at Database.init (/app/db.js:18)

Zine processes:

  1. OCRs screenshot → Extracts text
  2. Indexes error message, stack trace
  3. Links to Slack message context

Now searchable:

  • Query: "Redis ECONNREFUSED error"
    • Returns: This screenshot + Slack thread discussing fix
  • Query: "redis.js line 42"
    • Returns: This screenshot + GitHub file redis.js

Code Processing

How Code is Processed (Syntax-Aware)

Step 1: Ingestion

  • GitHub connector syncs repos
  • Direct file upload

Step 2: Syntax Parsing

  • Language detection (JavaScript, Python, etc.)
  • Parse AST (Abstract Syntax Tree)
  • Identify:
    • Function definitions
    • Class declarations
    • Imports/dependencies
    • Comments

Step 3: Indexing

  • Full-text search (every line)
  • Symbol search (functions, classes)
  • Semantic understanding (what code does)

Step 4: Enrichment

  • Links to GitHub (file, line numbers)
  • Links to related PRs, issues
  • Links to Slack discussions mentioning this code

Syntax-Aware Search

Traditional search: Keyword match only

Zine code search: Understands code structure

Example:

Query: "createUser function"

Traditional search returns:

  • All files containing string "createUser" (hundreds of matches)

Zine code search returns (ranked):

  1. Function definition: function createUser() in auth-service/users.js
  2. Function calls: Where createUser() is called (usage examples)
  3. Tests: test('createUser should...)
  4. Documentation: README mentioning createUser
  5. Slack discussions: Team discussing createUser implementation

Supported Languages

Strongly supported (syntax-aware):

  • JavaScript, TypeScript, React (JSX/TSX)
  • Python
  • Java, Kotlin
  • Go
  • Rust
  • C, C++
  • C#
  • Ruby
  • PHP
  • Swift

Text search (still searchable, less syntax awareness):

  • Shell scripts (bash, zsh)
  • SQL
  • YAML, JSON, TOML
  • Markdown
  • HTML, CSS

Example: Code Search

Query: "authentication middleware"

Returns:

  1. GitHub code: middleware/auth.ts (function authenticateRequest())
  2. GitHub PR #234: "Add authentication middleware" (implementation)
  3. Slack #engineering: Discussion about auth approach
  4. Notion: "Auth Architecture" spec (requirements)
  5. GitHub issues: Bug reports mentioning auth middleware

All connected: Spec → Discussion → Implementation → Issues


Document Processing

PowerPoint/Keynote

Processing:

  1. Extract text from slides
  2. OCR embedded images (screenshots, charts)
  3. Extract speaker notes
  4. Index slide order (Slide 1, 2, 3...)

Search:

  • Query: "Q4 roadmap"
    • Returns: Presentation, jumps to relevant slide

PDF (Scanned Documents)

Processing:

  1. Detect if PDF is searchable (text layer) or scanned (images)
  2. If scanned: OCR every page
  3. Extract tables, charts (with labels)
  4. Index page numbers

Search:

  • Query: "revenue projections"
    • Returns: Financial PDF, page 14

Excel/Google Sheets

Processing:

  1. Extract text from cells
  2. Preserve table structure (row/column context)
  3. Index sheet names

Search:

  • Query: "Acme Corp pricing"
    • Returns: Pricing spreadsheet, Sheet: "Enterprise Customers", Row 23

Searching Multimodal Content

Unified Search (All Formats)

Query: "Redis performance"

Returns (mixed formats):

  1. Meeting recording (audio): Architecture review discussing Redis
  2. Slack screenshot (image): Redis performance graph (OCR'd labels)
  3. GitHub code: redis-client.ts implementation
  4. Notion doc (text): "Redis Configuration Guide"
  5. Loom video: Demo of Redis integration

All ranked by relevance, all formats unified.

Filtering by Content Type

Search only videos:

type:video Redis performance

Search only code:

type:code createUser function

Search only images:

type:image error screenshot

Search only audio/meetings:

type:audio roadmap discussion

Time-Based Filtering

Recent recordings:

after:7d meeting recording

Old presentations (may be outdated):

before:2024-01-01 roadmap presentation

Best Practices

1. Name Files Descriptively

Bad:

  • recording.mp4
  • screenshot.png
  • slides.pptx

Good:

  • 2025-11-13-roadmap-review-meeting.mp4
  • redis-error-screenshot-2025-11-13.png
  • Q4-product-roadmap-slides.pptx

Why: Filenames are indexed, help with search.


2. Use Timestamps in Queries

For long recordings:

  • Zine returns timestamp where match occurs
  • Click timestamp → jump to exact moment in video/audio

Example: 1-hour meeting, query "mobile app"

  • Returns: Timestamp 00:34:12 where mobile was discussed
  • Click → plays from 00:34:12

3. Combine Multimodal with Text Search

Best queries mix formats:

Query: "authentication system"

Returns:

  • Code (auth-service/)
  • Meeting recordings (arch reviews)
  • Slack discussions
  • Notion specs
  • Screenshots (auth flow diagrams)

Result: Complete picture, all formats.


4. Upload Meeting Recordings Immediately

Don't wait:

  • After meetings, upload recording same day
  • Zine processes in background (10-30 min)
  • Searchable by next meeting

Benefit: Context preserved while fresh.


5. OCR Whiteboards and Sketches

After whiteboard sessions:

  • Take photo
  • Upload to Zine (or share in Slack)
  • Zine OCRs handwritten notes (if legible)

Searchable: Brainstorming sessions, architecture sketches.


6. Connect YouTube for Tutorials

If your team shares YouTube tutorials:

  • Connect YouTube channel/playlist via feed
  • Zine auto-transcribes new videos
  • Search video content like docs

7. Use Dev Mode for Code + Context

When searching code:

  • Use Dev Mode (split view)
  • Left: Code files
  • Right: Related issues, PRs, Slack discussions

Example: Click auth.ts → See related Slack thread about auth decisions.


Next Steps

Now that you understand multimodal processing:

  1. Upload Meeting Recordings: Past architecture reviews, product demos
  2. Connect Slack: Screenshots shared in channels get OCR'd automatically
  3. Connect GitHub: Code becomes searchable alongside discussions
  4. Test Unified Search: Query something discussed in meeting + Slack + code
  5. Use Timeline View: See chronological narrative across all formats

Related Guides:

Learn More:


Text is just one format. Your team's knowledge lives in audio, video, images, and code. Search it all.

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

Multimodal Processing: Search Text, Audio, Video, Images, and Code | Graphlit Developer Guides