Multimodal Processing: Search Text, Audio, Video, Images, and Code

Most search systems only handle text documents—PDFs, Word docs, plain text files.

Your team's knowledge lives in:

Meeting recordings (audio, video) - architectural decisions, product reviews
Screenshots (images) - error messages, UI designs, whiteboard sessions
Videos (YouTube, Loom) - product demos, onboarding tutorials
Code files (GitHub) - implementations, not just docs
Presentations (PowerPoint, Keynote) - strategy decks, customer pitches

If your search can't handle these formats, you're missing 60%+ of your team's knowledge.

Zine's multimodal processing extracts searchable text from audio, video, images, and code—so you can search:

"What did the CTO say about microservices?" (from meeting recording)
"Show me error screenshots from Slack" (OCR'd images)
"Find the authentication implementation" (syntax-aware code search)
"Onboarding video about our deployment process" (video transcripts)

This guide covers all supported formats, how they're processed, and how to search them effectively.

Supported Formats Overview
Audio Processing
Video Processing
Image Processing (OCR)
Code Processing
Document Processing
Searching Multimodal Content
Best Practices

Supported Formats Overview

All Supported Content Types

Audio Formats:

MP3, WAV, M4A, FLAC, OGG
Podcast recordings
Voice memos
Meeting audio
Phone call recordings

Video Formats:

MP4, MOV, AVI, WMV, MKV
Meeting recordings (Zoom, Google Meet, Teams)
Product demos (Loom, Screen Studio)
YouTube videos (via URL)
Training videos

Image Formats (with OCR):

JPEG, PNG, GIF, TIFF, BMP, WebP
Screenshots
Whiteboard photos
Diagrams and charts
Scanned documents
UI mockups

Code Formats (syntax-aware):

JavaScript/TypeScript (.js, .ts, .jsx, .tsx)
Python (.py)
Java (.java)
C/C++ (.c, .cpp, .h)
Go (.go)
Rust (.rs)
Ruby (.rb)
PHP (.php)
Swift (.swift)
Kotlin (.kt)
30+ languages supported

Document Formats:

PDF (searchable and scanned)
Microsoft Word (.doc, .docx)
PowerPoint (.ppt, .pptx)
Excel (.xls, .xlsx)
Google Docs, Sheets, Slides
Markdown (.md)
Plain text (.txt)
HTML (.html)
RTF, ODT, LaTeX

Presentation Formats:

PowerPoint slides (text + embedded images OCR'd)
Keynote exports
Google Slides (via Drive connector)

Audio Processing

How Audio is Processed

Step 1: Ingestion

Upload audio file or connect feed (Zoom, Google Meet)
Zine detects format (MP3, WAV, etc.)

Step 2: Transcription

Speech-to-text (OpenAI Whisper or similar)
Multi-language support (50+ languages)
Speaker diarization (identifies who said what)
Timestamp alignment (every utterance timestamped)

Step 3: Indexing

Transcript becomes searchable text
Timestamps preserved for navigation
Speaker names extracted (if available)

Step 4: Enrichment (optional)

Entity extraction (people, companies mentioned)
Key topics identified
Action items extracted

Supported Audio Sources

Direct Upload:

Upload MP3, WAV, M4A files (up to 500MB)

Meeting Recordings (via connectors):

Zoom: Auto-sync cloud recordings
Google Meet: Recordings saved to Drive
Microsoft Teams: Meeting recordings from OneDrive
Loom: Video/audio from Loom library

Podcasts:

RSS feed connector (auto-download new episodes)

Voice Memos:

Upload from phone (iOS Voice Memos, Android Recorder)
Dropbox/Drive sync

Example: Meeting Recording Search

Scenario: Product roadmap review meeting (1 hour)

Uploaded: Zoom recording (meeting.mp4, 350MB)

Zine processes:

Extracts audio track
Transcribes (10 minutes processing)
Identifies speakers: Alice (PM), Bob (Eng Lead), Sarah (Design)
Timestamps every sentence

Transcript indexed:

[00:03:42] Alice: Should we prioritize mobile app in Q4?
[00:04:15] Bob: Mobile is technically complex. I'd estimate 3 months.
[00:04:58] Sarah: From design perspective, mobile needs 2 weeks prep.
[00:06:12] Alice: Let's target Q4 for beta, Q1 for full launch.

Now searchable:

Query: "mobile app priority"
- Returns: This meeting, jump to 00:03:42
Query: "Bob's concerns about mobile"
- Returns: Timestamp 00:04:15 ("technically complex")
Query: "Q4 roadmap decisions"
- Returns: This meeting + other roadmap discussions

Video Processing

How Video is Processed

Step 1: Ingestion

Upload video file or connect feed (YouTube, Loom)

Step 2: Audio Extraction

Extract audio track from video
Transcribe (same process as audio)

Step 3: Visual Analysis (optional)

Extract frames (every 5 seconds)
OCR any text visible in video (screen shares, slides)
Identify scenes/chapters

Step 4: Indexing

Transcript + visual text searchable
Thumbnail generated
Chapter markers (if available)

Supported Video Sources

Direct Upload:

Upload MP4, MOV, AVI (up to 2GB)

YouTube (via URL or feed):

Paste YouTube URL → Zine downloads + transcribes

Meeting Platforms:

Zoom: Cloud recordings
Google Meet: Drive-saved recordings
Microsoft Teams: OneDrive recordings

Screen Recording Tools:

Loom: Library connector
Screen Studio: Export MP4s
OBS recordings: Upload MP4s

Example: Product Demo Video Search

Scenario: Loom demo of new feature (15 minutes)

Uploaded: demo-checkout-flow.mp4

Zine processes:

Transcribes narration: "Here's the new checkout flow..."
OCRs screen content: Button labels, form fields
Indexes both transcript + visible text

Now searchable:

Query: "checkout flow demo"
- Returns: This video, plays from start
Query: "payment button"
- Returns: Timestamp where button is shown + mentioned
Query: "How do we handle errors in checkout?"
- Returns: Section of video discussing error handling

Image Processing (OCR)

How Images are Processed

Step 1: Ingestion

Upload image or pulled from Slack/Drive/email

Step 2: OCR (Optical Character Recognition)

Extract all text visible in image
Handle:
- Screenshots (UI text, code snippets, error messages)
- Whiteboard photos (handwritten notes, diagrams)
- Scanned documents (PDFs, receipts)
- Charts/graphs (axis labels, legends)

Step 3: Indexing

Extracted text becomes searchable
Image thumbnail preserved
Metadata (filename, source, date)

Supported Image Sources

Direct Upload:

Upload JPEG, PNG, GIF, TIFF (up to 50MB)

Slack (via connector):

Screenshots shared in channels
Whiteboard photos
Design mockups

Google Drive (via connector):

Scanned documents
Photos

Email (via connector):

Inline images
Attachments

Example: Screenshot Search

Scenario: Developer shares error screenshot in Slack

Image content:

[Screenshot of console]
Error: ECONNREFUSED 127.0.0.1:6379
  at RedisClient.connect (/app/redis.js:42)
  at Database.init (/app/db.js:18)

Zine processes:

OCRs screenshot → Extracts text
Indexes error message, stack trace
Links to Slack message context

Now searchable:

Query: "Redis ECONNREFUSED error"
- Returns: This screenshot + Slack thread discussing fix
Query: "redis.js line 42"
- Returns: This screenshot + GitHub file redis.js

Code Processing

How Code is Processed (Syntax-Aware)

Step 1: Ingestion

GitHub connector syncs repos
Direct file upload

Step 2: Syntax Parsing

Language detection (JavaScript, Python, etc.)
Parse AST (Abstract Syntax Tree)
Identify:
- Function definitions
- Class declarations
- Imports/dependencies
- Comments

Step 3: Indexing

Full-text search (every line)
Symbol search (functions, classes)
Semantic understanding (what code does)

Step 4: Enrichment

Links to GitHub (file, line numbers)
Links to related PRs, issues
Links to Slack discussions mentioning this code

Syntax-Aware Search

Traditional search: Keyword match only

Zine code search: Understands code structure

Example:

Query: "createUser function"

Traditional search returns:

All files containing string "createUser" (hundreds of matches)

Zine code search returns (ranked):

Function definition: function createUser() in auth-service/users.js
Function calls: Where createUser() is called (usage examples)
Tests: test('createUser should...)
Documentation: README mentioning createUser
Slack discussions: Team discussing createUser implementation

Supported Languages

Strongly supported (syntax-aware):

JavaScript, TypeScript, React (JSX/TSX)
Python
Java, Kotlin
Go
Rust
C, C++
C#
Ruby
PHP
Swift

Text search (still searchable, less syntax awareness):

Shell scripts (bash, zsh)
SQL
YAML, JSON, TOML
Markdown
HTML, CSS

Example: Code Search

Query: "authentication middleware"

Returns:

GitHub code: middleware/auth.ts (function authenticateRequest())
GitHub PR #234: "Add authentication middleware" (implementation)
Slack #engineering: Discussion about auth approach
Notion: "Auth Architecture" spec (requirements)
GitHub issues: Bug reports mentioning auth middleware

All connected: Spec → Discussion → Implementation → Issues

Document Processing

PowerPoint/Keynote

Processing:

Extract text from slides
OCR embedded images (screenshots, charts)
Extract speaker notes
Index slide order (Slide 1, 2, 3...)

Search:

Query: "Q4 roadmap"
- Returns: Presentation, jumps to relevant slide

PDF (Scanned Documents)

Processing:

Detect if PDF is searchable (text layer) or scanned (images)
If scanned: OCR every page
Extract tables, charts (with labels)
Index page numbers

Search:

Query: "revenue projections"
- Returns: Financial PDF, page 14

Excel/Google Sheets

Processing:

Extract text from cells
Preserve table structure (row/column context)
Index sheet names

Search:

Query: "Acme Corp pricing"
- Returns: Pricing spreadsheet, Sheet: "Enterprise Customers", Row 23

Searching Multimodal Content

Unified Search (All Formats)

Query: "Redis performance"

Returns (mixed formats):

Meeting recording (audio): Architecture review discussing Redis
Slack screenshot (image): Redis performance graph (OCR'd labels)
GitHub code: redis-client.ts implementation
Notion doc (text): "Redis Configuration Guide"
Loom video: Demo of Redis integration

All ranked by relevance, all formats unified.

Filtering by Content Type

Search only videos:

type:video Redis performance

Search only code:

type:code createUser function

Search only images:

type:image error screenshot

Search only audio/meetings:

type:audio roadmap discussion

Time-Based Filtering

Recent recordings:

after:7d meeting recording

Old presentations (may be outdated):

before:2024-01-01 roadmap presentation

Best Practices

1. Name Files Descriptively

Bad:

recording.mp4
screenshot.png
slides.pptx

Good:

2025-11-13-roadmap-review-meeting.mp4
redis-error-screenshot-2025-11-13.png
Q4-product-roadmap-slides.pptx

Why: Filenames are indexed, help with search.

2. Use Timestamps in Queries

For long recordings:

Zine returns timestamp where match occurs
Click timestamp → jump to exact moment in video/audio

Example: 1-hour meeting, query "mobile app"

Returns: Timestamp 00:34:12 where mobile was discussed
Click → plays from 00:34:12

3. Combine Multimodal with Text Search

Best queries mix formats:

Query: "authentication system"

Returns:

Code (auth-service/)
Meeting recordings (arch reviews)
Slack discussions
Notion specs
Screenshots (auth flow diagrams)

Result: Complete picture, all formats.

4. Upload Meeting Recordings Immediately

Don't wait:

After meetings, upload recording same day
Zine processes in background (10-30 min)
Searchable by next meeting

Benefit: Context preserved while fresh.

5. OCR Whiteboards and Sketches

After whiteboard sessions:

Take photo
Upload to Zine (or share in Slack)
Zine OCRs handwritten notes (if legible)

Searchable: Brainstorming sessions, architecture sketches.

6. Connect YouTube for Tutorials

If your team shares YouTube tutorials:

Connect YouTube channel/playlist via feed
Zine auto-transcribes new videos
Search video content like docs

7. Use Dev Mode for Code + Context

When searching code:

Use Dev Mode (split view)
Left: Code files
Right: Related issues, PRs, Slack discussions

Example: Click auth.ts → See related Slack thread about auth decisions.

Next Steps

Now that you understand multimodal processing:

✅ Upload Meeting Recordings: Past architecture reviews, product demos
✅ Connect Slack: Screenshots shared in channels get OCR'd automatically
✅ Connect GitHub: Code becomes searchable alongside discussions
✅ Test Unified Search: Query something discussed in meeting + Slack + code
✅ Use Timeline View: See chronological narrative across all formats

Related Guides:

Data Connectors - Connect Zoom, Slack, GitHub, Drive
GitHub Intelligence - Deep dive on code search
Slack Knowledge Base - Search screenshots in Slack

Learn More:

Try Zine - Free tier available
Schedule a demo - See multimodal search in action

Text is just one format. Your team's knowledge lives in audio, video, images, and code. Search it all.

Multimodal Processing: Search Text, Audio, Video, Images, and Code

Table of Contents

Supported Formats Overview

All Supported Content Types

Audio Processing

How Audio is Processed

Supported Audio Sources

Example: Meeting Recording Search

Video Processing

How Video is Processed

Supported Video Sources

Example: Product Demo Video Search

Image Processing (OCR)

How Images are Processed

Supported Image Sources

Example: Screenshot Search

Code Processing

How Code is Processed (Syntax-Aware)

Syntax-Aware Search

Supported Languages

Example: Code Search

Document Processing

PowerPoint/Keynote

PDF (Scanned Documents)

Excel/Google Sheets

Searching Multimodal Content

Unified Search (All Formats)

Filtering by Content Type

Time-Based Filtering

Best Practices

1. Name Files Descriptively

2. Use Timestamps in Queries

3. Combine Multimodal with Text Search

4. Upload Meeting Recordings Immediately

5. OCR Whiteboards and Sketches

6. Connect YouTube for Tutorials

7. Use Dev Mode for Code + Context

Next Steps

Ready to Build with Graphlit?