Vector Database

A vector database is a specialized storage system optimized for storing, indexing, and querying high-dimensional vector embeddings. It enables fast semantic similarity search: given a query embedding, find the most similar vectors (and their associated content) based on distance metrics like cosine similarity or Euclidean distance.

Vector databases power retrieval-augmented generation (RAG), semantic search, and recommendation systems. However, they don't provide structured memory—they lack entity linking, relationship graphs, temporal awareness, and reasoning capabilities. For agent memory, vector databases are one component (retrieval), not a complete solution.

The outcome is knowing when vector search is sufficient (semantic similarity tasks) and when you need additional layers (knowledge graphs, timelines, entity resolution) for robust agent memory.

Why it matters

Enables semantic similarity search: Find documents, messages, or data "similar in meaning" to a query—without exact keyword matches.
Powers RAG systems: Retrieve relevant context based on query embeddings, then pass to LLMs for grounded generation.
Scales to large datasets: Vector databases index millions of embeddings for fast approximate nearest neighbor (ANN) search.
Supports multimodal retrieval: Store embeddings for text, images, audio—enable cross-modal search (e.g., "find images similar to this text").
Clarifies limitations: Vector similarity is not structured memory—no entity linking, no relationships, no temporal awareness.
Informs hybrid architectures: Combine vector databases (semantic search) with knowledge graphs (structured reasoning) for robust agent memory.

How it works

Vector databases operate through embedding storage and similarity search:

Ingestion → Content (documents, messages, images) is converted into vector embeddings using models like OpenAI text-embedding-3, Cohere, or custom encoders.
Indexing → Embeddings are indexed using algorithms like HNSW (Hierarchical Navigable Small Worlds) or IVF (Inverted File Index) for fast ANN search.
Query Embedding → When a user queries "show me documents about agent memory," the query is converted into a vector.
Similarity Search → The database finds the k-nearest neighbor embeddings to the query vector (e.g., top 10 most similar).
Result Retrieval → Associated content (document text, metadata) for the top matches is returned to the application.

This pipeline enables semantic retrieval but doesn't track entities, relationships, or time-based context.

Comparison & confusion to avoid

Term	What it is	What it isn't	When to use
Vector Database	Storage for embeddings with fast similarity search	Structured memory with entities, relationships, and timelines	Semantic search and RAG—not agent memory or reasoning
Knowledge Graph	Structured graph of entities and relationships	Similarity-based retrieval—graphs enable traversal and reasoning	Relationship queries and multi-hop reasoning
Agent Memory	Durable, structured, time-aware context for agents	A retrieval system—memory includes entities, timelines, and state	Multi-session continuity, learning, ownership tracking
Relational Database	Tables with foreign keys and joins	Optimized for vector similarity—RDBs don't do semantic search	Transactional workloads with fixed schemas

Examples & uses

RAG for document Q&A
1,000 company documents are chunked and embedded. User asks: "What's our refund policy?" The query is embedded, and the vector database retrieves the 5 most similar chunks. An LLM generates an answer based on retrieved context.

Semantic search for support tickets
Thousands of support tickets are embedded. An agent searches for "authentication errors" (no exact keyword needed). The vector database returns semantically similar tickets like "login failure," "SSO not working," "password reset issues."

Recommendation system
User interactions are embedded (e.g., articles read, products viewed). When a user visits, the system queries for similar embeddings to recommend related content or products.

Limitation: No entity or temporal reasoning
User asks: "What did Alice work on last week?" Vector search finds documents mentioning Alice, but can't track ownership changes, state transitions, or time-based context—knowledge graphs and temporal memory are required.

Best practices

Chunk content appropriately: Embedding a 100-page document as one vector loses granularity—chunk into sections or paragraphs for precise retrieval.
Use hybrid search: Combine vector similarity with keyword filters or metadata (e.g., "semantically similar AND from last month").
Re-embed when content changes: If a document is updated, re-generate its embedding—stale embeddings degrade retrieval quality.
Monitor embedding drift: Model updates (e.g., new OpenAI embedding version) change vector spaces—re-embed corpus or handle version mismatches.
Combine with knowledge graphs: Vector search finds relevant content; knowledge graphs provide structured reasoning—use both for agent memory.
Implement metadata filtering: Store timestamps, authors, tags alongside embeddings—enables filtered retrieval (e.g., "similar to this AND authored by Alice").

Common pitfalls

Assuming vector DB equals agent memory: Vector databases provide retrieval, not structured memory—agents need entity linking, timelines, and reasoning layers.
No entity resolution: If "Alice Johnson" and "Alice J." are embedded separately, they're treated as different—knowledge graphs handle canonicalization.
Ignoring temporal context: Vector search doesn't understand "what changed last week" or "who owned this in October"—temporal memory is required.
Over-relying on embeddings: Semantic similarity is imperfect—hybrid approaches (vector + keyword + graph) improve accuracy.
Not handling embedding updates: When models improve or content changes, embeddings need refresh—plan for versioning and re-indexing.

Vector Database

Vector Database

Why it matters

How it works

Comparison & confusion to avoid

Examples & uses

Best practices

Common pitfalls

See also

Ready to build with Graphlit?