What is a Vector Database?
A vector database is a specialized database optimized for storing and querying high-dimensional embedding vectors: supporting fast nearest-neighbor search across millions or billions of vectors using approximate nearest-neighbor (ANN) algorithms like HNSW, IVF, or DiskANN, typically with sub-100ms query latency at scale.
Overview
Vector databases are the storage and retrieval layer that makes RAG and semantic search practical at production scale. Without a vector database, finding the most similar embedding to a query in a 10-million-vector corpus would require comparing the query against every stored vector: billions of dot products per query. Vector databases solve this with approximate nearest-neighbor algorithms that build indexes (HNSW graphs, IVF clusters, DiskANN trees) enabling sub-100ms search across massive corpora. They also handle the operational concerns naive solutions miss: hybrid search combining vectors with metadata filters, multi-tenancy, replication, backup, and access control. Choosing the right vector database is a meaningful architectural decision: switching later means re-indexing your entire corpus.
How vector databases work
Two key operations: insert (store an embedding vector with optional metadata) and query (find the K nearest vectors to a query vector by cosine similarity, dot product, or Euclidean distance). The challenge is doing query at scale. Naive brute force is O(N) per query: too slow for millions of vectors. Approximate nearest neighbor (ANN) algorithms trade a tiny accuracy loss (typically 95-99% recall vs exact search) for orders-of-magnitude speedup. HNSW (Hierarchical Navigable Small World) builds a multi-layer graph for fast traversal: fastest in-memory ANN, used by most modern databases. IVF (Inverted File Index) partitions vectors into clusters, querying only nearby clusters: strong for very large corpora. DiskANN extends ANN to billion-scale corpora that don't fit in RAM.
Hybrid search and metadata filtering
Pure vector search isn't enough for most production RAG. Real systems need to combine vector similarity with: keyword search (BM25 for exact term matching), metadata filters (only return documents owned by user X, only documents from 2024+), and access control (role-based permissions enforced at query time). Modern vector databases (Qdrant, Weaviate, Pinecone) support all three natively. The performance trade-off matters: filtering AFTER vector search misses results outside the top-K; filtering BEFORE requires the database to maintain inverted indexes alongside vector indexes. Pinecone's filter-then-search architecture is the production standard.
Choosing a vector database
Three axes of choice: managed vs self-hosted, scale, and feature set. Managed (Pinecone, Weaviate Cloud, Qdrant Cloud) lets you skip operational overhead at the cost of pricing and vendor lock-in. Self-hosted (Qdrant, Weaviate, Milvus, pgvector) gives sovereignty and control at the cost of operational complexity. For pgvector + Postgres: handles up to ~10M vectors at modest scale, perfect when you already run Postgres. For Pinecone: managed, easy, the default for fast time-to-production. For Qdrant: best self-hosted option with strong hybrid search. For Weaviate: graph + vector hybrid for complex relationships. For Milvus: hyperscale (billions of vectors) with the operational overhead to match.
Use cases
- RAG retrieval over enterprise document corpora
- Semantic product search for e-commerce
- Recommendation systems (similar items, similar users)
- Image and multimedia search via multimodal embeddings
- Anomaly and fraud detection using historical pattern embeddings
- Code search and code-intelligence systems
- Conversational memory storage for long-running agent systems
Examples in production
Pinecone
Pinecone is the most-deployed managed vector database: used by thousands of companies including Notion, Klarna, and Microsoft. Strong performance at scale, mature hybrid search, simple API.
SourceQdrant
Qdrant is the leading self-hosted vector database in 2026: Rust-based for performance, strong hybrid search, excellent operational story for sovereign deployment.
Sourcepgvector + Postgres
pgvector turns Postgres into a competent vector database via a single extension. Used by Supabase, Neon, and any team that wants vector search without adopting a new database.
SourceWeaviate
Weaviate combines vector search with graph relationships and structured metadata. Strong choice when your retrieval needs to traverse explicit relationships (knowledge graphs over documents).
SourceVector Database compared to alternatives
| Alternative | Choose Vector Database when | Choose alternative when |
|---|---|---|
Elasticsearch / OpenSearch Full-text search engines that have added vector search support | Dedicated vector databases for vector-first workloads where ANN performance and hybrid search architecture matter. | Elasticsearch when you already run it for full-text search and your vector workload is a small extension. Native vector databases will outperform on pure vector search. |
pgvector + Postgres Postgres extension that adds vector indexing and similarity search | Dedicated vector databases when you need >10M vectors, advanced features (multi-tenancy, replication strategies for vectors), or maximum performance. | pgvector when you already run Postgres, your scale is modest (<10M vectors), and operational simplicity beats peak performance. Excellent default for prototypes and many production systems. |
Brute-force in-memory search (FAISS, NumPy) Direct vector comparison without an indexing database | Vector database when you need persistence, multi-user access, hybrid search, metadata filtering, or any operational feature beyond raw search. | FAISS / NumPy for prototypes, single-user research, or batch processing where persistence and concurrent access don't matter. |
Common pitfalls
- Wrong index choice: HNSW is fast but RAM-hungry; IVF scales further but with higher latency. Picking based on hype instead of workload characteristics is a common mistake.
- Insufficient metadata indexing: filtering by user/date/permission requires inverted indexes alongside vector indexes. Skipping this leads to filter-then-rerank patterns that miss results.
- Ignoring sharding strategy: as vector count grows past 10M-50M, sharding becomes necessary. Plan for this in initial architecture or face painful rebuilds.
- Dimensional explosion: 3,072-dim vectors at 100M scale require >1TB RAM for in-memory indexes. Use lower-dim embeddings or DiskANN-style on-disk indexing.
- Forgetting access control: most tutorials show vector search without permissions. Production requires filter-first retrieval enforced by the database: bolting permissions on top of unscoped retrieval is broken.
Questions about Vector Database.
Up to roughly 10M vectors with HNSW indexing on a well-tuned Postgres instance. Past that, query latency degrades and you should consider dedicated vector databases. For most teams under 10M vectors, pgvector is the right answer because operational simplicity beats marginal performance gains.
Filter-first retrieval: encode user permissions into vector metadata, then filter the index by those permissions BEFORE running the similarity search. Pinecone, Qdrant, and Weaviate all support this natively. NEVER bolt access control on after retrieval: by then, sensitive context has already passed through the model.
With HNSW indexing on a properly-resourced database, expect 5-30ms query latency for corpora under 10M vectors. At 100M+, latency creeps to 50-100ms. Add 100-300ms of network latency for managed services. Total RAG query budget (embedding + vector search + reranking + LLM): typically 500-2,000ms.
Modern vector databases (Qdrant, Weaviate, Pinecone) support hybrid search natively: combining ANN similarity with BM25 keyword scoring and metadata filters. The combination consistently outperforms pure vector search for retrieval quality. Always evaluate hybrid vs vector-only on your golden dataset.
Need help implementing Vector Database?
BearPlex builds production AI systems that use Vector Database for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.