Skip to main content
AI engineering glossary

What is a Vector Database?

A vector database is a specialized database optimized for storing and querying high-dimensional embedding vectors: supporting fast nearest-neighbor search across millions or billions of vectors using approximate nearest-neighbor (ANN) algorithms like HNSW, IVF, or DiskANN, typically with sub-100ms query latency at scale.

Last updated 2026-04-27BearPlex AI Engineering Team

Overview

Vector databases are the storage and retrieval layer that makes RAG and semantic search practical at production scale. Without a vector database, finding the most similar embedding to a query in a 10-million-vector corpus would require comparing the query against every stored vector: billions of dot products per query. Vector databases solve this with approximate nearest-neighbor algorithms that build indexes (HNSW graphs, IVF clusters, DiskANN trees) enabling sub-100ms search across massive corpora. They also handle the operational concerns naive solutions miss: hybrid search combining vectors with metadata filters, multi-tenancy, replication, backup, and access control. Choosing the right vector database is a meaningful architectural decision: switching later means re-indexing your entire corpus.

How vector databases work

Two key operations: insert (store an embedding vector with optional metadata) and query (find the K nearest vectors to a query vector by cosine similarity, dot product, or Euclidean distance). The challenge is doing query at scale. Naive brute force is O(N) per query: too slow for millions of vectors. Approximate nearest neighbor (ANN) algorithms trade a tiny accuracy loss (typically 95-99% recall vs exact search) for orders-of-magnitude speedup. HNSW (Hierarchical Navigable Small World) builds a multi-layer graph for fast traversal: fastest in-memory ANN, used by most modern databases. IVF (Inverted File Index) partitions vectors into clusters, querying only nearby clusters: strong for very large corpora. DiskANN extends ANN to billion-scale corpora that don't fit in RAM.

Hybrid search and metadata filtering

Pure vector search isn't enough for most production RAG. Real systems need to combine vector similarity with: keyword search (BM25 for exact term matching), metadata filters (only return documents owned by user X, only documents from 2024+), and access control (role-based permissions enforced at query time). Modern vector databases (Qdrant, Weaviate, Pinecone) support all three natively. The performance trade-off matters: filtering AFTER vector search misses results outside the top-K; filtering BEFORE requires the database to maintain inverted indexes alongside vector indexes. Pinecone's filter-then-search architecture is the production standard.

Choosing a vector database

Three axes of choice: managed vs self-hosted, scale, and feature set. Managed (Pinecone, Weaviate Cloud, Qdrant Cloud) lets you skip operational overhead at the cost of pricing and vendor lock-in. Self-hosted (Qdrant, Weaviate, Milvus, pgvector) gives sovereignty and control at the cost of operational complexity. For pgvector + Postgres: handles up to ~10M vectors at modest scale, perfect when you already run Postgres. For Pinecone: managed, easy, the default for fast time-to-production. For Qdrant: best self-hosted option with strong hybrid search. For Weaviate: graph + vector hybrid for complex relationships. For Milvus: hyperscale (billions of vectors) with the operational overhead to match.

Use cases

  • RAG retrieval over enterprise document corpora
  • Semantic product search for e-commerce
  • Recommendation systems (similar items, similar users)
  • Image and multimedia search via multimodal embeddings
  • Anomaly and fraud detection using historical pattern embeddings
  • Code search and code-intelligence systems
  • Conversational memory storage for long-running agent systems

Examples in production

Pinecone

Pinecone is the most-deployed managed vector database: used by thousands of companies including Notion, Klarna, and Microsoft. Strong performance at scale, mature hybrid search, simple API.

Source

Qdrant

Qdrant is the leading self-hosted vector database in 2026: Rust-based for performance, strong hybrid search, excellent operational story for sovereign deployment.

Source

pgvector + Postgres

pgvector turns Postgres into a competent vector database via a single extension. Used by Supabase, Neon, and any team that wants vector search without adopting a new database.

Source

Weaviate

Weaviate combines vector search with graph relationships and structured metadata. Strong choice when your retrieval needs to traverse explicit relationships (knowledge graphs over documents).

Source

Vector Database compared to alternatives

AlternativeChoose Vector Database whenChoose alternative when
Elasticsearch / OpenSearch
Full-text search engines that have added vector search support
Dedicated vector databases for vector-first workloads where ANN performance and hybrid search architecture matter.Elasticsearch when you already run it for full-text search and your vector workload is a small extension. Native vector databases will outperform on pure vector search.
pgvector + Postgres
Postgres extension that adds vector indexing and similarity search
Dedicated vector databases when you need >10M vectors, advanced features (multi-tenancy, replication strategies for vectors), or maximum performance.pgvector when you already run Postgres, your scale is modest (<10M vectors), and operational simplicity beats peak performance. Excellent default for prototypes and many production systems.
Brute-force in-memory search (FAISS, NumPy)
Direct vector comparison without an indexing database
Vector database when you need persistence, multi-user access, hybrid search, metadata filtering, or any operational feature beyond raw search.FAISS / NumPy for prototypes, single-user research, or batch processing where persistence and concurrent access don't matter.

Common pitfalls

  • Wrong index choice: HNSW is fast but RAM-hungry; IVF scales further but with higher latency. Picking based on hype instead of workload characteristics is a common mistake.
  • Insufficient metadata indexing: filtering by user/date/permission requires inverted indexes alongside vector indexes. Skipping this leads to filter-then-rerank patterns that miss results.
  • Ignoring sharding strategy: as vector count grows past 10M-50M, sharding becomes necessary. Plan for this in initial architecture or face painful rebuilds.
  • Dimensional explosion: 3,072-dim vectors at 100M scale require >1TB RAM for in-memory indexes. Use lower-dim embeddings or DiskANN-style on-disk indexing.
  • Forgetting access control: most tutorials show vector search without permissions. Production requires filter-first retrieval enforced by the database: bolting permissions on top of unscoped retrieval is broken.
FAQ

Questions about Vector Database.

Three solid defaults: Pinecone for managed (fastest time-to-production, strong scale), Qdrant for self-hosted (best sovereignty, strong hybrid search), pgvector for teams already on Postgres at modest scale. We benchmark for the specific workload before committing: query patterns, scale, and metadata complexity all influence the right choice.

Up to roughly 10M vectors with HNSW indexing on a well-tuned Postgres instance. Past that, query latency degrades and you should consider dedicated vector databases. For most teams under 10M vectors, pgvector is the right answer because operational simplicity beats marginal performance gains.

Filter-first retrieval: encode user permissions into vector metadata, then filter the index by those permissions BEFORE running the similarity search. Pinecone, Qdrant, and Weaviate all support this natively. NEVER bolt access control on after retrieval: by then, sensitive context has already passed through the model.

With HNSW indexing on a properly-resourced database, expect 5-30ms query latency for corpora under 10M vectors. At 100M+, latency creeps to 50-100ms. Add 100-300ms of network latency for managed services. Total RAG query budget (embedding + vector search + reranking + LLM): typically 500-2,000ms.

Modern vector databases (Qdrant, Weaviate, Pinecone) support hybrid search natively: combining ANN similarity with BM25 keyword scoring and metadata filters. The combination consistently outperforms pure vector search for retrieval quality. Always evaluate hybrid vs vector-only on your golden dataset.

Work with BearPlex

Need help implementing Vector Database?

BearPlex builds production AI systems that use Vector Database for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.