How do I evaluate embedding model quality?

Build a golden dataset of (query, relevant_document, irrelevant_document) triples, typically 200-1,000 triples curated by domain experts. For each candidate embedding model, measure recall@k and MRR (mean reciprocal rank) on the golden set. The model with the highest scores wins; pick a single metric (usually recall@10) as the primary decision driver.

Should I use OpenAI embeddings or open-source embeddings?

OpenAI embeddings (text-embedding-3-small/large) are the default for most production work: strong quality, simple API, predictable pricing. Open-source (BGE, sentence-transformers, E5) is competitive on quality and required for sovereign deployment. We benchmark both on every project; results depend heavily on domain specifics.

How much do embeddings cost in production?

OpenAI text-embedding-3-small: ~$0.00002 per 1K tokens. text-embedding-3-large: ~$0.00013 per 1K tokens. Cohere and Voyage are similar order of magnitude. For a 1M-document corpus averaging 500 tokens per chunk, indexing costs $10-65 with managed APIs. Self-hosted open-source models cost only the GPU/CPU cycles.

Can I switch embedding models without re-indexing?

No: different embedding models produce vectors in different spaces, so they're not comparable. Switching requires re-embedding your entire corpus and rebuilding your vector index. Plan this carefully; pick a strong embedding model upfront and budget for migration if you switch later.

Start a conversation

AI engineering glossary

What is an Embedding?

An embedding is a numerical vector representation of text, image, or other data (typically a dense array of 384 to 4,096 floating-point numbers) that encodes semantic meaning in a high-dimensional space, where similar concepts are mathematically close to each other and dissimilar concepts are far apart.

Last updated 2026-04-27BearPlex AI Engineering Team

Overview

Embeddings are the foundational data structure underlying nearly every modern AI retrieval system. The intuition: words and concepts that mean similar things should have similar mathematical representations. A high-quality embedding model maps 'doctor' and 'physician' to vectors that are close together in 1,024-dimensional space, while 'doctor' and 'banana' are far apart. This geometric structure makes it possible to search by meaning instead of keywords (semantic search), to find nearest neighbors in a knowledge base (RAG), to cluster related concepts (topic modeling), and to measure similarity between arbitrary pieces of content. The quality of your embedding model directly determines the quality of any system built on top of it.

How embeddings work

An embedding model takes input (text, image, audio, etc.) and produces a fixed-length vector, typically 768, 1,024, 1,536, or 3,072 dimensions for modern text embeddings. The model is trained on huge corpora of text using contrastive objectives: pairs of related items (queries and documents that answer them, paraphrases, translations) are pulled together in the embedding space, while unrelated items are pushed apart. After training, semantically similar inputs reliably produce similar vectors. To compare embeddings, you typically use cosine similarity (the angle between vectors): values close to 1 mean very similar, values close to 0 mean unrelated.

Choosing an embedding model

The trade-offs are quality, dimensionality, and cost. Higher-quality models (OpenAI text-embedding-3-large, Cohere embed-v4, Voyage-3-large) produce better retrieval results but cost more per call and produce larger vectors that need more vector database storage. Smaller models (OpenAI text-embedding-3-small, Cohere embed-v4-light, BGE small) are dramatically cheaper and store more compactly but with measurable quality drop on hard queries. For most production RAG, we benchmark 3-4 candidate models on a representative golden dataset before committing: embedding quality dominates downstream system quality and switching later is expensive.

Multimodal and specialized embeddings

Beyond text: image embeddings (CLIP, JinaCLIP) for visual search, audio embeddings (Whisper-derived) for podcast search, code embeddings (Voyage code, OpenAI code-search) for repository search, and multimodal embeddings (CLIP, ImageBind) that unify text and images in the same space. Specialized embeddings: Cohere's multilingual embed-v4 for cross-language retrieval, Voyage's domain-tuned models for legal/medical/finance, BGE's matryoshka-trained embeddings that let you truncate to lower dimensions at inference time without retraining.

Use cases

RAG retrieval: embedding the user query and document chunks, finding nearest neighbors
Semantic search: search by meaning instead of keywords (e.g., enterprise knowledge bases)
Recommendation systems: find similar products, similar users, similar content
Clustering and topic modeling: automatically group related documents
Deduplication and near-duplicate detection (legal documents, content moderation)
Cross-language retrieval (find German documents matching an English query)
Anomaly detection (flag content unlike anything in your historical corpus)

Examples in production

OpenAI text-embedding-3 family

OpenAI's text-embedding-3-small (1,536 dims) and text-embedding-3-large (3,072 dims) are the most-used commercial embedding models in 2026. Strong general-purpose quality, predictable pricing, supports matryoshka truncation.

Source

Cohere Embed v4

Cohere's embed-v4 supports 100+ languages, 4 modes (search query, search document, classification, clustering), and matryoshka truncation. Strong multilingual quality.

Source

Voyage AI

Voyage AI publishes domain-tuned embedding models (voyage-law-2, voyage-finance-2, voyage-code-2) that consistently outperform generic models on their respective domains in benchmarks.

Source

BGE / sentence-transformers (open source)

BGE (BAAI General Embedding) family provides high-quality open-source embeddings that match commercial APIs on many benchmarks. Sentence-transformers library is the standard interface.

Source

Embedding compared to alternatives

Alternative	Choose Embedding when	Choose alternative when
Keyword search (BM25) Classical lexical search ranking documents by exact term overlap with the query	Embeddings when query and documents use different words for the same concepts (semantic search), or when natural-language queries don't match document phrasing.	BM25 when exact term matching matters (product SKUs, error codes, named entities) or when interpretability is critical. Best practice: hybrid search using both.
Knowledge graph Structured representation of entities and relationships	Embeddings for fuzzy similarity over unstructured content where the relevant relationships aren't pre-modeled.	Knowledge graph when relationships are explicit, structured, and require multi-hop reasoning (organizational hierarchies, drug interactions, legal precedents).

Common pitfalls

Mixing embedding models: chunks embedded with one model and queries embedded with another don't align. Pick one and stick with it; re-embedding the entire corpus to switch models is expensive.
Wrong dimensionality choice: 3,072-dim vectors are great for accuracy but bloat your vector database and increase serving latency. 1,024-1,536 is the sweet spot for most production systems.
Ignoring the 'mode' parameter: many models (OpenAI, Cohere) embed queries and documents differently. Mixing them silently degrades retrieval.
No quality benchmark: deploying without a golden dataset means you have no signal when embedding quality drifts. Build the eval harness before the retrieval pipeline.
Embedding huge chunks: most embedding models have 512-8,192 token limits. Chunks longer than the limit get truncated, losing the latter half of the document.

Related terms

Vector Database RAG Semantic Search

Related BearPlex services

RAG & Knowledge Systems

Full AI glossary

FAQ

Questions about Embedding.

1,024-1,536 dimensions is the sweet spot for most production systems: accuracy stays high while storage cost and serving latency stay manageable. 3,072 dimensions (OpenAI text-embedding-3-large) is overkill for most use cases. Models with matryoshka representation learning let you truncate to lower dimensions without re-embedding: useful for cost optimization.

Need help implementing Embedding?

BearPlex builds production AI systems that use Embedding for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.

Talk to BearPlex See case studies

What is an Embedding?

Overview

How embeddings work

Choosing an embedding model

Multimodal and specialized embeddings

Use cases

Examples in production

OpenAI text-embedding-3 family

Cohere Embed v4

Voyage AI

BGE / sentence-transformers (open source)

Embedding compared to alternatives

Common pitfalls

Related terms

Related BearPlex services

Questions about Embedding.

Related reading

Need help implementing Embedding?