What is an Embedding?
An embedding is a numerical vector representation of text, image, or other data (typically a dense array of 384 to 4,096 floating-point numbers) that encodes semantic meaning in a high-dimensional space, where similar concepts are mathematically close to each other and dissimilar concepts are far apart.
Overview
Embeddings are the foundational data structure underlying nearly every modern AI retrieval system. The intuition: words and concepts that mean similar things should have similar mathematical representations. A high-quality embedding model maps 'doctor' and 'physician' to vectors that are close together in 1,024-dimensional space, while 'doctor' and 'banana' are far apart. This geometric structure makes it possible to search by meaning instead of keywords (semantic search), to find nearest neighbors in a knowledge base (RAG), to cluster related concepts (topic modeling), and to measure similarity between arbitrary pieces of content. The quality of your embedding model directly determines the quality of any system built on top of it.
How embeddings work
An embedding model takes input (text, image, audio, etc.) and produces a fixed-length vector, typically 768, 1,024, 1,536, or 3,072 dimensions for modern text embeddings. The model is trained on huge corpora of text using contrastive objectives: pairs of related items (queries and documents that answer them, paraphrases, translations) are pulled together in the embedding space, while unrelated items are pushed apart. After training, semantically similar inputs reliably produce similar vectors. To compare embeddings, you typically use cosine similarity (the angle between vectors): values close to 1 mean very similar, values close to 0 mean unrelated.
Choosing an embedding model
The trade-offs are quality, dimensionality, and cost. Higher-quality models (OpenAI text-embedding-3-large, Cohere embed-v4, Voyage-3-large) produce better retrieval results but cost more per call and produce larger vectors that need more vector database storage. Smaller models (OpenAI text-embedding-3-small, Cohere embed-v4-light, BGE small) are dramatically cheaper and store more compactly but with measurable quality drop on hard queries. For most production RAG, we benchmark 3-4 candidate models on a representative golden dataset before committing: embedding quality dominates downstream system quality and switching later is expensive.
Multimodal and specialized embeddings
Beyond text: image embeddings (CLIP, JinaCLIP) for visual search, audio embeddings (Whisper-derived) for podcast search, code embeddings (Voyage code, OpenAI code-search) for repository search, and multimodal embeddings (CLIP, ImageBind) that unify text and images in the same space. Specialized embeddings: Cohere's multilingual embed-v4 for cross-language retrieval, Voyage's domain-tuned models for legal/medical/finance, BGE's matryoshka-trained embeddings that let you truncate to lower dimensions at inference time without retraining.
Use cases
- RAG retrieval: embedding the user query and document chunks, finding nearest neighbors
- Semantic search: search by meaning instead of keywords (e.g., enterprise knowledge bases)
- Recommendation systems: find similar products, similar users, similar content
- Clustering and topic modeling: automatically group related documents
- Deduplication and near-duplicate detection (legal documents, content moderation)
- Cross-language retrieval (find German documents matching an English query)
- Anomaly detection (flag content unlike anything in your historical corpus)
Examples in production
OpenAI text-embedding-3 family
OpenAI's text-embedding-3-small (1,536 dims) and text-embedding-3-large (3,072 dims) are the most-used commercial embedding models in 2026. Strong general-purpose quality, predictable pricing, supports matryoshka truncation.
SourceCohere Embed v4
Cohere's embed-v4 supports 100+ languages, 4 modes (search query, search document, classification, clustering), and matryoshka truncation. Strong multilingual quality.
SourceVoyage AI
Voyage AI publishes domain-tuned embedding models (voyage-law-2, voyage-finance-2, voyage-code-2) that consistently outperform generic models on their respective domains in benchmarks.
SourceBGE / sentence-transformers (open source)
BGE (BAAI General Embedding) family provides high-quality open-source embeddings that match commercial APIs on many benchmarks. Sentence-transformers library is the standard interface.
SourceEmbedding compared to alternatives
| Alternative | Choose Embedding when | Choose alternative when |
|---|---|---|
Keyword search (BM25) Classical lexical search ranking documents by exact term overlap with the query | Embeddings when query and documents use different words for the same concepts (semantic search), or when natural-language queries don't match document phrasing. | BM25 when exact term matching matters (product SKUs, error codes, named entities) or when interpretability is critical. Best practice: hybrid search using both. |
Knowledge graph Structured representation of entities and relationships | Embeddings for fuzzy similarity over unstructured content where the relevant relationships aren't pre-modeled. | Knowledge graph when relationships are explicit, structured, and require multi-hop reasoning (organizational hierarchies, drug interactions, legal precedents). |
Common pitfalls
- Mixing embedding models: chunks embedded with one model and queries embedded with another don't align. Pick one and stick with it; re-embedding the entire corpus to switch models is expensive.
- Wrong dimensionality choice: 3,072-dim vectors are great for accuracy but bloat your vector database and increase serving latency. 1,024-1,536 is the sweet spot for most production systems.
- Ignoring the 'mode' parameter: many models (OpenAI, Cohere) embed queries and documents differently. Mixing them silently degrades retrieval.
- No quality benchmark: deploying without a golden dataset means you have no signal when embedding quality drifts. Build the eval harness before the retrieval pipeline.
- Embedding huge chunks: most embedding models have 512-8,192 token limits. Chunks longer than the limit get truncated, losing the latter half of the document.
Questions about Embedding.
Build a golden dataset of (query, relevant_document, irrelevant_document) triples, typically 200-1,000 triples curated by domain experts. For each candidate embedding model, measure recall@k and MRR (mean reciprocal rank) on the golden set. The model with the highest scores wins; pick a single metric (usually recall@10) as the primary decision driver.
OpenAI embeddings (text-embedding-3-small/large) are the default for most production work: strong quality, simple API, predictable pricing. Open-source (BGE, sentence-transformers, E5) is competitive on quality and required for sovereign deployment. We benchmark both on every project; results depend heavily on domain specifics.
OpenAI text-embedding-3-small: ~$0.00002 per 1K tokens. text-embedding-3-large: ~$0.00013 per 1K tokens. Cohere and Voyage are similar order of magnitude. For a 1M-document corpus averaging 500 tokens per chunk, indexing costs $10-65 with managed APIs. Self-hosted open-source models cost only the GPU/CPU cycles.
No: different embedding models produce vectors in different spaces, so they're not comparable. Switching requires re-embedding your entire corpus and rebuilding your vector index. Plan this carefully; pick a strong embedding model upfront and budget for migration if you switch later.
Need help implementing Embedding?
BearPlex builds production AI systems that use Embedding for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.