Architecture

Embeddings

Numerical representations of text, images, or other data that capture semantic meaning in a format AI models can process.

What are embeddings?

Embeddings are numerical representations—lists of numbers (vectors)—that capture the meaning of text, images, or other data in a format AI can work with.

When you convert text to an embedding, similar meanings get similar numbers. "Dog" and "puppy" have embeddings close together in vector space, while "dog" and "motorcycle" are far apart.

Example embedding (simplified—real ones have hundreds or thousands of dimensions):

"happy" → [0.8, 0.1, 0.9, -0.2, ...]
"joyful" → [0.79, 0.12, 0.88, -0.18, ...]  // Very similar
"sad" → [-0.7, 0.2, -0.8, 0.3, ...]        // Very different

This numerical representation lets us do mathematical operations on meaning—finding similar documents, clustering related concepts, or searching by semantic similarity.

How do embeddings work?

Creating embeddings:

  1. An embedding model (like OpenAI's text-embedding-3-small) processes your text
  2. The model outputs a fixed-size vector (e.g., 1536 dimensions)
  3. This vector captures semantic meaning learned during training

Measuring similarity:

To find how similar two texts are, calculate the distance between their embeddings:

  • Cosine similarity: Measures the angle between vectors (most common)
  • Euclidean distance: Measures straight-line distance
  • Dot product: Quick similarity measure for normalized vectors

Higher cosine similarity = more semantically similar.

Why this works:

Embedding models learn from massive text datasets. They learn that words appearing in similar contexts have similar meanings. This distributional semantics gets encoded into the vector space.

Types of embeddings

Text embeddings Convert words, sentences, or documents to vectors. Used for search, classification, and RAG.

  • OpenAI text-embedding-3-small/large
  • Cohere embed-v3
  • BGE, E5 (open source)

Image embeddings Convert images to vectors. Used for visual search and image similarity.

  • CLIP (OpenAI)
  • ViT (Vision Transformer)

Multimodal embeddings Embed different types of data in the same space, enabling cross-modal search (find images using text queries).

  • CLIP
  • ImageBind

Code embeddings Optimized for programming languages. Used for code search and similarity.

  • CodeBERT
  • StarEncoder

Custom embeddings Fine-tuned for specific domains (legal, medical, scientific) to improve accuracy in specialized contexts.

What are embeddings used for?

Semantic search Find documents by meaning, not just keywords. "How do I cancel my subscription?" matches "Ending your membership" even without shared words.

RAG (Retrieval-Augmented Generation) Embeddings power the retrieval step—finding relevant documents to include in AI prompts.

Recommendations Find similar products, articles, or content by comparing embeddings.

Clustering Group similar items together. Useful for organizing large document collections or understanding data patterns.

Classification Classify text by comparing its embedding to examples from each category.

Anomaly detection Identify outliers—documents or data points with embeddings far from the norm.

Deduplication Find near-duplicate content by comparing embedding similarity.

How to choose an embedding model

Consider these factors:

Dimension size Larger dimensions (1536, 3072) capture more nuance but cost more to store and search. Smaller dimensions (384, 768) are faster and cheaper.

Performance benchmarks Check MTEB (Massive Text Embedding Benchmark) scores for your use case—retrieval, classification, clustering, etc.

Context length How much text can the model embed at once? Ranges from 512 tokens to 8K+ tokens.

Language support Some models excel at English only; others handle 100+ languages.

Cost API-based models charge per token. Self-hosted open-source models have infrastructure costs.

Popular choices:

ModelDimensionsBest for
text-embedding-3-small1536General purpose, cost-effective
text-embedding-3-large3072Maximum quality
Cohere embed-v31024Multilingual, RAG
BGE-large1024Open source, self-hosted

Embedding best practices

Chunk appropriately For long documents, split into chunks before embedding. 200-500 tokens per chunk typically works well. Include overlap between chunks to preserve context.

Preserve context When chunking, include metadata (document title, section headers) in each chunk so the embedding captures context.

Normalize vectors For cosine similarity, normalize embeddings to unit length. Many models do this automatically.

Batch requests Embed multiple texts in a single API call for efficiency. Most APIs support batching.

Cache embeddings Store computed embeddings rather than regenerating them. Embeddings are deterministic—same input = same output.

Use the same model Query embeddings and document embeddings must use the same model. You can't mix OpenAI embeddings with Cohere embeddings.

Test retrieval quality Before deploying, test that your embedding + retrieval setup actually returns relevant results for real queries.