Hybrid Retrieval

Why This Matters

Neither pure keyword search nor pure semantic search is perfect. Keyword search misses synonyms ("car" won't match "automobile"). Semantic search can miss exact terms ("error code XYZ-123"). Hybrid retrieval combines both for the best of both worlds.

The Intuition

Imagine searching for a restaurant. Semantic search is like asking a friend: "Know any good Italian places with outdoor seating?" Keyword search is like Ctrl+F on a list: "patio Italian downtown." Neither alone is perfect, but together they cover each other's blind spots.

How It Works

Query: "How does BCI handle behavior retrieval?"

┌──────────────────┐    ┌──────────────────┐
│ Semantic Search   │    │ Keyword Search    │
│ (FAISS/vectors)   │    │ (BM25/TF-IDF)    │
│ → top 20 by       │    │ → top 20 by       │
│   cosine sim      │    │   term frequency   │
└────────┬─────────┘    └────────┬─────────┘
         │                       │
         └───────┬───────────────┘

         [Reciprocal Rank Fusion]

         Final top-k results

Fusion Strategies

Strategy Description Trade-off
Reciprocal Rank Fusion (RRF) Score = Σ 1/(k + rank_i) Simple, robust, no tuning
Weighted combination Score = α × semantic + (1-α) × keyword Needs tuning of α
Cross-encoder re-ranking Re-rank union with a neural model Best quality, most expensive

When to Use

  • Semantic only: Conceptual questions ("What is attention?")
  • Keyword only: Exact matches ("error code E1234")
  • Hybrid: Real-world queries that mix concepts and specifics

See Also

PRIVATE PREVIEW

Request early access

Amprealize is invite-only during the preview. Share a little context and we’ll reach out.