Why This Matters
Neither pure keyword search nor pure semantic search is perfect. Keyword search misses synonyms ("car" won't match "automobile"). Semantic search can miss exact terms ("error code XYZ-123"). Hybrid retrieval combines both for the best of both worlds.
The Intuition
Imagine searching for a restaurant. Semantic search is like asking a friend: "Know any good Italian places with outdoor seating?" Keyword search is like Ctrl+F on a list: "patio Italian downtown." Neither alone is perfect, but together they cover each other's blind spots.
How It Works
Query: "How does BCI handle behavior retrieval?"
↓
┌──────────────────┐ ┌──────────────────┐
│ Semantic Search │ │ Keyword Search │
│ (FAISS/vectors) │ │ (BM25/TF-IDF) │
│ → top 20 by │ │ → top 20 by │
│ cosine sim │ │ term frequency │
└────────┬─────────┘ └────────┬─────────┘
│ │
└───────┬───────────────┘
↓
[Reciprocal Rank Fusion]
↓
Final top-k resultsFusion Strategies
| Strategy | Description | Trade-off |
|---|---|---|
| Reciprocal Rank Fusion (RRF) | Score = Σ 1/(k + rank_i) | Simple, robust, no tuning |
| Weighted combination | Score = α × semantic + (1-α) × keyword | Needs tuning of α |
| Cross-encoder re-ranking | Re-rank union with a neural model | Best quality, most expensive |
When to Use
- Semantic only: Conceptual questions ("What is attention?")
- Keyword only: Exact matches ("error code E1234")
- Hybrid: Real-world queries that mix concepts and specifics
See Also
- RAG — The broader pattern hybrid retrieval improves
- FAISS — The vector search component
- Behavior Retrieval In Practice — Hybrid retrieval in action