AI/ML Glossary

Quick reference for core terminology. Each entry links to a deeper concept page where available.

Attention — Mechanism that lets a model weigh the importance of different parts of input relative to each other. Core of the Transformer architecture.

BCI (Behavior-Conditioned Inference) — Amprealize's RAG implementation for retrieving and injecting procedural behaviors into agent context. See BCI In Practice.

Chain of Thought (CoT) — Prompting technique that asks the model to show its reasoning step by step, improving accuracy on complex tasks. See Prompt Engineering.

Context Window — Maximum number of tokens a model can process in a single call. Ranges from 4K to 200K+ depending on model.

Cosine Similarity — Measure of angle between two vectors. Used to compare embeddings. Range: -1 (opposite) to 1 (identical).

Embedding — Dense vector representation of text that captures semantic meaning. See Embeddings.

FAISS — Facebook AI Similarity Search. Library for efficient nearest-neighbor search over vectors. See FAISS.

Few-Shot — Providing examples in the prompt to guide model behavior. Zero-shot = no examples, one-shot = one example.

Fine-Tuning — Continuing to train a pre-trained model on domain-specific data. Expensive but bakes knowledge into weights.

Hallucination — When a model generates confident but factually incorrect content. Caused by autoregressive generation without knowledge grounding.

Hybrid Retrieval — Combining semantic (vector) search with keyword (BM25/TF-IDF) search. See Hybrid Retrieval.

Inference — Running a trained model to produce output. See Inference & Generation.

LLM (Large Language Model) — Neural network with billions of parameters trained on text to predict next tokens. GPT-4, Claude, Llama are LLMs.

MCP (Model Context Protocol) — Standard for tools and resources that LLMs can access. Amprealize exposes its functionality as MCP tools.

Multi-Agent — System where multiple specialized AI agents collaborate on tasks. See Multi-Agent Orchestration.

RAG (Retrieval-Augmented Generation) — Pattern of retrieving relevant context before generating to improve accuracy. See RAG.

Temperature — Parameter controlling randomness in generation. 0 = deterministic, 1 = sample from distribution. See Inference.

Token — Basic unit of text that LLMs process. ~4 characters or ¾ of a word. See Tokenization.

Top-p (Nucleus Sampling) — Sampling strategy that considers tokens until cumulative probability exceeds p. See Inference.

Transformer — Neural network architecture based on self-attention. Foundation of all modern LLMs. See Transformers.

Vector Database — Storage system optimized for similarity search over embedding vectors. See FAISS.

Bitter Lesson — Richard Sutton's 2019 observation that general methods leveraging computation consistently outperform approaches leveraging hand-crafted human knowledge. Implies that search and learning are the only strategies that scale indefinitely. See The Bitter Lesson & Search at Scale.

Context Fragment — A discrete unit of information explicitly loaded into an agent's context window by the harness. Each fragment represents a deliberate design decision about what the model needs to see. See Agent Harnesses & Context Fragments.

Experiential Memory — Agent memory accumulated over interactions, analogous to human episodic memory. Includes both raw traces and distilled higher-level patterns derived from those traces. See Experiential Memory for AI Agents.

Harness — The orchestration layer that wraps an LLM: populating the context window, routing outputs, managing state between calls, and enforcing boundaries. Frameworks like LangChain and Anthropic's Agent SDK are harness implementations. See Agent Harnesses & Context Fragments.

Memory Distillation — The process of converting raw agent experience traces into compact, generalizable, retrievable higher-level primitives. Analogous to how humans consolidate episodic memories into general knowledge. See Experiential Memory for AI Agents.