Dictionary

50 terms Key terms and concepts in context engineering, token economics, and AI-assisted development.

A

Agent Hooks

architecture

Deterministic shell commands that run at key lifecycle points during agent sessions (e.g., PreToolUse, PostToolUse, SessionStart, SessionStop). Unlike instructions that guide behavior, hooks execute code with guaranteed outcomes — useful for enforcing security policies, automating code quality checks, or creating audit trails.

hooksagentslifecycleautomation

Agent Orchestration

architecture

A pattern where multiple specialized subagents collaborate to achieve a complex goal, each operating in its own dedicated context window. Enables context efficiency (no shared overflow), specialization (different models per task), and parallel execution of independent subtasks. Examples include conductor-based systems with planning, implementation, and review agents.

orchestrationagentsparallelmulti-agent

Agent Skills

architecture

Reusable packages of domain expertise defined in SKILL.md files that provide specialized capabilities, knowledge, and refined workflows to AI agents. Skills can be scoped to specific domains like testing strategies, API design, or performance optimization, and can be invoked as slash commands or loaded automatically by the model.

skillsagentsreusablecustomization

Attention Budget

context management

The effective amount of context a model can meaningfully attend to. While context windows may be large (128K+), attention quality degrades with length. The practical attention budget is often smaller than the raw token limit.

contextattentionlimits

Automatic Context Compaction

memory strategies

An API-level feature (e.g. Anthropic’s compaction_control parameter) that automatically summarizes conversation history when token usage exceeds a configurable threshold. Can achieve 50-60% token reduction transparently, without application-level code changes.

compactionautomaticapi
B

Batch Processing

token economics

An API tier that processes requests asynchronously at reduced cost, ideal for non-time-sensitive workloads like bulk analysis or testing. Typically offers the lowest per-token pricing.

pricingbatchoptimization

Break-Even Point

caching

The number of requests after which prompt caching becomes cost-effective compared to standard pricing. Calculated from cache write cost vs. cumulative savings from cache reads.

cachingroianalysis
C

Cache Read

caching

The cost of using a previously cached prompt prefix. Significantly cheaper than standard input (typically 90% discount). The break-even point depends on how many times the cached prefix is reused.

cachingcostread

Cache Write

caching

The cost of storing a prompt prefix in the cache for the first time. Typically 25% more expensive than standard input pricing, but pays for itself when the same prefix is reused multiple times.

cachingcostwrite

Chain of Thought (CoT)

prompt engineering

A prompting technique where the model is instructed to show its reasoning step by step before giving a final answer. Improves accuracy on complex tasks but increases output token usage.

promptsreasoningtechnique

Cobb-Douglas Model

evaluation

An economic production function adapted for AI cost analysis. Models the relationship between token inputs (quantity and quality) and output quality, helping find the optimal trade-off between cost and performance.

economicsmodelingoptimization

Compaction

context management

When a session nears its token limit, the assistant summarizes critical details — such as architectural decisions and unresolved bugs — while discarding redundant tool outputs. This reclaims context budget without losing essential information.

contextcompressionlong-horizon

Context Editing

memory strategies

An advanced context management technique where a secondary model reviews and removes stale or redundant information from the conversation before the next turn. Like an auto-cleaner that tidies the desk. Can achieve up to 84% token reduction while maintaining coherence.

editingcontextoptimization

Context Engineering

context management

The discipline of designing and managing the information provided to an AI model to maximize output quality while minimizing token costs. Encompasses prompt design, file selection, caching strategy, and context window management.

contextengineeringoptimization

Context Pollution

context management

The accumulation of irrelevant, redundant, or misleading information in the context window that degrades model performance. Includes distractors, context rot, and poor structural patterns. Fighting context pollution is a core challenge of context engineering.

contextpollutionquality

Context Rot

context management

As more tokens are added to a conversation, the model’s ability to accurately retrieve specific pieces of information from the context decreases. Long conversations suffer from degraded attention, making early details harder to recall.

contextdegradationtokens

Context Token Threshold

memory strategies

A configurable token count (typically between 5,000 and 150,000) at which automatic context compaction is triggered. When conversation tokens exceed this threshold, the system summarizes older turns to reclaim budget.

compactionthresholdconfiguration

Context Window

context management

The maximum number of tokens (input + output) a model can process in a single request. Think of it as the AI’s short-term memory or desk — everything must fit on the desk at once. Ranges from 8K to 2M tokens depending on the model.

tokenscontextlimits

Copilot Memory

memory strategies

A persistent cross-session memory store for GitHub Copilot that lets the agent save and recall important information across chat sessions. The agent recognizes when to store facts (e.g., 'always ask clarifying questions') and retrieves relevant memories to inform future responses, eliminating the need to repeatedly provide the same context.

memorypersistencecopilotcross-session

Cost per Million Tokens (MTok)

token economics

Standard pricing unit for LLM APIs. For example, Claude Sonnet 4.5 costs $3/MTok input and $15/MTok output. This metric allows comparison across providers and models.

pricingcostcomparison
D

Distractors

context management

Files or code snippets that are topically related to the query but do not contain the answer. These can cause the model to lose focus or hallucinate, degrading output quality. Effective context engineering actively filters distractors.

contextpollutionrelevance
E

Extended Thinking

prompt engineering

A model capability where additional compute is used for internal reasoning before generating a response. The thinking tokens consume budget but can dramatically improve quality on hard problems.

reasoningthinkingquality
F

Function Calling / Tool Use

architecture

The ability of a model to invoke external functions or tools during generation. The model outputs structured parameters that the client executes, returning results back into the context. Enables agentic workflows.

toolsfunctionsagents
H

High-Signal Tokens

token economics

The objective of context engineering: provide the smallest possible set of tokens that maximize the likelihood of correct code generation. Every token should contribute meaningfully to the model’s understanding.

tokensoptimizationquality
I

Input Tokens

token economics

Tokens sent to the model in a request, including system prompts, conversation history, and user messages. Input tokens are typically cheaper than output tokens and form the bulk of context window usage.

tokensinputpricing
J

JIT Context (Just-in-Time)

context management

A strategy where context is loaded dynamically at runtime rather than pre-loaded. The IDE or agent fetches only the files, symbols, or data needed for the current step — similar to how modern IDEs lazy-load imports. Reduces waste and keeps the context window focused.

contextjitdynamic
L

Lightweight Identifiers

context management

The assistant maintains references (file paths, stored queries) and dynamically loads only the necessary data at runtime using tools like grep, head, or tail. This avoids stuffing the full content into the context window upfront.

contextreferencesefficiency
M

MCP Apps

architecture

Rich, interactive UI components rendered by MCP (Model Context Protocol) servers directly in the chat client. Enables models to display visualizations like flame graphs, flowcharts, and data dashboards inline, providing visual context alongside text-based responses.

mcpvisualizationinteractivetools

Memory Tool

memory strategies

An external persistent storage mechanism (like a filing cabinet) that the model can read from and write to across sessions. Unlike the context window (short-term desk), the memory tool persists information permanently. Used for user preferences, project knowledge, and cross-session continuity.

memorypersistenceexternal

Message Steering

context management

An agent interaction pattern where a follow-up message signals the currently running request to yield after finishing the active tool execution, then processes the new message immediately. Used to redirect an agent heading in the wrong direction without cancelling or waiting for the full response to complete.

agentssteeringqueueinginteraction
O

Output Tokens

token economics

Tokens generated by the model in response. Output tokens are typically 3-5x more expensive than input tokens. Controlling output length through instructions and max_tokens is a key cost optimization lever.

tokensoutputpricing
P

Progressive Disclosure

context management

Instead of loading an entire codebase — which would immediately overwhelm the attention budget — modern agents use JIT (just-in-time) context. The assistant dynamically loads only the necessary data at runtime, revealing information progressively as needed.

contextjitoptimization

Prompt

prompt engineering

The complete set of instructions and context sent to an AI model in a single request. Includes system instructions, user message, conversation history, and any retrieved context. The quality of the prompt directly determines the quality of the output.

promptsfundamentalsinstructions

Prompt Caching

caching

A technique where frequently-used prompt prefixes are stored server-side, allowing subsequent requests with the same prefix to be processed at reduced cost (typically 90% cheaper). Requires a minimum token threshold to activate.

cachingoptimizationcost
R

RAG (Retrieval-Augmented Generation)

architecture

A pattern where relevant documents are retrieved from a knowledge base and injected into the context before generation. Enables models to reference up-to-date or domain-specific information without fine-tuning.

ragretrievalarchitecture
S

Structural Patterns

context management

Research finding that models often perform differently on shuffled vs. logically structured context. The placement and organization of information within the context window affects retrieval accuracy and generation quality.

contextstructureresearch

Structured Note-taking

context management

The agent maintains an external NOTES.md or to-do list to track dependencies and progress across thousands of steps. After a context reset, it can read these notes back to restore essential state without replaying the full history.

contextpersistencenotes

Subagent

architecture

An isolated agent invoked by a parent agent to handle a subtask in its own dedicated context window. Subagents prevent context overflow in the main agent by running independently, and can execute in parallel for tasks that can be split into independent parts.

subagentagentscontext-isolationdelegation

Summarisation

memory strategies

A context management strategy that condenses the full conversation history into a compact executive summary before each new turn. Preserves the big picture but loses verbatim detail and adds latency from the summarisation LLM call.

summarycontextcompression

Summary Prompt

memory strategies

A custom instruction given to the compaction system that controls how conversation history is summarized. Allows domain-specific summarisation — for example, a customer service bot can be told to always preserve order IDs, account numbers, and resolution status in the summary.

compactionsummarycustomization

System Prompt

prompt engineering

Instructions provided at the beginning of a conversation to set the model’s behavior, personality, and constraints. System prompts are ideal candidates for caching since they remain constant across requests.

promptssysteminstructions
T

Tab Relevance Scoring

architecture

A technique used in IDE extensions to rank open editor tabs by their relevance to the current task. Factors include import relationships, path similarity, edit recency, and diagnostic overlap.

extensionrelevancescoring

Terminal Sandboxing

architecture

A security mechanism that restricts file system and network access for terminal commands executed by AI agents. Sandboxed commands have read/write access only to the current workspace, and network access can be limited to trusted domains. Helps mitigate risks from agent-executed commands.

securitysandboxagentsterminal

Thinking Tokens

token economics

Tokens generated by a model's internal reasoning process before producing a visible response. Thinking tokens consume context budget but can dramatically improve quality on complex tasks. Anthropic models now support interleaved thinking between tool calls, and the thinking budget is configurable.

thinkingreasoningtokensbudget

Token

token economics

The fundamental unit of text processing for LLMs. A token is roughly 3-4 characters or 0.75 words in English. All API pricing is based on token counts. Understanding tokenization is essential for cost estimation.

tokensfundamentalspricing

Token Budget Dashboard

architecture

A real-time visualization showing current token usage across the context window, broken down by category (system prompt, conversation history, file contents, tool outputs). Helps developers stay within limits.

extensiondashboardmonitoring

Tool Result Clearing

context management

A lighter form of compaction where the raw results of previous tool calls (like long terminal outputs or file reads) are cleared to save space, while keeping the conclusions and decisions derived from them.

contexttoolsoptimization

Trimming (Last-N)

memory strategies

A simple context management strategy that keeps only the last N conversation turns and discards older ones. Fast and predictable but loses early context completely. Best for simple chatbots where recent context matters most.

trimmingcontextturns

Turn Limit

memory strategies

A configuration parameter that triggers automatic context compaction after a specified number of conversation turns, regardless of token count. Provides predictable compaction intervals for applications with consistent turn sizes.

compactionturnsthreshold
X

XML Tagging

prompt engineering

Using tags like <background_information>, <tool_guidance>, <constraints> to clearly separate different types of instructions in prompts. This structural technique helps models parse complex multi-section prompts more reliably.

promptsxmlstructure