Context Engineering
Wiki
58 articles from arXiv, OpenAI, Anthropic, Google AI, and built-in terms. Auto-fetched and searchable.
Engineering Tagging Languages for DSLs
To keep a DSL clean, readable and reusable in different contexts, it is useful to define a separate tagging language. A tag model logically adds information to the tagged DSL model while technically...
Data Engineering for Scaling Language Models to 128K Context
We study the continual pretraining recipe for scaling language models' context lengths to 128K, with a focus on data engineering. We hypothesize that long context modeling, in particular \textit{the...
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines. However, LLMs are optimized for language generation...
How Important Is Tokenization in French Medical Masked Language Models?
Subword tokenization has become the prevailing standard in the field of natural language processing (NLP) over recent years, primarily due to the widespread utilization of pre-trained language...
Token Weighting for Long-Range Language Modeling
Many applications of large language models (LLMs) require long-context understanding, but models continue to struggle with such tasks. We hypothesize that conventional next-token prediction training...
On the solution existence and stability of polynomial optimization problems
This paper introduces and investigates a regularity condition in the asymptotic sense for optimization problems whose objective functions are polynomial. Under this regularity condition, the...
Caching with rental cost and zapping
The \emph{file caching} problem is defined as follows. Given a cache of size $k$ (a positive integer), the goal is to minimize the total retrieval cost for the given sequence of requests to files. A...
StruQ: Defending Against Prompt Injection with Structured Queries
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However,...
How and Where to Translate? The Impact of Translation Strategies in Cross-lingual LLM Prompting
Despite advances in the multilingual capabilities of Large Language Models (LLMs), their performance varies substantially across different languages and tasks. In multilingual retrieval-augmented...
Exploiting Context to Identify Lexical Atoms -- A Statistical View of Linguistic Context
Interpretation of natural language is inherently context-sensitive. Most words in natural language are ambiguous and their meanings are heavily dependent on the linguistic context in which they are...
Towards Adaptive Context Management for Intelligent Conversational Question Answering
This particular paper introduces an Adaptive Context Management (ACM) framework for the Conversational Question Answering (ConvQA) systems. The key objective of the ACM framework is to optimize the...
Impacts of National Cultures on Managerial Decisions of Engaging in Core Earnings Management
This study investigates the impact of Hofstede's cultural dimensions on abnormal core earnings management in multiple national cultural contexts. We employ an Ordinary Least Squares (OLS) regression...
EVOR: Evolving Retrieval for Code Generation
Recently the retrieval-augmented generation (RAG) has been successfully applied in code generation. However, existing pipelines for retrieval-augmented code generation (RACG) employ static knowledge...
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level. Unlike...
Intelligent Interaction Strategies for Context-Aware Cognitive Augmentation
Human cognition is constrained by processing limitations, leading to cognitive overload and inefficiencies in knowledge synthesis and decision-making. Large Language Models (LLMs) present an...
How to work with large language models
[Large language models][Large language models Blog Post] are functions that map text to text. Given an input string of text, a large language model predicts the text that should come next.
Techniques to improve reliability
When GPT-3 fails on a task, what should you do?
Related resources from around the web
People are writing great tools and papers for improving outputs from GPT. Here are some cool ones we've seen:
How_to_count_tokens_with_tiktoken
{ "cells": { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": " How to count tokens with tiktoken\n", "\n", " tiktoken ...
How_to_stream_completions
{ "cells": { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": " How to stream completions\n", "\n", "By default, when you request a completion...
Prompt Caching
Claude API Documentation
Prompt Engineering Overview
Claude API Documentation
Chain of Thought Prompting
Comprehensive guide to prompt engineering techniques for Claude's latest models, covering clarity, examples, XML structuring, thinking, and agentic systems.
Context Windows
Claude API Documentation
Long Context Window Tips
Comprehensive guide to prompt engineering techniques for Claude's latest models, covering clarity, examples, XML structuring, thinking, and agentic systems.
Token Counting
Claude API Documentation
Use XML Tags in Prompts
Comprehensive guide to prompt engineering techniques for Claude's latest models, covering clarity, examples, XML structuring, thinking, and agentic systems.
Extended Thinking
Claude API Documentation
Context Caching
Learn how to use Context Caching in the Gemini API
Long Context
Learn about how to get started building with long context (1 million context window) on Gemini.
Tokens
Ya está disponible la versión preliminar de Gemini 3.1 Flash Lite. Pruébalo en AI Studio https://aistudio.google.com/prompts/new_chat?model=gemini 3.1 flash lite preview&hl=es 419 . ...
Prompting Strategies
Ya está disponible la versión preliminar de Gemini 3.1 Flash Lite. Pruébalo en AI Studio https://aistudio.google.com/prompts/new_chat?model=gemini 3.1 flash lite preview&hl=es 419 . ...
System Instructions
البدء بإنشاء تطبيقات للمحادثات وإنشاء النصوص باستخدام Gemini API
Code Execution
Learn how to use the Gemini API code execution feature.
Progressive Disclosure
Instead of loading an entire codebase—which would immediately overwhelm the attention budget—modern agents use JIT context. The assistant dynamically loads only the necessary data at runtime.
Lightweight Identifiers
The assistant maintains references (file paths, stored queries) and dynamically loads only the necessary data at runtime using tools like grep, head, or tail.
Compaction
When a session nears its token limit, the assistant summarizes critical details—such as architectural decisions and unresolved bugs—while discarding redundant tool outputs.
Tool Result Clearing
A light touch form of compaction where the raw results of previous tool calls (like long terminal outputs) are cleared to save space.
Structured Note-taking
The agent may maintain an external NOTES.md or a to-do list to track dependencies and progress across thousands of steps, which it can read back into its context after a reset.
Distractors
Files or code snippets that are topically related to the query but do not contain the answer can cause the model to lose focus or hallucinate.
Context Rot
As more tokens are added, the model's ability to accurately retrieve needles of information from the haystack of the codebase decreases.
XML Tagging
Use tags like <background_information>, <tool_guidance>, <constraints> to clearly separate different types of instructions in system prompts.
High-Signal Tokens
The objective is to provide the smallest possible set of high-signal tokens that maximize the likelihood of the correct code generation.
Structural Patterns
Research suggests that models often perform better on shuffled or unstructured context than on logically structured haystacks, impacting how they process long files.
Agent Skills
Reusable packages of domain expertise defined in SKILL.md files that provide specialized AI agent capabilities. Introduced as GA in VS Code 1.109, skills can be invoked as slash commands or loaded...
Agent Hooks
Deterministic shell commands that execute at key lifecycle points during agent sessions. Unlike instructions, hooks run code with guaranteed outcomes for security policies, quality checks, or audit...
Agent Orchestration
A multi-agent pattern where specialized subagents collaborate on complex tasks, each operating in its own dedicated context window. Provides context efficiency, specialization with different models,...
Message Steering
An agent interaction pattern where follow-up messages redirect a running agent request. The agent yields after the active tool execution and processes the new message. Alternatives include request...
Terminal Sandboxing
A security mechanism restricting file system and network access for agent-executed terminal commands. Sandboxed commands have read/write access only to the workspace directory, and network access can...
Thinking Tokens
Tokens generated during a model's internal reasoning process before producing a visible response. Thinking tokens consume context budget but improve quality on complex tasks. Anthropic models support...
A Survey of Context Engineering for Large Language Models
Context Engineering is a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. This survey of 1,400+ papers covers context retrieval, processing, management, RAG, memory systems, tool-integrated reasoning, and multi-agent architectures.
Token-Budget-Aware LLM Reasoning
LLM reasoning chains are unnecessarily long and can be compressed by including a token budget in the prompt. This framework dynamically estimates a token budget per problem based on reasoning complexity, reducing token costs with only a slight performance reduction.
Agentic Much? Adoption of Coding Agents on GitHub
The first large-scale empirical study of coding agent adoption across 129,134 GitHub projects finds an estimated adoption rate of 15.85–22.60% by late 2025 — very high for a technology only months old. Agentic tools like Cursor, Claude Code, and Codex are rapidly replacing traditional code completion.
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
ACE treats the system prompt as an evolving playbook that accumulates strategies through generation, reflection, and curation. It achieves +10.6% on agent benchmarks and +8.6% on finance tasks while significantly reducing adaptation latency and rollout cost.
Context Branching for LLM Conversations: A Version Control Approach to Exploratory Programming
ContextBranch applies version-control semantics (checkpoint, branch, switch, inject) to LLM conversations, reducing context size by 58.1% in exploratory programming. A 39% average performance drop in multi-turn conversations motivates structured context management.
Codified Context: Infrastructure for AI Agents in a Complex Codebase
A three-component codified context infrastructure — hot-memory constitution, 19 specialist agents, and cold-memory knowledge base — deployed across 283 sessions on a 108,000-line C# codebase, preventing LLMs from forgetting project conventions across sessions.
SaaS Bridge Session: Context Engineering in Practice — Feedback Report
Summary of the SaaS Bridge developer session (March 2026) where Tokalator was introduced to ~90 developers. Key feedback themes: standalone CLI demand, turn-count visibility, and minor UI bugs.
TechCareer Community Session: Developer Feedback on Context Engineering Tools
Summary of the TechCareer community session (March 2026) introducing Tokalator to ~80 developers. Feedback covered CLI workflows, turn-budget indicators, and minor bugs. TechCareer is an open developer community.