Builders Talk #1 — AI-Native Software Development
Vahid Faraji · Senior Applied AI Specialist @ Kariyer.net
Vahid breaks down the critical factors that determine the efficiency of AI assistants — context window vs. prompt distinction, token economics, cost management, and IDE planning with GitHub Copilot.
Concepts & Strategies
12 core ideas from the talk — each with a definition and a concrete example.
The Attention Budget
Every AI model has a fixed-size "window" it can look at — the context window. Think of it like a desk: you can only spread out so many papers before things start falling off the edge. Every word you send (system instructions, chat history, code files, tool outputs) takes up space on that desk. The total capacity is measured in tokens — roughly ¾ of a word each. When the desk is full, the model starts forgetting things.
GPT-4o has a 128K token window. A typical VS Code workspace with 15 open tabs, linter output, and chat history can easily reach 60K tokens — already half the budget before you even ask your question.
Information Overload (Context Rot)
As the context window fills up, the model's ability to find and use specific information degrades — even if the task itself stays the same. This is "Context Rot." It happens because the model's internal attention spreads thinner across more tokens. The result: it misses details, repeats itself, or confidently gives wrong answers.
Imagine searching for a single sentence in a 10-page document vs. a 500-page book. The sentence hasn't changed, but your ability to find it has. The same happens inside the model. Research shows accuracy drops most when the relevant fact blends semantically into surrounding text — and related-but-wrong information causes more errors than random noise.
Strategy: Trimming
The simplest fix: drop older messages. A "Last-N" approach keeps only the most recent conversation turns. It has zero added latency and gives perfect recall for recent context. The trade-off? The model completely forgets early instructions or long-term goals — what you might call "conversation amnesia."
You start a coding session by saying "always use TypeScript strict mode." After 20 turns of back-and-forth, the trimmer drops that early instruction. The model starts generating plain JavaScript. The fix: pin critical instructions in the system prompt where they won't get trimmed.
Strategy: Compaction & Summarisation
Instead of throwing away old messages entirely, summarise them. A "shadow prompt" — a compressed XML or Markdown block — replaces the raw history with a synthetic summary. This preserves the gist while freeing up token space. The risk is "Context Poisoning": if the summary contains a hallucination or error, it becomes permanent truth for every future turn.
After 30 turns of debugging a React component, the system compresses the history into: "<summary>User is fixing a useEffect cleanup bug in Dashboard.tsx. Attempted solutions: dependency array fix (failed), ref-based approach (partial success).</summary>" — now the model has room for the next attempt without losing the thread.
Strategy: Context Isolation (Sub-agents)
Complex tasks get delegated to sub-agents that run in their own isolated context windows. The detailed search results, tool logs, and intermediate steps stay in the sub-agent's window — only the final answer flows back to the main agent. This prevents the main context from getting cluttered with noise.
You ask the main agent to refactor authentication across your app. It spawns a Search Sub-agent that reads 40 files, greps for auth patterns, and builds a dependency map — all in its own context. The main agent only receives: "Auth is handled in 3 files: auth.ts, middleware.ts, session.ts. Here are the entry points." Clean.
Your IDE Is the Context Engine
"Your environment is the prompt." In modern AI-assisted coding, the IDE itself feeds signals into the context window — open files, linter errors, terminal output, project structure, git diffs. The quality of these signals (high signal-to-noise ratio) directly determines the quality of the AI's output. Managing what your IDE sends is context engineering in practice.
VS Code's Copilot agent reads your active file, related imports, linter errors, and terminal output. If you have 25 irrelevant tabs open, those dilute the signal. Closing unrelated files is literally improving your AI's reasoning — this is why the Tokalator extension shows a "Budget Level" indicator (Low/Medium/High) so you can see the impact in real time.
The Plan Agent Workflow
Modern AI coding agents follow a structured 4-phase workflow instead of jumping straight to writing code. Discovery: the agent explores your codebase to understand the structure. Alignment: it asks clarifying questions so it doesn't guess wrong. Design: it writes a step-by-step plan with specific file locations. Refinement: it double-checks decisions and adds verification criteria.
You type /plan "add dark mode support". The agent first discovers your CSS architecture (Tailwind? CSS modules?), then asks: "Should dark mode be system-preference-based or toggle-based?" — preventing a wrong assumption that would waste 20 minutes. Only after your answer does it produce a file-by-file implementation plan.
Message Steering & Thinking Tokens
You don't have to wait for the AI to finish if it's heading in the wrong direction. "Message Steering" lets you send a correction mid-task. Combined with "Thinking Tokens" — where the model shows its internal reasoning process as it works — you can see exactly when and why the model goes off track, and redirect immediately.
The agent starts refactoring your auth module and you see in the thinking tokens: "I'll convert this to a class-based approach..." — but you prefer functions. You send "keep it functional, no classes" while it's still working. The agent adjusts course without starting over.
The Product Brain
Treat product requirements like a living codebase, not a static document. Unstructured inputs (Slack messages, emails, user feedback) flow through an agentic synthesis process that updates a living spec. That spec then drives structured actions: generating PRs, updating roadmaps, assigning tasks. The "Product Brain" is a sidecar repository that captures the reasoning behind every decision.
A customer support ticket says "users can't find the export button." The Product Brain agent processes this, updates the spec with a new requirement ("move export to top-level toolbar"), and generates a draft PR with the proposed UI change — all tracked in a reasoning log you can audit.
Real-Time Budget Tracking: The Tokalator
The Tokalator is a VS Code extension that acts as a real-time context budget calculator. It shows how many tokens your current session is consuming, previews the cost of your next turn, and provides one-click cleanup of low-relevance tabs to prevent attention dilution. It turns the invisible "attention budget" into something you can see and manage.
Your Tokalator dashboard shows: Budget Level: HIGH (warning). 85K of 128K tokens used. Top consumers: 3 test files (22K tokens, low relevance). One click on "Clean low-relevance tabs" drops you to 63K tokens — Medium budget — and the model's next response is noticeably sharper.
The Context Engineering Checklist
Five rules for effective context management. Altitude: keep system instructions specific enough to be useful, flexible enough not to conflict with varied tasks. Hygiene: trim redundant tool outputs and stale messages regularly. Structure: use XML/Markdown tags to section your context so the model can navigate it. Memory: use just-in-time retrieval for large datasets instead of dumping everything upfront. The golden rule: treat context as a finite resource, because it is.
Before a complex refactoring session: (1) Pin your coding standards in the system prompt. (2) Close all unrelated tabs. (3) Structure your request with clear sections: "## Goal", "## Constraints", "## Files to modify". (4) Let the agent discover dependencies via search instead of pasting code. (5) Check your Tokalator budget level before each major prompt.
Resources & Tools
Collections, release notes, and context management tools referenced in the talk.
Awesome Copilot — Collections
Community-curated agents, prompts, and instructions for GitHub Copilot.
Visit →VS CodeVS Code v1.109 Release Notes
Latest agentic features: Plan Agent, Message Steering, Sub-agents, Copilot Memory, and more.
Visit →ToolContext7
Up-to-date documentation and code examples pulled directly into your prompt. No stale training data.
Visit →GitHubOneContext
Open-source context management lab — tools and research for building context-aware AI systems.
Visit →PlatformAgentation
Platform and patterns for building production-grade autonomous agents with structured context flows.
Visit →