AI-NATIVESOFTWARE DEV.ONLY BUILDERS CLUB
PasttalkPart 1

Builders Talk #1 — AI-Native Software Development

Vahid Faraji · Senior Applied AI Specialist @ Kariyer.net

Friday, February 13, 2026 · 12:30 – 17:00Istanbul

Vahid breaks down the critical factors that determine the efficiency of AI assistants — context window vs. prompt distinction, token economics, cost management, and IDE planning with GitHub Copilot.

Concepts & Strategies

12 core ideas from the talk — each with a definition and a concrete example.

The Attention Budget

Every AI model has a fixed-size "window" it can look at — the context window. Think of it like a desk: you can only spread out so many papers before things start falling off the edge. Every word you send (system instructions, chat history, code files, tool outputs) takes up space on that desk. The total capacity is measured in tokens — roughly ¾ of a word each. When the desk is full, the model starts forgetting things.

Example

GPT-4o has a 128K token window. A typical VS Code workspace with 15 open tabs, linter output, and chat history can easily reach 60K tokens — already half the budget before you even ask your question.

Information Overload (Context Rot)

As the context window fills up, the model's ability to find and use specific information degrades — even if the task itself stays the same. This is "Context Rot." It happens because the model's internal attention spreads thinner across more tokens. The result: it misses details, repeats itself, or confidently gives wrong answers.

Example

Imagine searching for a single sentence in a 10-page document vs. a 500-page book. The sentence hasn't changed, but your ability to find it has. The same happens inside the model. Research shows accuracy drops most when the relevant fact blends semantically into surrounding text — and related-but-wrong information causes more errors than random noise.

Strategy: Trimming

The simplest fix: drop older messages. A "Last-N" approach keeps only the most recent conversation turns. It has zero added latency and gives perfect recall for recent context. The trade-off? The model completely forgets early instructions or long-term goals — what you might call "conversation amnesia."

Example

You start a coding session by saying "always use TypeScript strict mode." After 20 turns of back-and-forth, the trimmer drops that early instruction. The model starts generating plain JavaScript. The fix: pin critical instructions in the system prompt where they won't get trimmed.

Strategy: Compaction & Summarisation

Instead of throwing away old messages entirely, summarise them. A "shadow prompt" — a compressed XML or Markdown block — replaces the raw history with a synthetic summary. This preserves the gist while freeing up token space. The risk is "Context Poisoning": if the summary contains a hallucination or error, it becomes permanent truth for every future turn.

Example

After 30 turns of debugging a React component, the system compresses the history into: "<summary>User is fixing a useEffect cleanup bug in Dashboard.tsx. Attempted solutions: dependency array fix (failed), ref-based approach (partial success).</summary>" — now the model has room for the next attempt without losing the thread.

Strategy: Just-in-Time Retrieval

Instead of dumping everything into the context window upfront ("context stuffing"), let the agent discover what it needs layer by layer. It uses tools like grep, ls, and read_file to pull in only the relevant code when it needs it. Think of it this way: the agent shouldn't memorise the whole library — it should know how to use the card catalogue.

Example

You ask the AI to refactor a payment module. Instead of opening all 30 files at once, the agent first reads the directory structure, then opens only the payment service file, discovers it imports a validator, opens that too — building understanding step by step, keeping the context window clean.

Strategy: Context Isolation (Sub-agents)

Complex tasks get delegated to sub-agents that run in their own isolated context windows. The detailed search results, tool logs, and intermediate steps stay in the sub-agent's window — only the final answer flows back to the main agent. This prevents the main context from getting cluttered with noise.

Example

You ask the main agent to refactor authentication across your app. It spawns a Search Sub-agent that reads 40 files, greps for auth patterns, and builds a dependency map — all in its own context. The main agent only receives: "Auth is handled in 3 files: auth.ts, middleware.ts, session.ts. Here are the entry points." Clean.

Your IDE Is the Context Engine

"Your environment is the prompt." In modern AI-assisted coding, the IDE itself feeds signals into the context window — open files, linter errors, terminal output, project structure, git diffs. The quality of these signals (high signal-to-noise ratio) directly determines the quality of the AI's output. Managing what your IDE sends is context engineering in practice.

Example

VS Code's Copilot agent reads your active file, related imports, linter errors, and terminal output. If you have 25 irrelevant tabs open, those dilute the signal. Closing unrelated files is literally improving your AI's reasoning — this is why the Tokalator extension shows a "Budget Level" indicator (Low/Medium/High) so you can see the impact in real time.

The Plan Agent Workflow

Modern AI coding agents follow a structured 4-phase workflow instead of jumping straight to writing code. Discovery: the agent explores your codebase to understand the structure. Alignment: it asks clarifying questions so it doesn't guess wrong. Design: it writes a step-by-step plan with specific file locations. Refinement: it double-checks decisions and adds verification criteria.

Example

You type /plan "add dark mode support". The agent first discovers your CSS architecture (Tailwind? CSS modules?), then asks: "Should dark mode be system-preference-based or toggle-based?" — preventing a wrong assumption that would waste 20 minutes. Only after your answer does it produce a file-by-file implementation plan.

Message Steering & Thinking Tokens

You don't have to wait for the AI to finish if it's heading in the wrong direction. "Message Steering" lets you send a correction mid-task. Combined with "Thinking Tokens" — where the model shows its internal reasoning process as it works — you can see exactly when and why the model goes off track, and redirect immediately.

Example

The agent starts refactoring your auth module and you see in the thinking tokens: "I'll convert this to a class-based approach..." — but you prefer functions. You send "keep it functional, no classes" while it's still working. The agent adjusts course without starting over.

The Product Brain

Treat product requirements like a living codebase, not a static document. Unstructured inputs (Slack messages, emails, user feedback) flow through an agentic synthesis process that updates a living spec. That spec then drives structured actions: generating PRs, updating roadmaps, assigning tasks. The "Product Brain" is a sidecar repository that captures the reasoning behind every decision.

Example

A customer support ticket says "users can't find the export button." The Product Brain agent processes this, updates the spec with a new requirement ("move export to top-level toolbar"), and generates a draft PR with the proposed UI change — all tracked in a reasoning log you can audit.

Real-Time Budget Tracking: The Tokalator

The Tokalator is a VS Code extension that acts as a real-time context budget calculator. It shows how many tokens your current session is consuming, previews the cost of your next turn, and provides one-click cleanup of low-relevance tabs to prevent attention dilution. It turns the invisible "attention budget" into something you can see and manage.

Example

Your Tokalator dashboard shows: Budget Level: HIGH (warning). 85K of 128K tokens used. Top consumers: 3 test files (22K tokens, low relevance). One click on "Clean low-relevance tabs" drops you to 63K tokens — Medium budget — and the model's next response is noticeably sharper.

The Context Engineering Checklist

Five rules for effective context management. Altitude: keep system instructions specific enough to be useful, flexible enough not to conflict with varied tasks. Hygiene: trim redundant tool outputs and stale messages regularly. Structure: use XML/Markdown tags to section your context so the model can navigate it. Memory: use just-in-time retrieval for large datasets instead of dumping everything upfront. The golden rule: treat context as a finite resource, because it is.

Example

Before a complex refactoring session: (1) Pin your coding standards in the system prompt. (2) Close all unrelated tabs. (3) Structure your request with clear sections: "## Goal", "## Constraints", "## Files to modify". (4) Let the agent discover dependencies via search instead of pasting code. (5) Check your Tokalator budget level before each major prompt.

Resources & Tools

Collections, release notes, and context management tools referenced in the talk.

context-engineeringtoken-economicsgithub-copilotprompt-engineeringrefactoring