Agent Fundamentals
The architecture behind modern AI coding assistants is surprisingly simple. Here's everything you need to know about how agents work.
What Makes a Model “Agentic”?
Agentic Models
Biased toward action through tool calling. They “chase” tool calls like digital squirrels seeking nuts.
Oracle Models
Prefer reasoning over action. They think deeply before acting, sometimes to the point of over-analysis.
The Agent Loop
This is the ENTIRE architecture of modern coding agents.
User input → context
The user provides a task or question
Model generates response
May include a tool call
Check for tool call
Does response contain a function call?
If yes: execute tool
Run the tool, add result to context
Loop back to step 2
Continue until no tool calls remain
If no: return response
Final answer delivered to user
Loop-in-a-Loop: Inner vs Outer
Claude Code and Codex both behave agentically (plan → tool calls → iterate). Codex documents its harness explicitly. Ralph adds an external meta-loop around the whole process. The outer loop solves a different problem than the inner loop — Ralph is not replacing the inner loop, it wraps around it.
Inner Loop
Codex / Claude Code
Think → call tools → observe → think → repeat → respond
This is the “harness / agent loop” that orchestrates tool calls and context.
✓ Great for: “finish this coding task” in one session
Outer Loop
Ralph
while :; do cat PROMPT.md | claude-code ; done
“Ralph is a technique… in its purest form, Ralph is a Bash loop.” — Geoff Huntley
✓ Great for: “keep going forever, one slice at a time”
The Key Idea
Each iteration starts with a fresh agent run (empty chat context), then reloads state from disk
This is the fundamental difference. The inner loop accumulates context. The outer loop resets it.
How the Outer Loop Works
Bash loop feeds PROMPT.md into Claude
Or Codex, Amp, any agent with CLI access
The prompt says: read IMPLEMENTATION_PLAN.md, pick one next task
Agent knows to focus on a single slice of work
Agent does a small slice of work, validates (tests/lint), updates plan files, optionally commits, then exits
Process terminates cleanly after completing task
Bash loop restarts the agent immediately
Fresh process, fresh context window
Agent reads the updated plan and repeats
Picks the next task, continues until project is done
Memory is NOT the Chat Context
What persists between runs:
Repo State
Files on disk
IMPLEMENTATION_PLAN.md
Task tracking
AGENTS.md
Agent instructions
Git History
Commits
File-based state + process restart
Comparing the Two Loops
Many tool calls per session
Iterative reasoning
Works until it decides it's done
Must handle growing context (compaction)
Restarts the whole program
Forces fresh context every “turn”
Turns a single-run agent into a repeatable worker pipeline, bounded and checkpointed
Uses filesystem as shared state
Why Two Loops? First Principles
Memory
Context Windows Degrade
- • Prompts get huge
- • Quality gets weird (“context rot”)
- • Compaction loses details
Control
Who is in Charge?
- • Codex: Agent is orchestrator
- • Ralph: Bash is the boss
If agent loops/hallucinates:
Normal: burns tokens forever
Ralph: capped by “1 task + exit”
Reliability
Checkpointing
- • Hard checkpoint every iteration
- • Committed code + updated plan
- • Repo is single source of truth
Single Long-Running Agent
(Codex / Claude Code)
- • Tries to finish everything in one “life”
- • Needs compaction + prompt management
- • Great UX, more “assistant-y”
Factory Line
(Ralph)
- • Each worker gets hired, does 1 job, clocks out
- • The factory (bash + repo) persists forever
- • Great autonomy + cost control
Inner loop = how the agent thinks and uses tools.
Outer loop = how you keep spawning the agent until the whole project is done, with fresh context each time.
The Five Primitives
Every coding agent needs these tools. That's it.
Read
Load file contents into context
List
Enumerate directories and structure
Bash
Execute shell commands
Edit
Modify existing files
Search
Find patterns with ripgrep
Context Window Truths
Advertised ≠ Usable
Sonnet: 200k advertised, ~176k usable
More Context = Worse Performance
Models degrade as context fills up
Clear Context Between Activities
Fresh context = better results
MCP Servers Consume Context
Each tool description uses tokens
One Task, One Context
Context Pollution
If you start building an API controller, then research meerkats, don't expect the final design to stay focused. The model conflates information from unrelated tasks in the same context.
Clean Slate
Clear the context window after each activity. One task, one context. This keeps the model focused and prevents cross-contamination between unrelated work.
Less Tools = Better Results
Every tool you register consumes context tokens, and the model must evaluate all available tools for every single response.
Example: MCP Tool Bloat
76K tokens for MCP tools = only 100K usable context remaining. Performance degrades measurably with too many tools registered.
“Less is more, folks. Less is more.”
Start with the 5 primitives. Add tools only when proven necessary.
The Harness Layer
When you type a message to an AI agent, that's not the full prompt the model sees. Your message gets wrapped in a harness layer — invisible system instructions that shape how the model behaves.
System Prompt
Top-level instructions: persona, safety rules, capabilities
Harness / Tool Registrations
OS info (bash vs PowerShell), available tools, behavioral guidelines
Your Message
The actual task or question you typed
Model Response
Output shaped by all layers above
What's in the Harness?
- • Tool definitions (name, description, parameters)
- • Operating system context
- • Coding conventions and style guides
- • Safety and ethical boundaries
Why It Matters
- • You never see the full prompt
- • Harness content consumes your context
- • Explains why same question → different results in different tools
“The harness prompt is where your tool registrations go. It contains information such as the operating system you're running.”
Guidance, Not Guarantee
LLMs are non-deterministic. The same prompt can produce different outputs each time. Your instructions are guidance, not rules the model must follow.
Same Prompt, Different Outputs
Run 1
“The function validates user input...”
Run 2
“This validates the user's input data...”
Run 3
“Validates incoming user data...”
Same question, three semantically similar but textually different responses
Temperature: The Randomness Dial
0.0
Deterministic
Same output
0.7
Balanced
Common default
1.0+
Creative
Highly varied
What This Means
You can include guidance in your prompts, and it's just guidance. The model might not follow it exactly, especially for edge cases.
What To Do
Test your prompts. Iterate. Spend time playing with the models. Expect some variance and build systems that handle it.
“Through prompt evaluation, tuning, and spending time playing with the models” — that's how you get reliable results, not by expecting deterministic behavior.
The Democratization
No Moat in Agent Building
- ~300 lines of code for a basic agent
- Mini-SWE-agent: 68% on SWE-bench
- Canva now requires AI in interviews
The Real Competitive Advantage
It's not in building the agent — it's in understanding and execution.
“Your current workers will take your job, not AI”
— those who learn this stuff
Tools Using This Pattern
Explore Model Sizes
See how different model sizes affect capability and how tools augment them.