Agent Fundamentals

The architecture behind modern AI coding assistants is surprisingly simple. Here's everything you need to know about how agents work.

What Makes a Model “Agentic”?

🐿️

Agentic Models

Biased toward action through tool calling. They “chase” tool calls like digital squirrels seeking nuts.

Claude SonnetKimi K2

🔮

Oracle Models

Prefer reasoning over action. They think deeply before acting, sometimes to the point of over-analysis.

Claude OpusGPT-4o

The Agent Loop

This is the ENTIRE architecture of modern coding agents.

User input → context

The user provides a task or question

Model generates response

May include a tool call

Check for tool call

Does response contain a function call?

If yes: execute tool

Run the tool, add result to context

Loop back to step 2

Continue until no tool calls remain

If no: return response

Final answer delivered to user

Steps 4-5→Loop until no more tool calls

Loop-in-a-Loop: Inner vs Outer

Claude Code and Codex both behave agentically (plan → tool calls → iterate). Codex documents its harness explicitly. Ralph adds an external meta-loop around the whole process. The outer loop solves a different problem than the inner loop — Ralph is not replacing the inner loop, it wraps around it.

🔄

Inner Loop

Codex / Claude Code

Think → call tools → observe → think → repeat → respond

This is the “harness / agent loop” that orchestrates tool calls and context.

✓ Great for: “finish this coding task” in one session

♻️

Outer Loop

Ralph

while :; do cat PROMPT.md | claude-code ; done

“Ralph is a technique… in its purest form, Ralph is a Bash loop.” — Geoff Huntley

✓ Great for: “keep going forever, one slice at a time”

💡

The Key Idea

Each iteration starts with a fresh agent run (empty chat context), then reloads state from disk

This is the fundamental difference. The inner loop accumulates context. The outer loop resets it.

How the Outer Loop Works

Bash loop feeds PROMPT.md into Claude

Or Codex, Amp, any agent with CLI access

The prompt says: read IMPLEMENTATION_PLAN.md, pick one next task

Agent knows to focus on a single slice of work

Agent does a small slice of work, validates (tests/lint), updates plan files, optionally commits, then exits

Process terminates cleanly after completing task

Bash loop restarts the agent immediately

Fresh process, fresh context window

Agent reads the updated plan and repeats

Picks the next task, continues until project is done

Memory is NOT the Chat Context

What persists between runs:

📁

Repo State

Files on disk

📋

IMPLEMENTATION_PLAN.md

Task tracking

🤖

AGENTS.md

Agent instructions

📜

Git History

Commits

File-based state + process restart

Comparing the Two Loops

🔄Inner Loop (Claude Code / Codex)

Many tool calls per session

Iterative reasoning

Works until it decides it's done

Must handle growing context (compaction)

♻️Outer Loop (Ralph)

Restarts the whole program

Forces fresh context every “turn”

Turns a single-run agent into a repeatable worker pipeline, bounded and checkpointed

Uses filesystem as shared state

Why Two Loops? First Principles

⚠️

Memory

Context Windows Degrade

• Prompts get huge
• Quality gets weird (“context rot”)
• Compaction loses details

Codex: prompt caching + compaction

Ralph: kill → restart → reload from files

🎮

Control

Who is in Charge?

• Codex: Agent is orchestrator
• Ralph: Bash is the boss

If agent loops/hallucinates:

Normal: burns tokens forever

Ralph: capped by “1 task + exit”

✅

Reliability

Checkpointing

• Hard checkpoint every iteration
• Committed code + updated plan
• Repo is single source of truth

“Deterministically bad — it fails in repeatable ways, so you can tune it.”

🧠

Single Long-Running Agent

(Codex / Claude Code)

• Tries to finish everything in one “life”
• Needs compaction + prompt management
• Great UX, more “assistant-y”

🏭

Factory Line

(Ralph)

• Each worker gets hired, does 1 job, clocks out
• The factory (bash + repo) persists forever
• Great autonomy + cost control

Inner loop = how the agent thinks and uses tools.
Outer loop = how you keep spawning the agent until the whole project is done, with fresh context each time.

The Five Primitives

Every coding agent needs these tools. That's it.

📖

Read

Load file contents into context

📁

List

Enumerate directories and structure

⚡

Bash

Execute shell commands

✏️

Edit

Modify existing files

🔍

Search

Find patterns with ripgrep

Context Window Truths

⚠️

Advertised ≠ Usable

Sonnet: 200k advertised, ~176k usable

📉

More Context = Worse Performance

Models degrade as context fills up

🧹

Clear Context Between Activities

Fresh context = better results

🔌

MCP Servers Consume Context

Each tool description uses tokens

One Task, One Context

Context Pollution

If you start building an API controller, then research meerkats, don't expect the final design to stay focused. The model conflates information from unrelated tasks in the same context.

Clean Slate

Clear the context window after each activity. One task, one context. This keeps the model focused and prevents cross-contamination between unrelated work.

Less Tools = Better Results

Every tool you register consumes context tokens, and the model must evaluate all available tools for every single response.

📊

Example: MCP Tool Bloat

76K tokens for MCP tools = only 100K usable context remaining. Performance degrades measurably with too many tools registered.

“Less is more, folks. Less is more.”

Start with the 5 primitives. Add tools only when proven necessary.

The Harness Layer

When you type a message to an AI agent, that's not the full prompt the model sees. Your message gets wrapped in a harness layer — invisible system instructions that shape how the model behaves.

🎛️

System Prompt

Top-level instructions: persona, safety rules, capabilities

🔧

Harness / Tool Registrations

OS info (bash vs PowerShell), available tools, behavioral guidelines

💬

Your Message

The actual task or question you typed

🤖

Model Response

Output shaped by all layers above

What's in the Harness?

• Tool definitions (name, description, parameters)
• Operating system context
• Coding conventions and style guides
• Safety and ethical boundaries

Why It Matters

• You never see the full prompt
• Harness content consumes your context
• Explains why same question → different results in different tools

“The harness prompt is where your tool registrations go. It contains information such as the operating system you're running.”

Guidance, Not Guarantee

LLMs are non-deterministic. The same prompt can produce different outputs each time. Your instructions are guidance, not rules the model must follow.

Same Prompt, Different Outputs

Run 1

“The function validates user input...”

Run 2

“This validates the user's input data...”

Run 3

“Validates incoming user data...”

Same question, three semantically similar but textually different responses

Temperature: The Randomness Dial

0.0

Deterministic

Same output

0.7

Balanced

Common default

1.0+

Creative

Highly varied

⚠️

What This Means

You can include guidance in your prompts, and it's just guidance. The model might not follow it exactly, especially for edge cases.

✅

What To Do

Test your prompts. Iterate. Spend time playing with the models. Expect some variance and build systems that handle it.

“Through prompt evaluation, tuning, and spending time playing with the models” — that's how you get reliable results, not by expecting deterministic behavior.

The Democratization

No Moat in Agent Building

~300 lines of code for a basic agent
Mini-SWE-agent: 68% on SWE-bench
Canva now requires AI in interviews

The Real Competitive Advantage

It's not in building the agent — it's in understanding and execution.

“Your current workers will take your job, not AI”

— those who learn this stuff

Tools Using This Pattern

Cursor·AI-first code editor

Windsurf·AI coding assistant

Claude Code·Anthropic's coding agent

Copilot·GitHub's AI pair programmer

Amp·AI development platform

Explore Model Sizes

See how different model sizes affect capability and how tools augment them.

View Models