How AI Agents Actually Work
The first time I watched an agent work end-to-end, I realized most explanations skip the boring, crucial details. Everyone talks about the magic of reasoning, but no one shows you the scaffolding that makes it happen. This article is the scaffolding.
An agent is fundamentally different from a chatbot. A chatbot answers your question. An agent decides what to do to answer your question. That decision-making loop—perceive, think, act, observe—is where everything lives.
The Agent Loop: Perceive → Think → Act → Observe
Every agent, from the simplest to the most complex, runs a loop. Here's the pseudocode:
loop:
1. PERCEIVE: Get the current state
- Read the user's message
- Retrieve relevant context from memory
- Observe tool outputs from the last step
2. THINK: Reason about what to do
- LLM processes state and generates thoughts
- LLM decides: call a tool or respond directly?
3. ACT: Execute the decision
- If tool call: invoke the function
- If response: return answer to user
4. OBSERVE: Record what happened
- Store tool output in context
- Update state for next loop iteration
Exit when: response given OR max steps exceeded
Here's how the loop looks visually — every agent, regardless of complexity, runs this cycle:
flowchart TD
U([User / Environment]) -->|input| P[PERCEIVE]
P --> T[THINK]
T -->|tool call| A[ACT]
T -->|final answer| R([Respond to User])
A --> O[OBSERVE]
O --> P
style U fill:#f5f5f4,stroke:#e7e5e4
style R fill:#f5f5f4,stroke:#e7e5e4
style P fill:#fafaf9,stroke:#a8a29e
style T fill:#fafaf9,stroke:#a8a29e
style A fill:#fafaf9,stroke:#a8a29e
style O fill:#fafaf9,stroke:#a8a29e
This looks abstract. Let me make it concrete with a real example.
Concrete Example: "Book me a flight to Denver"
Step 1: PERCEIVE
User input: "I need a flight to Denver tomorrow, budget $300"
Memory context: [Previous bookings show user prefers morning departures]
Tool outputs: [None yet, first iteration]
Step 2: THINK The LLM sees: goal requires searching flights. It doesn't have flight data. It needs to call a tool.
LLM reasoning (in context window):
I need to: find flights to Denver tomorrow under $300.
Available tools: search_flights, check_weather, book_flight
The search_flights tool takes: destination, date, budget, preferences
I have all that. Let me call search_flights.
Step 3: ACT
Tool call: search_flights(
destination="Denver",
date="2026-04-05",
budget_usd=300,
time_preference="morning"
)
Step 4: OBSERVE Tool returns:
[
{flight_id: "UA123", departure: "7:30 AM", price: "$280"},
{flight_id: "DL456", departure: "6:45 AM", price: "$295"}
]
Next Loop (Iteration 2):
- PERCEIVE: Now the context includes the flight results above
- THINK: Agent decides both options fit budget, recommends the cheapest + earliest
- ACT: Agent responds directly to user (no tool call)
- OBSERVE: Conversation ends, or user asks follow-up
This loop repeats until the agent outputs a response (ACT) or hits a maximum step limit.
Tool Use: How LLMs Actually Call Functions
This is where most tutorials handwave. Here's what actually happens.
An LLM doesn't "call" functions like a normal program. Instead, it generates text that describes the function call. The agent runtime parses that text and executes the actual function.
The Actual Mechanism
The LLM never directly touches your code. Instead:
- Prompt includes tool definitions: The system prompt lists all available tools in a structured format (usually JSON Schema).
- LLM generates tool-use syntax: Depending on your API (OpenAI, Anthropic, etc.), the LLM generates something like:
Based on the flights shown, I should book the cheapest option.
<tool_use>
{
"name": "book_flight",
"input": {
"flight_id": "UA123",
"passenger_name": "Alice Johnson"
}
}
</tool_use>
- Runtime parses and executes: Your agent code detects the tool-use block, extracts the function name and parameters, calls the actual function, and gets a result.
- Result fed back to LLM: The tool result is added to the context, and the loop continues.
Here's pseudocode for the tool-calling runtime:
def run_agent_step(user_input, memory_context, available_tools):
# Prepare the prompt
system_prompt = build_system_prompt(available_tools)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input}
] + memory_context
# Get LLM response (which may include tool calls)
response = llm.generate(messages)
# Parse tool calls from response
tool_calls = parse_tool_calls(response.text)
if tool_calls:
# Execute the tools
results = []
for call in tool_calls:
function = available_tools[call.name]
result = function(**call.input)
results.append({
"tool": call.name,
"input": call.input,
"output": result
})
# Add results to context and loop again
new_context = memory_context + [{
"role": "assistant",
"content": response.text
}, {
"role": "user",
"content": format_tool_results(results)
}]
return run_agent_step(
user_input="", # empty, we're continuing
memory_context=new_context,
available_tools=available_tools
)
else:
# No tool calls, agent responded directly
return response.text
Key insight: The LLM never sees the actual function code. It only sees the tool description (name, parameters, purpose). It's generating text that describes what to do. Your runtime interprets that text.
Planning Strategies: ReAct and Chain-of-Thought
An agent without a planning strategy is like driving without checking a map. You might get somewhere, but probably not where you intended.
ReAct: Reason + Act
ReAct (Reasoning + Acting) is the dominant pattern in production agents. The idea: make the LLM's reasoning explicit before each tool call.
The LLM generates something like:
Thought: I need to find flights to Denver for tomorrow.
Action: search_flights
Action Input: {"destination": "Denver", "date": "2026-04-05", "budget_usd": 300}
Observation: [Flight results...]
Thought: The user prefers morning flights. UA123 at 7:30 AM is $280. I should recommend this.
Action: respond_to_user
Action Input: {"message": "I found a flight departing at 7:30 AM for $280..."}
Why this works: Explicit reasoning gives the LLM a hook to catch its own mistakes. If it thinks "I'll call search_flights" but then actually calls book_flights, something is obviously wrong.
Chain-of-Thought (CoT)
Chain-of-Thought is simpler: the LLM writes out its reasoning step-by-step before deciding.
Let me think through this:
1. The user wants a flight tomorrow
2. I don't have access to flight databases directly
3. I should use the search_flights tool
4. It needs destination, date, and budget
5. I have all three pieces of information
6. Let me call the tool now...
CoT is useful when the task is complex but doesn't require repeated tool calls. ReAct is better when tools are involved.
The Three Types of Memory
Agents without memory are like people with amnesia—they repeat themselves and can't learn. But not all memory is the same.
1. In-Context (Short-Term) Memory
This is the conversation history stuffed into the prompt. Every message, tool output, and observation lives in the context window.
Pros:
- Fast (no lookups needed)
- Clear (the LLM sees everything)
- Works immediately
Cons:
- Limited by context window size (~100K tokens tops)
- Expensive (you pay per token, including all history)
- Forces you to truncate old conversations
When I built systems at Amazon processing millions of requests, we learned: in-context memory alone doesn't scale. A 10-turn conversation uses maybe 2K tokens, but a 100-turn conversation becomes expensive.
2. External Vector Store (Semantic Memory)
Imagine you have 1000 conversations with an agent. You can't fit them all in context. Instead, you embed them into a vector database and retrieve the most relevant ones.
User: "Do I prefer morning or evening flights?"
Retrieve from vector store:
→ similarity("prefer morning") → [past_convo_1, past_convo_2, ...]
→ Add top 3 to context
Pros:
- Handles unlimited historical data
- Semantic search (finds relevant context by meaning, not keyword)
- Reduces context window bloat
Cons:
- Retrieval quality matters (bad embeddings = bad memories)
- Latency (extra database lookup)
- Can mix up similar but different contexts
3. Key-Value Store (Episodic Memory)
For facts that don't change often, use a simple KV store. "User's preferred airline: United", "User's home city: San Francisco".
Before PERCEIVE step:
facts = kv_store.get("user_preferences")
→ {"airline": "United", "home_city": "San Francisco"}
→ Add to context as: "User prefers United flights and lives in San Francisco"
Pros:
- Precise (structured facts)
- Fast (direct lookup)
- Easy to update
Cons:
- Requires manual structure (you decide what to store)
- Static (facts don't capture nuance)
- Update logic can be tricky
4. Structured Database (Procedural Memory)
Some agents learn procedures. "When booking a flight, always check weather first." This lives in a database of learned rules or workflows.
In my experience, most production agents use a mix of all three:
- In-context for the current turn
- Vector store for relevant historical context
- KV store for user facts
- Procedural for learned patterns
The tradeoff: more memory systems = more complexity. But a single memory type will eventually fail.
Here's how these memory types fit into the agent loop:
flowchart LR
subgraph Memory["Memory Systems"]
IC["In-Context (current session)"]
VS["Vector Store (semantic search)"]
KV["Key-Value Store (user facts)"]
DB["Structured DB (learned rules)"]
end
PERCEIVE -->|load into context| IC
VS -->|top-k relevant| PERCEIVE
KV -->|user preferences| PERCEIVE
DB -->|procedures| PERCEIVE
IC --> THINK
THINK --> ACT
ACT -->|store outcomes| VS
ACT -->|update facts| KV
style Memory fill:#fafaf9,stroke:#e7e5e4
From Theory to Practice
Let me tie this together with a more complete example. Here's a simplified booking agent:
SYSTEM PROMPT:
You are a flight booking assistant. You have access to:
- search_flights(destination, date, budget): searches flights
- check_weather(city, date): checks weather forecast
- book_flight(flight_id, passenger_name): books a flight
- respond(message): sends a message to user
Always use ReAct format: Thought → Action → Observation
USER INPUT:
"Book me a flight to Denver tomorrow. I hate rain."
ITERATION 1:
Thought: The user wants a flight and cares about weather.
I should check weather first, then search flights.
Action: check_weather
Action Input: {"city": "Denver", "date": "2026-04-05"}
Tool output:
{"forecast": "sunny, 72°F"}
ITERATION 2:
Thought: Great weather in Denver. Now find flights.
Action: search_flights
Action Input: {
"destination": "Denver",
"date": "2026-04-05",
"budget_usd": null
}
Tool output:
{"flights": [
{"id": "UA123", "time": "7:30 AM", "price": "$280"},
{"id": "DL456", "time": "2:15 PM", "price": "$250"}
]}
ITERATION 3:
Thought: Both flights are available. User said no rain (weather is good).
I should ask which one they prefer before booking.
Action: respond
Action Input: {
"message": "I found two flights to Denver tomorrow.
Weather is sunny and 72°F (great!).
Option 1: United at 7:30 AM ($280)
Option 2: Delta at 2:15 PM ($250)
Which would you prefer?"
}
Agent responds and waits for user input.
Notice: The agent reasoned about the problem (thought) before acting. It even checked weather proactively because the user mentioned rain. This is what separates agents from chatbots.
Key Takeaways
- Agents run a loop: perceive → think → act → observe
- Tool use works by LLM text generation + runtime parsing
- ReAct (explicit reasoning) is the pattern that works in production
- Memory comes in flavors: in-context (fast), vector store (semantic), KV (precise), procedural (learned)
- The complexity scales with ambition, but the fundamentals stay the same
Next article: why all of this falls apart at scale.