Multi-Agent Systems: Orchestration Patterns That Work

A single agent can book a flight. Can it also check weather, compare prices, find hotels, and make a complete travel plan? Eventually, you hit a wall.

The wall is cognitive load. An agent trying to do too much gets confused. It forgets the main goal. It makes contradictory decisions. It's like asking one person to simultaneously be a travel agent, meteorologist, and financial advisor.

That's where multi-agent systems come in. Instead of one powerful agent, you use many specialized agents and an orchestrator to coordinate them.

In my experience building systems at Amazon, we learned that this is incredibly powerful but also incredibly easy to mess up. This article covers the patterns that work.

Why Single Agents Hit Limits

Let's say you want an agent that:

Searches flights
Checks weather
Searches hotels
Compares prices
Checks visa requirements
Books flights and hotels
Generates an itinerary

A single agent trying to do all this:

Agent prompt:
"You are a travel planning assistant.
 You can search flights, check weather, search hotels, etc.
 Plan a complete trip to [destination]."

Agent's challenge:
- 7 different domains of expertise
- 7 different error modes
- 7 tools competing for attention
- Huge prompt (all tool descriptions)
- High context usage
- Easy to lose track of the goal

Real problem: When an agent has 20 tools, it often forgets why it's calling them. It searches flights without remembering the original budget. It books hotels without checking if the flight time makes sense.

The Orchestrator-Worker Pattern

The solution: one orchestrator agent that delegates to specialized worker agents.

flowchart TD
    U([User]) --> O[Orchestrator Agent]
    O --> FW[Flight Worker]
    O --> HW[Hotel Worker]
    O --> WW[Weather Worker]
    O --> IW[Itinerary Worker]
    FW -->|flight results| O
    HW -->|hotel results| O
    WW -->|weather data| O
    IW -->|final plan| U
    style U fill:#f5f5f4,stroke:#e7e5e4
    style O fill:#fafaf9,stroke:#a8a29e

User input: "Plan a trip to Denver for 3 days, budget $2000"

Orchestrator Agent:
"I need to:
 1. Find flights (delegate to FlightWorker)
 2. Find hotels (delegate to HotelWorker)
 3. Check weather (delegate to WeatherWorker)
 4. Create itinerary (aggregate results)

 Let me start..."

Orchestrator calls FlightWorker:
"Find flights to Denver for 3 days, under $600 total
 (leaving $1400 for hotel and activities)"

FlightWorker searches and returns:
"Best option: United 7:30 AM, $280 round-trip"

Orchestrator calls HotelWorker:
"Find hotels in Denver, 2 nights, budget $600"

HotelWorker searches and returns:
"Best option: Marriott downtown,  $$300/night, $$ 600 total"

Orchestrator calls WeatherWorker:
"What's the weather in Denver April 5-7?"

WeatherWorker returns:
"Sunny, 65°F"

Orchestrator synthesizes:
"Here's your trip:
 - Flight: United 7:30 AM ($280)
 - Hotel: Marriott downtown ($600)
 - Weather: Sunny, 65°F
 - Total: $880 (under budget!)
 - Free time: $1120 for activities"

How It Works

Orchestrator structure:

class Orchestrator:
    def __init__(self):
        self.workers = {
            "flights": FlightWorker(),
            "hotels": HotelWorker(),
            "weather": WeatherWorker(),
            "itinerary": ItineraryWorker()
        }

    def plan_trip(self, destination, duration, budget):
        # Step 1: Decompose goal
        plan = {
            "destination": destination,
            "duration": duration,
            "budget": budget,
            "flight_budget": budget * 0.3,  # 30% for flights
            "hotel_budget": budget * 0.5,   # 50% for hotels
            "activities_budget": budget * 0.2  # 20% for activities
        }

        # Step 2: Delegate to workers
        flights = self.workers["flights"].search(
            destination,
            budget=plan["flight_budget"]
        )

        hotels = self.workers["hotels"].search(
            destination,
            duration,
            budget=plan["hotel_budget"]
        )

        weather = self.workers["weather"].check(destination)

        # Step 3: Aggregate results
        itinerary = self.workers["itinerary"].create(
            flights=flights,
            hotels=hotels,
            weather=weather,
            budget=plan
        )

        return itinerary

Each worker is a simple, focused agent:

class FlightWorker:
    def search(self, destination, budget):
        # Only does one thing: find flights
        # Small prompt, single purpose, easy to test

        return api.search_flights(destination, budget)

class HotelWorker:
    def search(self, destination, duration, budget):
        # Only does one thing: find hotels
        return api.search_hotels(destination, duration, budget)

Key insight: Workers don't need to be LLM-based. They can be simple functions that call APIs. Only the orchestrator needs reasoning.

The Critic Pattern

Sometimes you want multiple agents to solve the same problem independently, then critique each other's solutions.

User: "How should I structure my React component?"

Expert 1 (Performance Agent):
"Use React.memo to prevent re-renders.
 Split into smaller components.
 Use useMemo for expensive calculations."

Expert 2 (Maintainability Agent):
"Create a custom hook to isolate logic.
 Use descriptive component names.
 Keep components under 100 lines."

Expert 3 (Accessibility Agent):
"Add ARIA labels.
 Ensure keyboard navigation.
 Use semantic HTML."

Critic Agent reviews all three:
"Great suggestions. I recommend:
 1. Use React.memo (Expert 1)
 2. Create custom hook (Expert 2)
 3. Add ARIA labels (Expert 3)
 4. Keep components under 100 lines (Expert 2)
 Note: Performance optimization via React.memo should
 come after ensuring accessibility (Expert 3 priority)."

Implementation

class ExpertPanel:
    def __init__(self):
        self.experts = [
            PerformanceExpert(),
            MaintainabilityExpert(),
            AccessibilityExpert()
        ]
        self.critic = CriticAgent()

    def evaluate(self, problem):
        # Get independent opinions
        opinions = []

        for expert in self.experts:
            opinion = expert.analyze(problem)
            opinions.append({
                "expert": expert.name,
                "opinion": opinion
            })

        # Have critic synthesize
        synthesis = self.critic.synthesize(problem, opinions)

        return synthesis

class CriticAgent:
    def synthesize(self, problem, opinions):
        # Prompt: evaluate all opinions
        prompt = f"""
        Problem: {problem}

        Opinions:
        {json.dumps(opinions, indent=2)}

        Synthesize these opinions into a single recommendation.
        Explain tradeoffs. Prioritize when opinions conflict.
        """

        return llm.generate(prompt)

When to use: Complex problems with multiple perspectives. Technical decisions with tradeoffs.

Execution Patterns: Sequential vs. Parallel

Sequential Execution

One worker finishes, next worker starts:

FlightWorker (starts)
  ↓
  (searches, takes 2s)
  ↓
FlightWorker (finishes)

HotelWorker (starts)
  ↓
  (searches, takes 2s)
  ↓
HotelWorker (finishes)

Total time: 4s

Pros: Simple, easy to debug, one thing at a time

Cons: Slow, orchestrator waits on each worker

Parallel Execution

Multiple workers run simultaneously:

FlightWorker (starts)  ─┐
                        │
HotelWorker (starts)  ──┼─ Run in parallel
                        │
WeatherWorker (starts) ─┘

Wait for all to finish

Total time: 2s (instead of 6s)

Pros: Fast, utilizes resources

Cons: Harder to debug, harder to handle failures

Implementation

from concurrent.futures import ThreadPoolExecutor, as_completed

class OrchestratorParallel:
    def plan_trip(self, destination, duration, budget):
        with ThreadPoolExecutor(max_workers=3) as executor:
            # Submit tasks
            flight_task = executor.submit(
                self.workers["flights"].search,
                destination,
                budget * 0.3
            )
            hotel_task = executor.submit(
                self.workers["hotels"].search,
                destination,
                duration,
                budget * 0.5
            )
            weather_task = executor.submit(
                self.workers["weather"].check,
                destination
            )

            # Wait for all to complete
            flights = flight_task.result()
            hotels = hotel_task.result()
            weather = weather_task.result()

        # Now aggregate
        return self.aggregate(flights, hotels, weather)

Real choice: Use parallel when workers are independent (flights and hotels). Use sequential when one depends on another (check weather, then choose activity).

Agent-to-Agent Communication: A2A Protocol

How do agents talk to each other? There needs to be a standard format.

Google published A2A (Agent-to-Agent), a protocol for agents to call each other:

Agent A calls Agent B:

Request:
{
  "agent_id": "agent_a",
  "target_agent": "agent_b",
  "task": "search_hotels",
  "parameters": {
    "city": "Denver",
    "nights": 2,
    "budget": 600
  },
  "context": {
    "user_preferences": {"luxury": false},
    "constraints": {"must_have_wifi": true}
  }
}

Agent B responds:

Response:
{
  "status": "success",
  "result": {
    "hotels": [
      {
        "name": "Marriott",
        "price": 300,
        "rating": 4.5,
        "has_wifi": true
      }
    ],
    "search_metadata": {
      "results_count": 1,
      "search_time_ms": 245
    }
  }
}

Benefits:

Standard format (all agents use same protocol)
Includes context (Agent B knows user preferences)
Includes metadata (helps orchestrator debug)
Easy to extend (can add custom fields)

Failure Modes of Multi-Agent Systems

Just because you split agents doesn't mean it works. Here are common failure modes:

1. Lost Context

Orchestrator gives each worker a piece of the goal. Worker forgets the original goal.

Original goal: "Plan a trip under $2000"

Orchestrator tells FlightWorker:
"Find flights under $600"

FlightWorker searches:
→ Finds United flight for $500
→ Doesn't check if this leaves enough budget for hotel
→ Recommends it anyway

Orchestrator tells HotelWorker:
"Find hotel under $1000"

HotelWorker searches:
→ Books 5-star hotel for $900/night
→ Total trip cost:  $$500 + $$ 900*2 = $2300
→ Over budget!

Mitigation: Pass full context to workers, not just partial requests:

# Instead of this:
flights = flight_worker.search(
    destination="Denver",
    budget=600
)

# Do this:
flights = flight_worker.search(
    destination="Denver",
    budget=600,
    context={
        "total_budget": 2000,
        "other_costs": 1100,
        "reason": "User planning a 3-day trip"
    }
)

2. Orchestrator Bottleneck

The orchestrator becomes the bottleneck. Every decision goes through it.

If orchestrator has to:
- Check worker results
- Validate outputs
- Handle worker failures
- Retry workers
- Aggregate results

Then orchestrator becomes complex and slow.

Mitigation: Let workers be autonomous. Only orchestrate high-level goals:

# Heavy orchestration (bad)
class OrchestratorBad:
    def book_trip(self, destination):
        flights = self.search_flights(destination)

        # Check flights
        if not flights:
            return {"error": "No flights"}

        # Check weather
        weather = self.check_weather(destination)

        # Book only if good weather
        if weather.rain_probability < 0.3:
            booking = self.book_flight(flights[0])
            return booking
        else:
            return {"error": "Bad weather"}

# Light orchestration (good)
class OrchestratorGood:
    def book_trip(self, destination):
        # Delegate everything to workers
        plan = self.trip_planner_agent.plan(destination)
        return plan

# TripPlannerAgent handles all logic internally

3. Worker Conflicts

Two workers make conflicting decisions.

FlightWorker finds flight at 11 PM (cheap!)
HotelWorker finds hotel 30 miles from airport
ItineraryWorker realizes: can't get from airport to hotel at midnight

Result: Invalid plan

Mitigation: Have workers share constraints:

constraints = {
    "flight_arrival_time": "between 6 AM and 10 PM",
    "hotel_distance_from_airport": "< 20 miles",
    "total_travel_time": "< 2 hours"
}

flights = flight_worker.search(destination, constraints)
hotels = hotel_worker.search(destination, constraints)

4. Cascading Failures

One worker fails. Orchestrator doesn't know how to recover.

FlightWorker fails (API down)
Orchestrator stops
Entire trip planning fails

Should have fallen back to:
- Show cached flights
- Show flights from yesterday
- Ask user to try again

Mitigation: Build explicit fallback logic:

def search_flights_with_fallback(destination, budget):
    try:
        return flight_api.search(destination, budget)
    except APIDownError:
        # Fallback 1: Use cached results from last hour
        cached = cache.get_flights(destination)
        if cached:
            return cached

        # Fallback 2: Use historical data
        historical = db.get_avg_flights(destination)
        return historical

        # Fallback 3: Inform user
        raise UserError(
            "Can't search flights right now. "
            "Try again in a few minutes."
        )

When NOT to Use Multi-Agent Systems

Multi-agent sounds powerful. But it adds complexity. Sometimes a single agent is better.

Use single agent if:

Problem is simple (1-3 tools)
Tools are tightly coupled (can't separate)
Low latency is critical (orchestration adds overhead)
Domain is narrow (expert focus possible)

Use multi-agent if:

Problem is complex (5+ domains)
Tools are independent
Accuracy matters more than latency
Different tools have different requirements

Real rule: Don't split agents until you have to.

Practical Example: Customer Support System

Here's a real multi-agent system:

class CustomerSupportOrchestrator:
    def __init__(self):
        self.agents = {
            "intent": IntentClassifier(),  # What does user want?
            "faq": FAQAgent(),              # Is this a common question?
            "account": AccountAgent(),      # Access account data?
            "billing": BillingAgent(),      # Billing issue?
            "technical": TechnicalAgent(),  # Technical issue?
            "resolver": ResolverAgent()     # Final resolution
        }

    def handle_support_request(self, user_input):
        # Step 1: Classify intent
        intent = self.agents["intent"].classify(user_input)
        # Returns: "billing", "technical", "refund", etc.

        # Step 2: Try FAQ
        faq_result = self.agents["faq"].search(user_input)
        if faq_result["confidence"] > 0.9:
            return {"type": "faq", "answer": faq_result["answer"]}

        # Step 3: Gather context (parallel)
        with ThreadPoolExecutor(max_workers=2) as executor:
            if intent in ["billing", "refund"]:
                billing_task = executor.submit(
                    self.agents["billing"].get_history,
                    user_id
                )
                billing_context = billing_task.result()
            else:
                billing_context = None

            if "account" in user_input.lower():
                account_task = executor.submit(
                    self.agents["account"].get_details,
                    user_id
                )
                account_context = account_task.result()
            else:
                account_context = None

        # Step 4: Resolve
        resolution = self.agents["resolver"].resolve(
            user_input,
            intent=intent,
            billing_context=billing_context,
            account_context=account_context
        )

        return resolution

This system:

Classifies the problem once
Gathers only needed context
Delegates resolution to specialized agent
Scales easily (add more agents)
Maintains focus (each agent has one job)

Key Takeaways

Multi-agent systems work when:

Single orchestrator coordinates specialized workers
Workers are autonomous and focus on one domain
Communication is standard (via A2A or similar)
Execution is efficient (parallel when possible)
Failures are handled (explicit fallbacks)

Common patterns:

Orchestrator-Worker: One coordinator, many specialists
Critic Pattern: Multiple experts, one critic
Parallel Execution: Run independent workers simultaneously
Sequential with Feedback: Later workers learn from earlier results

Don't split agents until you need to. But when problems get complex, multi-agent systems are often the only way forward.

Start simple. Add agents as needed.