Step one of the 12 steps is admitting you have a problem.
So here goes: I deploy agents I don’t need, for tasks a function call would handle, and I live in a constant state of multi-step agent sprawl.
OpenClaw on my VPS. Hermes on my telegram. Open models on Ollama, some custom (+ abandoned) ones on my Ubuntu machine. Cursor agents, my daily driver. Claude, also in the melee in a busy terminal. An OpenAI API key gathering dust until the next time Codex is the toast of the town, which is roughly every Tuesday-ish. At work, there’s another custom cast of badge-wearing serious agents inside the IDE, chatbots that aggressively police any non-conforming prompt.
I use them for research, code generation, chat, task automation, and essentially anything that frees up time to strategize tasks. After years of managing people, its like having another pod of engineers to manage, only these ones are always on. This is possibly the future of management in tech – Agent manager, Sr Agent Manager and so on but I digress.
In most projects, I keep coming back to a single structured LLM call after a few rounds of trying agents and hitting agent fatigue.
It’s like going back to Point Break, the real deal, instead of enduring eleven versions of the fast and the furious franchise.
Taxonomy
Anthropic’s Building Effective Agents draws a line that most engineers blur:
- Workflows are LLMs and tools orchestrated through predefined code paths. You control the flow. The model fills in the blanks.
- Agents are systems where the LLM dynamically directs its own process. The model decides what to do next, which tools to call, when to stop.
Most things folks are calling agents are possibly workflows. Not a diss – they are powerful, predictable and mostly do what you need them to do. However, calling them agents is a bit of a stretch. I’m definitely not in the elite tier of folks who have workflows that run for 24 hours and dont’ really have usecases for those.
A single agent making a single tool call isn’t sprawl. Sprawl is when each step spawns another step, and you can no longer predict the shape of the execution at write time.
The pattern I see and fall into myself is reaching for “agents” because everyone with a bullhorn is talking about them. The Anthropic blog puts it as: “For many applications, optimizing single LLM calls with retrieval and in-context examples is usually enough.”
OpenAI says the same thing from a different angle: start with a single-agent system and only reach for multi-agent when the task genuinely exceeds what one model + tools can handle in a loop.
Escape from XML hell
For a recent project, I had to generate XML for a complex system. Hundreds of attributes, channel-specific prefixes, cab IR slot integers. If you ask an LLM to write that XML directly , you get hallucinated attribute names, invalid floats, and a corrupt preset file.
The architecture I landed on was a single structured LLM call:
- LLM receives a system prompt with domain knowledge (era tables, tone myths, compensations)
- LLM receives a user prompt with the request (and optionally, spectral features from a reference MP3)
- LLM emits one JSON object – a
ToneDescriptorwith normalized knobs, enums, and human-readable notes - Python validates the descriptor and translates it step by step to XML
The LLM never touches the XML. Python owns the output. The model is a proposal generator; the code is the approver.
# Single structured call: model proposes, Python validates and converts.
response = client.chat(
model=model,
messages=[
{"role": "system", "content": _load_system_prompt()},
{"role": "user", "content": _build_user_prompt(query, guitar, tuning, audio_summary)},
],
format=ToneDescriptor.model_json_schema(),
options={"temperature": 0.4},
)
Essentially it needed one call, one thing to evaluate and one thing to log.
Needless to say, I wanted to get with the times and did try the agent path first. Tool loops, ReAct-style chains, “let the browser agent browse the preset folder, let the reader agent read the presets, let the writer agent write the presets, let the judge agent judge the quality..you get the drift”
It got muddy fast and I lost control of the code. Overall it was expensive, harder to test, and offering zero improvement in output quality. It was like the Nic Cage version of The Wicker Man. I had no idea what the hell was going on and I wasn’t even sure if I was in the right file after a few rounds.
Here’s roughly what the agent version looked like before I abandoned it:
# The agent version - what I tried and abandoned
# The model decides what to call, in what order, how many times.
# Step count: unknown. Cost: unknown. Failure mode: partial XML written to disk.
# Debugging: good luck - which tool call produced the bad attribute?
tools = [
{"name": "read_preset_folder", "description": "List available presets"},
{"name": "read_preset_file", "description": "Read a specific preset's XML"},
{"name": "write_preset", "description": "Write XML to a .pdpreset file"},
]
response = run_agent_loop(system_prompt, user_query, tools, max_turns=10)
Compare to the single structured call that replaced it:
# The single structured call - what actually shipped.
# The model emits JSON. Python validates it. Nothing partial reaches the filesystem.
response = client.chat(
model=model,
messages=[
{"role": "system", "content": _load_system_prompt()},
{"role": "user", "content": _build_user_prompt(query, guitar, tuning, audio_summary)},
],
format=ToneDescriptor.model_json_schema(),
options={"temperature": 0.4},
)
# Either ToneDescriptor validates or it raises.
descriptor = ToneDescriptor.model_validate_json(_extract_json(response["message"]["content"]))
The agent had ten potential failure points while this has one.
The decision you should actually be making
Here’s the rule of thumb I’ve settled on:
Can you define the output schema before you write the prompt?
If you can, you want a workflow. Define the schema, bake your domain knowledge into the system prompt, validate the output, translate it directly. It’s not glamorous, but it works. It’s also a lot easier to test and debug.
If you can’t – the problem is genuinely open-ended, the number of steps is unknowable, and the model needs to decide what to do next based on what it finds – then you want an agent.
The distinction maps almost perfectly to problem type:
| Problem | Output schema known? | Right pattern |
|---|---|---|
| Infrastructure YAML from requirements | Yes – Terraform / Pulumi schema | Single structured call |
| CI pipeline config | Yes – GitHub Actions / GitLab schema | Single structured call |
| AdTech audience segment definitions | Yes – DSP segment JSON | Single structured call |
| Game NPC behavior config | Yes – engine-specific format | Single structured call |
| “Debug this production incident” | No – steps unknown | Agent |
| “Research and write a report on X” | No – scope open-ended | Agent |
Workflow problems tend to masquerade as agents – the rule above is how you tell them apart.
So when DO you use agents?
I’m not anti-agent. I use them daily even with that sprawl I mentioned earlier. But the ones earning their complexity are doing things like:
- Debugging production incidents where the number of log queries is unknowable upfront
- Research tasks that require deciding whether to go deeper or pivot based on what’s found
- Code generation tasks where the files touched depend on what the model discovers
- Choosing models based on the task at hand
Essentially anything where there is a need to act, observe, decide, repeat. That’s the right problem shape for an agent.
Everything else is probably a workflow – and a workflow with a tight schema is faster, cheaper, and easier to debug than any agent you’ll ship this quarter.
The meta-problem
I remember when microservices were kewl and the way forward was to break a monolith into services to minimize complexity per unit. But that ended up creating a distributed monolith that has all the downsides of both. Service A can’t deploy without Service B, every change ripples through a tangle of APIs, and you now have network latency on top of your original problems.
Agent-sprawl is teaching us the same lesson about LLM calls. Wrapping a single inference call in a multi-step agent loop sounds sophisticated. You get orchestration, delegation, tool use, the whole agentic buffet that is “LinkedIn-influencer-certified”. But what you actually ship is an agentic single call: a system with all the unpredictability of agents and none of the flexibility that justifies them. This is trading determinism for the illusion of intelligence.
The engineers I’m seeing ship reliable LLM-powered systems always ask “do I actually need the model to decide what to do next?” and mostly say no when they know it’s not needed.