~ / trail / twelve-factor-agents

twelve factor agents ——

this is an interesting piece by Dex Horthy of HumanLoop (an ai startup working in the LLMOps space). it mimics herokus “12 factors apps” (of which i was honestly unaware of, my inexperience is showing 🙈). i’ve listed some takeaways below!

(a) mix non-detirminism into a detirministic system

I’ve been surprised to find that most of the products out there billing themselves as “AI Agents” are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical.

reasons for this include —

approaching building agentic capabilities as a “refactor” makes things more tenable

a small, focused approach ensures you can get results TODAY, while preparing you to slowly expand agent scope as LLM context windows become more reliable. (If you’ve refactored large deterministic code bases before, you may be nodding your head right now.)

you get the benefits of modularization

Manageable Context: Smaller context windows mean better LLM performance

Clear Responsibilities: Each agent has a well-defined scope and purpose

Better Reliability: Less chance of getting lost in complex workflows

Easier Testing: Simpler to test and validate specific functionality

Improved Debugging: Easier to identify and fix issues when they occur

this makes sense to me — throwing away a bunch of application code (in the short term) seems rather silly. and thinking of making your application more “agentic” as a slow refactor of increasingly LLM dependent code as capabilities increase seems way more likely to succeed than the “build out a new agent product” route

(b) llms are stateless functions

or as dex puts it —

Make your agent a stateless reducer

one of the existing biases that this post made realize is that i inherintly think of llms as “chats” — a continous thread of inputs from the user and messages from the LLM. but most LLM APIs are not inherintly stateful, that’s usually an abstraction added by the “agent library” applications are built on top of.

some neat examples of this from the post —

compacting errors in the context window

you don’t need to just put the raw error back on, you can completely restructure how it’s represented, remove previous events from the context window, or whatever deterministic thing you find works to get an agent back on track.

using non-chat style context windows

(you can) Enable agents to be triggered by non-humans, e.g. events, crons, outages, whatever else. They may work for 5, 20, 90 minutes, but when they get to a critical point, they can contact a human for help, feedback, or approval.

pre-fetching context before the LLM call

If there’s a high chance that your model will call tool X, don’t waste token round trips telling the model to fetch it

You might as well just fetch [the data] and include them in the context window

(c) own as much as you can

a lot of the rules (2: Own your prompts, 3: Own your context window, 8: Own your control flow) boiled down to don’t give up too much to frameworks / apis. it argues a lot of the simplicity of existing “agent frameworks” tend to obfuscate key parts of the process that are necessary to ensure your system is reliable (especially when models, best practices, and capabilities are changing as rapidly as they are).

(d) tool calls can be used creatively

for example, as factor 7 states you can Contact humans with tool calls. the same tool call can also be handled differently depending on the state of the application or context, abstracting away some of the complexity as described in factor 4.

(e) everything is context engineering

Remember: The context window is your primary interface with the LLM. Taking control of how you structure and present information can dramatically improve your agent’s performance.

Context includes: prompts, instructions, RAG documents, history, tool calls, memory

Optimize context format for token efficiency and LLM understanding

Consider custom context formats beyond standard message-based approaches

Structure information for maximum density and clarity

Include error information in formats that help LLMs recover

Control what information gets passed to LLMs (filter sensitive data)

a great read, and something that is pushing me further from SDKs in my own experiments and more towards trying to homebrew something. i might try using the BAML language linked throughout this post

and now a haiku!

if it doesn’t have
llms all the way down
is it an agent?