Micro agents belong in the interface

Most agent demos still feel like a person standing next to the software rather than a part of it.

You open a chat panel. You type a request. Something rummages around behind a spinner, narrates its inner monologue, and eventually hands you an answer. Sometimes that is genuinely useful. More often it feels like the interface has shrugged, turned to one side, and outsourced the moment to a talkative intern.

There is a tell in all of it: the agent behaves like a guest. It has to be greeted, shown around, asked to do things, and thanked. A guest is a lot of work, and a guest is always slightly in the way.

The more I build Trakkr, the more I want the opposite. Not a guest, a fixture. Something built into the room that you forget is there until you flip a switch and it does exactly one useful thing.

I have started calling these micro agents. Not one grand autonomous system. Not a bot with a name and a face. A micro agent is one small piece of product intelligence, pointed at one job, sitting exactly where the user already has intent. It reads the data for this screen, investigates one narrow question, and hands back a shape the interface already knows how to draw.

That last part is the whole game, and I will come back to it.

Why this is suddenly possible#

For most of the last decade, putting judgment into a product meant paying for it twice. Once to build the rules, and then forever, to stop them going stale. Judgment was a capital expense. You spent it only where the return was obvious, which meant that a thousand smaller moments in a product, the ones that each needed a little thought but not a lot, got a generic string and a fallback. Nobody could justify a hand-built pipeline for a tenth of a feature.

What changed is not that models learned to think. It is the price of a thought.

When a grounded answer costs a fraction of a cent and arrives in about a second, the question flips. It stops being "can we justify building intelligence here?" and becomes "is there any reason not to put a little analyst behind this button?" You would never staff a rules engine to write next week's email opener. You will happily spend a tenth of a cent on one.

Most of the Trakkr agents I run sit on DeepSeek V4 Flash through its OpenAI-compatible endpoint. No fine tuning, no orchestration framework, no agent platform. The economics are almost rude:

The economics

Data already in hand. One call in, JSON out.

fresh input6k@ $0.14/M

cache-read0@ $0.0028/M

output900@ $0.28/M

Cost per run$0.0011$1 buys ~915 runs

Repeated context inside a loop bills at the cache-read rate, $0.0028 per million tokens, one fiftieth of fresh input. That is what makes a multi-step loop cheap enough to sit inside a button. GTM work costs more, but it is GTM: there is real money on the other side.

The number that does the real work there is the cache read. A multi-step agent re-sends a growing transcript on every turn, and that would be expensive if every token were full price. It is not. Repeated context bills at one fiftieth of fresh input, so a loop that reads a few pages and calls a few tools still lands in fractions of a cent. That one line in a pricing table is what lets judgment live inside a button instead of inside a nightly batch job.

Once a unit of judgment is that cheap, the interesting decision is where you spend it. My answer: behind the surfaces people already understand.

In Trakkr the agentic work is almost never exposed as "talk to our AI". It sits behind a crawler action in the monitoring queue, a read on why an optimization matters, a content idea grounded in prompts where the brand is missing, the weekly email, a go-to-market note for a sales team. Each of those needs judgment. None of them needs a general-purpose assistant. They need a small analyst that can look at the right slice of data and come back with something concrete.

That changes the design problem completely. If the agent is the whole product, the UX has to teach the user what the agent can even do, which is exactly where the blank box and the blinking cursor come from. If the agent is behind a widget, the surrounding interface has already done that teaching. The user knows why they are on this screen. The agent only has to be good at the next move.

It is easier to show this than to argue it. Here is one, running in this paragraph.

Live, in this paragraph

reading crawler overviewlisting candidate pagesfetching acme.com/pricingchecking the answer blockcompiling the action

A real recorded run against an example brand. The card is the exact shape Trakkr ships: a title, a grounded reason, the evidence it stands on, and a first step. It cost about a tenth of a cent and named the page it inspected.

No chat, no preamble. You arrived with intent, "why is this page underperforming", and a small thing read the right data, looked at the actual page, and handed back a card the product already knows how to render. The intelligence is real. The container is boring on purpose.

Two shapes#

Almost every micro agent I run is one of two shapes.

The first is a one-shot synthesizer. We already have the data, so we pass the relevant slice in, the brand profile, the prompt gaps, the citation evidence, the observed queries, in a single call, tell the model exactly what JSON to return, and render it. The content-ideas agent works like this. It is not brainstorming from vibes. It reads prompts where the brand is absent, sees which third-party pages are winning the citations, notes who is being named instead, and returns a ranked set of ideas with a conservative read on what is actually winnable.

The second shape is a tool-calling loop. Here we hand the model a small tool registry and let it investigate. It looks at an overview, lists candidate pages, inspects citation overlap, fetches a real URL, and when it has seen enough, calls one terminal tool to submit its findings. That terminal call is the output contract: when submit_actions fires, the loop stops and the product has a structured payload to compile.

The loop is deliberately boring.

The loop

deepseek-v4-flashthinking: disabledmax steps 18

01get_crawl_overviewcrawl health, top bots, newly seen bots
02list_candidate_pages12 pages flagged by bot activity
03get_citation_overlappages AI engines already cite
04fetch_url(acme.com/pricing)plain GET · 38 words · no answer block
05get_crawler_accessrobots.txt allows GPTBot, blocks CCBot
06fetch_url(blog.acme.com/compare)plain GET · schema present · 1.2k words
07submit_actions3 actions · 2 page notes · loop stopsexit

runningstep 1 / 18

Model call. Tool calls. Run the tools. Feed results back. Stop when the terminal tool fires. The boringness is the feature: a bounded, legible loop is a shippable one.

Model call. Tool calls. Run the tools. Feed the results back. Stop when the terminal tool fires. Cap the steps. Track why it stopped. Turn hidden thinking off so the latency and the token budget go to tool use and output, not to a monologue nobody reads. The boringness is the feature. A bounded, legible loop is a loop you can ship.

Let the agent pick its context#

Here is the part I find genuinely new, and the reason small agents can beat the big one even on the big one's own turf.

For years, the craft of putting a model into a product was context engineering: a developer deciding, in advance, exactly what to stuff into the prompt. Get it right and the model had what it needed. Get it wrong, or let it drift, and you were either starving the model or drowning it. Either way the decision was made at build time, by a human, once.

Cheap fast models let you move that decision to runtime, and hand it to the model.

Instead of one giant prompt, you give the agent a set of cheap tools that each return a clean slice of the world, and you let it ask for the slices this particular question needs. The data is pre-sharded into questions. The agent composes its own working set. Try it:

Context routing

The question on the screen

brand overview1.2k

visibility breakdown2.4k

prompt gaps3.1k

what's working2.8k

competitor landscape4.6k

citation gaps1.9k

content & pages6.7k

perception2.2k

existing actions1.5k

live page read · fetch_url4.2k

Context routed8.9ktokens into the prompt

21.7k of stale context never enters the prompt.

The model asks for the slice each question needs. The rest never enters the prompt. Smaller context is not just cheaper, it is faster and more accurate, because the model is not reading past facts that do not matter.

The win is not only a smaller bill. A smaller, sharper context is faster and more accurate, because the model is not reading past a competitor landscape it never needed in order to answer a question about citations. Bloat is the enemy of all three at once: cost, latency, and correctness. Context routing lets you keep all three.

And the agents keep their own house in order as they go. When a loop's transcript gets long, the model writes itself a summary of what it has learned, the older tool results get dropped, and it carries on with a lighter context. Housekeeping, done by the one thing that knows what still matters.

The sharpest version of this is letting the agent go and fetch ground truth that no stored slice contains. Crawler logs can tell us a bot visited a page. They cannot tell us what the page says: whether the answer is buried in JavaScript, whether the schema is present, whether a plain request returns an almost empty document. So the agent gets a fetch tool, and it reads the page the rough way a crawler does.

Ground truth

GET acme.com/pricing

Simple, transparent pricing

Pick the plan that scales with you.

Starter$19Choose

Team$49Choose

Scale$99Choose

rendered in JavaScript

Crawler logs can tell us a bot visited a page. They cannot tell us what the page says. The agent fetches the page itself and reads the rough view a bot gets, so the recommendation is "this page returns 38 words to a plain GET", not "add schema" in the abstract.

This is where the useful recommendations come from. Not "add schema" in the abstract, but "this page wins bot attention and returns thirty-eight words to a plain request". The card does not feel like generic advice. It feels like the product noticed something, because it did.

The interface still owns the contract#

The dangerous version of all this is letting free-form model output leak straight into the structure of your product.

I do the opposite, and it is the rule that makes the rest safe to ship. The model gets room to think and write like a person. Then the application soft-compiles what it wrote into the product's contract.

For a crawler action, the agent writes a free recommendation: a title, what to do, why it matters, the evidence, a first step, the URLs it touches, the bots involved, a severity, an effort, an impact, a confidence. The compiler then maps that onto the nearest valid action type the database knows about.

The important word is map, not reject.

The contract

What the model wrote

Add FAQ schema to the comparison page so AI can quote a direct answer

The page ranks well for humans but returns no structured data, so engines summarise a competitor instead.

Nearest valid action_type

create_content_for_gap
create_comparison_content
add_schema_markup
fix_meta_descriptions
pitch_to_publication
update_cited_page
unblock_ai_crawlers
target_query_cluster

Map, do not reject. If the agent writes something slightly off-catalogue but insightful, the system maps it to the nearest valid type and keeps the richness in the free text. The rigid bits of the app stay valid. The intelligent bits stay alive.

If the agent writes something slightly off-catalogue but genuinely insightful, the system does not throw it away. It maps the item to the closest valid type and keeps the richness in the free-text fields. The rigid parts of the app stay valid. The intelligent parts stay alive. Numbers get clamped, confidences get bounded, foreign URLs get filtered out of owned-page actions. Users do not care whether an idea came from a perfect enum. They care whether the card in front of them is concrete, grounded, and worth doing.

The boring loop is the point#

The micro-agent pattern only works if the user can trust the surface it feeds. In Trakkr that comes down to a handful of unglamorous constraints, applied everywhere:

use real data, never generic category advice
ask for exact JSON when the data is already in hand
reach for tools when the model needs to investigate
turn hidden thinking off when latency matters
keep temperatures low so the same inputs read the same way
cache by a hash of the inputs when nothing has changed
track the token cost of every run
validate and clamp anything numeric
never treat one failed plain request as proof that a brand blocks bots

None of those is exciting on its own. Together they are what make an agentic UI feel calm instead of jumpy. The user never sees the step cap, the tool order, or the cost ledger. They see a card that loads quickly, names the page it inspected, shows its evidence, and does not pretend to know more than it does.

That is the good kind of invisible.

The product lesson#

The temptation with agents is to build a destination. A new place in the nav called Agent. A blank box. A magical employee you have to manage.

I think the quieter pattern wins.

Put small agents behind the moments where a user already needs help deciding. Let the interface frame the job. Give the model just enough agency to go and look at the world, and just enough cheap context to look at the right part of it. Then force it to hand back a shape the product can own.

In Trakkr I do not want one agent that replaces the product. I want dozens of small ones that make it sharper: one for crawler actions, one for content ideas, one for optimization reads, one for the weekly synthesis, one for go-to-market intelligence. Each is small enough to understand, small enough to test, small enough to price, and small enough to delete the day it stops earning its place.

That, oddly, is what makes them feel powerful. Not the size of the agent, but the number of small, sure-footed places you are finally willing to put one.

Micro agents belong in the interface

Why this is suddenly possible#

The unit is the widget#

Two shapes#

Let the agent pick its context#

The interface still owns the contract#

The boring loop is the point#

The product lesson#

Mack Grenfell

Claim detection and fact checking for AI SEO

Building Long-Tail Keyword Lists for Programmatic SEO

Conversion Lift Tests are Dead; Transitioning to Geo-Experiments

What if Performance Advertising isn't Just an Analytics Scam?