Skip to main content

Unsupervised Work

Mar 25, 2026 · 9 min read

Every AI coding agent makes silent decisions that shape your architecture. Plan mode does not fix this. It moves the same invisible choices into a document you skim and approve. The plan emerges from conversation, not from a wall of markdown.

Coding agents call it plan mode. The agent disappears, comes back with a finished plan, and asks you to approve it. That is not planning. It is unsupervised work with a rubber stamp at the end.

Every coding agent makes decisions you never agreed to. “Should this be a new file or extend the existing one?” The agent picks. “Zod schema or plain type?” The agent picks. “Refactor the caller or add an adapter?” The agent picks.

Each of these is a decision you did not know happened. The code works, so nobody questions it. But the architecture drifts in a direction nobody chose. That is not a bug. It is invisible technical debt, decisions baked into the code that no human consciously made. You only notice it if you review carefully, and by then three things are already built on top of that decision. Fixing it means reworking all of them. So you leave it.

If you do not review carefully (and be honest, not every diff gets a close read) you do not notice at all. The code ships. It works. Months later someone asks “why is this architected this way?” and nobody knows. That is how codebases accumulate decisions that nobody made. Not through negligence, but through a tool that makes choices on your behalf and never tells you.

I have seen this happen in Acolyte. Early on, the agent added fallback logic everywhere: retries, alternate paths, defensive branches I never asked for. It worked, but it quietly changed how the system behaved. In another case, it chose to communicate tool calls using strings and regex parsing instead of structured JSON. Nothing broke. Undoing that decision later took nearly a week.

The plan that nobody reads

The industry response is plan mode. Before the agent writes code, it produces a plan. You read the plan, approve it, and the agent executes. That is the theory. Here is the practice.

The agent goes away for thirty seconds. It comes back with a wall of markdown. Fifteen numbered steps. Nested sub-bullets. A “considerations” section. Maybe a “risks” section that lists things that will not actually happen.

You do not read it. Not really. You skim the headings. You check that the first step sounds right and the last step sounds like what you asked for. You say “looks good.” You hope the middle is fine.

Every developer does this. Not because they are lazy, but because the cognitive load of evaluating a complete plan is enormous. You would need to hold the entire plan in your head, cross-reference it against the codebase, and spot the decisions you disagree with, all from a static document you did not help write.

That is not planning. That is a code review before the code exists, and it has the same failure mode: things slip through.

Worse than no plan

When the agent works without a plan and makes a bad decision, you see it in the diff. You push back. The feedback loop is tight.

When the agent produces a plan and you approve it, you feel committed. You already said yes. Now when the diff looks wrong, you wonder if maybe you missed something in the plan, or if this is what you agreed to. The plan created a false sense of alignment, and that false sense makes you less likely to question the result.

The decisions are still hidden. They just moved from the code to the plan document. The same silent choices are made in the same invisible way. The only difference is that now you signed off on them without knowing.

What planning actually looks like

Think about the best planning sessions you have had with a colleague. Standing at a whiteboard, or sitting together at a desk. Nobody presents a finished plan.

One person grabs a marker, sketches a box, says “what if we start here.” The other person says “that works, but what about this edge case.” They add another box. They disagree about the third step. They try two approaches and pick the one that survives scrutiny.

The plan does not exist before the conversation. It emerges from it. Neither person could have produced it alone, because the value comes from the pushback, the questions, and the course corrections that happen in real time. At no point does someone disappear for thirty seconds and come back with a document for the other person to approve.

The cognitive load stays low because you never see the whole plan at once. You see one or two steps, react, and then see the next. By the time the plan is complete, you have already discussed every piece of it. There is nothing to skim because you were there for all of it.

The moment that matters

The biggest source of wasted work is not wrong code. It is the agent making a decision you would have disagreed with, then building three steps on top of it before you notice.

On a whiteboard, your colleague would pause at those moments. “I am thinking we put this in a new file, unless you would rather extend the toolkit?” That two-second exchange prevents a refactor later. It is the cheapest possible intervention at the most valuable possible time.

Current plan modes do not surface these moments. The agent resolves every branch internally and presents the result. Some tools present the branch as a multiple-choice question: “Option A, Option B, Option C.” That is not collaboration. That is a test. The agent already did the thinking and narrowed the options. The user is just rubber-stamping a decision someone else framed.

On a whiteboard, your colleague does not hand you a quiz. They say “I am not sure how to handle this part.” The uncertainty is shared. Both people think about it together and the direction emerges.

The code review moment

Every developer who works with AI agents has had this experience. You ask the agent to implement something. It works. You start reviewing the code and something feels off. Not a bug, but a design choice. The agent used a workaround where a proper abstraction existed. Or it added a layer of indirection that complicates the code for no clear reason. Or it picked an approach that is technically correct but goes against how the rest of the codebase works.

You think: “why didn’t we think about this before it wrote the code?”

That is the moment plan mode should prevent. Not by producing a document you approve in advance, but by making that decision visible when it is still cheap to change, during the conversation, before a single line of code is written.

An agent that surfaces its assumptions is not slower. It is faster. The first approach sticks because both sides agreed on it. There is nothing to rework on review.

What this requires

Three things make whiteboard planning work that current plan modes lack.

Incremental building. The plan grows through dialogue. The agent adds a step, the user reacts, the agent adds the next. The user never sees fifteen steps at once. They see the plan unfold one piece at a time, with space to intervene at every point.

Shared uncertainty. When the agent hits a fork, it does not present options. It shares the problem. “This step touches the stream handler and the lifecycle. What is your instinct?” Both sides think about it. The direction emerges from the conversation, not from a predetermined list.

The user can sketch too. If you say “start with the contract,” the agent takes that as the first step and builds around it. If you say “I think we should handle the error case first,” the agent adjusts. Your ideas shape the plan directly, not through approve/reject cycles on someone else’s proposal.

Autonomy needs alignment

Autonomy without alignment is just unsupervised work.

The agent guessing faster is not the same as the agent doing what you want. Most agents that call themselves autonomous are really just unsupervised. They make all the decisions themselves and hope you agree. When you ask them to plan first, they do the same unsupervised work but add a document at the beginning.

The difference between unsupervised work and genuine autonomy is whether the human had input on the direction before execution started.

Collaborative planning gives you both. Alignment upfront through conversation, then genuine autonomy during execution. The agent does not need to ask questions mid-task because the questions were already answered during planning. It does not need to guess your preferences because you already shaped the approach together.

That is how you build an agent that does what you would expect. Not by making the model smarter at guessing, but by making the planning collaborative enough that guessing is no longer necessary.

Why I shipped without it

Acolyte originally had four modes: chat, explore, work, and verify. Explore was the closest thing to planning, but really just work mode with read-only permissions. I never developed it further because I could see where it would lead.

I recently created a plan skill that works the way this post describes. It is a single file you can drop into any project. The agent has a conversation with you about the design, reads the code instead of guessing, and only summarizes when you both agree on the direction.

Share

Read next

Know the Ground

The model guesses what tools a project uses. The answer is already in the config files. Acolyte reads them at startup and runs format and lint deterministically. It also detects the test command so the model can run targeted tests on the files it changed.