Giving the agents job titles · Building in the Open

For a while I treat the AI like one person who can do anything. Write the code, check the code, ship the code. It is fast. It is also marking its own homework.

That is the part that bothers me. The same agent that writes the code decides the code is good. I have never run a team that way. You do not hand one person the whole job and hope they catch their own mistakes.

On the left, one overwhelmed robot trying to do every job at once, tools flying around it. On the right, a tidy team of specialist robots, each a different color, each with one role: Product Manager, Engineer, Code Reviewer, Security Reviewer, Architect, QA. — One agent wearing every hat, marking its own homework — versus a team where each agent has one job and nobody grades their own work.

So I stop. I write down the roles I would actually staff if this were people.

Builders, the ones who write the code. Reviewers who read it after, one looking for things that are simply wrong, another looking only for security holes. A coach whose job is to critique the work and push it harder. A PM. A QA agent. An architect to think about the shape of things before anyone writes a line.

None of these is a separate program, and none is a person. Each is the same underlying AI pointed at one job, given its own instructions and its own boundary to stay inside. The roles are real. The team of employees is a metaphor, and a useful one.

Then I set the one rule that makes it mean something. A builder does not get to call its own work done. A reviewer has to check it first. The build does not count until it passes someone else’s read.

Each agent gets one job and gets good at it. The builder builds. The security reviewer thinks about nothing but how this could be broken into. The coach is allowed to be harsh. Nobody is asked to be excellent at everything, which is the one thing the single do-everything agent was quietly failing at.

The output changes almost immediately. Not because any one agent got smarter. Because the work now runs a gauntlet before it reaches me, the same way it would if a real team built it.

Learnings

The thing that made the work better was not a better agent. It was splitting one agent into specialists and refusing to let any of them approve themselves. A reviewer who did not write the code sees what the writer is blind to.

I had been treating “fast” as the goal. One agent doing everything is fast. But fast and unchecked is just a quicker way to ship the same mistakes. The roster is slower per step and far better at the end.

The smallest change carried the most weight: a builder cannot mark its own work done. Once a different agent had to sign off, “done” started meaning something.