Opus thinks, Sonnet builds · Building in the Open

The cost lessons leave me with one nagging question. I have two models I can put to work. One is powerful and expensive. One is fast and cheap. Until now I have reached for the expensive one out of habit. For everything.

So I look at what the agents actually do all day. Most of it is not hard. Move files. Run a sweep across the code. Rename things. Read a folder and report back. None of that needs a deep thinker. It needs a fast worker that does not make mistakes.

The hard part is smaller and rarer. Deciding what to build. Judging whether a fix is real or just looks done. Reviewing the work before it ships. That is where being smarter actually changes the outcome.

I set a simple rule. The expensive model decides and reviews. The cheap one builds. The judgment work stays on the premium model, where a better call is worth the price. Everything else, which is most of the volume, runs on the workhorse.

Two-panel diagram showing a deep thinker model handling judgment and decisions on the left, and a fast workhorse model handling execution and volume on the right — Pay for judgment, not for volume. The hard part is smaller and rarer than it looks — most of what gets done all day does not need the expensive one.

On that kind of work, I did not notice a quality drop in my own testing. The workhorse moves files and runs sweeps as well as the expensive one — for the grunt work it was never the bottleneck, and I was just paying a premium for work that did not need it. The judgment calls are a different question, which is exactly why those stay on the expensive model.

The agents are still doing the building. I just stopped letting the most expensive one of them spend its day on errands.

Learnings

Not every task needs the smartest, priciest model. I was reaching for the expensive one on reflex, even for moving files. Once I separated the two kinds of work, the answer was obvious: pay for judgment, not for volume. The deciding and the reviewing go to the powerful model. The bulk of the building goes to the cheap, fast one.

Matching the model to the task turned out to be one of the cleanest cost levers I have — as long as the judgment stays on the powerful model. On the routine work, the moving and sweeping and renaming, I could not see a difference in my own testing. The dividing line is not intelligence alone. It is risk, complexity, reversibility, and the cost of being wrong. The trick was noticing that most of what gets done all day is not the hard part.