Development & Engineering / workflow case

The Right Tool for the Job

Beginner to intermediate Set up once, then iterate continuously @AISmallBizGuru
Result

Hierarchical routing of models and local tools by task to keep token costs down | Multi-model division-of-labor agent delivery workflow

For

People running multi-model coding/agent workflows who want to control token cost and quality

I keep seeing stories about companies burning absurd amounts of money on AI, and I get how it happens. Once “AI strategy” becomes a budget line, people start treating every task like it needs the biggest model, the fattest context window, the fanciest agent harness, and a burn rate that makes the CFO start stress-chewing printer paper.

My approach is much more boring: use the right tool for the job.

Right now my personal AI stack is roughly two $20-ish monthly plans, plus some pay-as-you-go usage. ChatGPT is my main architecture, planning, reasoning, writing, and product-thinking partner. Gemini is also in the stack because it gives me another strong model family, and the plan also comes with useful Google ecosystem benefits like cloud storage. Then I use OpenRouter and OpenCode Zen for pay-as-you-go model access when I want broader model choice, coding firepower, or a cheaper model that is good enough for a specific task.

All in, I am probably around $60/month most months: ChatGPT, Gemini, and maybe another $20 in OpenRouter/OpenCode Zen usage. That is less than, or roughly comparable to, a single high-end “pro” AI subscription. It is also enough to build a surprising amount if you are disciplined about which model does what.

The pattern is simple. I use the strongest general model where thinking quality matters most: architecture, tradeoffs, design direction, PRDs, MVP specs, decomposition, and the prompts that will later be handed to coding agents or cheaper models. That is where I want the good judgment. Bad architecture is expensive. Bad planning is expensive. A sloppy spec can waste hours of agent time. So I do not cheap out on the thinking layer.

But once the plan is clear, a lot of the execution does not need the most expensive model in the drawer. If the task is “add this tab,” “wire this endpoint,” “refactor this component,” “write this test,” or “implement this well-scoped prompt,” then lesser models can often do the work just fine. Some of the Chinese models are genuinely good at code. My current favorite is Kimi 2.6. It is not always the model I want making the architectural call, but for scoped coding work, it can be very strong.

I also have Codex usage included through my ChatGPT plan, so I use the good model there up to my limits. But I try not to spend that quota like a drunk sailor with a token cannon. I keep some in reserve for the places where it matters most: debugging bad outputs, reviewing messy diffs, explaining why a lesser model went sideways, or getting a stuck implementation unstuck.

The local stack matters too. For me, that means Cmux and LazyVim. Cmux gives me a practical way to run and manage multiple agent/code sessions without turning my desktop into a tab graveyard. LazyVim gives me the editor environment I actually want to live in: fast, keyboard-driven, extensible, close to the files, and comfortable for real development instead of just prompt-and-pray coding. The AI tools are powerful, but I still want a local cockpit where I can inspect diffs, move through the codebase, run tests, edit by hand, and keep the whole process grounded.

That is an important part of my workflow. I am not trying to outsource all judgment to the model. The better pattern is human direction, strong-model planning, cheaper-model execution, local inspection, and then another round of review when needed. Cmux and LazyVim are part of that loop because they keep the work close to the repo instead of turning development into a collection of disconnected chatbot transcripts.

This is where a lot of AI spend goes sideways. Companies do not just overspend because models are expensive. They overspend because they have no routing discipline. Every task goes to the premium model. Every workflow gets an agent. Every internal prototype gets enterprise ceremony. Every “what if” experiment gets built like it has to survive Black Friday traffic.

The right question is not “what is the best model?” The right question is “what does this step actually need?” Architecture needs judgment. PRDs need clarity. Specs need precision. Code generation needs competence. Debugging needs reasoning. Bulk edits need cheap throughput. Summaries need “good enough.” Exploration needs optionality. Those are not the same job, and they do not all deserve the same model.

That is why I like a mixed stack. ChatGPT handles the big-picture thinking and higher-order planning. Gemini gives me another serious model family and useful ecosystem value. OpenRouter gives me model selection and pay-as-you-go experimentation across providers. OpenCode Zen gives me another execution lane for coding workflows. Codex gives me a strong integrated coding assistant, but with limits I want to use intentionally. Cmux and LazyVim keep the actual development loop local, inspectable, and under my control.

Grok is technically in the pile too because I have it as part of being a verified X user, but I do not do much coding with it yet. Maybe that changes. Maybe it does not. That is another part of the point: tools earn their place by being useful in a specific lane, not because they are shiny.

This is also why I like writing good planning prompts. A strong model can turn a fuzzy idea into a clean implementation brief. That brief can then be handed to a cheaper or more code-specialized model. The expensive model does not have to hammer every nail. Sometimes its highest-value job is making sure the nail, board, and hammer are all properly identified before the cheaper carpenter shows up.

That is not anti-premium-model. I love the good models. I use them constantly. But I do not think “use the best model for everything” is a strategy. It is a procurement problem wearing a hoodie.

The better approach is tiered intelligence. Use the best model for direction. Use capable cheaper models for bounded execution. Use specialized tools where they fit. Use pay-as-you-go when you need flexibility. Use subscription quotas intentionally. Keep reserve capacity for debugging. And maybe most importantly, do not build enterprise plumbing for a shed.

For an individual developer, that keeps the monthly spend sane. For a company, the same idea scales conceptually even if the numbers are bigger. The waste usually comes from treating AI as magic instead of workflow. Once you understand the workflow, you can route tasks intelligently.

That is the real lesson: AI spend should be tied to task value, not model prestige.

The right tool for the job is not always the biggest model. Sometimes it is the best model. Sometimes it is the cheapest model that will not embarrass you. Sometimes it is a coding agent. Sometimes it is a planning conversation. Sometimes it is Cmux and LazyVim with a human reviewing the diff like civilization still matters. And sometimes the smartest move is to spend the good tokens up front so the cheap tokens do not wander into a ditch later.

Related