Skip navigation

bafonins.xyz
↵ All articles

A Second Opinion for Claude Code: Wiring in Codex

Table of contents

Introduction

I spend most of my coding day inside Claude Code, and it is genuinely good. But after a while you start to notice its habits. It reaches for certain patterns, it is confident in a particular way, and when it is wrong, it tends to be wrong consistently. That is not a knock on the model. Every model has a personality baked in by its training, and that personality comes with blind spots you stop seeing precisely because you look at them all day.

The fix that worked for me is embarrassingly simple: before I let Claude build anything non-trivial, I make it run the plan past a different model. In my case that second model is GPT-5.5, reached through Codex, OpenAI’s coding harness. This article is about how I wired the two together and why I think the idea is worth stealing.

The problem with one opinion

When you ask a single agent to plan and then execute, the planning and the reviewing happen in the same head. The model that decided the approach is the same model that decides the approach is fine. There is no friction, no disagreement, nobody in the room saying “wait, why this and not that?”.

For small changes this is fine. For anything with architectural weight, multi-file edits, a tricky migration, or a decision that is annoying to walk back, that lack of friction is exactly where mistakes slip through. You only find out the plan was flawed after the model has happily written three hundred lines around it, and now you are paying tokens to unwind work that should never have started.

A second model breaks that loop. It did not fall in love with the plan, so it has no reason to defend it.

There is a catch, though. The second model has to come from a different family. When I had an Opus or Sonnet session review code that another Opus or Sonnet session had written, the critique was toothless. It rarely pushed back, and more than once I watched it describe a pattern in almost the same words the first session had used to justify writing it. Same training, same instincts, same blind spots. To get any real friction you need a model raised somewhere else, which is why the reviewer here is GPT-5.5 and not a second Claude.

The dual-planning skill

What I landed on is a small Claude Code skill I call dual-planning. The shape of it is a two-stage protocol: Claude drafts, Codex critiques, Claude revises, and only then does the plan reach me. Here is the flow:

  1. Claude drafts a plan: given the task, it writes out the files it intends to touch, the architectural choices it is making, and the risks it can already see. Nothing gets built yet.
  2. The plan goes to Codex: the draft is handed to a Codex agent through the mcp__codex__codex MCP call. Codex answers with structured JSON, a verdict of either APPROVED or NEEDS_REVISION, plus any issues sorted by severity into critical, major, and minor.
  3. Claude answers back: for each issue it either accepts the point and amends the plan, or pushes back with a reason the original choice should stand. The revised plan returns to Codex on the same thread through mcp__codex__codex-reply, so the conversation keeps its context instead of starting cold every round.
  4. It iterates, but not forever: the back and forth runs for up to three rounds. If the two models still disagree after that, the skill stops trying to settle it alone.
  5. Disagreements come to me: when something is genuinely stuck, I get a clean question with the two positions side by side. Here is Claude’s approach, here is Codex’s approach, pick one. The final plan arrives with a collapsible history of what was argued and how it landed.

A couple of knobs make it practical. Codex runs on gpt-5.5, and the reasoning effort scales with the stakes, from low for a trivial fix up to xhigh for security or critical logic, sitting at medium by default. The working directory is just passed through as !pwd so Codex reviews against the real project.

Tip!

Reach for this on the tasks that are expensive to get wrong, like architectural decisions and multi-file changes. For a one-line bug fix the extra round trip is not worth it, and the skill is deliberately tuned to stay out of the way there.

Why this is worth the trouble

Two things make this more than a gimmick.

The first is obvious. You catch real problems earlier. A model trained by a different lab sees things differently, so Codex flags edge cases Claude waved past, questions a dependency choice, or notices that the “simple” approach is hiding a concurrency bug. And when Codex is wrong and Claude defends the original plan, that is useful too. The reasoning is now out in the open instead of assumed.

The second is easier to miss. It saves tokens in your Claude session. Reviewing a plan is cheap. Building the wrong thing and then unbuilding it is not. A few hundred tokens of critique up front spares you the much larger bill of generating code on a bad foundation, noticing halfway through, and asking Claude to tear it all out and start again. The critique also comes back as a short list of structured points rather than a long meandering exchange, so your main context window stays lean.

There is a quieter benefit too. Because the critique lands as structured output, the friction I mentioned earlier becomes something you can actually read. Instead of a vague feeling that the plan might be off, you get a list of named concerns with severities attached. That is a much better starting point for the one decision a human should still own: which way to go when two capable models honestly disagree.

Credit where it is due

This idea is not original to me. I came at it after seeing claude-octopus , a Claude Code plugin that takes the multi-model concept much further. Where my skill is a focused Claude-and-Codex handshake, Octopus orchestrates up to eight providers across a full Discover, Define, Develop, Deliver workflow, with consensus thresholds and adversarial review built in. If the second-opinion idea clicks for you and you want the maximalist version, that project is the place to look.

My skill is the small, single-purpose cousin of that idea: just enough orchestration to get a second model’s read on a plan, and nothing more.

Try it yourself

Two pieces have to be in place: Codex wired into Claude Code as an MCP server, and the skill itself. For the first, install the Codex CLI and register it as a server. This is where the mcp__codex__* calls in the skill come from:

npm install -g @openai/codex
claude mcp add --transport stdio codex -- codex mcp-server

One nice thing about this setup is the economics. If you already pay for ChatGPT but barely open it now that most of your work happens in Claude, Codex gives that subscription something to do. Even the $20 plan comes with fairly generous limits, and because the skill only ever sends Codex a plan to look over rather than a whole coding session, I have never come close to hitting them.

The skill itself is a single file. Grab it from the gist below, drop it into your skills directory, and the next time you ask Claude to plan something with real weight to it, you will get two opinions instead of one.

dual-planning skill (GitHub gist)

Conclusion

The mental model I keep coming back to is a code review that happens before the code exists. We already accept that a second person reading our pull requests catches things we cannot see in our own work. Doing the same thing one step earlier, at the plan, with a model that thinks differently from the one writing the code, is the same instinct applied at a cheaper moment. It costs a little time, it saves a lot of tokens, and every so often it stops you from building the wrong thing beautifully.