I’ve run into this exact situation several times recently. They were all tasks t...

I’ve run into this exact situation several times recently. They were all tasks that I’m positive Claude Code could have managed in the past, but now it simply cannot get over the finish line a lot of the time, and Codex will one shot fix it just like the old Claude Code could. I’ve even tried having Code fix and implement some old tasks it had done correctly in the past and now it simply can’t.

My guess is that it is the fault of the model rather than the harness, I believe Opus to be much worse than it was for whatever reasons. Though I suppose it could be Code’s fault somehow. For the time being though Codex is much better which I never thought I’d be saying.

I plan to run tests using Pi so they have the same system prompt and harness, but I’m suspicious that it’s only the subscription level Claude Code that is worse and we’re not allowed to use that with Pi.