A later commenter mentioned an AI version of TDD, and I lean heavy into that. I structure the process so it’s explicit what observable outcomes need to work before it returns, and it needs to actually test to validate they work. Cause otherwise yeah I’ve had them fail so hard they report total success when the program can’t even compile.
The setup I use that’s helped a lot of shortcomings is thorough design, development, and technical docs, Claude Code with Claude 4.5 Sonnet them Opus, with search and other web tools. Brownfield designs and off the shelf components help a lot, keeping in mind quality is dependent on tasks being in distribution.
A later commenter mentioned an AI version of TDD, and I lean heavy into that. I structure the process so it’s explicit what observable outcomes need to work before it returns, and it needs to actually test to validate they work. Cause otherwise yeah I’ve had them fail so hard they report total success when the program can’t even compile.
The setup I use that’s helped a lot of shortcomings is thorough design, development, and technical docs, Claude Code with Claude 4.5 Sonnet them Opus, with search and other web tools. Brownfield designs and off the shelf components help a lot, keeping in mind quality is dependent on tasks being in distribution.