AI Is Coding Faster Than You Can Review. Now What?

I use Claude Code extensively these days. I put in a prompt and it generates hundreds of lines of code. It shows me the changes before applying them. Do I read the changes fully? I used to. But no longer. And that’s for two reasons. First, it is very hard to suggest edits because the changes come in chunks and it’s hard to have a full picture of what it is trying to achieve (no, checklists don’t help). Second, and more importantly, I am slow! I am the bottleneck in the way of great potential. AI gives me working code with just a few back-and-forths. I get in its way if I read every line.

I am not alone. Our small team—including the product manager—codes as fast as AI code generators. And we are among many teams trying to realize their ideas with these tools.

Assuming most of the code will be written by AI, how important is it to review the generated code? How can we keep up and not become the bottleneck? I’ll try to answer these questions and describe my approach.

Why do we care about code reviews?

Traditionally, we do code reviews for two main reasons. First, to ensure the change delivers the desired functionality and does so without side effects. Second, to verify it passes the expected quality bar.

The functionality piece is usually covered by different tests, both manual and automatic, and is relatively straightforward. Reviewing for quality is more challenging because quality means many different things—it’s often subjective and difficult to test.

Interestingly, a good review is one that cares about quality too, the long-term play. As Martin Fowler describes in his article, “high quality software is cheaper to produce” and ultimately sells better.

But if AI is writing most of the code, is “quality” or more specifically “internal quality” still a real factor? Does “cruft” get in the way? With the current state of AI, I believe yes and yes. Quality still matters and cruft definitely gets in the way.

Quality and AI

When it comes to quality, many decisions stem from the current state of the project and our architectural vision for the software. Current AI tools are optimized for local tasks. They cannot make trade-offs based on short-term or long-term goals or respond to business changes. We still need humans to guide the overall design. For humans to do this effectively, they need to understand the code—perhaps not at line-by-line granularity, but at least at function-level granularity.

Moreover, software is built through collaboration. Each of us prompts AI differently and generates code with different considerations. Eventually, we need to merge these changes. Merging low-quality code is challenging. Even if AI handles the merging, it would leave behind considerable cruft—undesirable for the reasons mentioned above.

Generate correct code without reviewing every line

AI is not perfect. It can confidently generate code that is subtly wrong or uses outdated/insecure libraries. Hallucination remains an issue. However, I expect these problems will resolve quickly as tools and models improve. Meanwhile, effective prompting goes a long way toward generating functionally correct code.

Until we have perfect AI, we should maintain near-perfect test coverage. This builds trust in new functionality and prevents AI from introducing regressions in unrelated parts (I encounter unexpected changes in unrelated code more frequently than I’d expect).

Previously, writing tests was cumbersome for most developers. This was likely because tests require considerable boilerplate code, which isn’t fun to write. When starting a new project, I prioritize making it easy to write tests. With AI, achieving this state is even simpler.

I strongly advise writing test cases manually. Better yet, write tests first—a practice known as TDD (test-driven development). This ensures our understanding is complete and that we’ve considered edge cases and ambiguous aspects of the functionality. In my experience, this exercise helps us craft better prompts and reduces back-and-forth with code assistants.

What about quality?

Laying out preferences is the first step. I use claude.md to describe the architecture, the reasoning behind it, and the conventions I want followed. I include details about important folders, their relationships, and guidelines for creating and placing new files. Other AI tools likely have similar mechanisms to specify preferences. Sharing these instructions among collaborators ensures everyone follows the same standards. Keeping them updated is also essential.

The prompt itself is equally important. Beyond describing functionality, the author should think about non-functional requirements and any constraints, then translate them to instructions in the prompt.

The reason that I believe this translation better be done manually is that non-functional requirements are usually about architecture and their realization vary based on the current stage of the project and the expected next state.

Consider scalability as an example. For a project in its infancy, scalability is of no importance. And since there are not many dependencies (microservices, third-party APIs, etc.), local improvements are enough. But after product market fit and when growth is happening, there are probably dependencies to different services, infrastructure patterns, databases, etc. and scalability is no longer local to the code but depends on many external factors and not all of them are known to the code assistant and not all of them are controlled by code. If the author does not know how to do this translation, discussing them with AI definitely helps but ultimately, the prompt author should make the decision and instruct the AI what to do.

The other tools at our disposal to ensure quality and correctness are static code analyzers, linters and AI code reviewers. Setting up hooks to automatically run these checks before accepting new changes is a must if you don’t want to review the code line-by-line.

Shift the focus

Most of what I discussed above is not new and are practices followed by high quality teams even before the age of AI assistants. Here, I am not advocating blindly accepting the changes as suggested by the AI. Rather, I want to find a pragmatic process to leverage AI as much as we can and as fast as we can. It means shifting the focus from reviewing the changes line-by-line to a more targeted, architecture-centric approach.

So, what does this look like for me? After AI is done with a task, the first thing I do is run git diff. I’m looking for deviations from the patterns we’ve already established in our codebase, or those I specifically laid out in claude.md. The big question I’m asking myself is: “Is this change actually moving us towards the system we intend to build, or is the AI subtly steering us off course?” It’s easy for an AI focused on a local task to miss the bigger architectural picture.

Next, I check if the AI added any new libraries or dependencies. This is crucial. I look into their credibility, quality, and how they’re actually being used in the generated code. I’ve often found myself asking Claude to regenerate a solution using a different library that our team trusts more, or one that’s a better fit than what the AI initially picked.

Then, my attention turns to data structures and anything touching the persistence layer. If there are new data definitions, or changes to existing structures or database schemas, do they make sense in the grand scheme of things? This is also where I hunt for common anti-patterns – things like N+1 queries popping up where they shouldn’t, or risky shortcuts like using simple string interpolation for database queries.

If there are large chunks of new code, or areas where most changes happened, I take a closer look. Does anything seem overly complex for what it’s trying to achieve? If my gut says “yes,” I dive deeper to pinpoint why it feels that way and then instruct the AI to regenerate that specific part, giving it more precise instructions to simplify.

A quick check on cross-cutting concerns is also on my list. Things like logging, error handling, telemetry, and authentication/authorization need to be consistent. I want to make sure the AI isn’t inventing its own way of doing these, but rather sticking to our project’s standards.

And before I sign off on the AI’s work, I think about our non-functional requirements (NFRs) – specifically scalability, performance, and resilience. Considering the current stage of our project, I look for any obvious issues or design choices the AI made that might cause headaches down the line.

Finally, and this is a vital step, I update claude.md. If the AI introduced a new pattern that I like and want to see more of, or if it did something I definitely want to avoid in future changes, I make a note of it. This helps the AI get better aligned with our needs over time.

When it comes to peer review, I start by focusing on the “intent” and “rationale”. Looking at the original prompt and the chat history (or at least a summary of the AI’s plan) is essential. It gets everyone on the same page about what was asked and helps me spot if something critical was misunderstood by the AI or missed in the prompt. After that, the review process is very similar to how I review my own AI-generated changes.

Closing Thoughts

Shifting the way we do code reviews isn’t a decision to be taken lightly, and I know it might raise some eyebrows. It’s true that when we don’t check each line, we might be trading away a few things.

There’s the risk of subtle bugs that even a good test suite might miss. There’s also the question of learning opportunities – how do developers, especially those more junior, truly learn the craft or the intricacies of a codebase if they’re not regularly reading through new code? And, perhaps a less tangible but still important point, does our intuitive “feel” for the codebase diminish if we’re not constantly deep in the details?

These are valid concerns. My hope is that the kind of architecture-centric review I’ve described, coupled with strong testing practices (especially TDD) and automated checks, helps to mitigate these risks.

As for learning, it might shift. The learning curve now involves crafting effective prompts, understanding architectural trade-offs at a higher level, and critically evaluating the output. The “deep dives” into complex AI-generated sections can still provide those “craft” learning moments.

Ultimately, this approach is about finding a pragmatic balance. AI is undeniably accelerating how we build software. The challenge is to harness that speed responsibly, ensuring we maintain quality and understanding, even if the way we achieve it looks a bit different than it used to. It’s an evolving space, and I’m sure our processes will continue to adapt.