AI Impact Frontier
Posts
Why Coding Copilots Deliver Less Than Expected...For Now

Why Coding Copilots Deliver Less Than Expected...For Now

Expectations were always unrealistic, but impact is real and improving

Sanjin Bicanic
November 18, 2024

Coding was one of the first breakout applications of the GenAI wave with the launch of GitHub Copilot. Early studies suggested dramatic productivity gains, with GitHub’s research indicating a potential 50%+ speed-up. This sparked hopes that GenAI could revolutionize the speed (and, by extension, productivity) of developing software. However, reported field data (I could locate) from software engineering teams shows a more tempered reality, with actual productivity improvements ranging between 0% to 28% with an average of ~15%, which is in line with my anecdotal experience.

Links in footnotes

There are three important reasons that are driving this sizeable discrepancy:

1. The Fallacy That Coding Is All Software Engineers Do

The misconception that software engineering is mostly coding is the primary reason for inflated expectations for GenAI's impact. In reality, developers spend only 10-40% (low-end, high-end) of their time on actual coding; the rest is split between meetings, system design, testing, troubleshooting, etc.

When coding assistants make coding faster, it will impact a small part of a developer’s working time. The simple math formula illustrates the concept. Even if GenAI doubled coding productivity (a best-case scenario), it could only contribute a 5-20% productivity gain overall.

2. Lab Environment vs. Real-World Complexity

A second reason for the disconnect lies in the difference between controlled, lab-based testing environments and the realities of complex, real-world projects. In labs, tasks are usually very well specified, isolated, and performed on clean, small codebases. By contrast, real-world project goals and requirements are often ambiguous, and changes need to be made to sprawling codebases and involve cross-functional dependencies. This is why teams working on new projects or smaller features tend to report much more benefit.

In addition, when AI-generated code contains errors, developers may need more time to untangle and fix these mistakes than to write the code themselves. A study by Up-Level confirmed that debugging AI-generated errors can add a layer of complexity, as developers trace convoluted errors back to the code.

Lastly, these tools perform best in widely used languages like Python and JavaScript, whereas code quality is lower for niche or older languages (e.g., Haskell or Fortran).

3. Lack of Clarity in RepurposingSaved Time

Even if developers gain back valuable time, the absence of clear guidance on how to best use this time limits tools’ impact. Without specific, prioritized tasks lined up for developers, productivity gains become incidental rather than strategic. To avoid this trap, enterprises need a concrete approach to redeploying saved time effectively, turning potential productivity into tangible gains for the business.

Does This Mean GenAI in Software Development is Overhyped?

Not really. A few years ago, gaining a 10% boost in productivity in one year would have seemed incredible. It only seems weak when the expectation is unreasonably set at 50%!

It’s important for leaders to set reasonable expectations to avoid a trap where genuinely useful technology is deemed a dud simply because it did not live up to overly ambitious expectations.

That said, these tools offer clear benefits in specific scenarios today and GenAI has greater potential for greater impact as tools improve.

Here are several practical applications where GenAI can add immediate value:

Large Codebase Refactoring: Amazon’s CEO reported they saved 4,500 developer years with these tools for refactoring old code bases. While the number is ambitious, it highlights the savings potential with well-implemented AI tools

Since GenAI is particularly adept at identifying repetitive code patterns in extensive codebases, it makes sense that it would be ideal for large-scale refactoring projects.

Language Familiarization: Coding assistants help developers new to a language or framework by providing boilerplate code, function suggestions, and AI-generated syntax and idiomatic structures. This means that, at the very least, developers might be more versatile and have shorter onboarding to new projects.

Reintroducing Occasional Developers: For those who don’t code regularly, like former engineers in leadership roles or citizen engineers, these tools can act as a quick reference or even a partner in simpler coding tasks. This lowers the barriers to participation, potentially broadening the base of developers and enriching software output.

Beyond coding assistants, GenAI tools can also contribute in other ways that free up software developers’ time.

Automating Test Creation: AI tools can auto-generate test cases, catching edge cases that might otherwise be missed.
Running Autonomous Testing: Executing these tests with GenAI agents allows for rapid quality assurance.
Generating Documentation: AI-written documentation can lower the cognitive load for developers who need to understand legacy code or onboard new engineers.

The Road Ahead: Evolving Capabilities

As the field matures, new tools and advances promise to address current limitations.

Better tools: Cursor and Sourcegraph are making strides in enhancing AI’s understanding of codebases, which should reduce errors and improve the relevance of suggestions, especially in large and complex codebases.
Better models: GenAI models like OpenAI’s O-1 are improving multi-step reasoning and self-evaluation capabilities. They enable the AI to review and correct its own work, thereby increasing quality on the first attempt.
Agents: Emerging AI agents, such as Devin or Cosine Genie, are showing promise. These agents can autonomously resolve over 20% of GitHub issues on a leading benchmark (SWE-Bench full) —an impressive leap from under 5% just a year ago.

Conclusion

While the initial 50% productivity claims were unrealistic, the current 10-15% gains are a meaningful start. The key for leaders is:

Set the right expectations
Prioritize areas where it will have the greatest impact (re-factors, smaller codebases, popular languages, and frameworks)
Explore GenAI uses beyond just coding assistants (documentation, testing)
Keep an eye on coding agents as they are getting better rapidly.

1 Electronic Manufacturing Company - MIT/NBER

2 Singapore GovTech - arxiv link

3 ZoomInfo - CIO Magazine

4 Bain & Co - 2024 Tech Report

5 Jellyfish - blog post

6 Uplevel - report