What’s the problem?
Here at Google, we have thousands of engineers working on our code base every day. In fact, as previously noted, 50% of the Google code base changes every month. That’s a lot of code and a lot of people. In order to ensure that our code base stays healthy, Google primarily employs unit testing and code review for all new check-ins. When a piece of code is ready for submission, not only should all the current tests pass, but new tests should also be written for any new functionality. Once the tests are green, the code reviewer swoops in to make sure that the code is doing what it is supposed to, and stamps the legendary “LGTM” (Looks Good To Me) on the submission, and the code can be checked in.
However, Googlers work every day on increasingly more complex problems, providing the features and availability that our users depend on. Some of these problems are necessarily difficult to grapple with, leading to code that is unavoidably difficult. Sometimes, that code works very well, and is deployed without incident. Other times, the code creates issues again and again, as developers try to wrestle with the problem. For the sake of this article, we’ll call this second class of code “hot spots”. Perhaps a hot spot is resistant to unit testing, or maybe a very specific set of conditions can lead the code to fail. Usually, our diligent, experienced, and fearless code reviewers are able to spot any issues and resolve them. That said, we’re all human, and sneaky bugs are still able to creep in. We found that it can be difficult to realize when someone is changing a hot spot versus generally harmless code. Additionally, as Google’s code base and teams increase in size, it becomes more unlikely that the submitter and reviewer will even be aware that they’re changing a hot spot.
In order to help identify these hot spots and warn developers, we looked at bug prediction. Bug prediction uses machine-learning and statistical analysis to try to guess whether a piece of code is potentially buggy or not, usually within some confidence range. Source-based metrics that could be used for prediction are how many lines of code, how many dependencies are required and whether those dependencies are cyclic. These can work well, but these metrics are going to flag our necessarily difficult, but otherwise innocuous code, as well as our hot spots. We’re only worried about our hot spots, so how do we only find them? Well, we actually have a great, authoritative record of where code has been requiring fixes: our bug tracker and our source control commit log! The research (for example, FixCache) indicates that predicting bugs from the source history works very well, so we decided to deploy it at Google.
How it works
In the literature, Rahman et al. found that a very cheap algorithm actually performs almost as well as some very expensive bug-prediction algorithms. They found that simply ranking files by the number of times they’ve been changed with a bug-fixing commit (i.e. a commit which fixes a bug) will find the hot spots in a code base. Simple! This matches our intuition: if a file keeps requiring bug-fixes, it must be a hot spot because developers are clearly struggling with it.
More at Hacker News.