Shift Left: Code Review

Code review can mean a wide variety of things such as formal code review or automated code analysis. For this article I am talking about when a developer has a patch to some code base and would like someone else to review it before it goes into the main repository.

Just the thought of having a code review will frequently cause developers to spend extra time making sure a patch is as good as it can be. Code reviews typically catch bugs, design flaws, missed edge cases, inconsistent or confusing code style, and more.

There are several core ideas for code reviews, the more of them that are followed the more successful the technique will be.

  • Review of the code by one or more persons before committing it to the central repository.
  • Record comments about a patch so suggestions are not forgotten as a patch changes.
  • Patches should be easy to review and test.
  • The developer shouldn't feel like the code review is holding them back or that they are waiting for a review before they can continue working.
  • Reviews should integrate easily with the existing work flow.

With only these five requirements it shouldn't be too hard to do. So what are the most common ways that developers do code reviews?

Email pass-around

The most common form of code review I have seen is the email mailing-list of commits. It is so common that even when you find any other technique you will often still find this one running too. This is mostly because it is "the example" of how to add a hook to your revision control system and usually extremely easy to setup if not done for free. It works by having a hook in the primary code repository that sends an e-mail to a mailing-list with the diff of every commit. This of course fails the first and most important requirement which is to get the review before committing. The idea is that everyone reads commits they are interested in and replies to the mailing-list when they spot something wrong which another commit fixes. In practice most commits are not reviewed and most people read only a few random commits before marking them all as read. This of course only gets worse as the project grows and the volume of mail on this mailing-list increases to the point where no one reviews. To top it off when you see some code that works, but has a few confusing variable names you wont say anything. In fact it takes quite a bit for anyone to actually hit the reply and comment on a patch; code that doesn't compile, possible security bug, introduction of bad public API or some other major and obvious bug. And this works both ways, the committers know that they can get away with it so they don't make their commits as polished as they could.

Cons: By the time the e-mail goes out to the mailing list the patch is already committed into the repository. Once the code is in the repository it is much harder to ask someone to make smaller fixes and can even be taken as an insults. In practice there is an extremely low volume of actual reviews that are done.

Pros: Having nothing to do with code review this method is often best to occasionally let you see what is going on elsewhere in the project. In fact it has spawn a whole sub world that digest commits and write up news reports

Over the shoulder

Less common in the open source world (and in the corporate world) is the in person code review. This is where you have someone physically next to you while you explain the patch. While better it has its own drawbacks. When you go over to do a code review for someone else they have usually been sitting around for five minutes waiting for a code review and quickly walk you through the code. Because you are face to face, you can often ask any question and quickly determine if the developer had explored all the options you can think of. You can ask if the unit tests have run, if there is an example, etc. At the end of the day the results vary wildly with some reviews being in depth while others being just a quick glance. When this is the primary code review method and the person that would normally review some code is not available (vacation, sick, out of office) the commit might go in without any review.

Pros: Face to face reviews can be very quick and I can quickly turn into pair programming. The process of explaining the problem and patch can often times cause the developer to think of edge cases themselves to test. Unrelated this is one way to teach someone about a piece of code.

Cons: Many times the reviewer can feel rushed as they try to digest code that the developer quickly explained. The developers tells the reviewer the patch is correct even before the reviewer has reviewed the code. The developer who is best suited to review the patch might not be around and a developer who does not know the code and is not as able to properly review the code is drafted into doing it.

E-Mail diff

Bits of all of the above, a diff is passed around through e-mail which can spawn a thread of comments. The biggest downside is that there is a higher cost to entry and people are less likely to send smaller or simpler patches via e-mail. Often times the only patches that are sent via e-mail are larger ones that go through a number of revisions, and yet the code being e-mailed around isn't in a revision control system loosing out on the ability to see the diff between versions. Because the developer resends the entire patch every time they make a change the more revisions that are sent in a thread the less actual reviewing will be done.

PasteBin

The past few years I have seen the rise of paste websites such as http://paste.lisp.org/ that provide a simple way to paste some text. Usually no login is required and it takes only seconds to do. Developers will generate a diff and paste it onto these sites and then ask for a review over IRC or another chat system. There are even command line tools to make pasting code straight onto the website even easier such as 'svn diff | lpaste'. Pasting while cheap and easy has drawbacks such as requiring the other person to be there right then and the paste is often missing full context such as what was just above a diff. Because pasting is usually done in conjunction with chat the paste sites don't provide a way to add comments to a patch.

Pros: Very easy to do. Can be done with someone that is remote. As it is just text it is not constrained by any system or process. Also used sometimes for a quick way to send patches or other files rather then e-mail. Because you are also chatting via IRC you know that patch is being reviewed right there and then vs e-mail which could be a week.

Cons: The patch is frequently only read and almost never applied to a working repository to test even if it even compiles. This is just another tool that has absolutely no integration with any other tools you use such as the source repository. Feedback is always done via another format such as IRC. Requires live interaction between both parties and requires the manually pushing of the diff up to the website in the first place to get a review. Many paste sites have an expiration of a month and sometimes developers set it to one day causing frustration the next day when the reviewer tries to access it.

WebTool

In the past year or two several code review website tools have popped up. http://www.review-board.org/ and Rietveld are the one that had obtained some blog press. The review tools all try to improve upon the paste idea by managing patches. You can add new patches, request reviews from developers, add comments to patches, and update the patch. When I first heard of the tool I though it was fantastic and played around with it, but ended up not using it for very long. Fundamentally it requires a lot of work on the part of the developer. For every patch there is a lot of manual work that needs to be done. Even when the patch is approved you still have to submit the patch and then mark it as committed in the tool. Not to mention you need to have yet another way to watch for updates to your patches and for patches you need to review. When testing out the systems I only ever used one patch, but in production I had many patches, some of them on the same file. This caused problems as the patches were approved and other patches got stale and the tool would get confused. Fundamentally a review system needs to have the foundations of a revision control system to properly work and be closely tied into the revision control system that is used. Most of the review sites out there are little more then paste projects that got scaled up, they will let you down. To further illustrate this: you can patch your patches. This causes the review system to keep what is a branch on top of the current tree that you can view. This eventually will break by not behave like your real revision control system does. The most common one I saw was where I wrote patch A, someone else submitted B and then I updated A and uploaded it. Rietveld would freak out and break. And because the review systems are so big they eventually go down for a day here or there. All of your patches are up on the system when the review system goes down you can't do anything either. Just as pasting got popular because it was so very easy, a web tool probably wont get very popular because it is just to much work.

Is there any better options?

So after that whirlwind tour of what is currently in use out there I want to present something I have been using the past few months that takes advantages of distributed revision control and a little known feature of GitHub.

One behavior I have witnessed is that once a developers pastes a patch they wont do anything until they get a review. They won't commit until they get a review and they don't want to work on the next task until they commit the first. Nice catch-22. Using a distributed revision control system such as Git (or any system where branches are cheap) a developer can commit to the branch, post the patch for review, and continuing hacking in another branch giving the reviewer(s) as much time as they want to review the code. Patches sitting in their own branch can be worked on and revises for as long as needed without feeling as pressured to merge it 'right now' as often happens when a system does not use branches.

So just switching to Git solves one annoying code review problem, but can it do more? Several months ago I started using GitHub for hosting one of my open source projects. Every developer has a fork of the main repository (including me). When a branch (often just one change) is ready it is pushed up to GitHub and a pull request is made. When I get a pull request I can view the patch which is the same as e-mail pass-around and paster, but because the patch is in the source repository I can do all of the following very cheaply:

  • Review the patch in the context of the entire file. I can glance at just a patch, view the entire file, its history, other branches, and well anything I want because I have the full repository right there.
  • Review the changelog entry to make sure it is following the project format, spelling, etc.
  • Checkout the code and try it myself. The number of times I have actually downloaded and applied a patch from a paster is very small, maybe twice a year. But with GitHub almost every time I have fetched the repository, built the branch, played with the new feature / bug fix, ran the auto tests, etc. It is very cheap and easy. When I get a pull request from user foo, I run git fetch foo to grab their changes and because the patch is in its own branch compiling and running the patch is as easy as checking out their branch. No patching required.

Another nice aspect is that when viewing the change on GitHub you can click on any line number and add a comment right there. Add as many comments as you want to the different issues you spot. When the developer commits the next version you don't have to review the whole patch, you only have to review the patch to the patch to make sure that what you commented on was fixed. Because the branch is a real Git branch and not some files on a website review tool it understands patches to patches and when upstream is rebased just fine and never has issues.

Unlike review-board there is no extra work involved to review code. I get code review features for free with doing my normal everyday committing and pushing. When I get a pull request that is perfectly fine I can just merge it and reply back with a "Thanks!". But if I want to go in-depth with a review and leave comments and have them upload a new versions I can. If I want multiple reviews of one of my changes I can easily get that with multiple pull requests (with a comment just to review and not merge). To get reviews of my code I already push up my branches so it is nothing more then going to GitHub and clicking on the pull request button.

Reviewing with GitHub isn't perfect. I would like to have an RSS of all the comments I have left and another one of all the comments others have left on my repository. Currently they are simply placed in your news feed. But what is in place today works well enough and is far beyond in my opinion the other code review options.

With distributed revision control systems and sites like GitHub or Gitorious code review is such a simple add-on to already proven workflows, it's essentially free.

Previous
Previous

JavaScript speed, the browser wars, and the death of IE6

Next
Next

Code coverage and more using Valgrind's Callgrind