Code coverage and more using Valgrind's Callgrind

From Valgrind's callgrind manual:

Callgrind is a Valgrind tool for profiling programs. The collected data consists of the number of instructions executed on a run, their relationship to source lines, and call relationship among functions together with call counts. Optionally, a cache simulator (similar to cachegrind) can produce further information about the memory access behavior of the application.

The result is that when you run a program through callgrind the files that are produces contain a plethora of data just waiting to be mined. I hacked together some tooling to pull out data including:

  • Watching a function for performance regressions
  • Verify that no new libraries are loaded during runtime or that all expected libraries are loaded
  • List every source file used to determine hot code or the opposite, dead code
  • Every function and line used

The last one can also be used for fun things such as striping a binary down to the absolute minimum for 96K and similar contests.

Combining this data with a C++ parser (rpp in my case, but llvm also works) you can generate pretty good code coverage statistics. At the minimum showing which functions are never called, and beyond that showing which lines have been used. But it turns out it Callgrind can do one better because it has the option "--collect-jumps=yes" which adds branching information to the callgrind file and can result in much better coverage statistics.

For the following line with branching information you would know that the line was both executed and which branches were taken.

1035    if (isFull) {

If isFull is true in all tests according to the line statistics you might have full coverage of the function, but with branching you would know that you need to add a test for when isFull is false.

This gets even more valuable when you have multiple possible jumps on a single line such as the following:

633     if (model && columnClicked) {

Now that you have good code coverage, you might as well gamify it. Combining the C++ parser with git blame and git log it was possible to write a tool to determine who "owns" the code and generate a webpage with a public scoreboard showing code, their owners and the code coverage. Provide handy links to how to easily improve your code coverage and let individuals desire to be at the top take over.

Standing on the shoulders of giants tools like llvm that let you parse and walk C++ code combining with other rich data sources can let you create powerful new tools.

Previous
Previous

Shift Left: Code Review

Next
Next

Reducing the cost of unit tests