Measuring Code
The Good
Personally I’ve always found metrics very useful. For example – code coverage tools like Emma can give you a great insight into where you do and don’t have test coverage. Before embarking on an epic refactor of a particular package, just how much coverage is there? Maybe I should increase the test coverage before I start tearing the code apart.
Another interesting metric can be lines of code. While working in a legacy code base (and who isn’t?), if you can keep velocity consistent (so you’re still delivering features) but keep the volume of inventory the same or less, then you’re making the code less crappy while still delivering value. Any idiot can implement a feature by writing bucket loads of new code, but it takes real craftsmanship to deliver new features and reduce the size of the code base.
The Bad
The problem with any metric is who consumes it. The last thing you want is for an over eager manager to start monitoring it.
You can’t control what you can’t measure– Tom DeMarco
Before you know it, there’s a bonus attached to the number of defects raised. Or there’s a code coverage target everyone is “encouraged” to meet.
As soon as there’s management pressure on a metric, smart people will game the system. I’ve lost count of the number of times I’ve seen people gaming code coverage metrics. In an effort to please a well meaning but fundamentally misguided manager, developers end up writing tests with no assertions. Sure, the code ran and didn’t blow up. But did it do the right thing? Who knows! And if you introduce bugs, will your tests catch it? Hell, no! So your coverage is useless.
The target was met but the underlying goal – improving software quality – has not only been missed, it’s now harder to meet in future.
The Ugly
The goal of any metric is to measure something useful about the code base. Take code coverage, for example – really what we’re interested in is defect coverage. That is, out of the universe of all possible defects in our code, how many would cause a failure in at least one test? That’s what we want to know – how protected are we against regressions in the code base.
The trouble is, how can I measure “the universe of all possible defects” in a system? Its basically unknowable. Instead, we use code coverage as an approximation. Given that tests assert the code did the right thing, the percentage of code that has been executed is a good estimation of the likelihood of bugs being caught by them. If my tests execute 50% of the code, at best I can catch bugs in 50% of the code. If there are bugs in the other 50%, there’s zero chance my tests will find them. Code coverage is an upper bound on test coverage. But, if your tests are shoddy, test coverage can be much lower. To the point where tests with no assertions are basically useless.
And this is the difficulty with metrics: measuring what really matters – the quality of our software – is hard, if not impossible. So instead we have to measure what we can, but it isn’t always clear how that relates to our underlying goal.
But what does it mean?
There are some excellent tools out there like Sonar that give you a great overview of your code using a variety of common metrics. The trouble often is that developers don’t know (or care) what they mean. Is a complexity of 17.0 / class good or bad? I’m 5.6% tangled – but maybe there’s a good reason for that. What’s a reasonable target for this code base? And is LCOM4 a good thing or a bad thing? It sounds like a cancer treatment, to be honest.
Sure, if I’m motivated enough I can dig in and figure out what each metric means and we can try and agree reasonable targets and blah blah blah. C’mon, I’m busy delivering business value. I don’t have time for that crap. It’s all just too subtle so it gets ignored. Except by management.
A Better Way
Surely there’s got to be a better way to measure “code quality”?
1. Agree
Whatever you measure, its important the team agree and understand what it means. If there’s a measure half the team don’t agree with, then its unlikely it will get better. Some people will work towards improving it, others won’t so will let it get worse. The net effect is likely to be heartache and grief all round.
2. Measure What’s Important
You don’t have to measure the “standard” things – like code coverage or cyclomatic complexity. As long as the team agree its a useful thing to measure, everyone agrees it needs improving and can commit to improving it – then its a useful measure.
A colleague of mine at youDevise spent his 10% time building a tool to track and graph various measures of our code base. But, rather unusually, these weren’t the usual metrics that the big static analysis tools gather – these were much more tightly focused, much more specific to the issues we face. So what kind of things can you measure easily yourself?
- If you have a god class, why not count the number of lines in the file? Less is better.
- If you have a 3rd party library you’re trying to get rid of, why not count the number of references to it.
- If you have a class you’re trying to eliminate, why not count the number of times its imported?
These simple measures represent real technical debt we want to remove – by removing technical debt we will be improving the quality of our code base. They can also be incredibly easy to gather, the most naive approach only needs grep & wc.
It doesn’t matter what you measure, as long as the team believe whatever you do measure should be improved; then it gives you an insight into the quality of your code base, using a measure you care about.
3. Make It Visible
Finally, put this on a screen somewhere – next to your build status is good. That way everyone can see how you’re doing and gets a constant reminder that quality is important. This feedback is vital – you can see when things are getting better and, just as importantly, when things start to slip and the graph veers ominously upwards.
Keep It Simple, Stupid
Code quality is such an abstract concept its impossible to measure. Instead, focus on specific things you can measure easily. The simpler the metric is to understand the easier it is to improve. If you have to explain what a metric means you’re doing it wrong. Try and focus on just a few things at any one time – if you’re tracking 100 different metrics its going to be sheer luck that on average they’re all getting better. If we instead focus on half a dozen, I can remember them – the very least I’ll do is not let them get worse; and if I can, they’ll be clear in my mind so I can improve them.
Do you use metrics? If so, what do you measure? If not, do you think there’s something you could measure?
References: Measuring Code from our NCG partner David Green at the Actively Lazy blog.