The State of Testing
When I joined my first large team, I was naive. When I found bad code, I said, “you should leave things better than you found it.” One day I found a method name that wasn’t following conventions, and as pedantic as it sounds, the right thing to do was rename it. So I did. I updated all references, double checked my change, clicked through to make sure nothing broke, and finally, confident, committed my work.
When that change went live, our advertisers stopped receiving impression data. Our profit was entirely from advertisers. This was disastrous.
The bug: inside a configuration file, which nobody knew about, was a set of hard coded method names monitored by the logger code. Nobody knew about this file because the logging system was written a year before, and all engineers who worked on it had left or changed departments.
We eliminated these types of regressions by developing an automated test suite. According to a publication released by the Department of Defense, half the cost of a project can be QA. That same article goes on to say that by the time a project has reached the ten year mark, automated tests cut 97% of the cumulative test days.
That’s reason enough to adopt automated tests, but there’s a pile of secondary benefits.
- Your code improves. You write cleaner APIs and decouple logic when you’re forced to eat your own dog food.
- You clearly define business logic. You ask questions when you need to define “correct” with true and false.
- You have supplemental documentation. A comment explaining your weird code won’t stop someone from trying to refactor it. However, a failing test gets your point across.
There is a clear return-on-investment in automated-testing, so I was curious how many people do it. The results were depressing. Anywhere from 39% to 54% of developers don’t write any sort of tests, according to the Department of Defense Information Analysis Center (2007), a GeoSpecial Developer Survey (2008), a .NET Developer site (2007), and this consultancy (2008).
This post will eliminate the excuse, “I don’t know where to start.” It will be a primer if you’ve never tested, and provide pointers if you’re already familiar. We’ll see:
- Structure of tests
- Levels of tests
- Tools to improve test adoption
While the examples I use are written in Ruby, most have equivalents in other languages. If you’re on the JVM or CLR, you can even integrate these tools directly through JRuby or Iron Ruby, respectively.
Structure of a Test Case
As an example, we’ll use photo cropper I developed for starrup, an actor website service. When an actor uploads a headshot, we present them with an ajax interface to crop it for thumbnails. This clearly needs tests because image libraries are ugly, brittle beasts, and we wouldn’t be surprised if they broke between updates.
If we had no experience with test frameworks, we’d write this script that throws an exception if our code stops working.
@new_photo = @photo.crop_at(0, 0, 300) if !@new_photo raise "Did not create a photo." end if @new_photo.x != 300 raise "Photo didn't crop." end
That works, but we’re going to run into problems:
- “if/raise” gets ugly fast.
- Using an exception confusing when the app throws its own exceptions.
- If we run a bunch of tests in a row, we aren’t sure if one test is affecting another.
- If we run tests independently, our setup/teardown will get rundant.
These problems are all solved by test frameworks. The same test in xUnit style:
class CroppingTest < Test::Unit::TestCase def setup @photo = Photo.find(1) end def test_cropping @new_photo = @photo.crop_at!(0, 0, 300) assert(@new_photo) assert_equal(300, @new_photo.x) end def test_cropping_files_too_big @new_photo = @photo.crop_at(0, 0, 700) assert_equal(600, @new_photo.x) end end
Here’s what you get with a framework:
- “assert” is shorthand for the if/raise.
- Assert throw failures, with helpful diagnostic information.
- Methods that starts with “test” is an independent test.
- “setup” gets run before every test.
This system is great. Most people who write tests use some variation of this framework. But frequently you’ll run into people who use the framework correctly but miss the big picture. You’ll see giant meandering tests. You’ll see tests that touch a bunch of unrelated functionality, without motivation.
Like the rest of your code, good tests are concise, specific, and to the point. The clearest way you do that is by testing specific behavior.
We see “behavior driven development” frameworks like rSpec that hammer home this point. They load syntactic sugar onto xUnit.
describe "cropping" do before(:each) do @photo = Photo.find(1) # A 600x600 photo end it "should crop images" do @new_photo = @photo.crop_at(0, 0, 300) @photo.x.should == 300 end it "should not crop larger crop regions" do @new_photo = @photo.crop_at(0, 0, 700) @new_photo.x.should == 600 end end
When you run that, you even get pretty colored reports.
This format approaches natural language, but what if you pushed this one step farther? What if you could sit down with business people and write a user story like this:
Feature: Photo cropping In order to improve headshot photos As a subscriber I want to crop photos Scenario Outline: Crop a valid image Given I have a 600 x 600 image When I crop 300 pixels Then the width should be 300
and what if you could parse it and turn it into executable code? Enter Cucumber.
Given /I have a (\d+) x (\d+) image/ do |x, y| @photo = Photo.find(:first, :conditions => ['x = ? AND y = ?', x, y]) end When /When I crop (\d+) pixels/ do |width| @new_photo = @photo.crop(0, 0, width) end Then /Then the width should be (\d+)/ do |width| @new_photo.x.should == size end
Cucumber adds automation to acceptance testing. You define how to parse the Given/When/Then lines as executable steps, and then you can mix and match them to make testable scenarios.
Depth of Testing
We’ve tested the image cropping operation. We haven’t tested that incoming http requests call our code, or that we have security in place to prevent users from cropping each other’s photos, or that we redirect users to the photo after we’ve cropped it. To test all of this together, we use integration tests.
Unit tests, as we’ve seen, test individual components. Integration tests test the wiring. Unit tests test object directly, while integration tests make full requests through the HTTP stack. Consequently, you write fewer integration tests than unit tests, because they’re fragile.
There are a few popular ways to perform this “black box” testing. One is Webrat, which uses a fake browser to hit URLs and parse the resulting DOM. Webrat can fill out forms, follow redirects, and more.
def test_sign_up visit "/" click_link "Sign up" fill_in "Email", :with => "firstname.lastname@example.org" select "Free account" click_button "Register" # ... end
Selenium RC uses a cross-site-scripting hack to control real browsers and inspect page results. It works with Internet Explorer, Firefox, and Safari.
open '/login' type 'name', name type 'password', password click 'submit', :wait => true
However, given the Rube Goldberge like setup, Selenium RC is more prone to false positives than webrat.
Tools for Monitoring Coverage
Once you know how to do it, testing is easy. The real challenge is the social change to get people on board. How do you get people to treat test failures seriously? How do you ensure your tests are exhaustive? How do you know who is pulling their weight?
Cruise Control is a tool that runs tests on every commit and sends an email to your team when tests break. You install it on a server and tell it the repository to watch, and it handles everything from there. You should already be running tests before every commit, so the real value is the social stigma when a member of the team breaks something.
If you don’t have the capacity to devote a server to continuous integration testing, there are third party services like run-code-run that will handle everything for you.
To ensure you’re testing your whole app, there are code coverage tools like rcov. rcov monitors the lines of code in your production app that are executed when you run your test suite, and prints a report at the end telling you the code that wasn’t run. It even has options to mutate your code as it runs, passing “true” when things should be “false”, to check if things aren’t failing when they should.
Finally, Jacob Dunphy at AT&T Interactive wrote Kablame. It’s a tool that runs “blame” and counts the lines of code written per-person in a given directory. He wrote it to figure out who wrote the most and least tests in a given project. You get a top-10 to turn the process into a contest.
That’s only the beginning
If this was your first exposure to testing, it was overwhelming, but by no means comprehensive. You may think this is all too hard, or all a waste a time. Didn’t you feel the same way about object oriented programming? Revision control? When you started with those, did you go yak shaving (Java, design patterns), or wasted on immature tools (CVS, Visual SourceSafe)?
The good news is that testing frameworks and techniques are reaching maturity. The bad news: in a few years, you’ll either know them or go extinct.