The State of Testing

When I joined my first large team, I was naive. When I found bad code, I said, “you should leave things better than you found it.” One day I found a method name that wasn’t following conventions, and as pedantic as it sounds, the right thing to do was rename it. So I did. I updated all references, double checked my change, clicked through to make sure nothing broke, and finally, confident, committed my work.

When that change went live, our advertisers stopped receiving impression data. Our profit was entirely from advertisers. This was disastrous.

The bug: inside a configuration file, which nobody knew about, was a set of hard coded method names monitored by the logger code. Nobody knew about this file because the logging system was written a year before, and all engineers who worked on it had left or changed departments.

We eliminated these types of regressions by developing an automated test suite. According to a publication released by the Department of Defense, half the cost of a project can be QA. That same article goes on to say that by the time a project has reached the ten year mark, automated tests cut 97% of the cumulative test days.

That’s reason enough to adopt automated tests, but there’s a pile of secondary benefits.

There is a clear return-on-investment in automated-testing, so I was curious how many people do it. The results were depressing. Anywhere from 39% to 54% of developers don’t write any sort of tests, according to the Department of Defense Information Analysis Center (2007), a GeoSpecial Developer Survey (2008), a .NET Developer site (2007), and this consultancy (2008).

This post will eliminate the excuse, “I don’t know where to start.” It will be a primer if you’ve never tested, and provide pointers if you’re already familiar. We’ll see:

While the examples I use are written in Ruby, most have equivalents in other languages. If you’re on the JVM or CLR, you can even integrate these tools directly through JRuby or Iron Ruby, respectively.

Structure of a Test Case

As an example, we’ll use photo cropper I developed for starrup, an actor website service. When an actor uploads a headshot, we present them with an ajax interface to crop it for thumbnails. This clearly needs tests because image libraries are ugly, brittle beasts, and we wouldn’t be surprised if they broke between updates.

If we had no experience with test frameworks, we’d write this script that throws an exception if our code stops working.

  @new_photo = @photo.crop_at(0, 0, 300)
  if !@new_photo
    raise "Did not create a photo."
  end
  if @new_photo.x != 300
    raise "Photo didn't crop."
  end

That works, but we’re going to run into problems:

These problems are all solved by test frameworks. The same test in xUnit style:

  class CroppingTest < Test::Unit::TestCase
  
    def setup
      @photo = Photo.find(1)
    end

    def test_cropping
      @new_photo = @photo.crop_at!(0, 0, 300)
      assert(@new_photo)
      assert_equal(300, @new_photo.x)
    end

    def test_cropping_files_too_big
      @new_photo = @photo.crop_at(0, 0, 700)
      assert_equal(600,  @new_photo.x)
    end

  end

Here’s what you get with a framework:

This system is great. Most people who write tests use some variation of this framework. But frequently you’ll run into people who use the framework correctly but miss the big picture. You’ll see giant meandering tests. You’ll see tests that touch a bunch of unrelated functionality, without motivation.

Like the rest of your code, good tests are concise, specific, and to the point. The clearest way you do that is by testing specific behavior.

We see “behavior driven development” frameworks like rSpec that hammer home this point. They load syntactic sugar onto xUnit.

  describe "cropping" do

    before(:each) do
      @photo = Photo.find(1) # A 600x600 photo
    end
    
    it "should crop images" do
      @new_photo = @photo.crop_at(0, 0, 300)
      @photo.x.should == 300
    end
    
    it "should not crop larger crop regions" do
      @new_photo = @photo.crop_at(0, 0, 700)
      @new_photo.x.should == 600
    end
    
  end

When you run that, you even get pretty colored reports.

This format approaches natural language, but what if you pushed this one step farther? What if you could sit down with business people and write a user story like this:

  Feature: Photo cropping
    In order to improve headshot photos
    As a subscriber
    I want to crop photos

    Scenario Outline: Crop a valid image
      Given I have a 600 x 600 image
      When I crop 300 pixels
      Then the width should be 300

and what if you could parse it and turn it into executable code? Enter Cucumber.

  Given /I have a (\d+) x (\d+) image/ do |x, y|
    @photo = Photo.find(:first,
      :conditions => ['x = ? AND y = ?', x, y])
  end

  When /When I crop (\d+) pixels/ do |width|
    @new_photo = @photo.crop(0, 0, width)
  end

  Then /Then the width should be (\d+)/ do |width|
    @new_photo.x.should == size
  end

Cucumber adds automation to acceptance testing. You define how to parse the Given/When/Then lines as executable steps, and then you can mix and match them to make testable scenarios.

Depth of Testing

We’ve tested the image cropping operation. We haven’t tested that incoming http requests call our code, or that we have security in place to prevent users from cropping each other’s photos, or that we redirect users to the photo after we’ve cropped it. To test all of this together, we use integration tests.

Unit tests, as we’ve seen, test individual components. Integration tests test the wiring. Unit tests test object directly, while integration tests make full requests through the HTTP stack. Consequently, you write fewer integration tests than unit tests, because they’re fragile.

There are a few popular ways to perform this “black box” testing. One is Webrat, which uses a fake browser to hit URLs and parse the resulting DOM. Webrat can fill out forms, follow redirects, and more.

  def test_sign_up
    visit "/"
    click_link "Sign up"
    fill_in "Email", :with => "good@example.com"
    select "Free account"
    click_button "Register"
    # ...
  end

However, it can’t execute javascript. If you want to do that, you want to use a real browser, and the solution is Selenium RC.

Selenium RC uses a cross-site-scripting hack to control real browsers and inspect page results. It works with Internet Explorer, Firefox, and Safari.

  open '/login'
  type 'name', name
  type 'password', password
  click 'submit', :wait => true

However, given the Rube Goldberge like setup, Selenium RC is more prone to false positives than webrat.

John Barnette and Aaron Patterson are working on the cleanest solution to javascript tests. They use a webrat like system with Mozilla Spidermonkey javascript engine embedded inside Ruby. This solution is still in the works, but it’s very promising.

Tools for Monitoring Coverage

Once you know how to do it, testing is easy. The real challenge is the social change to get people on board. How do you get people to treat test failures seriously? How do you ensure your tests are exhaustive? How do you know who is pulling their weight?

Cruise Control is a tool that runs tests on every commit and sends an email to your team when tests break. You install it on a server and tell it the repository to watch, and it handles everything from there. You should already be running tests before every commit, so the real value is the social stigma when a member of the team breaks something.

If you don’t have the capacity to devote a server to continuous integration testing, there are third party services like run-code-run that will handle everything for you.

To ensure you’re testing your whole app, there are code coverage tools like rcov. rcov monitors the lines of code in your production app that are executed when you run your test suite, and prints a report at the end telling you the code that wasn’t run. It even has options to mutate your code as it runs, passing “true” when things should be “false”, to check if things aren’t failing when they should.

Finally, Jacob Dunphy at AT&T Interactive wrote Kablame. It’s a tool that runs “blame” and counts the lines of code written per-person in a given directory. He wrote it to figure out who wrote the most and least tests in a given project. You get a top-10 to turn the process into a contest.

That’s only the beginning

If this was your first exposure to testing, it was overwhelming, but by no means comprehensive. You may think this is all too hard, or all a waste a time. Didn’t you feel the same way about object oriented programming? Revision control? When you started with those, did you go yak shaving (Java, design patterns), or wasted on immature tools (CVS, Visual SourceSafe)?

The good news is that testing frameworks and techniques are reaching maturity. The bad news: in a few years, you’ll either know them or go extinct.