Automating Cucumber test scripts on a Mac

I’m really enjoying my work at the BBC on the Olympics team! I’m primarily implementing Cucumber scenarios to verify the work that the developers are doing. I’m also using other tools for stress testing and performance monitoring.

My work here is leading me to use Cucumber in weird and wonderful ways! Here’s an example. Yesterday i was asked to run a script every half an hour that tests the Olympics video player to see if it’s playing successfully, and if not, to email people to say that it’s broken. Not your usual Cucumber scenario, but amazingly enough, it can be used to do that!

Forgive the jargon below. PID is a programme identifier, and IVP is the Interactive Video Player that we are building.

  Scenario: Check LIVE video player
    Given a live PID 
    When I check the IVP 
    Then I should email people if anything goes wrong

I can use a combination of javascript hooks and taking screenshots a few seconds apart and comparing them to tell whether the video is running. I decided to run some step definitions that i’ve already defined and rescue any errors to be emailed. There are a few more steps in here but i’ve stripped it down a bit for simplicity.

When /^I check the IVP$/ do                                                                         
  if @pid
    begin
      step 'I attempt to view the IVP host page with that PID'
      step 'the video should be ready'
      step 'the video should be playing'
    rescue Exception => e
      @error = e.message
    end
  end
end

Notice that there might not actually be a live PID at the moment. It feels very weird, to put conditional logic in Cucumber step definitions like this!

Now comes the email part. I used Pony and sendmail as a simple email sending mechanism.

Then /^I should email people if anything goes wrong$/ do
  if @pid && @error
    subject = "Assurance test failed in #{ENV['ENVIRONMENT']}!"
    body = "#{page.current_url} caused this error: #{@error}"
    recipients = ['send_to@me.co.uk']

    Pony.mail(:to => recipients,
              :subject => subject,
              :body => body,
              :via => :sendmail)
  end
end

So, with that working, i set about putting it on a schedule. I thought i could just call the script from a cron task, but it turns out that wouldn’t work because i need to launch Firefox with Selenium to test the flash player.

I found this cool built-in Mac tool called Automator. You can set up all sorts of tasks into a workflow. I just needed a simple shell script:

And now, we can call that from a cron job every 30 minutes! You might have to hunt around a bit for where it saves the file. For me it’s in my Library/Services.

*/30 * * * * automator /Users/daniea16/Library/Services/IVP\ Test.workflow

Another creative alternative to the cron job is to set up recurring calendar appointments in iCal. You can set an alarm that runs a script. Pretty awesome, hey?! :)

I know Cucumber is meant to be a BDD framework, and was never intended to be used for automated regression testing or assurance testing, but isn’t it cool that with a little bit of imagination you can bend it to fit these alternative uses? :)

Cucumber running headless Selenium with Jenkins (the easy way)

Selenium is a great way to test your javascript from Cucumber. It opens up a browser and you actually see it clicking around, filling in fields and submitting forms right before your eyes. It’s cool.

Jenkins is a brilliant continuous integration and deployment tool. I recently set it up for a new project i’ve started. It checks for git commits, pulls the latest code, runs all the specs and cucumber scenarios, and if they all pass, it deploys the code to five different websites. It’s great because we just have to git push and all five websites will get updated.

But Jenkins runs all its commands on the command line, and it doesn’t actually have a display on which it can run a browser. This isn’t too big a problem. You can use Xvfb virtual frame buffer which can emulate a display for you.

The old way to do this was to configure Xvfb to make a display (we usually use 99 to avoid conflicts) and write little shell scripts to start and stop this display, and export an environment variable to make sure the display is used ………… YAWN!

Now enter the gem headless. It does all those boring things behind the scenes, making it much simpler. I got this tip from 8th light’s blog post Jenkins, RVM, and Selenium. To get headless Selenium you simply have to do the following:

Install Xvfb:

sudo apt-get install xvfb

Require headless in your Gemfile:

gem 'headless'

Add this little snippet to features/support/env.rb:

if ENV['HEADLESS'] == 'true'
  require 'headless'

  headless = Headless.new
  headless.start

  at_exit do
    headless.destroy
  end
end

Tag your cucumber scenarios with @javascript for those that require selenium.

Call cucumber like this:

HEADLESS=true cucumber

It Just Works™!

As an added bonus, when i’m working remotely i’m often tunelling via SSH to a shared screen on a remote server. I can use the same HEADLESS=true trick to run my selenium scenarios remotely.

How BDD focusses on the ‘why’

Today i got talking with somebody who was unconvinced of the benefits of behaviour driven development. I hope i’m not misquoting them, but they seemed to be saying that BDD doesn’t tell you why something happens, only what happens.

This person also said that BDD doesn’t encourage you to code correctly. That i agree with. You can code poorly with any test tools or development framework. The only thing that’s going to encourage you to code cleanly is your own standards, or those that are instilled into you by requirement, or pair programming, or code review.

To me, the focus on the why is a key difference of BDD as opposed to TDD. Here’s an example for comparison:

Test First Development tells you that you’re going to use an Array of names and it’s going to receive the shuffle method. You write the test because you’ve already decided that’s what you’re going to do. It is brittle because a change in the code will require you to rewrite the test.

Test Driven Development says that somehow some names are going to be input, and after the process they’ll come out in a random order. It doesn’t necessarily mind whether you use an Array, Hash, a shuffle method or write your own randomising function. You let the tests drive you to make those decisions when you need to.

Behaviour Driven Development tells you about me as a key stakeholder, and that i want to draw names in a random order, so that i can assign prizes fairly. It tells a story about the people entering, explaining in natural language that the names are drawn out of a hat, and when they are, the first randomly chosen person picked gets the first prize, and their name is put aside so that they don’t win another prize. How you implement the details to achieve this behaviour is entirely up to you as a developer. But you should understand quite well what you’re expecting to achieve, and why.

Okay, i know, it’s a horrible example because it’s quite difficult to test for randomness. Given an unbiased coin / When i flip the coin / Then i should get … either heads or tails … Anyway, hopefully those examples show how i consider the test methodologies to differ.

My mentor Enrique recently helped me to understand BDD, particularly Cucumber scenarios, as documentation of the understanding between the client and the developer. It just so happens that you can execute this documentation to help you write your code, and continue to use it for regression testing.

Now i pass over to you, my readers! Are your opinions different from mine? What have i missed out, misunderstood, or misrepresented? I’m not an expert here! Fortunately, you don’t need to be a BDD expert to benefit from using it! :)

Pretty autospec growl messages

It’s interesting how you can sometimes notice something through someone else’s eyes that you’d previously overlooked. This week i have been mostly working with my apprentice Despo on a Rails project. Despo noticed that running specs in Rails takes a long time to initialize, which something i am well aware of but i guess i’ve sort of got used to it.

So we decided to do something about it. The first thing we did was set up Spork. If you want to know how to do that i recommend this post by Chris: How to get Spork working NOW on Rails 3, Rspec 2 and Cucumber.

As Chris mentioned at the end of that post, autotest and autotest-growl play well with Spork, so that is what we did next. The results can be quite nice!

Autotest notices which files have changed and starts running the relevant specs:

Autotest notices modifications

Growl can be configured to show failure messages in red:

Growl can be configured to show failure messages in red

Success messages are shown in green:

Success messages are shown in green

Growl can also show when specs are pending:

Growl can also show when specs are pending

How did we achieve this?

Firstly you will need to install the gems autotest and autotest-growl. Either include them in a bundle, or install them system-wide if you want to use them across all projects.

If you use RVM for different versions of ruby you might find this command useful:

for version in $(rvm list strings); do rvm use $version@global && gem install autotest && gem install autotest-growl; done

Secondly you will need a .autotest file. You can either put it in your project’s root directory, or in your home directory ~/.autotest

Here is mine so you can see what it looks like:

#!/usr/bin/env ruby
require 'rubygems'
require 'autotest/growl'

Autotest::Growl::image_dir = File.join(ENV['HOME'], '.autotest_images')
Autotest::Growl::show_modified_files = true

Autotest.add_hook :initialize do |autotest|
  %w(.git .DS_Store db log tmp).each do |exception|
    autotest.add_exception(exception)
  end
  autotest.order = :random
end

Thirdly you will need to install Growl and configure it to your liking. I have mine set to Music Video because it’s so impossible to miss!

The only trouble with the Music Video style is it can only display one message at a time, so i make it fade in and last for just 1 second. Happily, with growl colouring, you don’t really need to read the message – the colour tells you immediately what happened.

The colours you need to set are:

  • Very Low for passed
  • Moderate for pending
  • Normal for info
  • Emergency for failed (or syntax error)

Here is my Growl configuration:

Here is my Growl configuration

Finally, if you want cute pictures to come up with your growl messages, you simply need to put them in ~/.autotest_images – named passed.png, pending.png, info.png, failed.png and error.png.

These are my autotest images

Running autotest

Now, of course, you need to know how to run it!

With Rails 3 you need

AUTOFEATURE=true autotest

With Rails 2 it’s

AUTOFEATURE=true autospec

If you only want to run specs and not features, just take away the AUTOFEATURE=true bit.

But what about FSEvent?

The normal behaviour for autotest is to constantly poll your filesystem looking for changes. This is wasteful as it will use a lot of CPU and drain your battery, but there is an alternative. From Mac OSX 10.5 onwards an FSEvent service reports modified files.

To make use of FSEvent, you should be able to just install the autotest-fsevent gem and require 'autotest/fsevent' in your .autotest file.

Unfortunately, when i tried it, the mere presence of the autotest-fsevent gem seems to cause the specs not to run. It notices changes but doesn’t do anything about them. So if anyone can help me to understand what’s up with that, i would be grateful!

In the mean time, i’m just trying to remember to stop autotest whenever i’ve finished using it.

Using Cucumber to test concurrency issues

Recently i encountered a concurrency problem of the type where there is a queue of things to do, and users press a button to be automatically assigned the next item in the queue. The bug report was that two users could get assigned the same item.

My pair programmer and i tried to reproduce the problem using two computers, but we couldn’t. We were only running one Rails instance, but we know that in the production environment there are multiple load-balanced servers pointing to one database, so we had an inkling that we’d be able to produce it using multi-threading.

To give it a test, we wrote a Rake task which we ran in two terminal windows to mimic the simultaneous access. The Rake task looked something like this:

require File.join(File.dirname(__FILE__), '..', '..', 'config', 'environment.rb')
namespace :test do
  task :take_next_for, :login do |t, args|
    user = User.find_by_login(args[:login])
    user.take_next_item
    puts user.item.inspect
  end
end

This is easily called by running:

rake test:take_next_for['ann']

We ran it for two users simultaneously and inspected the output. Sure enough they were being assigned the same item.

Since there is only one database, we knew that we could fix it with a carefully placed transaction and lock on the database. But we wanted to add a Cucumber feature so that we could be sure it was working, and to give us confidence that the bug would not come back again in the future.

  Scenario: Two users take next item simultaneously
    Given a user with login "ann"
    And a user with login "bob"
    And an available item called "Item 1"
    And an available item called "Item 2"
    When two users attempt to take the next item at the same time
    Then they should each have taken different items

Notice we can’t actually say who gets which item – it’s a race condition. We can only check that both of them have an item and that they are not the same item. We could alternatively check that both of the items have successfully been taken.

Testing this concurrency issue in Cucumber turned out to be somewhat tricky. We tried using simple Ruby threads in Cucumber, but it wasn’t properly simultaneously. I guess the single Cucumber environment still only does one thing at a time. So it was back to the Rake task.

When /^two users attempt to take the next item at the same time$/ do
  t1 = Thread.new { `RAILS_ENV=cucumber rake test:take_next_for['ann']` }
  t2 = Thread.new { `RAILS_ENV=cucumber rake test:take_next_for['bob']` }
  t1.join
  t2.join
end

We ‘join’ the two threads to make sure they’ve both finished before carrying on.

It’s slow because it loads up a whole new Rails environment for each of the Rake tasks, but that is exactly what we want to do, to mimic the concurrency of the production system.

The next problem we encountered was that Cucumber scenarios are run inside a transaction which means that a Rake task running outside of it cannot see the users and items we just created. So we had to tag the scenario as @no-txn so that they would be available externally and @clean-up-afterwards so that we could remove them from the database.

After "@clean-up-afterwards" do
  User.destroy_all
  Item.destroy_all
end

With this in place the Cucumber scenario failed as we hoped it would! Then it was simply a matter of creating a transaction from the moment we find the next item (with a database lock) until we have successfully assigned the item. This is a simplified version of what we ended up with:

class User < ActiveRecord::Base
  has_one :item

  def take_next_item
    transaction do
      item = Item.available.by_priority.find(:first, :lock => true)
      self.item = item
    end
  end

end

The Cucumber scenario passed and the problem was solved. In the live system, if two users now try to take an item at the same time, one of them has to wait a moment until the database has finished assigning to the first user so that it can assign a different item to the second user.

How would you have tested a concurrency issue like this? Are there better ways of imitating a multi-server production environment than the solution we came up with?

Javascript BDD with Cucumber and Harmony

Inspired by tooky‘s recent post Exploring Harmony for javascript BDD with RSpec i have become determined to become better at writing javascript in a BDD kind of way.

On Monday we were privileged to have Corey Haines visit Eden for the day. I paired with Corey for a couple of hours learning about BlueRidge for unit-testing javascript classes in a Rails application. I liked what i learned but i am disappointed by the need to maintain HTML fixtures. I would like to run the specs directly on my application.

I was pairing with Tris today and we really wanted to run javascript straight from Cucumber scenarios, so we had another look at Harmony. The first thing we noticed that it needed to load a file (rather like a fixture), which we didn’t like very much. We went to see how HolyGrail does it.

Maybe it’s because we’re using Capybara not Webrat, or maybe it’s because we did something wrong, but we couldn’t get HolyGrail to work right. However, a quick peek at the source code gave us a few clues. We pulled out and tweaked the following:

def js(code)
  @__page ||= Harmony::Page.new(page.body.to_s)
  @__page.execute_js(code)
end

This worked pretty nicely for a step involving simple javascript, like:

Then /^the page is titled "([^\"]*)"$/ do |title|
  js('document.title').should == title
end

It was actually pretty exciting when we first saw that working! On to something more meaningful …

  Scenario: Use javascript to hide a box
    Given I am on a page with a box that can be hidden
    When I use javascript to click "Hide"
    Then the box should be hidden

Clicking the link with javascript actually wasn’t hard:

When /^I use javascript to click "([^\"]*)"$/ do |text|
  js("$('a:contains(#{text})')").click
end

We used jQuery to find the right link and click it using Harmony. Harmony sends the ‘click’ method straight through to the javascript object, which is pretty cool.

Ensuring the box disappears was a little more tricky. For the moment we’re just checking that the contents becomes empty.

Then 'the box should be hidden' do
  js("$('.hide_box')").html.should be_nil
end

Sure enough, when we plug in the unobtrusive jQuery code to bind the click and remove the box, the feature passes! Alright, this is a very small step down a long road, but to me it is very exciting indeed!

Next steps: we have already begun writing a Capybara driver to run Harmony. So that we don’t have to write a separate step definition When I use javascript to click “Hide” but we simply use the standard When I follow “Hide” and we tag the scenario with @harmony to use the alternative driver. That will first attempt to click it with javascript and, if nothing happens, it will follow the link in the normal way. We will also use the driver to handle when the @__page cached copy gets refreshed.

An important next step is make it a proper integration test, interacting with the full stack, such as clicking something which triggers an AJAX request to the server which must run a bit of code and send the response back to the page. I can foresee this presenting some challenges, we’ll see!

Exciting things are happening, and this is just the beginning! Web development these days is all about standing on the shoulders of giants. This is only possible thanks to the great work already being done by Harmony, Johnson and SpiderMonkey. I’m hoping that other people will become inspired by the possibilities and take this further. Watch this space for more news!