Introduction

Cucumber is not the only engine supporting natural language instructions. It's just one implementation of natural language instructions interpreter. The actual language to write tests with is called Gherkin. And it has different implementations adopted to different programming language. Thus we have:

Cucumber for Ruby
JBehave for Java
NBehave and SpecFlow for C#
Freshen for Python
Behat for PHP

This list isn't complete as there can be many other similar engines which are simply less popular. All of them have some common set of supported features but there're some restrictions and abilities specific to the actual engine. So, the aim of this post is to collect useful features for each listed above engine and present it in some comparable form. Key features to be mentioned are:

Documentation availability
Flexibility in passing parameters
Auto-complete
Steps, scenario and feature scoping
Complex steps
Hooks and pre-conditions
Binding to code
Formatting flexibility
Built-in reports
Input data sources support

Each feature has some sub-features which reflect some specific part of functionality. And those little things make each engine different and unique in some cases. So, let's take a closer look at each feature. In order to make some comparison characteristics let's introduce the scale of support quality. Each feature can be values with the grade from 0 to 3 by the following criteria:

Grade	Criteria
0	No support at all
1	Functionality exists but with serious restrictions
2	Major functionality exists
3	Full-featured support

Documentation availability

One of the key factors demonstrating the maturity of the engine is the documentation availability and it's completeness. Indeed, when we start using new engine the first thing we usually do is documentation reading where we can take some examples, list of features etc. Actually, the documentation is the part of software product (this is one of the differences between application and software product), so if there's a lack of documentation it indicates that product isn't complete. Additional source of documentation is various online materials we can find in Internet from specialized resources, blogs etc. So, when I estimate the grade of documentation availability I check the following criteria:

Documentation is available in general (it makes grade 1 at once)
Every feature is described and has examples (if it fits it makes grade 2)
There're additional well-grown resources (forums, blogs, user groups) where we can find additional information about the engine

First of all I'd like to mention that all engines I evaluate in this article have documentation (the official site actually), so they have grades of 1 as minimum. Situation with documentation completenes is a bit harder. When Cucumber, JBehave, Freshen, Behat and SpecFlow contain more or less complete documentation the NBehave looks worse than it's analogs in this area. Well, at the moment I see that there's not so much activity on the official site since last year. Hope NBehave will grow further. But for now there's a lot of work to do and there's a lot of stuff to do with documentation as well. As for other engine, let's take a look at the available internet resources we can use in addition to documentation. I've just made a brief search for LinkedIn and Google groups to find discussions related to BDD engines. Of course, we can dig more but even brief search shows some results. Here are the findings:

Cucumber:
- Cucumber group on LinkedIn - quite populated place with big number of active discussions
- Cukes Google group
Freshen - honestly speaking I didn't find any specialized resourse dedicated to freshen only. Most likely it can be discussed in the more general forums dedicated to BDD in general.
JBehave:
- JBehave - BDD LinkedIn group
SpecFlow:
- SpecFlow: Pragmatic BDD for .NET
- SpecFlow Google group
Behat:
- Behat/Mink Users LinkedIn group

These are external resources which can be found pretty quick (for me it took less than a minute to find them). So, given all the above information we can evaluate the grades for each BDD engine. The table will look like:

Engine	Documentation availability
Cucumber	3
Freshen	2
JBehave	3
NBehave	1
SpecFlow	3
Behat	3

Flexibility in passing parameters

Passing parameters in the natural language instructions is quite frequent case when we try to re-use existing instructions as much as possible. There can be several cases where we should make our instructions flexible to different variations. They are:

Passing the actual value - e.g. we have a code entering some text into the text field. So, we can create common method where the parameter will specify the text to enter
Small variations of the instruction while the action to perform is the same - typically we need either to use some shortened form of the instruction or we just have to re-phrase existing expression to make the overall test more readable
Passing complex structures - sometimes we have to pass some set of data groupped under some specific entity. E.g. we have to create some order filling all necessary fields. So, if we want to code it we should pass some structure containing all necessary data. Another case is when we just pass multiline text.

All above points can be reflected in the following functional features:

Regular expressions support
Tables support
Multi-line input support
Extra features - features related to parameters passing which isn't common for all engines but adds some additional syntax sugar to the tests

The extra features were added to reflect additional abilities of several engines. E.g. in JBehave and SpecFlow we can specify which parameter can be mapped to the phrase in text instruction. Thus, we are not tied to the sequence of data passed in text instructions. Also, that would reflect additional JBehave features which are not present in other engines like parameter converters, parameter injection (similar thing is available in SpecFlow). NBehave has unique ability to pass an array as inline argument (look at Arrays section). So, let's take our BDD engines and try to estimate the grade for each above feature support. The table will look like:

Engine	Regular expressions support	Tables support	Multi-line input support	Extra features
Cucumber	3	3	3	0
Freshen	3	3	3	0
JBehave	2	3	0	3
NBehave	2	3	0	2
SpecFlow	2	3	3	2
Behat	3	3	3	0

As it's seen from the table the engines for scripting languages are looking better especially in regards of multiline input. Also, there are some small restrictions with regular expressions. E.g. in Cucumber or Behat I can define just part of the phrase and it will match. Something like:

Then /the main page is open/

And it will match the following phrases:

Then the main page is open
Then I should see the main page is open

So, both expression fit regular expression. In JBehave, NBehave and Specflow the same result can be achieved either by pattern variants or by wildcards. Both options can help in getting desired result but they look more complex than it's done for scripting languages. That's why JBehave, NBehave and Specflow have grade 2 for regular expressions support But all those engines equally good at support of tables. That's why corresponding column has the highest grade for all engines.

Auto-complete

I'd say it's one of the most useful feature while writing tests using natural language instructions. Key problem is that natural language instructions used for BDD tests are artificially natural. The actual code is still behind them. So, we still have to keep the phrases in our memory which is difficult for large project where the number of such instructions amounts to hundreds. As the result we're at risk of having multiple phrases expressing the same things but in a bit different form. The entire testing solution grows drammatically due to that. Also, it's more effective if we build our tests by the bricks when we just have to select phrases we need rather than invent new ones from time to time. For this purpose the auto-complete feature is really helpful. You just have to type some key parts of entire phrase and select the most appropriate option. If there's such ability it's definitely great. Unfortunately this feature is quite rare and mostly represented as some IDE plugin for writing stories (it doesn't bind to the actual code or regular expressions we use for binding). So, partially it's supported by Eclipse plugins. At the moment I know only SpecFlow plugin to Visual Studio which fully supports autocomplete. So, support table for this feature looks like:

Engine	Auto-complete support
Cucumber	1
Freshen	1
JBehave	1
NBehave	1
SpecFlow	3
Behat	1

As it's seen from the table the auto-complete is the weakest feature at the moment for BDD engines.

Steps, scenario and feature scoping

There are two major cases when we need such feature:

We want to run only some sub-group of tests - this is typically done using tags or any other meta information both on feature and scenario level
The same instruction should call different code depending on what the functionality under test is - this feature is rather related to scoped steps when some step definitions are available only if we run some specific tags.

Tagging is supported by all engines (maybe in a different forms) except NBehave (at least this information isn't available in documentation). The situation with step scoping is not so good. More or less full-featured support of it can be found only for SpecFlow. JBehave has steps prioritization feature which is close to steps scoping but it's not exactly what is needed. So for those features the support table looks like:

Engine	Tagging support	Scoped steps support
Cucumber	3	0
Freshen	3	0
JBehave	3	1
NBehave	0	0
SpecFlow	3	3
Behat	3	0

Complex steps

Each business functionality consists of many smaller actions performing some generic operations. The higher level of test abstraction is the more generic operations are needed to do necessary action. At the same time it's inconvenient to copy/paste huge number of lines. It's useful to have some high level instruction which just groups generic actions into some bigger formation. Well, it's quite frequent functionality as well as it always can be done using engine libraries. So, if we talk about grades, all engines have such functionality. As the result they have grade of at least 2. But using engine classes isn't very good inside the test code. It's much more convenient to call Givens, Whens, Thens explicitly from the code just like it's done in Cucumber. For instance:

Given /I'm on the search page/ do
    Given "I'm logged into the system"
    When 'I click on the "Search" link'
    And 'wait for page to load'
    Then 'I should see the "Search" page is open'
end

Such constructions make additional abstraction layer where we write almost no code but text instructions. After some time such layer appear anyway. And the earlier we switch to that layer the less work we should do to create new tests (we don't need to write the code implementing text instructions). The engines for scripting languages like Ruby, Python, PHP support such functionality (proof for Cucumber, proof for Freshen, proof for Behat). Additionally, JBehave has specific annotation called "Composite" which is very helpful here. So, from above information we can make the following table:

Engine	Composite steps
Cucumber	3
Freshen	3
JBehave	3
NBehave	2
SpecFlow	2
Behat	3

Hooks and pre-conditions

Of course, a lot of tests usually require some initial state before they start running. Typically such actions are implemented using backgrounds. Background is some scenario which is running before each scenario in the feature. Usually there's only one background per feature. Backgrounds are supported by most of the engines I make overview for. The exceptions here are NBehave and JBehave (I was quite surprized while seing that fact because it's one of the basic features). However you can find how backgrounds are used in other engines by the following links:

Backgrounds in Cucumber
Backgrounds in Freshen
Backgrounds in SpecFlow (actually it refers to Cucumber documentation)
Backgrounds in Behat

Also, some actions are needed to be done after some event occurs (e.g. we should clean temporary folder after each test completes). Every engine has specific set of methods called "Hooks". Those methods allows customizing actions to be done before/after step, feature, scenario and run at all. Typically it's useful for custom logging but you can place any code you want. Hooks are implemented in a different way for each engine but the idea is the same. Here are references to hooks documentation:

Unfortunately I can't find anything similar for JBehave and NBehave. Partially, hooks can be replaced with corresponding methods used by xUnit engines (e.g. JBehave actually wraps over JUnit so we can specify JUnit before-methods) but it's rather workaround than full-efatured support. All above engines can be graded in the following table:

Engine	Backgrounds	Hooks
Cucumber	3	3
Freshen	3	3
JBehave	1	1
NBehave	0	1
SpecFlow	3	3
Behat	3	3

Binding to code

This characteristics shows how convenient is to bind text instructions to the code. When we add new text instruction we should assosiate it with some executable code. The same way if some test instruction is changed we should update the binding. Also, the binding level is another layer of code which we should support. And it actually brings some overhead while developing tests. Let's see how it works using Cucumber as an example. E.g. we have some method performing some functionality:

def test_method
    # Here are some actions
end

And we reserved some text instruction which should call this method. Let's say, for example:

When I call test method

In order to make proper binding we should write additional code like:

When "I call test method" do
    test_method
end

Thus we have additional code level we should support as well as we have to spend extra time for writing that binding. Also, with such organization there's a risk of mixing business and system logic because binding is actually the same executable code and you can use any code there (it can be multiple method calls). Additional trouble can happen when you mix method and step definition calls or even if you try to find where each specific method is called. So, we should pay more attention to the code organization. The same problem is valid for Behat. But other engines do not have most of the above potential problems as they use annotations/attributes for binding. As the result all the time each test instruction corresponds to some specific method. Such one-to-one correspondence simplifies navigation through the code as well as supports better framework organization when we just have core code (implementing business functionality) which is marked with corresponding expressions to bind it to text instructions. So, Freshen, JBehave, NBehave, SpecFlow have serious advantage in this area in comparison to Cucumber and Behat. So, the grades can be set in the following way:

Engine	Binding to code
Cucumber	2
Freshen	3
JBehave	3
NBehave	3
SpecFlow	3
Behat	2

All above engines support bindings but Cucumber and Behat just have some inconveniences with them. That's why they have lower grade than others.

Formatting flexibility

Since Gherkin was designed to make tests representation human-readable the formatting plays essential role in test design. E.g. it's convenient to outline each scenario, align columns, outline tables. But some engines are still sensitive to the format which brings serious restrictions. Thus, Jbehave is sensitive for heading tabs in story files. So, you can't format this file well to make it convenient for reading. All other engines don't have such troubles. So, hope JBehave will get rid of it as well in the nearest future. But now the grades are distributed in the following way:

Engine	Formatting
Cucumber	3
Freshen	3
JBehave	1
NBehave	3
SpecFlow	3
Behat	3

Built-in reports

Reporting is one of the most important part of the testing and automation as well. It's not enough to say that test is passed or failed. Very often we should identify where it fails. It's vital for functional tests where we perform sequence of steps with some interim checks and such test can fail everywhere. And we should be able to say what was the actual reason of fail. The BDD engines have some advantage in this area. Since tests are designed using natural language instructions and we have an ability to control which steps were executed (using hooks we can retrieve text instruction which is being processede now). So, in addition to error messages we can produce steps to reproduce which seriously simplifies an effort of results analysis. Behat and Cucumber even have such build-in formats. E.g. you can specify the HTML output and engines will generate informative reports like this one:

Unfortunately, some other engines are usually based on unit tests engines like JUnit, NUnit and they usually produce standard report which works for unit tests but without steps information. E.g. SpecFlow works this way. Actually it generates NUnit (or MSTest if appropriate configuration is made) code and all tests are executed as NUnit tests. Similar problem is for Freshen, JBehave, NBehave. In order to have informative reports we should spend some time customizing report using hooks or similar mechanisms. However, it's doable anyway. So, let's grade our engines by current criteria with the following values:

Engine	Built-in reports
Cucumber	3
Freshen	2
JBehave	2
NBehave	2
SpecFlow	2
Behat	3

Input data sources support

Sometimes there's necessity to kepp some test data or tests outside of source code repository. E.g. you can store tests in some external location where some other people can make some updates. Or you can store tests as requirement documents. Anyway, sometimes it's useful to have an ability to use such shared resources. Also, it's convenient when you can include some part of the tests into another test located in a different file. It's just minimizes copy/paste operations. E.g. you have some steps which are used as background in one feature and you want to re-use it in some other features. If you can simply include needed file that would be much faster than copying repetitive code. That would be clearly seen during the maintenance. So, how things are going in that space? Not very good actually. More or less serious support of external resources can be found for JBehave. There's an ability to use files not only from local machine but also from some specific URL (there's even support for Google Docs). Also, Freshen had an ability to use URL to reference to test data. In terms of inclusions Freshen is still good. Additionally it has an ability to specify where the steps definitions should be taken from. For other engines there's no specific information regarding those features. So, the grade table for this chapter looks like:

Engine	External Data	Inclusions
Cucumber	0	0
Freshen	2	3
JBehave	3	2
NBehave	0	0
SpecFlow	0	0
Behat	0	0

Overview table and conclusions

Let's put together all grades collected during this article. The overview table looks like:

Engine	D o c u m e n t a t i o n	Flexibility in passing parameters				A u t o - c o m p l e t e	Scoping		C o m p o s i t e s t e p s	Backgrounds and Hooks		B i n d i n g t o c o d e	F o r m a t t i n g f l e x i b i l i t y	B u i l t - i n r e p o r t s	Input Data Sources		Overall
Engine	D o c u m e n t a t i o n	R e g u l a r e x p r e s s i o n s	T a b l e s	M u l t i - l i n e i n p u t	E x t r a f e a t u r e s	A u t o - c o m p l e t e	T a g g i n g	S c o p e d s t e p s	C o m p o s i t e s t e p s	B a c k g r o u n d s	H o o k s	B i n d i n g t o c o d e	F o r m a t t i n g f l e x i b i l i t y	B u i l t - i n r e p o r t s	E x t e r n a l D a t a	I n c l u s i o n s	Overall
Cucumber	3	3	3	3	0	1	3	0	3	3	3	2	3	3	0	0	33
Freshen	2	3	3	3	0	1	3	0	3	3	3	3	3	2	2	3	36
JBehave	3	2	3	0	3	1	3	1	3	1	1	3	1	2	3	2	32
NBehave	1	2	3	0	2	1	0	0	2	0	1	3	3	2	0	0	20
SpecFlow	3	2	3	3	2	3	3	3	2	3	3	3	3	2	0	0	36
Behat	3	3	3	3	0	1	3	0	3	3	3	2	3	3	0	0	33

As it's seen from the table there's no silver bullet. Every engine have something the others don't have. Nevertheless, there're some features which can be mentioned:

Engines for scripting languages have almost the same feature set with some small variations. So, they implement some canonical part of functionality and it's quite easy to migrate from one engine to another (if there's a need of it)
JBehave and SpecFlow pay additional attention to unique features which are hardly available in other engines. That brought them additional score. Though they still have some problems in fundamental things
Every engine has gaps. It means that all of them have a lot of stuff to grow with. Maybe some features seem to be useless for each specific engine but at the same time such features become profitable in some cases.

And finally, I just compared features of those engines. But that's not the only criteria in case we select which engine to use. There're some other criteria for tool set selection. And taking into account that overall grade isn't drammatically different for most of engines I can tell that functional differences between those engine are not so serious. However, the above table can show where we should expect problems, restrictions to be ready to omit them in advance.

10 comments:

Sankar24 August 2012 at 05:01
Great post. I was just hunting for the right BDD framework for .Net and this post helped me a lot.
DANG Zhengfa28 January 2013 at 15:13
there are several BDD engines in Python: Behave, Lettuce, Freshen. but only Freshen was mentioned here. do you have any comparison among them?
mauricio31 January 2013 at 06:12
about feature "hooks" for jbehave, I belive this could be what you was looking for:
http://jbehave.org/reference/stable/annotations.html
green29 May 2014 at 09:57
1. There are fee JBehave plugins for IDEA and eclipse - they support autocompletion
2. What about Cucumber JVM? Starting from IDEA 12, it officially supports Cucumber-JVM stuff.
3. I think it's possible to scope tests in jbehave, just add some magic: http://java.dzone.com/articles/how-scope-scenarios-jbehave
4. JBehave has GivenStories instead of Backgrounds, not sure that backgrounds will appear in JBehave:
http://jira.codehaus.org/browse/JBEHAVE-392
Unknown3 May 2015 at 19:54
Currently Freshen for Python isn't good choice. It's tightly integrated with nose test runner and can't be run without it. Better choices are Lettuce, Behave and Morelia. I wrote on Python's BDD tools on my blog:
http://stolarscy.com/dryobates/2015-04/bdd_tools_in_python/
I still search for JavaScript BDD-tool written in the Cucumber's vein. Can you recommend any such tool?

Test Automation from inside

Search

Friday, 8 June 2012

BDD engines comparison (Cucumber, Freshen, JBehave,NBehave,SpecFlow, Behat)