Monday 9 February 2015

NBehave vs SpecFlow Comparison

NBehave vs SpecFlow Comparison

It's always good when you use some technology and you have a choice between various tools/engines. In some cases it makes a problem like it happens sometimes with BDD Engines especially when we have to choose between similar engines which are widely used and at first glance they seem to be identical. Some time ago I've made JBehave vs Cucumber-JVM comparison to spot some differences and comparative characteristics of the most evolved engines in Java world. And as I can see from the pge views statistics it's quite interesting topic. At the same there is .NET technology which has another set of BDD engines. And they are quite popular as well. So, in this world we may encounter question like: What is better, NBehave or SpecFlow ?

This answer isn't so trivial. When I did cross-platform BDD Engines comparison almost 3 years ago some of the engines weren't well enough or at least their documentation was on low level. At that time NBehave didn't seem to look well. But since that time a lot of things changed and now both NBehave and SpecFlow are turned into full-featured Gherkin interpreter engines. So, the choice of better tool among then isn't so trivial anymore. It means we'll start new comparison between NBehave or SpecFlow

So, let's find out who's winner in this battle!!!

What would be compared. Calculation methodics

Again, we'll use the same comparison scale as before. We'll take some common group of characteristics and set the grade showing how each engine is in each area

The following areas/features will be covered:

  • Documentation
  • Flexibility in passing parameters
  • Data sharing between steps
  • Auto-complete
  • Scoping
  • Composite steps
  • Backgrounds and Hooks
  • Binding to code
  • Formatting flexibility
  • Built-in reports
  • Conformity to Gherkin standards
  • Input Data Sources
We'll keep using 4 grade scale for evaluating each of the features. The grades are classified in the following way:
0Feature is not represented
1Feature is partially available with some restrictions
2Mainly feature exists but without any extra options
3Full-featured support
The above grades are still quite unclear and sometimes they can be subjective. In this case we'll split each area into some specific features and check which of them exists/missing. Each positive case (when feature is present) will be treated as one score item. The overall number of score items will be divided into 3 parts each of them will correspond to some grade. Based on this joint mark we'll be able to estimate each feature grade more or less objectively with some specific evidences. Thus, our comparison would be valid and valuable.

What would NOT be compared

Additionally, there are some things which I wouldn't use for comparison. They are:

  • Technology support - some engines may run with Mono or something else
  • IDE integration - despite it's still important it is excluded as it's hard to make comparative measure here
  • Test engines support - both BDD tools may run with different test frameworks
  • Additional languages support - .NET is just a platform to be used by multiple programming languages
Why the above items are excluded? During this comparison I'm trying to use metrics and characteristics which can be expandable to other similar engines for other programming languages. Thus, I'm trying to be not too much technology-specific.

So, when refer or use this comparison, please, don't forget that the above items were not taken into account. And for some people that can be essential. And mainly, don't forget to read system requirements.


The documentation is the first entry point while trying to use any library/tool/other software whatever. So, we'll start with it here as well.While evaluating this feature we'll take a look at the following items:

  • Official site - every more or less formed solution should have an official site and an entry point for new people. Otherwise it would be hard to find it within miriads pages in the Internet space
  • Getting started guide - when you're new to any engine there should be some dedicated page where to start from. Basically it contains some step-by-step instruction for creating first sample using the engine/tool whatever. Having such page is the big plus to the documentation
  • General Usage guide - indeed for many different reasons we should refer to manual for some specific feature or some complex action. So, it would be great if we have consolidated set of pages for reference
  • API documentation - since both engines are used via API it would be nice to have them documented on the API level as well
  • Code examples - that would be another reference point demonstrating some specific feature capabilities. In most cases the only live sample is much better than tons of documentation pages
  • Specialized forums - every well-grown solution consolidates people around. They make communities, discussion forums. Such resources are very useful for discussing some specific cases or non-standard tricks which are quite frequent but problem-specific almost all the time
  • Related blog posts - another documentation type which is still helpful is various blog posts. It indicates that some information is not just defined on official site but also some of examples, tips and tricks can be found on other pages written by people who aren't developers of the engine itself
  • Features documentation completeness - in this post we'll make an overview of various features. So another good indicator for documentation is features documentation completeness. It means that every feature should have page describing it

FeatureAvailable in
EvidenceAvailable in
Official siteNBehave Official SiteSpecFlow Official Site
Getting started guideGetting Started with NBehaveGetting Started With SpecFlow
General Usage guideNBehave DocumentationSpecFlow Documentation
API documentationNo API docs availableNo API docs available
Code examplesNBehave Examples on GitHubSpecFlow GitHub Examples
Specialized forums
Related blog posts
Features documentation completenessAt the time when this post is being written the official documentation has several pages empty. E.g.: Tags, Visual Studio PluginDuring this comparison there were found some places where there's no direct references to the feature itself. Mainly it's related to Gherkin features

Summarizing the above content we can find out that mainly both NBehave and SpecFlow are equally good represented in all documentation areas. They both have API docs missing but mainly it's due to .NET doesn't have really full-featured API doc generator as well as it's not something which is being widely practiced in the .NET world. But NBehave documentation has some areas missing. Thus, it gets a bit lower grade and we have grades like this:


Flexibility in passing parameters

In this section we'll compare NBehave and SpecFlow capabilities against their flexibility in building instructions and varying them in different fashion to cover more specific cases which correspond to the same action. Here we'll check the following options:

  • Parameter variants - in some cases we have fixed scope of possible values to pass as the parameter. Instead of using generic regular expression we should be able to enumerate acceptable options. That's what's done using parameter variants
  • Parameters injection - in some cases the sequence of the input parameters to method is not the same as we see convenient for our phrases. In such cases we should be able to define explicitly which part of the text goes to some specific parameter
  • Tabular parameters - when we pass some complex structures or arrays of complex structures there should be some mechanism to group them in granular way so they are still readable and at the same time they're convenient to parse. Tabular parameters is one of the ways for that
  • Step arguments conversion - an ability to convert input values into compact data structures
  • Examples - that looks similar to tabular parameters except this feature is used for different purpose when we want to run the same scenario against multiple different input parameters
  • Multi-line input - relatively small feature however it's quite frequently used and it's nice when such feature is supported
  • Extra features - any other features. They're not so essential as previous features however they're nice to have
So, the following table compares JBehave and Cucumber against the above features:
FeatureAvailable in
EvidenceAvailable in
Parameter variantsSimply supported by regular expressionsSimply supported by regular expressions
Parameters injectionTyped steps in NBehave 
Tabular parametersTables in NBehave
Step arguments conversionTyped steps in NBehaveStep arguments conversion in SpecFlow
ExamplesScenario outlines in NBehaveGherkin Language in SpecFlow
Multi-line inputDocstrings in NBehaveGherkin Language in SpecFlow
Formatted input  
There may be some other small features but in most cases they're rather some small syntax sugar or missing just because there's no need of them in some specific engine.

Despite SpecFlow has one feature less both engines have major support but it's not full-featured. Thus, we'll grade them both with 2, so:


Data sharing between steps

During test execution there may be some cases when we need to remember some value on one step and re-use it in some further steps. Well, it's very frequent case. By default it can be resolved by using some global objects (which are usually static) and sometimes it may cause problems in case of parallel runs. But some engines have built-in storage which is supposed to be more thread-safe as well as it's a kind of standard way of data sharing. Such feature is called the context. It represents some internal data storage (usually some map) where we store some named values and then can retrieve it. Depending on the scope contexts can be:

  • Binding-specific - localized by class, file or any other type of single resource
  • Scenario-specific - the scope is limited with specific scenario
  • Feature-specific - the scope is limited with specific feature
  • Global - the values in this context are visible and accessible everywhere and during entire run time

ContextAvailable in
Available in
Binding-specificThis is supported by local variables. So, it's all about language features + code binding specifics
GlobalThis can be done using static objects or singletons. So, it's more about language capabilities

So, both engines support contexts pretty well. Thus, they both gain the highest grades:



Auto-complete feature is really important and gives real speed-up during test scenarios development as all you need to do is just use some specific set of words and system will pick up all necessary phrases. It's extremely convenient and it may be one of the key points of decision making when selecting proper engine.

Both engines under current comparison can have it. NBehave has NBehave Visual Studio Plugin which contains auto-complete feature. SpecFlow has Visual Studio 2013 Integration feature which is also represented as the plugin with auto-complete feature. Currently, NBehave plugin is only for Visual Studio 2010 (and supported by 2012 version) so it doesn't really work on 2013. But as the part of this comparison we just check the principal ability to support this and NBehave does this. So, both engines have highest grade here:



If we load step definitions from different libraries or we apply similar text instructions to different application areas we may need to use the same key words but with different implementation behind it. Well, in most cases it's a bit confusing but on the other hand it may make some benefits. E.g. imagine you have an applicaiton which is represented with different clients (web and desktop) doing the same thing but they are implemented under different platforms so that one of clients is great to use on local desktop while another is targeted to be used via browser. All actions are the same so end user test scenarios are the same. But since we interact with different windows and controls we actual technical core part should be different. That's why we need to distribute steps somehow based on some specific attribute. Thus, we apply some steps to web applicaiton only while other steps are applied only for desktop client. This is one of the examples where we need steps scoping.

So, how good our engines are with it. Well, SpecFlow has dedicated scoped bindings feature for that. It does exactly what is expected for this functionality area. We may have several step bindings for the same key word but depending on some attributes they may be applied differently. What about NBehave? Well, there's no explicit way to do like that. But, NBehave itself is being started a bit differently. It runs specific assemblies. So, in most cases if we want to run different versions of steps representing the same key word we just need to have separate assembly and run it. I cannot even say it is workaround. It is feature but it's a bit weaker than the one supported by SpecFlow as we can do similar trick using SpecFlow configuration if necessary. Thus, we can mark our engines with the following grades:


Composite steps

Normally when we design BDD-based solution we reserve some key words for low-level actions (like generic clicks, text entries) then we go to some small page actions e.g. actions for filling the form and then we have separate level where we describe all the above actions using some more general description reflecting some business functionality, e.g. create some trade. It means that some lower level steps may be simply included into higher level steps. Thus we additionally re-use our key words which is good as it brings more advantage of BDD approach. Generally the ability to call some step definitions from another step definitions is called Composite Steps. So, how are they supported by NBehave and SpecFlow?

Well, NBehave doesn't seem to have this feature explicitly. At least it is not described in the documentation as well as it's not seen in the code. But as an alternative it has embeded runner feature. It doesn't cover all potential cases but at least it may have an ability to invoke features from the code. Thus, we can combine reusable steps into string with the feature and we'll get what we need.

As for the SpecFlow it supports calling steps from step definitions pretty well. Thus, we may conclude it has full feature support. This can be reflected in the following grades:

Despite NBehave doesn't support this feature explicitly the alternative way of doing it covers as options. In other words, this feature isn't supported directly but some workaround covers all necessary support.

Backgrounds and Hooks

Of course, test actions aren't just performed within actions. In some cases we need to perform some pre-setup activities to make sure our application is in proper initial state. Additionally we should be able to handle some events which may happen before/after each step/scenario/feature/run etc. All that stuff is covered by backgrounds and hooks.


Backgrounds are important when we have common pre-conditions which should be applied to any test within a feature. Here is the direct reference showing backgrounds in NBehave support. SpecFlow also supports this Gherkin feature. So, this feature is fully supported by both engines, thus we have grades like:



Hooks are needed to run some specific action on some engine-specific event. E.g. in some cases we may need to run some code before/after entire suite or even specific step. It may be needed for may purposes (e.g. reporting). The comparison will cover the following hook types:

  • Before/After step - runs before/after each step
  • Before/After scenario block - runs before/after each scenario blocks like all givens, whens etc.
  • Before/After scenario - runs before/after each test scenario
  • Before/After feature - runs before/after each feature
  • Before/After run - runs before/after entire run
  • Tag Filtering - this is additional feature which gives an ability to apply hooks only to specific tags. It was decided to add it here as it's the most relevant place.
Both NBehave and SpecFlow support hooks. Here is the the list of NBehave hooks. SpecFlow hooks also have their dedicated description. The common set of hooks is pretty similar and covers the most essential places. But SpecFlow has 2 more options which cannot be found in NBehave. They are:
  • Before/After scenario block - it's not really critical to have it as well as using existing NBehave hooks we can implement this type of hooks. But anyway, for SpecFlow it's additional feature which doesn't exist in NBehave
  • Tag Filtering - this is relevant to scoped bindings which are simply good feature but not essential (sometimes it's even harmful) but yet, it's still something the SpecFlow has unlike NBehave
The above information can be represented with the following table:
HookAvailable in
Available in
Before/After step
Before/After scenario block
Before/After scenario
Before/After feature
Before/After run
Tags Filtering
Taking to account the above features we easily can estimate grades for both engines. They are:
Despite SpecFlow appeared to be better in this area the NBehave didn't lose too much as such features aren't really frequently used.

Binding to code

Both NBehave and SpecFlow use attributes to bind some text to the actual implementation which is the best step binding. Thus, both engines get the highest grade in this area:


Formatting flexibility

For this comparison current area is also redundant as both NBehave and SpecFlow are relatively flexible to features formatting. Thus, again both engines get the highest grade:


Built-in reports

Reporting is one of the important parts of each test engine as reports are the major artifact we analyze after test run completion. Sometimes people tend to overestimate the reports importance but for sure too few of them underestimate that. For this comparison we'll take the set of report types which is supported not only at least by one of the engines in this comparison but also we'll pay attention to reports generated by other similar engines for other programming languages. So, let's compare NBehave and SpecFlow against supported report types:

Report typeAvailable in
Available in
Console output
Pretty console output
Structured file (e.g. XML)
Well-formatted readable file (e.g. HTML)
Usage report
Extra report types
SpecFlow is a bit better than NBehavebut it is just due to one report which SpecFlow supports. So, the difference here is not essential and that makes the following grades:

Conformity to Gherkin standards

Since Gherkin is the common standard applied not just for NBehave and SpecFlow but also for many other languages it is important to make sure that our engines conform this common standard. In some cases that may appear to be key selection factor. E.g. it is important if you develop features in different programming languages and you want to use the same set of tests across multiple languages. Having common feature files should decrease the effort on migrating from one language to another.

So, the table below shows which features are supported by both engines under comparison:

Keyword/attributeAvailable in
Available in
Scenario Outline 
Both NBehave and SpecFlow are 100% Gherkin compliant. That's reflected in grades:

Input Data Sources

Input data sources are needed in case we want to share some data or scenario parts across multiple resources. Generally, it should optimize the effort for development and maintenance. But on the other hand we may have additional problems with references resolution on features level. Generally, Gherkin feature files were not supposed to be a description of algorithms and some other technical stuff. Mainly it is targeted to express test instructions which should be interpreted explicitly. Thus, ideologically presence of inclusions is not really appropriate. At the same time being able to read data from some external sources could be useful. But...

External Data

SpecFlow doesn't contain anything which does that directly. At the same time NBehave has embeded runner which may read features transferred as a string. Where this string comes from is a separate question but generally it's possible using built-in functionality. At the same time it's rather workaround than something really targeted to read external sources, thus NBehave gains some points. So, comparative grades look like:



No support for inclusions by SpecFlow. Maybe it's even for good. NBehave has embeded runner feature which can be used for that. It's not 100% of what is required but it is really serious feature supporting that. Based on that information we can grade our engines like this:



It's time to summarize all the results and join them under total score table. Here it is:

Engine\FeatureDocumentationFlexibility in
passing parameters
Data sharing
between steps
and Hooks
to code
Conformity to
Input Data

Well, we've just got equal scores.

Does that mean both engines are equal? Well, probably, no. What we should take from this comparison is that:
  • NBehave and SpecFlow have different balance of features. But the general quantity and impact of those features compensates gaps of each engine
  • Some features were not really compared (and that was described in separate paragraph), so in some cases those excluded features should be taken into account
  • SpecFlow looks like alive project while NBehave hasn't been updated for a while however, the new branch for 0.7 release is still there and who knows, maybe the project was just suspended
And in general, the comparison will mainly show you what you gain and what you lose by selecting any of the compared engines if you apply them on the same area. But again, don't forget to read release notes and their specific details.


  1. This was very useful. Thanks.
    You had compared BDD tools across technologies in 2012. Can you please compare them again as many features have got added. Nbehave in those days was no where closer to SpecFlow but now they seem to be at par. So it would be worthwhile to compare with other tools as well.
    I also looked into JBehave and when I compare that with SpecFlow, I believe the automation text fixture generation in SpecFlow saves a hell lot of effort. Such features need to be given higher weightage compared to others for a more balanced points. Is that a fair comment?

    1. That definitely makes sense. I was actually thinking about more frequent comparisons across platforms, e.g. on annual basis as all those engines definitely evolve from time to time and in some areas my posts become out-dated very fast. But it takes a huge amount of time as I need to go through all the engines (which number is growing). Also, calculation system definitely needs an improvement and your example is just another confirmation for this.

  2. really well compiled post. Thanks