Investing in Testing: 2015

Thursday, July 16, 2015

Seperation of Tests

Go to any automation conference and there will no doubt be a couple of talks on flaky tests, they are one of the larger pain points when dealing with automation. There are a number of approaches to reducing the problem of flaky tests, I would suggest watching talks from previous GTAC or selenium conferences, I would like to talk about splitting the test results, or the execution themselves.

We frequently had a problem where the build was constantly red, atleast one test had failed. From a Pm's point of view they stop seeing the benefit of automation, testers themselves start to lose hope that they will keep see a green build and worst of all the results stop being valued by the team.

We started by moving tests that were flaky or had defects attached to them into a separate run. We continued to execute these defect/flaky tests, looking to see whether the defect tests failed earlier or started to pass and making sure that the flaky tests were flaky.

Unfortunately quite often during projects testers are pressed for time, so with this steup we had tests still sitting in the regression suite waiting to be investigated whether the failure was the result of the flaky test or a real defect. This lead us to creating a final run called investigation, where all tests that had failed in the previous regression run are rerun. These are ususally run straight after completion of the regression run. The results from this enable us to hopefully allocate the test into the correct run (flaky or defect).

In the future we hope to automate the process of allocating the tests into the correct run.

Monday, July 13, 2015

Breaking vs Failing Tests

So your test failed, what does it mean? One of the first things you need to determine is did it fail on the test or the setup. Sadly I've frequently found that tests fail before they get to the actual part of the system they are testing, especially with gui testing.

The problem with tests failing in this way is that we often report them as failures and this can massively skew our results. If we have a suite of 50 tests targetted at the Accounts functionality of our system but the accounts tab has been removed so we are unable to navigate to it should this be reported as 1 failure or 50?

By marking one test as failing due to a check that the accounts tab is available and the rest as breaking we solve this problem. Suddenly we have 1 test failure and 49 grouped breakages, which is a far more indicative of the actual state of the system.

So a breaking test is a test that fails before it gets to what it is checking/testing/asserting. I highly recommend incorporating the breakdown of failures to include breaking tests into your automation reports.

Thursday, June 25, 2015

Gherkin Reuse

Unfortunately there is no silver bullet for automation. As with every tool there are both positives and negatives, to get the most from an automation tool we need to accentuate the positives and mitigate the negatives. With gherkin based tests we can reuse the same language and stories for multiple tests, mitigating the problem of maintaining stories.

Gherkin best practices state that where not directly testing the GUI tests should be UI agnostic. UI agnostic tests require less frequent changes and are often shorter and easier to maintain.

UI agnostic example:

Given the user has an open account
When they close their account
Then the account has a status of closed

UI specific example:

Given the user has an open account
And the user is on the account status page
When they click close account
And select yes from the prompt
Then the user is taken to the account status page
And the accounts status is displayed as closed

Although fictitious this example is similar to what I've noticed regularly testers who are new to gherkin writing. The UI specific test:

Will require more maintenance as the test is bound to the UI with any dialog/button name changes needing to be reflected in the test.
Longer in length making test sets more difficult to quickly read through.
Difficult to move to another UI/platform. For example if we were going to run this test on a mobile device the action click is not as relevant and should possibly be replaced by tap.

In the latest framework I've developed, tags on a test cause the test to be run multiple times on different platforms. For example a test tagged @soap, @iOS, @chrome would be run as a soap test, on a ios device and on the chrome browser. In the UI agnostic test we can use (scope binded steps) to have the same story but be executed in the appropriate way for the SUT.

Additionally for one of the latest project we have the same execution platform but different authentication methods. We were able to set the tests to run by default using both authentication methods with only one underlying method changing between the two test executions.

One of the complaints often registered against gherkin tests is that maintaining the stories is time consuming/difficult. By having good practices/documentation/training in story creation and reusing the tests across different executions we can continue to gain the benefits from using a gherkin testing whilst mitigating some of the difficulties.

Increasing Automation ROI - Reuse of Automation

Both creating and maintaining an automation framework or a suite of automated tests requires a large investment and buy-in from management. As with all investments managers are seeking a return on their investment (ROI). It is often said that the ROI from automation increases after each execution. Unfortunately this is not always the case; in the case where the product is stable and the test does not alter it's data running the tests multiple times would be unlikely to discover any bugs in the system thus only providing a sense of security. Without the data of the test or system under test (SUT) changing a test is unlikely to find any new bugs so the question becomes how can we get value from our tests?

Although difficult to achieve randomizing the data of a test can increase it's value. Imagine a test as trail through a forest, once created with set data each run follows the same trail, not exploring the unknown parts of the forest. Bugs in this metaphor will often be found in the unexplored areas. Changes in the data used by the test can expand the coverage of the test and increase it's value.

In a similar fashion when tests are executed in random or the test actions themselves execute in a random order (see model based testing) additional value can be derived from the same tests.

The easiest way though to increase the ROI of your automation suite is to change the SUT. This could mean running your test against:

Different operating systems. Especially valuable now in mobile testing where operating systems are changing more rapidly.
Different browsers.
Different devices.
Different integrated environments/components.

As a manual tester I've done browser upgrades and never has my desire to come into work been dampened to the same extent. So a side benefit to executing automation against SUT is that it reduces the mindnumbing arduous tasks that need to be completed by manual testers.