Thursday, April 21, 2016

Test Brain, how we multi threaded our testing in C#

When I joined my current role specflow was already in use as the tool for BA’s to write their specs.  I hadn’t worked in C# land before and thought that I could quickly build up a working mobile testing framework then easily parallelize it using selenium grid and the C# version of whatever java/python tool I had used in the past.  Quickly though I came to realize that this was not the case and although Microsoft have done a tremendous job recently improving their tooling it is still behind what I was using 2+ years ago in other languages.

One of the biggest problems with UI tests (and especially mobile) is that they are very slow.  With specflow being unable to run tests in parallel (there has since been a update that allows this functionality but I have not tested it myself) we were stuck with test suites with run times of over 8 hours.  As our automation grew (with testing times hitting 24 hours+), even with splitting the tests into device type and then into bundles we were still hovering at the 10 hour mark for getting results from our regression runs.

So not only are UI tests slow to run but they are also very unreliable.  Look at the schedules from any of the automation conferences and you see presentations on flaky tests or making your tests more reliable.

To mitigate these problems we developed the Test Brain.  At its core the Test Brain is a tool which looks at your infrastructure determines the number of tests threads that can be run in parallel, spawns these, runs the tests and can rerun any failed tests.  The Test Brain has a queue of tests which are ordered based on run priority and as new builds come in the tests are added to the queue.  Let’s go through these in a little more detail.

Number of Test Threads - The test brain can run a script which calculates the number of threads that can be executed in parallel.  For example with mobile tests it checks the selenium grid for devices that match the test run criteria, which are online and available.  For tests which alter the environment it might check the number of environments that are available and matching the build requirement.

Spawning Test Threads - Similar to http://www.clusterrunner.com/ Test Brain creates copies of the project, distributes them to available resources and then spins up the required number of threads and starts executing tests!

Rerun Failed Tests – Sometimes we have issues which are unrelated to our tests, sometimes tests are just flaky especially when run on mobile phones (more on this another day), so runs can have rerun counts and pass % requirements.  This fits into our reporting infrastructure such that we can see which tests have rerun and each runs result.

Prioritizing Tests – If we have a smoke test suite come into the queue and we still have 1000 tests left in a regression run then we prioritize the smoke test run finishing first.  Similarly if we see the same test twice we can cut it out of the queue.

We’ve extended and are working on further extending the Test Brain, here are some of the additional parts of functionality:

Fail Builds Early – For some of our test suites if they fail a single test then we need to instantly report this failure, rather than progressing and executing the rest of the test suite.

Defect Creation – We have yet to implement this feature as I’m not sure whether it should be included in our Test Brain or in Repauto, but being able to instantly raise a defect with the information found during test execution is something we are considering.  Given that Repauto already has some built in triaging functionality, including manually updating test status it might be easier for this functionality to be there.

Test Tagging – Similar to defect creation except this involves changing the tags on test scripts to move them into different test suites.

Number of Tests Run – Each build has a different chance of introducing bugs.  We would like to run defect density or code complexity or even just change size/what has changed to each of our builds and feed this detail into the Test Brain as part of the calculations on which tests should be run.

Tests to be Run – Similar to the above, except here we would like to keep track on which tests have been run against which builds and from this determine which tests should be run against each build, skipping tests on older builds that are already passing on more recent builds, etc.

The Test Brain has helped us more than we anticipated, there is an immeasurable difference in the value we get from the automation now that it is fast and reliable.  Our teams now believe in automation and are checking the results daily.

Tuesday, April 19, 2016

Automated Performance Testing

Every company I've worked at previously has done performance testing using Opensta, Jmeter, VS Test or Loadrunner.  We've run our tests at the end of the project and prayed that everything worked.

At my last company we started using RUM performance testing reusing selenium tests to generate har files, this gave us some interesting results and we ran the tests during development but we still left running larger loads until the end of the project.

So at my current job we we face over a 3 month period at the end of each project called stabilization, during which time we run performance testing among other activities.  One of our goals has been to shorten this length of time before each release and to do this my team has been working on moving our performance testing into our development phase.

So we automated the deployment of each build (using chef and octopus deploy) and start of performance testing.  This was scheduled to run each night with results being available in the morning.  We would then perform additional performance tests dependent on what was needed for the project.  This worked ok, but it requires us to store the results from each performance test, the environment data and have these checked after each run.

I've talked about our reporting tool for automation before called Repauto.  We extended this to store performance test results.  Rather than storing the results from each run from our performance testing tool we installed Prometheus (an open source monitoring tool similar to splunk or ELK).  Prometheus pulls data from the servers and stores it for us to query.

In Repauto each performance run has a summary with the most important stats of the run, including the build, environments used and test details.  We also have two tabs one which has the data stored in Prometheus graphed using grafana and the another which details all of the environment stats, which we pull out using chef.  It looks a little like this.


For each of the projects we have a set of alerts which we've created such that if performance degrades we can quickly look into the run and start assessing what has changed.  Further we can compare runs looking for any differences in the environments (patches applied etc.) which before we had no archive of.  So now we have nightly performance runs and constant and consistent monitoring.  We will see soon if this helps shorten our stabilization cycle.

Sunday, February 7, 2016

Repauto - Reporting made better

I was lucky enough to get to present at 2015 Selenium Conference.  Myself and Xiaoxing Hu gave a talk on Repauto our reporting dashboard for automated tests, you can see the talk here.  Unfortunately open sourcing it has been more difficult than expected, but we have managed to open source a slightly less functional version which can be found here.

I will write up a more substantial post about the functionality and how we use it.  But for now would be great to hear some feedback.