Health check for Test Automation: Blue green deployment

With the product’s evolution and exponential growth over time, adding new features to make user experience of the application better release by release, we as software engineers want to make sure that the product’s delivery is faster without compromising its quality. It is the responsibility of the product developers to not just focus on product health, but to also think about the health of the automated tests. We create different types of tests like unit tests, UI tests, instrumentation tests, and end-to-end tests to evaluate the health and the quality of the product, but we are missing out on the fact that we also need to maintain the health of these tests.

“Also common is the test automation group zombie. This zombie is the practice of assigning test automation to a dedicated team of test automators. The appeal is that we can keep developers focused on writing new code instead of writing and maintaining automated tests. The danger is that test automation inevitably lags development, so feedback from testing is delayed in a way that significantly reduces its value.”Dale Emery.

How can we achieve faster delivery of the product?

Approach 1: CI/CD Pipeline

Continuous Integration Continuous Deployment
Continuous Integration Continuous Deployment

What is CI/CD Pipeline

Continuous Integration/ Continuous Deployment, also known as CI/CD, is a method used to efficiently deliver apps to customers by introducing automation into all stages of app development.

Our team started by building automation test coverage and plugging them into the CI/CD automation test pipelines, which would run every night. In the initial stage, it was working fine, but as the products evolved, we started seeing different challenges.

Challenges with this approach:

  • Automation reliability: Flakiness of the tests decreased the automation reliability; tests would pass the first time but fail on subsequent runs despite the same application being tested. Individual tests ran locally but failed when combined because of the stale data being left after the run of the previous test.
  • Device Issues: Sometimes the internet connectivity on the device would go off, the VPN would disconnect automatically, or the Google Play store login would throw an error and require reinstallation of the application.
  • Environment Issues/Reliable Data Issues: Changes of the configuration in the test environment by a different team.

To eliminate some of the issues that we faced with CI/CD pipeline, we went ahead with a new approach.

Approach 2: CI/CD pipeline with a robin role (this is the slight enhancement of the first approach)

Robin Role : Aperson is dedicated for a week to look at the test failures, triage the test executions, analyze the failure points, and update the results to the dashboard
Robin Role : Aperson is dedicated for a week to look at the test failures, triage the test executions, analyze the failure points, and update the results to the dashboard

What is robin role?

In this approach, a person is dedicated for a week to look at the test failures, triage the test executions, analyze the failure points, and update the results to the dashboard.

With this in place, we now had an idea of the reason behind the failures, but there wasn’t anyone who is fixing these issues in parallel, so the number of issues like product change, existing failures, or flakiness were still piling up. We as a team spent a good amount of time building a robust automation suite that could help in our release train by analyzing the quality of the product; however, we were now spending more time maintaining those automated tests instead of identifying the product defects.

Challenges with this approach:

  • Inordinate time spent triaging
  • Upcoming releases become bottlenecked on the robin role
  • No targeted approach to resolve the identified issues

This was the initial method for maintaining the quality of the tests and the code, but with the fast-paced growth of the product and their corresponding tests, it was no longer sufficient. This opens the platform to introduce our next approach.

Approach 3: Blue Green Deployment Model

Automated Green Blue Deployment Model
Automated Green Blue Deployment Model

What is Blue Green Deployment

Blue green deployment is a very well-known term in the world of continuous delivery. It involves having two identical instances up and running, with any new change being pushed to the blue deployment first; once everything checks out, then it is pushed to the green deployment.

After the 2014 Google Test Conference presentation by Roy Williams “Never Send a Human to do a Machine’s Job — How Facebook uses bots to manage tests”, we thought of adopting this approach to our product by creating two suites: primary and secondary. With this, we started eliminating intermittent failures by not adding the tests directly to the primary suite until they proved themselves a fixed number of times. A Test Warden Service is created, which would be responsible for tracking the health of the tests. The time taken for the test to prove its quality and move to the primary test suite is considered its probation period.

Adoption of Blue Green Deployment for the existing tests

  • Created a small number of tests, ran them constantly
  • With these consecutive runs, flaky tests were identified
  • Then these tests were separated out from the stable tests
  • Since now we have two suites: primary suite serving as the Green Pipeline and secondary suite (unstable) serving as the Blue Pipeline
  • Created issues for the flaky tests, which are part of Blue Pipeline
  • Now started tackling the items one by one when we have our Green Pipeline always in place and running smoothly

Adoption of Blue Green Deployment for the new tests

  • New tests are added to the blue pipeline
  • Tests prove themselves for 5 consecutive runs (one week) in the Blue Pipeline and then moved onto the Green Pipeline once the health of the tests is established
  • If flakiness is detected in the Blue Pipeline, a priority 1 (critical) product bug is created and assigned to the responsible user. A strict timeline of 3 days is followed to eliminate the flakiness of the tests, or else the tests will be removed from the test suite and the test coverage for the module will be decreased
  • Green Pipeline ensures that we can easily identify product defects

Now that the flaky tests are separated from the automation test suite, it is much easier to identify any product defect. With this, our execution time of the automated tests gets reduced and very few tests fail

Benefits of Blue Green Deployment:

  • Test execution time is reduced from around 6 hours to 3 hours.
  • Robin role: frees up time for the robin to help with other tasks in the sprint. Earlier, it was considered a 5-pointer task, but with this approach, it’s reduced to 1 pointer.
  • Turnaround time to identify an operational reliability issue or a product issue is less than an hour.


The software development process is a learning curve, and changes happen with experience. There is no universally correct way of doing things; one must experiment and find what’s right for their product. With a product that is widely used by millions of customers, we cannot compromise on quality at all.

“When the product is right, you don’t have to be a great Marketer” — Lee Iacocca


I would like to extend my gratitude to Reena Singh Kshatriya in co-authoring this article.