Skip to main content
Loading...

#3 DEVELOPING GREAT HEALTHCARE OUTCOMES FOR AUSTRALIANS – A JOURNEY OF TRUST, COLLABORATION AND CONTINUOUS IMPROVEMENT

Now time to conclude our journey, starting with our testing practices.

  • Healthcare-3

Welcome back to our series on the challenges and changes we’ve made delivering a middleware product to the healthcare industry on behalf of our federal government customer. In the first article we introduced some of the changes we’ve applied to our relationship and working practices, and built on that in our second article to describe corresponding changes to how we develop, build and deploy. It’s now time to conclude our journey, starting with our testing practices.


How we test

Our testing practices have historically been a mixture of sporadically implemented and maintained unit and integration tests and mostly manual system and regression testing. In the last couple of years we've invested (and will continue to invest) in more consistent use of test automation.

Our legacy unit tests had been written using the MSTest framework across many years by many developers. As tends to happen, different developers had different approaches, experience and appetites for writing coded tests, resulting in tests that were inconsistent and in many cases assumed dependencies such as databases or data in databases that were long, long gone! Due to this, the tests weren't able to be executed reliably, resulting in them becoming poorly maintained, and the cycle continued.

We're now partway through migrating our existing unit and integration tests from the MSTest framework to xUnit instead (this was the subject of a recent team swarm day), and all new tests are written in xUnit. For us, xUnit provides much better support for the explicit handling of dependencies through its fixtures, and we use them extensively to perform actions for each test run such as:

  • Creating a clean application database (using our database upgrade mechanism) and populating it with seed data in SQL Server LocalDB for each test run

  • Ensuring that client certificates required for interacting with external dependencies are installed appropriately (more on that to come)

  • Ensuring our logging configuration is available during unit tests

  • Using just these fixtures we've eliminated many of the static "test helper" classes (shudder) we previously had throughout our codebase and supplied a framework that can consistently and reliably get tests into a known state prior to running and subsequently clean up after themselves. This allows our developers to focus on what they actually need to test instead of worrying about dependencies as a barrier to even getting started! By making our test dependencies explicit and easing how they're fulfilled, we now focus on writing repeatable unit and integration tests that will run anywhere - including any data or configuration that's specific to the test.

    While this approach is working just fine for our unit tests, we're actually reasonably constrained in how widely we can implement unit tests before we reach the land of integration tests instead. Being middleware, we have dependencies on a number of external services, and a vast majority of our middleware logic revolves around interactions with these services. In a sound architecture these dependencies would have been abstracted by interfaces and we'd use dependency injection to inject either the concrete dependency or a mock that we could test against. Alas, tying in with my previous observation of doing the right things from the start (from now on), in the legacy codebase we have no interfaces nor dependency injection. While that is certainly something we will progressively work towards, in the meantime we're left exploring alternatives for satisfying dependencies in unit and integration tests:

  • We have access to an externally hosted "system vendor test" environment for each external service dependency, and historically this has been the environment we've targeted with our integration tests.

    • The Good: The environment exactly replicates the service logic implemented in the corresponding production environment, meaning if our tests fail in this environment, it's significant.

    • The Bad: The environment intermittently and inconsistently returns error responses to service calls. And the very next time the same test case is executed it's likely to succeed, but a different test case may fail, or not. So we waste time investigating test failures that turn out to be a transient failure in our dependency not in our system under test, consequently again losing confidence in our integration tests.

    • The Bad: We have limited control over the data in this environment, it's shared with a number of other vendors. So we can very well (and do!) execute a test using test data that we thought was in one state, only to find that someone else has modified it in the meantime!

  • We have access to a "simulator" for each of our external service dependencies that can be run alongside our system under test.

    • The Good: We're in complete control of the environment and the data in it.

    • The Bad: The simulator doesn't implement all service operations supplied by the external service dependencies, so we can't execute some test cases.

    • The Bad: The simulator is based on a different codebase to the main codebase for the external services, so may not always reflect the current release.

  • We've developed a set of "mocks" for each external service dependency and the service operations we invoke.

    • The Good: We're in complete control again.

    • The Bad: The simulator doesn't implement all service operations supplied by the external service dependencies, so we can't execute some test cases.

    • The Bad: We implement minimal logic in the mocks, so they're not really suitable for testing interaction logic. But they are great for performance testing - more on this later!

I'm not sure there's a great answer to this, everywhere we turn there are constraints and deficiencies. In the interim we're working with the supplier of the "system vendor test" environments to try to understand why we encounter intermittent failures and introducing increased resiliency into our code that interacts with these dependencies. This is an area we continue to invest in, because it's important to get right.

As our products are becoming more widely adopted we're seeing a greater emphasis from customers on performance, both in the core and user interface products. Performance testing in general is not something we've historically been great at, we've tended to do just enough to meet the requirements of a specific project, without really considering repeatability or frameworks. To address the increasing requirements from our customers we're now investing heavily in automated performance testing. We've composed a reasonably robust framework that uses JMeter running inside Docker containers as our test clients, which enables us to scale our clients to simulate any load we desire. To remove our dependency on the external services and provide the responsiveness we need during performance testing we've used SoapUI to mock each service operation we depend on, with the mocks deployed to Tomcat. We use Azure DevOps Pipelines to deploy a target release to our performance test environment and kick-off the performance test run, capturing performance metrics via Telegraf in InfluxDB, and reporting on them using Grafana. Using this approach we've finally been able to benchmark performance of our core and UI products, with a view to identifying changes that introduce performance regressions in the future. We've also successfully used our framework to identify and eliminate some long-standing performance bottlenecks, in particular improving the responsiveness of a high-use screen in our UI by at least 5x.

We're actively working on maturing our testing capability in other areas too, including:

  • Continuing to automate our regression test suite using tools such as Selenium and Katalon

  • Continuing to evolve our performance testing framework, including consideration of alternative mocks such as Mountebank, and scheduling our performance test suite to run on a regular basis so we can compare the results with previous runs

  • Automating and scheduling vulnerability scans

  • Simplifying and standardising our test reporting requirements, particularly in the areas of test strategy and test summaries


How we implement and support

Not only do we supply our products to customers, we also support them implementing and using the products. From the very inception of our engagement almost ten years ago, support was historically fulfilled by the product development team, with nominated team members being on rotation through support.

Support isn't an easy gig. Like anything you produce, over time our products have progressively supported more operations and incorporated more logic, in some cases requiring a deep understanding of not only how the products are technically implemented, but also how they can be deployed and configured, and the complexities of the healthcare domain itself.

We've made some great advances here too.

For our last major release we introduced the concept of an adoption site: a customer who receives our beta releases across a 4-6 week period, validating our documentation and deployment process and providing what's effectively a pre-release UAT. Because it's now so much simpler to deploy our products, we can provide weekly or daily beta releases, integrating any feedback or bug fixes and working towards delivering a higher quality final release. We had great success with this approach for our major release, and have continued to apply it to our minor releases.

To better support our customers we're transitioning level 1 and level 2 support to a dedicated managed services team, with our product development team only getting involved at level 3. There have been a number of benefits from this:

  • The knowledge required to support our products is made more explicit through needing to be shared across multiple levels of support

  • Knowledge is captured somewhere other than in key team members' heads

  • The managed services team is able to identify and focus on commonly encountered issues

  • The product development team is able to focus on product development, only becoming involved in support requests when absolutely needed

  • We’ve introduced a Knowledge Portal, maintained by the managed services team, that can be accessed by our customers to identify and resolve many commonly encountered issues without even needing to log a support request - imagine that!

  • Support is now actively contributing suggestions back to our product backlog for remediating common issues or making our products easier to support

Again, none of this is revolutionary, but it was needed.

A great example where a suggestion from support has been integrated back into our products is observability. The consistency of our logging wasn't great, meaning in some cases a customer or support didn't have a lot to go on to diagnose an issue. And so support requests would fall straight through to level 3, needing a developer to dive into code to understand what was going on. We're now starting to focus on providing better visibility into the operational state of our products, through:

  • Making it simple for developers to log in our product source code

  • Performing improved and more consistent logging

  • Including useful contextual metadata in logs (semantic logging)

  • Where possible, ensuring logs include a reference to a known error condition in our Knowledge Portal, better supporting self-diagnosis and resolution of issues

  • Correlating logs so we can gain a more complete picture of state for a logical operation

  • Using a logging framework to support configurable destinations for logs

  • Proactively identifying conditions that can lead to issues and using our logging framework to notify an operator

Building on this we're now exploring options for introducing a more mature operational insights capability, using products such as Elastic Stack, TICK Stack, or Prometheus to capture and aggregate logs and other telemetry, support self-service operational analytics, and produce alerts for sub-optimal operational conditions.

Making our products easier to support helps everyone.


Where next?

We've made huge progress in the last few years, largely thanks to a great team comprised of Chamonix and partner team members, who all work by the same set of shared team values. There's still so much we can achieve together though. So we'll continue trusting each other, collaborating and iterating, continuously improving and eliminating waste, all so that we can provide the best value possible.

Thanks for sharing our journey with us!