Thursday 23 March 2023

Visualising Test Coverage

Back in the Product Company World

It has been a long time since I wrote a blog post. When I was a consultant, first at Thoughtworks, then at Codurance, it was easy to take the time to write the articles and our clients gave us lots of material. It was very easy to write articles that combined experiences and mashed them up into coherent experiences and learnings without betraying any client consultancy trust relationships and it was easy to create personas based on the most dysfunctional or amusing aspects of the people that we came across.
Back in the world of product companies, first at Currencycloud and latterly at IMMO, I lost the diversity of material and experiences, the ability to anonymise the organisation to which I draw experiences and the ability to create anonymous personae. Only now, after a year at IMMO, do I feel that I am ready to write again and talk about some of the problems I found when I came here and the solutions we have tried, both successfully and unsuccessfully, to improve the situation.

IMMO Capital

I started work at IMMO as VP Engineering back in April 2022. My initial remit was to grow the engineering maturity of the organisation by understanding the big problems that we had and putting in place plans to address them. The company had just closed its Series B funding and had a list of technology due diligence items that needed to be addressed pretty urgently. 
It became apparent quite quickly that we were going through the growing pains of changing from a startup to a scaleup. Most of the solutions that we had in place were built using a low code framework called Zoho. This is a great solution for an organisation that is experimenting and trying to prove and understand a business model, but it cannot be described as a sustainable or long-term solution. It is fair to say that one of our big problems (which we have made big strides to fix) is reducing the reliance of the business on Zoho without compromising our operational efficiency.

Technology Strategy

In my time as a consultant I helped many organisations to put together their technology Vision and their Technology Strategy. In some cases I was involved in starting the delivery of the technology strategy. 
Our approach was always:
  1. Understand and document the current state of all the organisation's technology.
  2. Understand the desired future state by describing what impediments or constraints the current picture places on our ability to deliver on our business goals.
  3. Describe the steps to get from the current state to the future state in small enough chunks that can be owned by somebody (preferably an individual) and delivered in a reasonable time frame.
We didn't really have a playbook on how to "do" technology strategies either at Thoughtworks, in my time at TW London at least, and certainly not at Codurance when I started. On reflection I'd say that Thoughtworks was very good at steps 1 and 2 above, I saw great strides in those steps at Codurance in my time there, but in both cases, there wasn't any kind of playbook and certainly no framework to describe step 3 to the client. After a few missteps, reading Eben Hewitt's excellent book and learning a lot from the CTO at Currencycloud, my conclusion was that the best way to achieve step 3 is through OKRs (as long as they are done right). I am putting together a post purely about OKRs later. I'll link to it here when it is done.

Quality Strategy

While formulating the early version of the Technology Strategy at IMMO it became clear very quickly that we need to understand better how to achieve "quality". We had CI pipelines in place but the company did not have, or didn't think it had, enough stuff in AWS to have invested yet in tooling to achieve CD. There was still a fair bit of manual testing and, whilst there was a very heartening amount of Terraform scripting in evidence, the actually delivery of things into production was being managed by our Platform team on a set release cadence.
The biggest reason I heard time and again from the teams was that we couldn't release on demand because we had to do a load of testing. So I boiled those objections down to "how can we be confident that our changes 'work' and that they won't 'break" something that is already out there so that we can do a release?" I think when we ask ourselves what all of our pre-production environments are for, the answer is that they help us answer those questions.
I therefore took an action to build and help to implement a "Quality Strategy" as part of the wider technology strategy. This is now starting to bear fruit.

The End to End Test Delusion

One of the biggest problems I have seen in an organisation's test strategy (maybe it was never a strategy, just the way things evolved) is when they have a load of "end to end" tests, usually written in something like Selenium / Webdriver, that are supposed to all pass before a release happens. I believe that this anti-pattern evolves because of some or all of these reasons:
  • Unit tests were never written contemporaneously with the code
  • The effort to retrofit unit tests was tried but failed, probably because:
    • They found that in order to unit test effectively they would need root and branch redesign and refactoring
    • Nobody knows what the bits of code are meant to do, it "just works"
    • Leadership (at different levels of magnification) doesn't understand the value of the effort
    • Product doesn't care, it just wants new features, so anything that doesn't directly contribute to a new feature is seen as a waste of time
  • The solution is, or evolved from, a single, big, hard to unit test monolith. There may still be a rump of this hard to test monolith in the middle of every flow or most flows.
  • There has been no investment in contract testing or some kind of testing that involves, either directly or indirectly, multiple components working together.
  • There is, or was, a QA team who built over time bigger and bigger test scripts (sometimes called Regression Packs) that are mandated to be completed before each release. At some point it became clear that this process was unsustainable so somebody decided to automate all of these tests.
Whatever the reason for it, a large amount of "end to end" tests is at best an impediment and at worst a serious problem because:
  • Such tests are expensive to run, in terms of infrastructure and time taken waiting
  • The feedback loops are too long
  • They are hard to maintain and usually very flaky
  • They give you a lot of information when they pass (yay! everything works!) but very little information when they fail (end to end test #3423 failed, we don't know why, it could have been a connectivity issue, it could have been...)
The end game of the end to end test delusion usually looks something like this:
  • The tests take so long that there is a "code freeze" on a "release candidate branch" so that they can complete, be "fixed", successfully run, defects fixed and then release. This is a big bang release and is precisely the anti-pattern that automated testing hopes to get away from.
  • The tests are so flaky that everybody ignores every test run and people stop maintaining them (very likely).
  • When tests fail, the team can't wait for it to be fixed before they release, so they comment out the offending test, or skip it somehow, so that they can get their release through the pipeline.
So if we accept that "too many" end to end tests is bad, we then have two questions to answer:
  1. How many end to end tests should I have?
  2. How can I get "enough" automated testing in place so that I can have confidence that I can release my changes?

A Picture is Worth a Thousand Words

As a consultant I learnt to value models and diagrams above most things. If a model could be expressed in a simple diagram (e.g. Cynefin) then better still. The benefit of a model is that it gives you a frame of reference in which to discuss things. So if your model is accepted as somehow useful, it can start to form the basis of some kind of playbook on how to do things.
Happily, when it comes to the notion of test coverage there are already some diagrams in existence. Sadly, in terms of answering the above questions, I have always found the existing models, the best known of which (I think) I've listed here, don't fall short on question 1 and don't address question two at all.
In my critiques that follow, please bear in mind that I'm looking at these various models mainly through the prism of the above two questions. I accept that what I see here as a weakness may not be a weakness to somebody else's use case, indeed may well be seen as a strength.

The Test Pyramid

The Test pyramid is probably the best known of the ways to visualise your test coverage. There are many different versions in existence (some of which add a "cloud" of manual testing at the top of the pyramid) which could be part of its problem in my mind. Many of the shortcomings are articulated in the article by Fowler that I've linked to.

Fowler's Test Pyramid

I like the arrows that Martin added to this picture indicating Slow v Fast, Expensive v Cheap. I personally also like to add another two arrows when I talk about this diagram:
  • Tests at the top give you lots of information when they pass, very little when they fail. Tests at the bottom give you very little information when they pass, excellent, targeted information, when they fail. I think it was my Thoughtworks colleague Chris Ford who passed this one on to me.
  • Tests at the top should be owned, written and maintained by a QA people (but understood by engineers), Tests at the bottom should be be owned, written and maintained by Engineers (but understood by QAs. This is one that I came up with at some point in the past and, of course, only makes sense if you have QA as a competence, hopefully embedded in your delivery teams.
  • Easy to understand
  • Clearly shows that you need "most" of the stuff at the bottom and "least" of the stuff at the top.
  • There are many different versions, meaning it suffers to a degree from semantic diffusion.
  • It can cause big discussions and misunderstandings as people can get dogmatic about what "integration test", or "service test", or whatever actually means. 
  • The outcome of "confidence" or "enough testing to gain confidence" isn't really addressed.

Agile Testing Quadrants

I thought that this was originated by Lisa Crispin in Agile Testing but as this blog from Lisa says, it originated earlier with Brian Marick's agile testing matrix. What I really like about this is that it explains what types of tests you should concentrate on for what types of outcomes using two simple dimensions, help the team v critique the product and business facing v technology facing.

  • Easy to see what type of test to write for what type of outcome you want
  • Describes different stakeholders in quality
  • No indication of how much of each type of testing is good or bad
  • Lost of information in here, maybe too much (you need to read the book to drill into the different types of testing here)
  • It is of much wider scope than "can I be confident in my release"