Friday, 21 July 2023

The Starling Method


Behavioural Interviews

Behavioural interviews have got more and more popular throughout my working life, so much so that I'd now say they are ubiquitous. I recall that when I was starting out, people would ask questions such as "how would you approach a situation where..." whereas now the question is always, "tell me about a situation where..." This may sounds like a small shift but the consequences are massive. We are now dealing with concrete examples with real situations, real people and real results (good or bad), rather than theory that we have read somewhere.

From what I have read on Wikipedia, "The idea is that past behavior is the best predictor of future performance in similar situations. By asking questions about how job applicants have handled situations in the past that are similar to those they will face on the job, employers can gauge how they might perform in future situations." Interestingly, I found exactly the same phrase in this LinkedIn post, but the Wikipedia article cites a 1995 source and the LinkedIn was posted in April 2023, so I think the 1995 reference is probably the original. So if we accept the initial premise above, "...past behaviour is the best predictor of future performance..." then we have to be prepared to go to interviews with some good answers to these questions because saying "if this happened to me I would..." is not going to cut it.

The STAR Method

Every piece of advice I've seen advises using the STAR method to answer the question. The components of your answer (if you aren't guided into them) should contain the following elements:
  • Situation - what is the context of this story
  • Task (or Target, I'll come back to this) - what you were asked to do
  • Actions - what you did
  • Results - what happened, how did it work out?

Situation

This is pretty self explanatory, what was the situation that meant you had to (e,g.) resolve some conflict in your team? (a depressingly frequent visitor to the behavioural interview, why does everybody assume that conflict resolution is sufficient common that it has to be raised in every interview? Does it say something about the organisation that asked the question?) 

Task (or Target)

This is what you were asked to do / had to do. For example, "resolve a conflict". I prefer Target over task here, but maybe that is just me. My reason is that a lot of stories begin with "I was asked to do..." One thing I learnt in my Thoughtworks days is that delegation should be based on delegation of outcomes, not tasks. So in my personal experience of delegation, which is extensive as a manager, I have learnt that people react better to being given an outcome and being free, at least to an extent, to work out how to achieve that outcome. 

So a while ago I asked a team member to "get us to a place where all our laptops have encrypted hard drives and they can be remotely wiped in the event of loss", rather than, "Find the best MDM for our company." To an extent, this also framed the selection criteria of the MDM by outlining the most important characteristics of the choice. This in turn helped my colleague to provide me with the right level of detail to support the choice that was made.

Actions

This is pretty obvious, I hope. This is what we did.

Results

Again, this should be obvious. What happened? Were you successful? Some sources I have read also tag on the end something about learning. The Wikipedia article linked above says, "What did you learn from this experience? Have you used this learning since?" I find this coda problematic. Learning should be a first class citizen.

Learning as a First Class Citizen

All good organisations should embrace learning. If they don't, they will not improve, if you don't improve, you will be going backwards in comparison to your competitors. Jack Welch is said to have said, "An organization's ability to learn, and translate that learning into action rapidly, is the ultimate competitive advantage." (although I'm sure I've read a similar quote by Peter Senge).

So I believe that almost everything we do should contain a self reflection (or a team reflection, maybe call it a "retro") on what we learnt. Perhaps you can use the results to ask, "how do I make sure I succeed again", often we need to ask, "how do I make sure a similar failure doesn't happen again?" So learning should always follow.

Conclusion - The STARLing Method

I believe learnings should follow experience and I believe that if something is perceived as "failed" that shouldn't matter as long as appropriate learnings were taken and acted upon. So I would propose that we should talk about the STARLing Method:
  • Situation
  • Task or Target
  • Actions
  • Result
  • Learnings
Armed with this modified answer, we should no longer be afraid to discuss a situation in a behavioural interview where there may have been a perception of failure. I would even argue that when we talk about a "failed experiment" in Lean experiment terms, we shouldn't talk about failure. The only failed experiment is one with an inconclusive result. If we tested a hypothesis that we thought might increase sales but it didn't increase sales, we have learnt something that doesn't work. That isn't failure, that is learning!

Thursday, 23 March 2023

Visualising Test Coverage

Back in the Product Company World

It has been a long time since I wrote a blog post. When I was a consultant, first at Thoughtworks, then at Codurance, it was easy to take the time to write the articles and our clients gave us lots of material. It was very easy to write articles that combined experiences and mashed them up into coherent experiences and learnings without betraying any client consultancy trust relationships and it was easy to create personas based on the most dysfunctional or amusing aspects of the people that we came across.
Back in the world of product companies, first at Currencycloud and latterly at IMMO, I lost the diversity of material and experiences, the ability to anonymise the organisation to which I draw experiences and the ability to create anonymous personae. Only now, after a year at IMMO, do I feel that I am ready to write again and talk about some of the problems I found when I came here and the solutions we have tried, both successfully and unsuccessfully, to improve the situation.

IMMO Capital

I started work at IMMO as VP Engineering back in April 2022. My initial remit was to grow the engineering maturity of the organisation by understanding the big problems that we had and putting in place plans to address them. The company had just closed its Series B funding and had a list of technology due diligence items that needed to be addressed pretty urgently. 
It became apparent quite quickly that we were going through the growing pains of changing from a startup to a scaleup. Most of the solutions that we had in place were built using a low code framework called Zoho. This is a great solution for an organisation that is experimenting and trying to prove and understand a business model, but it cannot be described as a sustainable or long-term solution. It is fair to say that one of our big problems (which we have made big strides to fix) is reducing the reliance of the business on Zoho without compromising our operational efficiency.

Technology Strategy

In my time as a consultant I helped many organisations to put together their technology Vision and their Technology Strategy. In some cases I was involved in starting the delivery of the technology strategy. 
Our approach was always:
  1. Understand and document the current state of all the organisation's technology.
  2. Understand the desired future state by describing what impediments or constraints the current picture places on our ability to deliver on our business goals.
  3. Describe the steps to get from the current state to the future state in small enough chunks that can be owned by somebody (preferably an individual) and delivered in a reasonable time frame.
We didn't really have a playbook on how to "do" technology strategies either at Thoughtworks, in my time at TW London at least, and certainly not at Codurance when I started. On reflection I'd say that Thoughtworks was very good at steps 1 and 2 above, I saw great strides in those steps at Codurance in my time there, but in both cases, there wasn't any kind of playbook and certainly no framework to describe step 3 to the client. After a few missteps, reading Eben Hewitt's excellent book and learning a lot from the CTO at Currencycloud, my conclusion was that the best way to achieve step 3 is through OKRs (as long as they are done right). I am putting together a post purely about OKRs later. I'll link to it here when it is done.

Quality Strategy

While formulating the early version of the Technology Strategy at IMMO it became clear very quickly that we need to understand better how to achieve "quality". We had CI pipelines in place but the company did not have, or didn't think it had, enough stuff in AWS to have invested yet in tooling to achieve CD. There was still a fair bit of manual testing and, whilst there was a very heartening amount of Terraform scripting in evidence, the actually delivery of things into production was being managed by our Platform team on a set release cadence.
The biggest reason I heard time and again from the teams was that we couldn't release on demand because we had to do a load of testing. So I boiled those objections down to "how can we be confident that our changes 'work' and that they won't 'break" something that is already out there so that we can do a release?" I think when we ask ourselves what all of our pre-production environments are for, the answer is that they help us answer those questions.
I therefore took an action to build and help to implement a "Quality Strategy" as part of the wider technology strategy. This is now starting to bear fruit.

The End to End Test Delusion

One of the biggest problems I have seen in an organisation's test strategy (maybe it was never a strategy, just the way things evolved) is when they have a load of "end to end" tests, usually written in something like Selenium / Webdriver, that are supposed to all pass before a release happens. I believe that this anti-pattern evolves because of some or all of these reasons:
  • Unit tests were never written contemporaneously with the code
  • The effort to retrofit unit tests was tried but failed, probably because:
    • They found that in order to unit test effectively they would need root and branch redesign and refactoring
    • Nobody knows what the bits of code are meant to do, it "just works"
    • Leadership (at different levels of magnification) doesn't understand the value of the effort
    • Product doesn't care, it just wants new features, so anything that doesn't directly contribute to a new feature is seen as a waste of time
  • The solution is, or evolved from, a single, big, hard to unit test monolith. There may still be a rump of this hard to test monolith in the middle of every flow or most flows.
  • There has been no investment in contract testing or some kind of testing that involves, either directly or indirectly, multiple components working together.
  • There is, or was, a QA team who built over time bigger and bigger test scripts (sometimes called Regression Packs) that are mandated to be completed before each release. At some point it became clear that this process was unsustainable so somebody decided to automate all of these tests.
Whatever the reason for it, a large amount of "end to end" tests is at best an impediment and at worst a serious problem because:
  • Such tests are expensive to run, in terms of infrastructure and time taken waiting
  • The feedback loops are too long
  • They are hard to maintain and usually very flaky
  • They give you a lot of information when they pass (yay! everything works!) but very little information when they fail (end to end test #3423 failed, we don't know why, it could have been a connectivity issue, it could have been...)
The end game of the end to end test delusion usually looks something like this:
  • The tests take so long that there is a "code freeze" on a "release candidate branch" so that they can complete, be "fixed", successfully run, defects fixed and then release. This is a big bang release and is precisely the anti-pattern that automated testing hopes to get away from.
  • The tests are so flaky that everybody ignores every test run and people stop maintaining them (very likely).
  • When tests fail, the team can't wait for it to be fixed before they release, so they comment out the offending test, or skip it somehow, so that they can get their release through the pipeline.
So if we accept that "too many" end to end tests is bad, we then have two questions to answer:
  1. How many end to end tests should I have?
  2. How can I get "enough" automated testing in place so that I can have confidence that I can release my changes?

A Picture is Worth a Thousand Words

As a consultant I learnt to value models and diagrams above most things. If a model could be expressed in a simple diagram (e.g. Cynefin) then better still. The benefit of a model is that it gives you a frame of reference in which to discuss things. So if your model is accepted as somehow useful, it can start to form the basis of some kind of playbook on how to do things.
Happily, when it comes to the notion of test coverage there are already some diagrams in existence. Sadly, in terms of answering the above questions, I have always found the existing models, the best known of which (I think) I've listed here, don't fall short on question 1 and don't address question two at all.
In my critiques that follow, please bear in mind that I'm looking at these various models mainly through the prism of the above two questions. I accept that what I see here as a weakness may not be a weakness to somebody else's use case, indeed may well be seen as a strength.

The Test Pyramid

The Test pyramid is probably the best known of the ways to visualise your test coverage. There are many different versions in existence (some of which add a "cloud" of manual testing at the top of the pyramid) which could be part of its problem in my mind. Many of the shortcomings are articulated in the article by Fowler that I've linked to.

Fowler's Test Pyramid


I like the arrows that Martin added to this picture indicating Slow v Fast, Expensive v Cheap. I personally also like to add another two arrows when I talk about this diagram:
  • Tests at the top give you lots of information when they pass, very little when they fail. Tests at the bottom give you very little information when they pass, excellent, targeted information, when they fail. I think it was my Thoughtworks colleague Chris Ford who passed this one on to me.
  • Tests at the top should be owned, written and maintained by a QA people (but understood by engineers), Tests at the bottom should be be owned, written and maintained by Engineers (but understood by QAs. This is one that I came up with at some point in the past and, of course, only makes sense if you have QA as a competence, hopefully embedded in your delivery teams.
Strengths:
  • Easy to understand
  • Clearly shows that you need "most" of the stuff at the bottom and "least" of the stuff at the top.
Weaknesses:
  • There are many different versions, meaning it suffers to a degree from semantic diffusion.
  • It can cause big discussions and misunderstandings as people can get dogmatic about what "integration test", or "service test", or whatever actually means. 
  • The outcome of "confidence" or "enough testing to gain confidence" isn't really addressed.

Agile Testing Quadrants

I thought that this was originated by Lisa Crispin in Agile Testing but as this blog from Lisa says, it originated earlier with Brian Marick's agile testing matrix. What I really like about this is that it explains what types of tests you should concentrate on for what types of outcomes using two simple dimensions, help the team v critique the product and business facing v technology facing.

Strengths:
  • Easy to see what type of test to write for what type of outcome you want
  • Describes different stakeholders in quality
Weaknesses:
  • No indication of how much of each type of testing is good or bad
  • Lost of information in here, maybe too much (you need to read the book to drill into the different types of testing here)
  • It is of much wider scope than "can I be confident in my release"