Friday 21 July 2023

The Starling Method


Behavioural Interviews

Behavioural interviews have got more and more popular throughout my working life, so much so that I'd now say they are ubiquitous. I recall that when I was starting out, people would ask questions such as "how would you approach a situation where..." whereas now the question is always, "tell me about a situation where..." This may sounds like a small shift but the consequences are massive. We are now dealing with concrete examples with real situations, real people and real results (good or bad), rather than theory that we have read somewhere.

From what I have read on Wikipedia, "The idea is that past behavior is the best predictor of future performance in similar situations. By asking questions about how job applicants have handled situations in the past that are similar to those they will face on the job, employers can gauge how they might perform in future situations." Interestingly, I found exactly the same phrase in this LinkedIn post, but the Wikipedia article cites a 1995 source and the LinkedIn was posted in April 2023, so I think the 1995 reference is probably the original. So if we accept the initial premise above, "...past behaviour is the best predictor of future performance..." then we have to be prepared to go to interviews with some good answers to these questions because saying "if this happened to me I would..." is not going to cut it.

The STAR Method

Every piece of advice I've seen advises using the STAR method to answer the question. The components of your answer (if you aren't guided into them) should contain the following elements:
  • Situation - what is the context of this story
  • Task (or Target, I'll come back to this) - what you were asked to do
  • Actions - what you did
  • Results - what happened, how did it work out?

Situation

This is pretty self explanatory, what was the situation that meant you had to (e,g.) resolve some conflict in your team? (a depressingly frequent visitor to the behavioural interview, why does everybody assume that conflict resolution is sufficient common that it has to be raised in every interview? Does it say something about the organisation that asked the question?) 

Task (or Target)

This is what you were asked to do / had to do. For example, "resolve a conflict". I prefer Target over task here, but maybe that is just me. My reason is that a lot of stories begin with "I was asked to do..." One thing I learnt in my Thoughtworks days is that delegation should be based on delegation of outcomes, not tasks. So in my personal experience of delegation, which is extensive as a manager, I have learnt that people react better to being given an outcome and being free, at least to an extent, to work out how to achieve that outcome. 

So a while ago I asked a team member to "get us to a place where all our laptops have encrypted hard drives and they can be remotely wiped in the event of loss", rather than, "Find the best MDM for our company." To an extent, this also framed the selection criteria of the MDM by outlining the most important characteristics of the choice. This in turn helped my colleague to provide me with the right level of detail to support the choice that was made.

Actions

This is pretty obvious, I hope. This is what we did.

Results

Again, this should be obvious. What happened? Were you successful? Some sources I have read also tag on the end something about learning. The Wikipedia article linked above says, "What did you learn from this experience? Have you used this learning since?" I find this coda problematic. Learning should be a first class citizen.

Learning as a First Class Citizen

All good organisations should embrace learning. If they don't, they will not improve, if you don't improve, you will be going backwards in comparison to your competitors. Jack Welch is said to have said, "An organization's ability to learn, and translate that learning into action rapidly, is the ultimate competitive advantage." (although I'm sure I've read a similar quote by Peter Senge).

So I believe that almost everything we do should contain a self reflection (or a team reflection, maybe call it a "retro") on what we learnt. Perhaps you can use the results to ask, "how do I make sure I succeed again", often we need to ask, "how do I make sure a similar failure doesn't happen again?" So learning should always follow.

Conclusion - The STARLing Method

I believe learnings should follow experience and I believe that if something is perceived as "failed" that shouldn't matter as long as appropriate learnings were taken and acted upon. So I would propose that we should talk about the STARLing Method:
  • Situation
  • Task or Target
  • Actions
  • Result
  • Learnings
Armed with this modified answer, we should no longer be afraid to discuss a situation in a behavioural interview where there may have been a perception of failure. I would even argue that when we talk about a "failed experiment" in Lean experiment terms, we shouldn't talk about failure. The only failed experiment is one with an inconclusive result. If we tested a hypothesis that we thought might increase sales but it didn't increase sales, we have learnt something that doesn't work. That isn't failure, that is learning!

Thursday 23 March 2023

Visualising Test Coverage

Back in the Product Company World

It has been a long time since I wrote a blog post. When I was a consultant, first at Thoughtworks, then at Codurance, it was easy to take the time to write the articles and our clients gave us lots of material. It was very easy to write articles that combined experiences and mashed them up into coherent experiences and learnings without betraying any client consultancy trust relationships and it was easy to create personas based on the most dysfunctional or amusing aspects of the people that we came across.
Back in the world of product companies, first at Currencycloud and latterly at IMMO, I lost the diversity of material and experiences, the ability to anonymise the organisation to which I draw experiences and the ability to create anonymous personae. Only now, after a year at IMMO, do I feel that I am ready to write again and talk about some of the problems I found when I came here and the solutions we have tried, both successfully and unsuccessfully, to improve the situation.

IMMO Capital

I started work at IMMO as VP Engineering back in April 2022. My initial remit was to grow the engineering maturity of the organisation by understanding the big problems that we had and putting in place plans to address them. The company had just closed its Series B funding and had a list of technology due diligence items that needed to be addressed pretty urgently. 
It became apparent quite quickly that we were going through the growing pains of changing from a startup to a scaleup. Most of the solutions that we had in place were built using a low code framework called Zoho. This is a great solution for an organisation that is experimenting and trying to prove and understand a business model, but it cannot be described as a sustainable or long-term solution. It is fair to say that one of our big problems (which we have made big strides to fix) is reducing the reliance of the business on Zoho without compromising our operational efficiency.

Technology Strategy

In my time as a consultant I helped many organisations to put together their technology Vision and their Technology Strategy. In some cases I was involved in starting the delivery of the technology strategy. 
Our approach was always:
  1. Understand and document the current state of all the organisation's technology.
  2. Understand the desired future state by describing what impediments or constraints the current picture places on our ability to deliver on our business goals.
  3. Describe the steps to get from the current state to the future state in small enough chunks that can be owned by somebody (preferably an individual) and delivered in a reasonable time frame.
We didn't really have a playbook on how to "do" technology strategies either at Thoughtworks, in my time at TW London at least, and certainly not at Codurance when I started. On reflection I'd say that Thoughtworks was very good at steps 1 and 2 above, I saw great strides in those steps at Codurance in my time there, but in both cases, there wasn't any kind of playbook and certainly no framework to describe step 3 to the client. After a few missteps, reading Eben Hewitt's excellent book and learning a lot from the CTO at Currencycloud, my conclusion was that the best way to achieve step 3 is through OKRs (as long as they are done right). I am putting together a post purely about OKRs later. I'll link to it here when it is done.

Quality Strategy

While formulating the early version of the Technology Strategy at IMMO it became clear very quickly that we need to understand better how to achieve "quality". We had CI pipelines in place but the company did not have, or didn't think it had, enough stuff in AWS to have invested yet in tooling to achieve CD. There was still a fair bit of manual testing and, whilst there was a very heartening amount of Terraform scripting in evidence, the actually delivery of things into production was being managed by our Platform team on a set release cadence.
The biggest reason I heard time and again from the teams was that we couldn't release on demand because we had to do a load of testing. So I boiled those objections down to "how can we be confident that our changes 'work' and that they won't 'break" something that is already out there so that we can do a release?" I think when we ask ourselves what all of our pre-production environments are for, the answer is that they help us answer those questions.
I therefore took an action to build and help to implement a "Quality Strategy" as part of the wider technology strategy. This is now starting to bear fruit.

The End to End Test Delusion

One of the biggest problems I have seen in an organisation's test strategy (maybe it was never a strategy, just the way things evolved) is when they have a load of "end to end" tests, usually written in something like Selenium / Webdriver, that are supposed to all pass before a release happens. I believe that this anti-pattern evolves because of some or all of these reasons:
  • Unit tests were never written contemporaneously with the code
  • The effort to retrofit unit tests was tried but failed, probably because:
    • They found that in order to unit test effectively they would need root and branch redesign and refactoring
    • Nobody knows what the bits of code are meant to do, it "just works"
    • Leadership (at different levels of magnification) doesn't understand the value of the effort
    • Product doesn't care, it just wants new features, so anything that doesn't directly contribute to a new feature is seen as a waste of time
  • The solution is, or evolved from, a single, big, hard to unit test monolith. There may still be a rump of this hard to test monolith in the middle of every flow or most flows.
  • There has been no investment in contract testing or some kind of testing that involves, either directly or indirectly, multiple components working together.
  • There is, or was, a QA team who built over time bigger and bigger test scripts (sometimes called Regression Packs) that are mandated to be completed before each release. At some point it became clear that this process was unsustainable so somebody decided to automate all of these tests.
Whatever the reason for it, a large amount of "end to end" tests is at best an impediment and at worst a serious problem because:
  • Such tests are expensive to run, in terms of infrastructure and time taken waiting
  • The feedback loops are too long
  • They are hard to maintain and usually very flaky
  • They give you a lot of information when they pass (yay! everything works!) but very little information when they fail (end to end test #3423 failed, we don't know why, it could have been a connectivity issue, it could have been...)
The end game of the end to end test delusion usually looks something like this:
  • The tests take so long that there is a "code freeze" on a "release candidate branch" so that they can complete, be "fixed", successfully run, defects fixed and then release. This is a big bang release and is precisely the anti-pattern that automated testing hopes to get away from.
  • The tests are so flaky that everybody ignores every test run and people stop maintaining them (very likely).
  • When tests fail, the team can't wait for it to be fixed before they release, so they comment out the offending test, or skip it somehow, so that they can get their release through the pipeline.
So if we accept that "too many" end to end tests is bad, we then have two questions to answer:
  1. How many end to end tests should I have?
  2. How can I get "enough" automated testing in place so that I can have confidence that I can release my changes?

A Picture is Worth a Thousand Words

As a consultant I learnt to value models and diagrams above most things. If a model could be expressed in a simple diagram (e.g. Cynefin) then better still. The benefit of a model is that it gives you a frame of reference in which to discuss things. So if your model is accepted as somehow useful, it can start to form the basis of some kind of playbook on how to do things.
Happily, when it comes to the notion of test coverage there are already some diagrams in existence. Sadly, in terms of answering the above questions, I have always found the existing models, the best known of which (I think) I've listed here, don't fall short on question 1 and don't address question two at all.
In my critiques that follow, please bear in mind that I'm looking at these various models mainly through the prism of the above two questions. I accept that what I see here as a weakness may not be a weakness to somebody else's use case, indeed may well be seen as a strength.

The Test Pyramid

The Test pyramid is probably the best known of the ways to visualise your test coverage. There are many different versions in existence (some of which add a "cloud" of manual testing at the top of the pyramid) which could be part of its problem in my mind. Many of the shortcomings are articulated in the article by Fowler that I've linked to.

Fowler's Test Pyramid


I like the arrows that Martin added to this picture indicating Slow v Fast, Expensive v Cheap. I personally also like to add another two arrows when I talk about this diagram:
  • Tests at the top give you lots of information when they pass, very little when they fail. Tests at the bottom give you very little information when they pass, excellent, targeted information, when they fail. I think it was my Thoughtworks colleague Chris Ford who passed this one on to me.
  • Tests at the top should be owned, written and maintained by a QA people (but understood by engineers), Tests at the bottom should be be owned, written and maintained by Engineers (but understood by QAs. This is one that I came up with at some point in the past and, of course, only makes sense if you have QA as a competence, hopefully embedded in your delivery teams.
Strengths:
  • Easy to understand
  • Clearly shows that you need "most" of the stuff at the bottom and "least" of the stuff at the top.
Weaknesses:
  • There are many different versions, meaning it suffers to a degree from semantic diffusion.
  • It can cause big discussions and misunderstandings as people can get dogmatic about what "integration test", or "service test", or whatever actually means. 
  • The outcome of "confidence" or "enough testing to gain confidence" isn't really addressed.

Agile Testing Quadrants

I thought that this was originated by Lisa Crispin in Agile Testing but as this blog from Lisa says, it originated earlier with Brian Marick's agile testing matrix. What I really like about this is that it explains what types of tests you should concentrate on for what types of outcomes using two simple dimensions, help the team v critique the product and business facing v technology facing.

Strengths:
  • Easy to see what type of test to write for what type of outcome you want
  • Describes different stakeholders in quality
Weaknesses:
  • No indication of how much of each type of testing is good or bad
  • Lost of information in here, maybe too much (you need to read the book to drill into the different types of testing here)
  • It is of much wider scope than "can I be confident in my release"


Tuesday 21 September 2021

Technical Debt Interest and Depreciation

Foreword

I started writing this article in September of last year (2020) when I was still a consultant. It was one of my standard consultancy rants about clients not "getting" things. Having now spent a while out of the consultancy bubble, I finished the article because one of our engineering managers asked me about how we can justify tackling technical debt over prioritising a feature. I referenced this article to him but then realised that I hadn't finished the article. So 90% of this was written in September 2020, then, if I remember correctly, I tweaked the spreadsheet because it was faulty, then today I added a link to the sheet and a conclusion.

The Danger of Dogma

I've written and talked about the danger of dogma in the past but I'd always been talking form the perspective of the consultant. In particular, while I was at ThoughtWorks, I was aware that certain aspects of engineering practice had become so ingrained into the ThoughtWorks culture that many of my colleagues talked dogmatically about using them. Whilst I would certainly support the adherence to these practices, talking dogmatically about, in particular, pair programming, never helped us to persuade sceptical clients of the merits of it. In one case I recall, a colleague of mine who had come through the graduate training program admitted to me that he was incapable of forming an argument in favour of pair programming because it was all that he had ever known in a business context. Thus, dogmatically advocating pair programming was his only possible approach.

Anti-Dogma

Just like patterns are anti-patterns are a thing, I've noticed recently that people can develop powerful negative feelings toward certain practices that I'm starting to think of as anti-dogmas. Commonly I have seem strong opposition to Agile ("it doesn't work here"), pair programming ("why do you need two people to do the work of one person?") and especially Technical Debt ("why do programmers always want to gold plate things?"). If you meet this kind of opposition to what you consider to be sensible solutions to obvious problems it can be extremely wearing. I'm happy to discuss with people what their concerns are and what other solutions may exist but when the other side of the argument is being dogmatic it brooks no possible discussion.

Trying Every Argument

For the past 6 months or so I've been working as a Delivery Principal across a few different accounts and dealing directly and indirectly with various stakeholders. We have had some success at these clients, notably in going through a series of workshops with one sceptical (but not dogmatic) client to illustrate the damage to the business being caused by burgeoning technical debt and the urgent need for us to find a way to start paying it down. 

One of our client stakeholders (I'm going to call him "John" in this article, obviously not his real name), however, has shown himself to be virtually immovable despite all evidence. It is John's firm belief that the only reason why delivery is slow is because the developers don't work hard enough. He has said to me more than once, with a straight face, that if he "saw more urgency" things would get done better. He has continually refused our suggestion to put in place a methodology to measure the four key metrics and thus start to understand what "better" or "worse" may mean for the company, he just knows it isn't good enough. He is dogmatically opposed to, amongst other things, pair programming, TDD, any kind of context sharing and the concept of any development team deploying and running its own code.

Over the course of the whole engagement, now in the region of 15 months, we have tried many different arguments to attempt to persuade John that it is vitally important for the continued existence of their company that they take seriously the technical debt that has been accumulated and the effect that it is having on their ability to deliver new features in a timely fashion. We have failed in this endeavour. No matter how much we talk about "the weight of opinion in the industry" or "recent research" or "authoritative sources", John refuses to engage in these conversations, because he is so convinced that he is right. It is quite staggering that he seems to think he knows better than, amongst others, Mary Poppendieck, Ward Cunningham, Martin Fowler, Kent Beck, Nicole Forsgren and a host of others that most people would consider to be worth listening to.

The Trigger

The trigger to do something different finally came this month. After the relationship with our client had stagnated for a time and it has become increasingly difficult to engage John in conversation (lockdown has had something to do with this admittedly) we asked our senior leadership team if we could consider how we might accelerate the conversation with our client. We want to get to a stage where we can start to work in the way that we know will be better for all parties or work on our exit strategy from the client. In order to ensure that we had done everything we could before considering walking away from our client, myself and my colleague were asked to put together a proposal to take to John that could "reboot" the relationship and hopefully make it better for everybody. 

In His Terms

The substance of our proposal is not very different from something that we proposed just before lockdown. The difference this time is that we know John is not going to respond to any kind of persuasive argument. The only thing we think he may listen to is a finance based argument or an argument that can guarantee, or make more likely, the delivery of features "on time". So I was asked to prepare something that would appeal to John and hopefully that would get him listening to us.

Continual Improvement Program

My first instinct for the proposal was to articulate what giving us the responsibility for a continual improvement program whilst still taking responsibility for the delivery of features (to be agreed) might look like. This approach had worked in one of our other clients in the recent past, enabling us to move from low value tactical "just get it done, I don't care about quality" work to a combination of continual improvement work focused on always improving our ability to deliver the features the client wants. Unfortunately, to get John to listen to this argument would involve him first admitting that there is a problem that needs to be solved. We don't think he understands that there is. He accepts that the solution is not perfect but he doesn't accept that it is valuable to invest money in making it any better. After all, he argues, it works.

Delivery Responsibility

We spoke internally about taking delivery responsibility for the work. This is a tricky thing and could be seen to be the start of a slippery slope towards a fixed price delivery which is something I never want to be involved in. We felt that we had to offer to take delivery responsibility in order to get anywhere in the conversation but in so doing we have to be allowed to deliver. We can't carry on with the current fractured value stream and broken (and hugely costly) automated tests. In order for us to be confident of being able to deliver on whatever expectations we agree we need to be able to address the technical debt items, large and small, that we know are there and impeding delivery. But we fear that despite all evidence John will not give us the empowerment we need to deliver which will make it unacceptable for us to take on delivery responsibility. So how can we possibly close the gap?

Technical Debt Compound Interest

I have read a lot of articles about technical debt incurring interest but couldn't recall ever finding any mention of what the numbers may look like and how they may affect the bottom line. I decided to look again to see if I could find anything that looks like a financial model. I found this article that mentions that different "interest rates" exists, but no other context. This article is fairly typical, talking as it does in generalities and principles but no hard numbers. I did find this research paper quite interesting but actually it was far too complicated for my purposes, I just wanted some kind of quick and dirty model of what technical debt costs.

Building a Model

I thought back to an article I had written for the Codurance website a couple of months ago on the cost of doing nothing. I couldn't remember exactly what I'd written but I do recall that the argument I made was that when you "do nothing", meaning when you refuse to address historical technical debt, things get worse and it gets harder to deliver things. The cost can be evaluated in wasteful processes but the far more interesting number, if you can calculate it, is the cost of not having new features to use. We normally call this opportunity cost. But what is a sensible, or quick and dirty, measure of opportunity cost and how does technical debt influence it?

Inspired by being encouraged to make some assumptions in the costings of my proposal I thought I would play around with what looked like some sensible numbers to see how allowing debt to pile up or conversely paying it down, would affect the return on investment in some reasonable horizon. I used the obvious tool at my disposal - Google Sheets. I constructed a simple model of a development group that can sustain three concurrent streams with a backlog of items that are all scheduled to take (according to estimates) 3 months to complete. They are staggered to start on the first day of each month. 

Parameters and Outputs

The parameters I can vary in the version I have at the time of writing are:
  • Baseline. This represents the current velocity of the teams. If it is 1 then the estimates are accurate. If you set it to a number greater than 1 then the delivery will take longer than the estimate. So if you set it to 1.2 then each delivery will take 20% longer than the estimate.
  • Tech Debt Interest. This parameter represents the rate at which tech debt interest is accumulating or the rate at which it is being paid back. If the organisation does not care about quality and the situation is deteriorating, the number will be greater than 1. If the organisation is making real efforts at continual improvement this number will be less than 1.
  • Feature Value. The model assumes a single monthly value for a completed feature for the purposes of calculating a ROI. This is often hard to calculate in real life but I happened to know that in John's case feature value is fairly well known because most features on the roadmap are implemented in response to signed contracts with clients that have a monthly value.
You can view my spreadsheet here. This is a read only access but you should be able to copy the data and play with it yourself.

Conclusion

The results are quite interesting because of the power of compound interest. If you assume that you incur a cost of 1% per month on your ability to deliver new features you'll see that very quickly the overall financial bottom line deteriorates. In fact, the way I currently have it set up (3 parallel streams, feature value of £50k, 3 months to deliver a feature), that 1% monthly degradation in your ability to deliver features, which will not be noticed month to month, will cost just shy of £0.5M in the first year.

In reality, I think ignoring technical debt will incur a bigger cost than 1% per month (I haven't done the maths in detail though I have worked as a consultant in some places were upwards of 50% of developer time was attributable to "waste" because of tech debt). So if you use a model like this and put in what you think are realistic figures for your organisation, perhaps based on some evidence you gather of time wasted over a time period, you should find you can make a convincing argument for making tackling technical debt a first class deliverable.

Tuesday 10 November 2020

Why Is Your Team Unable to Deliver on Time?

I was asked a while ago by Codurance to write a post with this title which I duly did. As always, our posts to Codurance blogs are peer reviewed before publication and the main feedback I got for it was that it was very good but that the tone was a bit too personal. So I was asked if I could tone it down a bit. When I looked back over the draft I had to agree with the feedback, it definitely was a little too personally opinionated. In my defence, I was rushing to get it finished before I went away on holiday and didn't really read it back. So after I toned it down, it was published to our website here

I did, however, keep the original draft thinking I would put it on my own blog a couple of months after it was originally published to the company blog. So here it is, the original, heavily opinionated, draft...

Why Is Your Team Unable to Deliver on Time?

In this article we examine some of the reasons why a software delivery team or operation might be consistently missing its deadlines. We first examine what “on time” means before examining various common reasons why deliveries may be late. This article is firmly rooted in the problem space and does not seek to go into detail around the solutions that may be appropriate for late deliveries.

The Problem Statement

“Software never gets delivered on time,” is a complaint that I’ve heard many times from business leaders, product owners and technology leaders. Often this refrain is accompanied by the plaintive cry that “we used to be much better than we are”. Often they have tried many things to make the situation better but to no avail. Sometimes it even feels like the harder they try to improve things, the more resources they put into making deliveries work, the more they get delayed. How does this happen and what can we think about doing to help the situation?

Understanding “On Time”

Before discussing why the development teams are not delivering on time, it is important to discuss what “on time” means. 


It is possible that development teams are not being “slow” at all. It could be that unreasonable expectations are being imposed upon them. It could be, for example, that delivery dates for products are mandated by higher management before development teams are even involved. Any organisation should beware of imposing the iron triangle on any development group as it will almost certainly lead to failure of a project or product.


It is tempting for any business to point to a development team and to blame it for late delivery. It is therefore hugely important that for “on time” to be understood, agreed and bought into by all of the interested parties. If a development team is being held accountable for a delivery date that it feels it did not agree to then it will almost certainly not be motivated to meet that date and may well fall back on a blame game because “we never agreed to this anyway!”

Planning Tension

Developers and development teams feel more comfortable when asked to predict a timescale for something small whilst business focused people demand to know how long large things will take to deliver as they are often asked to plan and forecast for large periods of time, sometimes in the multiples of years. Resolving this tension is a subject in itself but in general, it should be understood that it isn’t possible, or valuable, for a development group to predict with certainty more than a few weeks into the future. Any definition of “on time” must be flexible enough to take this into account. Rather than hold teams accountable for slow delivery or poor estimates, it is much more valuable to find a way that will work more usefully for all interested parties.

Lack of People

It is possible that a development team could be failing to deliver on time because of a lack of people. Whilst we have known since the 1970s that simply adding extra people to a software development effort will not necessarily reduce the time to delivery it would be wrong to simply dismiss extra resources as a possible solution to slow delivery. 


If your delivery team is responsible for different products or its work can be easily parallelised because it doesn’t have hard interdependencies, then it could be that the team simply has too much to do and could benefit from extra people. It should be relatively simple to understand if this is the problem with any given team. Beware though that adding extra people into a team could cause problems with sharing context and it might be better to use the available resources to create two new teams rather than expand the existing one.

Silos and Lack of Alignment

Many organisations naturally evolved around technical competences. This seems perfectly logical and works well up to a certain point. I worked for a startup from 2005 until 2015 during which time we grew from five technology focused people to around three hundred. Originally we had a CTO, two developers (of which I was one), a database person, a designer and three non-technical people (CEO, salesperson, and business manager). As we hired more people they fitted into what was a single cross functional team, although “cross functional” was never a phrase we used.


As we grew larger, it seemed natural for the database person to become the manager of the new database people and then later on the developers were split into two groups, one responsible for the website and one responsible for “everything that isn’t the website”. I was the leader of that second group. So we had by this stage three technology focused teams. This still worked reasonably well up until the point that we grew big enough so that it was no longer obvious and visible to everybody what everybody else was doing at any given point. 


This was a strange thing to observe and I can’t say that it was obvious when the moment came. What I now know is that we ultimately started suffering from the malign effects of Conway’s law on our ability to deliver solutions to our users. The value stream was now fragmented between several teams, each with its own goals, priorities and incentives. We now had silos in our technology delivery capability meaning that we had handovers of work between non-aligned teams. The result of handovers is queuing time. Queuing time is waste, pure and simple. It doesn’t matter how “busy” or “utilised” different parts of the value stream are if there are handovers and queues, that queuing time will almost certainly be the biggest factor in your overall delivery time.

Lack of Knowledge, Experience or Expertise

Norm Kerth’s Agile Prime Directive should be read at the beginning of every retrospective meeting. It tells us that:


“Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand.”


This is an important message and it should always be considered that a team may simply be lacking in “skills and abilities”. Do not discount this as a possible cause of the inability to deliver on time.

General Skills and Experience

Are the developers in our teams capable of delivering the outcomes that we are asking them to deliver? Do we have the necessary experience to understand how to own a delivery process? It is important to understand, articulate and align on the skills that are needed in every team and to have an open and honest conversation around whether the group contains those skills. If you find that your team does not contain the requisite skills or experience then that is the problem that should be tackled.

Domain Knowledge

Do you have a very competent team that has insufficient knowledge of the domain in which it is being asked to work? It could also be possible that an individual in your team is holding all of the domain knowledge in a personal silo. In the former case, you need to examine what kind of product ownership of analysis capability you need to introduce and in the latter case, you will need to examine the mechanisms your teams use to share context internally. For example, are they pairing regularly on all aspects of the delivery, from story creation through development to deployment?

Control Mechanisms

Many organisations, particularly more mature organisations, have control mechanisms in place that was originally designed to support an outcome in an entirely sensible fashion given the prevailing norms of technology and business at the time they were conceived. The problem with many such processes is that they no longer make sense in the modern world of Agile delivery and in some cases they can actively impede value creation. Such processes are sometimes known as Risk Management Theatre.


A good example of this is the traditional change management board or team. In the days when software was delivered on long cycles, perhaps years or more, and changed rarely after initial delivery, it made sense to carefully analyse and record the impact of changes and the potential risks. In the modern Agile world of fast feedback and baking quality in such processes not only do not make sense but actively work against getting value to your customers as quickly as possible.


It is possible that your organisation has many such small processes in place that no longer serve their original purpose and possibly even increase the risks that they were originally intended to mitigate. A good way to identify such processes is to ask people involved in process oriented work what outcome their process supports. If this is a hard question to answer, or that outcome is supported earlier in your delivery cycle, it could be that the process is not needed at all and could be abolished.

Technical Debt

A big vector for slow delivery is the accumulation of technical debt. This is a concept that has been talked about since at least 1992 by Ward Cunningham. The problem is that product owners often do not understand why they should care about it. A myth has built up over the decades that developers want to do things right just to please themselves. The truth is that every missing test, every piece of badly written code, every wrongly named method, every abuse of sensible design standards, has a cost. This cost is generally small in each case but you pay a heavy price in time because the cost of each, like real monetary debt, compounds. Code quality should not be regarded as “gold plating”, rather it should be regarded as the oil that makes the engine run smoothly.


It is important to understand if you are suffering from technical debt and if so to do something to tackle it. Tackling technical debt is not for the benefit of software developers, it is a necessary maintenance task that will help us deliver software on time today and in future. There are many tools available that can be used to analyse codebases, identify code smells and bad design and suggest improvements. Such tools can give objective metrics that can be tracked to help you understand if technical debt is an issue that should be tackled.


Ideally, any given product managed by a good delivery team will not be allowed to build up high levels of technical debt. But if it does, you need a strategy to tackle it. Simple strategies such as the boy scout rule or technical debt days can help but ultimately prevention, as with many things in life, is better than cure.

Automation

In a modern DevOps mindset, the development team takes responsibility for developing and running the application that they produce. Part of this end to end stream of value is deploying the application into some infrastructure. Slow delivery can often be caused by a lack of automation in the processes associated with deployment. In the worst cases, this could mean that you have on-premise servers hosting your software and the act of deploying software to them involves an individual or group of people following a list of manual tasks to get the latest version of the software running on the servers.


The ideal state for deploying software should be that you have automated pipelines that can deploy your software at the touch of a button. Furthermore, not only should the act of deployment be automated and therefore repeatable but the infrastructure on which it is deployed should be itself created or modified through scripts that are run through such automated pipelines. Your goal at every level should be to automate all the things that can be automated. Any manual, repetitive task is not only a waste of time but engenders a massive risk of failure.

Metrics and Throughput

Finally, and perhaps perversely, the methods by which your organisation measures the productivity of your development teams could cause them to deliver slower. There is a not unnatural desire within most management paradigms to understand how well teams and people are performing. Eli Goldratt once said, “Tell me how you will measure me and I will tell you how I will behave”. It is entirely possible that the measurements you are taking are causing the very problem they should be helping to solve.


The only sensible way to measure a team that makes something is to understand throughput. Throughput is defined as the rate of return of value to the company or its customers. The problem is that it is virtually impossible to understand throughput in a software delivery organisation because it is impossible to assess the value of epics, features, stories or tasks. This leads managers to devise proxies for throughput such as (the abuse of) story points, stories completed, lines of code, bugs fixed or anything else you may think of. The trouble with such proxies is that they are not leading indicators of the value returned and they can be gamed easily.


It could be that you can create a sensible, meaningful, game-proof metric that is a good proxy for throughput. It could even be that you can exactly calculate throughput. In my experience, this is rare to non-existent. The best way of assessing value, and understanding whether teams or the whole organisation is performing well, is to measure the four key metrics and iterate on their improvement. These metrics - Lead Time, Deployment Frequency, Mean Time to Restore and Change Fail Rate - have been shown to be an indicator of high performance in delivery teams and organisations.

Conclusion

There are many different possible reasons why teams could be perceived to be persistently unable to deliver on time, each with its own solution. My golden rules to follow when diagnosing and attempting to fix slow delivery are:

  • Understand what “on time” means. Make sure all interested parties are aligned on what “on time” means and are bought-in to any agreed dates.

  • Understand if you have handovers and consider reorganising your business so that you have aligned cross functional teams capable of and empowered to deliver on the entirety of their customer outcomes.

  • Don’t discount simple explanations.

    • Do we have enough people? 

    • Do we have the right skills?

  • Care about technical debt and make sure that everybody knows that it is everybody’s problem if technical debt accumulates.

  • Automate all the things.

  • Only measure meaningful things, if it isn’t possible to measure throughput, consider tracking the four key metrics.


Wednesday 26 August 2020

Pascal's Big Bang

The Big Bang Cycle of Fear

I have written and talked in the past about the Big Bang Cycle of Fear (similar to its cousin, the Tech Debt Cycle). This is a cycle that has happened in countless places over the decades. Its end game - long cycle times, lots of failed releases, low automation and general fear around changing anything - is the state that I have been introduced to at the start of many engagements as a consultant, first for ThoughtWorks and now for Codurance.

The Big Bang Cycle of Fear (extended version) goes:

  1. There was a "bad" release that caused some production problems.
  2. The response is "we have to do more tests" but instead of automating tests, because the codebase is in such a state that test isolation is hard to achieve, we add more manual testing process and possibly more "testers".
  3. In order to be able to "test everything properly" before each release, we do a "code freeze" after a set period and then assign a set period for testing of that "release candidate".
  4. At the end of this testing period nobody is confident that there isn't a problem but the "regression packs" passed with only minor issues so it is probably OK. In any case, the product owners are screaming for the release of "my feature" to go ahead.
  5. Ostensibly to discuss the risks associated with the release and therefore whether the release should go ahead, we have a long and costly meeting with all the team leads, the product owners, some key developers, the head of engineering and possibly the CTO. In reality the purpose of this meeting is to share the blame for the failure when it comes and thus ensure that no single person is blamed and possibly fired.
  6. The release goes ahead.
  7. There is a production disaster and we go back to step one.

Understanding the Issue

The first step to solving any problem is understanding that you have a problem. I think most people involved in a BBCF would recognise that they have a problem. But would they be able to identify the probable root cause of the issue? I've recently read some books about systems thinking and one of the insights in there was that people normally look for solutions to problems at the point at which the problem manifests. I think in the case of our cycle, it isn't clear where the problem happened because it is a cycle. Thus, it might not be clear where the best place to look for a solution will be.

Of course, most organisations flounder around for months, possibly (even probably) years not understanding that they are causing their own problem. A things goes wrong in production and they blame the technology part of the organisation. Of course, they do, it is a technology problem isn't it? Sadly all too often the response of the CTO is also to not understand the issue. So instead of tackling the real cause or fixing the real problem, they decide that they have to do more testing, which manifests in some common anti-patterns.

More Manual Testing

Sometimes an organisation will hire more "testers" for their "QA team", not realising that you can't test quality in after the event. Maybe they insist on a longer code freeze to make sure that they can test everything, not understanding that by doing this, they are exponentially making it it harder to test effectively. The worst way that this gets done is when the organisation already outsources its testing (we do development in house, of course!) because it is seen as lower value, and thus they hire dozens more people at the outsourcing firm, probably several time zones distant, ensuring that any feedback loop is automatically extended by at least a working day.

More Automated Tests

A seemingly more sophisticated response is to decide that what is needed is more automated tests to cover the existing functionality. They have no people available to write these tests, or the skills needed, so they hire a "Test Automation Consultant" to write a suite of tests. This seems on the face of it to be a sensible short term expense to solve a long term problem and it may even appear at first to be effective, a classic anti pattern trait. The trouble is that these "automation test consultants" leave once they have written the tests, nobody will then maintain the tests, they will then start to fail as soon as changes are introduced. Now the company is stuck with pipelines that won't complete and which probably take hours to run because they are "end to end" tests, because that is what you asked for.

How Many Things Can Go Wrong?

A couple of years ago I was working as an adviser for a company who had undergone, and were still undergoing, this big bang cycle of fear. Instinctively it feels wrong to make a big release. The more changes there are to release, the more things can go wrong. I noticed that the testers (separate from the developers) had added tests at each new release which were by hand tests, so each successive release necessitated more testing for any subsequent release. This would have been OK, of course, if the new tests added were all automated and could be run in a reasonable amount of time. Sadly, they weren't automated and so they couldn't be run in a reasonable amount of time. So straight away they had introduced a linear scaling problem.

New Problems

The strange thing, from the perspective of the managers at this company, was that even though they had enough people to run all of these "regression packs" at each stage and thus, they thought, verify their existing functionality, they still experienced strange, unforeseen issues. "How can this be?" they asked, "when we are running all of these regression scripts?" It was clear to us that the answer was simple: over time the design of the system had degraded to the extent that there were all sorts of tight, loose and temporal couplings between the various parts of the system. Nobody had a sense for what these couplings were and nobody had any idea about what they were intended to do, let alone whether they were doing it correctly. So clearly they weren't considering enough things in their testing. They needed to consider interactions between the things.

My Mathematical Question

I went to a meeting with all of the senior managers just after another failed release in which they had released in the region of 30 different things. As usual they had gone through their ritual of blame sharing, which they called "Go / NoGo" so therefore they were safe in the knowledge that no single person in the room was going to carry any cans. In fact, the interactions between the different parts of their solution were so unpredictable that they didn't even know which "thing" had failed. They just knew the release had caused issues.

I started with a simple question:
If I release one thing, how many things can go wrong?

Clearly, the answer here is 1. So I followed up with another simple question:

If I release two things, how many things can go wrong?

At this point there was a bit of a murmur in the room but the general consensus was that two things could go wrong. "I beg to differ", I said, "Thing #1 can go wrong, Thing #2 can go wrong OR the unexpected interaction between Thing #1 and Thing #2 could cause a problem." Nobody argued with this because that was exactly the experience that they had been having.

So the next question I posed was, naturally:

If I release three things, how many things can go wrong?

Now, the maths starts getting a little complex because at this point each of three individual things can go wrong, Each or three interactions between pairs can go wrong or an interaction involving all three things can go wrong. So we now have 3 + 3 + 1 = 7 places to look for our problem. I could see the group starting to appreciate where I was going with this.

Pascal's Triangle

So naturally, my train of thought led me to think that there must be a simple formula that tells us how many things can go wrong when you release N things. At first I thought I was looking at the formula for adding the first N numbers, that is to say N(N + 1) things, or O(N2), which is bad enough, but then I realised it was even worse than that.

You may remember Pascal's Triangle from studying maths at school. My recollection of it was that it was introduced to illustrate the Binomial Theorem. I realised as I was going from 3 things to 4 things... to N things that what I was seeing was the successive rows in Pascal's triangle:


So if you start from row zero (the single 1 at the top) and number the successive rows, then row 5 contains the numbers {1, 5, 10, 10, 5, 1}. If you ignore the initial 1 (which can be regarded as representing number of cases where no things interact with no other things and thus can't be the cause of any issue) you can see that there are 5 single things that can go wrong, 10 pairs of things interacting that can go wrong, 10 triplets interacting that can go wrong, 5 sets of 4 things interacting and a single set of 5 things. So we can see that if we release 5 things, there is a potential 31 things that can cause us problems.

As well as representing the coefficients in a binomial expansion the numbers in the Nth horizontal row of Pascal's Triangle add up to 2N so given that we know the first 1 is irrelevant for this discussion, the conclusion is that if we release N things, we are causing a potential 2N - 1 issues and therefore need to test all of those things to be sure that our release will not fail. Even if we could rule out certain interactions and therefore reduce the overall universe of possible interactions, the number of things that can go wrong, and therefore could need to be tested, when we release N things is still O(2N) and thus of exponential complexity. So the scary answer for my client back in 2018 was that if you as a group agree to release 30 things simultaneously then you need to test 230 - 1 things, which is just over a billion things. It is easy to see that you are stretching the limits of feasibility here and thus why your releases fail every time.

What is the Answer?

The answer is simple and, to a lot of people, obvious. Release one thing at a time. This means reduce your batch sizes until you are doing continuous delivery. Find a way to gain confidence around your changes so that they can be released as soon as they are "done", not weeks later. This could mean you do no new work (or reduced amounts of work) while you pay down technical debt, it could mean assigning developer effort away from new features but probably most importantly it HAS to include taking a long hard look at your Work in Progress (WIP) and aggressively reducing it. 

How can you do this and still deliver value steadily? Well, for a short period of time you can't. You have to accept that your rate of release of new features will reduce. BUT, ask yourself how often you have released new things without causing new problems and then ask yourself what is your REAL rate of return of value to the business? The calculation will be different for every system but ultimately if you don't gain confidence around your working system your throughput will eventually grind to a shuddering halt.

I think the desired end point is clear. We should all be practising continuous delivery which means tiny releases very often. Depending on the current state it could be easier or harder to get there or it may not even be possible without some kind of large scale software modernisation program. Hopefully most people will see that particular state coming and do something about it before it arrives.

Conclusion

The bigger your release the more likely it is to fail. The combinatorial mathematics shows this.

Use Pascal's Big Bang to demonstrate to stakeholders the fallacy of adding extra testing cycles.

To understand your real throughput consider releases as complete only when they have no remaining issues.

Reduce batch sizes aggressively until you are comfortable with continuous delivery.

The best form of cure is prevention! Don't suffer from boiling frog syndrome. If your levels of technical debt are increasing, slowing delivery and causing problems for releases, don't allow your company to fall into one one of the common anti-patterns that inevitably lead to the Big Bang Cycle of Fear.

Tuesday 25 August 2020

On the Giving and Receiving of Feedback

What is Feedback?

I joined ThoughtWorks in 2015 after working at a startup for 10 years and a few other companies before that. The culture shock when I moved to ThoughtWorks from the (hero and blame) culture of my previous employer was huge. It was so huge that I struggled to adjust in my early months and at various stages in the first year was close to failing the probation period and / or quitting.

One of the weirdest things for me at the time was the culture of feedback. In all of my previous jobs personal feedback meant a once a year conversation with a manager in which the manager picked fault with your performance in the previous year and ignored anything good you had done in order to justify a miserable annual pay rise and a disappointing bonus. Feedback was thus only used for bad news, was so infrequent as to give no opportunity to improve and was strictly a one way process which, like most nasty things, only flowed downhill.

When I moved to ThoughtWorks (my current employer, Codurance, has a similar culture of feedback) my experience was very different. Firstly, in the two day induction there was a module on giving and receiving feedback. Secondly, personal feedback for the purposes of salary review was a very different thing. Instead of a manager (neither ThoughtWorks nor Codurance has the concept of reporting lines) using feedback as a way to beat down your salary expectations, the responsibility is put on the individual to gather feedback as and when you see fit in order to improve yourself in your role and possibly to support your argument for a pay raise at a later point. The key point being that it is up to you to seek feedback and also up to you how to solicit it, how to record it and how or whether to act upon it.

Instant Feedback

Instant feedback is extremely useful when used well because, at the risk of sounding obvious, it gives you a short feedback loop. Anybody familiar with Agile values should appreciate the value of short feedback loops. For example, have you ever been in some kind of feedback discussion, focussing on a large period of time, when somebody says to you something like "sometimes you can be arrogant"? This is all well and good and you might be inclined to say, or think, "I'll try to be less arrogant" but the chances are you weren't aware of being arrogant when apparently were being so. You may well, therefore, ask "can you give me an example of when you perceived me as having been arrogant?" Unfortunately, the person talking to you probably can't and therefore you are unsure what behaviour is being thus characterised and you can't act upon this feedback.

Giving Instant Feedback

I was taught to give instant feedback following three simple rules:
  • Do it as early as possible after observing something.
  • Tell the person what they did or said.
  • Tell them how it made you feel.

As Early as Possible

This is quite obvious. The earlier feedback is given referring to a specific incident the fresher in the memory of the feedback giver and receiver it will be. Moreover, you have the readymade example so need to ask for a specific example of a general behaviour.

What Happened

This should be quite clear as well. Tell the person what they did or said that is causing you to give them some feedback. Provided you have followed the first rule of instant feedback, there should be no dispute or interpretation at this stage. You are simply stating a fact of what happened.

How it Made You Feel

This may seem a little strange but is actually extremely important. The reason why you tell the person how it made you feel is because they cannot dispute this. Only you know how something made you feel. The difference between saying "you were trying to make me look small in front of the client" and "your comment made me feel like you didn't respect me" is enormous. The first comment can easily be met with an indignant (and very possibly justified) "No I wasn't!", the second comment can only prompt further useful discussion.

My Experience

Once I was used to dynamic of receiving and giving this type of fast feedback it became hugely useful for me when I received it and hugely satisfying to give it. Yes, there were sometimes hard conversations but almost always the conversation contained at some point a phrase such as "I had no idea that could make people feel like that, thanks for letting me know, please point it out if it happens again." I felt liberated and most importantly never again took part in conversations that went "You do things that to belittle people....it's just the way you talk to people....I can't think of a specific example, but you do it all the time..." The most satisfying aspect of all was that I could see, both when I gave and received feedback like this, is how much more valuable the early, instantly actionable, feedback was compared to if we had waited some considerable time before attempting the same conversation.

The Reinforcing and The Constructive

Just to be clear, this type of feedback giving should also include positive, reinforcing feedback. I realised that just as not having a specific example for behaviours that you think should be curtailed, not having a specific example for a behaviour that should be amplified is extremely frustrating. So my habit became to frequently give small nuggets of feedback, good and bad, but always immediate and actionable. It is my belief that none of that feedback was resented or ill received.

Longer Term Feedback

Longer term feedback is a bit harder to do effectively but unfortunately it seems to be much more common than instant feedback. There are many problems with it, mainly that the passage of time will have dimmed memories of certain incidents as previously mentioned. It also may be way too late to take any useful action on the back of the feedback that has been given as it could be many months after the actions that prompted it.

Why Engage in Long Term Feedback?

Annual Pay Reviews

As I've mentioned above the most likely reason that people may be having a conversation about actions and behaviours that have occurred during some long period of time will be to support the case for pay reviews. Most companies will have an annual pay review involving all their staff in which they have some kind of allegedly fair compensation formula which will reward everybody as they deserve but which in reality will force the whole of the organisation at every level of magnification to rank the peers and force their pay awards into some kind of normalised distribution that will result in a pre determined fixed rise to the total wage bill. Have I mentioned that I'm not a fan of this type of one-way-no-discussion-allowed feedback?

Project or Milestone Retrospectives

Sometimes teams break up, or reform at least, at the end of a project or at some significant milestone. It is also possible, particularly if you are a consultant, that people will regularly rotate in and out of engagements. In recent years I haven't spent more than nine months working for the same client. So it is often seen as appropriate to give and receive some kind of retrospective feedback from the people with whom you have worked for a time and from whom you may soon be separated.

Career Progression Discussions

A third use case for a retrospective style giving and receiving of feedback is to gain some kind of view on your colleagues' perceptions of your readiness or suitability for some kind of career progression. It could be that you believe that you are deserving of a promotion and you want to verify whether you are being reasonable or not. It could be that you have certain blindspots to your own shortcomings, or even your own strengths, and therefore you want to engage your colleagues in order to understand what some of these blind spots might be.

Some organisations, in fact many, will combine the annual pay review conversation with career progression conversations.

So How do I Do It?

how do you approach the giving and receiving of long-term feedback? This is a question that wasn't really addressed in a systematic fashion in the time I was at ThoughtWorks. In fact, other than saying things like "We have a feedback culture" or "We value the giving and receiving of feedback", I haven't seen many organisations that give their people much guidance on how to give and receive periodic feedback. 

Most of the time I have seen people sending emails to a group of colleagues soliciting feedback. More often than not it will say something like "Please reply to this email with some feedback." I never found it easy to respond to such requests. Too often, the response ends up being strong in terms of praise and very weak in terms of actionable points. Sometimes people request a meeting to discuss feedback. In those circumstances it can be even harder to give anything other than bland, "well done" type feedback. I think there are a number of reasons why this is but primarily it is because it is hard for many people to to give constructive feedback. Most of us are just too nice. The other reason I think is similar to what I mentioned above in that it can be very hard to come up with useful examples (because of the passage of time) of things to improve on which makes it difficult to mention certain areas.

Make it Comfortable

Having gone through a couple of frustrating cycles of less than useful feedback I alighted almost by accident on a method that seemed to work well for me, and which I shared with many of my ThoughtWorks colleagues. Firstly, one should recognise that feedback is useful and should be regarded as a gift. Therefore, make it as easy as possible for people to give you feedback. So when I would send an email soliciting feedback, I would give the recipients the option of how they felt most comfortable with giving the feedback. I would typically use something like "I'm happy for you to answer my questions by email, or if you'd prefer, we can meet face to face in the office, over Zoom, or in a pub after work if you like."

Make it Explicit

Finally, and I think this is the key, make your request for feedback more explicit. Instead of saying "please give me some feedback on my performance", try something like:

I'm looking for some feedback on my project performance. In particular could you please consider the following questions:

    • Is there anything I do as a Technical Lead [or whatever] which you think I should be doing less of?
    • Is there anything that you would expect a Technical Lead to be doing that I'm not doing?
    • Is there anything I could have done better in my direct interactions with you?
    • Can you think of any times when you haven't got what you expected from me personally?
    • What do you think I need to do more of to be considered as a Principal [or whatever the next level is]?
    • Is there anything I need to stop doing to be considered as a principal?
My experience was that five or six targeted questions yielded up actionable points that I could actively work on improving or eliminating as appropriate. I would have a pool of perhaps 8 or 10 questions that I would use for a single round of feedback and would share a different subset with different people depending on how we had worked together. For example, I would be more likely to ask another of the technical leads about interactions with our stakeholders, because they were probably in a lot of the same discussions, than I would be to ask the same question to my team's graduate developer.

Conclusion

  • There are two types of (very different) feedback. Instant feedback for immediate, actionable feedback on recent specific events and longer-term feedback on performance over time. Be clear on your reasons for giving and your expectations for receiving both.
  • The golden rules for giving instant feedback are do it soon, stick to the facts of what happened and tell that person how their actions made you feel. Most importantly, do not attempt to speculate or second guess the recipient's motivation for their actions.
  • Make it as comfortable as possible for people to have what could be an uncomfortable exchange.
  • Be explicit in your request for a long term feedback.