Wednesday, 20 September 2017

Risk Management Theatre

Process Theatre?

Some time ago I put some placeholder headings on my blog for some thoughts I was having. One of them was "process theatre" which I was never happy with. Before I dive into it, some historical context...

Security Theatre

I think a lot of people are familiar with the concept of Security Theatre. This is the practice of investing in measures that give the impression of providing security when in fact they do little to improve actual security. We see this in the UK with armed police patrolling London's transport hubs. These people do a wonderful job I'm sure but they will not prevent attacks. The real work is done behind the scenes by intelligence people. I guess the problem is (for some people, not me) that this work is not visible.

"Security Theatre" has worked its way into the IT lexicon to describe things that happen that give somebody somewhere some comfort that things are all OK but may not actually add value in an Agile environment. Consider the insistence of many "infrastructure" teams on installing virus checking software on every production instance. This may have had some merit back in the day of real metal servers but it has no merit now. I've been involved in teams who have to install virus checking software every time we spin up a new instance in AWS. This adds minutes to the start up time and hence the deploy time. I have never known this virus checking software find a virus or highlight any issue. We still have to do it though because it is on somebody's checklist somewhere.

So Security Theatre is stuff that looks like security but isn't. In fact it is probably worthless. This is why I was always unhappy with "process theatre" because, whilst in my mind it was describing a worthless process, it is definitely a process.

Reactive Processes

On my last 3 major engagements I worked for mature organisations, including one bank, who were in various states of confusion, inability to innovate and general lack of corporate agility. This type of work is often dubbed as "Agile Transformation" or "Digital Transformation", certainly "transformation" needs to be in there somewhere.

The big thing that I kept seeing over and over again were processes that had to be followed no matter what. Such processes were (probably) created in reaction to some bad event that had real consequences at some point in the distant past. The business responds by installing all sorts of governance around that thing which is evermore regarded as risky. Teams then evolve to own parts of this process. Over time the original intent of the regulation is lost. The teams that manage it now exist for the purpose of that process. There is no knowledge, or desire, to understand what outcome was originally being supported. The end state is that entire teams of people exist whose day to day job involves servicing a check list to ensure that if (when) things go wrong, they don't get blamed.

So I am grateful to my colleague, Robin Weston, who introduced me to the phrase, Risk Management Theatre, apparently coined by our esteemed ex colleague, Jez Humble, to describe such processes. This is much better than what I was thinking. What we have is a process that is seen by some to manage a particular risk but which in fact does nothing to manage that risk. In fact in some cases it actually makes it worse.

So I was reassured that what I had observed and thought about was a thing. So I've been thinking about it a lot in the course of my consulting work.

Example - Code Freezes and Release Cycles

If you have worked in software delivery for any length of time you will have come across (unless you are very lucky) the "death spiral" of code freeze - testing period - release to production - pray - work late and work on weekends to fix stuff - throw out fixes - rinse and repeat etc etc.

Long before Agile was a thing somebody released some code that broke something in production. This was probably before TDD was a thing, maybe there weren't even any kind of unit tests. Old bugs were reintroduced, nobody really knew what had changed, nobody could understand why it didn't work in production because "it worked on my machine" etc etc...

In response to this undeniably bad happening the business decided that it needed to test its releases more thoroughly. This seems like a reasonable response, indeed in 2004 it may well have been a reasonable response. So they create a test team somewhere. In order to do a release now you have to build your application, put it in some environment for testing and have this test team run through their tests scripts for a prescribed period before each release. Note that after you've built into this test environment you cannot make any further changes to the code that will be released (unless they are critical bug fixes), the code freeze. This is so that everybody can be sure that the test team is testing the stuff that will end up in production.

Ten years later this governance process still exists even though software development practices have moved on. We now have TDD, pair programming, cross functional teams, continuous integration and continuous delivery. We now understand what we have changed much better and moreover we have tested it at the unit level, the integration level and the user journey level. One of the big factors in our confidence is that we changed very little. This means we know what changed, we know what we needed to test and we know how to undo it very quickly if something somehow slipped through the net.

If we still have to go through code freeze and a testing cycle what happens? We try to cram as many changes as we can into the codebase before the code freeze. We cut corners on our Agile process to beat the deadline. We start to game the process by forcing new features past the code freeze by pretending they are bug fixes. We end up doing a big bang, waterfall style release. This is precisely the thing that caused the original problem. We have good solid practices in place that mean we don't need this whole process but the insistence of somebody, somewhere that that process is necessary perpetuates the exact behaviour that made it necessary in the first place. This is why I call this a "death spiral" - once it starts it is very hard to stop.

Corporate Immune System

So why do these teams and processes not get swept away? I think there are many answers, and I don't have all of them by any stretch. A good summary of some of the reasons can be found in the Corporate Immune System.

The organisation has evolved many processes, they have to be followed. There may be a blame culture within the organisation which prevents bold, innovative, employees from challenging the status quo. They first think of avoiding blame and second think of adding value. Innovation means change, which is something to fear for many people. There are managers, or even CXOs, who are heavily invested in the people supporting some of the risk management theatre. These often powerful voices argue strongly for the continued existence of the process that they curate citing examples from ancient history to show their worth. Other people have written far more eloquently and far more knowledgeably than I about the Corporate Immune System. I'm not saying the corporate immune system, or its side effects, are the only cause of the continuance of risk management theatre, but it certainly doesn't help!

How do we break the cycle of Risk Management Theatre?

I haven't fully formulated my thoughts on solutions yet. I have had varying degrees of success in the teams I worked with on my previous engagements. I'm now working in the public sector which introduces a whole new dimension of dysfunction that I hadn't previously seen. This is what prompted me to get my thoughts down about Risk Management Theatre as they stand. I want to take that as a baseline to hopefully move forward with this new client and hopefully come up with some solutions that may help even in the public sector. That is very much a work in progress!

My best answer at the moment is to reason with people involved in Risk Management Theatre about the outcomes they are supporting. If you can find a receptive audience, with an appetite for positive change, they should be willing to reason out the answer to "what outcome are you supporting" and extrapolate that thinking on to the question of whether the process they are using is the best way to achieve that outcome.

Customer focus is a great reasoning tool here as well. If you can ask the question above in terms of customer focus, i.e. "what customer facing outcome are you supporting with this process?" this can not only help an individual to reason about a single process but it can also help resolve a conflict between opposing processes and goals. For example, we have had success using the customer as a reasoning tool to resolve the tension between the development team who wants rapid change and the infrastructure team who oppose all change.

I will return to this subject when I understand better how to challenge (and hopefully reduce) risk management theatre in a public sector environment that cares little for any customer and may not even have a clear understanding or consensus about who their customers are.




No comments:

Post a Comment