How to Manage Delivery and Risk (Be a Well Rounded Technical Lead, part 3)

Thursday, 29 May 2014

“It is impossible to win the great prizes of life without running risk” 
Theodore Roosevelt

Software projects fail. Sometimes that's good — fast failure is far better than a long drawn out death-march. However many projects fail for disappointing reasons. Problems that could have been prevented if the right information was uncovered and acted upon at the appropriate time.

Be the barometer for technical risk.

Ensure that that problems are spotted and acted upon appropriately. Shine a light and discover if the team's assumptions are correct. Gain visibility of the real issues, get buy-in to act.

Perfection is the enemy of working software.

Guide the delivery between risk and reward. Ask the questions: 'Can we do the same with less risk? Could we take a little more risk and achieve much more?'

Walking around the practices

Risk identification and analysis - What’s going to cause you to fail?

Your first step is spotting the technical risks that are going to cause trouble. Risk identification starts as simple as making a list of what might go wrong. Listen to your gut feelings and experience.

I ask the team what they see as the challenges. What might keep them awake at night. Ask them to imagine then have just delivered the system and ask: 'What were the high and low points? What went wrong? What was hard and why?'

Once you have a view of the risks, do some finding out. Be scientific and prove your theories. Quickly coding and testing critical scenarios can cast early light on choices that may have huge impact later. Collect and use data. It will enable you to seek trends and create likely predictions.

Identifying risks isn’t a one time action. Take time out regularly to review where everything is at, what’s not working out, and what’s unknown.

Risk Mitigation - not failing when things go wrong

The risks you've identified need managing. Some risks have an impact bigger than can be allowed and must be avoided. However, in many cases we shouldn't be avoiding all risk, we should be reducing the damage it causes.

From a technical point of view, a risk averse environment is one full of checkboxes, procedures and delays. The bar is lowered and made safe for everyone. This is not an environment that invites change or challenge. Often risk is accumulated, not mitigated.

The other extreme features principles like Facebook’s "Move fast, break things" which gives people trust and freedom to do the right thing. Having this flexibility can result in a system and a team that flexes and deals with failure better.

The right answer depends on circumstances. I would advise looking to be reactive to failure rather than protective of change. Embrace safer failure and take the risks you can. Be able to learn and continue on error.

Dealing with the unexpected

Unexpected things happen, no matter what you plan for. Blend the way you mitigate your known risks with the way you deal with unexpected failure. Build your system to take 'what-ifs' into account. Make the unexpected less harmful or less likely to happen.

There are some rules to consider:
  • Ensure your system has clarity of function and simplicity. If you don't need it, don't have it.
  • Minimise interactions and dependencies. Ensure you deal well with downstream failures.
  • Ship often. Bugs are easy to fix when code is fresh in people's minds.
  • Have your system tell you when things aren't working.
  • When releasing, segment your users to get feedback without impacting everyone.
There are some articles on architectural and delivery patterns discussed in Further Reading that help to deal with risk and failures.

Prioritisation - Doing the right things at the right time

Try to do everything at once and you'll fail. Equally, you have to be feeling very lucky to want to rely on fate. Deciding what to do first and identifying what's better to leave till later will have a big impact on your delivery, risk exposure, and learning.

These four perspectives can help when prioritising.
  • Focus on your critical challenges. What’s important, what’s urgent, what’s both, therefore critical? Each demands different actions and time-frames.
  • Focus on opportunity. Consider the principle of the Most Responsible Moment and make timely decisions.
  • Focus on consequence. Balance the severity of impact with the likelihood of it happening.
  • Focus on reward. What would bring the most reward for the least effort. What actions will inform your next steps the most?
Get the team involved in finding out what’s important and choosing what to do next. A good team will function best when you, they, and the mission in front of you are aligned.

Managing Technical Debt and Cross Functional Requirements

Mismanagement of Technical Debt and Cross Functional Requirements are regular features of software delivery failures. It often falls to the Tech Lead to spot and highlight the actions and priorities needed in these areas.

Managing the risks of technical debt

The risks of technical debt are the unforeseen costs of a capable team of developers maintaining what they have built. They loose their ability to move fast and adapt. Conversely avoiding it completely it is also detrimental to success.

New technical debt isn't often a big risk. The trade-off of getting features to users to discover their value is often worth the risk. The team that coded it is on hand and have the ability to improve the situation. Avoid investing heavily in code that hasn't yet proved its value.

Old technical debt ain't so good. Much bad code is forgotten knowledge and unrecorded experience, it slows the team down through bugs, confusion and bad assumptions. Don't let technical debt it get stale. If code becomes relied on, it should be made solid, understandable and changeable. If it isn't valued, it should be got rid of.

Accept some technical debt, but keep track of your situation and don't rely on code you can't trust. Manage your investments wisely by structuring your code. Modularise, encapsulate, and decouple so that improvements or rewrites can happen simply.

Mind the Cross Functional Requirement gap

Cross Functional Requirements are the demands that make a system function predictably at scale. I often see systems that are unable to deliver a reliable, trustworthy service due to lack of focus on these CFRs. That users value these as features, is often missed until the system breaks.

There is always a lot to do and rarely enough time to do it in. Waiting to build in reliability at the end is fraught with surprises and assumptions. Building it all up front rarely helps either. The system and the features need to be built in step. Split up and prioritise the work needed to support the features as they are built, but do have a roadmap and look ahead to where you need to be growing towards.

You may need to champion recognising the value of the CFRs as part of the delivery. You'll need to understand the risks of not building each CFR, and guide the team and the wider decision makers to understand the trade offs, so everyone understands the choices.

Trust and Reputation

When you and your team make mistakes, you want to be seen as a group that worked with the right intentions and can continue to be trusted. For this you need trust and a good reputation. Whereas risk mitigation can be a technical operation, trust comes from sharing and communicating outside the team.

Highlight your successes, share your plans. Involve the people who your product or system matters to. Accept steering in choosing the right level of risk that suits the business drive. Don't cry wolf, but ensure that the decision makers know the risks you've foreseen and what you are doing about them.

Don't deny your mistakes, instead talk about what you are doing to fix them. Critically, when you don't know, ask for advice, and share your plans of how you at going to find out.

Further Reading

Release It!, a fantastic book, talks through many of the technical risks of complex systems and their interactions. The author lays patterns and anti-patterns to look for in your systems.  Continuous Delivery is a risk management strategy for delivering software. Jez Humble talks about it in this article.

The What, Where And When Is Risk In System Design? video looks at risks from a web operations perspective. Looking further, outside of technology, understanding How Complex systems fail can help you both assess what you have build and how build systems for less failure. There is a related video on the subject from a velocity conference.


Identify your most critical issues and ensure you are safe enough. A balanced approach to risk allows you to to gather information and learn as you go, and to focus on what's most important.

Working beyond your team, with the people who your work matters to, can help you make better choices and understand the risks from different perspectives. This is a big subject and one we return to in the next post.

No comments :