Insight
Expediated but controlled: how to recover from a technology crisis
By Dan Ridge, Charlie Herbert and Phil Nottingham
A technology incident can happen to any business, as we have seen evidenced in the news over the last several months across many big brands. But when it does happen, your initial reaction to the situation is the defining feature for how quickly business operations can be re-established.
For CIOs and technology leaders, these moments define credibility. Communicating to the board while aligning business and technical priorities. Balancing risk against continuity. These decisions will shape long-term trust in your technology leadership.
We’ve supported global enterprises through ransomware containment, outages and critical failures — experiences that have shaped our structured crisis recovery approach. Across these, there have been some common patterns to how a business can move itself through the crisis and out the other side:
Think of your recovery plan as three layers:
- Control: contain and coordinate
- Stabilise: restore and communicate
- Recover: rebuild and strengthen
Ideally you should have a rough strawman of how this process might work on file (a physical one!) so you can reach for it in the event of a crisis. This is something that can be planned for, and hopefully never used, but having a strategy in place to manage this eventuality feels like a wise decision in the current environment. The following is how we would advise you structure your plan.
Establish control
As quickly as possible, confirm the incident lead and create a command centre.
Create a stakeholder list, although the technology team will be at the forefront, there are many stakeholders who need to support the effort. This should be a living document that includes all external dependencies too such as the service integration partner, cloud and vendor reps.
Depending on the severity of the crisis, in a complex global organisation there will be different internal parties to involve. Mapping out how these teams interact, for example legal, compliance and investor relations will be an important part of the command centre’s early activity.
Cut through red tape by pre-authorising a small group with decision rights. Define the escalation process and publish a RACI, If it is clear who can do what it’ll save a lot of wasted time or potential miscommunications.
Concurrently, shut down any systems that might be impacted (working with an excess of caution at this point) and quarantine infected devices to stop the spread.
Stabilise
Once the initial state of crisis is under control, it is time to assess and confirm your plan. This will be an agile state, especially in the early days, but there are some key pointers that will help with this:
Prioritise people
The early stages of a crisis are an all hands to the pump moment. Your team will be working long hours, possibly without time to go home if they live far away. Make provision for that and look after them. Block out hotel rooms, make plans for catering and ensure there are quiet spaces within the building where they can go if they need time out.
Some people thrive in a crisis, others struggle and you might be surprised at how particular people react. Make sure you have the support in place that is needed to protect both their energy levels and mental health. Discussions will get heated and tempers might flare. Help your team to treat each other with respect (and forgiveness when it’s needed). Find ways to shield the core team from their normal daily work if necessary so they can focus on the task at hand.
Create morning standups, evening washups and predictable updates in-between so everyone in the recovery team is kept informed.
Carve out little moments of joy where you can to celebrate the small wins. A visit from the ice cream van or a pizza delivery can lift the spirits to a surprising degree.
Show the progress
If your systems are down, you might have to go old school with a white board or post-its in the early stages. Whatever it is, find a way to keep everyone updated on tasks, owners and status.
Track the progress so that everyone can see that there is some. Whatever is meaningful to your project – services restored, users back online etc.
It might feel very slow at first, but progress will speed up and not only will it be motivating for everyone to see it, but later on you will want to analyse that data and decision processes for the audit trail.
Restore in an orderly way
The reason you need the plan and the decision makers defined is so that you can all agree the order systems are restored. If everyone just works on their own piece of the puzzle you could find it slows you down.
You will need to work differently to usual. Decisions will need to be expedited but controlled. Loosen the normal change process to speed the recovery, then tighten as you release back into production (quarantine and inspection before re-admitting).
Typically, we’d advise that you work on a compliance checked recovery of your core infrastructure first: identity/authorisation, networking/DNS/IPAM, storage, backups and monitoring. Then you can recover platforms (e.g. VDI/Citrix) before moving on to the business applications. Create a business-unit priority list (revenue, customers, regulatory exposure) to decide which regions/teams get restored first. Communicate this clearly and regularly. Protect the core tech team by making it someone else’s job to communicate this and deal with queries.
Recover
Expect it to take several weeks to reach some form of initial stability, months to get to “mostly normal” and probably a year of ripple work (debt pay-down, decommissioning, process fixes).
There will be impacts on the work that was going on before the crisis: paused programs and delayed launches which will need to be accounted for.
Perhaps the one silver lining to a crisis is that it does give you the chance to retire obsolete systems and accelerate long-planned improvements. It is your ‘once in a long time’ opportunity to ask: “If we rebuilt today, would we rebuild it this way?” If the answer is no, change it.
While every crisis is unique, and there are certainly some sector-specific considerations, this broad process works and is certainly a better starting point than a blank sheet of paper.
A crisis is a learning opportunity, so once the dust has settled enough, it gives everyone the chance to reflect on what worked and update the playbook for the future.
If you’d like to explore how your organisation’s crisis readiness and recovery strategy compares with best practice, our experts at prosource.it can help you benchmark and prepare.
Get in Touch
Talk to us today to explore how we can support your organisation's technology needs.