Cyber Security

When disaster strikes: Are you prepared? Navigating the essentials of Business Continuity and Disaster Recovery

Let's demystify the process and explore the essential steps to get you started on the path to resilience.

Richard Heysmond

21 October, 2024

In today's digital landscape, cyberattacks are an ever-present threat, lurking in the shadows, ready to disrupt operations and compromise sensitive data. Even with the most sophisticated security measures in place, breaches are inevitable. When the worst happens, will your business be able to weather the storm and emerge stronger?

The answer lies in having a well-defined Business Continuity (BC) and Disaster Recovery (DR) plan, your lifeline in a crisis. BC ensures your critical operations continue running, even in the face of adversity, while DR focuses on swiftly recovering lost data and infrastructure.

But where do you start if you've never developed such a plan?

Let's demystify the process and explore the essential steps to get you started on the path to resilience.

Step 1: Gather information: Talk to each department

What departmental activities do we consider most critical and why:

Start small . . . we just want the top two at this point!
This tells us what to restore!

What resources are needed to make these critical activities possible:

Access to our network? The internet? Phones, desks, laptops, software?
Do we need people with essential skills or important business records?
This tells us what is needed to perform the restore!

Now we ask, what will happen if we cannot do these activities:

When will we fall out of compliance with laws, regulations or contractual mandates?
When will operational impacts be realised?
When will the backlog of work become unmanageable?
When will financial impacts be realised?
When will we operate at a competitive disadvantage?
On a scale of 1-5 how badly will loss of critical functionality impact the business within a few hours, a few days or even a few weeks.
From a business perspective, this tells us what needs to be restored, how quickly and in what order, if we want to survive the crisis. However, it is for management to ultimately decide the pecking order!

Alternative strategies:

Are there manual workarounds and how long would they remain viable for?
Could the work be performed remotely?
Could you shift workloads to another part of the business or a third party?
What activities can be safely dropped?
Have you ever had a disruption before and if so, what happened and what lessons were learnt?
This tells us what can be done to keep us mobile in the interim.

Step 2: Using metrics for the planning

For each critical activity, we should now be able to calculate:

THE RECOVERY POINT OBJECTIVE (RPO):

This is the amount of data we can afford to lose prior to the incident happening.

However, the less data we can afford to lose, the more expensive life gets. For example, a weekly backup on tape is a relatively straightforward and cheap solution for any IT department to implement if we can afford to lose 6 days’ worth of data. However, a tolerance of only 2-hour loss will require something along the lines of disc mirroring or remote journaling! This is a more resource-intensive process to configure, operate and maintain over time.

BOTTOM LINE: IT’S A QUESTION OF COST BENEFIT!

Management will need to talk to IT! What storage solution offers the best level of protection for the amount of money we are prepared to spend, given the criticality of the function?

This conversation should also elicit some workable procedures. How will the data be saved? How will it be restored? How will we know if it needs restoring in the first place? Who will save it? Who will restore it? How can we check if saving the data and restoring the data works well? Who are the key points of contact?

ACCEPTABLE INTERRUPTION WINDOW (AIW):

This is our second metric: what is the maximum amount of time we can wait for the restoration of a critical service? Exceeding this metric poses an existential threat to the business.

RECOVERY TIME OBJECTIVE (RTO):

Now we know the AIW, we can work with IT, to set our own recovery targets. This is known as the RTO and it includes the ability to spin up the service (or any dependencies), configure it, import the data and check everything is hunky-dory.

Just like the RPO, immediate recovery requires highly resilient architectures and can come at a great cost. However, if the business can tolerate longer delays between the point of disaster and the resumption of the critical path, a weaker SLA offered by a vendor or third party might suffice.

BOTTOM LINE: IT’S STILL A QUESTION OF COST BENEFIT!

Just as before, management and IT need to discuss financially viable options. What recovery times are affordable and protect us from future losses? This too should kick-start conversations about procedures. Who will do what when and how? How will the key players communicate? How can we test any part of the plan to ensure it goes smoothly?

Testing might be nothing more than multiple stakeholders reviewing a checklist of activities, table-topping 'what if scenarios' or running very small-scale interruptions whilst we build experience and confidence.

Let’s look at our final metric. . .

THE MAXIMUM TOLERABLE OUTAGE (MTO):

This asks the question; how long can you remain in contingency? 1 week? 2, 3?

In part, this depends on how many critical functions you can restore and whether we are talking about full restoration or only 20% of warp drive capacity! But it also depends on the workarounds you might have spotted during STEP 1. If these are comprehensive and successful, we can usually last a little longer.

Step 3: Documentation

Usually, we create a Master BCDR plan that references other critical documents, but remember, we just want the basics.

Cover the aims of the document.
Explain clearly why management requires the business to take BC/DR seriously such as the fear of financial losses, reputational damage, compliance issues, loss of competitive advantage or employee morale.
Explain the criteria to initiate the plan and the call to stand down. This is more critical than you think. Moving into BC/DR too soon or too late can financially ruin you, so what constitutes a crisis? Is it a natural disaster, a major IT outage, a cyber-attack, the loss of key personnel, a pandemic?
Reference any standards and procedures regarding backup and the restoration of data (RPO).
Reference any standards and procedures regarding the restoration of critical systems (RTO).
Reference the roles and responsibilities of key players, making sure they have the authority and resources to act.

BC/DR planning may seem daunting, but it's a crucial investment in the future of your business. The steps outlined here offer a solid foundation for crafting a plan that aligns with your organisation's unique needs.

In the aftermath of a disaster, the true value of preparedness shines. With a robust BC/DR plan in place, you'll be equipped to navigate challenges, minimise losses, and emerge resilient. "By failing to prepare, you are preparing to fail."

Protect your business with Firebrand

For the past 13 years in a row, we’ve been named one of the Top 20 IT Training Companies in the World.

We specialise in accelerated training that helps you become competent, confident, and certified at twice the speed.

Could one of our courses be right for you, or your team?

See all our courses.