An Insider’s View of a BCP Failure: What an IT Manager Wants You to Know?

IT Manager’s View on Avoiding a BCP Failure

A business continuity plan (BCP) document is an enterprise IT manager’s crisis playbook. At a deeper level, it provides a snapshot of the IT assets, systems, and processes to be protected at all costs from probable disruptions. In summary, it highlights the importance of keeping the business running during an outage or its impact.

As a form of crisis management practice, BCP has changed since 1970 with decades of technological and regulatory standard developments. Despite being a mainstream strategy for recovery and operational resilience, even under non-ideal circumstances, there have been unfortunate real-world examples of BCP going wrong. In every instance of a BCP failure, an IT manager can predict the level of damage and tenacity based on the time and cost involved in recovery.

In this blog we will share an IT manager’s perspective on why an enterprise’s BCP is less likely to work.

An IT Manager’s Viewpoint of BCP and Avoiding a BCP Failure

iTech GRC’s dedicated GRC team has demonstrated expertise in helping enterprises upgrade to the Business Continuity Management module within IBM OpenPages. We have collated takeaways from GRC professionals and spoke to iTech’s IT manager for an insider’s view on avoiding risks that can cause BCP failures.

BCPs are built around ‘what ifs’. Why are they necessary?

It’s true that BCPs are based on several ‘what if’ scenarios. They are created to help improve an organization’s preparedness for unexpected incidents. These unexpected incidents are real-world risks that halt business operations and slow down recovery. BCPs are built on the hypotheses of risky events to form probabilities that would help safeguard IT systems, hardware, devices, networks, teams, and resources for business continuity.

BCP is a detailed policies and procedures document outlining org-wide or process-level plans to respond and recover from impending risks such as:

Cybersecurity attacks like malware, phishing, and ransomware.
Power and network outages, appliances, and devices malfunction.
Physical damages from natural disasters like fire, flood, Tornado, etc.
Data loss can result from human errors, such as due diligence failures, file deletions, security oversights, missing updates, and social engineering.
Offsite outage from data center downtime.
Lockdowns and pandemics that deeply affect business operations.

If BCP can help plan for uncertainties, why not just avoid them?

Just like settling the age-old ‘chicken or the egg‘ riddle. Do risks emerge from uncertainties, or do uncertainties exist because of risks? This may warrant endless discussion. But objectively speaking, almost every enterprise requires a BCP. To elaborate, let’s investigate some research facts around cybersecurity risks. Typically, every enterprise implements network, application, operation, and data security management protocols. And yet every minute, four companies fall prey to ransomware attacks. Despite remarkable developments in cybersecurity solutions, over the last decade, there has been a whopping 87% increase in malware attacks!

Here is an example of unpredictable risks. No one anticipated the COVID-19 pandemic. It disrupted even the largest enterprises with stronger technologies and resilient protocols with interruptions. Ultimately, the cost of continuity and time for recovery were the common business continuity imperatives for businesses irrespective of their size, revenue, or industry operation. Moreover, IT disruptions from disasters and downtime are the most expensive.

How to Predict a BCP Failure?

How would you measure the success of a business continuity plan?

It comes down to two standard metrics: continuity cost and recovery time. BCP budgets include the hidden cost of an outage, alternative IT systems investment, loss of business cost and customer value, employee cost, and non-compliance fines.

A single minute of IT shutdown can cost $9,000, so it makes sense that IT managers and teams strive for high availability.

Downtime costs are expensive in today’s hyper-digital age. Downtime costs are measured using the formula:

Downtime cost = average minutes of downtime x cost per minute.

The cost per minute for smaller enterprises is about $427 and $9,000 for larger businesses. In the enterprise space, the average cost of downtime per hour without regulatory penalties and legal fines is up to $5 million. Research data reveals that the hourly downtime costs have increased by 32% over the seven years.

The recovery time objective (RTO) and recovery point objective (RPO) are essential parameters that can be used to measure your business continuity and disaster recovery (BCDR). RPO helps calculate the maximum acceptable data loss per minute during disruption. It answers what the IT administrators must do to recover data loss. RTO refers to the average time required for restoring applications and systems affected during an outage.

RTO and RPO help measure the overall effectiveness of BCDR, identify critical applications and systems that an enterprise needs to continue operations, plan allocation, and meet service-level agreements (SLAs).

These KPIs give realistic recovery estimates with the available IT systems and resources. Unfortunately, research stats reveal that in small-medium enterprises, 16% of executives are unaware of their RTOs, and 24% expect their data to be recovered within 10 minutes after an event! Post-pandemic, small and medium companies are disproportionately susceptible to unforeseen events.

What is the difference between BCP and disaster recovery (DR)?

It is common to find that both concepts are approached collectively. However, two disciplines may need to be combined depending on the scenario. BCP employs a more comprehensive approach towards a broad scope of risks an enterprise may encounter at a given point. Disaster recovery (DR) plans include comprehensive instructions for protecting IT systems and data during disaster scenarios.

How would you define the immediate benefit of BCDR plans?

Business continuity and disaster recovery are two of the confirmed benefits. Investment in a thoroughly planned BCDR impacts enterprises by reducing downtime, which typically costs millions of dollars, slows down daily operations, and hampers credibility and reputation. Lower financial risks from downtime and reduced legal penalties are other merits of a BCDR plan done well.

What are the reasons for a BCP failure?

There’s a long list of reasons for BCP failure. Planning precedes development. Most BCP is about addressing the lowest-hanging fruits or the most pressing needs that can put the enterprise at risk. The reasons for BCP failures stem from mismatches between IT systems and the users within the enterprise, lack of training, or dedicated resources to implement recovery approaches.

BCP success depends on the plan overview and how it is implemented, tested, documented, and communicated with all the stakeholders. That’s just the high-level reason. Below are the questions you need to ask to determine whether your enterprise’s BCP is a failure or a success:

Is the BCP based on business impact analysis (BIA) and risk assessment (RA) data?
Does the BCP document clearly describe the scope of internal and external disruption and its impact on IT assets, systems, and processes?
Does it include the disaster recovery (DR) plan and tasks to manage it?
Does it meet the relevant SLA metrics, such as maximum tolerable downtime (MTD) and maximum tolerable data loss (MTDL)?
Have you documented and updated the training hypothesis and iterations to the BCP plan?
Is the BCP adequately tested with live exercises and simulations?
Are the results from BCP testing documented and implemented in areas that require extra training?
What are preferred backup mechanisms and systems to ensure business continuity?
How often do you update the BCP and DR documentation?
Where is the alternative DR site? Is it on-premises or cloud-hosted?
In the event of a disaster or an incident, what would be the preferred communication channel?
Does your BCP document succinctly describe the go-to resources, configurations, processes, personnel, external stakeholders, internal points of contact, etc.?
Where would the enterprise plan to store data required for resuming operations after an event?
Is the BCP exercise and plan communicated effectively to personnel (IT and non-IT) across departments, functions, and branches?
Are the employees and personnel given adequate training in BCP procedures?

Any final thoughts?

In a crisis, BCP and DR eliminate unnecessary guess games. They equip teams with a decent level of certainty to recover and curb further impact. It’s better to be proactive than reactive.

iTech GRC utilizing OpenPages Business Continuity Management helps enterprises continue their operations while protecting their resources and systems from disruptive events. The solution’s powerful data visualization features and dashboards allow IT teams to identify dependencies, data relationships, and risk probabilities to stay proactive.

If you are looking for the expertise of certified GRC professionals on planning and maximizing enterprise risk management through BCP, contact us for a discussion.

Talk to iTech’s GRC team today!

An Insider’s View of a BCP Failure: What an IT Manager Wants You to Know?

An IT Manager’s Viewpoint of BCP and Avoiding a BCP Failure

Need an expert IBM OpenPages implementation partner to help you develop a comprehensive GRC solution?

Let's start a conversation

Subscribe to our latest releases

Connect with us

GRC Solution

Contact Us

Quick Links