The global outage of Microsoft Windows systems across airlines, banking, and healthcare sectors in July 2024 was an eye-opener for the industry. The corrupted software release, ironically, from the cybersecurity company CrowdStrike, that was tasked with protection, led to crashes and disruptions across 8.5 million computers. The recovery process was slow and tedious for many.

Internal systems at several banks became inoperable, trades were stalled, and channels, tellers, and ATMs were affected. In the Philippines, for instance, at least five major banks’ operations were affected. In India, 10 banks and non-banking financial companies experienced disruptions. The Singapore Exchange’s post-trade system was also disrupted.

The impact is both monetary and reputational. Cyber risk analytics firm CyberCube estimates global insured losses to range from $400 million to $1.5 billion. Parametrix, a cloud monitoring and insurance services company, estimates that 25% of Fortune 500 companies were affected, incurring losses of $5.4 billion (excluding Microsoft), including $1.149 billion in losses for the banking sector.

Financial institutions (FIs) increasingly rely on multiple vendors for a range of enabling technologies, making outsourcing and vendor risk management extremely important. There have been several other vendor-led outages that have disrupted the operations of institutions in recent years.

Challenges in technology risk management and resilience

The interdependence of systems adds complexity, as one mistake in the network can spiral into an industry-wide event. The risk has increased because some large technology companies are not just intertwined across industries but also within the social fabric.

The Asian Banker conducted a survey across 10 leading FIs in APAC to assess their challenges in technology risk management and how their strategy is evolving following the recent outage. It reveals several key challenges in technology risk and vendor management, especially as supply chains are widening and banks have limited visibility into third- and fourth-party vendors.

The lack of transparency in vendors’ technology development and testing is rated as the biggest challenge by 40% of FIs. Inadequate testing and risk measures by vendors is considered the most challenging issue by 30% of FIs, while interdependence of systems is considered a significant challenge by 60%.

Ensuring resiliency in operations, and the ability to manage disaster recovery and business continuity with speed is critical. Banks have been investing in this area but still face challenges.

The survey reveals the high costs associated with having multiple vendors for building redundancies as the ‘most challenging factor’ for 20% of FIs and ‘a significant challenge’ for 50%. About 60% consider the high cost of building internal resilience ‘a significant challenge’. Internal technology talent also remains among the top challenges.

Key learnings from the CrowdStrike outage

The TAB survey reveals that 80% of FIs are planning to build more redundancies, and resilience and disaster recovery processes, 60% plan to improve change management and governance; and 50% will reassess vendor concentration and supply chain threats, as well as strengthen their software testing protocols.

FIs are rethinking their technology risk management strategy. “The CrowdStrike incident raises questions about managing vendor concentration and endpoint software contagion risk. It also provides several lessons as we plan our annual resilience activities,” commented John Howard Medina, chief operating officer at the Philippine Bank of Communication (PBCom).

Test, test, and test again
The incident highlights the need for stronger tests and controls by vendors, third parties, and FIs themselves. Any changes to critical systems, such as operating systems, must undergo much more stringent checks and testing. CrowdStrike had access to the Microsoft kernel (core program), which enabled it to make these changes. Microsoft will now need to reassess its own controls and core access.

“Customers need to be aware of other kernel drivers running in their Windows environment. Additionally, updates should be reviewed carefully before being rolled out on critical systems. For added safety, banks might consider continuing with a previous version when a new update is released, acknowledging the trade-off of not having the latest updates to their infrastructure protected by such security solutions,” opined RAKBANK’s Tushar Vartak, executive vice president and head of information, cyber security, and fraud prevention.

Any software change must be well tested and rolled out in phases. FIs should test the software in a protected environment before implementing it.

“One key learning is that FIs must first test any update in a sandbox environment before it is released. If we roll it out in our network without testing, it could cause a contagion effect. With several of these security agents in our networks, one can become complacent, especially if they are software-as-a-service cloud-hosted,” said Medina. He added that his bank has a policy of sandboxing all endpoint updates and rolling them out in batches.

Vartak raised a red flag about the increasing trend of using AI in testing. “AI significantly contributes to code writing and testing, enhancing productivity through automated deployment pipelines. However, manual testing remains crucial for identifying vulnerabilities in business logic that automated processes might miss.”

Operational resilience becomes urgent
It’s not just about business continuity; it’s about resiliency and the ability to recover quickly in any contingency. FIs must reassess and prepare for any emergency, ensuring additional investments in redundancies and backups.

“In this case, the endpoint failure was not where resilience had been built in. Resilience should cover the entire network, and banks should test their resilience as often as possible. To create endpoint resilience, enterprises can plan for alternative access points if the critical or frontline endpoints fail,” commented Medina.

This also becomes relevant with the increasing workloads on the cloud, which calls for increasing resilience. Medina pointed out that FIs can implement secondary or tertiary levels of redundancy based on the amount they are willing to spend, and redundancies can be added with different cloud providers. He shared that when PBCom started hosting on the cloud, it was also mirrored in a secondary geographical region, with a tertiary on-premises backup in case both the primary and secondary sites became inaccessible.

Regulators are already stressing the importance of resilience and stronger technology risk management.

The Digital Operational Resilience Act in the EU mandates FIs on operational resilience with rules for protection, detection, containment, recovery, and repair capabilities against incidents related to information and communication technology. These include technology risk management, incident reporting, operational resilience testing, and third-party risk monitoring. FIs in the EU will need to implement these by January 2025.

The Monetary Authority of Singapore issued Guidelines on Outsourcing for banks in 2023, detailing the responsibilities of senior management in ensuring sound risk management, a framework for risk evaluation, outsourcing, and business continuity.

Widening supply chains, increasing concentration risks
The CrowdStrike incident highlighted the concentration of endpoints that were secured by one system across various organisations.

“The financial industry must evaluate solutions integrated within their ecosystem that are dependent on third-party vendors. The availability of banking infrastructure depends not only on internal protection but also on how third parties manage their infrastructure. Complete supply chain visibility is crucial, particularly with fourth-party relationships. Ensuring transparency and understanding of the origin and development process of code from vendors is vital for maintaining security,” opined Vartak.

This demands stronger governance protocols and the need for vendors to work together swiftly to resolve such issues. They must rigorously reassess their technology risk, identify and quantify all risks in the system, and ensure that they have a stronger governance framework in place.

Plan and prepare for the next incident
Despite stronger processes and prevention measures, FIs can expect to face more outages and risks to their IT systems. Faster recovery and business continuity will require a well-planned response. Built-in resilience across systems with detailed recovery, incident response, and communication strategies are essential. Strong investments in technology backups and redundancies must be complemented by staff training and stronger risk processes.