Businesses around the world have suffered massive outages to their Windows workstations linked to a flawed update released by cybersecurity firm CrowdStrike.
“CrowdStrike is actively working with customers affected by a flaw discovered in one content update for Windows hosts,” said the company’s CEO, George Kurtz. said in statement. “Mac and Linux hosts are not affected. This is not a security incident or cyber attack.”
A company that admitted “reports (Blue screens of death) on Windows hosts,” also said it discovered the problem and deployed a fix for the Falcon Sensor product, urging customers to contact the support portal for the latest updates.
For systems that have already been affected, the mitigation instructions are below –
- Boot into Windows in Safe Mode or the Windows Recovery Environment
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory
- Find the file named “C-00000291*.sys” and delete it
- Restart your computer or server normally
Notably, the crash also affected Google Cloud Compute Engine, causing Windows virtual machines using CrowdStrike’s csagent.sys to crash and enter an unexpected reboot state.
“After automatically receiving a flawed patch from CrowdStrike, Windows virtual machines crash and fail to reboot,” the report said. said. “Windows virtual machines that are currently up and running should no longer be affected.”
Microsoft Azure also has placed a similar update stating that “some customers have received successful recovery messages who have attempted multiple VM restarts on damaged VMs” and that “multiple restarts may be required (15 reported)”.
Amazon Web Services (AWS), for its part, said it has taken steps to mitigate the problem for as many instances of Windows, Windows Workspaces and Appstream applications as possible, recommending that customers still affected “take steps to restore connectivity.”
Security researcher Kevin Beaumont said “I got the CrowdStrike driver that they pushed through auto-update. I don’t know how it happened, but the file is not a properly formatted driver and causes Windows to crash every time.”
“CrowdStrike is a state-of-the-art EDR product applied to everything from POS to ATMs, etc. – This will likely be the largest ‘cyber’ incident in the world in terms of impact.”
Airlines, financial institutions, food and retail chains, hospitals, hotels, news organizations, railway networks and telecommunications companies among in many enterprises victims. CrowdStrike shares fell 15% in US premarket trading.
“The current event — even in July — looks set to be one of the most significant cyber issues of 2024,” said Omer Grossman, Chief Information Officer (CIO) of CyberArk, in a statement shared with The Hacker News. “The damage to business processes globally is dramatic. The failure is related to a software update of the CrowdStrike EDR product.”
“This is a product that runs with elevated privileges that protects endpoints. Failure to do so can, as we see in the current incident, cause the operating system to crash.”
The recovery is expected to take several days as the problem must be addressed manually, endpoint by endpoint, by booting them into safe mode and removing the offending driver, Grossman said, adding that the root cause of the failure would be of “great interest.” .”
Jake Moore, global security advisor at Slovakian cybersecurity company ESET, told The Hacker News that the incident highlights the need to implement multiple “responses” and diversify IT infrastructure.
“Upgrading and maintaining systems and networks can inadvertently include small errors that can have the wide-ranging consequences that CrowdStrike customers face today,” said Moore.
“Another aspect of this incident is related to ‘diversity’ in the use of large-scale IT infrastructure. This applies to mission-critical systems such as operating systems (OS), cybersecurity products, and other globally deployed (scalable) applications. Where diversity is low, a single technical incident, let alone a security issue, can lead to global outages with consequential consequences.”
The development comes as Microsoft recovers from a separate outage of its own that caused problems with Microsoft 365 apps and services, including Defender, Intune, OneNote, OneDrive for Business, SharePoint Online, Windows 365, Viva Engage and Purview.
“A configuration change to some of our Azure backend workloads caused an interruption between storage and compute resources, resulting in connectivity failures, affecting downstream Microsoft 365 services that depend on those connections,” the tech giant said. said.
Omkhar Arasaratnam, general manager of OpenSSF, said the Microsoft-CrowdStrike outages highlight the fragility of monoculture supply chains and emphasized the importance of diverse technology stacks to improve resilience and security.
“Mono-culture supply chains (one operating system, one EDR) are inherently fragile and susceptible to systemic failures, as we’ve seen,” noted Arasaratnam. “Good systems engineering tells us that changes to these systems should be introduced gradually, seeing the effect in small tranches, not all at once. More diverse ecosystems can tolerate rapid change because they are resilient to systemic challenges.”