Cyber security company CrowdStrike has published root cause analysis detailing the Falcon Sensor software update failure that crippled millions of Windows devices worldwide.
“Channel File 291” incident as originally allocated in its previous post-incident review (PIR), was traced to a content inspection issue that arose after it introduced a new type of template to provide visibility and detect new attack techniques that abuse named pipes and other Windows inter-process communication (IPC). mechanisms.
Specifically, it related to problematic cloud-deployed content updates, describing it as a “confluence” of several issues that led to the crash: a mismatch between the 21 inputs passed to the content validator via the IPC template type as opposed to the 20 supplied to the content interpreter .
CrowdStrike said the parameter mismatch was not detected during “multiple layers” of the testing process, in part due to the use of wildcard matching criteria for the 21st input during testing and in initial instances of IPC templates that were delivered between March and April 2024 year.
In other words, the new version of channel file 291, released on July 19, 2024, was the first instance of the IPC template to use the 21st input parameter field. The lack of a specific test for no substitution match criteria in field 21 meant that this was not flagged until the rapid response content was sent to the sensors.
“Sensors that received the new version of Channel File 291 with problematic content experienced a hidden out-of-bounds reading issue in the Content Interpreter,” the company said.
“On the next IPC notification from the operating system, new instances of the IPC template were evaluated, indicating a comparison with the 21st input value. The content interpreter expected only 20 values. Therefore, an attempt to access the 21st value caused an out-of-bounds memory read outside the input array and resulted in a system crash.”
In addition to checking the number of input fields in the Template Type when compiling the sensor to address the issue, CrowdStrike said it also added runtime input array bounds checks to the Content Interpreter to prevent out-of-bounds memory reads, and fixed the number of provided inputs per template type IPC.
“Added bounds checking prevents the Content Interpreter from performing accesses outside the bounds of the input array and causing the system to crash,” the post notes. “Additional validation adds an extra layer of runtime verification that the size of the input array matches the number of inputs expected by the responsive content.”
Additionally, CrowdStrike said it plans to increase test coverage during template type development to include test cases for non-substitution match criteria for each field in all (future) template types.
Some of the sensor updates are expected to address the following gaps:
- The content validation tool is being modified to add new checks that ensure that content in template instances does not include match criteria that match on more fields than are provided as input to the Content Interpreter
- The content validation tool is modified to only allow wildcard matching criteria in the 21st field, preventing access outside of sensors that only provide 20 inputs
- The content configuration system has been updated with new test routines to ensure that each new template instance is tested regardless of whether the original template instance is tested with the template type on creation
- The content configuration system has been updated with additional deployment levels and acceptance checks
- The Falcon platform has been updated to give customers more control over responsive content delivery
Last but not least, CrowdStrike said it engaged two independent third-party security software vendors to conduct further security and quality assurance checks on the Falcon sensor code. It also independently audits the end-to-end quality process from development to deployment.
He also promised to work with Microsoft as Windows introduces new ways to perform security functions in user space as opposed to using a kernel driver.
“The CrowdStrike kernel driver is loaded from the early boot phase of the system to allow the sensor to observe and protect against malware that is launched before user-mode processes are launched,” it said.
“Providing up-to-date security content (such as CrowdStrike’s rapid response content) to these kernel capabilities allows the sensor to protect systems against rapidly evolving threats without making changes to kernel code. Quick response content is configuration data, not kernel code or driver.”
Root Cause Analysis is published as Delta Air Lines said it has “no choice” but to seek damages from CrowdStrike and Microsoft for the massive disruption and $500 million in lost revenue and additional costs associated with thousands of canceled flights.
Both CrowdStrike and Microsoft have since answered to the criticism, saying that they were not to blame for the multi-day outage and that Delta had rejected their offers of on-site assistance, indicating that the carrier’s problems may run much deeper than its Windows machines failing as a result of a faulty security update.