- meoward
- Posts
- š CrowdStrike didnāt cause a security incident (but the incident was material)
š CrowdStrike didnāt cause a security incident (but the incident was material)
For legal reasons, this is satire.
Read time: 6 minutes
Hi again, it's meoward ā I received the following note from a subscriber last week:
Dear hacker cat, I was sitting on my front porch, sipping an iced tea, when I thought: where did meoward go? Please come back. A security tool caused a global outage, and we donāt know what it means.
Well baby Iām back. And if youāre new here, this is how this newsletter works:
I show up one summer afternoon and take you out for a Big Mac. And yes, the ice cream machine is still broken.
I tell you, "I want you back in my life."
I drop you back off at your house. I donāt call again for another year.
Have you missed me and have you been to therapy lately?
New Balance sneakers, approved by dads everywhere.
Cloudstruck at Starbucks
A couple weeks ago, I was on a road trip driving across the country, when I pulled into a Starbucks for a little pick-me-up. As I walked into the store, I noticed a handwritten note on the door that said "Cash only." I thought that was pretty odd, so I asked someone in line what was up and they said "Yeah, everything is down because of something called CrowdStrike."
"Wait, what? The EDR?"
"I donāt know. Something about Microsoft and blue screens."
And thatās how I learned about the CrowdStrike update that crashed computers around the world, causing widespread disruptions. It also introduced everyone to a favorite pass time of mine: photographing Blue Screens of Death (BSODs) in the wild.
We know what the lazy Halloween costume is gonna be this year.
For those of us that have been rolling security tools for awhile, this probably brought back memories of a similar incident in April 2010 when a content update from McAfee caused BSODs on Microsoft Windows systems across the globe. It seems that the CEO of CrowdStrike, George Kurtz, is no stranger to tech disruptions.
What caused the BSOD?
CrowdStrike pushes what they call Rapid Response Content updates to all their agents, typically in response to threat intel about new adversary techniques. On July 19th, CrowdStrike released one of these updates to all Windows hosts running sensor version 7.11 and above.
Thatās right ā all of them. Ship it!
And this update contained problematic content, which caused Windows systems around the world to crash due to an out-of-bounds (OOB) memory read.
An OOB memory read, huh?
The short (and slightly inaccurate) version from 360 (English version here):
CrowdStrikeās
CSAgent.sys
driver implements a custom virtual machine engine, which is used to gather all that fancy EDR data.The Rapid Response Content update was pushed (the infamous
C-00000291-00000000-00000009.sys
) to Windows systems around the world.The direct cause of the BSOD: when the
CSAgent.sys
attempted to process the content update, it caused an OOB memory read during opcode verification.
The InfoSec community reacts
I love drama. As I continued my drive across the country, every time I stopped for food or gas I was refreshing my feeds to see what people were saying. Hereās the three narratives I saw:
1. Wow, a lot of companies use CrowdStrike.
Itās impressive to see their market dominance. But a lot of people were left asking: āHave we become too reliant on one or two big companies for everything that we use?ā
Iām writing this on a plane to Las Vegas right now, and thereās two guys sitting in front of me talking about how glad they were that their companies werenāt impacted. And then the one guy says āyou know, right before I headed out on this trip my CISO asked me why are all our competitors using CrowdStrike and we arenāt? Maybe we should be using CrowdStrike, too.ā
Thereās been a lot of consolidation in IT, specifically in the EDR space. Sure, thereās still lots of niche tools out there (and theyāre probably annoyed I said that), but your options today are either:
Something your dad would install
Some poor tool that used to be great before it was gobbled up by private equity
The underdog that lacks a lot of features today, but has a niche audience
The market leaders (you know them)
And if your market research and proof of concepts only look at the technology from a feature perspective, the market leaders will win every time. Theyāve made sure of it.
2. Iām so glad weāre more enlightened and run macOS.
To be clear, this issue had nothing to do with the operating system. Yes, macOS moved system extensions out of the kernel. And yes, they can and still do cause panics. Ask me how I know. This group of elitists should serve as a reminder to always support your colleagues when theyāre having a really bad day/week. Because itās just a matter of time.
3. Why does CrowdStrike not do rolling updates?
Bugs can and do happen. But at some point in CrowdStrikeās history, a decision was made to NOT do canary deployments for these content updates, and instead push them out production-wide. And there was nothing customers could have done to prevent it: delivery of this content was non-configurable.
This was not just a failure to test. This was not just a failure to do proper QA. This was not just an engineering design failure.
This was a failure at the leadership level.
The ability to push an update to systems globally without a rollout is wildly risky. And the lack of technical controls, processes, and policies that allowed it to happen is scary. But it tells us exactly what leadership was prioritizing, and what they werenāt.
What we should all learn
Iāve been deploying endpoint security tools for a long time. Itās always been terrible. Yes, weāve gotten more capabilities and yes, theyāre a lot easier to use. But quality control has always suffered. Ask me how I know.
This global outage should serve as a reminder of what weāve outsourced. Weāve outsourced a lot of our security maturity, change management, supply chain issues, and resiliency. We made the mistake of assuming that a company with global market dominance would prioritize stability and quality before product features. And if you think by choosing the underdog EDR youāll avoid these issues, I think our current economic environment would say otherwise. Companies, even private one, are having to find a profitability narrative, and thatās led to an increase in layoffs and archaic return to office orders in a struggle to increase profits and senior leadership control. So what gets impacted? Quality controls. Testing. The basics.
And weāve gotten so far up the stack that we no longer understand the basics of how our tools work or the impact they could have ā and yet we deploy this software to all our laptops and servers.
Instead of feeding the constant demand for new features, we need to implore our vendors to commit the time to improve quality controls. Instead of asking what percentage of the MITRE ATT&CK framework their tool covers, letās start with the basics: How does this thing work? How can change be introduced? What risks do those create? Are we comfortable with those risks?
You know, the boring stuff.
Itās time to re-evaluate the risks we thought we outsourced.
How did you like today's email? |