• meoward
  • Posts
  • šŸˆ CrowdStrike didnā€™t cause a security incident (but the incident was material)

šŸˆ CrowdStrike didnā€™t cause a security incident (but the incident was material)

For legal reasons, this is satire.

Read time: 6 minutes

Hi again, it's meoward ā€“ I received the following note from a subscriber last week:

Dear hacker cat, I was sitting on my front porch, sipping an iced tea, when I thought: where did meoward go? Please come back. A security tool caused a global outage, and we donā€™t know what it means.

Well baby Iā€™m back. And if youā€™re new here, this is how this newsletter works:

  1. I show up one summer afternoon and take you out for a Big Mac. And yes, the ice cream machine is still broken.

  2. I tell you, "I want you back in my life."

  3. I drop you back off at your house. I donā€™t call again for another year.

Have you missed me and have you been to therapy lately?

New Balance sneakers, approved by dads everywhere.

Cloudstruck at Starbucks

A couple weeks ago, I was on a road trip driving across the country, when I pulled into a Starbucks for a little pick-me-up. As I walked into the store, I noticed a handwritten note on the door that said "Cash only." I thought that was pretty odd, so I asked someone in line what was up and they said "Yeah, everything is down because of something called CrowdStrike."

"Wait, what? The EDR?"

"I donā€™t know. Something about Microsoft and blue screens."

And thatā€™s how I learned about the CrowdStrike update that crashed computers around the world, causing widespread disruptions. It also introduced everyone to a favorite pass time of mine: photographing Blue Screens of Death (BSODs) in the wild.

We know what the lazy Halloween costume is gonna be this year.

For those of us that have been rolling security tools for awhile, this probably brought back memories of a similar incident in April 2010 when a content update from McAfee caused BSODs on Microsoft Windows systems across the globe. It seems that the CEO of CrowdStrike, George Kurtz, is no stranger to tech disruptions.

What caused the BSOD?

CrowdStrike pushes what they call Rapid Response Content updates to all their agents, typically in response to threat intel about new adversary techniques. On July 19th, CrowdStrike released one of these updates to all Windows hosts running sensor version 7.11 and above.

Thatā€™s right ā€“ all of them. Ship it!

And this update contained problematic content, which caused Windows systems around the world to crash due to an out-of-bounds (OOB) memory read.

An OOB memory read, huh?

The short (and slightly inaccurate) version from 360 (English version here):

  1. CrowdStrikeā€™s CSAgent.sys driver implements a custom virtual machine engine, which is used to gather all that fancy EDR data.

  2. The Rapid Response Content update was pushed (the infamous C-00000291-00000000-00000009.sys) to Windows systems around the world.

  3. The direct cause of the BSOD: when the CSAgent.sys attempted to process the content update, it caused an OOB memory read during opcode verification.

The InfoSec community reacts

I love drama. As I continued my drive across the country, every time I stopped for food or gas I was refreshing my feeds to see what people were saying. Hereā€™s the three narratives I saw:

1. Wow, a lot of companies use CrowdStrike.

Itā€™s impressive to see their market dominance. But a lot of people were left asking: ā€œHave we become too reliant on one or two big companies for everything that we use?ā€

Iā€™m writing this on a plane to Las Vegas right now, and thereā€™s two guys sitting in front of me talking about how glad they were that their companies werenā€™t impacted. And then the one guy says ā€œyou know, right before I headed out on this trip my CISO asked me why are all our competitors using CrowdStrike and we arenā€™t? Maybe we should be using CrowdStrike, too.ā€

Thereā€™s been a lot of consolidation in IT, specifically in the EDR space. Sure, thereā€™s still lots of niche tools out there (and theyā€™re probably annoyed I said that), but your options today are either:

  1. Something your dad would install

  2. Some poor tool that used to be great before it was gobbled up by private equity

  3. The underdog that lacks a lot of features today, but has a niche audience

  4. The market leaders (you know them)

And if your market research and proof of concepts only look at the technology from a feature perspective, the market leaders will win every time. Theyā€™ve made sure of it.

2. Iā€™m so glad weā€™re more enlightened and run macOS.

To be clear, this issue had nothing to do with the operating system. Yes, macOS moved system extensions out of the kernel. And yes, they can and still do cause panics. Ask me how I know. This group of elitists should serve as a reminder to always support your colleagues when theyā€™re having a really bad day/week. Because itā€™s just a matter of time.

3. Why does CrowdStrike not do rolling updates?

Bugs can and do happen. But at some point in CrowdStrikeā€™s history, a decision was made to NOT do canary deployments for these content updates, and instead push them out production-wide. And there was nothing customers could have done to prevent it: delivery of this content was non-configurable.

This was not just a failure to test. This was not just a failure to do proper QA. This was not just an engineering design failure.

This was a failure at the leadership level.

The ability to push an update to systems globally without a rollout is wildly risky. And the lack of technical controls, processes, and policies that allowed it to happen is scary. But it tells us exactly what leadership was prioritizing, and what they werenā€™t.

What we should all learn

Iā€™ve been deploying endpoint security tools for a long time. Itā€™s always been terrible. Yes, weā€™ve gotten more capabilities and yes, theyā€™re a lot easier to use. But quality control has always suffered. Ask me how I know.

This global outage should serve as a reminder of what weā€™ve outsourced. Weā€™ve outsourced a lot of our security maturity, change management, supply chain issues, and resiliency. We made the mistake of assuming that a company with global market dominance would prioritize stability and quality before product features. And if you think by choosing the underdog EDR youā€™ll avoid these issues, I think our current economic environment would say otherwise. Companies, even private one, are having to find a profitability narrative, and thatā€™s led to an increase in layoffs and archaic return to office orders in a struggle to increase profits and senior leadership control. So what gets impacted? Quality controls. Testing. The basics.

And weā€™ve gotten so far up the stack that we no longer understand the basics of how our tools work or the impact they could have ā€“ and yet we deploy this software to all our laptops and servers.

Instead of feeding the constant demand for new features, we need to implore our vendors to commit the time to improve quality controls. Instead of asking what percentage of the MITRE ATT&CK framework their tool covers, letā€™s start with the basics: How does this thing work? How can change be introduced? What risks do those create? Are we comfortable with those risks?

You know, the boring stuff.

Itā€™s time to re-evaluate the risks we thought we outsourced.

How did you like today's email?

Login or Subscribe to participate in polls.