Many of my cousins were unable to work last Friday because of a CrowdStrike incident. I got a few questions asking me to explain why local IT admins were going laptop to laptop; kiosk to kiosk; or server to server to get them running again. “Why doesn’t rebooting just work? The IT Crowd did tell us the solution to most problems is to “try turning it off and then on again.” My (poor) attempt at an explanation was that the machines had all entered a mode where they couldn’t boot. In order to boot they needed to be put into safe mode and a specific file needed to be deleted. But this was too high level. It didn’t let people understand what was actually happening. And it didn’t explain what CrowdStrike Falcon was. So here’s my metaphor.
A computer (be it server, laptop, kiosk, whatever) is like a little building. At the bottom of the building you have a lobby that has elevators to the higher floors (where the real work happens), pipes, a mail room, receptionist, and that the supervisory staff that keep the place running. Anything that comes from the outside (email, photos, input from keyboards, usb disks, etc) it gets routed by the hard working supervisors on the ground floor to the correct people on the higher floor that do actual work with it. Actual work could be approving airline ticket purchases, posting cat pics, or scheduling a surgery. Our supervisors don’t know or care – they are hard workers who do their job well. And if the people on the higher floors have trash or need to send outgoing mail they drop it off to the ground floor and the supervisors take care of it. They do the Basic Input and Output. We need them for the building to function.
Background
CrowdStrike makes security software. Every building needs security. Instead of hiring a guard to march around the building you can just pay CrowdStrike (C/S). The C/S guard is then in your building 24/7 and has a massive rolodex full of photos of “bad guys.” CrowdStrike sends new photos via email to all of its guards stationed around the world. This way “bad guys” spotted in one building can’t break into other buildings. This service has been rock solid and people pay lots of money for it.
What Happened
Thursday night C/S accidently sent a picture that wouldn’t go into the rolodex (maybe it was the wrong size? Maybe it was a mirror? Maybe it wasn’t a picture but an MP3?). We don’t know why but it wouldn’t fit into the rolodex. Every guard tried to get it into the rolodex but it wouldn’t work. The guard starts dedicating more and more of the building to his rolodex project. Pretty soon he’s got the entire lobby dedicated to this strange obsession. The building supervisor saw the guard getting angry trying to jam that picture into the rolodex and pulled the fire alarm- kicking everyone out of the building. “It’s ok, we’ll all come back in and work can get started again”, the supervisor thought. There’s a proper way to enter the building. The building super and staff go in first, then the C/S guard, then the tenants. So they get lined up to go in. Super and staff go in. The C/S guard enters. And the first thing he does is try to get that stupid poster sized picture into his rolodex. “I’m going to need some more space,” the guard says while cracking his knuckles and rolling up his sleeves. The super sees this and pulls the alarm again. Now this building is stuck in a cycle. At some point the supervisor throws up a big blue screen of death. No matter how many times they try they can’t get work to start.
CrowdStrike Takes Action
C/S realizes it’s mistake an sends an email updating that last bad photo to the correct photo of a level 5 muchacho. But since every guard already has that old, bad “photo” and will not chekc the email until they figure out how to get the bad photo into the rolodex. So we need some manual intervention.
The Fix
The building and people in it have no way to fix this cycle they are in. They just can’t get any work done.
Thankfully the helpful local town fire chief (your friendly neighborhood IT person) comes by and puts the building into a “safe mode”. Only the supervisor is let in to turn on the building and then the fire chief. The super goes in. The guard is left out in the cold with everyone else. Don’t worry he’s telling everyone about his great rolodex. While he’s bragging the super lets the fire chief in to poke around. The fire chief destroys that blasted picture and then tells everyone to get out and enter again. This time the guard goes in looks at the rolodex and then checks his email. Ahh there’s a new photo from C/S that has the latest level 5 muchacho in it. He adds it to his rolodex and the building functions again.
Now that firechief just has to go to every other building in town and then things will start to get better.
Oh yeah also since your entire town is highly dependent on each other, the firechief has to then coordinate with chiefs across town and have buildings start in certain orders…..It’s going to be a long night!
Some disclaimers: This was written as a way to explain a portion of the issue, as it was understood by me, with limited information and limited understanding to a non-technical audience. If you’re reading this after I wrote it- you have more information than I did when I wrote it. If you made it this far- you might want to get information from a person that doesn’t mix modern cyber security with a “rolodex” metaphor