Conversation
Crowdstrike thoughts
Show content
TL-DR: no, it's not your beloved "Install Linux" moment.

First, how do I understand this incident. This is an important part because misunderstanding will lead to wrong conclusions (and it can if I'm not correct in this part).

Crowdstrike has some "early antivirus" type of program analyzing your system to detect yet unidentified threats. To do this it needs kernel-level access achieved their driver.

Any crash of app with this access level will cause BSOD / kernel panic or so on depending on your favorite OS.

Can Microsoft block this access level? No, because they are obliged to allow it via the EU court (and let's be fair, if they blocked kernel level access, we would read much more MS blaming from Linux fans).

So, did Crowdstrike update driver causing outage? Due to limitations of kernel level driver updates, it will take a long time, and updating it with every new virus won't be the case. So they have made some configuration files that we can simplify as virus databases (because technically they are).

And "virus database" update caused kernel-level driver error and BSOD.

As far as I understand Windows have some default fixing mechanisms that couldn't work because of faulty driver working on early boot process causing endless reboots.

So, could similar things happen under linux. Well, it would be unfair to say what could have happened because it [happened already before](https://www.reddit.com/r/debian/comments/1c8db7l/linuximage61020_killed_all_my_debian_vms/).

Next, we've seen an online discussions about instruments that could be used to fix the problem. The most interesting solution I've found was ssh built-in into initramfs for remote maintenance.

Well, if you can put ssh server into initramfs, you may also have antivirus auto installed into initramfs causing problems at the early boot stages.

It doesn't mean that it will get into initramfs or indeed cause problems also at that early stage in case of kernel panic. But it it means so Linux is not protected from the same kind of problem.

The next stage is manual updates. I understand that auto app updates may cause problems. I understand that app/driver updates should be tested before applying at critical infrastructure.

But do you really ask people to manually test every virus database update before applying it in the company? Test every new virus database released few times a day! In this case, let's maually test every document file people have to deal with if it doesn't break your work applications before allowing users to read it. Why not?

So, who do we have to blame?

Crowdstrike? Yes, they fucked up twice. First time by allowing driver to break in case of broken config update. Second time by sending a broken update.

Admins that doesn't disable automatic virus database updates? Sorry, but no.

Microsoft? Sorry, but this is the case when Microsoft is not the one to blame.

Regulators that allowed critical infrastructure to "put all eggs into the single basket"? Maybe.

Linux fans spreading misinformation? Yes.

Yes, I'm also a Linux fan. Linux is more convenient for me than Windows. But please don't act like some trash political propagandists spreading misinformation to achieve their goals.

Thank you.
4
2
1
Fediverve will eat me at this lunch blobcatcatnom
1
1
0

@yura Welp, you're absolutely right.

Given you're a catboy I'd still eat you though… 😘

0
0
1
@yura Linux is only a kernel - the only point of comparison with windows are how both are proprietary software.

I tested a similar kind of bug on GNU Linux-libre, which was a module that de-references a null pointer and it did not crash and just oops'd unlike the NT kernel, which responds to a de-reference of low "protected" memory by triggering a bluescreen.

ClownStrike did offer a LiGNUx version which was just as buggy - previously part of it consisted of a Linux module, which was so badly programmed that it could trigger kernel panics - although each used had the ability to test updates prior to deployment, meaning only the test servers would require fixing after a broken update and not every single computer in the botnet.

clownstrike for LiGNUx now utilizes an eBPF program, which is a VM designed to try to make it impossible for software running inside to crash Linux.
1
1
1
@Suiseiseki I remember that it's mentioned somewhere that they have a version of their shitty software that runs on GNU/Linux systems. I'm curious, do someone even trusts them anymore after that massive fuck-up?
@yura
1
0
0
cw: Crowdstrike
Show content
@yura I have heard that some people whose company paid additional money to have their updates staged were also ffed up by the fact that faulty update ignored that staging. So, like, CrowdStrike is most to blame, yeah.

But Microsoft could've made some privileged API for such functionality instead of officially signing the kernel modules that seemingly execute unsigned code/pseudocode at kernel level. They are pointing fingers at EU, but it doesn't matter because they already announced that they are making such APIs.
1
0
0
re: Crowdstrike thoughts
Show content
@papush @yura As well as developers of kernels that include widespread ambient authority with no fault isolation nor fault tolerance.

Microkernel OSes would've been largely unaffected. Microsoft's own Singurality/Midori OSes (monolithic high-isolation LBS OSes) would *also* have been unaffected (and access to the privileged core would've been a "lolno, fuckoff").
0
0
0
@papush @lispi314 yeah. It has. Drivers operating in user mode won't crash the system in case of failure.

But drivers operating in kernel mode will. https://learn.microsoft.com/en-us/windows-hardware/drivers/gettingstarted/user-mode-and-kernel-mode
0
0
0
@getimiskon >I'm curious, do someone even trusts them anymore after that massive fuck-up?
It's pretty much used by only box-tickers that install whatever to tick the "defend against compromise box", even if what is installed compromises the system, so the sort that actually install it will take anything - I've read about how the sort had allocated test machines in the past after realizing how buggy the software was.
1
0
2
@Suiseiseki that makes sense, and i believe they are much more "box-tickers" than we think there are out there
0
0
0
re: Crowdstrike thoughts
Show content
@yura

>Can Microsoft block this access level? No, because they are obliged to allow it via the EU court

They in fact wanted to do this some time ago and EU blocked it. They proposed a security API that would give more access into the inner workings of the kernel, but EU considered it anti-competitive because of a possibility that smaller companies wouldn't have access to the API.

>As far as I understand Windows have some default fixing mechanisms that couldn't work because of faulty driver working on early boot process causing endless reboots.

The typical self-repair mechanisms would prevent all of this if Crowdstrike didn't label their driver as a boot-start driver that must always be loaded when starting Windows.

>So, could similar things happen under linux. Well, it would be unfair to say what could have happened because it happened already before

RHEL was also affected a month or so later after a kernel update. They have a history of issuing broken updates or their software breaking systems.

>But do you really ask people to manually test every virus database update before applying it in the company?

Yes, this should be absolutely expected. In an enterprise environment, no update that wasn't tested should be applied on production systems. It's an industry standart to partially roll out updates in enterprise usually in a N, N-1 and N-2 fashion (N being the newest version.)

The version N should only be installed on test systems that do nothing but test new updates and the updates should be installed automatically as they come out. This stage is usually called test. The next stage is staging where N-1 versions are installed, these are approved manually and run on non-critical systems where outages are only annoying to deal with. The last stage is production where N-2 versions live and they are also manually approved. At this point everything should have been tested at least for few days, ideally a week or two and everything should be fine.

Now the problem is that the Crowdstrike software allows this kind of granular control over updates, but some updates are allowed to bypass these policies completely and this was one of them. This turned out to be the issue I described in my first post about this as the "seemingly updates faster than updating the policy that disables auto-updates".

Consider parts of the post I left out as something I fully agree with.
0
0
1