Booting Automotive ECUs Really Fast with Modern Security Features
Brendan Le Foll, BMW Car IT GmbH [Open Source Summit EU 2022]

Modern cars have dozens of ECUs. Nowadays there is typically a central ECU called Node0 running Linux. Try to keep things simple by doing as much as possible in Linux itself.

Botting fast is actually doing the things that need to be done fast early on - before the entire system is ready. In addition it has to have decent security.

There are regulatory reasons why some functionalities need to be available fast, but in general as a user you just want your car to be ready as soon as possible when you turn on the ignition.

Decent security means:

  • integrity protection (i.e. verified boot);
  • secure key storage, ECU authentication (so you can’t steal one and install in another vehicle);
  • IPC security policies and mandatory access controls as defense in depth;
  • Encrypting customer data.

How to boot fast? Boot early, as soon as the car is opened. Suspend to RAM can be a lot faster, but you can’t rely on STR when the car is not used for weeks. However, STR allows you to get the entire system running very fast, while otherwise it’s only the critical systems that are up quickly. Hibernation instead of STR is not an option in automotive, it wears out the flash in less than 15 years.

Before Linux boots, the big chunk is taken by loading the kernel and verifying it (but that doesn’t slow down boot too much anyway; e.g. smaller keys don’t help much). There is already 1s gone there. Note that you can also do verified boot on the -M and -R microcontrollers.

Modern flash (eMMC) is pretty fast. UFS flash devices have several LUNs with different speed properties. The kernel however is going to create a separate block device for every partition in a UFS device. This takes some time to enumerate, especially with udev. Solution is to hardcode stuff in the kernel (because it’s not actually dynamic anyway).

Boot ROMs in many SoCs support A/B boot from Boot LUNs based on a register.

To make the kernel load fast, move everythng you can to modules. RAM hotplugging helps as well - start with less RAM for booting, add the rest later. Scheduler and freq scaling have an impact. Also big.little - you want to put things as much as possible on the large cores at boot. WALT vs. PELT.

For verifying the rootfs, they copied what Android does. All r/w partitions are noexec, and all exec partitions use dm-verity. AVB2 format from Android. dm-verity also allows you to skip module and firmware signing (because they’re already verified by dm-verity) so this speeds up module and firmware load a little. Of course need to restrict module and firmware loading to that partition.

For trustzone, an arbitration daemon is needed, so you need that to be started before you can do things. And that one depends on a bunch of other things which are slow to start. OP-TEE itself is relatively quick to load though even though it steals CPU cycles from the kernel boot. ARM Systemready will standardize how the boot process is done which is going to make this work easier to interoperate between SoC vendors.

Systemd is very flexible. They added an that runs before sysinit and even udev. This allows you to get some userspace up and running very early. Of course, those application need to have really minimal dependencies. Some things that need to be in there is dbus, basic hardware information, systemd-networkd, a bunch of startup devices (since udev is not ready yet).

udev is tricky. Replay of coldboot events is unpredictable. It’s also tricky to prioritize the triggers. You also want to avoid re-triggering events. They contributed a number of patches to udev, but it’s hard to make them generic.

Polkit is really big and therefore slow. It uses javascript as well… Recent patches replace mozjs with ducktape, which is much smaller, but it’s still big and slow. It’s difficult to avoid since it’s used by dbus to secure the IPC. So they created there own, called smolkit, which will be published soon. Inspired by Drop-in replacement for polkit (so still uses Javascript???) - but it skips everything that tries to communicate with a user. An alternative would be Debian polkit, but it is just a little bit better, and it is not well maintained upstream.

They use a lot of containers. They are normally not really part of the strict boot time requirements, but still for full functionality. They compared LXC with different podman-based solutions and LXC is just a lot faster to boot. They are however missing orchestration features.