Large Scale Deployments for Automated Kernel Testing - Dave Pigott, Linaro [Open Source Summit EU 2018]

LAVA (Linaro Automated Validation Architecture) deploys a test image, runs it and gathers results.

LAVA is now in its second iteration.

LAVA puts some requirements on a device. When you power it on, the device should boot without need for pressing a button. Multiple devices will be connected to one server, so they need to be identified. For example, the USB-to-serial converter needs to have an ID so it can be identified. It has to have serial connectivity, not just e.g. network.

If the board gets bricked, it is powercycled with a PDU (accessed through an abstraction layer). If a button needs to be pushed, it is replaced with a relay. There may be an infrastructural problem, e.g. serial cable has fallen off, so LAVA does a health check if something seems to be off. If the board is completely bricked, it is possible to reflash it (in a board-dependent way) - perhaps using an additional relay. So soldering is needed, which does not scale well.

Linaro has had many iterations of sdmux (i.e. an emulated SD card that you can swap between the host and the target so you can put a recovery image on it). But it doesn’t work well, it doesn’t work on all boards, it works for a while but then degrades, …

Connectivity is an issue. Quality cables are important (serial, USB, network) - a high quality serial cable is worth hours of engineering time. FTDI serial cables, shielded USB cables. USB hubs should have a per-port power control (Cambrionics).

Boards have a wide range of form factors, so varying types of racks are needed. For smaller boards they use monitor shelves. If it comes in a case, refactor it into a 1U case, multiple in 1 case for small boards.

To manage the server infrastructure (there are several LAVA developments in the Cambridge lab), they use Salt and Ansible. For LAVA hacking sessions (i.e. submitting a job that allows a developer to ssh into the board), VPN access is needed.

They have a staging instance to test a new release of LAVA. It is also used for testing a board’s new firmware/bootloader before it is deployed all over the world.

For scalability, they are developing a 1U carrier that can mount 16 (small) boards.