Update on automated testing systems [Automated Testing Summit 2019]

In a series of lightning talks, various contributors gave a status update on the various automated testing systems.

LKTF (Milosz Waslewski)

LKFT tests arm32, arm64, i386 and x86_64 on various hardware. 32-bits are tested on 64-bit server in qemu. Runs various test frameworks: LTP, libhugetlbfs, perf, v4l2, kvm-unit-tests, s-suite (an IO benchmark), kselftests. Runs about 25K test on each push of the LTS kerneels, the latest stable and master. It also runs other testsuites on Android kernels. It doesn’t find very many bugs. Often failures are due to the test suites themselves.

LKTF uses openembedded for the build. LKTF 2.0 however will align with kernelCI: don’t rebuild the rootfs, don’t use fastboot flash but NFS based rootfs where possible. They also want to create pieces of filesystem that can be overlayed as needed for a specific test. They’re also working on parsing the logs to find errors and warnings. Also TAP will be used to extract test data. For reporting, they’re looking at a good reporting and analysis tool. They’re looking at kcidb to aggregate results with others. An important concern is to identify flaky results.

Fuego (Tim Bird)

Fuego is its own Debian-derived Linux distro with Jenkins and the test execution core and a series of tests. All is inside a docker container. It is targeted at product testing, so a lot of the tests are high-level integration networking tests. It’s not testing upstream kernels, it’s testing specific product kernels. The tests themselves are part of a Fuego install. If we’re lucky, the users upstream their tests when they are sufficiently generic. Most of the test contributions come from distros. A volume is mounted inside the docker to get to toolchains etc. Tests are build from source; tests are cross-built (except the toolchain). The firmware itself is not built, however, it’s assumed to already be present on the target. But they will (have to) add that. Tests consist of a test script and parsing, analsys and visualization. It supports multiple transports: ssh, adb, serial. Test scripts should make minimal assumptions about the target, e.g. no awk, posix shell. It uses Jenkins for front-end (launching tests) and for back-end (visualizing results), but there’s a command-line tool to. The latest release also has a Jenkins-less install so it can be integrated with other systems, or without container (i.e. in your own environment). The latter can be used to run the whole thing including building on the target itself. It supports tests from other frameworks (Linaro, ptest) and running under LAVA (prototype). Back-end is configurable, so it should be easy to add kcidb support.

They would like to add the monitors from 0-day, to get additional data next to the test results. They want to continue integrating in other systems like labgrid, beaker, … They’ll have to add support for provisioning the software on the target board. The SUT artefacts should ideally be pulled from an artefact server so it can be integrated in a pipeline. They’d also like to add hardware testing, there’s a talk about that. fserver is a test object server. It stores tests, artefacts, and results. It can be used to deliver request from one host to another. It helps to support distributed operation. It is not complete.

Remark at the end of the talk: one thing we’re not sharing enough is the pass criteria. A lot of the tests are benchmarks. That gives a number, not a pass/fail. The number you will get does not just depend on the software under test, but also on the hardware on which it runs. E.g. a filesystem benchmark will be different on a Kingston SD card and a Samsung SD card. Fuego has created an object to describe the hardware and associate the pass/fail number to it. But people are not actually sharing this data.

KernelCI (Kevin Hilman)

From the beginning, the goal of KernelCI was to test on a wide variety of hardware. Today: x86_64, arm, arm64, mips, arc, riscv. Multiple dimensions of variation so a lot of combinations: multiple kernel trees (stable, next, various subsystem trees), multiple kconfig options (all upstream defconfigs + various fragments), multiple compilers (gcc, clang, multiple versions). Tests are boot test but also a bit more: graphics test (on appropriate devices), v4l2-compliance, suspend/resume, USB smoke test. The focus is on running other’s test suites, not creating kernelCI specific.

KernelCI wants to increase collaboration and avoid the fragmentation. Even more so now it joined the Linux foundation.

Challenge right now is to make sense of all the data that kernelCI is producing. It collects a lot of results, logs, artifacts, from a huge matrix of configurations. Big data analysis could be done on this. kcidb is a BigQuery database with test results from various projects. It’s very minimal now, so a lot of data is lost, but it has to be common to all the projects that push to it. Once the database is there you can use various existing tools to analyse it. This started just after plumbers, with KernelCI, CKI and LKTF.

Google and Microsoft have joined the project, which has added a lot of compute power. This makes it possible to add even more config options. They want to keep on expanding the hardware. It’s quite easy to set up a lab to contribute.

The test sripts have been refactored to be more independent from Jenkins, so they can also be used in other CI pipelines.

CKI (Veronika Kabatova)

CKI is RedHat’s testing infrastructure. Its goal is to avoid new bugs getting into the kernel. It takes a number of upstream trees: stable, stable-next, arm-next, rdma and rt-devel. For x86_64 only it also tests scsi, net-next and mainline. Focus is on x86_64, aarch64, ppc64{,le}, s390x.

CKI uses gitlab-CI. Triggered from patchwork, git repos, koji or COPR. The tests run with Beaker, there’s a separate talk about that. Pipeline is merge, build, publish, test.

KPET is a tool that analyses patches and runs tests based on what has changed. It reduces the number of tests that need to be run so shortens the feedback loop. It also avoids false positives, because irrelevant tests are not run.

Tests are integrated in collaboration with the developers of the tests themselves.

A future goal is to have targeted tests per subsystem.

SLAV (Pawel Wieczorek)

For SLAV, it is important to use the DUT also outside the test infrastructure. So the DUT control plays a central role. On top of that is a target manager to access the DUT, and a test scheduler.

They developed a target manager device, but since it’s a research institute they’re not able to sell it.

SLAV is a proof of concept. See the slides for a feature list. They’re still polishing it but it’s a free-time activity now for the developers. They try to set up a MuxPi-less environment to show that it’s useable for a wider audience. However MuxPis can be purchased now, and the firmware of the microcontroller has been open-sourced. They’re also documenting the design decisions. They’re using https://elinux.org/Test_Glossary terminology to make sure it stays aligned with other projects.

Dmitry

I couldn’t follow what this was about :-(

There is a repository of 3000 reproducers of crashes.

To communicate test results, there starts to be a bit of consensus about how it should be done. It would be with feeds, like the public-inbox mailing list archive.

Open Testing (Kevin Hilman, c/o Guillaume Tucker)

To have a CI system you need a long pipeline:

  • code
  • build
  • run
  • process (collect logs)
  • store
  • analyse
  • report

A problem is if one of these blocks remains proprietary.

One of the first areas where we can share is in the reporting. Or rather, to dump all the results together so there can be reporting.

https://designing-for-automated-testing.readthedocs.io has a collaborative document that tracks what you need to do to make a board suitable for test automation.