A Survey of Open Source Test Definitions - Tim Bird, Sony [Automated Testing Summit 2019]

Different test frameworks define/organise tests in different ways. Tim did a survey to try to propose a somewhat standard schema that would allow to exchange tests between frameworks.

It is not the goal to unify all the ecosystems. If there’s some new syscall test for example, it’s better to add it to LTP. Longterm vision is to have a kind of test store where you can download and use existing tests.

The test definitions are used in several parts of the CI loop.

  • metadata to run the test itself
  • prerequisites
  • output parsing
  • result formatting
  • result analysis (pass/fail)
  • visualisation control

Tim looked at the different files that are used, the different languages that are used, and the different elements. He looked at Fuego, Linaro, 0day, Yocto, CKI, Jenkins, SLAV.

Files

Fuego uses:

  • yaml file with metadata
  • source tarball for program
  • json file with variable definitions
  • … see slides

Linaro:

  • … see slides

Yocto:

  • basically a single python file
  • sometimes more depending on the test

0day:

  • PKGBUILD with test metadata
  • install instructions in shell
  • test itself in shell
  • yaml file with metadata for execution

CKI:

  • index file with metadata, triggers, scheduling data
  • Makefile with test phases
  • metadata file is created from Makefile
  • runtest.sh to execute

Jenkins:

  • Single config.xml file that contains everything

SLAV:

  • yaml file
  • some extra files

Looking at all together, everything uses shell at some point. Many use yaml for something. There are lots of custom file formats or languages.

Metadata elements

See the slides or the spreadsheet on the wiki for details of the different elements in the different systems.

In Linaro, the test instructions are very freeform. For example, it may include extra dependencies to pull in.

0day has a lot of info around execution control. This is probably used for scheduling, but Tim isn’t sure.

CKI has a lot of fields, but a lot of them are deprecated and a lot of others are just informational.

Jenkins isn’t comparable to any of the others. Only the scheduling attributes are somewhat related. And of course it doesn’t have anything that relates to the boards themselves.

There is some information that is generally present:

  • name
  • description
  • license
  • version
  • package_version

But those are mostly informational anyway.

In execution control, the only common ones are timeout and source reference. The rest differs wildly between different schedulers.

Dependencies are also very different, but still a few commonalities: kconfig dependencies, package dependencies (though sometimes it’s just pulled in as part of the test). A problem for package dependencies is that the name differs between different distros. There are two possible approaches to dependencies: either the dependency is pulled in explicitly by the test runner (e.g. during the build), or the test is only executed on a machine where the dependency is satisfied. Also for example: root permission is required; the one will just skip the test, the other will sudo. And also sometimes the dependencies are satisfied by the test script itself, e.g. the test does the sudo. There is a lot of dependency information that is not declarative, only imperative in the code itself.

Commonalities in the test instructions are cleanup, build, run and teardown.

Output parsing has no commonality at all. The only thing that is a bit common is that some tests use regular expressions.

Results analysis (thresholds for benchmarks, which tests are allowed to fail, …), formatting and visualisation has even less in common. Often it is just not part of the test setup.

Results format

It would be nice to have a common results format because then it can be stored in a common database. kcidb is becoming the standard here.

Creating test definition harmony

You can’t write a converter from one infra to another, that’s clear. So everyone would have to convert to a single format. But how do you do that? For declarative formats, it’s still somewhat doable. But what with code?

The next question is where to store the standard format.

The only way to get there is to try to execute other framework’s tests, and see how it maps.

We also have to converge on some metadata aspects.

  • Converge on test phases
  • Match the naming for pre-requisites and dependencies
  • Harmonize test variable names. For example, iozone has a variable that specifies how many loops to perform. It would be nice if everyone used LOOPS for this.
  • Create a common shellscript library to perform common functions. This does things like starting services, getting parts of logs. Similarly, if tests do things like install a package, it would be nice if the function for doing that was always called the same.

Tim proposes to create a server that can host test definition objects. Best to start with a small schema, like kcidb. Tim already started something like this. The goal would be to have a repository where you can browse, download and execute tests.

Yocto has a tool that creates test results in a common format.

An idea is for every project to contribute a common test in their own format, so that it’s at least possible to compare the different projects. Now it took Tim a lot of effort to even find out about all the metadata etc. for the different frameworks.

Since the amount of manpower is limited, you have to prioritize. The priority should certainly be to converge on the test results, not so much on the test metadata. This actually forces people to insert information in their test definitions, because test results are only useful if you also know what test was run, on which arch, under which conditions, etc. If the metadata in the test results is standardized, this is also standardisation of the test definition metadata.