Different test frameworks define/organise tests in different ways. Tim did a survey to try to propose a somewhat standard schema that would allow to exchange tests between frameworks.
It is not the goal to unify all the ecosystems. If there’s some new syscall test for example, it’s better to add it to LTP. Longterm vision is to have a kind of test store where you can download and use existing tests.
The test definitions are used in several parts of the CI loop.
Tim looked at the different files that are used, the different languages that are used, and the different elements. He looked at Fuego, Linaro, 0day, Yocto, CKI, Jenkins, SLAV.
Fuego uses:
Linaro:
Yocto:
0day:
CKI:
Jenkins:
SLAV:
Looking at all together, everything uses shell at some point. Many use yaml for something. There are lots of custom file formats or languages.
See the slides or the spreadsheet on the wiki for details of the different elements in the different systems.
In Linaro, the test instructions are very freeform. For example, it may include extra dependencies to pull in.
0day has a lot of info around execution control. This is probably used for scheduling, but Tim isn’t sure.
CKI has a lot of fields, but a lot of them are deprecated and a lot of others are just informational.
Jenkins isn’t comparable to any of the others. Only the scheduling attributes are somewhat related. And of course it doesn’t have anything that relates to the boards themselves.
There is some information that is generally present:
But those are mostly informational anyway.
In execution control, the only common ones are timeout and source reference. The rest differs wildly between different schedulers.
Dependencies are also very different, but still a few commonalities: kconfig dependencies, package dependencies (though sometimes it’s just pulled in as part of the test). A problem for package dependencies is that the name differs between different distros. There are two possible approaches to dependencies: either the dependency is pulled in explicitly by the test runner (e.g. during the build), or the test is only executed on a machine where the dependency is satisfied. Also for example: root permission is required; the one will just skip the test, the other will sudo. And also sometimes the dependencies are satisfied by the test script itself, e.g. the test does the sudo. There is a lot of dependency information that is not declarative, only imperative in the code itself.
Commonalities in the test instructions are cleanup, build, run and teardown.
Output parsing has no commonality at all. The only thing that is a bit common is that some tests use regular expressions.
Results analysis (thresholds for benchmarks, which tests are allowed to fail, …), formatting and visualisation has even less in common. Often it is just not part of the test setup.
It would be nice to have a common results format because then it can be stored in a common database. kcidb is becoming the standard here.
You can’t write a converter from one infra to another, that’s clear. So everyone would have to convert to a single format. But how do you do that? For declarative formats, it’s still somewhat doable. But what with code?
The next question is where to store the standard format.
The only way to get there is to try to execute other framework’s tests, and see how it maps.
We also have to converge on some metadata aspects.
Tim proposes to create a server that can host test definition objects. Best to start with a small schema, like kcidb. Tim already started something like this. The goal would be to have a repository where you can browse, download and execute tests.
Yocto has a tool that creates test results in a common format.
An idea is for every project to contribute a common test in their own format, so that it’s at least possible to compare the different projects. Now it took Tim a lot of effort to even find out about all the metadata etc. for the different frameworks.
Since the amount of manpower is limited, you have to prioritize. The priority should certainly be to converge on the test results, not so much on the test metadata. This actually forces people to insert information in their test definitions, because test results are only useful if you also know what test was run, on which arch, under which conditions, etc. If the metadata in the test results is standardized, this is also standardisation of the test definition metadata.