New Ways Out of the Struggle of Testing Embedded Devices - Chris Fiege, Pengutronix e.K. [Automated Testing Summit 2019]

Chris is an electrical engineer, he supports the software developers with the hardware part of the story.

The Pengutronix lab environment consists of DUTs, power supply switches, serial ports (network connected), Ethernet switches, and GPIO switches. There is also a central CAN bus to which devices can be attached. Similar for USB. A test can span over multiple devices. There’s a test server in the same rack as the devices. There’s also a WiFi access point and a Bluetooth device. There are also USB devices.

There are lots of USB devices in the lab infrastructure. The USB interfaces of the DUTs sometimes misbehave. But also the hubs don’t always behave properly. Because there are so many USB devices connected to the server, there are sometimes not enough USB addresses on the bus.

EMC becomes a problem because lab is too compact. There are ground loops between the USB, Ethernet shield, serial, 1-wire, CAN, … buses. Also capacitative over the power supplies. Thus, sometimes problems are triggered by switching big loads - maybe some of the USB problems are due to that.

The lab uses 1-wire for a lot of switching because it is compact. There are two problems with that. It is accessed over USB so if USB breaks down, so does 1-wire. Also the owfs server sometimes seem to loose devices. It’s also difficult to debug.

There will be more automated testing in the future, but also more interactive work on the central test setup. Therefore, the reliability needs to increase. Some ideas to get there:

  • Decentralize the test approach, i.e. multiple test servers and connect fewer devices to the test server. Even better, have an embedded device as the test server, so you can use real hardware for GPIO, 1-wire, CAN. The impact of a problem is much smaller then. Such a small conroller should cost $200-300.
  • Ethernet doesn’t cause too much problems, so that can stay centralised.
  • There should still be a centralized test server that is x86_64.
  • A problem with so many test servers is that scalability problems pop up: updating the OS, reliability of power supplies and storage.
  • Replace 1-wire and (some) USB with CAN bus. This allows galvanic isolation between the bus and the DUT. However, it adds complexity because a microcontroller is needed on the node. Also some protocol on top of the CAN bus has to be used.
  • Currently, the DUT’s power is switches with the PSU from the customer. However, this creates high load switches which may be the source of some of the problems. So it would be better to sitch the 5/12V supply on the device itself. This also gives the advantage that the inrush current can be controlled, and it allows measurements of current and voltage.

For distinguishing between a hardware failure and a test failure, there should be some known-good situation that is checked in the test infra. LAVA and labgrid have something like this, but it doesn’t cover everything, e.g. a flaky USB device.

From the audience, there were some idea of what should be in this test hardware.

  • Power switches
  • Ethernet that is isolated (link can be brought down, can be put on the bridge or isolated). However, for that, it’s also possible to use a central managed switch. The labgrid setup already has VLAN control of the different ports, it would be possible to add link control to that with SNMP.
  • CAN bus seems to be more interesting than capes for extending the setup with specialised equipment (ADC, temperature control, …).
  • Sometimes high-speed extension devices are needed. USB3 can be used for that. But then of course the host needs to support USB3.
  • For some use cases, the host needs to be an x86_64.
  • SDmux controlled directly (without USB) would be useful as well.