Power Debugging with JTAG - Patrick Titiano & Alexandre Bailon, BAYLIBRE [Open Source Summit EU 2018]

With JTAG, it is possible to do non-intrusive, realtime, OS-agnostic, multi-architecture profiling.


Start with a demo: realtime visualisation of CPU load and CPU bus accesses. For this, no modifications to the target application code or kernel are needed.

Access to profiling information should be non-intrusive: don’t interrupt the CPU execution flow or change power states. But most monitoring tools available are running on the target (ftrace, perf, powertop, …

So, we want everything to run on a host instead of on the target. We don’t want to rebuild the code for profiling. We want a generic way to describe the SoC so generic debugging and visualisation tools can be developed - like device tree for configuring the kernel.

libSoCCA enables non-intrusive access to SoC registers through JTAG. It abstracts the platform through SVD files. The target OS is not involved at all, the host can also be anything since it’s written in Python.

Why JTAG? It’s the only existing solution for non-intrusive access. Most SoCs support it. OpenOCD is a generic way to use it.

The SVD (System View Description) file is an XML description of the SoC registers. With the SVD file, it’s possible to make a generic visualiser that allows access to registers by name and access the bitfields.

In this case, we want to access the memory without stopping the CPU instead of using JTAG to insert a breakpoint as usually. libSoCCA abstracts this interface, so other access methods would be possible as well, in case JTAG is not available. With the SVD files, it’s possible to access the registers and bitfields by name from Python.

An architecture abstraction layer gives a generic interface to subsystems like clock and PMU. Currently supported architectures are ARMv7 and ARMv8, in AMLogic S905X, NXP i.MX7ULP, STM32F4.

PMUgraph is an application that uses the PMU to plot a live view of the CPU load and memory accesses. The overhead is limited to 480 bytes per second over the interconnect for a 10Hz update rate (on a 400MByte/s bus). It’s even negligible compared to the 500KBps JTAG speed.

Clock tool under development to snapshot the clock states. This allows realtime clock tree visualisation, but also realtime clock control, and detecting powerstate changes using watchpoints.

The dream is to have a single tool that can snapshot all state: power consumption, CPU load, …

Problems encountered while doing this:

  • ARM Coresight is not very well documented for anything other than normal debugging of the CPU. It’s rather an overload of information that is not relevant. Not all SoCs turn out to have the same features for debugging. For example, the OMAP3 doesn’t allow to read the PMU directly from JTAG, it has to be done by stopping the CPU and using cp15.
  • Not many platforms provide SVD files. In addition, they are incomplete, e.g. the PMU is not desribed. SVD files don’t support include directives, so they wrote a tool to merge SVD files.
  • OpenOCD is not always easy to setup. The telnet API is not really machine-accessible because warning messages are mixed with normal output. The OpenOCD Python library has some shortcomings, e.g. it doesn’t handle async events in a race-free way. Should be rewritten using a different API than telnet.

Future work:

  • Support for watchpoints/breakpoints to reduce amount of polling required.
  • Integrate in CI framework.
  • LibSoCCA should be reintrant so there can be multiple users of the JTAG interface.
  • Documentation, more SoCs, more applications.