Why Embedded Cameras are Difficult, and How to Make Them Easy - Laurent Pinchart, Ideas on Board [Open Source Summit EU 2018]

libcamera is a new project to make dealing with cameras easier.

video4linux is the API in the kernel for cameras. The userspace API was modeled based on PC-based hardware. When embedded system were beginning to be supported, the architectures were pretty simple, basically a single pipeline. But then hardware become more complex, with many processing blocks possible. So exposing a camera is a simple device was not enough anymore, userspace needs more fine-grain control over the components: the media controller API. But this made things more complicated for userspace as well: the application has to interact with several device nodes, node just one (with a large number of ioctls). So now userspace needs some abstraction to avoid needing to re-implement applications for every platform.

libv4l emulates the v4l interface and you can have plugins to support different types of platforms. But not much has been developed there. And it anyway would not have been enough: the complexity is indeed hidden from the application, but it offers no way to use the new features of the hardware.

libcamera is a new library but still in early stages of development. There is an architecture but not much more than that. It will offer interfaces for different framework integrations, e.g. GStreamer, v4l2 compat, Android Camera HAL. And also a new C++-based native API (possibly bindings for different languages). v4l2 compat allows you to use the camera in pretty much the way you can do it today: just capture a stream, set resolution, things like that. Android Camera API does expose many of the features that you have in cameras today; currently vendor implementations reinvent the wheel all the time. The idea is that there could be a standard implementation of the Android Camera API on top of libcamera, and that the vendor-specific things move into libcamera. That way, they are available not just in Android but also to other applications.

Central concept in libcamera is the camera device. This is how the user sees it, so a phone would have two cameras (even if the back camera actually has two sensors). It can enumerate cameras. The underlying device nodes are hidden from libcamera users. Each camera device exposes capabilities (i.e. features). There will be a very large number of capabilities, in the hundreds. So to make it easier for libcamera users, there will be profiles - even if some capability is not supported, the profile might be able to emulate it.

Out of the camera come different stream, e.g. a live stream and a still capture at a different resolution. For all streams, there will be per-frame controls (e.g. exposure time, flash). Problem with current v4l API is that the control is decoupled from the stream, so you can’t predict to which frame your control will apply. This makes it very hard to do things like AWB or AE. Per-frame controls synchronise the controls with the stream.

libcamera will include algorithms for AE, AWB, AF. Currently these are very close-source implementations from vendors - sometimes implemented on the camera itself, but often also with closed-source blobs on the SoC. The idea is to support those binary blobs as well, but to create an ecosystem that will allow the implemetnation of those algorithms in open source as well.

In a camera device, a large part is device-agnostic, instead of the completely closed-source Camera HAL implementation that vendors have now. Vendors can insert vendor-specifc components in there: image processing algorithms and a pipeline handler.

The Camera Devices Manager keeps track of camera devices, does hotplugging and enumeration.

The pipeline handler of a camera device sets up the pipeline on the SoC. Typically, the stream comes in from the sensor over e.g. CSI, goes into memory, and from there the ISP processes them into multiple output streams and statistics. That requires buffer handling, configuring the pipeline, scheduling the streams. This is done by the pipeline handler. It can be really simple if the ISP is simple, but can become more complicated. So it is vendor specific, but the concept of buffer handling and scheduling is generic. The pipeline handler will be open source, the camera vendors have to upstream it.

There will be a 3A API to communicate between the generic libcamera and a vendor-specific (closed source) component for the image processing algorithms. However, this is untrusted vendor code. Therefore, that component will be sandboxed, using an IPC mechanism and a separate process with seccomp limitations. The same API will exist on both sides of the IPC, so for the vendor it is invisible. The image processing algorithms will not have direct access to the devices. This way, we can see how they use the devices (because it has to go through libcamera), and they can’t hide unfair hardware access through an open source camera driver.

For v4l2 compatibility, there will be an LD_PRELOAD library that emulates teh ioctls. This has to be fully transparent to support closed-source applications.