After a few years, it’s time to give an update of what happened in V4L2 land. There are two major topics: support for hardware codecs, and testing.
See the slides for various links and resources.
Hardware codecs are interesting because they involved a lot of different companies. There are two different types of codecs: stateful (i.e. with their own memory from one frame to the other, and do parsing of the bytestream), or stateless (you have to give it the previous frame). The memory is needed for predicted frames, P frames and B frames. Stateful codecs were already supported for a long time. The main new thing is that the details are accurately documented now, e.g. of how to do seeks. Several drivers have been updated to follow this specification.
A vicodec driver has been written to test the API. This is a virtual stateful codec without underlying hardware.
Stateless codecs don’t have their own memory. The user has to provide the parsed bytestream and the reference frames. This is done in userspace. For now no stateless encoders exist, but they will appear in the future. The work to document the API for stateless codecs is ongoing. A vicodec driver for stateless exists as well, developed in the Outreachy project.
Stateless codecs use a new framework, the Request API. Currently two drivers: cedrus for Allwinner, hantro for Rockchip and i.MX8. All of this is in staging, until the API is mature. Also not yet in uapi, because they will change. For now tested with MPEG-2, H.264 and VP8.
The Request API is the main reason why it took so long to have stateless codec support. Hans had already implemented a Configuration Store API for ChromeOS, that was more geared towards brightness, contrast etc. on a per-frame level. The basic idea is to create a request object and associate configuration data and video buffers to it. This request is queued, the driver programs it in the hardware. Currently only used by codecs, but can be used for complex camera pipelines as well. For cameras however the API still needs to be improved. There’s no target date for this.
So stateless decoders expect that the state is set through controls. The controls depend on the codec type (e.g. MPEG-2, H.264), but are not hardware specific. This is possible because the standard specifies exactly what the bytestream contains. For the encoder, it is possible that there is hardware variation and more controls are possible. But no stateless encoders for the time being.
The userspace API uses two device nodes:
/dev/mediaX for the Requests and
/dev/videoX for the video buffers.
Userspace has to parse the headers out of the bytestream.
This is in userspace because parsing is risky: it may trigger buffer overflows.
Also, it is hard, due to e.g. packet loss.
So there is policy involved and you want to do this customized.
This is also why people who do decoder software prefer stateless decoders: they can control more.
Therefore, this part is in userspace.
So when userspace parses 10 frames out of the bytestream, it creates 10 output video buffers. They are provided to the decoder in a different sequence: first the I frame, then the P frame, and then the B frames that come between them. The output buffer has a flag to say it is linked to a request. The timestamp gets a tag instead of a real timestamp, to link it to the specific request.
Userspace queues the result buffer (capture buffer) on the video device, then the request (with the encoded data) on the media device. Then wait for a signal on the media device that it has completed. Then dequeue both the decoded frame and the corresponding reference frame from the video device. For H.264 it is a bit more complicated because it supports slicing (partial frames). This improves latency because you can already get a partial decoded frame before the entire bytestream has been read in. Some stateless decoders decode each slice in a separate buffer, but some of them require that the same buffer is used for subsequent slices.
Is there a difference in performance between stateless and stateful decoders? It’s impossible to say, depends a lot on the hardware of course. It would be great if we could also bypass the parsing in the stateful codecs and use them as stateless.
Zerocopy is supported up to some level. The headers have to be parsed for sure. The encoded frame data is passed in as a pointer, so it can be zerocopied. However, drivers may have requirements which trigger an extra copy in the kernel.
There has mainly been a lot of work in the virtual drivers.
The virtual drivers are used for variouskind of testing. vivid is run as part of kernelci. Compliance test can still be improved a lot. Documentation has become a focus. It is very useful to find corner cases. syzkaller also runs on the virtual drivers.