Deferred Problem: Issues With Complex Dependencies Between Devices in Linux Kernel - Andrzej Hajda, Samsung [Open Source Summit EU 2018]

The kernel tries to represent devices in a nice tree, but this is not always accurate.

Some hardware has one interface (PCI, USB): all signals go over that one bus. Power and clocks don’t need to be suppied separately. Usually they are also discoverable. This forms a nice and easy tree.

A second class of devices, however, has multiple interfaces: power supply, clock lines, gpios (interrupts), data buses. Maybe the data is even missing. This requires close cooperation between connected devices. So it is no longer a tree, it becomes a graph, and not necessarily acyclic.

An example is an MHL 3.0 transmitter, transforms HDMI to MHI. It has an I2C (slave) interface, 10 power lines connected to different power supplies (typically 2), an interrupt line, a clock line, a reset line, TMDS and MHL lines for the data, and a hotplug line to detect plugging. In addition it is also an I2C slave for DDC, SPI slave, HSIC for tunneling USB over MHL, and 9 GPIOs.

The kernel has the concept of a device (representing a physical entity), a driver (software to operate a device) and a bus (a device to which other devices can be attached). To match a driver to a device: every time a device is registered, it is matched against registered drivers. Every time a driver is registered, it is matched against devices which are not bound yet. Both devices and drivers may be registered or unregistered at any time. Similarly, drivers may be bound/unbound at any time (through sysfs for example).

Resources are something that is provided by a provider (usually a device) and needed by a consumer (usually another device) to be able to operate. Includes clocks, pwer, gpios, interrupts, but also bridges, panels (i.e. displays), … Sometimes a resource is optional, i.e. the device still works but misses some functionality (e.g. reset line).

When a device is bound to a driver, it is probed. Usually, all the resources are gathered at that time. However, maybe the provider is not probed yet at that time. Also a resource can disappear when the driver is unbound (currently leads to crash). Consumer can also disappear but that is handled (consumer will release its resources). For optional resources, if it is not present during probe, you have to poll for it to be able to start using it if it appears later. Also, before probe, it is not possible to evaluate what resources it needs. Finally, there can be semi-circular dependencies where two devices use resources from each other. To handle that the resource gathering has to be split into phases, gathering the resources that it can acquire and providing the resources that it can provide with those. But again, that would require some kind of polling because the probe function has already finished. Note that this problem is the same as an optional resource.

To solve the probe order, there are a few options. You can have the initcall stages or the order in which things are specified in device tree so the providers are registered before the consumers. However, that is very fragile (e.g. doesn’t work with modules) and is a nightmare to maintain (e.g. limited number of initcall stages).

Deferred probe is another solution. During initcall, the probe function can return deferred. Then the device remains unbound, and it is attempted to bind it again in late initcall, and this can be retried multiple times. This is obviously suboptimal (reprobing multiple times) and significant delays. It doesn’t handle resource disappearance or optional resources. If the resource really isn’t available, the probe will be re-attempted every time a new device or driver is registered, indefinitely.

For resources disappearing, you can just disallow the provider to disappear, e.g. use module refcounting - but then it can still be unbound, so you have to disallow unbinding as well.

Another solution is the device link concept. This explicitly models the link between a consumer and provider. The driver core guarantees that the consumer is not probed until the provider is ready, and vice versa the consumer is unbound before the provider is. This is still quite straightforward, and works well with runtime PM. However, it doesn’t solve the optional resource or semicircular dependency (still single probe). It also still requires deferring of the probe (so reprobing). It also only works for resources that are associated with a device, but some resources aren’t. The link also has to be created by someone. Also, the resource is created before the provider probe function is finished, so the consumer can start using it before the provider probe function has finished. Does the provider guarantee that it can operate correctly if its probe function has not finished yet? It breaks the rule that a consumer is not probed before its provider probe is finished.

Another solution is the components framework (currently used in DRM). It synchronizes multiple devices. A master device lists components that it wants to probe. The component devices get probed independently - the component has a callback for that. So the probe just register things, the bind itself is deferred until all components are ready (through a master bind callback). This handles things fully correctly, but it’s pretty specialised and still doesn’t support optional components.

So, what is still missing: doing bind/probe in the optimal order; support optional resources; unbinding of providers; handling of consumers/providers that are not devices.

Andrzej proposes a new resource tracking framework. Instead of deferring, the consumer registers callbacks for when the requested resource appears, and finishes the probe successfully. So this introduces a new state, separate from “Probed”, where the device is really functional. In the Probed state the device has requested its resrouces, in the Functional state it actually got them. When resources are removed, it goes back to probed state. With optional resources, you get multiple functional states. Optional resources are only requested when the device goes to the basic functional state. What is still needed is some kind of identifier to identify the resource before it is really available, e.g. DT node, pair of (provider name, resource name).

Patches have been posted a while ago but they still need a lot of work.