The Linux Capabilities Model - Michael Kerrisk, man7.org Training and Consulting [Open Source Summit EU 2019]

Traditional UNIX model is extremely coarse: you only have superuser and normal user. Capabilities are an attempt to mitigate this problem.

Capabilities break down the superuser power into (currently 38) smaller pieces. See man 7 capabilities. The traditional setuid root gets all these capabilities. The idea is to replace the setuid root with set-capability.

Processes have capabilities, but executable files can have the as well. It basically gives those capabilities to the process when you execve the file, like setuid.

A capability set is a bitmask representing a group of capabilities. Each process (actually thread) has 3 (actually more) capability sets:

  • permitted
  • effective
  • inheritable

Files have 3 corresponding capability sets. Inheritable capability sets were a mistake, they’re not usable in practice. Process capability sets can be seen in /proc/PID/status. getpcaps reads this and shows it in more-or-less human-readable form.

A process can “raise” (i.e. acquire) and “lower” (i.e. drop) capabilities. libcap is a library to simplify these APIs. Effective capabilities can be raised if they’re in the permitted set. Dropping a capability from the permitted set means that it can never reacquired again (except by execve’ing another program). Effective capabilities can be dropped and then raised again later.

Permitted capabilities are the permitted capabilities it acquires on execve. Effective capabilities is a single bit that says if the permitted capabilities become immediately effective. Without it, it starts out without effective capabilities. Normally you would not set the effective capabilities, but it’s useful for programs that are not capability-aware.

setcap allows to assign capabilities to a file. It is stored as xattr in the filesystem. Obviously, you need privilege: CAP_SETFCAP. getcap shows the capabilities. libcap has corresponding functions, and also conversion from/to string.

During execve, the process permitted capabilities are ANDed with the previous process’s bounding set (usually everything). The capability bounding set makes it possible to irreversibly drop capabilities that children can never reacquire, even by execve’ing a setcap program.

Inheritable capabilities have now been replaced with ambient capabilities. The problem is that a program with capabilities sometimes wants to run a child process that doesn’t have any setcap. However, on execve, the child gets the capabilities of the file, not the capabilities of the parent process. So, there has to be a way to make sure that some capabilities are inherited by children, i.e. ORed with the file capabilities. A process can copy capabilities from its permitted set to the ambient set, and those will be transferred over execve, but only if the binary is not privileged. Reality is a little more complicated. See the slides for the formal details.

Capabilities are complicated to understand, less familiar to sysadmins. They’re more work to program. Some capabilities can easily be used to get full root access. This is not always very obvious. About half of them can be used for full root access, according to grsecurity. Capabilities are often too broad: a particular program needs to do only one specific thing, but the capability allows a lot of other things as well. From the kernel developer’s point of view, when adding a new thing, they have to decide if they should use an existing bit or create a new one. Obviously, usually an existing bit will be reused. In particular, CAP_SYS_ADMIN allows you to do many different things. It accounts for over 45% of the capability checks in the 5.2 kernel.