Networking: From the Ethernet MAC to the Link Partner - Maxime Chevallier & Antoine Ténart, Bootlin [Open Source Summit EU 2018]

This talk is about the physical and data link layers of the network stack. Ethernet exists already for 30 years, and has evolved a lot and become more complicated.

The physical layer is the electrical specification of how data is put on the wire (or air, optics, …), and how bits are converted into this signal.

The data link layer adds framing to actually transfer data, between two directly connected nodes. You need a network layer on top of it to transport data with hops.

In Ethernet, the data link layer is the MAC (Media Access Control); the physical layer (PHY) is connected to the MAC through an MDIO bus for control and (R)(G)MII for data.

More complicated setups have different connectors (RJ45, SFP (= Small Form-factor Pluggable)). Example: MacchiatoBin: it has 4 ethernet ports and 6 connectors (RJ45 and SFP). So on some MAC ports there are two connectors connected to a single PHY. So the link has to be reconfigured depending on which port is in use. One port has the MAC directly connected to an SFP+ cage, without PHY.

Both the MAC and the PHY have a driver in Linux: struct net_device resp. struct phy_device. Sometimes you have MAC and PHY in a single package, and the PHY is handled directly in the MAC driver.

To control them, there is ethtool to control the MAC (and also the PHY in case they are in one package). mii-tool is deprecated, but it can dump the PHY status.

The MII (Media-Independent Interface) handles the MAC-to-PHY connection (for data). It includes GMII, RGMII, SGMII, XGMII, XAUI. The MDI (Media-Dependent Interface) connects the PHY to the physical medium (cable). There are dozens of media (with corresponding IEEE 802.3 standards).

The link mode (e.g. 1000Base-T) has a notation that describes the speed (1000), band (Baseband), medium (twisted-pair, PCB, copper, fiber, …), encoding (e.g. X = 10b/8b) and number of lanes (= numer of wires used, if larger than 1).

Inside the PHY, there are three main components: PCS (Physical Coding Subsystem) is the codec of the MII. PMA (Physical Medium Attachment) translates between PCS and PMD and does e.g. collision detection. PMD (Physical Medium Dependent) is the codec for the physical medium. The PCS is what is important to Linux because that’s what we talk to.

MDIO is the control interface of the PHY. It’s similar to I2C: a 2-wire serial addressable bus (up to 32 PHYs on one bus). It is used to access pHY confifguration and status registers. Originally 5 bit register addresses and 16-bit data. Clause 45 extends to 16-bit addresses and different subdevices in the PHY (PCS, PMA, PMD).

Each PHY has an ID which can be used to bind it to the correct driver. So it is sufficient do specify in the device tree that it’s a standard PHY, and the driver can be auto-discovered. The register sets are standardized so there is not too much variation between devices. phylib does most of the work.

The MII needs a lot of pins, especially at higher speeds. So at higher speeds, serial differential links are used with a SerDes block on each end. SGMII is not a real standard but often used: 4 differential pairs, called Base-X PCS. 1Gbps but you can put 4 in parallel for 5Gbps.

Because of all this variation, there is now an explicit phy_interface_t to represent the MAC-to-PHY connector, and it’s described in the device tree.

To advertise the supported speeds, not only the PHY capabilities has to be taken into account, but also the MAC and their link. So the software sets what will be advertised.

SFP is hot-pluggable and can have a PHY. Since it’s hot-pluggable, you no longer have a fixed MAC-to-PHY link. Also, part of the PHY (PCS) can be embedded in the MAC and a SerDes link is used to connect to the PHY. So, on the Macchiatobin with two connectors on 1 PHY, there are different ways it can be configured. If the RJ45 is used, it’s the normal way. With a passive SFP transceiver, the normal PHY is used as well. But if the PHY is on the SFP, it has to be dynamically added.

To deal with this dynamism, the phylink infrastructure was added. It handles reconfiguration of the PCS in the MAC. It will make sure everything is configured in the right way before sending data. When the MAC is created, the phylink is also created but the link is down. When the MAC is started, phylink connects to the PHY and powers it up. It is configured in its default configuration (as specified in device tree). When the PHY establishes a link with a partner with a certain speed, phylink will reconfigure both the MAC and PHY to use a compatible link between MAC and PHY.