Introduction to HyperBus Memory Devices
- Vignesh Raghavendra [Open Source Summit EU 2019]

HyperBus is a parallel bus for somewhat lower spec’ed (low pincount) devices than DDR2/3. It has 8 data lines and a (sometimes differential) clock line. It strobes on both rising and falling edges (double data rate). It is used for flash and for RAM, both can be on the same bus. Read throughput can go up to 400MB/s.

HyperFlash draws upon the legacy features of both parallel and serial memories.

HyperRAM is self-refresh DRAM with a HyperBus interface. An additional data strobe is needed to indicate data validity, because the device is not available when a refresh is ongoing.

A HyperBus transaction has a command phase, a wait phase and a data phase. Command phase is always applied by the master. The wait phase is applied by the device: waiting for refresh to finish or for data to be available.

Command is 47 bits. Bit 47 indicates read or write. Bit 46 indicates target address space (memory itself or config registers). Bit 45 indicates how a burst is interpreted: linear or wrapped. Wrapped is useful for filling a cache line (get the data you need first, then get the rest of the cache line). Bit 44-16 are the half-page address. The half-page is the smallest unit on which ECC is calculated. It is 16 bytes.

HyperFlash follows CFI extended command set 0002, so the generic driver for it should work. It comes up in direct read mode, which is very useful for boot.

To write to flash, it first has to be unlocked with a two-step unlock stage. Data is buffered before going to flash. The last bit of the programming sequence is a confirm command.

Address space overlays are used to access the control space. E.g. CFI space, persistence registers, vendor-specific areas. There are commands to switch address spaces.

On the host side, there are two types of controllers. Dedicated HyperBus controllers only understand the HyperBus protocol. They usually support memory-mapped access to flash similar to SDRAM controllers. This allows XIP. Multi-IO serial controllers, on the other hand, generally don’t support memory-mapped access but require an expicit transfer to memory.

In the kernel, support was merged in v5.3, but only for HyperFlash with MMIO capable controllers. It hooks to the existing CFI framework. The map framework forwards the CFI commands over MMIO. The HyperBus framework defines the address ranges and probing. Some changes were needed, e.g. to check for status after erase, the CFI framework would look at the status lines, but in HyperFlash you need to poll a status register.

To create a HyperBus controller (HBMC) driver, you just need to implement a few ops functions:

  • read16: read 16 bits of data, used to read from non-default address space e.g. CFI.
  • write16
  • copy_from: read data from flash array
  • copy_to
  • calibrate: for the pin timings

The difference between the read/write and the copy is that the latter may use aligned accesses and discard data, while the former need to access exactly the requested word.

The hyperbus_device structure is generic though currently only HyperFlash is supported.

In the device tree, you populate the hbmc as a memory controller and put the flash devices as children.

For userspace, HyperFlash is just a mtd device like any other.

HyperFlash is now part of the JEDEC xSPI (following QSPI) specification. So it’s one of the standardized ways of programming serial flash. It’s a separate profile from traditional SPI NOR. The main difference in signaling is that in SPI NOR, the command and address phases are separate, while in HyperFlash they are mixed. For compatibility, HyperFlash powers up in SPI mode (single line). It goes to HyperBus mode by setting a configuration register bit. A Serial Flash Discoverable Protocol table allows to read the parameters from the flash in a vendor-agnostic way.

In the kernel, there’s a spi-mem layer to abstract SPI memory devices. Its operations assume command-address-data, with 1 byte command and 4 byte address. For HyperFlash, we’d need to extend with a new function or add arguments. This way, a single kernel driver can talk to both HyperFlash and SPI NOR devices.

At the moment, write performance is quite slow because it’s done at byte granularity. DMA would also be useful, but buffers coming from flash filesystems like UBIFS are vmalloc’ed so they’re difficult for DMA.