Acceleration techniques for white-box CPEs

This blog will provide a quick introduction and comparison of some of the available acceleration technologies common on white-box CPE, sometimes also referred to as “universal CPE” or uCPE.

Classical premises equipment has traditionally relied on specialized network processors to deliver network processing performance. Standard x86 hardware however, which was originally designed for more general purpose compute tasks, especially when used together with a “plain vanilla” Linux implementation, will result in disappointing performance levels for data communication purposes unless expensive x86 CPUs are used. To address this concern, a number of software and hardware acceleration techniques have been introduced to meet the performance requirements imposed on today’s CPEs.

The processing context for white-box CPE is an environment that provides a small virtualized infrastructure at the customer premises, where multiple VMs (Virtual Machines) that host VNFs (Virtual Network Functions) are created and hosted in a Linux environment, Service chaining is established between the different VMs, resulting in a final customer service which can be configured and adapted through a customer portal. In this setup VNFs and VMs need to communicate either to the outside world through a NIC (Network Interface Card) or to another VNF for service chaining.

DPDK (Data Plane Development Kit)
In the case of white-box CPEs, DPDK provides a framework and set of techniques to accelerate data packet processing and circumvent the bottlenecks encountered in standard Linux processing. DPDK is implemented in software and basically bypasses the Linux kernel and network stack to establish a high-speed data path for rapid packet processing. Its great advantage is to produce significant performance improvements without hardware modifications. Although DPDK was originally developed for Intel-based processor environments, it is now also available on other processors such as ARM.

AES-NI (Advanced Encryption Standard New Instructions)

This is an extension to the x86 instruction set for Intel processors to accelerate the speed of encrypting and decrypting data packets using the AES standard. Without this instruction set, the encryption and decryption process would take a lot more time since it is a very compute-intensive task. The encryption is done at the data plane level and is used to secure data communications over Wide Area Networks.

(QAT) Quick Assist Technology

This technology is based on hardware acceleration and is available from Intel only. It reduces load on the CPU by offloading compute-intensive operations such as bulk cryptography, public key cryptography and compression onto a specialized processor. QAT can be integrated into the main CPU (as on some of the Intel Rangeley processors) or can be added as a separate hardware accelerator chip. The difference between QAT and AES-NI is that QAT completely offloads the CPU for encryption-related tasks while AES-NI just provides specialized instructions in the CPU to accelerate computation in relation to the AES encryption standard.

Peripheral Component Interconnect (PCI)-passthrough

PCIe is a high-speed communication bus present on computer systems, which allows high-speed communication between the CPU and peripheral devices such as Network Interface Cards (NICs). One way to accelerate communication between a VM and a Network Interface Card is to use PCI-passthrough, which provides the VM exclusive direct access to the NIC over the PCI bus without traversing the host Linux kernel. This mechanism is only possible if the hardware device is not shared between different VMs. The advantage is that overall performance is improved and processing power is saved for other more productive tasks.

SR-IOV (Single Root I/O Virtualization)

SR-IOV is a specification that allows a PCIe device to appear to be multiple separate physical PCIe devices for different VMs. With SR-IOV it is possible for multiple VMs to directly access the NIC without the overhead of traversing the host Linux kernel and hence accelerating communications. It can also be used to offload the CPU for service chaining as it also provides a mechanism for VMs to communicate with each other without passing the CPU. In this way it is a valid alternative for service chaining using OVS (Open Virtual Switch), which is dependent on the processing power of the CPU. The actual processing of data packets between the different VNFs is then performed by the hardware and buffers of the Network Interface Card.

Most of these acceleration techniques will improve the performance by a factor of 5 or more but the choice of whether or not to use these techniques will be dependent on the expected price/performance ratio of the services being deployed. In addition, it is important to align the different acceleration techniques in a service chain as performance will only be as good as the weakest link in the chain. Only in this way is it possible to maximize and optimize the performance of white box CPE for a given combination of services.