Implementing DPDK and Xen
With DPDK, packet processing is performed at the application layer in the virtual machine. Receive processing is based on polling the receive interface (using the EAL) rather than on interrupts. Interrupts require a fair amount of overhead in the “normal” case, but when interrupts must be propagated from the host operating system to the hypervisor to the guest operating system to the application, they end up being very expensive.
With DPDK, threads communicate through the use of shared memory queues. DPDK provides mechanisms for lockless (i.e. no blocking or synchronization needed) ring buffers that support single and multiple writers as well as single and multiple readers. These are called “rte-rings”.
For Rx (Receive Processing), the NIC (Network Interface) performs a Direct Memory transfer (DMA) to a buffer ring. If the NIC supports Receive Side Scaling (RSS), it can queue packets to different threads on different cores based on packet filters setup on the NIC. This increases the packet processing performance by spreading it over multiple cores.
The Proc function (Processing) can be scaled by decomposing it into a sequence of steps that can be organized as a pipeline. For example, if the processing is organized as three sequential steps, the three threads can be assigned to different cores and once the pipeline is full the system is, in effect, working on three packets simultaneously. There are a couple of different models for this depending on whether RSS is being used (which is shown in the top example in the following diagram.
The Tx (Transit Processing) is initiated by putting the packet on a shared queue. The Tx process can then transfer the packet to the NIC for transmission.
DPDK is intended for a solution where all of the threads are running on the same Virtual Machine.
Packet Processing with XEN
Not all packet processing solutions are designed to run on a single Virtual Machine. There are administrative reasons for splitting the system across multiple virtual machines. For example, if the packet stream represents multiple customers, then it might be desirable to split the processing across multiple VMs to provide separation and protection between customers as well as facilitating billing. There are also functional reasons for splitting the system across multiple virtual machines. For example, if the server is providing both a client focused capability (such as DHCP) and also a network service such as an IP Router, then running these on different VMs makes sense.
DPDK can provide performance benefits within a single VM, but splitting the processing across VMs is a bit more problematic since each VM appears as an independent and self-contained machine, each with its own memory. Providing communications and data movement between these VMs could be done using a networking function (i.e. transmitting and receiving packets through a virtual switch), but there are performance problems with this. Using shared memory between processes is much faster.
This is where XEN can play a part. XEN provides inter-VM shared memory using a page grant mechanism. The XEN hypervisor runs in Domain 0 (what we normally think of as kernel mode). It has access to the hardware page tables and memory management functions. Most of the work is done in the User Domain, however. So, a process in Dom 0 (e.g., a NIC driver) or Dom U in a VM (e.g., an application) can share a page but making a request to XEN. XEN enters the page into a grant table and returns a handle for the page. The handle can then be provided to another process running on another VM to grant access to the page. This mechanism is called “xenstore”.
XEN provides support for non-blocking ring buffers in xenstore similar to DPDK rte-rings. These are called xenstore-rings.
In the final section we compare and summarize the two solutions. XEN and DPDK Solutions: Alternative to Polling >
By Larry S.