- EXPERT CONSULTING
- OUR EXPERTISE
- OUR SERVICES
- INTELLECTUAL CAPITAL
- BLOGS & NEWS
Intel’s Data Plane Development Kit (DPDK) has emerged as the key enabler for building the high-performance data planes needed by network functions virtualization applications. DPDK is a set of software libraries and drivers that can be integrated with virtual network functions (VNFs) or into virtual switches (or both) for fast packet performance in a server powered by an Intel® architecture processor. DPDK, initially developed by Intel, is now a Linux Foundation open source project.
Communications service providers are turning to NFV to help improve the agility and lower the costs of their network services. DPDK is one of the critical underlying technologies that improves data plane throughput so that VNFs deliver the deterministic response like the legacy, appliance-based applications they are replacing. With a long list of DPDK project implementations, combined with years of network software technology expertise, Northforge works with companies seeking a performance edge for their NFV applications.
Download Intel Network Builder’s solution brief on Northforge’s work with DPDK applications here.
In a building, each floor depends on the strength of the floors below it. Ensuring that the fifth floor is reinforced provides very little comfort if there is a structural problem on the third floor. In order to ensure that the building keeps standing, you need to reinforce every floor starting from the bottom.
And so it is with protocol stacks. Providing security at the transport layer (e.g., TLS) has questionable value if the packet exchange is compromised at the network or data link layer. Yet, we rarely worry about protecting these layers.
This new technical brief from Northforge describes some of the common attacks that can occur at the data link layer and how MAC-layer Security or MACsec (IEEE std 802.1AE™) can be used to provide hop-by-hop or end-to-end authentication and encryption to protect the lowest floors of your protocol building.
For a technical brief on MACsec, download here.
High performance I/O from a NIC
XEN and DPDK can be complementary. Clearly an EAL can be used within a VM running on top of the XEN hypervisor to provide high performance I/O from a NIC. However XEN and DPDK can also be alternatives for implementing a solution to the same problem. Consider a system that needs to perform the following sequence of functions
Using DPDK in a single VM, this could be decomposed as follows:
The solution can be implemented natively on top of XEN in multiple VMs like this:
The world of using virtual machines to implement virtualized network functions is still new. Each application is different, either administratively, functionally, or in performance requirements and therefore each will need an implementation that addresses its specific requirements. DPDK and XEN are two of the tools that can assist with these design problems.
If you need help designing and implementing your VNFs call Northforge Innovations. This is what we do.
By Larry S.
Implementing DPDK and Xen
With DPDK, packet processing is performed at the application layer in the virtual machine. Receive processing is based on polling the receive interface (using the EAL) rather than on interrupts. Interrupts require a fair amount of overhead in the “normal” case, but when interrupts must be propagated from the host operating system to the hypervisor to the guest operating system to the application, they end up being very expensive.
With DPDK, threads communicate through the use of shared memory queues. DPDK provides mechanisms for lockless (i.e. no blocking or synchronization needed) ring buffers that support single and multiple writers as well as single and multiple readers. These are called “rte-rings”.
For Rx (Receive Processing), the NIC (Network Interface) performs a Direct Memory transfer (DMA) to a buffer ring. If the NIC supports Receive Side Scaling (RSS), it can queue packets to different threads on different cores based on packet filters setup on the NIC. This increases the packet processing performance by spreading it over multiple cores.
The Proc function (Processing) can be scaled by decomposing it into a sequence of steps that can be organized as a pipeline. For example, if the processing is organized as three sequential steps, the three threads can be assigned to different cores and once the pipeline is full the system is, in effect, working on three packets simultaneously. There are a couple of different models for this depending on whether RSS is being used (which is shown in the top example in the following diagram.
The Tx (Transit Processing) is initiated by putting the packet on a shared queue. The Tx process can then transfer the packet to the NIC for transmission.
DPDK is intended for a solution where all of the threads are running on the same Virtual Machine.
Not all packet processing solutions are designed to run on a single Virtual Machine. There are administrative reasons for splitting the system across multiple virtual machines. For example, if the packet stream represents multiple customers, then it might be desirable to split the processing across multiple VMs to provide separation and protection between customers as well as facilitating billing. There are also functional reasons for splitting the system across multiple virtual machines. For example, if the server is providing both a client focused capability (such as DHCP) and also a network service such as an IP Router, then running these on different VMs makes sense.
DPDK can provide performance benefits within a single VM, but splitting the processing across VMs is a bit more problematic since each VM appears as an independent and self-contained machine, each with its own memory. Providing communications and data movement between these VMs could be done using a networking function (i.e. transmitting and receiving packets through a virtual switch), but there are performance problems with this. Using shared memory between processes is much faster.
This is where XEN can play a part. XEN provides inter-VM shared memory using a page grant mechanism. The XEN hypervisor runs in Domain 0 (what we normally think of as kernel mode). It has access to the hardware page tables and memory management functions. Most of the work is done in the User Domain, however. So, a process in Dom 0 (e.g., a NIC driver) or Dom U in a VM (e.g., an application) can share a page but making a request to XEN. XEN enters the page into a grant table and returns a handle for the page. The handle can then be provided to another process running on another VM to grant access to the page. This mechanism is called “xenstore”.
XEN provides support for non-blocking ring buffers in xenstore similar to DPDK rte-rings. These are called xenstore-rings.
In the final section we compare and summarize the two solutions. Comparing XEN and DPDK Solutions – A summary
By Larry S.
In general, packet processing applications follow a standard regimen:
The Rx part is, more or less, the same regardless of the type of packet processing. The Proc part is the heart of the application.
This could be an Ethernet Switching application, an IP Routing application, Deep Packet Inspection (DPI), or a protocol process such as DHCP.
In gateway applications (routing and switching) the Tx is usually a different interface than the Rx interface whereas with a protocol process the Tx is usually a response to the sender and therefore to the same interface as the Rx.
The performance requirements vary from application to application, but a 1Gbps Ethernet can transport approximately 1.5 million packets per second, so the requirements could be very steep.
One approach to achieving high performance is to employ multiple cores. For Rx, this can be done by using a hardware capability called Receive Side Scaling (RSS). With RSS the Network Interface (NIC) can queue packets to different threads on different cores based on fields in the packet. For Proc, this can be done by breaking the processing into multiple tasks and creating a pipeline. (sidebar on pipeline processing). Tx usually requires very little processing so there is little motivation for partitioning it.
By effectively using multiple cores, the packet processing performance of the system is greatly enhanced because the system is processing multiple packets simultaneously.
Based on the general packet processing model described here, we will next show how this model can be implemented using DPDK and XEN: Packet Processing with DPDK and XEN
By Andrei C.
– No host OS + paravirtualization support = performance improvement
XEN is a hypervisor. A hypervisor is a supervisory program (think, operating system) that provides support for virtual machines. Parallels (Parallels), VMWare (Dell), and Virtual Box (Oracle) are all hypervisors. They provide an environment that hosts a number of processes (virtual machines) where each virtual machine believes it is running on the underlying hardware. Each virtual machine contains a guest operating system (e.g., Windows, macOS, Linux) and one or more processes/applications running within the guest operating system. Each of these hypervisors sits on top of a host operating system (e.g., Windows, macOS, Linux).
It is common to run a hypervisor like Virtual Box on a Mac and load one or more Windows virtual machines in order to run applications that only run on Windows.
XEN is a bit different in several ways from the three hypervisors listed above. First, it is a “bare metal” system. It runs as the lowest level, right on top of the hardware — there is no host operating system. As you’d expect, this improves performance and efficiency.
Second, XEN supports paravirtualization. With paravirtualization, XEN provides APIs for many system functions and the guest operating system in the virtual machines can be rebuilt to access system resources and the I/O subsystem via this API.
There are several Linux distributions that have been recompiled to use the XEN interface. This can improve performance and also allow stronger support for virtual machines on CPUs that don’t have good VM support (some x86s, older ARMs, etc.) XEN also provides support for accessing the underlying hardware like the other hypervisors (this is called Hardware Virtual Machine, HVM). The final difference is that XEN is open-source.
One of the major problems with implementing network processing applications in virtual machines is implementing high performance I/O. Packet processing applications often need to process tens of thousands or even millions of packets per second. This is difficult at the application layer, in general, but even more difficult when the system I/O calls are made to a guest operating system which has to access the hard through the hypervisor and the host operating system.
The Data Plane Development Kit (DPDK) is a solution to this problem. DPDK is a framework that provides for creating software libraries tailored for specific hardware architectures (e.g., X86) and specific operating systems (e.g., Linux). These libraries (called Environment Abstraction Layer or EAL) provide high performance, generic (i.e., hardware and OS independent) access to hardware and operating resources including the I/O subsystem.
Using the DPDK EAL allows the development of high performance user-mode packet processing applications which can also be tuned to exploit multi-core CPUs.
Next, we provide an overview of a general packet processing model: General Packet Processing Model
By Larry S.
First in a five-part series
It is increasingly common to implement packet processing functions in virtual machines. This is what Network Functions Virtualization (NFV) is all about.
The most common implementation model for network functions has been to replicate the functions in the devices that are distributed around the network. This is the easiest way to do it, but taking a step back it becomes clear that this is not the most efficient way to do it both from a management point of view as well as from an overall efficiency point of view.
Consider DHCP for example. A network could have 50 edge routers running DHCP, but there is no reason why DHCP has to run in each edge router. The traffic load associated with DHCP is small so it makes sense to run each of the 50 DHCP instances as a Virtual Machine (VM) or possibly as a thread of a DHCP VM on a centralized server. There is the efficiency benefit in statistically multiplexing the computational load (all 50 instances are never running at the same time) and several management benefits such as only having to update a single system to fix bugs and add features and only having to configuration a single local system. Additionally, modern cloud computing technology provides the ability to migrate VMs for redundancy and to cloud burst for unexpected load peaks.
NFV requires high-performance software-based implementations of these packet processing functions in a virtual machine environment. These implementations must be able to read packets from the network, process them, and send packets out to the network. There are a variety of tools and techniques for implementing packet processing in Virtual Machines. In this blog series we will discuss two common, but very different approaches. One is the XEN Hypervisor and the other is the Data Plane Development Kit (DPDK). Depending on the functionality required, these two solutions can be used independently or concurrently. Over a sequence of blog posts we will describe these two approaches and how they can be used to implement high-performance packet-processing software.
In the next blog post we provide a brief overview of the XEN hypervisor and DPDK.
Are you looking to reduce costs and improve latency? The Multi Stage Content Aware Engine of the Broadcom StrataXGS allows for additional packet inspection and manipulation in the switching chip rather than using an external device like an FPGA, NPU or software on a host processor. That helps to reduce cost and also reduces packet processing latency by staying in the chip. It also enables a degree of Packet Inspection and packet handling decisions based on that inspection without needing to go off chip, again giving you cost reduction and latency improvement.
This content aware engine capability of Broadcom enables customers to replace FPGAs, redirect traffic to applications and develop diagnostic tools. The Broadcom XGS Content Aware engine provide the following capabilities:
The Content Aware Engine can be further classified in the following phases:
At Northforge we can help you program the content aware engine as per the application requirement and speed up your development process by fulfilling the required functionality by using the Broadcom chipset. We have hands-on experience with our customers in replacing their FPGAs and applications development like IPTV with the content aware engine of Broadcom to fulfill their complex deployment scenarios in the field.
Let’s say you want to develop a service where a SmartHome monitoring system detects smoke and a phone call is generated to alert the homeowner. With the ability to receive external events from any monitoring software, there’s a free, open source communications software for creating voice and messaging products for that.
FreeSWITCH is a softswitch for PBX applications which can create that phone call alert and then connect the homeowner to the 911 operator. In this situation, it allows you to generate calls for automation systems where you play audio files, collect user input, and then decide to make another call and have two parties talk to each other. In this case, with FreeSWITCH you can also receive external events from external programs, such as the home monitoring system, and can generate calls remotely. Its built-in Interactive Voice Response (IVR) system lets you play a pre-recorded audio file, collect user input (usually dtmf digits) and make decisions — either to play something else or to generate another call, hang up or basically anything else within the phone system. Some of this functionality is provided by external systems that are integrated using FreeSWITCH modules.
There’s more to FreeSWITCH than this one use case. You can use FreeSWITCH to develop text-to-speech engines where you can turn any text into an audio file and then play it, or have your audio files professionally pre-recorded and then play it while still receiving user input. Or the reverse, use it for speech recognition such as when you want cell phone voicemail messages to also appear in a text format. It’s very suitable for voicemail and eFax applications. FreeSWITCH works with Linux and Windows, and supports English, French, German and Russian languages.
FreeSWITCH is quite popular with telecommunications software developers who need to create automated telemarketing programs. Using FreeSWITCH as a base, you can generate automated telemarketing calls where FreeSwitch generates the call, plays a pre-recorded audio file and then hangs up.
It can be used to develop services which receive incoming calls to the FreeSWITCH for one leg of the call, play an audio file and based on what the user wants to do, connect to another application, such as voicemail. Or create a service where when the user receives a voicemail, they can also receive an email with an attached audio file telling them that a voicemail is waiting for them.
Our Northforge team has created voice and messaging services based on FreeSWITCH for several customers who want to expand or start new services. Based on our experience with this open source software, we can get our customers telecom programs up and running quickly.
In last few years mobile communications have dramatically increased in popularity and usage. This growth has inspired a development of advanced communication protocols offering higher throughput and reliability over wireless links.
Much of wireless technology is based on the principle of direct point-to-point communication, where participating nodes “speak” directly to a centralized access point.
However, there is an alternative, “multi-hop” approach, where the nodes communicate to each other using other nodes as relays for traffic if the endpoint is out of direct communication range.
Mobile Ad hoc NETwork (MANET), described here, uses the multi-hop model.
Wikipedia describes MANET (Mobile Ad Hoc Network) as a continuously self-configuring, infrastructure-less network of mobile devices connected wirelessly.
All the nodes (devices) are wireless, mobile and equal (no access points, base stations, or any other kind of infrastructure).
The best comparison could be a cellular network WITHOUT base stations, where all the phones need to create multi-hop mesh network.
Infrastructure-based Network (i.e. Cellular):
Infrastructure-less Network (MANET – Mobile Ad hoc Network):
These networks are self-configuring and can be set up randomly and on-demand. Such networks can have dynamically changing multi-hop topologies, composed of, likely, bandwidth-constrained wireless links.
The concept of the mobile ad-hoc network suggests the incorporation of routing functionality into mobile nodes, in other words all nodes should be able to act as routers for each other.
Since an infrastructure-based network is always a better solution than an infrastructure-less network in the meaning of network performance, MANET is relevant only in cases when laying the infrastructure is impossible or is not practical:
MANET – Layer-3 Routing Core
Ad-hoc networks are not restricted to special hardware or a certain link layer. MANET is a routing core (Layer-3 routing protocols) running on top of any possible Layer-2 wireless medium that is able to provide connectivity between the neighboring (1-hop) nodes:
It is important to note a difference between MANET routing and traditional IP routing. Routing in fixed networks is based on aggregation combined with best matching. When a packet is to be forwarded, the routing table is consulted and the packet is transmitted on the interface registered with a route containing the best match for the destination, i.e. all hosts within the same subnet are available on a single one-hop network segment via routers. However, in MANETs nodes route traffic by transmitting packets on the interface it has arrived from.
Aggregation is not required in MANETs, as all routing is host based and for all destinations within MANET, a sender has a specific route.
There are two principal approaches for route maintenance in MANET – reactive and proactive:
Northforge implemented the Optimized Link State Routing Protocol (OLSR). OLSR is an IP routing protocol optimized for mobile and wireless ad hoc networks. The protocol was integrated in a commercial routing stack suite. OLSR is documented in RFC3626 and uses the link-state scheme in an optimized manner to propagate topology information. The optimization is based on a technique called MultiPoint Relaying.
OLSR operation mainly consists of updating and maintaining information in routing tables. The data in these tables is based on received control traffic and the control traffic is generated based on information retrieved from the tables.
A general MANET network is illustrated below:
Key to Success: Network Simulation
Network simulation is designed for characterizing, creating and validating the communication solutions, computer networks and distributed or parallel systems. It enables the prediction of network behavior and performance. One can create, run and analyze any desired communication scenario.
Generally, a simulation is the only method that allows continuous development, testing and debugging of a network comprised of hundreds and thousands of mobile MANET nodes, since a standard lab won’t do, and field tests are expensive, difficult to operate and non-deterministic.
One of the challenges during the development was testing OLSR with various topologies, e.g. two nodes, three 1-hop neighbors or 2-hop neighbor. In order to validate correct behavior of OLSR, it was important to emulate the dynamic nature of MANET, where nodes can roam around, come up and come down. To address these challenges, we decided to deploy a virtualized test environment, based on Linux containers (LXC), thus enabling execution of multiple OS instances on a single x86 machine.
In addition, Network Simulation has a very important practical usage: it can be supplied as a Network Planning System add-on product for MANET core.
This application supplies the following abilities:
We learned a lot during this MANET project, specifically the ability to interpret and implement RFCs in MANET situations and how network simulation based on LXC is a key to success. We’ll be taking this knowledge to future MANET projects as mobile ad-hoc networks grow in use in this growing mobile network environment.
Authors: Oleg P. and Sasha I.