- EXPERT CONSULTING
- OUR EXPERTISE
- OUR SERVICES
- INTELLECTUAL CAPITAL
- BLOGS & NEWS
Implementing DPDK and Xen
With DPDK, packet processing is performed at the application layer in the virtual machine. Receive processing is based on polling the receive interface (using the EAL) rather than on interrupts. Interrupts require a fair amount of overhead in the “normal” case, but when interrupts must be propagated from the host operating system to the hypervisor to the guest operating system to the application, they end up being very expensive.
With DPDK, threads communicate through the use of shared memory queues. DPDK provides mechanisms for lockless (i.e. no blocking or synchronization needed) ring buffers that support single and multiple writers as well as single and multiple readers. These are called “rte-rings”.
For Rx (Receive Processing), the NIC (Network Interface) performs a Direct Memory transfer (DMA) to a buffer ring. If the NIC supports Receive Side Scaling (RSS), it can queue packets to different threads on different cores based on packet filters setup on the NIC. This increases the packet processing performance by spreading it over multiple cores.
The Proc function (Processing) can be scaled by decomposing it into a sequence of steps that can be organized as a pipeline. For example, if the processing is organized as three sequential steps, the three threads can be assigned to different cores and once the pipeline is full the system is, in effect, working on three packets simultaneously. There are a couple of different models for this depending on whether RSS is being used (which is shown in the top example in the following diagram.
The Tx (Transit Processing) is initiated by putting the packet on a shared queue. The Tx process can then transfer the packet to the NIC for transmission.
DPDK is intended for a solution where all of the threads are running on the same Virtual Machine.
Not all packet processing solutions are designed to run on a single Virtual Machine. There are administrative reasons for splitting the system across multiple virtual machines. For example, if the packet stream represents multiple customers, then it might be desirable to split the processing across multiple VMs to provide separation and protection between customers as well as facilitating billing. There are also functional reasons for splitting the system across multiple virtual machines. For example, if the server is providing both a client focused capability (such as DHCP) and also a network service such as an IP Router, then running these on different VMs makes sense.
DPDK can provide performance benefits within a single VM, but splitting the processing across VMs is a bit more problematic since each VM appears as an independent and self-contained machine, each with its own memory. Providing communications and data movement between these VMs could be done using a networking function (i.e. transmitting and receiving packets through a virtual switch), but there are performance problems with this. Using shared memory between processes is much faster.
This is where XEN can play a part. XEN provides inter-VM shared memory using a page grant mechanism. The XEN hypervisor runs in Domain 0 (what we normally think of as kernel mode). It has access to the hardware page tables and memory management functions. Most of the work is done in the User Domain, however. So, a process in Dom 0 (e.g., a NIC driver) or Dom U in a VM (e.g., an application) can share a page but making a request to XEN. XEN enters the page into a grant table and returns a handle for the page. The handle can then be provided to another process running on another VM to grant access to the page. This mechanism is called “xenstore”.
XEN provides support for non-blocking ring buffers in xenstore similar to DPDK rte-rings. These are called xenstore-rings.
In the final section we compare and summarize the two solutions. Comparing XEN and DPDK Solutions – A summary
By Larry S.
In general, packet processing applications follow a standard regimen:
The Rx part is, more or less, the same regardless of the type of packet processing. The Proc part is the heart of the application.
This could be an Ethernet Switching application, an IP Routing application, Deep Packet Inspection (DPI), or a protocol process such as DHCP.
In gateway applications (routing and switching) the Tx is usually a different interface than the Rx interface whereas with a protocol process the Tx is usually a response to the sender and therefore to the same interface as the Rx.
The performance requirements vary from application to application, but a 1Gbps Ethernet can transport approximately 1.5 million packets per second, so the requirements could be very steep.
One approach to achieving high performance is to employ multiple cores. For Rx, this can be done by using a hardware capability called Receive Side Scaling (RSS). With RSS the Network Interface (NIC) can queue packets to different threads on different cores based on fields in the packet. For Proc, this can be done by breaking the processing into multiple tasks and creating a pipeline. (sidebar on pipeline processing). Tx usually requires very little processing so there is little motivation for partitioning it.
By effectively using multiple cores, the packet processing performance of the system is greatly enhanced because the system is processing multiple packets simultaneously.
Based on the general packet processing model described here, we will next show how this model can be implemented using DPDK and XEN: Packet Processing with DPDK and XEN
By Andrei C.
– No host OS + paravirtualization support = performance improvement
XEN is a hypervisor. A hypervisor is a supervisory program (think, operating system) that provides support for virtual machines. Parallels (Parallels), VMWare (Dell), and Virtual Box (Oracle) are all hypervisors. They provide an environment that hosts a number of processes (virtual machines) where each virtual machine believes it is running on the underlying hardware. Each virtual machine contains a guest operating system (e.g., Windows, macOS, Linux) and one or more processes/applications running within the guest operating system. Each of these hypervisors sits on top of a host operating system (e.g., Windows, macOS, Linux).
It is common to run a hypervisor like Virtual Box on a Mac and load one or more Windows virtual machines in order to run applications that only run on Windows.
XEN is a bit different in several ways from the three hypervisors listed above. First, it is a “bare metal” system. It runs as the lowest level, right on top of the hardware — there is no host operating system. As you’d expect, this improves performance and efficiency.
Second, XEN supports paravirtualization. With paravirtualization, XEN provides APIs for many system functions and the guest operating system in the virtual machines can be rebuilt to access system resources and the I/O subsystem via this API.
There are several Linux distributions that have been recompiled to use the XEN interface. This can improve performance and also allow stronger support for virtual machines on CPUs that don’t have good VM support (some x86s, older ARMs, etc.) XEN also provides support for accessing the underlying hardware like the other hypervisors (this is called Hardware Virtual Machine, HVM). The final difference is that XEN is open-source.
One of the major problems with implementing network processing applications in virtual machines is implementing high performance I/O. Packet processing applications often need to process tens of thousands or even millions of packets per second. This is difficult at the application layer, in general, but even more difficult when the system I/O calls are made to a guest operating system which has to access the hard through the hypervisor and the host operating system.
The Data Plane Development Kit (DPDK) is a solution to this problem. DPDK is a framework that provides for creating software libraries tailored for specific hardware architectures (e.g., X86) and specific operating systems (e.g., Linux). These libraries (called Environment Abstraction Layer or EAL) provide high performance, generic (i.e., hardware and OS independent) access to hardware and operating resources including the I/O subsystem.
Using the DPDK EAL allows the development of high performance user-mode packet processing applications which can also be tuned to exploit multi-core CPUs.
Next, we provide an overview of a general packet processing model: General Packet Processing Model
By Larry S.
First in a five-part series
It is increasingly common to implement packet processing functions in virtual machines. This is what Network Functions Virtualization (NFV) is all about.
The most common implementation model for network functions has been to replicate the functions in the devices that are distributed around the network. This is the easiest way to do it, but taking a step back it becomes clear that this is not the most efficient way to do it both from a management point of view as well as from an overall efficiency point of view.
Consider DHCP for example. A network could have 50 edge routers running DHCP, but there is no reason why DHCP has to run in each edge router. The traffic load associated with DHCP is small so it makes sense to run each of the 50 DHCP instances as a Virtual Machine (VM) or possibly as a thread of a DHCP VM on a centralized server. There is the efficiency benefit in statistically multiplexing the computational load (all 50 instances are never running at the same time) and several management benefits such as only having to update a single system to fix bugs and add features and only having to configuration a single local system. Additionally, modern cloud computing technology provides the ability to migrate VMs for redundancy and to cloud burst for unexpected load peaks.
NFV requires high-performance software-based implementations of these packet processing functions in a virtual machine environment. These implementations must be able to read packets from the network, process them, and send packets out to the network. There are a variety of tools and techniques for implementing packet processing in Virtual Machines. In this blog series we will discuss two common, but very different approaches. One is the XEN Hypervisor and the other is the Data Plane Development Kit (DPDK). Depending on the functionality required, these two solutions can be used independently or concurrently. Over a sequence of blog posts we will describe these two approaches and how they can be used to implement high-performance packet-processing software.
In the next blog post we provide a brief overview of the XEN hypervisor and DPDK.
Are you looking to reduce costs and improve latency? The Multi Stage Content Aware Engine of the Broadcom StrataXGS allows for additional packet inspection and manipulation in the switching chip rather than using an external device like an FPGA, NPU or software on a host processor. That helps to reduce cost and also reduces packet processing latency by staying in the chip. It also enables a degree of Packet Inspection and packet handling decisions based on that inspection without needing to go off chip, again giving you cost reduction and latency improvement.
This content aware engine capability of Broadcom enables customers to replace FPGAs, redirect traffic to applications and develop diagnostic tools. The Broadcom XGS Content Aware engine provide the following capabilities:
The Content Aware Engine can be further classified in the following phases:
At Northforge we can help you program the content aware engine as per the application requirement and speed up your development process by fulfilling the required functionality by using the Broadcom chipset. We have hands-on experience with our customers in replacing their FPGAs and applications development like IPTV with the content aware engine of Broadcom to fulfill their complex deployment scenarios in the field.
Let’s say you want to develop a service where a SmartHome monitoring system detects smoke and a phone call is generated to alert the homeowner. With the ability to receive external events from any monitoring software, there’s a free, open source communications software for creating voice and messaging products for that.
FreeSWITCH is a softswitch for PBX applications which can create that phone call alert and then connect the homeowner to the 911 operator. In this situation, it allows you to generate calls for automation systems where you play audio files, collect user input, and then decide to make another call and have two parties talk to each other. In this case, with FreeSWITCH you can also receive external events from external programs, such as the home monitoring system, and can generate calls remotely. Its built-in Interactive Voice Response (IVR) system lets you play a pre-recorded audio file, collect user input (usually dtmf digits) and make decisions — either to play something else or to generate another call, hang up or basically anything else within the phone system. Some of this functionality is provided by external systems that are integrated using FreeSWITCH modules.
There’s more to FreeSWITCH than this one use case. You can use FreeSWITCH to develop text-to-speech engines where you can turn any text into an audio file and then play it, or have your audio files professionally pre-recorded and then play it while still receiving user input. Or the reverse, use it for speech recognition such as when you want cell phone voicemail messages to also appear in a text format. It’s very suitable for voicemail and eFax applications. FreeSWITCH works with Linux and Windows, and supports English, French, German and Russian languages.
FreeSWITCH is quite popular with telecommunications software developers who need to create automated telemarketing programs. Using FreeSWITCH as a base, you can generate automated telemarketing calls where FreeSwitch generates the call, plays a pre-recorded audio file and then hangs up.
It can be used to develop services which receive incoming calls to the FreeSWITCH for one leg of the call, play an audio file and based on what the user wants to do, connect to another application, such as voicemail. Or create a service where when the user receives a voicemail, they can also receive an email with an attached audio file telling them that a voicemail is waiting for them.
Our Northforge team has created voice and messaging services based on FreeSWITCH for several customers who want to expand or start new services. Based on our experience with this open source software, we can get our customers telecom programs up and running quickly.
In last few years mobile communications have dramatically increased in popularity and usage. This growth has inspired a development of advanced communication protocols offering higher throughput and reliability over wireless links.
Much of wireless technology is based on the principle of direct point-to-point communication, where participating nodes “speak” directly to a centralized access point.
However, there is an alternative, “multi-hop” approach, where the nodes communicate to each other using other nodes as relays for traffic if the endpoint is out of direct communication range.
Mobile Ad hoc NETwork (MANET), described here, uses the multi-hop model.
Wikipedia describes MANET (Mobile Ad Hoc Network) as a continuously self-configuring, infrastructure-less network of mobile devices connected wirelessly.
All the nodes (devices) are wireless, mobile and equal (no access points, base stations, or any other kind of infrastructure).
The best comparison could be a cellular network WITHOUT base stations, where all the phones need to create multi-hop mesh network.
Infrastructure-based Network (i.e. Cellular):
Infrastructure-less Network (MANET – Mobile Ad hoc Network):
These networks are self-configuring and can be set up randomly and on-demand. Such networks can have dynamically changing multi-hop topologies, composed of, likely, bandwidth-constrained wireless links.
The concept of the mobile ad-hoc network suggests the incorporation of routing functionality into mobile nodes, in other words all nodes should be able to act as routers for each other.
Since an infrastructure-based network is always a better solution than an infrastructure-less network in the meaning of network performance, MANET is relevant only in cases when laying the infrastructure is impossible or is not practical:
MANET – Layer-3 Routing Core
Ad-hoc networks are not restricted to special hardware or a certain link layer. MANET is a routing core (Layer-3 routing protocols) running on top of any possible Layer-2 wireless medium that is able to provide connectivity between the neighboring (1-hop) nodes:
It is important to note a difference between MANET routing and traditional IP routing. Routing in fixed networks is based on aggregation combined with best matching. When a packet is to be forwarded, the routing table is consulted and the packet is transmitted on the interface registered with a route containing the best match for the destination, i.e. all hosts within the same subnet are available on a single one-hop network segment via routers. However, in MANETs nodes route traffic by transmitting packets on the interface it has arrived from.
Aggregation is not required in MANETs, as all routing is host based and for all destinations within MANET, a sender has a specific route.
There are two principal approaches for route maintenance in MANET – reactive and proactive:
Northforge implemented the Optimized Link State Routing Protocol (OLSR). OLSR is an IP routing protocol optimized for mobile and wireless ad hoc networks. The protocol was integrated in a commercial routing stack suite. OLSR is documented in RFC3626 and uses the link-state scheme in an optimized manner to propagate topology information. The optimization is based on a technique called MultiPoint Relaying.
OLSR operation mainly consists of updating and maintaining information in routing tables. The data in these tables is based on received control traffic and the control traffic is generated based on information retrieved from the tables.
A general MANET network is illustrated below:
Key to Success: Network Simulation
Network simulation is designed for characterizing, creating and validating the communication solutions, computer networks and distributed or parallel systems. It enables the prediction of network behavior and performance. One can create, run and analyze any desired communication scenario.
Generally, a simulation is the only method that allows continuous development, testing and debugging of a network comprised of hundreds and thousands of mobile MANET nodes, since a standard lab won’t do, and field tests are expensive, difficult to operate and non-deterministic.
One of the challenges during the development was testing OLSR with various topologies, e.g. two nodes, three 1-hop neighbors or 2-hop neighbor. In order to validate correct behavior of OLSR, it was important to emulate the dynamic nature of MANET, where nodes can roam around, come up and come down. To address these challenges, we decided to deploy a virtualized test environment, based on Linux containers (LXC), thus enabling execution of multiple OS instances on a single x86 machine.
In addition, Network Simulation has a very important practical usage: it can be supplied as a Network Planning System add-on product for MANET core.
This application supplies the following abilities:
We learned a lot during this MANET project, specifically the ability to interpret and implement RFCs in MANET situations and how network simulation based on LXC is a key to success. We’ll be taking this knowledge to future MANET projects as mobile ad-hoc networks grow in use in this growing mobile network environment.
Authors: Oleg P. and Sasha I.
With the Internet of Things gaining popularity, embedded devices are getting more attention. An operating system that’s also getting its share of attention is OpenWrt, an OS that’s primarily for embedded networking devices. It’s based on the Linux kernel and primarily used on embedded devices to route network traffic – essentially a Linux distribution for your router.
If you are developing an application, OpenWrt gives you the framework to build an application without having to build a complete firmware around the application – giving you the ability to fully customize devices in ways you never imagined. While OpenWrt isn’t the ideal solution for everyone, OpenWrt is flexible and can be installed on various routers. OpenWrt has a web interface, but if you just want more than a web interface, you’re probably better off with another replacement router firmware.
Having a modular Linux distribution available on your router gives you lots of opportunities, including setting up a proper VPN on your OpenWrt router; running server software such as a web server, IRC server, or BitTorrent tracker on a router to use less power than on a computer – perfect for lightweight servers; create a special wireless guest network for security purposes; or capturing and analyzing network traffic.
OpenWrt is being adopted in many consumer grade and small business products, but out of the box, OpenWrt lacks many of the functions required for an enterprise class router to make it ready for larger enterprises or carriers to use.
An enterprise class router needs many functions such as a CLI management interface; multiple native routing protocols; multicast; traffic management (queuing, policy based routing, etc.); security (NAT, ACL, VPN, AAA, etc.); interface to manage L2 switch; flow control; OAM for hardware; statistics such as RMON and others; flash image management; and much more.
Some of the key features OpenWrt lacks are hardware monitoring; storing two or more firmware images (for rollback etc.); CLI; extensive statistics; traffic management; some AAA (TACACS+, Radius); and many multicast features. Extensive development is required to build out OpenWrt for use in an enterprise router. In addition, extensive QA would be required to ensure proper operation and robustness in an enterprise environment.
For one customer we looked at whether OpenWrt (release 14.07) would be a suitable option for their enterprise router. The goal was to adapt OpenWrt to meet their needs, taking consumer grade quality and completeness and converting it to enterprise grade quality and completeness. We identified 51 features in the customer’s requirements which weren’t supported in OpenWrt along with a unified CLI needed to cover all features for L2 switches and L3 routers. We would need to apply our experience and support to new development work in the customer’s infrastructure, management interfaces, native protocols, multicast, encapsulation, traffic management, routing IPv4/IPv6, L2, security, and L3 OAM.
Though OpenWrt is not suitable for all environments, it may be time for you to start thinking about what’s possible with OpenWrt for your applications. If you’re hit with challenges and we’ve been there too, we can help develop a solution that takes advantage of the flexibility OpenWrt has to offer.
While the expectation in the network industry was that Network Functions Virtualization (NFV) would primarily be for deployments in the cloud, in particular in data centers, the surprise is that NFV is proving to be most effective when used on virtual Customer Premise Equipment (vCPE). Sure, NFV and its cousin SDN implementations were originally focused on the cloud, but the reality is that vCPE and on-the-cloud-edge applications have, at the moment, the most to benefit from NFV adoption.
The first reason is a much faster ROI. Enterprises are always looking to find ways to save on network equipment costs. It is much easier to assess the savings generated by deploying vCPE, either with Common-Off-the-Shelf (COTS) equipment, or with hybrid Wide Area Network (WAN) Integrated Access Device (IAD) equipped with one or more x86 blades, for very specific and well-defined Virtual Network Functions (VNF): firewall, switch, router, and PBX.
The second reason is the benefit of having consolidated network management. Virtual CPE based on x86 offers the possibility of using a single network management application to configure, provision and monitor all VNFs, as opposed to a more cumbersome and expensive Network Management System (NMS) solution requiring the integration of various Element Management Systems (EMS) from different OEMs. And even compared with the edge VNFs configuration in the cloud data center, the vCPE network management is more cost effective.
The third reason, but not less important, is security. Security is a main concern today in network communications. By definition, access is a security characteristic, and it happens on the edge between the cloud and the customer premises. It then makes sense to implement the firewall, the Deep Packet Inspection (DPI), and the Distributed Denial of Service (DDoS) mitigation solutions as VNFs on customer premises.
Because of our experience in developing unique solutions for distributed NFV at the customer edge, we’re working with several OEMs and ODMs to develop NFV solutions such as a virtual PBX (vIPBX) running as VNF on vCPE. The surprise to the industry may be that NFV is proving to be most effective when used on vCPE, but we’ve already have been doing some trail blazing here.
Author: Stefan M.
The ongoing exponential growth of traffic moving through the interconnected world drives a continuous upgrade cycle for equipment in the data path and control network. For home and business subscribers, gigabit connectivity per termination is now in reach. This drives the upgrade cycle, which touches all the sub-systems of the network element.
Whether the data path is primarily mobile backhaul, Metro Ethernet, or GPON, it is likely to have somewhere within it, an Ethernet switching component. One of the most common upgrades of this component is the move from 10G to 25/40/100G. This blog post summarizes Northforge’s recent experience in creating a network element upgrade.
The upgrade started with the switching silicon. Generally, the change of switching silicon will also trigger software changes. These may include an upgrade of the switch SDK, the control processor hardware, the control plane OS, and whatever version of L2/L3 protocol stacks are running on the hardware. These low level changes can require substantial development resources.
When, for example, Broadcom switching silicon is found in the Ethernet infrastructure, the move to 25/40/100 G will include an upgrade of the Broadcom switch to Trident II, Trident II+, or Tomahawk. This means that a version of the Broadcom SDK that supports one of the latest chips will also be needed – another upgrade. There is typically Layer 2 and Layer 3 control plane software. Examples of this software layer would be Broadcom’s FASTPATH®, Metaswitch’s Network Interconnect, or IP Infusion’s ZebOS®. Most likely this software was dependent on an earlier version of the SDK and will also need to be upgraded.
Since the switch hardware has been changed, the platform vendor may choose a new multi-core processor to improve performance, reduce power, or decrease component cost. Once the control processor is selected, a version of the OS that supports all of the software upgrades listed above will have to be selected. This requires a new board support package (BSP) for that OS and control processor. All of the low level hardware and software needs to be integrated and tested.
Where this upgrade process can get tricky is at the application layer. The application layer will typically have been designed to be largely independent of the bandwidth provided by the underlying hardware. Its user interface and features will be well known to users and operations personnel. It will have been thoroughly tested by QA. In short, it may be a requirement that the upper layers of the application undergo no modification whatsoever. This dictates that an interface between the application layer and the lower layers of the software will have to be developed, isolating the application layers from those changes.
At the end of the upgrade cycle, the users and administrators still have a familiar look and feel to the service. However, the bandwidth has been increased by as much as a factor of 10x. This enables the delivery of more data to more people in a constrained cost and power envelope. Using Northforge’s development resources, highly skilled in networking hardware and software, the entire upgrade cycle was accomplished in a timely and cost-effective manner.
Northforge Innovations can help you in upgrading your network connected equipment to the latest and greatest Ethernet technologies. Our experience with management plane, control plane, data plane, BSPs, Switching SDKs, and network application development will help ensure the success of your Ethernet upgrade project.
Author: Harry N.