781.897.1727

The Northforge development team brings deep networking DNA to each project. With expertise extending beyond software development services, our team tackles our customers’ most demanding challenges and delivers innovative solutions. Here are some perspectives on today’s hottest technologies and trends from Northforge’s team of experts.

Our Latest Blog Article

Northforge Utilizes its DPDK Expertise to Develop High-Performance NFV Data Plane Applications

 

Blog Index

Building Integrity into your Network

The XEN + DPDK Alternative to Pulling

There’s a Better Approach for Increased Performance and Efficiency

Use Multiple Cores in Your General Packet Processing Model to Enhance Performance  

XEN, Not Your Regular Hypervisor, and DPDK

Hold On – There’s a More Efficient Way to do Packet Processing in Virtual Machines

Speed Up Your Development Process with Functionality of Broadcom StrataXGS

FreeSWITCH Gives You Options for Voice and Messaging Services

Consider MANET (Mobile Ad Hoc Network) for when you can’t do direct point-to-point

Looking at OpenWrt as an Option for Enterprise Routers

The Surprise of NFV

Upgrading from 1G to 25/40/100G and Beyond

Opportunities for More Advanced SDN-based Traffic Classification

OCP Networking — Will It Catch on Throughout the Industry?

Mitigating Malicious Attacks that Interrupt Network Services — Distributed Denial of Service (DDoS)

How to Improve Packet Buffering Performance with Broadcom Smart-Buffer Technology

The Lost Battles of Network Security

How Attackers of Network Security Craft their Attack

Network Security from the Perspective of Security Vendors

Control Plane Protocols for Streaming Video Services

How to Reduce Latency and Improve Audio Fidelity

How “Dynamic Pathing” Provides Benefits Not Achievable with Static Physical Device Chains

How Automatic Scaling and Optimal Pathing Benefits Service Function Chaining

OPNET Modeler (currently, Riverbed Modeler)

Taking IP PBX to the next level with Network Functions Virtualization

Are we approaching the era of Dynamic Spectrum Auction?

Will 5G Deliver on Expectations?

Implementing hardware accelerated IP route cache in Linux using Broadcom StrataXGS multilayer switch

Migration of existing applications to Cavium OCTEON using its SDK/ADK and/or Linux Kernel for acceleration

Northforge Increases Performance of Mission Critical Device Powered by Cavium OCTEON® Processor

What Not to Bring into the Cloud

Challenges that need to be overcome before NVF can be commercialized

How is the industry going to implement Network Functions Virtualization?

What is NFV and why is it significant to the networking industry?

What is OpenID and why is it important?

Why use DPI for Virtualized Network Traffic Analysis?

SIP Security: How Much is Enough?

OpenStack Virtualized Network Simply Explained

OpenStack Metrics Simply Explained

OpenStack Simply Explained

Network Traffic Increase and NP based DPI

What Does CSCF Do in Your IMS Network?

A Comparison of VoIP Platforms: Asterisk vs. Freeswitch

Measuring the Business Value of Cloud Computing

Advancing Network and Security Performance

IMS: Delivering Multimedia over Next Generation Networks

Puzzled over a Cloud Launch

AppDev – More than Mobile

Development Strategy 2.0

What’s Your Cloud Vision

 

 

 

August 25, 2017 – Northforge Utilizes its DPDK Expertise to Develop High-Performance NFV Data Plane Applications

Intel’s Data Plane Development Kit (DPDK) has emerged as the key enabler for building the high-performance data planes needed by network functions virtualization applications. DPDK is a set of software libraries and drivers that can be integrated with virtual network functions (VNFs) or into virtual switches (or both) for fast packet performance in a server powered by an Intel® architecture processor. DPDK, initially developed by Intel, is now a Linux Foundation open source project.

Communications service providers are turning to NFV to help improve the agility and lower the costs of their network services. DPDK is one of the critical underlying technologies that improves data plane throughput so that VNFs deliver the deterministic response like the legacy, appliance-based applications they are replacing. With a long list of DPDK project implementations, combined with years of network software technology expertise, Northforge works with companies seeking a performance edge for their NFV applications.

Download Intel Network Builder’s solution brief on Northforge’s work with DPDK applications here.

 

July 17, 2017 – Building Integrity into your Network

In a building, each floor depends on the strength of the floors below it. Ensuring that the fifth floor is reinforced provides very little comfort if there is a structural problem on the third floor. In order to ensure that the building keeps standing, you need to reinforce every floor starting from the bottom.

And so it is with protocol stacks. Providing security at the transport layer (e.g., TLS) has questionable value if the packet exchange is compromised at the network or data link layer. Yet, we rarely worry about protecting these layers.

This new technical brief from Northforge describes some of the common attacks that can occur at the data link layer and how MAC-layer Security or MACsec (IEEE std 802.1AE™) can be used to provide hop-by-hop or end-to-end authentication and encryption to protect the lowest floors of your protocol building.

For a technical brief on MACsec, download here.

 

June 26, 2017 – The XEN + DPDK Alternative to Pulling

High performance I/O from a NIC

XEN and DPDK can be complementary. Clearly an EAL can be used within a VM running on top of the XEN hypervisor to provide high performance I/O from a NIC. However XEN and DPDK can also be alternatives for implementing a solution to the same problem. Consider a system that needs to perform the following sequence of functions

Using DPDK in a single VM, this could be decomposed as follows:

 

The solution can be implemented natively on top of XEN in multiple VMs like this:

 

 

The world of using virtual machines to implement virtualized network functions is still new. Each application is different, either administratively, functionally, or in performance requirements and therefore each will need an implementation that addresses its specific requirements. DPDK and XEN are two of the tools that can assist with these design problems.

If you need help designing and implementing your VNFs call Northforge Innovations. This is what we do.

By Larry S.

June 9, 2017 – There’s a Better Approach for Increased Performance and Efficiency

Implementing DPDK and Xen

With DPDK, packet processing is performed at the application layer in the virtual machine. Receive processing is based on polling the receive interface (using the EAL) rather than on interrupts. Interrupts require a fair amount of overhead in the “normal” case, but when interrupts must be propagated from the host operating system to the hypervisor to the guest operating system to the application, they end up being very expensive.

With DPDK, threads communicate through the use of shared memory queues. DPDK provides mechanisms for lockless (i.e. no blocking or synchronization needed) ring buffers that support single and multiple writers as well as single and multiple readers. These are called “rte-rings”.

There’s a better approach for increased performance and efficiency

For Rx (Receive Processing), the NIC (Network Interface) performs a Direct Memory transfer (DMA) to a buffer ring. If the NIC supports Receive Side Scaling (RSS), it can queue packets to different threads on different cores based on packet filters setup on the NIC. This increases the packet processing performance by spreading it over multiple cores.

The Proc function (Processing) can be scaled by decomposing it into a sequence of steps that can be organized as a pipeline. For example, if the processing is organized as three sequential steps, the three threads can be assigned to different cores and once the pipeline is full the system is, in effect, working on three packets simultaneously. There are a couple of different models for this depending on whether RSS is being used (which is shown in the top example in the following diagram.

Implementing DPDK and Xen fig 2

The Tx (Transit Processing) is initiated by putting the packet on a shared queue. The Tx process can then transfer the packet to the NIC for transmission.

DPDK is intended for a solution where all of the threads are running on the same Virtual Machine.

Packet Processing with XEN

​Not all packet processing solutions are designed to run on a single Virtual Machine. There are administrative reasons for splitting the system across multiple virtual machines. For example, if the packet stream represents multiple customers, then it might be desirable to split the processing across multiple VMs to provide separation and protection between customers as well as facilitating billing. There are also functional reasons for splitting the system across multiple virtual machines. For example, if the server is providing both a client focused capability (such as DHCP) and also a network service such as an IP Router, then running these on different VMs makes sense.

DPDK can provide performance benefits within a single VM, but splitting the processing across VMs is a bit more problematic since each VM appears as an independent and self-contained machine, each with its own memory. Providing communications and data movement between these VMs could be done using a networking function (i.e. transmitting and receiving packets through a virtual switch), but there are performance problems with this. Using shared memory between processes is much faster.

This is where XEN can play a part. XEN provides inter-VM shared memory using a page grant mechanism. The XEN hypervisor runs in Domain 0 (what we normally think of as kernel mode). It has access to the hardware page tables and memory management functions. Most of the work is done in the User Domain, however. So, a process in Dom 0 (e.g., a NIC driver) or Dom U in a VM (e.g., an application) can share a page but making a request to XEN. XEN enters the page into a grant table and returns a handle for the page. The handle can then be provided to another process running on another VM to grant access to the page. This mechanism is called “xenstore”.

XEN provides support for non-blocking ring buffers in xenstore similar to DPDK rte-rings. These are called xenstore-rings.

Implementing DPDK and Xen fig 3

In the final section we compare and summarize the two solutions. Comparing XEN and DPDK Solutions – A summary

 By Larry S.

 

May 22, 2017 – Use Multiple Cores in Your General Packet Processing Model to Enhance Performance

In general, packet processing applications follow a standard regimen:

  •     Receive a packet (Rx)
  •     Process a packet (Proc)
  •     Transmit a packet (Tx)

The Rx part is, more or less, the same regardless of the type of packet processing. The Proc part is the heart of the application.

Use multiple cores in your general packet processing model to enhance performance fig b

This could be an Ethernet Switching application, an IP Routing application, Deep Packet Inspection (DPI), or a protocol process such as DHCP.

In gateway applications (routing and switching) the Tx is usually a different interface than the Rx interface whereas with a protocol process the Tx is usually a response to the sender and therefore to the same interface as the Rx.

​The performance requirements vary from application to application, but a 1Gbps Ethernet can transport approximately 1.5 million packets per second, so the requirements could be very steep.

One approach to achieving high performance is to employ multiple cores. For Rx, this can be done by using a hardware capability called Receive Side Scaling (RSS). With RSS the Network Interface (NIC) can queue packets to different threads on different cores based on fields in the packet. For Proc, this can be done by breaking the processing into multiple tasks and creating a pipeline. (sidebar on pipeline processing). Tx usually requires very little processing so there is little motivation for partitioning it.

Use multiple cores in your general packet processing model to enhance performance fig b

By effectively using multiple cores, the packet processing performance of the system is greatly enhanced because the system is processing multiple packets simultaneously.

Based on the general packet processing model described here, we will next show how this model can be implemented using DPDK and XEN: Packet Processing with DPDK and XEN

By Andrei C.

May 1, 2017 – XEN, Not Your Regular Hypervisor, and DPDK

No host OS + paravirtualization support = performance improvement

XEN is a hypervisor. A hypervisor is a supervisory program (think, operating system) that provides support for virtual machines. Parallels (Parallels), VMWare (Dell), and Virtual Box (Oracle) are all hypervisors. They provide an environment that hosts a number of processes (virtual machines) where each virtual machine believes it is running on the underlying hardware. Each virtual machine contains a guest operating system (e.g., Windows, macOS, Linux) and one or more processes/applications running within the guest operating system. Each of these hypervisors sits on top of a host operating system (e.g., Windows, macOS, Linux).

It is common to run a hypervisor like Virtual Box on a Mac and load one or more Windows virtual machines in order to run applications that only run on Windows.

XEN is a bit different in several ways from the three hypervisors listed above. First, it is a “bare metal” system. It runs as the lowest level, right on top of the hardware — there is no host operating system. As you’d expect, this improves performance and efficiency.

Second, XEN supports paravirtualization. With paravirtualization, XEN provides APIs for many system functions and the guest operating system in the virtual machines can be rebuilt to access system resources and the I/O subsystem via this API.

There are several Linux distributions that have been recompiled to use the XEN interface. This can improve performance and also allow stronger support for virtual machines on CPUs that don’t have good VM support (some x86s, older ARMs, etc.) XEN also provides support for accessing the underlying hardware like the other hypervisors (this is called Hardware Virtual Machine, HVM). The final difference is that XEN is open-source.

DPDK

One of the major problems with implementing network processing applications in virtual machines is implementing high performance I/O. Packet processing applications often need to process tens of thousands or even millions of packets per second. This is difficult at the application layer, in general, but even more difficult when the system I/O calls are made to a guest operating system which has to access the hard through the hypervisor and the host operating system.

The Data Plane Development Kit (DPDK) is a solution to this problem. DPDK is a framework that provides for creating software libraries tailored for specific hardware architectures (e.g., X86) and specific operating systems (e.g., Linux). These libraries (called Environment Abstraction Layer or EAL) provide high performance, generic (i.e., hardware and OS independent) access to hardware and operating resources including the I/O subsystem.

Using the DPDK EAL allows the development of high performance user-mode packet processing applications which can also be tuned to exploit multi-core CPUs.

​Next, we provide an overview of a general packet processing model: General Packet Processing Model

By Larry S.

April 10, 2017 – Hold on – There’s a More Efficient Way to do Packet Processing in Virtual Machines

First in a five-part series It is increasingly common to implement packet processing functions in virtual machines. This is what Network Functions Virtualization (NFV) is all about. The most common implementation model for network functions has been to replicate the functions in the devices that are distributed around the network. This is the easiest way to do it, but taking a step back it becomes clear that this is not the most efficient way to do it both from a management point of view as well as from an overall efficiency point of view. Consider DHCP for example. A network could have 50 edge routers running DHCP, but there is no reason why DHCP has to run in each edge router. The traffic load associated with DHCP is small so it makes sense to run each of the 50 DHCP instances as a Virtual Machine (VM) or possibly as a thread of a DHCP VM on a centralized server. There is the efficiency benefit in statistically multiplexing the computational load (all 50 instances are never running at the same time) and several management benefits such as only having to update a single system to fix bugs and add features and only having to configuration a single local system. Additionally, modern cloud computing technology provides the ability to migrate VMs for redundancy and to cloud burst for unexpected load peaks. NFV requires high-performance software-based implementations of these packet processing functions in a virtual machine environment. These implementations must be able to read packets from the network, process them, and send packets out to the network. There are a variety of tools and techniques for implementing packet processing in Virtual Machines. In this blog series we will discuss two common, but very different approaches. One is the XEN Hypervisor and the other is the Data Plane Development Kit (DPDK). Depending on the functionality required, these two solutions can be used independently or concurrently. Over a sequence of blog posts we will describe these two approaches and how they can be used to implement high-performance packet-processing software. ​In the next blog post we provide a brief overview of the XEN hypervisor and DPDK.

March 6, 2017 – Speed up your development process with functionality of Broadcom StrataXGS

Are you looking to reduce costs and improve latency? The Multi Stage Content Aware Engine of the Broadcom StrataXGS allows for additional packet inspection and manipulation in the switching chip rather than using an external device like an FPGA, NPU or software on a host processor. That helps to reduce cost and also reduces packet processing latency by staying in the chip. It also enables a degree of Packet Inspection and packet handling decisions based on that inspection without needing to go off chip, again giving you cost reduction and latency improvement. This content aware engine capability of Broadcom enables customers to replace FPGAs, redirect traffic to applications and develop diagnostic tools. The Broadcom XGS Content Aware engine provide the following capabilities:

  • Dropping of frames that are identified as potential network security risks like DDoS packets
  • Forwarding of select control packets to the CPU, such as IGMP, OAM etc.
  • Assignment of new priority, Vlan ID, or VRF for the selected traffic stream
  • Counting or metering an ingress flow across multiple ports
  • Redirection of a select flow to a new egress port
  • Redirecting or mirroring traffic based on the egress port
  • These capabilities are achieved at different stages of the pipeline: Before l2 lookup(VCAP), end of ingress stage(ICAP) and end of the egress stage(ECAP). Broadcom achieves these capabilities with the help of TCAMs in the chipset.

The Content Aware Engine can be further classified in the following phases:

  • Selection phase: where the packets are matched and selected as per the configured rules/entries
  • Action phase: as per the configured rules, the packets are subject to the following actions
    • Drop the packet
    • Redirect the packet
    • Copy the packet to the host CPU
    • Modify the packet
    • Note: In some cases, more than one action will be required. Broadcom allows the merging of more than one action like drop the packet and copy the packet to the host CPU etc.
  • Statistics phase: the user can enable the statistic feature, so that engine counts the numbers of packets and bytes processed by the corresponding rule

At Northforge we can help you program the content aware engine as per the application requirement and speed up your development process by fulfilling the required functionality by using the Broadcom chipset. We have hands-on experience with our customers in replacing their FPGAs and applications development like IPTV with the content aware engine of Broadcom to fulfill their complex deployment scenarios in the field.

 

December 12, 2016 – FreeSWITCH gives you options for voice and messaging services

Let’s say you want to develop a service where a SmartHome monitoring system detects smoke and a phone call is generated to alert the homeowner. With the ability to receive external events from any monitoring software, there’s a free, open source communications software for creating voice and messaging products for that. FreeSWITCH is a softswitch for PBX applications which can create that phone call alert and then connect the homeowner to the 911 operator. In this situation, it allows you to generate calls for automation systems where you play audio files, collect user input, and then decide to make another call and have two parties talk to each other. In this case, with FreeSWITCH you can also receive external events from external programs, such as the home monitoring system, and can generate calls remotely. Its built-in Interactive Voice Response (IVR) system lets you play a pre-recorded audio file, collect user input (usually dtmf digits) and make decisions — either to play something else or to generate another call, hang up or basically anything else within the phone system. Some of this functionality is provided by external systems that are integrated using FreeSWITCH modules. There’s more to FreeSWITCH than this one use case. You can use FreeSWITCH to develop text-to-speech engines where you can turn any text into an audio file and then play it, or have your audio files professionally pre-recorded and then play it while still receiving user input. Or the reverse, use it for speech recognition such as when you want cell phone voicemail messages to also appear in a text format. It’s very suitable for voicemail and eFax applications. FreeSWITCH works with Linux and Windows, and supports English, French, German and Russian languages. FreeSWITCH is quite popular with telecommunications software developers who need to create automated telemarketing programs. Using FreeSWITCH as a base, you can generate automated telemarketing calls where FreeSwitch generates the call, plays a pre-recorded audio file and then hangs up. It can be used to develop services which receive incoming calls to the FreeSWITCH for one leg of the call, play an audio file and based on what the user wants to do, connect to another application, such as voicemail. Or create a service where when the user receives a voicemail, they can also receive an email with an attached audio file telling them that a voicemail is waiting for them. Our Northforge team has created voice and messaging services based on FreeSWITCH for several customers who want to expand or start new services. Based on our experience with this open source software, we can get our customers telecom programs up and running

 

October 27, 2016 – Consider MANET (Mobile Ad Hoc Network) for When You Can’t do Direct Point-to-Point

Infrastructure-less Network In last few years’ mobile communications have dramatically increased in popularity and usage. This growth has inspired a development of advanced communication protocols offering higher throughput and reliability over wireless links. Much of wireless technology is based on the principle of direct point-to-point communication, where participating nodes “speak” directly to a centralized access point. However, there is an alternative, “multi-hop” approach, where the nodes communicate to each other using other nodes as relays for traffic if the endpoint is out of direct communication range. Mobile Ad hoc NETwork (MANET), described here, uses the multi-hop model. Wikipedia describes MANET (Mobile Ad Hoc Network) as a continuously self-configuring, infrastructure-less network of mobile devices connected wirelessly. All the nodes (devices) are wireless, mobile and equal (no access points, base stations, or any other kind of infrastructure). The best comparison could be a cellular network WITHOUT base stations, where all the phones need to create multi-hop mesh network.      Infrastructure-based Network (i.e. Cellular): Infrastructure-based Network      Infrastructure-less Network (MANET – Mobile Ad hoc Network):   Infrastructure-less Network (MANET - Mobile Ad hoc Network) These networks are self-configuring and can be set up randomly and on-demand. Such networks can have dynamically changing multi-hop topologies, composed of, likely, bandwidth-constrained wireless links. The concept of the mobile ad-hoc network suggests the incorporation of routing functionality into mobile nodes, in other words all nodes should be able to act as routers for each other. Need Since an infrastructure-based network is always a better solution than an infrastructure-less network in the meaning of network performance, MANET is relevant only in cases when laying the infrastructure is impossible or is not practical:

  • Natural disasters: for rescue forces
  • Remote areas / difficult terrain, i.e. pit mines, tunnels, mountains, deserts, so on
  • Military, paramilitary, rescue, anti-terror forces
  • Others: Vehicular ad hoc networks, distributed sensor network, smartphones ad hoc network…

MANET – Layer-3 Routing Core Ad-hoc networks are not restricted to special hardware or a certain link layer. MANET is a routing core (Layer-3 routing protocols) running on top of any possible Layer-2 wireless medium that is able to provide connectivity between the neighboring (1-hop) nodes: MANET – Layer-3 Routing Core It is important to note a difference between MANET routing and traditional IP routing. Routing in fixed networks is based on aggregation combined with best matching. When a packet is to be forwarded, the routing table is consulted and the packet is transmitted on the interface registered with a route containing the best match for the destination, i.e. all hosts within the same subnet are available on a single one-hop network segment via routers. However, in MANETs nodes route traffic by transmitting packets on the interface it has arrived from. Aggregation is not required in MANETs, as all routing is host based and for all destinations within MANET, a sender has a specific route. There are two principal approaches for route maintenance in MANET – reactive and proactive:

  • Reactive routing protocols set up traffic routes on-demand (examples – Ad hoc On-demand Distance Vector, Dynamic Source Routing)
  • Proactive routing protocols dynamically maintain a full understanding of the topology (examples – Optimized Link State Routing Protocol, Babel)

Northforge implemented the Optimized Link State Routing Protocol (OLSR). OLSR is an IP routing protocol optimized for mobile and wireless ad hoc networks. The protocol was integrated in a commercial routing stack suite. OLSR is documented in RFC3626 and uses the link-state scheme in an optimized manner to propagate topology information. The optimization is based on a technique called MultiPoint Relaying. OLSR operation mainly consists of updating and maintaining information in routing tables. The data in these tables is based on received control traffic and the control traffic is generated based on information retrieved from the tables. A general MANET network is illustrated below:   A General MANET network Key to Success: Network Simulation Network simulation is designed for characterizing, creating and validating the communication solutions, computer networks and distributed or parallel systems. It enables the prediction of network behavior and performance. One can create, run and analyze any desired communication scenario. Generally, a simulation is the only method that allows continuous development, testing and debugging of a network comprised of hundreds and thousands of mobile MANET nodes, since a standard lab won’t do, and field tests are expensive, difficult to operate and non-deterministic. One of the challenges during the development was testing OLSR with various topologies, e.g. two nodes, three 1-hop neighbors or 2-hop neighbor. In order to validate correct behavior of OLSR, it was important to emulate the dynamic nature of MANET, where nodes can roam around, come up and come down. To address these challenges, we decided to deploy a virtualized test environment, based on Linux containers (LXC), thus enabling execution of multiple OS instances on a single x86 machine. In addition, Network Simulation has a very important practical usage: it can be supplied as a Network Planning System add-on product for MANET core. This application supplies the following abilities:

  • Planning of node movements
  • Showing the dynamic status of network topology and connectivity (map or canvas based)
  • Operational planning based on communication conditions
  • Planning of the communications infrastructure
  • Verification & comparison of communication solutions
  • “What if” analysis of real actual situation by changing the scenario
  • Real terrain, realistic radio and propagation models

We learned a lot during this MANET project, specifically the ability to interpret and implement RFCs in MANET situations and how network simulation based on LXC is a key to success. We’ll be taking this knowledge to future MANET projects as mobile ad-hoc networks grow in use in this growing mobile network environment. Authors: Oleg Palakov and Sasha Ilin

 

September 7, 2016 – Looking at OpenWrt as an Option for Enterprise Routers

With the Internet of Things gaining popularity, embedded devices are getting more attention. An operating system that’s also getting its share of attention is OpenWrt, an OS that’s primarily for embedded networking devices. It’s based on the Linux kernel and primarily used on embedded devices to route network traffic – essentially a Linux distribution for your router. If you are developing an application, OpenWrt gives you the framework to build an application without having to build a complete firmware around the application – giving you the ability to fully customize devices in ways you never imagined. While OpenWrt isn’t the ideal solution for everyone, OpenWrt is flexible and can be installed on various routers. OpenWrt has a web interface, but if you just want more than a web interface, you’re probably better off with another replacement router firmware. Having a modular Linux distribution available on your router gives you lots of opportunities, including setting up a proper VPN on your OpenWrt router; running server software such as a web server, IRC server, or BitTorrent tracker on a router to use less power than on a computer – perfect for lightweight servers; create a special wireless guest network for security purposes; or capturing and analyzing network traffic. OpenWrt is being adopted in many consumer grade and small business products, but out of the box, OpenWrt lacks many of the functions required for an enterprise class router to make it ready for larger enterprises or carriers to use. An enterprise class router needs many functions such as a CLI management interface; multiple native routing protocols; multicast; traffic management (queuing, policy based routing, etc.); security (NAT, ACL, VPN, AAA, etc.); interface to manage L2 switch; flow control; OAM for hardware; statistics such as RMON and others; flash image management; and much more. Some of the key features OpenWrt lacks are hardware monitoring; storing two or more firmware images (for rollback etc.); CLI; extensive statistics; traffic management; some AAA (TACACS+, Radius); and many multicast features. Extensive development is required to build out OpenWrt for use in an enterprise router. In addition, extensive QA would be required to ensure proper operation and robustness in an enterprise environment. For one customer we looked at whether OpenWrt (release 14.07) would be a suitable option for their enterprise router. The goal was to adapt OpenWrt to meet their needs, taking consumer grade quality and completeness and converting it to enterprise grade quality and completeness. We identified 51 features in the customer’s requirements which weren’t supported in OpenWrt along with a unified CLI needed to cover all features for L2 switches and L3 routers. We would need to apply our experience and support to new development work in the customer’s infrastructure, management interfaces, native protocols, multicast, encapsulation, traffic management, routing IPv4/IPv6, L2, security, and L3 OAM. Though OpenWrt is not suitable for all environments, it may be time for you to start thinking about what’s possible with OpenWrt for your applications. If you’re hit with challenges and we’ve been there too, we can help develop a solution that takes advantage of the flexibility OpenWrt has to offer.

August 7, 2016 – The Surprise of NFV

While the expectation in the network industry was that Network Functions Virtualization (NFV) would primarily be for deployments in the cloud, in particular in data centers, the surprise is that NFV is proving to be most effective when used on virtual Customer Premise Equipment (vCPE). Sure, NFV and its cousin SDN implementations were originally focused on the cloud, but the reality is that vCPE and on-the-cloud-edge applications have, at the moment, the most to benefit from NFV adoption. Why? The first reason is a much faster ROI. Enterprises are always looking to find ways to save on network equipment costs. It is much easier to assess the savings generated by deploying vCPE, either with Common-Off-the-Shelf (COTS) equipment, or with hybrid Wide Area Network (WAN) Integrated Access Device (IAD) equipped with one or more x86 blades, for very specific and well-defined Virtual Network Functions (VNF): firewall, switch, router, and PBX. The second reason is the benefit of having consolidated network management. Virtual CPE based on x86 offers the possibility of using a single network management application to configure, provision and monitor all VNFs, as opposed to a more cumbersome and expensive Network Management System (NMS) solution requiring the integration of various Element Management Systems (EMS) from different OEMs. And even compared with the edge VNFs configuration in the cloud data center, the vCPE network management is more cost effective. The third reason, but not less important, is security. Security is a main concern today in network communications. By definition, access is a security characteristic, and it happens on the edge between the cloud and the customer premises. It then makes sense to implement the firewall, the Deep Packet Inspection (DPI), and the Distributed Denial of Service (DDoS) mitigation solutions as VNFs on customer premises. Because of our experience in developing unique solutions for distributed NFV at the customer edge, we’re working with several OEMs and ODMs to develop NFV solutions such as a virtual PBX (vIPBX) running as VNF on vCPE. The surprise to the industry may be that NFV is proving to be most effective when used on vCPE, but we’ve already have been doing some trail blazing here.  Author: Stefan M.

 

July 7, 2016 – Upgrading from 1G to 25/40/100G and Beyond

The ongoing exponential growth of traffic moving through the interconnected world drives a continuous upgrade cycle for equipment in the data path and control network. For home and business subscribers, gigabit connectivity per termination is now in reach. This drives the upgrade cycle, which touches all the sub-systems of the network element. Whether the data path is primarily mobile backhaul, Metro Ethernet, or GPON, it is likely to have somewhere within it, an Ethernet switching component. One of the most common upgrades of this component is the move from 10G to 25/40/100G. This blog post summarizes Northforge’s recent experience in creating a network element upgrade. The upgrade started with the switching silicon. Generally, the change of switching silicon will also trigger software changes. These may include an upgrade of the switch SDK, the control processor hardware, the control plane OS, and whatever version of L2/L3 protocol stacks are running on the hardware. These low level changes can require substantial development resources. When, for example, Broadcom switching silicon is found in the Ethernet infrastructure, the move to 25/40/100 G will include an upgrade of the Broadcom switch to Trident II, Trident II+, or Tomahawk. This means that a version of the Broadcom SDK that supports one of the latest chips will also be needed – another upgrade. There is typically Layer 2 and Layer 3 control plane software. Examples of this software layer would be Broadcom’s FASTPATH®, Metaswitch’s Network Interconnect, or IP Infusion’s ZebOS®. Most likely this software was dependent on an earlier version of the SDK and will also need to be upgraded. Since the switch hardware has been changed, the platform vendor may choose a new multi-core processor to improve performance, reduce power, or decrease component cost. Once the control processor is selected, a version of the OS that supports all of the software upgrades listed above will have to be selected. This requires a new board support package (BSP) for that OS and control processor. All of the low level hardware and software needs to be integrated and tested. Where this upgrade process can get tricky is at the application layer. The application layer will typically have been designed to be largely independent of the bandwidth provided by the underlying hardware. Its user interface and features will be well known to users and operations personnel. It will have been thoroughly tested by QA. In short, it may be a requirement that the upper layers of the application undergo no modification whatsoever. This dictates that an interface between the application layer and the lower layers of the software will have to be developed, isolating the application layers from those changes. At the end of the upgrade cycle, the users and administrators still have a familiar look and feel to the service. However, the bandwidth has been increased by as much as a factor of 10x. This enables the delivery of more data to more people in a constrained cost and power envelope. Using Northforge’s development resources, highly skilled in networking hardware and software, the entire upgrade cycle was accomplished in a timely and cost-effective manner. Northforge Innovations can help you in upgrading your network connected equipment to the latest and greatest Ethernet technologies. Our experience with management plane, control plane, data plane, BSPs, Switching SDKs, and network application development will help ensure the success of your Ethernet upgrade project. Author: Harry N.

June 16, 2016 – Opportunities for More Advanced SDN-based Traffic Classification

With mobile technologies, the shift to the cloud paradigm, and increased demand for content delivery networks, traffic classification is becoming more and more complex. In this blog, let’s discuss the need for more advanced traffic classification in a SDN environment. This includes a definition of traffic classification for development of traffic classification products. Definition of network traffic classification Network traffic classification categorizes traffic flows into different classes determined by networking protocols (L2-L7), traffic types, services and applications. The purpose of such classification includes:

  • Traffic profiling
  • Distributed Denial of Service (DDoS) defense,
  • Content aware forwarding and load balancing
  • Data leak prevention
  • Anti-malware
  • Service level agreement enforcement
  • Network monitoring and trouble shooting.

A traffic flow is represented by the classic 5-tuple: the source IP, the source port, the destination IP, the destination port and the protocol number. Levels of Classifications There are four main levels of classifications:

  • Protocol
  • Web service
  • Application protocol
  • Standard protocol

Classifying at the protocol level means detecting traffic of an application or service flow by the protocols being used, for example, ETH, IP, TCP DNS, HTTP, RTP, RTSP, SMTP and so on. The web service level classification inspects the traffic flows deeper to classify web applications such as Yahoo, Google or YouTube. There are two kinds of application classification. The first is application level matching of traffic flows to internet applications which use both standard and propriety protocols in communications. “Skype” and uTorrent are typical examples of such applications. The second is standard protocol applications matching using only standard internet protocols. If the classifier is only able to identify the traffic as HTTP or SSL, the classification is considered incorrect or incomplete. Classification techniques Classifiers’ design employs various techniques of identifying network traffic, ranging from port-based identification and payload-based identification, to statistical classification and behavioral classification. Classifying an application based on its port number includes inspecting the port number in packet headers and matching them with the standard TCP or UDP port number registered with the IANA. The Payload-based classification technique is the classical deep packet inspection method in which the software finds particular patterns (signatures) inside individual packets or flow of packets. Since the payload-based implementation inspects each packet in the network traffic, it comes with a high computational cost. Port-based and payload-based inspection are the two methods that are widely implemented in a DPI engine. A classifier implementing only port-based inspection is no longer effective with internet traffic. The main reason is that non-legitimate web services and applications hide themselves behind well-known ports. At the same time, the payload-based method is unable to inspect encrypted traffic and thus becomes ineffective with encrypted traffic flow. Another approach to classify encrypted network traffic is to use “flow association” technique in the network traffic. The flow association is a technique that recognizes network traffic based on pre-determined factors. For example, SIP signalling packets might contain the information on port numbers and IP addresses that voice traffic would use. The statistical classifier tackles the problem of finding a particular class of applications flows that are not easily detected by the port-based and payload methods; either the traffic is encrypted or flows are asymmetric. Statistical classification is considered a more advanced technique and being actively studied in academia. Newer DPI products are incorporating this technique. Statistical classification is based on statistical characteristics, data mining techniques, and more specifically, machine learning (ML) algorithms. More advanced statistical classification requires machine learning because such classification requires different traffic patterns from large datasets. Machine learning uses characteristics of the sequence of internet packets to identify the class of a flow. There are broadly four types of machine learning defined in the research literature. They are classification, clustering, association and numeric prediction. Only classification and clustering are widely adopted in network traffic classification. Briefly, classification uses large known sample datasets to build rules for classification. Software engine use these computed rules to identify flows in new datasets. Since such learning requires known datasets, it is a type of supervised learning. Clustering first identifies patterns of flows and then groups the flows with similar patterns into clusters. Clustering is unsupervised learning because users do not provide any known datasets for building patterns. Simple statistical classification uses readily available metrics. These metrics can be collected and computed to find the type of traffic flow. For example, some classifiers use a sample size of 10 packets to determine the mean size of each packets, mean arrival time, and mean bit rate to classify Skype traffic. There are many other statistical approaches to classifying such traffic. All these methods attempt to classify a flow of traffic with lower computation costs and without dedicated hardware deployments on each type of application. The behavioral classification technique observes the entirety of network traffic received by the endpoint (host), seeking to identify the type of application by analyzing the generated network traffic patterns from the target host. This is a holistic approach to look for anomalies in a network. DPI in SDN In the recent development of software defined networking (SDN), traffic classification can be deployed separately as a service in an SDN network. This approach removes DPI (payload based classifier) software components from the middle-boxes such as IDS, IPS, load balancer firewall, and anti-malware. The DPI is placed in a strategic location in the network so that each packet only needs to be inspected once, and the classification result can be reused by all the middle boxes. The DPI classifier could further be put into dedicated hardware with network processors providing much higher throughput. However, the end-user applications do not need to be on the dedicated hardware. This will simplify development, integration and deployment of traffic classification software into end applications. This approach requires introducing a DPI controller to orchestrate the operation of multiple DPI engines and pass along the results to subscribers (the network application software). To give an example of such an application, an HTTP/HTTPS load-balancing system might only be interested in HTTP traffic classification results. In this approach, the load balancer does not need to re-inspect other irrelevant packets for layer 2 protocols. The load balancer simply subscribes to HTTP and HTTPS traffic classification results, process them (if needed) to find out the network load conditions, and act accordingly. A DPI solution with SDN usually means more centralized control of flows. In this case, statistical and behavioural classification techniques in conjunction with appropriate traffic steering techniques can be applied to get the best results. DPI as a service, together with virtual network function (VNF) appliances can be deployed with a template based network function virtualization orchestration technique. Re-deploying or deploying an additional “classifier” should take much less time when there are new VNFs onboard. Northforge has developed products that utilize each of the approaches to classification defined in this blog (i.e., Protocol, Web Service, Application Protocol, and Standard Protocol). Most recently it has been defining signatures enabling differentiation amongst Web Services. This has led to technology enabling classification of packets without decrypting or decompressing. Need help with traffic classification? Northforge has the skills and expertise in the accelerating, securing, and insuring the fidelity of data packets traversing the internet. Our focus on the cloud infrastructure enables us to develop custom products to solve the most complex technological challenges. Author: Michael L.

 

May 31, 2016 – OCP Networking — Will It Catch on Throughout the Industry?

The networking industry is starting to embrace the Open Compute Project (OCP). While it’s probably too early to speculate how OCP networking will pan out, but it is hard to contain our excitement. OCP is an industry association originating from the largest data center hardware customers. Together they represent a huge market and hardware manufacturers are falling in line to join and satisfy their needs. Luckily, the big guys’ needs align well with the rest of us. Everybody wants things to work simply, cheaply, and interchangeably. “Hardware” means racks, mounts, and power distribution all the way through servers and storage to networking equipment. Participating companies share their in-house designs hoping they will catch on, become mass-manufactured COTS products, driving down prices with economies of scale. Networking On the networking front, aside from hardware aspects like form factor and power, interchangeability is taken to new levels: All networking equipment (switches, routers, firewalls, load balancers, etc.) is expected to be available for purchase as “Bare Metal” hardware, with an open bootloader, and the ability to install a third-party “Network Operating System”. The same way we can purchase generic x86 servers today and install any OS and applications of our choosing, in the future we might be able to purchase networking hardware from Juniper and run Cisco’s IOS on it. We are quite far from such utopia, and it may never come to be, but there is a bewildering variety of networking equipment available, a nice selection of NOS-es, including an open source Linux distribution to take as a base and roll your own. The key to compatibility is ONIE, the Open Network Installation Environment bootloader. Hardware The uptake from network equipment manufacturers (often abbreviated as ODMs, Original Device Manufacturers) is huge: Alongside their traditional, integrated products, they now offer many models of ONIE-capable switches, routers, appliances, network-server hybrids. Many more are being demonstrated as prototypes at trade shows such as the OCP Summit. Despite the variety of appearances, port counts and speeds, throughput and capabilities, all these boxes share roughly the same internal design principles. There is the host processor part (often an Atom-class x86 board, but it can be PPC or ARM), and the switching silicon (a Broadcom Trident variant, a Cavium Xpliant, or similar). The ASIC (or more accurately, ASSP, Application Specific Standard Product) is usually connected to the PCIe bus of the host. The host often has more non-volatile storage than regular networking equipment — onboard Flash, SD card, or SSD. Neither has a significant price impact nowadays. Hardware is sold either bundled with an NOS license which is then preinstalled, but is replaceable, or with only ONIE. Software Aside from ONIE, the minimal Linux boot environment which enables the installation of a NOS, the switch vendors must provide the means to configure or program their networking silicon. Traditionally, suppliers like Broadcom and Cavium have kept their SDKs and toolchains under tight control, available only to their partners including ODMs and select expert software development consulting service providers like Northforge Innovations. Being low-level APIs, having access to them would be of little value to organizations without teams of dedicated engineers with specialized experience with each vendor and product. A welcome change to this state of affairs is the introduction of open APIs from the vendors: OF-DPA, OpenNSL, Switch Abstraction Interface, etc. These open SDKs are accessible for a much wider audience both from the availability and ease of use point of view: They are freely downloadable as a bundle of a closed-source binary and open-source headers, and the APIs presented are typically higher level. The trade-off here is simplified usability at the cost of less low-level, direct control. The exposed functionality while limited, matches fairly well what a custom application developer might want to do with a Whitebox switch: OF-DPA for example provides an API closely resembling the OpenFlow table hierarchy, making it an obvious choice for any OpenFlow agent type application. The open SDKs are evolving, the vendors are listening, and their classic SDK partners have the means to extend and enhance the offerings. Network Operating Systems are built on top of the foundations of ONIE and the open or closed SDKs. Typically, these NOSes are custom Linux distributions containing:

  • An ONIE-compatible installer,
  • Driver support for the special hardware in the Whitebox switch including GPIO and I2C access to LEDs, SFPs, power- and thermal management,
  • And most importantly, the PCIe interface of the switching ASIC.

The free Open Network Linux distribution, a development platform for custom applications, stops short right there. Full-feature NOS-es typically also include applications to do routing, switching, monitoring, and/or firewalling. Complete with OAM features, when (pre-) installed on a Whitebox switch they offer a polished, ready-made customer experience comparable to that of buying classical, non-Whitebox networking equipment from the likes of Cisco or Juniper. Examples include Cumulus Linux, BigSwitch SwitchLight OS, Pica8 PicOS, or even the free and open source OpenSwitch NOS founded by HP and other leading networking industry companies. The alternative to these turnkey NOS-es is rolling your own. Take, for example, Open Network Linux as the base, pick the SDK for your target platform and application, and implement your own routing/switching logic. Open Source components can be cherry-picked for vanilla inclusion or forking to add the custom functionality. Hyperscalers (the Googles and Facebooks of the world) have the talent and resources to do it in house, smaller players can take advantage of the growing ecosystem of consulting or software development services. It is probably too early to speculate how OCP networking will pan out, but it is hard to contain our excitement. Will it remain the purview of only the biggest players? Will a handful of NOS offerings cover every conceivable need and obviate independent development? Or is an “ONIE compatible switch” our generation’s “IBM compatible PC”? OCP Summit 2016 The 2016 OCP Summit in San Jose was altogether a relatively big event for such a fresh and specialized industry group as the Open Compute Project. The tradeshow floor, talks, workshops, and announcements were all split across the many areas of interest involved, of which Networking is only one. Still, the two days commanded a rather tight schedule to meet all the network-related vendors, see the most relevant talks, and drop by the Hackaton. Many vendors and member companies were represented at the CEO, CTO or VP level, indicating their interest in this promising new field, and acknowledging the Summit was more than just a trade show. We all went there not just to promote ourselves, but also to be part of the discussion on where OCP is headed, what new applications it will enable, and how we can all work together to make it pan out and succeed. There was an almost tangible, but hard to describe anticipation in the air. We all felt the Summit is not so much about one-upping each other, but more about being part of something big and genuinely useful for everyone whose life is impacted by technology. Northforge Innovations developed networking applications with switch ODMs and switch ASIC providers long before the OCP movement. Our long-term relationships and experience with Broadcom, Cavium, Mellanox and Intel enables us to provide solutions and services on Whitebox platforms just like we’ve been doing on closed platforms. We go beyond OF-DPA, OpenNSL and OpenXPS, typically working with the underlying SDKs accessible only to select partner companies. The Open Compute Project promises to open up this fascinating field to newcomers. We applaud and support these efforts, wish to welcome everyone to our world, and offer our services and partnership to anyone in need of in-depth expertise in this domain. Author: Gabor Sz

 

May 2, 2016 – Mitigating Malicious Attacks that Interrupt Network Services — Distributed Denial of Service (DDoS)

Attacks on networks are happening daily around the world and network operators need to protect their networks from attempts to disrupt their services. A DDoS attack is a malicious attempt from multiple sources to make a server or network resource unavailable to its intended users. It is achieved by saturating memory, processors or network bandwidth which results in a temporary or indefinite interruption or suspension of services to the legitimate users. Attackers gain control of, possibly, a large number of end hosts and concurrently launch a DDoS attack. Because of the distributed nature of such attacks, firewalls are not suitable for DDoS protection. In fact, the new generation of stateful firewalls can be a target and victim for stateful DDoS attacks causing the whole network behind the firewall to become inaccessible (we will discuss about stateful attacks later on). Before going into details of DDoS, we clarify the following terminologies: Attacker: The system(s) launching the DDoS attack from multiple end nodes. Malicious users: Refers to end nodes which have been hacked and are remotely controlled by attackers. Usually attackers try to infect the end nodes with some malware to gain control of the end nodes and launch attacks from there. Legitimate users: Refers to non-malicious end nodes that require services from a network node under DDoS attack. Victim: The targeted network nodes, e.g. DNS servers, webservers, firewalls, etc. Botnet: A network of zombie computers programmed to receive commands without the owners’ knowledge. Main types of DDoS attacks: Volumetric attack Attackers generate large volume of traffic (may go above 100 Gbps) which can cause network bandwidth saturation, host, or network element’s memory and CPU overload. The defense against volumetric attacks can be frustrated in two ways. First, if the attackers forge the source IP addresses (e.g. in TCP SYN attacks, see below). Secondly, if a large botnet is created and used to launch the volumetric attack from inside the subnet. Application-level attack This type of attack happens at the application level of the OSI’s seven-layer reference model. An application-level attack does not require high traffic volume. It will send “smart requests” which can cause the target to spend a lot of CPU power, memory usage and/or network bandwidth to process and respond to the malicious requests. For example, a request for a non-existing file on a webserver can force the target to perform extensive disk search. In a second example, an http request for a file with a large size can consume large amounts of memory buffers. Another example is a web-application request for a summary report of 1 year’s account transactions at a bank which causes a lot of processing in the backend database server. State-exhaustion attack Attackers can utilize the stateful processing of layer 3 and layer 4 protocols to saturate the memory of the nodes. Examples are Slowloris attacks or Ping of Death attacks (see below). Zero-day attacks A Zero-day attack can be defined as any previously unseen type of DDoS attack that common methods of protection may not be capable of handling because there is no known signature (behavior and traffic patterns) available to the DDoS protection system. An adaptive DDoS protection system should have a way to detect and react to such zero-day attacks. Multi-vector DDoS attacks Attackers do not use a single type but use multiple types of DDoS attacks at the same time. That makes the detection and mitigation of the attacks much more difficult. Well-known DDoS attacks TCP Syn Flood Attack (volumetric attacks) A classic example of a DDoS attack is TCP SYN flood & IP proofing attacks. It can be launched from a single node by sending TCP SYN requests with the source IP address being spoofed. Since the current Internet routing protocols only take the destination addresses into account, the spoofed packets with random source IP addresses have no difficulty in reaching the destination node. The node will allocate memory structures to accommodate the future flow to be opened by each TCP SYN packet. A high number of TCP SYN packets exhaust memory and crash the server. Reflection and amplification DDoS attacks (volumetric attacks) Let us start with DNS reflection and amplification attacks. Since the DNS protocol is based on the connection-less UDP transport protocol, an attacker can send a DNS request without establishing a connection. The DNS request has a forged source IP address that is the IP address of the victim. The DNS server will respond to the DNS request with a DNS response to be sent to the forged IP address in the request. The victim will receive DNS responses without sending DNS requests. The attackers can increase the DNS response traffic by increasing the rate of DNS requests and use specific requests that result in a larger response size (The name of DNS amplification comes from the large response size). For example, a DNS ANY request of 64 bytes will result in a 512-byte DNS response. Other reflection and amplification attacks exist and work the same way as the DNS one. Examples are NTP, SSDP, BitTorrent, RIPv1, mDNS, CharGEN, QOTD, SNMP, NetBIOS Name Server and RPC port map (Open Network), Sentinel (a license server) reflection and amplification attacks. Peer-to-peer DDoS attacks (Volumetric attacks) In the case of Peer-to-peer DDoS attacks, the file sharing traffic is routed to the victim’s server. Slowloris DDoS attacks (State-exhaustion attacks) The attackers open as many HTTP connections as possible. Then they try to keep the connections open by slowly sending partial requests. This way they can keep the established HTTP connections open while they do not need to send high volume traffic. Nuke or Ping of Death DDoS attacks (State-exhaustion attacks) Attackers send corrupt and large fragmented ICMP packets to the victim. The large ICMP packets will be reassembled on the victim side which needs to use a large number of buffers for the assembling of IP fragments. The attackers can also manipulate the fragments, such that the reassembly on the victim’s system keeps holding of the buffers before timeout happens. The purpose is to keep the memory buffers in use as long as possible so memory will not be available for other connections. How to detect DDoS attacks The DDoS attacks need to be detected and actions need to be taken to mitigate the attacks. Every DDoS protection systems usually has false positive or false negative decisions about legitimate users or attackers. False positive decisions disallow the legitimate users to access the service and cause negative business impacts. Active mechanisms are often used to help DDoS protection systems to differentiate legitimate users from attackers. Examples are CAPTCHA tests to mitigate webserver attacks or RST cookies to fight against TCP SYN flood attacks (we will discuss those methods later). These active mechanisms cause inconvenience to the legitimate users. A DDoS detection mechanism needs to decide whether a network is under a DDoS attack and based on that, what mitigation actions will be taken. Otherwise, no actions are taken to avoid false decisions or the cost of active mechanisms. DDoS detection can use the following mechanisms for the decision:

  • Rate-based detection: if the rate of a certain traffic to a node exceeds a predefined threshold, the node is considered to be under DDoS attack. Rate measurement can be done in the DDoS detection system or by processing flow information sent by nodes, routers, switches using protocols such as Netflow, sFlow, etc.
  • Memory and CPU usage at end nodes: DDoS detection systems can query and monitor the resource utilization of the nodes under protection using network management protocol such as SNMP, or monitoring software such as Nagios.

How to mitigate DDoS attacks There is no efficient single mitigation technique for all types of DDoS attacks. Each type of attack requires a different mitigation technique or a combination of mitigation techniques. TCP SYN flood attack: the DDoS protection system intercepts the SYN packet and does not forward the SYN message to the targeted node, instead it sends a SYN-ACK message with a cookie value. If the sender responds with an ACK with the same cookie value, the sender request is considered as legitimate user and the next TCP connection establishment attempt will be allowed. Volumetric attacks: A UDP-based reflection and amplification attack can be mitigated by putting a rate limit on the traffic coming to the node under attack. Web application attacks: These attacks can be mitigated by turning on the CAPTCHA challenge. Legitimate users are those who can solve the challenge. The drawback is that often this CAPTHCHA is not welcomed by end users and could have potential negative impact to the business offering the services (such as banking or P2P services). The DDoS mitigation system can use an IP reputation list or build its own blacklist and whitelist by gradually observing the user behavior and analyzing related statistics of user traffic. The DDoS mitigation system can also block whole or partial traffic from certain geographical areas depending on the priority of the customers the service under protection targets. This method is not effective if the attack is dispersed in different geographical areas and not suitable for international businesses. Needs and challenges of DDoA attacks DDoS attacks become more and more prevalent with increasing size. Reports from Akamai, Arbor Networks, and Verisign indicate attacks in 2014 and 2015 with the peak traffic volume of above 100 Gbps and above 100 Mpps. The biggest DDoS attack [1] in the history occurred in the beginning 2016. This targeted BBC with the estimated rate of 602 Gbps, almost twice the rate of the largest DDoS attack Arbor Networks reported in 2015. With the introduction of cloud computing and Internet of Things (IoT), attackers can get control of large amount of virtual machines and launch DDoS attacks from there. DDoS protection systems became a have become an important security appliance to avoid service disruption, and prevent the loss of revenue. A report from Digicert [2] indicates that DDoS attacks cost 40% of businesses at least $100,000 for every hour of down-time. Northforge Innovations identifies the key challenges for a DDoS protection systems to be twofold. First, the ability to handle large amounts of traffic. Second, the ability to mitigate the zero-day attacks. In order to address the first challenge, Northforge employs disrupted/optimized data-path processing, and DPI techniques. Second challenge is met through the implementation of user-behavior analysis techniques. References: [1] http://www.csoonline.com/article/3020292/cyber-attacks-espionage/ddos-attack-on-bbc-may-have-been-biggest-in-history.html [2] https://blog.digicert.com/ddos-trends-predictions-for-2016/

 

April 6, 2016 – How to Improve Packet Buffering Performance with Broadcom Smart-Buffer Technology

Broadcom’s Smart Buffer Technology can provide cost effective performance scaling of cloud applications in Broadcom StrataXGS data center switches. The function of a network switch is to receive packets on an input port, apply switching and routing decisions as per the configurations that are set on the L2/L3 switches, identify the outgoing port(s) and send the packet out. Whenever a switch receives a traffic burst, the output port will not be able to process the complete traffic as it is receiving on the input port. As a result, we need to implement store and forward logic at the input port to have lossless transmission of the packets. The buffering capability of the input port determines how many packets can be buffered by the port before the output port could send the packet out of the switch. Whenever the burst is high/continuous, it is very likely the buffering capability of the input port could be exhausted and packets could get dropped randomly. This results in poor and unpredictable behavior. The buffering capability of a port is determined not only by the amount of buffer space allocated for the port, but also by the Memory Management Unit architecture which plays a vital role in absorbing such types of burst in real time. In cloud-based data centers, it is evident that “Bursty” traffic patterns are very high. Therefore it is key for anyone who is designing a cloud-based data center to have good buffering algorithms implemented in the switches, so that it could absorb such traffic to its maximum capability. This approach could minimize the loss of traffic and as a result have better connectivity and reliability. The traffic patterns in a cloud environment are typically dynamic and uncertain. Examples of this type of traffic are Hadoop MapReduce, distributed file systems used in Big Data analytics, distributed caching related to high performance transaction processing, streaming media services, and many other demanding and high bandwidth computing processes. It’s better to take a close look at the traffic pattern characteristics in the context of Big Data such as Hadoop and MapReduce since these comprise a high percentage of the traffic in the data centers. In Hadoop File System (HDFS) operations such as input file loading and result file writing, give rise to network burstiness due to a high amount of data replication across cluster nodes in a very short time span. The data shuffling phase in MapReduce also tends to create many-to-one bursts when multiple mapper nodes terminate and send their results to reducer nodes in the network. Even though it is recommended to migrate from TCP collapse to Priority Flow Control (PFC), Quantized Congestion Notification (QCN), and Data Center TCP (DCTCP), these protocols are not widely implemented in the today’s Web and other cloud-based networks. That’s because it would require a costly and complex upgrade of the hardware and software in multiple nodes of the network. As well these protocols could not solve the complete problem instead they could handle short lived traffic flows or micro bursts. One solution for the above problem is to use Broadcom’s Smart-Buffer technology which offers a proven approach to delivering cost-effective packet buffer performance and well suited for modern data center switches running cloud applications. The StrataXGS switch architecture with Smart-Buffer technology incorporates a scalable multi-pipeline design interconnected through a centralized MMU architecture. Further, its packet buffer is right-sized and dynamically shared across all ports for excellent burst absorption. Its architecture enables global admission control, queuing, policing and shaping functions. Smart-Buffer delivers optimal buffer utilization and burst absorption for data center workloads by taking a holistic approach to buffer management – using real-life data center traffic scenarios to maximize overall throughput and lossless behavior. To summarize:

  • Excellent burst absorption
  • Fair shared buffer pool access
  • Port throughput isolation
  • Traffic independent performance

       At Northforge, we can help you in designing systems for Cloud Data Center environments that perform well and are cost effective solutions for the enterprise networks. Northforge also provides services in integrating systems with Broadcom SDK that work seamlessly in a cloud environment. Reference: https://www.broadcom.com/collateral/etp/SBT-ETP100.pdf

 

March 21, 2016 – Lost Battles of Network Security

While these products and strategies that we discussed in the first post of this three-part blog series might seem comprehensive (when taken as a whole), successful attacks are still common. Security experts point out that certain basic vulnerabilities are still not addressed. In an IBM white paper titled Stepping Up the Battle Against Advanced Threats, the notion of the “three lost battles” is raised to point to why successful attacks are still common. The three battles noted there are User Education, Patching and Secure Code Development, and Malware Protection. User Education – Users fall for a variety of ruses with potentially devastating effects. By falling for a spear phishing attack, executing a malicious attachment, or visiting a legitimate but compromised site, access to internal systems may be achieved. Once having internal access, detailed study of the internal security architecture is possible, resulting in an attack crafted to the specific vulnerabilities discovered. Many firms are lax in actually educating and testing employees with respect to their security awareness. Patching and Secure Code Development – As the number of endpoints grows exponentially, so does the code base embedded within those endpoints. This results in a huge opportunity for “zero-day” exploits, i.e. those exploits that leverage previously un-discovered vulnerabilities in the code bases and therefore the likelihood that existing security measures will fail. To quote from a recent article at sdxcentral.com, “British Telecom (BT) has deployed security from Cisco in its data centers to combat security threats that, according to the carrier, have increased 1,000 percent in the last 13 months.” In the article, Sam Rastogi, a Cisco senior product marketing manager, asks “Why would security threats increase 1,000 percent? It’s because of the massive growth in Internet-connected devices and the Internet of Things (IoT). There are simply more entry points for threats.” Malware detection: blacklisting and behavior analysis – Given the exponential growth of code in the endpoints, secure code development in itself is not likely to thwart all zero-day attacks. It is here that predictive methods come in to play. For example, look for suspicious patterns, probes, or other activity that suggest an attacker might be trying to determine the code base in a target (remember that much of the code in the target is likely to have an open-source origin) and limit access to that device or raise an alert to the security administrator. In fact, all three of the “lost battles” need to be surmounted and this presents a quite different product approach than point products targeted at specific phases of the attack. IBM has a buzz phrase for addressing this view: “Actionable Integration of Vulnerability Intelligence”, i.e. protection that spans the entire attack and provides a direct means of applying and enforcing it. In conclusion, we’ve seen that there’s a disconnect between how some of the security product industry views the threat landscape, how the threats are designed and delivered, and how security experts view the most serious threats. Security breaches are growing at exponential rates and hiring demand for security experts is skyrocketing. In recent web post at The Telegraph (1/13/16) it was noted that demand for cyber security experts has quadrupled to a record high over the last year following data breaches at Talk Talk, Sony and Ashley Madison. This is not a solvable problem but rather an ongoing fundamental aspect of cyber existence that will require human and financial resources for the foreseeable future.

March 3, 2016 – How Attackers of Network Security Craft Their Attack

While a network security attack will typically operate on the three phases (before, during, & after) discussed in the last blog post, the successful attacker will coordinate actions across all three phases to maximize the likelihood of success. The “before” will consist of multiple scans of the target to find those scanning approaches that produce the most information about the target. This phase may span months, but attackers are patient. Small actions each of which is undetected can be used to build the attack profile. The attack scenario will incorporate the results of those scans. The attack is launched – or not. The infection technology can be very well obscured by, for example “sleeping” for a long period of time after access has been gained until an action by a user (e.g. some mouse clicks) or external event causes activation of the attack. This characteristic can be effective in preventing detection of the threat even in instrumented “sandbox environments”. Even encryption is not a panacea – certificate compromise is common and many attacks take place over VPN’s created by the attacker once a certificate is compromised. Probing continues to see how the target defends itself against the initial attack and the strategy may be modified to further the continuance of the attack. At some point, if the attacker is successful in maintaining connection to the target, the goal of the attacker will be realized. At this point the attacker may take steps to conceal the nature of the attack. This concealment can include removal of the threat, or modification of the threat into a different form that preserves some aspect of the breach for subsequent attacks but protects the threat from detection by post-attack analysis. The above is a highly generalized view of the attack profile. Far more complexity is found in the real world. For example, looking at a set of specific attacks and threats such as Point of Sale, Web Apps, Malicious Insiders, Physical Theft, Crimeware, Card Skimmers, Denial of Service, and Cyber-espionage, each has its own attack profile. Additionally, each of these threats typically targets a different vertical market (e.g. retail, energy, public, financial, manufacturing, health, and travel). The range of variations between threat types and attack surface in the vertical market increases the difficulty of finding an off-the-shelf solution for the vertical seeking protection. That said, the perennial advice, slightly modified, still holds, “an ounce of prevention is worth a pound of cure”. Also there are behavioral and systemic weaknesses that do not fall nicely into the simple three-phase model. In the next post we will look at more sophisticated attack strategies and some of the high level failings that fall outside of the point approaches used in the three-phase model.

February 4, 2016 – Network Security from the Perspective of Security Vendors

It seems that nearly every day there are reports of a new type of security attack or some major security breach. Many more go unreported. Overall the complexity of information technology architectures and their widely distributed nature encourage novel attack schemes which gives them huge financial, commercial, and political rewards. This blog will look at the security challenge from three different perspectives. First, security product vendors have their views of how to best address this market. Second, attackers view the challenge as how to best circumvent the vendor’s products. And lastly, researchers and security experts weigh in on why successful attacks occur in spite of the numerous security products available in the market. In this blog post and the next two posts we’ll look at each of these three perspectives and the dynamics between them. Security Vendors Perspective Security vendors frequently characterize their products as addressing a specific phase of the security attack. At a high level these phases might be called “Before”, “During”, and “After”. This is a logical characterization as the defense technology employed at each phase of the attack can be optimized to a specific vulnerability. The terminology used here to characterize this three-phase model is largely drawn from Cisco which provides a good starting point. The NIST model of Identify, Protect, Detect, Respond and Recover can also be mapped to the three phases Before Actions appropriate to the “before” phase can be characterized by the phrase “Discover, Enforce, & Harden”. In the “before” phase, offerings include firewall, patch management, application control, vulnerability management, VPN/encryption, and network access control. These products cover the critical “protective” steps. During Actions appropriate to the “during” phase can be characterized by the phrase “Detect, Block, & Defend”. In the “during” phase offerings include intrusion protection, anti-virus, and email/web content filtering. Here the attack is in progress or just a “click” away. After Actions appropriate to the “after” phase can be characterized by the phrase “Scope, Contain, & Remediate”. In the “after” phase there is intrusion detection, log management, SIEM (Security information and event management). It is obvious that this three-phase approach, with well-defined functions supplied to each phase, provides an excellent model to support a wide range of product offerings. This is a product manager’s dream scenario as each product can have a well-defined feature set whose value can be easily communicated to potential customer. The product can be optimized for the specific defense type it provides and unrelated vulnerabilities do not have to be addressed. This segmented approach addresses the “necessary” aspect of the problem but is it “sufficient”? The answer to this is largely dependent on the value of the target. For example, a residential or small business wireless router typically does not sit in front of the same pricy equipment and highly valued assets as the network of a financial institution or government organization. The cost and complexity of providing protection should match the “value” of the assets being protected. The obvious risk is the possible mismatch between the level of protection and the value of the assets. And even here there is the case where a large number of low-value endpoints (say thousands of personal computers) are breached and federated into a much larger attack (e.g. DDOS). However, the sophisticated attackers are only interested in the success of the attack and will typically craft the attack to span all three phases mentioned above. “Before” – probe the target to identify weaknesses; “During” – launch the attack; “After” – clean up evidence of the target having been breached or leave the threat in place undetectably. The attacker’s perspective will be examined in the next blog post

December 16, 2015 – Control Plane Protocols for Streaming Video Services

The explosion of streaming video happening across all networks is forcing changes to the infrastructure that governs and controls the delivery of the video. Traditional video delivery control infrastructure was an isolated island of specialized servers delivering specific video content across a dedicated network for video. The evolution of IP networks, online video, and user experiences with video has brought about significant changes in the manner and infrastructure that controls video streams for end-users to watch. The end-user experience for consuming streaming video includes many new devices, such as phones, tablets, PCs, and IP-enabled TVs, as opposed to the traditional TV of the past. Content is no longer limited to a live event or the current broadcast of a specific channel. Live events, recorded content, newly released content, and time-shifted broadcasts are just a few of the many different types of video streams that are surging over the Internet and IP networks today. The marriage of the new content with the choices of screened devices by which to display the video has placed many new demands upon the infrastructure built by Service Providers (SP) to deliver video to their customers. The infrastructure to deliver the variety of video content to the many different devices used by consumers to watch video is no longer an isolated island of functionality. The constant changes in security, authentication, location, content, social media influence, and other factors have pushed SPs to build a multi-vendor based delivery network to meet the ever changing demands of consumers. This network infrastructure enables browsing for content, selecting the content, viewing the content now or preserving the content for later viewing. To implement those functionalities, based on the network, a certain number of Video Control Protocols are being used. Northforge has done work in the area of providing monitoring and troubleshooting capability of session-related messages and information flow between the primary components of the video service network. Message exchanges (request/response) are correlated into transactions, and transactions further correlated into sessions. Based on the transaction and session data collected, performance data is provided to the network administrator. At the same time, the network administrator has the ability to view each Protocol Data Unit (PDU) in a decoded format and observe the association to a particular transaction and session. There are challenges within each use case, most notably the need to monitor different Control Plane Protocols without impacting performance of other protocols. Different protocols require different techniques. With an HTTP Based Control Protocol (HBCP), each node can use a different application protocol. To allow operational troubleshooting and performance monitoring, a new technique was required to measure the quantity and characteristics of user sessions attempting to join video services being controlled by the session control servers. With Binary Based Control Protocol (BBCP), where message lengths are not specified, to achieve the required packets per second (PPS) rate within the constraints of the processor performance and memory capacity of the network interface card, a heuristic technique for flow identification is being evaluated. To address some of these challenges, we analyzed interface specifications and packet captures from the HBCP and BBCP deployments and proposed designs to reduce consumption of resource capacity of the network interface. For transaction correlation using multiple responses and mapping error codes to specific transactions, we developed a customized solution. So far, we have achieved:

  • A way to provide operational troubleshooting and performance monitoring capabilities to service providers deploying session control functionality for video services. Performance measurements of the session attempts will allow short-term and long-term evaluation and improvement of the overall performance of video services and the infrastructure as part of a network.
  • A method to search text patterns and correlate large, fragmented HTTP PDUs from different interfaces into video control sessions, at a high packet-per-second rate.

We are pleased with the results of this project to date and will publish more information as we proceed.

December 2, 2015 – How to Reduce Latency and Improve Audio Fidelity

More and more businesses and residential telephone subscribers are moving to Internet-based service providers. According to the Canadian Radio-television and Telecommunications Commission (CRTC), there were 5.5 million retail “Voice Over Internet Protocol” (VoIP) residential telephone lines in Canada in 2013 [1] – representing nearly 50% of that market. These numbers will only increase in the future, as the arguments for choosing VoIP are so compelling: caller ID, easily-added features such as online contact lists, black lists of blocked callers, virtual numbers, physical portability (bring your VoIP line with you on holidays!), multi-way conferences, and, perhaps most significantly, dramatically reduced costs. In spite of the abundant benefits of using Internet-based telephone service, one of the common criticisms of VoIP telephony centres on voice quality – the listener’s perception of the fidelity of the audio being received or transmitted. The continuous audio of speech is broken down, typically, into 50 “packets” of 20ms duration each, for every second of sound being transmitted. Those 50 packets have to be digitized, transmitted to the party at the other end of the call, received, and converted back to analog audio, for the listener to hear. The choice of digitizing strategy (“codec”) will determine the best-case perception of quality; audio quality from that level can be eroded due to problems with the transmission of those digitized packets of sound – each packet must arrive in a timely fashion, ideally in the order in which they were transmitted. But with internet data transmission, VoIP typically cannot rely on such “guaranteed service”. Packets can arrive out of order, delayed, or they may indeed not arrive at all. Latency is the term used to describe the delay incurred by a packet of data in transit from its origin to its destination. Contributing to latency are factors such as:

  • Physical distance between the two endpoints
  • Number of physical “hops” required to be traversed
  • Bandwidth of the physical ‘hops’
  • Demands upon the network(s) being used by other network users at the same time

How much does latency erode our perception of audio quality? In an attempt to quantify audio quality, a measure called the Mean Opinion Score (MOS) was developed. Originally, the MOS was obtained by having a panel of listeners offering their opinions on the audio quality of sounds in controlled conditions, but in the VoIP world, these subjective measures have been quantified in terms of the hazards faced by packet data on a network: latency, jitter (the variation in latency) and packet loss. These MOS numbers range from 1 (“bad quality”) through 3 (“fair quality”) to 5 (“excellent quality”). Historically PSTN (circuit switched) calls have a MOS score in the 4.0 to 4.5 range and cellular networks a MOS score in the 3.5 to 4.0 range. Ignoring packet loss can have a significant impact on the MOS. Here’s a chart showing the dependence of MOS upon latency, measured in milliseconds: How to reduce latency and improve audio fidelity   We can see that any latency values greater than about half a second (500 milliseconds) will probably result in MOS values less than 3 – anywhere from “fair” quality (3) to “poor” quality (2) and “bad” audio quality (1). So what can we do to reduce latency? First we have to know from where latencies arise. According to [2] internet transmission latency comes from four basic sources:

  1. This is a physical limitation; information is limited by the speed of light; travel across North America (say, 5000km) must take at least 16ms; travel to the moon (about 380,000km) is going to take at least 1250ms.
  2. Doing digital to analog conversion, encryption, and compression on the sending end, and then decompression, decryption, and digital to analog conversion on the receiving end, all take some amount of time
  3. Assuming that we’re transmitting our audio over a shared network, (as opposed to a dedicated one of which we are the sole users) delays are introduced when other users’ data must be interleaved with our own, or when a flood of other users’ data causes our stream to be delayed
  4. Grouping/Batching. Typically audio is digitized in 20ms packets; the packet cannot be transmitted until the entire 20ms interval is converted. This introduces a 20ms delay at minimum, assuming the digitization is instantaneous (which it won’t be). On slower or congested links, the effects of packet serialization–known as jitter–may be more prevalent. The techniques to manage this, such as buffering, may introduce additional delays. Decompression methods used by some codecs may also introduce delays when lining up packets.

So what can be done to reduce latency?

  1. Make sure the audio stream’s transmission path is as short as possible, all other things being equal. Of course, this is often out of our control. Choice of network backbones, failovers, and re-routings all occur more or less outside of users’ control
  2. Ensure that there is adequate processing horsepower and memory at both endpoints. Insufficient computing power can result in slowdowns and interruptions to completing the work at hand – encoding and decoding the audio stream. Some encode/decode schemes (codecs) have greater processing burdens than others. In all cases, VoIP demands timely processing
  3. Ensure that there is adequate bandwidth on all legs of the network(s) over which the audio is being transmitted. The more traffic being handled, the more likely it happens that some of the audio packets are delayed. “Bursty” traffic can be especially problematic in this respect. If our audio has to share a network with bursty or high-volume (i.e. close to network capacity) traffic, the timeliness of our transmissions will likely be impacted (which is by definition high latency). Whenever possible, prioritize your audio traffic by specifying the appropriate QoS (Quality of Service [3]) or DiffServ (Differentiated Services [4]) settings on any network equipment in your control.
  4. Optimize the trade-off between packet size and overhead ratio. The overhead for a single RTP packet is fixed; make the packet too small, and you’ll be wasting a lot of bandwidth; make it too large, and you’ll be adding to latency.

Further to the multiplexing issue above, much very low-level work has been done with respect to a problem known as “bufferbloat” [5]. Routers and switches have to buffer incoming packets when those packets’ outgoing destinations are “busy”, either through congestion or just contention. For many years, there was a tendency to design these devices with ever-larger buffers for handling such situations. Research has shown, however, that this tendency has actually caused breakage of congestion-avoidance algorithms, resulting in greater, and more variable, latencies. So in addition to the other strategies mentioned above, selecting the right networking equipment is also very important to latency minimization.   References: [1] http://www.crtc.gc.ca/eng/publications/reports/policymonitoring/2014/cmr5.htm figure 5.2.1 [2] Workshop on Reducing Internet Latency, 2013. Internet Society (internetsociety.org) [3] http://en.wikipedia.org/wiki/Quality_of_service [4] http://en.wikipedia.org/wiki/Differentiated_services [5} http://en.wikipedia.org/wiki/bufferbloat

November 11, 2015 – How “Dynamic Pathing” Provides Benefits Not Achievable with Static Physical Device Chains

In our last blog post we explored how Service Function Chaining (SFC) differs from the traditional model of creating a sequence of functions. Another key difference is that the path through the SFC can be “dynamic”, as noted in this Intel white paper. The dynamic qualities of the SFC can take many different forms. For example, classifiers at the chain ingress can control which elements of the service chain data passes through via metadata applied to the packet (see illustration below). Additionally, the entire service chain can be reconfigured dynamically as service needs change (scaling). Traditionally, paths through elements have been controlled by L2/L3 switches/routers. As topologies have become more complex, overlay (virtual) networks have been applied to manage these topologies. The additional capability inherent in SFCs is that these paths can be service-aware – not just source/destination aware – on (potentially) a packet-by-packet basis. A very simple example is illustrated below. Client1 is making HTTP requests, Client2 wishes to establish a SIP connection to the vIPBX. At a high level all traffic entering/leaving the SFC is directed to the Firewall. Client1’s traffic passes only through the Firewall and SFFs before reaching the terminating service; Client2’s traffic passes through the Firewall and SFFs and reaches a different terminating service. The determination of the path is determined by the interaction of the “classifier” (a logical function) and the SFFs. The logical function determined by the classifier could take many forms:

  • It might effectively do nothing and rely on the flow tables present in the SFF (which were populated by some other mechanism).
  • It might directly manipulate the flow tables based as the result of how it classified traffic
  • It might attach “service headers” to the traffic that the SFFs could match against their flow tables.

Service headers are a currently a subject of much discussion and can range from something as simple as an MPLS-like tag or a rich “network service header” as described in various IETF drafts (see RFC 7498 for a general SFC problem statement). NII_Blog DiagramIn conclusion, the above model can be compared to the initial model of a fixed sequence of ordered physical devices connected in series (likely through static L2 paths). The capacity of the SFC to (auto) scale and (auto) path provides valuable benefits not realizable with static physical device chains.

October 26, 2015 – How Automatic Scaling and Optimal Pathing Benefits Service Function Chaining

One of the obvious questions relating to Service Function Chaining (SFC) is how does it differ from the traditional model of creating a sequence of functions (FW/IPS, ADC, vWAN, Router), whether in specialized hardware or dedicated VMs, and passing traffic between those functions in a switched data plane? While that model could be considered to be a service chain, it lacks two qualities that bring the maximum possible benefit to a service chain: automatic scaling and optimal pathing through the service chain. A simple high level definition of a service function chain (SFC) is that an SFC is the ordered (though not necessarily static) set of functions needed to provide a given service. While there is much discussion about basing these SFCs on Virtual Network Functions (VNF), ideally all classes of network functions could be accommodated by the SFC framework. Typically the four classes of network functions include:

  1. Traditional fixed function dedicated appliances (e.g. a dedicated Next Gen Firewall appliance)
  2. Augmented COTS hardware (e.g. a COTS server augmented with a PCIe network processor card)
  3. Fixed function COTS (e.g. a vendor provides a COTS box with certain functions pre-installed and qualified)
  4. Elastic COTS (e.g. a COTS server with sufficient resources to host a “scalable” set of VNFs)

(1.) Must be accommodated to utilize existing assets. (2.) Must be accommodated to address performance requirements that exceed what a reasonable COTS platform(s) can provide. (3.) Enables vendors to provide a pre-validated set of functions that will provide a stable and predictable (set of) functions. (4.) Provides a VNF framework that can scale up/down, in/out, and be flexibly located within the network. In principle, an SFC could be constructed from any mix of the four elements above. The emerging standards encompass the different mixes at a high level. However, there is one common element that applies in all cases: the existence of a centralized controller that has enough built-in intelligence to know how to construct these chains and gives the ability to create, deploy, and scale multiple instances of these chains without manual intervention once the topology has been created. That centralized “controller” is assumed to exist and enable the benefits in the discussion below. Details of that controller will not be covered but it is assumed that in can provide the needed services. Secondly the most compelling benefits are provided by class (4.) and that will be the focus of what follows. Dynamic Service Function Chains One way to look at the “dynamic” nature of SFCs composed of elements of class (4.) is consider two aspects: the first is “scaling” (out/in, up/down) and the second is to look at dynamic paths through the SFC. “Scaling” The conventional definition of scale out/in is to add/remove resource instances to accommodate need. Scale up/down implies the modification of a resource to accommodate need. In the context of a service chain this could be extended to include the introduction of specific resource types (e.g. introduce a new function into the service chain during a DDOS attack as opposed to just adding more elements to absorb the traffic increase). One of the compelling features of an SDN/NFV-enable network is to be able to respond to peak demand for services (cell phone service at events such as the Mobile World Conference is a frequently used example). In order to provide this response more resources need to be pulled in during the demand spike and released back into a service pool once that spike has passed. While this could be done, in principle, by manual intervention the cost and complexity of that intervention is prohibitive. What sort of events could trigger scaling in a service chain? For scale out/in, in a fully automated environment, the load on the elements in the service chain is monitored and resources automatically added (scale out) or removed (scale in) to accommodate the current load. Alternatively a customer might request a higher quality of service for a specific period of time or event via a customer portal. Here new resources might not be pulled into the service chain but changes to the parameters of the service chain would result in the dynamically provisioned user being given priority over “best effort” users. To illustrate a scale up/down scenario taking the DDOS reference above, consider a virtual CCAP (Converged Cable Access Platform) that might control as many as 6000 DOCSIS downstream channels. It would be uneconomical and logistically complex to instantiate more of these platforms to respond to a DDOS attack. It would be more efficient to redirect attack traffic to devices that could accommodate the attack load effectively. In any of these scenarios the resources could be distributed (limited by interconnection capacity) and with the existence of the centralized controller, scaling can be enabled across locations. In our next blog post, we’ll look at “dynamic pathing” and at an example on how the capacity of the SFC to (auto) scale and (auto) path provides valuable benefits not realizable with static physical device chains.

August 25, 2015 – OPNET Modeler (currently, Riverbed Modeler)

Network Simulation Wikipedia describes network simulation as “a technique where a program models the behavior of a network by calculating the interaction between the different network entities (hosts/routers, data links, packets, etc.) using mathematical formulas.” But beyond the basic description, how can you use network simulation to predict and check your network systems? Network simulation is designed for characterizing, creating and validating the communication solutions, computer networks and distributed or parallel systems. It enables predicting network behavior and network performance. One can create, run and analyze any desired communication scenario. Generally, a simulation is the only method that allows continuous testing and debugging of network comprised of hundreds and thousands of communication elements (devices, hosts, routers, switches, servers, so on), since a standard lab won’t do, and field tests are expensive, difficult to operate and non-deterministic. Simulation Platforms There is a variety of simulation platforms available, some of them open source and some of them commercial: OPNET, QualNet, OMNET++, NS-3 … Common characteristics of all simulation platforms are:

  • Determinism – the simulation of the same scenario will give exactly the same results after each run (same sequence of events, same phenomena and same bugs)
  • Discrete event simulation (DES) – the running scenario is represented by the ordered sequence of well-defined events. Each event is composed of two components: the scheduled time and the trigger of the event’s handling mechanism. For implementation of a DES mechanism, a simulation platform manages its own timeline that has nothing in common with actual time, i.e. the half an hour scenario of complex system composed of dozens of devices, events, packets and so on may take hours and even days to execute.
  • All simulation platforms are PC GUI applications (Windows or/and Linux)
  • Model Library – implemented and ready for use standard models: CISCO routers and switches, Check Point firewalls, hosts, servers, application traffic models (HTTP, FTP, email, …), physical links (cables, optics, wireless – even based on real terrain), protocols (routing – OSPF, BGP, …, data link and MAC (Ethernet, Wi-Fi, LTE, …), transport and application layer protocols)
  • Integrated Development Environment (IDE) – comprehensive ability to implement your own model of any component in simulation. IDE supplies the ability to create a finite-state machine and describe each state’s behaviour using C/C++

Additional characteristics that could be present in a simulation platform:

  • Analyzer Tool – statistics of network behaviour and performance: bandwidth, failures, packet loss, delays, bottlenecks … Network behaviour visualization and performance charts
  • System-in-the-Loop (SITL) – interface that connects operational communication systems and applications (Real World) with simulation of communication systems and applications (Simulated World)
  • Parallel Simulation – capability that allows running the simulation scenario in distributed manner in order to decrease the time the scenario is running – multi-core and/or multi-processor, distributed (multi-computer) simulation

OPNET Modeler OPNET Modeler, currently known as Riverbed Modeler, is a commercial simulation platform provided by Riverbed Technology. It has very rich library of standard models, mostly supplied by the vendors themselves. This library allows the composition of almost any existing network on top of simulation, analysis of networks to compare the impact of different scenarios and technologies on end-to-end behavior. OPNET Modeler also has a very friendly IDE for development of your own devices, protocols, network mechanisms and algorithms across the communication stack. OPNET Modeler as a simulation platform can be used for different purposes or for any composition of them:

  • Network Planning: planning, inspection, predictive “what if” analysis, and optimization for communication networks; interaction with Network Management for improving real existing networks
  • Development of new components: development of protocols, algorithms and applications based on simulation Important: It is possible to implement a cross-platform framework that allows development and running of real operation source code on top of OPNET simulation. That means that exactly the same implementation is running in real equipment and in simulation, so called “One truth” principle – operational source code in simulation.
  • Communication Test Bed and Laboratory Extension – System-in-the-Loop interface allows extending communication test bed with simulated equipment. The real hardware, software application and users can interact with numerous virtual devices within the simulation model, potentially avoiding the need for an expensive test lab.

OPNET Modeler Basics Modeling OPNET Modeler is a PC GUI application that allows graphical construction of communication network in three-tiered hierarchy:

  1.  Network Model: composed of nodes, links and subnets
  2. Node Model: composed of node building blocks – processors, queues, transceivers and interfaces between them
  3. Process Model: composed of a finite-state machine diagram, blocks of C/C++ code, OPNET kernel procedures

To be continued … watch for follow-on blog

 

June 10, 2015 – Taking IP PBX to the next level with Network Functions Virtualization

The latest telephone technology isn’t limited to fancier smart phones. Today, telecom companies are also advancing their communications and networking technologies to add a wealth of flexible and advance business calling features. They’re taking advantage of a variety of technologies to allow their customers to tap into new IP telephone services at a small price. First, let’s take a recent advancement in telephony systems — VoIP (Voice over Internet Protocol). VoIP uses a packet switched network, like the Internet, to pass digitized voice data from one point to another, allowing telecommunications companies to squeeze more conversations into the same amount of bandwidth. Many companies use a PBX (Public Branch Exchange) telephone networks to minimize cost. Instead of having a single telephone line for every office or department which are only used for a fraction of the time, the company can reduce this to a few lines with the use of PBX while still having a telephone unit in each office. All internal calls are routed internally while calls to the outside take any of the available outside lines. But older PBX systems are not equipped to handle VoIP calls because they were created and perfected before VoIP technology was introduced whereas the newer systems, called IP PBX (Internet Protocol Private Branch eXchange), support VoIP. As VoIP gains widespread acceptance, companies and manufacturers have been motivated to develop IP PBX systems in order to have the advantages now available with VoIP. Utilizing VoIP in a PBX system can result in a seamless integration where users can use the same phone to dial outside numbers or call a branch office in another country via VoIP. An advanced PBX system can lessen a company’s phone bill by such a huge margin that most companies who need to replace their older PBX systems have opted to add VoIP support by purchasing and installing an IP PBX system or to use a cloud-based, hosted IP PBX. In part due to its cost savings, VoIP is becoming the future of PBX systems. But why stop there when you could use NFV (network functions virtualization) to reduce the cost of developing the IP PBX systems. NFV is a network architecture concept that enables the creation of communication services by connecting or chaining multiple network node functions that are all virtualized. IP PBX systems can benefit from virtualization. When deployed in a virtualized CPE device, IP-PBX provides VoIP service to an enterprise network. At Northforge, we’re working to design and implement a small footprint IP-PBX VM that can run on a single core of an Intel Atom class processor in a fan-less CPE (Customer Premise Equipment) device. The IP-PBX performance target is 60 concurrent calls and needs to run concurrently with two additional VMs (virtual machines) on a virtual switch and hypervisor on the Atom system without degradation of speech quality. An IP-PBX is typically deployed as a dedicated physical server in an enterprise network, where all resources such as processors, memory, and network access are dedicated. When it is deployed as a virtual machine instance in a virtualized CPE device, its resource use is in competition with that of other virtual machines co-located on the same CPE device. This resource competition with constrained resource availability makes it a challenging task for a virtualized IP-PBX to provide VoIP service with predictable and sustainable performance. Preliminary prototyping has shown that an IP-PBX as a single VM on an Atom class processor suffers from speech quality issues (broken speech) with 60 concurrent calls. X_NII_MKT_IPPBX-Diagram_V1-e1433955312510 We set up a virtualization lab, developed a virtualized open source IP PBX and conducted experiments with the virtualized IP PBX to characterize its resource usage behavior when there are sustained 60 simultaneous call on an Atom-class processor and to determine what performance bottlenecks need to be addressed. While the work still is in progress, we hope to be able to establish 60 simultaneous call test environment in which 120 registered extensions are used. Our target is to have the IP PBX run with 60% CPU consumption in steady state with 60 concurrent call sessions; along with two additional VMs running on the processor; 1 Gbps ingress and egress traffic including voice traffic; and to maintain PESQ (Perceptual Evaluation of Speech Quality) of 4.0 or better while the system is processing 1 Gbps traffic. By taking advantage of the newer IP PBX systems, and adding the technology advancements that can be gained from Network Functions Virtualization, Northforge is working to reduce the cost of developing the IP PBX systems for its customers, which in turn, will bring more services at more affordable costs to the end customer, the consumer.

May 20, 2015 – Are we approaching the era of Dynamic Spectrum Auction?

In a previous blog, we talked about the 5G requirements (Latency<1 msec. and bandwidth>1 Gbps) and services that can be considered true 5G use cases. We also mentioned that 5G enabled services will require a delay time of less than 1 msec. and that to meet this requirement content servers will have to be less than 1 km from the user’s device. This constraint of 1 km creates the need, in terms of interconnection, for greater co-operation between operators. According to GSMA, ultimately, it would make sense for a single network infrastructure to be implemented and shared between operators. This concept of a single network infrastructure seems to align quite well with the dream for an open market where new entrants could coexist harmoniously with giant operators via LTE virtualization, as LTE virtualization could enable Auctioned Based Dynamic Spectrum Sharing (ABDSS). LTE Virtualization is becoming a reality. The LTE network is composed of 2 main components: the Evolved Radio Access Network (E-RAN) and the Evolved Packet Core (EPC). The E-RAN is composed of a collection of eNodeBs, while the EPC comprises different network elements: MMEs, Serving Gateway (S-GW), PDN Gateway (P-GW), PCRF, and HSS. Figure 1 (taken from ref. 1) shows the basic architecture of the LTE Network. EPC virtualisation solutions are available according to many equipment vendors, but the E-RAN is not. The E-RAN/eNodeB is the entity responsible for accessing the radio channels and scheduling the air interface resources, it had to be virtualized to enable ABDSS. Figure 1 Figure 1 Figure 2 Figure 2 There is much ongoing research on E-RAN and eNodeB Virtualization. The framework architecture that seems to generate consensus is depicted in Figure 2. It adds a Hypervisor layer that does not exist in the current non-virtualized E-RAN/eNodeB architecture. In the non-virtualized E-RAN architecture, bandwidth is preconfigured and the eNodeB MAC scheduler takes care of subdividing it into Physical Resource Blocks (PRBs), which is the smallest unit that can be assigned to a user. In a virtualized environment, the Hypervisor layer will take care of managing the overall spectrum and schedule PRBs between different virtual operators. Is it more that technology for infrastructure is nearly ready? The required cloud computing platforms like OpenStack, Eucalyptus, OpenNebula, and many others contain most of the required functions (Networking, Hypervisor Management, Control and Monitoring of physical/virtual Infrastructure, etc.) for the implementation. The only remaining question is: what’s the financial interest for the giant operators to adhere to this effort? References:

  1. 3GPP TS23.401 V13.2.0
  2. Networking 2011 Workshops: Realizing the Broker Based Dynamic Spectrum Allocation through LTE Virtualization and Uniform Auctioning Yasir Zaki1, Manzoor Ahmed Khan2, Liang Zhao1

 

March 2, 2015 – Will 5G Deliver on Expectations?

The next big shift in mobile will come with greater capacity needs and with the requirement to process data much, much faster. While mobile users are still getting familiar with the capabilities of 4G, network operators will be racing to develop fifth-generation network technology to be ready for 5G that’s expected to be available by 2020. At the turn of the decade, 5G is expected to have 1,000 times the capacity of 4G. This will open up many possibilities for new services, not only for mobile phones but for wearable devices and new Internet-connected hardware that will be developed in the Internet of Things era. Today with 4G, streaming games in real time is really limited on mobile phones, but it will be a given requirement once 5G is in place and for the devices and appliances now being dreamed up by Internet of Things designers. In anticipation of 5G, GSMA published a whitepaper on 5G last December to clarify what 5G really is and what it means in the technological sense. GSMA pointed out two point of views about 5G that seem to result in contradictory requirements. One view indicates that 5G will deliver substantial reduction in end-to-end latency and tremendous increase in data speed. The other view sees 5G as consolidation of 2G, 3G, 4G, Wi-Fi and other innovations providing greater coverage and always-on reliability. To address this confusion, the GSMA whitepaper tries to provide answers to three questions:

  1. What is (and what isn’t) 5G?
  2. What are the real 5G use cases?
  3. What are the real implications of 5G for mobile operators?

From the GSMA perspective, the new technology shall address requirements that the previous technology was unable to satisfy. In the perspective of 5G, the weaknesses to solve from 4G is still a question mark. This concept is summarized in the table below. northforgee-5g-expectations-blog-graphic The two visions of 5G (Hyper-Connectivity and Next Generation Radio Access) that are driving the work initiatives on 5G have identified eight requirements:

  1. 10Gbps connections to end points in the field (i.e. not theoretical maximum)
  2. 1 millisecond end-to-end round trip delay (latency)
  3. 1000x bandwidth per unit area
  4. 10-100x number of connected devices
  5. (Perception of) 99.999% availability
  6. (Perception of) 100% coverage
  7. 90% reduction in network energy usage
  8. Up to ten-year battery life for low power, machine-type devices

GSMA does not see requirements 3 to 8 as technical requirements but rather as aspirational statements from network operators on how networks should be built; i.e. cheap and reliable. GSMA only sees requirements 1 (>1Gbps DL speed) and 2 (<1ms latency) as measurable network deliverables. The table below illustrates the use cases. northforge-5g-expectations-blog-graphic2 Note that to achieve downlink speeds greater than 1Gbps, new RAN technology will be required. For latency, services that require delay of less than 1msec must have their server content close to the user device. northforge-5g-expectations-blog-graphic3 The architecture depicted above will require close collaboration between network operators (one network approach). Perhaps there will be no need for frequency auctions. When should we expect to see the first 5G demo? SK Telecom announced last July that it had signed an agreement with Ericsson to develop 5G technology in time to demonstrate a network at the 2018 Winter Olympics in Pyeongchang. But will 5G address the weaknesses we are now experiencing with 4G, such as the flawed handoff between Wi-Fi and cellular? Will we still have latency issues and non-seamless handover between carriers? There’s time between now and 2020 to address these issues, but it will be interesting to see how 5G solves the issues of 4G, whether is lives up to its real potential, and perhaps more interesting, what will be the changes in the way consumers will use the 5G technology.

Ref: https://gsmaintelligence.com/files/analysis/?file=141208-5g.pdf

January 28, 2015 – Implementing hardware accelerated IP route cache in Linux using Broadcom StrataXGS multilayer switch

By using cost-effective switching solutions, such as those from Broadcom, equipment manufacturers can reduce the cost of their solution and meet performance targets.  An L3 switch can be implemented using hardware L2 packet switching and CPU based IP forwarding. L2 data packets don’t need to be modified, therefore the L2 throughput can achieve 1Gbps wire speeds. The Linux IP stack has a robust and efficient implementation. Along with multiple routing protocols suites, both open source and proprietary, Linux is a common choice for implementing a software based router. Although, the Linux IP forwarding code is very efficient, network equipment vendors tend to use underpowered CPUs to drive the cost down. Some of the Broadcom switches have an integrated CPU in them to reduce the cost of a product even further. Moreover the same CPU can be used for management, network control and L3 forwarding simultaneously. This approach introduces performance issues during high traffic loads. For example, when the device forwards FTP traffic destined to another network element, device management suffers delays. A network administrator might even get a time out while configuring the device under heavy traffic. Under more severe circumstances, the network control packets (like OSPF hello packets) are dropped or delayed until the routing protocol considers the link to be unreliable. If this ever happens the routing software declares the link unavailable and notifies all routers on the network about a topology change.  After the topology change the FTP traffic is redirected to avoid the unreliable link. As the packet flow over that link decreases, the routing protocol restores the link. This link flapping usually continues until the FTP transfer is completed. To solve the performance problem of a software based router, and in order to avoid starvation of management and control protocols, IP forwarding can be delegated to the hardware, thus offloading the CPU to handle more urgent packets. One such solution, the Broadcom StrataXGS line of products, is a multilayer switch that can be used as an enterprise solution with full line speed L2 switching and line speed L3 forwarding for up to 2K most recently used destinations. The StrataXGS switch is usually equipped with a limited amount of memory for the L3 table. This L3 table is an exact match table and its content needs to be maintained by software running on the CPU. At the beginning of a day, the L3 table is empty and all IP packets are trapped (sent) to the Linux IP stack. Since the IP stack maintains the full routing table learned from the routing software running in user space, Linux is able to perform IP forwarding decisions. As part of the destination IP address lookup, a routing cache entry is created by the Linux implementation. Using this routing cache, all following destination lookups will be much faster. When the routing entry is actually used for forwarding, a hardware L3 entry is added asynchronously, if needed. When the L3 entry is eventually added to the hardware, the Broadcom switch will stop trapping L3 packets to the CPU and will forward them at wire speed. An independent kernel activity, running in the background, periodically scans the L3 table. Any L3 entries not used for a specified time are removed to free up precious memory. When a routing cache entry is invalidated as a result of the link going down or being removed by the routing software, the L3 entry should also be deleted from the hardware as well. The removal of L3 hardware entries should not rely on the aging method described above. Consider the following scenario: the routing software finds a better route towards some destination. It redirects the route to another link, the routing cache entry is deleted, but the L3 entry is not. The existing traffic to the destination is forwarded in hardware towards the previous next hop.  Since the traffic is not stopped, the L3 entry is considered in use and never deleted. Therefore, the routing cache removal routine should also delete the appropriate L3 hardware entry. Another pitfall of this type of implementation is that once the traffic is forwarded in hardware, the software routing cache is not used and its entries will age gradually, causing routing cache flushing periodically. To resolve the problem, the aging activity that scans the L3 table, should synchronize the usage flag between the hardware and software caches. The suggested solution is not a replacement for proper traffic management, since when the L3 table is not populated yet or if it’s already full, IP packets will be sent to the CPU for IP forwarding decision. IP packets trapped to the CPU to be forwarded should have the least priority and should also be rate limited by the Broadcom hardware before sending them to the CPU. This simple implementation allows equipment manufacturers to reduce the cost of their solution while providing added value for the enterprise or small office clients by adding L3 functionality accelerated in hardware.

January 7, 2015 – Migration of existing applications to Cavium OCTEON using its SDK/ADK and/or Linux Kernel for acceleration

If you are developing a processor intensive network application, such as encryption/decryption, Virtual Private Networks, intrusion detection and prevention, or Network Attached Storage appliances, you’ll need a highly-integrated security and network application processor chip that delivers tremendous network throughput, such as the Cavium OCTEON® processor. Designed for increasing network application performance for enterprises, data centers and service provider infrastructures, the Cavium OCTEON family of multi-core MIPS64 processors is a scalable, high-performing solution for intelligent networking applications ranging from 100Mbps to 100Gbps. This family of processors is targeted at intelligent networking, wireless, control plane, and storage applications, and is being used in a variety of OEM networking and storage equipment, including routers, switches, unified threat management (UTM) appliances, content-aware switches, application-aware gateways, triple-play gateways, WLAN and 3G/4G access and aggregation devices, storage arrays, storage networking equipment, servers, and intelligent NICs. OCTEON offers two options for application development – use Cavium’s ported Linux or Cavium’s Simple Executive for core OCTEON data plane applications. If you don’t need the full throughput of Cavium’s OCTEON, then using the first option is a good approach since it gives you the benefit of being able to quickly port and use any application written for a standard Linux distribution. With this approach your standard application that’s been developed for a standard Linux distribution will pretty much run with minimum changes on OCTEON, while you still get to take advantage of many components of the OCTEON hardware. It’s a faster development time since you only need bare minimum changes to migrate to OCTEON and you’re still able to use open source software written for Linux. This is often done for control plane applications on OCTEON since these applications do not need to run at line rates. With the second option – writing core OCTEON data plane applications, you don’t have a full OS (Linux).  OCTEON provides a bare minimum boot loader (port of open source Uboot) and then your application directly deals with hardware. Here you can get the best of OCTEON’s throughput and avoid overheads (scheduling, context switches, etc.) introduced by an operating system.  However, you have to write applications from scratch using the OCTEON SDK and you cannot quickly port code written for standard Linux. But with this approach, you can fully utilize the capabilities of the OCTEON chip. For example, if you were creating a large real world application, you’d select this option for most of your data plane processing using most of cores of an OCTEON chip, reserving one or two cores for other functions.  The reserved cores might be running the first option (explained above, i.e. Linux) for control plane and sending accumulated data stats to the external world. Many view the OS kernel as part of the performance solution, but rather, it can be part of the problem. By running software without any OS between itself and hardware, we have the liberty to take packet handling, memory management etc. out of hands of the kernel and put into the application where it can be done efficiently while not wasting any cycles in say context switching. We program it ourselves on bare hardware, and we can do it fast enough for the performance requirements that’s needed. We let Linux handle the control plane and continue to let the application handle the data plane. Cavium’s OCTEON gives the ability to do this when needed – even adding specialized hardware to perform at par with this in true parallel fashion (i.e. there is no OS process scheduling with running your code, which makes it slow). OCTEON can do true multicore processing by sending different flows (traffic between one user session and website) to different cores. By doing the routing in hardware it’s far faster than doing it via software locking in than, let’s say, a Linux kernel. Usually the bottleneck for processing isn’t attributed to the processor but rather the IO, scheduling and other aspects OCTEON greatly speeds up by providing specialized hardware. At Northforge we’ve developed solutions for customers that take advantage of the control and data planes for many OCTEON based applications, whether you need a gigabit WiFi access point, a firewall for a large institution or other network applications. We can help you can take advantage of a scalable processor line for high performance and integration for many of your advanced security, networking and application acceleration needs.

November 10, 2014 – Northforge Increases Performance of Mission Critical Device Powered by Cavium OCTEON® Processor

Making devices perform faster is always a challenge, but a challenge that drives us to develop the very best solutions.  Recently Northforge was tasked with increasing the performance of a mission critical Cavium OCTEON® -based device by a factor of 3-4X – certainly a tall order. To realize that magnitude of improvement, we calculated that the device needed to take advantage of new hardware acceleration libraries, but these typically operate in an environment that requires more memory than was available.  The device is powered by a Cavium OCTEON® processor and has a small memory footprint design, with all software and the OS residing in 256MB of flash memory.  Thus any changes we made had to fit into the available memory.  Our new more efficient design was achieved by converting the existing applications to seamlessly use the Cavium OCTEON Application Development Kit (ADK), which allows them to take more advantage of the hardware acceleration features embedded in the OCTEON. But no challenge is complete without some technological obstacles. To garner the faster performance we had to migrate the device software to use the Cavium ADK so that we could make use of the latest acceleration libraries. Our challenge was that the standard ADK acceleration libraries only come with a full version of glibc, a C standard library which requires more memory than was available on the customer’s device Flash memory. To solve this problem, Northforge modified the build implementation of the standard Cavium ADK to use a much smaller library, we choose to use uclibc instead of glibc, thus reducing the memory requirements while gaining access to the acceleration libraries. With this design modification and integration of uclibc into the Cavium ADK, we completely replaced glibc so that all device applications could continue to use the same APIs and we didn’t have to modify any of the other device software. Our new design along with the complete device system load, still fits within the 256MB flash storage limit of the device. Northforge also developed drivers to bypass some kernel network processing steps and used the hardware acceleration in the Cavium OCTEON to gain even better performance. The result: Overall performance has improved by 7 times, and a very happy customer! Our Cavium Expertise Working in partnership with your engineers, Northforge can ease the integration of Cavium products to complete your solution quickly and get your products to market faster. For some examples of how our expertise helped companies to increase performance and capacity, click here.

September 25, 2014 – What Not to Bring into the Cloud

As enterprises and service providers look to offer on-demand computing resources and need to manage large networks of virtual machines, many companies are looking to the OpenStack cloud computing platform as a way to gain the flexibility they need to design their cloud. OpenStack provides that flexibility without the requirement of proprietary hardware or software and has the ability to integrate with legacy systems and third-party technologies. OpenStack is a free and open-source software cloud computing software platform that’s primarily deployed as an IaaS (infrastructure as a service) solution to control processing, storage and networking resources throughout a data center. It can be used for processing big data with tools like Hadoop, for scaling computing to meet demand, and for high-performance computing (HPC) environments to handle diverse and intensive workloads. We recently deployed a software system for a client on OpenStack to study the feasibility of an OpenStack system. That system was a stack of six instances and four networks described by an OpenStack Heat template. It sounded easy enough. But here is the twist — since a non-virtualized version of that system would still exist, it was decided that we would deploy to OpenStack an unmodified version of the system to reduce the number of differences between the two types of deployment. Eventually, a few things felt just like legacy systems that you try to bend to fit them to a new environment. The system is being actively developed, but many important design choices had been made long before people started to be concerned about virtualization and clouds. Here are a few points to illustrate that. The Uncommon Linux Distribution In our case, the guest OS had to be an uncommon Linux distribution on which we had no control, and it wasn’t cloud-ready like most mainstream Linux distributions. Because of that, my guest OSes had to run without Virtio’s paravirtualization drivers (a virtualization technique that presents a software interface to virtual machines that is similar, but not identical to that of the underlying hardware)Theoretically, this is not a problem because OpenStack offers what you need to live without paravirtualization. For example, with some efforts, you can live without Virtio disks by using IDE disks. But according to the number of issues we had to deal with because we were trying to avoid Virtio, this is not a well-traveled road, and it took more effort. We had to visit OpenStack’s issue tracker in order to find workarounds. Another issue with our guest OS was with its serial console. The serial console is what OpenStack can offer you to login to your instance without depending on network connectivity to your instance. The serial console of our guest wasn’t exactly how OpenStack expected it to be, and so it took some more effort to get it working, while it worked at the first attempt with mainstream Linux Distributions. The Forbidden DHCP Server One of the instances of that system provided a DHCP server which wasn’t working at first. After some traffic capture and analysis, we quickly saw that OpenStack was dropping DHCP responses. And it was by design, since it could interfere with some of the fundamental gears of OpenStack. A quick Internet search shows that there is one workaround. You can change a kernel setting to disable that packet filtering, but it has a few side effects. It disables all OpenStack’s security group filtering, and all Linux bridge filtering on the host. Also, nobody can say which other side effects will appear in the mid- to long-term. It wasn’t a perfect workaround, but we could run on an unpatched OpenStack, to explore further the feasibility of the project. The Link Aggregation Emulation One of the things needed for the system to work, was link aggregation. Without it, some configuration logic within the instances would get confused. It is more enlightening to say we needed link aggregation emulation, since all virtual aggregated links will most likely end up on the same physical cable anyway, suppressing any advantage provided by link aggregation. There is no reason to design that for the cloud, and it made a few tools unhappy when trying to create 2 NICs on the same network for the same instance. In our case “nova boot” and “heat stack-create” refused to do it while “nova interface-attach” was fine. Post Heat Deployment Steps The deployment was encoded by a heat template. At the end we were left with two things that could not be deployed from within the heat template. An IDE volume (not a Virtio one as explained earlier) and the ports for the link aggregation (because “heat stack-create” didn’t support them). Although they were pretty simple steps for anybody comfortable with all the OpenStack client apps (nova, neutron, cinder …) it put more details in the deployment procedure which is already, lengthy enough. Therefore, when you move software systems to the cloud, bringing over infrastructure details whose designs are pre-virtualization, it brings another source of risk. Whether you can afford to break these dependencies on these infrastructure details is, of course, a topic for another discussion.

August 6, 2014 –  Challenges that need to be overcome before NFV can be commercialized

In our last blog post on our three-part series on Network Functions Virtualization, commonly known as NFV, let’s look at what’s holding us back from commercializing NFV. We know that as carrier revenue growth flattens, the cost to grow network footprint and capacity will become unmanageable. In fact we are already seeing that adding subscribers as businesses and consumers shift their subscriptions from one company to the next is driving marketing costs higher.  In the face of saturation, carriers need to take a good look at their network operations to drive costs down and continue profitability. Over the next five years, carriers will be controlling their CapEx and OpEx costs as they increasingly shift their spending from expensive networking hardware to software-based solutions.  By 2018, software-based networking solutions will have taken a significant share of traditional carrier networking hardware market which was about $150 billion in 2013.  NFV will be a highly disruptive factor for many legacy service provider equipment suppliers, carriers and enterprises. Carriers see the need for increased network flexibility and application awareness as key to remaining competitive. Further, they must drastically reduce the time to market for new revenue generating features by breaking the bond equipment suppliers have over them with proprietary designs that are expensive and slow to innovate. But what needs to be overcome before NFV can be commercialized?  Here are several considerations: Portability / Interoperability: To define a unified interface which decouples software instances from the underlying hardware as represented by virtual machines and their hypervisors.  This has to work with all sorts of VMs e.g. from VMware or Virtual Box to a kernel based one from another provider. Performance Trade-off: A probable decrease in performance is to be taken into account which is inevitable because of moving from specialized hardware (e.g. acceleration engines) designed for one specific purpose to a more generalized industry standard hardware.  The key is to minimize its impact by, say, using a hypervisor that can scale / parallelize it to more available hardware resources and take advantage of numbers. Migration and coexistence of legacy platforms: NFVs would need to coexist with operator’s legacy equipment and be compatible with their existing functions for a gradual and seamless take over.  Legacy systems can’t converge until NFV provides a solid migration path. Automation:  Scalability of NFV is strictly tied to automation.  Anything less would produce the same sort of issues that operators already face when scaling physical infrastructures.  This should cover a consistent architecture of orchestration to control the entire VNF with software. Security and stability: More software is generally translated into “more prone to security attacks”.  So operators need to be assured that this migration is not brought at the cost of security and thereby reliability of their services.  The same faith needs to be extended with running a variety of heterogeneous appliances from different hardware vendors.  Tightly integrated security software is needed to cover that ground. Simplicity and integration:  Integrating multiple virtual appliances from different vendors seamlessly with servers and hypervisors which need to be controlled via standard software interfaces is a daunting task.  On top of this, the user interface of the software must remain simple enough to chain with other appliances to provide another NFV service. A proof of concept needs to address most of the above issues to be able to make it into any production environment. NFV is a key priority for carriers and not only are tier 1 carriers supportive of this software-based approach, but they are leading efforts to standardize and implement solutions over the upcoming years.   NFV is in several carrier trials and many vendors have committed to key NFV focused roadmaps. But for these initial adoption efforts to come to fruition, these challenges need to be addressed in the trials being implemented this year and in the production deployments over the next few years.

July 2, 2014 – How is the industry going to implement Network Functions Virtualization?

Over the next several years, carriers and enterprise organizations will need to implement Network Functions Virtualization (NFV) in order to curb network CapEx/OpEx spending and to be able to jump into markets with new services more quickly. By virtualizing network services, IT managers will be able to deliver the application performance once only attained with high-cost proprietary hardware and do so at a lower cost.

Setting the new standard

To assure that the industry abides by standard requirements as well as design approaches, an industry specification group of the European Telecommunications Standards Institute (ETSI) has produced several specifications on use cases, architecture and design of virtualizing network functions. The group represents more than 150 companies including 28 service providers such as AT&T, BT, China Mobile, Dell, HP, NEC, Verizon, and Orange. One of the key points resulting from the efforts of the ETSI working group is that NFV can be adopted incrementally by doing one step at a time rather than complete revamp of an entire service provider system. With this approach, operators can use NFV to minimize risks to their existing business and at the same time let them migrate and gain advantages of this advanced technology.

NFV Use Cases

The virtualization we have today uses virtual machines (VMs) running on hypervisors which already have separated Network Functions (NF) and their dependency on hardware. This results in sharing of the physical hardware by multiple VNFs in the form of Virtual Machines. If this approach is taken one step further — a bigger pooling of hardware resources by VNFs, it could achieve the objective laid out for NFV. Example scenarios which can be considered as a first step for mobile networks are EPC (Evolved Packet Core) and IMS (IP Multimedia System) Network Functions. Some of the prospective candidates for virtualization could be MME (Mobility Management Entity), S and PGW (Serving and Packet Data Networks Gateway), as well as base stations – all of which use different wireless standards.The EPC and IMS NFs can be consolidated on the same hardware resources. BS (base station) functions e.g. PHY/MAC/Network stacks that handle different wireless standards (2/3G, LTE/WiMAX) can share hardware in a centralized environment using dynamic resource allocation. Another target area might be Content Delivery Networks which utilize servers for caching data for various types of caches and applications. Dedicated cache hardware isn’t needed on a per operator, per provider basis. Instead, these could be consolidated using virtualized caches resulting in better resource utilization.

How to achieve NFV goal and Service Chaining

A service provider who follows the NFV design will implement one or more virtualized network functions or VNFs. A VNF by itself does not automatically provide a usable product or service to provider’s customers. Rather it provides a software interface to utilizing the hardware’s capability without going into the details of specific hardware. To build more complex services, the notion of service chaining is used, where multiple VNFs are used in combination to deliver a service. A core concept here is that we should isolate a type of network function that could be standardized for usage on generic hardware. Then all variations are controlled via software. Several such VNFs could be combined to produce a specific Network Service and depending upon the needs of each, it could utilize a different set of VNFs to achieve its goal. Since coming up with a new service often means switching control only in software, the process is faster, more scalable and reliable compared to standard networking services that we use today.

Using existing technologies to build NFV

At its core, cloud computing is about hardware virtualization. Hypervisors, as well as the use of VSwitch (virtual Ethernet switch) for connecting traffic between virtual machines and physical interfaces are some of the technologies existing today which can be used and extended to build NFV upon.Similarly, since NFVmakes decisions based upon software, technologies such as Deep Packet Inspection (that exist today entirely in software) can be used to gain more insight into network and help with making decisions hence reducing reliability on hardware. NFV won’t be accomplished overnight, but with ETSI standards in place existing technologies and a one step at a time approach, carriers and enterprises can benefit from NFV today and start to build an NFV system that will reduce costs and allow them to venture into new opportunities in the years to come.

June 9, 2014 – What is NFV and why is it significant to the networking industry?

With the increasing number of smart phones and mobile devices, more businesses and enterprises are depending on improved network communications. Consumers are demanding more information and everyday lifestyle decisions to be handled online. Carriers face the dilemma of managing unsustainable network traffic growth. Network traffic and the corresponding cost of maintaining networks are already outpacing revenue growth and predictions forecast andit will grow increasingly worse over the next decade. To sustain profitability, carriers are looking to network operations to drive long-term network cost reductions and accelerate the speed of new revenue generating services delivery.Carriers are particularly eager to drive CapEx down and take advantage of affordable higher performance computing to achieve that goal. By using standard servers, they not only reduce the equipment cost, but more importantly, reduce size and power requirements.With Network Functions Virtualization or NFV, they can realize cost and network flexibility benefits by virtualizing many common wide area network functions. Although it will take years to implement, NFV is expected to impact network spending and impact the networking vendor landscape. NFV virtualizes network services via software, giving carriers the application performance in software which historically was only found in high-cost proprietary hardware.NFV replaces dedicated network hardware with virtualization software running on commodity servers. NFV decouples network functions from proprietary hardware appliances and implements the network functions through software. Services provided by a network operator such as DNS, firewalls, NAT etc. are traditionally achieved through proprietary hardware which is physically injected into the network. The concept of NFV suggests bringing in virtual machines (VMs)running on industry grade server hardware performing all of these functions in software. This also means that the actual processing, for say a firewall, may or may not be done on the node physically attached to the network but rather anywhere on a VM. NFV foresees a fully virtualized infrastructure which should be delivered using networking components that consolidate all differential features in software and only using standardized hardware so it can be scaled up or down without any physical changes.

The Motivation for Taking the NFV Route

Network operators are motivated to adopt NFV in their networks due to the cost of purchasing and maintaining proprietary hardware. Most operators already have a variety of proprietary hardware, but when they want to introduce a new network service it often requires additional hardware, plus more space and power to accommodate these boxes. If they take that approach, it’s a costly, time intensive, error prone and not scalable process. Therefore they try to allocate the maximum bandwidth that might be needed, but it’s often a waste of physical resources which can’t be used in other scenarios. Moreover, recovery from a single point of failure requires physical intervention and often unacceptable down time.When the hardware reaches end of life it requires a complete cycle of procurement, design, integration and deployment.

Benefitting from NFV

NFV intends to solve several of these problems by leveraging standard IT virtualization techniques on consolidated generic high volume servers, switches and storage. These could either be placed in data centers, on network nodes on in the end user premises. The immediate benefits are:

  • Orchestration – The network creates, monitors and repairs VNF instances in software. We can already imagine the remotely setting up a server on Amazon cloud vs. having to host it locally, setup and expand its capabilities. NFV would take that a step ahead.
  • Reduction of onsite visits – NFV has the ability to help reduce truck rolls to customer sites, and is a huge cost saving for operators.
  • Cost effectiveness for consumersNFV would only bill the customers for services provided (time, number of users, bandwidth) vs. charging the whole setup cost.
  • Efficient resource utilization – Would be able to consume any unutilized resources and computing power in other services being run remotely by software.
  • Cost effectiveness of hardware needsThe bulk of unique functionality for a particular network service would be in software – leading to more homogenous type of hardware needs that could be produced in bulk and thereby be would be less expensive.
  • Lower operating costsImproved network management efficiency would be a huge leap i.e. being able to create, modify, experiment and shutdown a network can result in dramatic improvements in down time.
  • Flexibility and agility – Scaling up or down of services is just a matter of configuration to quickly address changing demands as well as providing room for experimentation.

With NFV, carriers’ long-term goals to drive down total CapEx spending and significantly reduce networking equipment costs can be achieved. This shift from proprietary hardware based implementation to software can lower the barriers to entry for new equipment suppliers to enter the market, drive down pricing andopens up new opportunities for new revenue sources for many non-networking suppliers.

May 12, 2014 – What is OpenID and why is it important?

In today’s web environment the authentication of users is a very frequent task. Users are asked many times to enter repetitive information in order to create yet another account. But their digital identity on ad hoc systems might be only protected by a very thin security layer that is prone to online attacks and intrusions. Plus, for each website where a user creates an account, they need to remember their distinct credentials and ensure that their credentials are strong enough and not prone to brute force network attacks. To address these concerns, the OpenID standard was created to allow users to use an existing account to sign into multiple websites, without needing to create or remember new passwords. With OpenID, you provide your password to your third-party identity provider and that OpenID provider confirms your identity to the websites that you sign into. After you create an account with your preferred OpenID provider you use it for signing on to any website which accepts OpenID authentication. By authenticating users through third-party services, this open standard eliminates the need for webmasters to provide their own systems to handle the authentication and allows users to consolidate their digital identities.

OpenID vs. oAuth

While there are some similarities between the two standards, OpenID’s role is strictly limited to being an authentication provider. As such, it simply confirms that you are who you say you are, and its responsibility stops there. On the other hand, oAuth functions as an authorization provider. That means that oAuth gives you controlled access to third-party resources that are not part of the consumer application without providing it with user name and password. As an example, Google’s OpenId authenticated token only tells a web application that the user is who the user claims to be whereas an API provided by Google will give a third party limited access to, say, your Google Calendar API provided that you grant access to it explicitly. Other OpenId providers include Yahoo, Blogger, Flickr just to name a few. OpenID facilitates username and password management while making it possible to ensure that users manage their logins with a trusted authority. Let’s look at an example which shows how such an authentication method can be applied to an element management system.

OpenID for Authentication: An Example

Element management (or FCAPS) applications deliver vital and high volume data at high speed to Operation Support Systems (OSS) to help network operators control cost, increase revenue and provide exceptional end customer experience. Fault, capacity, operational measurement and billing data of a service providing node is aggregated by the dedicated element management application. The FCAPS data is then stored into files (or other formats) and forwarded to the OSS. The data exchange usually depends on simple authentication with password or sometimes no authentication. File transfer protocols require password tokens to authenticate the FCAPS applications with the target OSS system. A secure file transfer protocol ensures data is encrypted during transfer. However, the passwords stored in the element manager disk are vulnerable to theft. At the element management level, there is often no centralized design mechanism for storing the OSS passwords. Such passwords are only entered once into the element manager during configuration time. The passwords are stored based upon individual security requirements. The password could be encrypted while it is stored on disk. However, the element manager disk could be replaced or hot swapped out of the system. In such a scenario, the encrypted password could be deciphered with sufficient time and the OSS is then vulnerable to attack or unauthorized access. Alternatively, we can use a public/private key based authentication, however provisioning a key-based mechanism on an element manager is cumbersome. Moreover, losing the private key creates the same vulnerability as the previous scenario. Using OpenID as an ID management protocol can eliminate the need for storing passwords of OSS systems on each element manager node in the network. The trust between the OSS and the element manager is established via an independent centralized security server. The operator security server is trusted by the OSS systems and the security server is usually security hardened against attacks. The key advantages of implementing the OpenID protocol at the node level are:

  1. Reduce unwanted passwords exposure in the operator network running FCAPS applications.
  2. Eliminate chances of defective software design for handling password. Not all programmers and testers have the same level of awareness on how to handle OSS passwords. With OpenID, OSS passwords no longer need to be stored on the element manager disk.
  3. Establish trust between the network level security server and the element manager independently without being tied to a particular transfer protocol and a type of application.

By implementing the OpenID standard protocol at the element management system level, password security can be improved and the need for additional systems and processes can be eliminated. Reference: http://openid.net/specs/openid-authentication-2_0.html

February 20, 2014 – Why use DPI for Virtualized Network Traffic Analysis?

Over time, more and more data is flowing over our networks, and traffic issues can have a big impact on our ability to do our jobs – or even make or break a business. Monitoring the performance of these networks so that corrective and/or preventive actions can be taken is essential. Let’s take a look at how network monitoring data has evolved. Let’s start with the decades old MIB-II (RFC 1213) which provides basic device level traffic statistics at the interface level such as packets in and packets out. This was later augmented by RMON which provided for a more detailed level of statistics (e.g. per protocol). As traffic grew and more services became network based, this information was found to be insufficient for effective performance monitoring and a better understanding of how data flows through the network was needed. Enter Cisco’s NetFflow (standardized later as IPFIX) and sFlow. These standards provide flow level statistics that give more insight into application traffic behavior. Today, applications and networks are continuing to evolve and we see the move towards increased cloud computing and network virtualization. The methods listed above are becoming insufficient for performance monitoring in more complex environments such as a software defined data center. Deep packet Inspection (DPI) provides the flexibility to analyze network traffic well beyond previous techniques. Let’s take a closer look at the flow level techniques.

sFlow

Based on the packet sampling concept explored as early as 1992, sFlow is a method that monitors network traffic by sampling 1 in every N packets in a flow passing through a device interface, where N is the sampling ratio configured. The sampled flow data is extracted from Layer 2 to Layer 7 header information and sent to a data collector using the sFlow protocol for further processing, this data can be used to help in bandwidth monitoring, real time traffic analytics, anomaly detection and IP level billing. The sFlow protocol is actually a UDP packet carrying the sampling rate and index of the interface on which packets get sampled in a flow. The UDP packet also contains multiple samples, each being a part of an Ethernet frame of a sampled packet. Once the size of such a UDP packet reaches 1500 bytes or a time limit, it is sent by a flow exporter to analyzer software for further analysis. The analyzer will process the exported data to provide one or more of network health monitoring, troubleshooting assistance, protocol and application usage patterns, security issues identification, policy enforcement and billing functions.

NetFlow

NetFlow was introduced on Cisco routers to allow IP traffic to be collected for monitoring purposes as it enters an interface. NetFlow-collected traffic information is typically organized into flow records that are carried in UDP packets for exporting to an analysis tool. The data captured in a flow record contains a layer 3 header, layer 3 routing information, an index of the ingress and egress interfaces, flow start time and termination time, number of packets in the flow, and other information. Depending on the implementation, NetFlow can be configured to capture flow records for every packet in a flow or for every one out of N packets.

IPFIX

IETF has defined IPFIX to supersede Netflow based on Netflow v9.

Comparison of sFlow and NetFlow

NetFlow/IPFIX can only capture IP based traffic flow information, limiting its work to layer 3 interfaces for IP traffic. In comparison, sFlow can work from layer 2 to layer 7 headers, and is not dependent on layer 3 routing or next hop information. sFlow can work with non-IP traffic. sFlow enables flow analysis beyond that of NetFlow.

Value of DPI

In a virtualized network neither sFlow nor NetFlow operating on a physical switch or router are able to see the traffic flow within the virtualized network running on top of the physical network. While sFlow and NetFlow operating on a virtual switch (e.g. e.g. Open Virtual Switch – OVS) are able to provide visibility of traffic flow between source and destination points, this leaves an information gap about the traffic flowing through virtualized networks over a long path in a physical network. DPI is the right tool to fill this gap. For traffic flows running over tunnels through a long path in a physical network, DPI can be deployed anywhere in a physical network to see through all tunnels on all layers of traffic that sFlow and NetFlow operating in physical switches/routers cannot see. Additionally many applications have key information in the payload which only DPI can access. For complete application aware traffic analysis DPI is needed and can be used to enhance analysis beyond what can be derived from NetFlow and sFlow. At Northforge, we are committed to the development of leading-edge software for network infrastructure. DPI and DPI device/applications management, represents one of the fastest growing segments where our software services experience is helping our customers rapidly develop their platforms. To further accelerate our customers time-to-market, Northforge has partnered with several chipset manufacturers who have silicon support for DPI functions. Northforge is uniquely positioned to offer very experienced software services, from chipset porting through to DPI Management frameworks and beyond. We invite you to contact Northforge, and see how our services can assist you.

January 22, 2014 – SIP Security: How Much is Enough?

Everybody wants to think that their telephone service is secure. While the security of phone communications has always been a concern for users and service providers alike, it has become an even bigger one in the Internet and VoIP era, and the recent events involving the NSA are more proof of the complexity and sensitivity of this issue. SIP is the most widely used signalling communications protocol for controlling voice calls over Internet Protocol networks. But what do we mean by “secure”, and what is a reasonable benchmark of adequate security? A good starting point for thinking about SIP security might be that we want to:

  • maintain confidentiality
  • assure integrity
  • guarantee service availability (including Quality of Service)

Confidentiality obviously includes the idea that only the intended participants, the calling and the called parties, should be able to listen in to the call. You probably don’t want third parties listening to your calls, or having your calls recorded without your consent. But confidentiality may also go beyond this to include maintaining some privacy about who you are calling, how much personal information you expose, what your calling patterns are, and so on. Integrity has to do with being certain that the call participants are who you think they are. You want to be confident that, for instance, when calling your IT support line or telephone banking line, you are not being connected with someone masquerading as those roles (“At the tone, please enter your account number followed by your password”). Service availability includes the prevention of denial of service attacks, but it would also encompass the prevention of unauthorized access, where, for instance, system resources are being used by third parties for unauthorized purposes, including toll fraud. It could also reasonably include the preservation of expected levels of service and quality. A further question that is worth thinking about is “Secure as compared with what?”. A reasonable benchmark for comparing the security of SIP-based VoIP telephony might be that of Plain Old Telephony Service (“POTS”). POTS set the bar very high for service availability, although a determined attacker could create a short-term denial-of-service attack by call “flooding” (having many callers target one particular endpoint). Similarly, POTS delivered a high level of call integrity – endpoints were known, as billing tended to validate them and systems usually required physical access for changes to their configuration settings. Confidentiality was reasonably-well assured, although operators, receptionists, and others might still have access to calls involving others.

How Does SIP Telephony Compare?

In the absence of security measures, a SIP system may be vulnerable to many threats: Registration Hijacking: An attacker can appear to be someone they aren’t; your calls may be transparently re-directed to the wrong place Snooping: Your calls may go to the right place, but unauthorized third parties can listen in on your conversations (or Fax transmissions…) Message Tampering: What you say to your correspondent may not be what they hear; it is possible to modify the audio stream that is being transmitted Session Control: An intruder may be able to forward, put on hold, record, or terminate your call, contrary to your intentions Denial of Service: A traditional Internet Protocol hazard, SIP is vulnerable to this. Endpoints can be rendered unusable; as can lines, or other telephony resources and more!

What Can We Do to Mitigate These Risks?

In order to reduce the chances of someone eavesdropping on your conversations, ensuring that your audio stream is encrypted by using Secure RTP (SRTP) instead of just RTP, and that the flow is endpoint-to-endpoint, will help a lot. If your audio is not encrypted, any node along the data path can intercept, listen in, or possibly even modify, your audio. “Any node along the data path” can include routers on your LAN, your ISP, internet backbone providers, indeed even the coffee shop whose WiFi hotspot you may be using. SRTP puts this kind of attack out of the reach of most, if not all, attackers. Similarly, encryption of the signalling path, using SIPS, instead of vanilla SIP, will reduce the chances of an attacker manipulating the call control, or learning too much peripheral information about your calls. When connecting across untrusted networks, such as public WiFi access, using a VPN is wise; that way untrusted local nodes would have less chance of even being aware of what you are doing – none of your plaintext signalling or unencrypted media would be visible.

Last Thoughts About Security

It’s important to remember that there is no such thing as absolute security. It is almost certainly impossible to be completely protected from all possible threats, and still have a usable system. In part this is because, as security demands increase, the costs to implement security responses increases as well – in computing overhead, bandwidth usage, protocol complexity. Security always involves a cost-benefit calculation; we need to be aware of what can go wrong, what kinds of damage can result (financial, reputational, or others), and how much time, effort or capital we’re willing to dedicate to preventing these things from happening. Security threats are always evolving, new threats are discovered and new vulnerabilities in protocols or networks are uncovered, so doing these cost-benefit calculations is an ongoing activity. Security is a game of cat-and-mouse. Finally, there is such a thing as too much security: sometimes the only response we can devise to some threat is to eliminate the feature that exposes that threat. This probably results in an overall less useful system. We need to keep in mind what sorts of hazards we are concerned about, what are the chances of those things actually happening, and what costs we are willing to incur to protect ourselves or our organizations against those threats.

For More Information

What Does Security Mean? Basic Vulnerability Issues for SIP Security Session Initiation Protocol Liars & Outliers: Bruce Schneier

Northforge

Northforge provides development expertise in IP Communications, VoIP, SIP/SDP/RTP/RTCP, H.323, and more. Please contact us for information on how we can make your next VoIP project a success!

December 17, 2013 – OpenStack Virtualized Network Simply Explained

This is the final blog in the OpenStack Cloud software platform series. This blog gathers some of the most interesting and up to date information available regarding OpenStack’s virtualized network.

OpenStack Virtualized Network

It is essential that an application understand how to plan their network topology on the cloud to ensure its security, performance and robustness. The information was collected from the following sources:

 

Neutron [aka Quantum]

The OpenStack component, Neutron (also known as Quantum in older releases of OpenStack), provides network connectivity as a service. This allows OpenStack tenants to build advanced network topologies for their virtual instances. Advanced network services, such as Load Balancer as a Service, firewall as a service, etc. can be plugged into a tenant’s network. In the example below, a dedicated network node, running the Quantum server, provides DHCP and Layer 2 and 3 networking services (routing and floating IP addresses) for private networks of two tenants. Openstack blog 3 image 1 [10] There are three types of networks:

  • Management Network which provides connectivity between OpenStack nodes of the cluster.
  • Data (tenant) Network which provides connectivity between VM instances of a tenant. It can be used to set up private networks to ensure tenant isolation.
  • External Network which provides VM instances with Internet access and a pre-allocated range of IP addresses that are routed on the Internet (“floating” IPs which can be assigned to VMs later).

If an OpenStack provider is running a private cloud dedicated to a single service then an advanced network topology is not required. But if the cloud provider chooses to use a VPN to segregate groups of subscribers or re-sell to the service then it is essential that the Quantum service is configured to monitor/manage the network components, provide scalability, security and high availability.

Load Balancing

Load balancing can be achieved by using the built-in feature of OpenStack called LBaaS (Load-Balancer-as-a-Service). LBaaS is a Quantum extension that introduces load balancing feature set into the core. It is a sub-project of Quantum and is now included in the Grizzly release of OpenStack. [11] LBaaS is built on the popular HAProxy open source load balancer which has a reputation for being fast, efficient (in terms of processor and memory usage) and stable. LBaaS can be managed via its own set of APIs or conveniently from within the OpenStack dashboard (Horizon). Existing hardware or software nodes (VMs) can be easily assigned and configured to the load balancer from within the dashboard or via the API. The load balancer supports various dynamic load distribution algorithms, including round robin, sticky session, sticky IP, weighed round robin, etc. It is essential that traffic is sent based on the actual load and may need to be dampened, depending on the algorithm, to prevent overloading the blade by sending too much traffic to it on an instantaneous basis. Using the OpenStack service APIs, it should be possible to monitor the load on the VMs and if required, spawn off additional VMs. The load balancer will dynamically become aware of these additional compute resources and distribute traffic to them accordingly.

Miscellaneous Application or System Considerations

It is critical that a cloud application successfully log and monitor adverse behaviour for fault isolation and troubleshooting purposes where there could be many other co-resident applications. Automated deployment of OpenStack can be huge time-saver in terms of setting up cloud test environments for rigorous application testing. The information was collected from the following sources:

 

Fault Monitoring

An IAAS platform opens up another avenue for faults to affect an application and the application’s management system should try to monitor the health of the IAAS as it can have an adverse impact on the application’s services. In a dedicated system, it is normal to monitor the health of the infrastructure, some of this needs to be carried forward to the IAAS environment.

Audit Trail

Each OpenStack node has a logging service with the standard logging levels: DEBUG, INFO, AUDIT, WARNING, ERROR, CRITICAL, TRACE. The logging level can be set for each nodes and custom logging statements can be added to the log. An open source tool, StackTach, can be used to collect and report the notifications sent by the compute nodes. [12] This information can be used as an audit trail to identify and isolate issues that affect the application that is running on the cloud.

Automated Deployment

It is important for an OpenStack service provider to be familiar with the steps involved in deploying OpenStack but for any large-scale cloud deployment an automated installation mechanism is essential for efficient, repeatable and successful deployments. Automated installation can also aid continuous integration and verification efforts for an application development team deploying to OpenStack. Infrastructure management tools such as, Puppet or Chef, could be used to build an automated OpenStack installer. [13]

Footnotes

[10]  http://www.mirantis.com/blog/a-new-agent-management-approach-in-quantum-for-openstack-grizzly/ [11]  https://wiki.openstack.org/wiki/Quantum/LBaaS [12]  http://docs.openstack.org/trunk/openstack-ops/content/logging_monitoring.html#stacktach [13]  http://docs.openstack.org/grizzly/openstack-compute/admin/content/ch_openstack-compute-automated-installations.html

Northforge

Northforge has combined its technical expertise in cloud computing/SaaS software development with its extensive network infrastructure experience to deliver multiple Cloud and SaaS technology projects. We understand the design, development and UI requirements to take the nebulous out of your next cloud-based project.

December 3, 2013 – OpenStack Metrics Simply Explained

This is the second blog in our OpenStack software platform series.  This blog gathers some of the most interesting and up to date information regarding OpenStack performance metrics.

OpenStack Metrics

A cloud deployment environment changes as commodity hardware is seamlessly added or removed in response to increasing amounts of subscribers, applications and/or resellers.  A tenant application on the cloud must collect and monitor metrics in order to ensure their performance benchmarks are being met and to see trending data which can lead to product improvements.  The information was collected from the following sources:

The OpenStack component for collecting metrics in OpenStack is called Ceilometer. The component was initially intended for collecting for customer billing but it is evolving to become the infrastructure to collect OpenStack measurements. It provides a unique point of contact to acquire measurements across all current OpenStack components. The type of data and frequency of collection can be configured by OpenStack deployers. The data is collected by monitoring notifications or by polling OpenStack infrastructure components. The data is collected and written to a database. The data can be accessed via a secure REST API and in the future, from Horizon, the OpenStack Web console. OpenStack blog 2 [7] For example, data from the Nova compute node includes:

NAME TYPE UNIT ORIGIN NOTE
instance Gauge instance both Duration of instance
instance: <type> Gauge instance both Duration of instance <type> (openstack types)
memory Gauge MB notification Volume of RAM in MB
cpu Cumulative ns pollster CPU time used
cpu_util Gauge % pollster CPU utilisation
vcpus Gauge vcpul notification Number of active VCPUs
disk.read.request Cumulative request pollster Number of read requests
disk.write.request Cumulative request pollster Number of write requests
disk.read.bytes Cumulative B pollster Volume of read in B
disk.write.bytes Cumulative B pollster Volume of write in B
disk.root.size Gauge GB notification Size of root disk in GB allocated to all active virtual instances
disk.ephemeral.size Gauge GB notification Size of ephemeral disk in GB allocated to all active virtual instances
network.incoming.bytes Cumulative B pollster Number of incoming bytes on the network
network.outgoing.bytes Cumulative B pollster Number of outgoing bytes on the network
network.incoming.packets Cumulative packets pollster Number of incoming packets
network.outgoing.packets Cumulative packets pollster Number of outgoing packets

[8] If you want to actively monitor OpenStack nodes and network services, the open source Nagios application can be used. Nagios runs on a Linux machine connected to the network and receives its data from an NRPE add-on agent running on the target OpenStack components. The NRPE is configured to monitor system resources on the remote machine. The Nagios server is responsible for receiving, storing and presenting the collected data to the user. openstack Blog 2 image 3[9] This performance and usage metrics can be used to effectively tune and scale the OpenStack deployment running your application. The data can be analyzed from test and production environments to monitor performance, resolve bottlenecks, verify virtual hardware models, plan capacity and optimize cloud resources. Part 3, the final posting in this OpenStack software platform blog series, will offer some of the most interesting and up to date information available regarding OpenStack’s virtualized network. The information for this blog was collected from the following sources: • http://www.infoworld.com/d/cloud-computing – Infoworld’s CloudComputing site. • http://www.openstack.org/ – OpenStack organization’s web site • http://www.mirantis.com/ – Mirantis OpenStack service vendor • http://www.redhat.com/products/cloud-computing/openstack/ – Red Hat’s OpenStack Deployment • http://devstack.org/ – DevStack OpenStack deployment scripts • http://www.morphlabs.com/ – MorphLabs Cloud Consultants • http://www.nagios.org/ – Nagios element and networking monitoring system

Footnotes

[7] http://docs.openstack.org/developer/ceilometer/architecture.html [8] http://docs.openstack.org/developer/ceilometer/measurements.html [9] http://nagios.sourceforge.net/docs/nagioscore/3/en/addons.html#nrpe

Northforge

Northforge has combined its technical expertise in cloud computing/SaaS software development with its extensive network infrastructure experience to deliver multiple Cloud and SaaS technology projects. We understand the design, development and UI requirements to take the nebulous out of your next cloud-based project. Top

November 14, 2013 – OpenStack Simply Explained

Northforge has created a 3 part blog series that looks at some of the most interesting and up to date information available on the internet regarding the OpenStack software platform. OpenStack is a collection of open source projects that can provide cloud management of network and computing resources. It is backed by a significant number of dominant players in the high tech industries such as IBM, HP, Dell, Cisco and AT&T, to name a few.

OpenStack Overview

OpenStack is an open source platform that lets you build an Infrastructure as a Service (IAAS) cloud that runs on commodity hardware and scales massively. [1]The main components of OpenStack are:

OpenStack Platform

[2]

      • Object Store (codenamed “Swift”) provides object storage. It allows you to store or retrieve files (but not mount directories like a fileserver).
      • Image (codenamed “Glance”) provides a catalog and repository for virtual disk images. These disk images are mostly commonly used in OpenStack Compute.
      • Compute (codenamed “Nova”) provides virtual servers upon demand.
      • Dashboard (codenamed “Horizon”) provides a modular web-based user interface for all the OpenStack services. With this web GUI, you can perform most operations on your cloud like launching an instance, assigning IP addresses and setting access controls.
      • Identity (codenamed “Keystone”) provides authentication and authorization for all the OpenStack services. It also provides a service catalog of services within a particular OpenStack cloud.
      • Network (codenamed “Neutron” , also known as “Quantum” in earlier releases) provides “network connectivity as a service” between interface devices managed by other OpenStack services (most likely Nova). The service works by allowing users to create their own networks and then attach interfaces to them.
      • Block Storage (codenamed “Cinder”) provides persistent block storage to guest VMs. [3]

 

OpenStack Hardware Scaling

Applications that have performance models based on hardware requirements must consider how OpenStack manages the CPUs, memory and storage in order to maintain their performance benchmarks in the cloud. The information was collected from the following sources:

 

Flavors

An OpenStack virtual machine is created based on a virtual hardware template called a flavor. The flavor defines the amount of virtualized CPU cores (a portion of a physical CPU core that is assigned to a virtual machine) memory, and storage used by the virtual machine. The default OpenStack set of flavors is:

OpenStack Flavors

[4] Additional flavors can be created to match the recommended hardware requirements for the application. For an existing application, in the near term, a flavor can be specified that matches the performance of the dedicated hardware previously used.  In future an application’s engineering rules can be modified to incorporate smaller units of compute resources to allow operators to be more granular in OpenStack flavor size.  It should be possible to fit an equation to the performance curve such that the size of the OpenStack flavor can be left up to the Operator.  In either case, an agent could be developed to validate the resource allocation (CPU, RAM, storage, disk swapping etc.) that is incorporated into the application. Though non-persistent storage is disk space, if the VM is terminated for any reason the disk space is lost.  Care must be taken to ensure persistent data is kept in permanent storage, in the OpenStack case persistent block storage is provided by ‘Cinder’.

Virtual Instances

Virtual instances are hosted on an OpenStack compute node. Ideally, all virtual instances will be the same flavor if the compute node is dedicated to supporting a single service. The scheduler will create virtual instances on a compute node until either the virtual CPU core limit or virtual memory limit is reached on that compute node. The default CPU overcommit ratio ( increasing the number of virtual CPU cores on a compute node at the cost reducing performance) is 16 virtual cores to 1 physical core but for CPU intensive applications a ratio of 4:1 or 2:1 is more applicable and if warranted could be set as 1:1. Increasing the number of virtual instances will adversely affect the performance of the instances. In the case of an existing application’s that provides engineering rules with precise performance modeling; a ratio of 1:1 can be mandated for deterministic behavior. The formula for the number of virtual instances on a compute node is: (CPU Overcommit Ratio (virtual cores per physical core)*(# of physical cores) )                                                            (# of virtual cores per instance) [5] Memory can also be overcommitted. The default overcommit ratio is 2:1 but overcommitting memory will also adversely affect instance performance so do only after testing your particular use case. As with CPU allocation, a ratio of 1:1 can be mandated for deterministic behavior where warranted. The Ram and CPU overcommit ratio can set at 1:1 initially so a virtual instance (with a flavor that matches the application’s hardware requirements) would be the equivalent to, say, a server blade that satisfies the current engineering rules for the application. It is recommended to test slightly higher overcommit ratios to see if a system could scale to support more virtual instances while satisfying the performance criteria for the application. Additional compute nodes can be added easily to OpenStack when it is necessary to scale out horizontally. Ideally, the new compute node should be just as powerful as the existing compute nodes so that instance capacity increases linearly. Over time, compute hardware becomes more powerful, and more virtual instances can be supported. However care needs to be taken to ensure application performance is not adversely affected. The most popular hypervisor used by OpenStack is KVM, though LXC, QEMU, UML, VMWare vSphere, Xen, PowerVM, Hyper-V and Bare Metal are also supported. All new OpenStack features are tested and implemented on KVM and gets the most support on OpenStack forums. It is best suited for Linux guests. [6] The physical location of the VMs may be a factor affecting performance, consider the case where 2 VMs need to transfer a lot of data between themselves to accomplish a task. The impact on performance of the physical VM location can be summarized as follows:

Open Stack VM Location

Data Storage

Data storage options available to a virtual instance in OpenStack are:

  • Local file system (managed by Nova) on the compute node
  • Block storage volumes (Cinder) on the block storage node
  • Software Image repository (Glance) on an image server

A bottleneck with virtual machines is local disk I/O by the Hypervisor since there all virtual instances have their local file systems on the compute node’s local hard disk. The problem can be mitigated by having a SSD drive on the compute node with a high IOPS (I/O operations per second) rate and migrating critical data to a block storage node. If the compute node goes down and the applications running on the virtual instances must be recovered in their current state then critical data must be stored outside of the virtual instance on an external block storage node. The block storage is attached to the virtual machine via iSCSI. Having a separate storage node also allows scaling and redundancy of block storage independently of the compute node. The block storage node should have a high IOPS rate and be reliably accessible to the network. The Glance component can store snapshots of the virtual instance which can be used to backup and quickly restore a virtual instance with an application. A database, such as Oracle RAC, is treated as an external entity to the OpenStack environment and is accessed through standard approaches such as JDBC and ODBC. In the longer term, the database server could be run in a virtual instance. Part 2 of this blog series regarding the OpenStack software platform will gather some of the most interesting and up to date information available on the internet regarding OpenStack performance metrics. The information for this blog was collected from the following sources: • http://www.infoworld.com/d/cloud-computing – Infoworld’s CloudComputing site. • http://www.openstack.org/ – OpenStack organization’s web site • http://www.mirantis.com/ – Mirantis OpenStack service vendor • http://www.redhat.com/products/cloud-computing/openstack/ – Red Hat’s OpenStack Deployment • http://devstack.org/ – DevStack OpenStack deployment scripts • http://www.morphlabs.com/ – MorphLabs Cloud Consultants • http://www.nagios.org/ – Nagios element and networking monitoring system

Footnotes

[1] http://docs.openstack.org/trunk/openstack-ops/content/index.html [2] http://www.redhat.com/products/cloud-computing/openstack/ [3] http://docs.openstack.org/folsom/openstack-object-storage/admin/content/components-of-openstack.html [4] http://docs.openstack.org/trunk/openstack-ops/content/scaling.html#starting [5] http://www.youtube.com/watch?v=0RRdKknfRUc(OpenStack Capacity Planning) [6] http://docs.openstack.org/trunk/openstack-ops/content/scaling.html

Northforge

Northforge has combined its technical expertise in cloud computing/SaaS software development with its extensive network infrastructure experience to deliver multiple Cloud and SaaS technology projects. We understand the design, development and UI requirements to take the nebulous out of your next cloud-based project. Top

October 9, 2013 – Network Traffic Increase and NP based DPI

The ubiquity of connected devices is putting demands on networks to deliver more data packets ever faster. Your smartphone, tablet vehicle, television, home and even your medical device depend on the network and its ability to operate efficiently, effectively and securely. Network operators need to have better insight on how their networks are operating to ensure they are meeting their customer needs. Network Monitoring with Deep Packet Inspection helps to deliver that insight. This blog looks at Deep Packet Inspection and how the use of Network Processors (NP) can meet the real time inspection requirements that are present in today’s networks.

DPI Applications

Deep Packet Inspection (DPI) is widely used at different packet handling stages ranging from the desktop to core networks. DPI is used to analyze traffic, from Layer 2 up to Layer 7 for many applications, for purposes such as traffic profiling, intrusion detection, intrusion prevention, content aware forwarding and load balancing, data leak prevention, anti-malware, service level agreement enforcement, network monitoring and trouble shooting and many others. Central to many applications of DPI is pattern matching that compares and matches byte streams in packet flows with a set of pre-defined patterns (signatures). Each application may have its unique way of using the pattern matching result. Apart from conventional pattern matching techniques, DPI also involves classification of traffic which does not have a fixed signature. Central to DPI in all scenarios is to be able to accomplish all this at wire speed without compromising on classification percentage capability.

Network Traffic Increase and NP based DPI

All recent studies are showing exponential growth in internet traffic and the trend is expected to continue in years to come [1]. The growth of internet traffic bandwidth has increased link speeds from 1Gbps to 10Gbps and to 100Gbps, presenting an increasing challenge for DPI to be applied at network at inspection points to incoming traffic at wire speed. This is further aggravated by the need to support DPI over a rapidly evolving set of protocols and services. The recent progress in network processor design has shown that DPI implementation leveraging network processors can be very effective in performing DPI on incoming traffic to meet the real time response/decision requirement for various applications. As with many other technologies careful consideration needs to be given to design to get the most out of the network processor. With many applications and services packet inspection from L2 to L7 needs to consider how to divide and map the DPI algorithm to the network processors capabilities such as available memory and number of processor cores. A further example is the inspection of dynamic length packets such as HTTP or protocols that are using non-standard values in various fields (such as non-standard ports), these types of situations can stress available memory or other resources.

Northforge Experience in DPI and Network Processors

Northforge software engineers have integration and porting experience, using network processors from different companies, including EzChip, Cavium and PMC-Sierra, and have developed and enhanced DPI algorithms for various protocols. The Northforge network processor team has in-depth knowledge and experience in various DPI processing stages, e.g., packet re-assembly, pre-classification, traffic classification via patterns and heuristics, load distribution, packet inspection (L2-L7), pattern signature matching, and flow profile policy application. Talk to us about how we can help solve your DPI needs.

References

[1] http://arstechnica.com/business/2012/05/bandwidth-explosion-as-internet-use-soars-can-bottlenecks-be-averted/ Top

September 11, 2013 – What Does CSCF (Call Session Control Function) Do in Your IMS Network?

The CSCF (Call Session Control Function) is a collection of functional capabilities that play an essential role in the IMS core network. The CSCF is responsible for the signaling controlling the communication of IMS User Equipment (UE) with IMS enhanced services across different network accesses and domains. The CSCF controls the session establishment and teardown, as well as user authentication, network security and QoS (Quality of Service). At present, there are four types of CSCF defined according to the functional capability offered. One or several of these functional capabilities may be hosted by a physical network node within the IMS network domain:

P-CSCF (Proxy-CSCF)

The P-CSCF is the first point of contact between the UEs and the IMS network. Acting as a SIP proxy, all the SIP requests and responses from/to UEs traverse the P-CSCF. The P-CSCF may be located in either the user’s home network or in the visited network for handling roaming. The P-CSCF supports several important functions: • Validates the correctness of SIP messages with IMS UEs according to SIP standard rules. • Ensures the security of the messages between UEs and the IMS network using IPsec or TLS security associations. • Authenticates and asserts the identity of the UE. • Compresses the messages ensuring the efficient transmission of SIP messages over narrowband channels. The P-CSCF may support Policy Enforcement capabilities for authorizing media plane resources, bandwidth, and QoS management. In addition, the P-CSCF can also generate charging information to be collected by charging network nodes.

I-CSCF (Interrogating-CSCF)

The I-CSCF is a SIP proxy located in the edge of an administrative IMS domain. Its IP address is published in the Domain Name System (DNS) of the domain (using NAPTR and SRV type of DNS records), so that remote servers can find and use it as a forwarding point (e.g., registering) for SIP packets to this IMS domain. The I-CSCF implements a Diameter (RFC 3588) interface to the HSS (Home Subscriber Server), and queries the HSS to retrieve the address of the S-CSCF for an UE to perform SIP registration. Being a SIP proxy, the I-CSCF forwards SIP message requests and responses to the S-CSCF. Additionally, the I-CSCF may encrypt parts of the SIP messages securing any sensitive information. Typically the IMS network includes a number of I-CSCF nodes for the purpose of scalability and redundancy. The I-CSCF is usually located in the IMS home network.

S-CSCF (Serving-CSCF)

The S-CSCF is a central function of the signaling plane in the IMS core network. A S-CSCF node acts as a SIP registrar, and in some cases as a SIP redirect server. It is responsible for processing the location registration of each UE, user authentication and call routing and processing. Similar to the I-CSCF, the S-CSCF supports Diameter Cx and Dx interfaces to the HSS to download the authentication information and user profile of the registering UEs from the HSS for authentication purpose. All of the SIP signaling from/to the IMS UEs traverses their serving S-CSCF allocated during the registration process. The S-CSCF also provides SIP message routing and services triggering. It also enforces the policy of the network operator and keep users from performing unauthorized operations. The S-CSCF is always located in the home network. A number of S-CSCFs may be deployed for the sake of scalability and redundancy.

E-CSCF (Emergency-CSCF)

Compared to other CSCFs, the E-CSCF is a newly defined entity in the IMS network. As its name indicates, the E-CSCF is responsible for handling of emergency call service. Once the P-CSCF detects that the received SIP message request is for an emergency call it forwards that SIP message to the E-CSCF. Then, the E-CSCF contacts the Locating Retrieval Function (LRF) to get the location of the UE for routing the emergency call appropriately. The E-CSCF can be located either in a home network or in a visited network. Northforge Innovations is a trusted provider of intellectual capital to world-leading IMS CSCF providers. Working closely with these customers, Northforge engineers have analyzed, designed, implemented and verified key features on IMS CSCF products, and provided technical support. Northforge has the technical knowledge and innovative implementation skills to support the development of your IMS CSCF products. Top

July 17, 2013 – A Comparison of VOIP Platforms:  Asterisk vs. FreeSWITCH

Voice over IP (VOIP) is both a technique and a technology for communicating by transmitting voice and multimedia over IP as sessions. It is often at the core of internet telephony applications, and provides signaling, setup and audio services, among other features. Many of these applications are built upon one of two popular software platforms: Asterisk and FreeSWITCH.

Asterisk

Asterisk, a software based private branch exchange (PBX) solution, can provide much of the required functionality for a VOIP product. Asterisk was one of the first software based PBX solutions. An open source solution, it was created in 1999 and, with sponsorship provided by Digium, was released for production in 2005. It is released under dual licenses: the GNU GPL (General Public License) and a proprietary license to allow further development of proprietary solutions based on Asterisk. It has a large user base, in over 170 countries with more than a million Asterisk based systems in use. While Asterisk is able to satisfy much of a VOIP requirement, its use historically came with obstacles. The number of concurrent calls that it could originally handle was limited for the caller throughput that was expected, and there were issues with scalability. With increased load, there were also stability issues. Extending Asterisk to provide related services, such as voice mail, was not an easy venture. Combined with issues relating to licensing and support going forward, another option was needed.

FreeSWITCH

In 2006, a group of former contributing Asterisk developers, created an alternative solution called FreeSWITCH. Inspired by the modular design of the Apache Web Server, their goals were to use this modular approach to produce improved scalability and stability over multiple platforms. Built upon a state-machine model, it was designed so that each call/channel operates on its own separate and unique thread. Freely available open source components were used as building blocks, such as the Sofia-SIP open source SIP user agent library developed at Nokia Research Centre. A new open-source telephony platform was born. Overall, there are a number of key areas in which FreeSWITCH has some advantages over Asterisk: Performance: Although the FreeSWITCH project team does not release official performance numbers, there are many sources for this information available. Reported gains have been as much as four- to ten-times improvement on number of concurrent calls supported. The addition of the Sofia-SIP stack provides a reliable, industry proven implementation for communications control. Stability: Asterisk design relied upon shared resources, including threads, to provision multiple calls. This resulted in occasional deadlocking, race conditions, and potential data corruption in high capacity environments. FreeSWITCH was designed so that each call has unique control of its own resources, and that shared resources are managed by core functionality through a layered API. Versatility: Asterisk does provide a versatile platform for extending functionality and new applications, however FreeSWITCH provides even more. Both allow interfacing with languages and environments that use streams and sockets for communications. FreeSWITCH however supports multiple languages and applications such as C/C++, Python, Perl, Lua, JavaScript and .NET. The FreeSWITCH core library is also easily embeddable in other applications. Configuration/Design: Sometimes cited as an advantage, Asterisk utilizes plain text files in its approach for configuration and dialplan design, which can simplify administration and setup. Conversely, FreeSWITCH configuration is based upon XML, which may make manual maintenance of configuration files a bit more involved. There are other open source applications available that may simplify that task for FreeSWITCH, however. Where XML comes to an advantage is automating these tasks. FreeSWITCH also adds better support for regular expressions, and more call properties against which to match, allowing for more advanced dialplan design. SQL is also available can be extended for even more advanced administration features. A growing FreeSWITCH development community also assures a vibrant platform on which new features can be innovated. The Right Tool For The Job: While Asterisk is a bit more mature, and possibly more suitable for a more specific and traditional PBX requirements, FreeSWITCH is a much better solution for this case. It is a platform that offers more possibilities beyond simple VoIP telephony. As Anthony Minnesale, the author of FreeSWITCH has stated, “Asterisk is an open source PBX. FreeSWITCH is an open source soft switch”. For performance, stability and versatility reasons, FreeSWITCH is the right tool for a diverse VOIP solution.

“Tale of the Tape”

Asterisk vs FreeSwitch image

Northforge

Northforge has the necessary skills and knowledge to make VOIP solutions happen. Industry knowledge of telephone switching and networking, with a focus on VoIP protocols and SIP, allows Northforge engineers to design modules on the FreeSWITCH platform to meet new goals. Strong development skills in C, Python and Perl, combined with experience with XML, embedded Linux and varied database technologies not only allows Northforge to implement FreeSWITCH as the core of a new solution, but allows for other technologies to be integrated for more comprehensive solutions. With FreeSWITCH at the new core of a VoIP/multimedia-based service, there are significant advantages including performance and functionality. Combined with its advanced SIP functionality and increased codec support, FreeSWITCH opens up new possibilities. Top

July 2, 2013 – Measuring the Business Value of Cloud Computing

Source: Treadway, John, April 26, 2013. “Measuring the Business Value of Cloud Computing.” http://www.cloudtp.com/measuring-the-business-value-of-cloud-computing/ My favorite and least favorite question I get is the same: “Can you help me build a business case and ROI for cloud computing?” Well, yes… and no. The issue is that cloud computing has such a massive impact on how IT is delivered that many of the metrics and KPIs that are typically used at many enterprises don’t capture it. I mean, how do you measure agility – really? In the past I have broken this down into 3 buckets. Although some people have more, these are the key three:

Agility

Agility is reducing cycle time from ideation to product (or system delivery), which is incredibly difficult to measure given that it’s hard to do apples to apples when every product/project is unique. You can do this in terms of Agile methodology task points and the number of points per fixed time frame sprint on average over time. Most IT shops do not really measure developer productivity in any way at the moment so it’s pretty hard to get the baseline let alone any changes. Agility, like developer productivity, is notoriously difficult to quantify. I have done some work on quantifying developer downtime and productivity, but Agility is almost something you have to take on faith. It’s the real win for cloud computing, no matter how else you slice it.

Efficiency

In a highly automated cloud environment with resource lifecycle management and open self-service on-demand provisioning, the impetus for long-term hoarding of resources is eliminated. Reclamation of resources, only using what you need today because it’s like water (cheap and readily available), coupled with moving dev/test tasks to public clouds when at capacity (see Agility above) can reduce the dev/test infrastructure footprint radically (50% or more). Further, elimination of manual processes will reduce labor as an input to TCO for IT. In a smaller dev/test lab I know of, with only 600 VMs at any given time, 4 FTE onshore roles were converted to 2 FTE offshore resources. There’s a very deep book on this topic that came out recently from Joe Weiman called Cloudonomics (www.cloudonomics.com). One of the key points is to be able to calculate the economics of a hybrid model where your base level requirements are met with a fixed infrastructure and your variable demand above the base is met with an elastic model. A key quote (paraphrase) “A utility model costs less even though it costs more.” Cloudonomics is based on a paper entitled The Mathematical Proof of the Inevitability of Cloud Computing, which can be summarized in the following graphic: Mathematical-Proof-of-the-Inevitability-of-Cloud-Computing1 Source: Joe Weiman in “Cloudonomics” A hybrid model is the most cost-effective, which is “obvious” on the surface but now rigorously proven (?) by the math. P = Peak. T = Time. U = the utility price premium. If you add the utility pricing model in Joe Weiman’s work to some of the other levers I listed above, you get a set of interesting metrics. Most IT shops will focus on this to provide the ROI only. They are the ones who are missing the key point on Agility. However, I do understand the project budgeting dance and if you can’t show an ROI that the CFO will bless, you might not get the budget unless the CEO is a true believer.

Quality

What is the impact of removing human error (though initially inserting systematic error until you work it through)? Many IT shops still provision security manually in their environments, and there are errors. How do you quantify the reputation risk of allowing an improperly secured resource be used to steal PII data? It’s millions or worse. You can quantify the labor savings (Efficiency above), but you can also show the reduction in operational risk in IT through improved audit performance and easier regulatory compliance certification. Again, this is all through automation. IT needs to get on the bandwagon and understand the fundamental laws of nature here — for 50-80% of your work even in a regulated environment, a hybrid utility model is both acceptable (risk/regulation) and desirable (agility, economics, and quality).

Do a Study?

The only way to break all of this down financially is to do a Value Engineering study and use this to do the business case. You need to start with a process review from the outside (developer) in (IT) and the inside (IT) out (production systems). Show the elimination of all of the manual steps. Show the reduced resource footprint and related capex by eliminating hoarding behavior. Show reduced risk and lower costs by fully automating the provisioning of security in your environment. Show the “cloudonomics” of a hybrid model to offset peak demand and cyclicality or to eliminate or defer the expense of a new data center (that last VM with a marginal cost of $100 million anybody?).

History Lesson

In 1987 the stock market crashed and many trading floors could not trade because they lacked real-time position keeping systems. Traders went out and bought Sun workstations, installed Sybase databases, and built their own. They didn’t wait for IT to solve the problem – they did it themselves. That’s what happens with all new technology innovation. The same thing happened with Salesforce.com. Sales teams just started using it and IT came in afterwards to integrate and customize it. It was obviously a good solution because people were risking IT’s displeasure by using it anyway. If you really want to know if cloud computing has any business value, take a look at your corporate credit card expenses and find out who in your organization is already using public clouds – with or without your permission. It’s time to stop calculating possible business value and start realizing actual returns from the cloud. About the author: John Treadway is a Senior Vice President for Cloud Technology Partners. Top

June 10, 2013 – Advancing Network and Security Performance

Source: Smith, Matt,  May 17, 2013. “Advancing Network and Security Performance.” http://blog.smartbear.com/performance/advancing-network-and-security-performance/ While server virtualization has actually been available for more than a decade, many IT professionals still refer to it as a relatively new technology. The comfort and familiarity of a world made up of only physical servers caused many to resist even looking into the virtual platform, but changes in the overall technology landscape are forcing some to re-think implementing this technology, especially because of its enhanced performance. Here, we will take a closer look at some of those benefits.

Cost Reduction

Some of the first benefits noted by IT professionals was the significant cost reduction in moving network servers from the physical-only to virtual servers. One very sophisticated virtual server, with all the respective bells and whistles needed to keep it operating at maximum efficiency may come with a hefty price tag; however, it generally only comes to fraction of the cost of replacing a number of physical servers and equipping each of them with the respective software necessary. Plus, the energy costs would obviously drop significantly.

Recovery Time

The benefits hardly stop there, though. The ability to recover from server malfunctioning increases exponentially with a virtual server. For example, if your network operates with eight individual physical servers and one of them experiences a major malfunction, that server stays down until the computer can be fixed (which, as well all know, can take days). By stark contrast, with a network running on, say, two or three virtual servers, when one of those servers goes down, the machines operating off that one failed virtual server can be brought up again on one of the other servers in – get this – mere minutes.

Task Management

Wait, it gets better. With a virtual server, you almost always purchase a hypervisor operating system. In effect, the hypervisor software manages the tasks in the cloud to assure everything is functioning properly and in an orderly manner. By investing in a more advanced hypervisor operating system, downtime is virtually eliminated because the software can be programmed to instantly and automatically transfer the operating functions from a failed system to the other server(s).

Data Backup

Anyone who has worked on a network for any length of time probably has nightmares surrounding disaster recovery. When your network of physical-only servers is lost due to some unexpected occurrence (like tornadoes, fires, etc.), it can take days or even weeks of nearly round-the-clock effort to restore each of those servers, especially when your network is operating from a number of separate physical-only servers. Virtualization eliminates the challenge of individually rebuilding each server; instead, the network administrators can rebuild one single host server, reinstall the hypervisor software. Virtual machines are backed up onto tape; thus, once the host server and hypervisor are back in place, the data backed up onto that tape can easily be used to restart the virtual machines. This process will likely save your days, or even weeks, in the even of an actual disaster situation. The virtualization world, no matter which platform you choose, will offer up such buzzwords as distributed resource scheduling, fault tolerance, live migration, high availability and storage migration. In short, these buzzwords simply suggest an ability to continue running seamlessly as well as the capacity to recover swiftly from any unexpected outages. This capacity is virtually unheard of with physical servers and can make your work life much easier. Finally, speaking of network buzzwords, you’ve undoubtedly been inundated with talk about moving, sooner or later, to “the cloud.” Whether you’re for it or against it, the move to network virtualization brings you one giant leap closer to a move to “the cloud” whenever your organization chooses to implement it. To properly deploy your network into virtualization, the same types of best practices that would routinely be observed with physical servers should also be in effect with virtualized servers. The lack of proper management of a virtual system can leave your network open to malware and other security threats. However, with the proper understanding, implementation and backups, the virtual network can open you up to performance and security you previously thought technologically improbable. About the author: Matt Smith is a Dell employee who writes to help raise awareness on the topic of Virtualization and other network management subjects. Top

April 16, 2013 – IMS: Delivering Multimedia over Next Generation Networks

The IP Multimedia Subsystem (IMS) is a standard based ALL IP next generation network architecture framework.  It enables openness, flexibility, adaptability, functional reusability, policy access and charging controls for the delivery of legacy and innovative services across fixed and mobile access networks that support person-to-person, person-to-machine, and machine-to-machine (M2M) communications. IMS gives network operators and service providers the ability to control and charge for the traffic and services offered to the end users.  Users served by an IMS network are able to access their communication services and applications while roaming as if they were in their home network and from their device of choice. Additionally, IMS encourages a multi-vendor offering of a variety of data products, services and solutions.  Network operators and service providers are no longer locked-in with one vendor does all offering. With IMS network operators and service providers have access to a wider set of innovative products while enabling them to place controls on CAPEX and OPEX. The IMS network architecture defines several functional components that could each be offered in individual nodes or combined in a single network node.  The IMS functional components have been defined to serve specific network planes:

  • user plane (access control, routing and negotiation of multimedia media streams – voice, video, messaging),
  • control plane (routing and control of signaling protocols such as SIP, Diameter to enable the communication set up between the interested parties), and
  • services plane (triggering and interworking with network provider and 3rd party applications to deliver innovative communication services) 

  IMS open interfaces and procedures have been defined and standardizedby respected world organization and standard bodies such as 3GPP, 3GPP2, ITU, IETF, Packet-cable, OMA, and GSMA. The defined standard interfaces and procedures enable:

  • the interworking of the IMS functional components within and across network planes,
  • deployment of multi-vendor best of breed IMS products and solutions,
  • independence from specific network access technology,
  • flexible service access by users at any location, at any time, from different devices and client applications

The IMS network infrastructure can be seen as the glue required to support a feature rich and flexible ecosystem that can satisfy present and future communication service requirements  based on the strengths and individual capabilities of components and network technologies across domains:

  • circuit switch,
  • cable and fixed packet data,
  • 2G/2.5G/3G/4G LTE and WLAN mobile networks

At the core of the network control plane a set of key functional entities called “Call Session Control Function” (CSCF) have been identified. Four kinds of CSCFs are currently defined:

  • Proxy CSCF (P-CSCF)
  • Interrogating CSCF (I-CSCF)
  • Serving CSCF (S-CSCF)
  • Emergency CSCF (E-CSCF)

Some or all of these CSCF functions can be combined in a single network element or can be deployed in individual network elements. The CSCFs are responsible for establishing and controlling multimedia/ single & multi-party communication sessions, ensuring secure and high quality communication, supporting user roaming, service triggering and delivery across different network access technologies, network domains and communication devices. Northforge’s software engineering consultants have contributed to the enhancements and innovation of world class IMS products that enable communication services and solutions for fixed, WLAN, and 2.5G/3G/4G-LTE mobile networks.  The Northforge IMS product development team, working closely with customers, has analyzed, proposed, designed and implemented key software feature optimizations, as well as innovative enhancements, in compliance with the latest 3GPP IMS and IETF SIP standards.

Top

February 4, 2013 – Puzzled over a Cloud Launch

Puzzled over how to approach a cloud application or migration – Consider an RPM – Rapid Prototype Model Preparing to migrate your technology application the cloud is challenging and in some cases absolutely puzzling. Conceptualizing everything from interface design to functionality is difficult even for the most advanced developers and product engineers. That’s where Northforge Innovations’ RPM – Rapid Prototype Model is invaluable for creating a working model of your application in as little as two weeks. RPM helps address the variables and engineering challenges of any cloud migration project by developing a functional model of the application in advance of full-on development.

RPM delivers:

  • A functional prototype of the application allowing for further conceptualization of design or advancement of application build requirements
  • Product roadmap planning – enabling product managers to define requirements for current and future application releases
  • Cost analysis of advancing project development allowing you to define necessary and nice to have feature sets and associated costs for every aspect of development
  • A usable product demo for internal review and evaluation from customers and technology partners allowing you to alter features and functionality in advance of finalizing design and development strategy

Top

January 29, 2013 – AppDev – More than Mobile

Engineering on the Move

– For Mobility, developing for devices is now the new standard for how employees, customers and consumers are engaging with you and your competitors technology. There are multiple choices available to develop, mange, resell and/or host products, however, it’s still an evolving technology and infrastructure which is why the right choices need to be taken at the initialization stage to ensure flexibility is integrated throughout the product and business sides of your company. Products available on mobile devices need to keep pace with the exponential revolution in technology and devices. However the majority of software products continue to be sold with the traditional on-premise deployment model. Technology companies who have yet to transition to a SaaS delivery model face a number of challenges:

  • Engineering and maintenance of legacy technology
  • Integration of other vendors’ SaaS products with premise based technology
  • Increased competition and more competitive pricing models from new or evolved SaaS-based competitors

What are your challenges – add to our list or call us and see how we can help you address those challenges today and for the future. Top

December 6, 2012 – Development Strategy 2.0

If your next development strategy or software release doesn’t involve the cloud or SaaS, it should. Here’s 10 Reasons for Making the Move to the Cloud from one of the leaders of the SaaS Cloud movement – Salesforce.com Simply put, cloud computing is computing based on the internet. Where in the past, people would run applications or programs from software downloaded on a physical computer or server in their building, cloud computing allows people access the same kinds of applications through the internet. It is a solution growing in popularity, especially amongst SMEs. The CRN predicts that by 2014, small businesses will spend almost $100 billion on cloud computing services. So why are so many businesses moving to the cloud? It’s because cloud computing increases efficiency, helps improve cash flow and offers many more benefits. The second a company needs more bandwidth than usual, a cloud-based service can instantly meet the demand because of the vast capacity of the service’s remote servers. In fact, this flexibility is so crucial that 65% of respondents to an InformationWeek survey said “the ability to quickly meet business demands” was an important reason to move to cloud computing. When companies start relying on cloud-based services, they no longer need complex disaster recovery plans. Cloud computing providers take care of most issues, and they do it faster.Aberdeen Group found that businesses which used the cloud were able to resolve issues in an average of 2.1 hours, nearly four times faster than businesses that didn’t use the cloud (8 hours). The same study found that mid-sized businesses had the best recovery times of all, taking almost half the time of larger companies to recover. In 2010, UK companies spent 18 working days per month managing on-site security alone. But cloud computing suppliers do the server maintenance – including security updates –themselves, freeing up their customers’ time and resources for other tasks. Cloud computing services are typically pay as you go, so there’s no need for capital expenditure at all. And because cloud computing is much faster to deploy, businesses have minimal project start-up costs and predictable ongoing operating expenses. Cloud computing increases collaboration by allowing all employees – wherever they are – to sync up and work on documents and shared apps simultaneously, and follow colleagues and records to receive critical updates in real time. A survey by Frost & Sullivan found that companies which invested in collaboration technology had a 400% return on investment. As long as employees have internet access, they can work from anywhere. This flexibility positively affects knowledge workers’ work-life blanace and productivity. One study found that 42% of working adults would give up some of their salary if they could telecommute, and on average they would take a 6% paycut. According to one study, “73% of knowledge workers collaborate with people in different time zones and regions at least monthly”. If a company doesn’t use the cloud, workers have to send files back and forth over email, meaning only one person can work on a file at a time and the same document has tonnes of names and formats. Cloud computing keeps all the files in one central location, and everyone works off of one central copy. Employees can even chat to each other whilst making changes together. This whole process makes collaboration stronger, which increases efficiency and improves a company’s bottom line. Some 800,000 laptops are lost each year in airports alone. This can have some serious monetary implications, but when everything is stored in the cloud, data can still be accessed no matter what happens to a machine. The cloud grants SMEs access to enterprise-class technology. It also allows smaller businesses to act faster than big, established competitors. A study on disaster recovery eventually concluded that companies that didn’t use the cloud had to rely on tape backup methods and complicated procedures to recover – slow, laborious things which cloud users simply don’t use, allowing David to once again out-manoeuvre Goliath. Businesses using cloud computing only use the server space they need, which decreases their carbon footprint. Using the cloud results in at least 30% less energy consumption and carbon emissions than using on-site servers. And again, SMEs get the most benefit: for small companies, the cut in energy use and carbon emissions is likely to be 90%. Top

November 6, 2012 – What’s Your Cloud Vision?

Business Challenges for Cloud Engineering 101

– The challenges that initially sparked software and technology companies to move to a resource model for product development, engineering, testing, and support have grown exponentially with the advent of SaaS and Cloud technology. Today’s Cloud & SaaS applications bring forward completely new and evolving challenges. Finding the right skill sets and engineering talent to architect new products or migrate existing products for SaaS, Cloud or Mobile applications. SaaS and Cloud technology opens up entirely new challenges. At Northforrge, questions like; How we can ensure our customers’ data is secure and scalable if they are on the public cloud or should we host them in-house? is a consideration many traditional technology companies aren’t adept at answering. Integrating premise-based applications to SaaS applications in the Cloud is another significant challenge. Mobility offers huge innovation for companies but also huge challenges. The challenge of finding and recruiting the technology talent to build solutions that can support the new SaaS/Cloud reality is a difficult one to solve without considering leveraging the talent and expertise of the Northforge Innovations engineering team. Top As an example, Google’s OpenId authenticated token only tells a web application that the user  is who the user claims to be whereas an API provided by Google will give a third party limited access to, say, your Google Calendar API provided that you grant access to it explicitly. Other OpenId providers include Yahoo, Blogger, Flickr just to name a few.

OpenID for Authentication: An Example