With mobile technologies, the shift to the cloud paradigm, and increased demand for content delivery networks, traffic classification is becoming more and more complex. In this blog, let’s discuss the need for more advanced traffic classification in a SDN environment. This includes a definition of traffic classification for development of traffic classification products.
Definition of network traffic classification
Network traffic classification categorizes traffic flows into different classes determined by networking protocols (L2-L7), traffic types, services and applications. The purpose of such classification includes:
- Traffic profiling
- Distributed Denial of Service (DDoS) defense,
- Content aware forwarding and load balancing
- Data leak prevention
- Service level agreement enforcement
- Network monitoring and trouble shooting.
A traffic flow is represented by the classic 5-tuple: the source IP, the source port, the destination IP, the destination port and the protocol number.
Levels of Classifications
There are four main levels of classifications:
- Web service
- Application protocol
- Standard protocol
Classifying at the protocol level means detecting traffic of an application or service flow by the protocols being used, for example, ETH, IP, TCP DNS, HTTP, RTP, RTSP, SMTP and so on. The web service level classification inspects the traffic flows deeper to classify web applications such as Yahoo, Google or YouTube. There are two kinds of application classification. The first is application level matching of traffic flows to internet applications which use both standard and propriety protocols in communications. “Skype” and uTorrent are typical examples of such applications. The second is standard protocol applications matching using only standard internet protocols. If the classifier is only able to identify the traffic as HTTP or SSL, the classification is considered incorrect or incomplete.
Classifiers’ design employs various techniques of identifying network traffic, ranging from port-based identification and payload-based identification, to statistical classification and behavioral classification. Classifying an application based on its port number includes inspecting the port number in packet headers and matching them with the standard TCP or UDP port number registered with the IANA. The Payload-based classification technique is the classical deep packet inspection method in which the software finds particular patterns (signatures) inside individual packets or flow of packets. Since the payload-based implementation inspects each packet in the network traffic, it comes with a high computational cost.
Port-based and payload-based inspection are the two methods that are widely implemented in a DPI engine. A classifier implementing only port-based inspection is no longer effective with internet traffic. The main reason is that non-legitimate web services and applications hide themselves behind well-known ports. At the same time, the payload-based method is unable to inspect encrypted traffic and thus becomes ineffective with encrypted traffic flow. Another approach to classify encrypted network traffic is to use “flow association” technique in the network traffic. The flow association is a technique that recognizes network traffic based on pre-determined factors. For example, SIP signalling packets might contain the information on port numbers and IP addresses that voice traffic would use.
The statistical classifier tackles the problem of finding a particular class of applications flows that are not easily detected by the port-based and payload methods; either the traffic is encrypted or flows are asymmetric. Statistical classification is considered a more advanced technique and being actively studied in academia. Newer DPI products are incorporating this technique.
Statistical classification is based on statistical characteristics, data mining techniques, and more specifically, machine learning (ML) algorithms. More advanced statistical classification requires machine learning because such classification requires different traffic patterns from large datasets.
Machine learning uses characteristics of the sequence of internet packets to identify the class of a flow. There are broadly four types of machine learning defined in the research literature. They are classification, clustering, association and numeric prediction. Only classification and clustering are widely adopted in network traffic classification. Briefly, classification uses large known sample datasets to build rules for classification. Software engine use these computed rules to identify flows in new datasets. Since such learning requires known datasets, it is a type of supervised learning. Clustering first identifies patterns of flows and then groups the flows with similar patterns into clusters. Clustering is unsupervised learning because users do not provide any known datasets for building patterns.
Simple statistical classification uses readily available metrics. These metrics can be collected and computed to find the type of traffic flow. For example, some classifiers use a sample size of 10 packets to determine the mean size of each packets, mean arrival time, and mean bit rate to classify Skype traffic. There are many other statistical approaches to classifying such traffic. All these methods attempt to classify a flow of traffic with lower computation costs and without dedicated hardware deployments on each type of application.
The behavioral classification technique observes the entirety of network traffic received by the endpoint (host), seeking to identify the type of application by analyzing the generated network traffic patterns from the target host. This is a holistic approach to look for anomalies in a network.
DPI in SDN
In the recent development of software defined networking (SDN), traffic classification can be deployed separately as a service in an SDN network. This approach removes DPI (payload based classifier) software components from the middle-boxes such as IDS, IPS, load balancer firewall, and anti-malware. The DPI is placed in a strategic location in the network so that each packet only needs to be inspected once, and the classification result can be reused by all the middle boxes.
The DPI classifier could further be put into dedicated hardware with network processors providing much higher throughput. However, the end-user applications do not need to be on the dedicated hardware. This will simplify development, integration and deployment of traffic classification software into end applications. This approach requires introducing a DPI controller to orchestrate the operation of multiple DPI engines and pass along the results to subscribers (the network application software). To give an example of such an application, an HTTP/HTTPS load-balancing system might only be interested in HTTP traffic classification results. In this approach, the load balancer does not need to re-inspect other irrelevant packets for layer 2 protocols. The load balancer simply subscribes to HTTP and HTTPS traffic classification results, process them (if needed) to find out the network load conditions, and act accordingly.
A DPI solution with SDN usually means more centralized control of flows. In this case, statistical and behavioural classification techniques in conjunction with appropriate traffic steering techniques can be applied to get the best results. DPI as a service, together with virtual network function (VNF) appliances can be deployed with a template based network function virtualization orchestration technique. Re-deploying or deploying an additional “classifier” should take much less time when there are new VNFs onboard.
Northforge has developed products that utilize each of the approaches to classification defined in this blog (i.e., Protocol, Web Service, Application Protocol, and Standard Protocol). Most recently it has been defining signatures enabling differentiation amongst Web Services. This has led to technology enabling classification of packets without decrypting or decompressing.
Need help with traffic classification? Northforge has the skills and expertise in the accelerating, securing, and insuring the fidelity of data packets traversing the internet. Our focus on the cloud infrastructure enables us to develop custom products to solve the most complex technological challenges.
Author: Michael L.