Implementing hardware accelerated IP route cache in Linux using Broadcom StrataXGS multilayer switch
By using cost-effective switching solutions, such as those from Broadcom, equipment manufacturers can reduce the cost of their solution and meet performance targets. An L3 switch can be implemented using hardware L2 packet switching and CPU based IP forwarding. L2 data packets don’t need to be modified, therefore the L2 throughput can achieve 1Gbps wire speeds.
The Linux IP stack has a robust and efficient implementation. Along with multiple routing protocols suites, both open source and proprietary, Linux is a common choice for implementing a software based router.
Although, the Linux IP forwarding code is very efficient, network equipment vendors tend to use underpowered CPUs to drive the cost down. Some of the Broadcom switches have an integrated CPU in them to reduce the cost of a product even further. Moreover the same CPU can be used for management, network control and L3 forwarding simultaneously.
This approach introduces performance issues during high traffic loads. For example, when the device forwards FTP traffic destined to another network element, device management suffers delays. A network administrator might even get a time out while configuring the device under heavy traffic. Under more severe circumstances, the network control packets (like OSPF hello packets) are dropped or delayed until the routing protocol considers the link to be unreliable. If this ever happens the routing software declares the link unavailable and notifies all routers on the network about a topology change. After the topology change the FTP traffic is redirected to avoid the unreliable link. As the packet flow over that link decreases, the routing protocol restores the link. This link flapping usually continues until the FTP transfer is completed.
To solve the performance problem of a software based router, and in order to avoid starvation of management and control protocols, IP forwarding can be delegated to the hardware, thus offloading the CPU to handle more urgent packets.
One such solution, the Broadcom StrataXGS line of products, is a multilayer switch that can be used as an enterprise solution with full line speed L2 switching and line speed L3 forwarding for up to 2K most recently used destinations. The StrataXGS switch is usually equipped with a limited amount of memory for the L3 table. This L3 table is an exact match table and its content needs to be maintained by software running on the CPU.
At the beginning of a day, the L3 table is empty and all IP packets are trapped (sent) to the Linux IP stack. Since the IP stack maintains the full routing table learned from the routing software running in user space, Linux is able to perform IP forwarding decisions. As part of the destination IP address lookup, a routing cache entry is created by the Linux implementation. Using this routing cache, all following destination lookups will be much faster. When the routing entry is actually used for forwarding, a hardware L3 entry is added asynchronously, if needed. When the L3 entry is eventually added to the hardware, the Broadcom switch will stop trapping L3 packets to the CPU and will forward them at wire speed.
An independent kernel activity, running in the background, periodically scans the L3 table. Any L3 entries not used for a specified time are removed to free up precious memory.
When a routing cache entry is invalidated as a result of the link going down or being removed by the routing software, the L3 entry should also be deleted from the hardware as well.
The removal of L3 hardware entries should not rely on the aging method described above. Consider the following scenario: the routing software finds a better route towards some destination. It redirects the route to another link, the routing cache entry is deleted, but the L3 entry is not. The existing traffic to the destination is forwarded in hardware towards the previous next hop. Since the traffic is not stopped, the L3 entry is considered in use and never deleted. Therefore, the routing cache removal routine should also delete the appropriate L3 hardware entry.
Another pitfall of this type of implementation is that once the traffic is forwarded in hardware, the software routing cache is not used and its entries will age gradually, causing routing cache flushing periodically. To resolve the problem, the aging activity that scans the L3 table, should synchronize the usage flag between the hardware and software caches.
The suggested solution is not a replacement for proper traffic management, since when the L3 table is not populated yet or if it’s already full, IP packets will be sent to the CPU for IP forwarding decision. IP packets trapped to the CPU to be forwarded should have the least priority and should also be rate limited by the Broadcom hardware before sending them to the CPU.
This simple implementation allows equipment manufacturers to reduce the cost of their solution while providing added value for the enterprise or small office clients by adding L3 functionality accelerated in hardware.