BigFoot Networks Killer NIC: Killer Marketing or Killer Product?
by Gary Key on October 31, 2006 2:00 AM EST- Posted in
- Networking
Technology behind the Killer NIC
We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.
The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.
The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.
The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.
These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.
However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.
Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.
Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?
BigFoot Networks believes it is and we will see what they have to say about it and their technology next.
We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.
The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.
The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.
The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.
These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.
However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.
Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.
Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?
BigFoot Networks believes it is and we will see what they have to say about it and their technology next.
87 Comments
View All Comments
Gary Key - Wednesday, November 1, 2006 - link
We have been trying to develop a benchmark for BF2142 and our issues always revolve around the Titan when it is full. ;-) I tried BF2142 right before we ended testing with the Killer NIC and could not tell any difference with it. However, I did not benchmark while we were trying to develop a benchmark. If I get a chance I will go back and try it with the new drivers.soydeedo - Wednesday, November 1, 2006 - link
cool beans. thanks for that quick first impression. i was just curious if it could somehow benefit from the packet optimization etc. anywho, keep us posted should you find something noteworthy with the new drivers. =)goinginstyle - Wednesday, November 29, 2006 - link
Any update on BF2142?Nehemoth - Tuesday, October 31, 2006 - link
Now i just need that anandtech review thishttp://www.hfield.com/wifire.htm">http://www.hfield.com/wifire.htm
yyrkoon - Wednesday, November 1, 2006 - link
Looks like a flat panel, and you'd do better with a 21-23DB gain Andrew, trust me, I've had the last two years to play with both since we've been wireless internet for about that long. We have just now switched (tonight, just got he hardware) to AT&T 'Wi-Max', and it is much much better than our previous provider using 802.11/G. Get this, it doesnt even need a dirrectional, just set it next to a window (such is true in our case), and you're getting an instant 2.52Mbit from a tower 8 miles away.It's pretty dahmed cool, and I didnt believe it myself, until I hooked up a neibors system for him, and he's got it in a window that sits on the opposite side of his house from the tower. Although, from the little technical information the tech support team was able to provide me with, it's only availible in our town, and only if you cant get DSL, supposedly, this is some sort of trial service for them, to determine whether its feasable to setup in other areas *shrug*. Nothing like downloading at 200 KB/s +, seen it swing as high as 800+ KB/s
feelingshorter - Tuesday, October 31, 2006 - link
Buddy, that thing is realistic. Dont tell me you never herd of a directional antenna?!?!? Thats all it is. No its not overpriced because good antennas cost a lot and it does stop your internet from dropping.Frumious1 - Tuesday, October 31, 2006 - link
Only problem is it's completely impractical for laptops where you move around a lot. For desktops, if you want a consistent quality connection, just run the damn wire and be done with it. The fastest wireless 802.11 stuff can't even come close to 100 Mbit for typical use, let alone gigabit!yyrkoon - Tuesday, October 31, 2006 - link
I have to admit, I'm a bit disappointed in you fellas, for not even benching the in-expensive Intel PCI-E NIC http://www.tigerdirect.com/applications/searchtool...">Intel Pro 1000 PCI-E, Or atleast comparring the two. For $40 USD, this card should perform very close, if not better than the $300usd 'snake oil' NIC.
*sigh*
Gary Key - Tuesday, October 31, 2006 - link
We tested the Intel PRO/1000 PT and the Koutech PEN120 PCI-Express Gigabit adapters. Both adapters scored slightly less than the NIVIDIA NIC across the board in our tests so we did not show the results. Both cards support Linux so that is a plus but then again we were reviewing a NIC designed for Windows based gaming.yyrkoon - Wednesday, November 1, 2006 - link
Hmm, guess i missed that review, however, the last review on saw on the Intel PCI, and Onboard Intel solutions (a year or so ago from *ahem* THW, showed both those leading the pack, of course, I guess the killer NIC wasnt availible at that time . . .