BigFoot Networks Killer NIC: Killer Marketing or Killer Product?
by Gary Key on October 31, 2006 2:00 AM EST- Posted in
- Networking
Technology behind the Killer NIC
We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.
The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.
The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.
The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.
These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.
However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.
Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.
Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?
BigFoot Networks believes it is and we will see what they have to say about it and their technology next.
We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.
The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.
The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.
The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.
These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.
However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.
Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.
Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?
BigFoot Networks believes it is and we will see what they have to say about it and their technology next.
87 Comments
View All Comments
stmok - Tuesday, October 31, 2006 - link
Yeah, I do agree.Its targetting at the wrong crowd. This product should be really for the hardcore enthusiasts. (I'm talking about those who actually use the command line on a regular basis). You don't expect clueless Windows users start tinkering with Linux, do you? :)
As for SLI and Crossfire? Its a bloody joke.
You buy two video cards today, and in 12 months time, they'll be outperformed by a single next generation video card. Yeah, money well spent there, isn't it?
stmok - Tuesday, October 31, 2006 - link
To be honest, if they opened up the specs for the card, and work with the community, you'd have a different product. (So they only focus on selling hardware and advising enthusiasts in how to develop software solutions for the card).yyrkoon - Tuesday, October 31, 2006 - link
So the fact that Intels NIC cards regularly perform better than atleast 99% of the competition, and the fact they have made a PCI-E card is completely lost on you ?BTW the price of the Intel card is FAR less . . .
Zebo - Tuesday, October 31, 2006 - link
It's you who is stupid. Video you get your monies worth unlike this POS, anywhere from 60-75% inprovement moving to that second card in SLI/xfire config.mlau - Tuesday, October 31, 2006 - link
as i said, i think this card is targeted at the wrong crowd. but then i don't doubtthat the windows network stack is a POS and offloading it completely to a piece of
hardware will free the host cpu for other tasks.
as for sli/xfire, performance improvements are almost not noticeable (and sometimes
perf decreases). noone except a few impressionable 12 year olds care about your fps
in fear and other shooters. i play games to be entertained and not to watch the fps
meter and tell my "friends" that "oooo i can play far cry in 2560x1200 8aa16af and still
get 120 fps!!!1!!11oneone, you cant!!". you people are pathetic.
Frumious1 - Tuesday, October 31, 2006 - link
"as for sli/xfire, performance improvements are almost not noticeable"Clearly you have never used a higher end gaming PC on modern title. I can assure that the improvements are VERY noticeable if you play with a larger LCD (even 1920x1200) and want smooth frame rates, or if you even load up Oblivion at moderate resolutions. Yes, an increase from 100 to 170 FPS in some titles is basically meaningless, but going from 20 to 35 FPS in Oblivion makes the difference between sluggish and smooth gameplay. Whether or not it's worth the price is up for debate, but just because you can't afford it and don't play enough games to justify the purchase doesn't make is pathetic.
BTW, I've got news for you moron: 12 year olds are NOT the people running SLI/Crossfire setups! But then your penis envy probably blinds you to that fact. Even in Linux, I doubt this card is worth the price of admission. $280 for another "coprocessor"? Lovely, except in another week or so $250 would add two more CPU cores and make the whole situation meaningless. Now let's just hope Vista has network stack improvements so that mutliple cores are truly useful for offloading audio and network tasks in games. Actually, that's probably at least partially a matter of getting game developers to do things more threaded-like.
Hey Gary, did you test Quake 4 with a non-SMP configuration? I understand Q4 optimizations for SMP essentially consist of running the client and server code in separate threads, so maybe the server is already offloaded and there's nothing new for the Killer to do? Gee why can't other devs do this? Lazy bums!
KAZANI - Tuesday, October 31, 2006 - link
"Whether or not it's worth the price is up for debate, but just because you can't afford it and don't play enough games to justify the purchase doesn't make is pathetic."To my mind going into a 600$ expenditure so that you can play overhyped duds such as Oblivion counts as pathetic. I am still not convinced that it's the heavy gaming that warrants dual-GPU's and not dual-GPU's warranting heavy gaming.
bob661 - Tuesday, October 31, 2006 - link
In MY mind (the ONLY mind that's important), people that criticize others choice in computer hardware and games IS indeed pathetic. I AM convinced that you are as jealous, self-righteous, asshole that probably drives in the left lane on the freeway at the speed limit because no one needs to go faster than the almighty YOU.rushfan2006 - Wednesday, November 1, 2006 - link
Agreed...there is a lot of people being dicks on this thread. I just don't understand it.If you don't like a game or something, just don't buy it - you can make your opinion about it so long as it offers some kind of value -- calling out the performance or problems with the product. But to associate someone's buying choice then calling them names its just gets ridiculous....its like grow the hell up already.
KAZANI - Wednesday, November 1, 2006 - link
Escuse me? You're spending 600$ to play Oblivion and you're telling me to "grow up"? DUDE, YOU NEED TIME OFF THE COMPUTER!