BigFoot Networks Killer NIC: Killer Marketing or Killer Product?
by Gary Key on October 31, 2006 2:00 AM EST- Posted in
- Networking
Technology behind the Killer NIC
We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.
The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.
The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.
The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.
These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.
However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.
Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.
Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?
BigFoot Networks believes it is and we will see what they have to say about it and their technology next.
We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.
The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.
The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.
The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.
These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.
However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.
Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.
Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?
BigFoot Networks believes it is and we will see what they have to say about it and their technology next.
87 Comments
View All Comments
mlau - Tuesday, October 31, 2006 - link
Correct, I haven't (I do have bills to pay and don't waste what's left on improving my laptop). To me it's absolutely not worth shelling out 500$ so that oblivion runs with 5 fps more. Reducing resolution costs nothing. With the saved money you can buy loads of beers which will make playing that game much more interesting :)The card is too expensive for what it offers, and its benefits will vanish with the
next cpu generation, no doubt. What makes the card interesting is the integrated
offload of all of linux' filtering/routing. The card is marketed to the wrong crowd.
PS: I think ati and nvidia need to be congratulated for finding another
reason for gamers to shell out money. (and look, ati also wants you to buy 3 cards in the
near future, for another completely useless thing: physics "simulation". I bet hundreds
of people can't wait to post benchmarks and how it improved their framerates and how
"physically correct" the dust now settles in $GAME)
rushfan2006 - Wednesday, November 1, 2006 - link
Agreed. I am a gamer a very long time gamer btw...if that counts for anything to do with anything...LOL...I've always built my own gaming boxes throughout the years -- so I think I have some relevant experience to base my opinions on. Though the guy is a bit brutish in how he makes his remarks, factually I believe he's correct in that right now with the state of technology the price:performance ratio for dual cards in games is just not there. If I'm going to invest a total of $1000 (two cards) I'd want to see DRAMATIC improvements. Now we all have our own standards -- so let me define mine...even 10% performance game for that investment is NOT "dramatic" to me. Research the benchmarks from your favorite tech sites, don't take my word for it -- the benchmarks speak for it.As for the topic of this Killer NIC...for me personally, as a gamer, its just a waste of money and the concept of it kind of makes me laugh to be honest.
imaheadcase - Tuesday, October 31, 2006 - link
I agree, crossfire/SLI is not all that at all. Its just a marketing tool to make gamers think they need it. The difference though is that it has some nice uses other than games.Games should be the LAST thing people should think about when getting SLI/Crossfire.
Frumious1 - Tuesday, October 31, 2006 - link
I'm not sure if you're being sarcastic or idiotic. Hopefully the former? Marketing tools are trying to peddle something that has a negligible impact. You know, convincing people to upgrade from a 2.4 GHz E6600 to a 2.93 GHz X6800 for three times the cost... maximum performance increase is 22% for a 200% price hike! CrossFire and SLI on the other hand can give up to a 90% (and usually at least 50%) performance increase for a 100% price increase.Yup, that's totally marketing. So are large LCDs, because those are completely useless. (Yes, that's sarcasm.)
feelingshorter - Tuesday, October 31, 2006 - link
How about spend 300 bucks and buy windows? My bit defender works just fine as a firewall and doesn't use 300 dollars worth of CPU. Hell, you can buy a new cpu and off set any performance hit using software firewall with 300 bucks!Hypernova - Tuesday, October 31, 2006 - link
But as the review says currently there are still NOTHING that shows the potential of FNapps. This is the card 2nd biggest selling point yet there's still nothing to show for it.cosmotic - Tuesday, October 31, 2006 - link
I don't think the "per second" is appropriate.