Conclusions So Far

Of one thing we are sure: the "cheaper, smaller, higher volume option historically wins" is a very weak argument to make when claiming that ARM SoCs will overtake Intel in the server market. It is hard to make all of the puzzle pieces come together: performance, power, volume, and software. Low prices and volume are not enough. We would love to see some real competition in the server market, but Intel is a lot better positioned today to fend off attacks than the RISC players were back in the 90s.

The current ARM server SoCs are a lot more powerful than Calxeda's ECX-1000, but they do not face a hopelessly outdated Atom S1200 anymore. The Atom C2000 is a huge step forward and the Xeon E3 has continued to evolve in such a way that even eight of the best ARM cores cannot deliver more raw integer processing power than a quad-core E3 with SMT. Meanwhile, the Xeon-D will offer all the advantages of the high performance "Broadwell" architecture, the flexibility of Intel's Turbo Boost, Intel's excellent process technology, and the highly integrated Atom C2000 SoC in one very competitive package.

The first – albeit very rough – performance data indicates that the server ARMada is not ready (yet?) to take on the best Intel Xeons in a broad range of server applications, at least in terms of performance. However, the ARM challengers do have an opportunity. Despite the massive number of Intel SKUs, Intel's market segmentation is rather crude and assumes that all customers can easily be categorized into three (maybe four) large groups: For low budgets, get the low range Xeon E3 (e.g. E3-1220 v3). Pay a bit more and you get Hyper-Threading and higher clock speeds (E3-1240 v3). Pay slightly more and you get another speed bump. Pay much more and you get four memory channels. We'll throw in more cores and a larger cache as a bonus (Xeon E5).

What if I have a badly scaling HPC application (low core count) that needs a lot of memory bandwidth? There is no Xeon E3 with quad channel. What if I need massive amounts of memory but moderate processing power? The Xeon E3 only supports 32GB. What if my application needs lots of cores and bandwidth but does not benefit from large and slow LLC caches? There is no Xeon E5 for that; I can only choose one of the most expensive E5s. And these examples are not invented; applications like these exist in the real world and are not exotic exceptions. What if my application benefits from a certain hardware accelerator? Buy a few 100k of SoCs and we'll talk. Intel's market segmentation is based largely on the assumption that every need (I/O, caches, memory bandwidth, memory capacity) is proportional to processing power.

 

The ARM based challengers have the potential to serve those "odd" but relatively large markets better. The cost to develop new SoCs is lower and ARMv8 has the inherent RISC advantage of spending fewer transistors on ISA complexity. This lowers the Intel advantage of process technology leadership.

Cavium has a clear focus and targets the scale-out, telecom, and storage markets. We are very curious how the first chip which is specialized for "scale-out" applications will perform. It has been a long time since we have seen such a specialized SoC and it is crystal clear that performance will vary a lot depending on the application. Our first impression is that the chip will be ideal running lots of network intensive virtual machines on top of a hypervisor, such as Xen or KVM.

AppliedMicro's X-Gene seems to target a much wider range of applications, attacking the Intel Xeon E3 and the fastest Atom C2000. The hardware accelerators and quad-channel memory should give it an edge in some server applications while staying close enough in others. Much will depend on how quickly the X-Gene 2 is available in real servers. The X-Gene 2 "ShadowCat" is already up and running, so we have high hopes.

Broadcom seems to have a similar approach. Broadcom is late but is a market leader with deep pockets and an impressive list of customers. The same is true for Qualcomm. But we needs specs and not just broad and vague statements before we dedicate more words to the server plans of Qualcomm.

AMD's Opteron A1100 is definitely betting on undercutting Intel's low-end Xeons in price and features. Everything about it screams "time to market, inexpensive but proven low power design". The more ambitious AMD ARM SoCs will come later, however, as the current A1100 is missing a crucial feature: a link to the Freedom Fabric. The network fabric is a critical feature as OEMs can then build a low power, high performance networked micro server cluster. It was the strongest point of the Calxeda based servers as it kept power per node low, offered very low latency network, and lowered the investments in expensive network gear (Cisco et al.). AMD is a well known brand with the enterprise folks and has a lot of unique server/HPC IP.

Last but not least, many enterprises in the IT world including HP, Facebook and Google want to see more competition in the server market. So all ARM licensees can count on some goodwill to make it happen.

We from our side have been preparing as well. We have developed several new benchmarks to test this new breed of servers. Hard numbers say more than just words, but you'll have to wait for part two of this series for those.

 

 

The RISC Advantage
Comments Locked

78 Comments

View All Comments

  • hojnikb - Tuesday, December 16, 2014 - link

    Wow, i have never motherboard that simple :)
  • CajunArson - Tuesday, December 16, 2014 - link

    OK you devote another huge block of text to the typical x86 complexity myth* followed by: Oh, but the ARM chips are superior because they have special-purpose processors that overcome their complete lack of performance (both raw & performance per watt).

    Uhm... WTF?? I need to have a proprietary, poorly documented add-on processor to make my software work well now? How is that a "standard"? How is requiring a proprietary add-on processor that's not part of any standard and requires boatloads of software cruft working in a "reduced instruction set architecture" exactly?

    I might as well take the AVX instruction set for modern x86... which is leagues ahead of anything that ARM has available, and say that x86 is now a "RISC" architecture because the AVX part of x86 is just as clean or cleaner than anything ARM has. I'll just conveniently forget about the rest of x86 just like the ARM guys conveniently forget about all the non-standard "application accelerators" that are required to actually make their chips compete with last-year's Atoms.

    * Maybe in a micro-controller setting where you are using a PIC or Arduino the x86 decoding is a real issue, but in a server? Please. Considering the only hard numbers you have show a 2013-model Atom beating a 2015-model ARM server processor, you'll have to try harder.
  • hlmcompany - Tuesday, December 16, 2014 - link

    The article describes ARM chips as becoming more competitive, but still lagging behind...not that they're superior.
  • Kevin G - Tuesday, December 16, 2014 - link

    The coprocessor idea is something stems from mainframe philosophy. Historically things like IO requests and encryption were always handled by coprocessors in this market.

    The reason coprocessors faded away outside of the mainframe market is that it was generally cheaper to do a software implementation. Now with power consumption being more critical than ever, coprocessors are seen as a means to lower overall platform power while increasing performance.

    Philosophically, there is nothing that would prevent the x86 line from doing so and for the exact same reasons. In fact with PCIe based storage and NVMe on the horizon in servers, I can see Intel incorporating a coprocessor to do parity calculations for RAID 5/6 in there SoCs.
  • kepstin - Tuesday, December 16, 2014 - link

    Intel has already added some instructions in avx and avx2 that vastly improve the performance of software raid5 and 6; the Haswell chip in my laptop has the Linux software raid implementation claiming 24GiB/s raid5 with avx, and 23GiB/s raid6 with avx2 (per core).
  • MrSpadge - Tuesday, December 16, 2014 - link

    Of course additional power draw for more complex instruction deconding mattes in servers: today they are driven by power-efficiency! The transistors may not matter as much, but in a multi-core environment they add up. Using the quoted statement from AMD of "only 10% more transistors" means one could place 11 RISC cores in the same area for the same cost as 10 otherwise identical x86 cores. Johan said it perfectly with "the ISA is not a game changer, but it matters".

    And you completely misunderstood him regarding the accelerators. Intel is producing "CPUs for everyone" and hence only providing few accelerators or special instructions. In the ARM ecosystem it's obvious that vendors are searching niches and are willing to provide custom solutions for it - hence the chance is far higher that they provide some accelerator which might be game-changing for some applications.

    This doesn't mean the architecture has to rely entirely on them, neither does it mean they have to be undocumented. The accelerators do not even have to be faster than software solutions, as long as they're easy enough to work with and provide significant power savings. Intel is doing just that with special-purpose hardware in their own GPUs.

    And don't act as if much would have changed in the Atom space ever since 22 nm Silvermont cores appeared. It doesn't matter if it's from 2013 or 2015 - it's all just the same core.
  • OreoCookie - Tuesday, December 16, 2014 - link

    What's with all the unnecessary piss and vinegar?

    All CPU vendors rely increasingly on specialized silicon, newer Intel CPUs feature special crypto instructions (AES-NI) and Quick Sync, for instance. Adding special purpose hardware to augment the system (in the past usually done for performance reasons) is quite old, just think of hardware RAID cards and video »accelerators« (which are not called GPUs). The reason that Intel doesn't add more and more of these is that they build general purpose CPUs which are not optimized for a specific workload (the article gives a few examples). In other environments (servers, mobile) the workload is much more clearly defined, and you can indeed take advantage of accelerators.

    The biggest advantage of ARM cpus is flexibility -- the ARM ecosystem is built on the idea to tailor silicon to your demands. This is also a substantial reason why Intel's efforts in the mobile market have been lackluster. Recently, Synology announced a new professional NAS (the DS2015xs) which was ARM-based rather than Intel-based. Despite its slower CPU cores, the throughput of this thing is massive -- in part, because it sports two (!) 10 GBit ethernet ports out of the box. Vendors are looking for niches where ARM-based servers could gain a foothold, so they are trying a lot of things and see what sticks.
  • goop666666 - Saturday, December 20, 2014 - link

    LOL! Most of the comments here like this one seem to be written by people who think computers should all be like gaming machines or something.

    Here'a tip: no-one cares about "complexity," "standardization," "RISC," or anything else you mention. All they care about in the target market for ARM server chips is price, performance and power, and I mean ALL THREE.

    On this Intel cannot compete. They sell wildly overpriced legacy hardware propped up by massive R&D expenditures and they're wedded to that model. The rest of the industry is wedded to the new and cheap model. Just like how the industry moved to mobile devices and Intel stood still, this change will also wash over Intel while they sit still in denial.

    There's a reason why Intel stock has gone no-where for years.
  • nlasky - Monday, December 22, 2014 - link

    Jan 8, 2010, Intel stock price $20.83. Dec 19, 2010, Intel stock price $36.37. If by gone no-where in for years you mean increased by 70% I guess you would be correct. Intel can't compete because they are wedded to their model? They have a profit margin of 20% and an operating margin of 27%. They could easily cut prices to compete with any ARM offerings. Servers have been around forever, unlike the mobile computing platform. Intel has an even larger stranglehold on this industry than ARM has in the mobile space. Here's a tip - stop spewing a bunch of uniformed nonsense just to make an argument.
  • nlasky - Monday, December 22, 2014 - link

    *Dec 19, 2014

Log in

Don't have an account? Sign up now