More Details on Elemental's GPU Accelerated H.264 Encoder
by Anand Lal Shimpi on June 23, 2008 10:52 PM EST- Posted in
- GPUs
The Deal with BadaBOOM
Due out in Q3, BadaBOOM is going to be the consumer version of the encoder. It will be an "affordable" program designed for those users who want to quickly take a video file and convert it to another format without playing with settings like bitrate. This is the application we were given a chance to preview.
Under the RapiHD brand, Elemental will deliver a professional version of their encoder/transcoding software. This application will allow you more options than BadaBOOM, letting you select bitrate and resolution, among other quality settings, manually.
As we mentioned before, the software was developed using CUDA and thus will only run on a CUDA-enabled NVIDIA GPU. NVIDIA has a full list here but in short, anything from the GeForce 8, GeForce 9 or GeForce GTX 280/260 families will work.
Mr. Blackman told us that the company isn't specifically tied to using NVIDIA hardware and that as Larrabee and other AMD/ATI solutions come to light it may evaluate bringing the technology to more platforms. But for now, this will only work if you have a CUDA-enabled GPU (and as such, it stands to be one of the biggest non-gaming killer apps for NVIDIA hardware).
Performance
In our testing we found that even though performance improved tremendously over a CPU-only encode, the process still required a fast host CPU (the Core 2 Extreme QX9770 was at 25 - 30% CPU utilization). It turns out that there are two factors at work here.
According to Mr. Blackman, NVIDIA's initial CUDA release didn't have the streaming mechanism that allows you to run CPU cycles in parallel with the GPU. This functionality was added in later versions of CUDA, but the early beta we tested was developed using the initial CUDA release. Once the CPU and GPU can be doing work in parallel, the CPU side of the equation should be reduced.
Secondly, it's worth pointing out that only parts of the codec are very parallelizable (motion compensation, motion estimation, DCT and iDCT) but other parts of the pipeline (syntax decoding, variable length coding, CABAC) are not so well suited for NVIDIA's array of Streaming Processors.
Elemental also indicated that performance scales linearly with the number of SPs in the GPU, so presumably the GeForce GTX 280 should be nearly 90% faster (at least at the GPU-accelerated functions) than a GeForce 9800 GTX.
SLI Support?
As we found in our GT200 article, in most cases NVIDIA's fastest GPU is actually the pair of G92s found on a single GeForce 9800 GX2. Unfortunately, Elemental's software will not split up a single video stream for processing across multiple GPUs - so NVIDIA's fastest GPU would be the GeForce GTX 280.
There is an exception however; if you do have multiple GPUs in your system, the professional version of Elemental's software will let you output to two different resolution/bitrate targets at the same time - with each GPU handling a different transcode stream.
Final Words
We've still got a couple of months before Elemental's software makes its official debut, but it's honestly the most exciting non-gaming application we've seen for NVIDIA's hardware.
We have already given Elemental some feedback as to features we'd like to see in the final version of the software (including support for .m2ts and .evo files as well as .mkv input/output). If there's anything you'd like to see, leave it in the comments and we'll pass along the thread to Elemental.
Elemental's software, if it truly performs the way we've seen here, has the potential to be a disruptive force in both the GPU and CPU industries. On the GPU side it would give NVIDIA hardware a significant advantage over AMD's GPUs, and on the CPU side it would upset the balance between NVIDIA and Intel. Video encoding has historically been an area where Intel's CPUs have done very well, but if the fastest video encoder ends up being a NVIDIA GPU - it could mean that video encoding performance would be microprocessor agnostic, you'd just need a good NVIDIA GPU.
If you're wondering why Intel is trying to launch Larrabee next year, this is as good of a consumer example as you're going to get.
50 Comments
View All Comments
mmntech - Tuesday, June 24, 2008 - link
ATI still has the software for AVIVO encoding in their drivers section. However, it only works with certain X1000 series cards. Somebody did make a crack of the program to work on all cards but I couldn't get it to work.I have no idea why they stopped it for the HD series. So far the only exercise my HD 3850 is getting lately is Folding@Home and Compiz. I agree they really missed the boat on GPU encoding. It would be nice to rip and encode a full DVD movie to DivX or AVC without it taking two hours.
djc208 - Tuesday, June 24, 2008 - link
I would think ATI is either seriously re-thinking that plan or trying to find someone to pair with for similar software on their cards. Their new "shoot for the middle" strategy is good but this will give nVidia not only an advantage in the graphics card market but also help sell more high end cards. ATI may not be able to compete on the high end but they can't afford not to compete in this new space at all.shiggz - Tuesday, June 24, 2008 - link
I hope they include a ps3 264 compatible profile.I've started noticing that lots of of my old 4-6yr old downloaded video files are just not very compatible these days. So Ive started encoding all of my files with the ps3 profile in nero. Hoping that they will 99% be compatible with ps4 or even later thus giving me hopefully another decade or more of guaranteed hardware compatibility.
ViRGE - Tuesday, June 24, 2008 - link
Speaking of profiles, did Elemental say what profiles BadaBoom and their professional applications will support? I've been told that the BadaBoom beta was limited to the Baseline profile, which is fine if you're encoding for mobile devices but not very useful for your backup scenario. Will the final version be able to encode material with High Profile features?ltcommanderdata - Tuesday, June 24, 2008 - link
Well, I definitely can't wait for OpenCL to avoid these platform dependent situations.In any case, I wonder if trying to reduce CPU load should be the only option. As in, if a GPU can encode very quickly, why not have an option where multiple CPU cores can be loaded to make things even faster while leaving a single-core free for other system tasks. I'm sure there are lots of cases where most of a quad-core processor would be free anyways, so why not use them too?
phatboye - Tuesday, June 24, 2008 - link
I was against CUDA from the start. There really needs to be an Open API so we don't get tied to one developer's hardware. Not sure if OpenCL will be it but I do hope something comes soon.ltcommanderdata - Tuesday, June 24, 2008 - link
I imagine it could be a bit difficult to deal with all the different architectures out there. I believe only nVidia's new GT200 series has IEEE 754 compliant 64-bit support, while the older 8-series and 9-series were only 32-bit. ATI's 2k-series and 3k-series only had partial IEEE 754 64-bit support and the 1k-series were GPGPU capable as well through their Pixel Shaders. With so many different platforms on the market right now, it'll be interesting to see where they find the common ground or whether OpenCL will try focus more on setting the standard for future generations. I suppose OpenCL could use OpenGL's extensions approach which will presumably allow all current GPGPU forms to be supported, but that will still leave developers having to optimize for each platform.SlyNine - Tuesday, June 24, 2008 - link
Also unless I missed it, What quality level did the GPU Transcode equal too , Fast Balanced or Insane Quality?Anand Lal Shimpi - Tuesday, June 24, 2008 - link
We couldn't really do a direct comparison - the bitrates put it at somewhere between balanced and insane.Take care,
Anand
Manabu - Wednesday, June 25, 2008 - link
>We couldn't really do a direct comparison - the bitrates put it>at somewhere between balanced and insane.
To make an comparison, is only set the x264 encoder to make an encode of the same size as your gpu-acelerated one. It don't matter that the gpu-acelerated one isn't tweakable, because x264 is. Bitrate is only size(bits)/time(seconds). Then you can compare quality. If the quality is too high in both to make an good comparison, then an reduction of bitrate would be necessary, to see witch one loose quality faster.
And I couldn't get exact infomation on AutoMKV profiles, but maybe the fastest profile try to retain an minimum of decency. DarkShikari speculates that you can reach 240+ fps in an quadcore, on an 720p stream. http://forum.doom9.org/showpost.php?p=1152036&...">http://forum.doom9.org/showpost.php?p=1152036&...