More Details on Elemental's GPU Accelerated H.264 Encoder
by Anand Lal Shimpi on June 23, 2008 10:52 PM EST- Posted in
- GPUs
The Deal with BadaBOOM
Due out in Q3, BadaBOOM is going to be the consumer version of the encoder. It will be an "affordable" program designed for those users who want to quickly take a video file and convert it to another format without playing with settings like bitrate. This is the application we were given a chance to preview.
Under the RapiHD brand, Elemental will deliver a professional version of their encoder/transcoding software. This application will allow you more options than BadaBOOM, letting you select bitrate and resolution, among other quality settings, manually.
As we mentioned before, the software was developed using CUDA and thus will only run on a CUDA-enabled NVIDIA GPU. NVIDIA has a full list here but in short, anything from the GeForce 8, GeForce 9 or GeForce GTX 280/260 families will work.
Mr. Blackman told us that the company isn't specifically tied to using NVIDIA hardware and that as Larrabee and other AMD/ATI solutions come to light it may evaluate bringing the technology to more platforms. But for now, this will only work if you have a CUDA-enabled GPU (and as such, it stands to be one of the biggest non-gaming killer apps for NVIDIA hardware).
Performance
In our testing we found that even though performance improved tremendously over a CPU-only encode, the process still required a fast host CPU (the Core 2 Extreme QX9770 was at 25 - 30% CPU utilization). It turns out that there are two factors at work here.
According to Mr. Blackman, NVIDIA's initial CUDA release didn't have the streaming mechanism that allows you to run CPU cycles in parallel with the GPU. This functionality was added in later versions of CUDA, but the early beta we tested was developed using the initial CUDA release. Once the CPU and GPU can be doing work in parallel, the CPU side of the equation should be reduced.
Secondly, it's worth pointing out that only parts of the codec are very parallelizable (motion compensation, motion estimation, DCT and iDCT) but other parts of the pipeline (syntax decoding, variable length coding, CABAC) are not so well suited for NVIDIA's array of Streaming Processors.
Elemental also indicated that performance scales linearly with the number of SPs in the GPU, so presumably the GeForce GTX 280 should be nearly 90% faster (at least at the GPU-accelerated functions) than a GeForce 9800 GTX.
SLI Support?
As we found in our GT200 article, in most cases NVIDIA's fastest GPU is actually the pair of G92s found on a single GeForce 9800 GX2. Unfortunately, Elemental's software will not split up a single video stream for processing across multiple GPUs - so NVIDIA's fastest GPU would be the GeForce GTX 280.
There is an exception however; if you do have multiple GPUs in your system, the professional version of Elemental's software will let you output to two different resolution/bitrate targets at the same time - with each GPU handling a different transcode stream.
Final Words
We've still got a couple of months before Elemental's software makes its official debut, but it's honestly the most exciting non-gaming application we've seen for NVIDIA's hardware.
We have already given Elemental some feedback as to features we'd like to see in the final version of the software (including support for .m2ts and .evo files as well as .mkv input/output). If there's anything you'd like to see, leave it in the comments and we'll pass along the thread to Elemental.
Elemental's software, if it truly performs the way we've seen here, has the potential to be a disruptive force in both the GPU and CPU industries. On the GPU side it would give NVIDIA hardware a significant advantage over AMD's GPUs, and on the CPU side it would upset the balance between NVIDIA and Intel. Video encoding has historically been an area where Intel's CPUs have done very well, but if the fastest video encoder ends up being a NVIDIA GPU - it could mean that video encoding performance would be microprocessor agnostic, you'd just need a good NVIDIA GPU.
If you're wondering why Intel is trying to launch Larrabee next year, this is as good of a consumer example as you're going to get.
50 Comments
View All Comments
tuteja1986 - Tuesday, June 24, 2008 - link
err :( IQ testing needed. Anyways , I use AVIVO converter to do quick conversion for my zen. Its takes me 3mins to do a whole 40min episode of a TV show or 6mins for a movie.rhangman - Tuesday, June 24, 2008 - link
You would need to at least compare profiles. How many reference frames, etc. More advanced settings require more resources to encode (and decode).Really comes down to actually viewing the encodes though. If it can't touch x264's quality, then it doesn't matter how much faster it is. The scene will stick with x264 anyway just like they did with Xvid over DivX, 3ivX, etc. So it is x264 selling players, hardware decoding graphics cards, etc. just like it was Xvid selling Standalone DivX players.
JonnyDough - Wednesday, June 25, 2008 - link
Agreed. I don't rip CDs to my computer at low bit rate either. The higher the quality (even if you can't see it) the better. Just because your computer monitor can't display uber-high res doesn't mean that the massive tv set or projector you buy a few years down the road can't. There's simply no point in building a library of movies and music if you aren't going to rip it in the highest quality possible. If you need to transfer it to a limited sized mobile storage device THEN you convert it on the fly. With larger than 1TB hard drives on the way I don't see who wouldn't want to store their movies in the sharpest image possible.7oby - Tuesday, June 24, 2008 - link
[quote]You would need to at least compare profiles. How many reference frames, etc. More advanced settings require more resources to encode (and decode).[/quote]The bitrate hardly has an impact on encoding speed. As far as I understood CABAC is done on the 30% loaded CPU here. That means if you change the bitrate, the quality will change, but the encoding speed should basically be the same.
As you said: profile comparision gives some more information. I expect Baseline Profile at most since it's a derived product:
http://elementaltechnologies.com/products.php?id=4">http://elementaltechnologies.com/products.php?id=4
Maybe it's only intra frame:
http://elementaltechnologies.com/products.php?id=1">http://elementaltechnologies.com/products.php?id=1
One would need more advanced quality tests to tell e.g. how good the motion estimation works. If this one is bad, you will need additional bandwidth to compensate.
In any case: I think this is an innovative product for the intended target use case of transcoding movies for iPhones, PS3, HTPC etc.
x264 developers recently turned away from CUDA, although they started experimenting in December 2007:
http://forums.nvidia.com/lofiversion/index.php?t53...">http://forums.nvidia.com/lofiversion/index.php?t53...
Anand Lal Shimpi - Tuesday, June 24, 2008 - link
The problem is that the beta of BadaBOOM doesn't expose any of what it's doing to the end user, we'll have to wait for the pro-version for that it seems.-A
rhangman - Tuesday, June 24, 2008 - link
Should be able to extract the raw video stream and do some analysis on it though. Get an idea what the encoder is doing. For a visual comparison you don't need to know anyway.At any rate, in terms of encoding, speed is only half the equation.
Anand Lal Shimpi - Tuesday, June 24, 2008 - link
Agreed :) Working on that part, I may wait until the next beta though so we have the latest code at our disposal.-A
Rainman200 - Tuesday, June 24, 2008 - link
That great news thanks Anand, also do compare with Ripbot as well as it uses more bleeding edge versions of x264 with patches for film grain optimization and more.Also a comparison against the constant quality mode would be interesting, I use Ripbot (default high profile) with a setting of 18 and with most movies I can barely tell the difference on the HDTV vs the original. On a 3Ghz quad core 2 it takes at worst about 1 hour 30mins on average for a film, usually shorter but noisy/grainy movies like Downfall or Minority take that little bit longer.
Will be very interesting to see how RapiHD fares against in terms of image quality.
SlyNine - Tuesday, June 24, 2008 - link
Make it support other transcodeing functions, like for audio WMA-MP3. I know that for the most part the only thing that really needs this type of boost is HQ H.264 Blue Ray movies ( and HD-DVD).But it would be awesome to accelerate the other stuff as well.
icrf - Tuesday, June 24, 2008 - link
Those are pretty lighweight, compared to H.264 so there's probably little reason. Besides, writing a codec in CUDA is no small task. I think they should focus on making the H.264 encoder as flexible as possible.I assume the feature request is more to do with features of the application doing the encoding, and not about what code to run on the GPU itself. My encoding settings of choice tend to be a Level 4.1 compliant MPEG4 file, H.264 video and 5.1 AAC-LC audio. Ideally, it should use any VFW codecs installed and dump to the various common containers (avi/mkv/mp4/ogm with things like mov/wmv being distant seconds).