Speech Recognition - Ready for Prime Time?
by Jarred Walton on April 21, 2006 9:00 AM EST- Posted in
- Smartphones
- Mobile
The Contenders
When I first made the decision to try out speech recognition, there was an overwhelming favorite on the market: Dragon NaturallySpeaking. I had never used it before, but I'd heard about it and it was generally well-regarded. I picked up a copy of Dragon NaturallySpeaking 8.0 Preferred and commenced using it. The training process took about 20 minutes, another 20 or 30 minutes was spent scanning my documents for words and speech patterns, and then it was basically done and I was ready to start dictating. I've now been using Dragon NaturallySpeaking for several months, and during that time training has further improved the accuracy.
Dragon isn't a particularly cheap piece of software, but when you consider the versatility it offers and the fact that I've already spent about $700 on a desk, chair, and keyboard in an attempt to make an "ergonomic workspace," spending another $100-$200 is hardly a concern. The $100 Standard version apparently has reduced functionality, though apparently the only major difference is that it lacks the ability to transcribe recordings. For home use and personal use, you can get a discount on the Preferred version and buy it for $160. Unless that extra $60 is really important to you, I would have to recommend going with the Preferred version -- you never know when the ability to transcrive a recording will come in handy.
Of course, Microsoft Office 2003 also has built-in speech recognition. I have never heard anyone really talk about it, and I have never tried it myself, but having become familiar with Dragon NaturallySpeaking I figured it was only fair that I give Microsoft's product a shot. After all, practically every business in the world has a copy of Microsoft Office 2003 installed, so perhaps there isn't even a need to go out and purchase separate speech recognition software. One other item that may be of interest is how much processing time each product needs. Voice recognition may or may not benefit from dual core processors, but there's only one way to find out.
I conducted testing on several systems, but eventually settled on using one for the actual benchmarking. If there's interest, I can go back and look at performance on other systems, but for the most part I have found that modern Pentium/Athlon systems are sufficient - with a few exceptions that I'll get to in a moment.
Test System:
AMD Athlon X2 3800+ @ 2.60 GHz (10x260HTT)
2x1024MB Patriot SBLK @ DDR-433 (CPU/12)
Western Digital 250GB 16MB SATA-2 HDD
I began using Dragon NaturallySpeaking on a single core Athlon 64 3200+ socket 754 Newcastle (@2.42 GHz) -- my old primary system, which I have been using for about 18 months. I finally broke down recently and decided it was time to move on to a dual core setup for my main system. Both systems are of course overclocked, because that's the type of user I am. Since this is a look at a software technology as opposed to a hardware article, the system clock speed isn't particularly relevant except as a guideline of what level of performance you can expect.
The major reason for the upgrade is gaming - the old AGP 6800GT wasn't cutting it anymore, and the only reasonable upgrade required PCI Express. (That should tell you something about the amount of processing power most business tasks require - the 754 platform is still more than sufficient for most people!) I figured since I was already switching to socket 939, there was no reason not to add a second processor core. That extra core does help out when I'm trying to do multiple things at once, and Dragon does tend to consume a decent amount of resources. MMO gamers might find it useful as a way of chatting without having to type (and it might just cut down on the use of annoying abbreviations if more people did it, but I digress...). When I'm only dictating, though, I don't really notice the difference between my old system and my new system as far as speech recognition is concerned.
So how do you test and benchmark speech recognition packages? The more real world a test is the better, and what could be more real world than an article written for our web site? How about this very article? I'm going to take the first two pages of the article in their present form (minus the Isaac Asimov quote and potentially some later edits) and dictate the text into a sound file. All punctuation will be dictated, and I will edit the final sound file to remove any speech errors. The final sound file will be played back for both speech recognition packages, and with 1181 words of text we can come up with an accuracy rating.
This first sound file is basically my "dictation voice". There are two elements to training a speech recognition program: first, it learns to recognize your voice; second, you learn to adapt your voice to improve accuracy. After creating this first sound file, I realized that my voice didn't sound very normal to me. I'm okay with that, but I decided a second sound file was needed to stress test the software packages. I read the text a second time for this sound file, with a few minor updates to the text, but this time I spoke in a more natural voice and I didn't go back to correct any errors. I won't count any of my errors against the accuracy score, but this will hopefully provide additional insight into how these two voice recognition packages perform.
When I first made the decision to try out speech recognition, there was an overwhelming favorite on the market: Dragon NaturallySpeaking. I had never used it before, but I'd heard about it and it was generally well-regarded. I picked up a copy of Dragon NaturallySpeaking 8.0 Preferred and commenced using it. The training process took about 20 minutes, another 20 or 30 minutes was spent scanning my documents for words and speech patterns, and then it was basically done and I was ready to start dictating. I've now been using Dragon NaturallySpeaking for several months, and during that time training has further improved the accuracy.
Dragon isn't a particularly cheap piece of software, but when you consider the versatility it offers and the fact that I've already spent about $700 on a desk, chair, and keyboard in an attempt to make an "ergonomic workspace," spending another $100-$200 is hardly a concern. The $100 Standard version apparently has reduced functionality, though apparently the only major difference is that it lacks the ability to transcribe recordings. For home use and personal use, you can get a discount on the Preferred version and buy it for $160. Unless that extra $60 is really important to you, I would have to recommend going with the Preferred version -- you never know when the ability to transcrive a recording will come in handy.
Of course, Microsoft Office 2003 also has built-in speech recognition. I have never heard anyone really talk about it, and I have never tried it myself, but having become familiar with Dragon NaturallySpeaking I figured it was only fair that I give Microsoft's product a shot. After all, practically every business in the world has a copy of Microsoft Office 2003 installed, so perhaps there isn't even a need to go out and purchase separate speech recognition software. One other item that may be of interest is how much processing time each product needs. Voice recognition may or may not benefit from dual core processors, but there's only one way to find out.
I conducted testing on several systems, but eventually settled on using one for the actual benchmarking. If there's interest, I can go back and look at performance on other systems, but for the most part I have found that modern Pentium/Athlon systems are sufficient - with a few exceptions that I'll get to in a moment.
Test System:
AMD Athlon X2 3800+ @ 2.60 GHz (10x260HTT)
2x1024MB Patriot SBLK @ DDR-433 (CPU/12)
Western Digital 250GB 16MB SATA-2 HDD
I began using Dragon NaturallySpeaking on a single core Athlon 64 3200+ socket 754 Newcastle (@2.42 GHz) -- my old primary system, which I have been using for about 18 months. I finally broke down recently and decided it was time to move on to a dual core setup for my main system. Both systems are of course overclocked, because that's the type of user I am. Since this is a look at a software technology as opposed to a hardware article, the system clock speed isn't particularly relevant except as a guideline of what level of performance you can expect.
The major reason for the upgrade is gaming - the old AGP 6800GT wasn't cutting it anymore, and the only reasonable upgrade required PCI Express. (That should tell you something about the amount of processing power most business tasks require - the 754 platform is still more than sufficient for most people!) I figured since I was already switching to socket 939, there was no reason not to add a second processor core. That extra core does help out when I'm trying to do multiple things at once, and Dragon does tend to consume a decent amount of resources. MMO gamers might find it useful as a way of chatting without having to type (and it might just cut down on the use of annoying abbreviations if more people did it, but I digress...). When I'm only dictating, though, I don't really notice the difference between my old system and my new system as far as speech recognition is concerned.
So how do you test and benchmark speech recognition packages? The more real world a test is the better, and what could be more real world than an article written for our web site? How about this very article? I'm going to take the first two pages of the article in their present form (minus the Isaac Asimov quote and potentially some later edits) and dictate the text into a sound file. All punctuation will be dictated, and I will edit the final sound file to remove any speech errors. The final sound file will be played back for both speech recognition packages, and with 1181 words of text we can come up with an accuracy rating.
This first sound file is basically my "dictation voice". There are two elements to training a speech recognition program: first, it learns to recognize your voice; second, you learn to adapt your voice to improve accuracy. After creating this first sound file, I realized that my voice didn't sound very normal to me. I'm okay with that, but I decided a second sound file was needed to stress test the software packages. I read the text a second time for this sound file, with a few minor updates to the text, but this time I spoke in a more natural voice and I didn't go back to correct any errors. I won't count any of my errors against the accuracy score, but this will hopefully provide additional insight into how these two voice recognition packages perform.
38 Comments
View All Comments
JarredWalton - Friday, April 21, 2006 - link
That's definitely true -- if you look at how accuracy scales with CPU usage, doubling and even tripling the processor time comes with only incremental increases in accuracy. I do have to say that I noticed it being a little sluggish on my single core system when I was multitasking, but obviously I push my computers a little harder than a lot of people. Depending on what you're willing to live with in terms of speed, I'm sure both Dragon and Microsoft speech recognition can work on a Pentium III level system.LanceM - Friday, April 21, 2006 - link
So is that selection typical Asimov? If so, it has convinced me to never bother reading any of his works.His ideas/plots/etc. may be interesting, but I don't think I could handle phrases like, "as if she were some dried-up, old-maid teacher." Give me Joseph Conrad or William Faulkner.
Dfere - Monday, April 24, 2006 - link
Asimov is classic Sci-Fi- pulp, which usually had a gritty detective-novel appeal. Hs works are in large part murder mystery type novels. You have to understand the nature of the literature, the history and the author. I don't think a critique is deserved until then.Most Sci Fi writers of any ability first master imaginative concepts and apply them, even Drke and Sirling.
I give Kudos to the staff for including literary comments, the poster who said this should not be a book of the month club lives a very one dimensional life.
Shoal07 - Friday, April 21, 2006 - link
What makes Asimov special is many of his ideas in sci fiction are comming true today or are atleast on the horizon. Asimov shaped the way many of us picture the future.goinginstyle - Friday, April 21, 2006 - link
Why does the Anandtech staff revert to literary quotes in their reviews now? This is a computer website, not a book club.JarredWalton - Friday, April 21, 2006 - link
I read Asimov's foundation series as a teenager, and I loved it. He gave me lots of fanciful dreams about where technology might go in the future, and even though some of the writing styles have changed over the years, I still find a lot of these old sci-fi books to be entertaining. You should try reading War of the Worlds if you think that quote was bad. LOLSorry if some of you didn't like the quote. Everyone has their own dislikes and likes, but in the end it's just an introduction. I hope to one day be able to yell at my computer and have it properly understand what I say, as well as the context (i.e., yelling means something is going wrong, and maybe it can help me out). Will we ever get there? Probably some day, but whether it happens in our lifetimes or not is anyone's guess.
NegativeEntropy - Saturday, April 22, 2006 - link
I like the use of quotes -- though it does remind me a bit of being in English/writing class ("Always do something in the introduction to get your audience's attention...").On the subject of "classic" Sci-fi writers, I also still enjoy old school Heinlein. Though his characters can get a bit repetitive across his pile of works, many of the science ideas are still valid (and I share much of his apparent personal philosophy).
On the actual article -- thanks for doing it. I have been curious where this technology was at in terms of every day usage and hardware requirements.
Regarding CPU usage, it's possible DNS attempts to use whatever resources are available based on preferences. i.e. on minimum, it attempts to impact the system minimally, regardless of the CPU resources available; say 25% on min, 50% on med and 95% on max with the percentage staying relatively consistent from a P3 1GHz to an A64 2.6GHz. This would explain its reported good scaling from system to system. If you want to test it, underclock your A64 system to half its frequency and compare utilization at the medium setting.
kristof007 - Friday, April 21, 2006 - link
Here at Anandtech you can always count on to find something else. Great article! I tried out speech recognition a few years back and I got frustrated with it over one thing or another so I just dropped it and went back to typing. I've been typing for about 8 years now. I never learned the "proper" way to type where every finger has a spot. Anyway I hope Vista will make speech recognition WAAY better so that it could be used around the OS AND for speech recognition.Thanks for the article!