Speech Recognition - Ready for Prime Time?
by Jarred Walton on April 21, 2006 9:00 AM EST- Posted in
- Smartphones
- Mobile
Processor Utilization - Rapid Speech
Accuracy was obviously far lower with the faster, more casual delivery. What happens with processor utilization?
Dictation Processor Utilization
Transcription Processor Utilization
Processor requirements go way up with a faster speech delivery, at least with Dragon NaturallySpeaking. Perhaps it's my lack of enunciation when I'm speaking fast, but obviously you end up with not only lower accuracy, but also longer speech recognition times.
When you consider that our test system was able to keep up with our regular delivery when dictating (barely), the fact that it requires 15.7 minutes on maximum accuracy for an 8:11 length file shows how important "user training" really is. When dictating at 120 to 130 wpm, both software packages are able to keep up. At 146 wpm, maximum accuracy ends up processing only 77 words per minute. When you throw in the increased number of errors, rapid delivery is definitely not the preferred way to utilize either of these packages. If "Natural" means "Fast" to you, you might want to mentally rename DNS to Dragon "Enunciate-And-Speak-Slower" Speaking.
Transcribe mode is also slower, but it doesn't take nearly as big of a hit. Also interesting is that both the transcription mode and the dictation mode manage to max out processor usage with Dragon NaturallySpeaking but live dictation takes 2 to 5 times as much CPU time. Clearly, the full speech UI is putting a decent load on the CPU. That makes sense, as continually trying to determine whether the user has spoken a special command (i.e. trying to access the file menu) would create some overhead. Still, being two to three times slower seems a bit extreme. Of course, keep in mind that during normal writing, rarely can you speak at full speed for several minutes; normally, you'll take frequent pauses to think of exactly what you want to say next.
Accuracy was obviously far lower with the faster, more casual delivery. What happens with processor utilization?
Dictation Processor Utilization
DNS8 Maximum Accuracy |
DNS8 Medium Accuracy |
DNS8 Minimum Accuracy |
MSWord Maximum Accuracy |
MSWord Medium Accuracy |
MSWord Minimum Accuracy |
Transcription Processor Utilization
DNS8 Maximum Accuracy |
DNS8 Medium Accuracy |
DNS8 Minimum Accuracy |
Processor requirements go way up with a faster speech delivery, at least with Dragon NaturallySpeaking. Perhaps it's my lack of enunciation when I'm speaking fast, but obviously you end up with not only lower accuracy, but also longer speech recognition times.
When you consider that our test system was able to keep up with our regular delivery when dictating (barely), the fact that it requires 15.7 minutes on maximum accuracy for an 8:11 length file shows how important "user training" really is. When dictating at 120 to 130 wpm, both software packages are able to keep up. At 146 wpm, maximum accuracy ends up processing only 77 words per minute. When you throw in the increased number of errors, rapid delivery is definitely not the preferred way to utilize either of these packages. If "Natural" means "Fast" to you, you might want to mentally rename DNS to Dragon "Enunciate-And-Speak-Slower" Speaking.
Transcribe mode is also slower, but it doesn't take nearly as big of a hit. Also interesting is that both the transcription mode and the dictation mode manage to max out processor usage with Dragon NaturallySpeaking but live dictation takes 2 to 5 times as much CPU time. Clearly, the full speech UI is putting a decent load on the CPU. That makes sense, as continually trying to determine whether the user has spoken a special command (i.e. trying to access the file menu) would create some overhead. Still, being two to three times slower seems a bit extreme. Of course, keep in mind that during normal writing, rarely can you speak at full speed for several minutes; normally, you'll take frequent pauses to think of exactly what you want to say next.
38 Comments
View All Comments
FrankyJunior - Sunday, April 30, 2006 - link
For anyone that wants to try Dragon, I just noticed that the preferred version is in the CompUSA ad today for $99.Never would have looked twice at it if I hadn't read this article yesterday.
NullSubroutine - Thursday, April 27, 2006 - link
are we to the day when i say 'computer' and it does what i want, and when i time travel by going around the sun ill be confused when they hand me a mouse and keyboard when wanting to use a computer?JarredWalton - Thursday, April 27, 2006 - link
Almost. And if you go around the sun *backwards* you can travel through time in the other direction. :Dquanta - Tuesday, April 25, 2006 - link
How about a review based on http://www.voicebox.com">VoiceBox Tehnologies products? It was demonstrated on Discovery Channel, and it seems to work without extensive voice training, and it actually _understand_ human speeches. The Discovery Channel can be found in http://www.exn.ca/dailyplanet/view.asp?date=3/13/2...">here.rico - Tuesday, April 25, 2006 - link
Where did you find Dragon Pro for $160? I thought it ususally cost about $800. Thanks.JarredWalton - Tuesday, April 25, 2006 - link
Heh, sorry - got "Preferred" and "Professional" mixed up. I'm not entirely sure what Pro includes, i.e. "Comes with a full set of network deployment tools."Trying to surf through Nuance's site is a bit tricky, and finding prices takes some effort as well. I think the only difference between Standard and Preferred is the ability to transcribe recordings in preferred - can anyone confirm for sure? I asked Nuance and didn't get a reply.
Tabah - Sunday, April 23, 2006 - link
Excellent article/review. Here's the question I've been wondering. Personally I use DNS for blogging and generally anything that requires excessive typing. A friend of mine on the other hand swears by IBM ViaVoice. Any chance we could get a comparison article/review at a later date?JarredWalton - Tuesday, April 25, 2006 - link
I will try to get in touch with IBM. I'm sure they wouldn't mind participating in a follow-up article.Tabah - Tuesday, April 25, 2006 - link
Oddly enough ViaVoice is licensed by Nuance so you might have a better chance talking to them. The main reason I'd like to see a comparison between VV and DNS isn't so much because they're made/released by the same company, but because off the cost difference between them. Like I said before I really like DNS but VV at the high end (VV Pro USB vs DNS Pro) is still a few hundred dollars cheaper.Poser - Sunday, April 23, 2006 - link
Listening to the dictation files, I was amazed that all the punctuation was spoken. I would have expected that they would (or could) be replaced by using a non-speech sound. Something along the lines of a click of the tongue for a comma -- there's a good number of distinct sounds you can make with your tongue that we don't have words for but that anyone could recognize and make. Think of "The Gods Must be Crazy" and the language used by the Kalahari bushmen for an extreme example.Also, thanks for the article, it was really interesting and potentially very helpful! I'll hold off until Vista hits and I see some comparisons, but I'm certain now that I'll end up using one of the two.