Speech recognition: The “what”, “why” and “how”Published Date: April 18, 2020 |
For years since the mid of last decade speech recognition technology was proving to be more of a gimmick than be of actual value to mainstream consumers. But lately, it has got much better: most modern smartphones now have a host of voice-activated features which “actually work”. Not only can programs such as Google Now or the iPhone’s Siri handle special programmed tasks like finding restaurants or dialing contacts, but smartphones are also getting much better at free-form speech recognition, such as taking dictated text-messages or e-mails. How did computers get so much better at understanding speech especially when the first few models were so terrible?
Almost any word can begin a sentence, so the first word in a sentence can be one of tens of thousands. If any word were as likely as any other in any position, a five-word utterance from a vocabulary of words will have millions of possibilities. Faced with such odds which is further faced by the lack of noise cancellation from poor microphones the task of building a dedicated speech recognition system seemed hopeless at one time.
But words do not appear in random order, so the computer does not have to guess from a vocabulary of random words. Instead, the software assesses how likely one is to have said a given word based on the surrounding words, drawing on statistical models derived from vast repositories of digital documents and the previous utterances of other users. This analytical approach is what allowed speech recognition software builders to succeed in this seemingly impossible task.
Statistical analysis models are powering all kinds of language applications. Even as older forms of computerized translation tend to try to break down the grammar and meaning of a sentence and it can still recompose it in a new language. This is because the best modern systems rely on the likelihood that the original input in the language being rendered correctly in the target language, based on a body of human-translated material that the computer has been trained for.
Computers can be more useful to humans the more they learn about us, both collectively and individually. Increasingly, the question for consumers is how much personal information they are willing to give up in return for more helpful and reliable services.
Contact Person: Mr. Vijendra Singh
Contact Email: email@example.com