“Hey, Siri, what’re the top priorities on my schedule today?”
Today, we may think of this as a relatively simple request for our handy voice assistants to accomplish. And yet, it’s the culmination of decades of progress in Automatic Speech Recognition (ASR) technology.
So, what is ASR? At its essence, ASR involves teaching machines to listen and understand our words, then act. But how did we even reach the point where a phrase like “OK Google” could unlock a world of capabilities?
From Thomas Edison’s dictation machine to Siri’s casual chat, the ASR history journey is a tale of innovation and transformation that transformed not just the gadgets in our pockets but how we function in modern society.
Let’s trace the steps that brought voice commands to life.
1800s: The Origins of Dictation
1879 – Inventor Thomas Edison unveils the world’s first dictation machine
1950s: The Infancy of Speech Recognition Technology
1952 – Bell Laboratories introduces the “Audrey” machine, an innovative device with the remarkable ability to recognize the spoken voice of its developer, HK Davis, accurately identifying digits from 0 to 9 with an impressive accuracy rate exceeding 90%.
1960s: From Digits to Spoken Words
1962 – IBM showcases its “Shoebox” machine, which is capable of understanding 16 English words as well as numerical digits.
1970s: From Words to Complete Sentences
1971-1976 – The DARPA Speech Understanding Research (SUR) initiative, created by the U.S. Department of Defense, was used to advance voice recognition technologies with potential uses in both military and civilian contexts.
1976 – Researchers at Carnegie Mellon University develop “Harpy,” an advanced ASR system built on the foundations of DARPA. It was able to comprehend 1,011 words and recognize some complete sentences, marking a significant advancement in the field of speech recognition tech.
1980s: From a Few Hundred Words to Several Thousand | The Decade of HMM
The ’80s were a pivotal decade for ASR technologies. The introduction of the statistical method known as the “Hidden Markov Model (HMM)” became an inflection point that revolutionized language modeling, enhancing accuracy and laying the groundwork for even more sophisticated ASR.
Unlike traditional approaches that relied solely on word recognition and sound patterns, HMM brought a new dimension by enabling the prediction of the most probable phonemes to follow a given phoneme.
Mid ‘80s – IBM introduces a voice-activated typewriter named “Tangora” based on HMM that boasted a 20,000 spoken-word vocabulary.
1987 – World of Wonders releases Julie: the world’s most intelligent doll. This was the first “fully interactive toy” due to its DSP chip that allowed it to respond to and generate basic speech.
1990s: Microprocessors Change the ASR Game
Before the ‘90s, Automatic Speech Recognition systems relied on discrete dictation, which required the speaker to pause after every single word to ensure that the technology could accurately recognize each word. However, in the 1990s, a transformative shift occurred in the field of automatic speech recognition (ASR) with the advent of microprocessors, which enabled faster and more accurate speech pattern recognition.
1990 – “Dragon Dictate” makes history as the world’s first speech recognition software tailored for consumer use, revolutionizing how individuals interact with computers and opening the door to a new era of speech-enabled applications.
1997 – Later in the decade, Dragon develops “Dragon Naturally Speaking,” the first continuous speech recognition product capable of understanding continuous speech of up to 100 words per minute.
2000s: Speech Recognition Technology Becomes Faster and More Accurate
While the technology was continuously evolving over the decade, the most significant milestone was the introduction of Google’s Voice Search app. Tens of millions of users were exposed to speech recognition technology due to this app. Google was also able to gather petabytes of voice data that could be used to advance technology and boost predictions.
2010s: The Digital Assistant Explosion
The 2010s witnessed a remarkable and rapid rise in smartphone market penetration. At the start of the decade, just 20% of the population owned smartphones. But, in a few short years, the technology progressed rapidly, making smartphones an indispensable part of daily life. By the year 2020, an astounding 72.2% of the population had a smartphone in their pocket.1
Because of this, there was a large increase in speech recognition software and apps, sparked by the release of smart speakers and digital assistants like Siri or Alexa.
2011 – Apple introduces and launches Siri, the world’s first intelligent digital assistant on a phone.
2017 – Google’s machine learning algorithms achieve 95% English word accuracy rate, which is equivalent to human capabilities.
2020s: AI and ASR
The 2010s may have been defined by the rise of digital assistants, but the 2020s are shaping up to be the decade where AI’s influence on ASR technology becomes truly transformative, in terms of both acoustics and semantics. Some of the notable advancements include:2,3,4
Optimization Techniques – By harnessing innovations such as Faster-whisper and NVIDIA-wav2vec2, the ASR industry has been able to significantly reduce both training and inference times while making ASR tech more accessible and deployable.
Generative AI – Generative AI is heralding a revolution in human-digital interaction, employing avatars, Textless NLP, and innovative models like VALL-E for direct audio processing, voice cloning, and flexible, context-aware applications.
Conversational AI – Conversational AI is rapidly advancing with personal assistants like Alexa and Siri, evolving from text-based systems to sophisticated voice-based interfaces, with a focus on interoperability, nuanced communication, support for diverse accents, and multi-task learning frameworks for a wide array of spoken language tasks.
Global reach with multilingual ASR – With the introduction of multilingual speech recognition systems, companies are now making their applications and services available to a global audience.
Enhanced accessibility through automated captions – Live video content is now more accessible and inclusive than ever thanks to automated captioning.
AI-driven accuracy enhancements – AI continues to drive unprecedented levels of accuracy in advanced speech recognition technology. Through continuous learning and adaptation, ASR software is becoming more intuitive and responsive, paving the way for future innovations.
The Future of ASR Technologies
The evolution of AI technologies like machine learning (ML), deep learning (DL), natural language processing (NLP), neural networks, and ASR has accelerated exponentially in recent years. This rapid growth is pushing the boundaries of speech recognition and is poised to continue its transformational influence at an unprecedented pace over the coming decade.
With many practical applications in the legal industry—from contract review and negotiation to litigation prediction and analytics, legal research, transcription and more—speech recognition and artificial intelligence in law firms will continue to gain adoption.
Julie Feller is the Head of Marketing at U.S. Legal Support. Prior to U.S. Legal Support, Julie worked at Abacus Data Systems (now Caret Legal) providing legal technology platforms and services to legal professionals across the country.
Content published on the U.S. Legal Support blog is reviewed by professionals in the legal and litigation support services field to help ensure accurate information.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
This cookie is set by CloudFlare. The cookie is used to support Cloudflare Bot Management.
This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
5 months 27 days
This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
This cookie is set by HubSpot. The purpose of the cookie is to keep track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
1 year 24 days
This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
5 months 27 days
This cookie is set by the provider Lucky Orange. This cookie is used to identify the traffic source URL of the visitor's orginal referrer, if there is any.
This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
This cookie is used by Google Analytics to understand user interaction with the website.
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
This cookie is set by the provider Lucky Orange. This cookie shows the unique identifier for the visitor.
This cookie is set by the provider Lucky Orange. This cookie is used to show the total number of visitor's visits.
This cookie is set by the provider Lucky Orange. This cookie is used to identify the ID of the visitors current recording.
16 years 5 months 1 day 11 hours 7 minutes
These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
1 year 24 days
This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.