History of ASR Technologies

August 31, 2023

Court Reporting Blog

“Hey, Siri, what’re the top priorities on my schedule today?”

Today, we may think of this as a relatively simple request for our handy voice assistants to accomplish. And yet, it’s the culmination of decades of progress in Automatic Speech Recognition (ASR) technology.

So, what is ASR? At its essence, ASR involves teaching machines to listen and understand our words, then act. But how did we even reach the point where a phrase like “OK Google” could unlock a world of capabilities?

From Thomas Edison’s dictation machine to Siri’s casual chat, the ASR history journey is a tale of innovation and transformation that transformed not just the gadgets in our pockets but how we function in modern society.

Let’s trace the steps that brought voice commands to life.

1800s: The Origins of Dictation

1879 – Inventor Thomas Edison unveils the world’s first dictation machine

1950s: The Infancy of Speech Recognition Technology

1952 – Bell Laboratories introduces the “Audrey” machine, an innovative device with the remarkable ability to recognize the spoken voice of its developer, HK Davis, accurately identifying digits from 0 to 9 with an impressive accuracy rate exceeding 90%.

1960s: From Digits to Spoken Words

1962 – IBM showcases its “Shoebox” machine, which is capable of understanding 16 English words as well as numerical digits.

1970s: From Words to Complete Sentences

1971-1976 – The DARPA Speech Understanding Research (SUR) initiative, created by the U.S. Department of Defense, was used to advance voice recognition technologies with potential uses in both military and civilian contexts.

1976 – Researchers at Carnegie Mellon University develop “Harpy,” an advanced ASR system built on the foundations of DARPA. It was able to comprehend 1,011 words and recognize some complete sentences, marking a significant advancement in the field of speech recognition tech.

1980s: From a Few Hundred Words to Several Thousand | The Decade of HMM

The ’80s were a pivotal decade for ASR technologies. The introduction of the statistical method known as the “Hidden Markov Model (HMM)” became an inflection point that revolutionized language modeling, enhancing accuracy and laying the groundwork for even more sophisticated ASR.

Unlike traditional approaches that relied solely on word recognition and sound patterns, HMM brought a new dimension by enabling the prediction of the most probable phonemes to follow a given phoneme.

Mid ‘80s – IBM introduces a voice-activated typewriter named “Tangora” based on HMM that boasted a 20,000 spoken-word vocabulary.

1987 – World of Wonders releases Julie: the world’s most intelligent doll. This was the first “fully interactive toy” due to its DSP chip that allowed it to respond to and generate basic speech.

1990s: Microprocessors Change the ASR Game

Before the ‘90s, Automatic Speech Recognition systems relied on discrete dictation, which required the speaker to pause after every single word to ensure that the technology could accurately recognize each word. However, in the 1990s, a transformative shift occurred in the field of automatic speech recognition (ASR) with the advent of microprocessors, which enabled faster and more accurate speech pattern recognition.

1990 – “Dragon Dictate” makes history as the world’s first speech recognition software tailored for consumer use, revolutionizing how individuals interact with computers and opening the door to a new era of speech-enabled applications.

1997 – Later in the decade, Dragon develops “Dragon Naturally Speaking,” the first continuous speech recognition product capable of understanding continuous speech of up to 100 words per minute.

2000s: Speech Recognition Technology Becomes Faster and More Accurate

While the technology was continuously evolving over the decade, the most significant milestone was the introduction of Google’s Voice Search app. Tens of millions of users were exposed to speech recognition technology due to this app. Google was also able to gather petabytes of voice data that could be used to advance technology and boost predictions.

2010s: The Digital Assistant Explosion

The 2010s witnessed a remarkable and rapid rise in smartphone market penetration. At the start of the decade, just 20% of the population owned smartphones. But, in a few short years, the technology progressed rapidly, making smartphones an indispensable part of daily life. By the year 2020, an astounding 72.2% of the population had a smartphone in their pocket.¹

Because of this, there was a large increase in speech recognition software and apps, sparked by the release of smart speakers and digital assistants like Siri or Alexa.

2011 – Apple introduces and launches Siri, the world’s first intelligent digital assistant on a phone.

2017 – Google’s machine learning algorithms achieve 95% English word accuracy rate, which is equivalent to human capabilities.

2020s: AI and ASR

The 2010s may have been defined by the rise of digital assistants, but the 2020s are shaping up to be the decade where AI’s influence on ASR technology becomes truly transformative, in terms of both acoustics and semantics. Some of the notable advancements include:^2,3,4

Optimization Techniques – By harnessing innovations such as Faster-whisper and NVIDIA-wav2vec2, the ASR industry has been able to significantly reduce both training and inference times while making ASR tech more accessible and deployable.

Generative AI – Generative AI is heralding a revolution in human-digital interaction, employing avatars, Textless NLP, and innovative models like VALL-E for direct audio processing, voice cloning, and flexible, context-aware applications.

Conversational AI – Conversational AI is rapidly advancing with personal assistants like Alexa and Siri, evolving from text-based systems to sophisticated voice-based interfaces, with a focus on interoperability, nuanced communication, support for diverse accents, and multi-task learning frameworks for a wide array of spoken language tasks.

Global reach with multilingual ASR – With the introduction of multilingual speech recognition systems, companies are now making their applications and services available to a global audience.

Enhanced accessibility through automated captions – Live video content is now more accessible and inclusive than ever thanks to automated captioning.

AI-driven accuracy enhancements – AI continues to drive unprecedented levels of accuracy in advanced speech recognition technology. Through continuous learning and adaptation, ASR software is becoming more intuitive and responsive, paving the way for future innovations.

The Future of ASR Technologies

The evolution of AI technologies like machine learning (ML), deep learning (DL), natural language processing (NLP), neural networks, and ASR has accelerated exponentially in recent years. This rapid growth is pushing the boundaries of speech recognition and is poised to continue its transformational influence at an unprecedented pace over the coming decade.

With many practical applications in the legal industry—from contract review and negotiation to litigation prediction and analytics, legal research, transcription and more—speech recognition and artificial intelligence in law firms will continue to gain adoption.

If you are looking for reliable and accurate litigation support services, we can help. At U.S. Legal Support, we are constantly monitoring technologies, including AI and ASR. Whether it’s court reporting, medical record retrieval, legal translation, and more, we can act as your litigation support partner.

Dependable, professional court reporting services you can rely on. Learn more!

Sources:

Statista. Smart Phone Penetration Rate as Share of the Population in the US. https://www.statista.com/statistics/201183/forecast-of-smartphone-penetration-in-the-us/
Towards Data Science. Overcoming Automatic Speech Recognition Challenges: The Next Frontier. https://towardsdatascience.com/overcoming-automatic-speech-recognition-challenges-the-next-frontier-e26c31d643cc
NVIDIA. Essential Guide To Automatic Speech Recognition Technology. https://developer.nvidia.com/blog/essential-guide-to-automatic-speech-recognition-technology/
Customerzone360. AI Is Driving Greater Accuracy in Advanced Speech Recognition. https://www.customerzone360.com/topics/customer/articles/455823-ai-driving-greater-accuracy-advanced-speech-recognition.htm#

Julie Feller

Julie Feller is the Vice President of Marketing at U.S. Legal Support where she leads innovative marketing initiatives. With a proven track record in the legal industry, Juie previously served at Abacus Data Systems (now Caret Legal) where she played a pivotal role in providing cutting-edge technology platforms and services to legal professionals nationwide.

Editoral Policy

Content published on the U.S. Legal Support blog is reviewed by professionals in the legal and litigation support services field to help ensure accurate information. The information provided in this blog is for informational purposes only and should not be construed as legal advice for attorneys or clients.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie is set by CloudFlare. The cookie is used to support Cloudflare Bot Management.
__hssrc	session	This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__hstc	1 year 24 days	This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
__lotl	5 months 27 days	This cookie is set by the provider Lucky Orange. This cookie is used to identify the traffic source URL of the visitor's orginal referrer, if there is any.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-119238040-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gcl_au	3 months	This cookie is used by Google Analytics to understand user interaction with the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
_lo_uid	2 years	This cookie is set by the provider Lucky Orange. This cookie shows the unique identifier for the visitor.
_lo_v	1 year	This cookie is set by the provider Lucky Orange. This cookie is used to show the total number of visitor's visits.
_lorid	10 minutes	This cookie is set by the provider Lucky Orange. This cookie is used to identify the ID of the visitors current recording.
CONSENT	16 years 5 months 1 day 11 hours 7 minutes	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.