In-Depth

Talking speech tech

Specialized technology areas tend to have their own jargon, and speech tech is quickly generating an alphabet soup of acronyms. Here are some definitions of some of the key expressions of speech.

Automatic Speech Recognition (ASR) systems -- utilize voice recognition to replace keypad entry for telephone voice menus. These are the systems that tell callers to speak the digits 0 through 9.

Computer Telephone Integration (CTI) -- combines data with voice systems for enhanced telephone services.

Dual-Tone MultiFrequency (DTMF) -- the type of audio signals produced by a touch-tone telephone.

Grammars -- in speech tech circles, ''grammars'' are the phrases a user might say that a speech engine can recognize.

Interactive Voice Response (IVR) -- an automated telephone information system to which callers respond by using the keypad or by speaking words. The system communicates with callers using a combination of fixed voice menus and real-time data from databases.

Prompts -- phrases that a voice system plays back to callers, indicating which information the system needs next. For example: ''Please enter your credit card number.''

Speech Application Language Tags (SALT) -- extensions to HTML, XHTML and XML for voice recognition and synthesized speech output. SALT is the newest specification to emerge from the speech market. It is designed to support ''multimodality,'' including audio, video, text and graphics, depending on the hardware.

Speaker recognition (sometimes called voice authentication) -- refers to systems with the ability to distinguish and confirm the identity of the individual speaking to it. Speaker recognition can be further subdivided into speaker identification, which determines which registered speaker provides a given utterance from among a set of known speakers; and speaker verification, which accepts or rejects the identity claim of a speaker.

Speech engine -- software that either processes speech input or produces speech output.

Speech recognition -- refers to applications and systems that ''understand'' language, regardless of the speaker. It takes the form of a range of applications, from shrink-wrapped dictation programs that live on a desktop to sophisticated business apps that allow customers to interact with a computer over the telephone.

Text-to-Speech (TTS) -- TTS systems convert text into synthesized speech output. These systems were first designed to allow blind users to listen to written material. Today, TTS is used extensively to convey financial data, e-mail messages and other information via telephone.

Voice User Interface (VUI) -- the speech tech equivalent of a GUI, typically residing on a PDA or smart phone. A VUI is more sophisticated than an IVR system, and offers a wider range of commands than simply ''yes'' or ''no.''

Voice browser -- allows users to access the Web using speech synthesis, pre-recorded audio and speech recognition.

Voice portal -- offers a variety of Web-based services on a speech-enabled platform accessible from a telephone. A consumer voice portal is an interface for consumer information, such as newsletters, sports and stocks, typically offered by service providers. An enterprise voice portal provides an integrated telephony interface to a wide range of enterprise applications and information.

Voice XML (VXML) -- A markup language designed to create audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony and mixed-initiative conversations.

See the following related stories:
Giving applications a voice , by John K. Waters
Multiple modes , by John K. Waters
Speech specs , by John K. Waters

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].