In-Depth

Speech specs

''Historically, speech has been complicated to implement largely because the standards had not been developed to actually write speech applications,'' said Sunil Soares, director of product management at IBM's Pervasive Computing Division. ''Over the past three years, that has begun to change. You can think of voice today as being where the Web was in 1994, when we had static Web pages and PCs. We didn't know what to do with all of the technology and how to implement it.''

The emergence of a new specification (Speech Application Language Tags or SALT) and the maturation of an older one (VoiceXML), are beginning to provide a sense of stability in the speech industry.

Voice Extensible Markup Language (VoiceXML) was written by the VoiceXML Forum, which contributed it to the World Wide Web Consortium (W3C) standards body. VoiceXML has been around for about two-and-a-half years now, and there are more than 600 vendors and service providers who currently adhere to that particular standard for development.

The W3C defines VoiceXML as a markup language ''designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications.''

SpeechWorks was one of the earliest companies to embrace VoiceXML. The company's flagship product line, OpenSpeech, is a speech-recognition solution optimized for VoiceXML. ''We were the first company to introduce a line of products built from the ground up to support VoiceXML,'' said Steve Chambers, chief marketing officer at SpeechWorks. ''For us, it has been good because everyone wants a standard. It delivers investment protection.''

Chambers expects to see most of the speech applications appearing in the near term to be VoiceXML-based, primarily because the standard has been around for a while. But another speech standard, SALT, has received a lot of support from some very big players.

SALT was created by the SALT Forum, a group of technology companies working together to accelerate the development of speech technologies in telephony and so-called multimodal systems. The founding members of the group are SpeechWorks, Intel, Cisco Systems, Philips, Comverse and Microsoft. Formed in October 2001, the SALT Forum now claims more than 50 member organizations; it released the 1.0 version of SALT earlier this year.

According to James Mastan, director of marketing for Microsoft's .NET Speech Technologies, the SALT spec defines a set of lightweight tags as extensions to commonly used Web-based programming languages. ''The idea,'' Mastan said, ''was not to reinvent the wheel, but to take advantage of the existing Web infrastructure and standards, and to simply add some lightweight standards that allow developers to add speech to their Web applications in an integrated fashion.''

Basically, the SALT tags allow developers to add speech interfaces to Web content and applications using familiar tools and techniques. In ''multimodal'' applications, the tags can be added to support speech input and output, either as standalone events or jointly with other interface options, such as speaking while pointing to the screen with a stylus, Mastan said. In telephony applications, the tags provide a programming interface to manage the speech-recognition and text-to-speech resources needed to conduct interactive dialogs with the caller through a speech-only interface.

The SALT specification is designed to work equally well on traditional computers, handheld devices, home electronics, telematics devices (such as in-car navigation systems) and mobile phones.

''What's really going to matter here from an app development perspective is the types of tools available to application developers to enable them to build these multimodal applications,'' said Peter Gavalakis, marketing manager at Intel, ''not the SALT tags in and of themselves. But you need some standard or at least an open specification that an industry ecosystem can develop around.''

SALT-based offerings are already coming down the product pipeline. In May, Microsoft announced the beta release of its .NET Speech SDK, a Web developer tool that the Redmond software maker billed as the first product based on the SALT spec. Philips is reportedly building a SALT-based browser and a telephony platform for SALT. HeyAnita, a speech hosting company, is developing a SALT-based browser for its hosted speech platform. Carnegie Mellon University is developing an open-source SALT browser, which the university expects to be available by the end of the year. Kirusa, a company that is heavily involved in the multimodal application area, is focusing on building multimodal wireless apps around SALT.

Microsoft's Mastan believes that both SALT and VoiceXML will be around for a while, adding that there is some discussion among standards bodies about convergence of the two in the future.

Microsoft's entrance into this market has received mixed reviews, but is generally considered a good thing.

''Microsoft threw a monkey wrench in the gears with SALT,'' said Meta Group analyst Earl Perkins. ''But it's had both a positive and negative affect. It drew attention to a growing market, because Microsoft never enters a market unless they realize there's money to be made. But on the other hand, they introduced another standard, so there may be a bit of a delay while vendors sort out how they're going to support both of them.''

See the following related stories:
Giving applications a voice , by John K. Waters
Talking speech tech , by John K. Waters
Multiple modes , by John K. Waters

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].