White Papers
Voice-Enabled Interactive Applications
An Introduction to Voice-Enabled Interactive Applications
Rapid advances in the speech recognition related technologies have led to enormous opportunities for enterprises to provide a new interface to existing software applications, to their customers as well as their employees and vendors. The endearing part of this technology is that it leverages the human voice for information access over telephone. Since only the telephone is needed by the end-user to access voice-enabled applications, the potential reach of these applications is 1.3 billion users with telephone access. Comparing this with the number of computers out there, around 250 million, it is clear why applications of this nature are taking off. Analysts predict this market to grow to a size of $5 billion by the year 2005 with more than 45 million wireless phone subscribers also taking advantage of the voice interface for their day-to-day information needs.
Enterprises are looking towards voice-enabled applications to provide them the competitive advantage in today’s market. Enterprises are already providing their customers, vendors and employees valuable information through call centers manned by live agents or through interactive web applications, so the integration of a voice based query interface adds value to the existing application by increasing reach and providing a clear differentiation. Thus, this phase in the life cycle of enterprise applications represents a coming together of the telephone and the Internet, with the early adopters reaping the maximum rewards.
This paper describes the various factors, including business and technology factors that make voice enabled applications the easiest user interface ever. This paper also analyzes the different approaches for creating voice-based solutions to make best use of this opportunity.
Voice-Enabled Solutions – Current Scenario
The telephone has always been the ubiquitous instrument sitting in the corner of your home or office – its potential had always been deemed enormous, but never fully realized. One of the most popular approaches adopted by enterprises has been to go down the route of Call center agents handling calls from customers, vendors and employees. Enterprises have provided everything ranging from yellow weather updates to bank balances reports this way. This method of providing information has the inherent disadvantages of being expensive. Interactive Voice Response (IVR) systems then positioned themselves as the alternative to live agents, providing self-service to enterprise information seekers round the clock. But the limitations of the keypad on the telephone and the multiple levels of constantly changing menus have limited the success of IVRs. Modern cellular telephones and Personal Digital Assistants (PDA) have moved in trying to address the deficiencies of IVR systems. These devices make use of standard Wireless Application Protocols (WAP) and allow the user to leverage a lightweight browser to access information. But once again, the limitations of this browser, security issues and the tiny keyboard have hindered their rapid adoption.
With the telecommunications equipment and the speech recognition technologies becoming cheap and widely available, voice enabled applications have become relatively inexpensive design and deploy. Voice offers Enterprises an incredible opportunity to expand their user base and differentiate themselves from their competition. Also, voice front-end hosting companies and Application Service Providers (ASP) have come of age, providing cheap and quick alternatives to building a solution from ground-up.
Computing hardware has leap-frogged in the past decade to make processors powerful yet inexpensive and hard disk storage space abundant. In the software front, speech recognition algorithms have made dramatic advances. Text-to-speech (TTS technology) has also improved. The adoption of a standard voice scripting language, like VoiceXML, is driving voice-enabled services, just as HTML fueled the development of the Internet.
Hence, the overall cost and complexity of developing a speech-based solution has become more and more viable to even small businesses. Basic operations from information query to very complex financial transactions can be efficiently conducted by using speech. Enterprises thus are able to serve their customers and employees more effectively.
The voice-enabled application interface acts as the perfect complement to information systems based on an Internet delivery model. People have gotten used to real-time information using the Internet, so the transition of applications to serve people over the telephone is but natural. Also the telephone acts as the perfect medium for information for the vast majority of the populace who may be “technology-shy” or without access to the Internet. People who are away from their homes or offices can also continue to use these applications using their mobile phones, which is another big advantage.
Thus all the pieces of the puzzle have fallen in place – inexpensive hardware, advanced software, reliable voice ASPs, accepted VXML standards, application development framework, experienced system integrators and the ubiquitous telephone.
What Is A "Voice-Enabled Application?"
In its most generic sense a voice-enabled application can be defined as "speech-enabled access to Enterprise information". The software applications in any enterprise have traditionally provided the users with information through the computer user interface. This concept was extended to provide information to users anywhere on earth using the Internet as a medium with user interfaces being provided through a web browser. Now the voice based browser has emerged as the more efficient interface to eliciting information for users who have access to a telephone. Apart from information access through a “pull”, enterprise applications can also be designed to “push” information to the users.
A voice-enabled service can provide users to such information as news, weather, traffic, stock quotes, driving directions, or restaurant guides. Other than these simple uses, users of voice-enabled solutions can also carry out financial transactions, update enterprise databases or obtain quick customer service. A voice interface can also provide a front-end for phone based messaging, that integrates Web access with more traditional technologies like voice mail, email and fax. Also service providers are dabbling with providing users with access to their e-mail through the telephone.
Voice technology has the potential to dramatically improve customer satisfaction, increase revenues and reduce agent costs when used in e-Business. Enterprises such as major airlines, financial services companies, and overnight delivery companies are adopting public-network-hosted speech recognition technology to provide customers with such services as travel information and reservations, order entry and tracking, banking as well as stock trading.
The Pieces In A Voice-Enabled Application
The voice-enabled applications sector is still emerging where services are typically provided by
Voice Recognition
Some major vendors of speech recognition software include IBM, Nuance, Philips Electronics NV and Speechworks International. In the United States, Nuance and Speechworks have a major share of the market with support of multiple languages.
Voice Portals
Are companies whose basic business involves building, hosting, and marketing voice portal services targeted to particular audiences. Examples are TelSurf Networks and HeyAnita. These voice portals directly provides a different mix of information services to businesses and consumers. These portals also introduced the concept of the computer reading out your emails to you, a marvel of the text-to-speech advances.
Hybrid Internet Portals
Second kind of provider is a traditional Internet portal that wants to extend its reach via the phone. Lycos struck an agreement with Quack.com to allow people to access Lycos over the telephone. Similarly, America Online is in licensing agreements with SpeechWorks technology to enable it to develop voice portals to complement its online services.
Hosting Service Providers
There is a third major category of voice-enabled service providers. It consists of telecommunications or Internet Service providers who want to drive an increase in customer loyalty and maximize network usage with branded portal services that they host in the network and/or obtain from third-party suppliers. For example, Talk2.com is partnering with wireless companies that want to add value to their packages.
Telera, Netbytel are examples of hosting service providers that allow ISPs and voice portal companies to extend enhanced services using voice without having to build and maintain the technology infrastructure. The ISP does not need to be an expert in a particular technology or application, such as speech recognition or telecommunications, but instead concentrates on launching new services and growing their business. In short, the ISP can select the best in breed application that fits their business model and can stay focused on gaining and retaining customers, not on maintaining technology.
Voice ASP
A rapidly emerging category of providers is the Voice Application Service Providers. These providers help to quickly integrate enterprise applications and call centers with the required telephony, call handling and speech recognition front-ends. They also provide security, scalability monitoring and access reports to the enterprises. Examples are Tellme, BeVocal, and Voxeo.
Voice Middleware and Applications Services Providers
provides the industry’s first voice middleware and application framework and Calsoft ( Calsoftgroup.com) provides the industrial-strength enterprise voice application solutions to automate call centers, support logistics needs, financial transactions and customer service.
Analysis Of Technology Gains
Technology advancements are speeding the emergence of voice-enabled applications. Most significant is speech related technologies, which has been growing very rapidly. Significant growth has been achieved in the past decade.
Automatic Speech Recognition
Rapid advancements in Automatic Speech Recognition (ASR) have succeeded in making it well accepted. Earlier speech applications recognized only a small vocabulary of few words, but the accuracy and vocabulary size of Automatic Speech Recognition engines has dramatically improved, fueled by refined algorithms, dramatic increases in processing power, and lower costs. Today's speech systems support naturally spoken phrases and do not require prior training. This has helped application voice interfaces to become speaker independent.
Continuous Speech Processing (CSP)
CSP enhances existing speech technologies with additional algorithms that enable the creation of large-scale systems that include thousands of lines of speech recognition. Continuous Speech Processing (CSP) is a breakthrough in support for large-vocabulary, host-based speech recognition. CSP technology allows developers to build speech recognition applications more cost-effectively. CSP supports features such as “barge-in”, which allows a user to interrupt speech prompts by speaking over them. A speech recognizer is able to understand what is spoken during the interruption. The enhanced technology also enables applications to recognize voice commands more accurately, making them easier to use and increasing customer satisfaction.
Text-to-speech (TTS)
Once information is accessed, it needs to be communicated to the user. One way to do this is via text-to-speech, or TTS. TTS is increasingly being used to read out email and enterprise information to callers. Real-world applications, such as email, read over the phone, are made possible by preprocessors that handle data such as acronyms, contractions and differences in intonation. Lernout & Hauspie is one of the principal TTS vendors with support for multiple languages.
Voice XML
Just as growth of the World Wide Web was accelerated by the development of the HTML scripting standard, the acceptance of VXML as universal standard for voice-based services can be expected to propel growth of these services.
VoiceXML (Voice eXtensible Markup Language) will allow providers to open up voice-enabled services to customers using voice interfaces. It is designed to support synthesized speech for Text-to-speech recognition of spoken input, recognition of dual-tone multi-frequency, recording of spoken input, and telephony call handling. Enterprises can build voice-enabled applications using the same technology they used to create visual Web sites, significantly reducing the cost of construction and delivery of new capabilities to telephone customers. Because voice-based solutions use VXML on HTTP, the integration with back-end databases can be shared with any HML applications.
Emergence of Testing Tools
The success of speech-based applications depends on such factors as the phrasing of voice prompts as well as on other behavioral factors. Speech technology providers have created powerful tools to ease rapid deployment. These tools can reduce the time it takes to build a new application from several person years to even weeks.
Homegrown, Off-The-Shelf Or Outsourced – The Million-Dollar Question
Designing and developing a voice-enabled solution is more complex than just assembling the pieces together. Enterprises need to focus on their core business: improving bottom lines while escalating customer service. Time-to-market is everything, and getting to market quickly requires working with an experienced application integrator who understands both the technology and the requirements of the business. And, needs change very quickly. Enterprises look toward solution providers who are flexible and offer a choice of open, flexible standards-based solutions.
Another factor often overlooked is the importance of support, maintenance and training services. These services, which also allow for faster adoption of voice-enabled services, are an integral part of a complete offering.
Enterprises are faced with three major choices for deploying a voice-enabled solution:
- Build and deploy the solution themselves
- Purchase a system from a technology vendor
- Have their solution externally hosted
The following evaluation criteria become paramount when considering a system and vendor, whatever be the choice:
- Does the chosen solution consist of components built to open standards?
- Is it scalable?
- Is the system easily modifiable and upgradeable?
- Does the platform provide easy plug and play to add new features?
- Can the vendor provide relatively inexpensive support through the entire application life cycle?
- Homegrown – building the solution ground-up
Many enterprises could choose to buy all of the components that make up the voice-enabled solution (including telephony equipment, network interface, software development tools, applications platforms, testing tools and the necessary computing hardware) and build it themselves. An option is to purchase the hardware components separately, integrate the platform and develop the application. Alternatively, the enterprise might choose to take advantage of a new level of building block -- a fully integrated application-ready platform. This “hybrid” option consists of a pre-configured server platform that contains all the necessary voice hardware on which they can base their solution. In any case, with either of these options, system integrators with product visualization experience can help them put the solution together more quickly and easily, and without the need to retain specialized developmental resources in-house. Building solutions can be considered advantageous for an enterprise, if a great degree of control on the technology and platforms are required with no dependence on external entities. Many a times, this approach may be adopted to meet the requirements of a diverse customer base. However, building your own solution can also have significant disadvantages. The challenge of acquiring and integrating these complex components in a suitable time frame can be significant. The cost of building a solution from scratch and then keeping the technology current, especially when it is not part of the core business, can be considerable. - Off-the-shelf - Buy and customize your solution
Since building a solution ground-up is not for everyone, there are many choices available to the enterprise, which would rather want to focus on end-user satisfaction and leave the integration to a dependable solution integrator. As the market for voice-enabled applications matures, enterprises learn to depend on the application visualization and development skills of integrators, who can help them to evaluate ready solutions available in the market and then customize them to fit the enterprise needs. When evaluating sources for a complete solution, enterprises should look for competent solution providers that can provide all services across the spectrum – evaluation of readymade packages, integration, customization, testing, deployment and support on any desired platform. The solution provider, in addition, would also assist in optimizing the use of existing individual hardware components to provide a reliable and bulletproof application. The applications thus developed can easily provide a ready means for making any future modifications. - Outsource – ASP hosted solution
On the data side, the Internet has evolved to a model where space, hardware, and certain content can be hosted by one or more third parties. Logically, the voice-enabled applications market has evolved in the same direction. A fully managed hosting solution provides the advantages of a bandwidth-rich environment that is built and monitored to ensure mission-critical reliability. Such facilities allow enterprises to deploy solutions almost immediately. They do not have to manage any telephony hardware, speech recognition software or bandwidth. Thus instead of focusing on the technology, enterprises can focus on their business. In addition, to dramatically decreasing time-to-market, outsourcing offers scale and expertise, while allowing enterprises to concentrate on their core business. Upfront investments are minimum, when compared to the other options and also the enterprises need not worry about the technology getting obsolete.
How do you go about implementing your voice-enabled solution?
Calsoft (Calsoftgroup.com), as a provider of cutting edge technology solutions, can help you succeed in this fast-moving, highly complex market. Calsoft offers complete application design, development, support and consulting services, as well as interfacing with third-party voice ASPs to help get your voice enabled solutions up and running fast. We offer a variety of development models for maximum choice and flexibility - all highly scalable and robust. Calsoft can help you evaluate solutions and application development frameworks that are right for your business and that will help you maximize your opportunity in this exploding market.


