Infrastructure
Three infrastructure components are necessary to build
aninteractive voice response system:
Hardware
The hardware to be used does not differ from a normal server system
with the following differences:
- interactive voice response systems use more RAM: typically 2 Gigabyte (GB) and
more.
- Typically no databases or internet applications are installed on speach-dialogue-servers.
- Special ISDN boards from Dialogic or Acculab are used, which differ
from normal ISDN boards rearding the following points:
- One or more S2Mports are supported. Every
S2M-port allows 30 concurrent calls.
- The cards have high technology integrated,
which supported special telephone functions, like:
- Echo suppression
- Hissing suppression
- Conferencing
- Call reconnection to external phone
numbers
ENAiKOON decided to use Dialogic boards. This company
was bought by Intel in 1999 and today they are the market leader with
a market share of approx. 70%.
Standard Software
Operating System (OS)
Interactive voice response systems typically use operating systems
like WinNT or Win2000. The most important reason for this is that the
driver implementation and the implementation of the software libraries
was usually done for Windows systems. Only recently some Linux implementations
are available.
Up to now ENAiKOON did not migrate its IVR systems to Linux due to stability
reasons.
ENAiKOON is planning to migrate asap because of the standardization of
its web server farm and
its voice server farm.
Speech recognition system
There are various providers of good speech recognition software such as
IBM, Philips and Nuance. ENAiKOON
chose to go with speech
recognition software from Nuance, because we found this system
to suit our needs. The adaption of the English and German language was
done very nicely by Nuance.
Nuance communications is the world leader regarding this type of software.
Companies like British Airways, Deutsche Telekom, Loyds TSB Bank, SAS
and Telia Mobile use Nuance software.
Text-to-Speech Engine
Speech synthesis describes the computer based converstion from
written text into spoken text. This method is normally used when the text
output is extremely dynamic. In any other cases ENAiKOON uses trained
speakers for its voice systems.
ScanSoft is quoted
on the exchange (Nasdaq: SSFT) with over 500 employees and subsidiaries
all over the world. Their integrated TTS modules are used by leading manufacturers
in telecommunications, mobile communications, vehicle communications and
allow voice enabled components for UMTS, IVR, telematic-products, wireless
and pocket pc solutions.
Scansoft has recently been acquired by Nuance.
Database
Depending on the size of the database we use MySQL or Oracle. In both
cases the database is installed on a seperate server. This ensures that
the voice application and the database application do not influence each
other.
Application
The application represents the logic of the employment.
In the application it is defined how the user is greeted, which infos
must be given at what time, and how the system must react. Furthermore
the application has some typical speach application elements such as grammar,
a list of words and phrases, which the computer can understand.
Application Development
The application development is split into the following
parts:
Specification
The specification describes in detail, which tasks the voice
application has to cover and how this should happen. Furthermore
it describes the impression the caller should have of the system. (e.g.
more serious or more youthful).
Development of a Program Flow Chart
The flow sheet defines exactly in which steps the program has to "flow".
The program sequence defines exactly, in which steps the program is to
run-off, which questions in which situations are to be asked, which answers
are to be expected and to which answers is to be reacted. Furthermore
the data is written down to which application are available.
Grammar and Dictionary
The grammar is a list of words and phrases, which the application
must be able to handle, especially grouped into words and phrases per
question the system asks.
Example: When asking "When do you want to leave?" the user can
answer differently like "now", "tomorrow" or "on
the 14. january at 6 o'clock".
No matter what is answered, the system must calculate a concrete date
and time so that a database query can be successful.
The dictionary includes a transcription of all words and names. With the
help of the dictionary it is determined how a word is written which is
said in a certain manner. For example Google is pronounced "Gu:gel".
This means: if a caller says Gugel the machine without dictionary
would understand Gugel, not Google.
Coding of the Application
When the application
is coded theis means, that the previously defined routines are
programmed into the system. ENAiKOON does this by using predefined libraries
which then are updated with special elements of the application.
Example: The query of a certain time is allready implemented. Only this
function has to be used in this case. While coding and testing, the speach
output is realized with the help of the TTS-Engine (Text To Speach). This
makes it easier to change the announcements during testing.
Often program generators are used while programming voice applications,
which helps to speed up the coding process.
Testing Phase
During the testing phase, the voice application
is tested by a small number of persons which were not involved into the
development. It is tested, if these persons get along with the application
or if there are problems which need to be solved. In this case, the answers
which the computer did not understand
will be noted and implemented later.
Operation Phase
The system will be switched live as soon
as the customers accepts the status of the interactive
voice response system.
|