[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | Europe-Swas-Artificial-Intelligence |
|
Moderator: | HERON::BUCHANAN |
|
Created: | Fri Jun 03 1988 |
Last Modified: | Thu Aug 04 1994 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 442 |
Total number of notes: | 1429 |
293.0. "The Computer is Listening" by YIPPEE::MCGREGOR () Mon Mar 11 1991 22:56
Maybe of interest. A "side-effect" of a recent trip to the UK...
Anyone out there have any opinions on where we are with respect to
VOICE, or have worked on a project with a voice component?
George
Speech Technology (Dragon, Marconi and the Edinburgh CSTR)
---------------------------------------------------------
George McGregor / Emerging Tech. IST - Valbonne
This short informal report is a result of a brief
trip to Aptech Ltd. in Newcastle and
the Centre for Speech Technology Research in Edinburgh in Feb 91
Aptech are the European distributers for the Dragon speech technology
products (Dragon is based in Newton MA).
Aptech/Dragon
-------------
The Dragon products are among the most advanced commercially available speech
technology products.
Aptech is a small company which provides voice technology and the hardware
and software associated (e.g they can provide the complete PC system
configuration necessary to run Dragon-Dictate). Dragon told us to contact
Aptech for European support.
They seem to basically have two lines of business, based on Dragon technology,
- Automatic dictation using Dragon-Dictate. Integration with different word
processing packages.
- Voice-driven applications, based on a product Aptech developed called
VOXDRIVE, based on DragonWriter technology.
These two areas comprise 90% of their business.
Dragon Dictate
--------------
This is a dscrete utterance speech recognition system. There needs to be a
pause between words. It can be used to replace the PC keyboard.
It is speaker independent, training via error correction during use.
Has a base vocabulary of 25000 words built in, and is user extendable up
to 5000 more.
Technical details on how it functions are scarce, but I have one 3 page paper
gives some information.
It needs: an MS DOS pc
8 meg of RAM to use the total 30 000 word dictionary
8 meg of space on a hard disk
The demo was impressive. After each word, a menu of possible choices
appears. If you say another word, the system moves on, accepting the
first in the menu. If you don't like the default, you can
select another by saying e.g. "select 2". There are a variety of commands
to do mods (only a couple of words back), and enter spell mode. In spell mode,
as you type/speak
the characters of the word, the system uses this new info along with the
accoustic info, to modify the menu of possible choices. When a new word is
encountered you have to spell it out. Next time, the system gets it right.
A couple of points not specificallly in the documentation:
- It needs a special voice cancelling micro, and the basic Dragon technology
does not perform well over phone links for example.
- The 25,000 preset word models contain a lot of duplications, and useless
stuff, e.g. plurals and other derivatives are stored separately, and there
are a lot of proper nouns (e.g. names of all the North American Indian
tribes!). If you hit the 30,000 barrier however, least used words are
replaced.
-A "language model" is used to order the choices in the possible word
menu, according to context, (and possibly narrow down the active words
checked against at each step)
This is statistical, based on analysis of a large corpus of text. It
seems not to be grammar based. For
example in the documentation, in the sentence "People who do *their*
presentations", the menu for the *their* is in order THERE, THEIR, THEY'RE,
which seems to indicate the absence of an underlying grammar.
- The task of conversion to another
language seems to be reasonably complex, since the 25,000 word basic vocabulary
is American English specific, and the corpus based "language model" has to
be developed. It is difficult to assess just how much "English
specific" handling is hardwired into the core of the system. A company in
Brussels (Lernout and Hauspie) has the task of creating different
language versions, but the feeling
at Aptech was that this work was not advancing quickly. The plan is a 5-10
year effort. (Why so long?) The first target language is Spanish.
DRAGONWRITER AND VOXDRIVE
------------------------
VOXDRIVE allows you to build voice driven interfaces to applications.
The vocabulary can therefore be more limited than DragonDictate.
The underlying system is called DragonWriter, also from
Dragon. With Dragonwriter, the board is the the same as Dictate, but the
driver software is different. Dragonwrite is a toolbox for
building voice applications. Aptech have added a more
friendly interface to the underlying system. The voice-driven
interface is defined as a series of contexts, where possible words
are defined (finite state graphs). This is used to guide the word recognition.
MARCONI TALKMAN
---------------
A commercial product aimed at the factory floor. Consisting of a PC tool
which allows you to build the voice driven task dialogue, and a belt worn box
and micro which you can carry around while you stocktake for example. This
box handles a 250 word vocabulary, and has text to speech sythesis built
in. You download the results into the PC via an optical link.
EDINBURGH Centre for Speech Technology Research
------------------------------------------------
50 researchers. They have a speaker dependent continuous speech system
(OSPREY) and some patented speaker verification stuff which reduces the
data needed to identify a speaker to tens of bits! They claim that OSPREY could
be ported to another European language with 2 months work.
Conclusion
----------
It is important to be aware of commercially available technology in this field,
since projects with voice component are starting to appear.
DragonWrite and Voxdrive are of more immediate interest to us than
DragonDictate, because these systems allow voice interfaces ( a la DIDDLY)
to be easily integrated/added to other systems.
We will try to at least get the OSPREY PC board from Edinburgh
We are getting Dictate, Writer, and Voxdrive for a 6 week appraisal period.
We should consider installing them in the customer centre after this.
I have handouts on all of these systems, and a video of OSPREY at my desk.
If you want more technical detail these should help.
George
T.R | Title | User | Personal Name | Date | Lines |
---|
293.1 | Thanks -that was interesting! | MUNSBE::BRITTAIN | Peter, ACT-IS/IT Munich @UFC *773-3102 | Tue Apr 16 1991 16:44 | 1 |
|
|