T.R | Title | User | Personal Name | Date | Lines |
---|
1401.1 | | WJG::GUINEAU | | Wed May 04 1988 08:34 | 27 |
|
I had thought of this awhile back (before I even knew about Amiga, in fact,
probably before Amiga was even born!).
Voice Recognition is no easy task! The human voice is extreamly
flexible and varient. Even the same person speaking the same *word*
will have trouble matching exactly a pre-digitized record.
The question becomes: "Is this new input *close enough* to the stored one?"
But "close enough" is a hard thing to determine. (there are people at major
universities doing just this kind of thing, and they don't make it look easy!)
I had originally thought of doing "slope analysis" on the waveform. This is
(my own dreamed up screw ball method) of selecting a sample rate and then
comparing the relative slopes between each sample point with those of the
original waveform. If all the slopes of the whole sample are within some
tolorance of the original, it's a match.
The problem here is finding a common starting point. Phase shifting the sample
back and forth may help solve this (i.e. Didn't match?, Hmm shift the whole
sample x number of sample periods 'left' and try again...)
This must sound rediculous!
John
|
1401.2 | ... | LEDS::ACCIARDI | | Wed May 04 1988 09:34 | 35 |
| Didn't DEC have a whole group dedicated to a produt called 'DECTalk'
or some such? I remember reading articles about their work long
before I joined the company.
I don't know squat about voice recognition, but I do perform a little
signal analysis now and then on an HP dynamic signal analyzer.
We attempt to characterize spindle ball bearings by their unique
defects. We do this by passing the signal from a capacitive
displacement transducer, differentiated twice, through the analyzer
and performing an FFT on the signal. We can now identify specific
spindles by their frequency content. Unfortunately, the phase
information, although not completely lost, is difficult to extract
without performing an inverse FFT.
The gist of this is that the Amiga needs a good, fast FFT algorithm.
I haven't seen any yet, although I believe one of the waveform editing
packages has 'FFT' as one of it's drop down menu choices.
In fact, it kind of irks me that all the decent signal analysis
software exists for Pee Cees. With great graphics, a super fast
expansion buss, and reasonably fast processor, the Amiga could be
a killer signal processing system.
Anyway, I don't know if voice recognition should be performed in
the time domain or the frequency domain. Maybe both?
Re: .0
There are lots of sound sampling hardware/software packages available.
The sampled sounds are stored as IFF standard files to allow
interchange with other packages and music programs. The size of
the sampled sound and duration depends on the sampling rate you
select. They can get very huge in a hurry.
Ed.
|
1401.3 | | 45384::MASON | | Wed May 04 1988 10:37 | 36 |
|
Yes they did, I have one. But from what I know about the DECTalk
( I didnt get a manual with it ) the DECTalk only responds to data
input via the keyboard. There is no input for a microphone only
for headphones. You can change any one of the seven on-board voices
inside the DECTalk to be any voice you like and you can change the
way certain words sound.
I am sorry but I found your method incredibly complex to the eye
and got completely lost when trying to understand your method for
comparing input. Would you like to explain this in some naive user
sort of way? Please.
Re .1
This is quite an interesting idea. A way to find the correct starting
point for comparision could be to take say three or four points
and as soon as you find a match for these in both patterns, you
could even the two waves up and do your comparision. A way to be
sure that the same person is speaking regardless of variations in
speech could be to only accept the input if say 85% plus of the
points in the input are the same. This would not require 100%
matches and would allow the user access. Obviously the machine
is going to require some kind of specific input. If a person speaks
in a high pitched voice on purpose the machine is bound to recognise
their input as valid. This is a good way of doing this and would
not be too difficult to code providing I could interpret the file
containing the wave. Any thoughts??
It looks like I have stepped into some dodgy ground here which is
more difficult than I thought. Never mind, I still think I will
give it a shot even if I do end up facing a brick wall in every
direction.
Keep the info coming in,
Paul.
|
1401.4 | HearSay | ANGORA::SMCAFEE | Steve McAfee | Wed May 04 1988 10:41 | 10 |
|
You might want to find a couple of papers on the HearSay voice
recognition system. This uses a blackboard approach to solving
the problem. I don't think they ever got it working in real time,
but I believe it did work...
Sorry, I don't have any references at hand. Pick up any recent
AI textbook and try the index/bibliography.
- steve
|
1401.5 | lots of false starts in this area | SAUTER::SAUTER | John Sauter | Wed May 04 1988 11:31 | 34 |
| I worked at an AI lab in the 1960s, and one of the sub-groups there
was working on voice recognition. They were not very successful,
as I recall. The problem is to extract cues from the speech waveform
that can be used to match against the model.
My memory is hazy, but I think they were using frequency-domain
analysis: they had chosen three frequency bands, and they measured
the amplitude in each band every few milliseconds. They then tried
to match this pattern with the patterns recorded earlier, to recognize
the sentence. They tried to recognize sentences rather than words
because a sentence has lots more cues than a word.
Shortly before I left I got a copy of their data base and tried
to synthesize a waveform that would produce the same cues. I wanted
to see if I could recognize the sentence by hearing it, reasoning
that if I couldn't then they weren't using the right information
to create their patterns. I played the waveforms through the digital
music interface developed for John Chowning's experiments in computer
music. It took a lot of imagination to understand the sentence
from the sound I produced, so I concluded that they weren't gathering
the right information.
It may be that my synthesis program wasn't working correctly--I
was never sure what constants they were multiplying the amplitudes
by in each frequence band. Also, such programs are very hard to
debug. I should have run its output back through the recognizer
to see if it "recognized" the sentence I was synthesizing, but I
never did.
Voice recognition is not a simple task. I'm sure it's made progress
since my experience in it, but if you start from scratch you will
have to make all of the early mistakes over again. As one or two
of the previous replies said, start from an AI textbook.
John Sauter
|
1401.6 | Yes it's been done | MQFSV2::DESROSIERS | Tout est possible | Wed May 04 1988 12:31 | 12 |
| There are a number of chips that do just that, I saw an article
in a French magazine (Micro Systemes) that used some uPD series
chips to do voice recognition and Steve Ciarcia of Byte magazine
had an article on voice rec. using a different chip, the whole thing
hooked up to an Apple II or to a C64. Mind you these things could
not do speech to text and had a limited number of words that could
be learned and recognized, but at the price they were going for,
and the fact that it was done on such lowly machines made the whole
thing whorthwhile.
Jean
|
1401.7 | This sounds promising | 45384::MASON | | Wed May 04 1988 12:53 | 24 |
| Really!! You havent got any more information have you about the
issue number that the article appeared in for Byte Magazine?? This
would be fantastic if you have. What about the French magazine??
It doesnt matter that it is in French I would be able to get it
translated. If I could find out who supplies either of these chips
it could greatly reduce my task and maybe this company has developed
the chip even further now it high levels. Any help would be very
helpful indeed.
RE .5
If I can not succeed with the assistance of these chips mentioned
then I may just do that. I realise that this is going to be a very
big task but I am under no pressure as this is for the pure job
satisfaction of trying to complete such a task. It is inevitably
going to be a long process but what have I got to lose apart from
a few restless nights.
Thanks everybody for the input.
Regards,
Paul.
|
1401.8 | Just don't try using it with a cold... | TEACH::ART | Art Baker, DC Training Center (EKO) | Wed May 04 1988 14:21 | 22 |
|
I have the relevant issues of BYTE at home; I'll
upload the publication info for you tonite. The
software they use for recognition was pretty simple-
minded; it takes the output of some LPC chips and
compares the speech-parameters generated by new
input against the stored parameters of the words
it has been trained with. When it finds a match,
it assumes that's what you must have said. (For their
purposes, "match" is defined to be whatever is closest
in speech-parameter-space; that takes care of some of
the fuzziness associated with human speech.) Usual
restrictions: limited vocabulary, speaker dependent,
has to be trained to hear everything you plan to say,
discrete utterances (i.e. no connected speech).
Circuit Cellar Inc sells the whole thing as a kit;
unfortunately, it only comes with C64 or Apple II
interfaces. They might be able to help you rework
it a little.
("No, choose, Doctor !" ... "Snowshoes, Doctor?")
|
1401.9 | Just a thought... | DYO780::WILDER | U comes before V in the alphabet | Wed May 04 1988 15:07 | 21 |
| A more economical and reliable approach might be to consider using
touch-tone hardware for your audio input. I don't know if this
would fit in with how you wish your security system to work though
I'd suggest it. I can well appreciate that the challenge of voice
recognition might be an overriding consideration. Sounds like fun.
When I worked a General Motors, they had an operator network country-
wide that allowed you to call a local number in most cities, give
an access number, and then be connected to any long-distance number.
GM then elimiated most of the human operators by installing a voice
recognition system to enter your access code and destination number.
It ran on a large IBM system and only had to understand 12 words
(zero thru nine, yes, and no) and would usually work although not
always on the first try. If it just couldn't understand anything
you said, as a fallback it would ring you thru to a human operator.
It's a tough problem. Of course in this scenario, it may have only
had to understand 12 words but it had to understand anyone who said
those 12 words. A little different problem than trying to match
the same word spoken by the same person.
dan
|
1401.10 | Voice-recognition phone story | OLIVER::OSBORNE | Blade Walker | Wed May 04 1988 15:59 | 24 |
| Just a little story about voice recognition:
A friend of mine (Bob) has a voice recognition telephone. He has to "train"
it by speaking the name to be dialed many times, and then entering the
phone number associated with the spoken name.
So one day he wanted to demonstrate it to me and another friend. He spoke
my name several times, and the phone ignored it. In my usual intrusive way,
I asked to try it, Bob said it wouldn't recognize my voice, since I hadn't
"trained" it. I tried anyway, and my other friend said, "Nah, Bob's voice
is more nasal". So I pinched my nose and said my name again, and the phone
dialed my number.
Voice recognition has a way to go. Distinguish between two people? I wouldn't
try to get it past Rich Little...
In a book titled "Making your own Robot", or something similar, Tod Loofburrow
descibes a voice-recognition system implemented on a KIM-1. This is a pretty
primitive computer, I think it did frequency comparison to an averaged
set of samples, picking the closest. If you're interested in the book, I'll
see if I can find it at home.
John O.
|
1401.11 | | WJG::GUINEAU | | Wed May 04 1988 19:57 | 9 |
|
Funny. I just realized *why* you want to use voice recognition - Security
for Amiga...
Well, seeing as I can walk to your machine boot (or hit C-A-A) and then
bang on control C till startup-sequence aborts...
John
|
1401.12 | | 45384::MASON | | Thu May 05 1988 06:32 | 25 |
| Yes you could do that, but I am not using it so much for somebody
not to gain access to the machine full stop I want to be able to
limit what a user of the machine can do. Since MS-DOS doesnt have
anyway of entering a password like VMS and it doesnt have a key
on the front like an IBM then what else can I do to stop nasty little
people using my machine?? There is no sure way to stop anybody
getting into an MS-DOS system. I mean even on the Rainbow when
people thought they had it fixed. All you have to do is stick an
MS-DOS disk in drive a:, boot MS-DOS off drive a: and then swap
over to either E: or F: and there you go you are straight into the
hard disk and have a field day. There is no secure MS-DOS system
but unless you know about things like ctrl C on startup you shouldnt
be able to get into the system. Besides some security is better
than no security isnt it??
Please do get me a copy of that article I would really appreciate
it. I think I am going to drop the idea of a Voice Recognition
chip. It doesnt seem flexible enough for my needs. Maybe I should
consider something like telling a user to put their middle finger
onto a template and then comparing their finger print with one of
the authorised ones on disk. Who could forge something like a finger
print ( apart from James Bond )??
Paul.
-----
|
1401.13 | are you out in the jungle or what? | YIPPEE::GOULNIK | OogaboogaBox type | Thu May 05 1988 07:35 | 16 |
|
It seems you're into pattern recognition, one way or another.
Another idea then is to use the mouse, a lightpen or any such,
and ask the would be user to draw his signature, or any stored
drawing for that matter. It's not any easier than speech recognition
but at least does not require additional equipment. In any case
you might want to have a go at some neural-net stuff, which
potentially exhibit the features you're looking for: learning by
examples, reasonable output given a noisy/partial/distorted input
and fast reponse time. The major problem is in the training, which
can take quite a long time, but I suppose it can be fun. There is a
conference on that topic by the way on TLE::NEURAL_NETS.
Having said that, I think you'd be better off buying a secure case,
or locking the door.
Iv.
|
1401.14 | hardware protection | WJG::GUINEAU | | Thu May 05 1988 08:49 | 12 |
|
I don't know much about the Amiga BUS (yet) but I imagine you could put
some security stuff in a ROM that took over and made you do *something*
before allowing the machine to boot (like pressing the mouse buttons
in some order, or hitting certian keys...) Although without the OS
up, you'ld be on you own as far as getting input from devices...
But that would sure keep people guessing... (Hmmm, poor guy. His Amiga
won't even boot!)
John
|
1401.15 | | 45384::MASON | | Thu May 05 1988 09:41 | 12 |
| Yes, I thought about locking all the peripherals, RAM and ROM unless
they had gone through the correct channels and been given clearance.
However as mentioned if the user controls out of the routine before
this is performed then no security. Unless of course I did something
to the machine every time I turned it off. This would be one hell
of a feature and save a lot of money and time. The mouse clicking
idea is very good. Or even a password typed in but not echoed.
What would happen if there was a bug in it or you forgot the password??
I mean how could you ever get back into your AMIGA again. It could
cost a fortune developing this.
Anybody got any suggestions on how I could approach this??
|
1401.16 | Qmouse,timesaver,or fool them | TRUMAN::LEIMBERGER | | Thu May 05 1988 10:11 | 12 |
| There is a small utility I pulled off usenet a long time ago.it
is called QMOUSE.what happens is when you boot the amiga it will
poll the mouse and if it sees the left? button depressed it will
use an alternate startup routine.I have a amiga 1000 with timesaver
and this allows one to use a four letter password to disable the
keyboard.I don't know it it can be used on the 2000.Of course
it is connected to the keyboard cable so it won't work on a 500.
While not in the relm of pattern recoginition one could always
set the forground,background colors to the same color and this
may confuse the unexpecting.(confuses me if i do it by mistake.
bill
|
1401.17 | Idea? | MQFSV2::DESROSIERS | Tout est possible | Thu May 05 1988 10:24 | 12 |
| Paul,
Il'l rummage trough my clippings for the article in the french
magazine, even if you don't build it, it's a nice circuit, could
give you ideas for other projects.
Now I don't know if this is a good idea, but since some weird
persons made made viruses that lived in the boot blocks, could the
boot blocks be infected with a password program?
Jean
|
1401.18 | Password signature. | THEONE::PARSONS | Down-under computing... | Thu May 05 1988 19:43 | 8 |
| A recent short piece on Australian TV mentioned a password protection
scheme that involved measuring the time between keypresses when
typing the password, so not only do you have to have the right
password, but it has to be typed in the same manner as the owner
of the password. Sort of typing recognition idea as opposed to voice
recognition. I guess computing after partying would be out, in that
case. Regards Guy.
|
1401.19 | I think I've got it | 45384::MASON | | Fri May 06 1988 05:38 | 10 |
| Yes well I think I have come up with the best solution. It might
not be high tech or look very flash, but fitting a padlock onto
the cover over my Amiga is one hell of a way to top people hacking.
Thanks for the info but I think I'll let the professionals deal
with Voice/Keyboard/Password recognition. I'm not up to it.
Thanks everybody,
Paul.
|