[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference hydra::amiga_v1

Title:AMIGA NOTES
Notice:Join us in the *NEW* conference - HYDRA::AMIGA_V2
Moderator:HYDRA::MOORE
Created:Sat Apr 26 1986
Last Modified:Wed Feb 05 1992
Last Successful Update:Fri Jun 06 1997
Number of topics:5378
Total number of notes:38326

1401.0. "Voice Recognition??" by 45384::MASON () Wed May 04 1988 07:51

    Hi,
    
    I more regularly read this note conference than I do write to it
    since most of my questions seem to be quite common.  However there
    is one area I am becoming very confused indeed with. 
    
    This area is music on the amiga, or to be more specific digitising
    sounds.  I am not particularly interested in making music on my
    Amiga but I am very interested with the concept of digitising sounds,
    storing these on disk and then creating a security system on my
    Amiga through voice recognition.  Sounds like something out of a
    movie doesnt it.  I know that this concept is perfectly possible
    providing I have two crucial items.  Firstly the correct software to
    write the sounds to disk.  My questions arise here as to how many
    seconds of sound you can store in a 1 megabyte A500.  If I want
    to store more is it possible to write each megabyte of sound to
    disk and then merge the whole lot into one file.  The second piece
    of software which I think is going to be the real killer is one
    that will store the user's input and compare it with that in the
    file.  Are there are any relatively generic pieces of software which
    can compare input sounds with sounds on disk?? Thats what I thought.
    Any suggestions on how I could write one??  Going back to the first
    piece of software I would want it to be able to manipulate input
    and distort in a large number of ways.  You know the usual reverse
    etc etc.   No price limit,  I just want the best of the best.  
    
    All help gratefully received,
    
    Thanks,
    
    Paul.
T.RTitleUserPersonal
Name
DateLines
1401.1WJG::GUINEAUWed May 04 1988 08:3427
I had thought of this awhile back (before I even knew about Amiga, in fact, 
probably before Amiga was even born!).

Voice Recognition is no easy task!  The human voice is extreamly
flexible and varient. Even the same person speaking the same *word*
will have trouble matching exactly a pre-digitized record.

The question becomes: "Is this new input *close enough* to the stored one?"

But "close enough" is a hard thing to determine. (there are people at major
universities doing just this kind of thing, and they don't make it look easy!)

I had originally thought of doing "slope analysis" on the waveform. This is
(my own dreamed up screw ball method) of selecting a sample rate and then
comparing the relative slopes between each sample point with those of the
original waveform. If all the slopes of the whole sample are within some
tolorance of the original, it's a match.

The problem here is finding a common starting point. Phase shifting the sample
back and forth may help solve this (i.e. Didn't match?, Hmm shift the whole
sample x number of sample periods 'left' and try again...)


This must sound rediculous!

John
1401.2...LEDS::ACCIARDIWed May 04 1988 09:3435
    Didn't DEC have a whole group dedicated to a produt called 'DECTalk'
    or some such?  I remember reading articles about their work long
    before I joined the company.
    
    I don't know squat about voice recognition, but I do perform a little
    signal analysis now and then on an HP dynamic signal analyzer. 
    We attempt to characterize spindle ball bearings by their unique
    defects.  We do this by passing the signal from a capacitive
    displacement transducer, differentiated twice, through the analyzer
    and performing an FFT on the signal.  We can now identify specific
    spindles by their frequency content.  Unfortunately, the phase
    information, although not completely lost, is difficult to extract
    without performing an inverse FFT.
    
    The gist of this is that the Amiga needs a good, fast FFT algorithm.
    I haven't seen any yet, although I believe one of the waveform editing
    packages has 'FFT' as one of it's drop down menu choices.  
                                    
    In fact, it kind of irks me that all the decent signal analysis
    software exists for Pee Cees.  With great graphics, a super fast
    expansion buss, and reasonably fast processor, the Amiga could be
    a killer signal processing system.
                                      
    Anyway, I don't know if voice recognition should be performed in
    the time domain or the frequency domain.  Maybe both?
    
    Re: .0
    
    There are lots of sound sampling hardware/software packages available.
    The sampled sounds are stored as IFF standard files to allow
    interchange with other packages and music programs.  The size of
    the sampled sound and duration depends on the sampling rate you
    select.  They can get very huge in a hurry.
    
    Ed.
1401.345384::MASONWed May 04 1988 10:3736
    
    Yes they did, I have one.  But from what I know about the DECTalk
    ( I didnt get a manual with it ) the DECTalk only responds to data
    input via the keyboard.  There is no input for a microphone only
    for headphones.  You can change any one of the seven on-board voices
    inside the DECTalk to be any voice you like and you can change the
    way certain words sound.            
    I am sorry but I found your method incredibly complex to the eye
    and got completely lost when trying to understand your method for
    comparing input.  Would you like to explain this in some naive user
    sort of way? Please.
    
    Re .1
    
    This is quite an interesting idea.  A way to find the correct starting
    point for comparision could be to take say three or four points
    and as soon as you find a match for these in both patterns, you
    could even the two waves up and do your comparision.  A way to be
    sure that the same person is speaking regardless of variations in
    speech could be to only accept the input if say 85% plus of the
    points in the input are the same.  This would not require 100% 
    matches and would allow the user access.  Obviously the machine
    is going to require some kind of specific input.  If a person speaks
    in a high pitched voice on purpose the machine is bound to recognise
    their input as valid.  This is a good way of doing this and would
    not be too difficult to code providing I could interpret the file
    containing the wave.  Any thoughts??
 
    It looks like I have stepped into some dodgy ground here which is
    more difficult than I thought.  Never mind,  I still think I will
    give it a shot even if I do end up facing a brick wall in every
    direction.
    
    Keep the info coming in,
    
    Paul.
1401.4HearSayANGORA::SMCAFEESteve McAfeeWed May 04 1988 10:4110
    
    You might want to find a couple of papers on the HearSay voice
    recognition system.  This uses a blackboard approach to solving
    the problem.  I don't think they ever got it working in real time,
    but I believe it did work...
    
    Sorry, I don't have any references at hand.  Pick up any recent
    AI textbook and try the index/bibliography.
    
    - steve
1401.5lots of false starts in this areaSAUTER::SAUTERJohn SauterWed May 04 1988 11:3134
    I worked at an AI lab in the 1960s, and one of the sub-groups there
    was working on voice recognition.  They were not very successful,
    as I recall.  The problem is to extract cues from the speech waveform
    that can be used to match against the model.
    
    My memory is hazy, but I think they were using frequency-domain
    analysis: they had chosen three frequency bands, and they measured
    the amplitude in each band every few milliseconds.  They then tried
    to match this pattern with the patterns recorded earlier, to recognize
    the sentence.  They tried to recognize sentences rather than words
    because a sentence has lots more cues than a word.
    
    Shortly before I left I got a copy of their data base and tried
    to synthesize a waveform that would produce the same cues.  I wanted
    to see if I could recognize the sentence by hearing it, reasoning
    that if I couldn't then they weren't using the right information
    to create their patterns.  I played the waveforms through the digital
    music interface developed for John Chowning's experiments in computer
    music.  It took a lot of imagination to understand the sentence
    from the sound I produced, so I concluded that they weren't gathering
    the right information.
    
    It may be that my synthesis program wasn't working correctly--I
    was never sure what constants they were multiplying the amplitudes
    by in each frequence band.  Also, such programs are very hard to
    debug.  I should have run its output back through the recognizer
    to see if it "recognized" the sentence I was synthesizing, but I
    never did.
    
    Voice recognition is not a simple task.  I'm sure it's made progress
    since my experience in it, but if you start from scratch you will
    have to make all of the early mistakes over again.  As one or two
    of the previous replies said, start from an AI textbook.
        John Sauter
1401.6Yes it's been doneMQFSV2::DESROSIERSTout est possible Wed May 04 1988 12:3112
    There are a number of chips that do just that, I saw an article
    in a French magazine (Micro Systemes) that used some uPD series
    chips to do voice recognition and Steve Ciarcia of Byte magazine
    had an article on voice rec. using a different chip, the whole thing
    hooked up to an Apple II or to a C64.  Mind you these things could
    not do speech to text and had a limited number of words that could
    be learned and recognized, but at the price they were going for,
    and the fact that it was done on such lowly machines made the whole
    thing whorthwhile.
    
    Jean
    
1401.7This sounds promising45384::MASONWed May 04 1988 12:5324
    Really!!  You havent got any more information have you about the
    issue number that the article appeared in for Byte Magazine??  This
    would be fantastic if you have.  What about the French magazine??
    It doesnt matter that it is in French I would be able to get it
    translated.  If I could find out who supplies either of these chips
    it could greatly reduce my task and maybe this company has developed
    the chip even further now it high levels.  Any help would be very
    helpful indeed.
    
    RE .5
    
    If I can not succeed with the assistance of these chips mentioned
    then I may just do that.  I realise that this is going to be a very
    big task but I am under no pressure as this is for the pure job
    satisfaction of trying to complete such a task.  It is inevitably
    going to be a long process but what have I got to lose apart from
    a few restless nights.
    
    
    Thanks everybody for the input.
    
    Regards,
    
    Paul.
1401.8Just don't try using it with a cold...TEACH::ARTArt Baker, DC Training Center (EKO)Wed May 04 1988 14:2122
	I have the relevant issues of BYTE at home; I'll 
	upload the publication info for you tonite.  The
	software they use for recognition was pretty simple-
	minded; it takes the output of some LPC chips and
	compares the speech-parameters generated by new 
	input against the stored parameters of the words
	it has been trained with.  When it finds a match,
	it assumes that's what you must have said. (For their
	purposes, "match" is defined to be whatever is closest
	in speech-parameter-space; that takes care of some of
	the fuzziness associated with human speech.)  Usual
	restrictions: limited vocabulary, speaker dependent,
	has to be trained to hear everything you plan to say,
	discrete utterances (i.e. no connected speech).

	Circuit Cellar Inc sells the whole thing as a kit;
	unfortunately, it only comes with C64 or Apple II
	interfaces.  They might be able to help you rework
	it a little.

	("No, choose, Doctor !" ... "Snowshoes, Doctor?")
1401.9Just a thought...DYO780::WILDERU comes before V in the alphabetWed May 04 1988 15:0721
    A more economical and reliable approach might be to consider using
    touch-tone hardware for your audio input.  I don't know if this
    would fit in with how you wish your security system to work though
    I'd suggest it.  I can well appreciate that the challenge of voice
    recognition might be an overriding consideration.  Sounds like fun.
    
    When I worked a General Motors, they had an operator network country-
    wide that allowed you to call a local number in most cities, give
    an access number, and then be connected to any long-distance number.
    GM then elimiated most of the human operators by installing a voice
    recognition system to enter your access code and destination number.
    It ran on a large IBM system and only had to understand 12 words
    (zero thru nine, yes, and no) and would usually work although not
    always on the first try.  If it just couldn't understand anything
    you said, as a fallback it would ring you thru to a human operator.
    It's a tough problem.  Of course in this scenario, it may have only
    had to understand 12 words but it had to understand anyone who said
    those 12 words.  A little different problem than trying to match
    the same word spoken by the same person.
    
    dan
1401.10Voice-recognition phone storyOLIVER::OSBORNEBlade WalkerWed May 04 1988 15:5924
Just a little story about voice recognition:

A friend of mine (Bob) has a voice recognition telephone. He has to "train"
it by speaking the name to be dialed many times, and then entering the
phone number associated with the spoken name. 

So one day he wanted to demonstrate it to me and another friend. He spoke
my name several times, and the phone ignored it. In my usual intrusive way,
I asked to try it, Bob said it wouldn't recognize my voice, since I hadn't
"trained" it. I tried anyway, and my other friend said, "Nah, Bob's voice
is more nasal". So I pinched my nose and said my name again, and the phone
dialed my number.

Voice recognition has a way to go. Distinguish between two people? I wouldn't
try to get it past Rich Little...

In a book titled "Making your own Robot", or something similar, Tod Loofburrow
descibes a voice-recognition system implemented on a KIM-1. This is a pretty
primitive computer, I think it did frequency comparison to an averaged
set of samples, picking the closest. If you're interested in the book, I'll
see if I can find it at home.


John O.
1401.11WJG::GUINEAUWed May 04 1988 19:579
Funny. I just realized *why* you want to use voice recognition - Security
for Amiga...

Well, seeing as I can walk to your machine boot (or hit C-A-A) and then
bang on control C till startup-sequence aborts...


John
1401.1245384::MASONThu May 05 1988 06:3225
    Yes you could do that, but I am not using it so much for somebody
    not to gain access to the machine full stop I want to be able to
    limit what a user of the machine can do.  Since MS-DOS doesnt have
    anyway of entering a password like VMS and it doesnt have a key
    on the front like an IBM then what else can I do to stop nasty little
    people using my machine??  There is no sure way to stop anybody
    getting into an MS-DOS system.  I mean even on the Rainbow when
    people thought they had it fixed.  All you have to do is stick an
    MS-DOS disk in drive a:, boot MS-DOS off drive a: and then swap
    over to either E: or F: and there you go you are straight into the
    hard disk and have a field day.  There is no secure MS-DOS system 
    but unless you know about things like ctrl C on startup you shouldnt
    be able to get into the system.  Besides some security is better
    than no security isnt it??
    
    Please do get me a copy of that article I would really appreciate
    it.  I think I am going to drop the idea of a Voice Recognition
    chip.  It doesnt seem flexible enough for my needs.  Maybe I should
    consider something like telling a user to put their middle finger
    onto a template and then comparing their finger print with one of
    the authorised ones on disk.  Who could forge something like a finger
    print ( apart from James Bond )??
    
    Paul.
    -----
1401.13are you out in the jungle or what?YIPPEE::GOULNIKOogaboogaBox typeThu May 05 1988 07:3516
	It seems you're into pattern recognition, one way or another.
	Another idea then is to use the mouse, a lightpen or any such,
	and ask the would be user to draw his signature, or any stored
	drawing for that matter. It's not any easier than speech recognition
	but at least does not require additional equipment. In any case
	you might want to have a go at some neural-net stuff, which
	potentially exhibit the features you're looking for: learning by
	examples, reasonable output given a noisy/partial/distorted input
	and fast reponse time. The major problem is in the training, which
	can take quite a long time, but I suppose it can be fun. There is a
	conference on that topic by the way on TLE::NEURAL_NETS.

	Having said that, I think you'd be better off buying a secure case,
	or locking the door.
Iv.
1401.14hardware protectionWJG::GUINEAUThu May 05 1988 08:4912
I don't know much about the Amiga BUS (yet) but I imagine you could put
some security stuff in a ROM that took over and made you do *something* 
before allowing the machine to boot (like pressing the mouse buttons
in some order, or hitting certian keys...) Although without the OS
up, you'ld be on you own as far as getting input from devices...

But that would sure keep people guessing... (Hmmm, poor guy. His Amiga
won't even boot!)

John

1401.1545384::MASONThu May 05 1988 09:4112
    Yes,  I thought about locking all the peripherals, RAM and ROM unless
    they had gone through the correct channels and been given clearance.
    However as mentioned if the user controls out of the routine before
    this is performed then no security.  Unless of course I did something
    to the machine every time I turned it off.  This would be one hell
    of a feature and save a lot of money and time.  The mouse clicking
    idea is very good.  Or even a password typed in but not echoed.
    What would happen if there was a bug in it or you forgot the password??
    I mean how could you ever get back into your AMIGA again.  It could
    cost a fortune developing this.
    Anybody got any suggestions on how I could approach this??
    
1401.16Qmouse,timesaver,or fool themTRUMAN::LEIMBERGERThu May 05 1988 10:1112
    There is a small utility I pulled off usenet a long time ago.it
    is called QMOUSE.what happens is when you boot the amiga it will
    poll the mouse and if it sees the left? button depressed it will
    use an alternate startup routine.I have a amiga 1000 with timesaver
    and this allows one to use a four letter password to disable the
    keyboard.I don't know it it can be used on the 2000.Of course
    it is connected to the keyboard cable so it won't work on a 500.
    While not in the relm of pattern recoginition one could always
    set the forground,background colors to the same color and this
    may confuse the unexpecting.(confuses me if i do it by mistake.
    
    							bill
1401.17Idea?MQFSV2::DESROSIERSTout est possible Thu May 05 1988 10:2412
    Paul,
    
    	Il'l rummage trough my clippings for the article in the french
    magazine, even if you don't build it, it's a nice circuit, could
    give you ideas for other projects.
    
    	Now I don't know if this is a good idea, but since some weird
    persons made made viruses that lived in the boot blocks, could the
    boot blocks be infected with a password program?
    
    Jean
    
1401.18Password signature.THEONE::PARSONSDown-under computing...Thu May 05 1988 19:438
    A recent short piece on Australian TV mentioned a password protection
    scheme that involved measuring the time between keypresses when
    typing the password, so not only do you have to have the right
    password, but it has to be typed in the same manner as the owner
    of the password. Sort of typing recognition idea as opposed to voice
    recognition. I guess computing after partying would be out, in that
    case.                       Regards   Guy.
    
1401.19I think I've got it45384::MASONFri May 06 1988 05:3810
    Yes well I think I have come up with the best solution.  It might
    not be high tech or look very flash, but fitting a padlock onto
    the cover over my Amiga is one hell of a way to top people hacking.
    
    Thanks for the info but I think I'll let the professionals deal
    with Voice/Keyboard/Password recognition.  I'm not up to it.
    
    Thanks everybody,
    
    Paul.