[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1482.0. "Changing the Speed of Voice and Music?" by GAUSS::ASFOUR () Fri Aug 16 1991 12:17

    I'm not sure which notesfile to post this in, so I figured
    I'd start here.
    
    I've scanned a couple of papers describing several methods to
    change the speed of speech without changing its spectral
    characteristics. 
    
    e.g. play back speech slower without making it sound lower in pitch
    or distorting it.
    
    Does anyone know anything about any such techniques? Does anyone
    know if these techniques (or their equivalents) would work equally 
    well with other forms of sound such as CD quality music? 
    
    I'm working on a multi-media A/D project and we would like to be able 
    to synchronize several sources of audio and video together. Sometimes,
    the audio could take a longer or shorter time than the video,and we
    would like to be able to "shrink" or "expand" it so that they would 
    fit.
    
    Any pointers to "cookbook" methods to implement these (as opposed
    to translating the math in these papers to programs) would be 
    a great help. 
    
    		thanks.
    			Yousif.
    
    P.s. The papers I'm refering to are found in ___Speech Enhancement__
    edited by Jae S. Lim, Prentice-Hall Signal Processing Series, C 1983.
    One paper in the book is "Time-Scale Modification of Speech Based on 
    Short-Time Fourier Analysis", M.R. Portnoff, taken from the IEEE 
    transactions on Acoustics, Speech and Signal Processing,Vol 29, No3,
    June 1981.
T.RTitleUserPersonal
Name
DateLines
1482.1ALLVAX::JROTHI know he moves along the piersFri Aug 16 1991 16:0616
    If you have access to the usenet newsgroups, it may be worth posting
    a query in the comp.dsp group.

    Pitch/time shifting is well known, but does involve some trickery - you
    have to "splice" pieces of waveforms together at the zero crossings to
    avoid glitches.  Companies like Lexicon are experts at this.

    Another source (possibly) would be the application notes for DSP
    products from Motorola, TI, and Analog Devices.  There are public
    bullitin boards with code online as well.

    Actually, with DSP addons becoming so widely available even for PC's
    I expect shareware to become available for a whole gamut of signal
    processing tasks.  Lots of people are working in this area!

    - Jim
1482.2comp.dspHGRD01::CLCHEUNGMon Aug 19 1991 02:297
    
    There is some discussion of this topic in comp.dsp in the past week.
    
    I can post it or send mail to you if you need it.
    
    -CL
    
1482.3try DNEAST::COMMUSIC conferenceBRSTR1::SYSMANDirk Van de moortelTue Aug 20 1991 04:1211
I know that with a pitch to MIDI converter you can (with a microphone) record
a (not too complicated) voice into a sequencer (MIDI recorder). From there on
you should be able to playback at any speed without the pitch being distorted.

Have a look in the DNEAST::COMMUSIC conference... this will open a whole new
world for you.
If you post your question in COMMUSIC, you'll shurely get lots of hints...

Good luck!

Dirk
1482.4Could you please post comp.dsp discussion?GAUSS::ASFOURTue Aug 20 1991 10:438
    re .2
    	thanks. Could you please post the comp.dsp discussion, or 
    	you can send me mail to 3D::ASFOUR.
    
    re others
    	Thanks for the pointers. I'll follow up on them.
    
    			Yousif.
1482.5comp.dsp - 1HGRD01::CLCHEUNGWed Aug 21 1991 00:0471
Here is the first extraction from comp.dsp.

---------------------------------------------------------

Dear comp.dsp readers,

Thanks to everyone who responded to my questions about slowing down
speech files!

Here's my summary. Sorry about the delay...

---

The 2nd of July I wrote:

>I need algorithms or source code for a routine to slow down
>a speech file WITHOUT affecting the frequency domain.

---

Here are some replies I got:

>From: Greg Sandell ([email protected])
>
>What you want is a phase vocorder. There is C code for such a
>program in F. R. Moore's "The Elements of Computer Music",
>Prentice Hall, 1990.

>From: Peter Silsbee ([email protected])
>
>You might want to check out an article by Keith Lent in the Dec. 1989
>(I think) Computer Music Journal. It describes an algorithm for pitch-
>shifting which maintains the spectral *envelope* of a sound.
                                                     
>From: Brett Ninness ([email protected])
>
>I'm very interested in trying to time scale speech without altering
>its spectral content. The main papers in the area are:
>
>M. R. Portnoff, "Time-Scale Modification of Speech Based on Short-Time
>Fourier Analysis", IEEE ASSP-29 No 3 June 1981
>
>and
>
>T. F. Quatieri, "Speech Transformations Based on a Sinusoidal
>Representation" in IEEE ASSP-34 No 6 December 1986.

---

I also got a reply, describing a simple, straight-forward method to do
this. It goes like this:

One plays a portion of, say 20-100 ms, repeats it and skips to the next
20-100 ms chunk. One could probably play 50 ms, repeat the last 20 ms,
and so on, to slow down just a little bit.

I tried this method and it worked. Of course, it introduces quite a lot
of noise and distorts the sound when you try to slow it down too much.

I think it sounded pretty good at half the speed though.


                                Thanks again to all who responded!

                                        - Anders Ohlsson
                                        - [email protected]

--------------------------------------------------------------------------


                  
1482.6comp.dsp - 2HGRD01::CLCHEUNGWed Aug 21 1991 00:0531
The second extraction

-----------------------------------------------------------------

Subject: Slow it down...

Someone recently asked in comp.dsp:

Is there a way to slow down the rate of (e.g.) speech, without altering the
pitch?

Several decades ago, the Radiophonic Workshop of the BBC used a modified tape
machine to do this.  They recorded the speech normally onto the tape, then
played it back with a rotating head.  The speed of delivery of the speech
depends on the "absolute" rate at which the tape moves (i.e. the rate at
which it comes off the spools), whereas the pitch depends on the "absolute"
rate and also on the rotation speed of the playback head.  Since you can
control these speeds independently, you have independent control over the
speed & pitch.

There are one or two obvious problems with this approach, but
I think it's very clever.

----
Tony Fisher
Dept. of Computer Science, The University of York, York YO1 5DD, U.K.
Tel. +44 904 432738 or 432722
Janet:	  [email protected]
Internet: fisher%[email protected]
UUCP:	  [email protected] (..!uunet!mcsun!reading!minster!fisher)
    
1482.7comp.dsp - 3HGRD01::CLCHEUNGWed Aug 21 1991 00:0635
The thrid extraction.
-----------------------------------------------------------------------------
In comp.dsp, [email protected] writes:

>     Is there a way to slow down the rate of (e.g.) speech, without altering the
>     pitch?
> 
>     Several decades ago, the Radiophonic Workshop of the BBC used a modified tape
>     machine to do this.  They recorded the speech normally onto the tape, then
>     played it back with a rotating head.  The speed of delivery of the speech
> 
>     ----
>     Tony Fisher

Back in the 1960's, there was an article in Popular Electronics 
describing a system like this with the trademarked name "Squeech".
You could call a phone number and listen to a recording of it.

We now return to the 1990's!  You can buy a cassette recorder from GE 
called the Fastrak which allows a continuously variable control of speech
from 15% slower than normal up to 100% faster than normal without varying
the pitch.  There is a note on the unit that says the IC that does it (no
rotating heads!) is licensed from Variable Speech Corp.  It appears to
do the same thing as the rotating heads, only electronically (probably
uses a Reticon analog delay line, or something like that).

Also, our company's voice mail system (ASPEN by Octel) has a feature
where you can speed up playback of your messages.  The ASPEN implementation
is considerably better than the GE Fastrak system (i.e. it sounds better).

It ought to be fairly easy to implement it with a DSP chip.

Rick Karlquist
[email protected]

1482.8comp.dsp - 4HGRD01::CLCHEUNGWed Aug 21 1991 00:0735
The final one.  Pls follow comp.dsp for the coming one.

--------------------------------------------------------------------------------
In article <[email protected]> [email protected] (Rick Karlquist) writes:
>In comp.dsp, [email protected] writes:
>>  Is there a way to slow down the rate of (e.g.) speech, without altering the
>>  pitch?
>  There is a note on the unit that says the IC that does it (no
>rotating heads!) is licensed from Variable Speech Corp.  It appears to
>do the same thing as the rotating heads, only electronically (probably
>uses a Reticon analog delay line, or something like that).

Back in 1974-75, Electronics Magazine had a writeup on the VSC circuit,
with a demonstration on one of those floppy plastic phonograph records.
It sounded wierd, with a lot of low frequency hum/modulation, but better 
than most voice synthesizers I've played with.

They used an analog delay line with a (exponentily?) swept clock 
to stretch portions of the signal.  The rest of the signal got
thrown out.  Each sample was fairly long, 20 to 50 milliseconds.
There was some mention of detecting the start of a syllable, and 
using it to select that sample portion.  To speed up slower speech, 
I guess they recycle the samples through the delay.

Digitally this could be done with a (variable clocked) A/D feeding
a fifo which fed a D/A.  Where the fifo would hold a contiguous block 
of samples with duration of 20 milliseconds or so.

This only works because speech has a low baud rate (20 Hz or so) and
you can tolerate throwing out half of the signal (in the time domain).

Mark Zenier  [email protected]  [email protected]

---------------------------------------------------------------------------