[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1665.0. "Calculating a statistical likelihood....." by LARVAE::TREVENNOR_A (A child of init) Thu Sep 17 1992 04:41

Hi Mathematicians!
    
	X posted from the SAS conference, can anyone here assist please?
    
    AT.
    
    
                   <<< YIELD::PAGESWAP$:[NOTES$LIBRARY]SAS.NOTE;1 >>>
                         -< SAS statistical software >-
================================================================================
Note 33.0             Statisticians I need your expertise!               1 reply
TRUCKS::TREVENNOR_A "A child of init"                38 lines   7-SEP-1992 14:15
--------------------------------------------------------------------------------
    Hi there,
    
    		My name is Alan Trevennor. I am a software specialist in
    the UK. I need help with a statistical problem. This was the only 
    conference I could find whose title seemed even remotely applicable to
    a question to which I need an answer. If anyone can mail me a more
    suitable conference location I'll be happy to remove this note there.
    
    I need the help of a statistician to answer a question which is well
    beyond my poor abilities to even attempt!
         
    On an audio CD each item (track) is held as data. The data is held in
    blocks of data called frames. Each frame on the disk has a number -
    they number from zero on up to some large number (26843545600 I think).
    
    Now, each CD contains a table of contents (TOC) entry which specifies
    at which frame number each item (track) on the disk commences and how
    many frames it plays for. Given that a CD can contain up to (I believe)
    64 items, and that - although they must not overlap - they can start
    almost any place on the disk and be of any length other than zero. So,
    to my question: What is the statistical likelihood of any two CDs
    having identical tables of contents - yet actually containing different
    material? Is it a million to one chance, or better?
    
    Why do I want to know this? Well, although the slots are provided in
    the CD data format the record companies do not currently put
    information onto CDs which uniquely idetifies a disk (like who it is by
    and what the tracks are called etc). If my software wants to know what
    is in the CD player I have to tell it. If I can be pretty darn sure by
    reading the TOC from a disk and recognising it that I have the right
    disk it sounds like a good interim solution. But how sure can I be? Is
    it an answerable question?
    
    Thanks in any case      
    Regards
    Alan Trevennor
    UK UNIX and Multimedia consultant.
    
    
T.RTitleUserPersonal
Name
DateLines
1665.1Bad news and good news.CADSYS::COOPERTopher CooperThu Sep 17 1992 18:3536
    I'm afraid that there just isn't enough information here.  We need to
    know how long an entry is on the average.  We need to know how much
    variation there tends to be on that length.  We need to know the
    pattern of lengths that are typical.  We need to know the same
    information for "gaps" which do not fall into any TOC entry.  We need
    to know the degree of tendency for the TOC entries to be ordered.  We
    need to know if there is any tendency to "program" a short entry after
    a long one, or other such patterns.  We need to know (depending on how
    we model this, it may or may not be independent) the probability of
    there being exactly 64 entries, the probability of there being exactly
    63 entries, etc.

    To illustrate the problem: what if there were a commonly used
    convention -- used, say, on half the CD produced by one company which
    published 10% of the collection -- which simply divides the available
    material into 64 equal length tracks?  Then any calculation based on
    random material is moot: roughly 5% of your material will have the
    same "signature".

    You also have to watch out for the birthday paradox here.  There is
    about one chance in 364 that two people have the same birthday, but
    there is better than 1 chance in 2 that *some* two people out of a
    group of (I think it's) 24 will have the same birthday (this ignores
    the tendency of people to be born at certain times of year -- which
    improves the chances).

    The good news is that the space is large enough that my intuition says
    that if there is even a moderate amount of variation, if CDs with only
    a few entries are rare, and there are no conventions leading to greater
    uniformity in subpopulations, and you aren't talking about really large
    CD collections (i.e., your talking about a personal collection rather
    than a studio or CD company library) then you probably come in easily
    under the one-in-a-million chance of failure.  Of course, my intuition
    could be wrong.

				    Topher
1665.2A formula, but caveat user...SSAG::LARYLaughter &amp; hope &amp; a sock in the eyeThu Sep 17 1992 21:4814
As Topher says, the biggest problem is how to deal with systematic
nonrandomness. If you can deal with that, and if your "discriminator"
has a large number N of values all about equally likely, and you want to
make the probability of a "false discrimination" less than some small
value p, then you should limit the size of your collection to S = SQRT(2*n*p).

For example, lets say the CD's hold popular music. If its a good assumption
that a CD holds at least 8 songs and the "seconds" part of the duration of a
song (the "ss" in "mm:ss") is more-or-less uniformly random, then you have a 
discriminator with 60^8 values. If you want to limit the chance of a false
discrimination to one in a million you must limit the size of your CD
collection to about 18,000 CD's.

[a more precise relationship between S << N and P is: (1-S/(2*N))^(S-1) = 1-P]
1665.3AUSSIE::GARSONThu Sep 17 1992 23:4217
    re .0
    
    Not a mathematical answer but...my mate's CD player can already do
    this. When you insert a CD the display shows the name of the CD. I
    asked him how it works. It turns out that he entered the names himself
    which are stored in some sort of non-volatile memory. This implies to
    me that the CD player can uniquely identify the CD (and then uses the
    key to access its 'database'). [Also stored in the database is the
    default track sequence you would like played for that CD so if you don't
    like a track you just leave it out etc.]
    
    Anyway that left the next question...how does it identify the CD? One
    guess I came up with is that it can read the bar-coded CD number that
    seems to be etched on the disk. Anybody know the answer?
    
    So...why not try the hi-fi buffs conference or ask in a specialist
    hi-fi store?
1665.4A tick in the TOC solution I think!LARVAE::TREVENNOR_AA child of initFri Sep 18 1992 14:5533
    
    RE: .all
    
    	Thanks! I have had good responses here and offline. The concensus
    seems to be that using commercially available CDs (as opposed to
    special purpose single track disks) the likelihood of getting two the
    same is not appreciable, and in any case the consequences of a clash
    are not catastrophic (embarrasing, yes: "Here M'lady is a multimedia 
    presentation on the life of Beethoven incorporating clips from his
    musical works, like this one.......just a mo' - thats ZZ Top...oooops).
    
     So, I think I'll go with my plan to use the TOC as a footprint.
    
    RE: .-1 
    
    	I looked into whether it was possible to buy a database of barcodes
    which translated discs codes to text, but the record companies which I
    contacted (RCA and Phonogram) said they had such a database on paper
    only, or that they had one which they had never thought about selling
    and wouldn't feel comfortable selling. RCA hinted darkly that they were
    "looking at this area". When I thought about the time involved in
    putting something like that together and keeping it up to date (loadsa
    record companies to deal with on a rolling basis) I backed off quite
    fast. If you can find out how your friends system does it then I'd be
    very interested. I'll also be quite suprised if it isn't TOC based.
    Could you mail me (to fangio::alan please) if you find out? Many
    thanks!
    
    Thanks to all for your excellent responses.
    
    Regards
    Alan T.