[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1665.0. "Calculating a statistical likelihood....." by LARVAE::TREVENNOR_A (A child of init) Thu Sep 17 1992 03:41

Hi Mathematicians!
    
	X posted from the SAS conference, can anyone here assist please?
    
    AT.
    
    
                   <<< YIELD::PAGESWAP$:[NOTES$LIBRARY]SAS.NOTE;1 >>>
                         -< SAS statistical software >-
================================================================================
Note 33.0             Statisticians I need your expertise!               1 reply
TRUCKS::TREVENNOR_A "A child of init"                38 lines   7-SEP-1992 14:15
--------------------------------------------------------------------------------
    Hi there,
    
    		My name is Alan Trevennor. I am a software specialist in
    the UK. I need help with a statistical problem. This was the only 
    conference I could find whose title seemed even remotely applicable to
    a question to which I need an answer. If anyone can mail me a more
    suitable conference location I'll be happy to remove this note there.
    
    I need the help of a statistician to answer a question which is well
    beyond my poor abilities to even attempt!
         
    On an audio CD each item (track) is held as data. The data is held in
    blocks of data called frames. Each frame on the disk has a number -
    they number from zero on up to some large number (26843545600 I think).
    
    Now, each CD contains a table of contents (TOC) entry which specifies
    at which frame number each item (track) on the disk commences and how
    many frames it plays for. Given that a CD can contain up to (I believe)
    64 items, and that - although they must not overlap - they can start
    almost any place on the disk and be of any length other than zero. So,
    to my question: What is the statistical likelihood of any two CDs
    having identical tables of contents - yet actually containing different
    material? Is it a million to one chance, or better?
    
    Why do I want to know this? Well, although the slots are provided in
    the CD data format the record companies do not currently put
    information onto CDs which uniquely idetifies a disk (like who it is by
    and what the tracks are called etc). If my software wants to know what
    is in the CD player I have to tell it. If I can be pretty darn sure by
    reading the TOC from a disk and recognising it that I have the right
    disk it sounds like a good interim solution. But how sure can I be? Is
    it an answerable question?
    
    Thanks in any case      
    Regards
    Alan Trevennor
    UK UNIX and Multimedia consultant.

T.R	Title	User	Personal Name	Date	Lines
1665.1	Bad news and good news.	CADSYS::COOPER	Topher Cooper	`Thu Sep 17 1992 17:35`	36
	I'm afraid that there just isn't enough information here. We need to know how long an entry is on the average. We need to know how much variation there tends to be on that length. We need to know the pattern of lengths that are typical. We need to know the same information for "gaps" which do not fall into any TOC entry. We need to know the degree of tendency for the TOC entries to be ordered. We need to know if there is any tendency to "program" a short entry after a long one, or other such patterns. We need to know (depending on how we model this, it may or may not be independent) the probability of there being exactly 64 entries, the probability of there being exactly 63 entries, etc. To illustrate the problem: what if there were a commonly used convention -- used, say, on half the CD produced by one company which published 10% of the collection -- which simply divides the available material into 64 equal length tracks? Then any calculation based on random material is moot: roughly 5% of your material will have the same "signature". You also have to watch out for the birthday paradox here. There is about one chance in 364 that two people have the same birthday, but there is better than 1 chance in 2 that some two people out of a group of (I think it's) 24 will have the same birthday (this ignores the tendency of people to be born at certain times of year -- which improves the chances). The good news is that the space is large enough that my intuition says that if there is even a moderate amount of variation, if CDs with only a few entries are rare, and there are no conventions leading to greater uniformity in subpopulations, and you aren't talking about really large CD collections (i.e., your talking about a personal collection rather than a studio or CD company library) then you probably come in easily under the one-in-a-million chance of failure. Of course, my intuition could be wrong. Topher
1665.2	A formula, but caveat user...	SSAG::LARY	Laughter & hope & a sock in the eye	`Thu Sep 17 1992 20:48`	14
	As Topher says, the biggest problem is how to deal with systematic nonrandomness. If you can deal with that, and if your "discriminator" has a large number N of values all about equally likely, and you want to make the probability of a "false discrimination" less than some small value p, then you should limit the size of your collection to S = SQRT(2np). For example, lets say the CD's hold popular music. If its a good assumption that a CD holds at least 8 songs and the "seconds" part of the duration of a song (the "ss" in "mm:ss") is more-or-less uniformly random, then you have a discriminator with 60^8 values. If you want to limit the chance of a false discrimination to one in a million you must limit the size of your CD collection to about 18,000 CD's. [a more precise relationship between S << N and P is: (1-S/(2*N))^(S-1) = 1-P]
1665.3		AUSSIE::GARSON		`Thu Sep 17 1992 22:42`	17
	re .0 Not a mathematical answer but...my mate's CD player can already do this. When you insert a CD the display shows the name of the CD. I asked him how it works. It turns out that he entered the names himself which are stored in some sort of non-volatile memory. This implies to me that the CD player can uniquely identify the CD (and then uses the key to access its 'database'). [Also stored in the database is the default track sequence you would like played for that CD so if you don't like a track you just leave it out etc.] Anyway that left the next question...how does it identify the CD? One guess I came up with is that it can read the bar-coded CD number that seems to be etched on the disk. Anybody know the answer? So...why not try the hi-fi buffs conference or ask in a specialist hi-fi store?
1665.4	A tick in the TOC solution I think!	LARVAE::TREVENNOR_A	A child of init	`Fri Sep 18 1992 13:55`	33
	RE: .all Thanks! I have had good responses here and offline. The concensus seems to be that using commercially available CDs (as opposed to special purpose single track disks) the likelihood of getting two the same is not appreciable, and in any case the consequences of a clash are not catastrophic (embarrasing, yes: "Here M'lady is a multimedia presentation on the life of Beethoven incorporating clips from his musical works, like this one.......just a mo' - thats ZZ Top...oooops). So, I think I'll go with my plan to use the TOC as a footprint. RE: .-1 I looked into whether it was possible to buy a database of barcodes which translated discs codes to text, but the record companies which I contacted (RCA and Phonogram) said they had such a database on paper only, or that they had one which they had never thought about selling and wouldn't feel comfortable selling. RCA hinted darkly that they were "looking at this area". When I thought about the time involved in putting something like that together and keeping it up to date (loadsa record companies to deal with on a rolling basis) I backed off quite fast. If you can find out how your friends system does it then I'd be very interested. I'll also be quite suprised if it isn't TOC based. Could you mail me (to fangio::alan please) if you find out? Many thanks! Thanks to all for your excellent responses. Regards Alan T.