T.R | Title | User | Personal Name | Date | Lines |
---|
1665.1 | Bad news and good news. | CADSYS::COOPER | Topher Cooper | Thu Sep 17 1992 18:35 | 36 |
| I'm afraid that there just isn't enough information here. We need to
know how long an entry is on the average. We need to know how much
variation there tends to be on that length. We need to know the
pattern of lengths that are typical. We need to know the same
information for "gaps" which do not fall into any TOC entry. We need
to know the degree of tendency for the TOC entries to be ordered. We
need to know if there is any tendency to "program" a short entry after
a long one, or other such patterns. We need to know (depending on how
we model this, it may or may not be independent) the probability of
there being exactly 64 entries, the probability of there being exactly
63 entries, etc.
To illustrate the problem: what if there were a commonly used
convention -- used, say, on half the CD produced by one company which
published 10% of the collection -- which simply divides the available
material into 64 equal length tracks? Then any calculation based on
random material is moot: roughly 5% of your material will have the
same "signature".
You also have to watch out for the birthday paradox here. There is
about one chance in 364 that two people have the same birthday, but
there is better than 1 chance in 2 that *some* two people out of a
group of (I think it's) 24 will have the same birthday (this ignores
the tendency of people to be born at certain times of year -- which
improves the chances).
The good news is that the space is large enough that my intuition says
that if there is even a moderate amount of variation, if CDs with only
a few entries are rare, and there are no conventions leading to greater
uniformity in subpopulations, and you aren't talking about really large
CD collections (i.e., your talking about a personal collection rather
than a studio or CD company library) then you probably come in easily
under the one-in-a-million chance of failure. Of course, my intuition
could be wrong.
Topher
|
1665.2 | A formula, but caveat user... | SSAG::LARY | Laughter & hope & a sock in the eye | Thu Sep 17 1992 21:48 | 14 |
| As Topher says, the biggest problem is how to deal with systematic
nonrandomness. If you can deal with that, and if your "discriminator"
has a large number N of values all about equally likely, and you want to
make the probability of a "false discrimination" less than some small
value p, then you should limit the size of your collection to S = SQRT(2*n*p).
For example, lets say the CD's hold popular music. If its a good assumption
that a CD holds at least 8 songs and the "seconds" part of the duration of a
song (the "ss" in "mm:ss") is more-or-less uniformly random, then you have a
discriminator with 60^8 values. If you want to limit the chance of a false
discrimination to one in a million you must limit the size of your CD
collection to about 18,000 CD's.
[a more precise relationship between S << N and P is: (1-S/(2*N))^(S-1) = 1-P]
|
1665.3 | | AUSSIE::GARSON | | Thu Sep 17 1992 23:42 | 17 |
| re .0
Not a mathematical answer but...my mate's CD player can already do
this. When you insert a CD the display shows the name of the CD. I
asked him how it works. It turns out that he entered the names himself
which are stored in some sort of non-volatile memory. This implies to
me that the CD player can uniquely identify the CD (and then uses the
key to access its 'database'). [Also stored in the database is the
default track sequence you would like played for that CD so if you don't
like a track you just leave it out etc.]
Anyway that left the next question...how does it identify the CD? One
guess I came up with is that it can read the bar-coded CD number that
seems to be etched on the disk. Anybody know the answer?
So...why not try the hi-fi buffs conference or ask in a specialist
hi-fi store?
|
1665.4 | A tick in the TOC solution I think! | LARVAE::TREVENNOR_A | A child of init | Fri Sep 18 1992 14:55 | 33 |
|
RE: .all
Thanks! I have had good responses here and offline. The concensus
seems to be that using commercially available CDs (as opposed to
special purpose single track disks) the likelihood of getting two the
same is not appreciable, and in any case the consequences of a clash
are not catastrophic (embarrasing, yes: "Here M'lady is a multimedia
presentation on the life of Beethoven incorporating clips from his
musical works, like this one.......just a mo' - thats ZZ Top...oooops).
So, I think I'll go with my plan to use the TOC as a footprint.
RE: .-1
I looked into whether it was possible to buy a database of barcodes
which translated discs codes to text, but the record companies which I
contacted (RCA and Phonogram) said they had such a database on paper
only, or that they had one which they had never thought about selling
and wouldn't feel comfortable selling. RCA hinted darkly that they were
"looking at this area". When I thought about the time involved in
putting something like that together and keeping it up to date (loadsa
record companies to deal with on a rolling basis) I backed off quite
fast. If you can find out how your friends system does it then I'd be
very interested. I'll also be quite suprised if it isn't TOC based.
Could you mail me (to fangio::alan please) if you find out? Many
thanks!
Thanks to all for your excellent responses.
Regards
Alan T.
|