T.R | Title | User | Personal Name | Date | Lines |
---|
1068.1 | | DWOVAX::YOUNG | Sharing is what Digital does best. | Wed Apr 26 1989 07:54 | 1 |
| Because about 50% of all numbers used start with a 1.
|
1068.2 | Where's my slipstick when I need it? | NIZIAK::YARBROUGH | I PREFER PI | Wed Apr 26 1989 12:55 | 6 |
| Someone once remarked that his log table was dirtier near the front, for
essentially the same reason. Of all numbers occurring in nature, a
disproportionate number - I think it's something like ln(radix) - start
with 1. Someone want to work out the exact probability? You can get a good
approximation by measuring the length between 1 and 2 on a slide rule, if
you can still find one. - Lynn
|
1068.3 | Topher will be here soon | HERON::BUCHANAN | Andrew @vbo DTN 828-5805 | Wed Apr 26 1989 16:27 | 84 |
| Hang on. When we say "about 50% of all numbers used start with
a 1" and "something like ln(radix) start with 1. Perhaps someone can
work it out." there is a modelling step involved. What is the basic
probability of a number n being used in nature? And how should that
relate to housenumbers in Galway (Ireland?)
Let me boringly abstract.
We have some probability distribution, p(n), over the whole numbers,
{1,2,...}. We have a bunch of b functions, fb_i(n), from the whole numbers to
the naturals, where b is a base (10 normally) and i runs from 0 through b-1.
fb_i(n) is much easier to describe than to write down. Loosely,
fb_i(n) is the number of occasions the symbol i appears in the representation
in base b of the number n.
What we're interested in is the expectation, Eb_i:
Eb_i = sum(i=1..%) p(n).fb_i(n)
where % denotes infinity.
In particular, what can we say about the relative magnitudes of
the Eb_i for different i?
Clearly this is dependent on p(n). We don't know what p(n) is
exactly, but we do know:
(1) sum(i=1..%) p(n) = 1
(2) for all n, p(n) >= p(n+1)
since it is reasonable to assume that if a street containing house number n+1,
it has house number n. (Random aside, this is not always true. In Edinburgh,
Scotland, where I grew up, the house on the left was number 1, and the house
on the right was number 11. Our house number? 61, of course.)
Now it seems to me that (1) & (2) are enough to prove:
i > i' >= 1 => Eb_i =< Eb_i'
i >= 1 => Eb_0 =< Eb_i
Let me write Eb_i in a slightly non-obvious way as:
sum(c=0..%)sum(k=0..b^c-1)sum(j=0..%) p(j.b^(c+1) + i.b^c + k) for i <> 0
and the very similar:
sum(c=0..%)sum(k=0..b^c-1)sum(j=1..%) p(j.b^(c+1) + k) for i = 0
There is a 1-1 correspondence between the addends of Eb_i and Eb_i':
which maps p(j.b^(c+1) + i.b^c + k) to p(j.b^(c+1) + i'b^c + k). By the
monotonicity of p, we are proved. Comparing i and 0, the result is very
similar except that there are some extra addends corresponding to j=0, which
do not appear in Eb_i.
However, to go further, I think we need to postulate a specific
distribution for the numbers. Let's say that:
p(n) = (1-�).�^(n-1)
where � is in (0,1).
One calculation we can do now is to work out what the expected
number of symbols (digits, where b=10) is:
Eb = sum(i=0..b-1) Eb_i = sum(c=0..%)sum(k=b^c..%) p(k)
= (1/�) sum(c=0..%) �^(b^c)
which I don't think has any neat analytic form. In our case, the answer is
1 + �^9 + �^99 + �^999 + ...
So this means tht we shouldn't expect to see a neat analytic form
for all the constituent Eb_i, and indeed I don't.
Eb_0 = sum(c=0..%) (1-�^(b^c)).(1/�).�^(b^(c+1))/(1-�^(b^(c+1)))
Eb_i = sum(c=0..%) �^(i.b^c-1).(1-�^(b^c))/(1-�^(b^(c+1)))
In no case is there a hint of log(b). Anyone see a log lurking
anywhere?
Andrew.
|
1068.4 | Leading digits. | CADSYS::COOPER | Topher Cooper | Wed Apr 26 1989 17:58 | 30 |
| We're dealing with several different things here.
First the door number problem: We know that streets in the town rarely
have numbers higher than 40 on them (a given). If a typical street has
numbers 1-20, then there is one 1 for 1, ten more for the first digit
of 10-19, and one more, in addition, for twelve 1's out of the twenty
house numbers (involving 31 separate digits). After that (until we get
to 100) there is one 1 out of every 10 house numbers. If there are few
streets with over 40 numbers, than it is not unreasonable to suppose
that most streets have twenty house numbers or a few more. 50% seems
reasonable.
Next the ubiquity of leading low digits. There is a well documented
and oft observed tendency for most of the leading (most significant)
digits to be low (1,2,3) rather than high (7,8,9) in tables of figures.
This is not quite the same as "numbers in nature", since it involves
tabulations of frequently quite artificial quantities and does not
involve quantities like fundamental constants at all. Closer
examination does seem to show a rough logrithmic scale. I've seen
several theoretical explanations of this phenomena, most but not all of
which are similar. The most recent I've seen is an explanation by
Mandelbrot in terms of fractals. It would not be surprising if several
different mechanisms (e.g., rounding practices, rescaling to produce
"nice figures",the effects of overlapping, approximately uniform
distributions, and circumstances like the house numbering phenomenon)
contribute, in different tables to different extents. If you wish to
observe this phenomenon yourself pick up virtually any almanak, pick
some tables, and start counting leading digits.
Topher
|
1068.5 | Not a leading-digits problem | POOL::HALLYB | The Smart Money was on Goliath | Thu Apr 27 1989 09:49 | 24 |
| The interesting part of the problem -- that of leading digits -- has
been around for almost 100 years. I think Knuth has it worked out
somewhere in his first 3 volumes, and it IS logarithmic.
It's not clear how much THAT problem has to do with the question raised
by .0, given that house numbering more resembles the numbers you'd find
in telephone book rather than an almanac. Topher's analysis assumes
conveniently a mean "last house number" of 20, thereby assigning lots of
houses (NPI) to "1" and ignoring the 2-rich 20-29 range. The question
in .0 asked specifically about running out of 1s but not 2s, so it seems
a bit unfair to arbitrarily select 20 as the mean "last house number".
But as it turns out, the same conclusions arise even if you assume a
mean "last house number" of 30. With a mean of 30 you'll have some
streets that go to 31, fewer that go to 32 (edge to the 1s), and some
streets that only go to 27 or 28 (edge to the 1s again). And if you
select a mean "last house number" less than 30 you're certainly leaving
the 2s behind. And with a practical maximum of 40 it's hard to imagine
a mean "last house number" much larger than 30.
So of course they run out of 1s if they order in the same quantities
as the others. And they must have an absolute glut of 0s.
John
|
1068.6 | Very interesting | TRIBES::CREAN | Never play leapfrog with a unicorn | Thu Apr 27 1989 10:32 | 9 |
| Well, that was interesting. I recall hearing once of 'Zipf'
distribution, applied to a problem like this: though perhaps
this is a red herring....
Nice to learn that somebody else uses a slide rule. I still have
mine from the olden days and I find it a tremendous aid to
computation. I keep it beside my terminal and whenever somebody
grabs at my calculator I swipe at their knuckles with it.
|
1068.7 | There isn't one answer. | CADSYS::COOPER | Topher Cooper | Thu Apr 27 1989 12:42 | 30 |
| RE: .5
The problem with saying anyone has "worked the problem out" is that it
is a very complex phenomenon. People can make various assumptions
about how numbers are selected and manipulated for inclusion and they
will get rough or exact exponential laws (if they don't their
assumptions are incorrect or incomplete). It's very hard to prove that
any answer is the complete right one for all or even most tables or
books of tables. I suspect that no single answer *is* the complete
correct one for all tables. Logrithmic laws seem to pop out very
naturally in a variety of ways.
As for the house numbering problem, first off I did *not* talk about
a *mean* "last house number" but a typical last house number.
The closest formal concept is "mode". I wasn't "cheating" when I
used 20 as a number, because I was not attempting to predict that one
would be used heavily. Rather I went from an observation (that ones
disappear disportionately quickly) and posited the value of a
parameter (that the largest house number was typically around 20) that
would explain that observation (other explanations, are, of course
possible; for example, the kid's at the local high school use them to
make signs for sporting events which say "WE ARE NUMBER 1"). I did
not assume it, rather I deduced it. Of course, I checked if it was
plausible, which, with a general upper limit of 40, it seemed to me to
be. Moderate overshoot or undershoot of that parameter, weakens the
extent of the phenomenon but does not do so so much that it fails to
explain the observations (as you point out). The technical term is
that the estimate of the parameter is robust within the problem.
Topher
|
1068.8 | A statistically correct proof, I see | POOL::HALLYB | The Smart Money was on Goliath | Thu Apr 27 1989 14:52 | 20 |
| .7> Rather I went from an observation (that ones
.7> disappear disportionately quickly) and posited the value of a
.7> parameter (that the largest house number was typically around 20) that
.7> would explain that observation
Well seeing as how .4 begins with:
.4> have numbers higher than 40 on them (a given). If a typical street has
.4> numbers 1-20, then there is one 1 for 1, ten more for the first digit
it looks to me like you're starting with a desired solution and then
noticing that it fits the observation, thus deducing
.4> that most streets have twenty house numbers or a few more.
All of which is correct as far as it goes. But that alone is not
sufficient to answer the question, you also need to show the parameter
is robust. My error was failing to deduce that you did that.
John
|
1068.9 | My fault. | CADSYS::COOPER | Topher Cooper | Thu Apr 27 1989 17:08 | 10 |
| Not my day for making myself clear...
I was *not* trying to say that I was right and that you were wrong. On
the contrary, you were quite right to critisize .4. I was only
attempting to explain what I had meant to explain before, but had
failed. Obviously, in doing so, I failed to properly explain what it
was that I was explaining. I certainly hope that this isn't a case of
infinite regress :-).
Topher
|
1068.10 | Try this for insight | NIZIAK::YARBROUGH | I PREFER PI | Thu Apr 27 1989 17:28 | 31 |
| The following BASIC program counts the digits in the sequences 0..n, where
n takes on the values from 0 to 99. It shows that there is a tendency
toward the smaller digits (except for 0, since leading 0's are ignored).
The digits, their frequency, and the percentage of total are displayed.
In other words, if you start numbering things and tend to run out of
objects before you get to a power of 10, this program shows the expected
frequency of digits used.
10 for i = 0 to 9
20 bin(i) = 0
30 next i
100 for i = 0 to 99
110 for j = 0 to i
120 jj% = j
130 if jj% = 0 goto 200
140 kk% = jj% - 10%*(jj%/10%)
150 bin(kk%) = bin(kk%) + 1%
160 jj% = jj%/10%
170 goto 130
200 next j
210 next i
220 print using " #", i; for i = 1 to 9
230 print using " #", 0
310 tot = bin(0)
320 for i = 1 to 9
330 print using " #####", bin(i);
335 tot = tot + bin(i)
340 next i
350 print using " #####", bin(0)
410 print using " .####", bin(i)/tot; for i = 1 to 9
430 print using " .####", bin(0)/tot
|
1068.11 | A sample... | TRIBES::CREAN | Never play leapfrog with a unicorn | Fri Apr 28 1989 10:47 | 10 |
| As an experiment, I collected the house-numbers of all the
people in the department. They are as follows:
21 85 114 25 8 42 22 16 34 68
It is only a small sample, but the range surprises me. We do
not have many long streets in Galway, but the long ones do
have more houses in them....
|