T.R | Title | User | Personal Name | Date | Lines |
---|
1735.1 | | STAR::ABBASI | i am therfore i think | Fri Mar 26 1993 15:06 | 42 |
|
some more notes to look at in case it might be relevent:
406 TOOLS::STAN 14-DEC-1985 0 Math Notes Statistics
738 NAC::PICKETT 23-JUL-1987 1 Statistical Inference
823 SPYDER::TURNER 3-FEB-1988 1 Statistical Packages Available?
1065 BEING::POSTPISCHIL 21-APR-1989 9 Statistics Books
1145 MILKWY::JANZEN 30-OCT-1989 17 Chi-Square X^2 calculation (sta
1184 DWOVAX::YOUNG 22-JAN-1990 10 HELP! Compound Statistical Esti
1394 SMEGIT::ARNOLD 5-MAR-1991 24 Elementary (?) Statistics
1447 ATSE::GOODWIN 23-MAY-1991 23 Statistics can be (mis)used to
1481 CSSE::NEILSEN 15-AUG-1991 11 another statistical conundrum
1483 SOLVIT::DESMARAIS 22-AUG-1991 4 Statistical Data Anal in the Co
1489 MEIS::SCHAUBER 4-SEP-1991 3 Statistical Functions Lib.
1610 MINDER::WIGLEYA 18-MAY-1992 2 Wanted: Sanity check onmy rust
1625 EPIK::FINNERTY 10-JUN-1992 4 Computing the tstatistic for r
1665 LARVAE::TREVENNOR_A 17-SEP-1992 4 Calculating a statistical likel
1710 MARVA2::RAK 13-JAN-1993 8 Needed: Slick StatisticTrick
1735 GOCELT::RAK 26-MAR-1993 0 Statistical Mean andVariance T
End of requested listing
|
1735.2 | Mean variance is the mean of the variances. | CADSYS::COOPER | Topher Cooper | Fri Mar 26 1993 16:39 | 63 |
| I'm going to answer your second question first about the proper
"average" variance to use.
Brief answer: use the sum of the variances divided by the n. You can
take the square root of that and get an "average" standard deviation,
if you wish.
Explanation: If you have a bunch of ordinary variables x1, x2, ..., xn
if you want to take the "average" you add them together and then you
"rescale" them to the same units as the original by dividing by n.
What you end up with is something that is a kind of rough stand in
for the seperate values, which we will call x*.
Now imagine that you do the same thing, over and over again. Each time
you take the average. Each set of variables xi is a sample. There
will be a distribution of values that each of the xi will take (perhaps
each of the xi will have the same distribution of values, perhaps not).
In other words, we are now talking about a set of *random* variables
which we will call X1, X2, ..., XN. The x* (the average) produced with
each sample, also will have a distribution associated with it (which
can, in principle at least be derived from the distributions of the
Xi), and is therefore "really" a random variable X*. The mean (pretend
that doesn't mean the same thing as "average") and the
variance/standard deviation are descriptors of the distributions (one
being an indicator of the "location" of the distribution and the others
being an indicator of the "scale" of the distribution). We want to
know what the mean and the variance of X* are.
Let's add one more random variable, X+, which is the sum of the Xi's.
X* is then X+/n. What is the mean of X*? Well the mean is a "linear
operator". That has two important consequences:
1) The mean of the sum of random variables is the sum of their
means. ( E(A+B) = E(A) + E(B) ).
2) The mean of a random variable divided by a constant is the
same as dividing the mean of the random variable by the
constant. ( E(A/c) = E(A)/c )
So the mean of X+ is the sum of the means of the Xi, and the mean of
X* (= X+/n) is the mean-of-X+ divided by n. This is what justifies
the procedure of finding the "average" mean by taking the average of
the means.
What about the *variance* of our average value (X*)? Well, it turns out
that the variance is *also* linear (actually not surprising when you
remember that the variance is a mean: its the mean of the squared
deviations from the overall mean), so the same reasoning applies:
the variance of the average is the average of the variances.
The standard deviation is *not* linear, however. Since it does
represent a "scale" you can multiply it by some scalar and get something
meaningful, but you should never add standard deviations to each other,
it doesn't mean anything.
Note that since the "average" is the same thing as the "mean", the above
says that you will get the same result however you divide up the
samples into subsamples and take the averages of the averages. You
will get the mean and the variance of the whole thing.
I'll try to address your other problem as soon as I have a moment.
Topher
|
1735.3 | How about using SPC? | YIELD::FANG | | Tue Mar 30 1993 14:59 | 49 |
| Another way to look at your questions is via the methodology of
Statistical Process Control (SPC). As long as you're collecting
independant samples from a stable process, you can check the mean and
variance of each new sample against your past historical data. SPC
provides a nice simple graphical technique to apply the tests. I'll
attempt to describe the tests as best I can.
The Mean test would be accomplished by testing the sample mean against
a confidence interval (usually .9973) described by � �
3*sigma/sq.root(n). Where n=50 in this case, and sigma is the estimate
of the population sigma. The process mean is the mean of all the values
in your database, or the mean of your sample means (where each sample
represents 50 values). Sigma can be estimated from all your historical
data using the standard formula,i.e., why deal with sample variances or
standard deviations if you have all the historical data which you can
calculate the total sigma on.
The variance test can be accomplished by testing the variance against
limits set by:
______
Upper Limit = VAR(x)
------ * Chi�(alpha/2,n-1)
n-1
______
Center Line = VAR(x)
______
Lower Limit = VAR(x)
------ * Chi�(1-alpha/2,n-1)
n-1
where alpha is the risk of a Type I error,
Chi�(alpha,n) is the alpha percentage point of the chi-square
distribution with n degrees of freedom,
______
VAR(x) is the mean variance
Actually, it is more common to test the standard deviation rather than
the variance in SPC.
The SPC approach was started in the 20's by Shewhart. There are
multiple `tests' one can do to the mean and sigma to look for anything
unusual in the sample, all with the intent of keeping the alpha and
beta risks of the tests as small as possible.
Hope this is of some small help.
Peter
|
1735.4 | Question on SPC rejection region | MARVA1::RAK | | Thu Apr 01 1993 10:25 | 7 |
|
Thanks for the information on SPC. It will help us. From your
explanation of the mean test, I wonder why the rejection region
is so small (.0037) compared to the conventional .05 in
textbook hypothesis testing.
Lam
|
1735.5 | False rejections more costly. | CADSYS::COOPER | Topher Cooper | Thu Apr 01 1993 18:09 | 39 |
| RE: .4
Different purposes. The number concerned represents the number of
"false positives" you are willing to tolerate.
In conventional hypothesis testing the false positive is false
rejection of the null hypothesis. That is, you will tentatively
conclude -- (tentatively because if it is important enough there will
presumably be at least one replication, and if it is less important
it will still probably lead to later contradictions which will cause
it to be challenged -- only if it is completely unimportant will it
not be re-examined) -- that something "interesting" is going on.
The cost of a false positive is relatively low -- when it is not,
stricter criteria are used (e.g., the second standard of .01). A high
alpha means fewer false negatives -- you are less likely to miss
something interesting.
In SPC if you get a "positive" then you throw away the batch, shut
down the line for inspection, send everyone to the bomb shelters, order
additional, costly tests or whatever. A false positive is expensive
and you cannot tolerate as high a level as you can in research.
Therefore you set it low -- and
I'm not sure just where that particular standard value came from. Its
not clear that a standard value is appropriate at all. In research the
costs and the rewards are quite intangible. It is, therefore hard to
do detailed cost/benefit analysis to determine the proper alpha. A
rule of thumb is therefore appropriate (though whether the one we have
ended up with is the one that we *should* be using is an interesting
question). In SPC, generally, the costs and benefits of various
choices are much easier to quantify and so cost/benefit analysis would
seem to be the way to go.
(Probably argued in the SPC literature which I am almost totally
ignorant of).
Topher
|
1735.6 | And... | YIELD::FANG | | Tue Apr 06 1993 16:30 | 19 |
| I think the previous reply (.5) summed it up quite nicely. I would only
add the following:
The .0037 is with only 1 `rule' applied. Very commonly, the Western
Electric rules and/or Nelson rules are applied which increase alpha to
a level ~.01 or greater. Examples are: 2 of 3 points beyond the 2-sigma
limit, 6 consecutive points increasing or decreasing in the same
direction. So, the alpha risk isn't really that low. The advantage of
applying more rules (weighed against the increased alpha) is the
greater `statistical power' of the test: the ability to reject Ho when
H1 is true.
Topher is right in that the economics of the situation should drive the
choices. There is a chapter in a book called ``Introduction to
STATISTICAL QUALITY CONTROL'' by Montgomery (publ.=Wiley) that is
entitled "Economic Design of Control Charts". You can guess what this
chapter is about!
Peter
|