| .1 gives the definition of the arithmetic mean, which is almost always
meant when the word mean is unqualified. Note that this is the same as
the average, in the usual use of both terms.
There is also a geometric mean, the nth root of the product of n
numbers. There is also a harmonic mean, whose definition I forgot.
|
| The harmonic mean is the sample size divided by the sum of the
reciprocals of the data points. But that's not important here.
Note that the mean size is 3084 and the median is 1210. This is
typical of a skewed distribution, with lots of small files and a
few very large files. If you were to plot a histogram you would
get something like: .
..
....
.........
..............
......................
Where the Y-axis is the size and the X-axis is one point per file,
sorted by increasing file size. The average is probably the wrong
number to use; I would prefer the median. Or, perhaps the 90th
percentile if you're looking at upper bounds.
John
|
| re: <<< Note 1175.0 by EXIT26::ZIKA >>>
> I must determine the "mean" of 324 file sizes and data access times.
Based on the comments in .3, you probably need to think more about
*why* you must determine this.
Assuming this is a typical business problem, you are evaluating two
file access methods, and want to know which is "better". Depending on
circumstances, this could be a very difficult question.
> The Average file size is 3084 blocks and the average access times are
> 80 seconds using method one and 71 seconds using method 2.
[ just a nit: it can't really be taking you 80 seconds to access a
file, can it? Or is this over a wide area net? ]
>
> The Median file size is 1210 blocks with access times of 38 method one
> and 29 seconds using method 2.
This makes method two look better by both measures. Just for fun, you
could throw in the third common measure, the mode (defined as the most
common value). Determine the relative performance of the two methods
for the most common file size.
Then if you have not been statted out, you could divide all files into
quartiles (the smallest one-fourth, the next smallest, the next and the
largest one fourth). If method two is better for all quartiles, then
you can be pretty sure that it is the better method in general.
Trouble comes if one method is better for the most common files, but
another is better for uncommon files. It gets really nasty if the
uncommon files are the ones you care most about. Then you have to back
off and ask what does "better" mean in the global sense. For example,
you may want to maximize the throughput of the system. Or you may want
to minimize the upper bound of the response time. Your concept of
goodness then determines which kind of average or other statistic you
should use.
|