[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1175.0. "Help needed with formula for "MEAN"" by EXIT26::ZIKA () Thu Jan 04 1990 10:31

    I must determine the "mean" of 324 file sizes and data access times.
    
    The Average file size is 3084 blocks and the average access times are
    80 seconds using method one and 71 seconds using method 2.
    
    The Median file size is 1210 blocks with access times of 38 method one 
    and 29 seconds using method 2.
    
    The sum of the squares of the block sizes for all 324 is 1.31939E10
    The sum of the squares of method 1 access times for all 324 is 18,090,568
    The sum of the squares of method 2 access times for all 324 is 16,988,112
    
    Not having a probabilty reference book available I'll venture a guess
    that the formula to determine a "Mean" is
    
    | [(A**2) + (B**2) + (C**2) ...] |
    | ------------------------------ | ** .5
    |             n                  |
    
    or written otherwise square_root_of((sum_of_squares_of_all_numbers)/n)
    
    Could someone please either produce the "means" or verify the above
    formula?  -- Thanks in advance --- Chris
T.RTitleUserPersonal
Name
DateLines
1175.1BEING::POSTPISCHILAlways mount a scratch monkey.Thu Jan 04 1990 10:347
    Re .0:
    
    The mean of a set of numbers is the average of the numbers -- the total
    of the numbers divided by the number of numbers.
    
    
    				-- edp 
1175.2more meansPULSAR::WALLYWally Neilsen-SteinhardtThu Jan 04 1990 12:156
    .1 gives the definition of the arithmetic mean, which is almost always
    meant when the word mean is unqualified.  Note that this is the same as
    the average, in the usual use of both terms.
    
    There is also a geometric mean, the nth root of the product of n
    numbers.  There is also a harmonic mean, whose definition I forgot.
1175.3Don't be averageVMSDEV::HALLYBThe Smart Money was on GoliathThu Jan 04 1990 13:1119
    The harmonic mean is the sample size divided by the sum of the
    reciprocals of the data points.  But that's not important here.
    
    Note that the mean size is 3084 and the median is 1210.  This is
    typical of a skewed distribution, with lots of small files and a
    few very large files.  If you were to plot a histogram you would
    get something like:				       .
    						     ..
    						 ....
    					.........
    			  ..............
    ......................
    
    Where the Y-axis is the size and the X-axis is one point per file,
    sorted by increasing file size.  The average is probably the wrong
    number to use; I would prefer the median.  Or, perhaps the 90th
    percentile if you're looking at upper bounds.
    
      John
1175.4didn't think it could get this complicated?PULSAR::WALLYWally Neilsen-SteinhardtFri Jan 05 1990 15:0439
    re:                        <<< Note 1175.0 by EXIT26::ZIKA >>>

>    I must determine the "mean" of 324 file sizes and data access times.
    
    Based on the comments in .3, you probably need to think more about 
    *why* you must determine this.
    
    Assuming this is a typical business problem, you are evaluating two
    file access methods, and want to know which is "better".  Depending on
    circumstances, this could be a very difficult question.
    
>    The Average file size is 3084 blocks and the average access times are
>    80 seconds using method one and 71 seconds using method 2.
    
    [ just a nit: it can't really be taking you 80 seconds to access a
    file, can it?  Or is this over a wide area net? ]
    
>    
>    The Median file size is 1210 blocks with access times of 38 method one 
>    and 29 seconds using method 2.
    
    This makes method two look better by both measures.  Just for fun, you
    could throw in the third common measure, the mode (defined as the most
    common value).  Determine the relative performance of the two methods
    for the most common file size.
    
    Then if you have not been statted out, you could divide all files into
    quartiles (the smallest one-fourth, the next smallest, the next and the
    largest one fourth).  If method two is better for all quartiles, then
    you can be pretty sure that it is the better method in general.
    
    Trouble comes if one method is better for the most common files, but
    another is better for uncommon files.  It gets really nasty if the
    uncommon files are the ones you care most about.  Then you have to back
    off and ask what does "better" mean in the global sense.  For example,
    you may want to maximize the throughput of the system.  Or you may want
    to minimize the upper bound of the response time.  Your concept of
    goodness then determines which kind of average or other statistic you 
    should use.