[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

652.0. "standard deviation" by SKYLRK::RICHARD () Mon Jan 19 1987 15:03

    Here is an easy one.
    
    What is the formula for the standard deviation for n samples?
    I would look it up except as a general rule I give or throw away
    books I haven't looked at in a year and there are no math books
    in the sales library.
    
    Thank you,
    
    Gregory

T.R	Title	User	Personal Name	Date	Lines
652.1	variance^1/2	MODEL::YARBROUGH		`Mon Jan 19 1987 15:59`	12
	> What is the formula for the standard deviation for n samples? It's the square root of the mean of the squares of the differences between the observed values and the population mean. pop.mean = (sum(1..n) x[i])/n variance = (sum(1..n) (pop.mean-x[i])^2)/n std. dev. = sqrt (variance) Caveat: this calculation is subject to severe rounding errors.
652.2	another version	ESTORE::ROOS		`Tue Jan 20 1987 14:39`	16
	Two things: 1. Concerning .1's reply: The standard deviation for the population has an n in the denominator, but the standard deviation for a sample of the population has a n-1 in the denominator. 2. Another version for S.D.: variance = (sum(1..n) x[i]^2 - (sum (1..n) x[i])^2)/n (for population) variance = (sum(1..n) x[i]^2 - (sum (1..n) x[i])^2)/(n-1) (for a sample of a population) S.D. = sqrt (variance)
652.3		CLT::GILBERT	eager like a child	`Wed Jan 21 1987 01:12`	2
	I seem to recall that the standard deviation in .1 is numerically stable. The version in .2 can suffer from large round-off errors.
652.4		COGITO::ROTH		`Wed Jan 21 1987 09:09`	4
	I also agree with .3, the version in .2 is sometimes convenient for analysis but can suffer a negative square root due to round off errors. - Jim
652.5	One Pass Algorithm with Two Pass Accuracy	VAXAGB::BELDIN	Dick Beldin - 'Truth will Out'	`Mon May 04 1987 11:33`	46
	A common problem is to calculate the Standard Deviation from data stored in a file with only one pass and a static size of memory requirements. The following algorithm provides accuracy equivalent to the two pass calculation (1st mean, 2nd mean-squared-deviation). Let n symbolize the number of observations already seen, x = the most recently read value from the file, mean = the arithmetic mean of all values read so far, sumsquares = the sum of squared deviations from the mean. Initialize the following (real) variables (in Pascal notation): mean := 0; sumsquares := 0; n := 0; Then with the following algorithm, begin Read_an_Observation(x); n := n+1; d := ( x - mean ) ) / n; mean := mean + d; sumsquares := sumsquares + (n-1) * d * d; end; After the last observation is processed, calculate Population_Variance := sumsquares / n; Sample_Variance := sumsquares / (n-1); and the standard deviations are the square roots of the respective variances. This algorithm has been known for some twenty years. I no longer have any references to it.
652.6	Single-pass calculation of quantiles	SSDEVO::LARY		`Wed May 06 1987 17:54`	12
	In a similiar vein, there is an aricle in the October 1985 issue of Communications of the ACM on a heuristic algorithm for calculating arbitrary quantiles (a p-quantile of a distribution, 0<=p<=1, is the value below which 100p percent of the distribution lies - the 0.5-quantile is the median) with a single pass through the data and a very small amount of working storage. It is claimed that this algorithm, run on a set of samples of a distribution, produces an approximation of any quantile essentially as good as the brute force approach (which involves partially ordering the data, which takes as much memory as sorting it), provided the distribution does not have a discontinuity near the desirred quantile. One of the authors of the paper, Raj Jain, works for Digital.
652.7	What Comprises The STD DEVIATION ?	ADCSRV::RBROWN	Are there no work houses ?	`Mon Jul 01 1991 10:13`	16
	Picking up on the standard deviation question. We're looking at it as a representative of "confidance" for performance data. That is, the closer the std deviation is to 0, the more likely our performance numbers are to being what they should be. The std deviation is overlayed on a series of bar charts. Hence, if the std deviation is low, then the corresponding bar is probably more likly to be true. A bar with a high std deviation may mean that the bar represents a large series of peaks/valleys, perhaps a run-away process, etc ... This bar would be brought into question. Question though, what percentage of the overall data is represented by the std deviation ? We beleive it to be 80%, that is 80% percent of the numbers will fall into the range being specified. I've checked through several books and can't seem to find a figure for this. Thanks !
652.8	67% is a better (but not perfect) coverage rate	PULPO::BELDIN_R		`Mon Jul 01 1991 10:57`	20
	There is no simple answer. When the distribution is approximately normal, about 2/3 of the observations will be within one standard deviation of the mean. Any skew, excessive flattening or peakedness will distort this figure. You can see the impact by running the calculations with distribution functions given in most introductory texts on statistical and probability theory. Approximate normality is common where the deviations are many, as likely to be high as low, and small. If there is a single very large deviation that dominates the randomness, normality will typically be violated. As long as you don't base any critical decisions on the 2/3 figure, it is a reasonable approximation for practical work. Dick
652.9		VMSDEV::HALLYB	The Smart Money was on Goliath	`Mon Jul 01 1991 21:22`	12
	You really should consider pitching some percentage of your datapoints as outliers. In performance work especially, you get oddball timings that represent disk read errors or spurious datacomm path outages or... While the -frequency- of such outliers may be important, their -value- is almost surely unreliable and will tend to distort your other points. Also be sure of the distribution you are measuring. Interarrival times, for example, are often exponential in nature and the standard deviation really isn't very helpful there, if you know what I "mean". John
652.10	Is it normal?	PAKORA::PFANG		`Tue Jul 02 1991 04:31`	16
	You can get an idea of how close your data follows a normal (aka Gaussian) distribution by plotting it on a Normal Probability axis. If it falls approximately in a straight line, you get some confidence the data may be normally distributed. If the line has curvature to it, you may be dealing with a different distribution, for example exponential (as mentioned in the previous reply). If you get points at the end that don't fall on the line, you may have outliers (also mentioned previously). Do you really want the standard deviation, or do you want some kind of confidence interval for your data? The standard deviation has a direct interpretation if your data is normally distributed. But if you have another situation (not normal and/or outliers) then there are more `robust' measures of the variation of the data. Peter
652.12	what am I missing?	NOVA::FINNERTY	lies, damned lies, and the CAPM	`Wed Jul 27 1994 15:26`	8
	re: expected value what does E(r) = .80 mean? Do you mean that the average outcome over all possible outcomes is a 20% loss? Sounds unattractive, to say the least!
652.13	"30" is a private jokelet	VMSDEV::HALLYB	Fish have no concept of fire	`Wed Jul 27 1994 16:31`	15
	I think E(�) == 0 implies a fair game, so a loss would be negative. You probably need to know the underlying distribution to answer the question. If you assume a near-normal distribution then ... use a Monte Carlo simulation to compare any two strategies. With a little bit of work you could probably whip up a neat 3-d graph showing probability of ruin-before-doubling as a function of mean and variance. Unless Jim ("On the other hand, maybe 30 is about right" :-) Finnerty comes up with a closed form solution. John (forget the thinking, just go for 30)