[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1827.0. "Running Maximum" by RTL::GILBERT () Wed Dec 22 1993 18:28

    This problem is perhaps more algorithmic than mathematic, but I'll
    let it be subjected to your staggering intellects [i.e., I'll stoop
    to flattery to get a good solution].
    
    An algorithm that gives a 'running average' of the last (say) 1000
    values is fairly straight-forward, using a circular buffer of the
    last 1000 values, and adjusting the average (in O(1) time) for each
    successive number in the sequence.
    
    What is a good way to produce a 'running maximum'?  Scanning the
    previous 1000 values for each new number in the sequence is too slow.

T.R	Title	User	Personal Name	Date	Lines
1827.1		WRKSYS::ARUMUGHAM		`Wed Dec 22 1993 18:46`	3
	Is this a trick question? Why not just keep track of the current maximum, and if the new number is greater, make that the current maximum.
1827.2	Throwing away old values	WIDGET::KLEIN		`Wed Dec 22 1993 22:48`	60
	Strictly speaking, you need to keep the last N (1000) numbers in a list, so that as you get a new one you can throw the oldest one away. If that oldest one was the maximum, then you need to "recompute" or scan for the new maximum, which may be one of the entries that you already had or may be the new one. It can be shown that the worst case (a long sequence of monotonically decreasing values) requires the full storage of the entire reference stream of N values, since the new maximum will always be the N'th oldest entry. Thus, you will require storage for N entries. Even still, there are a couple of computational optimizations that are worth doing. Obviously, treat the 1000-entry array as a circular buffer with a pointer to the next entry to be filled in. Don't copy an entry once you have added it to the array. Once the ring is full, adding a new entry overwrites the oldest entry. Another optimization is to keep a reference count for the maximum value. When removing the oldest entry and it is equal to the maximum value, decrement the reference count. When the reference count goes to zero, scan the entire list to find the new maximum and the number of entries that are equal to that value. If the value of the entry being removed is less than the already existing maximum, simply remove it and continue. Then, when adding the new entry to the list, if it is equal to the already existing maximum value, increment the reference count. If it is greater than the already existing maximum value, replace the maximum value with the new value and set the reference count to one. If it is less than the already existing maximum, simply add it and continue. This optimization wins when there are several entries with the same values because it avoids a rescan until the maximum value really goes down. There is no rescan at all if the value goes up. If the number of entries in the history list (N) is large relative to the domain (set of values each entry can take), then there are going to be a large number of duplicates. Since this likelihood goes up as the number of entries goes up, and the cost of doing a rescan also goes up as the number of entries, the benefits of this optimization increase quadratically as N increases. Reducing storage costs is harder. It would be possible to discard entries that have values smaller than those in the entries to either side of them. More formally, an entry between two other entries that have greater values than the entry in question (as long as the number of entries between the two bounding entries is less than N) can be discarded, since it is guaranteed never to contribute to the overall maximum. So the storage problem can become one of encoding a sparce array of N entries into some storage less than N. But as the problem is defined, being able to maintain accuracy in the worst case requires that the full array be stored. -steve-
1827.3	A shot at O(1)?	SSAG::LARY	Laughter & hope & a sock in the eye	`Thu Dec 23 1993 03:38`	77
	I think you are going to be saddled with O(Log N) time per step, unless the range of the inputs is small relative to N (in which case the ref count stuff in .2 will apply). One way to think of this is as an N-element tournament sort where instead of throwing away the largest entry in every step you throw away the oldest entry. Its not an exact analogy, because the random ranking of the numbers you throw away causes the tournament tree to become unbalanced, but I think with the proper choice of data structures (AVL trees or skip lists) its still O(log N). The only trick I can see to try to get O(1) time from this is to utilize the fact that when you enter a new number X into the window, you not only throw out the number that entered the window N steps ago but you can throw out every number < X in the window, since none of them will never get to become the running maximum now that X is in there. This leads to an algorithm of the form: /* Disclaimer: its late and I'm going skiing in the morning, so this code probably won't work, but maybe the idea behind it can be salvaged... / int array[N]; / circular (wraparound) window ranked in descending order / int t_stamp[N]; / parallel window to "array" holding time stamps / int winsize, offset; / number of entries in window, index of 1st entry / int x, pos, time; winsize = 0; offset = 0; / initialize offset of window in "array" / time = 0; while (more_input_elements()) { time = (time + 1) % N; / advance time, modulo N / / Throw out the oldest entry if it is too old. The largest entry in the window, i.e. the one at position "offset", is always the oldest because it threw out all older smaller entries when it was entered into the window. / if (winsize > 0 && t_stamp[offset] == time) { offset = (offset + 1) % N; winsize--; } x = new_input_value(); / The routine bin_search(array, winsize, offset, x) returns the index into "array" of the first entry in the window (i.e. the array indices "offset" through "offset+winsize-1 mod N", wrapping around) whose value is less than "x"; if all entries in the window are >= "x" then bin_search returns the value "offset+winsize mod N". / pos = bin_search(array, winsize, offset, x); / Truncate the window so as to throw out all entries < x / winsize = pos - offset; if (winsize < 0) winsize += N; / add new entry to the window, time-stamp it, and bump window size / array[pos] = x; t_stamp[pos] = time; winsize++; / The running maximum is the largest element in the window */ output_running_maximum(array[offset]); } Now, obviously in pathological cases (like an input stream in descending order) this algorithm takes O(log N) running time (dominated by the binary search); just as obviously, if the input is in ascending order its O(1). The interesting question is, what is its running time for RANDOM input data? Time to get out Knuth and look at all the good stuff on algorithm analysis, except I'm going to bed... Richie
1827.4	O(log log N) per step	SSAG::LARY	Laughter & hope & a sock in the eye	`Tue Jan 04 1994 22:19`	6
	I ran the algorithm in .3 for various values of N; the average size of the window is O(log N), so the algorithm averages O(log log N) time per step. Better than I feared, not as well as I'd hoped. Actually, the average number of values in the window appears to be equal to SUM(i=1 to N)(1/i), or ln(N)+0.57...
1827.5	Finite (relatively small) number of values.	CADSYS::COOPER	Topher Cooper	`Wed Jan 05 1994 14:56`	11
	Just thought that I would point out the obvious and mention that if there are a fixed number of possible values (M), an appropriate table of size M combined with the previously mentioned fixed length queue can reduce the time to O(M) with a pretty small proportionality factor on M. Space required would be O(M+N) (with "N" being the window size). There might be some clever data structure to reduce the time to O(1) with such a space requirement, and something like O(log(M)) seems even more likely. Topher
1827.6	Refined estimate.	CADSYS::COOPER	Topher Cooper	`Wed Jan 05 1994 15:13`	10
	RE: .5 (myself) Some brief reflection on my reasons for believing that the previous method was "O(M) with a pretty small proportionality factor on the M" lead to the following refinement (conjecture or "a result justified by some amount of handwaving"): The expected (not worst-case) time is of order M/N�. That means that if you can afford the space for the additional table and M<N�loglogN then its pretty well justified. Topher
1827.7	please tell us more	RANGER::BRADLEY	Chuck Bradley	`Thu Jan 06 1994 11:52`	9
	what is the reason for this problem? where did it come from? the running average is usually a substitute for the average which is not available. why use the running max when the max so far is easier to compute and is a better estimate of the max of the complete set? the math is interesting, the problem is interesting, and i suspect the context for the problem would be even more interesting.
1827.8		RTL::GILBERT		`Thu Jan 06 1994 14:28`	18
	Steve, Richie, Topher, Many Thanks! Answering a few questions: This problem comes from presenting the 'errors' in the implementation of elementary math functions (sqrt, sin, tan, etc). There are many possible 'error' values (M in reply .5), but they are easily rounded to give M=1000 or fewer distinct values. The distribution of numbers is pseudo-random; for N=1000 consecutive points, they look like a linear congruential random number generator produced them. For N=100000, it's more like the sum from a few such RNGs. The easy approach is to simply output the max after every 1000 values, and reset to look for a new one, but Richie's curious queue made posing the question worthwhile.