T.R | Title | User | Personal Name | Date | Lines |
---|
1827.1 | | WRKSYS::ARUMUGHAM | | Wed Dec 22 1993 18:46 | 3 |
| Is this a trick question? Why not just keep track of the current
maximum, and if the new number is greater, make that the current
maximum.
|
1827.2 | Throwing away old values | WIDGET::KLEIN | | Wed Dec 22 1993 22:48 | 60 |
| Strictly speaking, you need to keep the last N (1000) numbers in a list,
so that as you get a new one you can throw the oldest one away. If that
oldest one was the maximum, then you need to "recompute" or scan for the
new maximum, which may be one of the entries that you already had or may
be the new one.
It can be shown that the worst case (a long sequence of monotonically
decreasing values) requires the full storage of the entire reference
stream of N values, since the new maximum will always be the N'th oldest
entry.
Thus, you will require storage for N entries.
Even still, there are a couple of computational optimizations that are
worth doing.
Obviously, treat the 1000-entry array as a circular buffer with a pointer
to the next entry to be filled in. Don't copy an entry once you have added
it to the array. Once the ring is full, adding a new entry overwrites the
oldest entry.
Another optimization is to keep a reference count for the maximum value.
When removing the oldest entry and it is equal to the maximum value, decrement
the reference count. When the reference count goes to zero, scan the entire
list to find the new maximum and the number of entries that are equal to
that value. If the value of the entry being removed is less than the
already existing maximum, simply remove it and continue.
Then, when adding the new entry to the list, if it is equal to the already
existing maximum value, increment the reference count. If it is greater than
the already existing maximum value, replace the maximum value with the
new value and set the reference count to one. If it is less than the
already existing maximum, simply add it and continue.
This optimization wins when there are several entries with the same values
because it avoids a rescan until the maximum value *really* goes down. There
is no rescan at all if the value goes up.
If the number of entries in the
history list (N) is large relative to the domain (set of values each entry
can take), then there are going to be a large number of duplicates. Since
this likelihood goes up as the number of entries goes up, and the cost of
doing a rescan also goes up as the number of entries, the benefits of this
optimization increase quadratically as N increases.
Reducing storage costs is harder.
It would be possible to discard entries that have values smaller than
those in the entries to either side of them. More formally, an entry between
two other entries that have greater values than the entry in question
(as long as the number of entries between the two bounding entries is less
than N) can be discarded, since it is guaranteed never to contribute
to the overall maximum.
So the storage problem can become one of encoding a sparce array of N entries
into some storage less than N. But as the problem is defined, being able
to maintain accuracy in the worst case requires that the full array be stored.
-steve-
|
1827.3 | A shot at O(1)? | SSAG::LARY | Laughter & hope & a sock in the eye | Thu Dec 23 1993 03:38 | 77 |
| I think you are going to be saddled with O(Log N) time per step, unless
the range of the inputs is small relative to N (in which case the ref count
stuff in .2 will apply).
One way to think of this is as an N-element tournament sort where instead of
throwing away the largest entry in every step you throw away the oldest entry.
Its not an exact analogy, because the random ranking of the numbers you throw
away causes the tournament tree to become unbalanced, but I think with the
proper choice of data structures (AVL trees or skip lists) its still O(log N).
The only trick I can see to try to get O(1) time from this is to utilize the
fact that when you enter a new number X into the window, you not only throw
out the number that entered the window N steps ago but you can throw out every
number < X in the window, since none of them will never get to become the
running maximum now that X is in there. This leads to an algorithm of the form:
/* Disclaimer: its late and I'm going skiing in the morning, so this code
probably won't work, but maybe the idea behind it can be salvaged...
*/
int array[N]; /* circular (wraparound) window ranked in descending order */
int t_stamp[N]; /* parallel window to "array" holding time stamps */
int winsize, offset; /* number of entries in window, index of 1st entry */
int x, pos, time;
winsize = 0;
offset = 0; /* initialize offset of window in "array" */
time = 0;
while (more_input_elements())
{
time = (time + 1) % N; /* advance time, modulo N */
/* Throw out the oldest entry if it is too old. The largest entry
in the window, i.e. the one at position "offset", is always the
oldest because it threw out all older smaller entries when it was
entered into the window.
*/
if (winsize > 0 && t_stamp[offset] == time)
{
offset = (offset + 1) % N;
winsize--;
}
x = new_input_value();
/* The routine bin_search(array, winsize, offset, x) returns the index
into "array" of the first entry in the window (i.e. the array
indices "offset" through "offset+winsize-1 mod N", wrapping around)
whose value is less than "x"; if all entries in the window
are >= "x" then bin_search returns the value "offset+winsize mod N".
*/
pos = bin_search(array, winsize, offset, x);
/* Truncate the window so as to throw out all entries < x */
winsize = pos - offset;
if (winsize < 0) winsize += N;
/* add new entry to the window, time-stamp it, and bump window size */
array[pos] = x;
t_stamp[pos] = time;
winsize++;
/* The running maximum is the largest element in the window */
output_running_maximum(array[offset]);
}
Now, obviously in pathological cases (like an input stream in descending order)
this algorithm takes O(log N) running time (dominated by the binary search);
just as obviously, if the input is in ascending order its O(1). The interesting
question is, what is its running time for RANDOM input data? Time to get out
Knuth and look at all the good stuff on algorithm analysis, except I'm going
to bed...
Richie
|
1827.4 | O(log log N) per step | SSAG::LARY | Laughter & hope & a sock in the eye | Tue Jan 04 1994 22:19 | 6 |
| I ran the algorithm in .3 for various values of N; the average size of the
window is O(log N), so the algorithm averages O(log log N) time per step.
Better than I feared, not as well as I'd hoped.
Actually, the average number of values in the window appears to be equal to
SUM(i=1 to N)(1/i), or ln(N)+0.57...
|
1827.5 | Finite (relatively small) number of values. | CADSYS::COOPER | Topher Cooper | Wed Jan 05 1994 14:56 | 11 |
| Just thought that I would point out the obvious and mention that if
there are a fixed number of possible values (M), an appropriate table
of size M combined with the previously mentioned fixed length queue can
reduce the time to O(M) with a pretty small proportionality factor on
M. Space required would be O(M+N) (with "N" being the window size).
There might be some clever data structure to reduce the time to O(1)
with such a space requirement, and something like O(log(M)) seems even
more likely.
Topher
|
1827.6 | Refined estimate. | CADSYS::COOPER | Topher Cooper | Wed Jan 05 1994 15:13 | 10 |
| RE: .5 (myself)
Some brief reflection on my reasons for believing that the previous
method was "O(M) with a pretty small proportionality factor on the M"
lead to the following refinement (conjecture or "a result justified by
some amount of handwaving"): The expected (not worst-case) time is
of order M/N�. That means that if you can afford the space for the
additional table and M<N�loglogN then its pretty well justified.
Topher
|
1827.7 | please tell us more | RANGER::BRADLEY | Chuck Bradley | Thu Jan 06 1994 11:52 | 9 |
| what is the reason for this problem? where did it come from?
the running average is usually a substitute for the average which
is not available.
why use the running max when the max so far is easier to compute and is a
better estimate of the max of the complete set?
the math is interesting, the problem is interesting, and i suspect the
context for the problem would be even more interesting.
|
1827.8 | | RTL::GILBERT | | Thu Jan 06 1994 14:28 | 18 |
| Steve, Richie, Topher,
Many Thanks!
Answering a few questions:
This problem comes from presenting the 'errors' in the implementation of
elementary math functions (sqrt, sin, tan, etc).
There are many possible 'error' values (M in reply .5), but they are easily
rounded to give M=1000 or fewer distinct values.
The distribution of numbers is pseudo-random; for N=1000 consecutive points,
they look like a linear congruential random number generator produced them.
For N=100000, it's more like the sum from a few such RNGs.
The easy approach is to simply output the max after every 1000 values, and
reset to look for a new one, but Richie's curious queue made posing the
question worthwhile.
|