| Generically, a B-Tree is a fairly complex data structure used to organize
ordered lists of data. A number of hub products use B-Trees internally to manage
the repeater MAC address database, the list of last addresses seen on each
repeater port. Every manageable repeater has this list for its own ports;
management agent devices which do repeater proxy management, such as the MAM,
DECagent 90, and proxying repeaters like the DECrepeater 90TS, also have this
list for all the repeaters they manage.
On any given module, entries get added to the MAC address database as they are
detected. Repeaters will immediately get these addresses from their MAC chips.
Management agents will get addresses by regular polling of the managed
repeaters, really just sampling the total stream of MAC addresses flowing
through a given port, since the polling rate is much lower than the packet rate.
Each time a module adds an entry to its database, it timestamps the entry. If
the module gets the same entry again, it updates the timestamp. Periodically,
the database is swept to check for old entries to remove, known as "aging out".
When one of these entries is found, it is deleted from the database. Thus an
address that is seen only once on a repeater port will soon age out and be
removed from the database. An address that is seen repeateadly will remain in
the database indefinitely, until that node remains quiet for a while.
B-Trees support add and remove operations. Under certain conditions in this
implementation of B-Trees, adding an entry requires more heap memory to be
allocated to the B-Tree structures. At other times, new memory does not need to
be allocated. Correspondingly, under certain conditions, when an entry is
removed, heap memory is freed from the B-Tree structures, while at other times
memory does not need to be freed. The frequency of adds and removals, and of
memory allocations and deallocations, is dependent on the turnover of MAC
addresses detected. The database holds up to 256 entries, and will not permit
new entries when full. Aging out frees up entries.
The MAC address database for a small network of continuously active nodes will
stabilize over a period of time, with the same entries remaining in the database
forever. However, if an address is not seen for too long a period (because a
node was down, or the MAC address polling of a port just happened to sample
different addresses on a port with multiple stations), an entry will be removed
occasionally. This may also happen with server nodes which periodically
advertise their presence, then stay quiet for a while. In more active networks,
or networks where nodes are bursty (send traffic for a while, then stay quiet
for a while), removals may take place more frequently. The degenerate case is a
large (specifically, much more than 256 MAC addresses), active network, where
whole batches of MAC addresses age out at the same time and new addresses
replace them rapidly, causing the database to repeatedly grow to maximum, then
deflate, then fill up again. I like to refer to this behavior as "churning the
database", because it causes so many continuous additions and removals.
As noted above, under certain conditions (which are non-determinstic, so don't
even *think* of trying to avoid it!), removal of an entry causes heap memory to
be freed. The B-Tree bug (a bug in this implementation, not in B-Trees as a data
structure) was that it would not free *all* of the memory it had allocated. Each
module has a fixed amount of heap memory that in normal use is repeatedly
allocated and freed by a number of operations; the amount of memory freed must
be equal to the amount allocated. The B-Tree code was consuming more memory than
it was releasing each time, so that over time it consumed all available heap
memory. This is a generic class of bug known as a "memory leak". The memory is
there, but all the software data structures have lost track of it, so it appears
to be gone. This would then generate the error reported in the log, when some
part of the MAM code attempted to allocate heap memory and found none available.
The fix was to free the correct amount of memory each time.
In a network with a fairly stable MAC address database, it takes a while to
consume all the heap. Entry 4 in .0 shows a timestamp of 186569200, in 100 ms
increments. This is 518 hours (21 days). Entry 3 is 608 hours (25 days).
Networks that really churn the database may causes a module to crash more
frequently. Predicting the failure time is virtually impossible, but once this
behavior is identified, it should be fairly repeatable given consistent network
traffic.
|
| Actually, a B-TREE is a "balanced multiway tree", in this case of order 16. This
means each node has up to 16 children, where a binary tree has only 2. It's
primary advantage over binary trees, and the major source of complexity, is that
it is always balanced, regardless of the order of adds or removes, with every
subtree having the same number of levels of children. This avoids the degenerate
case that binary trees are prone to where all children are down the same branch,
making the tree act in reality like a simple linear list and voiding all its
performance benefits.
Sorry, couldn't resist the pedantic explanation!
|