T.R | Title | User | Personal Name | Date | Lines |
---|
1601.1 | sample branches | MOCA::BELDIN_R | All's well that ends | Wed Apr 29 1992 13:02 | 18 |
| Re: <<< Note 1601.0 by CLARID::DEVAL "0-� Carefull with that vax eug�ne �-0" >>>
One quick and dirty suggestion:
STATISTICAL ANALYTIC APPROACH:
1) Sample a few branches. (You need to be able to estimate
what fraction of the entire network you are including.)
2) Sample the traffic along those branches at various times
of day. Calculate some statistics like mean, min, max,
number of bytes per [second, minute, hour, day] depending
on just how much data you collect.
3) Extrapolate your sample statistics from 2) to the entire
network using the sampling fraction in 1).
Dick
|
1601.3 | If I understand you correctly... | CADSYS::COOPER | Topher Cooper | Wed Apr 29 1992 16:47 | 57 |
| Not absolutely sure I understand the problem so first let me try to
describe it.
You have a network of N nodes, with n links (n will be even since they
are paired, back and forth, but I don't think that that matters).
Bytes are transmitted between the N nodes over the n links following
paths which minimize the number of links that the byte needs to be
transmitted over. You want to know the number of bytes transmitted
(where a byte transmitted from machine A through machine B to machine C
is counted once). What you have is the number of bytes transmitted
through each of the n links, which means that you overcount each link.
If that is a fair summary, here is a solution.
First of all you can assume that every message is sent to an adjacent
node. That means that there is *no* overcounting and this provides
a maximum number of messages consistent with your n link-counts. That
maximum is simply the sum of your n link-counts.
The minimum is less trivial but not too hard.
Imagine a path from one machine, A, to another, B, over some number of
links,p (p is the length of the path). The path is directional and is
the minimum path from A to B. The maximum amount of traffic along that
(directional) path is the minimum number of bytes transmitted over any
of its links. Any byte transmitted from A to B will be counted (in the
maximum count) p times in the maximum count. The maximum number of
counts for the path in the maximum count is therefore, the p times
the minimum number of bytes transmitted over any of its links.
To obtain the minimum possible traffic we want to see how much over-
counting could possibly have occured. Here is the procedure:
Start with a "minimum count", C, of 0. Have a list of all N(N-1) paths.
Let B[i] be the maximum number of bytes transmitted along path i (see
above), and p[i] be the length of path i. Find that path, j, which has
a maximum value for B[j]*p[j].
Add B[j] to C. Subtract B[j] from each of the links in path j. At
least one of those links (more if there were ties for minimum traffic
link) will go to zero. Drop every path which includes those links
(including path j) from the list (you don't actually have to do this
since their maximum traffic becomes zero, but its probably better to
get them out of the way -- a table indexing paths by links might come
in handy).
Repeat until you run out of (non-zero traffic) paths. C will be your
answer.
I think that that will produce an absolute minimum, but I haven't
absolutely convinced myself. In practice it shouldn't matter: it is
certainly conservative enough so it can be taken as a practical lower
limit unless some very bizzare things are taking place in the network.
Hope that helps.
Topher
|
1601.4 | Time for a SWAG | SSAG::LARY | Laughter & hope & a sock in the eye | Wed Apr 29 1992 17:24 | 27 |
| I think that to get a tighter result than Topher's bounds in .3 you will need
more information. Some information of a heuristic nature that might help would
be:
- Stick a network monitor on each link and sample sources and destinations of
messages, which will let you construct a profile of the actual distribution of
the number of hops per message on the links
- Make assumptions about the magnitude of the traffic between any pair of
sites and use those assumption to derive the path length distribution.
The kinds of assumptions that you might be able to reasonably defend are:
* Assume the traffic between any pair of sites is proportional to the product
of the number of Digital employees at the two sites
* Assume the traffic between any pair of sites is proportional to the product
of the number of nodes at the two sites
* Any better assumption than the above based on the business activity and
interrelations of the sites - f'rinstance, information tends to flow out of
Geneva more than it flows in :-)
If you do any of the above, you wind up with a function p(L,d,n) which is the
estimated probability that traffic on link L in direction d is part of an n-hop
path; then, if the traffic on each link is T(L,d), your "best guess" at the
total amount of traffic would be SUM(T(L,d)*p(L,d,n)/n) where the sum is over
all L, d, and n.
|
1601.5 | | TRACE::GILBERT | Ownership Obligates | Fri May 01 1992 12:10 | 28 |
| .3> Find that path, j, which has a maximum value for B[j]*p[j].
I'll call this the B*p greedy algorithm. It doesn't quite work.
If something like it *will*, then I think the expression should
be B[j] * (p[j] - 1). I'll explain.
Whenever B bytes of traffic on two adjacent links are 'connected' (i.e.,
made part of a longer path) by the algorithm, the bytes 'transmitted'
decreases by B. When links form a path of length p, there are only p-1
'connections' between the links, and the bytes 'transmitted' is reduced
by B*(p-1). Thus, I think B*(p-1) should be better.
Also, the following is a counterexample to the B*p greedy algorithm (but
note that the B*(p-1) greedy algorithm minimizes the transmissions).
X ----- Y ----- Z
5 2
The B*p greedy algorithm takes:
5 bytes connecting X-Y, then
2 bytes connecting Y-Z,
for a total transmission of 5+2 = 7.
The B*(p-1) greedy algorithm takes:
2 bytes connecting X-Y-Z, then
3 bytes connecting Y-Z,
for a total transmission of 2+3 = 5.
|
1601.6 | Good point. | CADSYS::COOPER | Topher Cooper | Fri May 01 1992 13:53 | 6 |
| RE: .5
Your right. The "overcount" (p-1) is more appropriate then the "count"
(p).
Topher
|