[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DECmcc user notes file. Does not replace IPMT. |
Notice: | Use IPMT for problems. Newsletter location in note 6187 |
Moderator: | TAEC::BEROUD |
|
Created: | Mon Aug 21 1989 |
Last Modified: | Wed Jun 04 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 6497 |
Total number of notes: | 27359 |
3438.0. "Success Stories for V1.2" by CTHQ3::WOODCOCK () Mon Jul 27 1992 12:55
Hi there,
Just thought it might be helpful to start a success note to help the sellers
of MCC. I seem to be asked somewhat regularly how we get the job done so I
wrote this up. Anyone with a success story might help the cause with a reply
regardless of whether the technical description is included. At times any
support conference can get a little 'tense' because the focus is always on
what's broken rather than what works. For the record:
**************** WE'VE NEVER HAD IT SO GOOD ****************
************************* THANKS ***************************
Sincerely,
brad...
............................................................................
ESC DECmcc V1.2 Implementation Overview
Brad Woodcock
ESC Consulting
last revision 7/27/92
INTRODUCTION:
The following writing describes the DECmcc V1.2 implementation for the
Enterprise Service's Center (ESC) of Digital Equipment Corporation. Its intent
is to describe the implementation textually for use as a learning/marketing
device for others.
DECmcc V1.2 provides for an integrated approach to managing objects using the
EMA platform. This approach has lead to this product being very granular for
managing multiple objects in similar fashions. At times granularity can be
perceived as complexity, I view it as functionally rich based on our success
with the product. Granularity in the context of this product results in
multiple methods of managing objects dependent upon the business needs.
This implies that the ESC solution and methods desribed below are not the only
available options but only one which meets specific ESC management needs. In
fact, the ESC's use of any managment product is simplistic due to the size of
the overall management environment. As ESC overall management needs change
so does the implementation of DECmcc.
ENVIRONMENT:
DECmcc's primary use within the ESC is mainly in the Data Networks arena. These
networks include the direct monitoring and management of DEC's internal
network backbone (BB) structures for DECnet_IV, TCP/IP, and WATN (Wide Area
Terminal Network) within the U.S. Future needs to be evaluated include OSI
routers as the network transitions, PBXs, and direct X25 product management.
The following is a breakdown of each protocol's direct management needs from
the ESC today:
DECnet: ~70 DECnet_IV routers (DECrouter 2000s)
24 Load Hosts (micro-vax's)
~80 DECnet circuits (BB->BB & Regional connects into BB)
TCP/IP: ~10 IP routers (Wellfleet)
~15 IP circuits (BB->BB)
WATN: ~100 WATN hosts
The DECmcc platform includes an 8810 VMS V5.4-3 multi-tasked system for the
monitoring of all networks. This system is in use for historical reasons as it
has always been used for network monitoring; the monitoring applications have
simply been extended/migrated to DECmcc. Graphical workstations are used as
display devices for managing network entities and maps from DECmcc on the 8810.
Entity registration is accomplished using a private DECdns namespace
implemented directly on the 8810 system. The use of a private namespace is in
part a decision of history also. The ESC implementation dates back to 1990
where issues of security/scalability/administration prevented DECmcc's use in
the corporate namespace. These issues have been resolved and
evaluation/transition of the ESCs use of the corporate namespace with DECmcc
will be done as time permits.
DOMAIN and MAP DESCRIPTIONS:
There are two map structures in use today; one for DECnet/IP and one for WATN.
The basic DECnet/IP structure's top level map (named .WORLDBB) contains 17
second level domains (named <site_code>-<decnet_area>). The WORLDBB map has
a backdrop of the US, Western Europe and a blow-out of New England. This was
created using AUTOcad then converted to the DECmcc format using an internally
developed tool found in the NOTED::MCC conference. The domains are
geographically placed on the map with lines between them as NODE4 CIRCUIT child
entities. The global node4 entities are also placed within this domain but not
in view of the map (off to the side). This enables the lines to change colors
with alarms and events. All IP routers are also placed within a section of the
map with lines drawn as SNMP INTERFACES enabling color changes also. Site
domains and IP routers are depicted with self created icons using DECpaint.
Within 12 of the 17 second level domains reside a pictorial of connectivity
showing all DECnet routers and load hosts managed by the ESC within the
appropriate US sites. Viewing the WORLDBB map gives a status of the entire US
DECnet and IP backbones, including all DECnet regional and international
connectivity into the US backbone.
The WATN network is a public based Tymnet network using Tymnet nodes and
Xyplex based hosts. The top level domain (.WATN.WATN) contains 16 domains
with names reflecting Tymnet nodes across the US. Each domain contains a data
collector icon and a series of reference entities representing logical hosts
being monitored for availability/status.
DECnet/IP MONITORING:
We currently have two methods for DECnet monitoring (events and polling). We
use DECnet events (4.7 & 4.10) for updating the map only (batch activitiy
would be too resource intensive for this size environment). DECnet events are
'sinked' from all routers to the DECmcc node. The DECmcc node is also set up
as a local sink using the MCC_DNA4_EVL process. A NOTIFY request is done at
the top level and individual TARGET commands are done for each domain for
updating lines on the map in real time.
Notify Command (worldbb only):
Notify Domain .worldbb Entity List = (node4 * circuit *),
Events = (circuit down circuit fault,circuit up)
The above command is saved and recalled by MCC at every map startup. Both
events are put into the same request so they will CORRELATE and change color
properly to reflect the status of links. The TARGET commands are used to
define the severity (color) for both events for EACH domain.
Target Commands (each domain):
EVENT SOURCE EVENT TARGET SEVERITY
-----------------------------------------------------------------------
node4 * circuit * circuit up clear
node4 * circuit * circuit down circuit fault critical
The second method for monitoring is polling alarms. Because we use events for
real time updates to the map polling is only set to every 30 minutes as a
'backup' and off-hour monitor. For DECnet circuits the following alarm is used
for each domain. Note that all routers only reside in ONE domain at the second
level so each router is only polled once. All circuits not in use are in the
OFF state. When these rules fire they update log files, send mail and call
DECalert (paging/voice) once for contiguous link outages. The alarms also
change the map color.
DECnet ALARM rule (12, one per domain):
MCC> show domain .pko-24 rule * all char
Domain LUVBOT_NS:.pko-24 Rule poll_PKO-24
AT 6-JUL-1992 11:38:00 Characteristics
Examination of attributes shows:
Alarm Fired Procedure = DISK$MCC:[MCC.COM]CKT_DOWN.COM;1
Alarm Exception Procedure = DISK$MCC:[MCC.COM]NODE_DOWN.COM;1
Batch Queue = "mcc$batch"
Expression = (node4 * circuit * substate <>
none,at every 0:30:0)
Severity = Critical
Probable Cause = Unknown
The ESC will also implement a rule for watching CIRCUIT DOWNS for better
management of bouncing circuits.
The following rule is used for IP circuits. Note that wildcarding cannot be
accomplished with IP because unused backup circuits are still seen with an
ifOperStatus of DOWN. Therefore an alarm for each circuit (15) is needed.
IP ALARM rule:
MCC> show domain .worldbb rule PALO_ALTO_W_5 all char
Domain LUVBOT_NS:.worldbb Rule PALO_ALTO_W_5
AT 6-JUL-1992 11:40:52 Characteristics
Examination of attributes shows:
Alarm Fired Procedure = DISK$MCC:[MCC.COM]CKT_DOWN.COM;1
Alarm Exception Procedure = DISK$MCC:[MCC.COM]NODE_DOWN.COM;1
Batch Queue = "mcc$batch"
Expression = (snmp LUVBOT_NS:.PALO_ALTO_W inter 5
ifoperstatus=down,at every 0:30:0)
Severity = Critical
Probable Cause = Unknown
The command procedures used by the alarm rules are multi-purpose as indicated
above. There are two different log files written to by these procedures; one
for the current month and the other for the current half-hour. Each have time
stamps built into the filename. In many instances we are polling both ends of
a DECnet circuit because of the wildcarding used in the alarms. The alarm
procedure checks a configuration file to ensure the circuit is only reported
thru batch from one end. Also, if the same circuit entry is present in the
last half-hour log file, calls for mail and DECalert are not issued. This
saves on unnecessary continuous mail for any long outages. A menu driven
procedure is then used to give the status (current, daily errors, monthly
errors) of all the networks using these log files while off-hours within
seconds. This procedure can also be 'launched' from the map application
pull-down.
WATN MONITORING:
The WATN monitor was built using data collectors. An X.25 connection into a
Tymnet node allows us to retrieve related events for this network. A program
was written which excepts these events and sends them to DECmcc using data
collector functionallity. The MCC_EVC_SINK process is run for this purpose.
A notify command is issued automatically at each map startup:
Notify Domain .watn.watn Entity List = (collector *),
Events = (any event)
This allows for HOST status to be reflected on the map as they become
unavailable/available on the network and is used for Tymnet vendor management.
An alarm rule was also written for sending mail, calling DECalert and updating
log files. The network status can also be determined with the same menu driven
command file described above providing for complete integration of network
status for all networks. The alarm rule is as follows:
MCC> show mcc 0 alarms rule watn_host_status all char
MCC 0 ALARMS RULE watn_host_status
AT 24-JUL-1992 16:58:59 Characteristics
Examination of attributes shows:
Procedure = DISK$MCC:[MCC.COM]WATN_HOST_STATUS.COM
;1
Queue = "mcc$batch"
Expression = (occurs(collector * any event))
Perceived Severity = Indeterminate
Probable Cause = Unknown
An additional alarm rule is used to verify the operation of the DECmcc system
itself and is meant to fire each half hour interval. It updates the map and
places an entry into the current half hour status log file. The
characteristics are as follows:
MCC> show domain .worldbb rule test_poll all char
Domain LUVBOT_NS:.worldbb Rule test_poll
AT 27-JUL-1992 11:07:53 Characteristics
Examination of attributes shows:
Alarm Fired Procedure = DISK$MCC:[MCC.COM]CKT_DOWN.COM;1
Alarm Exception Procedure = DISK$MCC:[MCC.COM]NODE_DOWN.COM;1
Batch Queue = "mcc$batch"
Expression = (node4 LUVBOT buffer size>500,at
every 0:30:0)
Severity = Minor
Probable Cause = Unknown
All the above alarm rules for all networks are enabled within a single batch
process. Rules are enabled within the process on a per domain basis with a
minute and a half wait statement for allowing the wildcarded rules time to
execute and to spread the load. This alarm's process shuts down at midnight
each night and restarts automatically.
DECnet/IP METRICS:
Availability metrics are derived using the monthly log files created by the
alarms fired during a given month. A procedure which determines the number of
circuits, days in the month, and poll rate then searches the log file for
numbers of errors. The procedure then calculates availabilty for routers and
circuits for both DECnet and IP.
The other major concern for metrics is circuit utilization. These metrics are
used for current performance issues and also long term trend analysis for
upgrade assessments. RECORDed information for all DECnet and IP circuits are
set up as follows:
DECnet circuit counters: hourly
DECnet circuit char.: daily
DECnet line char.: daily
IP interface counters: hourly
IP interface status: daily
These are the minimally required attributes recorded for DECmcc to calculate
circuit/interface statistics. Hourly statistics available using DECmcc commands
with this setup is sufficient for 99% of all current performance issues being
analyzed on these circuits.
Long term trends and upgrade assessments require a roll-up process for
filtering the data into less information for practical purposes. Daily and
monthly averages are required for ESC needs. The use of 7x24 data has been
deemed ineffective over the years and therefore a 5x8 approach has been taken.
Focusing on 5x8 is more practical because this timeframe is more user sensitive
(read: interactive) to performance issues. 7x24 tends to 'water down' averages
and potentially cover up problems until it's too late (eg. the phone rings).
Procedures have been developed which provide this process in an automated
fashion requiring as little interaction as possible. A procedure is self-run
each night producing overall utilization and congestion figures for each
circuit for the 'working' hours (M-F). A single entry is made into a file for
each circuit each night. A monthly process is run which averages these numbers
into a single entry for the entire month into a seperate file. The resultant of
this method are two files for each circuit being managed: one which holds an
entry for each working day's averages and another containing the monthly
averages of all working days. Graphs can be created on an individual basis for
any circuit for either daily or monthly data. Although graphs can be produced
this is not automated to produce a graph each month for each circuit due to the
scope of the management environment.
Future implementations will most likely be converted to using EXPORT and Rdb
features.
T.R | Title | User | Personal Name | Date | Lines |
---|
3438.1 | | CSOADM::ROTH | I'm getting closer to my home... | Wed Jul 29 1992 09:59 | 3 |
| Thank you for this informative post!
Lee
|