|
POLYCENTER_System_Watchdog_for_OpenVMS________
V2.2 ECO04 Release Notes
Order Number: AA-QBUZA-TE
February 1997
This guide describes the release information for
the POLYCENTER System Watchdog version 2.2-12 (V2.2
ECO04) software on OpenVMS VAX and OpenVMS Alpha
platforms.
Revision/Update Information: This guide supersedes
the release note
information for
version 2.2
Operating System: OpenVMS VAX and Alpha
versions 6.1 and 6.2
Product Version: 2.2-12
__________________________________________________________
February 1997
The information in this document is subject to change
without notice and should not be construed as a commitment
by Digital Equipment Corporation. Digital Equipment
Corporation assumes no responsibility for any errors that
may appear in this document.
The software described in this document is furnished under
a license and may be used or copied only in accordance
with the terms of such license. No responsibility is
assumed for the use or reliability of software on
equipment that is not supplied by Digital Equipment
Corporation or its affiliated companies.
Restricted Rights: Use, duplication, or disclosure by the
U.S. Government is subject to restrictions as set forth in
subparagraph (c)(1)(ii) of the Rights in Technical Data
and Computer Software clause at DFARS 252.227-7013.
Copyright � 1997 by Digital Equipment Corporation.
All Rights Reserved.
Printed in U.S.A.
The following are trademarks of Digital Equipment
Corporation: CI, DEC, DECnet, DECtalk, DECwindows,
EventCentral, HSC, LAT, MASSBUS, OpenVMS, PDP, RA,
UNIBUS, ULTRIX, VAX, VAX DOCUMENT, VAX Volume Shadowing,
VAXcluster, VAXstation, VMS, VT, VXT, and the DIGITAL logo.
This document was prepared using VAX DOCUMENT, Version
2.1.
1 Introduction
This document contains information discovered too late
for inclusion in the documentation (User's Guide and
Installation Guide) that describe this product. If a hard-
copy of this document is shipped with software kits, it
supersedes any soft copy of this document on the media.
POLYCENTER System Watchdog for OpenVMS version 2.2 ECO04
works on both
o OpenVMS Alpha V6.1, V6.2 and
o OpenVMS VAX V6.1, V6.2.
This document supplements the basic descriptions of the
product found in the following documents:
o POLYCENTER System Watchdog User's Guide, AA-PSY3B-TE
o POLYCENTER System Watchdog Installation Guide, AA-
PSY2B-TE
o POLYCENTER System Watchdog Software Product Description
(SPD 41.42.04, AE-PT5XE-TE)
1
2 Prerequisite Software
2.1 Operating System:
OpenVMS VAX or Alpha version 6.1 or 6.2 is required for
both POLYCENTER System Watchdog Consolidator and Agent.
2.2 Other Software:
o On OpenVMS V6.1, DEC Ada runtime V6.2 must be installed
before POLYCENTER System Watchdog V2.2 ECO04.
o DECwindows Motif is required to run the windows profile
editor.
o UCX is required to access POLYCENTER System Watchdog
Agents using TCP/IP.
2
3 Installation Warnings
The POLYCENTER System Watchdog for OpenVMS V2.2 ECO04 kit
is a complete product kit. This means you do not need to
have installed the base POLYCENTER System Watchdog V2.2
kit prior to installing the ECO04. The license product
authorization keys remain unchanged, as well as the main
product documentation set (user and installation guides).
_____________________ important _____________________
Because POLYCENTER System Watchdog V2.2 ECO04 was
produced using the latest DEC Ada compiler (which
helped improving the software reliability and
performance), this kit will not install on OpenVMS
V6.1 unless the new DEC Ada RTL V6.2 is already
installed.
This new runtime library is fully compatible with
OpenVMS V6.1, though it is delivered as a separate
kit.
You can check the version of the runtime currently
installed on your system using the following
command:
$ analyze/image/interactive sys$share:adartl.exe
_____________________________________________________
The Installation Guide provides complete instructions
for installing this product, but these particular items
deserve special attention:
3.1 License Management Facility Support Details
The POLYCENTER System Watchdog Consolidator Installation
Guide includes a general discussion of the License
Management Facility support included in the Consolidator.
This section explains the details of the Consolidator LMF
support that were not finalized soon enough to be included
in the Installation Guide.
3
3.1.1 POLY-SWDCON-USER License Type
The System Watchdog Consolidator can be run with either of
two licenses, POLY-SWDCON or POLY-SWDCON-USER. The POLY-
SWDCON license allows polling of System Watchdog Agents
with no fixed limit. The POLY-SWDCON-USER license allows
polling of a fixed set of named client nodes determined by
the conditions of the license.
When using the POLY-SWDCON-USER LICENSE, the System
Watchdog Consolidator is licensed based on the number
of named System Watchdog Agent client nodes that the
Consolidator can poll. Each client that is listed in the
profile requires 100 active license units. The licensing
terms and conditions listed in the System Watchdog
Consolidator Software Product Description (SPD) specify
that each license unit must be assigned to a specific
named client node. The System Watchdog Consolidator,
however, does not maintain nor check for a long term,
one-to-one relationship between licenses and client nodes.
Instead, during Consolidator startup or reconfiguration,
client nodes are granted licenses in alphabetical order
until the license unit limits are exceeded. If aggregate
license limits are exceeded, a message for each node that
has been ignored will appear at the command line and in
the log file. Additional warnings messages may appear for
queues, processes, and other node objects that refer to
nodes that have been ignored. Because of this current lack
of client name to licensed name enforcement, the failed
clients may not be the same as the clients that were most
recently added to the profile.
If the System Watchdog Consolidator fails to start at all,
check that a license named POLY-SWDCON or POLY-SWDCON-USER
with sufficient license units is registered and loaded on
the system.
4
3.1.2 License State Verification
If you find that you are not seeing events from a System
Watchdog Agent that you had included in your profile,
first check to see whether that Agent is listed in a
SENSE WATCHDOGS SHOW CONSOLIDATOR /FULL command with the
status 'Disabled'. If the Agent does not appear at all
in the watchdog information list, use the SENSE WATCHDOGS
RECONFIGURE command to see if a recheck of the license
status of the client Agent will report that the client has
been ignored.
3.1.3 License Enforcement
Because of limitations in the current license management
facility (LMF), the enforcement of the licensing by
the POLYCENTER System Watchdog Consolidator when using
the POLY-SWDCON-USER license may be more lenient than
the explicit rights granted by the System Watchdog
Consolidator Software Product Description. This present
temporary leniency does not extend any additional rights
beyond those stated in the SPD. Digital may increase the
level of enforcement in the System Watchdog Consolidator
code in a future release as new versions of the license
management facility become available and make it possible
to enforce matching of client names to licenses.
5
4 Enhancements to POLYCENTER System Watchdog V1.0
The followings have been enhanced in System Watchdog:
o The whole software was ported to OpenVMS Alpha.
o The RRD43 and RRD44 (CDROM readers) will not generate
the software write locked message.
o New StorageWorks HSJ controllers are supported.
o A new logical name SNS$TIME_DIFFERENCE_DELTA can be
defined to customize the time interval difference
between two nodes. The default value is set to
"00:05:00"
o Wildcarding is now supported for BATCHJOB and PROCESS
MISSING attributes: POLYCENTER System Watchdog for
OpenVMS is able to monitor the existence (report the
absence) of a process, or batchjob, which attributes
match user-specifed patterns (Process name and UIC for
the PROCESS MISSING event, Batchjob name, Queue name
and Username for the BATCHJOB MISSING event)
o Logical names within profile, error and output files
specification on START CONSOLIDATOR command are now
translated at Controller level to allow usage of
process or job level logical names instead of system
only.
6
5 Problems Corrected by System Watchdog V2.2
________________________Note ________________________
For runtime identification, the version number is
V2.2-02.
_____________________________________________________
1. In Consolidator interface with DECtalk: In some cases
after of failure of an attempt to output a message with
DECtalk the Consolidator would hang. Now the DECtalk
initialization and termination sequences are corrected
to handle correctly those exceptional cases.
2. HSC problem event now correctly reported, even if the
HSC is booted after System Watchdog startup.
3. Disabled memory pages count reported now consistent
with SHOW MEMORY.
4. Actions were triggered even if the concerned event was
set to 'Not checked'. Now fixed.
5. Problem with quotation marks included within object
identifiers into the profile and badly interpreted when
passed as parameters to action routines: this made the
one P1 parameter appear as many separate parameters
and possibly resulted in the command procedure eight
parameter number limit overflow.
6. Reported time difference now consistent with time zones
usual rules.
7. SENSE WATCHDOG SHOW LOG now can be stopped by ^C.
8. Improvement of the error messages the Consolidator
yields when the error, output or log files creation
fails due to access rights.
9. Avoid Consolidator hang on write operations to error,
output or logfile when running out of disk space.
10.Restore original event messages order upon delivery to
action routines: Message "stacks" turned into queues.
7
11.User program will not fail anymore when calling
sns$shr.exe shareable image entries (sns$add_message,
sns$delete_message) using the defaulting parameters
mechanism...
12.The window sizing was incorrect because the windows
were not designed to handle all the combinations of
monitor resolution combined with system settings of dot
per inch density. Now fixed.
13.Node name specification on external message handling
command line now correctly initialized.
14.Mailbox names are now case unsensitive.
15.User anticipated input to CLI profile editor now
correctly taken into account.
16.Deleting a class or an external class from the
DECwindows editor before displaying at least one no
more exits the editor.
17.CLI editor '@' command now parsed even if not left
justified.
18.Both profile editors now correctly handle UICs holding
dollars '$'.
19.SNS$EDIT> SENSE WATCHDOG MODIFY EXTERNAL_MESSAGE
command no longer fails when exceeding 36 characters
length after a SHOW ALL operation.
20.The limit on path blocks number System Watchdog can
handle now is raised to 160, which allows to deal with
any large cluster configurations.
21.Recursive logical names definitions for HSC names do
not anymore get the System Watchdog Agent entering a
tight loop, when it tries to translate them.
22.The MOTIF profile editor sometimes led to inconsis-
tancies when used to edit external class event lists.
This resulted in the Consolidator missing the external
events which patterns were added by the means of the
MOTIF configuration editor. This is now fixed.
8
23.The profile data validation event (VAL) was not raised
for disks the user specified for the Disk Near Full
(DNF) event check, when the provided names were not
valid disk names (e.g. when they held non-leading
underscores). The user data validation mechanism now is
fully efficient in that respect.
24.Using the SNS callable interface in order to ADD or
DELETE messages, 2 channels (one for MBxxx and the
other for NETxxx) are allocated, but not released if
DECnet connection is rejected. Now all channels are
released in case of DECnet connection problem.
25.The Controller commands to manage external messages now
correctly interpret double quotes:
SNS> ADD MESSAGE -
_SNS> "This is a new message with ""a quoted"" substring"
Will result in the following external message creation:
This is a new message with "a quoted" substring
26.The Editor now correctly interprets double quotes on
the following command:
SNS$EDIT> ADD ACTION MAIL /MODE=SPAWN /COMMAND= -
"mail/subject=""System Watchdog [|P7] |P1"" nl: SOMEONE"
27.The effect of <NEXT SCREEN> and <PREVIOUS SCREEN> keys
on the Controller continuous events display is not
anymore reversed.
9
6 Problems Corrected by System Watchdog V2.2 ECO01
________________________Note ________________________
For runtime identification, the version number is
V2.2-03.
_____________________________________________________
1. The OpenVMS Agent can now be polled using TCP/IP. This
was not possible with the previous version which was
limited to DECnet access only.
This ability relies on the UCX TCP/IP services
presence.
When the UCX layered product is installed and started,
the Agent declares itself, at its own startup time,
as a TCP/IP service number 251 (tunable thanks to the
SNS$TCPIP_SERVICE_NUMBER logical name).
This completes the TCP/IP implementation in System
Watchdog: Consolidators running on any platform
(OpenVMS, OSF/1 or ULTRIX) may use TCP/IP to connect to
Agents running on any platform (OpenVMS, OSF/1, ULTRIX,
HP-UX, SunOS, AIX, ...).
2. The TCP/IP communications error handling is enhanced.
Namely, under some conditions the Consolidator process
did not release the BGnnnn devices allocated by UCX on
its behalf upon a TCP/IP connection request which could
not complete successfully. The BGnnnn devices are now
correctly released in any cases.
3. Using the DECnet transport layer, the node UNRreachable
event was triggered intempestively, even in cases that
state was short lived...
Now the Consolidator retries the connection a short
time later, prior to generating that event. The number
of retries (1 by default) is tunable by defining the
SNS$UNR_RETRY_NUMBER logical name so that it translates
to the wished integer string representation.
4. Action routines are not triggered anymore for each
and every messages concerning a node newly detected as
unreachable.
10
The messages concerning that node are now put aside
in waiting the node be reachable again, instead of
removed.
A guard delay for temporarily unreachable node messages
saving was implemented and is tunable using the
SNS$UNR_KEEP_MESSAGES_DELAY logical name. If the node
does not get reachable again within that delay, then
related messages are removed.
The SNS$UNR_KEEP_MESSAGES_DELAY logical name should
translate to an OpenVMS time interval string represen-
tation with the HH:MM:SS format. The default delay is
24:00:00.
5. A new logical name SNS$CONSOLIDATOR_BASE_PRIORITY
can be defined to customize the newly started
Consolidators base priority. The default priority was
raised from 2 to 3 (batch priority). For even better
performances, the Consolidator priority may be raised
to 4 (interactive processes priority) quite harmlessly
for the host system resources usage.
6. The Consolidator output and error logfiles did not
accept concealed logical names within their pathname
specifications. This limitation is removed.
7. The Consolidator logfile specification now accepts
logical names (and concealed logical names as the
pathname device part as well).
8. A new problem in System Watchdog V2.2 prevented the
reuse of existing Consolidator logfiles... This is now
fixed.
9. The following Controller problem was fixed:
The following command file did not work:
$ sense watchdog
edit profile sns$profiles:sns$profile.dat
show node gemini/all
exit
$ exit
11
Whereas this one worked allwright:
$ sense watchdog edit profile -
sns$profiles:sns$profile.dat
show node gemini/all
exit
$ exit
Similarly, still in a command file:
$ sense watchdog
start cons/access=world/wait/info="Bob's Consolidator"
failed, whereas this worked:
$ sense watchdog start consolidator -
/access=world/wait/info="Bob's Consolidator"
Now either method is valid.
10.A correction in Shadow Set names validation allows
their specification by logical name.
11.Host based shadow set members were incorrectly flagged
(with the DiSk State problem event code) as in mount
verification timeout state, after they were detected
in mount verification in progress state and the mount
verification finally succeeded. This is now corrected.
12.The SNS$SHR.EXE shareable image entries (external
messages management API) don't fail anymore when called
intensively.
13.Sample modules written in C language provided in the
SNS$EXAMPLES directory are now ported to DEC C. In
addition, the SNS$FEED_PCM.C source file for the sample
POLYCENTER Console Manager feeder was tidied up.
14.New samples have been added in the SNS$EXAMPLES
directory (process in MWAIT state event check and
sample action routines).
15.The installation procedure is now more flexible
about the transport layer configuration: among other
minor changes is the introduction of new NCL commands
availability to better handle installations over DECnet
phase V. In addition, NCL commands were provided for
configuring DECnet phase V.
12
16.The "Warning: OSFNOD is a cluster alias ..." validation
message is not issued anymore when polling an OSF/1
node through DECnet ph.V.
Moreover, the node name included into event messages
generated on such an Agent is now systematically as
specified into the profile.
17.The Consolidator polling interval was restricted to
the admitted too restricted [0..999] range. Now, the
available range for the polling interval value is
[0..65535].
18.The DECtalk set LINE parameter now accepts logical
names. Those logical names translations are performed
at Consolidator start or reconfiguration time.
19.The profile editor now outputs the DECtalk set phone
numbers with surrounding double quotes when the
/FORMAT=COMMAND option is in use in conjunction with
the SHOW DECTALK_SET editor command.
This avoids any disruption when reusing the resulting
DECtalk set definitions, and the phone numbers
contain spaces or punctuation characters such as the
exclamation point (happens in pagers dialing numbers).
20.The Agent no longer randomly enters a deadlock state
as sometimes happened on Alpha multiprocessor systems
(such as DEC7620, DEC7720, etc...), consecutively to
some network I/O operation (this resulted in Agent
process looping or waiting forever in LEF state).
13
7 Problems Corrected by System Watchdog V2.2 ECO02
________________________Note ________________________
For runtime identification, the version number is
V2.2-05.
_____________________________________________________
1. The TCP/IP connection timeout events, which used to be
reported under the UNReachable node event code, are now
logged under the OTH event code ni order to ease their
identification, especially at action routines level.
2. Security violations when accessing the Agent via TCP/IP
are now checked correctly.
3. The extra vertical bar "|" tailing the action routine
8th parameter was removed.
4. The Agent now instals and starts correctly when DECnet
is not available, thus allowing its use in TCP/IP only
environments.
However, this implied dropping the Agent retries upon
failed attempts to declare itself as a DECnet object
(e.g. at node reboot, when System Watchdog Agent is
started before DECnet is up and running). As this
might be a problem for sites trusting DECnet as a
main transport layer, we provided the SNS$ENFORCE_
DECNET logical name to allow recovering the old
behaviour. Simply add the following line into the
SYS$STARTUP:SNS$STARTUP.COM file:
$ define/system SNS$ENFORCE_DECNET YES
5. After the ECO01 installation, the Controller commands
to handle external messages did not display a success
message anymore, despite of the presence of the /LOG
qualifier. The normal behaviour is now restored.
$ sense watchdog add message/LOG
%SNS-S-ADDED Message successfully added."
6. The ECO01 somehow broke the SYS$SHARE:SNS$SHR.EXE
entries. This is now fixed.
14
7. Timeout delay for DECnet network connection requests
(from Consolidator, Controller or calls to SNS$SHR.EXE
entries) and acknowledgements (on Agent side) are
now tunable using the SNS$DECNET_CONNECTION_TIMEOUT
logical name. Before this change, the timeout delay
upon connection requests, as built in DECnet, was about
40s, and the acknowledgement could even hang forever in
certain situations. Now the default for both operations
is 15 seconds.
Beware the logical name should be defined in the
system logical name table before the considered part of
POLYCENTER System Watchdog is launched.
Thanks to the above modification, the Agent no longer
randomly enters a deadlock state as sometimes happened
on Alpha multiprocessor systems (such as Digital Alpha
servers 2100, 7620, 7720, etc...), consecutively to
some local DECnet connectivity operations.
8. The Agent now fetches the local node name from the
"SCSNODE" SYSGEN parameter instead of the SYS$NODE
system logical name.
9. Corrected the configuration editor to format an
adequate way negative time differences upon the
following command:
SNS> SHOW NODE <NODENAME>/FORMAT=COMMAND
10.The processes runnning SNS$WATCHDOG.EXE, SNS$REJECT_
ONCE.EXE and SNS$REJECT_ALWAYS.EXE no longer crash when
receiving connection requests with node names longer
than 10 characters.
11.The Controller, when used within a DCL command file,
hanged when reading a command flow not properly
terminated with the EXIT clause. It now can handle
such unterminated command flows.
12.In some environments, one may wish to check node
reachability via both DECnet and TCP/IP. To achieve
this without getting twice the events for the
considered nodes, one would logically declare the same
nodes twice in the configuration file, once with their
DECnet names and once with their TCP/IP names, and hook
empty classes (i.e. event class with all event checks
15
disabled) to one of those two declarations, to avoid
getting duplicated events... This didn't work until
the ECO02, as the external events would be requested
and duplicated anyway. The ECO02 now allows such a
configuration, getting you rid of external events
duplications.
13.The Consolidator no longer hangs on Alpha, as it
sometimes happened with the ECO01, after a few hours of
proper working.
16
8 Problems Corrected by System Watchdog V2.2 ECO03
________________________Note ________________________
For runtime identification, the version number is
V2.2-11.
_____________________________________________________
1. The communication protocol between the Consolidator
and the Agent was again reviewed with respect of error
handling and variables initializations, which should
result in enhanced reliability for both processes.
A deadlock possibility was removed. A lot of code
simplifications were done in order to use the very
same routines for Agent/Consolidator dialogues, whether
they are solicited through a Consolidator request, a
Controller command to add or delete external messages
or a call to any entry of the SNS$SHR.EXE shareable
image.
2. Node names length limit is now 16 characters instead of
6 for configuration, communication, internal structures
and action purposes. Display still is limited to 6
characters long nodenames.
3. When an Agent became unreacheable for at least a poll
time, then was reachable again, the message sublist
displayed by the Controller continuous display was
not updated (messages seemed to "disappear", as they
were not displayed anymore until the Controller was
relaunched, though they were still present into the
Consolidator message list). This is now fixed.
4. Under some conditions, the SNS$SHR.EXE shareable images
entries would raise Ada exceptions to the calling
program. This should not happen anymore.
5. The shadow sets related event messages were not con-
solidated with the cluster alias, when simultaneously
detected by distinct members of the same cluster. This
is now corrected.
17
6. The ECO02 left a couple of places where the SYS$NODE
logical name translation would be used instead of the
SYSGEN parameter SCSNODE. This could cause problems on
non-DECnet nodes and it is now fixed.
7. Added support for HSD controllers.
8. A new logical name SNS$LOOPING_PROCESS_CPU_RATIO was
created to allow users to tune the percentage of
CPU usage beyond which a process may be detected as
looping on the Agent side (it acts on the pre-selection
criteria, as those processes in the selection which
performed I/Os are then discarded anyway). This logical
name should be defined in the system logical name
table, and may be redefined while the Agent is running.
The value is an integer within the 0 to 100 range, and
the default is 25. Depending on your system usage,
reasonable values presumably are between 20 (more
sensitive) and 50 (more permissive).
Example:
$ define/system SNS$LOOPING_PROCESS_CPU_RATIO 30
9. A check for the DEC Ada RTL version was added in the
installation procedure as it should not be any lower
than V6.2. Separate kits are available to install
DEC Ada RTL V6.2 on OpenVMS V6.1 (This is absolutely
harmless to other applications). Please contact your
local Digital support services.
10.The SNS$STARTUP.COM procedure was modified to activate
the node synonym option for the DECnet/OSI session
control application SNS$WATCHDOG. This enables node
short names usage.
11.The System Watchdog software was modified in order to
keep on working after the year 2000...
18
9 Problems Corrected by System Watchdog V2.2 ECO04
________________________Note ________________________
To help you identify the installed software version,
the System Watchdog V2.2 ECO04 Consolidator displays
V2.2-12, upon the following command:
SNS> SHOW CONSOLIDATOR/FULL
_____________________________________________________
1. Cluster-wide events consolidation was improved.
2. Nodename length limit is now 16 characters instead of
only 6. (Although this limit should rather be a greater
value, such as 255... This is to keep System Watchdog
V2.2 ECO04 Consolidator compatible with whichever
subversion of V2.2 Agents, as well as to avoid profile
conversions. A major improvement in this area will
happen in the next point release). The Controller
displays (both historical and continuous) now adapt
to the longest nodename to be output.
3. External event message classes with a first event text
filter set to priority NOT_CHECKED were considered as
empty, even if one of the subsequent entries was valid.
As a consequence, even in the case the very next entry
was, say, a catch-all pattern, external events from
the nodes associated with the considered external event
message class were simply ignored. This is now fixed.
________________________Note ________________________
In certain circumstances (after several socket
errors occurred) the consolidator may not deallocate
socket devices (BGxxx: devices) when polling
nodes through TCP/IP. The symptoms are that the
Consolidator does not poll anymore remote Agents
through TCP/IP (no more quota) and the 'show
process' DCL command on the Consolidator process
shows plenty of BGxxx: devices still allocated.
This is a known problem of the DEC C RTL and you
have to instal the following ECOs to fix it:
- VAXACRT02_061 for OpenVMS VAX V6.1
19
- AXPACRT04_061 for OpenVMS Alpha V6.1
_____________________________________________________
20
|