T.R | Title | User | Personal Name | Date | Lines |
---|
480.1 | Loop | ZENDIA::DBIGELOW | Innovate, Integrate, Evaporate | Wed Nov 16 1994 21:21 | 14 |
| Tom,
If I read you correctly, the customer is in an
infinite loop. As the console data comes out of the
pseudoterminal, it gets put back out to the
pseudoterminal which gets put out to the pseudoterminal
which gets put out to ... and so forth.
You could suggest to the customer that he/she replace
the "ALL" with a list of systems. In that way, they are
not watching themselve (the pseudo terminal).
Dave
|
480.2 | The actually use a list of systems ... | DECAUX::VNATIG::KARASEK | Thomas KARASEK @AUI | Thu Nov 17 1994 10:49 | 20 |
| Dave,
sorry for that misleading information. In fact they do use a list of systems
along with the 'console watch' - command. I have only a single system configured
which will be watched and output through the pseudo terminal.
This "system" is in fact connected to a decserver-port (Port #1) which itself is
connected to another port of that decserver (port #2).
All I do is just copy small textfiles to port #2 which will simulate console
input to port #1.
It only takes a few lines of text (e.g. 20 to 30 lines) until the control
daemon will fall into RWMBX.
I did raise bytlm to 300000 and pagfilquota to 200000 for the account, CM is run
of. DEFMBXQUOTA is currently 10000 (for the default=1056 didn't work at all).
Since the customer absolutely wants to watch the incoming data he is very
concerned about that problem.
Any help is strongly appreciated,
regards, Tom.
|
480.3 | | OPG::PHILIP | And through the square window... | Thu Nov 17 1994 11:19 | 24 |
| Tom,
My guess is that there is a condition where the watch process
has received some data, which it is trying to log with PCM,
however, the watch process has had its mailbox fill up and so
PCM has hung in RWMBX, now, because PCM is in this state, it
cannot service the data that the watch interface has put into
the pseudo-terminal, so its all in a deadlock and hung up!!!!
In this situation, raising quotas will only postpone the
inevitable.
I am not sure that I would want to support this kind of "abuse"
of the software.
I am also not sure what the customer is trying to do here!! Why
do they want to log their console output twice (once normally
via the console connection and a second time via the "watch")?
Please explain what your customer is trying to do, we may be
able to come up with a more efficient mechanism for them.
Cheers,
Phil
|
480.4 | | DECAUX::VNATIG::KARASEK | Thomas KARASEK @AUI | Thu Nov 17 1994 13:01 | 13 |
| Phil,
Thanks for your quick answer.
I just did verify what the customer really wants to do:
There are workstations in his cluster, which do not have any physical console
connection. The idea is to capture the console data of these stations via pseudo
devices.
Redirecting the console output to a decserver port would be a rather costly
solution.
Do you have any simplier ideas ?
Thanks, Tom.
|
480.5 | | OPG::PHILIP | And through the square window... | Thu Nov 17 1994 13:35 | 9 |
| Tom,
I am sorry, I am a little confused!! How was the customer
getting the output into a pseudo-terminal from the
workstations if they didnt want to use a DECserver?
Cheers,
Phil
|
480.6 | some more confusion ... | DECAUX::VNATIG::KARASEK | Thomas KARASEK @AUI | Thu Nov 17 1994 13:58 | 21 |
| The way it worked up to VCS V1.4 was rather simple:
The OPCOM-messages of these systems will show up on other cluster members as
well.
(e.g.
%%%%%%%%%%% OPCOM 9-NOV-1994 11:43:51.50 %%%%%%%%%%%
Message from user AUDIT$SERVER on VNORHM )
This messages were filtered according to the nodename, and then directed
to a pseudo terminal. So in fact this is not the 'real' console output, but
it may be enough for tracing most of the system events.
The only thing the customer wants, is to record a very limited set of events
(or OPCOM-messages) and feed them to a dummy system, which has its own icon
and does react on events (by changing its color).
I'm getting more and more convinced, that the simpliest solution would be to
set up action routines for 'really' connected system, which just take the
events an put them to the appropriate FTA-device of our dummy-system.
cheers, Tom.
|
480.7 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Thu Nov 17 1994 19:12 | 9 |
| I guess I don't understand why hooking up the real consoles of these
workstations is so costly? Decservers are cheap theses days and so
is deconnect cable and bear in mind that a node that uses a
pseudo-terminal still requires a license. If you alreayd paying for
a license then for my money I'd rather spend a little money and
hook up the real consoles!
Regards,
Dan
|
480.8 | same as monitoring peripheral devices ... | DECAUX::VNATIG::KARASEK | Thomas KARASEK @AUI | Fri Nov 18 1994 10:28 | 18 |
| Dan,
you are perfectly right. But they also want to monitor X.25 routers and items
which really do not have any physical console line.
So, for my opinion, this should be possible the same way as applies to
peripheral devices.
In fact we do nothing else than what John Becker did suggest in your TIMA
article:
"[PLY_CM] Monitoring A Peripheral Device for PCM Event Notification"
It does work perfectly, unless there are a number of lines coming in in short
sequence. It looks like the pseudo terminal is not able to keep up with the
speed of incomming data. This will result in hanging the control daemon into the
RWMBX-deadlock. Since this does not only affect that specific pseudo terminal,
but the whole PCM-interface, this should be considered as a serious bug, which
at least two of our customers are really concerned about.
regards, Tom.
|
480.9 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Fri Nov 18 1994 17:58 | 14 |
| >It does work perfectly, unless there are a number of lines coming in in short
>sequence. It looks like the pseudo terminal is not able to keep up with the
>speed of incomming data. This will result in hanging the control daemon into the
>RWMBX-deadlock. Since this does not only affect that specific pseudo terminal,
>but the whole PCM-interface, this should be considered as a serious bug, which
>at least two of our customers are really concerned about.
I agree in that this particular problem needs to be analyzed and fixed.
I just wanted to understand the real need here hence the reply in -2.
I want to see if I can reproduce this one.
Regs,
Dan
|
480.10 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Fri Nov 18 1994 20:45 | 44 |
| >you are perfectly right. But they also want to monitor X.25 routers and
>items which really do not have any physical console line.
> So, for my opinion, this should be possible the same way as applies to
>peripheral devices.
I just reread this and I have to ask the question: Are there
applications that the customer currently uses to talk to these
routers and other boxes that don't have a console. Example: Our own
LPS20's don't have a physical console but there is an application
called LPS$CONSOLE that is used to interface with these systems as in
effect they have a soft console. The right way to monitor LPS20's
is not to use a pseduo terminal with a watch command but rather a
pseudo-terminal that performs MCR LPS$CONSOLE. This is the way it
was don with VCS. If these boxes do indeed have a "console" application
similar to LPS$CONSOLE then use that instead of the WATCH interface.
And Phil's "gues" as to what is happening is right on. I just
reproduced this and here is what has happened:
The pseduo-terminal process that is running the watch command is
attempting to write a line of data to TT: which is of course the
FTA device itself. The controller daemon is attempting to place a
message into the mailbox that the was created for the watch image
to read from and this mailbox is full as the watch process hasn't been
able to process the messages quickly enough. This puts the controller
into RWMBX which means it can't read messages from the pseduo terminal
that the watch command is trying to output nor can it read any messages
from any other node that it happens to be controlling! I don't think
this would happen if the pseduo-node that is running watch was under
control of a different daemon than that which is running the node
or nodes we are trying to watch. This would remove the circularity of
the I/O situation described above. Bear in mind you would never want to
use nodename ALL as you are now watching the pseudo-node(s) that is/are
also running the WATCH command!!! So as Dave was alluding too earlier
you have created an infinite loop if you use nodename ALL. This would
also make the proposed workaround of forcing the pseduo-node(s) running
WATCH to a different child daemon than the nodes your WATCHing
impossible to achieve.
The scary part is I understand what I just wrote .....;-}
Regs,
Dan
|
480.11 | | DECAUX::VNATIG::KARASEK | Thomas KARASEK @AUI | Mon Nov 21 1994 10:05 | 34 |
| Hi Dan,
Your 'scary' description of what happens seems to summarize this problem
exactly.
> I just reread this and I have to ask the question: Are there
> applications that the customer currently uses to talk to these
> routers and other boxes that don't have a console.
They don't have anything like that yet. Is I mentioned in one of the replies
before, they only want to extract specific operator messages from one system
and feed them into some application (like pseudo terminal), making another box
in the C3-interface react on these messages.
> Bear in mind you would never want to
> use nodename ALL as you are now watching the pseudo-node(s) that is/are
> also running the WATCH command!!!
O.K. Please forget about 'nodename ALL', since this was only a mistake in the
first note, and neither me nor the customer did use it that way. In fact we
both are specifying one single system only.
> I don't think
> this would happen if the pseduo-node that is running watch was under
> control of a different daemon than that which is running the node
> or nodes we are trying to watch.
I agree. This sounds pretty logical. But I suspect this means not to use the
'watch'-command at all, and write an own daemon instead. I would be thankful
about any ideas how this could be managed.
Thanks for your investigations so far,
regards, Tom.
|
480.12 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Nov 22 1994 20:50 | 19 |
| >But I suspect this means not to use the 'watch'-command at all, and write
> an own daemon instead. I would be thankful about any ideas how this
>could be managed.
I'm not sure what you mean by "write an own Daemon". If you mean write
your own console controller daemon I would say that it is unnecesary
and difficult at best. When PCM starts up the database is read and
the parent daemon will start up one child controller daemon for
each 16 systems in the database. Now this value can be adjusted with
the config editor but *IT IS TOTALLY UNSUPPORTED TO DO SO*. The bottom
line is if you change it and something breaks don't expect it to be
fixed. The magic CC Editor commands are SET/SHOW HIDDEN.
A question: You say you are looking for certain opcom messages. Are
these opcom messages somehow related to the routers and other boxes
your are trying to manage?
Regs,
Dan
|
480.13 | | DECAUX::VNATIG::KARASEK | Thomas KARASEK @AUI | Wed Nov 23 1994 13:09 | 15 |
| Hi Dan !
I did use the *strictly unsupported* method to creat one child controller
process per system for my test configuration (1 real system, 1 pseudo terminal)
and id really did work around the RWMBX-problem (as expected).
Though this certainly means a lot of overhead, it could be used for a
workaround at the customer's site as well. Since we only have to make sure, that
the pseudo terminal and the system it is watching are using different daemons,
we could use a more efficient way to accomplish this.
So, what is the criteria for relating the systems to a daemon ?
(Order of appearance in the configuration script, alphabetical order, ...)
Thanks, Tom.
|
480.14 | | OPG::PHILIP | And through the square window... | Wed Nov 23 1994 13:45 | 14 |
| Tom,
The systems get assigned to daemons in the order they appear
in the database, disabled systems are skipped when the assignment
is done.
There certainly is nothing wrong with setting hosts per controller
to 1 however, it will eat a lot of process slots up.
I still dont understand why you need to create this "feedback" loop
though.
Cheers,
Phil
|
480.15 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Wed Nov 23 1994 20:30 | 24 |
| > I still dont understand why you need to create this "feedback" loop
> though.
Phil,
Here's why I think he wishes to do this. Sites have always wanted
event notification on peripherals such that the C3 icon for the
peripherals changes color just as a "regular" service node would.
Let's say we have a cluster with a TA90 tape drive with device name
$1$MUA0:. We could create a pseudo-node called TA90 and a scan profile
to search for strings with $1$mua0 as the opcom messages that are sent
to nodes in a cluster would of course contain the device name. So
we use a pseudo-node and it's scan profile to search for
strings generated by opcom that concern the tape drive.
Unless the PCM engine is part of the same cluster that it monitors of
is the standalone node that owns the tape drive then we can't just do a
spawn command and repl/enable. We'll be forced to use WATCH to get the
messages. The only other option I can think of is a DECNet task-to-task
setup which would be kind of messy and could generate *lot's* of
ethernet traffic if something really wierd happens that causes a flurry
of messages.
Regs,
Dan
from one of the nodes inthe cluste
|
480.16 | | OPG::PHILIP | And through the square window... | Wed Nov 23 1994 21:39 | 10 |
| Dan,
I think I see now, so, if you were to define the "subsystem" field
correctly for each event and we then used that to create a "hierarchical"
C3 display such that you could "zoom" into a system and see a bunch of icons
representing each of the subsystems, would that do what you and your
customers want?
Cheers,
Phil
|
480.17 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Thu Nov 24 1994 00:41 | 8 |
| I think that would be a *GREAT* use for the sub-system field but some
customers have already started using it for other logical groupings of
events. For example, VMSCluster CNXMAN messages can be placed into a
"Cluster" subsystem. I can see somebody griping about it. How about
a peripheral field ala the VCS C3 and using that trigger the icon
color?
Dan
|
480.18 | | OPG::PHILIP | And through the square window... | Thu Nov 24 1994 09:54 | 14 |
| Dan,
I dont want to add new fields to the evnt records at this stage for the
next release so I cant put in a peripheral field.
The intention of the subsystem field was in readiness for what I proposed,
I have asked Dave to put this in the C3 already but he has a lot of other
stuff to do, so, its just a matter of time and priorities.
How anyone can think that a "cluster" is a subsystem is beyond me!!
Cheers,
Phil
|
480.19 | .15 describes exactly what we were looking for ... | DECAUX::VNATIG::KARASEK | Thomas KARASEK @AUI | Thu Nov 24 1994 12:55 | 24 |
| One example of a customer's configuration:
6 systems
4 HSCs
1 translan bridge
(All of the above are physically connected to PCM.)
1 X.25-router (--> = 1 pseudo terminal)
19 Satellite VAXstations (--> = 1 pseudo terminal)
So one pseudo terminal is used for event notification of all the satellites.
All we want to do is trigger on a few events, like "lost connection to ...",
"established connection to ...".
To accomplish this, we need to watch one of the bootnodes in the pseudo terminal
(without logging data to file again) - and trigger on the specified events.
In the meantime I did use the suggestion to create one controller process per
configured system as a workaround. This seems to work pretty well (at least for
that limited number of nodes.)
However, I think this would be worth being addressed in future releases.
Thanks & regards, Tom.
|
480.20 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Mon Nov 28 1994 18:32 | 13 |
| >However, I think this would be worth being addressed in future releases.
I'm not sure I agree after seeing the real configuration that you
posted in -1. I see an ulterior motive on the customers part and that
is he wants event notification on 19 nodes without having to buy
19 licenses and hooking them up them up the right way.
Phil?
Regs,
Dan
|
480.21 | | OPG::PHILIP | And through the square window... | Mon Nov 28 1994 21:14 | 6 |
|
I'm with you on this one Dan, the customer should really be doing this the
"right" way, that is with seperate licences and real connections.
Cheers,
Phil
|
480.22 | | OPCO::TSG_SJM | Coming live to you from Rosebery | Tue Nov 29 1994 03:31 | 7 |
| FWIW..I was having nightly problems with processes in RWMBX. I have
used the "unsupported fix", and changed the number of systems per
control process down to 5, and havn't had a problem since, or am I just
covering something up.
Thanks
Steve
|
480.23 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Nov 29 1994 18:55 | 12 |
| >or am I just covering something up.
part of me says yes - part of me says no. There probably is a design
limitation that is rearing it's ugly head when you have a reasonably
busy system and each controller is handling 16 nodes. Decrease the
amount of work on each controller and no more problem. Is it a
bug/design limitation and the code should be changed or is do
we really need to reevaluate the amount of work each controller
should have to do? I really haven't made up my mind on this one!
Regs,
Dan
|
480.24 | | OPG::PHILIP | And through the square window... | Tue Nov 29 1994 20:15 | 15 |
|
There were some tradeoffs made when we ported our code from ULTRIX to
OpenVMS, these were based on the time it would take to port the code as
opposed to the efficiency of the result. Based upon our experiences V2.0
*should* be a lot less memory hungry, resilient and hopefully faster.
So, the bottom line is that you are covering up some of our design
deficiencies, lowering the hosts per controller from the default of 16 should
be no problem if you have enough system resources (memory etc) for all those
extra processes which will get created. Beware though that raising the hosts
per process above 16 could have dramatic effects based upon open file limits
etc.
Cheers,
Phil
|
480.25 | RE- .20, .21 | VNOTSC::KARASEK | Thomas KARASEK @AUI | Wed Nov 30 1994 11:36 | 14 |
| Hi Phil, Dan !
I don't quite agree on that, since the RWMBX-problem is not due to the fact,
that events for more than one nodes are scanned, but to the usage of the
'CONSOLE WATCH' - command.
However, I don't think this is a license violation as well, since all the
information is derived from a physically connected (and fully licensed) system.
We could also setup whatever event scan and action routine we want on that
system, without overcomming license agreements.
In fact we are consuming 1 additional license for the pseudo terminal, on which
the specific events are actually scanned.
regards, Tom.
|
480.26 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Wed Nov 30 1994 19:17 | 29 |
| >I don't quite agree on that, since the RWMBX-problem is not due to the
>fact, that events for more than one nodes are scanned, but to the usage of
>the 'CONSOLE WATCH' - command.
That wasn't the point actually. I've had RWMBX problems even when a
site isn't doing the WTACH trick and it was due to a similar problem
and typically seen on busy systems.
>However, I don't think this is a license violation as well, since all
>the information is derived from a physically connected (and fully licensed)
>system.
I never said it was a license violation but rather a way to get around
buying more licenses and hooking these systems up the right way.
>We could also setup whatever event scan and action routine we want on
>that system, without overcomming license agreements.
No argument here at all. This is a question of whats the customer real
motive.
> In fact we are consuming 1 additional license for the pseudo terminal,
>on which the specific events are actually scanned.
I'm well aware of this.
Regs,
Dan
|