[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference csc32::consolemanager

Title:POLYCENTER Console Manager
Notice:Kits, Scans, Docs on CSC32:: as PCM$KITS:,PCM$DOCS:, PCM$SCANS:
Moderator:CSC32::BUTTERWORTH
Created:Thu Aug 06 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1541
Total number of notes:6564

865.0. "LAT connection problem with 1.6" by SNOOTY::HAWLEYI (Mr Flibble says: Game over boys) Thu Jul 13 1995 14:38

    
    Hi, Ive got another customer who has just installed PCM 1.6 to fix a
    certain problem but is unable to get anything to connect.
    I will probably IPMT this if there is no quick "oh silly me!" answer as
    this is a pretty high profile affair and we may lose the business.
    
    The customers configuration is shown below. The two VAX systems used
    for monitoring were originally identically configured to run VCS. One
    as a live system, the other as "hot standby".
    For security reasons, the systems have dual-ethernet controllers to
    allow the VCS DECserver to be on a hidden LAN.
    
    As part of the customers migration plan towards PCM, he upgraded one of
    the systems to VMS 6.1 and PCM 1.5...
    
                 +-------++-------++--------++--------+
                 | VAX A || VAX B ||  VAX C ||  VAX D |
                 |       ||       ||        ||        |
                 +---+---++---+---++---+----++---+----+          Main LAN
           ----------+---+----+--------+-----+---+-----------------------
                         |                   |
                     +---+---+           +---+---+
                     |VCS    |           |PCM    |
                     |SYSTEM |           |SYSTEM |
                     +---+---+           +---+---+
                         |                   |             Hidden VCS LAN
           ------+--------+----------+-----------------------------------
                                  |
                             +----+----+
                             |DECserver|
                             +---------+
    
    
    PCM 1.5 (and PCM 1.5A) were causing them various problems with process 
    dumps etc, so I supplied them with PCM 1.6.
             
    Once they had installed PCM 1.6, they could not establish connection to 
    any of the connected systems.
             
    The eventslist window keeps saying ;-
    "Console not found - Managed Console line not available"
    for each of the systems
    
    When selecting one of the icons from the C3, the connect option would be 
    greyed out.
             
    When performing multiple MC LATCP SHO PORT commands, he could see it 
    creating the LTA ports, then they would disappear again.
             
    The customer checked out terminal server and port setups...all okay.
             
    If he created the LTA port manually and performed a SET HOST/DTE to it, 
    he could access the system console so it isnt a communication problem.
             
    Also, if he closed PCM and started up VCS on the other system, 
    it could access the consoles with no problems.
    
    As he was sure that he had changed nothing on the PCM system apart from 
    the new version of PCM, he re-installed the old 1.5A software as a test.
             
    With 1.5A he could establish connection to the consoles again
             
    He then re-installed PCM 1.6 again, and cannot now establish connection 
    to the consoles.
            
    *  Has there been a fundamental change in the LAT connection/creation
       method in 1.6?
    
    *  Will console$verify help?
    
    
    Thanks in advance,
    
    Ian Hawley.
     
             
            
T.RTitleUserPersonal
Name
DateLines
865.1OPG::PHILIPAnd through the square window...Thu Jul 13 1995 15:5417
Ian,

  Can you do the following...

  1) Shut PCM down

  2) Define/Sys Console$Debug "TERMINAL"

  3) Define/Sys Console$Debug_Level 6144

  4) Start up PCM V1.6

  When the errors have occured, shut PCM down and post one of the
  controller_nn.log files here.

Cheers,
Phil
865.2infoSNOOTY::HAWLEYIMr Flibble says: Game over boysThu Jul 13 1995 18:2565
    
    Phil,
    
    Heres the info:
    
    Author: Ian G Strachan, VSS, BCO      
    Date: 13-Jul-1995
    Posted-date: 12-Jul-1995
    
    $ set noon
    $ save_ver = f$verify (0)
    $ EXIT
    $ !
    $ ! Start a Child Controller process, name_num 1, child_num 1
    $ !                                                                
    $ CHILD :== $CONSOLE$IMAGE:CONSOLE$DAEMON.EXE
    $ CHILD "child" 1
    POLYCENTER Console Manager
    Console Controller Daemon Version V1.6-100
    Copyright (c) 1995 Digital Equipment Corporation. All Rights Reserved
    
     SYS$ASSIGN - Assigning Channel to LAT Device.
    Attempting to Map  Lat Terminal Start
    Cancelling QIOw timer  (status = 1, iosb[0] = 1)
    Attempting to Map  Lat Terminal End
    Attempting connect to Lat Terminal
    QIOw timer Timeout procedure called, cancelling I/O
    Cancelling QIOw timer  (status = 1, iosb[0] = 44)
    Connected to Lat Terminal, Status = 1
    iosb status was not normal value was <44>
     CMTerminalGetErrorMessages - Code is       : -190
     CMTerminalGetErrorMessages - Errno_Val is  : 44
     CMTerminalGetErrorMessages - Transport  is : 1
    Deleting LAT port 
     SYS$DASSGN - Deassigning Channel from LAT terminal->chan in Close.
     SYS$ASSIGN - Assigning Channel to LAT Device.
    Attempting to Map  Lat Terminal Start
    Cancelling QIOw timer  (status = 1, iosb[0] = 1)
    Attempting to Map  Lat Terminal End
    Attempting connect to Lat Terminal
    QIOw timer Timeout procedure called, cancelling I/O
    Cancelling QIOw timer  (status = 1, iosb[0] = 44)
    Connected to Lat Terminal, Status = 1
    iosb status was not normal value was <44>
     CMTerminalGetErrorMessages - Code is       : -190
     CMTerminalGetErrorMessages - Errno_Val is  : 44
     CMTerminalGetErrorMessages - Transport  is : 1
    Deleting LAT port 
     SYS$DASSGN - Deassigning Channel from LAT terminal->chan in Close.
     SYS$ASSIGN - Assigning Channel to LAT Device.
    Attempting to Map  Lat Terminal Start
    Cancelling QIOw timer  (status = 1, iosb[0] = 1)
    Attempting to Map  Lat Terminal End
    Attempting connect to Lat Terminal
    QIOw timer Timeout procedure called, cancelling I/O
    Cancelling QIOw timer  (status = 1, iosb[0] = 44)
    Connected to Lat Terminal, Status = 1
    iosb status was not normal value was <44>
     CMTerminalGetErrorMessages - Code is       : -190
     CMTerminalGetErrorMessages - Errno_Val is  : 44
     CMTerminalGetErrorMessages - Transport  is : 1
    Deleting LAT port 
     SYS$DASSGN - Deassigning Channel from LAT terminal->chan in Close.
    
    ...repeat to fade!...
865.3OPG::PHILIPAnd through the square window...Thu Jul 13 1995 18:4525
Ian,


>>     SYS$ASSIGN - Assigning Channel to LAT Device.
>>    Attempting to Map  Lat Terminal Start
>>    Cancelling QIOw timer  (status = 1, iosb[0] = 1)
>>    Attempting to Map  Lat Terminal End
>>    Attempting connect to Lat Terminal
>>    QIOw timer Timeout procedure called, cancelling I/O
>>    Cancelling QIOw timer  (status = 1, iosb[0] = 44)
>>    Connected to Lat Terminal, Status = 1

  It would appear that we stalled trying to open the LTA
  device because our 5 second timer went off!!

  Now, the question is, why did our connect to the LAT device
  QIOW stall for so long??? the status of 44 (SS$_ABORT) returned
  when we did the cancel is normal because we did the abort
  ourselves.

  Question, when you did a "set host/lat" how long did it take
  to actually connect?

Cheers,
Phil
865.4change in 1.6?SNOOTY::HAWLEYIMr Flibble says: Game over boysFri Jul 14 1995 16:1810
    
    Phil,
    
    Is this a big change in 1.6?
    Can this be changed so that it allows more time?
    
    My customer doesnt think it takes 5 secs to establish a connection...
    but if it works under 1.5A how comes it doesnt work under 1.6?
    
    Ian.
865.5OPG::PHILIPAnd through the square window...Fri Jul 14 1995 18:1151
Ian,

  looking a little more closely at the log output, it would appear
  something wierd is happening...

>>     SYS$ASSIGN - Assigning Channel to LAT Device.
>>    Attempting to Map  Lat Terminal Start
>>    Cancelling QIOw timer  (status = 1, iosb[0] = 1)
>>    Attempting to Map  Lat Terminal End
>>    Attempting connect to Lat Terminal

  The above has done the QIOW to connect to the lat device, before
  we did this QIO we called SYS$SETIMR for 5 seconds

>>    QIOw timer Timeout procedure called, cancelling I/O

  We are in the timers AST routine here, meaning it took 5 seconds
  (maybe!!!) so we SYS$CANCEL the QIOW for the connect 

>>    Cancelling QIOw timer  (status = 1, iosb[0] = 44)

  The QIOW has returned, but neither its status or IOSB[1] values are
  SS$_CANCEL, so we assume that the timer is still running, so we do a
  SYS$CANTIM on it...

  Now the IOSB[0] is 44 meaning the QIOW completed with SS$_ABORT, this
  normally happens when there is a problem with the terminal server (the
  port has hung up or something. Is there any chance of the customer
  using TSM or NCP to connect to the terminal server and doing a SHOW USER
  to see if something has grabbed the port on the server?

  The message I would have expected here if the QIOW terminated because of
  the SYS$CANCEL is an IOSB of SS$_CANCEL and a status of SS$_CANCEL
  resulting in debug output saying something like ...

QIOw was cancelled  (status = xx, iosb[0] = xx)
    Resetting status to SS$_TIMEOUT

  Now, it could be that the timer was completed prematurely because we dont
  use an event flag on it (we have had problems like this before) so, what
  I have done is added an event flag to the SYS$SETIMR call this change will
  be in the FT ECO kit which we will release on Monday, it was to be today,
  but we have had quite a busy week. Can your customer try this ECO kit to see
  if it fixes their problems? If it doesnt, then I will tell you how to increase
  the 5 second timer and we will see if that makes a difference.

Cheers,
Phil


 
865.629067::BUTTERWORTHGun Control is a steady hand.Fri Jul 14 1995 20:4512
    >  Now, it could be that the timer was completed prematurely because we
    >dont use an event flag on it (we have had problems like this before) so,
    >what
    
    Phil,
      The event flag is *irrelevant* to the actual firing of the timer. If you
    specify 5 seconds, you'll get 5 seconds unless someone does a SET TIME
    command. Period - the end. The event flag is set when the timer
    expires.
    
    Regs,
      Dan
865.7OPG::PHILIPAnd through the square window...Sat Jul 15 1995 16:198
Dan,

  In which case I dont know what is happening here, except that the timer did 
  fire, meaning it took at least 5 seconds to try the connect to the server, 
  this would indicate either a LAT or terminal server problem to me.

Cheers,
Phil
865.8ou est le patch?SNOOTY::HAWLEYIMr Flibble says: Game over boysMon Jul 17 1995 11:5023
    
    Phil,
    
    but does this explain why it works in 1.5a and not in 1.6?
    The lat/terminal server setup is the same.
    
    We have connected to the terminal server and done a SHOW USER and
    theres NOTHING with any hold on the port.
    
    Everything seems to point to a change in operation in 1.6 that is
    incompatible with my customers setup.
    
    I'd like to put the ECO patch on but the customer tells me that he is
    not allowed to put FT software on the system normally, but we may be
    able to make an exception. where is the kit?
    
    Also, if you could tell me how to change the 5 second timer i would be
    very grateful as this would be alot simpler and we are running out of
    time on this one.
    
    Thanks for all your help,
    
    Ian Hawley.
865.9OPG::PHILIPAnd through the square window...Mon Jul 17 1995 14:2213
Ian,

  The patch kit isnt ready yet, sometime today or tomorrow we hope.

  In the Character cell editor type "SET HIDDEN" what you want to
  change is the value of "Console Open Timeout".

  Please remember, if you or your customer reports a problem and
  these hidden values have been changed WITHOUT A VERY VERY GOOD
  REASON then you are on your own.

Cheers,
Phil
865.10fixedSNOOTY::HAWLEYIMr Flibble says: Game over boysMon Jul 17 1995 18:0012
    
    Philip,
    
    Increasing the "Console Open Timeout" value has fixed the problem!
    So, testing continues...!
    
    I still can't see why it works under 1.5 but not under 1.6, but I
    guess mine is not to reason why!
    
    Thanks.
    
    Ian.
865.11OPG::PHILIPAnd through the square window...Mon Jul 17 1995 18:2311
Hmm,

  It would be better if we understood this a little more, we chose 5 seconds
  as we figured that you would need to have a pretty bad network for it to
  take that long to open the LAT connection. I would still be inclined to
  have a close look at the customers LAN to see why its taking so long to
  open. Looking back at the code, it would appear that it worked in V1.5
  because this timer wasnt implemented for LAT in that version!

Cheers,
Phil
865.12lan probsSNOOTY::HAWLEYIMr Flibble says: Game over boysTue Jul 18 1995 11:4812
    
    Well, due to the way their network is setup it should take a little
    longer for it to establish a connection (dual ethernet = twice the
    work?). It takes more than 10 seconds in reality. I'm trying to suggest
    to the customer that he has a network problem. However, he is happy
    with the fix (its set to 20 seconds). Whatever, its definately not a
    PCM problem. Lets hope that now he can test 1.6 properly, he doesnt
    have a repeat of the console extract problems that plagued him in 1.5A!
    
    Thanks,
    
    Ian.
865.13OPG::PHILIPAnd through the square window...Tue Jul 18 1995 13:2310
Ian,

  Your customer should be made aware that he could wait up to 16 * 20
  (320) seconds before his child controllers are up and running properly,
  Nearly 5 and half minutes is an awfully long time during which he wont
  be able to do ANYTHING on any of the systems consoles because the daemon
  wont be ready to accept connects!!!!!!

Cheers,
Phil
865.145� minutes!!!SNOOTY::HAWLEYIMr Flibble says: Game over boysTue Jul 18 1995 18:2311
    
    Philip,
    
    UUUUuuuuuuuuuurgh!
    
    I'll tell him. I'm not very conversant with communications so I can't
    suggest where the problem may lie but we will work something out.
    
    Thanks for all your help,
    
    Ian.
865.15multiple LAT Links!60549::SIMMONDSUniverse of IndifferenceMon Mar 11 1996 23:5913
    Re: .*
    
    There is definitely a case for a longer default interval for the
    Console Open Timeout value : the configuration in .0 matches the
    one that my Customer is using and we too saw the failure to connect to
    any terminal server ports on servers connected to LAT LINKs other than
    the default (LAT$LINK).. obviously the additional time is taken by
    LTDRIVER/LATACP trying to reach the server via each LAT link in turn..
    
    Where should I enter a QAR for this?
    
    Thanks,
    John.
865.16CSC32::BUTTERWORTHGun Control is a steady hand.Tue Mar 12 1996 12:177
    John,
      No QAR for relesed versions so please IPMT this. What's the maximum
    value you have found necessary?
    
    Regards,
       Dan
    
865.17Temp. workaround16660::ADKINSTue Mar 12 1996 15:028
    Well, one quick but sleazy workaround I've found is to define a service
    on the server. I was getting the timeout problem, but after defining
    a service on the server, my connections came up quickly. It looks like
    the service broadcast enters the server node information (LAT link and
    address) in the LAT database.
    
    Jim Adkins
    
865.18CSC32::BUTTERWORTHGun Control is a steady hand.Wed Mar 13 1996 12:233
    Thats a great tip Jim. Thanks much!!
    
    Dan