T.R | Title | User | Personal Name | Date | Lines |
---|
865.1 | | OPG::PHILIP | And through the square window... | Thu Jul 13 1995 15:54 | 17 |
| Ian,
Can you do the following...
1) Shut PCM down
2) Define/Sys Console$Debug "TERMINAL"
3) Define/Sys Console$Debug_Level 6144
4) Start up PCM V1.6
When the errors have occured, shut PCM down and post one of the
controller_nn.log files here.
Cheers,
Phil
|
865.2 | info | SNOOTY::HAWLEYI | Mr Flibble says: Game over boys | Thu Jul 13 1995 18:25 | 65 |
|
Phil,
Heres the info:
Author: Ian G Strachan, VSS, BCO
Date: 13-Jul-1995
Posted-date: 12-Jul-1995
$ set noon
$ save_ver = f$verify (0)
$ EXIT
$ !
$ ! Start a Child Controller process, name_num 1, child_num 1
$ !
$ CHILD :== $CONSOLE$IMAGE:CONSOLE$DAEMON.EXE
$ CHILD "child" 1
POLYCENTER Console Manager
Console Controller Daemon Version V1.6-100
Copyright (c) 1995 Digital Equipment Corporation. All Rights Reserved
SYS$ASSIGN - Assigning Channel to LAT Device.
Attempting to Map Lat Terminal Start
Cancelling QIOw timer (status = 1, iosb[0] = 1)
Attempting to Map Lat Terminal End
Attempting connect to Lat Terminal
QIOw timer Timeout procedure called, cancelling I/O
Cancelling QIOw timer (status = 1, iosb[0] = 44)
Connected to Lat Terminal, Status = 1
iosb status was not normal value was <44>
CMTerminalGetErrorMessages - Code is : -190
CMTerminalGetErrorMessages - Errno_Val is : 44
CMTerminalGetErrorMessages - Transport is : 1
Deleting LAT port
SYS$DASSGN - Deassigning Channel from LAT terminal->chan in Close.
SYS$ASSIGN - Assigning Channel to LAT Device.
Attempting to Map Lat Terminal Start
Cancelling QIOw timer (status = 1, iosb[0] = 1)
Attempting to Map Lat Terminal End
Attempting connect to Lat Terminal
QIOw timer Timeout procedure called, cancelling I/O
Cancelling QIOw timer (status = 1, iosb[0] = 44)
Connected to Lat Terminal, Status = 1
iosb status was not normal value was <44>
CMTerminalGetErrorMessages - Code is : -190
CMTerminalGetErrorMessages - Errno_Val is : 44
CMTerminalGetErrorMessages - Transport is : 1
Deleting LAT port
SYS$DASSGN - Deassigning Channel from LAT terminal->chan in Close.
SYS$ASSIGN - Assigning Channel to LAT Device.
Attempting to Map Lat Terminal Start
Cancelling QIOw timer (status = 1, iosb[0] = 1)
Attempting to Map Lat Terminal End
Attempting connect to Lat Terminal
QIOw timer Timeout procedure called, cancelling I/O
Cancelling QIOw timer (status = 1, iosb[0] = 44)
Connected to Lat Terminal, Status = 1
iosb status was not normal value was <44>
CMTerminalGetErrorMessages - Code is : -190
CMTerminalGetErrorMessages - Errno_Val is : 44
CMTerminalGetErrorMessages - Transport is : 1
Deleting LAT port
SYS$DASSGN - Deassigning Channel from LAT terminal->chan in Close.
...repeat to fade!...
|
865.3 | | OPG::PHILIP | And through the square window... | Thu Jul 13 1995 18:45 | 25 |
| Ian,
>> SYS$ASSIGN - Assigning Channel to LAT Device.
>> Attempting to Map Lat Terminal Start
>> Cancelling QIOw timer (status = 1, iosb[0] = 1)
>> Attempting to Map Lat Terminal End
>> Attempting connect to Lat Terminal
>> QIOw timer Timeout procedure called, cancelling I/O
>> Cancelling QIOw timer (status = 1, iosb[0] = 44)
>> Connected to Lat Terminal, Status = 1
It would appear that we stalled trying to open the LTA
device because our 5 second timer went off!!
Now, the question is, why did our connect to the LAT device
QIOW stall for so long??? the status of 44 (SS$_ABORT) returned
when we did the cancel is normal because we did the abort
ourselves.
Question, when you did a "set host/lat" how long did it take
to actually connect?
Cheers,
Phil
|
865.4 | change in 1.6? | SNOOTY::HAWLEYI | Mr Flibble says: Game over boys | Fri Jul 14 1995 16:18 | 10 |
|
Phil,
Is this a big change in 1.6?
Can this be changed so that it allows more time?
My customer doesnt think it takes 5 secs to establish a connection...
but if it works under 1.5A how comes it doesnt work under 1.6?
Ian.
|
865.5 | | OPG::PHILIP | And through the square window... | Fri Jul 14 1995 18:11 | 51 |
| Ian,
looking a little more closely at the log output, it would appear
something wierd is happening...
>> SYS$ASSIGN - Assigning Channel to LAT Device.
>> Attempting to Map Lat Terminal Start
>> Cancelling QIOw timer (status = 1, iosb[0] = 1)
>> Attempting to Map Lat Terminal End
>> Attempting connect to Lat Terminal
The above has done the QIOW to connect to the lat device, before
we did this QIO we called SYS$SETIMR for 5 seconds
>> QIOw timer Timeout procedure called, cancelling I/O
We are in the timers AST routine here, meaning it took 5 seconds
(maybe!!!) so we SYS$CANCEL the QIOW for the connect
>> Cancelling QIOw timer (status = 1, iosb[0] = 44)
The QIOW has returned, but neither its status or IOSB[1] values are
SS$_CANCEL, so we assume that the timer is still running, so we do a
SYS$CANTIM on it...
Now the IOSB[0] is 44 meaning the QIOW completed with SS$_ABORT, this
normally happens when there is a problem with the terminal server (the
port has hung up or something. Is there any chance of the customer
using TSM or NCP to connect to the terminal server and doing a SHOW USER
to see if something has grabbed the port on the server?
The message I would have expected here if the QIOW terminated because of
the SYS$CANCEL is an IOSB of SS$_CANCEL and a status of SS$_CANCEL
resulting in debug output saying something like ...
QIOw was cancelled (status = xx, iosb[0] = xx)
Resetting status to SS$_TIMEOUT
Now, it could be that the timer was completed prematurely because we dont
use an event flag on it (we have had problems like this before) so, what
I have done is added an event flag to the SYS$SETIMR call this change will
be in the FT ECO kit which we will release on Monday, it was to be today,
but we have had quite a busy week. Can your customer try this ECO kit to see
if it fixes their problems? If it doesnt, then I will tell you how to increase
the 5 second timer and we will see if that makes a difference.
Cheers,
Phil
|
865.6 | | 29067::BUTTERWORTH | Gun Control is a steady hand. | Fri Jul 14 1995 20:45 | 12 |
| > Now, it could be that the timer was completed prematurely because we
>dont use an event flag on it (we have had problems like this before) so,
>what
Phil,
The event flag is *irrelevant* to the actual firing of the timer. If you
specify 5 seconds, you'll get 5 seconds unless someone does a SET TIME
command. Period - the end. The event flag is set when the timer
expires.
Regs,
Dan
|
865.7 | | OPG::PHILIP | And through the square window... | Sat Jul 15 1995 16:19 | 8 |
| Dan,
In which case I dont know what is happening here, except that the timer did
fire, meaning it took at least 5 seconds to try the connect to the server,
this would indicate either a LAT or terminal server problem to me.
Cheers,
Phil
|
865.8 | ou est le patch? | SNOOTY::HAWLEYI | Mr Flibble says: Game over boys | Mon Jul 17 1995 11:50 | 23 |
|
Phil,
but does this explain why it works in 1.5a and not in 1.6?
The lat/terminal server setup is the same.
We have connected to the terminal server and done a SHOW USER and
theres NOTHING with any hold on the port.
Everything seems to point to a change in operation in 1.6 that is
incompatible with my customers setup.
I'd like to put the ECO patch on but the customer tells me that he is
not allowed to put FT software on the system normally, but we may be
able to make an exception. where is the kit?
Also, if you could tell me how to change the 5 second timer i would be
very grateful as this would be alot simpler and we are running out of
time on this one.
Thanks for all your help,
Ian Hawley.
|
865.9 | | OPG::PHILIP | And through the square window... | Mon Jul 17 1995 14:22 | 13 |
| Ian,
The patch kit isnt ready yet, sometime today or tomorrow we hope.
In the Character cell editor type "SET HIDDEN" what you want to
change is the value of "Console Open Timeout".
Please remember, if you or your customer reports a problem and
these hidden values have been changed WITHOUT A VERY VERY GOOD
REASON then you are on your own.
Cheers,
Phil
|
865.10 | fixed | SNOOTY::HAWLEYI | Mr Flibble says: Game over boys | Mon Jul 17 1995 18:00 | 12 |
|
Philip,
Increasing the "Console Open Timeout" value has fixed the problem!
So, testing continues...!
I still can't see why it works under 1.5 but not under 1.6, but I
guess mine is not to reason why!
Thanks.
Ian.
|
865.11 | | OPG::PHILIP | And through the square window... | Mon Jul 17 1995 18:23 | 11 |
| Hmm,
It would be better if we understood this a little more, we chose 5 seconds
as we figured that you would need to have a pretty bad network for it to
take that long to open the LAT connection. I would still be inclined to
have a close look at the customers LAN to see why its taking so long to
open. Looking back at the code, it would appear that it worked in V1.5
because this timer wasnt implemented for LAT in that version!
Cheers,
Phil
|
865.12 | lan probs | SNOOTY::HAWLEYI | Mr Flibble says: Game over boys | Tue Jul 18 1995 11:48 | 12 |
|
Well, due to the way their network is setup it should take a little
longer for it to establish a connection (dual ethernet = twice the
work?). It takes more than 10 seconds in reality. I'm trying to suggest
to the customer that he has a network problem. However, he is happy
with the fix (its set to 20 seconds). Whatever, its definately not a
PCM problem. Lets hope that now he can test 1.6 properly, he doesnt
have a repeat of the console extract problems that plagued him in 1.5A!
Thanks,
Ian.
|
865.13 | | OPG::PHILIP | And through the square window... | Tue Jul 18 1995 13:23 | 10 |
| Ian,
Your customer should be made aware that he could wait up to 16 * 20
(320) seconds before his child controllers are up and running properly,
Nearly 5 and half minutes is an awfully long time during which he wont
be able to do ANYTHING on any of the systems consoles because the daemon
wont be ready to accept connects!!!!!!
Cheers,
Phil
|
865.14 | 5� minutes!!! | SNOOTY::HAWLEYI | Mr Flibble says: Game over boys | Tue Jul 18 1995 18:23 | 11 |
|
Philip,
UUUUuuuuuuuuuurgh!
I'll tell him. I'm not very conversant with communications so I can't
suggest where the problem may lie but we will work something out.
Thanks for all your help,
Ian.
|
865.15 | multiple LAT Links! | 60549::SIMMONDS | Universe of Indifference | Mon Mar 11 1996 23:59 | 13 |
| Re: .*
There is definitely a case for a longer default interval for the
Console Open Timeout value : the configuration in .0 matches the
one that my Customer is using and we too saw the failure to connect to
any terminal server ports on servers connected to LAT LINKs other than
the default (LAT$LINK).. obviously the additional time is taken by
LTDRIVER/LATACP trying to reach the server via each LAT link in turn..
Where should I enter a QAR for this?
Thanks,
John.
|
865.16 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Mar 12 1996 12:17 | 7 |
| John,
No QAR for relesed versions so please IPMT this. What's the maximum
value you have found necessary?
Regards,
Dan
|
865.17 | Temp. workaround | 16660::ADKINS | | Tue Mar 12 1996 15:02 | 8 |
| Well, one quick but sleazy workaround I've found is to define a service
on the server. I was getting the timeout problem, but after defining
a service on the server, my connections came up quickly. It looks like
the service broadcast enters the server node information (LAT link and
address) in the LAT database.
Jim Adkins
|
865.18 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Wed Mar 13 1996 12:23 | 3 |
| Thats a great tip Jim. Thanks much!!
Dan
|