[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference help::decnet-osi_for_vms

Title:DECnet/OSI for OpenVMS
Moderator:TUXEDO::FONSECA
Created:Thu Feb 21 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:3990
Total number of notes:19027

3879.0. "ECO-6 for V6.3 and DNS Server nodes" by COMICS::WEIR (John Weir, UK Country Support) Thu Feb 20 1997 12:27

The intention of this note is to let you know what might happen if you
install ECO-6 for V6.3 on a DNS Server node. If you then panic you will
probably mess up your DNS namespace (as have a couple of my Customers), if
you keep calm you can dig your way out of the situation.

As far as I know, there are no particular problems with ECO-6 on DNS Clerks or
systems with LOCAL as their primary name service. Therefore, over 99% of nodes
will probably be improved by installation of ECO-6 -- it is just DNS Server
nodes which may suffer, in particular, those with DECdns configured as the
primary name service.

The problem on DNS Server nodes occurs when the CDI cache entry for the Server
node itself is flushed. There are a number of events which may flush the
CDI cache entry, so you may want to avoid as many of these causes as possible!!

	1. Installation of ECO-6 (cache file format is changed, so the entire
	   cache is deleted)

	2. NET$CONFIGURE options 1 or 2 rename the node and flush the node's
	   own entries

	3. Explicit use of NCL FLUSH on the node's own name or on "*"

	4. In a very large network the cache might become full and older
	   entries might become purged.

	5. During normal operation old entries are periodically purged from
	   the CDI cache. The purge interval is set by the "Session Control
	   Naming Cache Timeout". By default, NET$CONFIGURE sets the value
	   in NET$SEARCHPATH_STARTUP.NCL to be 30 days. You may edit this,
	   but if you use NET$CONFIGURE option 2 to change it then your node's
	   entry will be flushed, anyway!

	6. During the boot process, between the startup of NET$ACP and the
	   execution of NET$SEARCHPATH_STARTUP there is a 20 second window
	   during which the cache timeout takes on the hard-coded value of
	   7 days. During this window, the node will almost certainly attempt
	   to look up its own name, and if its own cache entry is more than
	   7 days old it will be deleted from the CDI cache. The hard-coded
	   7 day value can be over-ridden by defining the logical CDI_CACHE_TTL.

In summary, if you have installed ECO-6 on your DNS Server, or if you need to
install it, then take the following two actions, and avoid the others (above)
and you will probably be OK. (OK, so you can't do #1 and avoid doing it!)

	1. Edit the NET$SEARCHPATH_STARTUP.NCL to set the Timeout to be
	   a large number of days.

	2. Edit SYLOGICALS.COM to include something similar to the following,
	   where the timeout is in units of 1 second

		$ def /sys CDI_CACHE_TTL 5184000	! 60 days

Setting CDI_CACHE_TTL to a negative value (any -ve value) is supposed to
disable the timeout. I can't confirm it, although I can say that it has
strange side effects, such as making the "Naming Cache Timeout" value
non-displayable by NCL.

If you are unlucky enough to have an ECO-6 DNS Server system lose its own
CDI cache entry then you may get a variety of symptoms, mostly involving
logical link connect attempts hanging and eventually timing out. Even
"SET HOST 0" is likely to hang and then timeout.

When in this state, the problem is how to restore the node's own CDI cache
entry. The following sequence has been used multiple times, and has succeeded
both in our lab and on Customer networks...

Sometimes the system seems to recover spontaneously, but more often it is
necessary to carry out the following sequence:

        1. Sometimes a reboot is required. A side effect of the
           problem is that NET$ACP can consume all its VA and the
           only way out is to reboot.

        2. Use NCL to rename the node into LOCAL: (for example,
           NCL RENAME NEW NAME = LOCAL:.FRED). Do not try to be
           clever and use NET$CONFIGURE option 2 as this will
           flush the cache as fast as it renames !

        3. Use DNS$DIAG to disable node verification, to prevent
           every logical link resulting in recursive lookups to
           DNS...

           $ mc dns$diag
           DECdns Server Diagnostics - Version V2.020   (Dec  2 1996)
           diag> disable node_verification

        4. Use DNS$DIAG to disable ACS, as your access control is
           almost certainly not set up to allow access from
           unvalidated LOCAL: connections:

           $ mc dns$diag
           DECdns Server Diagnostics - Version V2.020   (Dec  2 1996)
           diag> enable acs_override

        5. Carry out operations such as SET HOST to the proper
           *fullname* (ie with namespace name, etc):

                $ SET HOST NS:.XYZ.FRED

        6. Use NCL to rename the node back to its proper name
           and check that its still working

                $ NCL RENAME NEW NAME = NS:.XYZ.FRED
                $ SET HOST 0

        7. Force the CDI cache to be written to disk, using
           CDI trace to monitor CDI activity until the cache
           write occurs (15 minutes)

           $ mc ncl set ses con nam cac check int +0-08:00:00
           $ mc cdi$trace

        8. Search the CDI cache for references to LOCAL: and
           then flush them out ...

                $ @tt:/out=x.x
                _$ mc cdi_cache_dump
                _$ exit
                $ search x.x local
                $ mc ncl flu ses con nam cac ent "LOCAL:.FRED"

        9. Make sure the CDI cache is written to disk before the
           next reboot:

           $ mc ncl set ses con nam cac check int +0-08:00:00
           $ mc cdi$trace

        10. Restore normal DECdns access control

           $ mc dns$diag
           DECdns Server Diagnostics - Version V2.020   (Dec  2 1996)
           diag> enable node_verification
           diag> disable acs_override


Regards,

	John Weir
T.RTitleUserPersonal
Name
DateLines
3879.1IAMOSI::LEUNGWed Mar 12 1997 22:3023
Clarification needed :

I have a customer who needs to apply ECO6 because it contains a fix for a crash
in net$transport_nsp (NETNOSTATE) and the system is a DECdns server.

Do they have to do these steps BEFORE applying ECO6 :
>
>	1. Edit the NET$SEARCHPATH_STARTUP.NCL to set the Timeout to be
>	   a large number of days.
>
>	2. Edit SYLOGICALS.COM to include something similar to the following,
>	   where the timeout is in units of 1 second
>
>		$ def /sys CDI_CACHE_TTL 5184000	! 60 days

Is there an official patch from Engineering?

If the node were to lose its own CDI cache, after doing the steps mentioned
(renaming, dns$diag, etc...) and rebooting, is it then ok to use commands that
will flush cdi cache entries?

Thanks
Dennis
3879.2Preliminary fixes, only, so farCOMICS::WEIRJohn Weir, UK Country SupportThu Mar 13 1997 03:5968
	Dennis,

>Clarification needed :
>
>I have a customer who needs to apply ECO6 because it contains a fix for a crash
>in net$transport_nsp (NETNOSTATE) and the system is a DECdns server.
>
>Do they have to do these steps BEFORE applying ECO6 :
>>
>>	1. Edit the NET$SEARCHPATH_STARTUP.NCL to set the Timeout to be
>>	   a large number of days.
>>
>>	2. Edit SYLOGICALS.COM to include something similar to the following,
>>	   where the timeout is in units of 1 second
>>
>>		$ def /sys CDI_CACHE_TTL 5184000	! 60 days

	Until fixes are widely available, do the above, as you suggest.

	Note: This will not completely avoid the problem, as when you do
	the upgrade to ECO-6 the CDI cache is deleted and thus the node's
	own name is flushed along with everything else. Also, is you rename
	the node then its name will be flushed ... But, at least if you do
	the above then you should not suffer from periodic and possibly
	unexpected flushing of the node's own entry.

	The base note contains the full procedure for recovering, and it
	has been used on several Customer sites as well as our lab. Quite
	often you do not have to use the full procedure and sometimes things
	recover spontaneously. So, provided you provide the Customer with
	the recovery instructions, the impact of the cache entry being flushed
	is not disasterous and can be recovered quite quickly -- don't
	panic, just be prepared (and being prepared, means knowing how to
	recover quickly).

>
>Is there an official patch from Engineering?

	No, but I have successfully tested the preliminary fixes. I don't
	have authority to distribute them on Easynet (so don't ask me). If
	you need the fixes then ask Engineering or IPMT.

	The preliminary fix is to set a flag on the node's own CDI cache
	entry and that of its Alias so that they do not get flushed by
	periodic cache timeouts. But, these entries can still be flushed
	by renaming the node (a bad idea on a DNS Server, anyway). So, be
	aware that you might still need the recovery procedure even with
	the fixes -- But, if you don't mess around (eg renaming your DNS
	Server) then once the fixes are in then you should not need the
	recovery procedure any more...

>If the node were to lose its own CDI cache, after doing the steps mentioned
>(renaming, dns$diag, etc...) and rebooting, is it then ok to use commands that
>will flush cdi cache entries?

	I don't understand the question. Flushing the node's entry from the
	CDI cache may cause the problem, but if the node's entry has already
	been flushed, then you may repeat the flush as many times as you
	like without making the situation any worse ;-)

	ie flushing an empty cache is a NOOP.

	Regards,

		John


3879.3ClarificationNNTPD::"[email protected]"Paul StureTue May 13 1997 11:519
In View Note 3879, step 1 recommends setting the timeout in NET$SEARCHPATH.NCL

to be a "large number of days". The default appears to be 90 days. What do you
suggest for this "large number"?

TIA

PAul Sture
[Posted by WWW Notes gateway]
3879.490 days should be enough (maybe)COMICS::WEIRJohn Weir, UK Country SupportWed May 14 1997 06:4316
	Paul,

>
>to be a "large number of days". The default appears to be 90 days. What do you
>suggest for this "large number"?
>

	That all depends upon how quickly you think the fix will be
	produced and how readily your Customer will upgrade ...

	Engineering originally suggested 90 days. The default in
	NET$CONFIGURE is 30 days (I think).

	John

3879.5Lartge no of daysNNTPD::"[email protected]"rtoms::support_pwWed May 21 1997 11:3313
>	That all depends upon how quickly you think the fix will be
>        produced and how readily your Customer will upgrade ...
>
>        Engineering originally suggested 90 days. The default in
>        NET$CONFIGURE is 30 days (I think).

Thanks for your reply. It's for our own DNS systems in Germany, so I've got
control over applying the fix once it arrives.

Regards

Paul Sture
[Posted by WWW Notes gateway]