[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference lassie::ucx

Title:DEC TCP/IP Services for OpenVMS
Notice:Note 2-SSB Kits, 3-FT Kits, 4-Patch Info, 7-QAR System
Moderator:ucxaxp.ucx.lkg.dec.com::TIBBERT
Created:Thu Nov 17 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5568
Total number of notes:21492

5293.0. "Secondary DNS load balancing failed after reload" by GIDDAY::TAN () Mon Mar 03 1997 19:19

	Strange problem with dynamic load balancing with UCX V4.1 ECO#4!

	One of our inhouse systems has been configured to be a bind server,
with dynamic load balancing on two cluster aliases; this server is a secondary
DNS server, downloading host files from a UNIX primary server; the aim is to
eventually make this UCX system the main primary DNS server.

	When BIND first started, the two cluster aliases behaved exactly as
expected, highest rating hosts were accessed; this worked until the system
received a host file updates from the UNIX primary server, then only one of
the cluster aliases worked, i.e. access to the second cluster alias always get
put through to the host which is also the old-styled cluster impersonator node.

	To illustrate:

	2 cluster alias of 3 hosts, and an impersonator:

	snofs2.sno.dec.com   -	16.153.0.8   (snofs2) *impersonator
				16.153.0.72  (snov27)
				16.153.0.75  (snov19)
				16.153.0.74  (snov15)

	bbq.sno.dec.com	     -	16.153.0.93  (bbq)    *impersonator
				16.153.0.198 (dwspr)
				16.153.0.249 (snov09)
				16.153.0.151 (brews)


ucx show config bind:

Cluster:      bbq.sno.dec.com

Cluster:      snofs2.sno.dec.com

Cluster:      bbq

Cluster:      snofs2

Secondary
  Domain:     SNO.DEC.COM                   File: SNO_DEC_COM.DB
  Host:       16.153.0.76

Secondary
  Domain:     0.153.16.IN-ADDR.ARPA         File: 0_153_16_IN-ADDR.ARPA
  Host:       16.153.0.76

Cache
  Domain:     .                             File: NAMED.CA




and in SNO_DEC_COM.DB:


bbq             IN      A       16.153.0.93
                IN      A       16.153.0.249
                IN      A       16.153.0.198
                IN      A       16.153.0.151
snofs2          IN      A       16.153.0.8
                IN      A       16.153.0.72
                IN      A       16.153.0.75
                IN      A       16.153.0.74

$ mc ucx$metricview
Host                                                    Rating
----                                                    ------
16.153.0.72     snov27.sno.dec.com                        65
16.153.0.74     snov15.sno.dec.com                        37
16.153.0.75     snov19.sno.dec.com                        28
16.153.0.198    dwspr.sno.dec.com                         32
16.153.0.151    brews.sno.dec.com                         30

ignore 16.153.0.249 which doesn't have metric server running.

and the ucx$bind_startup.log file showed the followings:

UCX BIND Server Debug message -- Tue Mar  4 09:59:25 1997
LB: updating highest rating

CLUSTER HOST - RATING METRIC TABLE
CLUSTER_HOST:  8009910 RATING_METRIC: 999999    16.153.0.8      snofs2*
CLUSTER_HOST: 48009910 RATING_METRIC: 65        16.153.0.72     snov27
CLUSTER_HOST: 4b009910 RATING_METRIC: 29        16.153.0.75     snov19
CLUSTER_HOST: 4a009910 RATING_METRIC: 39        16.153.0.74     snov15
CLUSTER_HOST: 5d009910 RATING_METRIC: 999999    16.153.0.93     bbq*
CLUSTER_HOST: f9009910 RATING_METRIC: 999999    16.153.0.249    snov09
CLUSTER_HOST: c6009910 RATING_METRIC: 31        16.153.0.198    dwspsr
CLUSTER_HOST: 97009910 RATING_METRIC: 30        16.153.0.151    brews

so far so good, and everything appeared to be in order, but an ucx>show host 
bbq.sno.dec.com/server=ucx-dns   showed that bbq is top of the list!  
(snofs2 cluster alias worked however, and at all times)

     BIND database

Server:   16.153.0.190     snoman.sno.dec.com

Host address    Host name

16.153.0.93     BBQ.SNO.DEC.COM    <--- impersonator node, rating of 99999
16.153.0.249    BBQ.SNO.DEC.COM
16.153.0.198    BBQ.SNO.DEC.COM    <--- this should have been top of the list!
16.153.0.151    BBQ.SNO.DEC.COM


Note this only happened after a database reload from the primary server; if we
now stopped the ucx$bind process, then the above command worked again!
    (until the next reload)

Ripper_> ucx show host bbq.sno.dec.com/server=snoman.sno.dec.com

     BIND database

Server:   16.153.0.190     snoman.sno.dec.com

Host address    Host name

16.153.0.198    BBQ.SNO.DEC.COM
16.153.0.93     BBQ.SNO.DEC.COM
16.153.0.249    BBQ.SNO.DEC.COM
16.153.0.151    BBQ.SNO.DEC.COM

but the bind_startup.log showed more or less the same thing:

UCX BIND Server Debug message -- Tue Mar  4 10:11:39 1997
LB: updating highest rating

CLUSTER HOST - RATING METRIC TABLE
CLUSTER_HOST: 5d009910 RATING_METRIC: 999999
CLUSTER_HOST: f9009910 RATING_METRIC: 999999
CLUSTER_HOST: c6009910 RATING_METRIC: 29
CLUSTER_HOST: 97009910 RATING_METRIC: 28


It would appeared therefore that when one queried the bind server for a cluster
alias, the DNS server would then get a metric update, as evidenced by the bind
log file, but somehow this is not conveyed to the resolver, and this appeared
to be only happening to the second cluster alias, the first one worked, always.

I have UCX$BIND_METRIC_DBG_LEVE define (as 1), but no metric log was created.

This problem is delaying the implementation of the UCX DNS server, to take
advantage of dynamic load balancing.

Any comments welcome.

/David, Sydney CSC


                           
T.RTitleUserPersonal
Name
DateLines