[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

2063.0. "More asemgr/cmon/tractd problems" by DONVAN::HARRIS () Thu May 15 1997 14:15

    We're trying to put together a system that will be used to develop 
    training materials for OPS in a Production Server environment.  It 
    seems that this isn't an easy task, since we've found a lot of "gotchas" 
    that don't show up in the installation guides, but have to be tracked 
    down via notes and other resources.

    The latest is a problem with the tractd daemon.  It is somewhat similar 
    to what I've seen reported in several notes here (listed in 1765.1), yet 
    not quite the same.  Here's our setup:

    o  Two AlphaServer 2000's with I/O module upgrade to support MC
    o  Five rz28's on the shared SCSI
    o  Digital UNIX V4.0B
    o  Production Server V1.4
    o  OPS V7.3.2.2.0

    Looking at all of the hints and suggestions, here's what I've done.  

    1.  Check to make sure cluster is up and running.
    2.  Attempt to use cluster monitor
    3.  Attempt to use asemgr
    4.  Checked host names for underscores
    5.  Checked to see that all hostnames are in /etc/hosts file.
    6.  Checked to see that both networks are registered in /etc/networks.
    7.  Try running '# netstat | more' to see active connections
    8.  Try running '# netstat | wc -l' to (line) count connections
    9.  Use lsof 
   10.  Look at the file:  /var/admin/syslog.dated/*BOOT-DATE*

    The results are attached below.  Given the following information, can 
    someone please tell me if I'm a candidate for the patch described in 
    notes 1396.0 and/or 1658.2?   We're desperately trying to get course 
    materials together before funding runs out at the end of the fiscal 
    year, but are running out of time, just setting up the hardware and 
    software.

    I've also added a quick reference to how we're actually trying to 
    get to these devices via OPS, at the end, if that's of any help.

    Peggy

========================================================================

    1.  Check to make sure cluster is up and running.

% cnxshow

Cluster View from mcdove

Director: mccanary   Suspended: No 

Node monitor using tie-breaking disk: Not required

Hostname      Cluster I/F   CS_ID      Incarnation          Comm Okay  Member
-----------------------------------------------------------------------------
canary.zko.d  mccanary      0003,0002  0000000000011438        Yes      Yes
dove.zko.dec  mcdove        0004,0001  000000000009d372        Yes      Yes

========================================================================

    2.  Attempt to use cluster monitor

       Cluster Monitor
Digital Equipment Corporation
       Cluster Monitor
       Copyright 1996
Initializing.  Please wait...
-----------------------------

    Cluster Monitor: Warning
!Cannot initialize TRACT library.
        +--+
        |OK|
        +--+

========================================================================

    3.  Attempt to use asemgr

% su
Password:
# 
# asemgr
..........
Initialize failed.
# 
========================================================================

    4.  Checked host names for underscores

    Our node names are 'dove' and 'canary', with memory channel names of 
    mcdove and mccanary.  The only places we had underscores were in our 
    DRD service names, which we changed to be dashes (such as ODB-DRD1).  
    There is an underscore in the /etc/hosts name cluster_cnx, but I was 
    of the impression that we have no choice in that one.

========================================================================

    5.  Checked to see that all hostnames are in /etc/hosts file.

canary:  /etc/hosts

#
127.0.0.1	localhost
16.30.48.190    canary.zko.dec.com      canary	#   2.962  ZKO01-3 Lab
10.0.0.1	mccanary.zko.dec.com mccanary	  #  TCRPS MEMORY CHANNEL
10.0.0.2	mcdove.zko.dec.com mcdove         #  TCRPS MEMORY CHANNEL
10.0.0.42	cluster_cnx	

16.30.48.189    dove.zko.dec.com        dove	#   2.963  ZKO01-3 Lab
16.30.48.185    mmstuf.zko.dec.com      mmstuff	#   2.579  ZKO01-3 Lab
16.30.16.214    rdwngs.zko.dec.com      rdwngs  #   2.93 2D54/3006
   .
   .
   .
------------------------------------------------------------------------
dove:  /etc/hosts

#
127.0.0.1	localhost
10.0.0.1        mccanary.zko.dec.com    mccanary #
10.0.0.2        mcdove.zko.dec.com      mcdove  #
10.0.0.42	cluster_cnx	

16.30.48.189    dove.zko.dec.com        dove	#   2.963  ZKO01-3 Lab
16.30.48.190    canary.zko.dec.com      canary	#   2.962  ZKO01-3 Lab
   .
   .
   .
========================================================================

    6.  Checked to see that both networks are registered in /etc/networks.

        ** I'm not sure what to check for on this one, since I missed 
        any indication of editing this file in the installation guide.

canary:  /etc/networks

#
# Syntax: network_name  network_number [alias_1,...,alias_n] [ #comments ]
#
# network_name    name of the network supplied by the network administrator
# network_number  network number assigned to the network by the NIC
# alias		  other names or abbreviations for this network
# #comments       text following the comment character (#) is ignored
#
loop 127 loopback
------------------------------------------------------------------------
dove:  /etc/networks

# Syntax: network_name  network_number [alias_1,...,alias_n] [ #comments ]
#
# network_name    name of the network supplied by the network administrator
# network_number  network number assigned to the network by the NIC
# alias		  other names or abbreviations for this network
# #comments       text following the comment character (#) is ignored
#
loop 127 loopback
========================================================================

    7.  Try running '# netstat | more' to see active connections

    Got SEVERAL repeats of the following sequence:

# netstat | more
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp        0      0  dove.2529              turky.print-sr         CLOSE_WAIT
tcp        0      0  mcdove.1023            mcdove.538             ESTABLISHED
tcp      280      0  mcdove.538             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.570             ESTABLISHED
tcp        0      0  mcdove.570             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.602             ESTABLISHED
tcp        0      0  mcdove.602             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.634             ESTABLISHED
tcp      280      0  mcdove.634             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.666             ESTABLISHED
tcp        0      0  mcdove.666             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.698             ESTABLISHED
tcp      280      0  mcdove.698             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.730             ESTABLISHED
 .         .      .    .
 .         .      .    .
 .         .      .    .
========================================================================

    8.  Try running '# netstat | wc -l' to check line count

# netstat | wc -l
      1044

========================================================================

    9.  Use lsof (but I'm not sure what this does)

# lsof -i :1023
lsof: WARNING: access /.lsof_dove: No such file or directory
lsof: WARNING: created device cache file: /.lsof_dove
COMMAND     PID     USER   FD   TYPE       DEVICE   SIZE/OFF  INODE NAME
tractd      248     root    4u  inet   0x027bf700        0t0    TCP *:1023
tractd      248     root    5u  inet   0x027bea00        0t0    TCP mcdove.zko.dec.com:1022->dove.zko.dec.com:1023
tractd      248     root    6u  inet   0x027bfa00        0t0    TCP mcdove.zko.dec.com:1021->dove.zko.dec.com:1023
tractd      248     root    7u  inet   0x02517f00        0t0    TCP mcdove.zko.dec.com:1020->dove.zko.dec.com:1023
   .
   .  (long list repeats)
   .

========================================================================

   10.  Look at the file:  /var/admin/syslog.dated/*BOOT-DATE*

May 14 16:45:41 dove cnxpingd: starting
May 14 16:45:41 dove cnxpingd: waiting to register with kernel agent
May 14 16:45:42 dove cnxagentd: starting
May 14 16:45:53 dove ASE: local HSM Warning: Can't ping mccanary over the SCSI bus
May 14 16:45:53 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 16:45:55 dove ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 14 16:45:55 dove ASE: local HSM Notice: member mccanary is UP
May 14 16:45:55 dove ASE: local HSM ***ALERT: network ping to host mccanary is working but SCSI ping is not
May 14 16:45:55 dove ASE: mcdove Agent Notice: initializing agent... stopping all services
May 14 16:46:03 dove mountd[933]: startup
May 14 16:46:07 dove ASE: mccanary Director Notice: stopped service ODB-DRD3 on mccanary
May 14 16:46:07 dove ASE: mcdove Agent Notice: starting service ODB-DRD3
May 14 16:46:07 dove xntpd[1077]: xntpd version=3.4x
May 14 16:46:07 dove xntpd[1077]: tickadj = 1, tick = 976, tvu_maxslew = 1023
May 14 16:46:07 dove xntpd[1077]: precision = 976 usec
May 14 16:46:09 dove ASE: mccanary Director Notice: stopped service ODB-DRD1 on mccanary
May 14 16:46:09 dove ASE: mcdove Agent Notice: starting service ODB-DRD1
May 14 16:46:09 dove ASE: mccanary Director Notice: stopped service ODB-DRD2 on mccanary
May 14 16:46:09 dove ASE: mcdove Agent Notice: starting service ODB-DRD2
May 14 16:46:10 dove inetd[1197]: bootp/udp: unknown service
May 14 16:46:10 dove ASE: mccanary Director Notice: started service ODB-DRD3 on mcdove
May 14 16:46:13 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:46:21 dove ASE: mccanary Director Notice: started service ODB-DRD2 on mcdove
May 14 16:46:21 dove ASE: mccanary Director Notice: started service ODB-DRD1 on mcdove
May 14 16:46:21 dove ASE: mccanary Director Notice: agent on mcdove came ONLINE
May 14 16:46:27 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:47:51 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:50:59 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 16:50:59 dove xntpd[1077]: time reset (step) 0.133463 s
May 14 16:50:59 dove xntpd[1077]: synchronisation lost
May 14 16:51:00 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:53:38 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:55:49 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 14 16:56:06 dove ASE: mccanary Director Warning: Director exiting...
May 14 16:56:06 dove ASE: mcdove Agent Notice: starting a new director
May 14 16:56:07 dove ASE: mccanary Agent Notice: agent on mcdove will start director
May 14 16:56:10 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:56:11 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:56:16 dove ASE: mcdove AseMgr Notice: msgSvcSend: peer hung up before we got reply
May 14 16:56:16 dove ASE: mcdove AseMgr Warning: blocking send of ASE_INQ_SERVICES failed or channel hung up
May 14 16:56:16 dove ASE: mcdove AseMgr Notice: reconnected to director
May 14 16:56:30 dove ASE: mccanary Agent Notice: restarting Agent!
May 14 16:56:30 dove ASE: mccanary Director Notice: deleted member mccanary
May 14 16:56:30 dove ASE: mccanary Director Notice: changed the ASE member list
May 14 16:56:31 dove ASE: mccanary Director Notice: stored a new ASE configuration database
May 14 17:00:11 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 17:00:11 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 17:00:12 dove ASE: mccanary Director Notice: changed the ASE member list
May 14 17:00:22 dove ASE: mccanary Director Notice: stored a new ASE configuration database
May 14 17:00:26 dove ASE: local HSM Warning: Can't ping mccanary over the SCSI bus
May 14 17:00:28 dove ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 14 17:00:28 dove ASE: local HSM Notice: member mccanary is UP
May 14 17:00:28 dove ASE: local HSM ***ALERT: network ping to host mccanary is working but SCSI ping is not
May 14 17:05:28 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 17:11:38 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 17:13:44 dove last message repeated 5 times
May 14 17:16:59 dove last message repeated 6 times
May 14 17:16:59 dove xntpd[1077]: *** No more 'Prev time adj didn't complete'
May 14 17:19:14 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 14 20:41:18 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 20:41:38 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 14 20:42:04 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 20:51:59 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 14 21:06:56 dove xntpd[1077]: time reset (step) 0.420126 s
        .
        .   more of same
        .
May 15 06:58:00 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 06:58:00 dove xntpd[1077]: synchronisation lost
May 15 07:02:33 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 07:03:20 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 15 08:52:13 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 08:52:13 dove ASE: local AseMgr Error: can't create message service
May 15 08:52:13 dove ASE: local AseMgr Error: Initialize failed.
May 15 08:52:19 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 08:52:19 dove ASE: local AseMgr Error: can't create message service
May 15 08:52:19 dove ASE: local AseMgr Error: Initialize failed.
        .
        .  more of same...
        .
May 15 09:07:38 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 09:07:38 dove ASE: local AseMgr Error: can't create message service
May 15 09:07:38 dove ASE: local AseMgr Error: Initialize failed.
May 15 09:07:43 dove ASE: local AseMgr Error: Initialize failed.
May 15 09:52:26 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 15 10:30:05 dove xntpd[1077]: time reset (step) -0.340619 s
May 15 10:30:05 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 10:30:05 dove xntpd[1077]: synchronisation lost
May 15 10:35:39 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=2
May 15 11:07:13 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 11:07:13 dove ASE: local AseMgr Error: can't create message service
May 15 11:07:13 dove ASE: local AseMgr Error: Initialize failed.
        .
        .  more of same...
        .
May 15 11:08:03 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 11:08:03 dove ASE: local AseMgr Error: can't create message service
May 15 11:08:03 dove ASE: local AseMgr Error: Initialize failed.
May 15 11:08:08 dove ASE: local AseMgr Error: Initialize failed.
May 15 11:21:08 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2

========================================================================

    FYI... The disks on the shared SCSI's are divided into several 
    partitions.  All of the partitions for one disk are combined into 
    one service.  Service names are OPS-DRD1, OPS-DRD2 and OPS-DRD3.  
    Symbolic links have been setup, including the following:

	.../OraData/TEST/big_data.dbf -> /dev/rdrd/drd17
	.../OraData/TEST/control01.ctl -> /dev/rdrd/drd3
	.../OraData/TEST/redoODB101.log -> /dev/rdrd/drd1
	.../OraData/TEST/system.dbf -> /dev/rdrd/drd4
	.../OraData/TEST/tsp_rbs1.dbf -> /dev/rdrd/drd7

    These file names were then used to create the OPS database.  However, 
    when we tried to get back to the data, we got an error.  It's hard to 
    narrow this down to Oracle, when we can't even get back into the asemgr 
    to check the status of the service.

% svrmgrl

Oracle Server Manager Release 2.3.2.0.0 - Production

Copyright (c) Oracle Corporation 1994, 1995. All rights reserved.

Oracle7 Server Release 7.3.2.2.0 with the 64-bit option - Production Release
With the distributed, parallel query and Parallel Server options
PL/SQL Release 2.3.2.2.0 - Production

SVRMGR> connect internal
Connected to an idle instance.
SVRMGR> 
SVRMGR> shutdown abort
ORACLE instance shut down.
SVRMGR> 
SVRMGR> startup pfile=$ORACLE_HOME/dbs/initTEST.ora
ORACLE instance started.
Total System Global Area       4802496 bytes
Fixed Size                       52424 bytes
Variable Size                  4225784 bytes
Database Buffers                491520 bytes
Redo Buffers                     32768 bytes
Database mounted.
ORA-01157: cannot identify data file 1 - file not found
ORA-01110: data file 1: '/Layers/Oracle/OraData/TEST/system.dbf'

T.RTitleUserPersonal
Name
DateLines
2063.1HintsNNTPD::"[email protected]"Dave CherkusThu May 15 1997 15:0923
If I were you, I would start with the 'can't ping over SCSI bus'
problem in the log file.  You will never have a stable ASE til
this is fixed (i.e. asemgr won't work).  Fixing this also
might help get tractd unstuck.

So, why are SCSI pings failing?

Some hints:
  - Are you using a supported SCSI controller (i.e. KZPSA)?
  - Did you set the SCSI IDs for the each of these controllers?
    (Hint: one should be 6, the other should be 7, and the 
    one you set to 6 needs to be changed using console cmds)
  - Is there a disk that conflicts with SCSI id 6?  Try
    getting things to work with one disk first, then add
    disks one by one.
  - What is the console 'show' commands indicating?  Are all
    disks visible from all hosts?  Does 'sho devices' show
    both scsi controllers?  ASE won't work till this checks
    out
  - Is your scsi termination done properly?

Dave
[Posted by WWW Notes gateway]
2063.2console output on 1st (dove)DONVAN::HARRISFri May 16 1997 12:2777
    Here's the output from the show config, show device, and show pk* console 
    commands on 'dove.
    
dove console output
-------------------

P00>>>show config | more
                        Digital Equipment Corporation
                           AlphaServer 2000 4/200
                          
SRM Console V4.7-143             VMS PALcode V5.56-6, OSF PALcode X1..45-12

Component       Status           Module ID
CPU 0              P             B2020-AA DECchip (tm) 21064-3
Memory 0           P             B2023-BA 128 MB
I/O                              B2111-AA
                                 dva0.0.0.1000.0         RX26/RX23
                                 
Slot   Option                    Hose 0, Bus 0, PCI
  1    NCR 53C810                pka0.7.0.1.0            SCSI Bus ID 7
                                 dka0.0.0.1.0            RZ28
                                 dka100.1.0.1.0          RZ28
                                 dka200.2.0.1.0          RZ26
                                 dka300.3.0.1.0          RZ26L
                                 dka600.6.0.1.0          RRD43
                                 mka500.5.0.1.0          TLZ07 (not on canary)
  2    Intel 82375EB                                     Bride to Bus 1, EISA
  6    DECchip 21040-AA          ewa0.0.0.6.0            08-00-2B-E5-A5-31
  7    DEC PCI MC                                        Rev: b, mca0
  8    DEC KZPSA                 pkb0.6.0.8.0            SCSI Bus ID 6
                                 dkb100.1.0.8.0          RZ28D
                                 dkb200.2.0.8.0          RZ28
                                 dkb300.3.0.8.0          RZ28D
                                 dkb400.4.0.8.0          RZ28B
                                 dkb500.5.0.8.0          RZ28
                                 
Slot   Option                    Hose 0, Bus 1, EISA
  1    DE422
  7    Compaq Qvision

P00>>>
P00>>>show device
dka0.0.0.1.0               DKA0                  RZ28   442D
dka100.1.0.1.0             DKA100                RZ28   442D
dka200.2.0.1.0             DKA200                RZ26   392A (different?)
dka300.3.0.1.0             DKA300               RZ26L   442D
dka400.4.0.1.0             DKA400               RZ28D   0008 (not on canary)
dka600.6.0.1.0             DKA600               RRD43   1084
dkb100.1.0.8.0             DKB100               RZ28D   0008
dkb200.2.0.8.0             DKB200                RZ28   442D
dkb300.3.0.8.0             DKB300               RZ28D   0008
dkb400.4.0.8.0             DKB400               RZ28B   0003
dkb500.5.0.8.0             DKB500                RZ28   442D
dva0.0.0.1000.0            DVA0             RZ26/RX23
jkb707.7.0.8.0             JKB707             DIGITAL   ffff (different)
mka500.5.0.1.0             MKA500               TLZ07   4BE0 (not on canary)
ewa0.0.0.6.0               EWA0     08-00-2B-E5-A5-31                                
pka0.7.0.1.0               PKA0         SCSI Bus ID 7
pkb0.6.0.8.0               PKB0         SCSI Bus ID 6    F01 A10
P00>>>
P00>>>show pk*
pka0_disconnect      1
pka0_fast            1
pka0_host_id         7
pkb0_fast            1
pkb0_host_id         6
pkb0_termpwr         1
P00>>>boot
================== RFC 822 Headers ==================
Return-Path: [email protected]
Received: by donvan.zko.dec.com (UCX V4.1-12, OpenVMS V7.0 VAX);
	Fri, 16 May 1997 11:01:41 -0400
Received: by canary.zko.dec.com; id AA02096; Fri, 16 May 1997 10:59:26 -0400
Date: Fri, 16 May 1997 10:59:26 -0400
From: Peggy Harris <[email protected]>
Message-Id: <[email protected]>
Apparently-To: [email protected]
2063.3console output from 2nd (canary)DONVAN::HARRISFri May 16 1997 12:2876
    
    Here's the output from the show config, show device, and show pk* 
    console commands on 'canary'.
    
Canary Console Output
---------------------

P00>>>show config | more
                        Digital Equipment Corporation
                           AlphaServer 2000 4/200
                          
SRM Console V4.7-143             VMS PALcode V5.56-6, OSF PALcode X1..45-12

Component       Status           Module ID
CPU 0              P             B2020-AA DECchip (tm) 21064-3
Memory 0           P             B2023-BA 128 MB
I/O                              B2111-AA
                                 dva0.0.0.1000.0         RX26/RX23
                                 
Slot   Option                    Hose 0, Bus 0, PCI
  1    NCR 53C810                pka0.7.0.1.0            SCSI Bus ID 7
                                 dka0.0.0.1.0            RZ28
                                 dka100.1.0.1.0          RZ28
                                 dka200.2.0.1.0          RZ26
                                 dka300.3.0.1.0          RZ26L
                                 dka600.6.0.1.0          RRD43
  2    Intel 82375EB                                     Bride to Bus 1, EISA
  6    DECchip 21040-AA          ewa0.0.0.6.0            08-00-2B-E5-C2-64
  7    DEC PCI MC                                        Rev: b, mca0
  8    DEC KZPSA                 pkb0.7.0.8.0            SCSI Bus ID 7
                                 dkb100.1.0.8.0          RZ28D
                                 dkb200.2.0.8.0          RZ28
                                 dkb300.3.0.8.0          RZ28D
                                 dkb400.4.0.8.0          RZ28B
                                 dkb500.5.0.8.0          RZ28
                                 jkb607.6.0.8.0          DIGITAL (not on dove)
                                 
Slot   Option                    Hose 0, Bus 1, EISA
  1    DE422
  7    Compaq Qvision

P00>>>
P00>>>show device
dka0.0.0.1.0               DKA0                  RZ28   442D
dka100.1.0.1.0             DKA100                RZ28   442D
dka200.2.0.1.0             DKA200                RZ26   T386
dka300.3.0.1.0             DKA300               RZ26L   442D
dka600.6.0.1.0             DKA600               RRD43   1084
dkb100.1.0.8.0             DKB100               RZ28D   0008
dkb200.2.0.8.0             DKB200                RZ28   442D
dkb300.3.0.8.0             DKB300               RZ28D   0008
dkb400.4.0.8.0             DKB400               RZ28B   0003
dkb500.5.0.8.0             DKB500                RZ28   442D
dva0.0.0.1000.0            DVA0             RZ26/RX23
jkb607.6.0.8.0             JKB607             DIGITAL   ffff
ewa0.0.0.6.0               EWA0     08-00-2B-E5-C2-64                                
pka0.7.0.1.0               PKA0         SCSI Bus ID 7
pkb0.7.0.8.0               PKB0         SCSI Bus ID 7    F01 A10
P00>>>
P00>>>show pk*
pka0_disconnect      1
pka0_fast            1
pka0_host_id         7
pkb0_fast            1
pkb0_host_id         7
pkb0_termpwr         1
P00>>>boot
================== RFC 822 Headers ==================
Return-Path: [email protected]
Received: by donvan.zko.dec.com (UCX V4.1-12, OpenVMS V7.0 VAX);
	Fri, 16 May 1997 11:01:50 -0400
Received: by canary.zko.dec.com; id AA02083; Fri, 16 May 1997 10:59:36 -0400
Date: Fri, 16 May 1997 10:59:36 -0400
From: Peggy Harris <[email protected]>
Message-Id: <[email protected]>
Apparently-To: [email protected]
2063.4No conclusive evidence, but...NNTPD::&quot;[email protected]&quot;Dave CherkusMon May 19 1997 10:3317
The only tidbit in the data is that on dove the sho conf output
does not list jkb707 whereas its show dev output does.  This
name jkb707 corresponds to canary's scsi controller.  I'm not
sure how to interpret this.  It does correspond to the problem
reported in the logs: dove can't ping canary over the scsi bus.
It is a *hint* that there still is a scsi bus issue. 

If it were me I would continue working under the presumption
that it is indeed a scsi bus cabling and/or termination issue.
Did you yank the terminating resistors from the DWZZB (presuming 
it is not at the end of the bus)?  If you are doing external 
termination did you yank the terminators from the KZPSAs? Can 
you try shorter cables?  Do you have a second set of cables you 
can try instead?


[Posted by WWW Notes gateway]
2063.5Update & steps we tookDONVAN::HARRISMon May 19 1997 14:5243
    Here's an update...  We did the following, and are now able to see all
    disks from both machines.  

    o added new-wire_method=0

    Rebuilt the kernel, updating the parameter new-wire_method=0.  This 
    fixed the delay in connecting to a database from the OPS service 
    manager, but there were still troubles accessing the raw devices.  
    btw... I was told by an OPS technical person to use new-wire_method, 
    and that seemed to make a difference.  However, I've seen it spelled
    differently in this conference.  Can someone clarify?  

    o cluster_map_create 

    Tried to run cluster_map_create and got an rcmgr error.  Looked 
    in the .rchosts files and noticed that the host names were dove 
    and canary instead of mcdove and mccanary.  Added the correct 
    host names on both sides.  Then, reran cluster_map_create.

    o  ase_fix_config

    Was finally able to successfully run CMON on both machines.  canary 
    appeared to see all of the shared disks, but dove only saw about half 
    of them.  So, we ran /var/ase/sbin/ase_fix_config and specified the 
    shared SCSI bus to be numbered using the default (it was numbered 1, 
    but we changed it to 16, the default response in ase_fix_config).

   	Name	Controller	Slot	Bus	Slot
     )  scsi0	psiop0		0	pci0	1
    1)  scsi16	pza0		0	pci0	8

    o  Ran cmon

    This time, when we ran cmon, it started on both machines, and both 
    machines saw all of the disks correctly.
    
    I'm not sure exactly which of these steps fixed the problem.  Thanks 
    for the assistance, and directing us to take a closer look at the SCSI 
    connections.  Any additional feedback is welcome, since I plan to go 
    back and take a closer look to see exactly what is different after 
    these changes.

    Peggy