[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:	ase

Moderator:	SMURF::GROSSO

Created:	Thu Jul 29 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2114
Total number of notes:	7347

2063.0. "More asemgr/cmon/tractd problems" by DONVAN::HARRIS () Thu May 15 1997 13:15

    We're trying to put together a system that will be used to develop 
    training materials for OPS in a Production Server environment.  It 
    seems that this isn't an easy task, since we've found a lot of "gotchas" 
    that don't show up in the installation guides, but have to be tracked 
    down via notes and other resources.

    The latest is a problem with the tractd daemon.  It is somewhat similar 
    to what I've seen reported in several notes here (listed in 1765.1), yet 
    not quite the same.  Here's our setup:

    o  Two AlphaServer 2000's with I/O module upgrade to support MC
    o  Five rz28's on the shared SCSI
    o  Digital UNIX V4.0B
    o  Production Server V1.4
    o  OPS V7.3.2.2.0

    Looking at all of the hints and suggestions, here's what I've done.  

    1.  Check to make sure cluster is up and running.
    2.  Attempt to use cluster monitor
    3.  Attempt to use asemgr
    4.  Checked host names for underscores
    5.  Checked to see that all hostnames are in /etc/hosts file.
    6.  Checked to see that both networks are registered in /etc/networks.
    7.  Try running '# netstat | more' to see active connections
    8.  Try running '# netstat | wc -l' to (line) count connections
    9.  Use lsof 
   10.  Look at the file:  /var/admin/syslog.dated/*BOOT-DATE*

    The results are attached below.  Given the following information, can 
    someone please tell me if I'm a candidate for the patch described in 
    notes 1396.0 and/or 1658.2?   We're desperately trying to get course 
    materials together before funding runs out at the end of the fiscal 
    year, but are running out of time, just setting up the hardware and 
    software.

    I've also added a quick reference to how we're actually trying to 
    get to these devices via OPS, at the end, if that's of any help.

    Peggy

========================================================================

    1.  Check to make sure cluster is up and running.

% cnxshow

Cluster View from mcdove

Director: mccanary   Suspended: No 

Node monitor using tie-breaking disk: Not required

Hostname      Cluster I/F   CS_ID      Incarnation          Comm Okay  Member
-----------------------------------------------------------------------------
canary.zko.d  mccanary      0003,0002  0000000000011438        Yes      Yes
dove.zko.dec  mcdove        0004,0001  000000000009d372        Yes      Yes

========================================================================

    2.  Attempt to use cluster monitor

       Cluster Monitor
Digital Equipment Corporation
       Cluster Monitor
       Copyright 1996
Initializing.  Please wait...
-----------------------------

    Cluster Monitor: Warning
!Cannot initialize TRACT library.
        +--+
        |OK|
        +--+

========================================================================

    3.  Attempt to use asemgr

% su
Password:
# 
# asemgr
..........
Initialize failed.
# 
========================================================================

    4.  Checked host names for underscores

    Our node names are 'dove' and 'canary', with memory channel names of 
    mcdove and mccanary.  The only places we had underscores were in our 
    DRD service names, which we changed to be dashes (such as ODB-DRD1).  
    There is an underscore in the /etc/hosts name cluster_cnx, but I was 
    of the impression that we have no choice in that one.

========================================================================

    5.  Checked to see that all hostnames are in /etc/hosts file.

canary:  /etc/hosts

#
127.0.0.1	localhost
16.30.48.190    canary.zko.dec.com      canary	#   2.962  ZKO01-3 Lab
10.0.0.1	mccanary.zko.dec.com mccanary	  #  TCRPS MEMORY CHANNEL
10.0.0.2	mcdove.zko.dec.com mcdove         #  TCRPS MEMORY CHANNEL
10.0.0.42	cluster_cnx	

16.30.48.189    dove.zko.dec.com        dove	#   2.963  ZKO01-3 Lab
16.30.48.185    mmstuf.zko.dec.com      mmstuff	#   2.579  ZKO01-3 Lab
16.30.16.214    rdwngs.zko.dec.com      rdwngs  #   2.93 2D54/3006
   .
   .
   .
------------------------------------------------------------------------
dove:  /etc/hosts

#
127.0.0.1	localhost
10.0.0.1        mccanary.zko.dec.com    mccanary #
10.0.0.2        mcdove.zko.dec.com      mcdove  #
10.0.0.42	cluster_cnx	

16.30.48.189    dove.zko.dec.com        dove	#   2.963  ZKO01-3 Lab
16.30.48.190    canary.zko.dec.com      canary	#   2.962  ZKO01-3 Lab
   .
   .
   .
========================================================================

    6.  Checked to see that both networks are registered in /etc/networks.

        ** I'm not sure what to check for on this one, since I missed 
        any indication of editing this file in the installation guide.

canary:  /etc/networks

#
# Syntax: network_name  network_number [alias_1,...,alias_n] [ #comments ]
#
# network_name    name of the network supplied by the network administrator
# network_number  network number assigned to the network by the NIC
# alias		  other names or abbreviations for this network
# #comments       text following the comment character (#) is ignored
#
loop 127 loopback
------------------------------------------------------------------------
dove:  /etc/networks

# Syntax: network_name  network_number [alias_1,...,alias_n] [ #comments ]
#
# network_name    name of the network supplied by the network administrator
# network_number  network number assigned to the network by the NIC
# alias		  other names or abbreviations for this network
# #comments       text following the comment character (#) is ignored
#
loop 127 loopback
========================================================================

    7.  Try running '# netstat | more' to see active connections

    Got SEVERAL repeats of the following sequence:

# netstat | more
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp        0      0  dove.2529              turky.print-sr         CLOSE_WAIT
tcp        0      0  mcdove.1023            mcdove.538             ESTABLISHED
tcp      280      0  mcdove.538             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.570             ESTABLISHED
tcp        0      0  mcdove.570             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.602             ESTABLISHED
tcp        0      0  mcdove.602             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.634             ESTABLISHED
tcp      280      0  mcdove.634             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.666             ESTABLISHED
tcp        0      0  mcdove.666             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.698             ESTABLISHED
tcp      280      0  mcdove.698             mcdove.1023            ESTABLISHED
tcp        0      0  mcdove.1023            mcdove.730             ESTABLISHED
 .         .      .    .
 .         .      .    .
 .         .      .    .
========================================================================

    8.  Try running '# netstat | wc -l' to check line count

# netstat | wc -l
      1044

========================================================================

    9.  Use lsof (but I'm not sure what this does)

# lsof -i :1023
lsof: WARNING: access /.lsof_dove: No such file or directory
lsof: WARNING: created device cache file: /.lsof_dove
COMMAND     PID     USER   FD   TYPE       DEVICE   SIZE/OFF  INODE NAME
tractd      248     root    4u  inet   0x027bf700        0t0    TCP *:1023
tractd      248     root    5u  inet   0x027bea00        0t0    TCP mcdove.zko.dec.com:1022->dove.zko.dec.com:1023
tractd      248     root    6u  inet   0x027bfa00        0t0    TCP mcdove.zko.dec.com:1021->dove.zko.dec.com:1023
tractd      248     root    7u  inet   0x02517f00        0t0    TCP mcdove.zko.dec.com:1020->dove.zko.dec.com:1023
   .
   .  (long list repeats)
   .

========================================================================

   10.  Look at the file:  /var/admin/syslog.dated/*BOOT-DATE*

May 14 16:45:41 dove cnxpingd: starting
May 14 16:45:41 dove cnxpingd: waiting to register with kernel agent
May 14 16:45:42 dove cnxagentd: starting
May 14 16:45:53 dove ASE: local HSM Warning: Can't ping mccanary over the SCSI bus
May 14 16:45:53 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 16:45:55 dove ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 14 16:45:55 dove ASE: local HSM Notice: member mccanary is UP
May 14 16:45:55 dove ASE: local HSM ***ALERT: network ping to host mccanary is working but SCSI ping is not
May 14 16:45:55 dove ASE: mcdove Agent Notice: initializing agent... stopping all services
May 14 16:46:03 dove mountd[933]: startup
May 14 16:46:07 dove ASE: mccanary Director Notice: stopped service ODB-DRD3 on mccanary
May 14 16:46:07 dove ASE: mcdove Agent Notice: starting service ODB-DRD3
May 14 16:46:07 dove xntpd[1077]: xntpd version=3.4x
May 14 16:46:07 dove xntpd[1077]: tickadj = 1, tick = 976, tvu_maxslew = 1023
May 14 16:46:07 dove xntpd[1077]: precision = 976 usec
May 14 16:46:09 dove ASE: mccanary Director Notice: stopped service ODB-DRD1 on mccanary
May 14 16:46:09 dove ASE: mcdove Agent Notice: starting service ODB-DRD1
May 14 16:46:09 dove ASE: mccanary Director Notice: stopped service ODB-DRD2 on mccanary
May 14 16:46:09 dove ASE: mcdove Agent Notice: starting service ODB-DRD2
May 14 16:46:10 dove inetd[1197]: bootp/udp: unknown service
May 14 16:46:10 dove ASE: mccanary Director Notice: started service ODB-DRD3 on mcdove
May 14 16:46:13 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:46:21 dove ASE: mccanary Director Notice: started service ODB-DRD2 on mcdove
May 14 16:46:21 dove ASE: mccanary Director Notice: started service ODB-DRD1 on mcdove
May 14 16:46:21 dove ASE: mccanary Director Notice: agent on mcdove came ONLINE
May 14 16:46:27 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:47:51 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:50:59 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 16:50:59 dove xntpd[1077]: time reset (step) 0.133463 s
May 14 16:50:59 dove xntpd[1077]: synchronisation lost
May 14 16:51:00 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:53:38 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:55:49 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 14 16:56:06 dove ASE: mccanary Director Warning: Director exiting...
May 14 16:56:06 dove ASE: mcdove Agent Notice: starting a new director
May 14 16:56:07 dove ASE: mccanary Agent Notice: agent on mcdove will start director
May 14 16:56:10 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:56:11 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:56:16 dove ASE: mcdove AseMgr Notice: msgSvcSend: peer hung up before we got reply
May 14 16:56:16 dove ASE: mcdove AseMgr Warning: blocking send of ASE_INQ_SERVICES failed or channel hung up
May 14 16:56:16 dove ASE: mcdove AseMgr Notice: reconnected to director
May 14 16:56:30 dove ASE: mccanary Agent Notice: restarting Agent!
May 14 16:56:30 dove ASE: mccanary Director Notice: deleted member mccanary
May 14 16:56:30 dove ASE: mccanary Director Notice: changed the ASE member list
May 14 16:56:31 dove ASE: mccanary Director Notice: stored a new ASE configuration database
May 14 17:00:11 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 17:00:11 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 17:00:12 dove ASE: mccanary Director Notice: changed the ASE member list
May 14 17:00:22 dove ASE: mccanary Director Notice: stored a new ASE configuration database
May 14 17:00:26 dove ASE: local HSM Warning: Can't ping mccanary over the SCSI bus
May 14 17:00:28 dove ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 14 17:00:28 dove ASE: local HSM Notice: member mccanary is UP
May 14 17:00:28 dove ASE: local HSM ***ALERT: network ping to host mccanary is working but SCSI ping is not
May 14 17:05:28 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 17:11:38 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 17:13:44 dove last message repeated 5 times
May 14 17:16:59 dove last message repeated 6 times
May 14 17:16:59 dove xntpd[1077]: *** No more 'Prev time adj didn't complete'
May 14 17:19:14 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 14 20:41:18 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 20:41:38 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 14 20:42:04 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 20:51:59 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 14 21:06:56 dove xntpd[1077]: time reset (step) 0.420126 s
        .
        .   more of same
        .
May 15 06:58:00 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 06:58:00 dove xntpd[1077]: synchronisation lost
May 15 07:02:33 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 07:03:20 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 15 08:52:13 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 08:52:13 dove ASE: local AseMgr Error: can't create message service
May 15 08:52:13 dove ASE: local AseMgr Error: Initialize failed.
May 15 08:52:19 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 08:52:19 dove ASE: local AseMgr Error: can't create message service
May 15 08:52:19 dove ASE: local AseMgr Error: Initialize failed.
        .
        .  more of same...
        .
May 15 09:07:38 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 09:07:38 dove ASE: local AseMgr Error: can't create message service
May 15 09:07:38 dove ASE: local AseMgr Error: Initialize failed.
May 15 09:07:43 dove ASE: local AseMgr Error: Initialize failed.
May 15 09:52:26 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 15 10:30:05 dove xntpd[1077]: time reset (step) -0.340619 s
May 15 10:30:05 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 10:30:05 dove xntpd[1077]: synchronisation lost
May 15 10:35:39 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=2
May 15 11:07:13 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 11:07:13 dove ASE: local AseMgr Error: can't create message service
May 15 11:07:13 dove ASE: local AseMgr Error: Initialize failed.
        .
        .  more of same...
        .
May 15 11:08:03 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 11:08:03 dove ASE: local AseMgr Error: can't create message service
May 15 11:08:03 dove ASE: local AseMgr Error: Initialize failed.
May 15 11:08:08 dove ASE: local AseMgr Error: Initialize failed.
May 15 11:21:08 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2

========================================================================

    FYI... The disks on the shared SCSI's are divided into several 
    partitions.  All of the partitions for one disk are combined into 
    one service.  Service names are OPS-DRD1, OPS-DRD2 and OPS-DRD3.  
    Symbolic links have been setup, including the following:

	.../OraData/TEST/big_data.dbf -> /dev/rdrd/drd17
	.../OraData/TEST/control01.ctl -> /dev/rdrd/drd3
	.../OraData/TEST/redoODB101.log -> /dev/rdrd/drd1
	.../OraData/TEST/system.dbf -> /dev/rdrd/drd4
	.../OraData/TEST/tsp_rbs1.dbf -> /dev/rdrd/drd7

    These file names were then used to create the OPS database.  However, 
    when we tried to get back to the data, we got an error.  It's hard to 
    narrow this down to Oracle, when we can't even get back into the asemgr 
    to check the status of the service.

% svrmgrl

Oracle Server Manager Release 2.3.2.0.0 - Production

Copyright (c) Oracle Corporation 1994, 1995. All rights reserved.

Oracle7 Server Release 7.3.2.2.0 with the 64-bit option - Production Release
With the distributed, parallel query and Parallel Server options
PL/SQL Release 2.3.2.2.0 - Production

SVRMGR> connect internal
Connected to an idle instance.
SVRMGR> 
SVRMGR> shutdown abort
ORACLE instance shut down.
SVRMGR> 
SVRMGR> startup pfile=$ORACLE_HOME/dbs/initTEST.ora
ORACLE instance started.
Total System Global Area       4802496 bytes
Fixed Size                       52424 bytes
Variable Size                  4225784 bytes
Database Buffers                491520 bytes
Redo Buffers                     32768 bytes
Database mounted.
ORA-01157: cannot identify data file 1 - file not found
ORA-01110: data file 1: '/Layers/Oracle/OraData/TEST/system.dbf'

T.R	Title	User	Personal Name	Date	Lines
2063.1	Hints	NNTPD::"[email protected]"	Dave Cherkus	`Thu May 15 1997 14:09`	23
	If I were you, I would start with the 'can't ping over SCSI bus' problem in the log file. You will never have a stable ASE til this is fixed (i.e. asemgr won't work). Fixing this also might help get tractd unstuck. So, why are SCSI pings failing? Some hints: - Are you using a supported SCSI controller (i.e. KZPSA)? - Did you set the SCSI IDs for the each of these controllers? (Hint: one should be 6, the other should be 7, and the one you set to 6 needs to be changed using console cmds) - Is there a disk that conflicts with SCSI id 6? Try getting things to work with one disk first, then add disks one by one. - What is the console 'show' commands indicating? Are all disks visible from all hosts? Does 'sho devices' show both scsi controllers? ASE won't work till this checks out - Is your scsi termination done properly? Dave [Posted by WWW Notes gateway]
2063.2	console output on 1st (dove)	DONVAN::HARRIS		`Fri May 16 1997 11:27`	77
	Here's the output from the show config, show device, and show pk* console commands on 'dove. dove console output ------------------- P00>>>show config \| more Digital Equipment Corporation AlphaServer 2000 4/200 SRM Console V4.7-143 VMS PALcode V5.56-6, OSF PALcode X1..45-12 Component Status Module ID CPU 0 P B2020-AA DECchip (tm) 21064-3 Memory 0 P B2023-BA 128 MB I/O B2111-AA dva0.0.0.1000.0 RX26/RX23 Slot Option Hose 0, Bus 0, PCI 1 NCR 53C810 pka0.7.0.1.0 SCSI Bus ID 7 dka0.0.0.1.0 RZ28 dka100.1.0.1.0 RZ28 dka200.2.0.1.0 RZ26 dka300.3.0.1.0 RZ26L dka600.6.0.1.0 RRD43 mka500.5.0.1.0 TLZ07 (not on canary) 2 Intel 82375EB Bride to Bus 1, EISA 6 DECchip 21040-AA ewa0.0.0.6.0 08-00-2B-E5-A5-31 7 DEC PCI MC Rev: b, mca0 8 DEC KZPSA pkb0.6.0.8.0 SCSI Bus ID 6 dkb100.1.0.8.0 RZ28D dkb200.2.0.8.0 RZ28 dkb300.3.0.8.0 RZ28D dkb400.4.0.8.0 RZ28B dkb500.5.0.8.0 RZ28 Slot Option Hose 0, Bus 1, EISA 1 DE422 7 Compaq Qvision P00>>> P00>>>show device dka0.0.0.1.0 DKA0 RZ28 442D dka100.1.0.1.0 DKA100 RZ28 442D dka200.2.0.1.0 DKA200 RZ26 392A (different?) dka300.3.0.1.0 DKA300 RZ26L 442D dka400.4.0.1.0 DKA400 RZ28D 0008 (not on canary) dka600.6.0.1.0 DKA600 RRD43 1084 dkb100.1.0.8.0 DKB100 RZ28D 0008 dkb200.2.0.8.0 DKB200 RZ28 442D dkb300.3.0.8.0 DKB300 RZ28D 0008 dkb400.4.0.8.0 DKB400 RZ28B 0003 dkb500.5.0.8.0 DKB500 RZ28 442D dva0.0.0.1000.0 DVA0 RZ26/RX23 jkb707.7.0.8.0 JKB707 DIGITAL ffff (different) mka500.5.0.1.0 MKA500 TLZ07 4BE0 (not on canary) ewa0.0.0.6.0 EWA0 08-00-2B-E5-A5-31 pka0.7.0.1.0 PKA0 SCSI Bus ID 7 pkb0.6.0.8.0 PKB0 SCSI Bus ID 6 F01 A10 P00>>> P00>>>show pk* pka0_disconnect 1 pka0_fast 1 pka0_host_id 7 pkb0_fast 1 pkb0_host_id 6 pkb0_termpwr 1 P00>>>boot ================== RFC 822 Headers ================== Return-Path: [email protected] Received: by donvan.zko.dec.com (UCX V4.1-12, OpenVMS V7.0 VAX); Fri, 16 May 1997 11:01:41 -0400 Received: by canary.zko.dec.com; id AA02096; Fri, 16 May 1997 10:59:26 -0400 Date: Fri, 16 May 1997 10:59:26 -0400 From: Peggy Harris <[email protected]> Message-Id: <[email protected]> Apparently-To: [email protected]
2063.3	console output from 2nd (canary)	DONVAN::HARRIS		`Fri May 16 1997 11:28`	76
	Here's the output from the show config, show device, and show pk* console commands on 'canary'. Canary Console Output --------------------- P00>>>show config \| more Digital Equipment Corporation AlphaServer 2000 4/200 SRM Console V4.7-143 VMS PALcode V5.56-6, OSF PALcode X1..45-12 Component Status Module ID CPU 0 P B2020-AA DECchip (tm) 21064-3 Memory 0 P B2023-BA 128 MB I/O B2111-AA dva0.0.0.1000.0 RX26/RX23 Slot Option Hose 0, Bus 0, PCI 1 NCR 53C810 pka0.7.0.1.0 SCSI Bus ID 7 dka0.0.0.1.0 RZ28 dka100.1.0.1.0 RZ28 dka200.2.0.1.0 RZ26 dka300.3.0.1.0 RZ26L dka600.6.0.1.0 RRD43 2 Intel 82375EB Bride to Bus 1, EISA 6 DECchip 21040-AA ewa0.0.0.6.0 08-00-2B-E5-C2-64 7 DEC PCI MC Rev: b, mca0 8 DEC KZPSA pkb0.7.0.8.0 SCSI Bus ID 7 dkb100.1.0.8.0 RZ28D dkb200.2.0.8.0 RZ28 dkb300.3.0.8.0 RZ28D dkb400.4.0.8.0 RZ28B dkb500.5.0.8.0 RZ28 jkb607.6.0.8.0 DIGITAL (not on dove) Slot Option Hose 0, Bus 1, EISA 1 DE422 7 Compaq Qvision P00>>> P00>>>show device dka0.0.0.1.0 DKA0 RZ28 442D dka100.1.0.1.0 DKA100 RZ28 442D dka200.2.0.1.0 DKA200 RZ26 T386 dka300.3.0.1.0 DKA300 RZ26L 442D dka600.6.0.1.0 DKA600 RRD43 1084 dkb100.1.0.8.0 DKB100 RZ28D 0008 dkb200.2.0.8.0 DKB200 RZ28 442D dkb300.3.0.8.0 DKB300 RZ28D 0008 dkb400.4.0.8.0 DKB400 RZ28B 0003 dkb500.5.0.8.0 DKB500 RZ28 442D dva0.0.0.1000.0 DVA0 RZ26/RX23 jkb607.6.0.8.0 JKB607 DIGITAL ffff ewa0.0.0.6.0 EWA0 08-00-2B-E5-C2-64 pka0.7.0.1.0 PKA0 SCSI Bus ID 7 pkb0.7.0.8.0 PKB0 SCSI Bus ID 7 F01 A10 P00>>> P00>>>show pk* pka0_disconnect 1 pka0_fast 1 pka0_host_id 7 pkb0_fast 1 pkb0_host_id 7 pkb0_termpwr 1 P00>>>boot ================== RFC 822 Headers ================== Return-Path: [email protected] Received: by donvan.zko.dec.com (UCX V4.1-12, OpenVMS V7.0 VAX); Fri, 16 May 1997 11:01:50 -0400 Received: by canary.zko.dec.com; id AA02083; Fri, 16 May 1997 10:59:36 -0400 Date: Fri, 16 May 1997 10:59:36 -0400 From: Peggy Harris <[email protected]> Message-Id: <[email protected]> Apparently-To: [email protected]
2063.4	No conclusive evidence, but...	NNTPD::"[email protected]"	Dave Cherkus	`Mon May 19 1997 09:33`	17
	The only tidbit in the data is that on dove the sho conf output does not list jkb707 whereas its show dev output does. This name jkb707 corresponds to canary's scsi controller. I'm not sure how to interpret this. It does correspond to the problem reported in the logs: dove can't ping canary over the scsi bus. It is a hint that there still is a scsi bus issue. If it were me I would continue working under the presumption that it is indeed a scsi bus cabling and/or termination issue. Did you yank the terminating resistors from the DWZZB (presuming it is not at the end of the bus)? If you are doing external termination did you yank the terminators from the KZPSAs? Can you try shorter cables? Do you have a second set of cables you can try instead? [Posted by WWW Notes gateway]
2063.5	Update & steps we took	DONVAN::HARRIS		`Mon May 19 1997 13:52`	43
	Here's an update... We did the following, and are now able to see all disks from both machines. o added new-wire_method=0 Rebuilt the kernel, updating the parameter new-wire_method=0. This fixed the delay in connecting to a database from the OPS service manager, but there were still troubles accessing the raw devices. btw... I was told by an OPS technical person to use new-wire_method, and that seemed to make a difference. However, I've seen it spelled differently in this conference. Can someone clarify? o cluster_map_create Tried to run cluster_map_create and got an rcmgr error. Looked in the .rchosts files and noticed that the host names were dove and canary instead of mcdove and mccanary. Added the correct host names on both sides. Then, reran cluster_map_create. o ase_fix_config Was finally able to successfully run CMON on both machines. canary appeared to see all of the shared disks, but dove only saw about half of them. So, we ran /var/ase/sbin/ase_fix_config and specified the shared SCSI bus to be numbered using the default (it was numbered 1, but we changed it to 16, the default response in ase_fix_config). Name Controller Slot Bus Slot ) scsi0 psiop0 0 pci0 1 1) scsi16 pza0 0 pci0 8 o Ran cmon This time, when we ran cmon, it started on both machines, and both machines saw all of the disks correctly. I'm not sure exactly which of these steps fixed the problem. Thanks for the assistance, and directing us to take a closer look at the SCSI connections. Any additional feedback is welcome, since I plan to go back and take a closer look to see exactly what is different after these changes. Peggy