[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | ase |
|
Moderator: | SMURF::GROSSO |
|
Created: | Thu Jul 29 1993 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2114 |
Total number of notes: | 7347 |
2063.0. "More asemgr/cmon/tractd problems" by DONVAN::HARRIS () Thu May 15 1997 14:15
We're trying to put together a system that will be used to develop
training materials for OPS in a Production Server environment. It
seems that this isn't an easy task, since we've found a lot of "gotchas"
that don't show up in the installation guides, but have to be tracked
down via notes and other resources.
The latest is a problem with the tractd daemon. It is somewhat similar
to what I've seen reported in several notes here (listed in 1765.1), yet
not quite the same. Here's our setup:
o Two AlphaServer 2000's with I/O module upgrade to support MC
o Five rz28's on the shared SCSI
o Digital UNIX V4.0B
o Production Server V1.4
o OPS V7.3.2.2.0
Looking at all of the hints and suggestions, here's what I've done.
1. Check to make sure cluster is up and running.
2. Attempt to use cluster monitor
3. Attempt to use asemgr
4. Checked host names for underscores
5. Checked to see that all hostnames are in /etc/hosts file.
6. Checked to see that both networks are registered in /etc/networks.
7. Try running '# netstat | more' to see active connections
8. Try running '# netstat | wc -l' to (line) count connections
9. Use lsof
10. Look at the file: /var/admin/syslog.dated/*BOOT-DATE*
The results are attached below. Given the following information, can
someone please tell me if I'm a candidate for the patch described in
notes 1396.0 and/or 1658.2? We're desperately trying to get course
materials together before funding runs out at the end of the fiscal
year, but are running out of time, just setting up the hardware and
software.
I've also added a quick reference to how we're actually trying to
get to these devices via OPS, at the end, if that's of any help.
Peggy
========================================================================
1. Check to make sure cluster is up and running.
% cnxshow
Cluster View from mcdove
Director: mccanary Suspended: No
Node monitor using tie-breaking disk: Not required
Hostname Cluster I/F CS_ID Incarnation Comm Okay Member
-----------------------------------------------------------------------------
canary.zko.d mccanary 0003,0002 0000000000011438 Yes Yes
dove.zko.dec mcdove 0004,0001 000000000009d372 Yes Yes
========================================================================
2. Attempt to use cluster monitor
Cluster Monitor
Digital Equipment Corporation
Cluster Monitor
Copyright 1996
Initializing. Please wait...
-----------------------------
Cluster Monitor: Warning
!Cannot initialize TRACT library.
+--+
|OK|
+--+
========================================================================
3. Attempt to use asemgr
% su
Password:
#
# asemgr
..........
Initialize failed.
#
========================================================================
4. Checked host names for underscores
Our node names are 'dove' and 'canary', with memory channel names of
mcdove and mccanary. The only places we had underscores were in our
DRD service names, which we changed to be dashes (such as ODB-DRD1).
There is an underscore in the /etc/hosts name cluster_cnx, but I was
of the impression that we have no choice in that one.
========================================================================
5. Checked to see that all hostnames are in /etc/hosts file.
canary: /etc/hosts
#
127.0.0.1 localhost
16.30.48.190 canary.zko.dec.com canary # 2.962 ZKO01-3 Lab
10.0.0.1 mccanary.zko.dec.com mccanary # TCRPS MEMORY CHANNEL
10.0.0.2 mcdove.zko.dec.com mcdove # TCRPS MEMORY CHANNEL
10.0.0.42 cluster_cnx
16.30.48.189 dove.zko.dec.com dove # 2.963 ZKO01-3 Lab
16.30.48.185 mmstuf.zko.dec.com mmstuff # 2.579 ZKO01-3 Lab
16.30.16.214 rdwngs.zko.dec.com rdwngs # 2.93 2D54/3006
.
.
.
------------------------------------------------------------------------
dove: /etc/hosts
#
127.0.0.1 localhost
10.0.0.1 mccanary.zko.dec.com mccanary #
10.0.0.2 mcdove.zko.dec.com mcdove #
10.0.0.42 cluster_cnx
16.30.48.189 dove.zko.dec.com dove # 2.963 ZKO01-3 Lab
16.30.48.190 canary.zko.dec.com canary # 2.962 ZKO01-3 Lab
.
.
.
========================================================================
6. Checked to see that both networks are registered in /etc/networks.
** I'm not sure what to check for on this one, since I missed
any indication of editing this file in the installation guide.
canary: /etc/networks
#
# Syntax: network_name network_number [alias_1,...,alias_n] [ #comments ]
#
# network_name name of the network supplied by the network administrator
# network_number network number assigned to the network by the NIC
# alias other names or abbreviations for this network
# #comments text following the comment character (#) is ignored
#
loop 127 loopback
------------------------------------------------------------------------
dove: /etc/networks
# Syntax: network_name network_number [alias_1,...,alias_n] [ #comments ]
#
# network_name name of the network supplied by the network administrator
# network_number network number assigned to the network by the NIC
# alias other names or abbreviations for this network
# #comments text following the comment character (#) is ignored
#
loop 127 loopback
========================================================================
7. Try running '# netstat | more' to see active connections
Got SEVERAL repeats of the following sequence:
# netstat | more
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 dove.2529 turky.print-sr CLOSE_WAIT
tcp 0 0 mcdove.1023 mcdove.538 ESTABLISHED
tcp 280 0 mcdove.538 mcdove.1023 ESTABLISHED
tcp 0 0 mcdove.1023 mcdove.570 ESTABLISHED
tcp 0 0 mcdove.570 mcdove.1023 ESTABLISHED
tcp 0 0 mcdove.1023 mcdove.602 ESTABLISHED
tcp 0 0 mcdove.602 mcdove.1023 ESTABLISHED
tcp 0 0 mcdove.1023 mcdove.634 ESTABLISHED
tcp 280 0 mcdove.634 mcdove.1023 ESTABLISHED
tcp 0 0 mcdove.1023 mcdove.666 ESTABLISHED
tcp 0 0 mcdove.666 mcdove.1023 ESTABLISHED
tcp 0 0 mcdove.1023 mcdove.698 ESTABLISHED
tcp 280 0 mcdove.698 mcdove.1023 ESTABLISHED
tcp 0 0 mcdove.1023 mcdove.730 ESTABLISHED
. . . .
. . . .
. . . .
========================================================================
8. Try running '# netstat | wc -l' to check line count
# netstat | wc -l
1044
========================================================================
9. Use lsof (but I'm not sure what this does)
# lsof -i :1023
lsof: WARNING: access /.lsof_dove: No such file or directory
lsof: WARNING: created device cache file: /.lsof_dove
COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME
tractd 248 root 4u inet 0x027bf700 0t0 TCP *:1023
tractd 248 root 5u inet 0x027bea00 0t0 TCP mcdove.zko.dec.com:1022->dove.zko.dec.com:1023
tractd 248 root 6u inet 0x027bfa00 0t0 TCP mcdove.zko.dec.com:1021->dove.zko.dec.com:1023
tractd 248 root 7u inet 0x02517f00 0t0 TCP mcdove.zko.dec.com:1020->dove.zko.dec.com:1023
.
. (long list repeats)
.
========================================================================
10. Look at the file: /var/admin/syslog.dated/*BOOT-DATE*
May 14 16:45:41 dove cnxpingd: starting
May 14 16:45:41 dove cnxpingd: waiting to register with kernel agent
May 14 16:45:42 dove cnxagentd: starting
May 14 16:45:53 dove ASE: local HSM Warning: Can't ping mccanary over the SCSI bus
May 14 16:45:53 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 16:45:55 dove ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 14 16:45:55 dove ASE: local HSM Notice: member mccanary is UP
May 14 16:45:55 dove ASE: local HSM ***ALERT: network ping to host mccanary is working but SCSI ping is not
May 14 16:45:55 dove ASE: mcdove Agent Notice: initializing agent... stopping all services
May 14 16:46:03 dove mountd[933]: startup
May 14 16:46:07 dove ASE: mccanary Director Notice: stopped service ODB-DRD3 on mccanary
May 14 16:46:07 dove ASE: mcdove Agent Notice: starting service ODB-DRD3
May 14 16:46:07 dove xntpd[1077]: xntpd version=3.4x
May 14 16:46:07 dove xntpd[1077]: tickadj = 1, tick = 976, tvu_maxslew = 1023
May 14 16:46:07 dove xntpd[1077]: precision = 976 usec
May 14 16:46:09 dove ASE: mccanary Director Notice: stopped service ODB-DRD1 on mccanary
May 14 16:46:09 dove ASE: mcdove Agent Notice: starting service ODB-DRD1
May 14 16:46:09 dove ASE: mccanary Director Notice: stopped service ODB-DRD2 on mccanary
May 14 16:46:09 dove ASE: mcdove Agent Notice: starting service ODB-DRD2
May 14 16:46:10 dove inetd[1197]: bootp/udp: unknown service
May 14 16:46:10 dove ASE: mccanary Director Notice: started service ODB-DRD3 on mcdove
May 14 16:46:13 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:46:21 dove ASE: mccanary Director Notice: started service ODB-DRD2 on mcdove
May 14 16:46:21 dove ASE: mccanary Director Notice: started service ODB-DRD1 on mcdove
May 14 16:46:21 dove ASE: mccanary Director Notice: agent on mcdove came ONLINE
May 14 16:46:27 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:47:51 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:50:59 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 16:50:59 dove xntpd[1077]: time reset (step) 0.133463 s
May 14 16:50:59 dove xntpd[1077]: synchronisation lost
May 14 16:51:00 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:53:38 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:55:49 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 14 16:56:06 dove ASE: mccanary Director Warning: Director exiting...
May 14 16:56:06 dove ASE: mcdove Agent Notice: starting a new director
May 14 16:56:07 dove ASE: mccanary Agent Notice: agent on mcdove will start director
May 14 16:56:10 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:56:11 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 16:56:16 dove ASE: mcdove AseMgr Notice: msgSvcSend: peer hung up before we got reply
May 14 16:56:16 dove ASE: mcdove AseMgr Warning: blocking send of ASE_INQ_SERVICES failed or channel hung up
May 14 16:56:16 dove ASE: mcdove AseMgr Notice: reconnected to director
May 14 16:56:30 dove ASE: mccanary Agent Notice: restarting Agent!
May 14 16:56:30 dove ASE: mccanary Director Notice: deleted member mccanary
May 14 16:56:30 dove ASE: mccanary Director Notice: changed the ASE member list
May 14 16:56:31 dove ASE: mccanary Director Notice: stored a new ASE configuration database
May 14 17:00:11 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 17:00:11 dove ASE: local HSM Notice: Able to ping mccanary over the network
May 14 17:00:12 dove ASE: mccanary Director Notice: changed the ASE member list
May 14 17:00:22 dove ASE: mccanary Director Notice: stored a new ASE configuration database
May 14 17:00:26 dove ASE: local HSM Warning: Can't ping mccanary over the SCSI bus
May 14 17:00:28 dove ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 14 17:00:28 dove ASE: local HSM Notice: member mccanary is UP
May 14 17:00:28 dove ASE: local HSM ***ALERT: network ping to host mccanary is working but SCSI ping is not
May 14 17:05:28 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 17:11:38 dove xntpd[1077]: Previous time adjustment didn't complete
May 14 17:13:44 dove last message repeated 5 times
May 14 17:16:59 dove last message repeated 6 times
May 14 17:16:59 dove xntpd[1077]: *** No more 'Prev time adj didn't complete'
May 14 17:19:14 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 14 20:41:18 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 20:41:38 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 14 20:42:04 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 14 20:51:59 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 14 21:06:56 dove xntpd[1077]: time reset (step) 0.420126 s
.
. more of same
.
May 15 06:58:00 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 06:58:00 dove xntpd[1077]: synchronisation lost
May 15 07:02:33 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 07:03:20 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=1
May 15 08:52:13 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 08:52:13 dove ASE: local AseMgr Error: can't create message service
May 15 08:52:13 dove ASE: local AseMgr Error: Initialize failed.
May 15 08:52:19 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 08:52:19 dove ASE: local AseMgr Error: can't create message service
May 15 08:52:19 dove ASE: local AseMgr Error: Initialize failed.
.
. more of same...
.
May 15 09:07:38 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 09:07:38 dove ASE: local AseMgr Error: can't create message service
May 15 09:07:38 dove ASE: local AseMgr Error: Initialize failed.
May 15 09:07:43 dove ASE: local AseMgr Error: Initialize failed.
May 15 09:52:26 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
May 15 10:30:05 dove xntpd[1077]: time reset (step) -0.340619 s
May 15 10:30:05 dove xntpd[1077]: synchronized to 16.1.0.21, stratum=2
May 15 10:30:05 dove xntpd[1077]: synchronisation lost
May 15 10:35:39 dove xntpd[1077]: synchronized to 16.1.0.4, stratum=2
May 15 11:07:13 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 11:07:13 dove ASE: local AseMgr Error: can't create message service
May 15 11:07:13 dove ASE: local AseMgr Error: Initialize failed.
.
. more of same...
.
May 15 11:08:03 dove ASE: local AseMgr Error: msgSvcCreate: all network reserved ports in use!
May 15 11:08:03 dove ASE: local AseMgr Error: can't create message service
May 15 11:08:03 dove ASE: local AseMgr Error: Initialize failed.
May 15 11:08:08 dove ASE: local AseMgr Error: Initialize failed.
May 15 11:21:08 dove xntpd[1077]: synchronized to 204.123.2.71, stratum=2
========================================================================
FYI... The disks on the shared SCSI's are divided into several
partitions. All of the partitions for one disk are combined into
one service. Service names are OPS-DRD1, OPS-DRD2 and OPS-DRD3.
Symbolic links have been setup, including the following:
.../OraData/TEST/big_data.dbf -> /dev/rdrd/drd17
.../OraData/TEST/control01.ctl -> /dev/rdrd/drd3
.../OraData/TEST/redoODB101.log -> /dev/rdrd/drd1
.../OraData/TEST/system.dbf -> /dev/rdrd/drd4
.../OraData/TEST/tsp_rbs1.dbf -> /dev/rdrd/drd7
These file names were then used to create the OPS database. However,
when we tried to get back to the data, we got an error. It's hard to
narrow this down to Oracle, when we can't even get back into the asemgr
to check the status of the service.
% svrmgrl
Oracle Server Manager Release 2.3.2.0.0 - Production
Copyright (c) Oracle Corporation 1994, 1995. All rights reserved.
Oracle7 Server Release 7.3.2.2.0 with the 64-bit option - Production Release
With the distributed, parallel query and Parallel Server options
PL/SQL Release 2.3.2.2.0 - Production
SVRMGR> connect internal
Connected to an idle instance.
SVRMGR>
SVRMGR> shutdown abort
ORACLE instance shut down.
SVRMGR>
SVRMGR> startup pfile=$ORACLE_HOME/dbs/initTEST.ora
ORACLE instance started.
Total System Global Area 4802496 bytes
Fixed Size 52424 bytes
Variable Size 4225784 bytes
Database Buffers 491520 bytes
Redo Buffers 32768 bytes
Database mounted.
ORA-01157: cannot identify data file 1 - file not found
ORA-01110: data file 1: '/Layers/Oracle/OraData/TEST/system.dbf'
T.R | Title | User | Personal Name | Date | Lines |
---|
2063.1 | Hints | NNTPD::"[email protected]" | Dave Cherkus | Thu May 15 1997 15:09 | 23 |
| If I were you, I would start with the 'can't ping over SCSI bus'
problem in the log file. You will never have a stable ASE til
this is fixed (i.e. asemgr won't work). Fixing this also
might help get tractd unstuck.
So, why are SCSI pings failing?
Some hints:
- Are you using a supported SCSI controller (i.e. KZPSA)?
- Did you set the SCSI IDs for the each of these controllers?
(Hint: one should be 6, the other should be 7, and the
one you set to 6 needs to be changed using console cmds)
- Is there a disk that conflicts with SCSI id 6? Try
getting things to work with one disk first, then add
disks one by one.
- What is the console 'show' commands indicating? Are all
disks visible from all hosts? Does 'sho devices' show
both scsi controllers? ASE won't work till this checks
out
- Is your scsi termination done properly?
Dave
[Posted by WWW Notes gateway]
|
2063.2 | console output on 1st (dove) | DONVAN::HARRIS | | Fri May 16 1997 12:27 | 77 |
| Here's the output from the show config, show device, and show pk* console
commands on 'dove.
dove console output
-------------------
P00>>>show config | more
Digital Equipment Corporation
AlphaServer 2000 4/200
SRM Console V4.7-143 VMS PALcode V5.56-6, OSF PALcode X1..45-12
Component Status Module ID
CPU 0 P B2020-AA DECchip (tm) 21064-3
Memory 0 P B2023-BA 128 MB
I/O B2111-AA
dva0.0.0.1000.0 RX26/RX23
Slot Option Hose 0, Bus 0, PCI
1 NCR 53C810 pka0.7.0.1.0 SCSI Bus ID 7
dka0.0.0.1.0 RZ28
dka100.1.0.1.0 RZ28
dka200.2.0.1.0 RZ26
dka300.3.0.1.0 RZ26L
dka600.6.0.1.0 RRD43
mka500.5.0.1.0 TLZ07 (not on canary)
2 Intel 82375EB Bride to Bus 1, EISA
6 DECchip 21040-AA ewa0.0.0.6.0 08-00-2B-E5-A5-31
7 DEC PCI MC Rev: b, mca0
8 DEC KZPSA pkb0.6.0.8.0 SCSI Bus ID 6
dkb100.1.0.8.0 RZ28D
dkb200.2.0.8.0 RZ28
dkb300.3.0.8.0 RZ28D
dkb400.4.0.8.0 RZ28B
dkb500.5.0.8.0 RZ28
Slot Option Hose 0, Bus 1, EISA
1 DE422
7 Compaq Qvision
P00>>>
P00>>>show device
dka0.0.0.1.0 DKA0 RZ28 442D
dka100.1.0.1.0 DKA100 RZ28 442D
dka200.2.0.1.0 DKA200 RZ26 392A (different?)
dka300.3.0.1.0 DKA300 RZ26L 442D
dka400.4.0.1.0 DKA400 RZ28D 0008 (not on canary)
dka600.6.0.1.0 DKA600 RRD43 1084
dkb100.1.0.8.0 DKB100 RZ28D 0008
dkb200.2.0.8.0 DKB200 RZ28 442D
dkb300.3.0.8.0 DKB300 RZ28D 0008
dkb400.4.0.8.0 DKB400 RZ28B 0003
dkb500.5.0.8.0 DKB500 RZ28 442D
dva0.0.0.1000.0 DVA0 RZ26/RX23
jkb707.7.0.8.0 JKB707 DIGITAL ffff (different)
mka500.5.0.1.0 MKA500 TLZ07 4BE0 (not on canary)
ewa0.0.0.6.0 EWA0 08-00-2B-E5-A5-31
pka0.7.0.1.0 PKA0 SCSI Bus ID 7
pkb0.6.0.8.0 PKB0 SCSI Bus ID 6 F01 A10
P00>>>
P00>>>show pk*
pka0_disconnect 1
pka0_fast 1
pka0_host_id 7
pkb0_fast 1
pkb0_host_id 6
pkb0_termpwr 1
P00>>>boot
================== RFC 822 Headers ==================
Return-Path: [email protected]
Received: by donvan.zko.dec.com (UCX V4.1-12, OpenVMS V7.0 VAX);
Fri, 16 May 1997 11:01:41 -0400
Received: by canary.zko.dec.com; id AA02096; Fri, 16 May 1997 10:59:26 -0400
Date: Fri, 16 May 1997 10:59:26 -0400
From: Peggy Harris <[email protected]>
Message-Id: <[email protected]>
Apparently-To: [email protected]
|
2063.3 | console output from 2nd (canary) | DONVAN::HARRIS | | Fri May 16 1997 12:28 | 76 |
|
Here's the output from the show config, show device, and show pk*
console commands on 'canary'.
Canary Console Output
---------------------
P00>>>show config | more
Digital Equipment Corporation
AlphaServer 2000 4/200
SRM Console V4.7-143 VMS PALcode V5.56-6, OSF PALcode X1..45-12
Component Status Module ID
CPU 0 P B2020-AA DECchip (tm) 21064-3
Memory 0 P B2023-BA 128 MB
I/O B2111-AA
dva0.0.0.1000.0 RX26/RX23
Slot Option Hose 0, Bus 0, PCI
1 NCR 53C810 pka0.7.0.1.0 SCSI Bus ID 7
dka0.0.0.1.0 RZ28
dka100.1.0.1.0 RZ28
dka200.2.0.1.0 RZ26
dka300.3.0.1.0 RZ26L
dka600.6.0.1.0 RRD43
2 Intel 82375EB Bride to Bus 1, EISA
6 DECchip 21040-AA ewa0.0.0.6.0 08-00-2B-E5-C2-64
7 DEC PCI MC Rev: b, mca0
8 DEC KZPSA pkb0.7.0.8.0 SCSI Bus ID 7
dkb100.1.0.8.0 RZ28D
dkb200.2.0.8.0 RZ28
dkb300.3.0.8.0 RZ28D
dkb400.4.0.8.0 RZ28B
dkb500.5.0.8.0 RZ28
jkb607.6.0.8.0 DIGITAL (not on dove)
Slot Option Hose 0, Bus 1, EISA
1 DE422
7 Compaq Qvision
P00>>>
P00>>>show device
dka0.0.0.1.0 DKA0 RZ28 442D
dka100.1.0.1.0 DKA100 RZ28 442D
dka200.2.0.1.0 DKA200 RZ26 T386
dka300.3.0.1.0 DKA300 RZ26L 442D
dka600.6.0.1.0 DKA600 RRD43 1084
dkb100.1.0.8.0 DKB100 RZ28D 0008
dkb200.2.0.8.0 DKB200 RZ28 442D
dkb300.3.0.8.0 DKB300 RZ28D 0008
dkb400.4.0.8.0 DKB400 RZ28B 0003
dkb500.5.0.8.0 DKB500 RZ28 442D
dva0.0.0.1000.0 DVA0 RZ26/RX23
jkb607.6.0.8.0 JKB607 DIGITAL ffff
ewa0.0.0.6.0 EWA0 08-00-2B-E5-C2-64
pka0.7.0.1.0 PKA0 SCSI Bus ID 7
pkb0.7.0.8.0 PKB0 SCSI Bus ID 7 F01 A10
P00>>>
P00>>>show pk*
pka0_disconnect 1
pka0_fast 1
pka0_host_id 7
pkb0_fast 1
pkb0_host_id 7
pkb0_termpwr 1
P00>>>boot
================== RFC 822 Headers ==================
Return-Path: [email protected]
Received: by donvan.zko.dec.com (UCX V4.1-12, OpenVMS V7.0 VAX);
Fri, 16 May 1997 11:01:50 -0400
Received: by canary.zko.dec.com; id AA02083; Fri, 16 May 1997 10:59:36 -0400
Date: Fri, 16 May 1997 10:59:36 -0400
From: Peggy Harris <[email protected]>
Message-Id: <[email protected]>
Apparently-To: [email protected]
|
2063.4 | No conclusive evidence, but... | NNTPD::"[email protected]" | Dave Cherkus | Mon May 19 1997 10:33 | 17 |
| The only tidbit in the data is that on dove the sho conf output
does not list jkb707 whereas its show dev output does. This
name jkb707 corresponds to canary's scsi controller. I'm not
sure how to interpret this. It does correspond to the problem
reported in the logs: dove can't ping canary over the scsi bus.
It is a *hint* that there still is a scsi bus issue.
If it were me I would continue working under the presumption
that it is indeed a scsi bus cabling and/or termination issue.
Did you yank the terminating resistors from the DWZZB (presuming
it is not at the end of the bus)? If you are doing external
termination did you yank the terminators from the KZPSAs? Can
you try shorter cables? Do you have a second set of cables you
can try instead?
[Posted by WWW Notes gateway]
|
2063.5 | Update & steps we took | DONVAN::HARRIS | | Mon May 19 1997 14:52 | 43 |
| Here's an update... We did the following, and are now able to see all
disks from both machines.
o added new-wire_method=0
Rebuilt the kernel, updating the parameter new-wire_method=0. This
fixed the delay in connecting to a database from the OPS service
manager, but there were still troubles accessing the raw devices.
btw... I was told by an OPS technical person to use new-wire_method,
and that seemed to make a difference. However, I've seen it spelled
differently in this conference. Can someone clarify?
o cluster_map_create
Tried to run cluster_map_create and got an rcmgr error. Looked
in the .rchosts files and noticed that the host names were dove
and canary instead of mcdove and mccanary. Added the correct
host names on both sides. Then, reran cluster_map_create.
o ase_fix_config
Was finally able to successfully run CMON on both machines. canary
appeared to see all of the shared disks, but dove only saw about half
of them. So, we ran /var/ase/sbin/ase_fix_config and specified the
shared SCSI bus to be numbered using the default (it was numbered 1,
but we changed it to 16, the default response in ase_fix_config).
Name Controller Slot Bus Slot
) scsi0 psiop0 0 pci0 1
1) scsi16 pza0 0 pci0 8
o Ran cmon
This time, when we ran cmon, it started on both machines, and both
machines saw all of the disks correctly.
I'm not sure exactly which of these steps fixed the problem. Thanks
for the assistance, and directing us to take a closer look at the SCSI
connections. Any additional feedback is welcome, since I plan to go
back and take a closer look to see exactly what is different after
these changes.
Peggy
|