[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

2072.0. "TruCluster & OPS recovery" by HGOV08::NANDAN () Tue May 20 1997 10:12

    
    We have configured a TruCluster Production Server setup as 
    follows-
    System1         System2
    -------         -------
    as4100a         as4100b
    mc1             mc2
    
    The DIGITAL UNIX is 4.0B with all the 4.0B patches, KZPSA patch & 
    new-wire-method=0. The TCR version is 1.4.
    
    The shared storage is in a SW300 using HSZ50s (dual-redundant).
    
    We are running Oracle Parallel Server (OPS) 7.3.2.3 with this 
    setup. The 3 Oracle patches (420001, 424307 & 425425) are also 
    applied.
    
    We have done all checks such as mc_cable & mc_diag at the console 
    level and then the usual clu_ivp, cnxshow, at the UNIX level.
    
    We have the services on mc1. We do an update of a row on 
    as4100a/mc1. We do not commit it.
    
    We try to update the same row from as4100b/mc2. The row is locked 
    & so does not go thru' with the update.
    
    We turn off as4100a/mc1, which has the services, as well as the 
    lock on the database row. On the second system, as4100b/mc2, the 
    update continues to hang.
    
    Thru' 'asemgr' we can see that the service has relocated to 
    as4100b/mc2.
    
    Only when we again reboot as4100a/mc1, and the system comes up, is 
    the lock on the row released & the update goes thru'.
    
    We feel that because the lock is released on system reboot, the 
    problem is to do with TruCluster rather than with Oracle. Had the 
    Oracle jobs continued to hang, we would have suspected Oracle 
    recovery.
    
    Incidentally, we found that we need to change access permissions 
    on '/dev/rdrd/drd*' files, on both nodes. Otherwise, neither are 
    we able to create the databases, nor are we able to startup.
    
    We would appreciate any suggestions about this.
    
    Thanks in advance.
    
    Nandan
     

T.RTitleUserPersonal
Name
DateLines
2072.1Trouble still...HGOV08::NANDANWed May 21 1997 10:29214
    
    Subsequently we have used a BA350 (instead of a SW300/HSZ50) to 
    eliminate the possibility of SW300/HSZ50 being the problem.
    
    The software was reloaded.
    
    Now, the problem has worsened. Even after rebooting the system 
    that has been turned off, the lock on the database row is not 
    released.
    
    Following this, I am putting in the console log, and the 
    daemon.log respectively.
    
    This is the second time we have postponed a demo to an important 
    customer because we are not able to set up TruCluster. We would 
    appreciate any pointers from anyone. Also would appreciate any 
    person that we can call up & speak to. Someone who has succesfully 
    installed & demo-ed the TruCluster with OPS.
    
    Thanks in advance.
    
    Nandan
    -----------------------------------------------------------------
Console log for as4100b
cnxagent: added node mc1

cnxagent: mc2 is now a cluster member

dlm_agent: resuming lock activity

cnxagent: resuming

rmerror_state_change: unit = 0  Err_reg = 0x1400 node = 0

memory channel - removing node 0

rmerror_int: Error_count = 6 unit = 0 Err_reg = 0x1400 Node = 0

ccomsub: state change detected by this node via callback

ccomsub: state change: run 1 new 7 fixed 6 cpu 1

cnxlock: acquired director lock: entering CNX_RUN state

cnxagent: communication error detected

cnxagent: breaking channel to mc1

dlm_agent: suspending lock activity

cnxagent: disconnecting channel to mc1

ccomsub: state change: end 1 new 7 fixed 7 cpu 1

ccomsub: Successfully reconfigured for member 0 down

cnxagent: suspending

Cluster Memory Channel primary adapter is online.

	Rev 14 adapter is the primary channel (pci bus 1, slot 4)

	connected to a virtual hub (VH1) as node 1.

skipping test/delay for VH0/VH1 system

memory channel status request from node 0

memory channel request from node 0

memory channel update request from node 0

memory channel - adding node 0
-------------------------------------------------------------------------------
May 21 18:11:28 as4100b cnxpingd: starting
May 21 18:11:28 as4100b cnxmond: Tie-breaker disk is required
May 21 18:11:28 as4100b cnxmond: Found 0 tie-breaker disks
May 21 18:11:28 as4100b cnxmond: Cluster join inhibited until disks are configured
May 21 18:11:28 as4100b cnxpingd: waiting to register with kernel agent
May 21 18:11:29 as4100b cnxagentd: starting
May 21 18:11:33 as4100b ASE: local HSM Notice: Able to ping mc1 over the network
May 21 18:11:33 as4100b ASE: local HSM Notice: Able to ping mc1 over the SCSI bus
May 21 18:11:33 as4100b cnxpingd: Tie-breaker disk not configured... waiting
May 21 18:11:35 as4100b ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 21 18:11:35 as4100b ASE: local HSM Notice: member mc1 is UP
May 21 18:11:36 as4100b ASE: mc2 Agent Notice: initializing agent... stopping all services
May 21 18:15:44 as4100b cnxpingd: Tie-breaker disk configured
May 21 18:56:41 as4100b ASE: mc1 Agent Notice: adding service oracle
May 21 18:56:41 as4100b ASE: mc2 Agent Notice: adding service oracle
May 21 18:56:43 as4100b ASE: mc1 Director Notice: added service oracle
May 21 18:56:43 as4100b ASE: mc1 AseMgr Notice: Added service oracle
May 21 18:56:43 as4100b ASE: mc1 Agent Notice: starting service oracle
May 21 18:56:45 as4100b ASE: mc1 Director Notice: started oracle on mc1
May 21 18:56:45 as4100b ASE: mc1 Director Notice: stored a new ASE configuration database
May 21 20:28:08 as4100b ASE: mc1 Agent Notice: stopping service oracle
May 21 20:28:10 as4100b ASE: mc1 Director Notice: stopped oracle on mc1
May 21 20:28:10 as4100b ASE: mc2 Agent Notice: starting service oracle
May 21 20:28:12 as4100b ASE: mc1 Director Notice: started oracle on mc2
May 21 20:28:12 as4100b ASE: mc1 AseMgr Notice: Relocated service oracle to mc2
May 21 20:31:11 as4100b ASE: mc2 Agent Notice: stopping service oracle
May 21 20:31:12 as4100b ASE: mc1 Director Notice: stopped oracle on mc2
May 21 20:31:12 as4100b ASE: mc1 Agent Notice: starting service oracle
May 21 20:31:14 as4100b ASE: mc1 AseMgr Notice: Relocated service oracle to mc1
May 21 20:31:14 as4100b ASE: mc1 Director Notice: started oracle on mc1
May 21 20:37:54 as4100b cnxmond: changed alias with : /sbin/ifconfig mc0 alias 10.0.0.42 netmask 255.255.255.0
May 21 20:37:55 as4100b cnxmgrd: starting
May 21 20:37:56 as4100b ASE: local HSM Warning: Can't ping mc1 over the SCSI bus
May 21 20:38:01 as4100b ASE: local HSM Warning: Can't ping mc1 over the network
May 21 20:38:01 as4100b ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:DOWN
May 21 20:38:01 as4100b ASE: local HSM Warning: member mc1 is DOWN
May 21 20:38:07 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:38:07 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:38:09 as4100b cnxpingd: error reporting event
May 21 20:38:09 as4100b cnxmond: mark active mc2
May 21 20:38:16 as4100b ASE: mc2 Agent Notice: starting a new director
May 21 20:38:17 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:38:17 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:38:18 as4100b ASE: local Director ***ALERT: Member mc1 is not available
May 21 20:38:18 as4100b ASE: mc2 Agent Notice: starting service oracle
May 21 20:38:23 as4100b ASE: mc2 Director Notice: started service oracle on mc2
May 21 20:38:24 as4100b cnxmond: recovery delay completed
May 21 20:38:27 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:38:27 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:38:37 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:38:37 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:38:47 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:38:47 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:38:57 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:38:57 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:39:07 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:39:07 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:39:17 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:39:17 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:39:27 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:39:27 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:39:37 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:39:37 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:39:37 as4100b ASE: mc2 AseMgr Error: ASE_INQ_SERVICES request to director failed
May 21 20:39:47 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:39:47 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:39:57 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:39:57 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:40:07 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:40:07 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:40:17 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:40:17 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:40:27 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:40:27 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:40:37 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:40:37 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:40:47 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:40:47 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:40:57 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:40:57 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:41:07 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:41:07 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:41:17 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:41:17 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:41:17 as4100b ASE: mc2 AseMgr Error: ASE_INQ_SERVICES request to director failed
May 21 20:41:27 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:41:27 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:41:37 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:41:37 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:41:47 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:41:47 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:41:57 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:41:57 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:42:07 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:42:07 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:42:17 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:42:17 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:42:27 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:42:27 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:42:37 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:42:37 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:42:47 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:42:47 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:42:57 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:42:57 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:42:57 as4100b ASE: mc2 AseMgr Error: ASE_INQ_SERVICES request to director failed
May 21 20:43:07 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:43:07 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:43:17 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:43:17 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:43:27 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:43:27 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:43:37 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:43:37 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:43:47 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:43:47 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:43:57 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:43:57 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:44:07 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:44:07 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:44:17 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:44:17 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:44:22 as4100b ASE: local HSM Notice: Able to ping mc1 over the network
May 21 20:44:22 as4100b ASE: local HSM Notice: Able to ping mc1 over the SCSI bus
May 21 20:44:22 as4100b ASE: local HSM ***ALERT: HSM_PATH_STATUS:10.0.0.1:UP
May 21 20:44:22 as4100b ASE: local HSM Notice: member mc1 is UP
May 21 20:44:27 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:44:27 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:44:28 as4100b cnxmond: mark active mc1
May 21 20:44:31 as4100b ASE: mc1 Agent Notice: initializing agent... stopping all services
May 21 20:44:33 as4100b ASE: mc2 Director Notice: agent on mc1 came ONLINE
May 21 20:44:37 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:44:37 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:44:37 as4100b ASE: mc2 AseMgr Error: ASE_INQ_SERVICES request to director failed
May 21 20:44:47 as4100b ASE: mc2 AseMgr Warning: timeout waiting on Reply to ASE_INQ_SERVICES
May 21 20:44:47 as4100b ASE: mc2 AseMgr Notice: director request timed out, retrying...
May 21 20:44:51 as4100b ASE: mc2 AseMgr Notice: msgSvcSend: peer hung up before we got reply
May 21 20:44:52 as4100b ASE: mc2 AseMgr Warning: blocking send of ASE_INQ_SERVICES failed or channel hung up
May 21 20:44:52 as4100b ASE: mc2 AseMgr Notice: reconnected to director
--------------------------------------------------------------------------------