Title: | + OpenVMS Clusters - The best clusters in the world! + |
Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 |
Moderator: | PROXY::MOORE |
Created: | Fri Aug 26 1988 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 5320 |
Total number of notes: | 23384 |
I have been working with a customer and Field Engineer on a brand new Alpha 4100 cluster running V7.1. The memory channel comes up as "offline". Each system is configured identically with CIPCA, CCMAA-AA, 100Mbit Ethernet, 1.5GB memory. They share a common system disk off of an HSJ50, (they have several of these as well), there is a quorum disk, too. The mc_services_pn are set correctly, the MC and PM drivers do get loaded. An SDA> show ports lists PMA as valid. The SCA path between these machines is the CIPCA. They have PEDRIVER loaded, but that is not being used. There is no MC hub, just point to point 3m cable. I am looking for any ideas that you folks may have as to why the MC shows "offline". An SDA> show device MCA0 or PMA0 has 00000000's for device status. This is the first "problem" call we have recieved here in the CSC for MC on OpenVMS. Are there any diagnostics or anything we can use to try to figure this out? Thanks, Steve
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
5240.1 | EEMELI::MOSER | Orienteers do it in the bush... | Sun Mar 02 1997 12:58 | 18 | |
since you have a virtual hub, one is the master and one is the slave. Have you correctly set the jumpers on the MC board (CCMAA)? You could check this from the running system via SDA> show dev mca0/pdt, and do this on both machines. One should be the virtual master and one the virtual slave. You mention that the MC_xxx sysgen parameters are set correctly. By this I assume you mean that all 'current' values are the same as the 'default', right? As a next step, I would do the following: - power-cycle both machines - then boot first the virtual hub master system all the way up - check that MCA0 is 'unavailable' - now boot the virtual hub slave - both MCA0 should become online /cmos | |||||
5240.2 | CSC32::S_DANNEN | Live long and slobber | Mon Mar 03 1997 09:00 | 12 | |
Yes, all of the MC_SERVICES_Pn are at default, except _P7. I was hoping that maybe this would give us some additional verbage during boot to try to determine the cause of the problem. I will ask the Field Engineer to go on-site and verify the jumpers, and use your procedure to verify that it is working and post a reply here. Would anyone have any information as to whether the following are available on-line? EK-PCIRM-SV-A01 & EK-PCIRM-UG-A01 Thanks! Steve | |||||
5240.3 | Almost working now, but... | CSC32::S_DANNEN | Live long and slobber | Fri Mar 14 1997 18:36 | 33 |
It turns out that the jumpers were set incorrectly. The local Field Engineers set them up for virtual master and virtual slave, and got this on boot: %CNXMAN, Using local access method for quorum disk %CNXMAN, Established "connection" to quorum disk %MCA0 CPU00: Init retries exceeded, going off-line.ng %MCA0 CPU00: Too many initialization retries. The MCA0 from $ show device is again offline We then booted the master first, then the slave, and now we do indeed see the MCA0 and PMA0 in "Unavailable". A $ show clu/cont with LPORT shows the PMA0 port correctly. The customer rebooted the machines - out of sequence - slave first then master, and again the %MCA0 CPU00: Too many initialization retries. We then booted the master first, then the slave, and now we do indeed see the MCA0 and PMA0 in "Unavailable". A $ show clu/cont with LPORT shows the PMA0 port correctly. The customer rebooted the machines - out of sequence - slave first then master, and again the %MCA0 CPU00: Too many initialization retries. Is this the intended behavior, to be forced to boot the master first then the slave to get these things to synch up and become usable? If it is, this really shoots holes in the "High availablity model" for OpenVMSclusters. Since the hardware is strapped for master and slave, I would hope that this would be automagic, and not have to have operator intervention to get things right. /steve | |||||
5240.4 | STAR::NCARR | Talk dates & features - but never together.... | Mon Mar 17 1997 15:57 | 13 | |
The boot order dependency for Memory Channel clusters without a hub (master first, then the slave) is a temporary restriction. This problem was found very late in testing, and does not occur with all AlphaServer pairings. (It's system speed dependent.) The restriction will be lifted in the soon-to-be-available Memory Channal patch kit. If this becomes a serious issue at your customer, loan them a proper Memory Channel hub until the patch kit comes through. This problem does not occur with real hubs. (Of course, I realise that hubs are not exactly thick on the ground....) | |||||
5240.5 | thanks, anticipating the patch | CSC32::S_DANNEN | Live long and slobber | Tue Mar 18 1997 12:10 | 7 |
Thanks Nick, the customer is not yet in production for this cluster so they can wait for the patch to become available. We are starting to see several calls come into the CSC recently about this boot dependency issue, and a few Field Engineers have been swapping the hardware in an attempt to correct the behavior. /steve | |||||
5240.6 | STAR::NCARR | Talk dates & features - but never together.... | Tue Mar 18 1997 14:39 | 2 | |
I should have added that this restriction is not documented anywhere. The patch kit will be available before we could update the documentation.... | |||||
5240.7 | MCSHOW.EXE | EVMS::SCHUETZ | VMS Clusters Memory Channel 381-6075 | Wed Apr 30 1997 18:33 | 70 |
There exists a small utility that uses the call interface to show some basic information on Memory Channel. If someone could host it, I'll MAIL/FOREIGN it to you. $run MCSHOW.EXE Memory Channel information for community 0 0 is MCA, 1 is MCB etc device_name = MCA adapter_type = 49 adapter_version = e "e" is V1.5 - CCMAA-AB 65MB "b" is V1.0 - CCMAA-AA 35MB software_revision = 1 PDT = FFFFFFFF80CA5200 adapter is REAL_HUB_SLAVE or VIRTUAL_HUB_MASTER/SLAVE >If this line is blank, you have a real hardware problem. node ID = 4 0=Master, 1=Slave, else = slot on hub max nodes = 0 current state = 66 total error count = 0 tags size = 81920 function of MC_SERVICES_P3 regions size = 32768 _P4 locks size = 32768 channels size = 10526720 (_P6+56) * _P9 * 64 + some more >THIS IS REAL PHYSICAL MEMORY THAT GETS CONSUMED ! >You can lower P6 to the SYSGEN MIN of 544, and P9 to 50 or less >if you're not pushing a lot of traffic to get back memory if >you need to. In the TIMA kit, MC will go OFFLINE unless you have at >least 32 MB free AFTER MC is installed. i.e. don't put 2 adapters on >a 64 MB system. channel msg size = 992 MC_SERVICES_P6 sync addr ptr = 0 = 3a = 0 = 83c24000 = 2 = 0 >On a hub, the node numbers come from which linecard you're plugged into. MC node 0 state is 0, name is , device . MC node 1 state is 1, name is BEEF , device MCA. MC node 2 state is 0, name is , device . MC node 3 state is 1, name is TSTDOS , device MCA. MC node 4 state is 1, name is LYNX03 , device MCA. MC node 5 state is 0, name is , device . MC node 6 state is 1, name is PRMMC4 , device MCA. MC node 7 state is 0, name is , device . From a fully-redundant cluster: MC node 0 state is 1, name is RAWHI2 , device MCA. MC node 1 state is 1, name is LYNX05 , device MCA. MC node 2 state is 0, name is , device . MC node 3 state is 1, name is PRMMC2 , device MCA. MC node 4 state is 1, name is PRMMC3 , device MCA. MC node 5 state is 1, name is FLAM29 , device MCA. MC node 6 state is 1, name is SABL5 , device MCA. MC node 7 state is 1, name is SABL6 , device MCA. MC node 0 state is 1, name is RAWHI2 , device MCB. MC node 1 state is 1, name is LYNX05 , device MCB. MC node 2 state is 0, name is , device . MC node 3 state is 1, name is PRMMC2 , device MCB. MC node 4 state is 1, name is PRMMC3 , device MCB. MC node 5 state is 1, name is FLAM29 , device MCB. MC node 6 state is 1, name is SABL5 , device MCB. MC node 7 state is 1, name is SABL6 , device MCB. Note that VMS does NOT require you to wire up MCB in the same order as MCA, but it sure makes maintenance simpler if you do. /Chris |