[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5274.0. "Boot driver initialization failure" by LEMAN::NEUWEILER () Wed Apr 02 1997 09:02

Our customer has a 8400 5/350 with 4 CPU's and 6GB of memory
which is part of a Cluster with a DEC7000-720 and a DEC3000-300.
After the upgrade to  OpenVMS to V7.1, he encounters problems at 
each shutdown. 
The sequence of events during the shutdown on the 8400is as follows:

**** Boot driver initalization routine returned failure
**** Memory dump canceled. IOVector = 00000000, Flags = 02016874

access violation fault

	PCB = 0011E120 (entry)
	PC  = 00122F28
	VA  = 0014F388 00122F28
exception context saved starting at 00120CC0

GPRs:
     ......

dump of active call frames:

     .....

Brk 0 at 00067724

00067724 ! BPT

Eh?

The Bootdriver initialization errors do also show up on the other
nodes, but without crashing the console. (I have no console outputs
handy from this systems.) There is a common system disk on a HSJ.
    
I have not seen any known problem in this area, what could be wrong.                                        

Thanks for help,

	Joerg


Cross posted in MSBCS::TURBOLASER
    
T.RTitleUserPersonal
Name
DateLines
5274.1This problem should be escalated thru IPMT/CLD for proper prioritizationSTAR::JFRAZIERWhat color is a chameleon on a mirror?Wed Apr 02 1997 14:2726
Joerg,

This problem should be escalated thru IPMT/CLD for proper prioritization,
and a formal response.

In the meantime, if possible ...
Please provide us with:

#1 All the console variables, especially those relating to
booting and dumping. Consoles vary but some form of "SHOW *" should give you
the lot. Perhaps "SHOW DEV" also.

#2 Set bit 1 in the SYSGEN parameter DUMPSTYLE. Currently it's
probably set to 9 (compressed selective dump), so make it 11 (bits 0,1,3 set).
If you're using DOSD, it might today be 13 and you'll need to set it to 15.
                                                                            

#3 The entire console log of the shutdown.

#4 From the console >>> sho config, and >>> sho boot*, and >>> sho dump*

#5 What is the boot adaptor?

Regards

James	:-)
5274.2more infoLEMAN::NEUWEILERFri Apr 04 1997 08:48309
Here is some more information. We also took one of the
system disk shadow members and tried to boot it on a local 
adapter. The problem is the same. Due to time limitation we
could not do more.
I suspect that there might be something wrong with the
system disk. On the next occasion we will try to boot a fresh
installed copy of VMS from another disk. (Standalone backup 
booted from a disk executes the shutdown down ok.)

P00>>>show *
arc_enable          	OFF             
auto_action         	RESTART         
boot_dev            	dua313.7.0.5.0 dua313.8.0.5.0 
			dua323.8.0.5.0 dua323.7.0.5.0
boot_file           	                
boot_osflags        	1,0             
boot_reset          	OFF             
bootdef_dev         	dua313.7.0.5.0 dua313.8.0.5.0 
			dua323.8.0.5.0 dua323.7.0.5.0
booted_dev          	dua313.7.0.5.0  
booted_file         	                
booted_osflags      	1,0             
char_set            	0
console             	serial          
console_mode        	basic           
cpu                 	0               
cpu_enabled         	ffff
cpu_primary         	ffff
d_group             	field           
d_omit              	                
d_passes            	1
d_runtime           	0
d_verbose           	0
dump_dev            	dua313.7.0.5.0 dua313.8.0.5.0 
			dua323.8.0.5.0 dua323.7.0.5.0
enable_audit        	ON              
fru_table           	ON              
graphics_background 	4               
graphics_foreground 	7               
graphics_page       	0               
graphics_switch     	-1              
graphics_sync       	0               
graphics_type       	VIDEO           
interleave          	default         
kbd_hardware_type   	PCXAL           
kzpsa0_fast         	1
kzpsa0_host_id      	7
kzpsa0_termpwr      	1
language            	36
language_name       	English (American)
license             	MU              
os_type             	OpenVMS         
pal                 	VMS PALcode V1.19-4, OSF PALcode V1.21-4
simm_callout        	OFF             
sys_model_num       	8400            
sys_serial_num      	qv              
tta0_baud           	9600            
tta0_page           	0               
tta0_type           	VIDEO           
tty_dev             	0               
version             	V4.8-6, 12-FEB-1997 16:31:47
P00>>>show device
polling for units on kzpsa0, slot 1, bus 0, ose0...
kzpsa0.7.0.1.0     dka     TPwr 1 Fast 1 Bus ID 7   L01  A10    
dka100.1.0.1.0     DKa100                    RZ28  442D
dka200.2.0.1.0     DKa200                    RZ28  442D
dka300.3.0.1.0     DKa300                    RZ28  442C
dka400.4.0.1.0     DKa400                    RZ28  442D
dka500.5.0.1.0     DKa500                   RZ29B  0014
dka600.6.0.1.0     DKa600                    RZ28  442D
polling for units on cipca0, slot 5, bus 0, hose0...
cipca_a.1.0.5.0    dua     CI Bus ID 1
dua345.2.0.5.0     $1$DUA345 (HSJ002)        HSX0
dua355.2.0.5.0     $1$DUA355 (HSJ002)        HSX0
dua700.2.0.5.0     $1$DUA700 (HSJ002)        HSX1
dua710.2.0.5.0     $1$DUA710 (HSJ002)        HSX1
dua730.2.0.5.0     $1$DUA730 (HSJ002)        HSX1
dua750.2.0.5.0     $1$DUA750 (HSJ002)        HSX0
dua999.2.0.5.0     $1$DUA999 (HSJ002)        HSX0
dua121.5.0.5.0     $1$DUA121 (HSJ005)        HSX1
dua122.5.0.5.0     $1$DUA122 (HSJ005)        HSX1
dua126.5.0.5.0     $1$DUA126 (HSJ005)        HSX1
dua124.6.0.5.0     $1$DUA124 (HSJ006)        HSX1
dua125.6.0.5.0     $1$DUA125 (HSJ006)        HSX1
dua127.6.0.5.0     $1$DUA127 (HSJ006)        HSX1
dua101.7.0.5.0     $1$DUA101 (HSJ007)        HSX0
dua110.7.0.5.0     $1$DUA110 (HSJ007)        HSX0
dua130.7.0.5.0     $1$DUA130 (HSJ007)        HSX1
dua140.7.0.5.0     $1$DUA140 (HSJ007)        HSX0
dua311.7.0.5.0     $1$DUA311 (HSJ007)        HSX0
dua313.7.0.5.0     $1$DUA313 (HSJ007)        HSX0
dua323.7.0.5.0     $1$DUA313 (HSJ007)        HSX0
dua331.7.0.5.0     $1$DUA331 (HSJ007)        HSX0
dua333.7.0.5.0     $1$DUA333 (HSJ007)        HSX0
dua340.7.0.5.0     $1$DUA340 (HSJ007)        HSX0
dua351.7.0.5.0     $1$DUA351 (HSJ007)        HSX0
mua600.7.0.5.0     $1$MUA600 (HSJ007)        TZ87
mua620.7.0.5.0     $1$MUA620 (HSJ007)        TSZ7
dua104.8.0.5.0     $1$DUA104 (HSJ008)        HSX0
dua120.8.0.5.0     $1$DUA120 (HSJ008)        HSX0
dua323.8.0.5.0     $1$DUA323 (HSJ008)        HSX0
dua341.8.0.5.0     $1$DUA341 (HSJ008)        HSX0
dua343.8.0.5.0     $1$DUA343 (HSJ008)        HSX0
dua350.8.0.5.0     $1$DUA350 (HSJ008)        HSX0
mua610.8.0.5.0     $1$MUA610 (HSJ008)        TZ87
mua620.8.0.5.0     $1$MUA620 (HSJ008)        TSZ7
polling for units on kzpaa0, slot 11, bus 0, hose0...
pkb0.7.0.11.0      kzpaa1              SCSI Bus ID 7
dkb0.0.0.11.0      DKb0                     RRD43  0064
P00>>>
P00>>>show config

        Name                  Type   Rev  Mnemonic  
  TLSB
  0++   KN7CD-AB              8014  0000  kn7cd-ab0   
  1++   KN7CD-AB              8014  0000  kn7cd-ab1   
  2+    MS7CC                 5000  0000  ms7cc0      
  6+    MS7CC                 5000  0000  ms7cc1      
  7+    MS7CC                 5000  0000  ms7cc2      
  8+    KFTHA                 2000  0D02  kftha0      

  C0 PCI connected to kftha0              pci0    
  1+    KZPSA                81011  0000  kzpsa0      
  5+    CIPCA              6601095  0001  cipca0      
  8+    DEC PCI FDDI         F1011  0000  pfi0        
  B+    KZPAA                11000  0002  kzpaa0      



				.
				.

    OpenVMS (TM) Alpha Operating System, Version V7.1   
				.
				.
%SHADOW-I-VOLPROC, DSA0: shadow master has changed.  Dump file WILL 
be written if system crashes.  Volume Processing in progress.
				.
				.
$ @sys$system:shutdown


	SHUTDOWN -- Perform an Orderly System Shutdown
	            on node SUPRA1

How many minutes until final shutdown [0]: 
Reason for shutdown [Standalone]: 
Do you want to spin down the disk volumes [NO]? 
Do you want to invoke the site-specific shutdown procedure [YES]? no
Should an automatic system reboot be performed [NO]? 
When will the system be rebooted [later]: 

Shutdown options (enter as a comma-separated list):
 REMOVE_NODE         Remaining nodes in the cluster should adjust quorum
 CLUSTER_SHUTDOWN    Entire cluster is shutting down
 REBOOT_CHECK        Check existence of basic system files
 SAVE_FEEDBACK       Save AUTOGEN feedback information from this boot
 DISABLE_AUTOSTART   Disable autostart queues

Shutdown options [NONE]: 

%SHUTDOWN-I-OPERATOR, this terminal is now an operator's console
%%%%%%%%%%%  OPCOM   3-APR-1997 19:27:44.28  %%%%%%%%%%%
Operator status for operator _SUPRA1$OPA0:
CENTRAL, PRINTER, TAPES, DISKS, DEVICES, CARDS, NETWORK, CLUSTER, SECURITY,
LICENSE, OPER1, OPER2, OPER3, OPER4, OPER5, OPER6, OPER7, OPER8, OPER9, OPER10,
OPER11, OPER12


%SHUTDOWN-I-DISLOGINS, interactive logins will now be disabled
%SET-I-INTSET, login interactive limit = 0, current interactive value = 1
%SHUTDOWN-I-STOPQUEUES, the queues on this node will now be stopped

SHUTDOWN message on SUPRA1 from user SYSTEM at _SUPRA1$OPA0:   19:27:45
SUPRA1 will shut down in 0 minutes; back up later.  Please log off node SUPRA1.
Standalone


1 terminal has been notified on SUPRA1.
%SHUTDOWN-I-STOPUSER, all user processes will now be stopped
%SHUTDOWN-I-STOPAUDIT, the security auditing subsystem will now be shut down
%%%%%%%%%%%  OPCOM   3-APR-1997 19:27:45.88  %%%%%%%%%%%
Message from user AUDIT$SERVER on SUPRA1
Security alarm (SECURITY) and security audit (SECURITY) on SUPRA1, system id: 10
251
Auditable event:          Audit server shutting down
Event time:                3-APR-1997 19:27:45.88
PID:                      20200416        
Username:                 SYSTEM          


%SHUTDOWN-I-STOPCPU, the secondary processors will now be stopped
%SMP-I-STOPPED, CPU #01 has been stopped.
%SYSTEM-I-CPUSTOPPING, trying to stop CPU 1 after it reaches quiescent state
%SMP-I-STOPPED, CPU #02 has been stopped.
%SYSTEM-I-CPUSTOPPING, trying to stop CPU 2 after it reaches quiescent state
%SMP-I-STOPPED, CPU #03 has been stopped.
%SYSTEM-I-CPUSTOPPING, trying to stop CPU 3 after it reaches quiescent state
%SHUTDOWN-I-REMOVE, all installed images will now be removed
%SHUTDOWN-I-DISMOUNT, all volumes will now be dismounted
%DISM-I-INSWPGFIL, 1 swap or page file installed on volume
%DISM-I-MARKEDDMT, _$1$DKA100: has been marked for dismount
%DISM-I-INSWPGFIL, 1 swap or page file installed on volume
%DISM-I-MARKEDDMT, _$1$DKA200: has been marked for dismount
%DISM-I-INSWPGFIL, 1 swap or page file installed on volume
%DISM-I-MARKEDDMT, _$1$DKA300: has been marked for dismount
%DISM-I-INSWPGFIL, 1 swap or page file installed on volume
%DISM-I-MARKEDDMT, _$1$DKA400: has been marked for dismount
%%%%%%%%%%%  OPCOM   3-APR-1997 19:27:49.28  %%%%%%%%%%%
Message from user SYSTEM on SUPRA1
_SUPRA1$OPA0:, SUPRA1 shutdown was requested by the operator.

%%%%%%%%%%%  OPCOM   3-APR-1997 19:27:49.28  %%%%%%%%%%%
Logfile was closed by operator _SUPRA1$OPA0:
Logfile was SUPRA1::SYS$SYSROOT:[SYSMGR]OPERATOR.LOG;170

%%%%%%%%%%%  OPCOM   3-APR-1997 19:27:49.33  %%%%%%%%%%%

Operator _SUPRA1$OPA0: has been disabled, username SYSTEM

**** Boot driver initialization routine returned failure
**** Memory dump canceled. IOVector = 00000000, Flags = 02016074
unexpected exception/interrupt through vector 810
Unexpected IO device interrupt : 810
process idle, pcb = 0006B350

 pc: 00000000 00068130  ps: 10000000 00000000
 r2: 00000000 000686E8  r5: 00000000 00000001
 r3: 00000000 00105C38  r6: 00000000 00020190
 r4: 00000000 0002038C  r7: 00000000 0006B470

Overlay name                       memadr   topadr    size   ref
turbo                               20000    7d400  381952  0
xdelta                              7d420    8b820   58368  1

exception context saved starting at 0006C340

GPRs:
  0: 00000000 0000001F  16: 00000000 00000000
  1: 00000000 00000010  17: 00000001 00037348
  2: 00000000 0005ADA8  18: 00000000 0006C2F8
  3: 00000000 000636C0  19: 00000000 00000025
  4: 00000000 0002038C  20: 00000000 00000016
  5: 00000000 00000001  21: 00000000 00000000
  6: 00000000 00020190  22: 00000000 00054DA8
  7: 00000000 0006B470  23: 00000000 0000001F
  8: 00000000 0002038C  24: 00000000 000001AD
  9: 00000000 00000001  25: 00000000 00000001
 10: 00000000 0006B350  26: 00000000 000321DC
 11: 00000000 0002038C  27: 00000000 00068DD0
 12: 00000000 000636E8  28: 00000000 00000000
 13: 00000000 0010F240  29: 00000000 0006C4D0
 14: 00000000 00054D28  30: 00000000 0006C490
 15: 00000000 00020368

dump of active call frames:

PC  =  0006812C
PD  =  0005ADA8
FP  =  0006C4D0
SP  =  0006C490

R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R29 saved starting at 0006C4D8

R2  =  00001000
R3  =  00068C90
R4  =  0006C540
R5  =  0007D250
R6  =  000001F0
R7  =  00000000
R8  =  00000000
R9  =  0006C540
R10 =  00000002
R11 =  000202F0
R12 =  00067548
R29 =  00000000

unexpected exception/interrupt through vector 400
Breakpoint Trap

Nested exception - console restarting
Initializing...

F   E   D   C   B   A   9   8   7   6   5   4   3   2   1   0   NODE #
                            A   M   M   .   .   .   .   .   P   TYP
                            o   +   +   .   .   .   .   .  ++   ST1
                            .   .   .   .   .   .   .   .  EB   BPD
                            o   +   +   .   .   .   .   .  ++   ST2
                            .   .   .   .   .   .   .   .  EB   BPD
                            +   +   +   .   .   .   .   .  ++   ST3
                            .   .   .   .   .   .   .   .  EB   BPD

                +   .   .   +   .   .   +   .   .   .   +   .   C0 PCI +
.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   C1
.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   C2
.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   C3

                            .  A1  A0   .   .   .   .   .   .   ILV
                            . 2GB 2GB   .   .   .   .   .   .   4GB
AlphaServer 8400 Console V4.8-6, 12-FEB-1997 16:31:47, SROM V3.1
Configuring I/O adapters...
kzpsa0, slot 1, bus 0, hose0
cipca0, slot 5, bus 0, hose0
pfi0, slot 8, bus 0, hose0
kzpaa0, slot 11, bus 0, hose0

CPU 0 booting

(boot dua313.7.0.5.0 -flags 1,0)

(This was with 2GB of memory removed)
    
5274.3problem foundLEMAN::NEUWEILERTue Apr 15 1997 04:455
    The problem has been found. It occurs when the parameter
    SHADOW_SYS_DISK = 1 and the patch ALPSHAD01_071 is installed.
    Replacing the SYS$SHDRIVER from the patch with the original
    clears the problem.
    I opened an IPMT.
5274.4no ALPSHAD01_071 no RAID?TIMABS::FREPPELMosquito ergo summm...Tue Apr 15 1997 10:2711
    Joerg,
    
    >Replacing the SYS$SHDRIVER from the patch with the original
    >clears the problem.
    Will we then be able to run RAID (ALPSHAD01_071 claims to remove an
    incompatibility with the RAID software).
    
    >I opened an IPMT.
    Could you provide the cfs-number please?
    
    Raymond.
5274.5VMSSG::FRIEDRICHSAsk me about Young EaglesTue Apr 15 1997 15:4019
    Be VERY careful here!!!
    
    ALPSHAD01_071 included not on SYS$SHDRIVER.EXE, but also SYSINIT, SDA,
    SHOW, SYS$BASE_IMAGE and SDA$SHARE (I think that is all of them)...
    
    Anyways, you had better have matching SYSINIT/SYS$SHDRIVER images! 
    Either SSB or SHAD01_071...
    
    Problem such as the one in .0 sure sounds like they just copied the new
    SYS$SHDRIVER to their system, rather than installing the kit.  And that
    fact that it works correctly if *just* he SYS$SHDRIVER is moved back to
    the SSB version, supports this theory..
    
    I'm sure I'll be seeing the IPMT shortly, so you will probably hear the
    same question!! :-)
    
    Cheers,
    jeff
    
5274.6tested locallyLEMAN::NEUWEILERWed Apr 16 1997 05:285
    .4 The IPMT number is CFS.50473
    
    .5 I did not touch the customers system, the customer is still 
       running with ALPSHAD01_071 installed. I did all the tests on
       my workstation.
5274.7VMSSG::FRIEDRICHSAsk me about Young EaglesWed Apr 16 1997 11:3517
    Closure in the works...
    
    ALPSHAD01_071 changed the organization of the SHAD$ structure.  As a 
    result, images that use the SHAD$ also needed to be shipped (ie SDA,
    SHOW and SYSINIT).  We failed to include EXCEPTION.EXE which also
    uses the SHAD to determine which disk is the master member of the
    shadow system disk (so that the dump gets written to the correct
    member).
    
    ALPSHAD02_071 is being kitted as I write.  The only change will be the
    inclusion of EXCEPTION.EXE (ie the SYS$SHDRIVER will be identical to
    the one supplied in ALPSHAD01_071).  It should be available within a
    day or two.  
    
    Cheers,
    jeff