[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8660.0. "System Crash Query " by CGOOA::SMALL () Thu Jan 30 1997 18:21

    Hi folks,
    
    Below is the output of a crash-data file from a Digital system (v3.2d)
    which has been running at a customer site for about 9 months. The
    system has been experiencing intermittent crashes during overnight
    processing over the last 3 months. 
    
    The system is a X400/X500 mail server running DECnet as well as TCP/IP.
    plus ADVFS, polycenter producst (scheduler, watchdog), NSR and all teh
    X400/X500 products. 
    
    
    My quick glance at the messages suggest that the problem is hardware
    	"panic - machine check hardware error"
    
    But the problem is intermittent and the system always recovers.
    The system has crashed about 12 times in the last 2 months. HOwever
    since it never goes down while anyone is logged on and using it we
    didnt' notice for a while. Each of the crash-data files are pretty much
    the same. 
    
    What I would appreciate is at least a good notion of whether we have a
    hardware problem (call mcs) or a software problem (dig deeper.)
    
    Since I haven't found "A thumbnail guide to crash data files" I thought
    I'd post it here.
    
    --------------------------------
    
    #
    # Crash Data Collection (Version 1.4)
    #
    _crash_data_collection_time: Thu Jan 30 05:16:07 PST 1997
    _current_directory: /
    _crash_kernel: /var/adm/crash/vmunix.38
    _crash_core: /var/adm/crash/vmcore.38
    _crash_arch: alpha
    _crash_os: Digital UNIX
    _host_version: Digital UNIX V3.2D-2 (Rev. 41.64); Thu Oct  3 18:21:23
    PDT 1996 
    _crash_version: Digital UNIX V3.2D-2 (Rev. 41.64); Thu Oct  3 18:21:23
    PDT 1996 
    
    _crashtime:  struct {
        tv_sec = 854629801
        tv_usec = 988688
    } 
    _boottime:  struct {
        tv_sec = 854589000
        tv_usec = 981856
    } 
    _config:  struct {
        sysname = "OSF1"
        nodename = "eis1.bchydro.bc.ca"
        release = "V3.2"
        version = "41.64"
        machine = "alpha"
    } 
    _cpu:  43 
    _system_string:  0xffffffffff800a20 = "AlphaServer 1000 4/266" 
    _ncpus:  1 
    _avail_cpus:  1 
    _partial_dump:  1 
    _physmem(MBytes):  127 
    _panic_string:  0xfffffc000062e410 = "Machine check - Hardware error" 
    _paniccpu:  0 
    _panic_thread:  0xfffffc00079bdb80 
    _preserved_message_buffer_begin: 
    struct {
        msg_magic = 0x63061
        msg_bufx = 0x158
        msg_bufr = 0x6c2
        msg_bufc = "_revision	= 0x3
      esc_int0	= 0xa1
      esc_int1	= 0xef
      esc_elcr0	= 0x0
      esc_elcr1	= 0x0
      esc_last_eisa	= 0xff
      esc_nmi_stat	= 0x20
    
      pci_ir	= 0xff
      pci_imr	= 0x1
      svr_mgr	= 0xd4
    panic (cpu 0): Machine check - Hardware error
    syncing disks... DUMP.prom: dev RAID 0 11 0 0 0 0 0, block 131072
    DUMP.prom: dev RAID 0 11 0 0 0 0 0, block 131072
    de: OSF version 1.46
    pci0 at nexus
    psiop0 at pci0 slot 6
    Loading SIOP: script 800300, reg 82008000, data 406ec2a0
    scsi0 at psiop0 slot 0
    rz5 at scsi0 bus 0 target 5 lun 0 (DEC     RRD45   (C) DEC  1645)
    tz6 at scsi0 bus 0 target 6 lun 0 (DEC     TLZ07     (C)DEC 553A)
    eisa0 at pci0
    ace0 at eisa0
    ace1 at eisa0
    lp0 at eisa0
    fdi0 at eisa0
    fd0 at fdi0 unit 0
    vga0 at eisa0
     640x480 (Cirrus  )
    vga0: Cirrus Logic CL-GD5424 (SVGA) 512 Kbytes
    Initializing xcr0.  Please wait.
    Initializing xcr0.  Please wait.
    Initializing xcr0.  Please wait.
    Initializing xcr0.  Please wait.
    xcr0 at pci0 slot 11
    re0 at xcr0 unit 0 (unit status = ONLINE, raid level = 1)
    re1 at xcr0 unit 1 (unit status = ONLINE, raid level = JBOD)
    tu0: DECchip 21040-AA: Revision: 2.3
    tu0 at pci0 slot 13
    tu0: DEC TULIP Ethernet Interface, hardware address: 08-00-2B-E6-01-74
    tu0: console mode: selecting UTP (10BaseT) port
    gpc0 at eisa0
    lvm0: configured.
    lvm1: configured.
    dli: configured
    SuperLAT. Copyright 1993 Meridian Technology Corp. All rights reserved.
    cam_logger: CAM_ERROR packet
    cam_logger: bus 0 target 6 lun 0 
    ss_device_reset_done
    Bus device reset has been performed
    ADVFS: using 1152 buffers containing 9.00 megabytes of memory
    Node ID is 08-00-2b-e6-01-74 (from device tu0)
    dna_netman: configured
    dna_dli: configured
    Node UID is 34ee8fe0-7a43-11d0-800c-08002be60174
    dna_base: configured
    dna_xti: configured
    AlphaServer 1000 4/266 machine check type 0x670.
      retry		= 0x0
      mchk_code	= 0x0
      paltemp[1]	= 0x8908b5c8
      paltemp[2]	= 0x4
      paltemp[3]	= 0x0
      paltemp[4]	= 0x3dc0
      paltemp[5]	= 0x28
      paltemp[6]	= 0x0
      paltemp[7]	= 0x4200
      paltemp[8]	= 0x400
      paltemp[9]	= 0x0
      paltemp[10]	= 0x4e2ab0
      paltemp[11]	= 0x0
      paltemp[12]	= 0x4e2e50
      paltemp[13]	= 0x4e2e80
      paltemp[14]	= 0x4e2ee0
      paltemp[15]	= 0x4e2c50
      paltemp[16]	= 0x4e2960
      paltemp[17]	= 0x1a
      paltemp[18]	= 0x1ffeeea0
      paltemp[19]	= 0x8908b600
      paltemp[20]	= 0x65f200
      paltemp[21]	= 0x0
      paltemp[22]	= 0x626e6e6e
      paltemp[23]	= 0x80
      paltemp[24]	= 0x0
      paltemp[25]	= 0x10000
      paltemp[26]	= 0xd
      paltemp[27]	= 0x0
      paltemp[28]	= 0x3262000
      paltemp[29]	= 0x0
      paltemp[30]	= 0x1
      paltemp[31]	= 0x5f03a58
      exc_addr	= 0x383dba
      exc_sum	= 0x0
      msk		= 0x0
      iccsr		= 0x4
      pal_base	= 0x14000
      hier		= 0x1cd0
      hirr		= 0x0
      mm_csr	= 0x5b11
      dc_stat	= 0x3
      dc_addr	= 0xffffffff
      abox_ctl	= 0x942e
      biu_stat	= 0x254
      biu_addr	= 0x752c1e0
      biu_ctl	= 0x10002227
      fill_syndrome	= 0x0
      fill_adr	= 0x6100
      va		= 0x6170
      bc_tag	= 0x7614
    
      coma_gcr	= 0x7fb20034
      coma_edsr	= 0xffffa000
      coma_ter	= 0x7fb27fe0
      coma_elar	= 0x7fb20800
      coma_ehar	= 0x7fb20820
      coma_ldlr	= 0x7fb2c9bf
      coma_ldhr	= 0x6fb10031
      coma_base0	= 0x6fb10000
      coma_base1	= 0x6fb10000
      coma_base2	= 0x22310000
      coma_base3	= 0xffff0000
      coma_cnfg0	= 0x22310067
      coma_cnfg1	= 0x22310000
      coma_cnfg2	= 0x22310000
      coma_cnfg3	= 0x7fb20000
    
      epic_dcsr	= 0x801e0019
      epic_pear	= 0x802560
      epic_sear	= 0x12dfff0
      epic_tbr1	= 0x3b2000
      epic_tbr2	= 0x0
      epic_pbr1	= 0x8c0000
      epic_pbr2	= 0x40080000
      epic_pmr1	= 0x700000
      epic_pmr2	= 0x3ff00000
      epic_harx1	= 0x80000000
      epic_harx2	= 0x0
      epic_pmlt	= 0xff
      epic_tag0	= 0x80e000
      epic_tag1	= 0x810000
      epic_tag2	= 0x812000
      epic_tag3	= 0x814000
      epic_tag4	= 0x801000
      epic_tag5	= 0x807000
      epic_tag6	= 0x803000
      epic_tag7	= 0x80c000
      epic_data0	= 0x44bc
      epic_data1	= 0x68c0
      epic_data2	= 0x6d62
      epic_data3	= 0x6bde
      epic_data4	= 0x6e4
      epic_data5	= 0x6ea
      epic_data6	= 0x6e6
      epic_data7	= 0x6b26
    
      pceb_vid	= 0x8086
      pceb_did	= 0x482
      pceb_revision	= 0x5
      pceb_command	= 0x7
      pceb_status	= 0x200
      pceb_latency	= 0xf8
      pceb_control	= 0x60
      pceb_arbcon	= 0x9d
      pceb_arbpri	= 0x4
    
      esc_id	= 0xf
      esc"
    } 
    _preserved_message_buffer_end: 
    _kernel_process_status_begin: 
      PID	COMM
    00000	kernel idle
    00001	init
    00003	kloadsrv
    00019	update
    00079	fax_1
    00082	mwatch_1
    00085	mwatch_2
    00088	sstgw
    11357	mta_irchild
    22659	psw_sensor_eth
    00154	syslogd
    00156	binlogd
    11468	mta_irchild
    00208	routed
    22750	sh
    11548	mta_remote_api_s
    00285	portmap
    00287	nfsiod
    00288	nfsiod
    00289	nfsiod
    00290	nfsiod
    00291	nfsiod
    00292	nfsiod
    00293	nfsiod
    00296	rpc.statd
    00298	rpc.lockd
    11566	mta_remote_api_s
    22832	sched_agent_comm
    00324	dnalimd
    00327	dnaevld
    00369	dnascd
    00370	dnansd
    00371	dnaksd
    00375	dnsadv
    00379	dtssd
    00382	dnanoded
    00394	dnamopd
    00422	osaknmd
    00438	mta
    00449	smtpgw_ea
    00459	dxd_dsad
    01494	xdm
    01510	dxconsole
    00500	smtpgw
    00504	sendmail
    00593	mold
    00596	internet_mom
    00600	mgrAgentd_mom
    00615	snmp_pe
    00621	inetd
    00626	cron
    00640	mta_irserver
    00641	mta_mpserver
    00642	mta_rlserver
    00643	mta_wjserver
    00644	mta_remote_api_s
    00645	mta_remote_api_s
    00654	lpd
    00669	psw_agent
    02746	dnsclerk
    00710	ibxlookupd
    00711	ibxd
    00747	rpcd
    00755	mta_irchild
    00756	mta_rlchild
    00759	mta_irchild
    00809	sched_agent
    00825	sched_listener
    00832	sched_txm
    00846	sched_engine
    00850	sqlexec
    00851	sqlexec
    22372	sched_agent_job_
    00880	xdm
    00892	Xdec
    00910	sstinfo
    00922	_upsd
    00939	namon_server
    00940	namon_server
    00946	nsrd
    00965	nsrexecd
    00970	getty
    00973	nsrmmdbd
    00977	nsrmmd
    00979	nsrindexd
    _kernel_process_status_end: 
    _current_pid:  22832 
    _current_tid:  0xfffffc00079bdb80 
    _proc_thread_list_begin: 
    thread 0xfffffc00079bdb80 stopped at  [boot:1746 ,0xfffffc00004e5d8c]	
    Source not available
    _proc_thread_list_end: 
    _dump_begin: 
    >  0 boot(0x0, 0x4, 0x31372, 0x31373, 0x1)
    ["../../../../src/kernel/arch/alpha/machdep.c":1746,
    0xfffffc00004e5d8c]
    
       1 panic(s = 0xfffffc0000616b10 = "thread_block: interrupt level
    call") ["../../../../src/kernel/bsd/subr_prf.c":673,
    0xfffffc0000442c78]
    pcpu = 0xb3f001600000001
    i = 4464936
    bootopt = 7
    mycpu = 7011304
    spl = 5
    prevcc = 18446739675667504948
    nextcc = 18446739675670051736
    timer = -4294967292
    limit = -4397913889912
    
       2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1748,
    0xfffffc0000475198]
    thread = 0xfffffc00079bdb80
    new_thread = 0xfffffc0000659220
    mycpu = 0
    myprocessor = 0x20
    s = 5
    pset = 0xfffffc0000477a08
    prev = 0xfffffc00006c8a48
    
       3 thread_preempt(thread = 0xfffffc00079bdb80, processor =
    0xfffffc0000154100) ["../../../../src/kernel/kern/sched_prim.c":3460,
    0xfffffc0000477a14]
    s = 5
    pri = 6656544
    pset = 0xfffffc00006b6208
    
       4 call_disk(vdp = 0xfffffc0007e7a388, ioAmt = 16384, blk = 384096,
    ioList = 0xffffffff87f0a008, s = 0xffffffff8908ab38)
    ["../../../../src/kernel/msfs/osf/msfs_io.c":1237, 0xfffffc0000404638]
    bp = 0xffffffff87f0a008
    s = 2
    th = 0xfffffc00079bdb80
    
       5 bs_startio(vdp = (nil), s = 0xffffffff8908ab38, flushFlag =
    8039488) ["../../../../src/kernel/msfs/osf/msfs_io.c":1497,
    0xfffffc0000404b18]
    error = -2014057872
    ioAmt = 16384
    ioList = 0xffffffff87f0a008
    iop = 0xfffffc00002a3f70
    pages = 9591592
    devVirt = 0x114e000
    offset = 3921
    kpte = 0x1
    scratch = union {
        quadword = 18446739675663040512
        PTE_BITFIELD = struct {
            _v = 0
            _for = 0
            _fow = 0
            _foe = 0
            _asm = 0
            _gh = 0
            _prot = 0
            _exec = 0
            _wire = 0
            _seg = 0
            _lw_wire = 0
            _gh_shared = 0
            _soft = 0
            _lw_wire_count = 0
            _pfn = 4294966272
        }
    }
    sz = -2014162080
    blks = 0
    i = 115815816
    mask = 0
    vdBlk = 0
    svdBlk = 6656136
    readCnt = 15856
    rw = -2144937512
    msk = 1
    qhdr = 0xffffffff87f3eae8
    toFlush = 4
    
       6 flush_vols(dmnp = 0x1, s = 0x1, forceFlushFlag = -1995919576)
    ["../../../../src/kernel/msfs/bs/bs_qio.c":2663, 0xfffffc00003c6138]
    vdi = 8039488
    vdp = 0xf10
    
       7 get_freebuf(dmnp = 0xfffffc000217e008, bc = 0xffffffff8908aaf0, s
    = 0xffffffff8908ab38)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":4430,
    0xfffffc00003d5ba8]
    bp = 0x1ea2000
    tmp = 0xfffffc0007e7a388
    heldPg = 1
    dmni = 1
    i = -2014057624
    i = 3950620
    lim = 3921
    
       8 bs_pinpg_one_int(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
    BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640, putflag = 1)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":3217,
    0xfffffc00003d3f00]
    bp = 0x1
    tmp = 0xf0e7032f09da9
    hbp = 0xfffffc0006e73588
    sts = 0
    i = 0
    s = -1995919576
    haveBcache = 1
    doRead = 0
    res = -2014057624
    listLen = 0
    ioListp = 0x1153fff
    wait = 0
    desCnt = 18153472
    doFlush = 0
    shouldBlock = 0
    ubcsts = 0
    i = 18153472
    lim = 0
    bfPageSize = 132621192
    maskbit = 6681088
    pp = 0xfffffc00003c4c3c
    ioDescp = 0x2
    
       9 bs_pinpg_clone(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
    BS_NIL, ftxH = struct {
        hndl = 0
        level = 0
        dmnh = 0
    }, noReadMask = 1, pl = 0xfffffc00007e9640, putflag = 1)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2907,
    0xfffffc00003d3870]
    sts = 8296000
    bfSetp = 0xfffffc00003d3238
    
      10 bs_pinpg_put(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfAccessH = 31588352, bsPage = 3856, refHint =
    BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2478,
    0xfffffc00003d32b8]
    bfap = 0xfffffc0006e73580
    bs_pinpg_fn_p = 0xfffffc00003f9ff4
    
      11 msfs_putpage(0xfffffc0007634c00, 0x2, 0x1, 0x40,
    0xfffffc0004a6ad00)
    ["../../../../src/kernel/msfs/osf/msfs_misc.c":1813,
    0xfffffc00003fa098]
    
      12 ubc_flush_dirty(0x2, 0x40, 0xfffffc000065f200, 0xfffffc0004b57200,
    0xfffffc0000452bd8) ["../../../../src/kernel/vfs/vfs_ubc.c":3055,
    0xfffffc00002a2c08]
    
      13 mntflushbuf(mountp = 0xfffffc0000006010, flags = 0)
    ["../../../../src/kernel/vfs/vfs_bio.c":1427, 0xfffffc0000452bcc]
    vp = 0xfffffc0000613420
    nvp = 0x670
    
      14 boot(0x0, 0x0, 0xfffffc000062e410, 0x4f20, 0x730)
    ["../../../../src/kernel/arch/alpha/machdep.c":1675,
    0xfffffc00004e5c3c]
    
      15 panic(s = 0xfffffc000062e410 = "Machine check - Hardware error")
    ["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc0000442e34]
    pcpu = 0xfffffc0000744030
    i = -1995918576
    bootopt = 7
    mycpu = 1
    spl = 7
    prevcc = 18446739675663065104
    nextcc = 18446739675663065088
    timer = -4398041208068
    limit = 4072
    
      16 kn22a_machcheck(0x670, 0xfffffc0000006000, 0xffffffff8908b2a8,
    0xfffffc0001fb5780, 0x0)
    ["../../../../src/kernel/arch/alpha/hal/kn22a.c":2477,
    0xfffffc000050ebc8]
    
      17 mach_error(0x0, 0xfffffc0000006000, 0xffffffff8908b2a8,
    0xfffffffdff7fc000, 0xfffffffdff000000)
    ["../../../../src/kernel/arch/alpha/hal/cpusw.c":826,
    0xfffffc00005026f4]
    
      18 _XentInt(0x0, 0xfffffc0000383db8, 0xfffffc000065f200, 0x0,
    0x7e006) ["../../../../src/kernel/arch/alpha/locore.s":997,
    0xfffffc00004e2bbc]
    
      19 u_anon_faultpage(0x0, 0x140023990, 0x14000add0, 0x5,
    0xfffffc0000410134) ["../../../../src/kernel/vm/u_mape_anon.c":1002,
    0xfffffc0000383db4]
    
    _dump_end: 
    
    warning: Files compiled -g3: parameter values probably wrong
    _kernel_thread_list_begin: 
    thread 0xfffffc0007ef4000 stopped at   [thread_run:2282
    ,0xfffffc0000475cb4]	 Source not available
    thread 0xfffffc0007ef4400 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007f26800 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007f26c00 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc0007f27000 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc0007f27400 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007f27800 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc0007f27c00 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007ce2000 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007ce2400 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007ce2800 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc0007ce2c00 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc0007ce3400 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007ce3800 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc0007ce3c00 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    thread 0xfffffc000747e000 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc000747e400 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc000747e800 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc000747ec00 stopped at   [thread_block:1899
    +0x28,0xfffffc0000475458]	 Source not available
    thread 0xfffffc000747f000 stopped at   [thread_block:1914
    ,0xfffffc00004754c8]	 Source not available
    _kernel_thread_list_end: 
    _savedefp:  0xffffffff8908b950 
    _kernel_memory_fault_data_begin:  
    struct {
        fault_va = 0x0
        fault_pc = 0x0
        fault_ra = 0x0
        fault_sp = 0x0
        access = 0x0
        status = 0x0
        cpunum = 0x0
        count = 0x0
        pcb = (nil)
        thread = (nil)
        task = (nil)
        proc = (nil)
    } 
    _kernel_memory_fault_data_end:  
    Invalid character in input
    _uptime: 11.33 hours
    
    paniccpu: 0x0 
    machine_slot[paniccpu]: struct {
        is_cpu = 0x1
        cpu_type = 0xf
        cpu_subtype = 0x11
        running = 0x1
        cpu_ticks = {
            [0] 0x76f75b
            [1] 0x5
            [2] 0x32819f
            [3] 0x1d476c2
            [4] 0x1034
        }
        clock_freq = 0x400
        error_restart = 0x0
        cpu_panicstr = 0xfffffc000062e410 = "Machine check - Hardware
    error"
        cpu_panic_thread = 0xfffffc00079bdb80
    } 
    tset machine_slot[paniccpu].cpu_panic_thread: 
    Begin Trace for machine_slot[paniccpu].cpu_panic_thread: 
    >  0 boot(0x0, 0x4, 0x31372, 0x31373, 0x1)
    ["../../../../src/kernel/arch/alpha/machdep.c":1746,
    0xfffffc00004e5d8c]
       1 panic(s = 0xfffffc0000616b10 = "thread_block: interrupt level
    call") ["../../../../src/kernel/bsd/subr_prf.c":673,
    0xfffffc0000442c78]
       2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1748,
    0xfffffc0000475198]
       3 thread_preempt(thread = 0xfffffc00079bdb80, processor =
    0xfffffc0000154100) ["../../../../src/kernel/kern/sched_prim.c":3460,
    0xfffffc0000477a14]
       4 call_disk(vdp = 0xfffffc0007e7a388, ioAmt = 0x4000, blk = 0x5dc60,
    ioList = 0xffffffff87f0a008, s = 0xffffffff8908ab38)
    ["../../../../src/kernel/msfs/osf/msfs_io.c":1237, 0xfffffc0000404638]
       5 bs_startio(vdp = (nil), s = 0xffffffff8908ab38, flushFlag =
    0x7aac40) ["../../../../src/kernel/msfs/osf/msfs_io.c":1497,
    0xfffffc0000404b18]
       6 flush_vols(dmnp = 0x1, s = 0x1, forceFlushFlag =
    0xffffffff8908af28) ["../../../../src/kernel/msfs/bs/bs_qio.c":2663,
    0xfffffc00003c6138]
       7 get_freebuf(dmnp = 0xfffffc000217e008, bc = 0xffffffff8908aaf0, s
    = 0xffffffff8908ab38)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":4430,
    0xfffffc00003d5ba8]
       8 bs_pinpg_one_int(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 0xf10, refHint
    = BS_NIL, noReadMask = 0x1, pl = 0xfffffc00007e9640, putflag = 0x1)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":3217,
    0xfffffc00003d3f00]
       9 bs_pinpg_clone(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 0xf10, refHint
    = BS_NIL, ftxH = (...), noReadMask = 0x1, pl = 0xfffffc00007e9640,
    putflag = 0x1) ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2907,
    0xfffffc00003d3870]
      10 bs_pinpg_put(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfAccessH = 0x1e20000, bsPage = 0xf10, refHint =
    BS_NIL, noReadMask = 0x1, pl = 0xfffffc00007e9640)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2478,
    0xfffffc00003d32b8]
      11 msfs_putpage(0xfffffc0007634c00, 0x2, 0x1, 0x40,
    0xfffffc0004a6ad00)
    ["../../../../src/kernel/msfs/osf/msfs_misc.c":1813,
    0xfffffc00003fa098]
      12 ubc_flush_dirty(0x2, 0x40, 0xfffffc000065f200, 0xfffffc0004b57200,
    0xfffffc0000452bd8) ["../../../../src/kernel/vfs/vfs_ubc.c":3055,
    0xfffffc00002a2c08]
      13 mntflushbuf(mountp = 0xfffffc0000006010, flags = 0x0)
    ["../../../../src/kernel/vfs/vfs_bio.c":1427, 0xfffffc0000452bcc]
      14 boot(0x0, 0x0, 0xfffffc000062e410, 0x4f20, 0x730)
    ["../../../../src/kernel/arch/alpha/machdep.c":1675,
    0xfffffc00004e5c3c]
      15 panic(s = 0xfffffc000062e410 = "Machine check - Hardware error")
    ["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc0000442e34]
      16 kn22a_machcheck(0x670, 0xfffffc0000006000, 0xffffffff8908b2a8,
    0xfffffc0001fb5780, 0x0)
    ["../../../../src/kernel/arch/alpha/hal/kn22a.c":2477,
    0xfffffc000050ebc8]
      17 mach_error(0x0, 0xfffffc0000006000, 0xffffffff8908b2a8,
    0xfffffffdff7fc000, 0xfffffffdff000000)
    ["../../../../src/kernel/arch/alpha/hal/cpusw.c":826,
    0xfffffc00005026f4]
      18 _XentInt(0x0, 0xfffffc0000383db8, 0xfffffc000065f200, 0x0,
    0x7e006) ["../../../../src/kernel/arch/alpha/locore.s":997,
    0xfffffc00004e2bbc]
      19 u_anon_faultpage(0x0, 0x140023990, 0x14000add0, 0x5,
    0xfffffc0000410134) ["../../../../src/kernel/vm/u_mape_anon.c":1002,
    0xfffffc0000383db4]
    End Trace for machine_slot[paniccpu].cpu_panic_thread: 
    
    "cpu_data" is not an array
    _stack_trace[0]_begin: 
    >  0 boot(0x0, 0x4, 0x31372, 0x31373, 0x1)
    ["../../../../src/kernel/arch/alpha/machdep.c":1746,
    0xfffffc00004e5d8c]
       1 panic(s = 0xfffffc0000616b10 = "thread_block: interrupt level
    call") ["../../../../src/kernel/bsd/subr_prf.c":673,
    0xfffffc0000442c78]
       2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1748,
    0xfffffc0000475198]
       3 thread_preempt(thread = 0xfffffc00079bdb80, processor =
    0xfffffc0000154100) ["../../../../src/kernel/kern/sched_prim.c":3460,
    0xfffffc0000477a14]
       4 call_disk(vdp = 0xfffffc0007e7a388, ioAmt = 16384, blk = 384096,
    ioList = 0xffffffff87f0a008, s = 0xffffffff8908ab38)
    ["../../../../src/kernel/msfs/osf/msfs_io.c":1237, 0xfffffc0000404638]
       5 bs_startio(vdp = (nil), s = 0xffffffff8908ab38, flushFlag =
    8039488) ["../../../../src/kernel/msfs/osf/msfs_io.c":1497,
    0xfffffc0000404b18]
       6 flush_vols(dmnp = 0x1, s = 0x1, forceFlushFlag = -1995919576)
    ["../../../../src/kernel/msfs/bs/bs_qio.c":2663, 0xfffffc00003c6138]
       7 get_freebuf(dmnp = 0xfffffc000217e008, bc = 0xffffffff8908aaf0, s
    = 0xffffffff8908ab38)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":4430,
    0xfffffc00003d5ba8]
       8 bs_pinpg_one_int(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
    BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640, putflag = 1)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":3217,
    0xfffffc00003d3f00]
       9 bs_pinpg_clone(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
    BS_NIL, ftxH = (...), noReadMask = 1, pl = 0xfffffc00007e9640, putflag
    = 1) ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2907,
    0xfffffc00003d3870]
      10 bs_pinpg_put(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
    0xffffffff8908ae18, bfAccessH = 31588352, bsPage = 3856, refHint =
    BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640)
    ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2478,
    0xfffffc00003d32b8]
      11 msfs_putpage(0xfffffc0007634c00, 0x2, 0x1, 0x40,
    0xfffffc0004a6ad00)
    ["../../../../src/kernel/msfs/osf/msfs_misc.c":1813,
    0xfffffc00003fa098]
      12 ubc_flush_dirty(0x2, 0x40, 0xfffffc000065f200, 0xfffffc0004b57200,
    0xfffffc0000452bd8) ["../../../../src/kernel/vfs/vfs_ubc.c":3055,
    0xfffffc00002a2c08]
      13 mntflushbuf(mountp = 0xfffffc0000006010, flags = 0)
    ["../../../../src/kernel/vfs/vfs_bio.c":1427, 0xfffffc0000452bcc]
      14 boot(0x0, 0x0, 0xfffffc000062e410, 0x4f20, 0x730)
    ["../../../../src/kernel/arch/alpha/machdep.c":1675,
    0xfffffc00004e5c3c]
      15 panic(s = 0xfffffc000062e410 = "Machine check - Hardware error")
    ["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc0000442e34]
      16 kn22a_machcheck(0x670, 0xfffffc0000006000, 0xffffffff8908b2a8,
    0xfffffc0001fb5780, 0x0)
    ["../../../../src/kernel/arch/alpha/hal/kn22a.c":2477,
    0xfffffc000050ebc8]
      17 mach_error(0x0, 0xfffffc0000006000, 0xffffffff8908b2a8,
    0xfffffffdff7fc000, 0xfffffffdff000000)
    ["../../../../src/kernel/arch/alpha/hal/cpusw.c":826,
    0xfffffc00005026f4]
      18 _XentInt(0x0, 0xfffffc0000383db8, 0xfffffc000065f200, 0x0,
    0x7e006) ["../../../../src/kernel/arch/alpha/locore.s":997,
    0xfffffc00004e2bbc]
      19 u_anon_faultpage(0x0, 0x140023990, 0x14000add0, 0x5,
    0xfffffc0000410134) ["../../../../src/kernel/vm/u_mape_anon.c":1002,
    0xfffffc0000383db4]
    _stack_trace[0]_end: 
    
    _savedefp_exception_frame_(savedefp/33X): 
    ffffffff8908b950:  0000000000072008 0000000000000000
    ffffffff8908b960:  0000000000000001 0000000000022000
    ffffffff8908b970:  0000000000003dc0 0000000000000028
    ffffffff8908b980:  0000000000000000 000000000000000d
    ffffffff8908b990:  000000000000001a 000000000008bac8
    ffffffff8908b9a0:  0000000140023e20 000000011ffff838
    ffffffff8908b9b0:  0000000000000000 0000000140023990
    ffffffff8908b9c0:  000000014000add0 0000000000000005
    ffffffff8908b9d0:  000000011ffff168 000003ffc01db7f0
    ffffffff8908b9e0:  000000011ffee8a5 000003ffc01de2f0
    ffffffff8908b9f0:  0000000000000000 0000000000000282
    ffffffff8908ba00:  0000000000000000 000003ff804bb160
    ffffffff8908ba10:  000003ff8051bc40 000000011fffef30
    ffffffff8908ba20:  000000011ffffba0 0000000000000008
    ffffffff8908ba30:  000003ff8051bcc0 000003ffc01ac880
    ffffffff8908ba40:  000000000007dfc8 0000000000000000
    ffffffff8908ba50:  000000000000fde8
    _savedefp_exception_frame_ptr:  0xffffffff8908b950 
    _savedefp_stack_pointer:  0x11ffffba0 
    _savedefp_processor_status:  0x8 
    _savedefp_return_address:  0x3ff804bb160 
    _savedefp_pc:  0x3ff8051bcc0 
    _savedefp_pc/i:  
    l1 address 0x3ff8051bcc0 not mapped, pte 0x0
    
    can't read from process (address 0x3ff8051bcc0)
    _savedefp_return_address/i:  
    l1 address 0x3ff804bb160 not mapped, pte 0x0
    
    can't read from process (address 0x3ff804bb160)
    _kernel_memory_fault_data.fault_pc/i:  
    
    can't read from process (address 0x0)
    _kernel_memory_fault_data.fault_ra/i:  
    
    can't read from process (address 0x0)
    
    _kdbx_sum_start:
    Hostname : eis1.bchydro.bc.ca
    cpu: AlphaServer 1000 4/266	avail: 1
    Boot-time:	Wed Jan 29 17:50:00 1997
    Time:	Thu Jan 30 05:10:01 1997
    Kernel : OSF1 release V3.2 version 41.64 (alpha)
    _kdbx_sum_end:
    _kdbx_swap_start:
    
           Swap device name              Size       In Use       Free
    --------------------------------  ----------  ----------  ----------
    /dev/re0b                            131072k      20616k     110456k
                                          16384p       2577p      13807p
    
    /dev/re1b                            131072k      19200k     111872k
                                          16384p       2400p      13984p
    
    /dev/re1d                            262144k      18872k     243272k
                                          32768p       2359p      30409p
    --------------------------------  ----------  ----------  ----------
    Total swap partitions:    3          524288k      58688k     465600k
                                          65536p       7336p      58200p
    _kdbx_swap_end:
    _kdbx_proc_start:
    Addr        PID   PPID  PGRP  UID   NICE SIGCATCH P_SIG    Event      
    Flags
    =========== ===== ===== ===== ===== ==== ======== ======== ===========
    ============
    k0x07eed210     0     0     0     0    0 00000000 00000000        NULL
    in sys
    k0x07e9b210     1     0     1     0    0 307a7eff 00000000        NULL
    in pagv exec
    k0x0746f210     3     1     2     0    0 00004006 00000000        NULL
    in pagv exec
    k0x07467210    19     1    19     0    0 00002000 00000000        NULL
    in pagv
    k0x021ad210    79     1    33   106    0 00006000 00000000        NULL
    in pagv ctty exec
    k0x019cd210    82     1    33   106    0 00004000 00000000        NULL
    in pagv ctty exec
    k0x017b4210    85     1    33   106    0 00004000 00000000        NULL
    in pagv ctty exec
    k0x017b5210    88     1    33   106    0 00004000 00000000        NULL
    in pagv ctty exec
    k0x01c5a210 11357   640   438     0    0 20000000 00000000        NULL
    in pagv exec
    k0x052da210 22659   669   669     0    0 00000006 00000000        NULL
    in pagv exec
    k0x01fea210   154     1   154     0    0 00086001 00000000        NULL
    in pagv
    k0x017a5210   156     1   156     0    0 00004001 00000000        NULL
    in pagv
    k0x00ebe210 11468   640   438     0    0 20000000 00000000        NULL
    in pagv exec
    k0x017a4210   208     1   208     0    0 20006003 00000000        NULL
    in pagv
    k0x0667c210 22750   809 22750     0    0 60007eff 00000000        NULL
    in pagv exec
    k0x00ebf210 11548   645   438     0    0 00080000 00000000        NULL
    in pagv
    k0x021ac210   285     1   285     0    0 00080628 00000000        NULL
    in pagv
    k0x00deb210   287     1   287     0    0 00000000 00000000        NULL
    in pagv
    k0x00dea210   288   287   287     0    0 00000000 00000000        NULL
    in pagv
    k0x07663210   289   287   287     0    0 00000000 00000000        NULL
    in pagv
    k0x07662210   290   287   287     0    0 00000000 00000000        NULL
    in pagv
    k0x033f8210   291   287   287     0    0 00000000 00000000        NULL
    in pagv
    k0x033f9210   292   287   287     0    0 00000000 00000000        NULL
    in pagv
    k0x06e76210   293   287   287     0    0 00000000 00000000        NULL
    in pagv
    k0x06e77210   296     1     0     0    0 00002000 00000000        NULL
    in pagv ctty
    k0x078ca210   298     1   298     0    0 00002000 00000000        NULL
    in pagv
    k0x0667d210 11566   645   438     0    0 00080000 00000000        NULL
    in pagv
    k0x079bd210 22832 22372 22750     0    0 00001ef8 00000000        NULL
    in pagv exec
    k0x019cc210   324     1     0     0    0 00084003 00000000        NULL
    in pagv
    k0x018cd210   327   324   327     0    0 20082000 00000000        NULL
    in pagv exec
    k0x0319a210   369   324     0     0    0 00887efb 00000000        NULL
    in pagv exec
    k0x0319b210   370   369     0     0    0 00084000 00000000        NULL
    in pagv exec
    k0x06c56210   371   369     0     0    0 00080000 00000000        NULL
    in pagv exec
    k0x00e6c210   375   324   375     0    0 00484007 00000000        NULL
    in pagv exec
    k0x00e6d210   379     1   379     0    0 00004eff 00000000        NULL
    in pagv
    k0x018cc210   382   324     0     0    0 00000000 00000000        NULL
    in pagv exec
    k0x00e13210   394   324   394     0    0 00001ef8 00000000        NULL
    in pagv exec
    k0x078cb210   422     1   422     0    0 00000000 00000000        NULL
    in pagv exec
    k0x05e2b210   438     1   438     0    0 61885eaf 00000000        NULL
    in pagv exec
    k0x05a2f210   449     1   449     0    0 00081ef8 00000000        NULL
    in pagv exec
    k0x07837210   459     1   459     0    0 000018d0 00000000        NULL
    in pagv exec
    k0x027ec210  1494   880  1494     0    0 20000002 00000000        NULL
    in pagv
    k0x078c7210  1510     1  1499     0    0 00000000 00000000        NULL
    in pagv
    k0x01c5b210   500   449   449     0    0 60004607 00000000        NULL
    in pagv exec
    k0x06c57210   504     1     0     0    0 00086000 00000000        NULL
    in pagv
    k0x00e12210   593     1     0     0    0 00001ef8 00000000        NULL
    in pagv
    k0x05a2e210   596     1     0     0    0 00001ef8 00000000        NULL
    in pagv
    k0x027ed210   600     1     0     0    0 00001ef8 00000000        NULL
    in pagv
    k0x05e2a210   615     1     0     0    0 00001ef8 00000000        NULL
    in pagv
    k0x021cb210   621     1   621     0    0 00086001 00000000        NULL
    in pagv
    k0x021ca210   626     1   626     0    0 00002000 00000000        NULL
    in pagv
    k0x00fc1210   640   438   438     0    0 00082000 00000000        NULL
    in pagv exec
    k0x02ba9210   641   438   438     0    0 00082000 00000000        NULL
    in pagv exec
    k0x02ba8210   642   438   438     0    0 00082000 00000000        NULL
    in pagv exec
    k0x03f4e210   643   438   438     0    0 00082000 00000000        NULL
    in pagv exec
    k0x03f4f210   644   640   438     0    0 00080000 00000000        NULL
    in pagv exec
    k0x0365a210   645   640   438     0    0 00080000 00000000        NULL
    in pagv exec
    k0x039d4210   654     1   654     0    0 00084007 00000000        NULL
    in pagv
    k0x041cf210   669     1   669     0    0 60084007 00000000        NULL
    in pagv
    k0x04921210  2746   375   375     0    0 40005efb 00000000        NULL
    in pagv exec
    k0x079bc210   710     1   626     0    0 00080ef8 00000000        NULL
    in pagv
    k0x041ce210   711     1   626     0    0 00001ef8 00000000        NULL
    in pagv exec
    k0x027b2210   747     1   747     0    0 00001ef8 00000000        NULL
    in pagv
    k0x027b3210   755   640   438     0    0 20000000 00000000        NULL
    in pagv exec
    k0x039d5210   756   642   438     0    0 20000000 00000000        NULL
    in pagv exec
    k0x0746e210   759   640   438     0    0 20000000 00000000        NULL
    in pagv exec
    k0x03f67210   809     1   809     0    0 00001ef8 00000000        NULL
    in pagv exec
    k0x00c57210   825     1   825     0    0 00001ef8 00000000        NULL
    in pagv exec
    k0x01e7f210   832     1   832     0    0 00001ef8 00000000        NULL
    in pagv exec
    k0x05694210   846     1   846     0    0 00001ef8 00000000        NULL
    in pagv exec
    k0x00c56210   850   832   832     0    0 00005ef8 00000000        NULL
    in pagv exec
    k0x078c6210   851   846   846     0    0 00005ef8 00000000        NULL
    in pagv exec
    k0x07acd210 22372 22750 22750     0    0 60087afb 00000000        NULL
    in pagv exec
    k0x01e7e210   880     1     0     0    0 20084003 00000000        NULL
    in pagv
    k0x03878210   892   880   892     0   -2 00004003 00000000        NULL
    in pagv exec
    k0x03832210   910     1    99   106    0 00004000 00000000        NULL
    in pagv ctty exec
    k0x03f22210   922     1   922     0    0 10004002 00000000        NULL
    in pagv
    k0x03f23210   939     1   939     0    0 00001ef8 00000000        NULL
    in pagv
    k0x07836210   940   939   940     0    0 00001ef8 00000000        NULL
    in pagv
    k0x03f66210   946     1   946     0    0 00084003 00000000        NULL
    in pagv
    k0x04920210   965     1   965     0    0 00084003 00000000        NULL
    in pagv
    k0x07466210   970     1   970     0    0 00000000 00000000        NULL
    in pagv ctty exec
    k0x03d56210   973   946   946     0    0 00084003 00000000        NULL
    in pagv exec
    k0x07acc210   977   946   946     0  -15 20084003 00000000        NULL
    in pagv exec
    k0x05695210   979   946   946     0    0 00084003 00000000        NULL
    in pagv exec
    _kdbx_proc_end:
    
    Audit subsystem disabled
    
    No audit data to be saved
    #
    _crash_data_collection_finished:
    ----------------------------------------------------------------------
    
    
T.RTitleUserPersonal
Name
DateLines
8660.1please try CANASTA Mail Server ...HAN::HALLEVolker Halle MCS @HAO DTN 863-5216Thu Jan 30 1997 19:5412
    
    Before considering to post a crash-data file in notes, PLEASE try to
    use the CANASTA Mail Server, to find out whether this is a known
    problem.
    
    To learn how to use the CANASTA Mail Server, just send mail to
    
    	[email protected]
    
    with a subject-line of:	HELP
    
    Volker.
8660.2SMURF::MENNERit's just a box of Pax..Thu Jan 30 1997 21:181
    What version of DECnet OSI is running?  If it's not V3.2B - upgrade!
8660.3thanksTROOA::16.154.72.29::smallFri Jan 31 1997 11:4313
Thanks for the reference to CANASTA, I'll start that process.

As for DECnet OSI I'm not at the customer site to check today. It was 
initially installed last May, but I also installed some patches for it that 
was causing the system to crash last summer. Those patches seemed to have 
done the trick since the system ran fine until about November. Anyway I'll 
check the versions of everything since it's about time they did a system 
upgrade to 3.2g of Unix. If all those polycenter products are supported at 
that rev level.

thanks for the notes.
Stephen Small

8660.4netrix.lkg.dec.com::thomasThe Code WarriorFri Jan 31 1997 12:501
Since 3.2B came out after that, you need to upgrade DECnet/OSI.
8660.5Output from Canasta / MachinechkTROOA::16.154.72.8::smallTue Feb 04 1997 14:1644
Hi,

I checked the DECnet subsets with setld, they are all *321 versions. This was 
taken from some hardcopy output of the setld I kept in my files. Is this 
3.2b?.

I ran the crash-data file through Canasta. It asked me to get the machinechk 
tools from mvblab::sable. So I brought the alphastation 1000 version over and 
ran the query through it.

Here is the replies.
>	biu_stat = 254
>			Bit 2 is set - Tag Address Parity Error
>			The Ev4 requested an external cycle
>			the cycle is being performed  = write block
>			bit 11 is clear - Dcache fill reference
>			the failing quadword is = 0
>	epic_dcsr = 0x801e0019
>	coma_edsr = 0xfffa000
>	mchk_code = 0x0
>		The error code entered above has the following meaning
>	the Pal error code entered is unknown
>	
>	Type C to continue: C
>	Replace the CPU CARD
>
>	Checking for multiple errors in the registers
>	Type C to continue:
 returns to main menu

This would tell me that it is hardware problem and that I should replace the 
CPU card in the system. Thanks for all your help, if a cpu replacement doesn't 
solve the problem I'll be letting you know. 

PS. I will actually be recommending the customer upgrade all the software they 
are running to current release levels. That is when I figure out what is 
supported at what release, ie. System Watchdog, with, Polycenter Scheduler 
with X500 with X400 with DECnet OSI with DIGital unix 3.? with Netview with 
NSR with .........   do we have a software release matrix somewhere that could 
help me sort all this out? Then it's firmware time!!!

The fun continues - thanks again

SS
8660.6321 = 3.2aRHETT::MOORETue Feb 04 1997 16:062
    321 is 3.2a
    325 is 3.2b