[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
Notice: | Welcome to the Digital UNIX Conference |
Moderator: | SMURF::DENHAM |
|
Created: | Thu Mar 16 1995 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 10068 |
Total number of notes: | 35879 |
8660.0. "System Crash Query " by CGOOA::SMALL () Thu Jan 30 1997 18:21
Hi folks,
Below is the output of a crash-data file from a Digital system (v3.2d)
which has been running at a customer site for about 9 months. The
system has been experiencing intermittent crashes during overnight
processing over the last 3 months.
The system is a X400/X500 mail server running DECnet as well as TCP/IP.
plus ADVFS, polycenter producst (scheduler, watchdog), NSR and all teh
X400/X500 products.
My quick glance at the messages suggest that the problem is hardware
"panic - machine check hardware error"
But the problem is intermittent and the system always recovers.
The system has crashed about 12 times in the last 2 months. HOwever
since it never goes down while anyone is logged on and using it we
didnt' notice for a while. Each of the crash-data files are pretty much
the same.
What I would appreciate is at least a good notion of whether we have a
hardware problem (call mcs) or a software problem (dig deeper.)
Since I haven't found "A thumbnail guide to crash data files" I thought
I'd post it here.
--------------------------------
#
# Crash Data Collection (Version 1.4)
#
_crash_data_collection_time: Thu Jan 30 05:16:07 PST 1997
_current_directory: /
_crash_kernel: /var/adm/crash/vmunix.38
_crash_core: /var/adm/crash/vmcore.38
_crash_arch: alpha
_crash_os: Digital UNIX
_host_version: Digital UNIX V3.2D-2 (Rev. 41.64); Thu Oct 3 18:21:23
PDT 1996
_crash_version: Digital UNIX V3.2D-2 (Rev. 41.64); Thu Oct 3 18:21:23
PDT 1996
_crashtime: struct {
tv_sec = 854629801
tv_usec = 988688
}
_boottime: struct {
tv_sec = 854589000
tv_usec = 981856
}
_config: struct {
sysname = "OSF1"
nodename = "eis1.bchydro.bc.ca"
release = "V3.2"
version = "41.64"
machine = "alpha"
}
_cpu: 43
_system_string: 0xffffffffff800a20 = "AlphaServer 1000 4/266"
_ncpus: 1
_avail_cpus: 1
_partial_dump: 1
_physmem(MBytes): 127
_panic_string: 0xfffffc000062e410 = "Machine check - Hardware error"
_paniccpu: 0
_panic_thread: 0xfffffc00079bdb80
_preserved_message_buffer_begin:
struct {
msg_magic = 0x63061
msg_bufx = 0x158
msg_bufr = 0x6c2
msg_bufc = "_revision = 0x3
esc_int0 = 0xa1
esc_int1 = 0xef
esc_elcr0 = 0x0
esc_elcr1 = 0x0
esc_last_eisa = 0xff
esc_nmi_stat = 0x20
pci_ir = 0xff
pci_imr = 0x1
svr_mgr = 0xd4
panic (cpu 0): Machine check - Hardware error
syncing disks... DUMP.prom: dev RAID 0 11 0 0 0 0 0, block 131072
DUMP.prom: dev RAID 0 11 0 0 0 0 0, block 131072
de: OSF version 1.46
pci0 at nexus
psiop0 at pci0 slot 6
Loading SIOP: script 800300, reg 82008000, data 406ec2a0
scsi0 at psiop0 slot 0
rz5 at scsi0 bus 0 target 5 lun 0 (DEC RRD45 (C) DEC 1645)
tz6 at scsi0 bus 0 target 6 lun 0 (DEC TLZ07 (C)DEC 553A)
eisa0 at pci0
ace0 at eisa0
ace1 at eisa0
lp0 at eisa0
fdi0 at eisa0
fd0 at fdi0 unit 0
vga0 at eisa0
640x480 (Cirrus )
vga0: Cirrus Logic CL-GD5424 (SVGA) 512 Kbytes
Initializing xcr0. Please wait.
Initializing xcr0. Please wait.
Initializing xcr0. Please wait.
Initializing xcr0. Please wait.
xcr0 at pci0 slot 11
re0 at xcr0 unit 0 (unit status = ONLINE, raid level = 1)
re1 at xcr0 unit 1 (unit status = ONLINE, raid level = JBOD)
tu0: DECchip 21040-AA: Revision: 2.3
tu0 at pci0 slot 13
tu0: DEC TULIP Ethernet Interface, hardware address: 08-00-2B-E6-01-74
tu0: console mode: selecting UTP (10BaseT) port
gpc0 at eisa0
lvm0: configured.
lvm1: configured.
dli: configured
SuperLAT. Copyright 1993 Meridian Technology Corp. All rights reserved.
cam_logger: CAM_ERROR packet
cam_logger: bus 0 target 6 lun 0
ss_device_reset_done
Bus device reset has been performed
ADVFS: using 1152 buffers containing 9.00 megabytes of memory
Node ID is 08-00-2b-e6-01-74 (from device tu0)
dna_netman: configured
dna_dli: configured
Node UID is 34ee8fe0-7a43-11d0-800c-08002be60174
dna_base: configured
dna_xti: configured
AlphaServer 1000 4/266 machine check type 0x670.
retry = 0x0
mchk_code = 0x0
paltemp[1] = 0x8908b5c8
paltemp[2] = 0x4
paltemp[3] = 0x0
paltemp[4] = 0x3dc0
paltemp[5] = 0x28
paltemp[6] = 0x0
paltemp[7] = 0x4200
paltemp[8] = 0x400
paltemp[9] = 0x0
paltemp[10] = 0x4e2ab0
paltemp[11] = 0x0
paltemp[12] = 0x4e2e50
paltemp[13] = 0x4e2e80
paltemp[14] = 0x4e2ee0
paltemp[15] = 0x4e2c50
paltemp[16] = 0x4e2960
paltemp[17] = 0x1a
paltemp[18] = 0x1ffeeea0
paltemp[19] = 0x8908b600
paltemp[20] = 0x65f200
paltemp[21] = 0x0
paltemp[22] = 0x626e6e6e
paltemp[23] = 0x80
paltemp[24] = 0x0
paltemp[25] = 0x10000
paltemp[26] = 0xd
paltemp[27] = 0x0
paltemp[28] = 0x3262000
paltemp[29] = 0x0
paltemp[30] = 0x1
paltemp[31] = 0x5f03a58
exc_addr = 0x383dba
exc_sum = 0x0
msk = 0x0
iccsr = 0x4
pal_base = 0x14000
hier = 0x1cd0
hirr = 0x0
mm_csr = 0x5b11
dc_stat = 0x3
dc_addr = 0xffffffff
abox_ctl = 0x942e
biu_stat = 0x254
biu_addr = 0x752c1e0
biu_ctl = 0x10002227
fill_syndrome = 0x0
fill_adr = 0x6100
va = 0x6170
bc_tag = 0x7614
coma_gcr = 0x7fb20034
coma_edsr = 0xffffa000
coma_ter = 0x7fb27fe0
coma_elar = 0x7fb20800
coma_ehar = 0x7fb20820
coma_ldlr = 0x7fb2c9bf
coma_ldhr = 0x6fb10031
coma_base0 = 0x6fb10000
coma_base1 = 0x6fb10000
coma_base2 = 0x22310000
coma_base3 = 0xffff0000
coma_cnfg0 = 0x22310067
coma_cnfg1 = 0x22310000
coma_cnfg2 = 0x22310000
coma_cnfg3 = 0x7fb20000
epic_dcsr = 0x801e0019
epic_pear = 0x802560
epic_sear = 0x12dfff0
epic_tbr1 = 0x3b2000
epic_tbr2 = 0x0
epic_pbr1 = 0x8c0000
epic_pbr2 = 0x40080000
epic_pmr1 = 0x700000
epic_pmr2 = 0x3ff00000
epic_harx1 = 0x80000000
epic_harx2 = 0x0
epic_pmlt = 0xff
epic_tag0 = 0x80e000
epic_tag1 = 0x810000
epic_tag2 = 0x812000
epic_tag3 = 0x814000
epic_tag4 = 0x801000
epic_tag5 = 0x807000
epic_tag6 = 0x803000
epic_tag7 = 0x80c000
epic_data0 = 0x44bc
epic_data1 = 0x68c0
epic_data2 = 0x6d62
epic_data3 = 0x6bde
epic_data4 = 0x6e4
epic_data5 = 0x6ea
epic_data6 = 0x6e6
epic_data7 = 0x6b26
pceb_vid = 0x8086
pceb_did = 0x482
pceb_revision = 0x5
pceb_command = 0x7
pceb_status = 0x200
pceb_latency = 0xf8
pceb_control = 0x60
pceb_arbcon = 0x9d
pceb_arbpri = 0x4
esc_id = 0xf
esc"
}
_preserved_message_buffer_end:
_kernel_process_status_begin:
PID COMM
00000 kernel idle
00001 init
00003 kloadsrv
00019 update
00079 fax_1
00082 mwatch_1
00085 mwatch_2
00088 sstgw
11357 mta_irchild
22659 psw_sensor_eth
00154 syslogd
00156 binlogd
11468 mta_irchild
00208 routed
22750 sh
11548 mta_remote_api_s
00285 portmap
00287 nfsiod
00288 nfsiod
00289 nfsiod
00290 nfsiod
00291 nfsiod
00292 nfsiod
00293 nfsiod
00296 rpc.statd
00298 rpc.lockd
11566 mta_remote_api_s
22832 sched_agent_comm
00324 dnalimd
00327 dnaevld
00369 dnascd
00370 dnansd
00371 dnaksd
00375 dnsadv
00379 dtssd
00382 dnanoded
00394 dnamopd
00422 osaknmd
00438 mta
00449 smtpgw_ea
00459 dxd_dsad
01494 xdm
01510 dxconsole
00500 smtpgw
00504 sendmail
00593 mold
00596 internet_mom
00600 mgrAgentd_mom
00615 snmp_pe
00621 inetd
00626 cron
00640 mta_irserver
00641 mta_mpserver
00642 mta_rlserver
00643 mta_wjserver
00644 mta_remote_api_s
00645 mta_remote_api_s
00654 lpd
00669 psw_agent
02746 dnsclerk
00710 ibxlookupd
00711 ibxd
00747 rpcd
00755 mta_irchild
00756 mta_rlchild
00759 mta_irchild
00809 sched_agent
00825 sched_listener
00832 sched_txm
00846 sched_engine
00850 sqlexec
00851 sqlexec
22372 sched_agent_job_
00880 xdm
00892 Xdec
00910 sstinfo
00922 _upsd
00939 namon_server
00940 namon_server
00946 nsrd
00965 nsrexecd
00970 getty
00973 nsrmmdbd
00977 nsrmmd
00979 nsrindexd
_kernel_process_status_end:
_current_pid: 22832
_current_tid: 0xfffffc00079bdb80
_proc_thread_list_begin:
thread 0xfffffc00079bdb80 stopped at [boot:1746 ,0xfffffc00004e5d8c]
Source not available
_proc_thread_list_end:
_dump_begin:
> 0 boot(0x0, 0x4, 0x31372, 0x31373, 0x1)
["../../../../src/kernel/arch/alpha/machdep.c":1746,
0xfffffc00004e5d8c]
1 panic(s = 0xfffffc0000616b10 = "thread_block: interrupt level
call") ["../../../../src/kernel/bsd/subr_prf.c":673,
0xfffffc0000442c78]
pcpu = 0xb3f001600000001
i = 4464936
bootopt = 7
mycpu = 7011304
spl = 5
prevcc = 18446739675667504948
nextcc = 18446739675670051736
timer = -4294967292
limit = -4397913889912
2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1748,
0xfffffc0000475198]
thread = 0xfffffc00079bdb80
new_thread = 0xfffffc0000659220
mycpu = 0
myprocessor = 0x20
s = 5
pset = 0xfffffc0000477a08
prev = 0xfffffc00006c8a48
3 thread_preempt(thread = 0xfffffc00079bdb80, processor =
0xfffffc0000154100) ["../../../../src/kernel/kern/sched_prim.c":3460,
0xfffffc0000477a14]
s = 5
pri = 6656544
pset = 0xfffffc00006b6208
4 call_disk(vdp = 0xfffffc0007e7a388, ioAmt = 16384, blk = 384096,
ioList = 0xffffffff87f0a008, s = 0xffffffff8908ab38)
["../../../../src/kernel/msfs/osf/msfs_io.c":1237, 0xfffffc0000404638]
bp = 0xffffffff87f0a008
s = 2
th = 0xfffffc00079bdb80
5 bs_startio(vdp = (nil), s = 0xffffffff8908ab38, flushFlag =
8039488) ["../../../../src/kernel/msfs/osf/msfs_io.c":1497,
0xfffffc0000404b18]
error = -2014057872
ioAmt = 16384
ioList = 0xffffffff87f0a008
iop = 0xfffffc00002a3f70
pages = 9591592
devVirt = 0x114e000
offset = 3921
kpte = 0x1
scratch = union {
quadword = 18446739675663040512
PTE_BITFIELD = struct {
_v = 0
_for = 0
_fow = 0
_foe = 0
_asm = 0
_gh = 0
_prot = 0
_exec = 0
_wire = 0
_seg = 0
_lw_wire = 0
_gh_shared = 0
_soft = 0
_lw_wire_count = 0
_pfn = 4294966272
}
}
sz = -2014162080
blks = 0
i = 115815816
mask = 0
vdBlk = 0
svdBlk = 6656136
readCnt = 15856
rw = -2144937512
msk = 1
qhdr = 0xffffffff87f3eae8
toFlush = 4
6 flush_vols(dmnp = 0x1, s = 0x1, forceFlushFlag = -1995919576)
["../../../../src/kernel/msfs/bs/bs_qio.c":2663, 0xfffffc00003c6138]
vdi = 8039488
vdp = 0xf10
7 get_freebuf(dmnp = 0xfffffc000217e008, bc = 0xffffffff8908aaf0, s
= 0xffffffff8908ab38)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":4430,
0xfffffc00003d5ba8]
bp = 0x1ea2000
tmp = 0xfffffc0007e7a388
heldPg = 1
dmni = 1
i = -2014057624
i = 3950620
lim = 3921
8 bs_pinpg_one_int(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640, putflag = 1)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":3217,
0xfffffc00003d3f00]
bp = 0x1
tmp = 0xf0e7032f09da9
hbp = 0xfffffc0006e73588
sts = 0
i = 0
s = -1995919576
haveBcache = 1
doRead = 0
res = -2014057624
listLen = 0
ioListp = 0x1153fff
wait = 0
desCnt = 18153472
doFlush = 0
shouldBlock = 0
ubcsts = 0
i = 18153472
lim = 0
bfPageSize = 132621192
maskbit = 6681088
pp = 0xfffffc00003c4c3c
ioDescp = 0x2
9 bs_pinpg_clone(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
BS_NIL, ftxH = struct {
hndl = 0
level = 0
dmnh = 0
}, noReadMask = 1, pl = 0xfffffc00007e9640, putflag = 1)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":2907,
0xfffffc00003d3870]
sts = 8296000
bfSetp = 0xfffffc00003d3238
10 bs_pinpg_put(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfAccessH = 31588352, bsPage = 3856, refHint =
BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":2478,
0xfffffc00003d32b8]
bfap = 0xfffffc0006e73580
bs_pinpg_fn_p = 0xfffffc00003f9ff4
11 msfs_putpage(0xfffffc0007634c00, 0x2, 0x1, 0x40,
0xfffffc0004a6ad00)
["../../../../src/kernel/msfs/osf/msfs_misc.c":1813,
0xfffffc00003fa098]
12 ubc_flush_dirty(0x2, 0x40, 0xfffffc000065f200, 0xfffffc0004b57200,
0xfffffc0000452bd8) ["../../../../src/kernel/vfs/vfs_ubc.c":3055,
0xfffffc00002a2c08]
13 mntflushbuf(mountp = 0xfffffc0000006010, flags = 0)
["../../../../src/kernel/vfs/vfs_bio.c":1427, 0xfffffc0000452bcc]
vp = 0xfffffc0000613420
nvp = 0x670
14 boot(0x0, 0x0, 0xfffffc000062e410, 0x4f20, 0x730)
["../../../../src/kernel/arch/alpha/machdep.c":1675,
0xfffffc00004e5c3c]
15 panic(s = 0xfffffc000062e410 = "Machine check - Hardware error")
["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc0000442e34]
pcpu = 0xfffffc0000744030
i = -1995918576
bootopt = 7
mycpu = 1
spl = 7
prevcc = 18446739675663065104
nextcc = 18446739675663065088
timer = -4398041208068
limit = 4072
16 kn22a_machcheck(0x670, 0xfffffc0000006000, 0xffffffff8908b2a8,
0xfffffc0001fb5780, 0x0)
["../../../../src/kernel/arch/alpha/hal/kn22a.c":2477,
0xfffffc000050ebc8]
17 mach_error(0x0, 0xfffffc0000006000, 0xffffffff8908b2a8,
0xfffffffdff7fc000, 0xfffffffdff000000)
["../../../../src/kernel/arch/alpha/hal/cpusw.c":826,
0xfffffc00005026f4]
18 _XentInt(0x0, 0xfffffc0000383db8, 0xfffffc000065f200, 0x0,
0x7e006) ["../../../../src/kernel/arch/alpha/locore.s":997,
0xfffffc00004e2bbc]
19 u_anon_faultpage(0x0, 0x140023990, 0x14000add0, 0x5,
0xfffffc0000410134) ["../../../../src/kernel/vm/u_mape_anon.c":1002,
0xfffffc0000383db4]
_dump_end:
warning: Files compiled -g3: parameter values probably wrong
_kernel_thread_list_begin:
thread 0xfffffc0007ef4000 stopped at [thread_run:2282
,0xfffffc0000475cb4] Source not available
thread 0xfffffc0007ef4400 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007f26800 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007f26c00 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc0007f27000 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc0007f27400 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007f27800 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc0007f27c00 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007ce2000 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007ce2400 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007ce2800 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc0007ce2c00 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc0007ce3400 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007ce3800 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc0007ce3c00 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
thread 0xfffffc000747e000 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc000747e400 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc000747e800 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc000747ec00 stopped at [thread_block:1899
+0x28,0xfffffc0000475458] Source not available
thread 0xfffffc000747f000 stopped at [thread_block:1914
,0xfffffc00004754c8] Source not available
_kernel_thread_list_end:
_savedefp: 0xffffffff8908b950
_kernel_memory_fault_data_begin:
struct {
fault_va = 0x0
fault_pc = 0x0
fault_ra = 0x0
fault_sp = 0x0
access = 0x0
status = 0x0
cpunum = 0x0
count = 0x0
pcb = (nil)
thread = (nil)
task = (nil)
proc = (nil)
}
_kernel_memory_fault_data_end:
Invalid character in input
_uptime: 11.33 hours
paniccpu: 0x0
machine_slot[paniccpu]: struct {
is_cpu = 0x1
cpu_type = 0xf
cpu_subtype = 0x11
running = 0x1
cpu_ticks = {
[0] 0x76f75b
[1] 0x5
[2] 0x32819f
[3] 0x1d476c2
[4] 0x1034
}
clock_freq = 0x400
error_restart = 0x0
cpu_panicstr = 0xfffffc000062e410 = "Machine check - Hardware
error"
cpu_panic_thread = 0xfffffc00079bdb80
}
tset machine_slot[paniccpu].cpu_panic_thread:
Begin Trace for machine_slot[paniccpu].cpu_panic_thread:
> 0 boot(0x0, 0x4, 0x31372, 0x31373, 0x1)
["../../../../src/kernel/arch/alpha/machdep.c":1746,
0xfffffc00004e5d8c]
1 panic(s = 0xfffffc0000616b10 = "thread_block: interrupt level
call") ["../../../../src/kernel/bsd/subr_prf.c":673,
0xfffffc0000442c78]
2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1748,
0xfffffc0000475198]
3 thread_preempt(thread = 0xfffffc00079bdb80, processor =
0xfffffc0000154100) ["../../../../src/kernel/kern/sched_prim.c":3460,
0xfffffc0000477a14]
4 call_disk(vdp = 0xfffffc0007e7a388, ioAmt = 0x4000, blk = 0x5dc60,
ioList = 0xffffffff87f0a008, s = 0xffffffff8908ab38)
["../../../../src/kernel/msfs/osf/msfs_io.c":1237, 0xfffffc0000404638]
5 bs_startio(vdp = (nil), s = 0xffffffff8908ab38, flushFlag =
0x7aac40) ["../../../../src/kernel/msfs/osf/msfs_io.c":1497,
0xfffffc0000404b18]
6 flush_vols(dmnp = 0x1, s = 0x1, forceFlushFlag =
0xffffffff8908af28) ["../../../../src/kernel/msfs/bs/bs_qio.c":2663,
0xfffffc00003c6138]
7 get_freebuf(dmnp = 0xfffffc000217e008, bc = 0xffffffff8908aaf0, s
= 0xffffffff8908ab38)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":4430,
0xfffffc00003d5ba8]
8 bs_pinpg_one_int(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 0xf10, refHint
= BS_NIL, noReadMask = 0x1, pl = 0xfffffc00007e9640, putflag = 0x1)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":3217,
0xfffffc00003d3f00]
9 bs_pinpg_clone(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 0xf10, refHint
= BS_NIL, ftxH = (...), noReadMask = 0x1, pl = 0xfffffc00007e9640,
putflag = 0x1) ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2907,
0xfffffc00003d3870]
10 bs_pinpg_put(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfAccessH = 0x1e20000, bsPage = 0xf10, refHint =
BS_NIL, noReadMask = 0x1, pl = 0xfffffc00007e9640)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":2478,
0xfffffc00003d32b8]
11 msfs_putpage(0xfffffc0007634c00, 0x2, 0x1, 0x40,
0xfffffc0004a6ad00)
["../../../../src/kernel/msfs/osf/msfs_misc.c":1813,
0xfffffc00003fa098]
12 ubc_flush_dirty(0x2, 0x40, 0xfffffc000065f200, 0xfffffc0004b57200,
0xfffffc0000452bd8) ["../../../../src/kernel/vfs/vfs_ubc.c":3055,
0xfffffc00002a2c08]
13 mntflushbuf(mountp = 0xfffffc0000006010, flags = 0x0)
["../../../../src/kernel/vfs/vfs_bio.c":1427, 0xfffffc0000452bcc]
14 boot(0x0, 0x0, 0xfffffc000062e410, 0x4f20, 0x730)
["../../../../src/kernel/arch/alpha/machdep.c":1675,
0xfffffc00004e5c3c]
15 panic(s = 0xfffffc000062e410 = "Machine check - Hardware error")
["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc0000442e34]
16 kn22a_machcheck(0x670, 0xfffffc0000006000, 0xffffffff8908b2a8,
0xfffffc0001fb5780, 0x0)
["../../../../src/kernel/arch/alpha/hal/kn22a.c":2477,
0xfffffc000050ebc8]
17 mach_error(0x0, 0xfffffc0000006000, 0xffffffff8908b2a8,
0xfffffffdff7fc000, 0xfffffffdff000000)
["../../../../src/kernel/arch/alpha/hal/cpusw.c":826,
0xfffffc00005026f4]
18 _XentInt(0x0, 0xfffffc0000383db8, 0xfffffc000065f200, 0x0,
0x7e006) ["../../../../src/kernel/arch/alpha/locore.s":997,
0xfffffc00004e2bbc]
19 u_anon_faultpage(0x0, 0x140023990, 0x14000add0, 0x5,
0xfffffc0000410134) ["../../../../src/kernel/vm/u_mape_anon.c":1002,
0xfffffc0000383db4]
End Trace for machine_slot[paniccpu].cpu_panic_thread:
"cpu_data" is not an array
_stack_trace[0]_begin:
> 0 boot(0x0, 0x4, 0x31372, 0x31373, 0x1)
["../../../../src/kernel/arch/alpha/machdep.c":1746,
0xfffffc00004e5d8c]
1 panic(s = 0xfffffc0000616b10 = "thread_block: interrupt level
call") ["../../../../src/kernel/bsd/subr_prf.c":673,
0xfffffc0000442c78]
2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1748,
0xfffffc0000475198]
3 thread_preempt(thread = 0xfffffc00079bdb80, processor =
0xfffffc0000154100) ["../../../../src/kernel/kern/sched_prim.c":3460,
0xfffffc0000477a14]
4 call_disk(vdp = 0xfffffc0007e7a388, ioAmt = 16384, blk = 384096,
ioList = 0xffffffff87f0a008, s = 0xffffffff8908ab38)
["../../../../src/kernel/msfs/osf/msfs_io.c":1237, 0xfffffc0000404638]
5 bs_startio(vdp = (nil), s = 0xffffffff8908ab38, flushFlag =
8039488) ["../../../../src/kernel/msfs/osf/msfs_io.c":1497,
0xfffffc0000404b18]
6 flush_vols(dmnp = 0x1, s = 0x1, forceFlushFlag = -1995919576)
["../../../../src/kernel/msfs/bs/bs_qio.c":2663, 0xfffffc00003c6138]
7 get_freebuf(dmnp = 0xfffffc000217e008, bc = 0xffffffff8908aaf0, s
= 0xffffffff8908ab38)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":4430,
0xfffffc00003d5ba8]
8 bs_pinpg_one_int(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640, putflag = 1)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":3217,
0xfffffc00003d3f00]
9 bs_pinpg_clone(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfap = 0xffffffff8026d798, bsPage = 3856, refHint =
BS_NIL, ftxH = (...), noReadMask = 1, pl = 0xfffffc00007e9640, putflag
= 1) ["../../../../src/kernel/msfs/bs/bs_buffer2.c":2907,
0xfffffc00003d3870]
10 bs_pinpg_put(bfPageRefH = 0xffffffff8908ae20, bfPageAddr =
0xffffffff8908ae18, bfAccessH = 31588352, bsPage = 3856, refHint =
BS_NIL, noReadMask = 1, pl = 0xfffffc00007e9640)
["../../../../src/kernel/msfs/bs/bs_buffer2.c":2478,
0xfffffc00003d32b8]
11 msfs_putpage(0xfffffc0007634c00, 0x2, 0x1, 0x40,
0xfffffc0004a6ad00)
["../../../../src/kernel/msfs/osf/msfs_misc.c":1813,
0xfffffc00003fa098]
12 ubc_flush_dirty(0x2, 0x40, 0xfffffc000065f200, 0xfffffc0004b57200,
0xfffffc0000452bd8) ["../../../../src/kernel/vfs/vfs_ubc.c":3055,
0xfffffc00002a2c08]
13 mntflushbuf(mountp = 0xfffffc0000006010, flags = 0)
["../../../../src/kernel/vfs/vfs_bio.c":1427, 0xfffffc0000452bcc]
14 boot(0x0, 0x0, 0xfffffc000062e410, 0x4f20, 0x730)
["../../../../src/kernel/arch/alpha/machdep.c":1675,
0xfffffc00004e5c3c]
15 panic(s = 0xfffffc000062e410 = "Machine check - Hardware error")
["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc0000442e34]
16 kn22a_machcheck(0x670, 0xfffffc0000006000, 0xffffffff8908b2a8,
0xfffffc0001fb5780, 0x0)
["../../../../src/kernel/arch/alpha/hal/kn22a.c":2477,
0xfffffc000050ebc8]
17 mach_error(0x0, 0xfffffc0000006000, 0xffffffff8908b2a8,
0xfffffffdff7fc000, 0xfffffffdff000000)
["../../../../src/kernel/arch/alpha/hal/cpusw.c":826,
0xfffffc00005026f4]
18 _XentInt(0x0, 0xfffffc0000383db8, 0xfffffc000065f200, 0x0,
0x7e006) ["../../../../src/kernel/arch/alpha/locore.s":997,
0xfffffc00004e2bbc]
19 u_anon_faultpage(0x0, 0x140023990, 0x14000add0, 0x5,
0xfffffc0000410134) ["../../../../src/kernel/vm/u_mape_anon.c":1002,
0xfffffc0000383db4]
_stack_trace[0]_end:
_savedefp_exception_frame_(savedefp/33X):
ffffffff8908b950: 0000000000072008 0000000000000000
ffffffff8908b960: 0000000000000001 0000000000022000
ffffffff8908b970: 0000000000003dc0 0000000000000028
ffffffff8908b980: 0000000000000000 000000000000000d
ffffffff8908b990: 000000000000001a 000000000008bac8
ffffffff8908b9a0: 0000000140023e20 000000011ffff838
ffffffff8908b9b0: 0000000000000000 0000000140023990
ffffffff8908b9c0: 000000014000add0 0000000000000005
ffffffff8908b9d0: 000000011ffff168 000003ffc01db7f0
ffffffff8908b9e0: 000000011ffee8a5 000003ffc01de2f0
ffffffff8908b9f0: 0000000000000000 0000000000000282
ffffffff8908ba00: 0000000000000000 000003ff804bb160
ffffffff8908ba10: 000003ff8051bc40 000000011fffef30
ffffffff8908ba20: 000000011ffffba0 0000000000000008
ffffffff8908ba30: 000003ff8051bcc0 000003ffc01ac880
ffffffff8908ba40: 000000000007dfc8 0000000000000000
ffffffff8908ba50: 000000000000fde8
_savedefp_exception_frame_ptr: 0xffffffff8908b950
_savedefp_stack_pointer: 0x11ffffba0
_savedefp_processor_status: 0x8
_savedefp_return_address: 0x3ff804bb160
_savedefp_pc: 0x3ff8051bcc0
_savedefp_pc/i:
l1 address 0x3ff8051bcc0 not mapped, pte 0x0
can't read from process (address 0x3ff8051bcc0)
_savedefp_return_address/i:
l1 address 0x3ff804bb160 not mapped, pte 0x0
can't read from process (address 0x3ff804bb160)
_kernel_memory_fault_data.fault_pc/i:
can't read from process (address 0x0)
_kernel_memory_fault_data.fault_ra/i:
can't read from process (address 0x0)
_kdbx_sum_start:
Hostname : eis1.bchydro.bc.ca
cpu: AlphaServer 1000 4/266 avail: 1
Boot-time: Wed Jan 29 17:50:00 1997
Time: Thu Jan 30 05:10:01 1997
Kernel : OSF1 release V3.2 version 41.64 (alpha)
_kdbx_sum_end:
_kdbx_swap_start:
Swap device name Size In Use Free
-------------------------------- ---------- ---------- ----------
/dev/re0b 131072k 20616k 110456k
16384p 2577p 13807p
/dev/re1b 131072k 19200k 111872k
16384p 2400p 13984p
/dev/re1d 262144k 18872k 243272k
32768p 2359p 30409p
-------------------------------- ---------- ---------- ----------
Total swap partitions: 3 524288k 58688k 465600k
65536p 7336p 58200p
_kdbx_swap_end:
_kdbx_proc_start:
Addr PID PPID PGRP UID NICE SIGCATCH P_SIG Event
Flags
=========== ===== ===== ===== ===== ==== ======== ======== ===========
============
k0x07eed210 0 0 0 0 0 00000000 00000000 NULL
in sys
k0x07e9b210 1 0 1 0 0 307a7eff 00000000 NULL
in pagv exec
k0x0746f210 3 1 2 0 0 00004006 00000000 NULL
in pagv exec
k0x07467210 19 1 19 0 0 00002000 00000000 NULL
in pagv
k0x021ad210 79 1 33 106 0 00006000 00000000 NULL
in pagv ctty exec
k0x019cd210 82 1 33 106 0 00004000 00000000 NULL
in pagv ctty exec
k0x017b4210 85 1 33 106 0 00004000 00000000 NULL
in pagv ctty exec
k0x017b5210 88 1 33 106 0 00004000 00000000 NULL
in pagv ctty exec
k0x01c5a210 11357 640 438 0 0 20000000 00000000 NULL
in pagv exec
k0x052da210 22659 669 669 0 0 00000006 00000000 NULL
in pagv exec
k0x01fea210 154 1 154 0 0 00086001 00000000 NULL
in pagv
k0x017a5210 156 1 156 0 0 00004001 00000000 NULL
in pagv
k0x00ebe210 11468 640 438 0 0 20000000 00000000 NULL
in pagv exec
k0x017a4210 208 1 208 0 0 20006003 00000000 NULL
in pagv
k0x0667c210 22750 809 22750 0 0 60007eff 00000000 NULL
in pagv exec
k0x00ebf210 11548 645 438 0 0 00080000 00000000 NULL
in pagv
k0x021ac210 285 1 285 0 0 00080628 00000000 NULL
in pagv
k0x00deb210 287 1 287 0 0 00000000 00000000 NULL
in pagv
k0x00dea210 288 287 287 0 0 00000000 00000000 NULL
in pagv
k0x07663210 289 287 287 0 0 00000000 00000000 NULL
in pagv
k0x07662210 290 287 287 0 0 00000000 00000000 NULL
in pagv
k0x033f8210 291 287 287 0 0 00000000 00000000 NULL
in pagv
k0x033f9210 292 287 287 0 0 00000000 00000000 NULL
in pagv
k0x06e76210 293 287 287 0 0 00000000 00000000 NULL
in pagv
k0x06e77210 296 1 0 0 0 00002000 00000000 NULL
in pagv ctty
k0x078ca210 298 1 298 0 0 00002000 00000000 NULL
in pagv
k0x0667d210 11566 645 438 0 0 00080000 00000000 NULL
in pagv
k0x079bd210 22832 22372 22750 0 0 00001ef8 00000000 NULL
in pagv exec
k0x019cc210 324 1 0 0 0 00084003 00000000 NULL
in pagv
k0x018cd210 327 324 327 0 0 20082000 00000000 NULL
in pagv exec
k0x0319a210 369 324 0 0 0 00887efb 00000000 NULL
in pagv exec
k0x0319b210 370 369 0 0 0 00084000 00000000 NULL
in pagv exec
k0x06c56210 371 369 0 0 0 00080000 00000000 NULL
in pagv exec
k0x00e6c210 375 324 375 0 0 00484007 00000000 NULL
in pagv exec
k0x00e6d210 379 1 379 0 0 00004eff 00000000 NULL
in pagv
k0x018cc210 382 324 0 0 0 00000000 00000000 NULL
in pagv exec
k0x00e13210 394 324 394 0 0 00001ef8 00000000 NULL
in pagv exec
k0x078cb210 422 1 422 0 0 00000000 00000000 NULL
in pagv exec
k0x05e2b210 438 1 438 0 0 61885eaf 00000000 NULL
in pagv exec
k0x05a2f210 449 1 449 0 0 00081ef8 00000000 NULL
in pagv exec
k0x07837210 459 1 459 0 0 000018d0 00000000 NULL
in pagv exec
k0x027ec210 1494 880 1494 0 0 20000002 00000000 NULL
in pagv
k0x078c7210 1510 1 1499 0 0 00000000 00000000 NULL
in pagv
k0x01c5b210 500 449 449 0 0 60004607 00000000 NULL
in pagv exec
k0x06c57210 504 1 0 0 0 00086000 00000000 NULL
in pagv
k0x00e12210 593 1 0 0 0 00001ef8 00000000 NULL
in pagv
k0x05a2e210 596 1 0 0 0 00001ef8 00000000 NULL
in pagv
k0x027ed210 600 1 0 0 0 00001ef8 00000000 NULL
in pagv
k0x05e2a210 615 1 0 0 0 00001ef8 00000000 NULL
in pagv
k0x021cb210 621 1 621 0 0 00086001 00000000 NULL
in pagv
k0x021ca210 626 1 626 0 0 00002000 00000000 NULL
in pagv
k0x00fc1210 640 438 438 0 0 00082000 00000000 NULL
in pagv exec
k0x02ba9210 641 438 438 0 0 00082000 00000000 NULL
in pagv exec
k0x02ba8210 642 438 438 0 0 00082000 00000000 NULL
in pagv exec
k0x03f4e210 643 438 438 0 0 00082000 00000000 NULL
in pagv exec
k0x03f4f210 644 640 438 0 0 00080000 00000000 NULL
in pagv exec
k0x0365a210 645 640 438 0 0 00080000 00000000 NULL
in pagv exec
k0x039d4210 654 1 654 0 0 00084007 00000000 NULL
in pagv
k0x041cf210 669 1 669 0 0 60084007 00000000 NULL
in pagv
k0x04921210 2746 375 375 0 0 40005efb 00000000 NULL
in pagv exec
k0x079bc210 710 1 626 0 0 00080ef8 00000000 NULL
in pagv
k0x041ce210 711 1 626 0 0 00001ef8 00000000 NULL
in pagv exec
k0x027b2210 747 1 747 0 0 00001ef8 00000000 NULL
in pagv
k0x027b3210 755 640 438 0 0 20000000 00000000 NULL
in pagv exec
k0x039d5210 756 642 438 0 0 20000000 00000000 NULL
in pagv exec
k0x0746e210 759 640 438 0 0 20000000 00000000 NULL
in pagv exec
k0x03f67210 809 1 809 0 0 00001ef8 00000000 NULL
in pagv exec
k0x00c57210 825 1 825 0 0 00001ef8 00000000 NULL
in pagv exec
k0x01e7f210 832 1 832 0 0 00001ef8 00000000 NULL
in pagv exec
k0x05694210 846 1 846 0 0 00001ef8 00000000 NULL
in pagv exec
k0x00c56210 850 832 832 0 0 00005ef8 00000000 NULL
in pagv exec
k0x078c6210 851 846 846 0 0 00005ef8 00000000 NULL
in pagv exec
k0x07acd210 22372 22750 22750 0 0 60087afb 00000000 NULL
in pagv exec
k0x01e7e210 880 1 0 0 0 20084003 00000000 NULL
in pagv
k0x03878210 892 880 892 0 -2 00004003 00000000 NULL
in pagv exec
k0x03832210 910 1 99 106 0 00004000 00000000 NULL
in pagv ctty exec
k0x03f22210 922 1 922 0 0 10004002 00000000 NULL
in pagv
k0x03f23210 939 1 939 0 0 00001ef8 00000000 NULL
in pagv
k0x07836210 940 939 940 0 0 00001ef8 00000000 NULL
in pagv
k0x03f66210 946 1 946 0 0 00084003 00000000 NULL
in pagv
k0x04920210 965 1 965 0 0 00084003 00000000 NULL
in pagv
k0x07466210 970 1 970 0 0 00000000 00000000 NULL
in pagv ctty exec
k0x03d56210 973 946 946 0 0 00084003 00000000 NULL
in pagv exec
k0x07acc210 977 946 946 0 -15 20084003 00000000 NULL
in pagv exec
k0x05695210 979 946 946 0 0 00084003 00000000 NULL
in pagv exec
_kdbx_proc_end:
Audit subsystem disabled
No audit data to be saved
#
_crash_data_collection_finished:
----------------------------------------------------------------------
T.R | Title | User | Personal Name | Date | Lines |
---|
8660.1 | please try CANASTA Mail Server ... | HAN::HALLE | Volker Halle MCS @HAO DTN 863-5216 | Thu Jan 30 1997 19:54 | 12 |
|
Before considering to post a crash-data file in notes, PLEASE try to
use the CANASTA Mail Server, to find out whether this is a known
problem.
To learn how to use the CANASTA Mail Server, just send mail to
[email protected]
with a subject-line of: HELP
Volker.
|
8660.2 | | SMURF::MENNER | it's just a box of Pax.. | Thu Jan 30 1997 21:18 | 1 |
| What version of DECnet OSI is running? If it's not V3.2B - upgrade!
|
8660.3 | thanks | TROOA::16.154.72.29::small | | Fri Jan 31 1997 11:43 | 13 |
| Thanks for the reference to CANASTA, I'll start that process.
As for DECnet OSI I'm not at the customer site to check today. It was
initially installed last May, but I also installed some patches for it that
was causing the system to crash last summer. Those patches seemed to have
done the trick since the system ran fine until about November. Anyway I'll
check the versions of everything since it's about time they did a system
upgrade to 3.2g of Unix. If all those polycenter products are supported at
that rev level.
thanks for the notes.
Stephen Small
|
8660.4 | | netrix.lkg.dec.com::thomas | The Code Warrior | Fri Jan 31 1997 12:50 | 1 |
| Since 3.2B came out after that, you need to upgrade DECnet/OSI.
|
8660.5 | Output from Canasta / Machinechk | TROOA::16.154.72.8::small | | Tue Feb 04 1997 14:16 | 44 |
| Hi,
I checked the DECnet subsets with setld, they are all *321 versions. This was
taken from some hardcopy output of the setld I kept in my files. Is this
3.2b?.
I ran the crash-data file through Canasta. It asked me to get the machinechk
tools from mvblab::sable. So I brought the alphastation 1000 version over and
ran the query through it.
Here is the replies.
> biu_stat = 254
> Bit 2 is set - Tag Address Parity Error
> The Ev4 requested an external cycle
> the cycle is being performed = write block
> bit 11 is clear - Dcache fill reference
> the failing quadword is = 0
> epic_dcsr = 0x801e0019
> coma_edsr = 0xfffa000
> mchk_code = 0x0
> The error code entered above has the following meaning
> the Pal error code entered is unknown
>
> Type C to continue: C
> Replace the CPU CARD
>
> Checking for multiple errors in the registers
> Type C to continue:
returns to main menu
This would tell me that it is hardware problem and that I should replace the
CPU card in the system. Thanks for all your help, if a cpu replacement doesn't
solve the problem I'll be letting you know.
PS. I will actually be recommending the customer upgrade all the software they
are running to current release levels. That is when I figure out what is
supported at what release, ie. System Watchdog, with, Polycenter Scheduler
with X500 with X400 with DECnet OSI with DIGital unix 3.? with Netview with
NSR with ......... do we have a software release matrix somewhere that could
help me sort all this out? Then it's firmware time!!!
The fun continues - thanks again
SS
|
8660.6 | 321 = 3.2a | RHETT::MOORE | | Tue Feb 04 1997 16:06 | 2 |
| 321 is 3.2a
325 is 3.2b
|