[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9146.0. "3.2C simple_lock panic after ttyclose()" by NETRIX::"[email protected]" (Jason Orendorf) Wed Mar 12 1997 15:23

Customer at a secure site (vmcore.X, vmunix.X not available) has a 2100 4/275
w/ 3CPUs running DU3.2C with all patches applied (latest dupatch kit).  When
>1 CPUs enabled, the system will panic immediately after the X login screen
is displayed with "simple_lock: time limit exceeded".  The crash-data shows:

simple_lock: time limit exceeded
  pc of caller:		0xfffffc000042a018
  lock address:		0xfffffc0047df51e0
  current lock state:	0x000000000000001f (cpu=0,pc=0xfffffc000000001c,busy)

stack from panic'ing CPU (CPU 0) is:
0 boot
1 panic
2 simple_lock_fault
3 simple_lock_time_violation
4 ttyclose
5 wsclose
6 gpcclose
7 abc_cnclose
8 cnclose
9 speclose
10 clearalias
11 revoke
12 syscall
13 _Xsyscall

Disabled X and the system comes up fine, no panic.  Reenabling X and disabling
CPU 1 and CPU 2 eliminates the problem, which makes it look doubtful that the
graphics card is at fault.

Any suggestions?

TIA

-Jason

p.s.:  Customer did manage to fax me 2 crash-data files, so if more info from
those will help, I can provide it.

[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
9146.1SMURF::DENHAMDigital UNIX KernelWed Mar 12 1997 21:149
    The current lock state values look bogus. If this easy to reproduce
    have them reboot in lockmode 4 so we can get more data about
    who's got what lock. With a lock timeout like this, either
    someone else has the lock and is "stuck" holding or forgot
    to unlock it. Or the lock could be corrupted or no longer
    a lock. We'd need to be able to see the other threads to
    see whether we can find another owner of the lock.
    
    
9146.2Crashes changed after hardware modsNETRIX::"[email protected]"Jason OrendorfWed Mar 26 1997 15:05137
For some reason, the customer hasn't been able to set lockmode to 4 to get
more info as requested.  Instead, an FE made some drastic hardware changes
(replaced backplane, CPUs, and memory).  Since the hardware changes, the
simple_lock panics have stopped, but now the machine is panic'ing with
several "kernel memory fault"s.  Maybe these will help??

To review:  this is a secure site, system crashes at X startup with
multiple CPUs enabled *most* of the time.  If it doesn't crash immediately,
it will stay up and run just fine.  But getting it to stay up requires
multiple boot attempts/crashes.  The most recent setld patch kit for 3.2C
has been applied.

CANASTA found a match for crash-data.70 below (SRQ C960228-5443) but the
case was closed due to lack of customer interest - no solution provided.

All suggestions appreciated.

-Jason

crash-data.68:
trap: invalid memory read access from kernel mode
    faulting virtual address:     0x000000060000000c
    pc of faulting instruction:   0xfffffc000042a06c
    ra contents at time of fault: 0xfffffc000042a068
    sp contents at time of fault: 0xffffffff88fbf380
panic (cpu 0): kernel memory fault
Begin Trace for machine_slot[paniccpu].cpu_panic_thread: 
>  0 boot(0x0, 0x0, 0xfffffc00005acc90, 0x4f20, 0x0)
["../../../../src/kernel/arch/alpha/machdep.c":1744, 0xfffffc0000466650]
   1 panic(s = 0xfffffc00005acc90 = "kernel memory fault")
["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc00004267b4]
   2 trap() ["../../../../src/kernel/arch/alpha/trap.c":1243,
0xfffffc0000475184]
   3 _XentMM(0x0, 0xfffffc000042a06c, 0xfffffc00005ee9b0, 0x600000000, 0x35)
["../../../../src/kernel/arch/alpha/locore.s":1307, 0xfffffc0000463744]
   4 ttyclose(0x0, 0x0, 0xfffffc0007f73760, 0x800100000000, 0x600000000)
["../../../../src/kernel/bsd/tty.c":1380, 0xfffffc000042a068]
   5 wsclose(0x0, 0x0, 0xfffffc00056a2e00, 0x1, 0x0)
["../../../../src/kernel/io/dec/ws/ws_device.c":862, 0xfffffc000050d910]
   6 gpcclose(0x0, 0x0, 0x0, 0xfffffc000059b990, 0xfffffc0000219540)
["../../../../src/kernel/io/dec/eisa/gpc.c":716, 0xfffffc00004ddfa0]
   7 abc_cnclose(0x0, 0x0, 0x10000000000, 0xfffffc0000219540,
0xfffffc0000000000)
["../../../../src/kernel/arch/alpha/hal/dec2000_cons.c":175,
0xfffffc0000485480]
   8 cnclose(0x10000000000, 0xfffffc0000219540, 0xfffffc0000000000, 0x0,
0xfffffc0000432ad8) ["../../../../src/kernel/arch/alpha/hal/cons_sw.c":175,
0xfffffc0000481558]
   9 speclose(0xfffffc0000000000, 0x0, 0x2000, 0xffffffffffffffff, 0x0)
["../../../../src/kernel/vfs/spec_vnops.c":2306, 0xfffffc0000432ad4]
  10 clearalias(0xfffffc00056a2e00, 0x0, 0xfffffc0007f7a100,
0xfffffc00078ba000, 0xfffffc00056a2e00)
["../../../../src/kernel/vfs/spec_vnops.c":1030, 0xfffffc00004306f0]
  11 revoke(0xfffffc0002ecd210, 0xffffffff88fbf8c8, 0xffffffff88fbf8b8, 0x0,
0xfffffc000046362c) ["../../../../src/kernel/vfs/vfs_syscalls.c":3278,
0xfffffc000043f934]
  12 syscall(0xffffffff88fbc000, 0xfffffc0000200100, 0x0, 0x14000a7f8, 0x38)
["../../../../src/kernel/arch/alpha/syscall_trap.c":538, 0xfffffc0000474004]
  13 _Xsyscall(0x8, 0x120012658, 0x14000a540, 0x11fffff90, 0x192)
["../../../../src/kernel/arch/alpha/locore.s":1094, 0xfffffc0000463534]
End Trace for machine_slot[paniccpu].cpu_panic_thread: 

crash-data.70:
trap: invalid memory read access from kernel mode
    faulting virtual address:     0x0000000000000008
    pc of faulting instruction:   0xfffffc000041100c
    ra contents at time of fault: 0xfffffc00004105c8
    sp contents at time of fault: 0xffffffffa9533630
panic (cpu 0): kernel memory fault
Begin Trace for machine_slot[paniccpu].cpu_panic_thread: 
>  0 boot(0x0, 0x0, 0xfffffc00005acc90, 0x4f20, 0x0)
["../../../../src/kernel/arch/alpha/machdep.c":1744, 0xfffffc0000466650]
   1 panic(s = 0xfffffc00005acc90 = "kernel memory fault")
["../../../../src/kernel/bsd/subr_prf.c":757, 0xfffffc00004267b4]
   2 trap() ["../../../../src/kernel/arch/alpha/trap.c":1243,
0xfffffc0000475184]
   3 _XentMM(0x0, 0xfffffc000041100c, 0xfffffc00005ee9b0, 0xfffffc000e36bd00,
0xfffffc000715d688) ["../../../../src/kernel/arch/alpha/locore.s":1307,
0xfffffc0000463744]
   4 closef(0xfffffc00004105c8, 0xffffffffa9530000, 0xffffffffa9533708,
0xfffffc0037050f30, 0xfffffc0000379cc0)
["../../../../src/kernel/bsd/kern_descrip.c":1378, 0xfffffc0000411008]
   5 close(0xfffffc000715d210, 0xffffffffa95338c8, 0xffffffffa95338b8,
0xfffffc0000474b7c, 0x0) ["../../../../src/kernel/bsd/kern_descrip.c":1071,
0xfffffc00004105c4]
   6 syscall(0x3, 0x3ff8000edcc, 0x0, 0x0, 0x6)
["../../../../src/kernel/arch/alpha/syscall_trap.c":534, 0xfffffc0000473fd8]
   7 _Xsyscall(0x8, 0x3ff800daee8, 0x3ffc0091560, 0x0, 0x1)
["../../../../src/kernel/arch/alpha/locore.s":1094, 0xfffffc0000463534]
End Trace for machine_slot[paniccpu].cpu_panic_thread: 

crash-data.71:
trap: invalid memory write access from kernel mode
    faulting virtual address:     0x000000000042a058
    pc of faulting instruction:   0xfffffc0000457788
    ra contents at time of fault: 0xfffffc0000339144
    sp contents at time of fault: 0xffffffffa0447610
panic (cpu 1): kernel memory fault
Begin Trace for machine_slot[paniccpu].cpu_panic_thread: 
>  0 stop_secondary_cpu() ["../../../../src/kernel/arch/alpha/cpu.c":376,
0xfffffc000045fd68]
   1 panic(s = 0xfffffc00005aa1d8 = "event_timeout: panic request")
["../../../../src/kernel/bsd/subr_prf.c":669, 0xfffffc00004265ec]
   2 event_timeout(func = 0xfffffc0000426840, arg = 0xfffffc0000655c18,
timeout = 0x0) ["../../../../src/kernel/arch/alpha/cpu.c":837,
0xfffffc0000460b18]
   3 xcpu_puts(s = 0xffffffffa0447158, prfbufp = 0xfffffc0000655c18)
["../../../../src/kernel/bsd/subr_prf.c":810, 0xfffffc00004268a4]
   4 printf(va_alist = 0xfffffc00005a6350)
["../../../../src/kernel/bsd/subr_prf.c":355, 0xfffffc0000425bf4]
   5 panic(s = 0xfffffc00005acc90 = "kernel memory fault")
["../../../../src/kernel/bsd/subr_prf.c":719, 0xfffffc000042675c]
   6 trap() ["../../../../src/kernel/arch/alpha/trap.c":1243,
0xfffffc0000475184]
   7 _XentMM(0x4, 0xfffffc0000457788, 0xfffffc00005ee9b0, 0xfffffc0005ae5b80,
0xfffffc0000651908) ["../../../../src/kernel/arch/alpha/locore.s":1307,
0xfffffc0000463744]
   8 remque(0x4, 0xfffffc0000457788, 0xfffffc00005ee9b0, 0xfffffc0005ae5b80,
0xfffffc0000651908) ["../../../../src/kernel/kern/queue.c":165,
0xfffffc0000457784]
   9 find_to(0xb96, 0x1, 0xfffffc0000200d50, 0xfffffc0000200858,
0xfffffc00003390c0) ["../../../../src/kernel/streams/str_env.c":404,
0xfffffc0000339140]
  10 str_timeout(0xfffffc0005ae5b80, 0x0, 0xfffffc000040ce94,
0xffffffffa0444000, 0x0) ["../../../../src/kernel/streams/str_env.c":384,
0xfffffc000033909c]
  11 softclock_scan(usermode = 0x0)
["../../../../src/kernel/bsd/kern_clock.c":1027, 0xfffffc000040d264]
  12 hardclock(pc = 0xfffffc000045a74c = "p^^7\242^A", ps = 0x0)
["../../../../src/kernel/bsd/kern_clock.c":851, 0xfffffc000040ce90]
  13 _XentInt(0x0, 0xfffffc000045a74c, 0xfffffc00005ee9b0, 0x3fff, 0x45b3b0)
["../../../../src/kernel/arch/alpha/locore.s":917, 0xfffffc0000463374]
  14 idle_thread() ["../../../../src/kernel/kern/sched_prim.c":3042,
0xfffffc000045a748]
End Trace for machine_slot[paniccpu].cpu_panic_thread: 

[Posted by WWW Notes gateway]