T.R | Title | User | Personal Name | Date | Lines |
---|
577.1 | try this | MILORD::BISHOP | The punishment that brought us peace was upon Him | Fri May 09 1997 10:30 | 22 |
| What version of VMS?
Set AUTO_ACTION to RESTART and set DUMPSTYLE to 3 (I want to see how
far into BUGCHECK it gets before the second KSTAKNV occurs.)
And use the following hack to force the KSTAKNV...
.psect data,rd,noexe,wrt
zero: .long 0
.psect code,rd,exe,nowrt
foo:: .call_entry
pushab zero
pushab bar
calls #2,sys$cmkrnl
movl #ss$_normal,r0
ret
bar:: .call_entry
calls #0,bar
ret
.end foo
- Richard.
|
577.2 | thanks, sounds like a good idea.... | GIDDAY::FLAWN | | Fri May 09 1997 10:34 | 9 |
| Sorry, it's 6.2-1H3. Yes, setting DUMPSTYLE to 3 sounds good as we'll
get more console output - was discusd briefly this afternoon. The
customer may go back to 1GB for a bit (though they really need the
extra memory). Thanks for the code idea - I was thinking of just
continually increasing KSP but doing repeated calls to overflow the
stack looks much better and more realistic.
Regards and thanks,
Dave.
|
577.3 | Questions, Suggestions... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Fri May 09 1997 10:50 | 22 |
|
What version of firmware is in use, what version of OpenVMS is in use,
and what are the settings of the various console environment variables,
including BOOT_RESET? Are you getting any odd messages during a system
bootstrap -- messages possibly associated with a too-low ERLBUFFERPAGES
-- or anything odd in the error logs? Is shadowing (with the current
ECO) and/or the V7.1 compatibility kit (with the current ECO) in use?
What is the output from the commands:
>>> SHOW FRU
>>> INFO 1
>>> INFO 2
>>> INFO 3
>>> INFO 4
There is a known RMS problem that can crop up on some V6.2-vintage
systems, see the ALPRMSnn_nnn ECO kit for details.
Also note, the AlphaServer 4100 (Rawhide) series conference is located at
MVBLAB::ALPHASERVER_4100.
|
577.4 | thanks, some good ideas I hadn't been into | GIDDAY::FLAWN | | Fri May 09 1997 11:47 | 32 |
| Thanks Steve,
Firmware is latest, 4.8-6, OpenVMS 6.2-1H3. BOOT_RESET is OFF.
I'm not sure which way I should go on that one given what we're
seeing?
Yes - this system did not have ERLBUFFERPAGES set up right - I have now
corrected that - the error log was, as I understand it, writing time
stamps but I think we need this right to ensure we get error info. It
is now.
Not sure on VOLSHAD - hadn't occurred to me. So you're suggesting
ALPSHAD as a precation ? Also sounds good.
Thanks for the RMS one - that's a mandatory so should be recommended.
I've found a possible KRNLSTAKNV from NETACP which may be relevant.
I can get the INFO/SHOW FRU output (it's been captured).
I'm not sure what registers to look at for this sort of thing (right
now I only have hardcopy).
The system has a DEFPA (FDDI), 2 x KZPDA, 1 x CIPCA, S3 VGA, KFPSA.
Same fault even if we move the DEFPA to the other PCI bus or remove the
VGA card.....
I've run down the 4100 track but we think it's software. We don't know
why it doesn't dump. Can't find any h/w issues.
Regards and thanks,
Dave.
|
577.5 | see also note 589 in MVBLAB::ALPHASERVER_4100 | GIDDAY::FLAWN | | Fri May 09 1997 12:05 | 4 |
| Sorry, I forgot (that what happens at 1am !) note 589 in 4100
conference carries exploration of this and I've been in contact with
4100 eng. Sorry - this wasn't really a cross post but I should have
mentioned it.
|
577.6 | First Step... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Fri May 09 1997 12:52 | 3 |
|
Turn BOOT_RESET *on*.
|
577.7 | | UTRTSC::utoras-198-48-95.uto.dec.com::JurVanDerBurg | Change mode to Panic! | Tue May 13 1997 02:11 | 9 |
| The kernel stack not valid crash with RMS occurs when a kernel mode caller
(netacp for example) calls RMS while there's heavy kernel stack usage
(for example from host based raid). It took me while to find that one.
Increasing kernelstackpages does not help in that case, but you should be
able to get a valid crashdump. If it loops then i'm starting to think about
a hardware problem.
Jur.
|
577.8 | thanks, yes, could be hardware or software.... | GIDDAY::FLAWN | | Tue May 13 1997 05:51 | 25 |
| Thanks,
Yes, I agree the symptom does look like a hardware problem but I can't
see anything we haven't covered there unless it's some kind of rev
level incompatibility. We've now applied the RMS kit along with a
DECNET phase IV ECO. It looks like DUMPSTYLE was set to 3 for this, I
must have missed that before but the parameter output I have shows we
get this same scenario with that DUMPSTYLE value.
If the problem is hardware it's very specific - the kernel stack not
valid halts are the only problem we appear to have. When the customer
returns to 2GB memory (which they need to do soon, but may not be for
about 10 days yet) if the problem recurs I'm probably going to have to
escalate it as an OpenVMS problem while at the same time (if the
customer will take the hits) trying more hardware by using 1GB modules
and, if neceesary, subbing the whole machine with a 2100.
We also see this same problem with 1.5 GB, so I suppose that's some
sort of pointer down the hardware side.... using the extra memory
slots... The 4100 motherboard does handle row/column address logic for
memory but I can't see how this could reliably get us a kernel stack
not valid..... and nothing else going wrong.
Regards,
Dave.
|