T.R | Title | User | Personal Name | Date | Lines |
---|
193.1 | DG He say this !! | KERNEL::ADAMS | Brian Adams CSC-Viables '833-3026 | Sun Nov 06 1994 14:04 | 62 |
|
About the training, I intended to do a few more talks, the most
imporant one being the calling-standard as that is the thing that has changed
the most. The plan was for me to this, IO internal and maybe some others also
for Ian megarity to do one on memory management and maybe some others.
(it would have taken a while for both of us to prepare these
ones as are really complicated in axp.)
However when steve left suddenly just when I was starting teaching
this stuff Brian said that we would have to can it for a few months due to not
enouth bums on seats!. I doubt if anything has changed!, plus me and Ian have
been quite busy recently as has everyone else.
What I propose as a workaround is to create a notes-file. When we close
an axp call try to write it up with a bit of detail of whats going on,
put the relevant parts on the call. Folks can ask questions and add comments
(as replies to the note) and hopefully explain things if they are not clear.
Also if you guys can archive the directory contents - dump, plus any temporary
files calls onto ta90 (use the note no as the label) then if you want to check
the dump at a later stage or use it as material for a talk, it will be easy
to get it back.
You guys let me know if this is a good idea, if so put this message
in the notes file to see what the others think. I will also copy Ian, geoff
and Brian to see if they want to put their calls on this notes file.
(The easiest way to do this would be to have a que, say sys_archive.
Any calls worthy of this treatment can go into that queue, if anyone gets time
they can archive the directory contents, delete them and return the call
to who-ever worked on it. (better this way as it makes it easier for ccd should
the call be re-opened).
About Ruth Goldenburg, do you mean the internals book or bugcheck.mem.
I think there is an axp version of bugcheck.mem that you guys should
have in the library if you haven't got it let me know, I will print a copy off.
Note The stack patterns for the bugchecks look quite different to vax
but that is due more to the change in calling standard than anything else.
If internal book you mean, there are axp internals book around I am
sure you guys have some, if not I know Ian and maybe geoff have copies (I don't)
Most of the time the vax internals are near enough, can find out the gory
details/differences from the sources. If you haven't got a hard copy let me
know - I can print some off some postscript files.
(steve copied some over a while back, but better to check with me
first as I will check to see if there are any later versions in the states).
Note that that in many areas the vax internals principles are much the
same, the details are different due to the difference in architecture, calling
standard, size of data items, naming conventions etc. This is why I started off
teaching the architecture and was going to do the calling standard. Once you
know that stuff everying slots into place a bit better.
If we have some time when I can do some more talks let me know, I
will need some time to prepare though.
dg.
|
193.2 | -yes | KERNEL::ANTHONY | | Mon Nov 07 1994 23:09 | 29 |
|
this is interesting!!
I sat in on a couple of Dave's talks in the summer (seems like
a long time ago!) and honestly remember very little...
I think the way to learn this stuff is by example.. It's so
much easier to have something to refer back to.. go through
a worked example and then apply the knowledge to the problem at
hand.
We probably need no more than 2-3 bugcheck examples to start
with, say one of invexceptn, ssrvexcept, and pgfiplhi
What we need is CLEAR CONCISE analysis, not reams of
datastructures that mean nothing unless you are a guru
yourself. Something like the more detailed Stars stuff, but
with more explanation and step-by-step detail.
I still think Dave needs to go ahead and give the talks, but
base the talks on the articles and if necessary re-visit them
at a deeper level another time.
This will need a LOT of work to set up.. DG are you volunteering?
:-)
we can use the systems_tech notes file for this?
Brian
|
193.3 | DG he speaketh again | COMICS::GLEDHILL | | Thu Nov 10 1994 19:26 | 24 |
| I think I must have used reams of datastructures in .1 as I don't think it
was clear to anyone but me.
What I thought the situation was that we didn't have time to do any more talks
whether they were of internals details OR going thru dump files. (when I did
the talks before it was to only 2 or 3 folks at a time).
(well it was you Brian who told me to can the talks until it got less busy).
So I was suggesting that I (and everyone else closing calls) write up the alpha
crashes we close and put them in the notes file and people can read them at
a spare moment or at home etc, and ask questions (via replies to the notes).
If we ever get any time then we can go and do talks on these crashes or
internals or whatever.
I am quite happy to talk as much as you like, but have to balance that against
less time to take calls. As you will have noticed recently I have been taking
more calls from vms and also taking some holiday over the last few ages (so you
guys probably wouldn't notice.
So what are you saying, is the bum/seat ratio such that we can do talks again?
dg.
|
193.4 | A plea for some practical instruction | KERNEL::BLAND | Norman Bland 833 3797 CSC, Basingstoke | Sat Nov 12 1994 10:47 | 51 |
|
OK, Norman B is joining the debate.
From my perpective, there are some issues which need to be understood.
o - We have varying skill levels in analysing VAX system bugchecks
(forget Alpha systems for the moment). There is a large gulf
between the 'very best' and the 'not so good'. Although we need some
focus on analysing Alpha system bugchecks, I do not think we should
abandon assisting those who are in the 'not so good' category (that
includes me), in improving their skills on Vax system bugchecks.
o - Sometime ago, when I and others in the old RDC group were improving
their bugcheck analysis skills, we were placed into the FAST TRACK
group; bugcheck analysis was done (in the main) by the other group
called BUGCHECK ANALYSIS & 9000 (or something similar). Skills were
not totally lost but it did not help when some of the ex FAST TRACK
were placed into the, now, SYSTEMS group.
o - Even if we have sufficient people in the group now (bums on seats),
to attend seminars which theorise about OpenVMS AXP is of little
use. Whilst some theory is necessary, what I need (and I believe
some other Systems engineers), is a practical approach to analysing
bugchecks. When I am analysing an instruction stream within an
AXP bugcheck, when I am considering what data structure I should be
looking at, when I am considering what that structure contains, I
need to do it when actually viewing a 'real' bugcheck. With someone
beside me with the appropriate skills explaining what an
instruction is doing, what commands are available for analysis, the
data structures involved, why he is using a particular
troubleshooting technique.
o - Whilst the idea of saving bugchecks for later study appears to be
a good idea for improving bugcheck analysis skills, I cannot for
the life of me see how we will get the time to do this without
having a serious impact on the manning levels within the group.
o - The idea of writing up details of bugchecks that have been analysed
and placing then in a notesfile, would, I believe, be useful. This
would enable us to ask questions with replies as to 'how' and 'why'
etc.
OK, I am not sure how we do it but I have tried to summarise what
I require to improve my bugcheck skill level. This requires some
theory and a lot of practical work. Different individuals within
the group will have different needs. I need a skilled person to
'teach' me not to blind me with science. Whilst I may never be as
good as the best within our group, I know that I could be
considerably better at bugcheck analysis giving the appropriate
knowledge.
|
193.5 | | COMICS::GLEDHILL | | Sun Nov 13 1994 10:12 | 114 |
| I had a similar conversation with Brian yesterday...
I am probably out of step here, depending on what you mean by analyzing bug
checks (i guess), but in my opinion there isn't a way to 'analyze bugchecks'
A bugcheck is just a dump of the system at the time. If you understand how
the system is supposed to work then you look at the dump and see that it
is not doing what you expect etc...
I never been on a bugcheck analysis course, never sat in with anyone who
analyses them for a living. Just make it up as I go on... (bit like sex,
you can read all the books, but at the end of the day you just get on with it,
use your imagination (and any handy tools lying about)). I think once you try
and pin it down with techniques you lose it.
What I am getting at is that if I could say when you get a particular crash
then you do this, that, then the other it would be easy. Might as well disband
the group and write some software to do it. In my opinion the theory is the
most essential part of it, if you are wanting to get past the diagnosing stage
of the job. Me and probably Im, Gj etc learned that by sitting at home looking
thru the internals books, source code etc as much as anything else.
You need to understand how the system works as a whole fits together. When you
look at a crash you see that the system was in a particular place You ask
yourself questions, how did it get there, should it be be there etc, what else
was going on at the time, what went on before... Its no different to any
other form of trouble-shooting.
As far as finding your way through the mechanics of a dump (ie stacks etc)
the most important things to do are know the architecture and the calling
standard as I already said above. Without that you don't really have a chance
to work out whats going on esp on axp. (As a good example of that, one of the
early bugs in axpvms was when decnet uses R31 to store some stuff in...)
Once you work out where you are you can look at the sources and work out whats
supposed to be happening, then you compare that with the dump and see what
is really happening. Once you find the discrepancy your on your way.
Trouble is if you don't understand how vms is supposed to work often the sources
don't make a lot of sense. That was the good thing about the old vms tcd for
level 7, it made sure that you understood a lot of that stuff like (eg)
io-database
system services, how they are dispatched.
syncronization (mutexes, spinlocks etc).
memory-management (ptes, pfns, pfls etc).
Interrupt, ast dispatching
and so on.
some more comments...
o - Even if we have sufficient people in the group now (bums on seats),
to attend seminars which theorise about OpenVMS AXP is of little
use. Whilst some theory is necessary, what I need (and I believe
some other Systems engineers), is a practical approach to analysing
bugchecks. When I am analysing an instruction stream within an
AXP bugcheck, when I am considering what data structure I should be
looking at, when I am considering what that structure contains, I
need to do it when actually viewing a 'real' bugcheck. With someone
beside me with the appropriate skills explaining what an
instruction is doing, what commands are available for analysis, the
data structures involved, why he is using a particular
troubleshooting technique.
This may help, but most bugchecks are different, the danger with this is relying
on copying what someone else did last time and trying to apply it on the next
one.
A lot of analysis is not a 1 stop process. Most calls I look at for a while
get an idea whats going on, read a couple things, ask a chap, go onto something
else or down the pub. Come back to it later. I find a lot of stuff I work out
when I am not thinking about them. IF you try too hard it makes it hard. This is
why I often say to you chaps I will have a look later. If it was going through
calls already closed though that wouldn't be a problem.
A lot of the above you can work out for yourself. SHould be able to work out
what the instruction does from the book and what I tried to explain in the
talk. When you find the stuff in the sources that should tell you what
data-structures are supposed to be around, you can have a look at them and
see whats going on. HOwever you will need the background theory to understand
where that bit of code/structure fits in/does etc.
The commands for analysis are mostly listed under help. (there are couple
undocumented) Don't forget there are dcl utilities that you can use to save
time (sort, diff, search for example).
Don't get me wrong, I am not saying practical demonstration would not help.
But if you rely on that alone without using your understanding and imagination
as well that demonstration will probably only be of use on how to solve THAT
bugcheck.
o - Whilst the idea of saving bugchecks for later study appears to be
a good idea for improving bugcheck analysis skills, I cannot for
the life of me see how we will get the time to do this without
having a serious impact on the manning levels within the group.
Do you mean the time archiving them or looking at them later?? maybe could just
keep the customer tape as I suggested in an earlier note. Or automate using
sls or something.
o - The idea of writing up details of bugchecks that have been analysed
and placing then in a notesfile, would, I believe, be useful. This
would enable us to ask questions with replies as to 'how' and 'why'
etc.
Not only useful for the people reading it, if you write it up as you go along
I often find it helps the analysis. ONce you try explaining it in words you can
notice mistakes/false assumptions and so on. I think we should all do this
whether we put the call in the notesfile at the end of not. (DOn't think all
calls need do, just those that seem to have educational value.)
I could go on, but think I better go do some work...
dg.
|
193.6 | TCD would help - practical+theory would help | KERNEL::BLAND | Norman Bland 833 3797 CSC, Basingstoke | Sun Nov 13 1994 14:25 | 63 |
|
> depending on what you mean by analyzing bugchecks
I guess the real problem is not understanding the VMS internals sufficiently
well enough.
> As far as finding your way through the mechanics of a dump (ie stacks etc)
> the most important things to do are know the architecture and the calling
> standard as I already said above.
Don't understand them well enough. Sitting in an office and going through the
theory without having an example/examples to work through, is of little use
to me.
> Trouble is if you don't understand how vms is supposed to work often the sources
> don't make a lot of sense. That was the good thing about the old vms tcd for
> level 7, it made sure that you understood a lot of that stuff like (eg)
Absolutely. I have attempted to restart TCD with Brian Lindley but lately the
number of people on shift and the number of calls in our queue have made this
unrealistic. My main aim was to do VMS, starting from a level and working up.
The hope was that the learning would help with bugcheck analysis and for other
software related calls.
> A lot of the above you can work out for yourself. SHould be able to work out
> what the instruction does from the book and what I tried to explain in the
> talk.
I do try but often get stuck; if I didn't, I would not be writing this note.
> When you find the stuff in the sources that should tell you what
> data-structures are supposed to be around, you can have a look at them and
> see whats going on.
If you understand the internals well enough.
> HOwever you will need the background theory to understand
> where that bit of code/structure fits in/does etc.
Yes. But please please let us do this in the context of looking either at
'real' bugchecks or at a 'live' system.
> Don't get me wrong, I am not saying practical demonstration would not help.
So why can't we combine some theory with practical?
> Do you mean the time archiving them or looking at them later??
NO. What I meant was the time to review the cases (bugchecks) with someone who
has a good understanding, in order to learn something.
> Not only useful for the people reading it, if you write it up as you go along
This will have to be a new disipline; namely ensuring that relevant information
from the dump is saved and placed into a notesfile.
> I often find it helps the analysis. ONce you try explaining it in words you can
> notice mistakes/false assumptions and so on. I think we should all do this
I see this as being something that could be VERY useful.
Norman B
|
193.7 | | COMICS::GLEDHILL | | Sun Nov 13 1994 15:38 | 31 |
| I just had a chat with Norman about this on the phone, but he had to go off
and take a call!
I think that proves the point about lack of time to do this sort of stuff.
I think we will have to make time for me to finish what I started in the
summer. We will have to check with Paul first and then book some time.
It was my intention shortly after going thru the instructions to do a talk on
the calling standard, which would have included REAL stacks - but got canned.
The idea being after the theory to go through a stack printout and work out what
everything was on there. I think this is probably the sort of thing that you
want.
This should help in diagnosing (ie working out how we got to where we are).
To get any further need to do some internals stuff. As I said to norm on
the phone, I don't know what prerequeisites are for the internals course, but
what I did was read the architecture and system programming documentation.
This gave me a good enough overview to get something out of the course. (I did
read the first few chapters of the internal book first as well on the advice
of the great Ian Megarity),
As we also discussed on the phone dont'think there any short cuts, either
we or the company (or both) have got to make time for this.
What about the notes file? Do you want to set one up. Shall I do it or is
someone else in charge of this sort of thing??
dg.
|
193.8 | how about.. | KERNEL::ANTHONY | | Mon Nov 14 1994 19:52 | 37 |
|
Ok how about we start a three pronged attack on this?
1 DG sets up an entry in this notesfile for discussion of
one crashdump that we have on the system. (AXP dump)
Dave please choose one which is the most appropriate.
Give a pointer to the dump. AND GIVE US ALL A WEEK TO HAVE
A LOOK AT IT!! (no cheating, don't look at the call update!!)
We can add replies to start the analysis. If we
are way off, DG can give hints. We should all end up understanding
the thought processes needed to go through that PARTICULAR dump.
You should not be afraid of replying and looking a fool if you are
wrong.. this is a learning exercise!!
If worthwhile we start on another dump.. over a period of time, we
will build our expertise, and have a record of analysis to refer
to when we are stuck on a 'real' dump.
2 Dave: create ANOTHER entry for this notesfile that will be used as a
learning tool to understand the AXP calling standard. I see it
firstly as a write up on the standard by DG, followed by Q&A's
If this is successful, DG chooses another topic and we go through
the same process..
3 THEN after Christmas, when we have reasonable manning, we schedule
DG to run seminars on the calling standard etc. we will have
examples (here) to refer to and have sufficient understanding
before the seminar such that the info DG gives is much more
meaningfull..
What does the team think?
we could start this NOW!!!
Brian
|
193.9 | OpenVMS VAX/AXP Internals and Data Structures TBI | KERNEL::BLAND | Norman Bland 833 3797 CSC, Basingstoke | Tue Nov 15 1994 15:03 | 11 |
|
I have not had time to investigate this yet but the following TBI's are
available in TIMATOOLS. I have my fingers crossed.
Norman
6-TBI EY-Q157E-L0-0001 OpenVMS AXP Internals and Data Structures I
6-TBI EY-Q158E-L0-0001 OpenVMS AXP Internals and Data Structures II
16-TBI EY-Q159E-L0-0001 OpenVMS VAX Internals and Data Structures I
16-TBI EY-Q160E-L0-0001 OpenVMS VAX Internals and Data Structures II
|
193.10 | Looks worth a shot. | KERNEL::ADAMS | Brian Adams CSC-Viables '833-3026 | Thu Nov 17 1994 21:09 | 9 |
|
I've had a look at the Alpha versions of these and they look like
VERY useful. So much so, that I'm going to take a copy of the two
student guides and listings, and work my way through these, in my
own time.
Don't know how long it will take, but with some practical in the
office, to back it up, it might be as good as a course !!
|
193.11 | good!! | KERNEL::ANTHONY | | Thu Nov 17 1994 22:16 | 6 |
|
please make a copy available for our library
cheers
Brian
|