[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference koolit::vms_curriculum

Title:VMS Curriculum
Moderator:SUPER::MARSH
Created:Thu Nov 01 1990
Last Modified:Sun Aug 25 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:185
Total number of notes:2026

105.0. "SYSNET III -- Monitoring the System" by SUPER::REGNELL (Smile!--Payback is a MOTHER!) Tue Mar 19 1991 14:54

    
T.RTitleUserPersonal
Name
DateLines
105.1Module 10 Posted for ReviewWHEEL::MOSTEIKAPaul @ GSF/B21, DTN 264 (884)-2832Thu Sep 12 1991 14:168
Please note that Module 10 - System Monitoring, is currently available on node
SUPER for your technical/content and organizational review.

	SUPER::ES$REVIEW:[SYSNET_III]SYSNETIII_CHAP10.PS

					Regards,

					Paul M.
105.2Comments from DC Training CenterTEACH::WENDYFri Sep 27 1991 13:2866
                        SYSTEM MONITORING
                          10


1-9
I don't think this page really fits in here. We are talking
about monitoring a cluster here, not improving performance.  Anyway this
tells them how to relieve a bottleneck, but not how to tell if they have
one, which would include another monitor display.  Maybe show them the 
commands to point the system to those files if you move them to another
disk.  But really I think the page should go.

1-10 What do you do with this info? Like the instructor note says-
you can't do much with this info- so why show it to them.  It just
generates questions that you can't answer because most of the
display is useless.

1-11 same as above

1-12a  It says arriving and departing network packets rates are roughly
equivalent.  What if they are not? what does that mean? Put that in here.

1-14  I don't think INIT is a good command to start off with here.Why
not just get into show cluster ans assume you get the default display.
Since you are removing software and votes ahy not ahow them what the 
display looks like before you do that.

1-15  What do they do with the info in example 1-11.  How would you know 
if any of this is wrong.

1-23  If we have troubleshooting in this module maybe the module
should be called System Monitoring and Trouble Shooting.

1-26  Table 1-4  The bottom part of the diagram is confusing. The possible 
causes do not corresspond with the Trouble shooting technique.

1-27  Again the bootom of this chart stinks.  It was bad in SMII and
here it is again.  For example Type C and hit return.  What are
you talking about, where do you type this and why?  I have a feeling
you are talking about cancelling mount verification, but how can anyone
figure that out?? Very, Very vague.  

Also Take appropriate action-You can't be so vague.  What is appropriate
action? That is waht they will want to know.

1-29 Do $Search/window=
That is much more useful.

1-31a. Dynamic is the number of pages that have been found as bad since the
last boot.

1-33 This is not enough. Either leave it out totally and restrict it to
the appendix or put more in the main part of the chapter.

1-35 typo.  It is errorlog.sys not errorlog.sass.

1-38  Put something in instuctor notes about what to get out of this
display.

1-45  Will they know what a machine check is or why you do a machine
check. What will they do with this report.  Maybe put something in here.

1-46  What do you do with this info?

Wendy
105.3Is there a more recent version?NWGEDU::RODENBURGEd. Services, The NetherlandsFri Nov 08 1991 10:467
    I want to review the material I received. Is there a more recent
    version of the chapter included into the presented complete
    courseguide?
    
    Otherwise I will start to review next monday the current one.
    
    Joop
105.5SUPER::MATTHEWSFri Nov 22 1991 15:342
    In the pilot teach, this chapter took approximately 2-1/2 hours. (The
    appendix was not covered.)
105.6New Version of System Monitoring is AvailableHARDY::MOSTEIKAPaul, ZKO1-1/D42 DTN 381 (881)-1075Tue Dec 17 1991 15:4316
You will find a new version of the Monitoring Module in the review area (as well 
as a revised Performance Module). 

SUPER>dire ES$REVIEW:[SYSNET_III]SYSNETIII_CHAP9.PS;1

Directory ES$REVIEW:[SYSNET_III]

SYSNETIII_CHAP9.PS;1
                       1329  11-DEC-1991 15:12:39.86  11-DEC-1991 15:12:43.61

All your comments were carefully considered, and you'll find that almost all
were implemented.

Thanks for all the review comments.

						Paul M.
105.7Trouble shooting needs to be part of the titleSOAEDS::TRAYSERSeniority means a bigger shovel!Fri Feb 28 1992 02:1370
This is primarily the SW/HW trouble shooting modules from the SM II course.
No significant problems with this module, although it has a few displays 
that I know came from V4 VMS.  
  
Overall this chapter has some of the better instructor pages, quite useful
and in many cases very clear explanation of the student page -- nice job!

9-9 - 9-18 --
     These pages jump back and forth between VAXclusters and DECnet.  I'd
     suggest starting with the DECnet displays, this helps define some of 
     the VAXcluster displays.  I'd suggest something like:

       Start with 9-8, then move to 9-14 through 9-18.  Then go back to 
       9-7 (maybe even add a MONITOR CLUSTER -- remember that 65% of my
       students did NOT go to SysNet 2) and then on to 9-9 through 9-13.

9-10a, last line --
     Last I checked this wasn't true for all cases.  If a node is already 
     in the cluster using a quorum disk and another node wants to join but
     with its differently named quorum disk, the disk is ignored and 
     (providing all else is fine) the VAX joins without its choice of 
     drive.

9-14 - 9-15 --
     I'd suggest combining these pages.  Also, don't turn on all events.
     Show what logging we have on and maybe show adding a certain item
     or two.  And, so long as we are here, how about explaining why 
     we chose to be looking at this stuff.

9-14 - 9-16 --
     OK, so where do we go from here?  If this is supposed to be a task 
     oriented course, this lesson on events needs  to lead somewhere. We
     look at events, we turn them on, but....

9-17, 1st bullet --
     "Network activity can be monitored..." and the first bullet mentions
     LIST command in NCP.  I don't know of any list commands that will 
     display Network activity.
 
      whole page --
     We seem to be telling them that we can show them lots of stuff, but
     we don't show them any.

9-18 --
     WAIT!  Why are we zeroing counters before we show them something?!  The
     page title does not match page contents.  Add a SHOW command or just
     something useful with the counters.  For that matter, lets leave the 
     zeroing to the DECnet Mgr I course.

9-20a, 4th paragraph --
     Nit - misspelled "DECnet sofyware..."

9-20, 4th bullet --
     Change this to "Is it reasonable to swap in another device?", to make 
     sure that if a disk pack goes bad they don't start swapping it to another
     drive, thus destroying several drives - a real "classic" mistake.

9-22, last sentence --
     "The final step is to reboot the node" -- HA!  Kill this line, this
     is bogus.  Rebooting the VAX may not fix the problem!!  Just end it with
     something like "contact your support organization".

     Actually this entire page doesn't make any sense.  The problem statement 
     seems to be that the VAX is working but only from the console.  But the
     instructor's page is talking about a system hang.  If the system is hung
     how can we run LAT$STARTUP??  

  More later....

$
105.8Thanks for the CommentsSUPER::MOSTEIKAPaul, ZKO1-1/D42 DTN 381 (881)-1075Mon Mar 02 1992 10:0030
Keep those comments coming. It would've been nice to have them earlier, but,
I know how it is when you're teaching.

The general theme of this module was to monitor on a routine basis, for the 
purpose of avoiding system degradation. Whether it's performance related, 
errors, or what have you. Also as a prelude to the Performance module.

We wanted to have them use the tools available to do this. While they're
looking at an error log, we might as well explain what it is they're looking at. 
I beefed up the IG pages and reports, (thanks for the attaboy). You must realize
that some of these reports couldn't be created, and had to be edited to conform
to the newer version. Even the manual has some old reports in them. If you have 
any that you fell are good, send 'em to us. When I was teaching, I got my best 
examples from the field: students, support...  

I does seem jumpy, but it's organized by tool/utility: MONITOR, SHOW, NCP.
Mostly all of the review stuff was ripped out - page counts. But personally, I 
think it should be there.

It was decided to remove the info on events because the students need to know 
the different DECnet layers to make sense out of this. Where do we go? The 
instructor will tell them the important event classes to monitor.

Thanks for the feedback. We are compiling everyones notes for the update.

Remember, if you have a good example, a bad example, a good lab (we have enough
bad ones ;-)   ), send them to us. I'm still waiting on those labs Greg!

						Regards,
						Paul M.
105.9The rest of the module...SOAEDS::TRAYSERSeniority means a bigger shovel!Mon Mar 02 1992 23:2944
More...
  
9-7 --
    I think we need something different here.  I'd suggest a MONITOR CLUSTER.
    Its mostly a variation of MONITOR SYSTEM so its easily understood.  The
    current display leads to numerous questions about MSCP Servers which has
    not been discussed in detail in this course series.

9-10 - 9-13 --
    All of these pages have slight formatting problems on the margins, 
    especially if the line has a bullet.

9-13 --
    Kill this page.  We haven't discussed SYSAPs to any detail level yet.  I'd
    suggest you toss in information on the WRITE, SAVE and PAN commands.  And 
    maybe SHOW_CLUSTER$INIT logical would fit here well, I always teach these
    and I get a good response.

9-23, #1 --
    What is @CRASH on a uVAX-II equivalent to?  How about a VAXstation?  This
    was mentioned on 9-22a, but how about an example for reference.

9-29, DRA0, NODE1_SYS_V4 --
     Looks like an ancient system with RM05s or RM03s.  WOW!  Now, how about
     some updated listings so we don't have to discuss old hardware info.     

9-30 - 9-33 --
     I don't usually teach VAXsim since, like the instructor's page states,
     this is really a Customer Services (F.S.) tool.  *IF* I was to teach it,
     I would definitely do it AFTER I taught the 'built-in' stuff like error
     log and Analyze.  Please move it behind error log or in an appendix.

9-35 - 9-36 --
    One of these is mostly redundant of the other, please delete or maybe
    consolidate these pages.

9-37, bullet #1 --  
    Comment on bottom should be begin with "SWISH$" to be more technically complete.

9-39, bullet 2 --
     Indention problems, see 9-41 for proper indenting.

  
  $
105.10HARDY::MATTHEWSFri Mar 06 1992 08:462
    page 9-53, which I'm sure you all teach in great detail :-) "exits" in
    the last line of figure 9-5 should be "exist."