|
Feedback on review of CLUSTER_CHAP1 Ron Stahly 5/6/92
-----------------------------------------------------------------
Page 1-5a
Guidelines for VAXcluster Configurations is now at rev -005
Note also that this document is available to customers.
Note that the VAXcluster Principles manual is internal use only.
There is also a Version 5.4 VAXcluster Principles Update
order number EK-VAXCP-UP-001 (also internal use only)
A "new" VAXcluster principles manual(based on version 5.5) is
in the works and should be available in the next few months.
Plans are to also make this available to customers.
-------------------------------------------------------------
Page 1-5 locatio - needs an "n"
-------------------------------------------------------------
Page 1-7
Add FDDI to the interconnects, maybe list network interconnects
with Ethernet and FDDI
I would suggest that you add to your DSSI quote of 4 megabytes,
the statement of 32 megabits/sec so that you are using the same
term (megabits) that was used for CI and NI speed. This makes the
comparison easier.
-------------------------------------------------------------
Page 1-8a
On the Star Coupler discussion, you say "These may be either VAX
systems or HSC subsystems in any combination that makes sense"
We limit the number of CPUs attached to a star coupler to 16
This is stated in the V5.5 SPD
Also note that multiple star couplers can be used in the cluster.
-------------------------------------------------------------
Page 1-8 CI VAXcluster System Advantages
add Maximum of 16 VMS systems after Minimum of one VMS system
-------------------------------------------------------------
- 1 -
- 2 -
Page 1-10a - under security data.
The statements: Using DSSI does not change the essential
character of ethernet-based system, since the connectivity is
established over the ethernet. It does, however, offer the
option of providing dual-hosted system disks to the cluster.
are quite misleading. They are true if the DSSI is KFQSA
based on a QBUS, but incorrect for all other DSSI adapters
found on 3300/3400, 4200/4300/4500/4600, and KFMSA(XMI based).
All of these adapters allow full SCS traffic and are preferred
over the ethernet during connection formation.
-------------------------------------------------------------
Page 1-11a
Discussing SCS communication over the DSSI. You state supported
only for 3300/3400 systems using the EDA640 adapters and PIDRIVER.
This is also true for the embedded adapters on 4200/4300/4500/4600
and the KFMSA(XMI-based) adapters using PADRIVER.
-------------------------------------------------------------
Page 1-11 statement: The Ethernet cable is a single point of failure.
So is the ethernet adapter, but we now support multiple adapters,
and you could use multiple segments. This is not the issue that
it used to be.
-------------------------------------------------------------
Page 1-12a
Again SCS and the DSSI.
Add EDAs on 4200/4300/4500/4600 and KFMSA(XMI-based) or drop
the list and keep the point of KFQSA does not support.
-------------------------------------------------------------
Page 1-12 DSSI
add EDA on 4200(single DSSI bus)
add EDA on 4300/4500/4600 (two DSSI bus)
add KFMSA(XMI based) on 6xxx/9xxx ( two DSSI bus)
also you discuss Dual-Host, but not Multi-Host. We support
several multi-host systems of three nodes today, and soon more.
maybe add more details to instructor page 1-12.a
-------------------------------------------------------------
Page 1-15 Mixed-Interconnect - statement on DSSI service for
up to 2 MicroVAX members per DSSI
We currently have configurations available of 3 VAX systems
-------------------------------------------------------------
Page 1-20a
Work around for tape ported to a single member
Starting with VMS V5.5 we can serve TMSCP tapes to other
nodes in the cluster.
-------------------------------------------------------------
Page 1-21a - Reference to JBCSYSQUE.DAT - need updated !
-------------------------------------------------------------
Page 1-21 - Reference to JBCSYSQUE.DAT - need updated !
-------------------------------------------------------------
Page 1-26 - VAXcluster-Supplied components
Add TMSCP server
-------------------------------------------------------------
Page 1-27a VAXcluster Troubleshooting Reference Set
These are now revision -002
Very good reference
Also available as a set (EK-VCSRS-PK-002 I think, but will check)
-------------------------------------------------------------
Page 1-28 Statement: Cluster state transitions occur when a node
joins or leaves the cluster, and when the cluster recognizes
a quorum disk.
add to the statement about the quorum disk, that a transition
also occurs when the quorum disk becomes unavailable
-------------------------------------------------------------
Page 1-29a Instructors notes on changing quorum
use of IPC to regain quorum should also be included here in
the instructor notes.
-------------------------------------------------------------
Page 1-30a Instructor notes for resources on lock manager
could add: Distributed Lock Manager course material, and
the Version 5.4 VAXcluster Principles Update (EK-VAXCP-UP-001)
-------------------------------------------------------------
Page 1-31a wording on the statement (VMS V5.4) When a member
leaves the cluster(SHUTDOWN), part of the process before the
cluster state transition is a Lock Database Rebuild.
I really disagree with this wording. A rebuild is not really
performed before the transition, instead before the transition
begins any resources mastered on the node shutting down, are
re-mastered on another node interested in the resource.
-------------------------------------------------------------
Page 1-31 on LOCKDIRWT and dynamic remastership.
Your info "The distribution procedure operates as follows:"
is for version 5.2 - 5.4-3
Version 5.5 information is also needed here.
I can provide a write-up if needed.
-------------------------------------------------------------
Page 1-32 statement: You may wish to set LOCKDIRWT to zero on
MicroVAX and workstation members.
I would suggest using the term satellite nodes rather than
specifically pointing out MicroVAX and workstation members.
-------------------------------------------------------------
Page 1-35 SHOW DEVICE?FULL
I think you wanted a / and not a ?
-------------------------------------------------------------
Page 1-36 - Reference to JBCSYSQUE.DAT - need updated !
-------------------------------------------------------------------------------
Two things that I want to ask concerning the course material:
-------------------------------------------------------------
1) We need to present current release information. On the issue of
of BATCH/PRINT; should materials only contain V5.5 release
information, or should they contain both pre-V5.5 and version
5.5 material (especially the instructor pages) ?
2) The chapter appears to contain VAXcluster System Management
material with some new additions. Will the VAXcluster System
management course be updated and continue to be delivered, or
will these two new courses replace it ? ( Andy and/or Sherry
can you please respond to this question...)
Ron Stahly
DTN 523-2134
719-260-2134
NEURON::STAHLY
|
| Sorry, but 2 & � weeks was the best I could do...
These are some the items I've found that need further explanation or
corrections for the Cluster module. I'll *NOT* be repeating RON's or
SUSAN's comments, as they are on target and need to be incorporated.
General comment -- please make sure that somewhere in the early pages of
the book (or this chapter) to mention that the material constantly refers to
the word 'cluster' as a common 'slang' for the word VAXcluster, which is
a trademarked word.
1-5a, 2nd sentence --
"Details about common environment and multiple environment are in the
'Building' module." But we mention them on 1-6 (and 1-6a). This will
probably require that these topics are discussed in some detail at this
time. Can we not move it all to the Building Chapter.
By the way, I've searched through the VAXcluster manual and didn't find any
references to anything called a 'Heterogeneous' cluster. Did we make that
up?
1-5, 8th bullet --
"Potentially perform I/O to any disk OR TAPE storage subsystem in the
cluster" -- new as of 5.5
1-6a, 1st sentence --
I suggest that you make an extra overhead of figure 1-3 and keep that
available all week -- very good example to use when you need a sample
configuration to display.
1-6, definition of multiple environment --
If you read the sentence defining this environment it is possible to think
that you can have 2 'clusters' sharing one set of hardware. Make sure that
there is a note either on this page or the instructor's page that specifies
that all systems attached to a common star coupler must be in the same
cluster.
Or...kill this page entirely and bring up in the Building Chapter.
1-7a, 3rd paragraph --
It states that we are to use NI VAXcluster instead of LAVc, but commonly
throughout this module are many references to Local Area VAXcluster and LAVc
that need to be updated -- see 1-10a, 1-10, and 1-11a for examples.
4th paragraph --
Please remove the word 'even', as it provides no benefit.
1-7, 5th bullet --
Last I checked DSSI can now span 20 meters.
1-8a, last bullet --
SPD limits the VAX CPUs to 16, but neither the hardware nor the software
enforce this. There are many sites (in and out of DEC) with 17+ VAX systems
on the extended CIs.
1-9 --
Can we get rid of the shadowed-out part of the diagram. It is not useful
and provides too easy of an avenue for questions before their time.
Add the tape drive to the HSCs.
You picked the CPUs, now pick the HSCs. I suggest a HSC50 and a HSC60, this
will show both old and new HSCs (like the 785 in the cluster) work fine.
The disks could be almost any disk - RA70, RA72, RP06, etc. Let's just drop
the types of disks entirely from the description since they are NOT members.
1-10a --
I think Ron mentioned Dual-ported on a later page, but there are numerous
references that need to be updated to Multi-Hosted system disks throughout
the entire chapter.
1-10 --
"Boot server", "Boot Server", "Bootserver" or "bootserver"?? This page shows
two different spellings (3rd bullet from top and 2nd sentence from end). Can
we be consistent?
4th bullet, 2nd dash --
"Sends *initial* VMS image to node", since DECnet/MOP is only responsible for
NISCS_LAA.EXE and TERTIARY_VMB.EXE.
1-11a --
Move the DSSI discussion to 1-12a.
1-11, 9th bullet --
Since we can have Multi-hosted system disks via DSSI there isn't the need
for a quorum disk, so: "...and POSSIBLY a quorum disk..."
1-12a --
This page is very dated. Multi-host vs Dual-host, no new hardware listed,
etc. I think Ron mentioned this as well, but what we need is a generic
listing that won't date itself. Such as using "for example" or "a partial
list" type words to keep compatible if not current.
1-12 --
1st sentence needs "(ISE)" after "Integrated Storage Element" if we are to
use the acronym later in that paragraph. Therefore remove it from bullet 2.
3rd bullet needs updated hardware, list the 6000 and 4000 hardware.
1-13 --
Valid diagram, but old. How about updating with 4000s and 6000s? Also, if
we have the KFQSA listed, why not show the part for the other DSSI cable?
1-14 --
Why does the Instructor page specify V5.4-3 and the student page says V5.5?
1-15a, 1st sentence --
A pair of 6000s connected via both CI and DSSI *are* a MI cluster. The
Ethernet hardware is *NOT* required! More correct: "Therefore, each
SATELLITE and BOOT SERVER must reside on the Ethernet." There might be
other ways to word it, but be careful -- MI does *NOT* imply strictly
CI-NI based clusters anymore.
1-15 --
3rd bullet is misleading. Last I checked all CI based cluster members *must*
share a common star coupler, although some CPUs may have multiple Star
Couplers.
The quote from the SPD (V5.4) is...
The maximum number of VAX CPU's supported in a VAXcluster system is 96.
Up to 32 systems may be systems other than single user workstations.
It also is misleading, but 16 is still the maximum 'supported' VAX systems
on a Star Coupler.
1-17 --
The paragraph defining VAX PROCESSORS lists VAX or MicroVAX, but leaves off
VAXstation and VAX-11 systems. Best to just drop MicroVAX and let VAX
account for all of them.
1-18a, 2nd paragraph --
This leads us to believe that only HSC or DSSI based devices can be shared.
Misleading since almost any disk (or tape) attached *locally* can also be
served. This needs to be reworded to not exclude these devices.
1-19, 4th bullet --
VAX 4000 (not a MicroVAX or a VAXstation) is also a valid satellite. Maybe
just adding "...and selected VAX systems" might fix it.
1-20a --
1st bullet needs to add TAPE DRIVES
8th bullet states that the Lock manager is in the Software Module, but it
seems to be a very complete discussion on pages 30-33 in THIS chapter.
Last bullet, insert the following bullet for this one:
"o Serve the Tape Drive if it is capable (see 1-35a for a list)"
1-20, 1st sentence --
Change "are" to "in"
1-21a --
1st bullet, "...must complete on that node,..." is a confusing statement.
"Users can recover *more* quickly..." "More"? More quickly than what? Just
drop the word MORE.
1-21, 4th bullet --
"All" is *NOT* correct. Tried serving an RX23? Change ALL to MOST.
1-22, 4th bullet --
No longer true, see page 35 for details. How about "Available to any system
in the cluster provided that it is served."
By the way, Page 1-18 is a bulleted version of Pages 1-21 and 1-22. Can we
kill page 1-18?
1-23, 3rd bullet --
This is partially incorrect. For proper system management of a VAXcluster
you are required to have DECnet. HOWEVER, the SYSMAN utility does NOT
require DECnet to operate correctly in a cluster. MONITOR commands like
MONITOR CLUSTER and MONITOR NODE do need it, but SYSMAN only uses DECnet to
get to nodes that are OUTSIDE of the cluster, otherwise is uses SCS like
any good Cluster utility.
1-25, table headers --
Why is the Multiprocessor listed as a 6000? 9000 is also valid. Drop the
CPU type and let the header read "Multiprocessor".
1-28, 1st paragraph --
"...if a majority of the expected VOTING MEMBER NODES are functioning."
WRONG!! Not correct! We don't care how many *NODES* are present, what
we are looking for is "...a majority of the expected VOTES from MEMBER
NODES are present." The only time the first statement will be true is if
they all have the same number of votes. If the systems have varying
numbers of votes then it won't work.
1-30a --
For resources also see the Digital Technical Journal on VAXclusters.
1-30, 2nd bullet --
So, what is a Lock Value Block, and what does it do and why does a system
manager need to know what it is? Drop "...through the lock value block"
as it adds no benefit.
Last bullet should also list DIGITAL-written applications.
1-31 --
2nd bullet, 1st dash -- Nodes with a Zero value do *NOT* participate in the
directory function.
"Eagerness" -- please define! Do the systems block the other systems? Is it
a timing issue? Some type of interprocessor signal?
4th bullet -- when did this happen? Is there a reference for this? Please
give more info on instructor's page.
1-32a, 3rd bullet, 2nd dash --
Poor wording, please rewrite.
1-32, last bullet --
replace "MicroVAX and workstations" with "slow CPUs and satellites"
1-34a, last bullet --
replace "<REFERENCE>...stuff..." with a proper reference.
1-34, 1st bullet, 5th dash --
If you are going to specify "...mixed-interconnect..." then you need a
parenthesis similar with the previous bullet.
1-35a, last 2 bullets --
Aren't these disks??
1-35 --
1st bullet. We don't need the cluster course to be a 'new features' or
a 'release notes' type material. Let's just put things on the student
pages that are pertinent to VAXcluster *management*. Drop the reference
of SDA and put it on the instructor's page. The extra 'stuff' (like the
bottom 1/2 of 1-34a) is OK on instructor pages, but PLEASE keep the info
on the student pages focused on the purpose of the course and chapter.
Last two bullets -- put 'compliant' behind TMSCP.
1-36, 4th bullet --
Print queues don't have job limits. There is no 'ratio' for print queues.
These need to be two separate bullets.
1-37a, 1st bullet --
We tease them here. Point them to OPC$ENABLE_OPA0 in SYLOGICALS.COM.
1-38, 1st bullet --
Where are VAXstations??
|