T.R | Title | User | Personal Name | Date | Lines |
---|
423.1 | | KONING::KONING | Paul Koning, NI1D | Tue Dec 17 1991 11:25 | 8 |
| I have no idea if there are any plans for a DAS DEMFA, or how hard it would
be to create one. (My guess: not all that hard.)
Note that you can achieve the same sort of benefit (or actually, significantly
MORE benefit) by using two separate SAS adapters. Obviously that's a
higher cost solution but it is available today.
paul
|
423.2 | At what cost vs 2 DEMFAs ? | LARVAE::HARVEY | Baldly going into the unknown... | Tue Dec 17 1991 13:11 | 35 |
|
Hi Steve
As someone very involved with MDF clusters at present I have to admit
I've not considered, or been asked about this aspect of resilience when
putting MDF systems together... Mainly 'cos we don't make DAS adapters !
Besides we've been too busy looking at redundant and Dual-Homing bridges
and concentrators and dual-ring topologies etc....
Paul implies in .1 that a DAS DEMFA would be cheaper than 2 separate ones
- is this true ? Regardless of this a single board still represents a
single point of failure in the overall design. It all depends on what
aspect of resilience you're looking at doesn't it ?!
From my limited exposure so far in the area of MDF clusters it is wise to
tread carefully and work with the customers to ascertain what their views
and requirements are on the subject of resilience and "disaster
tolerance". Requirements vary enormously from "just" remote disk vaulting
through to complete application/cpu redundancy and failover, and
depending upon what level of support is required, so do the costs !
I think that a DAS DEMFA might be a useful component to have in the
"armoury" of products, especially if it was cheaper than 2 DEMFAs. It
would give us other configuration options to propose to customers,
according to their requirements. But quite how much demand there is for
them .....
In the MDF deliveries we're currently working on, there is a considerable
amount of work involved in documenting the numerous caveats associated
with modes of operation in the event of the myriad failure scenarios. A
DAS DEMFA would be the icing on the cake.... ;>)
Happy christmas to all my readers
Rog (should I send this note twice to ensure it gets entered ???!!!###)
|
423.3 | | COMICS::WOODWARD | Smile! | Wed Dec 18 1991 09:31 | 11 |
| Thanks for the comments so far. I can't see that it should be technically
difficult to make a DAS demfa (except for the size of the bulkhead connector :-)
and would hope it wasn't much more expensive than a SAS device, but I have no
real idea how much takeup there would be.
I feel the big benefits of das v's sas for a system is that the failover is
at the network level, and generally I'd expect this to be much quicker/less
disruptive than making the application have to cope. Is this a reasonable
assumption (or is it too much of a 'length of string' question) ?
Steve
|
423.4 | | STAR::PARRIS | _ 13,26,42,96... What comes next? | Wed Dec 18 1991 11:02 | 7 |
| > I feel the big benefits of das v's sas for a system is that the failover is
> at the network level, and generally I'd expect this to be much quicker/less
> disruptive than making the application have to cope.
I think maybe there's a misconception here. By using two SAS adapters in a
single system, attached to separate rings, you've effectively turned it into a
DAS *system*. Failover can be completely transparent to the application.
|
423.5 | | WELSWS::LOWRY | | Wed Dec 18 1991 12:18 | 20 |
| I know of a customer who is designing a computer centre using an
FDDI backbone. One of the requirements is automatic failover for DECnet
phase 4 and LAT.
They have budgeted for 2 DEMFA interfaces in each of their ten or so
Vaxes, but have found that they cannot connect them to the same ring
and expect DECnet to work. So, one answer is 2 rings connected by a
router. But then LAT will no failover!
To get LAT to failover requires a bridge between the two rings which
filters out DECnet. But we have no 100/100 bridge product yet!
To bridge the two rings we need to use 2 10/100 bridges!
The customer is not ready to implement Decnet Phase V although they
are looking at the NIS6xx product with interest.
A (much) easier solution would be a DAS DEMFA.
|
423.6 | | KONING::KONING | Paul Koning, NI1D | Wed Dec 18 1991 18:37 | 24 |
| Re .4: There is no such thing as a "DAS system". And as far as the term
is used in FDDI, two SAS cards in a node is a COMPLETELY different thing
than a single DAS in that node.
As far as coping faster, that depends. The crucial difference here is that
the DAS provides an active and a standby connection to the concentrator
tree, but only one MAC. So the higher layer protocols can't see anything
different from the SAS -- in particular they do not have to deal with multiple
addresses out there. However, it does take a bit of time (not a whole lot;
perhaps a second) for the standby connection to take over. Also, dual homing
protects against only a few faults, specifically cable or port failures but
not partitioning problems.
Conversely, dual adapters means the higher layers do see two separate links,
and have to handle that. As you know, DECnet phase 4 doesn't like this.
Failover may be slower or faster compared to dual homing, that depends on
how you define the rules. For example, a protocol could use one adapter until
it observes repeated timeouts, then switch to the other. That costs several
timeout periods. A different approach is to send everything on both ports.
That's less efficient when everything is up, but in exchange you get INSTANT
recovery on failure. DECnet Phase V multi-link endnode support allows either,
I believe.
paul
|
423.7 | Yes,.. but (unfortunately) thats not implemented | STAR::SALKEWICZ | It missed... therefore, I am | Thu Dec 19 1991 11:10 | 33 |
| re .6
>Conversely, dual adapters means the higher layers do see two separate links,
>and have to handle that. As you know, DECnet phase 4 doesn't like this.
The wording seems a bit too nice here,.. I would say that
"DECnet Phase IV can not handle this,.. no way,. no how"
>Failover may be slower or faster compared to dual homing, that depends on
>how you define the rules. For example, a protocol could use one adapter until
>it observes repeated timeouts, then switch to the other. That costs several
>timeout periods. A different approach is to send everything on both ports.
>That's less efficient when everything is up, but in exchange you get INSTANT
>recovery on failure. DECnet Phase V multi-link endnode support allows either,
>I believe.
Yes,.. a protocl *could* do that. Trouble is, the current
implementation (and from all the knowledge I have, this
is the implementation we are "stuck" with in Phase IV
semi-premanently,.. if not premanenetly) of DECnet-VAX
Phase IV does not, can not, and will not do that.
Other protocols have been updated/changed recently (LAT,
Clusters) to deal with this,.. but DECnet remains unable
to have more than one device/connection to a LAN.
I would love to see this change made to Phase IV,... but
I don't have the time to work the political machine. I
do understand that this is not a "trivial" change,.. but it
sure seems to be one customers could use. Phase V fixes things,..
but if it isn't available,....????
/Bill
|
423.8 | how do clusters do it ... | COMICS::WOODWARD | Smile! | Thu Dec 19 1991 12:31 | 5 |
| The original note was prompted by a proposed cluster environment. How does LAVc
cope with the 2 controllers and would a dual-homed device be faster to recover
on, say, a disconnected cable that the protocol/application itself ?
Steve
|
423.9 | Here's how Ni-SCA Dual DEMFA do it
| VERELL::BOAEN | | Thu Dec 19 1991 22:26 | 18 |
|
The NI-SCA transport (aka PEDRIVER) will detect and use both adapters
if they are available. However, for any given partner VMS system it will pick
one and use it until either it detects a significant latency improvement via
the other DEMFA, or it fails to get a HELLO message on the prefered DEMFA to
that destination, or it decides that it can't get a packet acknowledged after
several timeout/retransmit attempts. In cases of real failures, the loss of a HELLO
will most likely be the trigger to switch paths. This will occur in ~3sec.
It takes 31 timeout/retry cycles at ~2sec each before it gives up on retransmits.
I think that if the DEMFA decides the ring is gone it may fail frames queued for
transmission and these may trigger fail-over even faster. However, I'm not
positive about this.
'Gards,
Verell
BTW, The above numbers are subject to change, and certainly shouldn't be
discussed outside of DEC.
|
423.10 | not quite | STAR::SALKEWICZ | It missed... therefore, I am | Fri Dec 20 1991 14:23 | 31 |
| .9 is really misleading
Clsters/PEDRIVER has what is called a listen timeout. That has
been reduced to 8 seconds for V5.4-3 which is the first VMS relaease
to support multiple LAN adapters for clusters. So if the local PEDRIVER
does not hear from a given node on a given adapter for longer than
listen timeout, that channel is declared as dead, and another channel
/adapter will be used if available.
If both channels are alive, PEDRIVER will use the path with the lowest
latency. PEDRIVER measures latency constantly for all channels. It
is conceivable that two adapters could be is use simultaneously to
communicate with a single remote cluster node.
In the case of a DEMFA cable being unplugged, the device will interrupt
the driver (FXDRIVER) with a status change. FXDRIVER waits 5 seconds to
see if the ring becomes available. If the ring does become available
again within 5 seconds, nothing happens. All the IO that was queued
to the device and to the driver gets processed as though nothing had
happened. If the unavailability of the ring persists for more than five
seconds, the driver declares a fatal error which PEDRIVER gets
immediate notification. At that point, PEDRIVER will immediately switch
to an alternate channel/adapter if one is available.
Whew,.. that was a mouthful.
Anyway, I hope it clears some things up.
/Bill
|