T.R | Title | User | Personal Name | Date | Lines |
---|
1551.1 | Bright ideas... | CHRISB::BRIENEN | DECmcc Bridge|Station|SNMP Management. | Thu Sep 26 1991 10:56 | 13 |
| The errors you are seeing probably come from the MCC_EA routines (which
are used by Bridge AM, TransLAN AM, Ethernet AM, and Concentrator AM).
The type of error would indicate that the Target entity is not the problem,
but rather the Ethernet Host Port (guess: its being reset due to some error
condition).
Can you provide more information about the system these errors are
appearing on (e.g., what are the ethernet host ports? DELUAs? DEBNAs?
what type/version of vax system/vms software? Is the ethernet host
port being heavily used?)...
Chris
|
1551.2 | They aren't member of any idle club ! | ANTIK::WESTERBERG | Stefan Westerberg CS Stockholm | Thu Sep 26 1991 12:55 | 8 |
|
I seen this type of errors on a VAXstation 3100 M38 and VAX4000-300.
VMS version on both system is 5.4-2.
The load on 3100 is about 3 Export every 120s and 11 alarms every 60s and
on the 4000 30 Export every 120s and 30 alarms every 30s.
/Stefan
|
1551.3 | Still problems | STKMCC::LUND | | Fri Oct 18 1991 11:26 | 18 |
| Hello
We have patched the ezdriver on the 4000-300 (CSCPAT_0252) but the problem still
exsists. I have seen this on different sites and the problem seems to be
related to the load on the MCC host.
Is 30 Export every 120s and 30 alarms every 60s (not 30) to much for MCC on a
VAX 4000-300.
We are using the Translan AM for almost all alarms and exports, accessing the
bridges via 10 mbit Ethernet.
If the error message indicates a problem on the MCC host this should be
investigated.
No problems are seen on the MCC host ethernet line counters and VMS version is
5.4-2
Regards Niklas.
|
1551.4 | "120s" seconds? | TOOK::CALLANDER | MCC = My Constant Companion | Tue Oct 29 1991 16:44 | 14 |
| how much stuff are you set up to record/export. All partitions
or only some? and by 120s do you mean every 120 seconds
(2 minutes), if so then that's 1 minute between rules and
2 between exports...if that is the case it might be that the
intervals are a bit close together, based upon the load on
your net, load on your system, memory in the system, and what type of
devices you are looking at.
I will send this off to the Translan guy to see if he knows what the
characterization of their AM is in regards to overhead/response time to
show requests.
thanks
|
1551.5 | try longer intervals; less system load. | TOOK::MCPHERSON | i'm only 5 foot one... | Tue Oct 29 1991 17:18 | 39 |
|
The briefest exporting interval I've ever been able to make work (with
the Translan AM) was 00:05:00. Note that This was with *no* alarms
outstanding and no other exporting going on.
I haven't done any sort of workload characterization of the Translan
AM. Has anyone else?
MCC> set person doug opinion_flag = true
PERSON MCDOUG_NS:.Doug
AT 29-OCT-1991 17:12:31 Characteristics
Modification completed successfully.
opinion_flag = TRUE
MCC>
MCC> show person doug Personal Opinion
PERSON MCDOUG_NS:.Doug
AT 29-OCT-1991 17:14:31 Characteristics
"I'm not sure it's even worth it to try to export on that
brief an interval (for long-term exporting) since you'll
create Sagans and Sagans of attribute data records... and
your reporter (DTR32 or DECdecision) is going to have to
dutifully plow through it..."
MCC>
MCC> set person doug opinion_flag = false
BRIDGE KAJUN_NS:.br4
PERSON MCDOUG_NS:.Doug
AT 29-OCT-1991 17:14:35 Characteristics
Problem modifying attribute.
opinion_flag = TRUE
MCC> exit
|
1551.6 | get help now ;-) | MKNME::DANIELE | | Tue Oct 29 1991 17:29 | 1 |
| You been doin' this too long Doug.
|
1551.7 | Customer aren't happy ! | ANTIK::WESTERBERG | Stefan Westerberg CS Stockholm | Wed Nov 06 1991 17:19 | 14 |
| This is a very irritating exception that constantly lits up the screen.
And the customer aren't happy with that at all.
The load of 25 translan export with a 2 minute duration and about 120 alarms
with a duration spaning from 1 hour to 15s don't sounds to be a overwhelmy
load for a 8 vups VAX 4000-300. Infact it seems that we have to increase the
number of alarms close to 700. So this problem has to be solved if we are
going to be able to trust that the alarms triggerd are live, not false !
Is there anybode else that have seen this type of behaviour ?
Need a fix for this very soon.
/Stefan
|
1551.8 | Did you try lengthening the EXPORT interval? | MCDOUG::MCPHERSON | My object paradigm needs integration... | Thu Nov 07 1991 13:09 | 24 |
| Did you try legthening the export interval on the Translans (as I suggested
earlier)?
Again, the shortest export interval that *I* was able to make work was 5 min.
That might help lessen the load on the ethernet interface somewhat.
This is just a guess:
When you do an
NCP> sho known line count
NCP> show known circuit count
are you seeing a lot of "System buffer unavailable" or similar counters?
If so, you _may_ be able to alleviate the problem by upping some sysgen
parameters (that elude me right now... Maybe lrpcount? srpcount? dunno. Help?)
I know you're looking for a *solution* and not more questions, but please try to
work with us to isolate the problem.
If anyone else out there has any ideas, please feel free to chime in.
./doug
|
1551.9 | Some explanatons | STKMCC::LUND | Niklas Lund | Thu Nov 07 1991 15:49 | 66 |
| Hello Doug !
Thanks for helping us with this problem.
>>Did you try legthening the export interval on the Translans (as I suggested
>>earlier)?
No we haven't and that's just because this is a LIVE network monitoring system.
We are managing a big financial Value Added Network, with +30 Translan bridges
in it.
We must be able to detect problems, like broken lines, in the network within
30 Seconds.
The utlization graphs that we produce daily on each 64 Kbps line should not
have polling intervalls bigger than 120s (60s for the most importent lines).
We are exporting all line attributes on 5 Translan bridges with +5 synch ports
active.
The exports are most of the time working well and we get an RDB database that
have the size of 23000 blocks each day.
The alarms are changed to START with 2 seconds "duration".
Like this
Enable mcc 0 alarms rule bridge1_line2_nofwd, at start (+00:00:02)
Enable mcc 0 alarms rule bridge1_line3_nofwd, at start (+00:00:02)
Enable mcc 0 alarms rule bridge1_line4_nofwd, at start (+00:00:02)
.
.
Remember that we have seen these errors even on systems that have maybe 30%
of the above described load and that the AM's vary from Translan and Bridge to
Ethernet station AM.
The problem are seen most on VAX 4000-300 systems.
I have included two more error messages of the same type that shows up
as exceptions, but much less freqvently.
Exception: The requested operation cannot be completed
%MCC-E-TRANSMITERROR, error trying to transmit a packet
%SYSTEM-F-DEVINACT, device inactive
Exception: The requested operation cannot be completed
%MCC-E-RECEIVEERROR, error trying to receive a packet
%SYSTEM-F-DEVINACT, device inactive
Exception: The requested operation cannot be completed
%MCC-F-STRTDEVERROR, start Ethernet device failed.
%SYSTEM-F-BADPARAM, bad parameter value
(This one is only seen when using Ethernet station AM)
>>When you do an
>> NCP> sho known line count
>> NCP> show known circuit count
>>are you seeing a lot of "System buffer unavailable" or similar counters?
>>
>>If so, you _may_ be able to alleviate the problem by upping some sysgen
>>parameters (that elude me right now... Maybe lrpcount? srpcount?)
No problems are seen in line and circuit counters, no pool expansion either.
The load on the customers ethernet is 5-10% with peaks up to 30%
Regards Niklas
|
1551.10 | Snake eyes. sorry. | MCDOUG::MCPHERSON | My object paradigm needs integration... | Thu Nov 07 1991 16:06 | 16 |
| Whew. I dunno Niklas.... From your description (and of course the meaning
of the error you're getting) it doesn't sound like there's anything that can
be done to help you form within the TRanslan AM.
Unless you can get some flexibilty on the export interval, there's nothing
further I can think of to help you.
I hope someone else can come up with something.
/doug.
P.S. You do know that Digital must *purchase* the Translan AM for ALL USE
other than for use within DEC and for demo purposes, yes? I trust that the
appropriate monies and licenses have changed hands for the usage of the Translan
AM in this network, or we (Digital) are liable for breach of contract
(among other things).
|
1551.11 | This needs to be hidden from the user. | CHRISB::BRIENEN | DECmcc Bridge|Station|SNMP Management. | Thu Nov 07 1991 17:06 | 25 |
| This error is relatively common (at least to us) when pounding on the
Ethernet device.
It has nothing to do with CPU utilization, and is not a "code bug" in
the MCC_EA Routines (they're just reporting what happens).
There are two possible solutions to the problem, both involve hiding
the problem from the user (and neither are easy to patch into V1.1):
(1) Modify the MCC_EA Routines to do retries when encountering
the device inactive error - this wouldn't be the first time we
did something like this (e.g., there is special code in
place which handles the DELUA "differently")
(2) Tell AM developers to do the retries themselves if they don't
want to bother the user with this information - the set of
AMs using the EA routines is still fairly small, so this
isn't as big of a deal as one would think.
We will be looking at which makes sense very soon. This decision will be
based partly on how long the fix would take to implement and the risk
associated with making the change (e.g., change in the MCC_EA at this
point is more risky than having the AMs do retries).
Chris Brienen
|
1551.12 | Maybe we'll just _hide_ it next time.. | MCDOUG::MCPHERSON | My object paradigm needs integration... | Thu Nov 07 1991 17:29 | 7 |
| Thanks for the note, Chris.
That which we cannot fix, we hide. Fair enough.
I'll add this to the Vitalink's engineering "To Do" list for the next
release of the Translan AM.
/doug
|
1551.13 | Filed as MCC_INTERNAL QAR#1335 [Priority 3] | CHRISB::BRIENEN | DECmcc Bridge|Station|SNMP Management. | Fri Nov 08 1991 15:58 | 0 |
1551.14 | PATCH ? | ANTIK::WESTERBERG | Stefan Westerberg CS Stockholm | Fri Nov 15 1991 05:48 | 9 |
|
When could we expect a patch for this problem ?
At one customer site where we have aubout 700 alarms rule we get 10 to 32
%MCC-E-TRANSMITERROR per hour !
A patch for this problem is badly needed !
/Stefan
|
1551.15 | Please re-read .11 | TOOK::MCPHERSON | My object paradigm needs integration... | Fri Nov 15 1991 08:26 | 21 |
| I do understand your urgency, but I think Chris made it pretty clear:
The change would need to be made either
a) to the mcc_ea routines or
b) to the AM(s) that are calling the mcc_ea routines (in this case,
the Translan AM
"a" is fairly risky, given the amount of work that;'s focused on getting the
1.2 stuff out the door. Also, the mcc_ea routines really are working the way
that they're *supposed* to. The _calling routine_ should really handle the
retries on failure.
"b" is really the _correct_ thing to do, but you'll need to go to Vitalink to
get them to make a patch (or new .exe). AM maintenance *is* Vitalink's
responsibility (personally, I doubt that they'll be able to get you a
patch for the Translan AM any quicker than we could fix the mcc_ea routines.
I know this is exactly what you _don't_ want to hear, but my input is pick one
option and work it through the appropriate escalation mechanism(s); Digital's
for 'a' and Vitalink's for 'b'.
/doug
|