T.R | Title | User | Personal Name | Date | Lines |
---|
1244.1 | link down | STAR::CYPRYCH | | Tue Aug 08 1989 17:42 | 6 |
| %decw-e-connectabort (my spelling may be slightly off)
means that the link between the server and client disconnected.
If the server node became unreachable the client would
generate this message -- or if the logical link (depending
on what type of transport) went down.
|
1244.2 | decw-e-cnxabort | STAR::CYPRYCH | | Tue Aug 08 1989 17:44 | 2 |
| the spelling is %decw-e-cnxabort (.1)
|
1244.3 | And the answer IS.. | HYDRA::COAR | Have you mutated yet to-day? | Thu Aug 17 1989 14:10 | 9 |
| The implication here is that, if your applications bomb with this message, your
transport is flakey. In the case of DECnet, it is equivalent to
%SYSTEM-F-PATHLOST, path to network partner node lost
(among others). N'est-�e pas?
#ken :-)}
|
1244.4 | basically true | STAR::CYPRYCH | | Thu Aug 17 1989 15:40 | 18 |
| Yes basically although your transport isn't necessarily
"flakey", although it could be...
You could have just rebooted the server node which disconnects
the network link.
Someone could have shut DECnet down.
.. Basically the server could have shutdown, the machine
could have shut down, DECnet could have shut down,
or the link could have aborted (for a flakey reason),
or the node become "unreachable".
So yes, path to network partner node lost is what happens,
but there can be many reasons. Taking down the session
altogether disconnects links too.
I think that covers most of the reasons.... but there may
be one or two more.
|
1244.5 | Or MAX BROADCAST NONROUTERS too low | SEWANE::MASSEY | I left my heart in Software Services. | Thu Aug 31 1989 15:21 | 23 |
| Here's the solution that worked for us in St. Louis:
<<< QUEEN::PIX1:[PUBLIC.NOTES]EPIC.NOTE;6 >>>
-< You can't go wrong with DECwrite >-
================================================================================
Note 1959.21 DECwrite or DECwindows error? 21 of 21
DCC::HAGARTY "Essen, Trinken und Shaggen..." 13 lines 18-AUG-1989 04:24
-< Network configuration! >-
--------------------------------------------------------------------------------
Ahhh Gi'day...�
Sounds like the infamous BROADCAST NONROUTERS problem! MAKE SURE THAT
THIS IS DONE ON ALL SYSTEMS IN THE LAN, but firstly yours...
Count the numbers of nonrouters on the LAN (say in the region of 300),
and do a:
$ MC NCP SET EXEC MAX BROADCAST NONROUTERS 512
$ MC NCP DEF EXEC MAX BROADCAST NONROUTERS 512
This will stop the timeouts happening to the other nodes in the LAN! On
big machines, make it 1024!
|
1244.6 | Only applicable to routers, not endnodes | MIPSBX::thomas | The Code Warrior | Thu Aug 31 1989 18:32 | 1 |
|
|
1244.7 | I'm experiencing a similar kind of problem... | ASHBY::FEATHERSTON | Ed Featherston | Wed Sep 06 1989 17:09 | 13 |
| I have a MI Cluster with 2 8800's and 34 VAXStation's, all running DECWindows.
Almost all the DECWindows applications are running remotely on the 8800's. At
least 1 or 2 times a day everyone will lose from 1 to all of their remote
DECWindows applications at the same time. The logfiles show the connection
aborted error message. I am going nuts trying to figure out the cause. The
applications that get aborted are not all on the same 8800. Some applications
remain running without a problem.
(further info. Each 8800 has 128MB of memory, running VMS 5.1, DECWindows V1,
the VAXStations are a mixture of VS-II's, VS-II/GPX, and VS-2000's).
Any ideas as to what I should be looking for/at?
|
1244.8 | Boot requests | CASEE::CLEOVOULOU | Marios Cleovoulou | Thu Sep 07 1989 08:04 | 26 |
| I'll bet the cause is either cluster transitions or boot request
multicasts. We almost-totally cured the same problem by:
a) isolating our MIC from the rest of the ethernet with bridges,
b) giving NETACP lots of memory, by use of the NETACP$... logicals.
Note:: when a boot request comes in, NETACP goes through the
_entire_ nodename database, starting at node 1.1 going upwards,
looking for an entry with a matching HW address (implemented by
people in area 2, right :-). NETACP runs at high priority and pages
the system to death if it doesn't have enough memory.
c) defining the boot parameters for our satellites up against "fake"
nodes in area 1, so NETACP finds them quickly (we are really in
area 51!),
d) defining HW addresses but not load assist parameters under fake area
1 nodes for nodes still on our ethernet segment, but NOT part of
our cluster, so that NETACP finds them and FAILS to load them
quickly, rather than trashing through the entire database to not
find them.
Regards,
Marios
|
1244.9 | re: boot requests and cluster transition | ASHBY::FEATHERSTON | Ed Featherston | Thu Sep 07 1989 09:14 | 14 |
| Thanks for the info. We already have some of your suggestions in place.
1. We are on a separate ethernet segment isolated behind a bridge
(requirement for clusters in Hudson)
2. We had already given NETACP lots of memory (WSQUOTA of 10000,
in last 14 days of uptime the max working set size was 7600)
We are in area 6 so the search doesn't take a long time, but I hadn't thought
about the impact of systems we don't load being on the cable. I like the idea
of using fake area 1 nodes to handle that and will give it a try.
Is there anything that can help the cluster transitions?
|
1244.10 | | MARVIN::WARWICK | Well, that'll never work | Thu Sep 07 1989 09:46 | 13 |
|
RE: Cluster transtions.
Assuming that cluster transitions are really causing your problem (use
SHOW CLUSTER to see whether a node entering or leaving coincides with
your problem occurring), see the conference ELKTRA::CLUSTER, where the
subject is discussed at interminable length ! There are several things
you can do to tune a cluster to make the transitions short. I have a
34-38 node LAVC with two uVAX 3600s as boot nodes, and we just do not
notice satellites coing and going at all.
Trevor
|
1244.11 | Update on reply .7 | ROLL::FEATHERSTON | Ed Featherston | Wed Sep 13 1989 10:46 | 11 |
| We seem to have minimized the frequency of the problem using the
MAX BROADCAST NONROUTERS suggestion in an earlier reply, but have 1
guaranteed way of producing it. That is when we add a new satellite
to the cluster (nodes coming and going don't appear to generate the
problem though). We are now trying to isolate what is actually happening
that is different at that time (while the users scream, since each time
we add a new node, they are guaranteed to lose some windows) as opposed
to the normal comings and goings of satellites.
/ed/
|
1244.12 | Update on the update | ASHBY::FEATHERSTON | Ed Featherston | Wed Sep 20 1989 10:32 | 7 |
| We seem to have the last of the problem solved. When adding new nodes into the
cluster the local disk of the new node was initially MSCP served so a page and
swap file could be built. By not taking this option, we were able to add new
nodes to the cluster without any disconnects happening.
/ed/
|
1244.13 | | DECWIN::JMSYNGE | James M Synge, VMS Development | Wed Sep 20 1989 17:18 | 4 |
| Why do you think this would make a difference?
James
|
1244.14 | Not sure as to why it made a difference... | ROLL::FEATHERSTON | Ed Featherston | Fri Sep 22 1989 13:28 | 20 |
| ...when we determined that adding a node caused the problem, where-as rebooting
did not, the only difference between the 2 scenerios that was obvious was the
MSCP serving, so we tried it and voila, adding a node no longer caused the
problem.
A guess as to the reason is that when the disk is MSCP served, all the satellites
are forced to see the disk, requiring some amount of resource in the page pool
area. All the workstations are fairly tight on resources, so possibly this was
enough to push them over the edge (just a guess, don't have the time or resource
to verify it).
As a side note, we are still not preventing it 100% of the time, but with the
changes mentioned previously we have drastically reduced the frequency of the
occurances. (the MAX BROADCAST NONROUT item makes a BIG difference for us. The
value inadvertantly was reduced on one of our nodes the other day, and suddenly
people were losing stuff left and right. As soon as we raised it back up, things
stablized very quickly).
/ed/
|