| Not entirely.
Unfortunately, the files aren't well interlocked for simultaneous access
from multiple cluster nodes (or even multiple instances of NET$CONFIGURE on
the same node).
The config procedure merely creates a .TMP file to build the new script
into, then renames it to be the active script when it's done. It's
entirely possible that it's not even careful about which .TMP file it picks
up, so it's possible that when it decides to make a .TMP into the active
script, it's actually picking up a work in progress from some other node.
I can think of at least two ways to fix this bug, but obviously haven't
done it yet. For starters, I'd suggest a QAR (or equivalent) to make sure
the problem doesn't get forgotten.
I will say that the best we can probably do right now is just make sure
something catastrophic doesn't happen, although it's still a terribly bad
idea to try to have multiple systems simultaneously update a cluster common
application script. By this, I mean that a "fix" (as suggested in the
previous paragraph) would not eliminate all your problems, because short of
going to some formally interlocked database, I think we're always going to
have problems with two NET$CONFIGUREs banging on the same config file at
the same time.
One question in return: Simply because I'm so ignorant of what goes on
outside the DECnet software proper, it comes as a surprise to me that
something does application delete/add operations on startup. Is this a
first time (configuration) thing, or something more ongoing? What
processes are involved, and what's supposed to happen? (I ask this so any
future edits can be more attuned to whatever software's needs it isn't
currently attuned to.)
|
| thanks for the replies.
I just asume that RSM-folks did not trust in the Net-Object-
database hold in NCL-scripts so they made sure that their applications
get defined at startup of their product (this happens on each reboot).
Of course (Vera, you are right!) we also run modified/edited
RSM-startup commandfiles. But as you know, each time you forget to
change all necessary startupfiles you might get the same problem again.
This happens if new systems get introduced by installing them from
original productfiles.
Another idea to prevent concurrent modifications by multiple
NET$CONFIGURE-processes would be to maitain a RMS-file in the common
area where we could have a clusterwide lock on it for synchronisation.
Disadavantage would be to eventually slow down boot-processes even
more. But a meaningful OPCOM-message might help to check this
behaviour.
To complete this note I will open an IPMT.
Regards
Kurt
|