T.R | Title | User | Personal Name | Date | Lines |
---|
331.1 | SCS services, maybe? | FROST::HARRIMAN | DEC 41-BLANK-03, Harriman,Paul J., qty 1 | Thu Oct 09 1986 09:53 | 1 |
|
|
331.2 | Read Chapter 12 of System Services Ref. Man. | QUILL::NELSON | JENelson | Thu Oct 09 1986 13:51 | 28 |
|
Yes, the lock manager will do exactly what you want.
Here's what you do:
Your scheduler process gets started (on just one node of your
cluster!) and $ENQs a lock for EXclusive access, then converts
the lock to PW specifying a value block (value blocks discussed
later), and a blocking AST routine called DOORBELL. The scheduler
$HIBERnates.
When the DOORBELL routine runs, it converts the lock to NL mode,
then $ENQWs a request to get the lock back in EX mode. Once
the request is satisfied, you copy the value block to local
storage, re-initialize the value block, and convert the lock
back to PW, specifying DOORBELL as your blocking AST routine.
The scheduler can then act on the contents of the value block.
Value blocks are used to pass information (up to 16 bytes) between
processes.
The program that wants to communicate with the scheduler $ENQWs
a request for the lock in EX mode. Once granted, it fills in
the value block, and converts the lock to NL mode.
Hope this has been clear.
JENelson
|
331.3 | will try it, thanks - | RUMOR::FALEK | The TU58 King | Thu Oct 09 1986 14:27 | 4 |
| Thanks, I'll write a little test program to try this.
(I also entered this as note 1731 in VMSnotes and got some response
there, it says essentially the same thing as .2)
|
331.4 | | CLT::GILBERT | eager like a child | Thu Oct 09 1986 19:37 | 31 |
| Yes, but what if...
What if two processes simultaneously want to communicate with
the scheduler? Processes A and B $ENQW EX mode lock requests.
The scheduler's DOORBELL routine runs, converts the lock to
NL mode, and $ENQs a request to get the lock back in EX mode
(your note had this last one as '$ENQW', which is incorrect).
Now process A is granted the EX lock, fills the value block,
and converts to NL mode. Process B gets the lock, fills the
value block (thereby trashing what process A wrote there), and
converts to NL mode. Then the scheduler is granted its EX lock.
Note that the scheduler receives only one of the two messages.
Instead, to communicate with the scheduler should $ENQ (it's
okay to wait here, if you like) an EX mode request. When it's
granted, check whether the value block already contains a message.
While there's a message in the value block, convert to NL mode,
and $ENQ another EX mode request for the message. When there
isn't a message in the value block, fill the value block, and
convert the lock to NL mode.
And before the scheduler's DOORBELL routine converts the lock
to NL mode, it should clear the value block to indicate that
it contains no message.
Instead of clearing the value block, you could use the flags (to
the $ENQ service) and a lock-status-block status of SS$_VALNOTVALID
to indicate whether the value block contains a message.
|
331.5 | DECnet is not so bad | TAV02::NITSAN | Nitsan Duvdevani, Digital Israel | Thu Oct 16 1986 05:29 | 8 |
| re .0
> I don't want the overhead of setting up DECnet links to the node the
> scheduler is running on.
In a small "benchmark" we made about a year ago (on a small cluster),
DECnet communication (using the CI) was more efficient than the distributed
lock manager.
|
331.6 | DECnet links don't HAVE to be slow | CRATE::COBB | Danny Cobb, DSS Eng, LKG | Mon Oct 20 1986 13:18 | 6 |
| Lou, instead of creating/deleting processes for logical links,
write your own program that declares itself a decnet object and
handles the incoming links. Its fast, and handling the multithreading
isn't too tough, and your connects are practically instantaneous.
Danny
|
331.7 | Lock manager mechanism works great! | RUMOR::FALEK | ex-TU58 King | Wed Oct 22 1986 14:51 | 14 |
| Re: .6 (Having the server declare itself as a network object and
handle incoming DEcnet connects) - would certainly work, and has the
advantage (not relevant for this particular application) of also
working on a wide-area network. But... I've coded the mechanism
described in .3,.4 using $ENQ and it works great!! Connections
appear to the user as being nearly instantaneous. I don't think
DECnet connects to a server can work as quickly.
The distributed lock manager is neat stuff! The first time I read
the documentation it seemed confusing, but it is simple to use once
you get the concepts down. Our group is putting all our workstations
on a LAVC, so maybe I can use some of my new-found knowlege to make
some useful cluster utilities, like f'rinstance something to cause
execution of a VMS command on all nodes of a cluster at once.
|
331.8 | How can I pass a message to all nodes of cluster? | FALEK::FALEK | ex-TU58 King | Sat Nov 15 1986 23:57 | 72 |
| I now have my job-creating scheduler program's user-interface working
cluster-wide, using mechanisms discussed earlier in this note.
Users can talk to the scheduler from any node, but jobs get created
only on the node the scheduler is running on. There can be only
one copy of the scheduler running per cluster.
It would be useful to generalize things furthur. I'd like to add
a "NODE" field to the (common diskfile) database and run a scheduler
on EACH node of the cluster. If the node field in the database
is blank, the job may be run on any CPU in the cluster, otherwise
just on the specified CPU.
I came up with a lock-based communications mechanism to help implement
this, but when I tried it out, I found that my design is wrong. I'm
going to describe it here (any why it doesn't work) in hopes that
readers of this note may be able to offer hints that may help me to
get around the problem.
My design has one of the schedulers be "master", and all others
slaves. The master is the one that holds the "KO" lock in exclusive
mode. Since a scheduler never gives this up once it gets it, the
master is the first scheduler started in the cluster. If the node
crashes or someone kills the master process, one of the slaves
will get the KO_LOCK and the KO_AST that goes along with the lock
will make it the new master. The KO_AST sets a bit in the scheduler
that tells it that it is the master, and also sets up the
user-interface lock mechanism described in earlier responses to
this note. The user-interface always talks to the master scheduler.
(This stuff, the passing of master-ship and the user-interface always
talking to the master from any node, works correctly)
Now let's say that the master wants to send a message to one of the
slaves. Messages consist of a flag-byte, a destination node, and
a message. My original (doesn't work) design had each scheduler, at
startup time, enqueue a request for the "Round Robbin" (RR) lock
in exclusive mode, specifying the "RR_AST" to be delivered when
the EX-mode request is granted.
When the scheduler acting as master gets the EX-mode RR-lock, it keeps
it until it has a message to circulate to all the slaves. To circulate
a message, it puts the message in the value block, downgrades the lock
to NL-mode, and $ENQS another request to get the lock back in Ex-mode.
When a scheduler acting as slave gets the lock, it reads the message
in the value block, sets the flag-byte to "H" if the message was
for it, and then downgrades the lock to NL, thereby letting the next
guy get it. It then $ENQs a request to get the lock again in EX-mode.
Eventually the master should get the EX-mode lock again, look at the
flag byte to see if the message was accepted by anybody, and then keep
the lock in EX-mode until it has another message to circulate. I
reasoned that when the master downgrades the lock and then requests
the upgrade, the conversion request goes at the end of the queue and
so all slaves should get a crack at it before the master gets it again.
Unfortunately, my mechanism only works for up to 1 slave.
The problem is that new slaves can never get the lock for the first
time (even if they first request it in NL: mode and try to upgrade)
because apparently the lock CONVERSION queue must be empty before any
NEW locks can ever be granted. Since in my scheme either the first slave
or the master will always have an outstanding request for a conversion
of the RR lock to EX-mode, no new slaves can ever acquire the lock for the
first time.
Is this analysis correct? I've gotta believe VMS must do this sort of
thing all the time, so there must be some way to do it!
Can anybody think of a scheme whereby the master can pass a message to
the servers on ALL nodes of the cluster?
lou falek
|
331.9 | see also CSSE32::CLUSTER | FALEK::FALEK | ex-TU58 King | Thu Nov 20 1986 23:09 | 6 |
| I posted my question about using the lock manager to send a message
to all nodes of a cluster to CSSE32::CLUSTER (note 302) and got many
useful suggestions. There are pitfalls that one can fall into because
rules about which conversions can or can't be blocked by what are
complex and the documentation doesn't make them all that clear.
But what I want can be made to work.
|