T.R | Title | User | Personal Name | Date | Lines |
---|
2491.1 | sys$cluster_node logical | CHRLIE::HUSTON | | Tue Mar 30 1993 13:03 | 36 |
|
If you customer is running the fCS at distribution level 0 then there
is one partition per cluster. It hasu up to n+1 names where n is the
number of nodes in the cluster and +1 is there for a cluster alias
if present.
If they are running the FCS at distribution level 1 and have a DNS
namespace setup with partition objects in it, then they can have
any number of partitions on the cluster.
I assume they have level 0 running and no DNS namespace.
Do both systems share the OA$DATA_SHARE directory? This is where the
partition.dat file lives, which is what defines what drawers are in
the partition.
One way I have seen to get the no Distribution error is that the
logical
sys$cluster_node is incorrectly defined. It should have the two
colons on the end. IN other words, if the following happens:
$sho log sys$cluster_node
REVEALS: nod as opposed to node:: then you will see this happen.
The FCS blindly chops off the two characters from the length of the
logical. It does not first check to see if the logical is correct.
The bug number for this is THR-17429 in the IOSG data base.
If this is the case, have you customer find out why sys$cluster_node
is wrong, fix it and re-start the FCS. Make sure that when the system
is rebooted sometime in the future that the logical is correctly
defined.
--Bob
|
2491.2 | | KERNEL::LOAT | Keep passing the open windows... | Tue Mar 30 1993 15:08 | 18 |
|
Hi Bob
Both partition records point to OA$DATA_SHARE:PARTITION.DAT and both
nodes share the OA$DATA area.
They are running the FCS at distribution level 0. I've asked the
customer to check the logical SYS$CLUSTER_NODE, and this is defined as
ALIAS::, which looks okay to me.
If they are using a cluster alias, should they have a partition record
for the cluster alias as well as the two nodes?
Ideas?
Steve.
|
2491.3 | Turn on FCS tracing and try again | CHRLIE::HUSTON | | Tue Mar 30 1993 17:54 | 15 |
|
Steve,
All the cluster alias does is give users another way to name drawrs
on that system, they will all be in the same partition.
Have him turn on FCS tracing and then do what ever he is doing to
get the no DSO license rturned. This should show who they are
trying to connect to. Look for a record like "remote connect
started" or something like that, it should show what partition they
tried to connect to. This will give some indication if it is
their system or are they really trying to go out over the network.
--Bob
|
2491.4 | More info. | KERNEL::LOAT | Keep passing the open windows... | Wed Apr 07 1993 16:29 | 136 |
|
Well, Bob, here's the information. The customer was logged into node
GIV002 under the account LAWTON_RI_YOUK. He was accessing a drawer
owned by an account YORIL. The node GIV002 is in a cluster along with
GIV001, and the cluster alias is GIV003. There is a partition record
for each node (GIV001 and GIV002) which points to the same data file.
When he did this, he was trying to create a document in a shared
drawer, and it said 'Creating document', the it said 'Document will
not be created.' Doing a Gold-W gave the message 'Partition not
found.', and then the message that he didn't have the DSO license.
He's got a FCS running on each node correctly, and both FCS are defined
as LOCAL.
Below is the trace log file for the FCS.
Does this help?
Steve
--------------------------------------------------------------------------------
SESSION ID: 6885904
OAFC FUNCTION: OafcSetServer
TRACE EVENT: Task Complete
EVENT TIME 5-APR-1993 15:07:08.02
STATUS: 55803913
STRING1 IS: ALLIN1
SESSION ID: 6865904
OAFC FUNCTION: OafcShowServer
TRACE EVENT: Task 5tart
EVENT TIME: 5-APR-1993 15:07:09.02
STRING1 IS: ALLIN1
SESSION ID: 6885904
OAFC FUNCTION: OafcShowserver
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:07:09.08
STATUS: 55803913
STRING1 IS: ALLIN1
SESSION ID: 6687024
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Task Start
EVENT TIME: 15-APR-1993 15:08:12.48
SESSION ID: 6887024
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Connection Rcv'd
EVENT TIME: 5-APR-1993 15:08:12.61
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STRING1 IS: GIVOO2
STRING2 IS: YORIL
SESSION ID: 6887024
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Connection Granted
EVENT TIME: 5-APR-1993 15:08:13.86
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STRING1 IS: YORIL
SESSION ID: 6867024
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:O8:13.91
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STATUS: 55803913
STRING1 IS: YORIL
SESSION ID: 6887024
OAFC FUNCTION: OafcListW
TRACE EVENT: Task Start
EVENT TIME: 5-APR-1993 15:06:14.05
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STRINGl IS: YORIL
SESSION ID: 6887024
OAFC FUNCTION: OafcListW
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:00:15.86
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STATUS: 55603970
STRING1 IS: YORIL
SESSION ID: 6887024
TRACE EVENT: Disconnect Done
EVENT TIME: 5-APR-1993 15:08:15.92
FILE CABINET NAME: GIV002 .LAWTON_RI_YOUK
SESSION IO: 6885904
OAFC FUNCTION: OafcShowServer
TRACE EVENT: Task Stast
EVENT TIME: 5-APR-1993 15:08:31.69
STRING1 IS: ALLIN1
SESSION ID: 6885904
OAFC FUNCTION: OafCShowServer
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:08:37.87
STATUS: 55603913
STRING1 IS: ALLIN1
SESSION ID: 6885904
OAFC FUNCTION: OafcShowServer
TRACE EVENT: Task Start
EVENT TIME: 5-APR-1993 15:08:42.20
STRINGl IS: ALLIN1
SESSION IO: 6885904
OAFC FUNCTION: OafcShowServer
TRACE EVENT; Task Complete
EVENT TIME: 5-APR-1993 15:08:42.46
STATUS: 55803913
STRING1 IS: ALLIN1
SESSION ID: 6885904
OAFC FUNCTION: OafcSetServer
TRACE EVENT: Task Start
EVENT TIME: 5-APR-1993 15:05:43.12
STRING1 IS: ALLIN1
|
2491.5 | something is confused, I know I am | CHRLIE::HUSTON | | Wed Apr 07 1993 17:05 | 112 |
|
THere are some things here that are not making sense to me, maybe
there is just a misunderstanding between us.
>GIV001, and the cluster alias is GIV003. There is a partition record
>for each node (GIV001 and GIV002) which points to the same data file.
What do you mean there is a partition record for each node? Partition
record where? You should have one file: OA$DATA_SHARE:PARTITION.DAT
and it should contain drawer records, not node records. You say you
have two of these files, this is wrong unless you are using DNS
naming.
>When he did this, he was trying to create a document in a shared
>drawer, and it said 'Creating document', the it said 'Document will
>not be created.' Doing a Gold-W gave the message 'Partition not
>found.', and then the message that he didn't have the DSO license.
The trace file does not show a create being attempted, either by
an explicit call to create or to reserve. It shows a list (IOS index)
being done and this is failing. There is notheing about a broker being
attempted so you should not be getting the DSO error back.
Here is the relevant part of the trace file, even this seems wrong...
>SESSION ID: 6687024
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Task Start
>EVENT TIME: 15-APR-1993 15:08:12.48
>
>
>SESSION ID: 6887024
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Connection Rcv'd
>EVENT TIME: 5-APR-1993 15:08:12.61
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STRING1 IS: GIVOO2
>STRING2 IS: YORIL
>
>
>SESSION ID: 6887024
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Connection Granted
>EVENT TIME: 5-APR-1993 15:08:13.86
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STRING1 IS: YORIL
>
>
>SESSION ID: 6867024
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Task Complete
>EVENT TIME: 5-APR-1993 15:O8:13.91
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STATUS: 55803913
>STRING1 IS: YORIL
>
This sectino simply says that a VMS user on node GIV002, username YORIL
is opening his file cabinet which is owned by the A1 username
LAWTON_RI_YOUK. Is this correct? Is the person doing an
ALLIN1/USER=LAWTON_RI_YOUK? or is the A1 account name for YORIL
simply called LAWTON_RI_YOUK?
>SESSION ID: 6887024
>OAFC FUNCTION: OafcListW
>TRACE EVENT: Task Start
>EVENT TIME: 5-APR-1993 15:06:14.05
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STRINGl IS: YORIL
>
>SESSION ID: 6887024
>OAFC FUNCTION: OafcListW
>TRACE EVENT: Task Complete
>EVENT TIME: 5-APR-1993 15:00:15.86
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STATUS: 55603970
>STRING1 IS: YORIL
this means he tried to list something in the FC, (probably to get a list
of drawers) and it failed with a status of 55603970. I cannot find this
error anyplace, all FCS error codes are prefixed with 5580, we do have
an error 55803970 that is OafcInternalError, if this is the case please
check the sys$manager:oafc$server.log for an error log at the same time
as the above trace record.
>SESSION ID: 6887024
>TRACE EVENT: Disconnect Done
>EVENT TIME: 5-APR-1993 15:08:15.92
>FILE CABINET NAME: GIV002 .LAWTON_RI_YOUK
This is also confusing, it says that someone requested that this
connection to the FCS be disconnected and it is finished being
disconnected. The confusing part is that there also should be a record
for the disconnect starting or being requested.
This really looks like someone edited this file and cut out to much
information, can you get a complete trace file if this is the case.
also can you get a dir/full on OA$DATA_SHARE:PARTITION.DAT from both
GIV001 and GIV002. And get them to do a dump/rec on
OA$DATA_SHARE:PARTITION_MASTER.DAT (hopefully empty), also the output
from $show cluster, $sho log sys$cluster_node and $sho log sys$node on
both GIV001 and GIV002?
something simply is not making sense here and we have to find out
what and why.
--Bob
|
2491.6 | Closed! | KERNEL::LOAT | Keep passing the open windows... | Thu Apr 15 1993 10:17 | 10 |
|
Well, just to confuse the issue even more, the customer did a full
cluster reboot last night and the problem has disappeared! I've aksed
her to monitor the situation and see if they get this problem again,
call the CSC.
Thanx for the help
Steve.
|
2491.7 | | BUSHIE::SETHI | Ahhhh (-: an upside down smile from OZ | Fri May 28 1993 03:10 | 19 |
| Hi Bob,
This is the same customer as mentioned in note 2563.11 with the logfile
reference.
Manage partition shows 4 nodes these are VAX1,VAX6,VAX8 and CLUS2 only
VAX8 was being displayed now all of a sudden they have the other nodes
being displayed. The server attribute Distribution is set to OFF ie
distribution level 0 and the customer has said that they have always
had it set to OFF the default value. I asked the customer to look at
the partition record (read from MP) for each of the records and they
all share the same partition.dat.
Question is how did this happen ? Can we remove the other nodes ie
VAX1,VAX6 and CLUS2 without impacting the system ?
Regards,
Sunil
|
2491.8 | Strange, but any side-effects? | IOSG::STANDAGE | | Fri May 28 1993 10:41 | 16 |
|
Sunil,
The Manage Partitions subsystem should only be used when your server is
running with DNS enabled (i.e. distribution ON). It's quite believeable
that some spooky things will happen is you're running with distribution
OFF. Having said that, it doesn't really expalin why you're suddenly
seeing different information.
Running with distribution OFF means there is only ONE partition on the
system (OA$DATA_SHARE:PARTITION.DAT).
Kevin.
|
2491.9 | too many strange things happening | ROMA::TEP | | Tue Jun 01 1993 03:26 | 15 |
| Hi Kevin,
>-< Strange, but any side-effects? >-
I cannot answer this question see notes 2628.2 random locking up of
both shared and non-shared drawers, and problems with FCS and
%MCC-E-FATAL_FW... note 2563.11.
The site has too many problems with the FCS and strange things happening
for me to give you much back apart from saying have a look at the
logfile on RIPPER::USER$TSC:[SETHI]OAFC$SERVER.DIGITAL_COPY;1.
Regards,
Sunil
|