[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference iosg::all-in-1_v30

Title:*OLD* ALL-IN-1 (tm) Support Conference
Notice:Closed - See Note 4331.l to move to IOSG::ALL-IN-1
Moderator:IOSG::PYE
Created:Thu Jan 30 1992
Last Modified:Tue Jan 23 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:4343
Total number of notes:18308

2491.0. "How many partitions?" by KERNEL::LOAT (Keep passing the open windows...) Tue Mar 30 1993 11:18

    
    VMS 5.5 ALL-IN-1 3.0
    
    A customer has got a two node cluster and he's got a single ALL-IN-1
    system running on both nodes, so they share the OA$DATA, OA$LIB etc
    locations.
    
    How many partitions should he be able to select in SM MFC MP (given
    that they are not using DSO)?
    
    I thought that you should have one partition for each ALL-IN-1 system,
    but I've got a customer with a two node ALL-IN-1 system, who's got a
    partition for each of the two nodes. Also, on our production system at
    the CSC (which is also a two node cluster) we've got two partitions.
    The difference is that cross-drawer operations on our system work,
    whereas on the customers system, it gives an error message saying that
    they don't have the distributed sharing option (which they don't)
    
    What is correct, one partiton for each node in your ALL-IN-1 system, or
    one record for your ALL-IN-1 system, regardless of the number of nodes?
    If the former, any ideas why the customer is seeing these errors?
    
    Thanx
    
    Steve.
    
    
T.RTitleUserPersonal
Name
DateLines
2491.1sys$cluster_node logicalCHRLIE::HUSTONTue Mar 30 1993 13:0336
    
    If you customer is running the fCS at distribution level 0 then there
    is one partition per cluster. It hasu up to n+1 names where n is the
    number of nodes in the cluster and +1 is there for a cluster alias
    if present.
    
    If they are running the FCS at distribution level 1 and have a DNS
    namespace setup with partition objects in it, then they can have
    any number of partitions on the cluster.
    
    I assume they have level 0 running and no DNS namespace.
    
    Do both systems share the OA$DATA_SHARE directory? This is where the
    partition.dat file lives, which is what defines what drawers are in
    the partition.
    
    One way I have seen to get the no Distribution error is that the
    logical
    sys$cluster_node is incorrectly defined. It should have the two
    colons on the end. IN other words, if the following happens:
    
    $sho log sys$cluster_node
    
    REVEALS: nod as opposed to node:: then you will see this happen.
    
    The FCS blindly chops off the two characters from the length of the 
    logical. It does not first check to see if the logical is correct.
    The bug number for this is THR-17429 in the IOSG data base.
    
    If this is the case, have you customer find out why sys$cluster_node
    is wrong, fix it and re-start the FCS. Make sure that when the system
    is rebooted sometime in the future that the logical is correctly 
    defined.
    
    --Bob
    
2491.2KERNEL::LOATKeep passing the open windows...Tue Mar 30 1993 15:0818
    
    Hi Bob
    
    Both partition records point to OA$DATA_SHARE:PARTITION.DAT and both
    nodes share the OA$DATA area.
    
    They are running the FCS at distribution level 0. I've asked the
    customer to check the logical SYS$CLUSTER_NODE, and this is defined as
    ALIAS::, which looks okay to me.
    
    If they are using a cluster alias, should they have a partition record
    for the cluster alias as well as the two nodes?
    
    Ideas?
    
    Steve.
    
    
2491.3Turn on FCS tracing and try againCHRLIE::HUSTONTue Mar 30 1993 17:5415
    
    Steve,
    
    All the cluster alias does is give users another way to name drawrs
    on that system, they will all be in the same partition.
    
    Have him turn on FCS tracing and then do what ever he is doing to
    get the no DSO license rturned. This should show who they are
    trying to connect to. Look for a record like "remote connect
    started" or something like that, it should show what partition they
    tried to connect to.  This will give some indication if it is 
    their system or are they really trying to go out over the network.
    
    --Bob
    
2491.4More info.KERNEL::LOATKeep passing the open windows...Wed Apr 07 1993 16:29136
    
    Well, Bob, here's the information. The customer was logged into node
    GIV002 under the account LAWTON_RI_YOUK. He was accessing a drawer
    owned by an account YORIL. The node GIV002 is in a cluster along with
    GIV001, and the cluster alias is GIV003. There is a partition record
    for each node (GIV001 and GIV002) which points to the same data file.
    
    When he did this, he was trying to create a document in a shared
    drawer, and it said 'Creating document', the it said 'Document will
    not be created.' Doing a Gold-W gave the message 'Partition not
    found.', and then the message that he didn't have the DSO license.
    
    He's got a FCS running on each node correctly, and both FCS are defined
    as LOCAL.
    
    Below is the trace log file for the FCS.
                                                                          
    Does this help?
    
    Steve
    
--------------------------------------------------------------------------------

SESSION ID: 6885904
OAFC FUNCTION: OafcSetServer
TRACE EVENT: Task Complete
EVENT TIME 5-APR-1993 15:07:08.02
STATUS: 55803913
STRING1 IS: ALLIN1


SESSION ID: 6865904
OAFC FUNCTION: OafcShowServer
TRACE EVENT: Task 5tart
EVENT TIME: 5-APR-1993 15:07:09.02
STRING1 IS: ALLIN1


SESSION ID: 6885904
OAFC FUNCTION: OafcShowserver
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:07:09.08
STATUS: 55803913
STRING1 IS: ALLIN1


SESSION ID: 6687024
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Task Start
EVENT TIME: 15-APR-1993 15:08:12.48


SESSION ID: 6887024
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Connection Rcv'd
EVENT TIME: 5-APR-1993 15:08:12.61
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STRING1 IS: GIVOO2
STRING2 IS: YORIL


SESSION ID: 6887024
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Connection Granted
EVENT TIME: 5-APR-1993 15:08:13.86
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STRING1 IS: YORIL


SESSION ID: 6867024   
OAFC FUNCTION: OafcOpenCabinetW
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:O8:13.91
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STATUS: 55803913
STRING1 IS: YORIL


SESSION ID: 6887024
OAFC FUNCTION: OafcListW
TRACE EVENT: Task Start
EVENT TIME: 5-APR-1993 15:06:14.05
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STRINGl IS: YORIL

SESSION ID: 6887024
OAFC FUNCTION: OafcListW
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:00:15.86
FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
STATUS: 55603970
STRING1 IS: YORIL


SESSION ID: 6887024
TRACE EVENT: Disconnect Done
EVENT TIME: 5-APR-1993 15:08:15.92
FILE CABINET NAME: GIV002 .LAWTON_RI_YOUK


SESSION IO: 6885904
OAFC FUNCTION: OafcShowServer
TRACE EVENT: Task Stast
EVENT TIME: 5-APR-1993 15:08:31.69
STRING1 IS: ALLIN1


SESSION ID: 6885904
OAFC FUNCTION: OafCShowServer
TRACE EVENT: Task Complete
EVENT TIME: 5-APR-1993 15:08:37.87
STATUS: 55603913
STRING1 IS: ALLIN1


SESSION ID: 6885904
OAFC FUNCTION: OafcShowServer
TRACE EVENT: Task Start
EVENT TIME: 5-APR-1993 15:08:42.20
STRINGl IS: ALLIN1


SESSION IO: 6885904
OAFC FUNCTION: OafcShowServer
TRACE EVENT; Task Complete
EVENT TIME: 5-APR-1993 15:08:42.46
STATUS: 55803913
STRING1 IS: ALLIN1


SESSION ID: 6885904
OAFC FUNCTION: OafcSetServer
TRACE EVENT: Task Start
EVENT TIME: 5-APR-1993 15:05:43.12
STRING1 IS: ALLIN1
    
2491.5something is confused, I know I amCHRLIE::HUSTONWed Apr 07 1993 17:05112
    
    THere are some things here that are not making sense to me, maybe 
    there is just a misunderstanding between us.
    
    >GIV001, and the cluster alias is GIV003. There is a partition record
    >for each node (GIV001 and GIV002) which points to the same data file.
    
    What do you mean there is a partition record for each node? Partition
    record where? You should have one file: OA$DATA_SHARE:PARTITION.DAT
    and it should contain drawer records, not node records. You say you
    have two of these files, this is wrong unless you are using DNS
    naming.
    
    >When he did this, he was trying to create a document in a shared
    >drawer, and it said 'Creating document', the it said 'Document will
    >not be created.' Doing a Gold-W gave the message 'Partition not
    >found.', and then the message that he didn't have the DSO license.
    
    The trace file does not show a create being attempted, either by 
    an explicit call to create or to reserve. It shows a list (IOS index)
    being done and this is failing. There is notheing about a broker being
    attempted so you should not be getting the DSO error back.
    
    Here is the relevant part of the trace file, even this seems wrong...
    
>SESSION ID: 6687024
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Task Start
>EVENT TIME: 15-APR-1993 15:08:12.48
>
>
>SESSION ID: 6887024
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Connection Rcv'd
>EVENT TIME: 5-APR-1993 15:08:12.61
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STRING1 IS: GIVOO2
>STRING2 IS: YORIL
>
>
>SESSION ID: 6887024
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Connection Granted
>EVENT TIME: 5-APR-1993 15:08:13.86
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STRING1 IS: YORIL
>
>
>SESSION ID: 6867024   
>OAFC FUNCTION: OafcOpenCabinetW
>TRACE EVENT: Task Complete
>EVENT TIME: 5-APR-1993 15:O8:13.91
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STATUS: 55803913
>STRING1 IS: YORIL
>

    This sectino simply says that a VMS user on node GIV002, username YORIL
    is opening his file cabinet which is owned by the A1 username
    LAWTON_RI_YOUK.  Is this correct? Is the person doing an 
    ALLIN1/USER=LAWTON_RI_YOUK? or is the A1 account name for YORIL 
    simply called LAWTON_RI_YOUK?
    
>SESSION ID: 6887024
>OAFC FUNCTION: OafcListW
>TRACE EVENT: Task Start
>EVENT TIME: 5-APR-1993 15:06:14.05
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STRINGl IS: YORIL
>
>SESSION ID: 6887024
>OAFC FUNCTION: OafcListW
>TRACE EVENT: Task Complete
>EVENT TIME: 5-APR-1993 15:00:15.86
>FILE CABINET NAME: GIV002.LAWTON_RI_YOUK
>STATUS: 55603970
>STRING1 IS: YORIL

    this means he tried to list something in the FC, (probably to get a list
    of drawers) and it failed with a status of 55603970. I cannot find this
    error anyplace, all FCS error codes are prefixed with 5580, we do have
    an error 55803970 that is OafcInternalError, if this is the case please
    check the sys$manager:oafc$server.log for an error log at the same time
    as the above trace record.
    

>SESSION ID: 6887024
>TRACE EVENT: Disconnect Done
>EVENT TIME: 5-APR-1993 15:08:15.92
>FILE CABINET NAME: GIV002 .LAWTON_RI_YOUK

    This is also confusing, it says that someone requested that this 
    connection to the FCS be disconnected and it is finished being
    disconnected. The confusing part is that there also should be a record
    for the disconnect starting or being requested.
    
    This really looks like someone edited this file and cut out to much 
    information, can you get a complete trace file if this is the case.
    
    also can you get a dir/full on OA$DATA_SHARE:PARTITION.DAT from both
    GIV001 and GIV002. And get them to do a dump/rec on 
    OA$DATA_SHARE:PARTITION_MASTER.DAT (hopefully empty), also the output
    from $show cluster, $sho log sys$cluster_node and $sho log sys$node on
    both GIV001 and GIV002?
    
    something simply is not making sense here and we have to find out 
    what and why.
    
    --Bob
    

    
2491.6Closed!KERNEL::LOATKeep passing the open windows...Thu Apr 15 1993 10:1710
    
    Well, just to confuse the issue even more, the customer did a full
    cluster reboot last night and the problem has disappeared! I've aksed
    her to monitor the situation and see if they get this problem again,
    call the CSC.
    
    Thanx for the help
    
    Steve.
    
2491.7BUSHIE::SETHIAhhhh (-: an upside down smile from OZFri May 28 1993 03:1019
    Hi Bob,
    
    This is the same customer as mentioned in note 2563.11 with the logfile
    reference.
    
    Manage partition shows 4 nodes these are VAX1,VAX6,VAX8 and CLUS2 only
    VAX8 was being displayed now all of a sudden they have the other nodes
    being displayed.  The server attribute Distribution is set to OFF ie
    distribution level 0 and the customer has said that they have always
    had it set to OFF the default value.  I asked the customer to look at
    the partition record (read from MP) for each of the records and they
    all share the same partition.dat.
    
    Question is how did this happen ?  Can we remove the other nodes ie
    VAX1,VAX6 and CLUS2 without impacting the system ?
    
    Regards,
    
    Sunil
2491.8Strange, but any side-effects?IOSG::STANDAGEFri May 28 1993 10:4116
    
    
    Sunil,
    
    The Manage Partitions subsystem should only be used when your server is
    running with DNS enabled (i.e. distribution ON). It's quite believeable
    that some spooky things will happen is you're running with distribution
    OFF. Having said that, it doesn't really expalin why you're suddenly
    seeing different information.
    
    Running with distribution OFF means there is only ONE partition on the
    system (OA$DATA_SHARE:PARTITION.DAT).
    
    Kevin.
    
    
2491.9too many strange things happeningROMA::TEPTue Jun 01 1993 03:2615
    Hi Kevin,
    
    >-< Strange, but any side-effects? >-
    
    I cannot answer this question see notes 2628.2 random locking up of
    both shared and non-shared drawers, and problems with FCS and
    %MCC-E-FATAL_FW... note 2563.11.
    
    The site has too many problems with the FCS and strange things happening
    for me to give you much back apart from saying have a look at the
    logfile on RIPPER::USER$TSC:[SETHI]OAFC$SERVER.DIGITAL_COPY;1.
    
    Regards, 
    
    Sunil