[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference tuxedo::dce-products

Title:DCE Product Information
Notice:Kit Info - See 2.*-4.*
Moderator:TUXEDO::MAZZAFERRO
Created:Fri Jun 26 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2269
Total number of notes:10003

2227.0. "CDS dying replica" by VIRGIN::BILL (BILL is my lastname !!!) Thu Apr 24 1997 06:27

Hi friends

A customer is having strange problems with a CDS Replicas

He is using DCE1.4 on VMS 6.2

He has two CDS server, both holding CDS replicas for some
directories. Lately he discovered that CDSD was looping (100% CPU)
on the "Replica" CDS server.

He saw that a skulk was still pending, 

cdscp show server:
            Skulks Initiated = 1
            Skulks Completed = 0

The directory where the skulk was pending had the master copy on this node.
So he moved the master copy to the other server.
cdscp set dir to new epoch master Masternode exclude "hanging server"

This worked fine and the cdsd stopped looping. 

Now the questions:

cdscp show /.:/hosts/gdcw9e ( the replicated directory ) 
....
                     Timeout = :
                  Expiration = 1997-04-24-12:05:54.419
                   Extension = +1-00:00:00.000I0.000
                      MyName = /.../og.rzc2.ptt.com/hosts/gdcw9e
        CDS_DirectoryVersion = 3.0
            CDS_ReplicaState = dying replica
             CDS_ReplicaType = readonly

What does the dying replica means ? I remember in DNS was a tool
to fix such a replica ? Surgeon ?

Does the Expiration time means that this replica will disappear 
after this time ?

Is there a fast way to remove a directory which was excluded ?
If I try to write a new copy of the directory to this clearinghouse
I see that the TLOG file grows very fast, but nothing happens.
I use the command:
cdscp set dir  to new epoch  master Masterserver readonly "hanging server"

Any comments on this greatly appreciated...

Marco


T.RTitleUserPersonal
Name
DateLines
2227.1perhaps corrupt databaseTUXEDO::ZEEThere you go.Thu May 01 1997 19:0230
Sorry for the delay - on vacation, then sick.

A corrupt database could cause the CDS server process to take 100% of
the CPU.  I'm not sure if DCE V1.4 has a certain database fix that
would fix the above behavior.  You should run the surgeon tool to -scanrx
the .checkpoint file to check for any corruptions.  Then you would use
the tool to excise out the appropriate bad data.  A previous bug caused
index records to be placed incorrectly in the B-tree, so traversing the
tree would result in an infinite loop.

>What does the dying replica means ? I remember in DNS was a tool

This is a direct result of the "cdscp set dir to new epoch" command when
you exclude a clearinghouse.  The replica state will change from On to
Dying.  After a successful skulk, the replica state should change from
Dying to Dead.  My guess is the skulk is not returning, perhaps because
of the looping above.

>Does the Expiration time means that this replica will disappear 
>after this time ?

I believe these fields go with the attribute above the Timeout: field,
probably the CDS_ParentPointer attribute.

>Is there a fast way to remove a directory which was excluded ?

You mean to say "replica" instead of "directory".  It should be fast if
the skulk is successful.

--Roger
2227.2Replica still dying...VIRGIN::BILLBILL is my lastname !!!Mon May 12 1997 11:2923
Hi Roger

The "poor" replica is still dying. I tried to reproduce a dying replica
with following steps:

- cdscp set dir to new epoch exclude a clearinghouse
- cdscp delete replica from the above clear

In this state I've my dying replica. As soon as I recreate the
replica the state is back to on. As expected..

As far as I understand the exclude should only be used if
you intend to bring the replica back to life and NOT if you'll delete
it afterwards.

Anyway, the customer is not able to recreate the replica (TLOG grows rapidly)
neither he is able to remove the dying replica. Is there any hard way to 
get rid of such a replica ? Is it possible that the mentioned bug is still in
the VMS CDS ?

Thanks for any comment..

/Marco
2227.3TUXEDO::ZEEThere you go.Mon May 12 1997 12:4225
>As far as I understand the exclude should only be used if
>you intend to bring the replica back to life and NOT if you'll delete
>it afterwards.

This is generally true, since if you wish to delete a replica, you
do not need to "new epoch exclude" it first, just delete it.

I have been assuming that the directory in question is not the root
directory.  Also, are there any other directories replicated at
this clearinghouse containing the replica you wish to delete?

>Anyway, the customer is not able to recreate the replica (TLOG grows rapidly)
>neither he is able to remove the dying replica. Is there any hard way to 
>get rid of such a replica ? Is it possible that the mentioned bug is still in
>the VMS CDS ?

Creating or recreating a replica would cause the TLOG file to grow rapidly.
In removing a dying replica, try skulking the directory and note the
error if it fails.  Yes, the mentioned bug may be in that version of VMS
CDS, but someone from VMS DCE would need to verify that.  There is the
brute force method of deleting that clearinghouse altogether, but you
would need to clean up all of the other directories that are replicated
there.

--Roger
2227.4Database Corruption fix may not be in VMSSTAR::SWEENEYMon May 12 1997 13:087
If the database corruption fix mentioned in .1 was released in Digital Unix ECO 1 for 1.3, then I do not believe
OpenVMS has picked up the fix.  Roger, I will contact you offline about the fix.  We have all the source
differences for all the ECO 1 kit changes, but are having a difficult time determining exactly which source module
changes required for the database corruption fix.  

Dave