[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::dtss

Title:DTSS_NOTE
Moderator:TUXEDO::BARYIAMES
Created:Mon Jul 31 1989
Last Modified:Wed May 28 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:624
Total number of notes:2671

619.0. "Time sometimes skipped one month to the future" by TAGAUS::AURAND () Wed Feb 19 1997 10:08

Hi,

a customer of us has a strange problem with DTSS (OpenVMS Alpha V6.2, DECnet/OSI
V6.3-ECO#3).

Sometimes the time of the DTSS server exactly skipped one month in the future
(19.02.97 16:00 -> 19.03.97 16:00) and the time on some of the clerks changed to
the new date but not all of them. 
Most of the time this has happend during the night, when usually nobody is 
working. They have installed a time provider (HOPF clock ?) on the server and
set the server required parameter on all systems to one.

Could it be, that a problem in the time provider can lead to such a behaviour.

	Many thanks for your help

		Andreas Aurand
T.RTitleUserPersonal
Name
DateLines
619.1Might be the provider ?BULEAN::OLSONWed Feb 19 1997 17:129
    
    
    
      A time provider could cause this.  When this happens is the time
    server(s) effected or just some of the clerks ?  How many time servers
    do you have ?
    
    - Mark
    
619.2Only one serverTAGAUS::AURANDThu Feb 20 1997 04:558
    They have just one time server and the last time this happend, the time 
    server was also effected (at least this is what they told me).
    This there any reason, why only some of the clerks are effected and not
    all ?
    
    	Best regards
    
    		Andreas
619.3Well, this configuration isn't too fault tolerant, but...STEVMS::PETTENGILLmulpMon Feb 24 1997 20:5119
DECdts does try to tolerate faults and this is what seems to be happening
here.

If the time from a server does not intersect with the current time, then
the clerk algoritm will discard this time stamp.  Now my guess is that
the nodes that rejected the bogus time are actually configured as servers,
or they're running different versions of DECnet.

The reason that I suspect that the nodes that didn't jump are actually
servers is that a server always uses its own timestamp as one of the server
timestamps used to meet the required minimum.

Note that even when you have a node configured with an external you should
have additional servers to
	1) provide the time when the server with the TP is unavailable
	2) detect faults in the server with the TP, and in the TP itself

To determine the cause of the problem, my recommendation is to turn on the
TP's tracing and to enable DECnet event logging on the server for DTSS.
619.4TAGEIN::AURANDWed Feb 26 1997 06:176
    Hi
    
    I will give the information to the customer and we will see, if we can
    get more details about this problem.
    
    	Many thanks for your help  -Andreas
619.5Wrong time in SYS.EXE ????TAGEIN::AURANDThu Feb 27 1997 07:0415
    Hi,
    
    recently the customer rebooted the system without the time provider and
    the system came up with a time of 1-JAN-2000. Looking at the
    OPERATOR.LOG we saw, that the time was already false before DTSS was
    started. A VMS collegue told me, that their could be a wrong time in
    SYS.EXE (whatever this means) and that the customer should do a SET
    TIME without any parameter to update the timestamp in SYS.EXE.
    
    Could it be that the 'wrong time' in SYS.EXE had also 'bad' influence
    on DTSS ?
    
    	Many thanks for your help
    
    		Andreas
619.6Detect TP failure ???TAGEIN::AURANDThu Feb 27 1997 08:2716
    Hi,
    
    > detect faults in the server with the TP, and in the TP itself
    
    one more question: Is there any way to detect a fault in the time
    received from the time provider if there is no second DTSS server
    available. (At least I couldn't find any DTSS parameter).
    
    The customer has received some information form the people who sell the
    time provider software and they told him, that the received time can
    skip because of transmission failures.
    
    	Many thanks for your help
    
    		Andreas
    
619.7Think about it, how can you compare one answer to itself and figure out if its correctSTEVMS::PETTENGILLmulpThu Feb 27 1997 22:5538
I give you a time and you have no one else to ask about the time.  Is the
time I give you correct?  Lets say I give you 1999, then you shutdown and
reboot.  Now I give you 1997.  Is 1997 correct or incorrect?  How would you
decide?  Remember that you've written 1999 onto the disk in several places.

Now if the time provider would give you an indication that it was wrong, say
it tells you the time is 1999 but its probably wrong.  What do you do?
What do you do if it says the time is 1997, but that's probably wrong?

If you have two sources of time, then you can decide that they agree or
disagree.  What you need to understand is that when VMS boots, it sets
the time to the time in the BB_WATCH or to a delta time added to the time
written into the exec base image with an inaccuracy of "infinity".
When you check that time with the time provided by a time provider,
the vms time always intersects with the time give by the time provider
not matter how incorrect it is.

So, if you have a single time server and it obviously must be the one with
the time provider, a faulty time provider will hose your entire enterprise.
If you have three time servers and one time provider, then the one with the
TP will be detected as faulty if the TP is giving bogus time, assuming that
the other two time servers have been up long enough to have set their time
to a relatively small inaccuracy.  If the entire site has power failed
and then all systems reboot without any humans to verify things, then a
fault TP will cause all systems to end up with a bogus time, but the "good
news" is that they will be syncronized to the same bogus time and will
therefore be consistent in their error.

Site power recovery is when most of the problems occur.  If you have a
network as large as DEC's then you can count on all the other sites getting
the time back on track soon after the WAN links recover.  Otherwise, you
should have several external sources of time, with the best one being a
human.  If a person checks the time on one system soon after the power
returns, then he can verify whether the TP is working by looking at the
time and the inaccuracy.  If the inaccuracy is still infinite, then he
can set the time on that system to the current time plus or minus several
minutes and in ten minutes a faulty TP will not be able to screw up more
than one time server, assuming that you have 3 or more servers.
619.8ThaaaaaaaaaaaaaaaanksTAGEIN::AURANDFri Feb 28 1997 04:013
    Many thanks for your explanation. 
    
    	Andreas