T.R | Title | User | Personal Name | Date | Lines |
---|
619.1 | Might be the provider ? | BULEAN::OLSON | | Wed Feb 19 1997 17:12 | 9 |
|
A time provider could cause this. When this happens is the time
server(s) effected or just some of the clerks ? How many time servers
do you have ?
- Mark
|
619.2 | Only one server | TAGAUS::AURAND | | Thu Feb 20 1997 04:55 | 8 |
| They have just one time server and the last time this happend, the time
server was also effected (at least this is what they told me).
This there any reason, why only some of the clerks are effected and not
all ?
Best regards
Andreas
|
619.3 | Well, this configuration isn't too fault tolerant, but... | STEVMS::PETTENGILL | mulp | Mon Feb 24 1997 20:51 | 19 |
| DECdts does try to tolerate faults and this is what seems to be happening
here.
If the time from a server does not intersect with the current time, then
the clerk algoritm will discard this time stamp. Now my guess is that
the nodes that rejected the bogus time are actually configured as servers,
or they're running different versions of DECnet.
The reason that I suspect that the nodes that didn't jump are actually
servers is that a server always uses its own timestamp as one of the server
timestamps used to meet the required minimum.
Note that even when you have a node configured with an external you should
have additional servers to
1) provide the time when the server with the TP is unavailable
2) detect faults in the server with the TP, and in the TP itself
To determine the cause of the problem, my recommendation is to turn on the
TP's tracing and to enable DECnet event logging on the server for DTSS.
|
619.4 | | TAGEIN::AURAND | | Wed Feb 26 1997 06:17 | 6 |
| Hi
I will give the information to the customer and we will see, if we can
get more details about this problem.
Many thanks for your help -Andreas
|
619.5 | Wrong time in SYS.EXE ???? | TAGEIN::AURAND | | Thu Feb 27 1997 07:04 | 15 |
| Hi,
recently the customer rebooted the system without the time provider and
the system came up with a time of 1-JAN-2000. Looking at the
OPERATOR.LOG we saw, that the time was already false before DTSS was
started. A VMS collegue told me, that their could be a wrong time in
SYS.EXE (whatever this means) and that the customer should do a SET
TIME without any parameter to update the timestamp in SYS.EXE.
Could it be that the 'wrong time' in SYS.EXE had also 'bad' influence
on DTSS ?
Many thanks for your help
Andreas
|
619.6 | Detect TP failure ??? | TAGEIN::AURAND | | Thu Feb 27 1997 08:27 | 16 |
| Hi,
> detect faults in the server with the TP, and in the TP itself
one more question: Is there any way to detect a fault in the time
received from the time provider if there is no second DTSS server
available. (At least I couldn't find any DTSS parameter).
The customer has received some information form the people who sell the
time provider software and they told him, that the received time can
skip because of transmission failures.
Many thanks for your help
Andreas
|
619.7 | Think about it, how can you compare one answer to itself and figure out if its correct | STEVMS::PETTENGILL | mulp | Thu Feb 27 1997 22:55 | 38 |
| I give you a time and you have no one else to ask about the time. Is the
time I give you correct? Lets say I give you 1999, then you shutdown and
reboot. Now I give you 1997. Is 1997 correct or incorrect? How would you
decide? Remember that you've written 1999 onto the disk in several places.
Now if the time provider would give you an indication that it was wrong, say
it tells you the time is 1999 but its probably wrong. What do you do?
What do you do if it says the time is 1997, but that's probably wrong?
If you have two sources of time, then you can decide that they agree or
disagree. What you need to understand is that when VMS boots, it sets
the time to the time in the BB_WATCH or to a delta time added to the time
written into the exec base image with an inaccuracy of "infinity".
When you check that time with the time provided by a time provider,
the vms time always intersects with the time give by the time provider
not matter how incorrect it is.
So, if you have a single time server and it obviously must be the one with
the time provider, a faulty time provider will hose your entire enterprise.
If you have three time servers and one time provider, then the one with the
TP will be detected as faulty if the TP is giving bogus time, assuming that
the other two time servers have been up long enough to have set their time
to a relatively small inaccuracy. If the entire site has power failed
and then all systems reboot without any humans to verify things, then a
fault TP will cause all systems to end up with a bogus time, but the "good
news" is that they will be syncronized to the same bogus time and will
therefore be consistent in their error.
Site power recovery is when most of the problems occur. If you have a
network as large as DEC's then you can count on all the other sites getting
the time back on track soon after the WAN links recover. Otherwise, you
should have several external sources of time, with the best one being a
human. If a person checks the time on one system soon after the power
returns, then he can verify whether the TP is working by looking at the
time and the inaccuracy. If the inaccuracy is still infinite, then he
can set the time on that system to the current time plus or minus several
minutes and in ten minutes a faulty TP will not be able to screw up more
than one time server, assuming that you have 3 or more servers.
|
619.8 | Thaaaaaaaaaaaaaaaanks | TAGEIN::AURAND | | Fri Feb 28 1997 04:01 | 3 |
| Many thanks for your explanation.
Andreas
|