T.R | Title | User | Personal Name | Date | Lines |
---|
3975.1 | Yes, something is strange... | STKHLM::WEBJORN | Gullik Webj�rn Network Advisory | Wed May 28 1997 12:18 | 19 |
|
This has also been observed at DAGAB, where time is provided from a
GPS receiver. When the receiver drops satellites, all servers drift
away, when the correct time comes back, the server quickly leaves
the other gang of three, so that times does not intersect
They cannot follow the abrupt change from bad consensus time
to good provider time.
We have been unable to manipulate the server setting so that
the problem goes away, without setting unreasonable numbers
on the non-povider servers.
Very interested to hear what happens with your case...
Gullik
|
3975.2 | your solution/workaround? | UTOPIE::FRUEHWIRTH_M | | Wed May 28 1997 18:07 | 11 |
| hi Gullik,
> We have been unable to manipulate the server setting so that
> the problem goes away, without setting unreasonable numbers
> on the non-povider servers.
what was (is) your solution?
having one server (with time-provider) and all other are clerks?
best regards
martin
|
3975.3 | re:This has also been observed at DAGAB... | TWICK::PETTENGILL | mulp | Thu May 29 1997 00:38 | 18 |
| I'd like to see the server trace files used to generate graphs for a case
when this happens.
The only way that this can happen is that the clocks on the servers are
drifting faster than the spec'd inaccuracy.
This is based on the theory behind DTSS.
And the theory is backed up by many years of managing a set of time servers
where the external time provider occasionally fails. There were periods
where the global time servers were afu and reporting times that were wildly
offset from true with with huge inaccuracies and other times where a number
of the global time servers were faulty. At no time have I seen the local
servers fail to intersect with the server(s) with external time providers.
In fact, since my long time time provider is the dialup service ACTS,
my TP enables and disables itself dynamically, to the TP is coming and
going frequently.
|
3975.4 | | COMEUP::SIMMONDS | loose canon | Fri May 30 1997 00:29 | 5 |
| I agree with .3 .. mayhaps the Inaccuracy of the Server with TP attached
never gets reduced when the local TP device 'recovers' after failure..
(programming error in the DTSS$PROVIDER being used)
John.
|
3975.5 | Observed behaviour not intuitive. | STKHLM::WEBJORN | Gullik Webj�rn Network Advisory | Fri May 30 1997 06:15 | 40 |
| re: .2
We have not been able to solve this properly yet.
The customer wants 4 time servers to be able to take one down without
getting below 'servers required = 3' default.
That way other departments can run 'stock' decnet+
Clocks on the machines also seem to drift excessively. This requires
the customer to schedule a 'sync dtss set clock true' frequently,
so that if a server is rebooted, the hardware clock will be within
a reasonable accuracy. The provider gives a time +/- 6 mS at the server
Announcing accuracy with +/- 200 mS gives a too small interval.
This means that even if the customer knows time is +/- 6 mS,
some *very much larger* inaccuracy must be used ( 10-20 SECONDS)
Otherwise, the drift when GPS time is not available becomes
large enough that servers will not intersect, and hence not resync.
We thought that if the server WITH provider announced time with
200 mS, and the other 3 servers were set up with a larger
inaccuracy, that the gang-of-three would sync up when they found
out that the 'good' server was back, and weighting it's time
estimate higher, pulling all 3 back in line.
( Does a 'better' accuracy weigh better in the calculations ??? )
the TP program changes between 2 inaccuracies when the GPS 'REAL'
time flag is set. Currently good accuracy is set at 200 mS and
bad accuracy ( GPS freewheeling ) is set at +/- 5 S.
if and when the good server dropped out, all four would drift SLOWLY
due to the 'mass' of the four server. We don't understand why this
strategy fails.
Gullik
|
3975.6 | Ok, let's see if we can diagnose the problem here... | TWICK::PETTENGILL | mulp | Fri May 30 1997 21:34 | 35 |
| What you are describing is counter to my experience.
I've written a number of notes about how DTSS works in the DTSS conference
and the DTSS documentation seems pretty good to me, so let's get some
data for me to look at.
$run SYS$COMMON:[SYSHLP.EXAMPLES.DTSS]DTSS$GRAPH will give you some
instructions, but I will make my request specific:
Issue the following command on the system with the TP and at least one
server, preferably all of them:
$mc ncl set dtss synch trace true
This will start a log file, although it will take a while before it
appears.
Ideally, this would run for say six hours, then the GPS TP is disabled
for 6-12 hours, and then the GPS TP is re-enabled. Even more ideal,
the problem you describe would have occurred.
During this time, there should be no "$mc ncl set dtss" commands issued.
At then end of at least 24 hours, send me the sys$manager:dtss$inacc.log
file from each of the servers, being sure to label each log file according
to the server that it came from. I will also need the Ethernet addresses
of each system.
For a bonus, you would include a DECnet event log for all the DTSS server
systems.
Or alternately, you could send me the event logging for all the DTSS servers.
I've lost the command file that I've used previously, so I'll create a
new set of NCL commands to collect the DTSS events from all the systems
in one log file and post them later.
|
3975.7 | power down the timeproviderclock itself? | UTOPIE::FRUEHWIRTH_M | | Tue Jun 03 1997 13:10 | 18 |
| re .-1
>Ideally, this would run for say six hours, then the GPS TP is disabled
>for 6-12 hours, and then the GPS TP is re-enabled. Even more ideal,
>the problem you describe would have occurred.
Should we manually power down the timeproviderclock itself,
or does it mean that i should disable the whole DTSS on the DTSSserver
which has the timeproviderclock attached?
>Or alternately, you could send me ...
I supply you with the requested information as soon as possible,
it depends on the customer ...
best regards
martin
|
3975.8 | | PISGAH::PETTENGILL | mulp | Wed Jun 04 1997 01:28 | 10 |
| >Should we manually power down the timeproviderclock itself,
>or does it mean that i should disable the whole DTSS on the DTSSserver
>which has the timeproviderclock attached?
Just disconnect or disable the timeprovider software or hardware.
One of the points of doing this is to see how this system's time
behaves relative to the other systems in the LAN. Disabling the
server would defeat the whole purpose.
|
3975.9 | | COMEUP::SIMMONDS | loose canon | Wed Jun 04 1997 04:46 | 14 |
| Re: .5
| ( Does a 'better' accuracy weigh better in the calculations ??? )
Certainly, provided the interval with low inaccuracy intersects with
intervals from a majority of other Servers.
Btw, a Server with a TP will not query other Servers' times when it
synchronizes.. it's Inaccuracy is under your TP program's control.
You're lucky to receive a helping hand from Mike.. resolution
shouldn't be too far off once he gets your synch. trace...
John.
|
3975.10 | traces -> next week | UTOPIE::FRUEHWIRTH_M | | Wed Jun 04 1997 06:30 | 9 |
| > You're lucky to receive a helping hand from Mike.. resolution
> shouldn't be too far off once he gets your synch. trace...
i will supply you with the trace-files during the next week
(my customer is low on manpower due to holidayseason).
best regards
martin
|