T.R | Title | User | Personal Name | Date | Lines |
---|
456.1 | | DUCATI::LASTOVICA | Is it possible to be totally partial? | Thu Mar 27 1997 18:22 | 4 |
| > time it crashed the cluster.
I'd suggest calling Digital for some VMS analysis of why the
cluster crashed. I'd hate to think that it was collect.
|
456.2 | Active vs Static monitoring? | M5::BLITTIN | | Fri Mar 28 1997 13:29 | 6 |
| re: .1 Ct will contact DEC.
In the meantime. Ct reran the monitor against a static file and
everything seemed to run ok. Since the problem occurred while the
collection was active, does the monitor have any problem identifying
the end of the active collection, if/when, it hits it?
|
456.3 | end of file information in a lock value block | OMYGOD::LAVASH | Same as it ever was... | Fri Mar 28 1997 14:37 | 29 |
| If you are monitoring a collection in progess you should really be using
the /interval qualifier.
If not you are looking at all kinds of bogus data.
We have a 32K default cache that gets flushed when full. If you don't use
a flush interval you can get "old" data on the flush, which makes looking
at it in real time pointless.
The flush interval keeps data flushed to disk at a regular interval which
keeps it all consistant for the monitor.
For static data we can pre-sort the records in the file and pick them off
as needed.
Monitor is actually 2 processes, 1 the data channel tries to stay at the
end of the .dat file, reading records in as fast as possible and updating
global sections that the monitor process reads from.
The data channel if it hits end of file will check the lock and lock value
block for the file to see if any new data has come in. Actually it may issue
a blocking ast to be automatically notified when the file contents have
changed. Can't remember exactly it's been about 5 years...
Anyway, they should use interval if they are doing on-line monitoring.
If that makes their problem go away then I'd say ignore the other problem.
George
|
456.4 | /flush=00:00:02 | M5::BLITTIN | | Fri Mar 28 1997 14:57 | 6 |
|
They are using the /flush set to 00:00:02.
I'm having him contact DEC to evaluate the crash dump...
Thank you for the reply...
|
456.5 | couple things to try | OMYGOD::LAVASH | Same as it ever was... | Fri Mar 28 1997 16:38 | 10 |
| Then again if it's a heavily loaded system and they are using the 2 second
interval, perhaps all the concentrated writing is causing the problems...
Have them change the flush interval to 5, and bump the monitoring interval
to 5 or 10...
See if that helps. Or possibly they may need to tune some process/system
parameters...
George
|